Prerequisites
This book assumes significant prior knowledge. It is not an introduction to data science or machine learning — it is the advanced sequel.
Required Knowledge
Python Programming
- Object-oriented programming: classes, inheritance, composition, dunder methods
- Decorators, context managers, generators
- Type hints and their practical use
- Package structure and virtual environments
- Comfortable reading and writing production-quality Python (not just notebooks)
Machine Learning
- The full ML workflow: problem framing → data preparation → feature engineering → model selection → evaluation → deployment
- Supervised learning: linear/logistic regression, decision trees, random forests, gradient boosting (XGBoost or LightGBM)
- Unsupervised learning: k-means, PCA, basic clustering evaluation
- Model evaluation: cross-validation, train/validation/test splits, ROC curves, precision-recall, calibration
- Regularization: L1, L2, early stopping — conceptually and practically
- Scikit-learn pipelines and custom transformers
Mathematics
- Linear algebra: Vectors, matrices, dot products, matrix multiplication, transpose, inverse. You should be comfortable with the notation $\mathbf{Ax} = \mathbf{b}$. Eigenvalues and eigenvectors are helpful but will be derived from scratch in Chapter 1.
- Calculus: Derivatives, partial derivatives, the chain rule. You should understand that the gradient points in the direction of steepest ascent. Multivariable calculus is covered rigorously in Chapter 2.
- Probability: Random variables, probability distributions (normal, binomial), expected value, variance, conditional probability, Bayes' theorem. Formal probability theory is developed in Chapter 3.
- Statistics: Hypothesis testing, confidence intervals, p-values, Central Limit Theorem. Familiarity with Bayesian reasoning is helpful.
SQL
- SELECT, JOIN (inner, left, right), GROUP BY, HAVING, subqueries
- Window functions and CTEs are covered at advanced level in the book but basic familiarity helps
Software Engineering
- Git: branching, merging, pull requests
- Command line: basic navigation, package management
- Experience with at least one deployment (API, batch prediction, or embedded model)
Recommended but Not Required
- Docker: Basic containerization (Dockerfile, docker-compose). Covered as needed in Part V.
- Cloud platforms: Basic familiarity with AWS, GCP, or Azure. GPU instances are used in Part II.
- Deep learning: Any prior exposure to neural networks (even a tutorial) will make Part II more comfortable. Chapter 6 builds from scratch but moves fast.
- Causal inference: Any prior exposure will help with Part III, but no causal inference background is assumed.
- Bayesian statistics: Having encountered Bayes' theorem in a statistics course is sufficient. Part IV develops Bayesian methods from the ground up.
Self-Assessment
If you can answer "yes" to all of the following, you are ready for this book:
- Can you build a gradient boosting model with scikit-learn, including feature engineering, cross-validation, and hyperparameter tuning?
- Can you explain the bias-variance tradeoff and how regularization addresses it?
- Can you write a Python class with proper
__init__, type hints, and docstrings? - Can you compute a matrix-vector product by hand and explain what it means geometrically?
- Can you take the derivative of $f(x) = x^2 e^{-x}$ using the product rule?
- Can you write a SQL query with a window function to compute a running average?
- Have you deployed at least one ML model (even a simple one) to production?
If you answered "no" to more than two of these, consider starting with the Intermediate Data Science textbook (Book 2 in this trilogy) or strengthening the specific skills before proceeding. Part I (Chapters 1-5) will reinforce mathematical foundations, but it assumes prior exposure — it does not teach these concepts for the first time.
Technical Environment
- Python 3.11+ (3.11 recommended for PyTorch compatibility)
- GPU access recommended for Part II, required for Chapter 26
- See Appendix D for complete environment setup instructions
- See
requirements.txtfor all Python dependencies