Prerequisites

This book assumes significant prior knowledge. It is not an introduction to data science or machine learning — it is the advanced sequel.

Required Knowledge

Python Programming

  • Object-oriented programming: classes, inheritance, composition, dunder methods
  • Decorators, context managers, generators
  • Type hints and their practical use
  • Package structure and virtual environments
  • Comfortable reading and writing production-quality Python (not just notebooks)

Machine Learning

  • The full ML workflow: problem framing → data preparation → feature engineering → model selection → evaluation → deployment
  • Supervised learning: linear/logistic regression, decision trees, random forests, gradient boosting (XGBoost or LightGBM)
  • Unsupervised learning: k-means, PCA, basic clustering evaluation
  • Model evaluation: cross-validation, train/validation/test splits, ROC curves, precision-recall, calibration
  • Regularization: L1, L2, early stopping — conceptually and practically
  • Scikit-learn pipelines and custom transformers

Mathematics

  • Linear algebra: Vectors, matrices, dot products, matrix multiplication, transpose, inverse. You should be comfortable with the notation $\mathbf{Ax} = \mathbf{b}$. Eigenvalues and eigenvectors are helpful but will be derived from scratch in Chapter 1.
  • Calculus: Derivatives, partial derivatives, the chain rule. You should understand that the gradient points in the direction of steepest ascent. Multivariable calculus is covered rigorously in Chapter 2.
  • Probability: Random variables, probability distributions (normal, binomial), expected value, variance, conditional probability, Bayes' theorem. Formal probability theory is developed in Chapter 3.
  • Statistics: Hypothesis testing, confidence intervals, p-values, Central Limit Theorem. Familiarity with Bayesian reasoning is helpful.

SQL

  • SELECT, JOIN (inner, left, right), GROUP BY, HAVING, subqueries
  • Window functions and CTEs are covered at advanced level in the book but basic familiarity helps

Software Engineering

  • Git: branching, merging, pull requests
  • Command line: basic navigation, package management
  • Experience with at least one deployment (API, batch prediction, or embedded model)
  • Docker: Basic containerization (Dockerfile, docker-compose). Covered as needed in Part V.
  • Cloud platforms: Basic familiarity with AWS, GCP, or Azure. GPU instances are used in Part II.
  • Deep learning: Any prior exposure to neural networks (even a tutorial) will make Part II more comfortable. Chapter 6 builds from scratch but moves fast.
  • Causal inference: Any prior exposure will help with Part III, but no causal inference background is assumed.
  • Bayesian statistics: Having encountered Bayes' theorem in a statistics course is sufficient. Part IV develops Bayesian methods from the ground up.

Self-Assessment

If you can answer "yes" to all of the following, you are ready for this book:

  1. Can you build a gradient boosting model with scikit-learn, including feature engineering, cross-validation, and hyperparameter tuning?
  2. Can you explain the bias-variance tradeoff and how regularization addresses it?
  3. Can you write a Python class with proper __init__, type hints, and docstrings?
  4. Can you compute a matrix-vector product by hand and explain what it means geometrically?
  5. Can you take the derivative of $f(x) = x^2 e^{-x}$ using the product rule?
  6. Can you write a SQL query with a window function to compute a running average?
  7. Have you deployed at least one ML model (even a simple one) to production?

If you answered "no" to more than two of these, consider starting with the Intermediate Data Science textbook (Book 2 in this trilogy) or strengthening the specific skills before proceeding. Part I (Chapters 1-5) will reinforce mathematical foundations, but it assumes prior exposure — it does not teach these concepts for the first time.

Technical Environment

  • Python 3.11+ (3.11 recommended for PyTorch compatibility)
  • GPU access recommended for Part II, required for Chapter 26
  • See Appendix D for complete environment setup instructions
  • See requirements.txt for all Python dependencies