Part I: Mathematical and Computational Foundations

"The frontier moves fast, but fundamentals move slow. Attention mechanisms are built on linear algebra that has not changed in a hundred years. If you master the foundations, every new paper becomes readable."


Why This Part Exists

Every algorithm in this book — every neural network, every causal estimator, every Bayesian model, every production optimization — reduces to a sequence of mathematical operations. This part ensures you understand those operations deeply enough to derive, debug, and extend the methods that follow.

This is not a review. The intermediate textbook covered basic linear algebra, introductory calculus, and applied statistics. Here we go further: eigendecomposition and SVD (the X-ray of every matrix in machine learning), multivariate calculus and the optimization landscape (why your model converges or doesn't), rigorous probability theory (the language every model speaks), information theory (why your loss function works), and computational complexity (what is possible before you write a line of code).

Every concept gets three treatments: the intuition, the mathematics, and the implementation. By the end of Part I, you will have the mathematical vocabulary to read any ML paper published in the last decade.

Chapters in This Part

Chapter Focus
1. Linear Algebra for Machine Learning Eigendecomposition, SVD, matrix calculus, tensor operations
2. Multivariate Calculus and Optimization Gradients, Jacobians, backpropagation derivation, SGD variants
3. Probability Theory and Statistical Inference MLE derivation, exponential families, Monte Carlo methods
4. Information Theory for Data Science Entropy, KL divergence, mutual information, the ELBO
5. Computational Complexity and Scalability Algorithm analysis, approximate methods, GPU profiling

Progressive Project Milestones

  • M0 (Chapter 1): Define the recommendation problem as matrix completion. Analyze the user-item interaction matrix.
  • M1 (Chapter 5): Profile the naive approach. Implement approximate nearest neighbor retrieval. Establish latency budgets.

Prerequisites

Strong intuitive understanding of linear algebra, single-variable calculus, and probability distributions. This part makes the intuition rigorous.

Chapters in This Part