Further Reading — Chapter 1: Linear Algebra for Machine Learning


1. Gilbert Strang, Introduction to Linear Algebra, 6th edition (Wellesley-Cambridge Press, 2023)

The definitive pedagogical treatment of linear algebra, and the source of the "four fundamental subspaces" framework used in this chapter. Strang's exposition is remarkable for making abstract concepts geometric and intuitive without sacrificing rigor. Chapters 6-7 (eigenvalues and SVD) are directly relevant, and the entire book repays careful study. His MIT OpenCourseWare lectures (18.06) are an excellent companion. If you read one linear algebra textbook in your career, make it this one.

Difficulty: Accessible to anyone with calculus background. Undergraduate to early graduate level.


2. Halko, Martinsson, and Tropp, "Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions," SIAM Review 53(2), 2011

The foundational paper on randomized SVD algorithms. This paper shows that by multiplying a matrix by a random projection and then computing the SVD of the smaller result, you can obtain an accurate rank-$k$ approximation in $O(mnk)$ time instead of $O(mn \min(m,n))$. The paper is remarkably well-written for a mathematics journal article — the exposition is clear, the examples are concrete, and the error bounds are practical. Essential reading for anyone working with matrices too large for exact SVD.

Difficulty: Graduate level. Requires comfort with probability and linear algebra. The main algorithms (Section 4-5) are accessible; the proofs (Section 8-10) are more demanding.


3. Koren, Bell, and Volinsky, "Matrix Factorization Techniques for Recommender Systems," IEEE Computer 42(8), 2009

The paper that popularized matrix factorization for recommendation, written by the team that won the Netflix Prize. It covers the transition from pure SVD to regularized matrix factorization with SGD, including bias terms, temporal dynamics, and implicit feedback. The writing is accessible and practical — it explains not just the math but the engineering decisions (how to handle missing data, how to tune regularization, what features of the singular value spectrum matter). This remains the starting point for understanding collaborative filtering.

Difficulty: Accessible to anyone who has completed this chapter. Applied mathematics with clear engineering motivation.


4. Petersen and Pedersen, The Matrix Cookbook (Technical University of Denmark, 2012)

A freely available reference document (not a textbook) containing hundreds of matrix calculus identities, organized by category: derivatives of traces, determinants, inverses, eigenvalues, Kronecker products, and more. When you need to derive a gradient expression and cannot remember whether $\frac{\partial}{\partial \mathbf{X}} \text{tr}(\mathbf{AXB}) = \mathbf{A}^\top \mathbf{B}^\top$ or $\mathbf{B}^\top \mathbf{A}^\top$, this is where you look. Keep it bookmarked — you will reference it throughout this book. Available at: matrixcookbook.com.

Difficulty: Reference document, not sequential reading. Requires the linear algebra and matrix calculus background from this chapter.


5. Kolda and Bader, "Tensor Decompositions and Applications," SIAM Review 51(3), 2009

The comprehensive survey on tensor decomposition methods, including CP (CANDECOMP/PARAFAC) and Tucker decompositions. The paper covers definitions, algorithms, applications, and open problems. Sections 1-4 (definitions and CP decomposition) connect directly to Case Study 2 in this chapter. The discussion of uniqueness conditions for CP decomposition is particularly valuable — unlike matrix SVD, the CP decomposition is unique under mild conditions, which has important implications for interpretability.

Difficulty: Graduate level. The survey is comprehensive (over 60 pages) but well-organized. Read sections selectively based on interest.


Further Reading for Chapter 1 of Advanced Data Science