Chapter 2: Further Reading

Textbooks

Foundational Linear Algebra

  • Strang, G. (2016). Introduction to Linear Algebra, 5th ed. Wellesley-Cambridge Press. The gold standard for learning linear algebra with geometric intuition. Strang's emphasis on the "four fundamental subspaces" provides exactly the perspective AI engineers need. The companion MIT OpenCourseWare lectures (18.06) are freely available and highly recommended.

  • Axler, S. (2024). Linear Algebra Done Right, 4th ed. Springer. A more abstract, proof-oriented treatment that avoids determinants until the end. Excellent for developing deep conceptual understanding of vector spaces, linear maps, and the spectral theorem. The fourth edition is available as an open-access PDF.

  • Hoffman, K. & Kunze, R. (1971). Linear Algebra, 2nd ed. Prentice Hall. A rigorous classic for readers who want complete mathematical foundations. Best suited as a reference after building initial intuition from Strang or Axler.

Linear Algebra for Machine Learning and Data Science

  • Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press. Chapter 2 (Linear Algebra), Chapter 3 (Analytic Geometry), and Chapter 4 (Matrix Decompositions) map directly to the topics in this chapter. Available free online at mml-book.github.io. This is perhaps the most directly relevant reference for supplementing the material in this textbook.

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. Chapter 2 provides a concise review of linear algebra tailored to deep learning practitioners. Covers everything from basic operations to eigendecomposition and the trace operator. Available free at deeplearningbook.org.

  • Boyd, S. & Vandenberghe, L. (2018). Introduction to Applied Linear Algebra. Cambridge University Press. An applications-first approach that connects every concept to data science and engineering problems. The companion Julia/Python notebooks are excellent for hands-on practice. Available free at vmls-book.stanford.edu.

Online Courses and Video Series

  • 3Blue1Brown: Essence of Linear Algebra (YouTube). Grant Sanderson's animated video series is the single best resource for building geometric intuition about linear algebra. Each 10-15 minute video covers one concept (span, linear transformations, determinants, eigenvectors, etc.) with stunning visualizations. Start here if you want to truly see linear algebra.

  • MIT 18.06: Linear Algebra (Gilbert Strang, MIT OpenCourseWare). The complete lecture series accompanying Strang's textbook. 34 lectures covering the full undergraduate linear algebra curriculum. Available free at ocw.mit.edu.

  • fast.ai: Computational Linear Algebra for Coders. A top-down, code-first course that teaches linear algebra through applications including NLP, image compression, and PageRank. Uses Python and NumPy throughout. Ideal for programmers who learn best by doing.

Key Papers

Singular Value Decomposition

  • Eckart, C. & Young, G. (1936). "The approximation of one matrix by another of lower rank." Psychometrika, 1(3), 211-218. The original proof that the truncated SVD gives the best low-rank approximation. A foundational result referenced throughout this chapter's treatment of SVD.

  • Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM Review, 53(2), 217-288. The definitive reference on randomized SVD algorithms, which make SVD practical for very large matrices encountered in modern AI. Exercise 2.31 is based on the algorithm presented here.

PCA and Dimensionality Reduction

  • Jolliffe, I. T. & Cadima, J. (2016). "Principal component analysis: a review and recent developments." Philosophical Transactions of the Royal Society A, 374(2065). A comprehensive modern review of PCA, covering theoretical foundations, computational methods, sparse PCA, and applications. Useful for understanding the broader landscape beyond the basics covered in Case Study 2.

  • Shlens, J. (2014). "A tutorial on principal component analysis." arXiv:1404.1100. A clear, well-written tutorial that derives PCA from first principles with an emphasis on intuition. Recommended for solidifying your understanding after working through this chapter.

Applications in AI

  • Koren, Y., Bell, R., & Volinsky, C. (2009). "Matrix factorization techniques for recommender systems." Computer, 42(8), 30-37. Describes how SVD-based matrix factorization powers recommender systems, including the Netflix Prize winning approach. A direct application of the SVD concepts from Section 2.5.

  • Deerwester, S., et al. (1990). "Indexing by latent semantic analysis." Journal of the American Society for Information Science, 41(6), 391-407. The paper that introduced Latent Semantic Analysis (LSA), applying SVD to term-document matrices for information retrieval. One of the earliest and most influential applications of linear algebra to NLP.

NumPy and Computational References

  • NumPy Documentation: Linear Algebra (numpy.linalg). https://numpy.org/doc/stable/reference/routines.linalg.html The official reference for all NumPy linear algebra functions. Includes detailed descriptions of algorithms used (LAPACK routines) and numerical considerations.

  • Harris, C. R., et al. (2020). "Array programming with NumPy." Nature, 585, 357-362. The official NumPy paper, describing its design, ecosystem, and role in scientific computing. Useful for understanding why NumPy is structured the way it is.

  • Trefethen, L. N. & Bau, D. (1997). Numerical Linear Algebra. SIAM. The standard graduate text on the numerical aspects of linear algebra. Covers floating-point arithmetic, stability, conditioning, iterative methods, and the algorithms behind NumPy's linalg module. Essential reading for understanding why solve is preferred over inv, and why eigenvalue algorithms work the way they do.

Looking Ahead

The linear algebra foundations from this chapter connect directly to several upcoming topics:

  • Chapter 3 (Optimization and Calculus): Matrix calculus and the Hessian matrix build on eigendecomposition. The condition number determines convergence rates of gradient descent.
  • Chapter 4 (Probability and Statistics): Covariance matrices are symmetric positive semidefinite, and their eigendecomposition drives PCA and Gaussian distributions.
  • Chapters 7-8 (Neural Networks): Weight matrices, attention mechanisms, and embedding layers are all linear algebra operations. Understanding matrix multiplication and its gradients is essential for implementing and debugging neural networks.
  • Chapter 10 (Dimensionality Reduction): PCA, SVD, and their non-linear extensions (autoencoders, t-SNE, UMAP) will be explored in depth.