Acknowledgments

This textbook is an open-source project, made freely available under the Creative Commons Attribution-ShareAlike 4.0 International license (CC BY-SA 4.0). Code examples are additionally licensed under the MIT License.

Standing on Shoulders

The mathematical foundations in Part I draw from decades of pedagogical excellence in linear algebra (Gilbert Strang, MIT OCW), optimization (Stephen Boyd and Lieven Vandenberghe, Convex Optimization), probability (Larry Wasserman, All of Statistics), and information theory (David MacKay, Information Theory, Inference, and Learning Algorithms).

The deep learning chapters in Part II owe a debt to the educational traditions established by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (Deep Learning), Andrej Karpathy's pedagogical code implementations, and the PyTorch documentation team.

The causal inference chapters in Part III build on the foundational work of Judea Pearl (Causality, The Book of Why), Donald Rubin (the potential outcomes framework), Guido Imbens and Donald Rubin (Causal Inference for Statistics, Social, and Biomedical Sciences), Scott Cunningham (Causal Inference: The Mixtape), and Miguel Hernan and James Robins (Causal Inference: What If).

The Bayesian chapters in Part IV follow the practical Bayesian workflow advocated by Andrew Gelman, Aki Vehtari, and the Stan/PyMC communities.

The production ML systems chapters in Part V learn from Chip Huyen (Designing Machine Learning Systems), Martin Kleppmann (Designing Data-Intensive Applications), the Google ML engineering papers (Sculley et al., "Hidden Technical Debt in Machine Learning Systems"), and the MLOps community.

The responsible AI chapters in Part VI draw on the fairness, accountability, and transparency (FAccT) research community, with particular debts to Solon Barocas and Moritz Hardt (Fairness and Machine Learning), Cynthia Dwork (differential privacy), and the Fairlearn and AIF360 teams.

Open-Source Tools

This book would not exist without the open-source ecosystem: PyTorch, scikit-learn, numpy, pandas, matplotlib, Jupyter, PyMC, ArviZ, DoWhy, EconML, Fairlearn, Opacus, Captum, Great Expectations, MLflow, HuggingFace Transformers, PyTorch Geometric, and hundreds of other projects maintained by volunteers and organizations worldwide.

The Trilogy

This book is Book 3 of the DataField.Dev Data Science Trilogy:

  1. Introductory Statistics: Making Sense of Data in the Age of AI
  2. Intermediate Data Science: Machine Learning, Experimentation, and the Craft of Data-Driven Decisions
  3. Advanced Data Science: Deep Learning, Causal Inference, and Production Systems at Scale (this book)

The trilogy is designed to take a curious beginner through statistical literacy, practical machine learning, and into the advanced techniques and systems thinking that define senior data science practice.

Contributing

This is a living document. If you find errors, have suggestions, or want to contribute, please visit the project repository. Every correction, no matter how small, improves the resource for future readers.