Chapter 24: Further Reading
Essential Sources
1. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison, "Hidden Technical Debt in Machine Learning Systems" (NeurIPS, 2015)
The foundational paper on production ML systems engineering. Sculley et al. — a team from Google — identify multiple forms of technical debt specific to ML systems: entanglement (Changing Anything Changes Everything, or CACE), correction cascades, undeclared consumers, data dependency debt, and feedback loops. The paper's central figure — a small "ML Code" box surrounded by vastly larger infrastructure boxes — has become the iconic image of the field and provides the conceptual foundation for this entire chapter.
Reading guidance: Section 2 (Complex Models Erode Boundaries) introduces CACE and explains why ML systems are fundamentally harder to modularize than traditional software. Section 4 (System-Level Spaghetti) covers configuration debt, which is underappreciated but accounts for a large fraction of production incidents. Section 5 (Dealing with Changes in the External World) discusses monitoring and testing — topics covered in depth in Chapters 28 and 30 of this book. The paper is short (8 pages) and every section is worth reading carefully. For a follow-up, see Sculley et al., "Machine Learning: The High Interest Credit Card of Technical Debt" (SE4ML Workshop, ICSE, 2014), which introduces the metaphor in the title. For a recent empirical study of technical debt in practice, see Bogner et al., "Characterizing Technical Debt and Antipatterns in AI-Based Systems: A Systematic Mapping Study" (TechDebt, 2021).
2. Chip Huyen, Designing Machine Learning Systems (O'Reilly, 2022)
The most comprehensive single-volume treatment of production ML system design. Huyen covers the full lifecycle: project scoping, data engineering, feature engineering, model development, model deployment, monitoring, and continual learning. The book is organized around the design decisions that ML engineers face in practice, rather than around algorithms or mathematical foundations, making it an ideal complement to the algorithm-focused Parts I-IV of this textbook.
Reading guidance: Chapter 7 (Model Deployment and Prediction Service) covers the batch vs. real-time serving decision in detail, with system diagrams for common architectures and a practical discussion of model compression and optimization. Chapter 8 (Data Distribution Shifts and Monitoring) provides a taxonomy of distribution shifts (covariate shift, label shift, concept drift) and practical detection methods — this connects directly to the training-serving skew framework in Section 24.6 and anticipates Chapter 30 of this book. Chapter 9 (Continual Learning and Test in Production) covers the experimentation infrastructure discussed in Section 24.10 — shadow mode, canary, A/B testing, and bandits. For readers who want to go deeper on feature stores specifically, Huyen's treatment in Chapter 7 is supplemented by the Feast documentation (feast.dev) and Tecton's engineering blog, which provides case studies of feature store implementations at scale.
3. Martin Kleppmann, Designing Data-Intensive Applications (O'Reilly, 2017)
Although not ML-specific, Kleppmann's book is the authoritative reference on the distributed systems concepts that underlie production ML infrastructure: consistency models, replication, partitioning, batch and stream processing, and the interplay between them. The feature store, the data pipeline, the model serving layer, and the monitoring system are all data-intensive applications, and understanding their distributed systems properties is essential for debugging production failures and making sound architectural decisions.
Reading guidance: Part III (Derived Data) is the most directly relevant: Chapter 10 (Batch Processing) covers the computational model behind batch feature engineering and batch serving; Chapter 11 (Stream Processing) covers the event-driven architecture behind near-real-time feature computation and streaming features; Chapter 12 (The Future of Data Systems) discusses the lambda architecture (batch + stream processing), which is the conceptual ancestor of the feature store's dual online/offline design. Part II (Distributed Data) provides the foundations for understanding feature store consistency: Chapter 5 (Replication) and Chapter 9 (Consistency and Consensus) explain why eventual consistency — not strong consistency — is the pragmatic choice for online feature stores, and what the trade-offs are. Kleppmann's writing is exceptionally clear, and the book rewards careful reading even for chapters that seem tangential.
4. Larysa Visengeriyeva, Anja Kammer, Isabel Bär, Alexander Kniesz, and Michael Plöd, "Machine Learning Operations (MLOps): Overview, Definition, and Architecture" (IEEE Access, 2023)
A systematic review that defines MLOps as a discipline and provides a reference architecture for production ML systems. The paper synthesizes patterns from industry (Google, Facebook, Netflix, Uber) and academia into a layered architecture: data management, model development, model deployment, and model monitoring. The reference architecture maps closely to the component map in Section 24.2 and provides a standardized vocabulary for discussing ML system design across organizations.
Reading guidance: Section III presents the reference architecture with detailed component descriptions. Figure 3 is a comprehensive system diagram that extends the StreamRec diagram from Section 24.11 with additional components (data versioning, model lineage, regulatory compliance). Section IV discusses MLOps maturity levels (manual, ML pipeline automation, CI/CD for ML), which provides a useful framework for assessing where an organization stands and what to build next. For a complementary practitioner perspective, see the Google Cloud "MLOps: Continuous Delivery and Automation Pipelines in Machine Learning" whitepaper (2020), which defines three maturity levels (MLOps Level 0, 1, and 2) and describes the infrastructure required at each level.
5. Michael Nygard, Release It! Design and Deploy Production-Ready Software, 2nd edition (Pragmatic Bookshelf, 2018)
The definitive reference on designing software systems for production reliability. Nygard introduces the stability patterns — circuit breakers, bulkheads, timeouts, handshaking, and steady state — that are essential for ML serving infrastructure. The circuit breaker pattern in Section 24.7 is drawn directly from this book. Nygard's key insight is that systems fail not because individual components are unreliable, but because failures propagate through dependencies in ways that the designer did not anticipate.
Reading guidance: Part I (Create Stability) is essential: Chapter 4 (Stability Antipatterns) catalogs the ways systems fail — integration point failures, chain reactions, cascading failures, and blocked threads — and Chapter 5 (Stability Patterns) presents the countermeasures, including the circuit breaker pattern used in this chapter. Chapter 10 (Control Plane) discusses the operational infrastructure for managing production systems, which connects to the monitoring and observability concerns of Chapter 30. While the book predates the ML-in-production era, every stability pattern it describes applies directly to ML serving systems, and the failure modes it catalogs are exactly the failure modes that ML engineers encounter when deploying models to production.