Chapter 36: Further Reading

Essential Sources

1. Chip Huyen, Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications (O'Reilly, 2022)

The most comprehensive single-volume treatment of ML system design currently available. Huyen covers the full lifecycle — from project scoping and data engineering through model development, deployment, monitoring, and iteration — with a practitioner's focus on the decisions that matter in production rather than the mathematical details of individual algorithms.

Reading guidance: Chapter 7 (Model Deployment and Prediction Service) provides the conceptual framework for the serving architecture used in the StreamRec capstone — batch prediction, online prediction, and the hybrid architectures that combine both. Huyen's treatment of model compression (distillation, pruning, quantization) is particularly relevant for Track C students exploring cost optimization on the technical roadmap. Chapter 9 (Continual Learning and Test in Production) covers the continuous training pipeline and canary deployment patterns used in Chapters 29 and 36, with additional detail on the tradeoffs between stateless retraining (train from scratch on all data) and stateful training (fine-tune on new data only). Chapter 10 (Infrastructure and Tooling for MLOps) provides the build-vs-buy framework that informs the roadmap's build-vs-buy analysis — Huyen's "four layers of ML infrastructure" (storage, compute, development environment, deployment) is a useful mental model for allocating infrastructure investment. The book's emphasis on iterative, production-oriented design aligns with the capstone's Theme 6 (Simplest Model That Works): deploy the simple thing first, measure, improve. For students who want a single companion book to read alongside the capstone project, this is the recommendation.

2. Will Larson, Staff Engineer: Leadership Beyond the Management Track (2021)

The definitive book on what senior individual contributors actually do. Larson interviewed dozens of staff and principal engineers at companies including Stripe, Slack, Auth0, and Fastly, and synthesized their experiences into a framework for technical leadership without management authority.

Reading guidance: Chapter 3 (Writing Engineering Strategy) describes the design document and RFC processes that directly inform the capstone's Technical Design Document template. Larson's distinction between a design document (proposes a specific solution to a specific problem) and an engineering strategy (a collection of design documents that form a coherent direction) maps to the distinction between the TDD (the capstone deliverable) and the technical roadmap (the forward-looking strategy document). Chapter 5 (Being Visible) addresses the stakeholder communication challenge from Section 36.6 — how to present technical work to different audiences, when to write vs. present, and how to build credibility through technical communication. Chapter 6 (Operating at Staff) describes the day-to-day activities of staff engineers — design review, code review, mentoring, project leadership, and cross-team coordination — which are the activities that Chapters 37-39 of this book address. For capstone students pursuing Track C, Larson's treatment of Architecture Decision Records, design documents, and technical roadmaps provides additional depth beyond what Section 36.3-36.7 covers.

3. Michael Nygard, "Documenting Architecture Decisions" (Blog post, 2011) and Joel Parker Henderson, Architecture Decision Records (GitHub repository, adr.github.io)

The original lightweight ADR format proposal and the most comprehensive collection of ADR templates, examples, and tooling. Nygard's original post defines the five-section ADR (Title, Status, Context, Decision, Consequences) in a single page. Henderson's repository extends this with dozens of templates, real-world examples, and tool integrations.

Reading guidance: Start with Nygard's original blog post (approximately 800 words) for the core format. The key insight is that ADRs should be short, numbered, and immutable — when a decision is superseded, you write a new ADR that references the old one rather than editing the original. This creates an archaeological record of how the architecture evolved, not just what it currently is. Henderson's repository provides templates for common ML decisions (model selection, feature engineering strategy, serving architecture) and integration with tools like adr-tools (command-line ADR management) and Markdown-based documentation systems. For the capstone project, the MADR template (Markdown Architectural Decision Records) is the most practical: it adds "Considered Alternatives" as a first-class section (which is the structure used in Section 36.4's ADR class). The adr-log command generates a table of contents of all ADRs — useful for the TDD's Section 6.

4. D. Sculley, Gary Holt, Daniel Golovin, et al., "Hidden Technical Debt in Machine Learning Systems" (NeurIPS, 2015) and Eric Breck, Shanqing Cai, Eric Nielsen, et al., "The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction" (IEEE Big Data, 2017)

The two foundational papers on technical debt in ML systems. Sculley et al. (2015) identifies the unique sources of technical debt in ML: data dependencies, feedback loops, entanglement, undeclared consumers, and configuration debt. Breck et al. (2017) operationalizes these insights into a 28-item rubric (the ML Test Score) that quantifies production readiness.

Reading guidance: Sculley et al.'s Figure 1 — the small "ML Code" box surrounded by vast infrastructure boxes — is the visual thesis of Chapter 24 and the capstone project. Section 4 (Data Dependencies) introduces the concept of "underutilized data dependencies" (features that contribute minimally to model quality but add pipeline complexity), which directly informs the feature selection and data contract decisions in the capstone. Section 6 (Configuration Debt) describes the configuration management challenge addressed in Exercise 36.21. Breck et al.'s ML Test Score rubric is a practical evaluation tool for the capstone: students can compute their system's ML Test Score (0-28) at each project milestone to track production readiness. The rubric's four categories — tests for features and data, tests for model development, tests for ML infrastructure, and monitoring tests — map directly to the testing and monitoring components of the capstone architecture. A Track A system typically scores 8-12; a Track B system 14-18; a Track C system 18-24. Scores above 24 are rare outside major technology companies.

5. Martin Kleppmann, Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Data Systems (O'Reilly, 2017)

The standard reference for distributed systems design, covering the data infrastructure that underlies every production ML system. Kleppmann provides rigorous treatment of replication, partitioning, transactions, stream processing, and batch processing — the building blocks of the feature store, training pipeline, and serving infrastructure.

Reading guidance: Chapter 3 (Storage and Retrieval) explains the data structures behind the feature store's online (Redis hash maps) and offline (Parquet columnar files) storage layers. Chapter 10 (Batch Processing) covers the MapReduce and Spark paradigms that power the batch feature computation pipeline. Chapter 11 (Stream Processing) covers the Kafka-based streaming architecture used in the Track B/C feature store for real-time feature updates. The distinction between log-based and database-based stream processing (Section 11.3) directly informs the ADR for streaming feature architecture. Chapter 12 (The Future of Data Systems) discusses the concept of "derived data" — data computed from other data, which is exactly what features are — and the design principles for keeping derived data consistent with its sources. For capstone students who find themselves debugging feature freshness, consistency, or staleness issues, Kleppmann's treatment of eventual consistency, read-after-write consistency, and causal consistency (Chapter 9) provides the theoretical framework for understanding what is going wrong and why.