Chapter 29: Further Reading
Essential Sources
1. Danilo Sato, Arif Wider, and Christoph Windheuser, "Continuous Delivery for Machine Learning" (martinfowler.com, 2019)
The foundational practitioner reference for CI/CD in ML systems. Sato, Wider, and Windheuser — writing on Martin Fowler's influential software engineering blog — adapt continuous delivery principles from traditional software engineering to the ML context, introducing the three-artifact pipeline (code, data, model) and the concept of ML-specific deployment stages. The article defines the deployment pipeline that this chapter formalizes: from source control through automated testing to staged production deployment, with the critical addition of data versioning and model validation gates.
Reading guidance: The "End-to-End ML Pipeline" section provides the architectural diagram that maps directly to the pipeline in Section 29.3 — read it alongside the MLCIPipeline class to see how the abstract architecture translates to concrete CI steps. The "Model Serving Patterns" section covers blue-green and canary deployment with ML-specific considerations (delayed feedback, feature store consistency) that extend the comparison in Section 29.7. The "Data Pipeline" section introduces data versioning with DVC, which complements the artifact lineage system in Section 29.4. For the theoretical foundations of continuous delivery that this article adapts, see Jez Humble and David Farley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley, 2010) — the canonical reference on deployment pipelines that predates the ML-specific adaptations. For a more recent treatment that incorporates LLM deployment challenges, see Huyen, Designing Machine Learning Systems (O'Reilly, 2022), Chapter 9, which extends the deployment pipeline to handle foundation model serving.
2. Chip Huyen, Designing Machine Learning Systems (O'Reilly, 2022)
The most comprehensive single-volume treatment of production ML systems, covering the full lifecycle from data engineering through model deployment to monitoring. Huyen's treatment of deployment (Chapter 9) and monitoring (Chapter 8) provides the conceptual framework that this chapter operationalizes with code. The book's strength is its breadth: it covers deployment patterns (shadow mode, A/B testing, canary, interleaving) with practical examples from industry, and it addresses organizational challenges (who owns deployment, how to coordinate between ML and platform teams) that purely technical treatments omit.
Reading guidance: Chapter 9 ("Model Deployment and Prediction Service") covers the deployment patterns from Sections 29.5-29.7 of this chapter with additional patterns not covered here: interleaving (for search ranking), multi-armed bandit deployment, and contextual bandit deployment. Chapter 7 ("Model Development and Offline Evaluation") provides the evaluation methodology that feeds the validation gate. Chapter 8 ("Data Distribution Shifts and Monitoring") covers the drift detection that feeds the retraining triggers in Section 29.8 and connects directly to Chapter 30 of this textbook. Section 9.3 on "Shadow Deployment" is the clearest published explanation of shadow mode architecture and its limitations. For a complementary industry perspective, see Larysa Visengeriyeva et al., "The ML Test Score" (ml-ops.org, 2023), which surveys deployment practices across 50+ organizations and identifies the most common gaps — canary evaluation and automated rollback are the two capabilities most frequently missing from production ML pipelines.
3. Google Cloud, "MLOps: Continuous Delivery and Automation Pipelines in Machine Learning" (cloud.google.com/architecture, 2023)
Google's official reference architecture for MLOps, defining the maturity levels (0-2) that Section 29.2 of this chapter extends to Level 3. The document provides detailed architecture diagrams for each level, concrete GCP service mappings (Vertex AI, Cloud Build, Artifact Registry), and a decision framework for choosing the appropriate maturity level based on the model's business criticality and the team's engineering capacity. The architecture is cloud-vendor-specific in its service choices but vendor-agnostic in its principles.
Reading guidance: The "MLOps Level 0: Manual Process" and "MLOps Level 1: ML Pipeline Automation" sections are the most valuable for teams assessing their current maturity. The "MLOps Level 2: CI/CD Pipeline Automation" section maps directly to the deployment pipeline in Section 29.12 — Google's Vertex AI pipeline corresponds to Dagster, Google's Model Registry corresponds to MLflow, and Google's Traffic Split API corresponds to the Istio VirtualService configuration. The architectural diagrams are the clearest published representation of the trigger → train → validate → deploy → monitor loop. For an alternative cloud-vendor perspective, see AWS's "MLOps Foundation Roadmap for Enterprises" (aws.amazon.com/blogs/machine-learning, 2023), which covers the same maturity levels with SageMaker-specific service mappings. For a vendor-neutral perspective, see the Linux Foundation's "MLOps Principles" (ml-ops.org), which defines maturity levels without reference to specific cloud services.
4. Ville Tuulos, Effective Data Science Infrastructure (Manning, 2022)
A practitioner-focused book on the infrastructure stack that supports ML deployment, written by the creator of Metaflow (Netflix's ML infrastructure framework). Tuulos covers the full stack from compute infrastructure through model versioning to deployment, with a focus on the organizational and operational aspects that academic treatments omit: how to design infrastructure that data scientists can use without becoming infrastructure engineers, how to balance standardization with flexibility, and how to evolve infrastructure incrementally rather than building everything at once.
Reading guidance: Part III ("Production") covers deployment infrastructure with a systems engineering perspective that complements this chapter's software engineering perspective. Chapter 10 (Deployment) covers model serving patterns (batch, real-time, streaming) and the infrastructure that supports them — Docker, Kubernetes, and model serving frameworks (Seldon, BentoML). The treatment of "infrastructure as code" for model serving directly supports the containerized deployment approach in Section 29.11. Chapter 11 (Operations) covers monitoring and incident response, bridging this chapter's rollback procedures to Chapter 30's monitoring infrastructure. For a more ML-platform-specific treatment, see the companion open-source project Metaflow (docs.metaflow.org), which provides a complete implementation of the deployment pipeline for AWS infrastructure.
5. Board of Governors of the Federal Reserve System, "SR 11-7: Guidance on Model Risk Management" (2011)
The foundational regulatory document for model risk management in financial institutions. SR 11-7 defines "model risk" as the potential for adverse consequences from decisions based on incorrect or misused model outputs, and establishes requirements for model development, validation, and ongoing monitoring. Every financial institution subject to Federal Reserve supervision must comply with SR 11-7, and the guidance shapes deployment pipelines for any model used in credit decisions, market risk assessment, or capital planning.
Reading guidance: Section III (Model Development, Implementation, and Use) defines the documentation requirements that the automated model change document in Case Study 2 addresses. Section IV (Model Validation) specifies independent validation — the requirement that drives the MRM review gate in the Meridian Financial pipeline. Section V (Governance, Policies, and Controls) covers the organizational structures (model risk committees, kill switches, escalation procedures) that shape the deployment pipeline's approval workflow. For practitioners implementing SR 11-7 compliance in ML systems, the OCC's companion document (OCC 2011-12) provides additional implementation guidance, and the more recent SR 15-18 provides guidance on model risk management for large financial institutions with extensive model inventories. For a practitioner-oriented interpretation, see Patrick Hall and Navdeep Gill, "An Introduction to Machine Learning Interpretability" (O'Reilly, 2019), which translates regulatory requirements into technical specifications for model validation and documentation.