Chapter 12 Exercises: From Model to Production — MLOps
Section A: The Deployment Gap
Exercise 12.1 — Diagnosing Deployment Failures
For each of the following scenarios, identify the primary root cause of the deployment failure using the categories from Section 12.1 (infrastructure gap, organizational silo, missing skills, inadequate testing, no monitoring plan, data pipeline gap, or "works on my laptop" problem). Then propose one concrete action that would have prevented the failure.
a) A data scientist builds a sentiment analysis model in a Jupyter notebook. It achieves 89% accuracy on test data. When the team tries to deploy it, they discover the model depends on a Python library version that conflicts with the company's production environment. The conflict takes three weeks to resolve.
b) A retail company deploys a demand forecasting model. Six months later, a supply chain manager notices that forecasts have been increasingly inaccurate. Investigation reveals that the model was trained on data from three physical stores, but the company opened two new stores four months ago with different customer demographics. No one updated the model or its training data.
c) A fraud detection model performs well in development using a static dataset of historical transactions. In production, it receives transactions in real-time. The data scientist discovers that two critical features — "average transaction amount over 30 days" and "number of transactions this week" — cannot be computed in real-time because the required aggregation queries take 45 seconds to run.
d) A data science team at a bank builds a credit scoring model. The model development team sits in the analytics department. The deployment team sits in IT. The IT team receives a model file with minimal documentation and is told to "deploy it." Six weeks later, the model is still not in production because the IT team doesn't understand the model's input requirements or serving pattern.
e) A healthcare company deploys an appointment no-show prediction model. The model runs successfully for several months. Then the company migrates its electronic health record system to a new vendor. The model continues to run without errors, but its accuracy drops from 82% to 54% because the new EHR system encodes patient visit history differently.
Exercise 12.2 — Quantifying the Deployment Gap
Your organization has 12 data scientists who have collectively built 47 models over the past two years. Of these 47 models, 6 are in production. Calculate the organization's deployment rate. Then write a one-paragraph analysis: Is this rate typical? What organizational factors might explain it? What would you recommend to improve the rate, and what is the single highest-leverage investment?
Exercise 12.3 — The Business Case for MLOps
Your CEO has asked you to justify a $350,000 annual investment in MLOps infrastructure and an ML engineer hire. Currently, each model deployment takes 14 weeks of engineering effort. With MLOps infrastructure, you estimate the deployment time will drop to 4 weeks. Your organization plans to deploy 8 new models in the next 18 months, and you estimate each model generates an average of $500,000 in annual business value once deployed. Build a quantitative business case (including time-to-value acceleration, cost savings, and ROI) that would persuade a financially-minded executive.
Section B: Architecture and Design
Exercise 12.4 — Serving Pattern Selection
For each of the following ML use cases, recommend a serving pattern (batch, real-time, edge, or serverless) and justify your choice. Address latency requirements, traffic patterns, and infrastructure considerations.
a) A bank wants to score all 2.3 million credit card customers for fraud risk every night so that the fraud investigation team can prioritize their morning review queue.
b) An e-commerce company wants to show personalized product recommendations to customers as they browse the website. Average daily traffic is 400,000 sessions.
c) A construction company wants to use a computer vision model to detect safety violations (workers without hard hats) on job sites in remote areas with unreliable internet connectivity.
d) A SaaS company wants to predict which free trial users will convert to paid subscriptions. The marketing team runs conversion campaigns once per week.
e) An agricultural technology company wants to classify crop diseases from smartphone photos taken by farmers in developing countries with limited cellular data access.
f) A small consulting firm wants to use a text classification model to auto-categorize incoming support emails. They receive 50-100 emails per day, with most arriving during business hours.
Exercise 12.5 — Pipeline Architecture Design
You are the ML engineer at a mid-size insurance company. The data science team has built a claims fraud detection model that needs to be deployed to production. Design an end-to-end ML pipeline by specifying:
a) The data sources and ingestion strategy b) The data validation checks you would implement c) The feature engineering approach and whether you would use a feature store d) The model training and evaluation pipeline e) The serving pattern and deployment strategy f) The monitoring and alerting system g) The retraining strategy
Present your design as a written architecture description (not a diagram). For each component, justify your design choice.
Exercise 12.6 — Feature Store Cost-Benefit Analysis
Your organization has three ML models in production — a churn prediction model, a recommendation engine, and a demand forecasting model. Currently, each model has its own feature engineering pipeline. You've discovered that 40% of the features are shared across at least two models, but they're computed independently (and sometimes inconsistently). A feature store implementation would cost $120,000 in engineering effort plus $30,000/year in infrastructure.
a) List the specific risks of the current approach (independent feature pipelines). b) List the specific benefits of a feature store for this scenario. c) Estimate the break-even point, considering engineering time savings, reduced inconsistency bugs, and faster time-to-deploy for new models. State your assumptions clearly. d) Make a recommendation: should the organization invest in a feature store now, or wait? Justify your answer.
Section C: Monitoring and Incident Response
Exercise 12.7 — Monitoring System Design
Design a monitoring system for a real-time product recommendation model deployed as a REST API. For each of the four monitoring levels (infrastructure, data, model performance, business impact), specify:
a) The specific metrics you would track b) The alert thresholds for each metric c) The alert severity (warning vs. critical) and escalation path d) How you would distinguish between a temporary anomaly and a genuine problem
Exercise 12.8 — Drift Detection Scenarios
For each of the following scenarios, determine whether the situation describes data drift, concept drift, both, or neither. Explain your reasoning and describe the appropriate response.
a) A retailer's churn model was trained on data from 2023. In 2024, the retailer launches a mobile app that attracts a younger demographic. The age distribution of the customer base shifts significantly.
b) A bank's loan default model was trained on data from a period of low interest rates. Interest rates rise sharply, causing borrowers who previously would not have defaulted to struggle with payments. The relationship between income-to-debt ratio and default probability changes.
c) A hospital's readmission prediction model experiences a sudden accuracy drop. Investigation reveals that the hospital's IT team performed a software update that changed the units of a lab test value from mg/dL to mmol/L.
d) An e-commerce company's recommendation model gradually becomes less effective. Investigation reveals that customer preferences have shifted — customers who previously bought office supplies are now buying home fitness equipment (a trend accelerated by remote work). The model was trained on pre-remote-work data.
e) A weather-dependent demand forecasting model performs poorly during an unusually warm winter. Temperature distributions are outside the range the model was trained on, and the model's assumption that cold weather drives hot beverage sales no longer holds because the winter is warm.
Exercise 12.9 — Incident Response Simulation
You are the on-call ML engineer at a fintech company. At 7:15 a.m. on Monday, you receive the following alert:
CRITICAL: Credit scoring model prediction distribution anomaly. Mean credit score output shifted from 645 to 580 over the past 6 hours. Z-score: 4.7. Threshold: 3.0.
The credit scoring model is a real-time API that scores loan applicants. If the mean score has genuinely dropped, the company would be approving fewer loans than normal — potentially turning away creditworthy applicants.
Walk through your incident response: a) What is your immediate first action? (Before any investigation.) b) What are the three most likely root causes you would investigate, in priority order? c) For each root cause, describe the diagnostic steps you would take. d) If you determine that the root cause is a data pipeline failure that sent incorrect income data, what is your mitigation strategy? e) What changes would you recommend in the post-incident review to prevent recurrence?
Section D: MLOps Maturity and Strategy
Exercise 12.10 — Maturity Assessment
Evaluate the following organization's MLOps maturity using the Level 0-2 framework from Section 12.11. Cite specific evidence for your assessment and recommend three concrete actions to advance to the next level.
Organization Profile: A 500-person e-commerce company with a 6-person data science team and 2 data engineers. They have 4 models in production: - A product recommendation model (deployed 18 months ago, never retrained) - A search ranking model (deployed 10 months ago, retrained manually twice) - A fraud detection model (deployed 6 months ago, retrained monthly by a data scientist who runs a notebook) - A customer lifetime value model (deployed 3 months ago)
Models are trained in Jupyter notebooks and exported as pickle files. An engineer manually copies model files to production servers. There is no model registry. Monitoring consists of a Grafana dashboard that tracks API latency and error rates. There is no data drift monitoring. The team does not have on-call rotations — the senior data scientist's phone number is taped to a whiteboard with the label "Call if model breaks."
Exercise 12.11 — MLOps Roadmap Planning
You have been promoted to Head of ML Engineering at a consumer goods company. The company currently has 3 models in production (all Level 0) and plans to deploy 12 more over the next two years. You have budget for 2 additional ML engineers and $200,000 in tooling.
Create a 12-month MLOps roadmap that includes: a) A phased plan (quarter by quarter) for advancing from Level 0 to Level 1 (at minimum) b) Specific tools you would adopt at each phase, with justification c) Team structure and responsibility assignments d) Key milestones and success metrics for each phase e) Risk factors and mitigation strategies
Exercise 12.12 — Build vs. Buy for MLOps Platforms
Your organization is deciding between building a custom MLOps stack using open-source tools (MLflow, Airflow, Feast, Evidently) and purchasing an end-to-end MLOps platform (e.g., Databricks, SageMaker, or Vertex AI). Create a comparison analysis across the following dimensions:
a) Initial implementation cost and time b) Ongoing operational cost (including headcount) c) Flexibility and customizability d) Vendor lock-in risk e) Skill requirements f) Scalability
Make a recommendation for a mid-size company (50-person engineering team, 8-person data science team, 10 models in production). Justify your choice.
Section E: Organizational and Economic
Exercise 12.13 — Team Structure Design
A financial services firm has 15 data scientists spread across 4 business units (lending, insurance, trading, and compliance). Each business unit has 3-4 data scientists who build models independently. The firm has 2 data engineers and zero ML engineers. Models are deployed ad hoc, and three of the four business units have experienced production model failures in the past year.
a) Diagnose the organizational problem using the team structure models from Section 12.12. b) Propose a new team structure, including any new hires. c) Estimate the headcount investment and justify it quantitatively using the deployment gap data from the chapter. d) Describe how you would handle the transition from the current structure to the new one, including potential resistance from the business units.
Exercise 12.14 — Cost Optimization Case Study
An online marketplace runs 6 ML models in production:
| Model | Serving Pattern | Monthly Inference Cost | Monthly Value |
|---|---|---|---|
| Product recommendations | Real-time | $18,000 | $120,000 | |
| Search ranking | Real-time | $12,000 | $85,000 | |
| Fraud detection | Real-time | $8,000 | $45,000 | |
| Seller quality scoring | Batch (daily) | $2,500 | $30,000 | |
| Demand forecasting | Batch (weekly) | $1,500 | $25,000 | |
| Customer segmentation | Batch (monthly) | $800 | $15,000 |
a) Calculate the ROI for each model individually. b) Identify which models have the weakest ROI and propose optimization strategies. c) The CTO asks you to cut total inference costs by 30% without eliminating any models. Propose specific optimization strategies for each model and estimate the cost reduction achievable. d) One of the real-time models could potentially be converted to batch prediction without significant business impact. Which one? Justify your answer with business logic.
Exercise 12.15 — Blameless Post-Mortem
Write a blameless post-mortem report for Athena's churn model incident (Section 12.8). Your report should include:
a) Incident Summary — What happened, when, and what was the impact? b) Timeline — A minute-by-minute or hour-by-hour timeline of the incident from trigger to resolution c) Root Cause Analysis — Use the "5 Whys" technique to trace from symptom to root cause d) Contributing Factors — What organizational or process factors contributed to the incident's severity or delayed its detection? e) Action Items — At least five specific, measurable action items with owners and deadlines f) Lessons Learned — What did the organization learn that applies beyond this specific incident?
Section F: Integration and Synthesis
Exercise 12.16 — Full MLOps Architecture for a New Model
Athena's marketing team has requested a model that predicts which customers will respond to a promotional email campaign. The model should be run before each campaign (approximately twice per week) and should score all 2.1 million active customers, producing a ranked list of the top 200,000 most likely responders.
Design the full MLOps architecture for this model, addressing: a) Serving pattern and justification b) Feature engineering approach (considering existing features in Athena's feature store) c) Model packaging and deployment strategy d) Monitoring plan (all four levels) e) Retraining strategy f) Integration with Athena's existing MLOps infrastructure (Level 1) g) Cost estimate (order of magnitude) and ROI justification
Exercise 12.17 — Cross-Chapter Integration: From Problem to Production
Trace the complete journey of Athena's churn prediction model from inception to production, referencing specific concepts from each chapter:
a) How was the problem framed? (Chapter 6 — ML Canvas, Five Questions) b) What model type was selected and why? (Chapter 7 — classification) c) How was the model evaluated? (Chapter 11 — business-aligned metrics) d) How was it deployed and monitored? (Chapter 12 — MLOps)
For each stage, identify the key decision that was made, who made it, and what the alternative was. Then identify the single point in this journey where the project was most at risk of failure, and explain why.
Exercise 12.18 — MLOps Maturity Benchmarking
Research and compare the MLOps capabilities of three organizations: one in financial services, one in technology, and one in retail. (You may use publicly available case studies, conference talks, or blog posts.) For each organization, assess:
a) Their approximate MLOps maturity level b) The tools and platforms they use c) The team structure supporting MLOps d) Their model deployment frequency (how often they deploy new or updated models) e) Their key challenges and lessons learned
Synthesize your findings into a one-page comparison table and a two-paragraph analysis of common patterns and differences.
These exercises span the strategic, organizational, technical, and economic dimensions of MLOps. They are designed to be worked individually and discussed in groups. Exercises 12.5, 12.11, and 12.16 make excellent team projects. Exercise 12.15 can be run as a classroom simulation — assign roles (on-call engineer, team lead, data engineer, business stakeholder) and walk through the incident response in real time.