Chapter 12 Quiz: From Model to Production — MLOps


Multiple Choice

Question 1. According to research cited in this chapter, approximately what percentage of ML models never make it to production?

a) 25% b) 50% c) 70% d) 87%


Question 2. Which of the following best explains why ML systems require different operational practices than traditional software?

a) ML systems are more expensive to build b) ML systems are written in Python, which is less reliable than Java or C++ c) ML system behavior depends on data as well as code, and can degrade gradually without producing errors d) ML systems require GPUs, which are harder to manage than CPUs


Question 3. A retail company runs a churn prediction model every night at 2 a.m. to score all 1.5 million customers. The marketing team accesses the churn scores each morning. Which model serving pattern is this?

a) Real-time inference b) Batch prediction c) Edge deployment d) Serverless inference


Question 4. Training-serving skew occurs when:

a) The model is trained on more data than it receives in production b) The model performs well on training data but poorly on test data c) Features are computed differently during training than during production serving d) The model is deployed on different hardware than it was trained on


Question 5. Which serialization format is recommended for cross-language, cross-platform model deployment?

a) Pickle b) Joblib c) ONNX d) CSV


Question 6. A model registry primarily serves which of the following functions?

a) Storing training data for future model retraining b) Versioning, storing, and managing model artifacts with metadata and lineage tracking c) Running model training experiments and logging hyperparameters d) Serving model predictions via REST APIs


Question 7. A product recommendation model was trained on customer behavior data from 2023. In 2024, the customer base shifts significantly younger as the company expands to a new market. The distribution of the "age" feature changes substantially. This is an example of:

a) Concept drift b) Data drift c) Model overfitting d) Training-serving skew


Question 8. A lending model was trained during a period of low interest rates. When interest rates rise sharply, the relationship between a borrower's debt-to-income ratio and their likelihood of default changes fundamentally — borrowers with the same ratio are now more likely to default. This is an example of:

a) Data drift b) Concept drift c) Feature store inconsistency d) Infrastructure failure


Question 9. In Athena's first production incident (Section 12.8), the root cause was:

a) A bug in the model's algorithm b) An upstream data pipeline change that caused a critical feature to become null c) A hardware failure in the production server d) A sudden change in customer behavior that the model couldn't predict


Question 10. Which retraining strategy involves running a new model alongside the current production model and comparing their performance before switching?

a) Scheduled retraining b) Triggered retraining c) Champion-challenger deployment d) Continuous training


Question 11. In a canary deployment, the new model version initially receives:

a) 100% of production traffic b) 50% of production traffic c) A small percentage (typically 1-5%) of production traffic d) No production traffic — it runs only on test data


Question 12. At MLOps Maturity Level 0, which of the following is typically true?

a) Models are trained using automated pipelines b) Model deployment is fully automated with CI/CD c) Models are trained in notebooks and deployed through manual, ad hoc processes d) Comprehensive monitoring and automated retraining are in place


Question 13. According to the chapter, what is the most common staffing mistake organizations make when scaling ML operations?

a) Hiring too many ML engineers and not enough data scientists b) Hiring too many data scientists and not enough ML engineers c) Hiring too many product managers d) Not hiring a Chief AI Officer


Question 14. Which of the following is NOT one of the three pillars of MLOps as defined in Section 12.2?

a) Data b) Model c) Hardware d) Code


Question 15. A model that is running without errors, accepting inputs, and returning predictions — but producing systematically incorrect results — illustrates which key challenge of ML operations?

a) The deployment gap b) The cold start problem c) Silent model failure d) Feature store inconsistency


Question 16. The Population Stability Index (PSI) is used to detect:

a) Whether a model has been deployed correctly b) Whether the distribution of input features has changed over time c) Whether the model's inference latency meets SLA requirements d) Whether the model's code has been modified since deployment


Question 17. Ravi Mehta's decision to hire an ML engineer instead of an additional data scientist reflects which operational insight?

a) ML engineers are less expensive than data scientists b) ML engineers can build models faster than data scientists c) The deployment bottleneck, not the modeling bottleneck, is the primary constraint on getting value from ML d) Data scientists are not needed once models are in production


Question 18. Which of the following monitoring levels answers the question "Is the model making good predictions?"

a) Infrastructure monitoring b) Data monitoring c) Model performance monitoring d) Business impact monitoring


Question 19. A data contract between a data engineering team and a data science team specifies:

a) The salary and benefits of data engineers b) The columns, data types, and value ranges that the ML pipeline depends on, requiring notification before changes c) The service-level agreement for model inference latency d) The intellectual property ownership of model artifacts


Question 20. Which cost category is most frequently underestimated in production ML, according to both this chapter and Sculley et al.'s research?

a) Training compute b) Data acquisition c) Maintenance and ongoing operations d) Initial infrastructure setup


True or False

Question 21. A feature store's primary purpose is to make model training faster by caching data on GPUs.


Question 22. In a champion-challenger deployment, the challenger model's predictions are served to end users alongside the champion model's predictions.


Question 23. Pickle files carry a security risk because loading a pickle file from an untrusted source can execute arbitrary Python code.


Question 24. At MLOps Maturity Level 2, model retraining can be triggered automatically based on monitoring signals.


Question 25. Inference costs (the cost of running a model on each new prediction request) typically exceed training costs over the model's production lifetime for high-traffic applications.


Answer Key

  1. d) Gartner estimated that approximately 87% of ML models never reach production.

  2. c) ML system behavior depends on data as well as code, and models can degrade gradually (through data drift or concept drift) without producing traditional software errors. This is the fundamental distinction between ML systems and deterministic software.

  3. b) Batch prediction — the model processes a large dataset at scheduled intervals (nightly) and stores pre-computed predictions for later consumption.

  4. c) Training-serving skew occurs when features are computed differently during training versus serving, causing the model to receive inputs it wasn't trained on. The Athena example — where average order value was computed with and without returns in different environments — illustrates this.

  5. c) ONNX (Open Neural Network Exchange) is designed for cross-language, cross-platform model deployment. Pickle and Joblib are Python-specific.

  6. b) A model registry is a centralized repository for versioning, storing, and managing model artifacts along with metadata (metrics, lineage, stage) and promotion workflows.

  7. b) Data drift — the distribution of input features (age) changed, even though the relationship between features and target may not have changed.

  8. b) Concept drift — the fundamental relationship between the input features (debt-to-income ratio) and the target (default probability) has changed due to the new interest rate environment.

  9. b) The data engineering team renamed a column, causing the feature pipeline to return nulls instead of values. The model continued to run but produced unreliable predictions with the missing feature.

  10. c) Champion-challenger deployment runs the new model (challenger) alongside the current model (champion) and compares their predictions before promoting the challenger.

  11. c) In canary deployments, the new model receives a small percentage of traffic (typically 1-5%) to limit the blast radius of potential failures.

  12. c) Level 0 is characterized by manual, ad hoc processes — notebooks for training, manual deployment, no automated testing or monitoring.

  13. b) Organizations typically over-hire data scientists and under-hire ML engineers, which creates a bottleneck at deployment. The recommended ratio is approximately 1 ML engineer for every 1-2 data scientists.

  14. c) The three pillars are data, model, and code. Hardware is an infrastructure concern, not one of the conceptual pillars.

  15. c) Silent model failure — the model produces predictions that appear normal (no errors, proper formatting) but are systematically incorrect. This is why monitoring is essential.

  16. b) PSI measures the degree of change in a variable's distribution between two samples (e.g., training data vs. current production data). It is a standard drift detection metric.

  17. c) Ravi recognized that the bottleneck was deployment, not modeling. Adding another data scientist would produce more models in notebooks; adding an ML engineer would get existing models into production faster.

  18. c) Model performance monitoring tracks prediction quality — accuracy against ground truth, prediction distribution shifts, and subgroup performance.

  19. b) A data contract is a formal agreement specifying which columns, types, and value ranges the downstream system depends on, with requirements for notification before changes.

  20. c) Both this chapter and Sculley et al.'s "Hidden Technical Debt in Machine Learning Systems" emphasize that maintenance and ongoing operational costs far exceed initial development costs and are consistently underestimated.

  21. False. A feature store's primary purpose is to centralize feature definitions and ensure consistent feature computation across training and serving environments, eliminating training-serving skew and enabling feature reuse.

  22. False. In champion-challenger deployment, the challenger model runs in shadow mode — it makes predictions on the same inputs, but its predictions are logged for comparison, not served to end users. Only the champion's predictions are served.

  23. True. Pickle deserialization can execute arbitrary Python code, making it a potential security vector. The chapter recommends never loading pickle files from untrusted sources and considering ONNX or containerized approaches for production.

  24. True. Level 2 (CI/CD Pipeline Automation) includes triggered retraining based on monitoring signals — when data drift exceeds thresholds or model performance drops below minimums, retraining is initiated automatically.

  25. True. For high-traffic models, per-request inference costs accumulate over the model's lifetime and frequently exceed the one-time training cost, especially for real-time serving patterns.