Chapter 34: Quiz -- MLOps and LLMOps

Question 1

What is the primary purpose of experiment tracking in MLOps?

A) To make training faster B) To ensure reproducibility and enable comparison of different training runs C) To reduce the model size D) To improve data quality

Answer: B Explanation: Experiment tracking systematically records hyperparameters, metrics, artifacts, and environment details for each training run, enabling reproducibility and informed comparison of different approaches.

Question 2

Which MLOps maturity level includes automated CI/CD for ML, continuous training, and full observability?

A) Level 0 B) Level 1 C) Level 2 D) Level 3

Answer: C Explanation: Google's MLOps maturity model defines Level 0 as manual processes, Level 1 as ML pipeline automation with basic monitoring, and Level 2 as full CI/CD for ML with automated testing, continuous training, A/B testing, and comprehensive observability.

Question 3

What does DVC (Data Version Control) primarily address?

A) Model serving at scale B) Versioning large datasets and ML pipelines alongside code in Git C) Distributed model training D) Hyperparameter optimization

Answer: B Explanation: DVC extends Git to handle large files, datasets, and ML pipelines. It stores file metadata in Git while keeping actual data in remote storage (S3, GCS, etc.), enabling data versioning alongside code versioning.

Question 4

What is Population Stability Index (PSI) used for?

A) Measuring model accuracy over time B) Detecting distribution shift between a reference dataset and a new data batch C) Computing feature importance D) Optimizing hyperparameters

Answer: B Explanation: PSI measures how much a variable's distribution has shifted between two datasets (typically training data vs production data). PSI < 0.1 indicates no significant shift, 0.1-0.25 indicates moderate shift, and > 0.25 indicates significant shift requiring investigation.

Question 5

In a canary deployment strategy, what is the key principle?

A) Deploy the new model to all users simultaneously B) Gradually shift traffic from the old model to the new model while monitoring for regressions C) Test the model only in a staging environment D) Use shadow mode where both models run but only one serves traffic

Answer: B Explanation: Canary deployment gradually increases traffic to the new model version (e.g., 1% to 5% to 25% to 100%) while continuously monitoring metrics. If problems are detected, traffic is automatically shifted back to the old version.

Question 6

What is the primary difference between data drift and concept drift?

A) Data drift affects inputs while concept drift affects the input-output relationship B) Data drift is faster than concept drift C) Data drift only affects numerical features D) Concept drift only occurs in classification problems

Answer: A Explanation: Data drift (covariate shift) occurs when the input feature distribution changes (P(X) shifts). Concept drift occurs when the relationship between inputs and outputs changes (P(Y|X) shifts). Both require monitoring but have different detection approaches and implications.

Question 7

What does the model registry provide in an MLOps pipeline?

A) A place to store training data B) A centralized system for versioning, staging, and managing model lifecycle transitions C) A tool for hyperparameter search D) A monitoring dashboard

Answer: B Explanation: The model registry serves as the central hub for model management, tracking model versions with metadata, managing stage transitions (development, staging, production, archived), and enabling rollback when needed.

Question 8

What is the purpose of shadow mode deployment?

A) To save compute costs by running models less frequently B) To run the new model in parallel with the old model on real traffic without serving its predictions, for safe comparison C) To train models in the background D) To anonymize user data

Answer: B Explanation: Shadow mode (or dark launching) runs the new model on real production traffic but does not serve its predictions to users. Both models' outputs are logged and compared, allowing validation of the new model's behavior on real data without risk to users.

Question 9

Which is NOT a typical component of an LLMOps pipeline?

A) Prompt versioning and management B) Token usage and cost monitoring C) GPU kernel optimization D) Output guardrails and safety filtering

Answer: C Explanation: LLMOps focuses on operational concerns specific to LLM applications: prompt management, evaluation, cost monitoring, guardrails, and observability. GPU kernel optimization is a model development concern, not an operational concern specific to LLM deployment.

Question 10

What is the purpose of guardrails in LLM applications?

A) To improve model training speed B) To enforce input/output safety constraints, prevent harmful content, and ensure responses meet quality standards C) To reduce API costs D) To version prompts

Answer: B Explanation: Guardrails are programmatic checks applied to LLM inputs and outputs. Input guardrails may filter off-topic queries and redact PII. Output guardrails may check for harmful content, factual consistency, and format compliance. They are a critical safety layer in production LLM systems.

Question 11

In A/B testing for ML models, why is statistical significance important?

A) It makes the model more accurate B) It ensures that observed differences in metrics are unlikely due to random chance C) It reduces deployment time D) It improves data quality

Answer: B Explanation: Statistical significance testing (e.g., chi-squared test, t-test) determines whether the observed performance difference between model variants is real or could be explained by random variation in the traffic split. Without it, you might deploy a worse model based on noise.

Question 12

What is the key advantage of continuous training in MLOps?

A) It eliminates the need for monitoring B) It automatically retrains models when triggered by data drift, performance degradation, or schedules, keeping models current C) It makes training faster D) It requires less data

Answer: B Explanation: Continuous training automates the retraining loop, triggering model updates based on detected drift, performance degradation, new data availability, or scheduled intervals. This keeps models current without manual intervention.

Question 13

Which metric is commonly used to evaluate LLM output faithfulness in a RAG system?

A) BLEU score B) The fraction of claims in the response that are supported by the retrieved context C) Perplexity D) Token generation speed

Answer: B Explanation: Faithfulness measures whether the LLM's response contains only information supported by the retrieved context. It is typically computed by extracting claims from the response and verifying each against the context using NLI models or LLM-based evaluation.

Question 14

What problem does feature store solve in MLOps?

A) It speeds up model training B) It provides a centralized, consistent, and versioned repository for features used across training and serving C) It replaces the need for a database D) It automatically selects the best features

Answer: B Explanation: Feature stores solve the training-serving skew problem by providing a single source of truth for feature definitions and computation. They ensure that features computed during training match those used in production serving, while also enabling feature reuse across teams.

Question 15

What is the purpose of model quality gates in a CI/CD pipeline?

A) To speed up deployment B) To automatically block deployment of models that fail to meet minimum performance thresholds on validation data C) To reduce compute costs D) To version model artifacts

Answer: B Explanation: Quality gates are automated checks that prevent models from progressing through the pipeline if they fail to meet criteria such as minimum accuracy, maximum latency, fairness thresholds, or regression tests against the current production model.

Scoring Guide

Score	Level	Recommendation
14-15	Expert	Ready to build production MLOps pipelines
11-13	Advanced	Strong foundation, practice with real deployments
8-10	Intermediate	Good understanding, build end-to-end pipelines
5-7	Developing	Review monitoring and deployment concepts
0-4	Beginning	Re-read the chapter focusing on the ML lifecycle