Exercises: Chapter 36
The Road to Advanced: Deep Learning, Causal Inference, MLOps, and Where to Go Next
Exercise 1: When to Use Deep Learning (Conceptual)
For each of the following scenarios, recommend whether to start with gradient boosting or deep learning. Justify your answer in 2--3 sentences. If deep learning, name the architecture family (CNN, RNN/LSTM, Transformer).
a) Predicting loan default from 45 tabular features (income, credit score, employment history, loan amount, etc.) on a dataset of 200,000 applications.
b) Classifying customer support tickets into 12 categories based on the ticket text (average length: 150 words).
c) Detecting pneumonia from chest X-rays using a dataset of 15,000 labeled radiographs.
d) Forecasting daily sales for 500 retail products using 3 years of historical data plus calendar features, promotions, and weather.
e) Identifying defective circuit boards from high-resolution photographs taken on the production line.
f) Predicting employee attrition from HR data (tenure, department, salary, performance rating, commute distance, survey responses --- 22 features, 5,000 records).
Exercise 2: PyTorch Fundamentals (Code)
Modify the PyTorch hello world example from the chapter to classify the breast cancer dataset (binary classification) instead of the iris dataset. The network should have three layers (input, hidden with 32 neurons, output).
import torch
import torch.nn as nn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import numpy as np
np.random.seed(42)
torch.manual_seed(42)
# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Convert to tensors
X_train_t = torch.FloatTensor(X_train)
y_train_t = torch.FloatTensor(y_train).unsqueeze(1) # Shape: (n, 1)
X_test_t = torch.FloatTensor(X_test)
y_test_t = torch.FloatTensor(y_test).unsqueeze(1)
# TODO: Define a 3-layer network for binary classification
# Hint: Use nn.Sequential with Linear, ReLU, and a Sigmoid output.
# The loss function for binary classification is nn.BCELoss().
# TODO: Train for 200 epochs with Adam optimizer (lr=0.001).
# TODO: Evaluate on the test set and print a classification_report.
Questions: a) What test accuracy do you achieve? How does it compare to a logistic regression baseline (use scikit-learn)?
b) Add a second hidden layer with 16 neurons. Does performance improve?
c) What happens if you remove the StandardScaler? Why?
Exercise 3: Causal Inference --- Identifying the Question (Conceptual)
For each scenario, state whether the question is predictive or causal. If causal, identify the treatment, the outcome, and one potential confounder that could bias a naive comparison.
a) StreamFlow wants to know: "Which subscribers are most likely to churn next month?"
b) StreamFlow wants to know: "Does sending a 20% discount to at-risk subscribers reduce churn?"
c) The manufacturing team wants to know: "Which machines are most likely to fail in the next 30 days?"
d) The manufacturing team wants to know: "Does the new preventive maintenance schedule reduce unplanned downtime?"
e) The hospital wants to know: "Which patients are at high risk of readmission within 30 days?"
f) The hospital wants to know: "Does a nurse follow-up call within 48 hours of discharge reduce readmission?"
Exercise 4: Difference-in-Differences Simulation (Code)
StreamFlow launched a premium support feature (priority chat, dedicated account manager) for subscribers on the Business plan in January. Subscribers on the Professional plan did not get the feature. You want to estimate the causal effect on churn.
Simulate the following scenario and estimate the DiD effect:
import numpy as np
import pandas as pd
np.random.seed(42)
n_per_group = 1000
# Pre-treatment period (October-December): monthly churn rates
# Both groups have a seasonal decline in churn heading into winter
pre_business_churn = [0.15, 0.14, 0.13] # Business plan (treatment group)
pre_professional_churn = [0.12, 0.11, 0.10] # Professional plan (control)
# Post-treatment period (January-March):
# Business plan got premium support (true effect: -4pp reduction)
# Both groups continue seasonal trend
post_business_churn = [0.08, 0.07, 0.07] # Treatment effect included
post_professional_churn = [0.09, 0.08, 0.08] # No treatment, just trend
# TODO: Compute the DiD estimate using the average pre and post rates.
# TODO: Create a visualization showing the parallel trends assumption.
# Plot pre and post churn rates for both groups.
# Use matplotlib. Add vertical line at treatment date.
# TODO: Compute a 95% confidence interval for the DiD estimate
# using bootstrap resampling (1000 iterations).
Questions: a) What is your DiD estimate of the causal effect of premium support?
b) Do the pre-treatment trends appear parallel? Why does this matter?
c) Name one threat to the validity of this DiD analysis. How would you address it?
Exercise 5: MLOps Maturity Assessment (Conceptual + Code)
Using the data maturity assessment concept from Chapter 34, create an MLOps-specific maturity assessment. Define 10 yes/no questions that map to MLOps Levels 0--3.
import numpy as np
np.random.seed(42)
def assess_mlops_maturity(responses: dict) -> dict:
"""
Assess an organization's MLOps maturity level.
Parameters
----------
responses : dict
Keys are capability identifiers, values are booleans.
Returns
-------
dict
Maturity level (0-3), label, and recommendations.
"""
# TODO: Define 10 questions covering:
# - Version control for code
# - Version control for data
# - Automated training pipelines
# - Experiment tracking
# - Automated testing (data validation, model validation)
# - CI/CD for model deployment
# - Feature store
# - Model monitoring in production
# - Automated retraining on drift detection
# - Model governance (registry, approval workflows)
# TODO: Map scores to levels 0-3 with labels and recommendations.
pass
# Test with three scenarios:
# Scenario A: A startup with two data scientists working in notebooks.
# Scenario B: StreamFlow after implementing Chapters 29-32.
# Scenario C: A mature fintech with 50+ models in production.
Exercise 6: Learning Path Design (Reflection)
This exercise has no code. It is the most important exercise in the chapter.
a) Which of the four paths (NLP, Computer Vision, Experimentation, ML Engineering) interests you most? Write one paragraph explaining why.
b) From that path's "what you need to learn" list, identify the skill you are most excited about and the skill that seems most intimidating. For the intimidating one, identify one specific resource you will use to learn it.
c) Define a capstone project for your chosen path. It must: - Solve a real problem (not a Kaggle competition) - Include a clear business framing - Be completable in 4--6 weeks - Be deployable (not just a notebook)
d) Write a 6-month learning plan with monthly milestones. Be specific: "Learn transformers" is too vague. "Complete Hugging Face NLP Course chapters 1--4 and fine-tune a BERT classifier on Amazon product reviews" is actionable.
e) Set a 3-month checkpoint question: "Am I still energized by this path?" Write down what you will do if the answer is no.
Exercise 7: The Complete Picture (Conceptual)
This exercise ties together the entire textbook. For each of the following statements, identify which chapter(s) primarily address it and write one sentence explaining the key lesson.
a) "The model's accuracy is 95% on the test set, so it's ready for production."
b) "We don't need to monitor the model after deployment --- it was thoroughly evaluated."
c) "The model predicts that this customer will churn, so the retention offer must have prevented it."
d) "We should use deep learning because it's more powerful."
e) "The features are all numeric, so no preprocessing is needed."
f) "The model is fair because its overall accuracy is the same across demographic groups."
g) "We trained and evaluated on the same dataset, but we used cross-validation so it's fine."
h) "The model's AUC is 0.92, so the business should invest in deploying it."
Exercise 8: Translating Between Worlds (Code)
Write a function that takes a scikit-learn model and translates its evaluation metrics into the "languages" of four different audiences.
import numpy as np
from sklearn.metrics import (
accuracy_score, precision_score, recall_score,
f1_score, roc_auc_score, confusion_matrix
)
np.random.seed(42)
def translate_metrics(
y_true: np.ndarray,
y_pred: np.ndarray,
y_prob: np.ndarray,
tp_value: float,
fp_cost: float,
fn_cost: float,
context: str = "churn prediction",
) -> dict:
"""
Translate model metrics into four audiences' languages.
Returns a dict with keys:
- 'technical': Standard ML metrics (precision, recall, F1, AUC)
- 'business': Dollar-value metrics (ROI, cost of errors, break-even)
- 'executive': One-paragraph summary a CEO can understand
- 'engineering': Operational metrics (throughput, latency concerns,
monitoring requirements)
"""
# TODO: Compute standard metrics.
# TODO: Compute business value metrics using the expected value framework.
# TODO: Generate a plain-English executive summary.
# TODO: Generate engineering deployment notes.
pass
# Test with StreamFlow churn data:
n = 5000
y_true = np.random.binomial(1, 0.12, n)
y_prob = np.where(
y_true == 1,
np.random.beta(4, 2, n),
np.random.beta(1.5, 5, n),
)
y_pred = (y_prob >= 0.20).astype(int)
result = translate_metrics(
y_true, y_pred, y_prob,
tp_value=78.0, fp_cost=30.0, fn_cost=180.0,
context="StreamFlow churn prediction",
)
for audience, content in result.items():
print(f"\n{'=' * 50}")
print(f"Audience: {audience.upper()}")
print(f"{'=' * 50}")
print(content)
These exercises support Chapter 36: The Road to Advanced. Return to the chapter for reference material.