Exercises: Chapter 31
Model Deployment
Exercise 1: FastAPI Fundamentals (Code)
Build a minimal FastAPI application that serves a scikit-learn model.
a) Train a RandomForestClassifier on the following synthetic dataset and save it with joblib:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
import joblib
X, y = make_classification(
n_samples=5000, n_features=8, n_informative=6,
n_redundant=1, flip_y=0.05, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
model = RandomForestClassifier(
n_estimators=200, max_depth=8, random_state=42
)
model.fit(X_train, y_train)
auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])
print(f"Test AUC: {auc:.4f}")
joblib.dump(model, "model.joblib")
b) Write a FastAPI application (app.py) with:
- A GET /health endpoint that returns {"status": "healthy"}
- A POST /predict endpoint that accepts a JSON object with 8 float features (f0 through f7) and returns {"probability": <float>, "prediction": <int>}
c) Define Pydantic schemas for the request and response. The request schema should enforce that all 8 features are provided and are valid floats. The response schema should enforce that probability is between 0 and 1 and prediction is either 0 or 1.
d) Start the server with uvicorn app:app --port 8000 and test it with:
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"f0": 1.2, "f1": -0.5, "f2": 0.8, "f3": 2.1, "f4": -1.0, "f5": 0.3, "f6": 1.5, "f7": -0.2}'
Verify that the response includes a probability and a binary prediction.
Exercise 2: Input Validation (Code)
Extend the application from Exercise 1 to handle edge cases.
a) Add range validation to the Pydantic schema. Features should be between -10.0 and 10.0 (reasonable for standardized features). Test that a request with "f0": 999.0 returns a 422 error.
b) Add an endpoint POST /predict/batch that accepts a list of up to 500 records and returns a list of predictions. Define the batch request and response schemas.
c) Send a request with an empty list ({"records": []}). What status code does FastAPI return? Modify your schema to require at least one record using min_length=1.
d) Send a request where one field is a string instead of a float (e.g., "f0": "hello"). Examine the 422 error response body. How does Pydantic identify which field failed validation?
Exercise 3: Dockerfile (Code)
Write a Dockerfile for the application from Exercise 1.
a) Write a single-stage Dockerfile that:
- Uses python:3.11-slim as the base image
- Copies requirements.txt and installs dependencies
- Copies the application code and model file
- Exposes port 8000
- Runs uvicorn
b) Build and run the container:
docker build -t exercise-api:v1 .
docker run -d --name exercise-api -p 8000:8000 exercise-api:v1
curl http://localhost:8000/health
c) Now convert to a multi-stage build. The first stage installs dependencies; the second stage copies only the installed packages and application code. Compare the image sizes:
docker images | grep exercise-api
How much smaller is the multi-stage image? Why?
d) Add a HEALTHCHECK instruction to the Dockerfile. Run the container and verify the health status with docker inspect --format='{{json .State.Health}}' exercise-api.
Exercise 4: Latency Profiling (Code + Analysis)
Measure and optimize the prediction latency of your API.
a) Add a timing middleware to your FastAPI app that logs the response time for every request (as shown in the chapter). Make 100 sequential requests and record the average, median, p95, and p99 latencies.
import requests
import time
import numpy as np
url = "http://localhost:8000/predict"
payload = {"f0": 1.2, "f1": -0.5, "f2": 0.8, "f3": 2.1,
"f4": -1.0, "f5": 0.3, "f6": 1.5, "f7": -0.2}
latencies = []
for _ in range(100):
start = time.time()
resp = requests.post(url, json=payload)
elapsed = (time.time() - start) * 1000
latencies.append(elapsed)
print(f"Mean: {np.mean(latencies):.1f} ms")
print(f"Median: {np.median(latencies):.1f} ms")
print(f"P95: {np.percentile(latencies, 95):.1f} ms")
print(f"P99: {np.percentile(latencies, 99):.1f} ms")
b) Where does most of the latency come from? Instrument the predict function to time each step separately: request parsing, feature encoding, model inference, and response construction.
c) Compare the latency of 100 individual /predict calls vs. a single /predict/batch call with 100 records. What is the speedup factor for batch inference?
d) If your business requirement is sub-50ms p99 latency, and your current p99 is above that, what changes would you make? Discuss at least two concrete strategies.
Exercise 5: Model Versioning (Conceptual + Code)
Design a model versioning strategy for a production prediction API.
a) Your API currently loads a model at startup from a file path. Modify the code to accept the model path and version from environment variables:
import os
MODEL_VERSION = os.getenv("MODEL_VERSION", "v1.0.0")
MODEL_PATH = os.getenv("MODEL_PATH", "model.joblib")
Include the model version in every prediction response and in the /health endpoint.
b) You have deployed model v1.0 on Monday. On Wednesday, you train a new model v2.0 that has higher AUC on the test set. Describe a canary deployment strategy to safely roll out v2.0: - What percentage of traffic goes to v2.0 initially? - What metrics do you monitor during the canary period? - What conditions trigger a full rollout vs. a rollback? - How long should the canary period last?
c) Write a docker-compose.yml that runs two containers --- one with model v1.0 and one with model v2.0 --- on different ports. This simulates a blue-green deployment where you can switch traffic between them.
version: "3.8"
services:
churn-api-blue:
build: .
ports:
- "8001:8000"
environment:
- MODEL_VERSION=v1.0
- MODEL_PATH=model/model_v1.joblib
churn-api-green:
build: .
ports:
- "8002:8000"
environment:
- MODEL_VERSION=v2.0
- MODEL_PATH=model/model_v2.joblib
Test that both services return different model versions in their /health responses.
Exercise 6: Deployment Decision Matrix (Conceptual)
For each of the following scenarios, recommend a deployment pattern (real-time API, batch job, or hybrid) and justify your choice. Consider latency requirements, cost, and complexity.
a) Fraud detection for credit card transactions. The payment processor needs a risk score before approving each transaction. Transactions happen at a rate of 5,000 per second during peak hours.
b) Customer churn prediction for a quarterly business review. The marketing VP wants a list of the top 10% highest-risk customers to design a retention campaign.
c) Movie recommendations on a streaming platform. When a user opens the app, the home screen shows "Recommended for You."
d) Predictive maintenance for factory equipment. Sensors stream temperature, vibration, and pressure readings every 10 seconds. The model predicts whether a machine will fail in the next 24 hours.
e) Email spam classification. The email server needs to classify incoming messages before delivering them to the inbox.
For each scenario, specify: - Real-time, batch, or hybrid - Acceptable latency - Whether you need an API endpoint, a scheduled job, or both - Any caching strategies that would help
Exercise 7: End-to-End Deployment (Integration)
This exercise combines everything from the chapter into a single deployment pipeline.
a) Train a model on a dataset of your choice (the synthetic dataset from Exercise 1 is fine). Save the model with joblib.
b) Write a FastAPI application with Pydantic schemas, a /predict endpoint, a /predict/batch endpoint, and a /health endpoint.
c) Write tests (test_api.py) covering:
- Health check returns 200
- Valid prediction returns correct schema
- Missing fields return 422
- Invalid types return 422
- Batch endpoint accepts a list and returns a list
d) Write a Dockerfile with a multi-stage build, a non-root user, and a health check.
e) Write a docker-compose.yml for local development.
f) Build the image, run the container, and verify all endpoints work.
g) (Optional) Deploy to Cloud Run or ECS Fargate. Record the public endpoint URL and test it.
Submit the complete project directory: app.py, schemas.py, test_api.py, Dockerfile, docker-compose.yml, requirements.txt, .dockerignore, and the model artifact.
These exercises support Chapter 31: Model Deployment. Return to the chapter for reference.