Part VI: From Notebook to Production
You have a model. It scores well on your test set. The SHAP plots look reasonable. The stakeholders are excited.
Now ship it.
This is where most data science projects die. Not because the model was bad, but because nobody planned for what happens after the notebook. The preprocessing steps were manual. The experiment results live in a spreadsheet. The model runs on your laptop. The monitoring plan is "we'll check on it later." And six months from now, the model is silently returning predictions based on data distributions that no longer exist, and nobody has noticed.
Part VI bridges the gap between "works in a notebook" and "runs in production." Six chapters covering the engineering, operational, ethical, and business skills that separate a data science project from a data science product.
Chapter 29: Software Engineering for Data Scientists teaches the engineering practices that most data scientists skipped: project structure, testing, code quality, and the art of refactoring a 2,000-line notebook into importable modules. You do not need to become a software engineer. You do need to write code that other people can read, run, and maintain.
Chapter 30: ML Experiment Tracking introduces MLflow and Weights & Biases for tracking every experiment — every hyperparameter set, every metric, every artifact. If you cannot tell someone what hyperparameters produced your best model, you do not have a best model.
Chapter 31: Model Deployment wraps your model in a FastAPI REST API, containerizes it with Docker, and deploys it to the cloud. The product team can now call your model without you being in the room.
Chapter 32: Monitoring Models in Production adds the monitoring that keeps your model honest. Data drift detection, performance decay alerts, and retraining triggers — because every model starts dying the moment it hits production.
Chapter 33: Fairness, Bias, and Responsible ML ensures your model does not discriminate. Fairness is not a feature you add at the end. It is a constraint you design into the system from the start. The impossibility theorem says you cannot satisfy all fairness criteria simultaneously — so you must choose, and you must document that choice.
Chapter 34: The Business of Data Science closes the loop. The best model in the world is worthless if nobody uses it. ROI calculation, stakeholder communication, and the hard conversations that data science careers are built on.
Progressive Project Milestones
Part VI is where the StreamFlow churn model becomes a product:
- M10 (Chapter 31): Deploy as a FastAPI endpoint with Docker
- M11 (Chapter 32): Add drift detection and monitoring
- M12 (Chapter 33): Audit for fairness across demographics
- M13 (Chapter 34): Calculate ROI and build a stakeholder presentation
What You Need
- Parts I–III completed (the model must exist before you can deploy it)
- Chapter 19 (model interpretation) for the fairness chapter
- FastAPI, Docker, MLflow installed (see
requirements.txt) - Basic familiarity with the command line