Part III: Causal Inference

"The most important and most commonly confused distinction in applied data science: prediction is not causation. Getting it wrong causes real harm."


Why This Part Exists

A hospital builds a model that accurately predicts which patients will be readmitted within 30 days. The model is deployed to target high-risk patients for additional follow-up care. But intervening on high-risk patients changes the very outcome the model predicts. The prediction model cannot answer the question the hospital actually needs answered: "Will this intervention cause fewer readmissions?"

A recommendation system achieves high click-through rates. But does the algorithm cause users to engage with content they would not have discovered, or does it simply predict what users would have clicked anyway? The difference determines whether the system creates value or merely takes credit for organic behavior.

These are causal questions, and no amount of predictive accuracy can answer them.

This part teaches you to think causally. Five chapters cover the full landscape: the paradigm shift from prediction to causation, the Rubin potential outcomes framework, Pearl's graphical causal models, the toolkit of estimation methods (matching, propensity scores, instrumental variables, difference-in-differences, regression discontinuity), and the frontier of causal machine learning (heterogeneous treatment effects, uplift modeling, double machine learning).

Chapters in This Part

Chapter Focus
15. Beyond Prediction Why prediction models fail for causal questions; Simpson's paradox; confounding
16. The Potential Outcomes Framework Y(0), Y(1), ATE, SUTVA, ignorability, positivity
17. Graphical Causal Models DAGs, d-separation, backdoor criterion, front-door criterion, do-calculus
18. Causal Estimation Methods Matching, PSM, IPW, IV, DiD, regression discontinuity
19. Causal Machine Learning CATEs, causal forests, meta-learners, DML, uplift modeling

Progressive Project Milestone

  • M7 (Chapter 19): Estimate heterogeneous recommendation effects using causal forests. Build a targeting policy that recommends items only when they are likely to cause engagement.

Prerequisites

Chapter 3 (Probability Theory) is essential. No prior causal inference knowledge is assumed — this part develops the framework from first principles.

Chapters in This Part