Chapter 16: Key Takeaways

DataField.Dev

Chapter 16: Key Takeaways

Causal inference is a missing data problem, not a modeling problem. Every unit has two potential outcomes — $Y(1)$ under treatment and $Y(0)$ under control — but we only observe one. The individual treatment effect $\tau_i = Y(1) - Y(0)$ is never directly observable. This is the fundamental problem of causal inference (Holland, 1986), and it is a logical impossibility, not a data limitation. Every causal estimation method — from randomized experiments to regression adjustment to the advanced techniques in Chapters 17-19 — is a strategy for estimating what we cannot observe, under stated assumptions about the data-generating process.
The naive comparison of treated and control outcomes is biased whenever treatment is not randomly assigned. The decomposition $\mathbb{E}[Y \mid D=1] - \mathbb{E}[Y \mid D=0] = \text{ATT} + \text{Selection Bias}$ makes the bias explicit. In the MediCore case study, selection bias reversed the sign of the estimate: the naive analysis suggested Drug X increased readmission when it actually decreased it, because sicker patients were more likely to receive the drug. In the StreamRec case study, selection bias inflated the recommendation effect by 30% because the algorithm targets users who would engage organically. The direction and magnitude of selection bias depend on how treatment is assigned — which is why understanding the treatment assignment mechanism is as important as modeling the outcome.
Three assumptions — SUTVA, ignorability, and positivity — are required for causal identification, and each can fail in distinct ways. SUTVA (no interference + consistency) requires that one unit's treatment does not affect another's outcome and that treatment has a single well-defined version. Ignorability (unconfoundedness) requires that there are no unmeasured confounders — a powerful assumption that is untestable from data and must be argued from domain knowledge. Positivity requires that every covariate stratum has both treated and control units. Evaluating these assumptions is not a statistical exercise but a substantive one: it demands understanding of the treatment assignment mechanism, the outcome process, and the institutional context.
Randomization is the gold standard because it satisfies ignorability and positivity by design. A coin flip does not know or care about patient health status, user preferences, or any other confounder. Under randomization, the simple difference in means is an unbiased estimator of the ATE, and covariate balance is guaranteed in expectation. But randomization is often infeasible (ethical constraints, practical impossibility, platform unwillingness to show suboptimal recommendations), which is why the observational methods developed in this chapter and the next three are essential.
Regression adjustment controls for observed confounders, but the omitted variable bias formula quantifies what goes wrong when confounders are missed. The bias $\beta_2 \cdot \delta$ depends on two quantities: how strongly the omitted variable affects the outcome ($\beta_2$) and how strongly it is associated with treatment ($\delta$). Both must be nonzero for bias to occur. This formula provides a framework for sensitivity analysis: "How large would $\beta_2 \cdot \delta$ have to be to explain away my estimated effect?" — a question that can be answered with domain knowledge even when the specific confounder is unknown. The formula also explains why "control for everything" is not a valid strategy: variables that are consequences of treatment (mediators, colliders) can introduce bias when conditioned on, as Chapter 17 will formalize.
Choosing the right estimand — ATE, ATT, or ATU — is as important as estimating it correctly. The ATE answers "What is the average effect across the whole population?", the ATT answers "Did the treated group benefit?", and the ATU answers "Would the untreated group benefit if treated?" These can differ substantially when treatment effects are heterogeneous and treatment is selectively assigned. In the StreamRec case, ATT < ATU because the algorithm recommends where the causal effect is smallest — a finding with direct implications for recommendation targeting. In the MediCore case, ATT > ATE because the drug is given to sicker patients who benefit more. The policy question determines the estimand; the estimand determines the method.
The potential outcomes framework is not just notation — it is a discipline for honest causal reasoning. It forces you to specify exactly what you are estimating (the estimand), what you are assuming (SUTVA, ignorability, positivity), and what could go wrong (violations of each assumption). Every causal claim in this framework carries an explicit set of caveats. This transparency is the framework's greatest strength: it transforms "A causes B" from an untestable assertion into a claim with stated assumptions that colleagues, reviewers, and regulators can evaluate. The next three chapters build on this foundation with graphical models (Chapter 17), estimation methods (Chapter 18), and machine learning approaches (Chapter 19) — each addressing different aspects of the identification and estimation challenges introduced here.