Chapter 19: Further Reading

Essential Sources

1. Susan Athey and Guido Imbens, "Recursive Partitioning for Heterogeneous Causal Effects" (Proceedings of the National Academy of Sciences, 2016); Stefan Wager and Susan Athey, "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests" (Journal of the American Statistical Association, 2018)

These two papers establish the theoretical foundation for causal forests. The 2016 PNAS paper introduces the causal tree: a decision tree that splits to maximize treatment effect heterogeneity rather than predictive accuracy. The key innovations are the honest estimation procedure (using separate data for splitting and estimation) and the criterion that selects splits maximizing the variance of treatment effects across leaves. The 2018 JASA paper extends causal trees to causal forests, proving consistency and asymptotic normality — the result that enables valid confidence intervals for $\tau(x)$.

Reading guidance: The 2016 paper is more accessible. Section 2 defines the causal tree framework and the honest splitting criterion. Section 3 presents the asymptotic theory for honest causal trees. The key result: honest estimation eliminates the bias from adaptive splitting, at the cost of reduced statistical efficiency (using half the data for estimation instead of all of it). The 2018 paper is technically demanding. Section 2 defines the generalized random forest framework. Theorem 3.4 is the main result: under regularity conditions, the causal forest's CATE estimate is asymptotically normal with variance that can be consistently estimated. Sections 4-5 provide practical recommendations. For practitioners, Section 6 (simulations) and Section 7 (empirical application to the National Job Training Partnership Act study) are the most directly useful. The grf package in R (and EconML's CausalForestDML in Python) implements these methods. The companion paper by Athey, Tibshirani, and Wager (2019), "Generalized Random Forests" (Annals of Statistics), generalizes the framework to arbitrary estimating equations — a broader result that subsumes causal forests as a special case.

2. Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins, "Double/Debiased Machine Learning for Treatment and Structural Parameters" (The Econometrics Journal, 2018)

The foundational paper for DML. Chernozhukov et al. solve a fundamental problem: how to use flexible ML estimators for nuisance parameters (outcome models, propensity scores) while still obtaining $\sqrt{n}$-rate, asymptotically normal estimates of causal parameters. The paper introduces two innovations: Neyman-orthogonal moment conditions (which make the causal estimate insensitive to first-stage errors at first order) and cross-fitting (which eliminates overfitting bias from flexible ML models). Together, these innovations allow the use of LASSO, random forests, neural networks, or any other ML method for confounding adjustment, while the causal parameter still achieves the parametric convergence rate.

Reading guidance: The paper is long (69 pages in the journal version) and technically dense. For a first reading, focus on Sections 1-3. Section 1 lays out the problem: ML estimators have regularization bias of order $o(n^{-1/4})$, but valid inference requires bias of order $o(n^{-1/2})$. Section 2 introduces the partially linear model $Y = \theta D + g(X) + \epsilon$ and derives the Neyman-orthogonal moment condition. The key insight is equation (2.2): the DML moment condition involves the product of treatment and outcome residuals, which makes it robust to first-stage estimation errors. Section 3 formalizes cross-fitting and states the main theorem (Theorem 3.1): under the condition that the product of nuisance estimation rates is $o(n^{-1/2})$, the DML estimator is $\sqrt{n}$-consistent and asymptotically normal. Sections 4-5 extend to more general causal models (IV, panel data). For practitioners, Section 6 (simulations comparing DML to naive plug-in estimation) is essential: the simulations show that the naive approach fails dramatically in high dimensions while DML maintains valid coverage. The EconML library implements the paper's methods as LinearDML, SparseLinearDML, and DML.

3. Soren R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu, "Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning" (Proceedings of the National Academy of Sciences, 2019)

This paper systematizes the meta-learner framework for CATE estimation, introducing the X-learner and providing a rigorous comparison with the S-learner and T-learner. The paper's contribution is both conceptual (organizing existing approaches into a coherent taxonomy) and methodological (the X-learner itself, which adapts to treatment group imbalance through cross-group imputation).

Reading guidance: Section 2 defines the S-learner and T-learner with formal bias-variance analysis. The key result for the S-learner: regularized models systematically underestimate the treatment effect (Proposition 1). Section 3 introduces the X-learner. The three-step procedure — (1) fit outcome models, (2) impute individual treatment effects using cross-group predictions, (3) combine with propensity weights — is clearly explained with worked examples. Section 4 provides theoretical results on when each meta-learner dominates. The takeaway: the S-learner is best when the treatment effect is very large relative to noise; the T-learner is best when treated and control groups are balanced and the outcome functions differ substantially; the X-learner is best when one group is much larger than the other. Section 5 presents simulations that confirm the theoretical predictions. The paper does not cover the R-learner; for that, see Nie and Wager (2021), "Quasi-Oracle Estimation of Heterogeneous Treatment Effects" (Biometrika), which introduces the R-learner loss function and proves its Neyman-orthogonality property.

4. Nicholas J. Radcliffe and Patrick D. Surry, "Real-World Uplift Modelling with Significance-Based Uplift Trees" (2011); Pierre Gutierrez and Jean-Yves Gérardy, "Causal Inference and Uplift Modelling: A Review of the Literature" (2017)

These papers bridge the academic causal inference literature with the applied uplift modeling practice used in marketing, CRM, and digital platforms. Radcliffe is one of the originators of uplift modeling; his work on significance-based uplift trees provides a practical algorithm for direct CATE estimation in trees. Gutierrez and Gérardy's review paper is the best single entry point for practitioners, covering the connection between uplift modeling and causal inference, evaluation metrics (Qini curve, uplift curve, AUUC), and practical implementation guidance.

Reading guidance: Start with Gutierrez and Gérardy (2017). Section 2 establishes the potential outcomes framework for uplift, connecting the marketing terminology to Rubin's framework. Section 3 reviews estimation approaches: the "two-model" approach (equivalent to T-learner), the "class transformation" approach (Jaskowski and Jaroszewicz, 2012), and the "modified outcome" approach (equivalent to the transformed outcome). Section 4 covers evaluation metrics: the Qini curve (cumulative incremental gains), the uplift curve (treatment effect as a function of treated fraction), and AUUC. The key practical insight: never evaluate an uplift model using standard predictive metrics (AUC, accuracy), because these measure predictive quality ($P(Y=1|X)$), not uplift quality ($P(Y(1)=1|X) - P(Y(0)=1|X)$). A model can have high AUC and zero uplift — it perfectly predicts who will convert but cannot distinguish Persuadables from Sure Things. Section 5 reviews applications in direct marketing, customer retention, and pricing. For Radcliffe's original work, the paper introduces the "net information value" criterion for uplift tree splits and discusses practical issues (minimum node size, pruning, ensemble methods) with a directness that reflects industrial experience.

5. Keith Battocchi, Eleanor Dillon, Maggie Hei, Greg Lewis, Paul Oka, Miruna Oprescu, and Vasilis Syrgkanis, "EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation" (2019)

The documentation and design paper for EconML, the Microsoft Research library that implements the methods of this chapter. EconML provides a unified API for meta-learners (S, T, X), DML variants (LinearDML, SparseLinearDML, CausalForestDML, ForestDML), doubly-robust learners, IV methods, and policy learning. The library's design reflects the academic literature directly: each estimator corresponds to a published method with known theoretical properties.

Reading guidance: The best entry point is the EconML User Guide (https://econml.azurewebsites.net/spec/estimation.html), which maps each estimation method to its theoretical basis and provides worked examples. Key sections: (1) "Treatment Effect Estimation" covers the fit / effect / effect_interval API that all estimators share. (2) "Model Selection and Validation" discusses the RScorer — a model selection criterion based on the R-learner loss that evaluates CATE models without knowing the true $\tau(x)$. This is one of the most practically useful features: the RScorer enables hyperparameter tuning and model comparison for causal ML, partially addressing the evaluation challenge described in Section 19.6. (3) "Interpretability" covers SingleTreeCateInterpreter and SingleTreePolicyInterpreter for producing human-readable treatment rules from black-box CATE models. For the theory underlying each estimator, the library documentation links directly to the relevant papers. The library is actively maintained, and the GitHub repository (https://github.com/Microsoft/EconML) includes example notebooks for common use cases.