Case Study 1: MediCore Personalized Treatment — CATE Estimation for Drug Subgroup Analysis

Context

MediCore Pharmaceuticals has established (in Case Studies from Chapters 16 and 18) that Drug X reduces 30-day hospital readmission by approximately 4.2 percentage points on average among the 2.1 million patients in its observational EHR database. The causal estimation used inverse probability weighting with doubly-robust augmentation (Chapter 18), and the result survived sensitivity analysis: an unmeasured confounder would need to explain at least 8% of the residual variation in both treatment and outcome to nullify the effect.

But 4.2 percentage points is an average. MediCore's chief medical officer asks the question that changes the analysis: "Which patients benefit most from Drug X, and are there patients for whom Drug X is harmful?"

This is a CATE question. The FDA is increasingly interested in treatment effect heterogeneity for regulatory submissions (FDA Guidance on Enrichment Strategies, 2019), and MediCore's competitor has recently published subgroup analyses suggesting their drug works only for a genetic subpopulation. MediCore needs to understand heterogeneity in Drug X's effect — not to cherry-pick favorable subgroups, but to inform prescribing guidelines, identify safety signals, and prepare for regulatory review.

The Data

The dataset includes 180,000 Drug X recipients (treated) and 1,920,000 patients on standard therapy (control), with the following features:

Feature Description Range
age Patient age at admission 22–98 years
egfr Estimated glomerular filtration rate (renal function) 8–120 mL/min
hba1c Glycated hemoglobin (diabetes marker) 4.2–14.1%
systolic_bp Systolic blood pressure at admission 85–220 mmHg
bmi Body mass index 15.2–55.8
num_comorbidities Count of Charlson comorbidity index conditions 0–12
genetic_marker_rs12345 Presence of rs12345 polymorphism (binary) 0/1 (28% prevalence)
prior_readmission Readmitted in the previous 12 months (binary) 0/1
medication_adherence PDC (proportion of days covered) score 0.0–1.0
discharge_disposition Home, SNF, or rehabilitation facility Categorical (3 levels)

Analytical Approach

The team employs a two-stage strategy: (1) estimate CATEs using a causal forest with DML debiasing, then (2) validate and interpret the heterogeneity using policy trees, subgroup contrasts, and sensitivity analysis.

Stage 1: Causal Forest Estimation

import numpy as np
import pandas as pd
from econml.dml import CausalForestDML
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier


def medicore_cate_estimation(
    df: pd.DataFrame,
    feature_cols: list,
    treatment_col: str = "drug_x",
    outcome_col: str = "readmitted_30d",
) -> CausalForestDML:
    """Estimate CATEs for Drug X using CausalForestDML.

    Uses gradient boosting for nuisance estimation (outcome and
    propensity models) and a causal forest for CATE estimation.
    The DML framework ensures that regularization bias in nuisance
    estimation does not leak into CATE estimates.

    Args:
        df: Patient DataFrame.
        feature_cols: Clinical covariates for CATE estimation.
        treatment_col: Binary treatment indicator.
        outcome_col: Binary readmission outcome.

    Returns:
        Fitted CausalForestDML model.
    """
    X = df[feature_cols].values
    D = df[treatment_col].values
    Y = df[outcome_col].values

    model = CausalForestDML(
        model_y=GradientBoostingRegressor(
            n_estimators=300, max_depth=5, learning_rate=0.05,
            subsample=0.8, random_state=42
        ),
        model_t=GradientBoostingClassifier(
            n_estimators=200, max_depth=4, learning_rate=0.05,
            subsample=0.8, random_state=42
        ),
        n_estimators=1000,
        min_samples_leaf=50,  # Conservative: ensure sufficient data per leaf
        max_depth=None,
        criterion="het",
        random_state=42,
    )
    model.fit(Y, D, X=X)
    return model

Stage 2: Interpreting Heterogeneity

The causal forest produces CATE estimates for all 2.1 million patients. The team analyzes the distribution:

  • Mean CATE: $-0.042$ (consistent with the ATE from Chapter 18)
  • Standard deviation of CATEs: $0.038$
  • Range: $[-0.14, +0.03]$
  • Fraction with $\hat{\tau}(x) < 0$: 89% (Drug X is beneficial)
  • Fraction with $\hat{\tau}(x) > 0$: 11% (Drug X may be harmful)

The 11% with estimated positive CATEs is a safety signal. The team investigates.

Key Findings

Finding 1: The Genetic Marker Drives the Largest Heterogeneity

The causal forest's feature importances for treatment effect heterogeneity:

Feature Importance
genetic_marker_rs12345 0.31
egfr 0.22
medication_adherence 0.15
age 0.11
num_comorbidities 0.08
hba1c 0.05
systolic_bp 0.04
bmi 0.02
prior_readmission 0.01
discharge_disposition 0.01

The genetic marker is the strongest driver of treatment effect variation — even though it ranks only 7th in predictive importance for readmission itself. This illustrates the inversion discussed in Section 19.3: features that predict outcomes and features that predict treatment effect heterogeneity are often different.

Finding 2: Three Clinically Distinct Subgroups

The team uses SingleTreePolicyInterpreter to extract an interpretable summary:

from econml.cate_interpreter import SingleTreeCateInterpreter


def extract_medicore_subgroups(
    model: CausalForestDML,
    X: np.ndarray,
    feature_names: list,
) -> SingleTreeCateInterpreter:
    """Extract interpretable subgroups from the causal forest.

    A shallow decision tree approximates the causal forest's CATE
    function, producing human-readable rules for clinicians.

    Args:
        model: Fitted CausalForestDML.
        X: Patient covariates.
        feature_names: Names for interpretation.

    Returns:
        Fitted interpreter with tree structure.
    """
    interpreter = SingleTreeCateInterpreter(
        include_model_uncertainty=True,
        max_depth=3,
    )
    interpreter.interpret(model, X)
    return interpreter

The depth-3 policy tree identifies three primary subgroups:

Subgroup A: Strong responders (genetic marker = 1, eGFR $\geq$ 45) - Estimated CATE: $-0.092$ (9.2 pp reduction in readmission) - 95% CI: $[-0.112, -0.072]$ - $n = 487,000$ (23% of population) - Clinical interpretation: Drug X is highly effective for carriers of rs12345 with adequate renal function.

Subgroup B: Moderate responders (genetic marker = 0, eGFR $\geq$ 60, adherence $\geq$ 0.7) - Estimated CATE: $-0.031$ (3.1 pp reduction) - 95% CI: $[-0.042, -0.020]$ - $n = 890,000$ (42% of population) - Clinical interpretation: Drug X provides a modest but statistically significant benefit for non-carriers who take it reliably and have good renal function.

Subgroup C: Potential harm (eGFR $< 45$, regardless of genetic marker) - Estimated CATE: $+0.018$ (1.8 pp increase in readmission) - 95% CI: $[-0.005, +0.041]$ - $n = 210,000$ (10% of population) - Clinical interpretation: Drug X may increase readmission risk for patients with significantly impaired renal function. The confidence interval includes zero — the effect is not statistically significant at $\alpha = 0.05$, but the direction is concerning.

The remaining 25% of patients fall between these subgroups, with CATEs near zero and wide confidence intervals (insufficient data for precise estimation in these covariate strata).

Finding 3: Sensitivity Analysis by Subgroup

The team applies the Cinelli-Hazlett sensitivity framework (Chapter 18) within each subgroup:

Subgroup CATE Robustness value Interpretation
A (strong responders) $-0.092$ $R^2 = 0.15$ An unmeasured confounder would need to explain 15% of residual variation in both treatment and outcome to nullify the effect. Highly robust.
B (moderate responders) $-0.031$ $R^2 = 0.04$ Moderate robustness. A confounder as strong as prior_readmission could potentially explain away the effect.
C (potential harm) $+0.018$ $R^2 = 0.01$ Fragile. Even a weak unmeasured confounder could change the sign.

The subgroup C finding is not robust enough for clinical action but warrants further investigation — potentially a targeted RCT in the renal-impaired population.

Targeting Policy

The team constructs a treatment allocation policy:

def medicore_targeting_policy(
    model: CausalForestDML,
    X_new: np.ndarray,
    risk_threshold: float = 0.0,
) -> dict:
    """Apply the targeting policy to new patients.

    Recommends Drug X only for patients with estimated beneficial
    CATE and statistically significant effect (CI excludes zero).

    Args:
        model: Fitted CausalForestDML.
        X_new: Covariates for new patients.
        risk_threshold: Minimum beneficial effect to recommend treatment.

    Returns:
        Dictionary with recommendations and confidence assessments.
    """
    tau_hat = model.effect(X_new).flatten()
    lb, ub = model.effect_interval(X_new, alpha=0.05)
    lb, ub = lb.flatten(), ub.flatten()

    # Recommend treatment if: (1) point estimate is beneficial and
    # (2) upper bound of CI is below zero (statistically significant benefit)
    recommend = (tau_hat < risk_threshold) & (ub < 0)

    return {
        "recommend_treatment": recommend,
        "estimated_cate": tau_hat,
        "ci_lower": lb,
        "ci_upper": ub,
        "confidence": np.where(ub < 0, "high", np.where(tau_hat < 0, "moderate", "low")),
    }

Policy Evaluation

The team compares three policies using the AIPW estimator for policy value:

Policy Patients treated Estimated readmission reduction (total) Readmission reduction per treated patient
Treat all 2,100,000 88,200 fewer readmissions 0.042 pp per patient
Treat if $\hat{\tau} < 0$ 1,869,000 91,500 fewer readmissions 0.049 pp per patient
Treat if $\hat{\tau} < 0$ and CI excludes 0 1,377,000 85,800 fewer readmissions 0.062 pp per patient

The selective policy treats 34% fewer patients while preventing nearly as many readmissions. The per-patient benefit increases from 4.2 to 6.2 percentage points — a 48% improvement in treatment efficiency. The policy avoids treating 210,000 potentially-harmed patients (Subgroup C) and 513,000 patients with no detectable benefit.

Regulatory Considerations

The team prepares three deliverables for FDA communication:

  1. Subgroup analysis report: CATEs with confidence intervals for pre-specified subgroups (by genetic marker, renal function, age), following the ICH E9(R1) framework for estimands.

  2. Companion diagnostic proposal: The strong interaction between genetic marker rs12345 and Drug X efficacy justifies development of a companion diagnostic test. Subgroup A patients (marker-positive, adequate renal function) show a treatment effect more than twice the average.

  3. Safety signal documentation: The Subgroup C finding (potential harm in renal-impaired patients) is documented as a safety signal for post-marketing surveillance, with a recommendation for a targeted confirmatory trial.

Lessons

This case study demonstrates the full causal ML workflow: estimate CATEs using a method with valid confidence intervals (causal forest), interpret heterogeneity with an explainable model (policy tree), validate robustness with sensitivity analysis, and translate findings into actionable clinical policy. The critical insight: the ATE of $-4.2$ percentage points concealed a range from $-9.2$ pp (genetic marker carriers with good renal function) to $+1.8$ pp (renal-impaired patients). Personalized treatment allocation based on CATEs is substantially more efficient than uniform treatment.