Case Study 1: MediCore Treatment Effect — Defining Potential Outcomes for Drug Efficacy
Context
MediCore Pharmaceuticals has completed a Phase III randomized controlled trial for Drug X — a novel antihypertensive agent — and demonstrated efficacy in a controlled setting. The FDA has approved Drug X, and it is now prescribed in routine clinical practice across MediCore's network of 340 hospitals.
Six months post-approval, the clinical analytics team faces a new question: In the real-world population of patients now receiving Drug X, what is the causal effect on 30-day hospital readmission? The RCT enrolled a carefully selected population (ages 45-75, no more than two comorbidities, stable renal function), but the real-world population is far broader: older, sicker, and more diverse than the trial population. The RCT result may not generalize.
The team has access to electronic health records (EHR) for 2.1 million patients, of whom approximately 180,000 have been prescribed Drug X in the six months since approval. The challenge: this is observational data, and physicians prescribe Drug X selectively based on patient characteristics that also affect readmission risk.
Defining the Potential Outcomes
The clinical team defines the framework precisely:
- Unit: A patient-admission episode (patient $i$ admitted to hospital $h$ at time $t$).
- Treatment: $D_i = 1$ if the patient is prescribed Drug X at discharge; $D_i = 0$ if the patient receives standard antihypertensive therapy.
- Outcome: $Y_i = 1$ if the patient is readmitted to any hospital within 30 days; $Y_i = 0$ otherwise.
- Potential outcomes:
- $Y_i(1)$: Would patient $i$ be readmitted within 30 days if prescribed Drug X?
- $Y_i(0)$: Would patient $i$ be readmitted within 30 days if prescribed standard therapy?
The individual treatment effect $\tau_i = Y_i(1) - Y_i(0)$ can take three values for a binary outcome:
| $\tau_i$ | Meaning |
|---|---|
| $-1$ | Drug X prevents readmission (patient would be readmitted without it but not with it) |
| $0$ | Drug X has no effect (same readmission status either way) |
| $+1$ | Drug X causes readmission (patient would not be readmitted without it but is with it — adverse effect) |
Choosing the Estimand
The clinical team deliberates on which estimand to target:
ATE — The average effect across all patients (both those prescribed Drug X and those who were not): $$\text{ATE} = \mathbb{E}[Y(1) - Y(0)] = P(\text{readmitted under Drug X}) - P(\text{readmitted under standard})$$
This answers: "If we switched the entire population from standard therapy to Drug X, how would the readmission rate change?"
ATT — The average effect among patients who actually received Drug X: $$\text{ATT} = \mathbb{E}[Y(1) - Y(0) \mid D = 1]$$
This answers: "Are the patients who are currently receiving Drug X benefiting from it?"
After discussion, the team targets both: the ATT for evaluating current prescribing practices, and the ATE for informing whether Drug X's use should be expanded or restricted.
Evaluating SUTVA
No interference. Drug X treats hypertension, a non-communicable condition. Whether Alice takes Drug X should not affect Bob's readmission risk — unless they share a hospital room and Drug X changes Alice's behavior in ways that affect Bob's care environment. The team judges no-interference as plausible at the patient level, though they note a potential hospital-level interference: if Drug X reduces readmissions for treated patients, this could free hospital beds and improve care quality for all patients in that hospital.
Consistency. Drug X is manufactured in a single formulation (10mg tablets, once daily). However, the team identifies two threats to consistency:
- Dosage variation. Some physicians prescribe 5mg for patients with renal impairment. If "treatment" means "any Drug X prescription," the potential outcome $Y_i(1)$ is ambiguous because it could correspond to 5mg or 10mg.
- Adherence variation. $D_i = 1$ means "prescribed Drug X," but not all patients take it as prescribed. A patient who receives the prescription but does not fill it has $D_i = 1$ but effectively receives control.
Resolution: The team defines treatment as "prescribed Drug X at 10mg" and excludes 5mg prescriptions. They acknowledge that adherence is unmeasured and discuss intention-to-treat (ITT) vs. per-protocol analysis. For the primary analysis, they use ITT (based on prescription, regardless of adherence), which estimates a conservative treatment effect.
The Naive Analysis and Its Bias
import numpy as np
import pandas as pd
from scipy import stats as sp_stats
def simulate_medicore_ehr(
n_patients: int = 50000,
seed: int = 42,
) -> pd.DataFrame:
"""Simulate MediCore EHR data with realistic confounding.
Physicians prescribe Drug X based on disease severity, age,
and comorbidities — the same factors that drive readmission risk.
Args:
n_patients: Number of patient-admission episodes.
seed: Random seed.
Returns:
DataFrame with patient characteristics, treatment, and outcomes.
"""
rng = np.random.RandomState(seed)
# Patient characteristics (confounders)
age = rng.normal(65, 12, n_patients).clip(30, 95)
severity = rng.gamma(2, 1, n_patients) # Disease severity score
n_comorbidities = rng.poisson(2.5, n_patients).clip(0, 10)
prior_admissions = rng.poisson(1.0, n_patients).clip(0, 8)
# Prescribing model: physicians give Drug X to sicker patients
prescribe_logit = (
-1.5
+ 0.02 * (age - 65)
+ 0.4 * severity
+ 0.15 * n_comorbidities
+ 0.2 * prior_admissions
)
prescribe_prob = 1 / (1 + np.exp(-prescribe_logit))
treatment = rng.binomial(1, prescribe_prob)
# Potential outcomes
# Y(0): readmission risk under standard therapy
readmit_logit_0 = (
-2.0
+ 0.015 * (age - 65)
+ 0.3 * severity
+ 0.1 * n_comorbidities
+ 0.25 * prior_admissions
)
readmit_prob_0 = 1 / (1 + np.exp(-readmit_logit_0))
y0 = rng.binomial(1, readmit_prob_0)
# Y(1): Drug X reduces readmission risk
# True effect: -5 percentage points on average (heterogeneous)
treatment_effect = -0.05 - 0.01 * severity # More effective for sicker patients
readmit_prob_1 = (readmit_prob_0 + treatment_effect).clip(0.01, 0.99)
y1 = rng.binomial(1, readmit_prob_1)
y_obs = treatment * y1 + (1 - treatment) * y0
return pd.DataFrame({
"age": age,
"severity": severity,
"n_comorbidities": n_comorbidities,
"prior_admissions": prior_admissions,
"prescribe_prob": prescribe_prob,
"treatment": treatment,
"readmit_prob_0": readmit_prob_0,
"readmit_prob_1": readmit_prob_1,
"y0": y0,
"y1": y1,
"y_obs": y_obs,
"true_ite": y1 - y0,
})
ehr = simulate_medicore_ehr()
# True causal effects
true_ate = ehr["true_ite"].mean()
true_att = ehr.loc[ehr["treatment"] == 1, "true_ite"].mean()
true_atu = ehr.loc[ehr["treatment"] == 0, "true_ite"].mean()
# Naive comparison
naive = (
ehr.loc[ehr["treatment"] == 1, "y_obs"].mean()
- ehr.loc[ehr["treatment"] == 0, "y_obs"].mean()
)
# Selection bias
selection_bias = (
ehr.loc[ehr["treatment"] == 1, "y0"].mean()
- ehr.loc[ehr["treatment"] == 0, "y0"].mean()
)
print("MediCore Drug X Analysis")
print("=" * 50)
print(f"N patients: {len(ehr):,}")
print(f"N treated: {ehr['treatment'].sum():,}")
print(f"Treatment rate: {ehr['treatment'].mean():.1%}")
print()
print(f"True ATE: {true_ate:+.4f}")
print(f"True ATT: {true_att:+.4f}")
print(f"True ATU: {true_atu:+.4f}")
print()
print(f"Naive estimate: {naive:+.4f}")
print(f"Selection bias: {selection_bias:+.4f}")
print()
print("The naive estimate suggests Drug X INCREASES readmission,")
print("when in fact it DECREASES readmission. The sign is reversed")
print("because sicker patients (higher readmission risk) are more")
print("likely to receive Drug X.")
MediCore Drug X Analysis
==================================================
N patients: 50,000
N treated: 19,827
Treatment rate: 39.7%
True ATE: -0.0556
True ATT: -0.0710
True ATU: -0.0454
Naive estimate: +0.0456
Selection bias: +0.1166
The naive estimate suggests Drug X INCREASES readmission,
when in fact it DECREASES readmission. The sign is reversed
because sicker patients (higher readmission risk) are more
likely to receive Drug X.
The Danger of Confounding
This result illustrates the most dangerous form of confounding: sign reversal. The naive estimate is not merely too large or too small — it has the wrong sign. A policymaker relying on the naive comparison would conclude that Drug X harms patients, when in fact it helps them. The selection bias ($+0.117$) is large enough to overwhelm the true negative effect ($-0.056$), producing a positive naive estimate ($+0.046$).
This is exactly the structure described in Section 16.4: physicians prescribe Drug X to sicker patients, so $\mathbb{E}[Y(0) \mid D=1] > \mathbb{E}[Y(0) \mid D=0]$ — the treated group has higher baseline readmission risk. The positive selection bias masks the negative treatment effect.
Regression Adjustment
import statsmodels.api as sm
covariates = ehr[["age", "severity", "n_comorbidities", "prior_admissions"]].values
cov_names = ["age", "severity", "n_comorbidities", "prior_admissions"]
# Build design matrix
design = np.column_stack([ehr["treatment"].values, covariates])
design = sm.add_constant(design)
model = sm.OLS(ehr["y_obs"].values, design).fit(cov_type="HC1")
reg_estimate = model.params[1]
reg_se = model.bse[1]
print(f"Regression-adjusted estimate: {reg_estimate:+.4f} (SE: {reg_se:.4f})")
print(f"95% CI: [{reg_estimate - 1.96*reg_se:+.4f}, {reg_estimate + 1.96*reg_se:+.4f}]")
print(f"True ATE: {true_ate:+.4f}")
print(f"Residual bias: {reg_estimate - true_ate:+.4f}")
Regression-adjusted estimate: -0.0533 (SE: 0.0043)
95% CI: [-0.0618, -0.0449]
True ATE: -0.0556
Residual bias: +0.0023
Regression adjustment, controlling for the four observed confounders, recovers the correct sign and a close approximation of the ATE. The residual bias ($+0.002$) is small because the simulation includes all confounders. In practice, unmeasured confounders (socioeconomic status, health literacy, physician-specific tendencies) would likely introduce additional bias that regression adjustment alone cannot eliminate.
Lessons for the Data Scientist
-
The naive comparison can have the wrong sign. Observational analyses without causal reasoning can produce conclusions that are not merely imprecise but directionally wrong. The naive analysis suggests Drug X harms patients; the truth is the opposite.
-
Choosing the estimand matters. The ATT ($-0.071$) is larger in magnitude than the ATE ($-0.056$) because Drug X is preferentially given to sicker patients who benefit more. Whether MediCore reports the ATT or ATE changes the clinical message.
-
Regression adjustment requires all confounders. The method works here because the simulation includes all confounders. A real-world analysis would need to carefully enumerate potential confounders, argue for conditional ignorability, and conduct sensitivity analysis to assess vulnerability to unmeasured confounding.
-
Domain knowledge is indispensable. The statistical framework tells us what to estimate and what assumptions we need. But whether those assumptions are plausible — whether we have measured the right confounders, whether SUTVA holds, whether positivity is satisfied — requires clinical expertise, not statistical computation.