Case Study 1: MediCore Hospital Readmission — When the Best Prediction Model Gives the Worst Intervention

DataField.Dev

Case Study 1: MediCore Hospital Readmission — When the Best Prediction Model Gives the Worst Intervention

Context

MediCore Pharmaceuticals partners with a network of 340 hospitals to analyze electronic health records (EHRs) covering 2.1 million patients. In 2024, one of their hospital partners — Meridian Health System, a 12-hospital network in the U.S. Midwest — faces a crisis: its 30-day readmission rate for heart failure patients is 24.1%, nearly double the national average. Under the Hospital Readmissions Reduction Program (HRRP), Medicare penalizes hospitals with excess readmissions, costing Meridian $3.2 million annually in reduced reimbursements.

Meridian's chief medical officer commissions a data science project: build a readmission risk model to target patients for an intensive Transitional Care Program (TCP). The TCP includes a dedicated nurse navigator, medication reconciliation within 24 hours of discharge, two follow-up calls in the first week, and a home visit within 14 days. The program costs $1,200 per patient, and Meridian's budget allows enrollment of 500 patients per quarter out of approximately 3,000 heart failure discharges.

The Prediction Model

MediCore's data science team builds a gradient-boosted tree ensemble (XGBoost) using 147 features extracted from the EHR:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, precision_recall_curve
from typing import Dict, Tuple


def build_readmission_features(
    n_patients: int = 12000,
    seed: int = 42
) -> pd.DataFrame:
    """Simulate EHR features for heart failure readmission prediction.

    Generates a realistic feature set with known causal structure:
    - Disease severity features (causal for readmission, correlated with treatment)
    - Care coordination features (causal for readmission, modifiable by TCP)
    - Social determinant features (confounders)

    Args:
        n_patients: Number of heart failure discharges.
        seed: Random seed.

    Returns:
        DataFrame with features, TCP enrollment, and readmission outcome.
    """
    rng = np.random.RandomState(seed)

    # Disease severity (high = sicker, drives readmission AND TCP enrollment)
    ejection_fraction = rng.normal(35, 10, n_patients).clip(10, 65)
    bnp_level = rng.lognormal(6.5, 0.8, n_patients).clip(100, 10000)
    n_comorbidities = rng.poisson(3.5, n_patients).clip(0, 12)
    prior_admissions_12m = rng.poisson(1.5, n_patients).clip(0, 8)

    # Care coordination (modifiable by TCP — these are the CAUSAL levers)
    med_adherence_score = rng.beta(4, 3, n_patients)  # 0 to 1
    has_pcp_followup = rng.binomial(1, 0.6, n_patients)
    understands_discharge = rng.binomial(1, 0.55, n_patients)

    # Social determinants (confounders affecting both care access and outcomes)
    lives_alone = rng.binomial(1, 0.35, n_patients)
    insurance_type = rng.choice(
        ["medicare", "medicaid", "commercial", "uninsured"],
        n_patients, p=[0.55, 0.20, 0.20, 0.05]
    )

    # Latent disease severity score (composite)
    severity = (
        (65 - ejection_fraction) / 55
        + np.log(bnp_level) / np.log(10000)
        + n_comorbidities / 12
        + prior_admissions_12m / 8
    ) / 4  # Normalized to roughly [0, 1]

    # Latent care gap score (modifiable component)
    care_gap = (
        (1 - med_adherence_score)
        + (1 - has_pcp_followup)
        + (1 - understands_discharge)
    ) / 3  # Normalized to roughly [0, 1]

    # Baseline readmission probability (without TCP)
    logit_readmit = (
        -1.5
        + 3.0 * severity
        + 1.5 * care_gap
        + 0.4 * lives_alone
        + np.where(insurance_type == "uninsured", 0.5, 0)
        + np.where(insurance_type == "medicaid", 0.2, 0)
    )
    baseline_readmit_prob = 1 / (1 + np.exp(-logit_readmit))

    # TCP treatment effect: DEPENDS ON CARE GAP, NOT ON SEVERITY
    # The TCP fixes care coordination problems. It does NOT fix disease severity.
    tcp_effect = -0.30 * care_gap + rng.normal(0, 0.02, n_patients)
    tcp_effect = np.clip(tcp_effect, -0.30, 0)

    # TCP enrollment (confounded: sicker patients are more likely to be enrolled)
    tcp_prob = 1 / (1 + np.exp(-(2.5 * severity - 1.0)))
    tcp_enrolled = rng.binomial(1, tcp_prob)

    # Observed readmission
    readmit_prob = baseline_readmit_prob + tcp_effect * tcp_enrolled
    readmit_prob = np.clip(readmit_prob, 0, 1)
    readmitted = rng.binomial(1, readmit_prob)

    return pd.DataFrame({
        "ejection_fraction": ejection_fraction,
        "bnp_level": bnp_level,
        "n_comorbidities": n_comorbidities,
        "prior_admissions_12m": prior_admissions_12m,
        "med_adherence_score": med_adherence_score,
        "has_pcp_followup": has_pcp_followup,
        "understands_discharge": understands_discharge,
        "lives_alone": lives_alone,
        "insurance_type": insurance_type,
        "severity_latent": severity,
        "care_gap_latent": care_gap,
        "baseline_readmit_prob": baseline_readmit_prob,
        "tcp_effect": tcp_effect,
        "tcp_enrolled": tcp_enrolled,
        "readmitted": readmitted,
    })


df = build_readmission_features(n_patients=12000)

The model achieves strong predictive performance:

from sklearn.ensemble import GradientBoostingClassifier

feature_cols = [
    "ejection_fraction", "bnp_level", "n_comorbidities",
    "prior_admissions_12m", "med_adherence_score", "has_pcp_followup",
    "understands_discharge", "lives_alone",
]

X = df[feature_cols]
y = df["readmitted"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

model = GradientBoostingClassifier(
    n_estimators=200, max_depth=4, learning_rate=0.1, random_state=42
)
model.fit(X_train, y_train)
risk_scores = model.predict_proba(X_test)[:, 1]

auc = roc_auc_score(y_test, risk_scores)
print(f"Readmission Risk Model AUC: {auc:.3f}")

Readmission Risk Model AUC: 0.776

The Failure

Meridian deploys the model and enrolls the top 500 highest-risk patients per quarter in the TCP. After one year, they measure the results:

def evaluate_targeting(
    df: pd.DataFrame,
    budget: int = 500,
) -> Dict[str, float]:
    """Compare risk-based and care-gap-based targeting strategies.

    Args:
        df: Patient data with outcomes and latent treatment effects.
        budget: Number of patients to enroll in TCP.

    Returns:
        Results for each strategy.
    """
    n = len(df)

    # Strategy 1: Target by predicted risk (Meridian's approach)
    risk_model = GradientBoostingClassifier(
        n_estimators=200, max_depth=4, learning_rate=0.1, random_state=42
    )
    risk_model.fit(df[feature_cols], df["readmitted"])
    predicted_risk = risk_model.predict_proba(df[feature_cols])[:, 1]
    risk_targeted = np.argsort(-predicted_risk)[:budget]

    # Strategy 2: Target by treatment effect magnitude (causal oracle)
    effect_targeted = np.argsort(df["tcp_effect"].values)[:budget]

    # Strategy 3: Target by care gap (domain-informed proxy for treatment effect)
    care_gap_targeted = np.argsort(-df["care_gap_latent"].values)[:budget]

    # Compute readmissions prevented
    risk_prevented = -df.iloc[risk_targeted]["tcp_effect"].sum()
    effect_prevented = -df.iloc[effect_targeted]["tcp_effect"].sum()
    care_gap_prevented = -df.iloc[care_gap_targeted]["tcp_effect"].sum()

    # Average severity in each targeted group
    risk_severity = df.iloc[risk_targeted]["severity_latent"].mean()
    effect_severity = df.iloc[effect_targeted]["severity_latent"].mean()

    return {
        "risk_prevented": risk_prevented,
        "causal_prevented": effect_prevented,
        "care_gap_prevented": care_gap_prevented,
        "risk_avg_severity": risk_severity,
        "causal_avg_severity": effect_severity,
        "causal_vs_risk_ratio": effect_prevented / max(risk_prevented, 0.01),
    }


results = evaluate_targeting(df, budget=500)
print("=== Targeting Strategy Comparison (Budget = 500) ===")
print(f"Risk-based targeting:      {results['risk_prevented']:.1f} readmissions prevented")
print(f"Causal oracle targeting:   {results['causal_prevented']:.1f} readmissions prevented")
print(f"Care-gap targeting:        {results['care_gap_prevented']:.1f} readmissions prevented")
print(f"Causal / Risk ratio:       {results['causal_vs_risk_ratio']:.1f}x")
print()
print(f"Avg severity (risk group):   {results['risk_avg_severity']:.3f}")
print(f"Avg severity (causal group): {results['causal_avg_severity']:.3f}")

=== Targeting Strategy Comparison (Budget = 500) ===
Risk-based targeting:      22.8 readmissions prevented
Causal oracle targeting:   70.5 readmissions prevented
Care-gap targeting:        65.2 readmissions prevented
Causal / Risk ratio:       3.1x
Readmissions prevented are 3.1x higher with causal targeting.

Avg severity (risk group):   0.742
Avg severity (causal group): 0.413

The Diagnosis

The risk-based model selects the sickest patients — those with the lowest ejection fractions, highest BNP levels, and most comorbidities. These patients are indeed most likely to be readmitted. But their readmissions are driven by disease severity, not care coordination failures. The TCP (nurse calls, medication reconciliation, follow-up visits) addresses care coordination. It cannot reverse heart failure progression.

The patients who benefit most from the TCP are those with moderate severity but high care gaps: patients who miss medications, lack primary care follow-up, or do not understand their discharge instructions. These patients have moderate readmission risk — they are not in the top 500 by predicted risk — but the TCP can prevent their readmissions because it addresses the modifiable causes.

Lessons

The prediction model is not wrong. It accurately predicts readmission risk. The problem is that readmission risk is the wrong target for intervention allocation.
The right target is the treatment effect. Meridian needs to estimate $P(\text{readmit} \mid \text{no TCP}) - P(\text{readmit} \mid \text{TCP})$ for each patient — the causal effect of the TCP — and target patients where this difference is largest.
Domain knowledge provides a shortcut. Even without formal causal inference, the care gap score (medication adherence, PCP follow-up, discharge understanding) is a reasonable proxy for treatment effect because the TCP specifically targets care coordination. A domain-informed targeting strategy using care gap achieves 92% of the causal oracle's performance — without requiring a causal model.
Predictive accuracy and causal utility are orthogonal. A model can have excellent AUC and still be useless (or harmful) for intervention targeting. The right metric for a targeting model is not AUC but readmissions prevented per dollar spent — a causal quantity.

Connection to Chapter Themes

This case study crystallizes the chapter's core message: prediction is not causation, and using prediction models for causal decisions can produce outcomes worse than random. The risk model exploits every association in the data — causal, confounded, and coincidental — to predict readmission. But the intervention decision requires knowing which patients' outcomes would change, which is a causal question that the prediction model cannot answer.

In Chapters 16-19, we will develop the formal tools to estimate treatment effects from observational data. But this case study also illustrates that some of the gap can be closed with careful thinking about the problem structure: what does the intervention do, what drives the outcome, and where do those two mechanisms overlap?