Case Study 2: MediCore Bayesian Treatment Effect Estimation — Clinical Prior Meets Observational Data

DataField.Dev

Case Study 2: MediCore Bayesian Treatment Effect Estimation — Clinical Prior Meets Observational Data

Context

MediCore Pharmaceuticals has developed Cardiopril-X, a new ACE inhibitor for treating mild-to-moderate hypertension. A randomized controlled trial (RCT) of 800 patients is planned but will not report results for 18 months. In the interim, MediCore's data science team has access to two information sources:

Prior knowledge from the drug class. Three published RCTs of related ACE inhibitors (enalapril, lisinopril, ramipril) report systolic blood pressure (SBP) reductions of 9.3, 11.2, and 10.1 mmHg respectively, with pooled standard errors. A meta-analysis of the three studies estimates the class-level treatment effect at 10.2 mmHg (95% CI: [8.1, 12.3]).
Observational data. MediCore has electronic health records from 340 hospitals covering 2.1 million patients. Among these, 4,200 patients received Cardiopril-X off-label (prescribed before the full trial results are available, as permitted for drugs with class-level evidence). After propensity-score matching to a control group of 4,200 similar patients, the estimated SBP reduction attributable to Cardiopril-X is 8.4 mmHg with a standard error of 0.9 mmHg.

The regulatory team asks: "Given what we know from prior ACE inhibitor trials and from the observational study, what is our best estimate of Cardiopril-X's treatment effect, and how certain are we?"

A frequentist analysis of the observational data alone yields a 95% confidence interval of [6.6, 10.2] mmHg. This is useful but ignores the substantial prior evidence. A Bayesian analysis can formally combine both sources.

The Analysis

Step 1: Specifying the Prior

The meta-analytic prior summarizes the treatment effect of the ACE inhibitor drug class. The pooled estimate is 10.2 mmHg with a 95% CI of [8.1, 12.3], corresponding to a standard error of approximately $(12.3 - 8.1) / (2 \times 1.96) \approx 1.07$ mmHg.

However, Cardiopril-X is a new compound — it may differ from the class average. To account for between-drug heterogeneity, we inflate the prior standard deviation. The team considers three options:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from typing import Dict, Tuple, List


@dataclass
class BayesianTreatmentAnalysis:
    """Bayesian Normal-Normal treatment effect analysis.

    Combines prior knowledge (e.g., meta-analysis) with observed data
    (e.g., observational study) using Normal-Normal conjugacy.

    Attributes:
        prior_mu: Prior mean treatment effect.
        prior_sigma: Prior standard deviation.
        data_mean: Observed treatment effect.
        data_se: Standard error of the observed effect.
    """
    prior_mu: float
    prior_sigma: float
    data_mean: float
    data_se: float

    @property
    def posterior_precision(self) -> float:
        return 1 / self.prior_sigma**2 + 1 / self.data_se**2

    @property
    def posterior_sigma(self) -> float:
        return 1 / np.sqrt(self.posterior_precision)

    @property
    def posterior_mu(self) -> float:
        prior_prec = 1 / self.prior_sigma**2
        data_prec = 1 / self.data_se**2
        return (prior_prec * self.prior_mu + data_prec * self.data_mean) / self.posterior_precision

    @property
    def prior_weight(self) -> float:
        """Fraction of posterior precision contributed by the prior."""
        return (1 / self.prior_sigma**2) / self.posterior_precision

    @property
    def data_weight(self) -> float:
        """Fraction of posterior precision contributed by the data."""
        return (1 / self.data_se**2) / self.posterior_precision

    def credible_interval(self, level: float = 0.95) -> Tuple[float, float]:
        z = stats.norm.ppf((1 + level) / 2)
        return (self.posterior_mu - z * self.posterior_sigma,
                self.posterior_mu + z * self.posterior_sigma)

    def prob_above(self, threshold: float) -> float:
        """P(treatment effect > threshold | data)."""
        return 1 - stats.norm.cdf(threshold, self.posterior_mu, self.posterior_sigma)

    def prob_in_range(self, lower: float, upper: float) -> float:
        """P(lower < treatment effect < upper | data)."""
        return (stats.norm.cdf(upper, self.posterior_mu, self.posterior_sigma)
                - stats.norm.cdf(lower, self.posterior_mu, self.posterior_sigma))


from dataclasses import dataclass


# Three prior specifications reflecting different levels of confidence
# in transferability from the drug class to Cardiopril-X
prior_specs = {
    "Tight (class transfer)": {"mu": 10.2, "sigma": 1.07},
    "Moderate (drug-specific uncertainty)": {"mu": 10.2, "sigma": 2.0},
    "Broad (skeptical about transfer)": {"mu": 10.2, "sigma": 4.0},
}

# Observational data
data_mean = 8.4
data_se = 0.9

print("=" * 80)
print("MediCore Cardiopril-X — Bayesian Treatment Effect Analysis")
print("=" * 80)
print(f"\nObservational data: {data_mean} ± {data_se} mmHg (propensity-matched, n=4200)")
print(f"Frequentist 95% CI: [{data_mean - 1.96*data_se:.1f}, {data_mean + 1.96*data_se:.1f}]\n")

analyses = {}
for label, spec in prior_specs.items():
    analysis = BayesianTreatmentAnalysis(
        prior_mu=spec["mu"],
        prior_sigma=spec["sigma"],
        data_mean=data_mean,
        data_se=data_se,
    )
    analyses[label] = analysis
    ci = analysis.credible_interval()

    print(f"Prior: {label}")
    print(f"  Prior:     N({spec['mu']}, {spec['sigma']}²) — weight: {analysis.prior_weight:.1%}")
    print(f"  Posterior: N({analysis.posterior_mu:.2f}, {analysis.posterior_sigma:.2f}²) "
          f"— data weight: {analysis.data_weight:.1%}")
    print(f"  95% CI:    [{ci[0]:.2f}, {ci[1]:.2f}]")
    print(f"  P(effect > 5 mmHg):  {analysis.prob_above(5.0):.4f}")
    print(f"  P(effect > 8 mmHg):  {analysis.prob_above(8.0):.4f}")
    print(f"  P(effect > 10 mmHg): {analysis.prob_above(10.0):.4f}")
    print()

================================================================================
MediCore Cardiopril-X — Bayesian Treatment Effect Analysis
================================================================================

Observational data: 8.4 ± 0.9 mmHg (propensity-matched, n=4200)
Frequentist 95% CI: [6.6, 10.2]

Prior: Tight (class transfer)
  Prior:     N(10.2, 1.07²) — weight: 41.5%
  Posterior: N(9.14, 0.69²) — data weight: 58.5%
  95% CI:    [7.79, 10.49]
  P(effect > 5 mmHg):  1.0000
  P(effect > 8 mmHg):  0.9505
  P(effect > 10 mmHg): 0.1060

Prior: Moderate (drug-specific uncertainty)
  Prior:     N(10.2, 2.0²) — weight: 16.8%
  Posterior: N(8.70, 0.82²) — data weight: 83.2%
  95% CI:    [7.09, 10.31]
  P(effect > 5 mmHg):  1.0000
  P(effect > 8 mmHg):  0.8039
  P(effect > 10 mmHg): 0.0564

Prior: Broad (skeptical about transfer)
  Prior:     N(10.2, 4.0²) — weight: 4.8%
  Posterior: N(8.49, 0.89²) — data weight: 95.2%
  95% CI:    [6.75, 10.22]
  P(effect > 5 mmHg):  0.9999
  P(effect > 8 mmHg):  0.7092
  P(effect > 10 mmHg): 0.0447

Step 2: Interpreting the Results

The analysis reveals a critical pattern: the conclusion about clinical significance is robust to prior choice, but the precise estimate is not.

Across all three priors, $P(\text{effect} > 5 \text{ mmHg} \mid D)$ exceeds 99.99%. The drug almost certainly produces a clinically meaningful blood pressure reduction — this conclusion does not depend on how much weight we give the prior.

However, the posterior mean ranges from 8.49 (broad prior, data-dominated) to 9.14 (tight prior, substantial prior influence). If the question is "how much bigger than 5 mmHg is the effect?", the answer depends on the prior.

# Visualize all three analyses
theta = np.linspace(2, 16, 500)

fig, axes = plt.subplots(1, 3, figsize=(16, 5), sharey=True)

for ax, (label, analysis) in zip(axes, analyses.items()):
    # Prior
    prior_pdf = stats.norm.pdf(theta, analysis.prior_mu, analysis.prior_sigma)
    ax.plot(theta, prior_pdf, "--", label="Prior", linewidth=2, color="steelblue")

    # Likelihood (scaled for visualization)
    lik_pdf = stats.norm.pdf(theta, analysis.data_mean, analysis.data_se)
    lik_pdf_scaled = lik_pdf * (prior_pdf.max() / lik_pdf.max()) * 0.8
    ax.plot(theta, lik_pdf_scaled, ":", label="Likelihood (scaled)", linewidth=2,
            color="gray")

    # Posterior
    post_pdf = stats.norm.pdf(theta, analysis.posterior_mu, analysis.posterior_sigma)
    ax.plot(theta, post_pdf, "-", label="Posterior", linewidth=2, color="darkorange")

    # Clinical threshold
    ax.axvline(5.0, color="red", linestyle="-.", alpha=0.4, label="Clinical threshold")

    ax.set_title(label, fontsize=10)
    ax.set_xlabel("SBP reduction (mmHg)")
    ax.legend(fontsize=8)

axes[0].set_ylabel("Density")
plt.suptitle("Prior Sensitivity Analysis: Cardiopril-X Treatment Effect", y=1.02)
plt.tight_layout()
plt.show()

Step 3: Sequential Updating as the RCT Progresses

MediCore's RCT enrolls patients in cohorts. Bayesian sequential updating allows the team to monitor the accumulating evidence without the multiple-testing penalty that plagues frequentist interim analyses.

def simulate_rct_sequential_update(
    true_effect: float,
    patient_sigma: float,
    cohort_size: int,
    n_cohorts: int,
    prior_mu: float,
    prior_sigma: float,
    seed: int = 42,
) -> List[Dict]:
    """Simulate sequential Bayesian updating as RCT cohorts arrive.

    Args:
        true_effect: True treatment effect (mmHg).
        patient_sigma: Within-patient variability.
        cohort_size: Patients per cohort.
        n_cohorts: Number of cohorts.
        prior_mu: Starting prior mean (from observational analysis).
        prior_sigma: Starting prior standard deviation.
        seed: Random seed.

    Returns:
        List of dictionaries with posterior summary at each stage.
    """
    rng = np.random.RandomState(seed)
    results = []

    current_mu = prior_mu
    current_sigma = prior_sigma

    # Initial state (before RCT data)
    results.append({
        "cohort": 0,
        "n_patients": 0,
        "posterior_mu": current_mu,
        "posterior_sigma": current_sigma,
        "ci_lower": current_mu - 1.96 * current_sigma,
        "ci_upper": current_mu + 1.96 * current_sigma,
        "prob_above_5": 1 - stats.norm.cdf(5.0, current_mu, current_sigma),
    })

    cumulative_n = 0
    for cohort in range(1, n_cohorts + 1):
        # Simulate cohort data
        cohort_effects = rng.normal(true_effect, patient_sigma, cohort_size)
        cohort_mean = cohort_effects.mean()
        cohort_se = patient_sigma / np.sqrt(cohort_size)

        # Bayesian update
        prior_prec = 1 / current_sigma**2
        data_prec = 1 / cohort_se**2
        post_prec = prior_prec + data_prec
        current_sigma = 1 / np.sqrt(post_prec)
        current_mu = (prior_prec * current_mu + data_prec * cohort_mean) / post_prec

        cumulative_n += cohort_size
        results.append({
            "cohort": cohort,
            "n_patients": cumulative_n,
            "posterior_mu": current_mu,
            "posterior_sigma": current_sigma,
            "ci_lower": current_mu - 1.96 * current_sigma,
            "ci_upper": current_mu + 1.96 * current_sigma,
            "prob_above_5": 1 - stats.norm.cdf(5.0, current_mu, current_sigma),
        })

    return results


# Start from the moderate prior posterior (which combines meta-analysis + observational data)
# Now the RCT provides additional evidence
rct_results = simulate_rct_sequential_update(
    true_effect=9.0,           # True effect (unknown to the team)
    patient_sigma=15.0,        # Within-patient SD of SBP change
    cohort_size=100,           # 100 patients per interim analysis
    n_cohorts=8,               # 8 cohorts = 800 patients total
    prior_mu=8.70,             # Starting from the observational posterior
    prior_sigma=0.82,          # Starting uncertainty
)

print("=== Sequential Bayesian Updating During RCT ===\n")
print(f"{'Cohort':>7s}  {'N':>5s}  {'Post.Mean':>9s}  {'Post.SD':>7s}  "
      f"{'95% CI':>18s}  {'P(>5)':>7s}")
print("-" * 62)
for r in rct_results:
    ci_str = f"[{r['ci_lower']:.2f}, {r['ci_upper']:.2f}]"
    print(f"{r['cohort']:>7d}  {r['n_patients']:>5d}  {r['posterior_mu']:>9.2f}  "
          f"{r['posterior_sigma']:>7.3f}  {ci_str:>18s}  {r['prob_above_5']:>7.4f}")

=== Sequential Bayesian Updating During RCT ===

 Cohort      N  Post.Mean  Post.SD            95% CI    P(>5)
--------------------------------------------------------------
      0      0       8.70    0.820  [7.09, 10.31]   1.0000
      1    100       8.71    0.734  [7.27, 10.15]   1.0000
      2    200       8.78    0.668  [7.47, 10.09]   1.0000
      3    300       8.82    0.615  [7.61, 10.03]   1.0000
      4    400       8.83    0.572  [7.71,  9.95]   1.0000
      5    500       8.86    0.536  [7.81,  9.91]   1.0000
      6    600       8.90    0.506  [7.91,  9.89]   1.0000
      7    700       8.91    0.480  [7.97,  9.85]   1.0000
      8    800       8.92    0.457  [8.03,  9.82]   1.0000

Key Findings

1. The Prior Provided Meaningful Regularization

The observational study alone estimated the treatment effect at 8.4 mmHg with a wide confidence interval of [6.6, 10.2]. Incorporating the meta-analytic prior (under moderate assumptions) tightened the credible interval to [7.09, 10.31] and pulled the estimate toward the drug-class average. The prior did not bias the result — it improved precision by incorporating legitimate external evidence.

2. Prior Sensitivity Was Bounded

The posterior mean ranged from 8.49 to 9.14 across three priors spanning a factor-of-4 range in prior standard deviation. For the key clinical question ($P(\text{effect} > 5) > 0.99$?), all three priors agreed. This robustness increases confidence in the conclusion.

3. Sequential Updating Was Natural and Efficient

As RCT cohorts arrived, the posterior updated smoothly without multiple-testing adjustments. The credible interval narrowed from width 3.22 (before RCT) to 1.79 (after 800 patients). At every interim analysis, the team could answer "given everything we know, what is the probability the drug works?" — a question the frequentist framework cannot directly address mid-trial without pre-specified stopping rules.

4. The Bayesian and Frequentist Answers Converge

After 800 RCT patients, the posterior mean (8.92) is close to what a frequentist analysis of the full dataset would produce. The Bayesian advantage was in the early stages: at cohort 0 (before any RCT data), the Bayesian estimate was already informative because it incorporated the meta-analysis and observational study. A frequentist analysis at that point would have had nothing to say about the RCT.

Regulatory Considerations

The FDA's 2019 guidance on "Interacting with the FDA on Complex Innovative Trial Designs" explicitly acknowledges Bayesian methods for clinical trials. Key regulatory requirements include:

Pre-specification of the prior: The prior must be declared in the statistical analysis plan before data collection. Post-hoc prior selection is not acceptable.
Prior justification: The prior must be based on relevant external evidence (published studies, meta-analyses), and the relevance of the external data to the current study must be argued.
Sensitivity analysis: Results under multiple priors (including a skeptical prior centered at no effect) must be reported.
Frequentist operating characteristics: Even for Bayesian trials, regulators require evidence of acceptable Type I error rates and power under repeated sampling. The Bayesian decision rule (e.g., "declare efficacy if $P(\text{effect} > 0 \mid D) > 0.975$") must be calibrated to control the frequentist false positive rate.

This case study demonstrates that Bayesian and frequentist approaches are not competing philosophies — they are complementary tools that serve different aspects of the regulatory and scientific process. The Bayesian framework provides natural incorporation of prior evidence and direct probability statements about the treatment effect. The frequentist framework provides operating characteristics (error rates, power) that regulators require for approval decisions.

Production Reality: In pharmaceutical data science, the debate is not "Bayesian vs. frequentist" but "how to use both." The primary analysis is usually frequentist (for regulatory acceptance), with a pre-specified Bayesian analysis as a co-primary or sensitivity analysis. The Bayesian analysis is most valued for adaptive trial designs, where enrollment, dose selection, or early stopping decisions benefit from formal probability statements about treatment effects.