Case Study 2: MediCore Bayesian Treatment Effect Estimation — Clinical Prior Meets Observational Data
Context
MediCore Pharmaceuticals has developed Cardiopril-X, a new ACE inhibitor for treating mild-to-moderate hypertension. A randomized controlled trial (RCT) of 800 patients is planned but will not report results for 18 months. In the interim, MediCore's data science team has access to two information sources:
-
Prior knowledge from the drug class. Three published RCTs of related ACE inhibitors (enalapril, lisinopril, ramipril) report systolic blood pressure (SBP) reductions of 9.3, 11.2, and 10.1 mmHg respectively, with pooled standard errors. A meta-analysis of the three studies estimates the class-level treatment effect at 10.2 mmHg (95% CI: [8.1, 12.3]).
-
Observational data. MediCore has electronic health records from 340 hospitals covering 2.1 million patients. Among these, 4,200 patients received Cardiopril-X off-label (prescribed before the full trial results are available, as permitted for drugs with class-level evidence). After propensity-score matching to a control group of 4,200 similar patients, the estimated SBP reduction attributable to Cardiopril-X is 8.4 mmHg with a standard error of 0.9 mmHg.
The regulatory team asks: "Given what we know from prior ACE inhibitor trials and from the observational study, what is our best estimate of Cardiopril-X's treatment effect, and how certain are we?"
A frequentist analysis of the observational data alone yields a 95% confidence interval of [6.6, 10.2] mmHg. This is useful but ignores the substantial prior evidence. A Bayesian analysis can formally combine both sources.
The Analysis
Step 1: Specifying the Prior
The meta-analytic prior summarizes the treatment effect of the ACE inhibitor drug class. The pooled estimate is 10.2 mmHg with a 95% CI of [8.1, 12.3], corresponding to a standard error of approximately $(12.3 - 8.1) / (2 \times 1.96) \approx 1.07$ mmHg.
However, Cardiopril-X is a new compound — it may differ from the class average. To account for between-drug heterogeneity, we inflate the prior standard deviation. The team considers three options:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from typing import Dict, Tuple, List
@dataclass
class BayesianTreatmentAnalysis:
"""Bayesian Normal-Normal treatment effect analysis.
Combines prior knowledge (e.g., meta-analysis) with observed data
(e.g., observational study) using Normal-Normal conjugacy.
Attributes:
prior_mu: Prior mean treatment effect.
prior_sigma: Prior standard deviation.
data_mean: Observed treatment effect.
data_se: Standard error of the observed effect.
"""
prior_mu: float
prior_sigma: float
data_mean: float
data_se: float
@property
def posterior_precision(self) -> float:
return 1 / self.prior_sigma**2 + 1 / self.data_se**2
@property
def posterior_sigma(self) -> float:
return 1 / np.sqrt(self.posterior_precision)
@property
def posterior_mu(self) -> float:
prior_prec = 1 / self.prior_sigma**2
data_prec = 1 / self.data_se**2
return (prior_prec * self.prior_mu + data_prec * self.data_mean) / self.posterior_precision
@property
def prior_weight(self) -> float:
"""Fraction of posterior precision contributed by the prior."""
return (1 / self.prior_sigma**2) / self.posterior_precision
@property
def data_weight(self) -> float:
"""Fraction of posterior precision contributed by the data."""
return (1 / self.data_se**2) / self.posterior_precision
def credible_interval(self, level: float = 0.95) -> Tuple[float, float]:
z = stats.norm.ppf((1 + level) / 2)
return (self.posterior_mu - z * self.posterior_sigma,
self.posterior_mu + z * self.posterior_sigma)
def prob_above(self, threshold: float) -> float:
"""P(treatment effect > threshold | data)."""
return 1 - stats.norm.cdf(threshold, self.posterior_mu, self.posterior_sigma)
def prob_in_range(self, lower: float, upper: float) -> float:
"""P(lower < treatment effect < upper | data)."""
return (stats.norm.cdf(upper, self.posterior_mu, self.posterior_sigma)
- stats.norm.cdf(lower, self.posterior_mu, self.posterior_sigma))
from dataclasses import dataclass
# Three prior specifications reflecting different levels of confidence
# in transferability from the drug class to Cardiopril-X
prior_specs = {
"Tight (class transfer)": {"mu": 10.2, "sigma": 1.07},
"Moderate (drug-specific uncertainty)": {"mu": 10.2, "sigma": 2.0},
"Broad (skeptical about transfer)": {"mu": 10.2, "sigma": 4.0},
}
# Observational data
data_mean = 8.4
data_se = 0.9
print("=" * 80)
print("MediCore Cardiopril-X — Bayesian Treatment Effect Analysis")
print("=" * 80)
print(f"\nObservational data: {data_mean} ± {data_se} mmHg (propensity-matched, n=4200)")
print(f"Frequentist 95% CI: [{data_mean - 1.96*data_se:.1f}, {data_mean + 1.96*data_se:.1f}]\n")
analyses = {}
for label, spec in prior_specs.items():
analysis = BayesianTreatmentAnalysis(
prior_mu=spec["mu"],
prior_sigma=spec["sigma"],
data_mean=data_mean,
data_se=data_se,
)
analyses[label] = analysis
ci = analysis.credible_interval()
print(f"Prior: {label}")
print(f" Prior: N({spec['mu']}, {spec['sigma']}²) — weight: {analysis.prior_weight:.1%}")
print(f" Posterior: N({analysis.posterior_mu:.2f}, {analysis.posterior_sigma:.2f}²) "
f"— data weight: {analysis.data_weight:.1%}")
print(f" 95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")
print(f" P(effect > 5 mmHg): {analysis.prob_above(5.0):.4f}")
print(f" P(effect > 8 mmHg): {analysis.prob_above(8.0):.4f}")
print(f" P(effect > 10 mmHg): {analysis.prob_above(10.0):.4f}")
print()
================================================================================
MediCore Cardiopril-X — Bayesian Treatment Effect Analysis
================================================================================
Observational data: 8.4 ± 0.9 mmHg (propensity-matched, n=4200)
Frequentist 95% CI: [6.6, 10.2]
Prior: Tight (class transfer)
Prior: N(10.2, 1.07²) — weight: 41.5%
Posterior: N(9.14, 0.69²) — data weight: 58.5%
95% CI: [7.79, 10.49]
P(effect > 5 mmHg): 1.0000
P(effect > 8 mmHg): 0.9505
P(effect > 10 mmHg): 0.1060
Prior: Moderate (drug-specific uncertainty)
Prior: N(10.2, 2.0²) — weight: 16.8%
Posterior: N(8.70, 0.82²) — data weight: 83.2%
95% CI: [7.09, 10.31]
P(effect > 5 mmHg): 1.0000
P(effect > 8 mmHg): 0.8039
P(effect > 10 mmHg): 0.0564
Prior: Broad (skeptical about transfer)
Prior: N(10.2, 4.0²) — weight: 4.8%
Posterior: N(8.49, 0.89²) — data weight: 95.2%
95% CI: [6.75, 10.22]
P(effect > 5 mmHg): 0.9999
P(effect > 8 mmHg): 0.7092
P(effect > 10 mmHg): 0.0447
Step 2: Interpreting the Results
The analysis reveals a critical pattern: the conclusion about clinical significance is robust to prior choice, but the precise estimate is not.
Across all three priors, $P(\text{effect} > 5 \text{ mmHg} \mid D)$ exceeds 99.99%. The drug almost certainly produces a clinically meaningful blood pressure reduction — this conclusion does not depend on how much weight we give the prior.
However, the posterior mean ranges from 8.49 (broad prior, data-dominated) to 9.14 (tight prior, substantial prior influence). If the question is "how much bigger than 5 mmHg is the effect?", the answer depends on the prior.
# Visualize all three analyses
theta = np.linspace(2, 16, 500)
fig, axes = plt.subplots(1, 3, figsize=(16, 5), sharey=True)
for ax, (label, analysis) in zip(axes, analyses.items()):
# Prior
prior_pdf = stats.norm.pdf(theta, analysis.prior_mu, analysis.prior_sigma)
ax.plot(theta, prior_pdf, "--", label="Prior", linewidth=2, color="steelblue")
# Likelihood (scaled for visualization)
lik_pdf = stats.norm.pdf(theta, analysis.data_mean, analysis.data_se)
lik_pdf_scaled = lik_pdf * (prior_pdf.max() / lik_pdf.max()) * 0.8
ax.plot(theta, lik_pdf_scaled, ":", label="Likelihood (scaled)", linewidth=2,
color="gray")
# Posterior
post_pdf = stats.norm.pdf(theta, analysis.posterior_mu, analysis.posterior_sigma)
ax.plot(theta, post_pdf, "-", label="Posterior", linewidth=2, color="darkorange")
# Clinical threshold
ax.axvline(5.0, color="red", linestyle="-.", alpha=0.4, label="Clinical threshold")
ax.set_title(label, fontsize=10)
ax.set_xlabel("SBP reduction (mmHg)")
ax.legend(fontsize=8)
axes[0].set_ylabel("Density")
plt.suptitle("Prior Sensitivity Analysis: Cardiopril-X Treatment Effect", y=1.02)
plt.tight_layout()
plt.show()
Step 3: Sequential Updating as the RCT Progresses
MediCore's RCT enrolls patients in cohorts. Bayesian sequential updating allows the team to monitor the accumulating evidence without the multiple-testing penalty that plagues frequentist interim analyses.
def simulate_rct_sequential_update(
true_effect: float,
patient_sigma: float,
cohort_size: int,
n_cohorts: int,
prior_mu: float,
prior_sigma: float,
seed: int = 42,
) -> List[Dict]:
"""Simulate sequential Bayesian updating as RCT cohorts arrive.
Args:
true_effect: True treatment effect (mmHg).
patient_sigma: Within-patient variability.
cohort_size: Patients per cohort.
n_cohorts: Number of cohorts.
prior_mu: Starting prior mean (from observational analysis).
prior_sigma: Starting prior standard deviation.
seed: Random seed.
Returns:
List of dictionaries with posterior summary at each stage.
"""
rng = np.random.RandomState(seed)
results = []
current_mu = prior_mu
current_sigma = prior_sigma
# Initial state (before RCT data)
results.append({
"cohort": 0,
"n_patients": 0,
"posterior_mu": current_mu,
"posterior_sigma": current_sigma,
"ci_lower": current_mu - 1.96 * current_sigma,
"ci_upper": current_mu + 1.96 * current_sigma,
"prob_above_5": 1 - stats.norm.cdf(5.0, current_mu, current_sigma),
})
cumulative_n = 0
for cohort in range(1, n_cohorts + 1):
# Simulate cohort data
cohort_effects = rng.normal(true_effect, patient_sigma, cohort_size)
cohort_mean = cohort_effects.mean()
cohort_se = patient_sigma / np.sqrt(cohort_size)
# Bayesian update
prior_prec = 1 / current_sigma**2
data_prec = 1 / cohort_se**2
post_prec = prior_prec + data_prec
current_sigma = 1 / np.sqrt(post_prec)
current_mu = (prior_prec * current_mu + data_prec * cohort_mean) / post_prec
cumulative_n += cohort_size
results.append({
"cohort": cohort,
"n_patients": cumulative_n,
"posterior_mu": current_mu,
"posterior_sigma": current_sigma,
"ci_lower": current_mu - 1.96 * current_sigma,
"ci_upper": current_mu + 1.96 * current_sigma,
"prob_above_5": 1 - stats.norm.cdf(5.0, current_mu, current_sigma),
})
return results
# Start from the moderate prior posterior (which combines meta-analysis + observational data)
# Now the RCT provides additional evidence
rct_results = simulate_rct_sequential_update(
true_effect=9.0, # True effect (unknown to the team)
patient_sigma=15.0, # Within-patient SD of SBP change
cohort_size=100, # 100 patients per interim analysis
n_cohorts=8, # 8 cohorts = 800 patients total
prior_mu=8.70, # Starting from the observational posterior
prior_sigma=0.82, # Starting uncertainty
)
print("=== Sequential Bayesian Updating During RCT ===\n")
print(f"{'Cohort':>7s} {'N':>5s} {'Post.Mean':>9s} {'Post.SD':>7s} "
f"{'95% CI':>18s} {'P(>5)':>7s}")
print("-" * 62)
for r in rct_results:
ci_str = f"[{r['ci_lower']:.2f}, {r['ci_upper']:.2f}]"
print(f"{r['cohort']:>7d} {r['n_patients']:>5d} {r['posterior_mu']:>9.2f} "
f"{r['posterior_sigma']:>7.3f} {ci_str:>18s} {r['prob_above_5']:>7.4f}")
=== Sequential Bayesian Updating During RCT ===
Cohort N Post.Mean Post.SD 95% CI P(>5)
--------------------------------------------------------------
0 0 8.70 0.820 [7.09, 10.31] 1.0000
1 100 8.71 0.734 [7.27, 10.15] 1.0000
2 200 8.78 0.668 [7.47, 10.09] 1.0000
3 300 8.82 0.615 [7.61, 10.03] 1.0000
4 400 8.83 0.572 [7.71, 9.95] 1.0000
5 500 8.86 0.536 [7.81, 9.91] 1.0000
6 600 8.90 0.506 [7.91, 9.89] 1.0000
7 700 8.91 0.480 [7.97, 9.85] 1.0000
8 800 8.92 0.457 [8.03, 9.82] 1.0000
Key Findings
1. The Prior Provided Meaningful Regularization
The observational study alone estimated the treatment effect at 8.4 mmHg with a wide confidence interval of [6.6, 10.2]. Incorporating the meta-analytic prior (under moderate assumptions) tightened the credible interval to [7.09, 10.31] and pulled the estimate toward the drug-class average. The prior did not bias the result — it improved precision by incorporating legitimate external evidence.
2. Prior Sensitivity Was Bounded
The posterior mean ranged from 8.49 to 9.14 across three priors spanning a factor-of-4 range in prior standard deviation. For the key clinical question ($P(\text{effect} > 5) > 0.99$?), all three priors agreed. This robustness increases confidence in the conclusion.
3. Sequential Updating Was Natural and Efficient
As RCT cohorts arrived, the posterior updated smoothly without multiple-testing adjustments. The credible interval narrowed from width 3.22 (before RCT) to 1.79 (after 800 patients). At every interim analysis, the team could answer "given everything we know, what is the probability the drug works?" — a question the frequentist framework cannot directly address mid-trial without pre-specified stopping rules.
4. The Bayesian and Frequentist Answers Converge
After 800 RCT patients, the posterior mean (8.92) is close to what a frequentist analysis of the full dataset would produce. The Bayesian advantage was in the early stages: at cohort 0 (before any RCT data), the Bayesian estimate was already informative because it incorporated the meta-analysis and observational study. A frequentist analysis at that point would have had nothing to say about the RCT.
Regulatory Considerations
The FDA's 2019 guidance on "Interacting with the FDA on Complex Innovative Trial Designs" explicitly acknowledges Bayesian methods for clinical trials. Key regulatory requirements include:
- Pre-specification of the prior: The prior must be declared in the statistical analysis plan before data collection. Post-hoc prior selection is not acceptable.
- Prior justification: The prior must be based on relevant external evidence (published studies, meta-analyses), and the relevance of the external data to the current study must be argued.
- Sensitivity analysis: Results under multiple priors (including a skeptical prior centered at no effect) must be reported.
- Frequentist operating characteristics: Even for Bayesian trials, regulators require evidence of acceptable Type I error rates and power under repeated sampling. The Bayesian decision rule (e.g., "declare efficacy if $P(\text{effect} > 0 \mid D) > 0.975$") must be calibrated to control the frequentist false positive rate.
This case study demonstrates that Bayesian and frequentist approaches are not competing philosophies — they are complementary tools that serve different aspects of the regulatory and scientific process. The Bayesian framework provides natural incorporation of prior evidence and direct probability statements about the treatment effect. The frequentist framework provides operating characteristics (error rates, power) that regulators require for approval decisions.
Production Reality: In pharmaceutical data science, the debate is not "Bayesian vs. frequentist" but "how to use both." The primary analysis is usually frequentist (for regulatory acceptance), with a pre-specified Bayesian analysis as a co-primary or sensitivity analysis. The Bayesian analysis is most valued for adaptive trial designs, where enrollment, dose selection, or early stopping decisions benefit from formal probability statements about treatment effects.