Case Study 1: Bayesian Treatment Effect Estimation at MediCore Pharmaceuticals

From Clinical Priors to Regulatory Decisions


Background

MediCore Pharmaceuticals is evaluating CardioShield, a novel antiplatelet agent designed to reduce major adverse cardiovascular events (MACE) in patients with acute coronary syndrome. A Phase III randomized controlled trial is planned, but before committing \$120M to the trial, the regulatory science team needs to integrate existing evidence and determine whether the drug's observed efficacy warrants the investment.

The available evidence comes from three sources:

  1. Mechanism of action: CardioShield targets the P2Y12 receptor with higher selectivity than existing agents. Preclinical models suggest a 15-25% relative risk reduction (RRR) in MACE.
  2. Phase II trial: 400 patients randomized (200 treatment, 200 placebo). Treatment group: 24 MACE events. Placebo group: 34 MACE events.
  3. Registry data: An observational database of 8,200 patients treated with off-label CardioShield at three academic medical centers. MACE rate: 9.8% in treated vs. 13.1% in matched controls (propensity score matching).

The regulatory team must answer: What is the probability that CardioShield's true MACE rate reduction exceeds the clinically meaningful threshold of 10% RRR?

The Bayesian Framework

We model the MACE rate in the treatment group as $\theta_T$ and in the placebo group as $\theta_P$, with the relative risk reduction defined as:

$$\text{RRR} = 1 - \frac{\theta_T}{\theta_P}$$

For tractability, we begin with a simpler formulation: estimating each rate independently using the Beta-Binomial conjugate model, then computing the derived quantity RRR from the joint posterior.

Step 1: Specifying the Prior

The prior encodes clinical knowledge before observing Phase II data. MediCore's clinical advisory board provides the following assessments:

  • Placebo MACE rate: Historical trials in this population show MACE rates of 12-18%. We encode this as $\theta_P \sim \text{Beta}(15, 85)$, which has mean 0.15 and a 95% credible interval of approximately [0.09, 0.22].
  • Treatment MACE rate: Based on the mechanism of action and preclinical data, experts believe the treatment rate will be 10-15% (lower than placebo). We encode this as $\theta_T \sim \text{Beta}(12, 88)$, with mean 0.12 and 95% CI of approximately [0.07, 0.18].
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Prior specification
prior_T = {"alpha": 12, "beta": 88}  # Treatment MACE rate
prior_P = {"alpha": 15, "beta": 85}  # Placebo MACE rate

# Verify prior properties
for name, prior in [("Treatment", prior_T), ("Placebo", prior_P)]:
    a, b = prior["alpha"], prior["beta"]
    dist = stats.beta(a, b)
    print(f"{name} prior: Beta({a}, {b})")
    print(f"  Mean: {dist.mean():.3f}")
    print(f"  Std:  {dist.std():.3f}")
    print(f"  95% CI: [{dist.ppf(0.025):.3f}, {dist.ppf(0.975):.3f}]")
    print()

Research Insight: Prior elicitation is both art and science. The Sheffield Elicitation Framework (SHELF) provides structured protocols for translating expert judgment into probability distributions. Key principles: (1) elicit quantiles, not parameters; (2) use multiple experts independently to avoid anchoring; (3) document the elicitation process for regulatory transparency. MediCore's regulatory submission must justify each prior to the FDA.

Step 2: Updating with Phase II Data

The Phase II trial provides the likelihood:

  • Treatment: $n_T = 200$, $k_T = 24$ MACE events
  • Placebo: $n_P = 200$, $k_P = 34$ MACE events

With the Beta-Binomial conjugate model, the posterior is immediate:

$$\theta_T \mid \text{data} \sim \text{Beta}(\alpha_T + k_T, \beta_T + n_T - k_T)$$ $$\theta_P \mid \text{data} \sim \text{Beta}(\alpha_P + k_P, \beta_P + n_P - k_P)$$

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Phase II data
n_T, k_T = 200, 24  # Treatment: 24/200 MACE
n_P, k_P = 200, 34  # Placebo: 34/200 MACE

# Prior parameters
prior_T_a, prior_T_b = 12, 88
prior_P_a, prior_P_b = 15, 85

# Posterior parameters (conjugate update)
post_T_a = prior_T_a + k_T       # 12 + 24 = 36
post_T_b = prior_T_b + n_T - k_T  # 88 + 176 = 264
post_P_a = prior_P_a + k_P       # 15 + 34 = 49
post_P_b = prior_P_b + n_P - k_P  # 85 + 166 = 251

print("Posterior distributions:")
for name, a, b in [("Treatment", post_T_a, post_T_b),
                     ("Placebo", post_P_a, post_P_b)]:
    dist = stats.beta(a, b)
    print(f"  {name}: Beta({a}, {b})")
    print(f"    Mean: {dist.mean():.4f}")
    print(f"    95% CI: [{dist.ppf(0.025):.4f}, {dist.ppf(0.975):.4f}]")

# Visualize prior → posterior shift
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
theta = np.linspace(0, 0.35, 1000)

for ax, name, prior_a, prior_b, post_a, post_b, k, n in [
    (axes[0], "Treatment (MACE rate)", prior_T_a, prior_T_b,
     post_T_a, post_T_b, k_T, n_T),
    (axes[1], "Placebo (MACE rate)", prior_P_a, prior_P_b,
     post_P_a, post_P_b, k_P, n_P),
]:
    ax.plot(theta, stats.beta.pdf(theta, prior_a, prior_b),
            'b--', linewidth=2, label="Prior")
    ax.plot(theta, stats.beta.pdf(theta, post_a, post_b),
            'r-', linewidth=3, label="Posterior")
    ax.axvline(k / n, color='green', linestyle=':', linewidth=1.5,
               label=f"MLE = {k/n:.3f}")
    ax.set_title(name)
    ax.set_xlabel(r"$\theta$")
    ax.legend()

plt.suptitle("MediCore CardioShield: Bayesian Update with Phase II Data",
             fontsize=13)
plt.tight_layout()

Step 3: Computing the Posterior on the Relative Risk Reduction

The RRR is a derived quantity: $\text{RRR} = 1 - \theta_T / \theta_P$. Since we have the posterior distributions for $\theta_T$ and $\theta_P$ independently, we use Monte Carlo sampling to compute the posterior distribution of RRR.

import numpy as np
from scipy import stats

def compute_posterior_rrr(
    post_T_a: float, post_T_b: float,
    post_P_a: float, post_P_b: float,
    n_samples: int = 500_000,
    random_state: int = 42,
) -> dict[str, float]:
    """Compute posterior distribution of relative risk reduction.

    RRR = 1 - theta_T / theta_P

    Args:
        post_T_a, post_T_b: Posterior Beta parameters for treatment.
        post_P_a, post_P_b: Posterior Beta parameters for placebo.
        n_samples: Number of Monte Carlo samples.
        random_state: Random seed.

    Returns:
        Dictionary with posterior summary statistics.
    """
    rng = np.random.RandomState(random_state)

    theta_T_samples = rng.beta(post_T_a, post_T_b, size=n_samples)
    theta_P_samples = rng.beta(post_P_a, post_P_b, size=n_samples)

    rrr_samples = 1 - theta_T_samples / theta_P_samples

    results = {
        "mean": np.mean(rrr_samples),
        "median": np.median(rrr_samples),
        "std": np.std(rrr_samples),
        "ci_2.5": np.percentile(rrr_samples, 2.5),
        "ci_97.5": np.percentile(rrr_samples, 97.5),
        "p_rrr_gt_0": np.mean(rrr_samples > 0),
        "p_rrr_gt_10pct": np.mean(rrr_samples > 0.10),
        "p_rrr_gt_20pct": np.mean(rrr_samples > 0.20),
        "samples": rrr_samples,
    }
    return results

results = compute_posterior_rrr(post_T_a, post_T_b, post_P_a, post_P_b)

print("Posterior distribution of Relative Risk Reduction (RRR):")
print(f"  Mean:                    {results['mean']:.3f} ({results['mean']*100:.1f}%)")
print(f"  Median:                  {results['median']:.3f}")
print(f"  95% Credible Interval:   [{results['ci_2.5']:.3f}, {results['ci_97.5']:.3f}]")
print()
print("Decision-relevant probabilities:")
print(f"  P(RRR > 0%  | data):     {results['p_rrr_gt_0']:.4f}")
print(f"  P(RRR > 10% | data):     {results['p_rrr_gt_10pct']:.4f}")
print(f"  P(RRR > 20% | data):     {results['p_rrr_gt_20pct']:.4f}")

Step 4: Decision Analysis

MediCore's decision is not just "is the drug effective?" but "should we invest \$120M in a Phase III trial?" This requires combining the posterior with a decision-theoretic framework.

import numpy as np

def expected_value_phase3(
    rrr_samples: np.ndarray,
    trial_cost: float = 120e6,
    peak_revenue_if_approved: float = 2.5e9,
    p_regulatory_approval_given_efficacy: float = 0.70,
    rrr_threshold: float = 0.10,
) -> dict[str, float]:
    """Compute expected value of proceeding to Phase III.

    Args:
        rrr_samples: Monte Carlo samples from RRR posterior.
        trial_cost: Cost of Phase III trial.
        peak_revenue_if_approved: Revenue if drug is approved.
        p_regulatory_approval_given_efficacy: P(FDA approval | RRR > threshold).
        rrr_threshold: Minimum RRR for regulatory approval.

    Returns:
        Dictionary with decision analysis results.
    """
    # Probability of true efficacy exceeding threshold
    p_efficacy = np.mean(rrr_samples > rrr_threshold)

    # Expected revenue
    expected_revenue = (
        p_efficacy
        * p_regulatory_approval_given_efficacy
        * peak_revenue_if_approved
    )

    # Expected value of proceeding
    ev_proceed = expected_revenue - trial_cost

    # Expected value of not proceeding
    ev_stop = 0.0

    return {
        "p_efficacy": p_efficacy,
        "expected_revenue": expected_revenue,
        "ev_proceed": ev_proceed,
        "ev_stop": ev_stop,
        "decision": "PROCEED" if ev_proceed > ev_stop else "STOP",
    }

decision = expected_value_phase3(results["samples"])

print("Phase III Decision Analysis:")
print(f"  P(RRR > 10%):              {decision['p_efficacy']:.3f}")
print(f"  Expected revenue:          ${decision['expected_revenue']/1e6:,.0f}M")
print(f"  Trial cost:                $120M")
print(f"  Expected value (proceed):  ${decision['ev_proceed']/1e6:,.0f}M")
print(f"  Recommendation:            {decision['decision']}")

Step 5: Sensitivity to the Prior

A responsible Bayesian analysis must assess sensitivity to the prior specification.

import numpy as np
from scipy import stats

# Sensitivity analysis: varying prior strength
print("Sensitivity to prior specification:\n")
print(f"{'Prior':>25s} | {'Post. Mean RRR':>15s} | {'P(RRR>10%)':>10s}")
print("-" * 60)

for prior_label, t_a, t_b, p_a, p_b in [
    ("Uninformative (1,1)", 1, 1, 1, 1),
    ("Weakly informative", 6, 44, 8, 42),
    ("Base case", 12, 88, 15, 85),
    ("Strong prior (skeptical)", 20, 180, 20, 80),
    ("Strong prior (optimistic)", 8, 72, 20, 80),
]:
    post_ta = t_a + k_T
    post_tb = t_b + n_T - k_T
    post_pa = p_a + k_P
    post_pb = p_b + n_P - k_P

    rng = np.random.RandomState(42)
    theta_t = rng.beta(post_ta, post_tb, size=200_000)
    theta_p = rng.beta(post_pa, post_pb, size=200_000)
    rrr = 1 - theta_t / theta_p

    print(f"{prior_label:>25s} | {np.mean(rrr):>14.3f} | {np.mean(rrr > 0.10):>10.3f}")

Analysis and Discussion

Key findings:

  1. The posterior RRR is positive with high probability (~95%), indicating strong evidence that CardioShield reduces MACE events relative to placebo.

  2. The probability of clinically meaningful efficacy (RRR > 10%) is approximately 70-80% depending on the prior — sufficient to justify a Phase III investment under standard pharmaceutical decision-making criteria.

  3. Prior sensitivity is moderate. With 200 patients per arm, the Phase II data is informative enough that even substantial prior changes (uninformative to strongly skeptical) shift the posterior mean by only 5-8 percentage points. This indicates that the conclusion is data-driven, not prior-driven.

  4. The frequentist comparison is instructive. A two-proportion $z$-test on the Phase II data yields $p \approx 0.13$ — not statistically significant at $\alpha = 0.05$. The Bayesian analysis, which integrates prior clinical knowledge, gives a more nuanced picture: the drug is probably effective, but uncertainty remains. The frequentist test's binary "significant / not significant" framework obscures the decision-relevant probability $P(\text{RRR} > 10\% \mid \text{data})$.

Production Reality: In pharmaceutical development, Bayesian methods are increasingly accepted by the FDA and EMA for clinical trial design. The FDA's 2019 guidance on "Adaptive Designs for Clinical Trials of Drugs and Biologics" explicitly discusses Bayesian approaches. However, regulatory submissions must justify the prior, demonstrate robustness to prior specification, and provide frequentist operating characteristics (Type I error control) for the Bayesian design. The analysis above would be supplemented with simulation-based calibration showing that the Bayesian decision rule controls the probability of advancing an ineffective drug.

Connection to the Chapter

This case study demonstrates: - Bayes' theorem in action: Prior clinical knowledge + Phase II likelihood = posterior for decision-making (Section 3.4) - The Beta-Binomial conjugate model: Closed-form posterior updates (Section 3.4, 3.6) - Monte Carlo methods: Computing the posterior of a derived quantity (RRR) from samples (Section 3.12) - Bayesian vs. frequentist: The Bayesian analysis provides $P(\text{RRR} > \text{threshold} \mid \text{data})$ — directly answering the decision question — while the frequentist $p$-value answers a different question (Section 3.10) - Understanding Why (Theme 1): The mathematical framework does not just compute a number; it provides a principled language for integrating evidence, quantifying uncertainty, and making decisions