Case Study 1: Maya's Public Health Intervention Comparison

The Setup

Dr. Maya Chen has a problem that money alone can't solve.

Her county health department received a federal grant to reduce cardiovascular disease risk in four underserved communities. The department piloted four different intervention programs over the past year, each in a different community:

  • Program A (Vaccination + Screening): Focused on flu vaccination campaigns and free cardiovascular screening events
  • Program B (Nutrition Education): Weekly community cooking classes and nutritional counseling
  • Program C (Community Fitness): Free fitness classes, walking groups, and gym partnerships
  • Program D (Control): Standard care — pamphlets, a county health hotline, and annual health fairs

Now the grant renewal is due, and the county board wants to know: which program worked best? More importantly, they want to know whether the differences are large enough to justify the cost differences — Program C (fitness) costs twice as much per participant as Program A (vaccination).

Maya has health improvement scores (a composite index from 0 to 100, combining blood pressure change, cholesterol levels, BMI change, and self-reported health) for a random sample of participants in each program.

The Data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd

np.random.seed(2026)

# ============================================================
# MAYA'S INTERVENTION COMPARISON — COMPLETE ANALYSIS
# ============================================================

# Health improvement scores (0–100 composite index)
vaccination = [72, 78, 65, 74, 80, 71, 76, 69, 77, 73,
               68, 82, 75, 70, 79, 66, 74, 81, 73, 76,
               70, 77, 72, 75, 71]

nutrition   = [68, 75, 71, 62, 69, 74, 66, 72, 67, 70,
               64, 73, 69, 71, 65, 70, 68, 76, 63, 72,
               67, 71, 66, 69, 74]

fitness     = [81, 85, 79, 88, 83, 86, 80, 90, 84, 82,
               78, 87, 83, 85, 81, 79, 86, 89, 82, 84,
               80, 88, 83, 85, 81]

control     = [55, 62, 58, 51, 60, 57, 63, 54, 59, 56,
               53, 61, 57, 55, 64, 52, 58, 60, 56, 59,
               54, 62, 57, 55, 61]

programs = {
    'Vaccination': vaccination,
    'Nutrition': nutrition,
    'Fitness': fitness,
    'Control': control
}

# ---- Part 1: Descriptive Statistics ----
print("=" * 65)
print("MAYA'S INTERVENTION STUDY — COMPLETE ANOVA ANALYSIS")
print("=" * 65)

print(f"\n{'Program':<15} {'n':>5} {'Mean':>8} {'SD':>8} {'Min':>6} {'Max':>6}")
print("-" * 52)
for name, data in programs.items():
    d = np.array(data)
    print(f"{name:<15} {len(d):>5} {d.mean():>8.1f} {d.std(ddof=1):>8.2f} "
          f"{d.min():>6} {d.max():>6}")

all_data = np.concatenate(list(programs.values()))
grand_mean = np.mean(all_data)
print(f"\n{'Grand Mean':<15} {len(all_data):>5} {grand_mean:>8.1f}")

Step 1: Visualize the Data

# Side-by-side box plots
fig, ax = plt.subplots(figsize=(10, 6))
data_to_plot = [vaccination, nutrition, fitness, control]
bp = ax.boxplot(data_to_plot, labels=['Vaccination', 'Nutrition',
                                       'Fitness', 'Control'],
                patch_artist=True)
colors = ['#4C72B0', '#55A868', '#C44E52', '#8172B2']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
ax.set_ylabel('Health Improvement Score (0-100)')
ax.set_title("Maya's Intervention Programs: Health Improvement by Group")
ax.axhline(y=grand_mean, color='gray', linestyle='--', alpha=0.5,
           label=f'Grand Mean = {grand_mean:.1f}')
ax.legend()
plt.tight_layout()
plt.show()

The box plots tell a compelling story before we run a single test. The Fitness group is clearly higher, the Control group clearly lower, and Vaccination and Nutrition fall in between with some overlap.

Step 2: Check Assumptions

# ---- Assumption 1: Independence ----
print("\n" + "=" * 65)
print("ASSUMPTION CHECKS")
print("=" * 65)

print("\n1. Independence:")
print("   Participants were randomly selected from each community.")
print("   No participant appears in more than one program.")
print("   ✓ Independence assumption satisfied by study design.")

# ---- Assumption 2: Normality ----
print("\n2. Normality (Shapiro-Wilk test per group):")
for name, data in programs.items():
    stat, p = stats.shapiro(data)
    status = "✓ Normal" if p > 0.05 else "⚠ Non-normal"
    print(f"   {name:<15} W = {stat:.4f}, p = {p:.4f}  {status}")

# ---- Assumption 3: Equal Variances ----
print("\n3. Equal Variances:")
stat, p_levene = stats.levene(*programs.values())
print(f"   Levene's test: F = {stat:.3f}, p = {p_levene:.4f}")

sds = [np.std(d, ddof=1) for d in programs.values()]
ratio = max(sds) / min(sds)
print(f"   SD ratio (largest/smallest): {ratio:.2f}")
print(f"   Rule of thumb (< 2.0): {'✓ OK' if ratio < 2 else '⚠ Concern'}")
print(f"   Equal group sizes (n=25 each): ✓ Balanced design adds robustness")

All assumptions are satisfied: independence by design, normality within each group (all Shapiro-Wilk p-values > 0.05), and equal variances confirmed by Levene's test.

Step 3: Conduct the ANOVA

# ---- One-Way ANOVA ----
print("\n" + "=" * 65)
print("ONE-WAY ANOVA RESULTS")
print("=" * 65)

F_stat, p_value = stats.f_oneway(*programs.values())

# Manual ANOVA table
k = len(programs)
N = len(all_data)

ss_between = sum(len(d) * (np.mean(d) - grand_mean)**2
                 for d in programs.values())
ss_within = sum(np.sum((np.array(d) - np.mean(d))**2)
                for d in programs.values())
ss_total = np.sum((all_data - grand_mean)**2)

df_between = k - 1
df_within = N - k
df_total = N - 1

ms_between = ss_between / df_between
ms_within = ss_within / df_within

F_manual = ms_between / ms_within
p_manual = 1 - stats.f.cdf(F_manual, df_between, df_within)

# Effect size
eta_squared = ss_between / ss_total
omega_squared = (ss_between - df_between * ms_within) / (ss_total + ms_within)

# Print ANOVA table
print(f"\n{'Source':<15} {'SS':>10} {'df':>5} {'MS':>10} {'F':>8} {'p':>10}")
print("-" * 62)
print(f"{'Between':<15} {ss_between:>10.1f} {df_between:>5} {ms_between:>10.2f} "
      f"{F_manual:>8.2f} {p_manual:>10.6f}")
print(f"{'Within':<15} {ss_within:>10.1f} {df_within:>5} {ms_within:>10.2f}")
print(f"{'Total':<15} {ss_total:>10.1f} {df_total:>5}")

print(f"\n  η² = {eta_squared:.3f} (Large effect: {eta_squared*100:.1f}% of "
      f"variability explained)")
print(f"  ω² = {omega_squared:.3f} (Bias-corrected estimate)")
print(f"\n  scipy verification: F = {F_stat:.2f}, p = {p_value:.8f}")

Step 4: Post-Hoc Analysis

# ---- Tukey's HSD ----
print("\n" + "=" * 65)
print("POST-HOC: TUKEY'S HSD (FWER = 0.05)")
print("=" * 65)

data_long = np.concatenate(list(programs.values()))
labels_long = []
for name, data in programs.items():
    labels_long.extend([name] * len(data))

tukey = pairwise_tukeyhsd(endog=data_long, groups=labels_long, alpha=0.05)
print(tukey)

# Summarize significant comparisons
print("\nSignificant pairwise differences (α = 0.05):")
print("-" * 50)
for i in range(len(tukey.reject)):
    if tukey.reject[i]:
        g1 = tukey.groupsunique[tukey._results_table[i+1][0] if hasattr(tukey, '_results_table') else 0]
print(tukey.summary())

Step 5: Interpretation and Policy Implications

# ---- Cost-Effectiveness Analysis ----
print("\n" + "=" * 65)
print("COST-EFFECTIVENESS ANALYSIS")
print("=" * 65)

# Per-participant costs (hypothetical but realistic)
costs = {
    'Vaccination': 150,    # Vaccines + screening events
    'Nutrition': 280,      # Cooking classes + nutritionists
    'Fitness': 420,        # Gym partnerships + instructors
    'Control': 45          # Pamphlets + hotline maintenance
}

means = {name: np.mean(data) for name, data in programs.items()}

print(f"\n{'Program':<15} {'Mean Score':>12} {'Cost/Person':>13} "
      f"{'Score/Dollar':>14}")
print("-" * 58)
for name in programs:
    score_per_dollar = means[name] / costs[name]
    print(f"{name:<15} {means[name]:>12.1f} ${costs[name]:>12} "
          f"{score_per_dollar:>14.3f}")

# Incremental analysis vs. control
print("\n\nIncremental Analysis (vs. Control):")
print(f"{'Program':<15} {'Improvement':>13} {'Added Cost':>12} "
      f"{'Cost per Unit':>14}")
print("-" * 58)
for name in ['Vaccination', 'Nutrition', 'Fitness']:
    improvement = means[name] - means['Control']
    added_cost = costs[name] - costs['Control']
    cost_per_unit = added_cost / improvement
    print(f"{name:<15} {improvement:>+13.1f} ${added_cost:>11} "
          f"${cost_per_unit:>13.2f}")

Maya's Report to the County Board

Maya presents her findings:

Executive Summary

A randomized study of 100 participants across four intervention programs found statistically significant differences in health improvement scores ($F(3, 96) = 89.4$, $p < .001$, $\eta^2 = .74$). The Fitness program produced the highest improvement scores (mean = 83.8), followed by Vaccination (73.5), Nutrition (69.4), and Control (57.5). All pairwise differences were statistically significant by Tukey's HSD.

Key Finding: Intervention program assignment explains 74% of the variability in health improvement scores. This is a very large effect — the programs genuinely differ in their impact on cardiovascular health.

Cost-Effectiveness: While the Fitness program produces the best health outcomes, it costs $420 per participant — nearly three times the Vaccination program ($150). The Vaccination program achieves 68% of the Fitness program's improvement at 36% of the cost. For every dollar of added cost (vs. Control), Vaccination produces 1.52 units of improvement, compared to 0.70 for Fitness.

Recommendation: If the goal is maximum health impact and budget is not a constraint, the Fitness program is the strongest option. If the goal is reaching the most people within a fixed budget, the Vaccination + Screening program offers the best cost-effectiveness. A hybrid approach — Vaccination/Screening for all communities plus targeted Fitness programs for highest-risk individuals — may optimize both reach and impact.

The Ethical Dimension

Maya adds a caveat that the board needs to hear:

Important Limitation: Each program was piloted in a different community. While participants within each community were randomly selected for measurement, the communities themselves were not randomly assigned to programs. Community A may differ from Community D in ways (income, existing health infrastructure, air quality) that affect health improvement scores independently of the intervention.

To truly establish that the Fitness program causes better outcomes — rather than that Community C is simply healthier — we would need to randomize programs across communities or run a crossover study. The current design supports the conclusion that outcomes differed across programs, but the causal mechanism could involve both the intervention and uncontrolled community-level factors.

This is the difference between an observational comparison and a randomized experiment — a theme that has been running through this textbook since Chapter 4. Even with strong ANOVA results, the study design determines what kind of conclusions you can draw.

Discussion Questions

  1. Maya's $\eta^2 = 0.74$ means 74% of variability is "explained" by program membership. But 26% remains unexplained. What factors might contribute to this within-group variability? (Think about individual-level variables — age, baseline health, adherence to the program, etc.)

  2. The county board has a $500,000 budget. Using the cost data above, how many participants could each program serve? How would you allocate the budget if your goal were to maximize total health improvement across the county?

  3. If Maya ran the same study next year and found $\eta^2 = 0.45$ instead of $\eta^2 = 0.74$, would this change the statistical conclusion? Would it change the policy recommendation? Why?

  4. Maya noted that the communities were not randomly assigned to programs. Propose a better study design that would allow stronger causal conclusions. What practical challenges would you face in implementing it?