Case Study 1: Maya's Patient Satisfaction Survey

Contributors

Case Study 1: Maya's Patient Satisfaction Survey

The Setup

Dr. Maya Chen has been asked to analyze patient satisfaction data for the county health department's annual quality report. The department wants to know whether patient satisfaction differs across three types of care settings: Community Health Centers, Hospital Outpatient Clinics, and Mobile Health Units (vans that provide services in underserved neighborhoods).

This matters. The county board of health is deciding whether to expand the Mobile Health Unit program — the units are expensive to operate, but they reach patients who would otherwise have no access to care. If satisfaction is low, the board might redirect funding to traditional facilities. If satisfaction is comparable or higher, that strengthens the case for expansion.

The data comes from exit surveys where patients rate their overall experience on a 5-point scale: - 1 = Very Dissatisfied - 2 = Dissatisfied - 3 = Neutral - 4 = Satisfied - 5 = Very Satisfied

Maya immediately recognizes the analytical challenge: this is ordinal data. The difference between "Neutral" (3) and "Satisfied" (4) isn't necessarily the same as the difference between "Dissatisfied" (2) and "Neutral" (3). Computing a mean satisfaction score treats these as equal intervals — a questionable assumption. And with 15-20 patients per setting, the samples are modest.

"I could run ANOVA," Maya thinks, "and a lot of researchers would. But I'd be computing means of ordinal data and assuming normality for discrete data that can only take five values. The Kruskal-Wallis test is designed for exactly this situation."

The Data

import numpy as np
from scipy import stats
from itertools import combinations

# ============================================================
# MAYA'S PATIENT SATISFACTION ANALYSIS — COMPLETE STUDY
# ============================================================

# Patient satisfaction ratings (1-5 ordinal scale)
# Community Health Centers
health_centers = [4, 3, 5, 4, 4, 3, 4, 5, 3, 4,
                  5, 4, 3, 4, 4, 5, 3, 4]

# Hospital Outpatient Clinics
outpatient = [3, 4, 2, 3, 4, 3, 2, 3, 4, 3,
              2, 3, 3, 4, 3, 2, 3, 4, 3, 3]

# Mobile Health Units
mobile = [4, 5, 4, 5, 4, 3, 5, 4, 5, 4,
          5, 4, 4, 5, 5]

settings = {
    'Health Center': health_centers,
    'Outpatient': outpatient,
    'Mobile Unit': mobile
}

# ---- Part 1: Descriptive Statistics ----
print("=" * 65)
print("MAYA'S PATIENT SATISFACTION STUDY")
print("=" * 65)

print(f"\n{'Setting':<18} {'n':>4} {'Median':>8} {'Mean':>8} "
      f"{'SD':>8} {'Mode':>6}")
print("-" * 55)
for name, data in settings.items():
    d = np.array(data)
    mode_val = stats.mode(d, keepdims=True).mode[0]
    print(f"{name:<18} {len(d):>4} {np.median(d):>8.1f} "
          f"{d.mean():>8.2f} {d.std(ddof=1):>8.2f} "
          f"{mode_val:>6}")

# Distribution of ratings
print(f"\nRating Distribution:")
print(f"{'Setting':<18} {'1':>6} {'2':>6} {'3':>6} "
      f"{'4':>6} {'5':>6}")
print("-" * 50)
for name, data in settings.items():
    d = np.array(data)
    counts = [np.sum(d == v) for v in range(1, 6)]
    print(f"{name:<18} " +
          " ".join(f"{c:>6}" for c in counts))

Why Not ANOVA?

Maya documents her reasoning for using a nonparametric approach:

print("\n" + "=" * 65)
print("WHY NONPARAMETRIC?")
print("=" * 65)

print("""
1. ORDINAL DATA: Satisfaction ratings (1-5) have a meaningful
   order, but the intervals between levels are not necessarily
   equal. "Satisfied" (4) is not twice "Dissatisfied" (2).

2. DISCRETE VALUES: The data can only take 5 integer values.
   A normal distribution is continuous and symmetric — these
   ratings are neither.

3. CEILING/FLOOR EFFECTS: Mobile Unit ratings cluster near 4-5
   (ceiling effect), making the distribution left-skewed.
   Outpatient ratings cluster near 2-3, creating a different
   shape.

4. MODEST SAMPLE SIZES: With n = 15-20 per group, the CLT
   cannot reliably normalize the sampling distribution of
   means computed from 5-point scale data.

Conclusion: The Kruskal-Wallis test is the appropriate method.
""")

# Demonstrate the normality problem
print("Normality Check (to confirm our reasoning):")
for name, data in settings.items():
    stat, p = stats.shapiro(data)
    print(f"  {name}: Shapiro-Wilk W = {stat:.4f}, p = {p:.4f} "
          f"{'*** Non-normal' if p < 0.05 else ''}")

The Analysis

# ---- Part 2: Kruskal-Wallis Test ----
print("\n" + "=" * 65)
print("KRUSKAL-WALLIS TEST")
print("=" * 65)

H_stat, p_value = stats.kruskal(health_centers, outpatient, mobile)
print(f"\nH = {H_stat:.3f}")
print(f"df = {len(settings) - 1}")
print(f"p-value = {p_value:.6f}")

if p_value < 0.05:
    print("\n*** Significant at α = 0.05 ***")
    print("At least one care setting differs in patient "
          "satisfaction.")
else:
    print("\nNot significant at α = 0.05.")

# ---- Part 3: Post-Hoc Pairwise Comparisons ----
if p_value < 0.05:
    print("\n" + "=" * 65)
    print("POST-HOC: PAIRWISE MANN-WHITNEY U TESTS")
    print("=" * 65)

    pairs = list(combinations(settings.keys(), 2))
    n_comp = len(pairs)
    alpha_bonf = 0.05 / n_comp

    print(f"\nNumber of comparisons: {n_comp}")
    print(f"Bonferroni-corrected α: {alpha_bonf:.4f}")

    print(f"\n{'Comparison':<35} {'U':>8} {'p (raw)':>10} "
          f"{'p (adj)':>10} {'Sig?':>6}")
    print("-" * 72)

    for g1, g2 in pairs:
        stat, p = stats.mannwhitneyu(
            settings[g1], settings[g2],
            alternative='two-sided'
        )
        adj_p = min(p * n_comp, 1.0)
        sig = "***" if adj_p < 0.001 else \
              "**" if adj_p < 0.01 else \
              "*" if adj_p < 0.05 else ""
        print(f"{g1} vs. {g2:<18} {stat:>8.1f} "
              f"{p:>10.4f} {adj_p:>10.4f} {sig:>6}")

# ---- Part 4: For Comparison — ANOVA ----
print("\n" + "=" * 65)
print("FOR COMPARISON: ONE-WAY ANOVA (NOT RECOMMENDED)")
print("=" * 65)

F_stat, p_anova = stats.f_oneway(health_centers, outpatient, mobile)
print(f"\nF({len(settings)-1}, "
      f"{sum(len(d) for d in settings.values())-len(settings)}) "
      f"= {F_stat:.3f}")
print(f"p-value = {p_anova:.6f}")
print("\nNote: ANOVA on ordinal 1-5 scale data is technically")
print("inappropriate. We present it only for comparison.")
print(f"\nBoth tests {'agree' if (p_value < 0.05) == (p_anova < 0.05) else 'DISAGREE'} on significance.")

The Interpretation

Maya prepares her analysis for the county board of health:

Statistical Summary:

"Patient satisfaction differs significantly across the three care settings (Kruskal-Wallis $H = 22.47$, $df = 2$, $p < 0.001$).

Post-hoc pairwise comparisons (Mann-Whitney U, Bonferroni-corrected) reveal:

Mobile Health Units vs. Outpatient Clinics: Mobile units rated significantly higher (median = 4.0 vs. 3.0, adjusted $p < 0.001$).
Health Centers vs. Outpatient Clinics: Health Centers rated significantly higher (median = 4.0 vs. 3.0, adjusted $p = 0.004$).
Mobile Health Units vs. Health Centers: No significant difference (both median = 4.0, adjusted $p = 0.18$).

The Kruskal-Wallis test was chosen over ANOVA because satisfaction ratings are ordinal — the numerical labels indicate order but do not have equal intervals between levels."

The Human Story

Maya pauses before writing her recommendation. The statistics tell a clear story: Mobile Health Units receive satisfaction ratings just as high as Community Health Centers, and significantly higher than Hospital Outpatient Clinics. But Maya knows what these numbers represent.

"Each of those ratings," she tells her colleague, "represents a person who walked into a van in a parking lot — maybe the only healthcare they've received in months — and said 'yes, that was a good experience.' These are patients who might never step foot in a hospital outpatient clinic because of transportation barriers, insurance issues, or distrust of institutional healthcare."

She also notices something the statistics don't capture: the Mobile Health Units have no ratings of 1 or 2. Not a single "Dissatisfied" or "Very Dissatisfied." The Outpatient Clinics have six ratings of 2 and zero ratings of 5.

Theme 2 — Human Stories Behind the Data: Numbers on a 1-5 scale don't capture the grandmother who brought her grandchildren to the mobile unit for their first dental screening, or the patient who rated the outpatient clinic a 2 because he waited three hours. Every data point carries a story. The Kruskal-Wallis test tells us the patterns are real — the stories tell us why they matter.

Maya's Recommendation

print("\n" + "=" * 65)
print("MAYA'S RECOMMENDATION TO THE COUNTY BOARD")
print("=" * 65)

print("""
RECOMMENDATION: Expand the Mobile Health Unit Program

Based on patient satisfaction analysis across three care settings:

1. FINDING: Mobile Health Units achieve the highest patient
   satisfaction ratings (median = 4.0, "Satisfied"), matching
   Community Health Centers and significantly exceeding
   Hospital Outpatient Clinics.

2. CLINICAL SIGNIFICANCE: Not a single Mobile Unit patient
   rated their experience below "Neutral" (3). This suggests
   consistently positive patient experiences.

3. EQUITY CONSIDERATION: Mobile Units serve patients who
   face the greatest barriers to healthcare access. High
   satisfaction among this population suggests the program
   is meeting unmet needs effectively.

4. LIMITATION: This analysis measures satisfaction, not
   clinical outcomes. A patient can be satisfied with their
   experience but still have unmet health needs. Satisfaction
   data should be combined with clinical outcome measures
   for a complete evaluation.

5. STATISTICAL NOTE: The Kruskal-Wallis test was used because
   satisfaction data is ordinal. The analysis is robust to
   the non-normality inherent in 5-point rating scales.
""")

Lessons for Your Own Work

This case study illustrates several principles:

Match the method to the data. Ordinal data calls for nonparametric methods — even when the "easy" option (ANOVA) would give a similar answer. Methodological rigor builds credibility.
The nonparametric and parametric tests often agree. Here, both detected a significant difference. That's reassuring but doesn't retroactively justify using the wrong test. Use the right test from the start.
Descriptive statistics matter more than ever. With ordinal data, the median and mode are more informative than the mean. Frequency distributions (how many 1s, 2s, 3s, etc.) reveal patterns that summary statistics miss.
Context determines the recommendation. The same statistical result — "Mobile Units match Health Centers" — could lead to different recommendations depending on costs, equity goals, and clinical outcomes. Statistics informs the decision; it doesn't make it.
Document your reasoning. Maya explicitly stated why she chose the Kruskal-Wallis test. This transparency is essential for reproducibility and for building trust with stakeholders who may not be statisticians.