Case Study 2: Bayes at the Hospital — Combining Model Predictions with Clinical Judgment
The Situation
Metro General Hospital has deployed a machine learning model to predict 30-day readmissions. The model outputs a probability for each discharged patient — a single number between 0 and 1 that estimates the likelihood the patient will return to the hospital within a month.
The clinical team has a problem. The model is accurate in aggregate — its AUC is 0.78, solid for healthcare — but individual predictions feel disconnected from clinical reality. A patient with a model score of 0.25 and a devastating social situation (homeless, no medication access, no caregiver) does not feel like a 25% risk to the attending physician. Meanwhile, a patient with a model score of 0.35 who has a supportive family, a primary care follow-up scheduled, and a simple medication regimen does not feel like a 35% risk.
Dr. Sarah Okafor, the chief quality officer, wants a framework that combines the model's output with clinical information that the model does not have access to. She does not want to replace the model. She wants to update its predictions with bedside judgment — formally and transparently.
The answer is Bayes' theorem.
The Framework
The idea is straightforward. The model's prediction becomes the prior — the best estimate before the clinician adds their expertise. Clinical observations become the evidence that updates the prior into a posterior — the final, combined estimate.
For this to work, the clinical team needs to quantify their observations in terms of likelihoods: "Among patients who are readmitted, how often do we see this factor? Among patients who are not readmitted, how often?"
Metro General's data team analyzed three years of readmission data and produced likelihood tables for several clinical factors the model does not capture well.
The Likelihood Tables
Factor 1: Social Support
| Social Support Level | P(factor | readmitted) | P(factor | not readmitted) |
|---|---|---|
| Strong (caregiver at home, transportation, stable housing) | 0.20 | 0.55 |
| Moderate (some support, some gaps) | 0.35 | 0.30 |
| Weak (isolated, unstable housing, no caregiver) | 0.45 | 0.15 |
Readmitted patients are much more likely to have weak social support (45% vs. 15%). Patients with strong support are readmitted far less often (20% vs. 55%).
Factor 2: Medication Complexity
| Medication Complexity | P(factor | readmitted) | P(factor | not readmitted) |
|---|---|---|
| Simple (1-3 medications, no changes at discharge) | 0.15 | 0.40 |
| Moderate (4-7 medications or 1-2 changes) | 0.40 | 0.40 |
| Complex (8+ medications or 3+ changes) | 0.45 | 0.20 |
Complex medication regimens strongly predict readmission. Simple regimens are protective.
Factor 3: Patient Understanding at Discharge
| Understanding Level | P(factor | readmitted) | P(factor | not readmitted) |
|---|---|---|
| Clear (can repeat back care plan, asks good questions) | 0.15 | 0.50 |
| Partial (nods along, some confusion on medications) | 0.45 | 0.35 |
| Poor (confused, cannot repeat instructions, language barrier) | 0.40 | 0.15 |
Patient Case: Margaret Chen, Age 72
Margaret is being discharged after a 4-day stay for heart failure exacerbation. Here is what the model and the clinician each know.
What the model knows (from structured EHR data): - Age: 72 - Diagnosis: heart failure (HF), Stage C - Length of stay: 4 days - Prior admissions in 12 months: 1 - Comorbidities: diabetes, hypertension - Lab values at discharge: BNP elevated but improving, HbA1c 7.8
Model prediction: P(readmit) = 0.28
What the clinician observes (not in the model): - Margaret lives alone. Her daughter visits twice a week but works full-time and lives 40 minutes away. Social support: Weak. - Margaret is being discharged on 9 medications with 3 changes from her admission regimen. Medication complexity: Complex. - During discharge education, Margaret repeated her medication schedule correctly and asked about sodium restriction. Patient understanding: Clear.
The Bayesian Update
We will apply Bayes' theorem sequentially — each clinical factor updates the probability, and the posterior from one update becomes the prior for the next.
Update 1: Social Support (Weak)
The model's prediction of 0.28 is our prior.
import numpy as np
# Prior: model prediction
prior = 0.28
# Likelihoods for weak social support
p_weak_given_readmit = 0.45
p_weak_given_no_readmit = 0.15
# Total probability of observing weak social support
p_weak = (p_weak_given_readmit * prior +
p_weak_given_no_readmit * (1 - prior))
# Posterior
posterior_1 = (p_weak_given_readmit * prior) / p_weak
print(f"Prior (model score): {prior:.1%}")
print(f"P(weak support | readmit): {p_weak_given_readmit:.0%}")
print(f"P(weak support | no readmit): {p_weak_given_no_readmit:.0%}")
print(f"P(weak support): {p_weak:.4f}")
print(f"Posterior after social support: {posterior_1:.1%}")
Prior (model score): 28.0%
P(weak support | readmit): 45%
P(weak support | no readmit): 15%
P(weak support): 0.2340
Posterior after social support: 53.8%
The weak social support nearly doubled the estimated readmission probability — from 28% to 53.8%. This makes clinical sense. Living alone at 72 with heart failure and a complex medication regimen is a major risk factor that the model's structured features may not capture fully.
Update 2: Medication Complexity (Complex)
The posterior from Update 1 (53.8%) becomes the new prior.
prior_2 = posterior_1 # 0.538
p_complex_given_readmit = 0.45
p_complex_given_no_readmit = 0.20
p_complex = (p_complex_given_readmit * prior_2 +
p_complex_given_no_readmit * (1 - prior_2))
posterior_2 = (p_complex_given_readmit * prior_2) / p_complex
print(f"Prior (after social support): {prior_2:.1%}")
print(f"Posterior after medication complexity: {posterior_2:.1%}")
Prior (after social support): 53.8%
Posterior after medication complexity: 72.4%
Medication complexity pushed the estimate to 72.4%. Margaret is now in the high-risk category by any reasonable threshold.
Update 3: Patient Understanding (Clear)
But Margaret demonstrated clear understanding of her care plan. This is a protective factor.
prior_3 = posterior_2 # 0.724
p_clear_given_readmit = 0.15
p_clear_given_no_readmit = 0.50
p_clear = (p_clear_given_readmit * prior_3 +
p_clear_given_no_readmit * (1 - prior_3))
posterior_3 = (p_clear_given_readmit * prior_3) / p_clear
print(f"Prior (after medication complexity): {prior_3:.1%}")
print(f"Posterior after patient understanding: {posterior_3:.1%}")
Prior (after medication complexity): 72.4%
Posterior after patient understanding: 44.1%
Clear understanding pulled the risk back down substantially — from 72.4% to 44.1%. Margaret can articulate her care plan, which partially offsets the risks from living alone and managing complex medications.
Summary of the Bayesian Chain
updates = [
("Model prediction (prior)", 0.28),
("+ Weak social support", posterior_1),
("+ Complex medications", posterior_2),
("+ Clear understanding", posterior_3),
]
print("=" * 55)
print(f"{'Stage':<40} {'P(readmit)':>12}")
print("=" * 55)
for label, prob in updates:
bar = '#' * int(prob * 40)
print(f"{label:<40} {prob:>10.1%} {bar}")
print("=" * 55)
=======================================================
Stage P(readmit)
=======================================================
Model prediction (prior) 28.0% ###########
+ Weak social support 53.8% #####################
+ Complex medications 72.4% #############################
+ Clear understanding 44.1% #################
=======================================================
The model alone said 28%. After incorporating clinical observations, the estimate is 44.1%. That is a meaningfully different risk level — one that would likely trigger a different intervention (perhaps a home health visit or a pharmacy consultation) than the raw model score.
A Second Patient: James Walker, Age 58
James is discharged after knee replacement surgery. Routine procedure, no complications.
Model prediction: P(readmit) = 0.18
Clinical observations: - Strong social support (wife is a retired nurse, lives at home) - Simple medications (3 medications, no changes) - Clear understanding (demonstrated exercises and medication schedule)
def bayesian_update(prior, p_evidence_given_positive, p_evidence_given_negative):
"""Single Bayesian update."""
p_evidence = (p_evidence_given_positive * prior +
p_evidence_given_negative * (1 - prior))
posterior = (p_evidence_given_positive * prior) / p_evidence
return posterior
# Start with model prediction
p = 0.18
# Update 1: Strong social support
p = bayesian_update(p, p_weak_given_readmit=0.20, p_weak_given_no_readmit=0.55)
print(f"After strong support: {p:.1%}")
# Wait — we need the likelihoods for STRONG, not WEAK
p = 0.18
p = bayesian_update(p, 0.20, 0.55)
print(f"After strong support: {p:.1%}")
# Update 2: Simple medications
p = bayesian_update(p, 0.15, 0.40)
print(f"After simple medications: {p:.1%}")
# Update 3: Clear understanding
p = bayesian_update(p, 0.15, 0.50)
print(f"After clear understanding: {p:.1%}")
After strong support: 7.4%
After simple medications: 3.0%
After clear understanding: 0.9%
James's risk drops from 18% to under 1%. Every clinical factor is protective. The model flagged him at moderate risk, but clinical judgment — formalized through Bayes — says he is extremely low risk. This patient does not need a follow-up call from the discharge team. Margaret does.
The Automation Question
Dr. Okafor asks: "Can we build this into the discharge workflow?"
The answer is yes, with careful design. The data team builds a discharge assessment tool where nurses record three observations (social support level, medication complexity, patient understanding) using structured dropdowns — not free text. Each selection maps to a row in the likelihood table. The tool automatically applies the Bayesian updates to the model's prediction and displays the final posterior.
def discharge_risk_assessment(model_score, social_support, med_complexity,
understanding):
"""
Combine ML model prediction with clinical assessment
using sequential Bayesian updates.
Parameters
----------
model_score : float
ML model's predicted P(readmission)
social_support : str
'strong', 'moderate', or 'weak'
med_complexity : str
'simple', 'moderate', or 'complex'
understanding : str
'clear', 'partial', or 'poor'
Returns
-------
float
Updated P(readmission) after clinical factors
"""
likelihood_tables = {
'social_support': {
'strong': (0.20, 0.55),
'moderate': (0.35, 0.30),
'weak': (0.45, 0.15),
},
'med_complexity': {
'simple': (0.15, 0.40),
'moderate': (0.40, 0.40),
'complex': (0.45, 0.20),
},
'understanding': {
'clear': (0.15, 0.50),
'partial': (0.45, 0.35),
'poor': (0.40, 0.15),
},
}
p = model_score
for factor_name, level in [('social_support', social_support),
('med_complexity', med_complexity),
('understanding', understanding)]:
p_given_pos, p_given_neg = likelihood_tables[factor_name][level]
p_evidence = p_given_pos * p + p_given_neg * (1 - p)
p = (p_given_pos * p) / p_evidence
return p
# Margaret Chen
margaret_risk = discharge_risk_assessment(
model_score=0.28,
social_support='weak',
med_complexity='complex',
understanding='clear'
)
print(f"Margaret Chen — Final risk: {margaret_risk:.1%}")
# James Walker
james_risk = discharge_risk_assessment(
model_score=0.18,
social_support='strong',
med_complexity='simple',
understanding='clear'
)
print(f"James Walker — Final risk: {james_risk:.1%}")
Margaret Chen — Final risk: 44.1%
James Walker — Final risk: 0.9%
Implementation Challenges
The clinical team raises several valid concerns during rollout.
Concern 1: Where do the likelihood tables come from?
The tables must be estimated from Metro General's own data. National averages will not suffice — patient populations, documentation practices, and social determinants vary enormously by hospital. The data team computed the likelihoods from 3 years of discharge records (n=42,000) where nurses had documented these factors.
Concern 2: Are the factors independent?
Bayes' theorem as applied here assumes conditional independence — that knowing a patient's social support level does not change the relationship between medication complexity and readmission (given the readmission status). This is not perfectly true. Patients with weak social support are more likely to have complex medications (chronic conditions, poor outpatient management). The sequential update overstates the combined risk slightly.
The mitigation: the data team periodically checks calibration. If patients with a posterior of 40% are actually readmitted 35% of the time, they adjust the likelihood tables or add interaction terms.
Concern 3: Can clinicians game the system?
If a nurse believes a patient needs more support, they might rate social support as "weak" to trigger a higher risk score and justify the intervention. This is technically gaming, but Dr. Okafor's position is pragmatic: "If the nurse thinks the patient needs help, the patient probably needs help. The Bayesian tool just formalizes the conversation."
Concern 4: Does the order of updates matter?
Mathematically, if the factors are conditionally independent, the order does not matter. The final posterior is the same regardless of whether you update social support or medication complexity first. This is reassuring — it means the tool produces consistent results no matter how the discharge form is organized.
# Verify order independence for Margaret Chen
p1 = discharge_risk_assessment(0.28, 'weak', 'complex', 'clear')
# Manual reorder: understanding first, then social, then meds
p = 0.28
p = bayesian_update(p, 0.15, 0.50) # understanding: clear
p = bayesian_update(p, 0.45, 0.15) # social: weak
p = bayesian_update(p, 0.45, 0.20) # meds: complex
p2 = p
print(f"Original order: {p1:.6f}")
print(f"Reversed order: {p2:.6f}")
print(f"Difference: {abs(p1 - p2):.10f}")
Original order: 0.440816
Reversed order: 0.440816
Difference: 0.0000000000
A Third Patient: Testing the Edge Cases
To stress-test the tool, the data team runs three edge-case scenarios.
Edge Case 1: All factors align with the model. Patient with model score 0.30, moderate support, moderate medications, partial understanding.
p = discharge_risk_assessment(0.30, 'moderate', 'moderate', 'partial')
print(f"All moderate/partial: {p:.1%}")
All moderate/partial: 44.5%
Even "moderate" factors push the risk up because the likelihoods for moderate factors are slightly asymmetric (they lean slightly toward the readmission side for support and understanding).
Edge Case 2: Clinical factors contradict the model. Patient with a high model score (0.45) but all protective clinical factors.
p = discharge_risk_assessment(0.45, 'strong', 'simple', 'clear')
print(f"High model score, all protective: {p:.1%}")
High model score, all protective: 3.3%
The clinical factors overwhelm the model. This might seem concerning — should three checkboxes override a machine learning model? But consider what the model might be missing. Perhaps the model's high score is driven by the patient's age and diagnosis history. The model does not know that the patient's daughter is a nurse who will be at home 24/7 for the next two weeks, that the medication regimen is just two pills once daily, and that the patient can explain every detail of their care plan. The clinical factors capture genuinely relevant information.
Edge Case 3: Near-zero prior. Patient with model score 0.03 (very low risk) and all risk factors.
p = discharge_risk_assessment(0.03, 'weak', 'complex', 'poor')
print(f"Very low model score, all risk factors: {p:.1%}")
Very low model score, all risk factors: 29.3%
Even starting from 3%, the accumulation of risk factors pushes the posterior to 29.3%. Bayes' theorem respects the prior — it does not ignore the model entirely — but strong evidence can still move the needle dramatically. This is the mathematically principled behavior: extraordinary evidence overrides ordinary priors.
Sensitivity Analysis: Which Factor Matters Most?
The clinical team wants to know which of the three factors has the biggest impact. The data team runs a sensitivity analysis, starting from the base readmission rate of 15%.
factors = [
('Social: strong', (0.20, 0.55)),
('Social: weak', (0.45, 0.15)),
('Meds: simple', (0.15, 0.40)),
('Meds: complex', (0.45, 0.20)),
('Understand: clear', (0.15, 0.50)),
('Understand: poor', (0.40, 0.15)),
]
base = 0.15
print(f"{'Factor':<25} {'Posterior':>10} {'Change':>10}")
print("-" * 47)
for name, (p_pos, p_neg) in factors:
post = bayesian_update(base, p_pos, p_neg)
change = post - base
print(f"{name:<25} {post:>9.1%} {change:>+9.1%}")
Factor Posterior Change
-----------------------------------------------
Social: strong 7.0% -8.0%
Social: weak 34.6% +19.6%
Meds: simple 6.2% -8.8%
Meds: complex 28.1% +13.1%
Understand: clear 6.0% -9.0%
Understand: poor 36.0% +21.0%
Poor patient understanding has the largest single-factor impact (+21.0 percentage points), followed by weak social support (+19.6 points). On the protective side, clear understanding has the strongest protective effect (-9.0 points). This analysis gives the clinical team actionable insight: invest discharge education resources where patient understanding is poorest — that is where the marginal return is highest.
Results After Six Months
Metro General tracked outcomes for 6 months after deploying the Bayesian discharge tool.
- Model alone (before tool): AUC 0.78, flagged 22% of patients as "high risk" (posterior > 25%), captured 61% of actual readmissions.
- Model + Bayesian clinical update: AUC 0.84, flagged 18% of patients, captured 74% of actual readmissions.
The combined system is both more precise (fewer false alarms) and more sensitive (catches more actual readmissions). The clinical factors add information the model genuinely lacks. And because the updates are formalized through Bayes' theorem — not through ad hoc "clinical override" — they are auditable, reproducible, and can be validated against outcomes.
The financial impact was meaningful. Readmissions cost Metro General an average of $14,200 per event (much of it unreimbursed under CMS penalties). The 13-percentage-point improvement in sensitivity — capturing 74% vs. 61% of readmissions — translated to approximately 89 additional patients receiving preventive interventions per quarter. At an intervention cost of roughly $400 per patient (home health visit, pharmacy consultation, follow-up call), the program generated an estimated net savings of $310,000 in the first six months.
The Broader Lesson
This case study illustrates a pattern that extends far beyond healthcare. In any domain where a machine learning model captures some signals but misses others — and this is every domain — Bayes' theorem provides a principled framework for human-AI collaboration.
The model is not replaced. It is not overridden by gut feeling. It is updated with structured expert judgment, using mathematics that has been proven optimal since 1763 (when Reverend Thomas Bayes' paper was posthumously published). The result is better than either the model or the clinician alone.
In the StreamFlow context, the same approach could combine a churn model's prediction with information from a customer success manager: "This customer is migrating to a competitor's platform" (strong evidence of churn) or "This customer just expanded to 50 more seats" (strong evidence of retention). The math is identical. Only the likelihood tables change.
Discussion Questions
-
The likelihood tables are static — estimated once from historical data. How often should they be re-estimated? What signals would indicate the tables need updating?
-
The conditional independence assumption is violated in practice. What would a more sophisticated approach look like? (Hint: think about what a Bayesian network could capture that sequential updates cannot.)
-
A surgeon argues: "I have 30 years of experience. I do not need a formula to tell me which patients will bounce back." How would you respond? What does the Bayesian framework offer that pure clinical intuition does not?
-
The system uses three clinical factors. How would you decide which additional factors to include? What are the tradeoffs of adding more factors to the Bayesian chain?
-
Metro General's base readmission rate is 15%. A rural community hospital's rate is 8%. Can Metro General's likelihood tables be used at the rural hospital? What adjustments would be needed?