Case Study 2: Bayes in the Courtroom — The Prosecutor's Fallacy in Criminal Trials
The Scenario
In 1999, a British solicitor named Sally Clark stood trial for the murder of her two infant sons. Both boys had died suddenly in infancy — one in 1996, the other in 1998. The prosecution's case relied heavily on expert statistical testimony.
The prosecution's medical expert, pediatrician Sir Roy Meadow, testified that the probability of two children in the same family dying of Sudden Infant Death Syndrome (SIDS) was approximately 1 in 73 million. He arrived at this figure by squaring the probability of a single SIDS death (approximately 1 in 8,543 for a family with the Clarks' socioeconomic profile).
$$P(\text{two SIDS deaths}) = \frac{1}{8{,}543} \times \frac{1}{8{,}543} \approx \frac{1}{73{,}000{,}000}$$
The jury heard "1 in 73 million" and convicted. Sally Clark went to prison. She spent more than three years behind bars before her conviction was overturned on appeal.
The statistical reasoning used to convict her contained at least three fatal errors — errors that every student of probability should understand.
Error 1: The Independence Assumption
Meadow multiplied the two probabilities as if they were independent — as if SIDS deaths in the same family were like two independent coin flips. But they're not.
As you learned in Chapter 8, the multiplication rule $P(A \text{ and } B) = P(A) \times P(B)$ applies only to independent events. SIDS deaths in the same family are not independent. They share:
- Genetic factors that may predispose to SIDS
- Environmental factors (sleeping conditions, household exposure)
- Socioeconomic factors (correlated with risk)
- Parental health behaviors (smoking, prenatal care)
Studies published after the trial showed that once one SIDS death has occurred in a family, the probability of a second is substantially higher — perhaps 5 to 10 times the baseline rate, not the baseline rate again. The general multiplication rule should have been used:
$$P(\text{second SIDS} \mid \text{first SIDS}) \gg P(\text{second SIDS})$$
$$P(\text{two SIDS}) = P(\text{first SIDS}) \times P(\text{second SIDS} \mid \text{first SIDS})$$
This alone could change the probability from 1 in 73 million to something like 1 in 1 million — still rare, but 73 times more likely than the figure presented to the jury.
Error 2: The Prosecutor's Fallacy
Even if the 1 in 73 million figure were correct, it answers the wrong question. This is the prosecutor's fallacy in its purest form.
The prosecution argued, in effect:
"The probability of this evidence (two SIDS deaths) given innocence is 1 in 73 million. Therefore, the probability of innocence given this evidence is 1 in 73 million."
In notation:
$$P(\text{two SIDS deaths} \mid \text{innocent}) = \frac{1}{73{,}000{,}000}$$
$$\therefore P(\text{innocent} \mid \text{two SIDS deaths}) = \frac{1}{73{,}000{,}000} \quad \text{← THIS IS WRONG}$$
This is exactly the $P(A \mid B) \neq P(B \mid A)$ confusion from Section 9.3. To find the actual probability of innocence given the evidence, you need Bayes' theorem — and you need the prior probability of murder.
Error 3: Ignoring the Alternative Hypothesis
The prosecution presented the probability of two natural deaths (SIDS) but never compared it to the probability of two murders. Bayes' theorem requires both.
Let's work through what the calculation should have looked like:
Setting up Bayes' theorem:
Let's define: - $I$ = innocent (both deaths were SIDS) - $G$ = guilty (both deaths were homicide) - $E$ = the evidence (two infant deaths in the same family)
$$P(I \mid E) = \frac{P(E \mid I) \times P(I)}{P(E)}$$
We need:
-
$P(E \mid I)$: the probability of two SIDS deaths in one family. Even accepting Meadow's flawed independence assumption, let's use a corrected estimate. Research suggests something like 1 in 1 million to 1 in 2 million when family correlation is accounted for. Let's use $P(E \mid I) \approx \frac{1}{1{,}000{,}000}$.
-
$P(E \mid G)$: the probability that a parent who murders two infants would have the observable evidence pattern. This is harder to estimate, but it's not 1. There would need to be opportunity, motive, and method consistent with the evidence. Let's assume $P(E \mid G) \approx \frac{1}{50}$ (acknowledging that double infanticide is itself extremely rare and requires specific circumstances to look like natural death).
-
$P(I)$ and $P(G)$: the prior probabilities. How likely is each explanation before looking at the evidence? Double infanticide is extremely rare. The base rate of a parent murdering two children is far lower than 1 in 73 million. Estimates suggest a prior probability of double infanticide of perhaps 1 in 10 million to 1 in 50 million births.
Let's calculate with these estimates:
Using Bayes:
$$P(G \mid E) = \frac{P(E \mid G) \times P(G)}{P(E \mid G) \times P(G) + P(E \mid I) \times P(I)}$$
With estimated values: - $P(G) = \frac{1}{50{,}000{,}000}$ (prior probability of double infanticide) - $P(I) = 1 - P(G) \approx 1$ - $P(E \mid G) = \frac{1}{50}$ - $P(E \mid I) = \frac{1}{1{,}000{,}000}$
$$P(G \mid E) = \frac{\frac{1}{50} \times \frac{1}{50{,}000{,}000}}{\frac{1}{50} \times \frac{1}{50{,}000{,}000} + \frac{1}{1{,}000{,}000} \times 1}$$
$$= \frac{\frac{1}{2{,}500{,}000{,}000}}{\frac{1}{2{,}500{,}000{,}000} + \frac{1}{1{,}000{,}000}}$$
$$= \frac{0.0000000004}{0.0000000004 + 0.000001}$$
$$= \frac{0.0000000004}{0.0000010004}$$
$$\approx 0.0004$$
Under these (rough) estimates, the probability of guilt given the evidence is about 0.04% — not the 99.9999986% implied by the prosecution.
Now, these numbers are estimates, and reasonable people can disagree about the inputs. But the point is devastating: the correct framework reverses the conclusion. The Bayesian analysis suggests that two SIDS deaths, while very rare, are actually more likely than double infanticide, because double infanticide is even rarer.
As the Royal Statistical Society stated in a letter to the Lord Chancellor after the case:
"The jury needs to weigh up two competing explanations for the deaths of the children: SIDS or murder. Two deaths by SIDS or two murders are each quite unlikely, but one of these explanations must be true. The frequency of SIDS is quite low, but so is the frequency of infanticide."
The Aftermath
Sally Clark's conviction was overturned in 2003 after it emerged that key pathological evidence had been withheld by the prosecution. She was released from prison, but never recovered from the ordeal. She developed severe psychiatric illness and died in 2007 from acute alcohol poisoning, at the age of 42.
The Royal Statistical Society took the unprecedented step of issuing a public statement criticizing the misuse of statistics in the trial. Professor Sir Roy Meadow was struck off the medical register (though later reinstated on appeal). The case led to a review of hundreds of other convictions where similar statistical testimony had been used.
The Pattern: Where the Prosecutor's Fallacy Appears
The Sally Clark case is extreme, but the prosecutor's fallacy appears in subtler forms throughout the justice system:
DNA Evidence
A prosecutor argues: "The probability of a random DNA match is 1 in 10 billion. Therefore, the defendant is guilty beyond reasonable doubt."
The fallacy: In a database search of 10 million profiles, the probability of at least one coincidental match is approximately:
$$P(\text{at least one match}) = 1 - \left(\frac{9{,}999{,}999{,}999}{10{,}000{,}000{,}000}\right)^{10{,}000{,}000} \approx 0.001$$
That's about 1 in 1,000 — much higher than 1 in 10 billion. The database search changes the problem. (This is related to the birthday problem from Chapter 8.)
Eyewitness Identification
A prosecutor argues: "The witness picked the defendant out of a lineup. The chance of a mistaken identification is only 5%."
The fallacy: The 5% is $P(\text{identified} \mid \text{innocent})$, not $P(\text{innocent} \mid \text{identified})$. If there are 100 equally plausible suspects in a city, the prior probability that this particular defendant is guilty might be only 1%. Bayesian updating with a 95% accurate identification would give:
$$P(\text{guilty} \mid \text{identified}) = \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.05 \times 0.99} = \frac{0.0095}{0.059} \approx 0.161$$
A 16% probability of guilt — far short of "beyond reasonable doubt."
Professor Washington's Connection
Professor Washington's research on predictive policing algorithms reveals the same pattern. When an algorithm labels someone "high risk," the relevant question is not "how accurate is the algorithm?" but "given this label, what's the actual probability of re-offense?"
As we saw in Section 9.11, even a reasonably accurate algorithm (75% sensitivity) applied to a population with a 20% base rate of re-offense produces a PPV of only 46%. More than half of "high risk" labels are applied to people who will not re-offend.
Washington writes in his research notes: "The prosecutor's fallacy isn't just a courtroom phenomenon. It's embedded in every predictive algorithm that confuses P(flagged | will offend) with P(will offend | flagged). And the people harmed by this confusion are disproportionately those already marginalized by the justice system."
The Broader Lesson
The prosecutor's fallacy teaches us three things:
-
Always ask "the probability of WHAT given WHAT?" The order matters. $P(A \mid B) \neq P(B \mid A)$, and confusing them can destroy lives.
-
Always consider the base rate. Rare events (coincidental DNA matches, double SIDS deaths, false confessions) may seem impossibly unlikely in isolation — but they're less unlikely than the alternative when the alternative is also extremely rare.
-
Always compare competing explanations. A probability in isolation is meaningless. You must compare it to the probability of the alternative. Bayes' theorem forces this comparison. Simple probability does not.
Questions for Discussion
-
The communication problem. How should expert witnesses present probabilistic evidence to a jury? Draft a one-paragraph explanation of the Sally Clark statistics that a non-technical juror could understand. Avoid the prosecutor's fallacy.
-
Database searches and DNA. If a crime lab searches a database of 5 million DNA profiles against crime scene evidence, and the random match probability is 1 in 1 billion, what is the expected number of coincidental matches? How does this change the interpretation compared to testing a single identified suspect?
-
The role of prior probabilities. In a criminal trial, what should serve as the "prior probability" of guilt? Is it appropriate to use statistical base rates in a courtroom? What are the arguments for and against?
-
Algorithms in sentencing. Some jurisdictions use risk assessment algorithms during sentencing. If an algorithm's PPV for violent recidivism is 40% (meaning 60% of people labeled "high risk" will NOT be violent), should courts be allowed to use this tool in sentencing decisions? What if the PPV differs by race?
-
The reverse fallacy. The defense attorney's fallacy is the opposite error: arguing that because there are many possible suspects, the DNA evidence is meaningless. Why is this also wrong? How does Bayes' theorem handle it correctly?
Python Extension
import numpy as np
import matplotlib.pyplot as plt
def prosecutors_fallacy_demo(p_evidence_given_innocent,
p_evidence_given_guilty,
prior_guilt):
"""
Demonstrate the prosecutor's fallacy by comparing:
- What the prosecutor claims: P(evidence | innocent) ≈ P(innocent | evidence)
- What Bayes actually says: P(guilty | evidence)
"""
# Bayes' theorem
p_evidence = (p_evidence_given_guilty * prior_guilt +
p_evidence_given_innocent * (1 - prior_guilt))
p_guilty_given_evidence = (p_evidence_given_guilty * prior_guilt /
p_evidence)
p_innocent_given_evidence = 1 - p_guilty_given_evidence
print("=== Prosecutor's Fallacy Analysis ===")
print(f"\nInputs:")
print(f" P(evidence | innocent) = {p_evidence_given_innocent:.10f}")
print(f" P(evidence | guilty) = {p_evidence_given_guilty:.4f}")
print(f" P(guilty) [prior] = {prior_guilt:.10f}")
print(f"\nThe prosecutor claims:")
print(f" 'P(innocent | evidence) ≈ P(evidence | innocent) "
f"= {p_evidence_given_innocent:.10f}'")
print(f" 'Therefore guilt is virtually certain.'")
print(f"\nBayes' theorem says:")
print(f" P(guilty | evidence) = {p_guilty_given_evidence:.6f}"
f" ({p_guilty_given_evidence*100:.4f}%)")
print(f" P(innocent | evidence) = {p_innocent_given_evidence:.6f}"
f" ({p_innocent_given_evidence*100:.4f}%)")
return p_guilty_given_evidence
# Sally Clark case (simplified)
print("--- Sally Clark Case ---")
p_guilt = prosecutors_fallacy_demo(
p_evidence_given_innocent=1/1_000_000, # Two SIDS (corrected)
p_evidence_given_guilty=1/50, # Two murders look natural
prior_guilt=1/50_000_000 # Base rate of double infanticide
)
print("\n\n--- DNA Database Search ---")
# DNA match: 1 in 10 billion random match,
# but searched against 5 million profiles
p_at_least_one_match = 1 - (1 - 1/10_000_000_000)**5_000_000
print(f"P(at least one coincidental match in database) = "
f"{p_at_least_one_match:.6f}")
prosecutors_fallacy_demo(
p_evidence_given_innocent=1/10_000_000_000,
p_evidence_given_guilty=0.999,
prior_guilt=1/5_000_000 # 1 suspect among 5M profiles
)
# How prior probability affects the conclusion
print("\n\n--- Effect of Prior Probability on Conclusion ---")
print(f"{'Prior P(guilt)':>20} {'P(guilt|evidence)':>20} {'Conclusion':>15}")
print("-" * 60)
priors = [1/100, 1/1000, 1/10000, 1/100000, 1/1000000,
1/10000000, 1/100000000]
for prior in priors:
posterior = (0.999 * prior) / (0.999 * prior + 1e-10 * (1-prior))
conclusion = "Likely guilty" if posterior > 0.5 else "Likely innocent"
print(f"{prior:>20.10f} {posterior:>20.6f} {conclusion:>15}")
Key Takeaway
The prosecutor's fallacy — confusing P(evidence | innocent) with P(innocent | evidence) — has contributed to wrongful convictions and unjust sentences. Bayes' theorem is not just an academic exercise. It's a safeguard against a reasoning error that can, and has, cost people their freedom and their lives. Every time you see a probability used as evidence, ask: "The probability of WHAT given WHAT?" The answer to that question might change everything.