Chapter 9 Key Takeaways: Forensic DNA Statistics

DataField.Dev

Chapter 9 Key Takeaways: Forensic DNA Statistics

A one-page field card. If you remember nothing else from this chapter, remember the rule in the box at the bottom.

The core claims

A DNA "match" is empty without a number. "These profiles match" means only that they correspond; the random match probability (RMP) is what tells you whether the correspondence is unremarkable or astronomically rare.
The RMP is a statement about coincidence, not guilt. "1 in 43 million" means a random unrelated person would share the profile by chance with that probability. It is not "a 1-in-43-million chance the defendant is innocent."
The likelihood ratio (LR) is the better frame. It compares the probability of the evidence under two stated hypotheses (defendant is a contributor vs. an unrelated unknown is). For a single-source match, LR ≈ 1/RMP. Its virtue is that it forces both hypotheses into the open and resists the fallacy.
Mixtures need software; software needs honesty. Probabilistic genotyping computes an LR for complex mixtures the human eye cannot untangle — a real advance — but the LR is a model's estimate, not a measurement, and two valid programs can disagree by orders of magnitude on a complex sample.
The prosecutor's fallacy is the chief danger. Transposing $P(\text{match} \mid \text{innocent})$ into $P(\text{innocent} \mid \text{match})$ has helped convict the innocent. Its mirror, the defense fallacy, strips strong evidence of context to make it look worthless. Both are wrong.
Bayes shows what the match probability leaves out: the prior. posterior odds = LR × prior odds. The scientist supplies the LR (the weight); the jury supplies the prior (drawing on all other evidence). The same LR yields a coin flip or near-certainty depending on the prior — which is why the number never decides alone.

Method-validity verdict (NAS 2009 / PCAST 2016)

Method / claim	Where it sits on the validity spectrum	Key error mode
Single-source DNA match statistics (RMP, LR)	Strong — quantified, peer-reviewed, the field's gold standard	The error is in interpretation/language, not the math (the prosecutor's fallacy)
Probabilistic genotyping — simple mixtures (within validated limits)	Foundationally valid (PCAST 2016)	Operating outside validated limits; biased human inputs
Probabilistic genotyping — complex mixtures (many contributors, low template)	Not yet established (PCAST 2016)	Model-dependence; programs disagree; source-code opacity
Reporting "probability the defendant is the source/guilty"	Not science — a transposed conditional	The prosecutor's fallacy itself

Key terms (one line each)

Likelihood ratio (LR) — how many times more probable the evidence is under one hypothesis than a competing one.
Probabilistic genotyping — software that computes an LR for complex/low-template DNA mixtures.
Prosecutor's fallacy — treating P(evidence | innocent) as P(innocent | evidence).
Defense fallacy — dismissing strong evidence by counting coincidental matchers while ignoring the other evidence against this defendant.
Bayesian reasoning — posterior odds = LR × prior odds; updating belief with the weight of evidence.
Allele frequency — the proportion of chromosomes carrying a given allele; the raw material of match probabilities.

Themes advanced in this chapter

The validity spectrum: DNA statistics anchor the strong end — because they are quantified — yet even there the validity lives in the interpretation, and complex-mixture software is not yet established.
The CSI effect cuts both ways: a jury primed for certainty hears a hedged probability as a verdict (over-trust), while a defense recasting of the LR as coincidence can make strong evidence look worthless (under-value). Stating the limits out loud is the safeguard.

⚖️ What you can honestly say on the stand

"A random, unrelated person would share this profile by chance with a probability of about 1 in [X]; equivalently, the evidence is about [LR] times more probable if the defendant is a contributor than if an unrelated unknown person is. This speaks to whether the defendant's DNA is present — not to how or when it got there, and not to the probability that he is guilty."

What you may never say: "the probability the defendant is innocent is 1 in [X]"; "the probability he is the source is 99.99%"; "it's a match, so he did it." Each is the prosecutor's fallacy or overstated individualization wearing a number.

The whole discipline in one rule: report the strength of the evidence; never report the probability of guilt.