Appendix E: DNA and Forensic Statistics Reference

This appendix is a standing reference to the one quantitative idea this book asks you to take seriously: the interpretation of a DNA result. It consolidates the machinery built across Chapters 7, 8, and 9 — the STR profile and CODIS, the random match probability, the likelihood ratio, probabilistic genotyping, mixtures, and the prosecutor's fallacy — into a single place you can consult when a number appears and you need to know what it does and does not mean.

The math here is deliberately light. Forensic statistics are taught through meaning and worked examples, not derivations. The discipline is not algebra; it is language — saying exactly what a number licenses and refusing to say more. Every probability in this appendix is expressed two ways, as a fraction and in words, because the words are what keep the meaning honest. And wherever a probability appears, the prosecutor's fallacy is named and refuted in the same passage, because that is the rule this book follows everywhere.

A standing caution on numbers. Every numeric value in this appendix is illustrative — chosen to show the arithmetic, not drawn from any real case or real allele-frequency table. Do not quote them as typical. Real calculations use measured per-locus frequencies and population-specific corrections; the shape of the reasoning is what transfers, never the digits.


E.1 The building blocks: STR loci, alleles, and CODIS

Forensic DNA typing does not read your genes. It targets a small set of locations in non-coding ("junk") DNA where the sequence varies a great deal between people — a deliberate, ethical design choice that makes standard typing reveal essentially nothing about health, traits, or ancestry. The variation it exploits is the short tandem repeat (STR): a short sequence of bases (typically 2–6 letters) repeated, back to back, a variable number of times.

Term Definition
Locus (pl. loci) A specific, named physical address in the genome that forensic scientists agree to examine.
Allele The version a person carries at a locus — e.g., 11 repeats, or 14. You carry two alleles per locus, one inherited from each parent.
Genotype (at a locus) A person's pair of alleles at one locus — e.g., (11, 14). Two different alleles = heterozygous; two of the same = homozygous, showing one value.
DNA profile The complete set of alleles a sample shows across all tested loci — a list, locus by locus. The U.S. core panel currently uses 20 STR loci plus a sex marker.
Allele frequency The proportion of chromosomes in a reference population carrying a particular allele at a locus; the raw material from which every match probability is built. An allele on 8% of chromosomes has frequency 0.08.

CODIS (Combined DNA Index System) is the U.S. national DNA database, administered by the FBI, that searches an unknown crime-scene profile against profiles on file (convicted-offender, arrestee in many jurisdictions, and unsolved-casework indexes). Two facts about CODIS govern its honest use:

  • CODIS stores numbers, not identities. A "hit" is the system flagging two profiles as candidate matches — not a name on a screen. The database is a pointer, not an answer.
  • A CODIS hit is a lead, not proof. Once the database points to a candidate, the laboratory obtains a fresh reference sample and re-types it; it is that confirmatory direct comparison, reported with its random match probability, that becomes evidence — never the bare fact that a computer flagged a candidate. (The added subtlety of finding a match by searching a large database — the "database-search effect" — is taken up in E.7.)

The core comparison logic, from Chapter 7, is the book's central asymmetry in its purest form: mismatch refutes; match supports. A single clean, reproducible difference at even one locus (the evidence is (15, 16) at a locus, the suspect is (14, 17)) is, barring rare biological exceptions, an exclusion — the suspect is not the source. Agreement at every locus does not prove the suspect is the source; it makes the suspect consistent with the source, and the strength of that consistency is the question the rest of this appendix answers.


E.2 The random match probability (RMP)

Random match probability (RMP) — the probability that a randomly chosen, unrelated person from the relevant population would, by coincidence, share the evidence profile.

The RMP is a statement about coincidence. It is the strongest single number in forensic science, and the most frequently misstated.

How it is built — the multiplication that is the engine of DNA's power. At each locus, population studies have measured how common each genotype is. The probability that a random person matches at one locus might be modest — say 1 in 10. But the loci are inherited largely independently, and independent probabilities multiply. Across twenty loci, twenty modest rarities multiply into one extreme one.

🔬 At the Bench — a worked RMP (illustrative numbers). Suppose at six independent loci, the genotype frequencies are roughly $1/20$, $1/15$, $1/50$, $1/8$, $1/30$, and $1/12$. Treating the loci as independent, the chance a random unrelated person matches at all six is: $$\text{RMP} = \frac{1}{20}\times\frac{1}{15}\times\frac{1}{50}\times\frac{1}{8}\times\frac{1}{30}\times\frac{1}{12} \approx \frac{1}{43{,}000{,}000}.$$ In words: if the DNA came from someone other than the defendant, the chance that random person would happen to share this six-locus profile is about 1 in 43 million — roughly one person in a population larger than California. Extend the same multiplication across the full 20-locus panel and the figure routinely falls below one in a trillion — more than a hundred times the population of Earth.

The real calculation is more careful than a flat product: it uses measured per-locus frequencies, and it applies a population-structure correction (often written $\theta$ or $F_{ST}$) to account for the fact that real populations are not perfectly random-mating. That correction makes the reported number more conservative — larger, and more favorable to the defendant.

State it both ways, always. "The random match probability is approximately 1 in 1.2 billion" is precise but cold; complete it: "—that is, you would expect to find this profile in roughly one person out of more than a billion, far more than the population of the United States." Expressing the figure as a fraction and in human terms is the difference between a juror understanding the evidence and a juror being overwhelmed by it.

Three hard edges on even this strongest number:

  1. It is computed for unrelated people. Close relatives share much more DNA. The chance a defendant's full sibling coincidentally matches is far higher than the RMP — sometimes only hundreds or thousands to one, not billions. A responsible report says the figure applies to unrelated individuals and that a sibling cannot be excluded by the statistic alone.
  2. It describes coincidence only — not laboratory error. A swapped, contaminated, or mislabeled tube is a separate and realistically larger risk than a one-in-a-billion coincidence, and it does not shrink no matter how many loci you type. Never let the astronomical coincidence figure stand in for the everyday, human risk of a mistake.
  3. It is a statement about the DNA, not the conduct. Even a perfect match says only whose cells these are — never when or how they got there (the transfer problem, E.5), and never that the source did anything wrong.

E.3 The likelihood ratio (LR)

The RMP answers how surprising a match would be if the defendant were not the source. It says nothing, directly, about the competing world in which the defendant is. The tool that holds both worlds in view — and the framework that now dominates forensic interpretation worldwide — is the likelihood ratio.

Likelihood ratio (LR) — a number expressing how much more (or less) probable the observed evidence is under one hypothesis than under a competing hypothesis. In forensic DNA it compares the probability of the results assuming the prosecution's proposition ($H_p$: the defendant is a contributor) to the probability of the same results assuming the defense's proposition ($H_d$: an unknown, unrelated person is). An LR of one million means the evidence is one million times more probable if $H_p$ is true than if $H_d$ is.

$$\text{LR} = \frac{\text{probability of the evidence IF the prosecution's proposition is true}}{\text{probability of the evidence IF the defense's proposition is true}}$$

The discipline the LR enforces: you cannot report one without finishing two sentences out loud — "Under $H_p$, the DNA came from _." "Under $H_d$, the DNA came from _." Get the propositions wrong — make them vague, or stack a straw-man defense hypothesis — and the number is worthless no matter how cleanly it was computed. The math is the easy part; choosing the right pair of hypotheses is the craft.

🔬 At the Bench — LR and RMP are two faces of one coin (single source). For a clean single-source match: if the defendant is the source, the probability of seeing his profile is essentially 1 (we'd expect exactly that). If an unrelated stranger is the source, the probability of seeing the defendant's exact profile is the RMP. So: $$\text{LR} = \frac{1}{\text{RMP}} = \frac{1}{1/43{,}000{,}000} = 43{,}000{,}000.$$ In words: the evidence is about 43 million times more probable if the defendant is the source than if a random unrelated person is. Same strength; framed as a comparison rather than a coincidence.

Why prefer the LR if the two numbers are equivalent for a single source? Because of how each is heard. "The random match probability is 1 in 43 million" invites the listener to slide into "so there's a 1-in-43- million chance he's innocent" — the fallacy. "The evidence is 43 million times more probable if he is a contributor than if an unrelated person is" resists that slide: it is overtly a statement about evidence under two hypotheses, not about the defendant. The LR is not just better mathematics; it is better-defended language. And it earns its keep where the RMP cannot apply at all — the mixture (E.4) — because it can still ask a clean two-hypothesis question even when no single "match probability" can be written down.

Reading the scale. Three reference points:

  • LR = 1 — the evidence is neutral; it favors neither proposition.
  • LR > 1 — the evidence supports $H_p$ over $H_d$ (the larger, the stronger the support).
  • LR < 1 — the evidence supports $H_d$ over $H_p$.

Many laboratories pair a numeric LR with a standardized verbal scale (e.g., an LR in the billions reported as "very strong support"). The verbal scale is a convention, not a law of nature, and it must always be completed by the comparison: "very strong support for the proposition that the defendant contributed, as compared with the proposition that an unrelated unknown person did" — never a free-floating "very strong" that a jury hears as "very strong proof of guilt."


E.4 Mixtures and their interpretation

A DNA mixture is a sample containing DNA from two or more contributors — a doorknob touched by a household, a shared steering wheel, a weapon handled by several people, a victim-plus-assailant sample. Mixtures are the hardest routine problem in forensic DNA, because the clean logic of "read the two alleles, that's the donor" breaks down: now there are more alleles at a locus than one person can own.

   ONE LOCUS: SINGLE SOURCE vs. TWO-PERSON MIXTURE (schematic; not to scale)

   SINGLE SOURCE (heterozygote)            TWO-PERSON MIXTURE
   two alleles, ~equal height              up to four alleles, unequal heights
        █           █                          █
        █           █                          █        █
        █           █                          █        █     █
        █           █                          █        █     █     ▌
       ─┴───────────┴──►                      ─┴────────┴─────┴─────┴──►
        12          16                         12   14   16    18

   Single source: read the two alleles — done.
   Mixture: which alleles pair into which contributor? Peak heights HINT at
   proportions but are unreliable at low template. Allele calls are illustrative.

Sorting a mixture into its contributors (or deciding whether a given person could be among them) is called deconvolution. Several features make it genuinely hard, and each is a place error enters:

  • The number of contributors is itself an estimate. Counting alleles gives a minimum, not the true number — three people can coincidentally share enough alleles to look like two. Misjudging the count skews everything downstream.
  • Allele sharing and stacking. When two contributors share an allele, their peaks overlap and add, so peak height is an ambiguous guide to who is present and in what proportion.
  • Major and minor contributors. Often one person contributed most of the DNA (the major contributor) and another only a little (the minor contributor). The minor profile may sit near the noise floor, full of the dropout and drop-in of a low-template sample — so the hardest contributor to resolve is usually the one a case most wants.
  • Degradation on top of mixing. A mixture that is also heat-degraded or low-template combines every difficulty at once.

⚠️ Junk-Science Alert — the CPI cautionary tale. For years a common mixture statistic was the Combined Probability of Inclusion (CPI / RMNE) — "the probability that a random person could not be excluded as a contributor." Used carelessly, especially on low-level or ambiguous mixtures, it produced numbers that overstated the strength of an inclusion. When a major U.S. crime laboratory's mixture protocols were found wanting, a large number of past cases had to be reviewed and some re-interpreted, sometimes changing the result. The lesson is not that mixtures are worthless; it is that a mixture statistic is only as good as the interpretation protocol behind it. Some mixtures — too many contributors, too little template — are honestly inconclusive, and "inconclusive" is a valid, ethical result a good lab reports without embarrassment.

The honest output of a mixture is rarely "this is his profile." At best it is a likelihood ratio: "the evidence is X times more probable if this person is a contributor than if he is not" — a strength-of-evidence statement, with all the honesty and all the room for dispute the LR carries.

Lineage markers, briefly (when standard STRs fail)

Two specialized methods can speak where nuclear STRs are silent, at a steep cost in discriminating power:

  • mtDNA (mitochondrial DNA) survives in rootless hair shafts, old bones, and badly degraded tissue because there are hundreds-to-thousands of copies per cell. But it is maternally inherited and shared along a whole maternal line — you, your mother, your siblings, your maternal cousins all share it — so it cannot individualize. Its match frequencies are far weaker than nuclear STRs (often a fraction of a percent of the population, not one in billions).
  • Y-STRs type the male-only Y chromosome — invaluable for pulling a small male contribution out of an overwhelming female background (a sexual-assault mixture). But the Y passes essentially unchanged from father to son, so a Y-STR profile is shared by all males in a paternal line. A "match" means "consistent with this paternal lineage," not "this man and no other."

The recurring error with both is presenting a lineage match as an individual identification. The honest statement always adds: "...and matches every [maternal/paternal] relative he has, and a measurable fraction of unrelated people who share this common type."


E.5 Probabilistic genotyping (and its black-box problem)

Probabilistic genotyping — software that uses explicit statistical models and computation to interpret DNA profiles, especially complex or low-template mixtures, by computing a likelihood ratio rather than relying on an analyst's subjective peak-by-peak judgment. It models the biological artifacts (stutter, dropout, drop-in, peak-height variation) probabilistically and weighs the evidence under competing propositions.

The promise is real. For decades, analysts read messy mixtures by eye and by hand — deciding subjectively which peaks to "call" — and two analysts given the same mixture and the suspect's profile could reach different conclusions, sometimes nudged by knowing the desired answer. Probabilistic genotyping replaces that irreproducible judgment with a documented, repeatable computation; it can extract usable information from mixtures a human would dismiss; and it carries the uncertainty through the calculation rather than forcing a binary "call it or don't."

Three cautions belong on every report:

  1. The black box. These are commercial products, and developers have, in several cases, resisted defense requests to examine the source code, asserting trade-secret protection. A defendant convicted partly on a number from software he is not permitted to inspect raises a genuine due-process question: how do you cross-examine a calculation you cannot see? Courts have split; the issue remains live — and it is a fairness question, not a scientific one (it connects to the Confrontation Clause; see Appendix D, Melendez-Diaz).
  2. Different programs, different numbers. Independent comparisons have found that two validated programs, run on the same complex mixture, can return LRs differing by orders of magnitude — occasionally pointing in opposite directions. That does not make either "wrong," but it means the number is model-dependent, not a fact of nature.
  3. The edge of the envelope. These tools are validated for certain numbers of contributors and minimum DNA quantities. Pushed past those limits, the output becomes unreliable, and the obligation is to report "beyond validated limits," not to report a number anyway.

The PCAST 2016 verdict (state it precisely): for simple mixtures (few contributors, reasonable amounts of DNA), probabilistic genotyping has foundational validity — the studies show it does what it claims. For complex mixtures (many contributors, low template, heavy overlap), foundational validity was not yet established, and caution is required. The book's recurring lesson in miniature: a method can be valid in one regime and unvalidated in another, and the honest practitioner states which regime the actual sample falls in.

🧠 Cognitive-Bias Watch. Software does not eliminate bias; it relocates it. The analyst still chooses the assumed number of contributors, defines the propositions, and decides whether the sample is within validated limits — choices that can be swayed by knowing the suspect's profile. Best practice is to make as many of these decisions as possible before the reference profile is compared, and to document them. "The computer said so" is no defense against a biased setup. (Sequential unmasking — Chapter 31 — is the safeguard.)


E.6 The prosecutor's fallacy (named and refuted)

This is the most important — and most dangerous — idea in forensic statistics. It is a logical error, not a mathematical one, which is exactly why it slips past intelligent people who can do the arithmetic flawlessly.

Prosecutor's fallacy — the error of treating the probability of the evidence given innocence (e.g., the random match probability) as if it were the probability of innocence given the evidence. It transposes a conditional probability: "the chance a random innocent person would match is 1 in a million" is wrongly restated as "the chance the defendant is innocent is 1 in a million."

In symbols — the one place the notation earns its keep — the RMP is roughly $P(\text{match} \mid \text{innocent})$: the probability of a match given the person is not the source. What a juror cares about is $P(\text{innocent} \mid \text{match})$: the probability the person is not the source given the match. The fallacy is to assume these are the same number. They are not. Confusing $P(A \mid B)$ with $P(B \mid A)$ is one of the oldest mistakes in reasoning.

The refutation — a homely example that breaks the spell. The probability an animal has four legs given it is a cow is essentially 1 — cows have four legs. The probability an animal is a cow given it has four legs is nowhere near 1 — most four-legged animals are not cows. Same two facts, two wildly different conditional probabilities; anyone who swapped them would be obviously wrong. The prosecutor's fallacy is that exact swap, wearing the dignity of a DNA statistic so it no longer looks absurd.

Why the swap matters so much: the probability a juror actually wants — innocence given the match — depends on something the match probability does not contain: how many other people could have left the DNA, and how likely the defendant was to be the source before the DNA was considered (the prior, E.7). A small random match probability does not, by itself, translate into a small probability of innocence.

⚠️ The origin story (Tier-1). The fallacy's most famous early appearance predates DNA: People v. Collins (California, 1968), where a couple was convicted partly on a prosecutor's claim that the probability of a random couple matching the eyewitness description was 1 in 12 million — a figure assembled by multiplying made-up frequencies for independent-seeming traits. The argument invited the jury to read "1 in 12 million that a random couple matches" as "1 in 12 million that this couple is innocent." The California Supreme Court reversed, identifying both the fabricated probabilities and the fallacious leap from coincidence to guilt.

The mirror image, for honesty's sake:

Defense fallacy — the error of dismissing strong evidence by noting that, in a large population, many people would coincidentally match, and concluding the match is therefore nearly worthless. It treats the defendant as if randomly drawn from the whole matching set, ignoring all the other evidence that placed this defendant before the court.

Example: RMP of 1 in 10 million in a country of 60 million → "six people would match by chance; my client is just 1 of 6; that's reasonable doubt." Wrong, for the complementary reason: the defendant was not plucked at random from 60 million — he was brought to court by other evidence the five hypothetical matchers do not share. The prosecutor's fallacy strips the DNA of its limits to make it look like proof; the defense fallacy strips it of its context to make it look weak.

The expert's rule (from the U.K. appellate guidance in cases usually cited as R v. Deen and R v. Doheny and Adams): state the rarity of the profile, and stop there — leave the leap to a conclusion about guilt to the jury. The instant an expert says "the probability he is the source is…," they have left science and committed the fallacy.


E.7 Bayesian reasoning: the prior, the LR, and the posterior

If the prosecutor's fallacy is the disease, Bayesian reasoning is the cure — not because jurors must do arithmetic, but because the Bayesian structure shows exactly what the match probability leaves out and where it belongs.

Bayesian reasoning — a framework for updating belief in a hypothesis as evidence arrives, combining a prior (belief before the evidence) with the weight of the evidence (the likelihood ratio) to produce a posterior (belief after). In odds form:

$$\text{posterior odds} = \text{LR} \times \text{prior odds}$$

In words: your belief after the DNA equals your belief before the DNA, multiplied by how strongly the DNA favors one hypothesis over the other. This single line explains everything wrong with the prosecutor's fallacy: the LR is only the multiplier (the weight of the evidence). To reach a probability of guilt you must multiply by a prior (how likely the defendant was to be the source before the DNA). The fallacy is forgetting the prior — treating the multiplier as if it were already the answer.

🔬 At the Bench — same DNA, opposite conclusions (illustrative numbers). Take a DNA result with an LR of 1,000,000 and consider two cases:

  • Case A — a cold hit with nothing else. A database search of millions turns up the defendant, with no other connection to the crime. The prior odds that this particular person (rather than anyone else who could have been searched) is the source might be low — illustratively, 1 in 1,000,000. $$\text{posterior odds} = 1{,}000{,}000 \times \tfrac{1}{1{,}000{,}000} = 1 \;\;(\text{even odds — a coin flip}).$$ In words: despite an LR in the millions, the DNA alone against a weak prior is far from conclusive.
  • Case B — DNA plus independent evidence. The same LR of 1,000,000, but the defendant was the victim's business partner, stood to gain financially, and was placed near the scene by other evidence. The prior odds might be 1 in 100. $$\text{posterior odds} = 1{,}000{,}000 \times \tfrac{1}{100} = 10{,}000 \text{ to } 1.$$ In words: now the case is strong.

Same DNA. Same LR. Wildly different conclusions — because the prior differs. This is why the scientist supplies the LR and must NOT supply the answer: the prior depends on all the non-DNA evidence the jury alone weighs.

The division of labor, stated as the book's hardest rule: the scientist owns the LR (the weight of the evidence); the jury owns the prior (everything else) and performs the multiplication. An expert who states a probability of guilt has reached into the jury's box, grabbed the prior, and performed a calculation that is not theirs to perform with information they do not have. Guilt is a posterior. It belongs to the jury.

The cold-hit warning. When a suspect is found only by searching a large database, chance had many opportunities to throw up a coincidental match, and the relevant prior can be very different from a case where independent evidence pointed to the person first. Naively reporting the same RMP can overstate the evidence. The safe practice: disclose that the suspect was identified by a database search, and let the factfinder weigh it, rather than presenting a cold hit as a confirmatory test of an independently chosen suspect.

A note on courtroom use. Courts have been skeptical of handing juries Bayes' theorem to compute (in a case usually cited as R v. Adams, the U.K. Court of Appeal was unenthusiastic about turning jurors into amateur statisticians). The mainstream position: the expert reports the LR and explains it plainly; the jury combines it with everything else by judgment, understanding that the DNA is one input, never the whole verdict. The Bayesian framework is invaluable as a way of understanding DNA evidence; it is contested as a procedure imposed on a jury.


E.8 Communicating a result honestly — the script

Everything converges on saying a true thing, out loud, to non-statisticians, under pressure from at least one attorney who would prefer something stronger or weaker than the truth. The entire discipline reduces to one instruction: report the strength of the evidence; never report the probability of guilt. Keep the verb on the evidence, and you stay inside the science; attach it to the person, and you have stepped over the line even if every number is correct.

🔬 At the Bench — the honest script (from Chapter 9, generalized in Chapter 30).

You MAY say: - "At twenty genetic locations, the defendant's profile and the crime-scene profile correspond, with no unexplained differences." - "The probability that a random, unrelated person from the relevant population would share this profile by chance is approximately 1 in [X] — one person in [Y] times the U.S. population." - "The evidence is approximately [LR] times more probable if the defendant is a contributor than if an unrelated, unknown person is." - "This is a mixture of at least [N] contributors; the analysis assumed [N] and compared the following propositions…" - "This result speaks to whether the defendant's DNA is present. It does not speak to how or when it was deposited." - "My method has a known, non-zero error rate; I do not, and cannot, testify to certainty."

You may NOT say: - "The probability the defendant is innocent is 1 in [X]." (Prosecutor's fallacy — transposed conditional.) - "The probability the defendant is the source is 99.99%." (A posterior; it requires a prior the scientist cannot supply.) - "This is a match, so he did it." (Conflates presence of DNA with guilt and with conduct.) - "To a reasonable degree of scientific certainty, this is his DNA and no one else's." (Overstated individualization; reserve quantified claims for the quantified statistic.)

The transfer problem, restated as a limit on every statistic. A match — even a one-in-a-trillion, single- source match — establishes only that the cells came from this person. It cannot tell you when they were deposited, how they got there (direct contact, secondary transfer, or planting), or what the person was doing. "His DNA is on the weapon" and "he wielded the weapon" are different statements; the distance between them is the distance between forensic science and a verdict. There are documented instances of a person's DNA reaching a scene they never visited, carried on an object or another person. The discipline (Chapter 31) is to keep the DNA's claim contained to source, and to evaluate the question of act on its own evidence.


E.9 Worked example: putting it together

A consolidated walk-through, with illustrative numbers, exercising every tool above.

The setup. A single-source bloodstain at a scene yields a full 20-locus profile. A suspect, identified by non-DNA investigation (a witness placed him at the scene), is typed and corresponds at all twenty loci. The laboratory computes, against the relevant population with the standard conservative correction, an RMP of 1 in 2 billion.

Step 1 — State the RMP both ways. "A randomly chosen, unrelated person from the relevant population would share this profile with probability about 1 in 2 billion — roughly one person in more than six times the U.S. population." (Coincidence, not guilt.)

Step 2 — Convert to an LR (single source). Since $\text{LR} = 1/\text{RMP}$: $$\text{LR} = \frac{1}{1/2{,}000{,}000{,}000} = 2{,}000{,}000{,}000.$$ "The evidence is about two billion times more probable if the suspect is the source than if an unrelated unknown person is — very strong support for the first proposition over the second."

Step 3 — Refuse the fallacy. A detective says: "Two billion to one — there's basically no chance he's innocent." The correction: that transposes the conditional (the prosecutor's fallacy). The two-billion figure is the chance a random unrelated person would coincidentally match; it is not the chance the suspect is innocent. That depends on the prior and on everything else in the case.

Step 4 — Apply the prior honestly (Bayes). A witness already placed the suspect at the scene, so the prior odds are not negligible — illustratively, 1 in 100. Then: $$\text{posterior odds} = 2{,}000{,}000{,}000 \times \tfrac{1}{100} = 20{,}000{,}000 \text{ to } 1.$$ A very strong case. But note what changed it: the prior the jury supplied, not the LR the scientist supplied. Had the suspect instead been a bare cold hit with no other connection, the same LR against a far weaker prior would leave far more doubt (E.7, Case A).

Step 5 — State the limits. The match is single-source and clean, so the mixture cautions (E.4) do not apply — but the RMP is for unrelated people (a brother is not excluded by the number alone), it says nothing about laboratory error (a separate, larger risk), and it speaks to presence of the blood, not to how the blood was shed or by whom in what act. The honest verb stays on the evidence throughout: consistent with, strongly supports — never proves, never guilty.


E.10 Quick-reference glossary

Term One-line definition Honest verb / use
STR Short tandem repeat; the variable, non-coding region forensic typing reads. The unit of comparison.
Allele / locus / genotype The version / the address / the pair at that address. Building blocks of a profile.
DNA profile The set of alleles across all tested loci (20-locus core panel + sex marker). "corresponds at N loci."
CODIS U.S. national DNA database; stores numbers, returns leads not verdicts. "the database generated a candidate."
Random match probability (RMP) Chance an unrelated random person coincidentally shares the profile. "found in ~1 in [X]; a statement of coincidence."
Allele frequency Proportion of chromosomes carrying an allele in a reference population. Raw material for the RMP.
Likelihood ratio (LR) How many times more probable the evidence is under $H_p$ than $H_d$. "evidence is [LR]× more probable if $H_p$ than $H_d$."
DNA mixture Sample from two or more contributors; needs deconvolution. "consistent with being a contributor" — rarely more.
Probabilistic genotyping Software computing an LR for complex/low-template mixtures. "an LR estimated by a model, not a fact of nature."
mtDNA / Y-STR Lineage markers (maternal / paternal); cannot individualize. "consistent with this lineage," not "this person."
Prosecutor's fallacy Treating $P(\text{match}\mid\text{innocent})$ as $P(\text{innocent}\mid\text{match})$. Name it; refuse it.
Defense fallacy Dismissing strong evidence by ignoring the non-DNA context. Name it; refuse it.
Bayesian reasoning posterior odds = LR × prior odds. Scientist owns LR; jury owns prior.

The one sentence to carry: single-source DNA statistics are foundationally valid and quantified; their honest communication is a strength-of-evidence statement about coincidence, never a statement about guilt — and the most common way valid DNA evidence misleads a jury is through the prosecutor's fallacy in the mouth of an expert or an attorney.


Cross-references: Chapter 7 (STR workflow, the DNA profile, CODIS, the RMP); Chapter 8 (touch/trace DNA, low-template stochastic effects, mixtures, mtDNA/Y-STRs, investigative genetic genealogy); Chapter 9 (the likelihood ratio, probabilistic genotyping, the prosecutor's and defense fallacies, Bayesian reasoning, honest communication); Chapter 6 and Appendix F (where these methods sit on the validity spectrum); Chapter 30 and Appendix D (presenting a statistic on the stand without overstatement); Chapter 31 (sequential unmasking and bias in interpretation).