Chapter 9 Exercises: Forensic DNA Statistics
Work these with the chapter open. Items marked with a dagger (†) have worked solutions in the answers appendix; the rest are for discussion or self-check. Mix of recall, applied reasoning, evidence interpretation, "spot the overstatement," and ethics. No answers appear in this file.
A reminder before you start: in every numerical exercise, treat all probabilities and likelihood ratios as illustrative teaching numbers, not as figures from real cases.
Part A — Recall and definitions
-
Define allele frequency and explain, in one sentence, why it is the raw material from which a random match probability is built. †
-
State the formula for a likelihood ratio in words (numerator and denominator), and name the two hypotheses it typically compares in a forensic DNA case.
-
What does the random match probability measure? Phrase your answer as a conditional probability: "the probability of ____ given ____."
-
Define the prosecutor's fallacy in one sentence, and identify which two conditional probabilities it confuses. †
-
Define the defense fallacy in one sentence, and state the one thing it ignores.
-
What is probabilistic genotyping, and what kind of DNA sample is it designed to interpret that a human analyst struggles with?
-
Write the Bayesian odds relationship that connects prior odds, the likelihood ratio, and posterior odds. State which quantity the forensic scientist supplies and which the jury supplies. †
-
Why do the genotype frequencies at separate genetic loci get multiplied to produce a profile frequency? What property of the loci justifies the multiplication?
Part B — Applied reasoning and calculation (numbers illustrative)
-
A clean single-source crime-scene profile has a random match probability of 1 in 20 million. What is the likelihood ratio comparing "the defendant is the source" to "an unrelated stranger is the source," and why? †
-
An LR is reported as 1 in 500 (that is, 0.002). In plain language, which side does this evidence favor, and roughly how strongly?
-
The genotype frequencies at four independent loci are (illustratively) 1/10, 1/25, 1/4, and 1/40. Compute the approximate combined profile frequency, and express the result both as a fraction and in words ("about one person in …"). †
-
A laboratory reports an LR of 2,000,000 in a two-person mixture case. Write out, as two complete sentences, the prosecution proposition ($H_p$) and the defense proposition ($H_d$) that this number must be comparing.
-
Using "posterior odds = LR × prior odds," compute the posterior odds for an LR of 100,000 and prior odds of (a) 1 in 100,000 and (b) 1 in 50. Comment on how different the two conclusions are despite the identical DNA. †
-
A profile's random match probability is 1 in 10 million in a country of 70 million people. About how many people in that country would be expected to match by chance? Why does this number not, by itself, tell you the probability the defendant is the source? †
-
An analyst computes an RMP of 1 in 3 billion against an unrelated person. The defendant has an identical twin who is also a suspect. Explain why the 1-in-3-billion figure is the wrong number for that alternative, and what kind of evidence would now be needed.
Part C — Evidence interpretation and "spot the overstatement"
-
For each statement, decide whether an expert may say it on the stand, and if not, name the error and rewrite it correctly: (a) "The probability the defendant is innocent is 1 in 6 million." † (b) "The evidence is 6 million times more probable if the defendant is a contributor than if an unrelated person is." (c) "This is a match, so the defendant was at the scene." (d) "A random unrelated person would share this profile with a probability of about 1 in 6 million." (e) "To a reasonable degree of scientific certainty, this DNA is the defendant's and no one else's."
-
A detective tells the press: "The lab says it's a billion to one it's our guy." Identify the fallacy, state what the billion-to-one number actually refers to, and write a one-sentence honest correction. †
-
A defense attorney argues: "Forty people in this state would match by random chance, so my client is just one of forty — that's clearly reasonable doubt." Name the fallacy and explain precisely what the argument ignores about why this defendant is on trial.
-
Two validated probabilistic-genotyping programs are run on the same complex four-person mixture and return likelihood ratios that differ by a factor of a thousand. A prosecutor says, "So one of them is just broken." Critique that interpretation. What is the more accurate way to describe the disagreement?
-
An expert testifies to an LR of 1 billion for a mixture but never states the propositions or the assumed number of contributors. List three questions a competent cross-examiner should ask, and say why each matters. †
-
Read the Evidence (build your own). Using the six-field format from the chapter (THE ITEM, THE CONTEXT, WHAT IT SHOWS, WHAT IT DOESN'T, THE INFERENCE, THE LESSON), write a block for a single-source crime-scene profile that matches a suspect with an RMP of 1 in 50 million — where the suspect is the victim's roommate. Make "WHAT IT DOESN'T" carry the transfer and prior limitations honestly.
Part D — Probabilistic genotyping and the black-box problem
-
State PCAST 2016's two-part verdict on probabilistic genotyping (simple vs. complex mixtures) in your own words, and explain why a single method can be valid in one regime and not the other. †
-
A defendant's attorney requests the source code of the probabilistic-genotyping program used to convict him; the developer refuses, citing trade secret. Lay out the competing interests, and explain why this is a fairness question rather than a scientific one.
-
Explain how cognitive bias can enter a probabilistic-genotyping analysis even though the computation itself is objective. Name two upstream human decisions that bias could affect, and one safeguard.
-
A mixture is suspected to contain DNA from five contributors at very low template. The software returns a number anyway. What should the analyst's report say, and why is "we got a number, so we'll report it" the wrong instinct here?
Part E — Ethics, communication, and the courtroom
-
You are an analyst, and the prosecutor in pretrial prep asks you to "just tell the jury there's a one-in-a-billion chance it's anyone else." Draft your refusal in two sentences — what you will say instead, and why you cannot say what was asked. †
-
Why is "report the strength of the evidence, never the probability of guilt" a sufficient rule to keep an expert out of both the prosecutor's fallacy and overstated individualization? Explain the connection.
-
The CSI effect (Chapter 1) is said to "cut both ways" in DNA-statistics testimony specifically. Give one way it makes a jury over-trust a DNA result and one way it makes a jury under-value one, and say how an honest expert guards against each. †
-
A suspect was identified only by a cold database search of millions of profiles, with no other connection to the crime. Explain, using the Bayesian framework, why reporting the ordinary RMP as though the suspect had been independently chosen can overstate the evidence — and what should be disclosed.
-
Construct a short courtroom exchange (in the style of the Chapter 1 cross-examination box) in which a careful analyst is pressed to commit the prosecutor's fallacy and declines, holding the line on honest language across at least four question-and-answer turns. †
Part F — Cold-case extension
-
The Mill Creek file. The state lab reports a two-person mixture from the gas-can handle: major contributor consistent with the victim Marcus Diallo, minor contributor with a likelihood ratio strongly supporting Roy Keller over an unknown unrelated person. (a) Write the two propositions ($H_p$, $H_d$) the lab must have compared. † (b) Keller is a co-owner of the property and the renovation. Explain, citing the transfer problem (Chapter 8) and the prior (§9.5), why a strong LR here is consistent with Keller but is not proof he handled the can on the night of the fire. (c) The detective's "one-in-a-billion that it isn't Keller" claim is a prosecutor's fallacy. Write the correction you would enter in the case file, naming the error and restating what the number means.
-
What would strengthen it? Independent of the DNA, list two kinds of evidence that — if found in later chapters — would raise the prior that Keller is the source, and explain (in Bayesian terms) why they would make the same LR far more probative than it is now. †
When you have finished, compare your answers to the worked solutions for the daggered and odd-numbered items in the answers appendix — but try every problem first. The interpretation skills in this chapter are the ones most often tested under cross-examination, and they reward practice more than memorization.