Chapter 8: Advanced DNA: Touch DNA, Degraded Samples, Mixtures, and Forensic Genealogy

DataField.Dev

39 min read

> "The more sensitive the method, the more careful the question. A test that can find one person's cells on a doorknob can also find the cells of someone who only shook that person's hand."

Prerequisites

1
3
4
6
7

Learning Objectives

Explain what touch DNA is, why modern typing is sensitive enough to recover it, and why that sensitivity makes secondary transfer a first-order interpretive problem rather than a footnote.
Describe how degradation and low template change a DNA result — dropout, drop-in, stochastic effects — and why a partial, heat-damaged profile is weaker evidence than a clean one even when it 'matches.'
Define a DNA mixture and explain why deconvoluting two or more contributors is the hardest routine problem in forensic DNA, where interpretation (and error) re-enter a method otherwise prized for objectivity.
State when nuclear STR typing fails and what mtDNA and Y-STR analysis can and cannot add, including why both trade individualizing power for the ability to work on hopeless samples.
Explain how investigative genetic genealogy reconstructs a suspect's family tree from a distant relative's SNP profile, using the Golden State Killer case, and locate the method honestly on the validity spectrum.
Weigh the privacy and ethics of consumer genetic databases as an investigative tool, and articulate why a powerful, valid method can still be a contested one.

In This Chapter

Overview
Learning Paths
8.1 Touch and trace DNA: sensitivity and its perils (secondary transfer)
8.2 Degraded and low-template samples
8.3 DNA mixtures: the hard problem
8.4 mtDNA and Y-STRs: when nuclear STRs fail
8.5 Investigative genetic genealogy and the Golden State Killer
8.6 The privacy and ethics of genetic databases (preview Ch. 38)
🗂️ The Case File
Conclusion
Key Terms
Spaced Review

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 8: Advanced DNA: Touch DNA, Degraded Samples, Mixtures, and Forensic Genealogy

"The more sensitive the method, the more careful the question. A test that can find one person's cells on a doorknob can also find the cells of someone who only shook that person's hand." — constructed line, in the voice of this book; it is the warning the whole chapter turns on [constructed teaching example]

Overview

Chapter 7 gave you DNA at its best: a clean, single-source profile from a generous sample — a bloodstain, a semen stain, a swab of saliva — typed at twenty well-separated loci and reported with a random match probability so small it might as well be the population of the Earth several times over. That is the DNA of the textbook and the television show, and it deserves its reputation as the strongest tool in forensic science. This chapter is about everything that happens when the sample is not generous. When the "sample" is the few skin cells a hand left on a gas-can handle. When the DNA has been cooked by a fire, soaked by rain, or sitting in the sun for a week, breaking into fragments too short to read. When the swab picks up not one person but three, their profiles overlaid like three voices talking at once. And when there is no database hit at all — when the profile belongs to a stranger CODIS has never heard of, and the only way forward is to find his third cousin instead.

These are not edge cases. They are most of modern forensic DNA work. The clean single-source stain is the easy day; the hard days are touch DNA, degraded DNA, and mixtures, and the hardest interpretive judgments in the entire field live here. This is the chapter where DNA — the one method this book holds up as rigorously validated — stops being automatic and starts requiring interpretation again. And interpretation, as the whole book keeps insisting, is where human judgment, and therefore human bias and human error, climb back in through the window. A method can be founded on solid science (the second theme) and still be applied badly, at the limit of its sensitivity, by an analyst who knows which answer the detective wants (the third theme). Touch DNA and mixtures are where those two themes collide.

We will also meet this book's emblem of genuine, validated forensic progress: investigative genetic genealogy, the technique that, in 2018, used a distant relative's DNA in a public genealogy database to give a name to the Golden State Killer after forty years. It is real, it is powerful, and — unusually for a celebrated method — it is honest about what it does, because what it produces is not a courtroom conclusion at all but an investigative lead that conventional DNA then confirms. It is also, precisely because it is so powerful, one of the most ethically contested tools in the field. Holding all of that in mind at once — powerful, valid, and contested — is the kind of calibrated thinking this book is trying to build.

In this chapter, you will learn to:

Explain what touch DNA is and why its great sensitivity makes secondary transfer a central problem, not a footnote.
Describe how degradation and low-template DNA distort a profile through allele dropout and stochastic effects, and why "partial match" is a weaker statement than "match."
Define a DNA mixture and explain why deconvolution is the hardest routine problem in forensic DNA.
Say when nuclear STRs fail and what mtDNA and Y-STR typing add — and what they give up to do it.
Walk through investigative genetic genealogy with the Golden State Killer case, and place it honestly on the validity spectrum.
Weigh the privacy and ethics of genetic databases, and hold "powerful," "valid," and "contested" in mind at once.

Learning Paths

This chapter sits on top of Chapter 7 — read that first — and it is the one where DNA's certainty develops cracks. Everyone should read §8.1 and §8.3; they reshape how you hear the word "DNA" for the rest of the book.

🔎 Investigator/CSI: Your collection decisions decide whether the lab gets a usable touch sample or a contaminated mixture. Weight §8.1 (what carries touch DNA, and how you contaminate it just by breathing on it) and §8.5 (how a genealogy lead is generated, and the chain that has to stay clean for it to matter). 🧪 Lab analyst: §8.2 and §8.3 are your bench at its hardest — dropout, drop-in, stochastic thresholds, and the deconvolution of mixtures that probabilistic genotyping (Chapter 9) will try to automate. Know exactly where your interpretation stops being measurement and starts being judgment. ⚖️ Law/courtroom: §8.1's transfer problem and §8.3's mixture ambiguity are where cross-examination lives now that single-source DNA is rarely contested. The question is no longer "is it his DNA?" but "how did it get there, and are you sure it's even resolvably his?" 👥 General reader/juror: A "DNA match" on the news is doing a lot of hidden work. §8.1 and §8.3 teach you to ask the two questions that matter most: how much DNA, and how many people are in it.

8.1 Touch and trace DNA: sensitivity and its perils (secondary transfer)

Begin with a question from a real scene, the kind that drives this whole chapter. A gas can is recovered from a burned cabin. No blood on it, no saliva, no visible stain of any kind — just a plastic handle that somebody, at some point, gripped. Can we get DNA off it? And if we can, what does it mean?

The answer to the first question is, increasingly, yes. Touch DNA (sometimes called contact DNA or, more broadly, trace DNA) is genetic material recovered from the cells a person leaves behind simply by touching a surface — skin cells shed from the hands, along with sweat, sebum, and possibly cells transferred from elsewhere on the body. You shed tens of thousands of skin cells a day; some of them, and the DNA inside them, rub off onto what you handle. Touch DNA names the sample type — DNA from contact rather than from a fluid; trace DNA is the wider umbrella for any tiny, often invisible biological deposit, touch included. Modern STR typing (Chapter 7) is sensitive enough that, on a good day, the few cells on a handle, a steering wheel, a ligature, or a trigger can yield a profile.

That sensitivity is a genuine advance. It has solved cases that left no other biological evidence, and — as always in this book — it has excluded suspects who handled nothing at the scene. But sensitivity is a double-edged instrument, and the edge that cuts the wrong way is the one you must understand cold. The more sensitive the method, the less the result tells you about how the DNA arrived. A bloodstain is a strong statement that its source bled there. A few skin cells are a much weaker statement, because skin cells get around.

🔬 At the Bench Recovering touch DNA usually means swabbing or "scraping" the contact area — a moistened swab, sometimes a dry swab in tandem, sometimes adhesive tape lifts or cutting out the substrate. The yield is often below a nanogram, sometimes a few cells' worth, which pushes the sample into the low-template regime we cover in §8.2. The analyst then amplifies and types as usual, but every choice now matters more: how many amplification cycles, what peak-height threshold counts as a real result, whether to interpret a profile at all. Crucially, anti-contamination discipline becomes everything. A scene tech who touches the handle bare-handed, an analyst who speaks over an open tube, a swab that brushes a glove that brushed something else — any of these can deposit cells that did not come from the perpetrator. At this sensitivity, the lab is partly competing against itself, which is why touch-DNA workflows demand dedicated negative controls, sterile consumables, and sometimes elimination databases of staff profiles.

The deepest problem with touch DNA is secondary transfer: DNA that arrives on an object not because its source touched the object, but because its source touched something or someone else that then touched the object. Shake my hand, then pick up a knife, and your hand may deposit my cells onto the handle — even though I never touched the knife and have never been to the scene. This is not a theoretical worry. Controlled studies have repeatedly shown that DNA can transfer from person to object to object to object, and that under some conditions a person's profile can end up as the major contributor on an item they never touched. (The exact frequencies depend on how much a person sheds, the surfaces, the timing, and moisture; the field is still mapping them, and the honest summary is "it happens often enough that you cannot ignore it.")

Sit with what secondary transfer does to an inference. Single-source blood under a victim's fingernails says, roughly, "this person bled in close contact with the victim." A touch-DNA profile on a weapon says only "cells from this person are on this weapon" — and those cells may have arrived by a handshake two transfers removed. The DNA result can be flawless, the random match probability vanishingly small, the analyst entirely competent, and the inference the source touched this object can still be wrong. The error is not in the typing. It is in the leap from "whose cells" to "who did what."

🧠 Cognitive-Bias Watch Touch DNA is fertile ground for the third theme of this book — bias contaminating interpretation. A few low-level peaks at a locus might be a real allele or might be noise; whether the analyst "sees" the suspect's allele there can drift toward what the analyst expects to find. If the case file says "we like Roy Keller for this," and Keller's profile is on the analyst's screen as a reference, the temptation to read ambiguous peaks as confirming Keller is real and documented. The fix is the same one we name throughout: context management — the analyst interprets the evidence profile before and blind to the suspect's reference whenever possible (we build this out in Chapter 31). At the limits of sensitivity, blind interpretation is not a nicety; it is the difference between reading data and reading your own expectation.

⚖️ In the Courtroom The cross-examination of single-source DNA used to be about whether the lab made an error. The cross-examination of touch DNA is about transfer, and it is devastating when the prosecution has overreached. A competent defense attorney will not dispute that the cells are the defendant's; they will concede it and ask the question the DNA cannot answer: How did they get there, and when? "Your profile establishes whose cells these are. It does not establish that my client touched this object, does it? It is consistent with him having shaken hands with someone who did?" An honest analyst must answer that secondary transfer is possible and that the profile alone does not distinguish direct from indirect deposit. The evidence remains real and often valuable — but its honest strength is "these cells are his," not "he handled this."

The lesson of touch DNA folds neatly back into Chapter 1. A DNA profile, even a perfect one, is a statement about a source of cells, not about an act. The strength of the method has moved the battleground: we rarely argue anymore about whose DNA, and increasingly argue about how it got there. That is real progress — and a permanent new place for error to live.

8.2 Degraded and low-template samples

The clean profile of Chapter 7 assumed long, intact DNA molecules and plenty of them. Real forensic samples often violate both assumptions. A body in a burned cabin, a bone from a shallow grave, a years-old stain on outdoor concrete, a cigarette butt baked in a car — these give the lab DNA that is degraded, low in quantity, or both. Understanding what degradation and low template do to a result is essential, because they produce profiles that look superficially like clean ones but carry far less weight, and an analyst (or a juror) who does not know the difference will overread them.

Start with degradation. DNA is a long molecule, and environmental insults — heat, moisture, ultraviolet light, microbial enzymes, time — break it into shorter and shorter fragments. STR typing reads each locus by amplifying and measuring a stretch of DNA; if the molecule is broken through that stretch, the locus cannot be amplified and produces no signal. The longer STR targets break first, so a degraded sample characteristically yields a profile that is strong at the small loci and fades or vanishes at the large ones.

FIGURE 8.1 — Electropherogram of a heat-degraded single-source sample (schematic, not to scale)
        small STR loci  ───────────────────────────►  large STR loci
peak    █                                              
height  █     █                                        
        █     █     █                                   
        █     █     █     ▌                             
        █     █     █     ▌     ▕      .       .      (nothing)
       ─┴─────┴─────┴─────┴─────┴──────┴───────┴──────────────►
        D3    vWA   D16   D2    ...    larger loci drop out first

  As fragment length increases (left → right), peak heights fall and then disappear.
  This "ski-slope" shape is the signature of degradation: the long targets break first.
  A real EPG carries exact peak heights in RFU; this sketch shows only the trend.

The diagram shows the classic "ski-slope" of a degraded sample: tall, reliable peaks at the small loci on the left, declining peaks in the middle, and nothing at the large loci on the right because those fragments are broken. The practical consequence is a partial profile — a profile typed at fewer loci than the full panel. A partial profile is still useful; it can exclude a suspect outright (one solid mismatch is a mismatch) and it can support an inclusion. But it supports that inclusion less strongly than a full profile, because fewer loci mean a larger random match probability — more people in the population share a five-locus profile than share a twenty-locus one. "Partial match" is an honest phrase that means "match, at fewer loci, with the rarity reduced accordingly," and it must never be allowed to sound as strong as a full match.

🔬 At the Bench Two technical fixes help with degradation, and both have limits. Mini-STRs redesign the primers so the amplified targets are shorter, sitting closer to the repeat region; this recovers some loci that standard kits lose, because a shorter target is more likely to survive intact. And modern multiplex kits are engineered with smaller amplicons overall for exactly this reason. But you cannot amplify a sequence that has been physically severed, and no chemistry restores broken DNA. When nuclear STRs fail entirely, the sample is handed to the methods of §8.4 — mtDNA, which survives where nuclear DNA does not. Knowing which technique a hopeless sample warrants is itself a skill.

Now the related but distinct problem of low-template DNA (also called low-copy-number, or LCN, typing): not damaged DNA, but too little of it — a handful of cells, picograms of template, far below the amount the standard method was validated for. Touch DNA (§8.1) is usually low-template. When you try to type so little starting material, the random, lottery-like nature of the early amplification cycles starts to dominate, producing stochastic effects — distortions that arise because, when only a few molecules are present, chance decides which get copied. Three named consequences matter:

Allele dropout — a true allele fails to amplify and simply does not appear, so a person who is genuinely heterozygous (two different alleles at a locus) looks homozygous (one), or a locus goes blank. Dropout makes a real contributor look like a non-match if you are not accounting for it.
Allele drop-in — a spurious allele appears that belongs to no true contributor, usually a stray molecule of contaminating DNA amplified by chance. Drop-in makes the profile look like it contains someone it does not.
Heightened stochastic imbalance — the two alleles of a heterozygous locus should amplify to roughly equal peak heights; with little template, one can tower over the other or one can drop out entirely, so peak heights stop being a reliable guide to who is and isn't present.

⚠️ Junk-Science Alert Here is the honest, uncomfortable part. Pushing DNA typing to its sensitivity floor — extra amplification cycles, interpreting a profile from a few cells — is not the same well-validated method as standard single-source typing, and it should not be sold to a jury as if it were. The reliability of very-low-template "LCN" interpretation has been seriously contested; in one prominent instance a court inquiry questioned a particular low-copy-number protocol's reliability for casework, and laboratories differ on whether and how to report such results at all. The danger is borrowing the credibility of gold-standard DNA for a result the gold-standard validation does not cover. This is the same move the book flags everywhere — a method's prestige outrunning its proof. When DNA is run at the edge, the honest analyst states that it was, reports the stochastic limits, and resists letting "DNA" do rhetorical work the actual result cannot support.

🔍 Check Your Understanding 1. A degraded sample yields strong peaks at the small loci and nothing at the large ones. Why that pattern, and what is the name for the resulting profile? 2. An analyst running a few skin cells sees a single allele at a locus where the suspect is heterozygous. Name the stochastic effect that could explain the missing allele, and say why it matters for calling a match. 3. Why is "partial match" an honestly weaker statement than "match," even when every typed locus agrees?

The throughline of this section is calibration, the same skill Chapter 1 demanded. Degraded and low-template results are not worthless — far from it — but they are graduated evidence, weaker than a clean profile in proportion to how little and how damaged the DNA was. The competent analyst reports that gradient honestly. The dangerous one lets every "DNA result" sound like the one-in-a-billion result, regardless of whether it came from a bloodstain or from three cells on a doorknob.

8.3 DNA mixtures: the hard problem

If touch DNA is where sensitivity bites back, DNA mixtures are where interpretation re-enters the method in force — and mixtures are, by a wide margin, the hardest routine problem in forensic DNA. A DNA mixture is a biological sample containing DNA from two or more contributors. Real evidence is full of them: a doorknob touched by everyone in a household, a weapon handled by several people, a sexual-assault sample combining victim and assailant, the steering wheel of a shared truck. The moment more than one person's DNA is present, the clean logic of Chapter 7 — read the two alleles at each locus, that's the donor — breaks down, because now there are more alleles at a locus than one person can own, and the analyst must work out who contributed what.

Picture a single locus where you would normally see one or two peaks. In a two-person mixture you might see four. Which two belong to contributor A and which to contributor B? If a suspect's two alleles are both somewhere in that set of four, is that meaningful — or would a large fraction of the population also "fit" into a crowd of four alleles? The problem compounds at every locus and explodes as contributors increase. This untangling has a name: deconvolution — the attempt to separate a mixed DNA profile into the individual profiles of its contributors, or at least to determine whether a given person could be among them.

FIGURE 8.2 — One locus of a two-person mixture vs. a single source (schematic)
  SINGLE SOURCE (heterozygote)            TWO-PERSON MIXTURE
  two alleles, ~equal height              up to four alleles, unequal heights
        █           █                          █                 
        █           █                          █        █        
        █           █                          █        █     █  
        █           █                          █        █     █     ▌
       ─┴───────────┴──►                      ─┴────────┴─────┴─────┴──►
        12          16                         12   14   16    18

  Single source: read the two alleles — done. Mixture: which alleles pair into which
  contributor? Peak heights hint at proportions but are unreliable at low template.
  Numbers are illustrative allele calls, not real data.

Several features make mixtures genuinely hard, and each is a place error can enter:

The number of contributors is itself an estimate. Counting alleles gives a minimum number of people, not the true number — three contributors can, by coincidence, share enough alleles to look like two. Misjudging the contributor count skews everything downstream.
Allele sharing and stacking. When two contributors share an allele, their peaks overlap and add, so peak height is an ambiguous guide to who is present and in what proportion.
Major and minor contributors. Often one person contributed most of the DNA (the major contributor) and another only a little (the minor contributor). The minor profile may be near the noise floor, riddled with the same dropout and drop-in as a low-template sample (§8.2) — so the hardest contributor to resolve is usually the one a case most wants.
Degradation on top of mixing. A mixture that is also heat-degraded or low-template (exactly the cold-case gas-can scenario) combines every difficulty at once: faded large loci, stochastic dropout, and overlapping contributors.

⚠️ Junk-Science Alert Mixture interpretation has a documented history of going wrong, and it is one of the strongest cautions in this chapter. For years, a common approach was the Combined Probability of Inclusion (CPI/RMNE) — a statistic for "the probability that a random person could not be excluded as a contributor." Used carelessly, especially on low-level or ambiguous mixtures, it produced numbers that overstated the strength of an inclusion. The problem became concrete when a U.S. crime laboratory's mixture-interpretation protocols were found wanting and a large number of past cases had to be reviewed and in some instances re-interpreted, sometimes changing the result. The lesson is not that mixtures are worthless; it is that a mixture statistic is only as good as the interpretation protocol behind it, and that complex, low-template mixtures can be at or beyond the limit of what any method can reliably resolve. Some mixtures — too many contributors, too little template — are honestly inconclusive, and "inconclusive" is a valid, ethical result that a good lab reports without embarrassment.

The state of the art has moved toward probabilistic genotyping — software that models dropout, drop-in, peak heights, and contributor proportions to compute a likelihood ratio for a proposed contributor, rather than forcing a binary include/exclude. That machinery is the subject of Chapter 9, where we also meet its "black-box" problem: several of these programs are proprietary, their source code shielded, so the defense cannot fully audit how a number that helps convict was produced. For now, hold the conceptual point: a mixture rarely yields "this is his profile." At best it yields "the evidence is X times more probable if this person is a contributor than if he is not" — a strength-of-evidence statement, with all the honesty and all the room for dispute that the likelihood ratio carries.

🔬 Read the Evidence

text FIGURE 8.3 — "The gas-can handle profile" [the cold case] THE ITEM A DNA profile developed from the swab of the gas-can handle recovered from the burned Mill Creek cabin — the same handle Chapter 7 sampled for touch DNA. THE CONTEXT Low-template touch deposit, partially heat-exposed near the fire. Swabbed, amplified, typed at the state lab. The electropherogram shows the ski-slope of degradation AND more than two alleles at several surviving loci. WHAT IT SHOWS This is a MIXTURE of at least two contributors, heat-degraded. One component is consistent with the victim, Marcus Diallo (his reference is on file from the autopsy). A minor component — low, partial, near the noise floor — is a second, unknown person. WHAT IT DOESN'T It does NOT, by itself, name the minor contributor; it does not say when or how those cells were deposited (secondary transfer is live); and the minor profile is too partial to treat as a clean single-source profile. The contributor count is a minimum, not a fact. THE INFERENCE Honestly: "a heat-degraded mixture; major component consistent with the victim; a minor, unknown contributor whose profile is partial." Interpreting it (a likelihood ratio for any named suspect) is Chapter 9's job, not this chapter's — and even then it will be "consistent with," never "identifies." THE LESSON Sensitivity bought us a profile from a handle; mixing and degradation took back much of its strength. The number of contributors, and how partial the minor one is, govern everything that can honestly be said next.

That figure is the heart of the chapter and the hinge of the cold case. Notice how far the honest reading sits from the television version. We did not "match the gas can to the killer." We recovered a damaged mixture, identified one component as the victim, and flagged a second, partial, unknown component — a lead, not a conclusion. Where that lead goes next is the business of §8.5 and of Chapter 9.

8.4 mtDNA and Y-STRs: when nuclear STRs fail

Sometimes the nuclear STR method of Chapter 7 simply has nothing to work with: the bone is decades old, the hair has no root, the sample is so degraded the ski-slope falls to zero. Two specialized DNA methods can sometimes speak where standard typing is silent — but both buy that ability at a steep price in discriminating power, and understanding that trade-off is the point of this section.

The first is mitochondrial DNA (mtDNA) typing. Most of the DNA this book has discussed lives in the cell nucleus, one copy per cell. But the mitochondria — the cell's energy organelles — carry their own small, circular genome, and there are hundreds to thousands of mitochondria per cell. That abundance is the whole point: where a single nuclear copy may be too degraded to read, some of the many mtDNA copies often survive. So mtDNA can be recovered from samples that defeat nuclear STRs — rootless hair shafts (the kind a microscope-only hair examiner used to over-interpret, Chapter 19), old bones and teeth, badly degraded tissue, the hard cases of mass-disaster identification (Chapter 35).

The price is severe and you must state it whenever you state the benefit. mtDNA is inherited maternally and is essentially identical along a maternal line: you, your mother, your siblings, your maternal grandmother, and your mother's sister's children all (barring rare mutations) share the same mtDNA type. So mtDNA cannot individualize. It can say "this hair is consistent with this person — and with everyone in that person's maternal lineage," and it can exclude (a different mtDNA type is a clean exclusion). It narrows; it does not point to one person. Its random match probability is also far weaker than nuclear STRs — instead of one in billions, mtDNA match frequencies are often more like a fraction of a percent of the population, depending on how common the type is. Useful, sometimes decisive for exclusion or identification-by-lineage; never the one-in-a-billion statement.

🔬 At the Bench mtDNA is reported by sequencing a couple of hypervariable regions of the control loop and comparing the sequence, base by base, to a reference and to known types in a population database. Because there are so many copies and contamination is correspondingly easy, mtDNA labs are fanatical about cleanliness — a single stray maternal-line contaminant can dominate. The result is usually expressed as the questioned and known sequences being the same type and the frequency of that type in the database. Note the careful verb: same type, not same person. An honest mtDNA report never lets "consistent with this maternal lineage" drift into "this individual."

The second method is Y-STR typing: STR analysis restricted to the Y chromosome, which only males carry. Its signature use is the sexual-assault mixture where a small amount of male DNA is swamped by a large amount of female DNA — exactly the case where ordinary autosomal typing drowns the male contributor under the victim's profile. Because Y-STRs ignore the female contribution entirely, they can pull a male profile out of that overwhelming female background. They also help count and characterize the number of male contributors in a mixture.

The Y chromosome carries the same lineage limitation as mtDNA, mirrored to the paternal side. The Y is passed essentially unchanged from father to son, so a Y-STR profile is shared by all males in a paternal line — a man, his father, his brothers, his sons, his paternal uncles and male cousins. A Y-STR "match" therefore means "consistent with this paternal lineage," not "this man and no other." Like mtDNA, it narrows to a family line rather than an individual, and its match statistics are correspondingly weaker than autosomal STRs.

⚖️ In the Courtroom The recurring error with both lineage markers is the same overstatement this book hunts everywhere: presenting a lineage match as an individual identification. "The mtDNA on the hair matches the defendant" is true and dangerously incomplete; the honest statement adds "and matches every maternal relative he has, and roughly p percent of unrelated people who happen to share this common type." A cross-examiner who knows this will ask, "Doesn't the defendant's brother share this exact Y-profile? His father? His paternal cousins?" — and the honest answer is yes. These methods are valuable precisely because they work on impossible samples; they are dangerous precisely because their narrowing is to a family, not a person, and juries hear "DNA match" as individual proof.

Where do mtDNA and Y-STRs sit on the validity spectrum? The underlying chemistry — sequencing, STR typing — is sound; these are not junk methods. But their discriminating power is intrinsically limited by biology, not by technique, and no improvement in the lab will ever make a maternally shared molecule individualize. They are best understood as honest, validated, lineage-level tools: strong for exclusion, strong for identification when combined with other evidence, genuinely able to work where nothing else can — and constitutionally unable to deliver the one-source certainty of clean nuclear DNA. That combination — real method, modest reach, honestly stated — is exactly the calibration this book wants you to carry.

8.5 Investigative genetic genealogy and the Golden State Killer

Now the triumph. For decades a serial offender known by a rotating set of names — the East Area Rapist, the Original Night Stalker, eventually the Golden State Killer — committed a long series of rapes and murders across California in the 1970s and 1980s and was never caught. Investigators had his DNA, recovered and typed from crime scenes. They had searched it against CODIS (Chapter 7) for years. CODIS is a database of convicted offenders and arrestees and crime-scene profiles; if your man has never been in the system, his crime-scene profile matches nothing, no matter how clean it is. The case was cold because the answer wasn't in the database. In 2018, it was solved — not by a better STR kit, but by a fundamentally different idea: investigative genetic genealogy (IGG).

The logic of IGG inverts the database problem. CODIS asks, "Is this exact person already in our criminal database?" — and for an unknown offender, the answer is no. IGG asks instead, "Are any of this person's relatives in a genealogy database?" — and because nearly everyone has dozens of third and fourth cousins, the answer, increasingly, is yes. Here is the method, step by step:

Generate a different kind of profile. Standard forensic typing reads ~20 STR loci, which is perfect for matching one profile to another but useless for genealogy. IGG re-types the crime-scene DNA across hundreds of thousands of SNPs — single-nucleotide polymorphisms, single-base differences in the genome — producing the same dense kind of profile that consumer ancestry companies generate from spit. (A SNP is a position in the DNA where individuals commonly differ by a single base; large panels of them are what genealogy matching runs on.)
Upload to a genealogy database that permits it. The SNP profile is uploaded to a service whose terms allow law-enforcement matching — in the original case, the public site GEDmatch, where users voluntarily upload their consumer results to find relatives. The database returns not "the offender" but a list of people who share enough DNA to be partial relatives — second, third, fourth cousins — ranked by how much they share.
Build family trees backward. Genealogists then take those cousin matches and, using traditional records — censuses, obituaries, marriage and birth records, newspapers — reconstruct the matches' family trees and work toward the common ancestors the offender must also descend from. From two or more distant matches on different sides of a tree, they triangulate downward to a small set of candidate descendants.
Narrow to a candidate, then confirm with ordinary DNA. Demographic facts from the case — approximate age, sex, geography — winnow the candidate descendants to a suspect. In the Golden State Killer case this pointed to Joseph James DeAngelo, a former police officer. Investigators then did the indispensable last step: they obtained a discarded abandoned DNA sample from DeAngelo (from items he threw away) and ran a conventional STR comparison against the crime-scene profile. That confirmed the identification. He was arrested in April 2018 and later pleaded guilty.

FIGURE 8.4 — How IGG inverts the database problem (schematic)
  CODIS (offender database)                 IGG (genealogy database)
  ┌───────────────────────┐                 ┌───────────────────────────┐
  │ crime-scene STR profile│                │ crime-scene SNP profile    │
  └───────────┬───────────┘                 └─────────────┬─────────────┘
              │ exact match?                               │ relatives?
              ▼                                            ▼
   ┌──────────────────┐                      ┌──────────────────────────────┐
   │  NO HIT (he's not │                      │ 3rd/4th cousins matched       │
   │  in the database) │                      │      │ build trees from records│
   └──────────────────┘                      │      ▼                        │
        cold case                            │ triangulate to candidates     │
                                             │      │ winnow by age/sex/place │
                                             │      ▼                        │
                                             │ ONE suspect → confirm with     │
                                             │ ordinary STR on abandoned DNA  │
                                             └──────────────────────────────┘
   IGG's output is a LEAD. Conventional DNA still makes the identification.

Two things about IGG matter more than any other, and they are why this book treats it as the emblem of honest progress rather than just impressive progress.

First: IGG does not "identify" anyone. It generates a lead. What comes out of the genealogy work is a name to investigate, not evidence for trial. The actual identification — the thing presented in court — is the conventional STR match between the crime-scene profile and the suspect's confirmed sample, the well-validated method of Chapter 7. This is the rare celebrated technique that is structurally modest: it hands off to gold-standard DNA for the conclusion and never pretends to be the conclusion itself. On the validity spectrum, then, IGG is best understood not as a courtroom identification method (it is not offered as one) but as an investigative lead-generation method — and as a lead generator that is then verified by the strongest method in the field, it is on extraordinarily solid ground. Its error mode is not "convicting the wrong person on genealogy" (the STR confirmation guards against that) but "following genealogical leads to the wrong family branch," which wastes effort and can intrude on innocent relatives' privacy — a real cost, but a different kind than a wrongful conviction.

Second: IGG works on the same DNA, but answers a different question, than CODIS. CODIS answers "is he already a known offender?" IGG answers "who are his relatives, and can we find him through them?" The genius of the Golden State Killer solution was recognizing that the offender's absence from the criminal database said nothing about his relatives' presence in consumer databases — a reservoir of millions of voluntarily uploaded profiles that no criminal-justice system built and no offender consented to populate. Which is exactly where the ethics get hard (§8.6).

🧠 Cognitive-Bias Watch IGG carries its own bias trap, distinct from bench bias. Once genealogists settle on a candidate family and a likely suspect, every subsequent fact can be read to fit — a confirmation-bias cascade running through public records rather than peak heights. The discipline's safeguard is the same in spirit as the lab's: the genealogical lead is treated as a hypothesis to be tested by independent confirmation (the abandoned-DNA STR comparison, plus conventional investigation), never as a conclusion the rest of the case must be bent to support. A genealogy that "feels right" is not a match; the STR confirmation is.

The Golden State Killer case is this book's emblem of validated forensic progress for a precise reason. It is not progress because it is dramatic (though it is). It is progress because it is honest: a powerful new way to generate a lead, married to the field's most rigorously validated method for confirming it, with the confirmation — not the genealogy — doing the work that puts a name in front of a jury. That is what real forensic advance looks like, and it is the opposite of the bite-mark trajectory we will study in Chapter 16, where a method's confidence ran far ahead of its proof.

8.6 The privacy and ethics of genetic databases (preview Ch. 38)

A method can be powerful, valid, and deeply contested all at once, and IGG is the clearest example in the book. Its power is exactly what makes it troubling, and a forensic scientist who can run a method but cannot reason about its ethics is only half-trained. We preview here what Chapter 38 takes up in full.

The core tension is consent at a distance. When you upload your DNA to a consumer genealogy service to find relatives, you are not only exposing your own genome; you are partially exposing the genomes of everyone who shares your DNA — siblings, parents, children, and cousins who never consented to anything and may not even know the database exists. IGG turns that ambient exposure into an investigative tool: a single distant cousin's voluntary upload can render an entire extended family findable by law enforcement. The offender did not consent; that is fair enough, he is a suspect. But neither did the dozens of innocent relatives whose presence in the database made him findable, and that is the part that should give you pause.

Several specific concerns recur, and a calibrated practitioner can name all of them without either dismissing IGG or romanticizing it:

Scope creep. IGG was first defended for the worst cases — serial murder and rape, the Golden State Killer. The honest question is where the line sits afterward: violent felonies only? Property crime? Identifying unknown remains (a use almost everyone supports)? Each step down expands whose family trees become police-searchable for what.
Database consent and terms. Different services have made different choices about whether to permit law-enforcement matching, sometimes changing their terms after users uploaded under the old ones. People who uploaded to find a half-sibling did not necessarily sign up to help solve a stranger's case. Informed consent is hard when the consequences run through relatives.
Innocent relatives. Being identified as the third cousin of a suspect carries no legal jeopardy, but it is an intrusion — your name in a murder file, an investigator at your relatives' doors — borne by people who did nothing.
Equity and error. Genealogy databases over-represent some populations (notably people of European descent) and under-represent others, so IGG's reach is uneven across groups. And following a genealogical lead to the wrong branch, while caught by STR confirmation before trial, still subjects the wrong family to scrutiny.
Regulation lag. The law governing what police may upload, to which databases, under what oversight, has lagged the technology. Some jurisdictions and some companies have written rules; many gaps remain.

⚖️ In the Courtroom A subtle but important legal point: because IGG produces an investigative lead rather than trial evidence, much of what is contested about it never reaches the jury directly — the jury usually hears the confirmatory STR match, not the genealogy that pointed there. That is part of why IGG has, so far, faced relatively limited courtroom challenge on reliability grounds (the reliable thing, the STR confirmation, is old and validated). The live disputes are instead about how the lead was developed — database terms of service, the propriety of the search, Fourth Amendment questions about searching consumer databases — which surface in suppression motions more than in admissibility fights over scientific validity. The validity question and the propriety question are genuinely different, and conflating them muddles the debate.

The point of this section is not to resolve the ethics; reasonable people, and reasonable jurisdictions, disagree. The point is the habit of mind. The reflex this book has trained — what does this method prove, and how do we know? — has a companion reflex for tools this powerful: who bears the cost of using it, and did they consent? IGG can be, all at once, a triumph that closed unsolvable cases, a model of methodological honesty, and a surveillance capability that outran the rules meant to govern it. Holding those three judgments together, without collapsing into either boosterism or alarm, is the mature forensic posture. We return to it, with the field's other reform debates, in Chapter 38.

🗂️ The Case File

The state lab's report on the gas-can handle comes back, and it is not the clean hit the detectives hoped for. Recall from Chapter 7 that touch DNA was recovered from the handle of the gas can found in the burned Mill Creek cabin, and a profile was developed and searched against CODIS. This chapter tells you what kind of profile it actually is.

The handle yielded a low-template, heat-degraded mixture — exactly the triple-difficulty sample this chapter has been about. The electropherogram shows the ski-slope of degradation (the large loci faded) and more than two alleles at several surviving loci, so it is a mixture of at least two contributors. One component is consistent with the victim, Marcus Diallo, whose reference profile is on file from the autopsy — unsurprising on an object in a structure he was working in. The second is a minor contributor: low, partial, near the noise floor, an unknown person who is not in CODIS.

What this does and does not establish, stated honestly: - It does establish that at least two people's cells are on the handle, one of them very likely the victim, and that a second, unknown person is present as a minor contributor. - It does not name that second person. The minor profile is too partial and too degraded to treat as a clean single-source profile, and CODIS returned no hit for it. - It does not say when or how those cells were deposited. Secondary transfer (§8.1) is fully live: a minor contributor's cells could, in principle, have reached the handle indirectly. - The "at least two" is a minimum contributor count, not a certainty.

Because CODIS came up empty on the minor contributor, the investigators do what the Golden State Killer case taught the field to do: they ask whether the unknown person can be reached through his relatives. The minor component, partial as it is, is worked up toward an investigative genetic genealogy lead — a SNP profile and a search for distant cousins (§8.5). That lead is generated here; it is not an identification, and nothing in this chapter names a suspect.

Running status. The gas-can DNA is a heat-degraded mixture — victim plus at least one unknown minor contributor — and a genealogy lead has been generated from the minor part. Interpreting that mixture (computing a likelihood ratio for any named person, and refuting the detective who will surely overstate it) is the work of Chapter 9, not this one. No one is included, no one is excluded, and the honest one-line summary is: a mixture exists, a lead exists, and interpretation is owed before any name is spoken.

Conclusion

Chapter 7 showed you DNA at full strength; this chapter showed you DNA at its limits, which is where most real casework actually lives. Touch DNA extended the method's reach to the few cells a hand leaves behind — and in doing so made secondary transfer a first-order problem, because a perfect profile is still only a statement about a source of cells, never about an act. Degradation and low template turn a clean result into a graduated one, distorted by dropout and drop-in, and "partial match" must never be allowed to sound like "match." Mixtures are the hard problem of the field, where deconvolution re-admits human judgment — and human error — into a method otherwise prized for its objectivity, and where "inconclusive" is sometimes the only honest answer. mtDNA and Y-STRs can speak on samples that defeat everything else, but only at the level of a maternal or paternal lineage, never an individual. And investigative genetic genealogy, the chapter's triumph, closed the unsolvable case by inverting the database problem — yet did so honestly, generating a lead that gold-standard DNA then confirmed, while raising privacy questions the law is still catching up to.

Two of the book's themes ran through all of it. The validity spectrum (theme two) does not stop at the door of DNA: within this single, strong discipline, a clean single-source profile and a complex low-template mixture sit far apart, and a genealogy lead is a different kind of claim again. And cognitive bias (theme three) re-enters wherever interpretation does — at the noise floor of a touch sample, in the deconvolution of a mixture, in the construction of a family tree — which is exactly why context management matters most precisely where DNA is hardest.

The cold case now has a mixture and a lead, and an honest analyst's refusal to say more than that. In the next chapter we do the thing this chapter kept deferring: we interpret. Chapter 9 takes the mixture and asks what it is actually worth — how to turn overlapping peaks into a likelihood ratio, how probabilistic genotyping computes one, and how to recognize and refute the prosecutor's fallacy the moment a detective says, "so there's only a one-in-a-billion chance it isn't him."

Key Terms

Touch DNA — genetic material recovered from the skin cells a person sheds onto an object by handling it (also contact DNA); names the sample type, usually low in quantity.
Trace DNA — the broad umbrella for any tiny, often invisible biological deposit (touch DNA included) from which a profile might be recovered.
Low-template DNA — DNA present in very small quantity (few cells / picogram amounts), below standard validation levels, where random amplification effects (stochastic effects) distort the profile.
Secondary transfer — DNA arriving on an object because its source contacted something or someone else that then contacted the object, rather than the source contacting the object directly.
Allele dropout — the failure of a true allele to amplify in a low-template sample, so a heterozygote can appear homozygous or a locus can go blank (drop-in is the converse: a spurious allele appears).
DNA mixture — a biological sample containing DNA from two or more contributors, producing more alleles at a locus than any single person can own.
Deconvolution — the attempt to separate a mixed DNA profile into its individual contributors' profiles, or to assess whether a given person could be among them.
mtDNA (mitochondrial DNA) — the small, high-copy-number, maternally inherited genome in the mitochondria; recoverable from degraded/rootless samples but shared along a maternal line, so it cannot individualize.
Y-STR — STR typing of the male-only Y chromosome; isolates a male contributor from a female-heavy mixture but is shared along a paternal line, so it identifies a lineage, not a person.
Investigative genetic genealogy (IGG) — generating a dense SNP profile from crime-scene DNA, finding distant relatives in a consumer genealogy database, and reconstructing family trees to develop an investigative lead that conventional DNA then confirms.
SNP (single-nucleotide polymorphism) — a genomic position where individuals commonly differ by a single base; large panels of SNPs are the basis of genealogy matching.

Spaced Review

A touch-DNA profile on a knife handle matches the defendant with a tiny random match probability. Name the single most important question the defense will ask that the DNA cannot answer, and the §8.1 concept behind it. (§8.1)
From Chapter 7: what does a CODIS search actually compare, and why did it return nothing for the unknown gas-can contributor — setting up the need for genealogy? (§8.5; Ch. 7)
Why can mtDNA be recovered from a rootless hair when nuclear STRs cannot — and what does mtDNA give up in exchange for that robustness? (§8.4)
Validity spectrum: Where does investigative genetic genealogy sit on the NAS/PCAST validity spectrum, and why is the answer different from where you would place the STR confirmation that follows it? (§8.5; the spectrum from Ch. 1 and Ch. 6)
From Chapter 1: restate "forensic science excludes more reliably than it proves" in the specific context of a degraded mixture — what kind of statement can the gas-can profile make cleanly, and what kind can it only make weakly? (§8.3; Ch. 1, §1.6)