Chapter 29: Emerging Technologies: AI in Forensics, Rapid DNA, Microbial Forensics, and the Future

DataField.Dev

44 min read

> "Every method in this book was once an emerging technology. The bite mark was once the future, too."

Prerequisites

1
4
5
6
7
8
9

Learning Objectives

Explain what rapid DNA is, what it can and cannot do in minutes rather than weeks, and why the trade for speed is a narrowing of the samples and the questions it can responsibly handle.
Define microbial forensics and the necrobiome, describe the kinds of legal questions microbial signatures might answer, and state plainly how far this field is from courtroom readiness.
Describe forensic isotope analysis and how stable-isotope ratios can suggest geographic origin and life history, while distinguishing an investigative lead from a courtroom identification.
Explain how AI and machine learning are entering forensic work, separate their genuine promise from their specific perils — black-box opacity, training-data bias, automation bias — and apply the validity yardstick to an algorithm.
Distinguish investigative genetic genealogy from familial searching, walk through the family-tree-to-suspect logic with the Golden State Killer case, and articulate the privacy and equity questions each raises.
Apply the NAS 2009 / PCAST 2016 validity framework prospectively — to evaluate a method that is not yet established — and state the questions a careful court should ask before any emerging method enters evidence.

In This Chapter

Overview
Learning Paths
29.1 Rapid DNA: minutes, not weeks
29.2 Microbial forensics and the necrobiome
29.3 Forensic isotope analysis: geographic origin
29.4 AI and machine learning in forensics (promise and peril)
29.5 Advanced genetic genealogy and familial searching
29.6 Evaluating tomorrow's methods before they enter court
🗂️ The Case File
Conclusion
Key Terms
Spaced Review

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 29: Emerging Technologies: AI in Forensics, Rapid DNA, Microbial Forensics, and the Future

"Every method in this book was once an emerging technology. The bite mark was once the future, too." — constructed line, in the voice of this book; the warning the whole chapter turns on [constructed teaching example]

Overview

There is a particular danger in a chapter about the future, and it is the opposite of the danger that haunts the rest of this book. Everywhere else, we have been correcting overconfidence — methods admitted to courtrooms decades ago on the authority of their practitioners, never validated, sometimes catastrophically wrong. Here the temptation runs the other way: toward credulous excitement, the breathless press release, the conference demonstration, the startup that promises to do in ninety minutes, or with an algorithm, what used to take a laboratory weeks. The reflex this chapter asks you to keep is therefore the same reflex the whole book has been building, simply pointed at tomorrow instead of yesterday: what does this method actually establish, and how do we know? The fact that a technique is new — that it runs on a sleek instrument, or invokes the magic words "artificial intelligence" — is not evidence that it works. It is, if anything, a reason for more caution, because the new method has had less time to be tested, less time to fail in public, and less time to accumulate the boring validation studies that separate a real tool from a plausible one.

This chapter surveys four genuinely promising frontiers and one fast-maturing technique, and holds every one of them to the validity yardstick of Chapter 6 — the NAS 2009 and PCAST 2016 reports. Rapid DNA moves a real, validated method (the STR typing of Chapter 7) out of the laboratory and into a box on a booking-station counter, trading some of the laboratory's safeguards for speed. Microbial forensics and the necrobiome propose to read the communities of bacteria on and around a body — or transferred between people and places — as a new kind of clock and a new kind of trace. Forensic isotope analysis reads the chemical signatures that food and water write into a person's tissues, hinting at where they have lived and traveled. AI and machine learning are quietly entering nearly every discipline already covered, from fingerprint search to probabilistic genotyping to image analysis, carrying both real power and a new family of failure modes. And familial searching, alongside the investigative genetic genealogy you met in Chapter 8, extends DNA's reach from the suspect to the suspect's relatives — the most powerful, and most contested, investigative development of the era.

The unifying lesson is in the last section. Section 29.6 is this book's "evaluate before you trust" discipline applied to the future: a checklist for meeting any tomorrow's method at the courtroom door and asking it the questions that the bite mark, the hair comparison, and the confident arson investigator were never asked in time. If you finish this chapter able to do that — to be genuinely excited about a new tool and genuinely skeptical of it, in the same breath — you have learned the most durable thing forensic science can teach.

In this chapter, you will learn to:

Explain what rapid DNA can and cannot do, and why speed is bought by narrowing the question.
Define microbial forensics and the necrobiome, and state honestly how far they are from court.
Describe forensic isotope analysis and the difference between a geographic lead and an identification.
Separate the promise of AI/ML in forensics from its specific perils, and apply the validity yardstick to an algorithm.
Distinguish familial searching from investigative genetic genealogy, and reason about the ethics of each.
Apply the NAS/PCAST framework prospectively — to a method that is not yet established.

Learning Paths

🔎 Investigator/CSI: The instruments in §29.1 are landing in booking stations and at scenes, and the decisions about what sample goes into the box are increasingly yours. Weight §29.1 and §29.5 — rapid DNA's appropriate-sample question and how a genealogy or familial lead is actually generated and handed off. 🧪 Lab analyst: You are the gatekeeper between a promising method and a courtroom claim. Weight §29.4 (what a model is actually doing, and where validation must happen) and §29.6 (the evaluation discipline). The hardest professional judgment of your career may be saying "not yet" to an exciting tool. ⚖️ Law/courtroom: Every method here will reach a Daubert hearing (Chapter 5) before it reaches a verdict. Sections 29.4, 29.5, and 29.6 are yours: the black-box challenge, the difference between an investigative lead and trial evidence, and the questions that decide admissibility. 👥 General reader/juror: The future is sold to you in headlines. Sections 29.1, 29.4, and 29.6 are the antidote — how to hear "rapid," "AI," and "breakthrough" and ask the one question that matters: has anyone measured how often it is wrong?

29.1 Rapid DNA: minutes, not weeks

Begin where the public's frustration is loudest. In Chapter 4 you learned that a real crime laboratory runs on a backlog measured in weeks or months; in Chapters 7 and 8 you learned why DNA, done properly, is slow — extraction, amplification by PCR, separation by capillary electrophoresis, and interpretation by a trained analyst, each step deliberate and quality-controlled. The detective who needs to know tonight whether the man in the holding cell is the source of a crime-scene profile has, for the whole history of forensic DNA, simply had to wait. Rapid DNA is the technology built to answer that detective.

Rapid DNA is the fully automated generation of a STR profile from a reference sample — typically a cheek (buccal) swab — inside a single self-contained instrument, without a human analyst performing the steps, in roughly ninety minutes to two hours. You put the swab in a cartridge, the cartridge in the machine, and the machine does extraction, amplification, separation, and an initial profile call end to end. The chemistry is not new; it is the same STR typing validated over thirty years (Chapter 7), miniaturized and ruggedized into a "swab in, profile out" appliance simple enough to sit on a booking-station counter and be run by a trained police officer rather than a laboratory scientist. That last fact — that it moves DNA typing out of the accredited laboratory — is the source of both its promise and every one of its problems.

The promise is real and worth stating plainly. For a single-source reference sample of good quality — a known person's buccal swab — rapid DNA does, quickly and reliably, exactly what a laboratory would do slowly. In the United States, this use was specifically enabled by federal legislation (the Rapid DNA Act of 2017) that created a pathway for approved rapid-DNA instruments to generate profiles from arrestees' reference swabs and, under defined conditions and approved processes, to search them against CODIS (Chapter 7). For booking-station identification — confirming who a person is, linking an arrestee to other cases through the database, clearing or implicating quickly — rapid DNA delivers a genuine operational advance built on a genuinely validated foundation.

🔬 At the Bench What is actually inside the box, and what is given up to fit it there. A conventional laboratory workflow lets an analyst intervene at every step: re-extract a stubborn sample, adjust the amount of DNA going into PCR, re-run a failed injection, and — crucially — look at the data and exercise judgment before a profile is called. Rapid DNA automates all of that away. The gain is speed and simplicity; the cost is the loss of the human checkpoints. The instrument is validated to handle a clean, abundant, single-source sample, and within that envelope it performs well. Push it outside the envelope — a trace sample, a degraded sample, a mixture — and the very automation that makes it fast removes the analyst who would have recognized the problem. A rapid-DNA instrument does not know that it is in trouble; it returns a result with the same confident formatting whether the input was a pristine buccal swab or a smear of mixed, degraded cells it should never have been asked to type.

That envelope is the entire interpretive content of this section, so let us be exact about its edges. Rapid DNA is designed and validated for reference samples: known individuals, swabbed under controlled conditions, yielding plentiful single-source DNA. The hard, ambiguous, evidentiary samples that dominate real casework — the low-template, heat-degraded mixture on a gas-can handle, for instance — are precisely the samples rapid DNA is least suited to, because they demand the human interpretation (Chapter 8) and the probabilistic reasoning (Chapter 9) that the automated box was built to skip. Running a complex crime-scene sample through a rapid-DNA instrument, outside an accredited laboratory and without a qualified analyst, is not a faster version of good forensic DNA; it is a different and weaker thing wearing the same name.

⚠️ Junk-Science Alert The danger of rapid DNA is not the chemistry, which is sound. The danger is scope creep dressed as progress — the slide from "fast typing of clean reference swabs" to "field DNA analysis of crime-scene evidence at the scene, by non-analysts, with no laboratory in the loop." The instrument's marketing momentum pushes toward the second use; the science supports only the first. A profile is only as trustworthy as the sample behind it and the interpretation in front of it, and a machine that removes the analyst has removed the person who would have said "this sample is a mixture — stop." When you hear that an agency is running evidentiary samples through rapid DNA in the field, the right response is not excitement. It is the question this book always asks: what is the validated envelope, and is this sample inside it?

Where does rapid DNA sit on the validity spectrum? This is a subtler placement than usual, because the answer depends entirely on the use. For approved reference-sample typing inside its envelope, rapid DNA inherits the strong validity of STR typing itself — it is the gold-standard method, automated, and it sits near the top of the spectrum. For complex evidentiary samples outside its envelope, it is not a validated method at all; it is an automated instrument being misapplied, and the appropriate weight is little to none. The lesson generalizes to everything else in this chapter: a method's validity is never a property of the instrument alone, but of the instrument applied to a particular sample to answer a particular question. Rapid DNA is the cleanest example in the book of a single technology that is simultaneously excellent and dangerous, depending only on whether it is kept inside the lines.

🔍 Check Your Understanding 1. A rapid-DNA instrument returns a clean profile from an arrestee's buccal swab in 100 minutes. Is this a legitimate use? Now the same instrument is used on a degraded crime-scene swab at a scene. What specifically has changed, and which earlier chapter's problem (touch DNA, mixtures) has been ignored? 2. Why does removing the human analyst — the feature that makes rapid DNA fast — also remove a safeguard? Name one thing an analyst would catch that the box will not.

29.2 Microbial forensics and the necrobiome

Now to a frontier far earlier in its development, and therefore one to discuss with correspondingly heavier caveats. Every human body is host to vast communities of microorganisms — bacteria, archaea, fungi, viruses — collectively the microbiome, and these communities live on the skin, in the gut, in the mouth, and almost everywhere else. Microbial forensics is the application of the analysis of these microbial communities to legal questions: estimating time since death from the microbes that colonize a corpse, associating a person with a place or object through transferred skin microbes, or — in its oldest and most established branch — attributing a deliberate biological attack to its source organism and origin. It rests on a real and rapidly advancing science (the sequencing of microbial communities), and it proposes several genuinely intriguing forensic uses. It is also, for most of those uses, nowhere near a courtroom, and a responsible treatment says so first and loudest.

The use most relevant to a death investigation is the necrobiome — the community of organisms, microbial and otherwise, that participates in the decomposition of a body and changes in a characteristic, time-dependent way after death. The idea is a microbial cousin of the entomological clock you met in Chapter 13. Just as insects colonize remains in a predictable succession (and just as that succession can be read, with care, as an estimate of the postmortem interval), the microbial communities on and within a corpse — and in the soil beneath it — shift through a reproducible sequence as decomposition proceeds. Researchers have proposed that sequencing these communities and reading where they sit in that sequence could yield an estimate of the postmortem interval (PMI) — the time since death we first met in Chapter 11 — potentially extending the reach of PMI estimation into intervals where the pathologist's early markers and even the insects have become unreliable.

🔬 At the Bench How the analysis works, in outline. A swab or tissue or soil sample is taken, the microbial DNA is extracted, and a marker region — often a particular gene that differs enough between microbial taxa to identify them — is sequenced in bulk, producing a census of which microbes are present and in what relative abundance. That community profile is then compared against reference data describing how such communities change over time after death, under given conditions. The promise is a clock that keeps running long after the insects have finished and that might work on remains where other methods fail. The catch is in the phrase under given conditions: microbial succession, like insect succession, is exquisitely sensitive to temperature, soil chemistry, moisture, the individual's own starting microbiome, and a dozen other variables — and the reference data needed to account for all of that, across the range of real-world scenes, are still being assembled.

Two other microbial-forensic ideas deserve mention, with the same sobriety. The first is microbial trace evidence: because each person's skin microbiome is somewhat individualized and is shed onto the objects and surfaces they touch (a microbial echo of Locard's exchange principle, Chapter 3), some researchers have proposed using transferred skin bacteria to associate a person with a touched object or place — a microbial complement to touch DNA. The second is the field's most mature branch, microbial source-tracking in a bioterrorism or biocrime context: when a pathogen is deliberately released, characterizing the organism's strain and genetic features can help attribute it to a particular source culture or origin, the discipline that came to prominence in the investigation of an anthrax-letter attack. This attribution work is the most developed corner of microbial forensics precisely because it has had the most time, the most resources, and the most rigorous scrutiny.

⚠️ Junk-Science Alert Microbial forensics for PMI and for trace association is promising research, not established forensic practice — and the gap between those two things is exactly what this book exists to make you respect. The underlying science (microbial sequencing) is real and powerful; the forensic application lacks, as yet, the large, reproducible, condition-spanning validation studies and the measured error rates that the PCAST 2016 report demands before a feature-comparison method should carry weight in court. The correct posture toward a microbial-PMI estimate today is the posture this book recommends toward any method without an established error rate: interested, hopeful, and unwilling to let it convict anyone. The history of forensic science is a graveyard of plausible-sounding methods that entered courtrooms before they were validated. Microbial forensics has the chance to be the rare method that earns its place first. Whether it does depends on the validation, not the promise.

Where on the validity spectrum does microbial forensics sit? Honestly: emerging — below the established methods, not because its science is unsound but because its forensic validation is incomplete. Its bioterrorism-attribution branch is the most mature; its PMI and trace-association uses are research frontiers with real potential and real unanswered questions about reproducibility and error. The point of including it here is not to teach a method you will use tomorrow but to practice the discipline of §29.6: meeting a genuinely exciting science at the door and asking, without embarrassment, where are the validation studies, and what is the error rate? — and being willing to wait for the answer.

29.3 Forensic isotope analysis: geographic origin

Some of the most evocative emerging evidence does not identify a person at all; it tells you where a person — or an unidentified body — has been. Forensic isotope analysis is the measurement of the ratios of stable isotopes of common elements (such as hydrogen, oxygen, carbon, nitrogen, sulfur, and strontium) in biological tissues to infer information about an individual's geographic origin, travel history, and diet. It does not say who someone is; it constrains where they likely lived and what they likely ate, which can be precious when an unidentified body has no name, no dental records, and no database hit — narrowing the search, generating leads, sometimes giving the unidentified dead a region to come home to.

The principle is genuinely elegant, and it rests on solid chemistry. A stable isotope is a non-radioactive variant of an element with a different number of neutrons (and thus a slightly different mass); elements occur naturally as mixtures of isotopes, and the ratio of one isotope to another varies measurably from place to place and food to food. The water you drink carries a hydrogen-and-oxygen isotopic signature that varies with geography and climate — the ratios in rainwater differ systematically across regions — and that signature is incorporated into your tissues. The food you eat carries carbon and nitrogen signatures that reflect the kinds of plants at the base of your food web and your position in it. Strontium isotopes in your teeth and bones reflect the geology of the place whose food and water you consumed. Because different tissues form and turn over at different rates — tooth enamel locks in a signature from childhood and never changes; hair grows out as a continuous tape recording of the last months; bone remodels over years — a single body can carry, in different tissues, a layered record of where this person was at different times of life.

🔬 Read the Evidence

text FIGURE 29.1 — "What the tissues remember" [constructed teaching example] THE ITEM Unidentified skeletal and soft-tissue remains. Samples taken: tooth enamel (forms in childhood, never remodels), a length of scalp hair (records the last several months, root to tip), and cortical bone (remodels over years). THE CONTEXT No name, no dental match, no database hit. Stable-isotope ratios (oxygen/hydrogen for water, carbon/nitrogen for diet, strontium for geology) measured in each tissue and compared to published regional reference maps. WHAT IT SHOWS The enamel signature is consistent with a childhood spent in one broad climatic/ geological region; the bone with adult years in a different region; the hair with a dietary/water shift in the final months — a layered life-history sketch. WHAT IT DOESN'T It does not name the person, fix a specific town, or prove a route. Many places share similar isotopic values; reference maps are coarse and uneven; individual variation and unusual diets blur the signal. It is a *region*, not an address. THE INFERENCE The remains are *consistent with* an individual who spent childhood in region A, adulthood in region B, and changed diet/location shortly before death — an investigative lead to narrow a missing-persons search, not an identification. THE LESSON Isotopes answer "from roughly where, and eating roughly what," not "who." Their honest output is a region and a life-history sketch that *narrows*, never a name.

Read that figure carefully, because it contains both the power and the limit. The power: a layered, tissue-by-tissue reconstruction of a life's geography from a body that can otherwise say nothing — a real contribution to identifying the unidentified, and to confirming or refuting a claimed origin or travel history. The limit: every inference is to a region, often a broad one, because many places share similar isotopic signatures, the reference maps that translate a measured ratio into a geographic range are coarse and incomplete (especially outside well-studied regions), and individual idiosyncrasies — an unusual diet, bottled water from elsewhere, a recent move — can scatter the signal. Isotope analysis narrows the field; it does not close it. That is the exclusion-over-proof theme (Theme 1) in a new dialect: the evidence's honest job is to shrink the universe of possible origins, not to pin one.

⚖️ In the Courtroom Isotope evidence has a defined and defensible role — and a tempting overreach. Its defensible use is investigative: narrowing a missing-persons search, prioritizing leads, testing whether an unidentified body's likely origin matches a candidate. Used that way, it is a real and valuable tool with a sound chemical basis. The overreach is to present an isotopic "match" to a region as though it identified an individual or proved a specific itinerary. An honest expert testifies to consistency with a region and to the coarseness of the reference data, quantifies confidence in cautious words, and concedes that many origins could produce a similar signature. "The isotopes are consistent with childhood in this broad region" is defensible; "the isotopes prove she grew up in this town" is not. As with palynology (Chapter 13), the chemistry is sound but the forensic precision is far lower than the instrument's decimal places suggest, and the discipline is in matching the claim to the evidence's true resolution.

On the validity spectrum, forensic isotope analysis occupies a position much like the geographic and associative methods of Part IV: the underlying measurement is rigorous analytical chemistry (the same instrumental confidence as Chapter 23's methods), but the forensic inference from measurement to geography is probabilistic, coarse, and reference-data-limited. It is strong for what it honestly claims — a region, a life-history sketch, an investigative lead — and weak the moment it is asked to identify a person or prove an itinerary. Sound instrument, modest inference: a pattern you should now recognize as the signature of an honest emerging method.

29.4 AI and machine learning in forensics (promise and peril)

No emerging technology is reshaping forensic science more broadly, or more quietly, than this one — and none demands the validity yardstick more urgently. AI/ML in forensics is the use of artificial-intelligence and machine-learning systems — algorithms that learn patterns from large sets of training data rather than following rules a human wrote explicitly — to perform or assist forensic tasks: searching fingerprint and face databases, deconvoluting DNA mixtures, detecting deepfakes, triaging digital evidence, comparing images, and more. It is already embedded in tools you have met in earlier chapters: the AFIS that searches a latent against millions of prints (Chapter 14), the probabilistic genotyping software that interprets mixtures (Chapter 9), the systems that flag child-exploitation material in digital forensics (Chapter 25). The question is no longer whether AI is in forensic science. It is whether we are evaluating it with the rigor its power demands — and, too often, the answer is no.

Start with the promise, because it is genuine. Machine-learning systems can find faint, complex patterns in data that exceed human perception; they can search enormous databases in seconds; they are tireless and, unlike a human examiner, do not get bored, anxious, or swayed by what the detective said in the hallway. A well-built, well-validated model applied to the right problem can outperform human analysts on narrow, well-defined tasks, and can do so consistently. For a field whose central weakness, the whole book has argued, is the cognitive bias of the human examiner (Theme 3), the prospect of a tool that does not share those biases is genuinely appealing. That appeal is real, and a reflexive dismissal of AI would be as foolish as a reflexive embrace.

But the perils are specific, and they form a new family of failure modes that every forensic practitioner must learn to name. Three matter most.

The first is the black box. Many machine-learning systems — especially the most powerful "deep learning" ones — produce an output without producing a human-understandable reason for it. The model says "match," or assigns a number, but cannot explain, in terms a person can examine and a court can probe, why. This collides head-on with the entire premise of forensic testimony, which is that an expert's reasoning can be explained, scrutinized, and cross-examined (Chapter 30). An algorithm that cannot show its work is, from the law's perspective, an expert who refuses to answer questions. The defense cannot probe the basis of the conclusion; the jury cannot weigh the reasoning; the Daubert gatekeeper (Chapter 5) cannot assess a method whose inner logic is opaque even to its builders. A correct answer with no auditable reasoning is not, by itself, admissible science — it is an oracle, and oracles have no place on a witness stand.

🧠 Cognitive-Bias Watch AI does not abolish cognitive bias; it relocates and sometimes amplifies it. Two mechanisms deserve names. The first is automation bias — the well-documented human tendency to over-trust a machine's output, to defer to it precisely because it is a machine and feels objective. An examiner who would have scrutinized a colleague's conclusion may wave through an algorithm's, reasoning that "the computer found it." The cure for examiner bias becomes a new source of error if the human stops thinking. The second is bias laundering: a model trained on historically biased data learns the bias and then re-emits it wearing the costume of mathematical objectivity. If the training data encode the skewed enforcement patterns of the past, the model reproduces them — but now the discriminatory output carries the false authority of an "objective algorithm," which is harder to challenge than an obviously fallible human. The machine did not remove the bias; it hid it behind a decimal point. Watch for both: the human who defers too readily, and the model that has quietly inherited the prejudices of its training set.

The second peril is training-data bias, which the Cognitive-Bias Watch just named and which deserves its own emphasis. A machine-learning model is only as good as the data it learned from. If a facial-recognition system was trained predominantly on faces of one demographic group, it will perform worse — produce more false matches — on the underrepresented groups, a disparity documented across multiple commercial systems. If a recidivism-prediction tool learned from arrest data shaped by decades of uneven policing, it will encode and reproduce that unevenness. The model does not invent fairness; it inherits the statistics of its training set, including their injustices. And because the output looks like neutral computation, the inherited bias is less visible and harder to contest than the same bias in an openly human judgment.

The third peril is the validation gap — and it is the one this whole chapter has been building toward. Too many AI forensic tools are deployed without the kind of independent, blind, adversarial validation that the PCAST 2016 report demanded of any feature-comparison method. A vendor's claim that its system is "99% accurate" is a marketing statement, not a validation study, until you know: accurate on what test set, drawn from what population, measured against what ground truth, with what false-positive and false-negative rates across which subgroups, validated by whom with no stake in the outcome. The same questions Chapter 6 taught you to ask of the bite mark and the hair comparison apply, without modification, to the algorithm. A neural network is not exempt from the validity spectrum because it is sophisticated. If anything, its opacity makes the demand for measured error rates more essential, not less.

🔬 Read the Evidence

text FIGURE 29.2 — "The confident algorithm" [constructed teaching example] THE ITEM A facial-recognition system returns a candidate "match" between a grainy surveillance still and a driver's-license photo, reported with a similarity score of 96%. THE CONTEXT The vendor advertises "99% accuracy." No independent validation on this population is on file; the training set's demographic composition is undisclosed; the model cannot explain which features drove the score. WHAT IT SHOWS The system ranked this photo as its top candidate against its database, by its own internal similarity metric, under its own undisclosed training. WHAT IT DOESN'T It does NOT establish identity. "96%" is the model's internal score, not a probability that this is the right person; "99% accuracy" is unverified marketing on an unknown test set; false-match rates may be far higher for this image quality and this subgroup; the model's reasoning is unauditable. THE INFERENCE At most, an *investigative lead* — a candidate to investigate by independent means, exactly as IGG produces a lead that DNA must confirm (§29.5). NEVER an identification on the strength of the score. THE LESSON A similarity score is not a probability, and a vendor's accuracy claim is not a validation study. Treat an algorithm's output as a lead to be confirmed, demand the measured error rate, and distrust any number whose basis you cannot audit.

The schematic above generalizes into a discipline. An AI forensic tool should be treated exactly as Chapter 8 taught you to treat investigative genetic genealogy: as a powerful generator of leads that must be confirmed by independent, auditable, validated means — never as a source of conclusions in itself. Where an algorithm's output can be confirmed by a transparent method (a human examiner re-doing the comparison with documented reasoning; a conventional DNA match confirming a database candidate), the algorithm has done legitimate work. Where its output is the conclusion, unexplained and unvalidated, it is the bite mark of the twenty-first century: a confident claim of identification with no auditable basis and no measured error rate. The form is new; the failure mode is the oldest one in this book.

Where do AI/ML methods sit on the validity spectrum? It depends entirely on the specific tool, the specific task, and — above all — on whether it has been independently validated with measured, subgroup-specific error rates and auditable reasoning. A narrowly scoped, transparently validated model used as a lead generator and confirmed downstream can sit high. An opaque, vendor-validated, never-independently-tested system offered as proof of identity sits at the bottom, with the discredited methods, no matter how advanced its architecture. The validity spectrum does not care how new or how clever a method is. It cares whether anyone has measured how often it is wrong — and AI, for all its promise, has too often skipped that step.

29.5 Advanced genetic genealogy and familial searching

Chapter 8 introduced investigative genetic genealogy (IGG) and used the Golden State Killer case to show how a distant cousin's DNA, voluntarily uploaded to a consumer genealogy database, could give a name to an offender CODIS had never heard of. This section advances that anchor from the emerging-technology angle — the mechanics that make it work, the related-but-distinct technique of familial searching, the standardization and policy questions that have crystallized since 2018, and the privacy and equity tensions that make this the most contested forensic development of the era. The Golden State Killer case is the load-bearing example, and Case Study 29.1 takes it deeper still; here we use it to draw a distinction that confuses even practitioners: IGG and familial searching are not the same thing, and conflating them muddies both the science and the ethics.

Start with the distinction, because everything follows from it. Familial searching is the deliberate search of a criminal DNA database (in the United States, CODIS) for a partial match to a crime-scene profile — a profile that is similar but not identical to a known offender's — on the theory that a close biological relative (a parent, child, or sibling) of that known offender may be the true source of the crime-scene DNA. It uses the same STR profiles and the same criminal database you met in Chapter 7; it simply searches them differently, looking for the near-misses that kinship produces rather than the exact match it normally seeks. Investigative genetic genealogy, by contrast, uses different genetic markers (hundreds of thousands of SNPs, not ~20 STRs) in a different kind of database (consumer genealogy services, not CODIS) and reaches much more distant relatives (third and fourth cousins) through family-tree reconstruction. Familial searching reaches a close relative already in the criminal system; IGG reaches a distant relative in a consumer system. Different markers, different databases, different reach, different law.

FIGURE 29.3 — Two ways DNA reaches a suspect through relatives        [schematic; constructed teaching example]

                          CRIME-SCENE DNA PROFILE
                                    │
              ┌─────────────────────┴──────────────────────┐
              │                                             │
      FAMILIAL SEARCHING                         INVESTIGATIVE GENETIC GENEALOGY
      (CODIS / STR markers)                      (consumer DB / SNP markers)
              │                                             │
   search criminal database for                  generate dense SNP profile;
   a PARTIAL (near) match                         upload to genealogy database
              │                                             │
   partial match → a known                        returns DISTANT relatives
   offender is a close relative                    (3rd–4th cousins), ranked by
   (parent / child / sibling)                      shared DNA
              │                                             │
   investigate that offender's                     build family trees from public
   CLOSE family                                    records; triangulate downward
              │                                             │
              ▼                                             ▼
      ┌───────────────────────────────────────────────────────────┐
      │   A SHORT LIST OF CANDIDATE RELATIVES — AN INVESTIGATIVE   │
      │   LEAD, NOT AN IDENTIFICATION                              │
      └───────────────────────────────┬───────────────────────────┘
                                       │  winnow by age / sex / geography
                                       ▼
                        ONE CANDIDATE SUSPECT
                                       │
                                       ▼
            CONFIRM WITH CONVENTIONAL STR ON A KNOWN/ABANDONED SAMPLE
                  (the gold-standard method of Chapter 7 does the identifying)

   BOTH techniques OUTPUT A LEAD. Conventional DNA still makes the courtroom identification.

Walk the diagram, because its shape is the whole ethical and scientific argument. Both techniques begin with a crime-scene profile and both end at the same place: not with an identification, but with an investigative lead — a candidate or short list of candidates that must then be confirmed by the conventional, gold-standard STR comparison of Chapter 7 against a sample known to come from the suspect. This is the single most important fact about both methods, and the reason IGG is this book's emblem of honest progress: neither familial searching nor IGG is offered to a jury as the identification. The genealogy or the partial-match search points; ordinary, validated DNA confirms. The output of the new, contested, powerful technique is a hypothesis; the output handed to the court is the old, validated match. A method that hands off its conclusion to the strongest tool in the field, and never pretends to be that tool, is being structurally honest about its own limits — which is exactly what the bite mark never did.

For the Golden State Killer specifically — advancing the anchor with detail Chapter 8 did not dwell on — the mechanics reward a closer look. The investigators did not have a single close-cousin match that handed them DeAngelo. They had a set of distant matches, on the order of third cousins, scattered across the branches of a very large extended family. Genealogists then performed the painstaking, weeks-long work that is the actual labor of IGG: building those matches' family trees outward and backward from public records — censuses, obituaries, marriage and birth and death records, newspaper archives — until the trees converged on a set of common ancestors from whom the offender must also descend. From two or more distant matches on different sides of the tree, they triangulated downward through generations of descendants, then winnowed that pool of candidate descendants by the case's known constraints — approximate age, sex, and California geography — until a single living candidate remained. Only then came the indispensable confirmation: a discarded item yielding DeAngelo's DNA, run as a conventional STR comparison against the crime-scene profile. The genealogy generated a name in 2018; the STR match is what put that name in front of a court. The forty-year gap was closed not by a faster instrument but by a new question — "where are his relatives?" — asked of a database no criminal-justice system built.

⚖️ In the Courtroom A point that confounds even experienced attorneys: because both familial searching and IGG produce an investigative lead rather than trial evidence, the genetic-genealogy work itself usually never reaches the jury. What the jury hears is the confirmatory STR match — the validated method of Chapter 7 — between the crime-scene profile and the defendant's known sample. This is why challenges to IGG and familial searching have, so far, centered less on scientific reliability (the reliable thing, the STR confirmation, is old and well-validated) and more on how the lead was developed: the terms of service of the databases searched, the propriety and oversight of the search, and Fourth Amendment questions about searching consumer genetic data. The validity question (is the confirming method sound? — yes) and the propriety question (was the search lawful and consented? — contested) are genuinely different, and a court, an attorney, and a forensic scientist must keep them apart. Conflating them produces bad law and muddled ethics.

The privacy and equity questions, which Chapter 8 previewed and Chapter 38 takes up in full, sharpen when the two techniques are seen side by side. Familial searching raises the problem of genetic surveillance by association: the relatives of everyone already in CODIS — a population that, because of uneven enforcement, over-represents some communities — become, in effect, partially searchable, placing the families of the already-databased under a genetic watch they never consented to and that falls unequally across groups. IGG raises the problem of consent at a distance: a single distant cousin's voluntary upload to a consumer service renders an entire extended family findable by law enforcement, none of whom consented, and because consumer genealogy databases over-represent people of European descent, IGG's reach is also uneven across populations. Both techniques, in different ways, extend the state's genetic gaze from the individual to the family — and the law governing what police may search, in which database, under what oversight, has lagged the technology badly. In the United States, an interim federal policy has set conditions on the use of forensic genetic genealogy in federally supported investigations (limiting it, in general, to violent crimes and unidentified-remains cases, and requiring that conventional database searching be exhausted first), but the broader regulatory landscape remains a patchwork.

🧠 Cognitive-Bias Watch Both techniques carry a bias trap distinct from the bench biases of earlier chapters: once the genealogical or partial-match work settles on a candidate family and a likely suspect, every subsequent fact can be read to fit — a confirmation-bias cascade running through public records and investigative tips rather than through peak heights or ridge detail. The safeguard is the same in spirit as the laboratory's: the lead is a hypothesis to be tested by independent confirmation — the abandoned-sample STR comparison and conventional investigation — never a conclusion the rest of the case must be bent to support. A family tree that "feels right," or a partial match that "must be the brother," is not an identification. The confirming STR match is. The discipline that protects the innocent relative is the refusal to treat the lead as the answer.

Where do these techniques sit on the validity spectrum? The distinction the whole section has drawn pays off here. The confirmation step — the conventional STR comparison that actually identifies — sits where DNA always sits: at the top, strong and validated. The lead-generation step — familial searching's partial-match logic, IGG's SNP-and-trees reconstruction — is best understood not as a courtroom identification method at all but as an investigative method, judged by whether it reliably points investigations in the right direction and by the privacy costs it imposes, not by a courtroom error rate it never claims. Its failure mode is not "convicting the wrong person on genealogy" — the STR confirmation guards against that — but "following leads to the wrong family branch," which wastes effort and intrudes on innocent relatives. That is a real cost, and a different kind than a wrongful conviction. Held to the right yardstick — investigative reliability and privacy, not courtroom individualization — and married to the gold-standard confirmation, these are among the most genuinely valuable developments in the modern field. They are powerful, they are validly grounded because they hand off to a validated method, and they are deeply contested. Holding all three judgments at once is the mature forensic posture.

29.6 Evaluating tomorrow's methods before they enter court

We arrive at the chapter's purpose, and the book's "evaluate before you trust" discipline made into a usable tool. The history this book has told is, at bottom, a history of methods admitted too early — bite marks, microscopic hair comparison, the fire-investigation folklore that helped execute Cameron Todd Willingham (Chapter 22), each accepted on the authority of its practitioners before anyone measured whether it worked, each then very hard to dislodge. The whole field has spent the twenty-first century paying for that haste. The single most valuable thing you can carry out of this chapter is a refusal to repeat it — a checklist for meeting any new method at the courtroom door and asking it the questions the bite mark was never asked in time.

Here is that checklist. It is nothing more than the NAS 2009 / PCAST 2016 framework, turned from a backward-looking audit into a forward-looking gate, and it applies without modification to rapid DNA, microbial forensics, isotopes, AI, genealogy, and to methods not yet invented.

What, precisely, is the claim? State the exact assertion the method makes — and its true strength. "This algorithm identifies the source" is a far stronger and far more suspect claim than "this method generates an investigative lead." Most honest emerging methods are lead generators (isotopes, IGG, AI search); the danger begins when a lead generator is dressed as an identifier.
Has it been independently validated? Not by its vendor or its inventors, but by independent researchers with no stake in the outcome, on realistic samples, with the truth known in advance. A demonstration is not a validation; a published accuracy figure is not a validation until you know who measured it, on what, and how.
What is the measured error rate — and across which subgroups? The decisive question of the whole book. A method with no measured error rate has no business carrying weight in court, no matter how new or how sophisticated. And the error rate must be broken out across populations and conditions, because an aggregate "99% accurate" can hide a far higher failure rate for an underrepresented group or a hard sample type.
Is the reasoning auditable? Can the method show its work in a form a human can examine, a defense can probe, and a Daubert gatekeeper can assess? A black box that cannot explain itself fails this test however often it is right, because forensic testimony rests on examinable reasoning, not oracular pronouncement.
Is it inside its validated envelope? A method validated for one use (rapid DNA on clean reference swabs) is not thereby validated for another (rapid DNA on degraded mixtures at a scene). Validity is a property of method-plus-sample-plus-question, never of the instrument alone.
Who bears the cost, and did they consent? The companion question for powerful tools (genealogy, microbial trace, facial recognition): a method can be valid and still be ethically fraught. Innocent relatives, over-surveilled communities, and the falsely flagged bear real costs that the validity question alone does not capture.
What does it hand off to? The single best sign of an honest emerging method is that it generates a lead confirmed by a transparent, validated downstream method (IGG handing off to STR), rather than offering its own output as the conclusion. A method that confirms itself, with no independent check, is the bite mark's structure in new clothing.

⚠️ Junk-Science Alert The newness of a method is not evidence for it, and three rhetorical moves should trigger your skepticism every time. First, the appeal to sophistication — "it uses deep learning / next-generation sequencing / stable isotopes, so it must be reliable." Complexity is not validity; an unvalidated complex method is less trustworthy than a validated simple one, because its failures are harder to see. Second, the vendor's accuracy claim presented as a validation study — a number with no independent measurement, no disclosed test set, no subgroup breakdown, and a commercial interest behind it. Third, the demonstration that stands in for evidence — the impressive conference case, the cold case the new method "cracked," offered as proof the method works in general. A method that worked once, with the answer already known, has not been validated; it has been demonstrated. The bite mark, the hair comparison, and the arson folklore all had their impressive demonstrations too. Demand the boring studies. The boring studies are where the truth is.

🔍 Check Your Understanding 1. A company markets an AI tool that "identifies" a suspect from gait in surveillance video, citing "97% accuracy in our testing." Apply three items from the checklist above. What must you ask before this output could responsibly support anything in court? 2. Why is "it generates a lead that conventional DNA then confirms" (IGG) a sign of methodological honesty, while "the algorithm's output is itself the identification" is a warning sign? Connect this to the structure of the bite-mark error (Chapter 16, previewed).

The deeper point of this section reaches past any one method. The four themes of this book are, in the end, a single discipline applied over and over: forensic science excludes more reliably than it proves (Theme 1), not all methods are equally valid (Theme 2), cognitive bias is the chief threat (Theme 3), and the CSI effect cuts both ways (Theme 4). Those themes were forged on the failures of the past — the methods admitted too early and dislodged too late. Their final test is whether they hold when pointed at the future, at methods that do not yet exist and claims not yet made. The reflex to ask "what does this prove, and how do we know?" is not a museum piece for re-examining old convictions. It is the live skill that will decide whether the next generation of forensic methods earns its place honestly, or repeats the field's oldest mistake on a new and more powerful scale. Carry the question forward. It is the most durable thing this book has to give.

🗂️ The Case File

The genealogy lead pays off — but narrows, it does not name. Recall the state of the Mill Creek file from Chapter 8: the touch DNA on the gas-can handle was a low-template, heat-degraded mixture — one component consistent with the victim, Marcus Diallo, and a minor contributor who was an unknown person, not in CODIS. Because the criminal database was silent, the minor component was worked up, exactly as the Golden State Killer case taught the field to do, toward an investigative genetic genealogy lead: a SNP profile and a search for distant relatives. This chapter tells you what that lead, together with rapid confirmatory typing, established — and, just as importantly, what it did not.

The genealogy work returned distant relatives and, after family-tree reconstruction and winnowing, pointed toward a known associate of the investigation rather than toward a stranger from outside the circle of persons of interest. A confirmatory STR comparison — the gold-standard method of Chapter 7 — was run (the kind of clean reference typing that rapid DNA, §29.1, can perform quickly and within its validated envelope) against the minor component. The result, stated at its honest strength: the minor contributor to the gas-can mixture is not an unknown stranger from outside the established field of persons of interest. The "random intruder" theory — that some unconnected outsider handled the can — is excluded.

What this adds — and only this. This is an exclusion, not an identification, and the distinction is the whole lesson of the chapter and the book. The evidence closes a door: it removes the unknown-stranger explanation for the minor DNA, narrowing the field of possibilities. It does not name the contributor; it does not, by itself, establish how or when those cells reached the handle (secondary transfer, Chapter 8, remains live for a minor, low-template component); and it does not prove that the contributor set the fire or committed any crime. Its honest verb is excludes — of a theory, not of a method's caution. The interpretation of whose contribution the mixture is consistent with, and at what likelihood ratio, was the work of Chapter 9 and is not redone here; what Chapter 29 adds is narrower and cleaner: the stranger theory is excluded.

Running status. The field narrows. The minor contributor to the gas-can DNA is not a random outsider, so the investigation's focus stays inside the established circle of persons of interest rather than chasing an unknown intruder — but no name is spoken, because the science here has earned an exclusion, not an inclusion. Log it in the workbook (Appendix I) as "stranger theory excluded," with the caveats attached: an exclusion of a possibility, not the identification of a person. This is the book's first theme in its purest form — forensic science excludes far more reliably than it proves — and the discipline, as always, is to claim exactly the exclusion and not one inch more. It is a capital mistake to theorize before one has data; the data here have closed a door, and closing a door is not the same as walking through it.

Conclusion

The future of forensic science is neither the salvation the headlines promise nor the threat the cynics fear; it is a set of new methods to be held to the same old yardstick. Rapid DNA takes the gold-standard STR method out of the laboratory and into a ninety-minute box — superb for the clean reference swabs inside its validated envelope, dangerous the moment it is pushed onto the degraded mixtures it was built to skip. Microbial forensics and the necrobiome propose a new clock and a new trace from the communities of bacteria on and around a body — real science, genuinely promising, and honestly not yet validated to the standard a courtroom requires. Forensic isotope analysis reads a layered life-history of geography from the tissues — rigorous chemistry yielding a region and an investigative lead, never a name. AI and machine learning carry real power into every discipline and a new family of perils with them — the black box, training-data bias, automation bias, and above all the validation gap — and must be held to the validity spectrum exactly as the bite mark was, with more scrutiny for their opacity, not less. And familial searching and investigative genetic genealogy, the era's most contested developments, extend DNA's reach from the suspect to the suspect's relatives — powerful, validly grounded because they hand off to the confirmatory STR match, and freighted with privacy and equity questions the law has not caught up to.

Two of the book's themes carried the whole chapter. The validity spectrum (Theme 2) proved to be a forward-looking instrument, not only a backward-looking audit: every emerging method sorts onto it by the same question — what is the measured error rate, and how do we know? And exclusion over proof (Theme 1) found its purest expression in the cold case, where rapid DNA and genealogy did not name a killer but excluded the unknown-stranger theory, narrowing the field by closing a door. That is what honest forensic progress looks like: not a database that chirps a name in nine seconds, but a new way to exclude a possibility, confirmed by a validated method, claimed at exactly its true strength.

We now leave the methods behind and turn, in Part VI, to the humans who use and judge them. The most sophisticated emerging technology in this chapter is still operated, interpreted, and presented by a person, under pressure, in front of a jury — and the next chapters reinterpret everything the book has taught through that human lens. Chapter 30 begins where all evidence finally lives or dies: the witness stand, where an expert must communicate what a method establishes, and what it does not, honestly, under adversarial fire — including, before long, how to explain to a jury why an algorithm's confident output is only a lead, and why "the computer found it" is not, and must never become, the new "the expert said so."

Key Terms

Rapid DNA — the fully automated generation of an STR profile from a reference sample (typically a buccal swab) inside a single self-contained instrument, without a human analyst, in roughly 90 minutes to two hours; validated for clean single-source reference samples, not for complex evidentiary samples.
Microbial forensics — the application of the analysis of microbial communities (bacteria and other microorganisms) to legal questions, including postmortem-interval estimation from the necrobiome, trace association from transferred skin microbes, and the attribution of biological attacks; an emerging field whose forensic validation is, for most uses, incomplete.
Forensic isotope analysis — the measurement of stable-isotope ratios (of elements such as hydrogen, oxygen, carbon, nitrogen, sulfur, and strontium) in biological tissues to infer geographic origin, travel history, and diet, yielding a regional/investigative lead rather than an identification.
AI/ML in forensics — the use of artificial-intelligence and machine-learning systems (algorithms that learn patterns from training data) to perform or assist forensic tasks; powerful but carrying specific perils — black-box opacity, training-data bias, automation bias, and a validation gap — that must be evaluated against the validity spectrum.
Familial searching — the deliberate search of a criminal DNA database (CODIS) for a partial match to a crime-scene profile, on the theory that a close biological relative (parent, child, sibling) of a known offender may be the true source; distinct from investigative genetic genealogy in its markers, database, and reach.

Spaced Review

Rapid DNA performs the same STR chemistry as a full laboratory. Explain why it is appropriately placed high on the validity spectrum for one use and effectively off it for another — and name the feature (lost in automation) that makes the difference. (§29.1; and the validity spectrum from Ch. 1, Ch. 6)
Distinguish familial searching from investigative genetic genealogy on three axes — genetic markers, database searched, and degree of relatedness reached — and state the one thing they have in common at the end. (§29.5; and IGG from Ch. 8)
In Chapter 9 you learned the prosecutor's fallacy and the likelihood ratio. An AI tool reports a "96% similarity score" for a facial match. Explain why that score is not a probability that the match is correct, and why treating it as one is a close cousin of the prosecutor's fallacy. (§29.4; and Ch. 9)
The cold-case Case File this chapter delivers an exclusion ("stranger theory excluded"), not an identification. Explain how this is the book's first theme (exclusion over proof) in its purest form, and why an honest analyst stops there rather than naming the contributor. (§29.5, the Case File; and Theme 1 from Ch. 1)
Validity-spectrum question. A new microbial-PMI method and a validated single-source DNA test both produce a number. Using the NAS 2009 / PCAST 2016 framework, state why they sit at opposite ends of the spectrum despite both being "DNA-based," and name the specific thing the microbial method still lacks. (§29.2, §29.6; and the spectrum from Ch. 6)