Appendix F: The Validated vs. Questioned Methods Scorecard
This is the single most important reference in the book. It is the validity spectrum (Chapter 1, sharpened in Chapter 6) rendered as a working scorecard — method by method, what the science can honestly support, what goes wrong, and where the two landmark reviews placed it. Every discipline chapter located its own method on this spectrum; this appendix gathers all of them in one place so you can answer, fast, the question that protects an innocent person: Where does this method sit, what claim is being made, and does the science support that claim?
Read three framing rules before any verdict below, because they matter more than memorizing positions.
- Position is a ceiling, not a guarantee. A method's place on the spectrum is the best reliability it can offer when done perfectly. Top-of-spectrum DNA can be dragged to a worthless result by a contaminated tube or a biased setup. The spectrum tells you the method's potential; validity as applied tells you what happened this time.
- The spectrum is question-specific. The same discipline can sit at opposite ends depending on what you ask it. Forensic odontology is near the discredited end for "these teeth made this bite mark" and near the solid end for "these dental records identify this body." Always ask about the specific claim, never the discipline's name.
- The honest verb tracks the position. As foundation weakens, the strongest defensible statement weakens in lockstep: identifies (with a stated probability) → consistent with (at a measured error rate) → class-level association only → the science does not support this comparison at all. An examiner whose confidence rises as the science weakens has inverted the relationship — and that inversion is, almost exactly, the shape of the wrongful convictions in this book.
What "NAS 2009" and "PCAST 2016" mean in the columns below. The NAS 2009 report (Strengthening Forensic Science in the United States) surveyed the whole field and found that, apart from nuclear DNA, the disciplines lacked the rigorous foundation their courtroom authority implied. The PCAST 2016 report zoomed in on feature-comparison methods and asked, of each, whether black-box studies (many examiners, many comparisons of known ground truth, error rate measured) had shown it actually works — distinguishing foundational validity (is the method valid?) from validity as applied (was it done right here?). A method PCAST did not address (e.g., toxicology, digital, accounting) is marked "not a feature-comparison method; outside PCAST's 2016 scope."
On the error-rate figures. Where a number appears it is illustrative of the right order of magnitude, drawn from the qualitative findings the chapters discuss — not a precise rate. Exact rates vary by study and conditions. Treat them as "roughly this big," never as a citable statistic.
F.1 The spectrum at a glance
STRONGER FOUNDATION ◄─────────────────────────────────────────────► WEAKER / DISCREDITED
(validated, error rate known) (no validated basis, or affirmatively debunked)
Single-source DNA > Instrumental > Latent > Simple DNA > Firearms/ > Microscopic > Bite
Toxicology (conf.) chemistry fingerprints mixtures (PG) toolmarks, hair "match", marks
(GC-MS confirmed) (drugs, debris) (valid; non- (valid in complex DNA handwriting (discredited;
zero error) envelope) mixtures, BPA, ID, footwear/ exonerations)
GSR, soil, tire "to one"
shoe/tire class
── what you can honestly say shifts as you move right ──►
"identifies, with "is X, to "consistent with; "X× more "cannot exclude; "cannot exclude; "the science
a stated RMP" analytical false-match rate probable if class-level of limited does not
certainty" ~1 in N" contributor" association" value" support this"
Digital forensics, forensic accounting, forensic pathology, anthropology, entomology, odontology-as-ID, and questioned-document non-handwriting work do not slot neatly onto a single left-right line because they are not "compare-a-feature-to-a-source" methods; their scorecards below state their own honest reach.
F.2 The master scorecard
Each row: the method, its foundational validity, its principal error modes, its NAS 2009 / PCAST 2016 status, and what it can honestly establish (the verb). Detailed notes for the contested methods follow in F.3–F.6.
Biological evidence
| Method | Foundational validity | Known error modes | NAS 2009 / PCAST 2016 status | What it can honestly establish |
|---|---|---|---|---|
| Single-source nuclear DNA | Strong — the model the others are measured against; understood biology, quantified RMP, objective core measurement. | Contamination; sample swap / mislabeling; secondary transfer; relatives share DNA; lab error is a separate, larger risk than coincidence. | NAS: the sole discipline exempted from the central criticism. PCAST: foundationally valid. | Identifies the source of the cells, with a stated random match probability — not when, how, or by whose conduct they were deposited. |
| DNA mixtures (probabilistic genotyping) | Split by regime — valid for simple mixtures within the validated envelope; not established for complex/low-template. | Wrong assumed contributor number; biased human inputs; model-dependence (two valid programs disagree by orders of magnitude); running beyond validated limits; black-box source code. | PCAST: foundationally valid for simple mixtures; not yet established for complex ones. | "The evidence is X times more probable if this person is a contributor than if not" — a likelihood ratio, never "this is his profile." |
| mtDNA / Y-STR (lineage markers) | Sound chemistry, intrinsically limited reach — biology, not technique, caps it. | Presenting a lineage match as individual ID; contamination (mtDNA especially); weak discriminating power. | Underlying typing accepted; cannot individualize by design. | "Consistent with this maternal/paternal lineage" (and all relatives in it) — strong for exclusion, never one-person ID. |
| Serology / body-fluid identification | Solid for identification; presumptive ≠ confirmatory. | Presumptive false positives (plant peroxidases, oxidizers for blood); cannot by itself say whose or when. | Identification methods accepted; the interpretation is where overstatement enters. | "This is human blood / semen / saliva" (after confirmatory test); locates and identifies the fluid, then hands off to DNA for source. |
| Bloodstain pattern analysis (BPA) | Weak-to-contested — physics of single droplets is sound; complex-scene reconstruction is subjective and examiner-dependent. | Examiner-to-examiner disagreement; overinterpretation of pattern as a precise event narrative; assumptions about surface, motion, sequence. | NAS: flagged as resting heavily on examiner experience and "more subjective than scientific." PCAST: not separately validated. | "Consistent with" a class of mechanism / a supported area of origin; not a precise reenactment of who did what, where, in what order. |
Pattern and impression evidence
| Method | Foundational validity | Known error modes | NAS 2009 / PCAST 2016 status | What it can honestly establish |
|---|---|---|---|---|
| Latent fingerprints (ACE-V) | Foundationally valid, with a measured non-zero error rate. The "gold standard" is a human judgment. | False positives (~1 in several hundred to ~1 in a thousand, illustrative); context/confirmation bias (Mayfield); partial/distorted latents; no defined statistical model for "individualization." | NAS: criticized the "zero error rate"/individualization claims. PCAST: foundationally valid, but examiners must report the error rate and drop "certainty." | "The latent and the reference agree, supporting a common source; the method's false-match rate is about 1 in N" — not "identification to the exclusion of all others." |
| Firearms / toolmarks | Not yet established at the time of PCAST — too few appropriately designed black-box studies; plausible but unproven. | Subjective "sufficient agreement" threshold; examiner bias; class vs. individual confusion; no validated error rate for individualization. | NAS: foundational research lacking. PCAST: foundational validity not established; called for the black-box studies. | "Class characteristics are consistent; cannot exclude this firearm/tool" — increasingly, a stated error rate where studies now exist; not "fired by this gun to the exclusion of all others." |
| Footwear & tire impressions | Two-tier: class characteristics solid; individual "this shoe and no other" not established. | Overstating acquired/individual wear features; subjective matching; substrate/casting artifacts. | PCAST: individual-source claims not established; class-level association is the defensible part. | "This class of outsole/tire (size, brand, tread) made the impression" and, with rare distinctive wear, a stronger association — not individualization. |
| Toolmarks (pry bars, cut marks) | Class-level reasonable; individual claims unproven (shares firearms' problem). | Same as firearms; deformation of the mark; subjective comparison. | PCAST: bundled with firearms — not established for individualization. | "Consistent with a tool of this class / this type of action (forced entry)" — class evidence, not one-tool ID. |
| Bite-mark analysis | Not foundationally valid — and PCAST saw little prospect it could become so. Two unproven premises: that dentition is unique and that skin records it reliably. | Skin distorts; high examiner disagreement even on "is this a bite?"; multiple exonerations (e.g., Krone, Brown). | NAS: deeply skeptical. PCAST: not foundationally valid. | Essentially nothing as identification. At most "this could be a bite"; the comparison to a suspect's teeth is not scientifically supported. |
| Microscopic hair comparison | Discredited as a "match" method — morphology cannot individualize. | The FBI review found examiners overstated the strength of hair testimony in a large majority of reviewed cases; juries heard "consistent with" as "his hair." | NAS: not validated for individualization. (FBI/DOJ later acknowledged systemic overstatement.) | "Consistent with" at the level of general characteristics only — weak class evidence; for source, the hair must go to mtDNA. |
| Fibers | Honest class evidence — value rises with rarity; never individualizing. | Treating a common fiber as significant; transfer/persistence ambiguity; overstated "association." | Not individualizing; accepted as class/associative evidence when handled honestly. | "Consistent with originating from this type of textile" — probabilistic, stronger when the fiber is rare; never "this garment and no other." |
| Handwriting / questioned documents (handwriting ID) | Contested subjectivity. Some document tasks are objective (ink/paper chemistry, indented writing via ESDA); handwriting authorship is the weak claim. | Examiner subjectivity; disguised/forged writing; limited validated error rate for authorship. | NAS: questioned the scientific basis of handwriting individualization. | Ink/alteration/indented-writing findings can be strong (instrumental); handwriting authorship is "consistent with / cannot exclude," not a validated individual ID. |
Chemical and toxicological evidence
| Method | Foundational validity | Known error modes | NAS 2009 / PCAST 2016 status | What it can honestly establish |
|---|---|---|---|---|
| Forensic toxicology (confirmed) | Strong when confirmed instrumentally (immunoassay screen → GC-MS/LC-MS confirmation). | Postmortem redistribution; "present ≠ impairing"; specimen choice; presumptive immunoassay false positives if not confirmed. | Not a feature-comparison method; outside PCAST's 2016 scope. Confirmatory chemistry is well founded. | "Substance X is present at concentration Y in this specimen" — interpretation of impairment/cause is the hard, qualitative part. |
| Drug chemistry (controlled-substance ID) | Strong when confirmed — presumptive color/microcrystalline tests suggest; instruments confirm. | The roadside field-test problem — presumptive color tests false-positive and have wrongly jailed people; confirmation is mandatory. | Outside PCAST's 2016 feature-comparison scope; confirmatory ID (GC-MS) is well founded. | "This is [substance]" only after instrumental confirmation; a presumptive field test alone establishes suspicion, not identity. |
| Fire-debris / arson chemistry | Strong for the chemistry; folklore for the old scene "indicators." | The discredited indicators (crazed glass, "pour patterns," alligatoring) — folklore that helped execute Willingham; flashover mimics "arson signatures." | NAS/PCAST context: ignitable-liquid chemistry is sound; the old visual indicators are debunked. | "Gasoline (etc.) is present in the debris" (GC-MS); an incendiary finding must rest on valid origin analysis plus confirmed ignitable liquid — never on indicators alone. |
| Instrumental analysis (GC-MS, FTIR, Raman, SEM-EDX) | Strong — the confirmatory backbone; turns "looks like" into "is." | Sample prep / contamination; misinterpreting a spectrum; matrix effects; over-reading elemental (SEM-EDX) results as source ID. | The analytical sciences these rest on are well validated. | "This compound is [identity], by its mass spectrum / spectrum"; SEM-EDX confirms GSR particle morphology+composition — identity and composition, not source attribution. |
| Gunshot residue (GSR) | Valid for particle identification; weak/contested for what it proves. | Contamination (police cars, booking areas, the shooter's environment); transfer; "absence ≠ didn't shoot, presence ≠ did." | The particle ID (SEM-EDX) is sound; the inference about shooting is where overstatement lives. (The FBI discontinued routine GSR analysis in 2006.) | "Particles characteristic of primer residue are present" — consistent with proximity to a discharge; not proof this person fired a weapon. |
| Paint | Honest class/associative evidence — layer structure can be discriminating. | Treating common paint as significant; overstated "match"; transfer ambiguity. | Accepted as class/associative comparison, especially multi-layer. | "Consistent with originating from this type/layer-sequence of paint" — stronger with more matching layers; not individualization. |
| Glass (refractive index, fracture) | Two-tier: RI/elemental comparison is class evidence; physical fracture fit can be near-individual. | RI overlaps across many sources; overstated significance; (a genuine jigsaw fracture match is the strong exception). | Accepted as class comparison; physical fit is strong when a fragment uniquely re-assembles. | "Consistent with this source of glass" (class) — but a physical fracture fit can support a single-source association. |
| Soil / geological | Honest class/associative evidence — value rises with distinctiveness. | Spatial variability; overstated "match"; common compositions. | Accepted as class/associative when handled with appropriate caution. | "Consistent with the soil at location L," stronger when the soil is distinctive; not "this spot and no other." |
Digital and other domains
| Method | Foundational validity | Known error modes | NAS 2009 / PCAST 2016 status | What it can honestly establish |
|---|---|---|---|---|
| Digital forensics | Strong for integrity; interpretive at the edges. Imaging + hashing make integrity mathematically verifiable. | Mis-attribution (a device/account is not a person); cell-site location is coarse, not GPS-precise; altered timestamps; over-reading metadata. | Not a feature-comparison method; outside PCAST's 2016 scope. Hashing/imaging are rigorous. | "This data existed on this device/medium with verified integrity"; cell-site shows an approximate area, never a pinpoint — device/account activity, not necessarily who acted. |
| Forensic accounting | Strong for the trail; inferential for intent. Audit trails and records are durable and checkable. | Confusing anomaly with fraud; Benford's-law misuse on unsuitable data; assuming motive from a number. | Not a feature-comparison method; outside PCAST's 2016 scope. | "These funds moved thus; here are the anomalies and the documentary trail" — motive/intent is an inference for the factfinder. |
| Forensic pathology (autopsy) | Established medical practice; manner of death carries irreducible judgment. | Manner-of-death disagreement; PMI estimates are ranges, not clocks; cognitive bias from case context. | A medical discipline, not a feature-comparison method; outside PCAST's scope. | Cause of death often well supported; manner (homicide/accident/etc.) is a reasoned opinion; PMI is a window, stated with uncertainty. |
| Forensic anthropology | Established for the biological profile and trauma timing (population-based estimates). | Estimates are ranges (sex, age, ancestry, stature); distinguishing perimortem trauma from heat/animal/postmortem damage is hard. | A skeletal-biology discipline; outside PCAST's feature-comparison scope. | A biological profile (as ranges) and perimortem vs. postmortem trauma reading — narrows identity and informs cause; rarely a stand-alone ID. |
| Forensic odontology (dental ID) | Strong for identifying the dead — the valid half of odontology (contrast bite marks). | Quality/availability of antemortem records; charting/comparison care. | Dental identification is accepted; bite-mark comparison is not (see above). | "These antemortem records identify this body" — a genuine, valid identification of the deceased. |
| Forensic entomology | Sound for a PMI window, within stated assumptions. | Temperature reconstruction; season; fire/burial altering insect access; drugs in tissue shifting development. | An ecological/biological method; outside PCAST's feature-comparison scope. | A minimum postmortem interval estimate, as a range bounded by its assumptions — a clock with error bars, not a timestamp. |
| Criminal profiling | Weak / largely unvalidated as a predictive product (distinct from genuine forensic psychology: competency, sanity, risk). | The Barnum effect (vague descriptions feel accurate); confirmation; pointing investigations the wrong way (Richard Jewell). | Not a feature-comparison method; the predictive profile lacks demonstrated validity. | Investigative suggestion at best; genuine forensic-psychology assessments (competency/risk) are the valid, careful core. Not evidence of who did it. |
F.3 The strong end, read honestly
The methods at the left of the spectrum earned their place by being built outside the courtroom — by scientists with no stake in any verdict — and validated, quantified, and stress-tested before they were asked to convict anyone. That sequence is the whole difference. Three honest cautions keep "strong" from becoming "infallible":
- Single-source DNA is the anchor, but its strength is for its actual claim — identifying the source of cells. Ask it "whose cells are these?" and it is the finest tool forensic science has. Ask it "is this person guilty?" and it is silent — and the most dangerous courtroom moments are when someone mistakes that silence for a yes. The transfer problem, relatives, and ordinary lab error are real edges on even this number (Appendix E).
- Instrumental chemistry and confirmed toxicology are strong because an instrument reads a molecule against a standard, not an examiner's eye against a memory. But "present" is not "impairing," a presumptive test is not a confirmation, and the most common failure is skipping confirmation — the roadside field-test problem that has jailed innocent people on a color change.
- Confirmed identification ≠ correct inference. GC-MS can prove gasoline is in the debris; that does not prove arson. The valid path to an incendiary finding runs through valid origin analysis and confirmed ignitable liquid — never through the debunked indicators that helped execute Cameron Todd Willingham.
F.4 The contested middle: where most courtroom danger lives
The middle of the spectrum is the most consequential, because these methods are admitted every day and their limits are the easiest to overstate. The pattern is consistent: a class-level statement is defensible; an individual-source statement is not.
- Latent fingerprints are the cautionary heart of the middle. PCAST found them foundationally valid — and in the same breath reported a measured, non-zero false-match rate wholly incompatible with the "zero error rate" examiners claimed for a century. The Brandon Mayfield case (Chapter 14) is the proof: the FBI was 100% confident and 100% wrong, driven partly by context and confirmation bias. The honest verb is "agreement supporting a common source, with a stated error rate," never "identification to the exclusion of all others."
- Firearms and toolmarks rest on a subjective "sufficient agreement" threshold that PCAST found inadequately validated by black-box studies. The defensible claim is class-level ("consistent with; cannot exclude"); the leap to "this gun and no other" outran the science.
- Footwear, tire, paint, glass, soil, and fibers are all honest class/associative evidence whose value rises with distinctiveness or rarity and collapses when a common feature is sold as significant. The two genuine exceptions worth knowing: a physical fracture fit (glass, or any broken object that uniquely re-assembles) can support a single-source association, and a rare fiber or distinctive wear pattern strengthens an association without ever reaching individualization.
- GSR and BPA sit lower in the middle for the same reason: the underlying observation can be sound (a primer-residue particle is real; a single droplet's physics is real) while the inference drawn from it (this person fired a gun; the attack happened here, this way, in this order) is contaminated by transfer, examiner subjectivity, or overinterpretation. The FBI discontinued routine GSR analysis in 2006; BPA was flagged by NAS as resting more on examiner experience than on science.
🧠 The bias multiplier. Everything in the middle is human judgment, and human judgment drifts toward the expected answer. When the examiner knows the suspect's profile, the detective's theory, or the "right" result, borderline calls bend that way — not from dishonesty but from ordinary confirmation bias (Chapter 31). This is why context management and sequential unmasking matter most precisely for the methods in this section, and why "experience" is not a substitute for a measured error rate.
F.5 The discredited end
These methods make individual-source claims their science cannot support. The tell, every time, is certainty that rises as the foundation falls.
- Bite-mark comparison is the field's most thoroughly debunked discipline. It needs two unproven premises — that human dentition is unique and that skin reliably records it — and fails both; examiners disagree even on whether a mark is a bite. Multiple exonerations (Ray Krone, Roy Brown, and others) were built on bite-mark "matches." PCAST found it not foundationally valid and saw little prospect that it could be. The honest verb is, in practice, nothing as identification.
- Microscopic hair comparison as a "match" method is discredited for individualization. The FBI's own review found examiners had overstated the strength of hair testimony in the large majority of reviewed cases — not by lying about what they saw under the scope, but by letting a jury hear "microscopically consistent" as "his hair." The valid path for hair-to-source is mtDNA, with its lineage limits stated.
- Handwriting individualization, the old fire "indicators," and profiling-as-prediction belong here too for their strongest claims: each was built on practitioner assertion rather than validation, and each has a documented failure (Willingham for the fire folklore; Richard Jewell for the profile). The instrumental or genuinely-careful parts of these fields (ink chemistry, indented writing, competency assessment) are valid; the individualizing or predictive claims are not.
⚠️ The diagnostic inversion. The DNA analyst — strongest method — speaks in careful probabilities and refuses "identifies" without a number. The bite-mark examiner — weakest — says "this defendant and no other, to a reasonable degree of dental certainty." When the confidence of the testimony runs opposite to the strength of the method, that inversion is itself a warning, and it is almost exactly the shape of the wrongful convictions this book studies.
F.6 How to use this scorecard
When you meet a forensic claim — in a chapter, a courtroom, a news story, or a case file — run it through five questions, in order:
- What is the specific claim? Not "fingerprint evidence" but "the examiner says this print is the defendant's to the exclusion of all others." The discipline is not valid or invalid; a claim is supported or not.
- Where does the method sit for that claim? Use F.2. Remember the spectrum is question-specific — odontology for body ID vs. for bite marks lands in two different columns.
- Is the verb honest for that position? "Identifies (with a number)" only at the strong end; "consistent with / cannot exclude" in the middle; "class-level association" for class evidence; "the science does not support this" at the discredited end. A verb stronger than the position is overstatement.
- What is the error rate — and was it measured? "We've never been wrong" is not an error rate. If the black-box studies were never done, that absence is itself a finding (and a Daubert problem — Appendix D).
- Was the method applied validly here? Position is a ceiling. Check contamination, chain of custody, bias exposure, and whether the sample fell within the method's validated envelope. A top-of-spectrum method, misapplied, can produce a bottom-of-spectrum result.
The reflex this appendix exists to build. By the end of the book you should be unable to hear a forensic claim without the spectrum lighting up: Where does this sit? What claim is being made? Does the science support that claim, and was it done right this time? That reflex — not any single position in any table — is what protects an innocent person.
Cross-references: Chapter 1 (class vs. individual characteristics; the validity spectrum introduced); Chapter 6 (NAS 2009, PCAST 2016, foundational validity vs. validity as applied — the spectrum's backbone); Chapters 7–9 and Appendix E (DNA and its statistics, the strong anchor); Chapters 10, 14–24 (the discipline-by-discipline verdicts summarized here); Chapter 30 and Appendix D (how each verdict becomes honest — or dishonest — testimony); Chapter 31 (the cognitive bias that drags the middle of the spectrum downward); Chapter 38 (the reforms that would make these verdicts the courtroom norm).