Case Study 26.1 — The Boston Marathon Bombing: When Video Was Genuinely Probative — and When Crowdsourced "Identification" Went Catastrophically Wrong
Sourcing and tone. The facts below are drawn from the widely documented public record of the April 15, 2013 Boston Marathon bombing and the investigation that followed. The case is used to teach two opposite lessons in a single event: how surveillance and bystander video can be genuinely probative when used carefully, and how informal, crowdsourced "facial identification" can be catastrophically wrong. We treat a mass-casualty atrocity soberly and confine ourselves to documented, public facts.
Background
On April 15, 2013, two homemade bombs detonated near the finish line of the Boston Marathon, killing three people and injuring hundreds. In the hours and days that followed, investigators faced an image-forensics problem of unprecedented scale: the finish-line area had been recorded, at the moment of the attack, by an enormous number of cameras — commercial surveillance systems on storefronts, news and broadcast cameras, and the personal phones and cameras of thousands of spectators who had come to watch the race. The challenge was not a shortage of footage but an ocean of it, almost none of it made to be evidence, and the task was to reconstruct, from many fragmentary viewpoints, who did what, when.
This is the modern condition of §26.1 made vivid: cameras everywhere, footage from devices owned by no one in common, and a discipline whose first job is to recover, organize, and read that footage honestly.
The forensic evidence — video done carefully
The investigation's use of imagery illustrates the legitimate, probative side of this chapter:
-
Reconstructing events from many viewpoints. Investigators assembled footage from commercial cameras and the public into a synchronized account of the finish-line area. By cross-referencing time and geometry across independent recordings, they could trace the movements of individuals through the crowd — behavior over many frames, which §26.1 identifies as among the most defensible uses of footage because sequences of action are far harder to misread than a single ambiguous still.
-
Identifying suspects from converging footage. Analysts identified two individuals — later determined to be Tamerlan and Dzhokhar Tsarnaev — whose movements and actions, captured across multiple independent cameras, were consistent with placing the devices. Critically, the identification rested not on a single magic frame but on multiple, mutually corroborating recordings showing consistent appearance, clothing, and behavior, supported thereafter by other investigative evidence.
-
Public release of images. The FBI eventually released selected surveillance images of the two suspects to the public, asking for help identifying them — a controlled, official release of specific frames, distinct from the uncontrolled crowdsourcing that had already gone wrong (below).
What this side of the case teaches is that video evidence, used with discipline, is genuinely powerful: many independent cameras recording the same scene provide redundancy and cross-checking that a single source cannot, and behavior traced across frames supports strong, defensible inferences. This is image forensics at its honest best — convergence of multiple independent recordings, not the television fantasy of a single enhanced frame resolving everything.
The failure — crowdsourced facial "identification"
The same event produced one of the most instructive failures in the short history of image-based identification, and it belongs in this chapter precisely because it shows what §26.2 warns about facial comparison.
In the days after the bombing, before the official suspects were known, amateur sleuths on social media and online forums attempted to "identify" the bombers by scrutinizing publicly available crowd photographs — picking out individuals who looked "suspicious" (carrying backpacks, standing in particular places) and circulating their faces as possible perpetrators. This informal, crowdsourced facial identification produced false accusations of entirely innocent people. Most painfully, online speculation wrongly named a young man, Sunil Tripathi — a student who had gone missing weeks earlier and was in fact already deceased, unconnected to the bombing — as a suspect, subjecting his grieving family to a wave of accusation during the search. Other innocent spectators were similarly misidentified and publicly accused. A prominent newspaper front page even displayed images of innocent men under suggestive framing.
Every one of these misidentifications was a textbook violation of this chapter's cautions: identifying an unknown person from imagery by appearance is error-prone; "looks suspicious" is not evidence; a face in a crowd photo, absent any validated method, supports nothing about guilt. The crowd was doing facial comparison — the contested, thin-validation discipline of §26.2 — with no training, no method, no error-rate awareness, and enormous confirmation bias, and it produced exactly the wrongful identifications the formal discipline warns are possible even with training.
What the evidence did — and didn't — establish
The contrast is the lesson. The footage, used carefully by investigators — synchronized across many independent cameras, read as behavior over time, corroborated by other evidence — was probative and supported the identification of the actual suspects. The same raw material, used carelessly by a crowd — single suspicious-looking stills, no method, no corroboration, runaway confirmation bias — produced false accusations of the innocent.
Note what distinguishes the two uses, because it is the whole of §26.1–§26.2:
- Convergence vs. a single frame. The valid identification rested on multiple independent recordings agreeing; the false ones rested on isolated stills and a hunch.
- Behavior vs. appearance-of-suspicion. The valid use traced what people did across frames; the false use judged who looked suspicious in a single image.
- Method and corroboration vs. neither. The valid use was embedded in a disciplined investigation with other evidence; the false use was untrained pattern-matching amplified by social media.
- Awareness of error vs. false certainty. Investigators treated identification as a hypothesis to be corroborated; the crowd treated a resemblance as a conviction.
Outcome
Dzhokhar Tsarnaev was apprehended and later convicted; Tamerlan Tsarnaev died during the manhunt. The carefully-assembled video record was part of a much larger evidentiary picture. The crowdsourced misidentifications, by contrast, harmed innocent people and their families and became a widely cited cautionary tale about online "investigation" — prompting reflection at the platforms and news organizations involved about the dangers of amplifying unverified facial accusations.
The lesson
The Boston Marathon case is, for this chapter, a single event that teaches both halves of the discipline at once. Video evidence is genuinely powerful — when many independent recordings converge, when behavior is read across frames rather than guessed from a single still, and when identification is treated as a hypothesis to be corroborated rather than a resemblance to be trusted. And image-based identification of unknown persons is dangerous and error-prone — most dangerous when done by the untrained, on isolated images, under the pressure of a public emergency and the pull of confirmation bias. The crowdsourcing failure is the CSI effect (Theme 4) in its purest form: a public that believes you can identify the guilty by looking hard at a photo, acting on that belief, and ruining innocent lives. The careful investigative use is exclusion over proof and convergence (Themes 1 and 2) done right: no single frame "proved" anything; the case was built from many honest, corroborating pieces.
For the cold case, the parallel is exact. The gas-station CCTV shows "a person consistent with Keller" — and the discipline is to keep it there, corroborated by other evidence, never converted by a confident analyst (or an eager public) into "that is Keller, and no other." The crowd that named Sunil Tripathi made precisely the leap this book exists to prevent.
Discussion questions
-
The valid identification rested on multiple independent cameras, while the false accusations rested on isolated stills. Using §26.1, explain why convergence across independent recordings is so much stronger than a single frame — and why "behavior over many frames" is more defensible than "appearance in one."
-
The crowdsourced "identification" was a form of untrained facial comparison (§26.2). List three things the crowd lacked that even a trained facial-comparison examiner would have, and explain why the absence of those things made the misidentifications likely.
-
Connect the Sunil Tripathi misidentification to the book's caution about the CSI effect (§1.2). What false belief about image evidence drove the public's behavior, and how is it the same belief that makes "zoom-and-enhance" seem plausible?
-
The FBI's controlled release of specific suspect images differed from the crowd's uncontrolled circulation of suspects. Why does that distinction matter for accuracy and for the rights of the innocent?
-
Compare the Boston video evidence with the cold case's gas-station CCTV. In both, footage supports an identification. What features made the Boston identification defensible, and what would the cold-case footage need (per §26.2) before "consistent with Keller" could honestly become anything stronger?
-
Bias tie-in (Chapter 31, previewed). Confirmation bias drove the crowd to "see" guilt in innocent faces. Explain how the same dynamic could distort a trained analyst who is told the suspect's identity before comparing footage, and what safeguard (context management, blind comparison) would reduce it.