Chapter 26: Video and Image Forensics: Enhancement, Authentication, and Deepfake Detection

DataField.Dev

37 min read

> "The camera does not lie, they used to say. The camera never lied less than people thought, and it lies more now than ever."

Prerequisites

1
5
6
25

Learning Objectives

Explain what camera footage can and cannot establish on its own, and why ubiquity does not equal probative value.
Define photogrammetry and describe how measurements (especially height) are recovered from images — together with the assumptions and error sources that bound the result.
Separate the real, limited operations of image and video enhancement from the 'zoom-and-enhance' fantasy television has sold, and state plainly what enhancement cannot create.
Describe the principal techniques of image and video authentication — including error level analysis and its well-known limits — and explain why authentication is now a more important forensic frontier than enhancement.
Define a deepfake and explain the detection problem as a moving target, distinguishing what current methods can support from what they cannot.
Define provenance and explain how metadata and content-authentication standards establish where an image came from — and why metadata is fragile, connecting to the digital-evidence handling of Chapter 25.
Place each method on the NAS 2009 / PCAST 2016 validity spectrum and state what an honest analyst can say on the stand about footage, a measurement, an enhancement, or an authentication result.

In This Chapter

Overview
Learning Paths
26.1 The ubiquity of cameras; what footage can establish
26.2 Photogrammetry: measuring from images
26.3 The truth about "enhancement"
26.4 Image and video authentication
26.5 Deepfakes and synthetic media
26.6 Provenance and metadata
🗂️ The Case File
Conclusion
Key Terms
Spaced Review

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 26: Video and Image Forensics: Enhancement, Authentication, and Deepfake Detection

"The camera does not lie, they used to say. The camera never lied less than people thought, and it lies more now than ever." — a working maxim among forensic image analysts [constructed teaching line, after a sentiment common in the discipline].

Overview

A man walks into a convenience store, buys two red plastic gas cans, and walks out. Eleven seconds of color video, time-stamped, from a camera bolted above the register — and suddenly the question is not what happened but what can we honestly prove from this footage. Is the man in the frame the suspect, or merely someone his height and build? How tall is he, really — can we measure it? Is the time stamp trustworthy? And the alibi video the suspect later hands over, the one that supposedly shows him forty miles away that evening: is it what it claims to be, or has its clock been edited?

These are the questions of image forensics, and they are not the questions television has trained you to ask. On screen, an analyst leans toward a monitor, says "enhance," and a four-pixel blur resolves into a license plate, a face, a reflection of the killer in someone's eye. None of that is real. You cannot conjure detail that the sensor never recorded; the information is simply not there, and no algorithm invents it honestly. What is real — and far more important — is the work of measuring what the image genuinely contains, and of authenticating whether the image is what it purports to be. In an age when a convincing fake can be generated on a laptop, authentication has quietly become the central problem of the field, and "enhancement" has become its most persistent myth.

This chapter takes both seriously. We will be honest about what cameras establish (less than you think, and only with care), about what measurement from images can support (real geometry, real error bars), about what enhancement can and cannot do (clarify, never create), and about the frontier where deepfakes and synthetic media are turning authentication from a niche specialty into a discipline every court will soon need. Throughout, we hold the field to the same yardstick as every other: not "does the expert sound certain?" but "what does the method actually support, and how do we know?"

In this chapter, you will learn to:

Say precisely what footage can and cannot establish, and why a camera being present is not the same as its evidence being probative.
Define photogrammetry and recover a measurement from an image while naming every assumption that could make it wrong.
Tell the real, narrow operations of enhancement from the "zoom-and-enhance" fantasy, and state what enhancement can never do.
Describe image authentication and error level analysis, including the limits that make ELA a screening tool, not a verdict.
Define a deepfake and explain why detecting one is a moving target, not a solved problem.
Define provenance, connect metadata to the digital-evidence handling of Chapter 25, and place every method on the validity spectrum.

Learning Paths

🔎 Investigator/CSI: Your job is recovery and preservation — §26.1 and §26.6. A surveillance system overwrites itself on a loop, often within days; the original file, with its metadata intact, is the evidence, and a phone-camera video of a monitor is not. How you obtain, hash, and document footage at the scene sets the ceiling on everything the lab can later do. 🧪 Lab analyst: Weight §26.2–§26.5. Photogrammetric measurement, the honest limits of enhancement, the authentication toolkit, and deepfake detection are the bench work — and §26.2 and §26.3 are where the discipline earns its keep by refusing to claim more than the pixels contain. ⚖️ Law/courtroom: Sections §26.3, §26.4, and §26.5 are where cross-examination lives — the gap between "enhanced" and "fabricated," the limits of ELA, and the looming problem of authenticating evidence in a world of plausible fakes. The Daubert question (Chapter 5) is sharp here because much of this work is new. 👥 General reader/juror: §26.1 and §26.3 are the antidote to the television version. Watch how a real analyst measures and authenticates — and how often the honest answer is "the image is consistent with, but does not prove."

26.1 The ubiquity of cameras; what footage can establish

Begin with the fact that reorganizes the modern crime scene: we are surrounded by cameras. The convenience-store register cam, the doorbell camera on a porch, the dashcam in a parked car, the automatic license-plate reader at an intersection, the body-worn camera on a responding officer, the phone in nearly every pocket, the satellite and the drone overhead. A single afternoon in a city is recorded, in fragments, by dozens of devices owned by no one in common. Image forensics — the application of scientific methods to the analysis, measurement, and authentication of images and video for legal purposes — exists because that ocean of footage now touches nearly every investigation, and because almost none of it was made to be evidence.

That last clause is the whole lesson of this section. Ubiquity is not the same as probative value, and the gap between them is where careless work goes wrong. A camera being present establishes very little by itself; what matters is what the footage actually shows, how reliably it shows it, and whether it is what it claims to be. Let us be concrete about what footage genuinely can establish, in ascending order of difficulty:

That an event occurred at all. The simplest and often strongest use: the footage shows a transaction, a collision, a door opening. The event is on the record.
When it occurred. Time stamps are powerful — and treacherous. A camera's clock is set by a human or a network and can be wrong, drifted, or in the wrong time zone. A time stamp is a claim made by the recording device, to be verified (against the system's settings, a known synchronized event, or metadata), not a fact to be read off the screen.
Where it occurred. The camera's field of view fixes a location, if you can establish what the camera was actually pointed at and that the footage was not relabeled from another device.
What a person or object did. Movements, sequences, interactions — the footage's richest content, and usually its most defensible, because behavior over many frames is hard to misread.
Who a person is. The hardest and most dangerous use. Recognizing a known individual from clear footage is sometimes reasonable; identifying an unknown person by facial comparison is a contested, error-prone discipline we treat with great caution in §26.2 and the case studies.

🔬 At the Bench The single most important thing an analyst does with surveillance video is often the least glamorous: secure the original. A digital video recorder (DVR) typically stores footage in a proprietary format and overwrites it on a rolling loop — sometimes after only a few days. The correct procedure is to export the native file, with its embedded metadata and at full resolution, and to compute a hash value (Chapter 25) so the file's integrity can be proved later. What you must not accept as the evidence: a phone video of the monitor, a screen recording, a re-compressed clip emailed by a witness, or a "converted" file in a convenient format. Every one of those is a copy of a copy, stripped of metadata and degraded by re-compression — and each degradation is a door a defense attorney will walk through. The chain of custody for a video begins at the DVR, not at the lab.

Notice how much of this depends on work done before any analysis: recognizing which cameras exist, obtaining native files quickly before they are overwritten, and documenting where each camera pointed. A perfect lab technique cannot rescue footage that was lost to an overwrite loop or reduced to a blurry phone-of-a-screen copy. This is the same lesson as the crime scene in Chapter 2 — the case is often won or lost in collection — transposed to the digital domain of Chapter 25.

🔍 Check Your Understanding 1. A witness hands police a clear cellphone video she shot of a store's security monitor, showing the incident. Why is this not the evidence you want, and what is? 2. A surveillance clip carries a time stamp reading 9:47 p.m. Name two reasons that stamp could be wrong, and state what you would do before relying on it.

There is one more honest caution to plant before we go further. Footage feels like direct evidence — like seeing the event yourself — and that feeling is exactly its danger. A juror who watches a grainy clip experiences a powerful illusion of having witnessed the truth, and that illusion can override the very real uncertainties about identity, timing, and authenticity that the analyst is duty-bound to flag. The CSI effect (Chapter 1, §1.2) has a video dialect: the public believes both that any footage can be magically clarified and that "I saw it on the video" settles a question the footage does not actually settle. Both halves of that belief are wrong, and both are this chapter's business.

26.2 Photogrammetry: measuring from images

A camera is not only a witness; it is, with care, an instrument of measurement. Photogrammetry is the science of obtaining reliable measurements of real-world objects and distances from photographs or video — most famously, in forensics, estimating the height of a person captured by a surveillance camera. The logic is geometric and genuinely sound: an image is a projection of three-dimensional space onto a two-dimensional sensor, and if you can reconstruct the geometry of that projection, you can work backward from positions in the image to dimensions in the world.

The most defensible photogrammetric method is reverse projection (also called scene-reconstruction photogrammetry). The principle is elegant: return to the actual scene, place a measuring reference — a marked pole, a calibrated target — at the exact spot where the person stood in the footage, and record new images from the same camera in the same position. By comparing the unknown person's apparent height in the original frame against the known reference at the same location, the analyst can estimate the person's true height, because both are subject to the identical projection. The camera's distortions, its angle, its lens — all cancel, because the reference experiences them too.

REVERSE-PROJECTION HEIGHT ESTIMATION  (side view — schematic, not to scale)

        camera ▣
          \  · · · · · · · · · · · ·  line of sight to top of head
           \                      ·
            \        suspect      ·  reference pole
             \         │‖         ·   ║ (known height, placed on the
              \        │‖  ........·...║  exact spot in a return visit)
   floor ──────●───────┴┴──────────────╨────────────────
              camera        STAND POINT (same spot, both)
              base

  The suspect's apparent head-top in the original frame is read against the
  reference pole imaged from the SAME camera position. Because both share the
  camera's angle, lens, and perspective, those distortions cancel, and the
  suspect's true height is estimated from the comparison.
  Measurements are ILLUSTRATIVE; a real reconstruction carries exact figures
  and a stated uncertainty range.

Walk through that diagram, because its strength and its fragility live in the same details. The strength: reverse projection is empirical — it does not assume the camera's parameters from a manufacturer's spec sheet; it measures the actual projection by putting a known object through it. Done well, with the original camera undisturbed and a careful return to the exact stand point, height estimates can be quite good, reported as a range (for example, "approximately 178–184 cm") rather than a false-precision point. This is real science with honest error bars, and courts have accepted it.

Now the fragility, and an honest analyst names every source of it:

The stand point must be right. If you place the reference pole even a short distance from where the person actually stood — nearer or farther from the camera — perspective changes the apparent size, and the estimate drifts. Establishing the exact spot from the footage is itself an inference.
Posture corrupts height. People slouch, lean, crane, rise on their toes, or wear shoes and hats of unknown thickness. A person is rarely standing in anatomical attention at the moment the frame is captured. Footwear and headwear add centimeters the analyst cannot see; posture can subtract them. The honest output is the height of the figure as imaged, converted to a range that should widen to absorb these unknowns.
The camera must be unchanged. If the surveillance camera was moved, replaced, refocused, or its lens swapped between the incident and the reconstruction, the projection no longer matches and the method breaks.
Image quality bounds precision. A blurry, low-resolution, or interlaced frame makes the top of the head and the floor line ambiguous by several pixels, and each pixel of ambiguity is centimeters of uncertainty at distance.

⚖️ In the Courtroom Photogrammetric height evidence is at its most honest when it excludes and at its most dangerous when it includes. "The figure in the footage is approximately 165–172 cm; the defendant is 188 cm" is a clean, powerful exclusion — a height mismatch that, like a DNA non-match (Chapter 1, §1.6), can clear a suspect. But "the figure is approximately 180–186 cm, and the defendant is 183 cm" does not identify the defendant; it merely fails to exclude him, alongside the very large fraction of adult men who fall in that band. A skilled cross-examination presses exactly here: How many men are within your range? Did you account for the shoes? For posture? Was the camera in the identical position? The evidence's honest verb is "consistent with," and an analyst who lets a jury hear "the defendant is the man in the video" because the heights overlap has stepped from measurement into overstatement.

There is a related, and far more contested, use of images for identity: facial comparison (sometimes "facial mapping"), in which an examiner compares a face in footage to a known person, feature by feature, and offers an opinion on whether they are the same individual. Unlike reverse-projection height estimation, facial comparison from imagery has a thin validation record and a documented history of error; PCAST-style questions about its measured error rate are largely unanswered, and it has contributed to wrongful identifications (see Case Study 26.1). Automated facial recognition — the software that searches a database for candidate matches — is a powerful investigative lead generator but a poor proof: it returns ranked candidates, its accuracy varies sharply with image quality and, troublingly, with the demographic group of the subject, and a candidate is a starting point for investigation, never an identification on its own. We will return to the bias dimension of these tools; for now, hold the distinction: measuring a dimension (height) from an image rests on solid geometry, while identifying a face from an image rests on much shakier ground.

🧠 Cognitive-Bias Watch Identifying a person from an image is exquisitely vulnerable to expectation. When an analyst already knows whom the detective suspects, the comparison stops being a neutral search for agreement and becomes a hunt for confirmation: ambiguous features get read toward the suspect, dissimilarities get explained away as "lighting" or "angle," and a resemblance hardens into a "match." An automated system's ranked candidate makes this worse, not better — it hands the analyst a name to confirm. The danger is not hypothetical or confined to the untrained; it is the same dynamic that, in a database "candidate match," can anchor an entire investigation on the wrong person, and it is most acute exactly when the stakes and the pressure are highest. The safeguard is the one we will name formally in Chapter 31: keep the suspect's identity (and any "this is probably him") away from the examiner until the comparison is fixed, compare against appropriate non-suspects, and treat a database candidate as a lead to be tested, never an identification to be defended. A facial comparison made knowing the answer is worth far less than one made blind — even when the two agree.

🔍 Check Your Understanding 1. Why does reverse-projection photogrammetry "cancel" the camera's lens distortion and angle, where a method that assumed the camera's specifications from a spec sheet would not? 2. A height estimate of "approximately 181–187 cm" matches the defendant. Explain why this is an exclusion tool that has, in this instance, failed to exclude — not an identification.

26.3 The truth about "enhancement"

Here is the chapter's central myth, and the place where television has done the most damage to public understanding of forensic science. On screen, enhancement is magic: a smear of pixels, a command to "zoom and enhance," and a readable face or license plate materializes from nothing. In reality, enhancement is the set of operations that make information already present in an image easier for a human to perceive — and it can never add information that was not captured in the first place. That sentence is the whole truth of §26.3, and it is worth reading twice, because nearly every abuse of image evidence is a violation of it.

Think about what a digital image is: a grid of pixels, each holding a recorded value. The detail in the image is fixed at the moment of capture by the sensor's resolution, the lens, the focus, the exposure, the motion, and the compression. If the license plate occupies six blurry pixels, those six pixels contain a fixed, finite amount of information — and no processing can manufacture the dozens of pixels of crisp detail that would be needed to read the plate. The information was never recorded. You cannot enhance your way to data that does not exist; you can only make what is there easier to see, and sometimes you cannot even do that.

What, then, can legitimate enhancement actually do? Real, validated operations include:

Brightness and contrast adjustment — making detail in shadows or blown-out highlights perceptible by remapping the existing tonal values.
Sharpening — increasing local contrast at edges so that boundaries already present become easier for the eye to resolve. (Over-sharpening creates artifacts that can be mistaken for detail — a trap.)
Noise reduction — averaging or filtering to suppress random sensor noise so the underlying signal stands out.
Frame averaging / integration — combining multiple frames of the same static scene to reduce noise, which genuinely recovers a cleaner image because the signal is consistent across frames while the noise is not. This is one of the few techniques that legitimately increases usable detail, and only for stationary subjects.
Geometric correction — undoing known lens distortion (e.g., fisheye warping) using measured parameters, so shapes are rendered truthfully.
Color and de-interlacing corrections — repairing artifacts of the recording format so the existing content is displayed faithfully.

Every one of these operates on information the image already holds. None creates a face, a plate, or a feature that the sensor did not record. And the discipline of honest enhancement is twofold: the analyst must (1) document every operation so the process is reproducible — another examiner, given the original and the steps, must be able to reproduce the result — and (2) always preserve and work from a copy of the unaltered original, never the original itself.

⚠️ Junk-Science Alert The "zoom-and-enhance" fantasy is not harmless entertainment; it has a courtroom cost in both directions (the CSI effect, §1.2). In one direction, jurors expect that any blurry footage could have been clarified, and they discount a real case when the police "didn't bother" to produce an impossible super-resolution image. In the other — the dangerous one — an over-eager analyst applies aggressive sharpening, interpolation, or "super-resolution" software and presents the output as a faithful rendering of what was recorded, when in fact the algorithm has invented plausible detail. Modern machine-learning "upscalers" are the sharp edge of this danger: a neural network trained on millions of faces can produce a sharp, confident, entirely fabricated face from a blur — a face that looks like a real person and is not. It is hallucinated detail, statistically plausible and forensically worthless, because it reflects the model's training data, not the crime scene's photons. The honest rule: enhancement reveals; it never invents. If a process adds detail that was not in the original capture, its output is not evidence of what happened — it is an illustration of what the software guessed. Any "enhanced" image whose steps cannot be reproduced from the original, or that contains detail finer than the original resolution could support, should be treated as argument, not evidence.

So when is enhancement genuinely useful? When the needed information is in the image but hard to see — detail lurking in shadow, a face washed out by glare, an edge buried in noise, a plate legible at full resolution but overlooked. In those cases, honest, documented enhancement is a real service, and it earns a real place on the validity spectrum: the underlying operations are well-understood signal processing, the results are reproducible, and the limits are knowable. The method is sound. The trouble is never the brightness slider; it is the temptation to cross from revealing into inventing, and to let a jury believe the camera saw more than it did.

🔬 Read the Evidence

text FIGURE 26.1 — "Two enhancements of the same blur" [constructed teaching example] THE ITEM A six-pixel-wide blur in a surveillance frame where a license plate should be. Two processed versions are offered: (A) brightness/contrast and noise-reduction applied, leaving the plate still unreadable but with two characters faintly discernible; (B) a machine-learning "super-resolution" output showing a crisp, fully legible plate number. THE CONTEXT Same original file, same analyst, two processing pipelines, both documented. WHAT IT SHOWS (A) reveals the limited information actually present: two characters can be read, the rest cannot. (B) displays a sharp plate — but the original six pixels could not contain that many characters of real detail. WHAT IT DOESN'T (B) does NOT show what the camera recorded; it shows what a model trained on other plates generated to fill the gap. The crisp characters in (B) are not measurements of the scene; they are a plausible guess. THE INFERENCE (A) is admissible enhancement: it reveals the real, partial information ("two characters are consistent with 3 and 8; the remainder is not legible"). (B) is a fabrication wearing the costume of evidence and should be excluded as such. THE LESSON Enhancement reveals what the sensor captured and stops there. The moment an output contains more detail than the original resolution can support, it has crossed from science into illustration — and a confident, legible fake is far more dangerous than an honest, partial blur.

26.4 Image and video authentication

If enhancement is the field's great myth, authentication is its real and growing job. Image authentication is the process of determining whether an image or video is what it purports to be — genuine and unaltered, captured by the device and at the time and place claimed — or whether it has been manipulated, fabricated, or mislabeled. As editing tools have become trivial to use and synthetic media has become convincing, this question has moved from the margins of the discipline to its center. Increasingly, the forensic problem is not "what does this footage show?" but "is this footage real at all?"

Authentication draws on several families of technique, and the honest analyst treats them as converging indicators rather than single-test verdicts — because every one of them, used alone, can mislead.

Metadata and container analysis. Every digital image and video file carries embedded data — the provenance trail we cover in §26.6 — describing the capturing device, settings, timestamps, and editing history, along with structural features of the file format itself. Inconsistencies (a JPEG whose internal structure does not match any camera that claims to have made it, a file "last modified" by photo-editing software, a video whose container metadata contradicts its claimed source) are red flags. Metadata is powerful when present and intact — and, crucially, easily stripped or forged, so its absence proves nothing and its presence must itself be authenticated.

Compression and format forensics. Most images and videos are stored with lossy compression (JPEG for stills, various codecs for video), which leaves characteristic, regular patterns. When part of an image is edited and the file is re-saved, the manipulated region often carries a different compression history than its surroundings — a discontinuity that analysis can sometimes detect. Doubly-compressed regions, mismatched JPEG quantization, and broken block-grid alignment are classic signatures of tampering.

Error level analysis (ELA). Because it is so often invoked — and so often misunderstood — error level analysis deserves its own treatment. ELA re-saves a JPEG image at a known compression level and examines the difference between the original and the re-saved version. The reasoning: areas of an image that have been edited and re-saved tend to have a different error level (they compress differently) than areas that have only ever been saved once, so a manipulated region can appear with a distinct brightness in the ELA map. ELA can, in favorable cases, draw the eye to a pasted-in object or a cloned region.

ERROR LEVEL ANALYSIS — what it actually does  (schematic)

  ORIGINAL JPEG  ──► re-save at known quality ──► RE-SAVED JPEG
        │                                              │
        └──────────────►  pixel-by-pixel DIFFERENCE  ◄─┘
                                  │
                                  ▼
                          ELA MAP (brightness = size of error)
        ┌───────────────────────────────────────────┐
        │  · · · · · · · · · ·  uniform (single save)│
        │  · · · ███ · · · · ·  ← a region with a    │
        │  · · · ███ · · · · ·    DIFFERENT error     │
        │  · · · · · · · · · ·    level: possible edit │
        └───────────────────────────────────────────┘

  A bright, anomalous region MAY indicate manipulation — or may be an edge,
  a high-contrast area, or text, all of which produce high error naturally.
  ELA is a SCREENING hint that something deserves a closer look, not proof.

🔬 At the Bench Read that ELA map with discipline, because ELA is the single most over-interpreted tool in amateur image forensics. Bright regions in an ELA map are routinely produced by perfectly innocent features — sharp edges, high-contrast boundaries, overlaid text, and areas of fine texture all compress differently and "light up" — so a bright spot is not a manipulation flag on its own. ELA also fails completely on images that have been re-saved many times (the differences wash out), on formats other than JPEG, and on screenshots. It cannot detect a manipulation that was done and then flattened by a single uniform re-compression. The honest use of ELA is as a quick screening hint that points an analyst toward a region to examine with other methods — never as a standalone verdict. An expert who projects a glowing ELA map and says "the bright area proves this was Photoshopped" is misusing the tool, and a competent cross-examination will dismantle the claim by showing the same glow on the unedited edges of the image.

Physics and content consistency. Some of the most robust authentication checks come from physical impossibility. Do the shadows in the image all fall consistently with a single light source, or does a pasted-in object cast its shadow the wrong way? Are the reflections in eyes, windows, and water geometrically coherent? Do perspective and vanishing lines agree across the scene? Is the noise pattern uniform, or does one region carry the sensor-noise fingerprint of a different camera? These checks are powerful precisely because a forger must get every physical detail right, and small inconsistencies betray a composite. They require expertise and careful reasoning, but they rest on real physics rather than on a single fragile statistic.

Source-device identification. Every camera sensor has tiny manufacturing imperfections that impose a faint, consistent noise pattern — a kind of sensor "fingerprint" (often called photo-response non-uniformity) — on every image it takes. In favorable conditions, this pattern can link an image to the specific device that captured it, or reveal that a region of an image came from a different device than the rest. The technique is real and can be strong, but it depends on image quality and on having the candidate device (or enough reference images from it) for comparison.

The discipline's honest posture, then, is convergence under uncertainty. No single test authenticates an image; a defensible conclusion is built from several independent indicators that agree, with each indicator's limits stated. And the field is in an arms race: as detection improves, manipulation improves to evade it, which is exactly why authentication is the frontier and why the next section's problem — synthetic media — is so hard.

🔍 Check Your Understanding 1. An ELA map shows a bright rectangle around some text overlaid on a photo. Why is concluding "the text region was manipulated" an error, and what does the bright region more likely reflect? 2. Why does the absence of editing-software metadata in a file's header fail to establish that the file is authentic?

26.5 Deepfakes and synthetic media

We arrive at the problem that is changing the field fastest. A deepfake is synthetic or manipulated media — most often video or audio — generated or altered by machine-learning techniques (commonly deep neural networks, from which the term takes its name) to depict a real person saying or doing something they did not say or do, with a realism that can be difficult or impossible to detect by eye. The category is broader than face-swapping: it includes fully synthesized faces of people who do not exist, voice cloning that reproduces a specific person's speech, lip-sync alterations that change the words a real recording shows someone speaking, and "puppeteering" that drives one person's face with another's expressions. Alongside the sophisticated deepfake sits the humbler cheapfake — media manipulated with simple, non-AI edits (slowing footage to feign intoxication, mislabeling a real video as a different event, selective cropping) — which is cruder but, because it is so easy to make and share, does enormous real-world damage.

For forensic science, deepfakes pose two distinct threats, and it is worth separating them:

Fabricated inculpatory evidence. A convincing fake could place an innocent person at a scene, in an act, or in a confession that never happened. This is the nightmare the discipline is racing to defend against.
The "liar's dividend." Subtler and arguably more corrosive: once the public knows that convincing fakes exist, genuine evidence can be dismissed as a possible fake. A real, incriminating video can be waved away — "that's probably a deepfake" — and the mere existence of the technology lends that dodge plausibility. Authentic evidence loses some of its power simply because fakery has become conceivable.

How is a deepfake detected? The honest answer is: imperfectly, and with methods that age quickly. Detection approaches fall into a few families:

Artifact detection. Early and current deepfakes often leave subtle tells — inconsistent or absent eye-blinking, unnatural eye reflections that do not match the scene's lighting, mismatched lighting on the face versus the background, irregular blending at the boundary of a swapped face, teeth or hair rendered as a smear, audio whose mouth movements do not perfectly align (the lip-sync problem), or physiological signals (like the faint color change of blood flow in real skin) that synthesis fails to reproduce. These artifacts are real, but each one is a moving target: as soon as a tell is published, the next generation of models is trained to eliminate it.
Machine-learning classifiers. Detectors are themselves neural networks, trained to distinguish real from synthetic media. They can perform well on the kinds of fakes they were trained on and degrade sharply on novel generation methods or on media that has been compressed and re-shared (as social-media footage always is). A detector that scores 99% on a benchmark may fail on a fake made with a method released the month after the detector was trained.
Provenance and authentication-at-capture. The most promising long-run defense is not detection at all but provenance (§26.6): cryptographically signing media at the moment of capture and tracking every subsequent edit, so that genuine media can prove its lineage rather than relying on after-the-fact fake-spotting. We return to this in the next section.

⚖️ In the Courtroom The legal system has not yet metabolized deepfakes, and that is itself the most important fact for a court. There is, as yet, no broadly validated, standardized, error-rate-characterized method for proving a piece of media is a deepfake to the standard Daubert (Chapter 5) demands — the detection methods are new, fast-changing, and not yet backed by the kind of independent, well-designed validation studies PCAST (Chapter 6) made the benchmark for feature-comparison methods. The honest expert says exactly that: "current detection methods can identify certain synthetic-media signatures and can flag anomalies consistent with manipulation, but a clean detector result does not prove a video is authentic, and the field's error rates on novel fakes are not well characterized." Expect authentication and provenance — establishing what a piece of media is and where it came from — to become routine pretrial work, and expect the burden to shift toward parties being able to demonstrate a chain of custody and provenance for the media they offer. A juror's instinct that "video is proof" is exactly the instinct this technology breaks.

Where does deepfake detection sit on the validity spectrum? At present, frankly, in motion and toward the unsettled end: a genuinely important problem, with real and improving techniques, but without the stable, validated, error-rate-characterized foundation that would let an analyst testify to a conclusion with quantified confidence. This is not a counsel of despair — provenance approaches are promising, and artifact and classifier methods have real investigative value — but it is a counsel of honesty. The field is building the airplane in flight, and a witness who claims settled certainty about deepfake detection is claiming more than the science currently supports. Hold synthetic-media evidence to the same yardstick as everything else, and be especially wary when the stakes are high and the method is new.

26.6 Provenance and metadata

The thread that runs through this whole chapter — and connects it back to Chapter 25 — is provenance: the documented origin and history of a digital file, the record of where it came from, what device created it, and what has been done to it since. If authentication asks "is this what it claims to be?", provenance is the evidence that answers the question. And in a world of plausible fakes, provenance is shifting from a nice-to-have to the foundation of trustworthy media evidence.

The traditional carrier of provenance is metadata — data embedded in or attached to a file that describes the file rather than its visible content. For images, the most common standard is EXIF (Exchangeable Image File Format), which can record the camera make and model, the date and time, exposure settings, and — on many phones — GPS coordinates. Video and other formats carry analogous container and stream metadata. When intact and genuine, metadata is investigative gold: it can place a photo at a location and time, link it to a specific device, and reveal an editing history (for instance, a "software" field naming the photo editor that last touched the file).

But — and this is the recurring caution, the same one we met in Chapter 25 — metadata is fragile and forgeable. It is trivially stripped (uploading to most social-media platforms removes EXIF automatically; many messaging apps re-compress and strip), it is editable with free tools (timestamps and GPS coordinates can be rewritten), and its mere presence does not prove its truth. So metadata must itself be authenticated, and its absence must not be over-read: a photo with no EXIF is not thereby fake, and a photo with EXIF claiming a time and place has only made a claim that needs corroboration.

🔬 At the Bench A worked example of reading metadata honestly. Suppose a suspect offers an alibi photo whose EXIF says it was taken at 8:14 p.m. on the relevant evening, at GPS coordinates forty miles from the scene. What does this establish? By itself, less than it appears. The EXIF timestamp reflects the camera's clock, which could have been set wrong (deliberately or not); the GPS field, if the device even recorded it, can be edited after the fact; and the file may have been re-saved in a way that altered or fabricated the metadata. The honest analyst does not declare the alibi confirmed or debunked from the EXIF alone. Instead they ask: Is the metadata internally consistent (do the timestamp, the file's modification times, the device fields, and the image content all agree)? Does it match the device's actual settings? Is there independent corroboration (cell-site data from Chapter 25, a transaction, a second device)? Inconsistency within the metadata — a "date taken" that postdates the "date modified," a software field showing an editor, a timestamp that contradicts the file's other timestamps — is a strong red flag that something has been altered. Consistency, by contrast, is necessary but not sufficient: it fails to exclude tampering, but does not prove authenticity.

That asymmetry is the section's lesson, and it is the book's lesson again (Chapter 1, §1.6): metadata inconsistency can undercut a claim cleanly, while metadata consistency only fails to exclude tampering. An alibi video whose internal timestamps contradict each other has a real problem; an alibi video whose metadata all agrees has merely passed one test among many.

Because metadata is so fragile, the field is moving toward something stronger: content provenance standards that bind authenticity to the file cryptographically. The most prominent effort, the C2PA (Coalition for Content Provenance and Authenticity) standard, defines a tamper-evident way to attach a signed record of a file's origin and edit history — "content credentials" — so that a viewer (or a court) can verify the chain from capture forward and detect if it has been broken. The vision is an authentication-at-capture model: a camera cryptographically signs an image when it is taken, and every subsequent edit is logged in a verifiable manifest. If widely adopted, this would shift media authentication from the losing game of detecting fakes after the fact to the winnable game of proving genuine media's lineage — the same logic by which hashing (Chapter 25) proves a file has not changed. It is not yet universal, and unsigned media will exist for the foreseeable future, but it points toward where trustworthy image evidence is headed.

🔍 Check Your Understanding 1. A photo's EXIF data shows a "date taken" of October 14 but a "date modified" that is earlier. Why is this a red flag, and what does it suggest? 2. Explain why content-provenance signing (proving a file's lineage from capture) is a more durable strategy than after-the-fact deepfake detection.

🗂️ The Case File

Three pieces of imagery, three different strengths. By this point the Mill Creek investigation had already overturned its founding assumption — the autopsy established that Marcus Diallo was dead before the fire (Chapter 11), the fire was incendiary on valid grounds (Chapter 22), and the digital trail had begun to strain Roy Keller's account (Chapter 25). This chapter adds the imagery, and — as always — the discipline is in stating each piece at its true strength and no higher.

The gas-station CCTV. Investigators recovered the native surveillance file (exported from the DVR, hashed, full resolution — not a phone-of-a-monitor copy) from a fuel station a few days' drive of the relevant dates before the fire. The footage shows a person purchasing two red plastic gas cans at the register. Reverse-projection photogrammetry, performed by returning to the store and imaging a calibrated reference at the customer's stand point with the same camera, estimated the figure's height as a range consistent with Keller's stature; the figure's build and clothing are likewise consistent. State it honestly: the footage shows a person consistent with Keller buying gas cans — it does not, by photogrammetry and appearance alone, identify him to the exclusion of every other person of similar height and build. The honest verb is consistent with. (Note the link to earlier evidence: gasoline was the confirmed accelerant, Chapters 21–23, which is what makes a gas-can purchase relevant rather than incidental.)

The doorbell camera. A residential security/doorbell camera along a plausible route added a separate sighting of a vehicle and figure consistent with the CCTV — a second, independent fragment of imagery placing a consistent person and vehicle in the relevant window. Again: corroboration of presence and timing, not identification.

The alibi video. Keller offered a video he said showed him forty miles away that evening. On examination, the file's metadata is internally inconsistent — the timestamps do not cohere, and the provenance trail does not support the claimed time of recording. This undercuts the alibi: the video's claim to fix Keller's location that evening is questionable, and an inconsistent provenance trail is a real problem (§26.6). But note the asymmetry honestly — showing the alibi video is unreliable removes a point in Keller's favor; it does not, by itself, prove he was at the cabin.

Running status — corroboration, not proof. Taken together, the imagery corroborates purchase and presence: a person consistent with Keller buying gas cans, a consistent sighting en route, and an alibi whose provenance does not hold up. This converges with the digital, soil, and DNA threads already in the file — but it remains corroboration. Photogrammetry and appearance support "consistent with Keller," not "this is Keller and no other"; the broken alibi subtracts a defense, but adds no direct proof of the killing. Log each piece in the workbook (Appendix I) at its true strength: consistent with for identity, questionable provenance for the alibi, corroborates for the whole. Resist converting "consistent with" into "identified." The case is tightening, but the science here adds bricks, not a verdict. It is a capital mistake to theorize before one has data — and the data here say "consistent," not "certain."

Conclusion

Cameras are everywhere, and almost none of their footage was made to be evidence — which is exactly why image forensics is a discipline of careful measurement and patient authentication rather than the magic television advertises. We have separated the field's real powers from its central myth. Footage can establish that an event happened, when (with verified timing), where, and what was done — and can sometimes exclude a suspect cleanly on height or appearance — but it identifies an unknown person only with great difficulty and great caution. Photogrammetry recovers genuine measurements from images through sound geometry, strongest as an exclusion tool and honest only when reported as a range with its assumptions named. Enhancement reveals information already captured and can never create what the sensor did not record; the "zoom-and-enhance" fantasy, and its modern, dangerous cousin the machine-learning "upscaler" that hallucinates plausible detail, are the chapter's chief junk-science warning. Authentication — built from converging indicators (metadata, compression and format forensics, the often-overstated error level analysis, physics-and-content consistency, and source-device fingerprints), no one of them a verdict alone — has become the field's real frontier, and deepfakes are pushing it forward faster than the law can follow, with detection methods that are real but unsettled and not yet validated to Daubert's standard. Underlying all of it is provenance: the file's documented lineage, carried fragilely by metadata and, increasingly, durably by cryptographic content-authentication standards.

On the validity spectrum, the chapter spreads out honestly: documented photogrammetric measurement and well-understood enhancement operations rest on solid ground; ELA and facial comparison are screening hints and contested opinions, not proofs; deepfake detection is a moving, unsettled frontier. The through-line is the book's: state what the image supports, not what it would be satisfying to claim; reach for exclusion and consistent with before identifies; and remember that the more a piece of media feels like direct truth, the more carefully it must be authenticated. In the next chapter we follow a different kind of trail — not light captured by sensors, but money moved through accounts — as forensic accounting asks the oldest investigative question of all: who benefits?

Key Terms

Image forensics — the application of scientific methods to the analysis, measurement, and authentication of images and video for legal purposes.
Photogrammetry — the science of obtaining reliable real-world measurements (especially a person's height) from photographs or video by reconstructing the geometry of the camera's projection; most defensibly via reverse projection.
Image authentication — the process of determining whether an image or video is genuine and unaltered and is what it purports to be (captured by the claimed device, time, and place), or has been manipulated, fabricated, or mislabeled.
Error level analysis (ELA) — an authentication screening technique that re-saves a JPEG at a known compression level and maps the difference, on the theory that edited regions compress differently; a hint toward areas to examine, not a standalone proof of manipulation.
Deepfake — synthetic or manipulated media generated or altered by machine-learning techniques to depict a real person saying or doing something they did not, with realism that can be difficult to detect.
Enhancement (limits) — operations that make information already present in an image easier to perceive (brightness, contrast, sharpening, noise reduction, frame averaging, geometric correction); it can clarify what was captured but can never add detail the sensor did not record.
Provenance — the documented origin and history of a digital file (its source device, capture time and place, and edit history), carried fragilely by metadata and, increasingly, durably by cryptographic content-authentication standards.

Spaced Review

A surveillance clip is offered as a phone recording of a security monitor, with a time stamp reading 10:02 p.m. Name two distinct integrity problems with this evidence (one about the copy, one about the time stamp) and state what an investigator should have obtained instead. (§26.1)
Reverse-projection photogrammetry estimates a figure at "approximately 168–174 cm," and the suspect is 190 cm. Is this an exclusion or an inclusion, and why is it the stronger of the two kinds of statement this method can make? Connect it to the exclusion-over-proof idea from Chapter 1. (§26.2; Chapter 1, §1.6)
Explain why a machine-learning "super-resolution" image that renders a crisp license plate from a six-pixel blur is not evidence of the plate number, using the one-sentence rule about what enhancement can and cannot do. (§26.3)
The metadata of a digital file connects this chapter to Chapter 25. State the recurring caution both chapters make about metadata, and give one reason its absence proves nothing and its presence must be authenticated. (§26.6; Chapter 25)
Validity-spectrum question. Where does documented reverse-projection photogrammetry sit relative to deepfake detection on the NAS 2009 / PCAST 2016 spectrum, and what specific feature — present in the first, absent in the second — accounts for the difference? (§26.2, §26.5; and the spectrum from Chapter 1)