> "The camera is an instrument that teaches people how to see without a camera."
Prerequisites
- 29
- 31
- 32
Learning Objectives
- Explain in plain terms what a generative image model does, where its pixels come from, and why that makes it categorically different from a camera.
- Distinguish AI-assisted editing (denoise, upscale, masking) from generative editing (fill, extend, replace) and locate any given edit on the capture-to-conjured spectrum.
- Apply the enhancement/manipulation test from Chapter 29 to AI tools and decide, per use, what your genre and your conscience permit.
- Disclose AI involvement honestly using plain language and embedded content credentials, and read the provenance of an image you are handed.
- Articulate where the photographer's contribution remains irreplaceable, and write a stated, defensible personal stance on AI in your own work.
In This Chapter
- Overview
- Learning Paths
- 33.1 What generative image models do (and don't)
- 33.2 AI in the editing pipeline: denoise, upscale, generative fill
- 33.3 The line between editing and generating
- 33.4 Disclosure, labeling, and trust
- 33.5 Copyright, training data, and the open questions
- 33.6 Where the photographer adds irreplaceable value
- Portfolio Checkpoint
- Summary
- Spaced Review
- What's Next
Chapter 33: AI and Generative Imaging: Tools, Disclosure, and What a Photograph Now Means
"The camera is an instrument that teaches people how to see without a camera." — attributed to Dorothea Lange
Overview
Here is a scene you can run in your own head. You photograph a quiet beach at dawn — long, clean, beautiful, exactly the light you waited an hour for. There is one problem: a red plastic bucket someone left near the waterline, dead center, glowing in the soft light, ruining the emptiness you came for. Three years ago you had two honest choices: walk down and move the bucket, or live with it. Today your phone offers a third. You circle the bucket with your finger, the software thinks for two seconds, and the bucket is gone — replaced by sand and foam that were never there, invented to match what the model guessed should be underneath. The photograph now shows an empty beach that, in the moment you pressed the shutter, was not empty.
Is that still a photograph? Is it a lie? Does it matter that no one will ever know? Those three questions are the whole of this chapter, and they do not have tidy answers — but you cannot work honestly in this decade without having thought them through and arrived at a stance you can state and defend.
This is the one chapter in the book where the technology genuinely threatens to change what the word "photograph" means. Everything before this chapter has been about a photograph as a record — light that actually existed, bouncing off objects that were actually there, measured and kept (Chapter 1). Generative AI breaks that chain. It can produce an image that looks exactly like a photograph without any light, any lens, or any moment behind it. So we are not going to teach you to be afraid of these tools, and we are certainly not going to sell them to you. We are going to teach you to understand precisely what they do, where they sit relative to the craft you have spent thirty-two chapters building, and how to be honest — with your viewers, your clients, and yourself — about when and how you use them.
In this chapter, you will learn to:
- Understand what a generative image model actually does — where its pixels come from, and why that makes it a fundamentally different thing from a camera.
- Place every AI tool you meet on a single spectrum from captured to conjured, from a denoise slider that cleans a real frame to a text prompt that invents one whole.
- Extend the enhancement-versus-manipulation test you learned in Chapter 29 to AI, and decide — per image, per genre — what is honest and what is not.
- Disclose AI involvement in plain language and with embedded content credentials, and read the provenance of an image handed to you.
- Locate, clearly and confidently, the large territory of value that AI cannot touch — and write your own stated stance on AI in your work, the deliverable for this chapter.
Learning Paths
📱 Mobile-only: The generative tools arrived on your phone first — magic eraser, generative editing, AI portrait modes are already in your camera app. Sections 33.2, 33.3, and 33.4 are written for you most of all, because you are the reader most likely to cross the editing/generating line without noticing. Learn where it is. 🎨 Hobbyist: You are deciding what these tools mean for images you make for love, not money. §33.3 (the line) and §33.6 (where you still matter) are the heart of the chapter for you; the copyright material (§33.5) is useful context but lower-stakes. 💼 Pro-track: Disclosure and copyright are not philosophy for you — they are liability. §33.4 (disclosure, labeling, client trust) and §33.5 (copyright, training data, what you can and can't own and sell) are required reading. A misstep here can cost a contract or a contest entry. 🎓 Student: This is the chapter most likely to appear on an exam and in a heated seminar. The Portfolio Checkpoint — your written, defensible stance on AI — is exactly the kind of reflective position you will be asked to articulate again and again. Draft it carefully.
33.1 What generative image models do (and don't)
Start with the thing that matters most, because almost every confusion about AI imaging dissolves once you have it: a camera measures light that was really there; a generative model predicts pixels that were never there. Those are not two flavors of the same activity. They are opposites that happen to produce the same file format.
When you press the shutter, photons that bounced off a real scene strike a sensor, and the camera records, pixel by pixel, how much light of which color arrived at each point (Chapter 2). The image is a measurement. It can be a biased measurement — you chose the light, the moment, the frame, the focus — but it is anchored, at every pixel, to something that physically happened in front of the lens. That anchoring is the entire reason a photograph has ever been trusted as evidence, as memory, as witness.
A generative AI is a computer system that produces new content — here, images — by predicting what is statistically likely, based on patterns it learned from an enormous body of existing images. It is not connected to any scene. When it makes a picture of a beach, no beach was photographed; the system produced an arrangement of pixels that resembles the millions of beach images it was trained on. The result can be photorealistic down to the grain of the sand, and there is no sand. There is no light, no lens, no moment. There is a very sophisticated guess about what a photograph of a beach tends to look like.
How a diffusion model builds a picture — the plain-language version
Most of today's photorealistic image generators are diffusion models: systems that learn to create images by reversing a process of adding noise — starting from pure visual static and removing noise step by step until a coherent image emerges, steered toward your request. You do not need the mathematics to use the tools wisely, but you do need the intuition, because the intuition tells you exactly what these tools are good and bad at.
FIGURE 33.1 — How a diffusion model makes an image (conceptual, left → right)
TRAINING (done once, in advance)
┌───────────────────────────────────────────────────────────────┐
│ millions of ──► add noise ──► learn to ──► a model that can │
│ real images step by step UNDO the noise "denoise" static │
│ (+ captions) until static at every step toward an image │
└───────────────────────────────────────────────────────────────┘
GENERATION (every time you ask for a picture)
[ "a red door in golden light" ]
│ your text prompt steers the steering wheel
▼
███▓▓▒▒░░ ──► ▓▒░ shapes ──► door appears ──► refined ──► FINISHED IMAGE
pure noise emerging from fog detail (no door ever existed)
At NO point did light from a real door reach a sensor. The "door" is the model's
best statistical guess at what the words "a red door in golden light" should look like.
Walk through what that diagram is telling you. During training, the model was shown a staggering number of images, each progressively destroyed with noise, and it learned the reverse trick: given a noisy image, predict the slightly-less-noisy version. Do that enough times, starting from pure static, and a clean image condenses out of the fog — like a Polaroid developing, except the picture was never of anything. Your text prompt is the steering: it biases every denoising step toward the words you typed. "A red door" nudges the emerging shapes toward door-ness and red-ness, because in training, images captioned "red door" had certain statistical fingerprints.
🔬 The Physics: (Optional — skip without penalty.) The honest technical name for what a diffusion model learns is a probability distribution over images. Training estimates, very roughly, "what do real images look like?" — and sampling draws a new point from that learned distribution, conditioned on your prompt. This is why generated images have a recognizable texture of plausibility: they are, by construction, the most likely-looking thing rather than a true thing. It is also why they hallucinate specifics — six-fingered hands, melted typography, jewelry that loops through itself, reflections that do not correspond to the scene. The model has no physical constraint forcing a hand to have five fingers; it only has the statistical tendency that hands usually do. Light in a real photograph obeys the inverse-square law and casts physically consistent shadows because it is light (Chapter 11). Light in a generated image only looks like it obeys those laws, on average, which is why it so often almost-does.
What they are genuinely good at — and genuinely bad at
Knowing the mechanism, you can predict the failure modes without anyone listing them, which is far more useful than memorizing a list that next year's models will partly fix.
Generative models are good at: plausible texture and surface; filling small areas with more of what surrounds them (more sky, more sand, more grass); broad stylistic mimicry; and anything where "looks right on average" is good enough. They are bad at: specific truth (a particular person's exact face, real text, a real place's real geometry); physical consistency across an image (shadows, reflections, counts of things, the way a strap actually wraps a shoulder); and anything where being correct matters more than being plausible. A model will happily give you a beautiful, confident, wrong answer, because plausibility is the only thing it was ever optimizing.
🚪 Threshold Concept: A camera answers the question "what was here?" A generative model answers the question "what would something like this probably look like?" Those are different questions, and the gap between them is where every ethical, legal, and creative issue in this chapter lives. The moment you internalize that AI imaging produces plausibility, not evidence, you stop being fooled by how real it looks and start asking the only question that matters: does this image need to be true, or only convincing? For a magazine cover concept, convincing may be plenty. For a news photo, a wedding album, a documentary, or a courtroom, only true will do — and a generative model cannot give you true, no matter how good it looks.
🔄 Check Your Eye: 1. In one sentence, what is the fundamental difference between what a camera produces and what a diffusion model produces? 2. Knowing only that a model optimizes for plausibility, predict one kind of detail it will reliably get wrong in a complex scene. 3. Why does "but it looks completely real" tell you almost nothing about whether an image is true?
Answers
- A camera measures light that physically reached a sensor from a real scene; a diffusion model predicts pixels statistically, with no scene, light, or moment behind them. 2. Anything requiring specific consistency — hands/finger counts, legible real text, physically correct shadows and reflections, the number of objects, how straps/jewelry connect — because the model only knows what such things usually look like, not what they are. 3. Because the model's entire job is to make plausible-looking images; photorealism is the default output, not evidence of a real referent. Looking real and being real became fully decoupled.
33.2 AI in the editing pipeline: denoise, upscale, generative fill
Now come down from philosophy to the toolbar, because most readers will not be generating whole images — they will be editing real photographs with tools that quietly have AI inside them. These arrived gradually and without fanfare, which is exactly why so many photographers are using generative tools without realizing they have crossed any line at all. Let us sort them, because they are not all the same, and the differences are the whole point.
Picture the modern editing pipeline as a row of tools, ordered by how much they invent:
FIGURE 33.2 — AI in the editing pipeline, ordered by how much it INVENTS
LESS INVENTION ◄──────────────────────────────────────────────► MORE INVENTION
│ │ │ │ │
AI DENOISE AI SHARPEN/ AI MASKING/ AI UPSCALE GENERATIVE FILL
(clean up LENS CORR. SELECTION (enlarge + / EXTEND / REPLACE
real grain) (fix the (find the sky, invent plausible (invent NEW content
real optics) the subject) fine detail) that was never there)
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
interprets corrects understands partly invents fully invents
the real the real the real (texture, edges) (objects, areas,
signal capture content whole regions)
── Everything left of UPSCALE refines WHAT YOU CAPTURED. ──
── UPSCALE and especially FILL begin adding WHAT YOU DID NOT. ──
AI denoise is software that removes the random speckle (noise) from a real photograph by learning what noise looks like versus what real detail looks like, and stripping the former. This is the gentlest end of the spectrum. The grain it removes was a flaw introduced by the sensor at high ISO (Chapter 3); cleaning it brings the image closer to the scene you photographed, not further. It does, mildly, "invent" — it guesses which smudges were noise and which were texture, and sometimes guesses wrong, smearing a freckle or a star — but its intent and usual effect is restoration of a real capture. A modern AI denoiser can make an ISO 12800 frame look like ISO 1600, which genuinely expands what you can shoot in the dark.
AI sharpening and lens corrections likewise fix the capture: they counter the softening of the lens, correct distortion and chromatic aberration, recover micro-contrast. They interpret the real signal. No new content appears; the door that was photographed is the door you see, rendered crisper.
AI masking and selection is the quiet hero of the modern darkroom: software that identifies regions of your real photo — "select the sky," "select the subject," "select the skin" — so you can adjust them separately (Chapter 25's local adjustments, made effortless). This invents nothing. It only understands what is already in your frame, then hands you a precise mask. It is pure assistance, and almost no one objects to it, because the pixels never change identity — only which of your real pixels get which of your real adjustments.
AI upscaling is software that enlarges an image and invents plausible fine detail to fill the new, larger pixel grid. Here we cross a meaningful threshold. When you upscale a small file to print it large, the software cannot know what the extra detail was — that information was never captured — so it guesses, adding texture and edges that are statistically likely but were not in the original light. For a modest enlargement of a landscape, this is usually invisible and harmless. Pushed hard, it will invent eyelashes that were never resolved, bricks that were never there, text that becomes confident gibberish. It is restoration shading into fabrication, and how far you pushed it decides which.
Generative fill, extend, and replace is the far end: software that invents entirely new content — objects, backgrounds, whole regions — to add to, remove from, or extend a photograph, generating pixels that were never captured to match a prompt or the surrounding image. Generative fill specifically means selecting an area of your photo and asking the model to replace it with something new it generates — remove the bucket and invent sand; extend the sky to change the crop; add a bird that never flew there. This is a diffusion model (§33.1) operating on a patch of your real image. It is no longer editing your photograph; it is partly authoring a new one.
⚠️ Common Mistake: Treating "it's just a tool in my photo editor" as if that settled the ethics. The location of a tool — built into your phone's gallery, sitting in the same toolbar as the crop and the exposure slider — tells you nothing about what it does. Magic-eraser-style generative removal lives one tap away from cropping, but cropping only excludes what was there while generative removal invents what wasn't. The fix is to judge tools by what they do to the record, never by where the button sits. A generative fill is a generative fill whether you ran it in professional software for an hour or tapped it on your phone in two seconds.
🎒 Gear Note: You do not need to buy anything to face every issue in this chapter — that is the point. The most aggressive generative tools shipped first inside flagship phone camera apps and free gallery apps, precisely because phone photos are where the mass market lives. A mobile-only reader has AI denoise running automatically on every night shot, AI upscaling in the "enhance" button, and generative removal a long-press away. The pro-track reader's desktop software has the same capabilities with more control and a clearer audit trail. The principles are identical across both; only the precision and the paper trail differ. Do not imagine this is a problem for other photographers with fancier kit. It is in your pocket right now.
🔄 Check Your Eye: 1. Order these four by how much they invent: generative fill, AI denoise, AI masking, AI upscale. 2. Why is AI masking almost universally uncontroversial, even though it is "AI in your photo"? 3. At which tool does the pipeline cross from refining what you captured to adding what you didn't?
Answers
- Least → most: AI masking (invents nothing) < AI denoise (interprets the real signal) < AI upscale (invents plausible fine detail) < generative fill (invents whole new content). 2. Because it changes nothing about the pixels' identity — it only understands your real frame and hands you a selection; every adjustment afterward is still yours, applied to your real content. 3. At AI upscale it begins (inventing fine detail); at generative fill it is unmistakable (inventing whole objects/regions).
33.3 The line between editing and generating
You now have the tools sorted by how much they invent. The harder question — the one your viewers, clients, and conscience actually care about — is where, on that spectrum, does honest editing become something else? This is not a new question. You met its parent in Chapter 29, and the test you learned there is the single most valuable thing you can carry into the age of AI.
The Chapter 29 test was: enhancement makes the photograph the best possible version of what was actually there; manipulation changes the record of what was there. The boundary is not about how much effort you spent or which tool you used — a single surgical click can be a profound manipulation; an hour of careful work can be pure enhancement. The test is whether the result still truthfully shows what the light recorded.
That test does not change for AI. It just gets sharper teeth, because generative tools make manipulation trivially easy and invisible. So let us draw the line explicitly for AI:
FIGURE 33.3 — The line, drawn for AI tools
REFINING THE RECORD (still a photograph) | AUTHORING NEW CONTENT (now something else)
──────────────────────────────────────────── | ────────────────────────────────────────────
AI denoise a real night frame | Generative-fill a new sky into the frame
AI sharpen / lens correction | Add an object/animal/person that wasn't there
AI masking to dodge & burn (Ch.28) the real | Remove a real person/object that WAS there*
tones | Extend the frame with invented scenery
Modest AI upscale for a larger print | Replace a face / swap a sky / "deghost" reality
Color-grade, crop, straighten (Ch.26–27) | Text-to-image generation of any part
──────────────────────────────────────────── | ────────────────────────────────────────────
The light you recorded is still the source. | Pixels now exist that no light ever made.
* Removal is the genuinely hard middle case — see below.
Read the two columns. On the left, every operation takes the light you actually recorded and presents it at its best — cleaner, sharper, better-toned, better-cropped. The source of every pixel is still the scene. On the right, pixels now exist in the image that no light ever produced; the model invented them. The left column is editing. The right column is generating. And the difference is not aesthetic — it is a difference in what the image is a record of.
The hard middle: removal
The honest difficulty — and you should never pretend it away — is removal, which can be done with generative fill but is conceptually older than AI. Cloning out a distraction (Chapter 29) and generative-removing it both change the record by taking out something that was there. So why did we treat some removals as defensible enhancement in Chapter 29?
The answer Chapter 29 gave still governs: it depends on whether the removed element was incidental to the photograph's subject or claim. Healing a sensor-dust spot from the sky is enhancement — the spot was never in the world. Cloning out a distant stray cone at the edge of a personal landscape is defensible enhancement for personal art. But the moment the photograph makes a claim about reality — a news photo, a documentary frame, a real-estate listing, a dating profile, a product shot, anything a viewer will read as "this is how it was" — removal of something that genuinely affected the scene becomes manipulation, whether you used a clone stamp in 2009 or generative fill today. AI did not move the line. It just made crossing it a two-second gesture instead of an hour of careful masking, which means you will cross it by accident far more often unless you decide, in advance, where your line is.
💡 Why It Works: The reason "did the source pixels come from the scene?" is a better test than "did I use AI?" is that it survives the technology changing. Tools will keep getting better and blurrier; a denoiser will get more aggressive, a fill will get more seamless. But the question of whether an image is a record of light or a fabrication of plausible pixels is stable. Anchor your ethics to what the image is, not to which menu you opened. A photograph manipulated with a brush in 1925 and one manipulated with a prompt in 2025 fail the same test for the same reason.
🖼️ Read This Frame: Here is the same captured beach, finished two ways, so you can see the line rather than just read about it.
text FIGURE 33.4 — "The bucket on the beach, two finishes" [constructed teaching example] THE FRAME A long, low dawn beach: wet sand foreground, a thin line of foam, flat sea, a pale graduated sky. In the captured original, a small red plastic bucket sits just left of center near the waterline, catching the warm light. THE LIGHT Soft, low, warm side-light from a rising sun off-frame right; long gentle shadows; the whole scene one to two stops brighter at the horizon than the foreground sand. THE MOMENT The instant a thin sheet of water slides back down the sand, mirror-bright. THE CHOICES Two finishes from the one capture: VERSION A (editing): AI-denoise the dim foreground, lift shadows, grade the warmth, straighten the horizon, crop slightly tighter. The bucket REMAINS — it was there. The image is the best version of what the light recorded. VERSION B (generating): all of the above, PLUS circle the bucket and generative-fill it away. The software invents sand and foam to cover it. The image now shows an empty beach that, at the moment of capture, was not empty. THE EFFECT Both look like clean, beautiful photographs. A viewer cannot tell them apart by eye. The only difference is invisible: Version A is a record of a real morning; Version B asserts a morning that did not occur. THE LESSON The line between editing and generating is frequently *invisible in the result* — which is exactly why it has to live in *you*, as a decided rule, before your finger is on the tool. You cannot outsource this to "does it look edited?", because good generation never does.🔗 Connection: This entire section extends the truth/manipulation thread that runs through the book. It began with the enhancement/manipulation test in Chapter 29 (§29.3–29.4), which handed off explicitly to this chapter, and it is bound to the consent and honesty duties of Chapter 32. The three chapters form one argument: a photograph carries a claim of witness, and you are responsible for not falsifying that claim — with a brush, with a clone stamp, or with a prompt.
🔄 Check Your Eye: 1. State the Chapter 29 test in one sentence, and explain why it works just as well for AI tools. 2. Why is removal the hard middle case, and what single question resolves most removals? 3. Why can't you use "does the final image look edited?" as your test for the editing/generating line?
Answers
- Enhancement presents the best version of what was there; manipulation changes the record of what was there. It works for AI because it asks about the image's relationship to reality, not which tool made the change — and that relationship is what viewers trust. 2. Because removal always changes the record by subtraction, yet some subtractions (incidental clutter, sensor dust) are honest and others (anything affecting the photo's claim) are not; the resolving question is was the removed element incidental to the photograph's subject or claim? 3. Because competent generation is invisible — Version A and Version B in Figure 33.4 are indistinguishable by eye — so the test must be about the image's source and honesty, decided by you, not about its appearance.
33.4 Disclosure, labeling, and trust
If the line between a recorded and an invented image is often invisible, then the only thing that protects your viewer — and your own credibility — is telling them. This is disclosure: stating what a viewer cannot see for themselves. You met disclosure as a general duty in Chapter 29; here we make it specific to AI, because the stakes and the mechanisms are different.
Disclosure (AI) is the practice of clearly telling viewers, clients, editors, or contests when and how AI was used to generate or substantially alter an image — in plain language a non-expert can understand, and, where possible, in embedded data that travels with the file. It rests on a simple ethical principle: a viewer is entitled to know whether they are looking at a record of something that happened or a fabrication of something that did not — because they will behave differently depending on the answer. They will trust a news photo, grieve at a memorial image, buy a product, swipe right, or convict a defendant partly because they read the image as a record. Removing their ability to know that, silently, is the harm.
Two layers of disclosure: words and credentials
Disclosure works on two layers, and serious practice uses both.
The first layer is plain-language labeling: an actual sentence a human reads. "This image was generated with AI." "Sky replaced using generative fill." "Composite of three exposures; no elements added or removed." It is blunt, it is human-readable, and it is the layer that actually informs people. Match the detail of the label to the stakes of the image: a personal art print needs only a light note; a news or documentary image needs a precise account of exactly what was and wasn't altered; an advertisement is increasingly required by law in some jurisdictions to disclose synthetic imagery, especially of people.
The second layer is content credentials (also called content provenance): tamper-evident metadata, embedded in the image file, that records how an image was made and edited — what device or model produced it, and what edits were applied — and travels with the file so a viewer's software can display it. Think of it as a verifiable nutrition label baked into the pixels' container. Provenance, in this context, means the documented origin and edit-history of an image: where it came from and what was done to it, recorded in a way that can be checked rather than merely claimed.
FIGURE 33.5 — Two layers of disclosure (use both)
LAYER 1 — PLAIN-LANGUAGE LABEL LAYER 2 — CONTENT CREDENTIALS (provenance)
┌──────────────────────────────┐ ┌──────────────────────────────┐
│ A sentence a HUMAN reads: │ │ Signed metadata a MACHINE │
│ "AI-generated." │ travels │ reads, embedded in the file: │
│ "Sky replaced (gen-fill)." │ with the │ • captured by camera X, or │
│ "3-exposure composite; │ image → │ generated by model Y │
│ nothing added/removed." │ │ • edits applied: denoise, │
│ │ │ gen-fill region, upscale │
│ Informs people directly. │ │ • cryptographically signed │
└──────────────────────────────┘ └──────────────────────────────┘
easy to strip / ignore survives re-saves IF the
but it is what humans see chain is preserved; verifiable
Words inform the viewer NOW. Credentials let anyone VERIFY later. Neither alone is enough.
Why both? Because each covers the other's weakness. A plain-language caption is what a human actually reads, but it is trivially removed — crop it off, repost without it, and the disclosure is gone. Content credentials are verifiable and travel inside the file, but only software that reads them surfaces them, and a determined bad actor can strip metadata. Used together — a human sentence and embedded provenance — they make honesty both visible and checkable. The emerging industry standard for this provenance (developed by a broad coalition of camera makers, software companies, and news organizations) aims to make "where did this image come from and what was done to it" answerable by your viewing software, the way a browser padlock answers "is this connection secure."
♿ Accessibility & Inclusion: Disclosure and good alt text are cousins, and you should write both as the same discipline. Alt text describes an image for someone who cannot see it (Chapter 1, Chapter 34); AI disclosure describes the image's origin for someone who cannot tell how it was made. Both rest on the same respect: the viewer is entitled to information they cannot get from the pixels alone. When you write alt text for a generated or heavily AI-edited image, say so in the alt text itself — "AI-generated illustration of…" — so that a blind user receives the same disclosure a sighted user gets from the caption. Honesty that only reaches sighted viewers is not honesty; it is a courtesy with a gap in it.
⚠️ Common Mistake: Burying the disclosure where no one will see it — a hashtag in the fortieth line of a caption, a line in a terms-of-service no one reads, metadata stripped on upload. Technically disclosed is not disclosed. The test is whether a normal viewer, in the normal way they encounter the image, would actually understand that it was generated or substantially altered. If the honest answer is "only if they go digging," you have not disclosed; you have created deniability. The fix: put the plain-language label where the image is seen, at the stakes the image carries.
🎞️ Behind the Image: (A constructed but representative vignette.) A photographer enters a respected nature contest with a stunning image of a fox in snow. It places. Then a sharp-eyed judge notices the snow texture repeating in a way real snow never does, and the catchlights in the fox's eyes do not match the scene's single light source — telltale fingerprints of generative fill. The photographer had used AI to "clean up" the background and, in doing so, replaced half of it. They had not lied in words — they simply had not mentioned it. The entry is disqualified, and the story spreads, and for a while their name means the AI fox in every photo forum. Nothing they generated was illegal. What ended them was the gap between what the image claimed to be — a captured wildlife moment — and what it was. The lesson is not "never use AI." It is: the disclosure is not optional, and the genre's expectation is the contract you are bound by whether you signed it or not.
🔄 Check Your Eye: 1. Name the two layers of AI disclosure and what each one is good at. 2. Why is a single hashtag at the end of a long caption usually not adequate disclosure? 3. What is provenance, and how do content credentials make it useful?
Answers
- Plain-language labeling (a sentence a human reads — informs people directly but is easy to strip) and content credentials/provenance (signed metadata a machine reads, embedded in the file — verifiable and travels with the file, but only if software surfaces it and the chain isn't stripped). 2. Because technically disclosed ≠ disclosed: the test is whether a normal viewer in the normal way they meet the image would understand it was AI; a buried hashtag creates deniability, not understanding.
- Provenance is the documented, checkable origin and edit-history of an image; content credentials embed it as tamper-evident, signed metadata so viewing software can verify how the image was made rather than take a caption's word for it.
33.5 Copyright, training data, and the open questions
This section comes with a warning printed on the tin: this is genuinely unsettled, it varies by country, and it is changing while you read. We are not going to hand you false certainty, because anyone who does is either guessing or selling something. What we can do is give you the stable shape of the questions, so you can ask the right ones and find current, local answers (and, for the business-and-contracts side, see Chapter 35, which owns copyright and licensing as a working topic).
Three questions sit at the center, and they are independent — an answer to one does not settle the others.
Question 1: Who, if anyone, owns the output? In several jurisdictions, copyright has historically required human authorship. A photograph qualifies because a human made the creative decisions — the four decisions of Chapter 1. A purely text-prompted AI image, where the human only typed a sentence and the machine made every pictorial choice, sits on much shakier ground: some authorities have indicated that output generated without sufficient human creative control may not be protected by copyright at all, meaning anyone could copy it. The more a human shapes the result — through extensive editing, compositing, combining generations with real photography, and genuine creative selection — the stronger the claim to authorship becomes. The blunt practical takeaway: the photographs you actually capture are clearly yours; the images you merely prompt may belong to no one. This is the opposite of how beginners assume it works.
Question 2: Was it legal to train the model on all those images? Most large image models were trained on enormous sets of images scraped from the open internet — including, almost certainly, copyrighted photographs whose makers never consented. Whether that training was lawful is being fought in courts around the world right now, with different theories and different outcomes pending in different places. You do not need to predict who wins. You need to know two things: that the question is open, and that if you are a working photographer, your images may be in those training sets, which is why this is not an abstract issue for you — it is potentially your own work, used without asking, to build a tool that competes with you.
Question 3: Can a generated image infringe an existing work or a person? Yes, plausibly, in at least two ways. A model can be prompted to closely imitate a specific living artist's protected style or a specific copyrighted character, producing output that may infringe. And it can generate a recognizable real person's likeness — which collides directly with the right of publicity and consent duties you learned in Chapter 32. "The AI made it" is not a defense that has saved anyone; you chose the prompt and you published the result.
FIGURE 33.6 — Three independent copyright questions (don't conflate them)
┌─ Q1: OUTPUT OWNERSHIP ─────────────┐ "Is the image I generated mine to own/sell?"
│ human authorship may be required; │ → captured photos: clearly yours.
│ pure prompts: weak/none in places │ → pure prompts: maybe nobody's. More human
└────────────────────────────────────┘ creative control → stronger claim.
┌─ Q2: TRAINING-DATA LEGALITY ───────┐ "Was it lawful to train on scraped images?"
│ unsettled, litigated, varies by │ → open question; YOUR work may be in the set.
│ country; outcome pending │
└────────────────────────────────────┘
┌─ Q3: OUTPUT INFRINGEMENT ──────────┐ "Can a generation infringe a work or a person?"
│ imitating a protected style/ │ → plausibly yes; real-person likeness also
│ character; a real person's face │ triggers right-of-publicity / consent (Ch.32).
└────────────────────────────────────┘ → "the AI did it" has protected no one.
🎒 Gear Note: A practical hedge that costs nothing: keep your RAW files and your capture metadata. The unaltered RAW from your camera (Chapter 26) is the strongest possible evidence of human authorship and of what the scene actually contained — it is your negative, your alibi, and your proof of provenance all at once. In a world where anyone can generate a convincing fake of your style, the photographer who can produce the original capture, the EXIF, and the edit history holds something the prompter never can: proof that a real moment happened and that they were the one who saw it. Back it up (Chapter 30's 3-2-1 rule).
🔗 Connection: The copyright and licensing of your own photographs — registering, licensing usage, work-for-hire, contracts, getting paid — is the working subject of Chapter 35. This section is only the AI-specific wrinkle: that generated images may not be ownable, that your work may have trained the tools, and that "the AI did it" shifts no liability off you. Take the durable principles here; take the business mechanics there.
🔄 Check Your Eye: 1. Why might a purely text-prompted image be less protectable than a photo you captured — the opposite of what most beginners assume? 2. Name the three independent copyright questions and confirm they don't answer each other. 3. Why is "the AI generated it" not a shield when a generation depicts a real person?
Answers
- Because copyright in many places requires human authorship; a captured photo embodies a human's creative decisions, while a pure prompt may lack sufficient human creative control, so the output may be unprotectable (anyone could copy it). More human shaping strengthens the claim. 2. (a) Do you own the output? (b) Was training on scraped images lawful? (c) Can the output infringe a work or person? Each can have any answer independent of the others. 3. Because you chose the prompt and published the result; right-of-publicity and consent (Ch.32) attach to using a real person's likeness regardless of the tool, and no court has accepted the tool as the responsible party.
33.6 Where the photographer adds irreplaceable value
After five sections of disruption, end where an honest accounting leads: the territory generative AI cannot enter, which turns out to be most of what made you want to photograph in the first place. This is not consolation. It is strategy. Knowing precisely what the machine cannot do tells you exactly where to stand.
A generative model can produce a plausible image of a sunset. It cannot stand on the cold sand at 5:40 a.m. and photograph this sunset, the one that happened, with your grandmother in it, on the last morning she was well. Run down the list of what the model structurally cannot do, and notice that none of it is a temporary limitation a better model will fix — each is a consequence of what a generative model is.
It cannot witness. A photograph's oldest power is testimony — this happened, I was there, here is the proof (Chapters 1, 17, 32). A generative model is, by definition, not there and not a record. It can fabricate the look of witness but never its substance. Every photograph that matters as evidence, memory, journalism, or family history draws its value from a well the model cannot reach.
It cannot be present to a real, specific moment. The decisive moment (Chapter 10) — the exact instant a real gesture, a real glance, a real collision of elements aligns — exists only in time, and only a person present can catch it. The model can imitate the style of a decisive moment; it cannot have been at the moment, because the moment was real and singular and gone.
It cannot have a relationship with the subject. The trust between a portrait photographer and a person, built over minutes or years, that lets a real face open (Chapters 13, 17); the access a documentary photographer earns; the consent freely given (Chapter 32) — these are human transactions. A face the model invents trusts no one and reveals nothing real, because there is no one there to reveal.
It cannot make a genuinely personal choice. Your voice (Chapter 39) — the particular way you see light, choose moments, frame the world, the consistent set of decisions that makes your work recognizably yours — is the residue of a real human life lived behind a camera. A model can average ten thousand photographers' choices; it cannot make yours, because yours come from being you, in this body, with this history, in this light.
It cannot take responsibility. A photograph carries duties — to the person in it, to the truth, to the viewer (Chapter 32). Only a person can owe those duties, be trusted to honor them, and be accountable when they don't. That accountability is not a bug in human photography; it is the source of its authority.
FIGURE 33.7 — Where value lives now (the photographer's territory)
WHAT THE MODEL CAN DO WHAT ONLY A PHOTOGRAPHER CAN DO
───────────────────────── ──────────────────────────────────────────
plausible generic imagery WITNESS — this really happened, I was there
average of all styles PRESENCE — being at the real, singular moment
fast, cheap, infinite variants RELATIONSHIP — earned trust, real consent
"looks like a photograph" VOICE — the choices only you would make
no stake, no memory, no duty RESPONSIBILITY — duties owed and honored
───────────────────────── ──────────────────────────────────────────
competes on QUANTITY & PLAUSIBILITY you compete on TRUTH, PRESENCE, MEANING, TRUST
The model floods the world with plausible images. That makes the SCARCE thing —
a true image, made by a present, responsible, particular human — more valuable, not less.
There is an economic edge to this, and it is worth saying plainly. As generative tools make plausible images infinite and free, the plausible image becomes worthless — there is an unlimited supply. What becomes scarce, and therefore valuable, is everything the flood cannot produce: the true image, the witnessed image, the image made by a present and accountable person with a recognizable voice. The photographer who competes with the machine on quantity of plausible content will lose and deserve to. The photographer who competes on truth, presence, relationship, and voice is selling something the machine cannot make at any price. The rise of the fake is, counterintuitively, the best argument for the real that has ever existed.
🔗 Connection: This section is the bridge to the book's final movement. Chapter 36 (history) will show that photography has survived this kind of existential question before — painters declared it the death of art; it was the birth of a new one. Chapter 37 (seeing for a lifetime) and Chapter 39 (finding your voice) are, in effect, the curriculum for everything in Figure 33.7's right-hand column: the human capacities no model can reach. AI does not make this book's project obsolete. It makes it urgent.
📸 In the Field — Photograph what the machine can't. Go make one image whose entire value is something a generative model structurally cannot produce: witness, a specific real moment, a relationship, or your particular voice. Concretely — photograph a real person you know, in real light, at a real moment that mattered, with their consent (Chapter 32); or document something happening now in one of your recurring locations — the busy intersection, the market — that is true and specific and dated to today. Shoot 20, keep 3. Then ask of your best frame: could a model have made this? If the honest answer is "no — because it really happened, and I was there, and only I would have framed it this way," you are standing exactly where the photographer's value now lives. Keep that frame; it may be the one you bring to the Portfolio Checkpoint below.
🔄 Check Your Eye: 1. Name three capacities a generative model structurally cannot have, and say why each is not a temporary limitation. 2. Explain the economic argument: why does a flood of plausible AI images make a true photograph more valuable, not less? 3. Which is the losing game for a photographer — competing with AI on plausible quantity, or on truth/presence/voice — and why?
Answers
- Any three of: witness (it isn't there and isn't a record), presence to a real singular moment (the moment is real and gone; only a present person catches it), relationship/consent (a human transaction with a real subject), personal voice (the residue of a real particular life), responsibility (only a person can owe and honor duties). None is temporary because each follows from what a generative model is — not a witness, not present, not a person. 2. Because plausible images become infinite and therefore worthless; scarcity shifts to what the flood can't make — the true, witnessed, voiced, accountable image — so that image's value rises. 3. Competing on plausible quantity loses (infinite supply, no human advantage); competing on truth/presence/voice wins because those are exactly what the machine cannot produce at any price.
Portfolio Checkpoint
Throughout this book you have been building a Photography Portfolio of twenty to thirty images. Part VII matures it from a pile of good frames into a considered, defensible body of work — and this chapter's contribution is unusual: it may be words rather than a picture, and that is exactly the point.
Do one of these two, and you may do both.
(A) Write your stated stance on AI. In roughly 150–300 words, write — in plain language, as if for the "about" page of your portfolio site or the artist statement you will draft in Chapter 34 — your personal stance on AI in your work. Be specific and defensible, not vague. Answer concretely: Which AI tools do you use, and where on the capture-to-conjured spectrum (Figure 33.2) do you draw your line? What will you never do? How will you disclose what you do use? A strong stance reads like a promise a viewer could hold you to: "I use AI denoise and AI masking on my real captures; I never add or remove content with generative fill in any image presented as documentary or journalism; where I make AI-composited art, I label it 'AI composite' plainly and embed content credentials." Vague is worthless here. A stance is only worth the specifics it commits to.
(B) Add one clearly-disclosed AI-assisted or comparison piece. Alternatively (or additionally), add to the portfolio one image that uses AI honestly and says so — for example, a real photograph you finished with AI denoise/masking and a one-line disclosure of exactly what was done; or a deliberate side-by-side "editing vs. generating" pair like Figure 33.4, captioned to teach the line; or an openly-labeled AI composite that you are presenting as an AI composite, not as a photograph. Whatever you choose, the disclosure is part of the piece, not a footnote.
Why this belongs in the portfolio. A serious body of work in this decade is not just images — it is a position. Clients, galleries, editors, and viewers increasingly want to know where a photographer stands on the truth of their images, and a photographer who has thought it through and can say so plainly is trusted in a way one who hedges is not. Curation note: file your stance (and any disclosed piece) alongside the ethics audit you began in Chapter 32; together they are the integrity of your portfolio, the part that says not just "look how this image was made" but "here is who made it, and what they will and will not do." You will fold this into the artist statement in Chapter 34.
Summary
This chapter confronted the live question of the decade: what a photograph is when an image can be conjured rather than captured. The durable answers, for reference:
- A camera measures light that was there; a generative model predicts pixels that were not. A generative AI produces images by predicting what is statistically likely from patterns in training data; today's photorealistic generators are diffusion models that build an image by reversing noise, steered by a prompt. The output is plausibility, not evidence — which is why it hallucinates specifics (hands, text, shadows, counts) and why "it looks real" tells you nothing about whether it is real.
- Sort AI tools by how much they invent. AI denoise and AI sharpening interpret/correct the real signal; AI masking invents nothing (it only understands your frame); AI upscaling invents plausible fine detail; generative fill / extend / replace invents whole new content. Everything up to upscale refines what you captured; upscale and fill begin adding what you didn't.
- The editing/generating line is the Chapter 29 test with sharper teeth. Enhancement presents the best version of what was there; manipulation changes the record. For AI: did the source pixels come from the scene (editing) or did the model invent them (generating)? Removal is the hard middle — resolved by asking whether the removed element was incidental to the photo's claim. The line is often invisible in the result, so it must live in you as a decided rule.
- Disclose on two layers. Plain-language labeling (a sentence a human reads) informs people but is easy to strip; content credentials / provenance (signed, embedded edit-history) is verifiable but only if software surfaces it. Use both. Technically disclosed ≠ disclosed: the test is whether a normal viewer, in the normal way they meet the image, would understand. Disclosure (AI) matches detail to stakes.
- Copyright is unsettled — know the three independent questions. (1) Output ownership: captured photos are clearly yours; pure prompts may be no one's (human authorship). (2) Training-data legality: open, litigated, varies by country — your work may be in the sets. (3) Output infringement: a generation can infringe a style/character or a real person's likeness (right of publicity, Chapter 32); "the AI did it" has protected no one. Keep your RAWs as proof of authorship; business mechanics live in Chapter 35.
- The photographer's value is now scarcer and clearer. A model cannot witness, be present to a real moment, have a relationship with a subject, make your voice, or take responsibility. As plausible images flood toward free, the true image — witnessed, voiced, accountable — becomes more valuable, not less. Compete on truth, presence, and meaning, never on plausible quantity.
| Situation | Honest default |
|---|---|
| High-ISO real capture, noisy | AI denoise — fine; it refines your real frame |
| Need a precise local edit (Ch.25/28) | AI masking — fine; invents nothing |
| Modest enlargement for print | AI upscale — fine in moderation; disclose if pushed hard |
| Personal art, removing incidental clutter | Defensible (Ch.29 test) — note it |
| News / documentary / journalism | No added or removed content; disclose any process precisely |
| Real-estate, product, dating, anything claiming "this is how it is" | No generative add/remove; it falsifies a claim |
| Openly-labeled AI art / composite | Fine — label it plainly as AI and embed credentials |
| Any image of a real, recognizable person | Consent + right of publicity apply (Ch.32) regardless of tool |
Spaced Review
Test yourself on earlier chapters without scrolling back — retrieval is how it sticks.
- (Chapter 29) State the one-sentence test that separates enhancement from manipulation, and give one reason it is not about how much work you did.
- (Chapter 32) A photograph depicts a recognizable real person. Name the consent-related concept and the legal concept that govern using their likeness — and explain why a generated likeness doesn't escape them.
- (Chapter 29) Why is removing a sensor-dust spot from a clear sky enhancement, while removing a real person from that same sky is manipulation, even though both might use a heal/fill tool?
Answers
1. *Enhancement* presents the best version of what was actually there; *manipulation* changes the record of what was there. It is not about effort because a single click can be a profound manipulation (removing a wedding ring) while an hour of dodging can be pure enhancement — the test is the result's relationship to the truth, not the labor. 2. **Informed consent** (the person knowingly agreeing to be photographed/used) and the **right of publicity** (control over the commercial use of one's likeness); a generated likeness doesn't escape them because the duty attaches to *using a real person's likeness*, and you chose to create and publish it. 3. The dust was on the sensor, never in the world, so removing it brings the image *closer* to the scene's truth (enhancement); the person was genuinely there, so removing them makes the photograph assert a false fact (manipulation).What's Next
You have a stance on what your images are and how honestly you make them. The next move is to put them in front of people. Chapter 34 — Sharing Your Work is about everything that happens after the edit: how to sequence a body of work so the order itself carries meaning, how the print differs from the screen, how to build a portfolio site and use social media without losing your voice to the algorithm, and how to write the artist statement that frames it all — the statement into which you will fold the AI stance you just drafted. You have made the photographs and decided what they mean. Now you learn to let them be seen.