Exercises: Generative AI: Ethics of Creation and Deception

DataField.Dev

Exercises: Generative AI: Ethics of Creation and Deception

These exercises progress from concept checks to challenging applications. Estimated completion time: 3-4 hours.

Difficulty Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)

Part A: Conceptual Understanding ⭐

Test your grasp of core concepts from Chapter 18.

A.1. Section 18.1.1 distinguishes between analytical AI systems and generative AI systems. In your own words, explain this distinction. Why does the shift from analysis to creation raise ethical questions that earlier AI systems did not?

A.2. Explain how a large language model (LLM) generates text, as described in Section 18.1.2. Why is the statement "LLMs understand language" misleading, even though their outputs can be remarkably coherent?

A.3. Section 18.1.3 describes how diffusion models generate images. Using the noise-to-image metaphor from the chapter, explain the process in two to three sentences. Then explain why this process makes it possible to create photorealistic images of events that never happened.

A.4. Define "AI hallucination" as presented in Section 18.4. Why is hallucination a particularly dangerous problem in domains such as law, medicine, and journalism?

A.5. Section 18.2.1 presents a debate between artists who call training data collection "theft" and AI companies that compare it to human artists learning by studying existing works. Summarize both positions. Then identify one specific way in which the comparison to human learning breaks down.

A.6. Explain the concept of "ghost work" as introduced in Section 18.2.2. Why does the chapter argue that ghost work is a justice issue and not merely a labor market concern?

A.7. Section 18.6 describes watermarking and the C2PA content provenance standard. What problem do these technologies attempt to solve? Identify one limitation of each approach.

Part B: Applied Analysis ⭐⭐

Analyze scenarios, arguments, and real-world situations using concepts from Chapter 18.

B.1. Consider the following scenario:

A mid-size law firm uses an LLM to draft legal briefs. An associate submits a brief to a court that includes three case citations generated by the LLM. None of the three cases exist — they are hallucinated citations with plausible-sounding names, realistic docket numbers, and fabricated holdings. The judge discovers the fabrication and sanctions the firm.

Using the accountability framework from Chapter 17 and the hallucination analysis from Section 18.4, analyze this scenario. Who is responsible — the associate, the firm, or the LLM provider? How does the concept of the accountability gap apply? What governance measures could prevent this outcome?

B.2. Section 18.3 examines the societal risks of deepfakes. Classify the following deepfake scenarios by the type of harm they represent (epistemic, interpersonal, political, or economic) and explain your reasoning:

(a) A deepfake video of a CEO announcing a merger that never happened, causing the company's stock price to surge
(b) A deepfake pornographic image of a high school student, shared among classmates
(c) A deepfake audio recording of a political candidate making racist statements, released 48 hours before an election
(d) A deepfake video of a news anchor reporting a fabricated natural disaster, shared on social media
(e) A deepfake voice call impersonating a grandparent, asking a grandchild to wire money for an "emergency"

B.3. A visual artist discovers that her portfolio — 2,000 original digital paintings posted on her personal website — was included in the training dataset for a popular image generation model. She never consented to this use. Users of the model can now generate images "in the style of [her name]" by including her name in a text prompt. Apply Nissenbaum's contextual integrity framework (introduced in Chapter 7 and referenced in Section 18.2.1) to analyze this situation. Was a norm of information flow violated? Who is responsible?

B.4. Section 18.5 examines generative AI's impact on labor markets. Ray Zhao's team at NovaCorp replaces four junior copywriters with an LLM system that generates marketing content. The system costs $2,000 per month; the copywriters' combined salaries were $180,000 per year. Analyze this decision from: - (a) An economic efficiency perspective - (b) A labor rights perspective - (c) A quality and accountability perspective - (d) The perspective of the displaced workers

Does the economic efficiency argument settle the question? Why or why not?

B.5. The chapter discusses the concept of "model collapse" — the degradation that occurs when generative AI models are trained on content produced by other generative AI models. Explain why this is a problem. Then consider: if the internet becomes saturated with AI-generated text, what are the implications for (a) future AI training, (b) human knowledge production, and (c) the epistemic foundations of democratic society?

B.6. Sofia Reyes argues in Section 18.2.2 that the labor question in AI ethics "doesn't get the attention it deserves." Consider the following supply chain for an AI-generated image:

A photographer takes 10,000 photos over ten years

A web scraper collects those photos from the photographer's website

Data annotators in Kenya label the photos for $1.50/hour

Engineers at an AI company use the labeled data to train a diffusion model

A marketing agency uses the model to generate images for a client's campaign

The campaign earns $500,000 in revenue

Map the value chain. Who captures the economic value? Who bears the costs and risks? What governance mechanisms, if any, could redistribute value more equitably along this chain?

Part C: Real-World Application Challenges ⭐⭐-⭐⭐⭐

These exercises ask you to investigate your own experience with generative AI.

C.1. ⭐⭐ Hallucination Detection. Use a publicly available LLM (such as ChatGPT, Claude, or an open-source model) to generate a 500-word essay on a topic you know well. Carefully fact-check every claim, citation, and specific detail. Document: (a) how many factual errors you found, (b) whether any fabricated sources were cited, (c) whether the errors were subtle or obvious, and (d) what a reader without domain expertise would likely have believed. Write a one-page reflection on what this reveals about the reliability of LLM-generated content.

C.2. ⭐⭐ Training Data Investigation. Select a generative AI system you have used or encountered. Research: (a) what training data the system was built on, (b) whether the creators of that training data consented to its use, (c) whether any training data was copyrighted, and (d) what the company's stated policy is on training data sourcing. Note how easy or difficult it was to find this information. Write a one-page assessment of the ethical adequacy of the system's training data practices.

C.3. ⭐⭐⭐ Watermark and Provenance Test. Research the C2PA content provenance standard described in Section 18.6. Identify one platform or tool that has implemented C2PA metadata. Then test it: generate an AI image using a tool that embeds C2PA metadata, download it, and share it through two or three different channels (email, social media, messaging). Does the provenance metadata survive the sharing process? Document your findings and assess whether C2PA is a practical solution to the provenance problem.

C.4. ⭐⭐⭐ Style Imitation Experiment. Using a text-based generative AI tool, prompt the system to write "in the style of" three different well-known authors (one living, one recently deceased, one historical). Evaluate the outputs: (a) How convincingly does the system imitate each author's style? (b) What ethical questions arise from the system's ability to replicate a living author's voice? (c) Should there be a legal or ethical distinction between imitating a living author and a historical one?

Part D: Synthesis & Critical Thinking ⭐⭐⭐

These questions require you to integrate multiple concepts from Chapter 18 and think beyond the material presented.

D.1. The chapter identifies a tension between two positions on training data ethics. Position A: training data collection without consent is a violation of creators' rights that should be prohibited or compensated. Position B: training on publicly available data is analogous to human learning and should be permitted. Write a 400-600 word essay that moves beyond this binary by proposing a third position — a governance framework that addresses the legitimate concerns of both sides. Your framework should be specific enough to be operationalized.

D.2. Dr. Adeyemi observes in Section 18.1.4: "For centuries, we've operated on the assumption that photographs depict reality, that audio recordings capture actual speech, and that video shows actual events. Generative AI breaks all three assumptions simultaneously." Analyze the epistemic implications of this statement. If we can no longer trust that photographs, audio, and video are authentic, what happens to: (a) journalism, (b) criminal evidence, (c) democratic discourse, and (d) personal trust? Is there a viable path to restoring epistemic reliability in the generative AI era?

D.3. The chapter discusses the speed of adoption of generative AI (Section 18.1.5) and the resulting "governance vacuum." Compare the governance vacuum around generative AI to a historical precedent — the introduction of the printing press, the automobile, nuclear weapons, or the internet. What does the historical comparison reveal about likely governance trajectories? What does it miss?

D.4. Section 18.5 examines labor displacement by generative AI. Some economists argue that technological displacement has always been followed by the creation of new jobs — the "lump of labor fallacy" critique. Others argue that generative AI is different because it targets cognitive and creative work that was previously considered uniquely human. Write a 300-500 word analysis evaluating both positions. Reference at least two concepts from the chapter.

Part E: Research & Extension ⭐⭐⭐⭐

These are open-ended projects for students seeking deeper engagement. Each requires independent research beyond the textbook.

E.1. The Copyright Landscape. Research the current state of copyright litigation involving generative AI, focusing on at least two of the following cases: Andersen v. Stability AI, The New York Times v. Microsoft and OpenAI, Getty Images v. Stability AI, or Doe v. GitHub (Copilot). Write a 1,000-word report covering: (a) the plaintiffs' claims, (b) the defendants' arguments, (c) the current status of the litigation, (d) the potential implications of different outcomes for the AI industry, and (e) how the cases connect to the training data ethics debate in Section 18.2.

E.2. Ghost Work Investigation. Research the labor conditions of data annotation workers who train AI systems, focusing on one or more of the following sources: the 2023 Time investigation of OpenAI's Kenyan data labelers, the work of Mary L. Gray and Siddharth Suri (Ghost Work, 2019), or recent reporting on Scale AI, Sama, or Amazon Mechanical Turk. Write a report (800-1,200 words) addressing: (a) what work annotators perform, (b) working conditions and compensation, (c) the psychological impacts of content moderation labeling, (d) the power asymmetry between AI companies and annotators, and (e) what governance mechanisms could improve conditions.

E.3. Deepfake Governance Comparative. Compare the governance approaches to deepfakes and synthetic media in at least three jurisdictions (e.g., the EU AI Act's synthetic media disclosure requirements, US state-level deepfake laws, China's Deep Synthesis Provisions, or South Korea's regulations). Write a comparative analysis (600-1,000 words) identifying: (a) what each jurisdiction requires, (b) how enforcement works, (c) whether the approaches adequately protect against political, interpersonal, and economic deepfake harms, and (d) which approach is most effective and why.

Solutions

Selected solutions are available in appendices/answers-to-selected.md.