Quiz: Generative AI: Ethics of Creation and Deception

DataField.Dev

Quiz: Generative AI: Ethics of Creation and Deception

Test your understanding before moving to the next chapter. Target: 70% or higher to proceed.

Section 1: Multiple Choice (1 point each)

1. Which of the following best describes the fundamental distinction between analytical AI and generative AI as presented in this chapter?

A) Analytical AI uses supervised learning; generative AI uses unsupervised learning.
B) Analytical AI processes and classifies existing data; generative AI produces new content that did not previously exist.
C) Analytical AI is deterministic; generative AI is probabilistic.
D) Analytical AI requires human oversight; generative AI operates autonomously.

Answer

**B)** Analytical AI processes and classifies existing data; generative AI produces new content that did not previously exist. *Explanation:* Section 18.1.1 defines this as the "generative turn." Previous AI systems — search engines, recommendation algorithms, credit scoring models — analyzed, classified, and acted on existing data. Generative AI produces new text, images, audio, and video. This shift is not just technical but ethical: creation raises questions about authorship, truth, originality, and labor that analysis does not. Options A, C, and D describe technical properties that may be true of some systems but do not capture the core distinction the chapter draws.

2. A large language model generates text by:

A) Retrieving relevant passages from a database of stored documents and assembling them into coherent responses
B) Predicting the most probable next token in a sequence, based on patterns learned from massive training datasets
C) Understanding the meaning of a prompt and composing a response based on comprehension and reasoning
D) Running a series of if-then rules that match keywords in the prompt to pre-written response templates

Answer

**B)** Predicting the most probable next token in a sequence, based on patterns learned from massive training datasets. *Explanation:* Section 18.1.2 describes LLMs as neural networks trained on vast text corpora to predict the most probable next token (word or word fragment). The chapter emphasizes that despite their fluency, LLMs do not "understand" text — they model statistical patterns. Option A describes a retrieval system, not a generative model. Option C attributes understanding that Section 18.1.2 explicitly denies. Option D describes a rule-based system, which is a fundamentally different architecture.

3. The ethical problem with web scraping for AI training data, as described in Section 18.2.1, is primarily that:

A) Web scraping violates the technical terms of service of most websites.
B) The creators of scraped content — writers, artists, photographers — did not consent to their work being used to train AI systems.
C) Scraped data is always of low quality and introduces errors into AI models.
D) Web scraping is prohibited by the GDPR in all circumstances.

Answer

**B)** The creators of scraped content — writers, artists, photographers — did not consent to their work being used to train AI systems. *Explanation:* Section 18.2.1 frames the core ethical problem as a consent violation: people who published creative work online did not anticipate or agree to that work being ingested by machine learning models. "Publicly available" is not the same as "freely available for any purpose." While terms-of-service violations (A) may be relevant in some cases, the chapter frames the issue as fundamentally about consent and contextual integrity. Option C conflates quality concerns with the ethical question. Option D overstates the GDPR's prohibitions.

4. The concept of "ghost work" as introduced in Section 18.2.2 refers to:

A) AI systems that operate without human supervision in the background
B) The hidden human labor — data annotation, content labeling, feedback rating — that makes AI systems function
C) Automated bots that impersonate human users on social media platforms
D) Jobs that have been eliminated by AI automation, leaving "ghost" positions in organizational charts

Answer

**B)** The hidden human labor — data annotation, content labeling, feedback rating — that makes AI systems function. *Explanation:* Section 18.2.2 introduces ghost work (a term from Mary L. Gray and Siddharth Suri) as the invisible, often poorly compensated human labor that enables AI systems to function. Data annotators label images, rate text outputs, and flag toxic content. This work is essential to AI systems but is deliberately hidden by marketing narratives that emphasize machine intelligence. The chapter particularly highlights the exploitative conditions faced by annotators in the Global South, including psychological trauma from content moderation work.

5. Which of the following best defines "AI hallucination" as used in this chapter?

A) The phenomenon in which an AI system produces visual distortions in generated images
B) A machine learning model's tendency to overfit to training data, producing accurate but ungeneralizable outputs
C) The generation of confident, plausible-sounding content that is factually incorrect or entirely fabricated
D) The tendency of users to perceive intelligence and understanding in AI systems that possess neither

Answer

**C)** The generation of confident, plausible-sounding content that is factually incorrect or entirely fabricated. *Explanation:* Section 18.4 defines hallucination as the production of content that is fluent, coherent, and presented with confidence but is factually wrong — including fabricated citations, invented statistics, and descriptions of events that never occurred. The term is significant because hallucinated content is often indistinguishable from accurate content without independent verification. Option D describes anthropomorphism, a related but distinct concept. Options A and B describe other technical phenomena.

6. Non-consensual intimate imagery (NCII) created using AI tools is discussed as an example of:

A) An economic harm from generative AI that affects the entertainment industry
B) A deepfake harm that is interpersonal in nature, disproportionately targeting women and girls
C) A hallucination problem unique to image generation models
D) A fair use exception under copyright law that permits parody and satire

Answer

**B)** A deepfake harm that is interpersonal in nature, disproportionately targeting women and girls. *Explanation:* Section 18.3 identifies NCII as one of the most urgent harms of generative AI. AI tools can now create photorealistic intimate images of real people from a single photograph — without the subject's consent or knowledge. Studies cited in the chapter indicate that the vast majority of deepfake pornography targets women. This is classified as an interpersonal harm rather than an economic or political one because the damage is to individual dignity, reputation, and psychological well-being — though it also has political implications when used to silence women in public life.

7. Section 18.6 describes the C2PA standard. What problem is C2PA designed to address?

A) Preventing AI models from being trained on copyrighted data
B) Providing a cryptographic chain of provenance metadata that records how a piece of content was created and edited
C) Detecting deepfakes in real time using machine learning classifiers
D) Compensating content creators when their work is used in AI training datasets

Answer

**B)** Providing a cryptographic chain of provenance metadata that records how a piece of content was created and edited. *Explanation:* Section 18.6 describes the Coalition for Content Provenance and Authenticity (C2PA) as a technical standard that embeds metadata in digital content — recording whether an image was captured by a camera, generated by AI, or edited by software. The goal is to allow viewers to verify how content was created, addressing the epistemic crisis caused by synthetic media. C2PA does not detect deepfakes (C) — it provides provenance information for content that opts into the standard. It does not address copyright or compensation (A, D).

8. The chapter argues that the speed of generative AI adoption creates a "governance vacuum." This concept means:

A) No government has any regulations applicable to generative AI
B) The technology is widely used before the rules, norms, and institutions needed to govern it responsibly have been developed
C) Companies deliberately avoid governance by incorporating in jurisdictions without AI regulations
D) Academic researchers have not yet studied generative AI's impacts

Answer

**B)** The technology is widely used before the rules, norms, and institutions needed to govern it responsibly have been developed. *Explanation:* Section 18.1.5 describes the governance vacuum as the temporal gap between technology adoption and governance adaptation. ChatGPT reached 100 million users within two months of launch — far faster than regulatory, professional, educational, or social institutions could respond. The result is a period in which the technology is deployed at scale without adequate governance. This is not the same as claiming there are no regulations at all (A) — existing laws apply in some cases — but rather that the governance infrastructure is insufficient for the pace and scope of deployment.

9. The analogy that AI companies draw between AI training on publicly available data and human artists learning by studying existing works is contested in the chapter because:

A) Human artists never study existing works — they create entirely from imagination
B) Unlike a human artist, an AI model can memorize and reproduce training data with high fidelity, and it operates at a commercial scale that the human analogy does not support
C) Copyright law explicitly prohibits AI systems from learning from existing works
D) Human artists always compensate the artists whose work they study

Answer

**B)** Unlike a human artist, an AI model can memorize and reproduce training data with high fidelity, and it operates at a commercial scale that the human analogy does not support. *Explanation:* Section 18.2.1 presents and critiques this analogy. The chapter acknowledges that human artists do learn by studying others' work but identifies key differences: a human who studies Monet does not memorize every brushstroke and reproduce them on command, while an AI model can generate images "in the style of Monet" with high fidelity. Furthermore, the human analogy involves individual learning, while AI training involves commercial-scale ingestion of millions of works for profit. Option A is false (human artists do study others). Option C overstates current law. Option D is false (human artists do not compensate artists they study).

10. Sofia Reyes argues that the AI labor question "doesn't get the attention it deserves." Which of the following best captures her concern?

A) AI companies employ too many engineers, inflating labor costs
B) The value created by AI systems depends on low-wage, often exploitative human labor — particularly data annotation — performed by workers in the Global South
C) AI systems are not productive enough to justify their development costs
D) AI companies should hire more lobbyists to advocate for favorable labor regulations

Answer

**B)** The value created by AI systems depends on low-wage, often exploitative human labor — particularly data annotation — performed by workers in the Global South. *Explanation:* Section 18.2.2 presents Sofia's argument that AI ethics discussions tend to focus on bias and privacy while neglecting the labor dimension: who does the work that makes AI work, under what conditions, and for what pay. The chapter documents data annotators earning $1-2 per hour, experiencing psychological trauma from content moderation work, and lacking basic worker protections — while the value of their labor flows to AI companies and shareholders. This is the power asymmetry at its sharpest.

Section 2: True/False with Justification (1 point each)

11. "AI hallucination is a temporary problem that will be fully solved as language models are trained on larger and higher-quality datasets."

Answer

**False.** *Explanation:* Section 18.4 explains that hallucination is an inherent feature of how LLMs work — they generate text by predicting probable next tokens, not by verifying factual accuracy. Larger datasets and better training can reduce hallucination rates but cannot eliminate them entirely because the generation mechanism is fundamentally probabilistic, not truth-seeking. The chapter notes that even the most advanced models continue to hallucinate, and that reducing hallucination in one domain may not transfer to others. Treating hallucination as "solvable" creates a false sense of reliability.

12. "The C2PA content provenance standard can verify the authenticity of any digital image or video, including content that existed before the standard was implemented."

Answer

**False.** *Explanation:* Section 18.6 clarifies that C2PA is a *prospective* standard — it can record provenance information for content created by cameras, software, or AI systems that implement the standard going forward. It cannot retroactively verify content that existed before C2PA was adopted. Moreover, C2PA is opt-in: content creators and platforms must choose to implement it. Content without C2PA metadata is not thereby proven inauthentic — it simply lacks provenance information. The chapter also notes that C2PA metadata can be stripped by sharing content through channels that do not preserve it.

13. "Copyright law in most jurisdictions has been definitively settled regarding whether AI-generated content is eligible for copyright protection."

Answer

**False.** *Explanation:* Section 18.5 documents the unsettled state of copyright law regarding AI-generated content. The U.S. Copyright Office has ruled that purely AI-generated content cannot be copyrighted because copyright requires human authorship — but cases involving substantial human direction of AI tools remain contested. Other jurisdictions take different positions. The training data side is equally unsettled: multiple major lawsuits (*Andersen v. Stability AI*, *The New York Times v. OpenAI*) are testing whether training on copyrighted content constitutes infringement or fair use. The chapter describes the legal landscape as actively evolving, not settled.

14. "Deepfakes are primarily a political problem — their main risk is election manipulation and propaganda."

Answer

**False.** *Explanation:* Section 18.3 identifies multiple categories of deepfake harm: political (election manipulation, propaganda), interpersonal (non-consensual intimate imagery, harassment, bullying), economic (fraud, stock manipulation, impersonation scams), and epistemic (erosion of trust in all media). The chapter emphasizes that interpersonal harms — particularly NCII targeting women and girls — are currently the most prevalent and damaging category of deepfake harm, even though political deepfakes receive more media attention. Reducing deepfakes to a political problem underestimates the scope of the threat.

15. "The chapter argues that generative AI's impact on labor markets is fundamentally different from previous waves of technological automation because generative AI targets cognitive and creative work."

Answer

**True.** *Explanation:* Section 18.5 presents the argument that while previous waves of automation primarily displaced manual and routine cognitive labor, generative AI targets skills that were previously considered uniquely human: writing, visual art, music composition, translation, coding, and complex analysis. This does not mean that no new jobs will be created — the chapter acknowledges the historical pattern of technological job creation — but it raises the question of whether the "new jobs will emerge" response applies when the technology directly replicates the cognitive and creative capabilities that defined the displaced jobs. The chapter presents this as an open question, not a settled conclusion.

Section 3: Short Answer (2 points each)

16. The chapter uses Nissenbaum's contextual integrity framework to analyze the ethics of training data collection. Explain how contextual integrity applies to the scenario of an artist whose portfolio is scraped for AI training. What norm of information flow is violated?

Sample Answer

Nissenbaum's contextual integrity framework holds that privacy violations occur when information flows in ways that violate the norms appropriate to the context in which the information was shared. When an artist posts paintings to a portfolio website, the context is professional display: the norms include that viewers will see the work, potential clients will consider hiring the artist, and critics will engage with it. Training an AI model on those images violates contextual integrity because the information flow — from portfolio to commercial AI training dataset — is not consistent with the norms of professional display. The artist shared work for viewing and appreciation, not for commercial ingestion by a system that can then produce competing work in the artist's style. The violation exists even though the work was "publicly available," because public availability within one context does not authorize use in any context. This distinction between public accessibility and universal licensure is central to the chapter's analysis of training data ethics. *Key points for full credit:* - Explains contextual integrity as norm-of-context analysis - Identifies the specific context (professional display) and violated norm - Distinguishes between "publicly available" and "available for any purpose"

17. Explain the concept of "model collapse" and why it poses a long-term threat to the quality of AI systems and the broader information ecosystem.

Sample Answer

Model collapse occurs when generative AI systems are trained on content that was itself produced by generative AI systems, creating a feedback loop that progressively degrades output quality. As the internet fills with AI-generated text and images, future training datasets will inevitably include AI-generated content — and models trained on this content will reproduce and amplify the patterns, errors, and biases of their AI predecessors rather than learning from the diversity of genuine human expression. The long-term threats are twofold. For AI systems, model collapse means diminishing returns: each generation of models trained on AI-polluted data produces outputs that are blander, more homogeneous, and less accurate than the previous generation. For the broader information ecosystem, the proliferation of AI-generated content threatens to dilute the epistemic quality of publicly available knowledge — the very foundation on which AI training depends. This creates a tragedy-of-the-commons dynamic: each AI company benefits from generating content, but the collective result is the degradation of the shared information commons on which all AI systems depend. *Key points for full credit:* - Defines model collapse as a training-on-AI-content feedback loop - Identifies the dual threat: to AI quality and to the information ecosystem - Notes the commons/collective action dimension

18. The chapter presents the "governance vacuum" created by generative AI's rapid adoption. Explain what this vacuum is, why it exists, and identify two specific harms that have occurred within this vacuum.

Sample Answer

The governance vacuum is the gap between the rapid deployment of generative AI technology and the slow development of the rules, institutions, norms, and oversight mechanisms needed to govern it responsibly. It exists because ChatGPT reached 100 million users in two months — faster than any regulatory body, professional association, educational institution, or social norm could adapt. The result is a period in which the technology is used at massive scale while the governance infrastructure remains inadequate. Two specific harms that have occurred within this vacuum: First, AI-generated NCII (non-consensual intimate imagery) proliferated before most jurisdictions had laws specifically criminalizing it, leaving victims without clear legal recourse in many states and countries. Second, AI hallucination in professional contexts — such as lawyers submitting AI-fabricated case citations to courts — occurred because neither bar associations nor legal education had established clear rules about LLM use in practice before practitioners began using the tools. *Key points for full credit:* - Defines the governance vacuum as a temporal gap between adoption and governance - Explains why the gap exists (speed of adoption vs. speed of institutional adaptation) - Identifies two specific, concrete harms with clear connections to the vacuum

19. Section 18.5 discusses the tension between economic efficiency and creative labor. A company replaces human workers with AI. The company saves money. The displaced workers lose their livelihoods. Using concepts from the chapter, explain why the economic efficiency argument alone is insufficient to resolve this tension.

Sample Answer

The economic efficiency argument holds that if AI produces acceptable output at lower cost, the substitution is justified because it creates economic value — savings can be reinvested, prices can fall, and resources can be allocated to higher-value activities. The chapter argues this is insufficient for several reasons. First, the efficiency calculation ignores distributional consequences: the savings accrue to the company and its shareholders, while the costs — unemployment, lost income, identity disruption — fall entirely on the displaced workers. Economic efficiency measures aggregate welfare, not its distribution. Second, the argument assumes that "acceptable output" is equivalent to the human output it replaces. But the chapter notes that AI-generated content may lack the originality, contextual judgment, and accountability that human creators provide — qualities that are difficult to measure but socially valuable. Third, the efficiency argument ignores the labor conditions upstream in the AI supply chain: the low-wage data annotators whose work enables the system. If the full labor cost of the AI system — including exploited annotation labor — were internalized, the efficiency advantage might narrow significantly. The chapter does not argue that economic efficiency is irrelevant, but rather that it must be weighed alongside distributive justice, quality considerations, and the full accounting of labor in the AI supply chain. *Key points for full credit:* - Identifies the distributional problem (who gains, who loses) - Notes the quality/equivalence assumption - References the upstream labor dimension

Section 4: Applied Scenario (5 points)

20. Read the following scenario and answer all parts.

Scenario: TrueVoice

TrueVoice is a startup that offers AI-powered voice cloning for corporate training. Companies record a 30-second sample of a spokesperson's voice, and TrueVoice generates audio of that voice reading any script — enabling rapid production of training materials, product announcements, and internal communications.

One of TrueVoice's clients, Horizon Financial, clones the voice of its CEO, Maria Chen, to produce quarterly earnings call recordings. The recordings are accurate transcriptions of approved scripts. However, an employee at Horizon discovers that the TrueVoice platform's API can be used to generate audio of Maria Chen saying anything — not just approved scripts. The employee generates audio of Chen appearing to announce a major acquisition, and shares it in an investment forum. Horizon's stock price spikes 12% before the fraud is detected.

TrueVoice's terms of service prohibit misuse, and the employee violated company policy. Maria Chen did not know her voice was being cloned for any purpose beyond the approved training materials.

(a) Identify and classify the ethical violations in this scenario. For each, identify which chapter concept it relates to (e.g., consent, deepfakes, hallucination, labor, copyright). (1 point)

(b) Map the accountability chain. Identify at least four actors and evaluate each one's potential liability. (1 point)

(c) Apply the watermarking and provenance framework from Section 18.6. Could C2PA or digital watermarking have prevented or mitigated the harm? What are the limitations? (1 point)

(d) Evaluate TrueVoice's "terms of service" defense — the claim that they prohibited misuse and the employee violated those terms. Is this defense adequate? Why or why not? (1 point)

(e) Propose three governance measures — one technical, one legal, and one institutional — that would reduce the risk of this scenario occurring. (1 point)

Sample Answer

**(a)** Ethical violations: - **Consent violation (deepfakes/voice cloning):** Maria Chen consented to voice cloning for a specific, limited purpose (training materials). The platform's capability to generate her voice saying anything extends far beyond her consent — a violation of contextual integrity. Relates to Section 18.3 on deepfakes and Section 18.2.1 on consent. - **Market manipulation (economic deepfake harm):** The employee generated fabricated audio to influence stock prices — a form of economic fraud enabled by synthetic media. Relates to Section 18.3's taxonomy of deepfake harms. - **Inadequate access controls (platform governance):** TrueVoice's API allowed generation of unrestricted content using cloned voices, with only terms of service as a guardrail. Relates to Section 18.6 on governance tools. - **Informational harm to investors:** Investors acted on fabricated information, suffering potential financial losses. Relates to the epistemic harm category in Section 18.3. **(b)** Accountability chain: - **The employee:** Directly generated and distributed the fraudulent audio. Violated company policy and securities law. Clearly liable for the fraudulent act itself. - **TrueVoice (platform):** Built a tool capable of generating unrestricted voice content from a 30-second sample. Terms of service prohibited misuse but technical design permitted it. Potential liability: building a tool that foreseeably enables fraud without adequate safeguards. Claim of non-liability: "We prohibited misuse in our terms of service." - **Horizon Financial (client):** Authorized voice cloning of its CEO without adequate internal controls on how the cloned voice could be used. Potential liability: failure to establish security protocols around a powerful technology. Claim: "The employee acted outside their authority." - **Maria Chen (voice subject):** The person whose identity was exploited. Not liable — she is the victim. But the scenario raises the question of whether she was fully informed about the extent of the cloning capability. - **Investors:** Acted on the fabricated audio. While not liable, their losses illustrate the downstream harm of synthetic media in financial markets. **(c)** C2PA provenance metadata embedded in the audio file could have indicated that the content was AI-generated, allowing platforms and investors to identify it as synthetic before acting. Digital watermarking (inaudible markers in the audio) could have allowed forensic detection even if the audio was shared through channels that strip metadata. However, limitations are significant: (1) C2PA only works if the sharing platform checks and displays provenance information — many forums and messaging apps do not. (2) Watermarks can potentially be removed by sophisticated actors. (3) Neither technology prevents the initial creation of the harmful content — they only support detection after the fact. (4) In a fast-moving financial market, the damage from a 12% stock spike can occur in minutes, before provenance checks are applied. **(d)** TrueVoice's terms-of-service defense is inadequate. Terms of service are contractual, not technical — they prohibit misuse but do not prevent it. This mirrors the chapter's argument about the consent fiction: clicking "I agree" to terms of service does not constitute meaningful governance when the technology itself enables the prohibited behavior. A responsible platform would implement technical safeguards (access controls, content restrictions, audit logs) rather than relying solely on contractual prohibition. The TOS defense is analogous to a gun manufacturer claiming no liability because they included a note saying "please don't commit crimes" — the defense acknowledges the risk but does nothing to mitigate it architecturally. **(e)** Three governance measures: 1. **Technical:** TrueVoice should implement scope-limited voice generation — the API should accept only pre-approved scripts authorized by the voice subject or their designated agent. All generated audio should be watermarked and embedded with C2PA metadata indicating AI generation. Audit logs should record every generation request with user identity and script content. 2. **Legal:** Jurisdictions should require explicit, informed, scope-limited consent from any person whose voice is cloned. Voice cloning without consent should be a civil violation; use of cloned voice for fraud should carry criminal penalties. Securities regulators should classify AI-generated audio purporting to represent corporate officers as a category of market manipulation subject to existing securities fraud enforcement. 3. **Institutional:** Horizon Financial should establish an internal AI governance committee that reviews all uses of generative AI tools, maintains access controls, and conducts regular audits of how tools are used. The company should implement a voice cloning policy requiring explicit CEO authorization for each use case, with technical controls enforcing scope limitations.

Scoring & Review Recommendations

Score Range	Assessment	Next Steps
Below 50% (< 15 pts)	Needs review	Re-read Sections 18.1-18.3 carefully, redo Part A exercises
50-69% (15-20 pts)	Partial understanding	Review specific weak areas, focus on Part B exercises for applied practice
70-85% (21-25 pts)	Solid understanding	Ready to proceed to Chapter 19; review any missed topics briefly
Above 85% (> 25 pts)	Strong mastery	Proceed to Chapter 19: Autonomous Systems and Moral Machines

Section	Points Available
Section 1: Multiple Choice	10 points (10 questions x 1 pt)
Section 2: True/False with Justification	5 points (5 questions x 1 pt)
Section 3: Short Answer	8 points (4 questions x 2 pts)
Section 4: Applied Scenario	5 points (5 parts x 1 pt)
Total	28 points