Case Study 20-1: Artist Litigation Against Generative AI — The Copyright Frontier

Overview

The emergence of AI image generation has produced some of the most significant intellectual property litigation in recent legal history. Beginning in January 2023, a series of lawsuits challenged the legality of training AI image generators on copyrighted artwork scraped from the internet, the generation of images in specific artists' styles, and the commercial exploitation of artists' creative work without consent or compensation.

These cases sit at the intersection of two domains that have historically coexisted uneasily: the economics of creative work and the development of powerful technology. The plaintiff artists argue that AI image generation is only possible because of their creative work, that they never consented to their work being used for AI training, and that the result of that training directly competes with them commercially. The AI companies argue that training on publicly available images is transformative fair use — the same principle that allows Google to cache web pages, search engines to analyze content, and researchers to study human-created works — and that the alternative would make AI development legally impossible.

The outcomes of this litigation will resolve questions that copyright law has never previously confronted, and they will shape the economics of creative work and AI development for decades.

The Legal Landscape Before 2023

What Copyright Law Protects

Copyright law protects original creative expression — not ideas, not styles, not facts, but the specific original expression that an author has created. The owner of a copyright has the exclusive right to reproduce the work, create derivative works, distribute copies, perform the work publicly, and display it publicly. These rights last for the author's lifetime plus 70 years.

Several things copyright does not protect are relevant to AI:

Style. The style of a painter, musician, or writer is not copyrightable. You can paint in the style of Vincent van Gogh; you cannot reproduce a specific Van Gogh painting. Style is in the public domain.
Ideas. Underlying ideas are not copyrightable. The idea of a hero's journey, the concept of revenge tragedy, the composition of a landscape painting — these are not protected expression.
Facts. Facts are not copyrightable, though the creative selection and arrangement of facts can be.

The style/expression distinction is crucial in AI image litigation: can AI companies argue that their systems have learned artists' styles (unprotectable) rather than reproducing their specific works (protected)?

Fair Use

Section 107 of the Copyright Act provides that reproduction of copyrighted works for purposes of "criticism, comment, news reporting, teaching, scholarship, or research" may be fair use. Courts assess fair use through a four-factor test:

Purpose and character of the use: Transformative uses — which add new meaning, expression, or message — are more likely to be fair use. Commercial uses are less likely to be fair use.
Nature of the copyrighted work: More creative works receive stronger protection.
Amount and substantiality: The more of the work that is copied, and the more central the copied portion is to the original work, the less likely fair use applies.
Market effect: Uses that substitute for the original work in its market receive less fair use protection.

These factors must be balanced holistically; no single factor is determinative. The Supreme Court's 2023 decision in Andy Warhol Foundation v. Goldsmith clarified that transformativeness alone does not make a commercial use fair — a significantly transformative use that nonetheless substitutes for the original work in its commercial market may not qualify as fair use.

The Andersen v. Stability AI Litigation

The Complaint

Filed in January 2023 in the Northern District of California, the class action complaint by Sarah Andersen, Kelly McKernan, and Karla Ortiz named Stability AI (developer of Stable Diffusion), Midjourney (developer of the Midjourney image generator), and DeviantArt (an art-sharing platform that had released its own AI image generator called DreamUp, based on Stable Diffusion).

The plaintiffs were professional artists who alleged that their work appeared in LAION-5B — a dataset of approximately 5.85 billion image-text pairs scraped from the internet — which Stability AI used to train Stable Diffusion. The complaint alleged:

Direct copyright infringement: The scraping of plaintiffs' images to create training data constituted reproduction without authorization.
Vicarious copyright infringement: Midjourney and DeviantArt were vicariously liable for Stability AI's infringement because they controlled and profited from the infringing system.
Contributory copyright infringement: Defendants contributed to infringement by providing the means to infringe.
DMCA violations: Stability AI stripped attribution metadata from images in the training dataset, violating the DMCA's prohibition on removing copyright management information.
Right of publicity violations: California law protects individuals' rights to control commercial use of their name, likeness, and persona. The plaintiffs alleged that Stable Diffusion could generate images in their specific artistic style, effectively exploiting their artistic persona commercially.

The complaint also included innovative "compression" theory allegations: that AI models do not merely learn from training data but effectively store compressed versions of the training images, such that outputs are not truly original but reconstructed copies.

The District Court's Initial Rulings

Judge William Orrick of the Northern District of California issued a series of significant rulings in 2023 on motions to dismiss:

The court dismissed the copyright infringement claims against Midjourney and DeviantArt, finding that the complaint did not adequately allege that these defendants had directly copied the plaintiffs' works — they had used Stable Diffusion, but did not control the training process that was alleged to constitute infringement.

The court sustained some claims against Stability AI, including the DMCA claim regarding removal of copyright management information. The compression theory — that AI models store copies of training data — was found plausible enough to survive dismissal.

The right of publicity claims survived in part, opening the possibility that generating images "in the style of" a specific living artist could constitute commercial exploitation of their persona.

The litigation is ongoing, with discovery and evidentiary proceedings expected to resolve key factual questions: whether LAION-5B actually contains plaintiffs' specific works; whether Stable Diffusion's outputs are derived from specific training images; and what "compression" of training data actually occurs in the model.

The Getty Images Litigation

UK Proceedings

Getty Images filed suit against Stability AI in the UK in January 2023, citing UK copyright law. The UK claim alleged that Stability AI had used approximately 12 million images from Getty's database without a license to train Stable Diffusion, and that the resulting system could reproduce Getty's watermark (the distinctive watermark overlay on Getty images) in generated images — strong evidence that the system had memorized specific training images.

The Getty watermark finding is particularly striking: Stable Diffusion could generate images that included a blurry version of the Getty Images watermark, clearly visible in outputs, even when the prompt made no mention of Getty. This is exactly the kind of evidence that supports the "compression" theory — that training data is memorized, not just learned from — and it is devastating to the "transformative use" defense, which depends on the training process being genuinely transformative rather than merely copying.

U.S. Proceedings

Getty filed a parallel U.S. lawsuit against Stability AI in the District of Delaware in February 2023. The U.S. case raised similar copyright claims and added a trademark claim: by generating images with versions of the Getty watermark, Stability AI was alleged to have violated Getty's trademarks. The trademark claim is potentially significant: trademark infringement does not require proof of copying, only likelihood of consumer confusion — and images with a corrupted Getty watermark could confuse consumers about the source of the images.

New York Times v. OpenAI

The Complaint

The New York Times' December 2023 lawsuit against OpenAI and Microsoft represents the highest-profile AI copyright case to date, and potentially the most consequential. The Times' complaint alleges:

Direct copyright infringement: OpenAI trained GPT-4 on massive amounts of Times content, reproduced Times content in outputs, and enabled Microsoft to incorporate this capability into Bing and Copilot.
Contributory copyright infringement: Microsoft is liable as a contributor to OpenAI's infringement.
DMCA violations: OpenAI removed copyright management information from Times articles used in training.
Unfair competition under New York law.

The complaint included striking exhibits: specific examples of ChatGPT reproducing Times articles nearly verbatim when prompted. These exhibits directly address the market substitution factor in fair use analysis: if users can get Times content through ChatGPT without paying for a Times subscription, the AI system is substituting for the Times in its market.

The Fair Use Battle

The legal core of this case is the fair use analysis. OpenAI's fair use defense rests primarily on transformativeness: the argument that training a language model on text is transformative because the model does not output copies of the training text but generates new text. The training process, OpenAI argues, is analogous to human learning — humans read, process, and internalize information from texts they encounter, and then generate new original expression. AI training does something analogous at scale.

The Times' response has several components: (1) the use is not transformative because the primary purpose — enabling AI to produce text about topics covered by the Times — directly substitutes for Times journalism; (2) the amount copied is enormous and systematic; and (3) the market effect is direct and severe, as demonstrated by the verbatim reproduction examples in the complaint's exhibits.

The outcome will hinge significantly on factual questions: how much Times content actually appears in GPT-4's training data (much is not publicly disclosed); how often and under what conditions outputs reproduce Times content; and whether the AI system actually competes with the Times for subscribers. Expert testimony, technical analysis, and extensive discovery will be required before the legal questions can be fully adjudicated.

The Fair Use Analysis: Current State

As of 2025, courts have not yet issued a definitive ruling on whether AI training on copyrighted works is fair use. The landscape is:

Factor 1 (Purpose and character): AI training is commercial — the AI companies use it to generate revenue. The transformativeness of the training process is genuinely contested: there is a credible argument that training a model is transformative (producing a novel system, not a copy), and a credible counterargument that the primary purpose of copying is to enable commercial competition with the sources of the training data.

Factor 2 (Nature of the work): The works at issue are highly creative (original art, journalism, novels) — favoring the copyright holder.

Factor 3 (Amount): In many cases, entire works have been copied into training datasets. This factor favors copyright holders.

Factor 4 (Market effect): This is where the strongest arguments lie for copyright holders. Evidence of verbatim output reproduction (the Times exhibits), the watermark memorization (Getty), and the style imitation effects on professional artists all support significant market harm.

The overall fair use balance is contested and will likely be resolved differently in different factual contexts: the analysis for training image generators on specific artists' work may differ from the analysis for training language models on journalistic text.

Implications

For AI Development

If courts ultimately find that AI training on copyrighted works is not fair use, the implications for the AI industry are severe: existing models may be subject to massive damages and may need to be retrained on licensed or public-domain data. Future AI development would require licenses to training data — either through industry-wide licensing agreements (analogous to the music industry's blanket licensing through ASCAP and BMI) or through public-domain or synthetic data strategies.

The industry is already anticipating these scenarios. Several AI companies have entered into licensing agreements with publishers and news organizations: OpenAI announced licensing agreements with the Associated Press and several publishers. These agreements provide both an alternative source of training data and evidence, in litigation, that licensing is feasible — supporting the argument that the "market substitution" factor weighs against fair use.

For Creative Professionals

The litigation has already changed AI industry practices for creative professionals. Several AI companies have established opt-out mechanisms allowing artists to request that their work be excluded from future training datasets. The practical effectiveness of these mechanisms — which require artists to proactively identify their work and submit removal requests, rather than requiring opt-in consent — is contested by artists' advocates.

The cases have also spurred legislative interest. The EU AI Act includes a requirement that AI companies comply with copyright law, including an opt-out mechanism for rights holders who do not want their work used in AI training. Similar provisions have been proposed in U.S. draft legislation.

Discussion Questions

Walk through the four-factor fair use analysis as applied to OpenAI training GPT-4 on New York Times articles. Where does the analysis point most strongly for each factor, and what is your overall conclusion?
The Getty Images watermark memorization finding — Stable Diffusion generating images with blurry Getty watermarks — seems like strong evidence against the fair use defense. Why would this specifically undercut the "transformativeness" argument?
Some AI companies have entered into licensing agreements with publishers. Does the availability of licensing actually undercut the fair use defense (by showing that licensed alternatives exist), or does it show that the market is working without legal intervention?
If courts find that AI training is not fair use, what would the practical alternatives be? What would it mean for AI development to retrain models only on licensed or public-domain data?
The right of publicity claim — that generating images "in the style of" a living artist violates their persona rights — is conceptually distinct from the copyright claim. What interests does the right of publicity protect that copyright does not? Do you find it a persuasive theory in the AI context?