Case Study 1: Getty Images vs. Stability AI — The Copyright Battle That Could Shape AI's Future

Case Study 1: Getty Images vs. Stability AI — The Copyright Battle That Could Shape AI's Future

Introduction

On January 17, 2023, Getty Images filed a lawsuit against Stability AI in the High Court of Justice in London, alleging that Stability AI had copied and processed millions of Getty's copyrighted images — without permission or compensation — to train its Stable Diffusion image generation model. A parallel action was filed in the US District Court for the District of Delaware.

The lawsuit was not a surprise. For months, the AI industry had braced for a legal reckoning over training data rights. But the specifics of the Getty case made it uniquely consequential. Getty Images is one of the world's largest visual content libraries, representing over 477 million images from more than 500,000 contributing photographers and artists. Stability AI's Stable Diffusion is one of the most widely used open-source image generation models, with millions of individual users and thousands of commercial applications built on top of it.

The case sits at the intersection of copyright law, AI technology, and the economics of creative work. Its outcome — along with parallel cases involving text, music, and code — will establish foundational precedents for how generative AI models are built, who profits from them, and what obligations AI companies owe to the creators whose work fueled their training.

For business leaders, the case is not merely a legal curiosity. It has direct implications for any organization that uses AI-generated images, licenses visual content, or operates in creative industries. The frameworks for managing IP risk that emerge from this case will shape generative AI strategy for a generation.

The Parties

Getty Images

Founded in 1995 by Mark Getty and Jonathan Klein, Getty Images became the dominant player in the commercial stock photography market through a series of acquisitions — including PhotoDisc, Tony Stone Images, and iStockphoto. The company's business model is straightforward: photographers and visual artists contribute images to Getty's library, Getty licenses those images to businesses, media organizations, advertisers, and publishers, and revenues are shared between Getty and the contributing creators.

By 2023, Getty represented approximately 477 million creative and editorial images. Its client base included most major advertising agencies, media companies, and Fortune 500 corporations. The company's annual revenue was approximately $900 million. Getty's value proposition rested on two pillars: the quality and breadth of its library, and the legal certainty of its licensing — when you license a Getty image, you receive a clear intellectual property license with known terms and indemnification.

The rise of AI image generation posed an existential threat to this business model. If businesses could generate photorealistic images on demand for pennies, the commercial rationale for licensing stock photography eroded rapidly. But if the AI models that enabled that generation were themselves built on Getty's copyrighted content, the threat became a legal claim.

Stability AI

Stability AI, founded by Emad Mostaque in 2020, developed and released Stable Diffusion — an open-source text-to-image diffusion model — in August 2022. The model quickly became one of the most popular AI image generation tools, distinguished by its open-source availability (anyone could download and run it) and the thriving ecosystem of applications, fine-tuned models, and plugins built around it.

Stability AI raised over $100 million in venture capital and was valued at approximately $1 billion by late 2022. But the company's rapid growth was accompanied by questions about the provenance of its training data. Stable Diffusion was trained on LAION-5B, a publicly available dataset containing approximately 5.85 billion image-text pairs scraped from the internet. Researchers quickly demonstrated that LAION-5B included millions of copyrighted images from Getty, Shutterstock, and other commercial libraries — including images that still bore visible watermarks.

The Allegations

Getty's complaint centered on several key claims:

Mass Copyright Infringement

Getty alleged that Stability AI "copied and processed millions of images" owned by Getty or its contributors to train Stable Diffusion. The complaint identified over 12 million Getty images in the LAION training dataset. Getty argued that copying these images into the training dataset — regardless of what the model subsequently did with them — constituted reproduction of copyrighted works without authorization.

Evidence of Copying: The Watermark Problem

Perhaps the most striking evidence in the case was that Stable Diffusion could sometimes generate images containing artifacts of the Getty Images watermark — the distinctive overlaid text reading "gettyimages" that Getty places on unlicensed preview images. If the model had been trained only on properly licensed or public domain images, it would have no reason to have learned the Getty watermark pattern. The watermark's appearance in generated outputs was, Getty argued, direct evidence that copyrighted, watermarked preview images were included in the training data.

Business Insight: The watermark evidence illustrates a broader principle: AI models encode patterns from their training data, and those patterns can surface in unexpected ways. For businesses using AI-generated content, this creates a latent risk — the model may have learned from copyrighted source material, and that origin may become visible in outputs. This is one reason why Lena Park (in the chapter) advises businesses to understand what training data their AI tools were built on.

Trademark Infringement

Beyond copyright, Getty alleged trademark infringement. The Getty Images watermark and logo are registered trademarks. By training on watermarked images and generating outputs that sometimes reproduced the watermark, Stability AI was allegedly using Getty's trademarks without authorization in a way that could confuse consumers about the origin or licensing status of the generated images.

Unjust Enrichment

Getty argued that Stability AI profited from Getty's investment in curating, organizing, and licensing visual content without compensating Getty or its contributing creators. In effect, Getty claimed that Stability AI had built a competing product — one that undermined the market for licensed photography — using Getty's own content as raw material.

The Defense

Stability AI's defense rested on several arguments, though the company's legal strategy evolved as the case progressed:

Fair Use (US) and Fair Dealing (UK)

In the US proceeding, Stability AI argued that training a model on copyrighted images constituted fair use — a doctrine that allows limited use of copyrighted material without permission for purposes such as commentary, criticism, research, and transformative use. The key argument was that training an AI model is "transformative" — the model does not reproduce the training images but rather learns abstract patterns and relationships that it uses to generate entirely new images.

The fair use defense in US law is evaluated on four factors: (1) the purpose and character of the use (commercial vs. educational, transformative vs. reproductive), (2) the nature of the copyrighted work, (3) the amount and substantiality of the portion used, and (4) the effect on the market for the original work. Stability AI's case was strongest on factor 1 (transformative use) and weakest on factor 4 (market effect), since AI-generated images directly compete with licensed stock photography.

In the UK proceeding, the equivalent doctrine — "fair dealing" — is more restrictive than US fair use, and Stability AI's defense was correspondingly more difficult.

The Images Are Not "Stored"

Stability AI argued that the trained model does not contain or store the training images. The model's parameters encode statistical patterns learned from the training data, not copies of individual images. Therefore, the argument went, the training process is analogous to a human artist studying thousands of photographs to learn composition, lighting, and style — and then creating original work informed by that learning.

Getty countered that the analogy was inapposite: a human artist processes images through biological cognition and creates genuinely original interpretations, while an AI model processes images through mathematical optimization and produces outputs that are statistical interpolations of the training data.

The LAION Dataset Defense

Stability AI argued that it did not directly scrape Getty's images. LAION — a nonprofit research organization based in Germany — created the LAION-5B dataset by crawling publicly accessible web pages and collecting image-URL pairs. Stability AI used this publicly available dataset for training. Whether the images in LAION were copyrighted was, Stability AI argued, LAION's responsibility, not Stability AI's.

This argument faced challenges on both legal and practical grounds. Legally, using copyrighted material does not become legal simply because a third party assembled the collection. Practically, Stability AI had the technical capability to filter copyrighted images from the training dataset (image fingerprinting, watermark detection) and chose not to.

Broader Implications for Business

For Companies Using AI-Generated Images

The Getty case established that using AI-generated images carries IP risk that cannot be entirely eliminated by the end user. Even if you did not train the model and did not select the training data, the content you generate may have originated from copyrighted source material, and the legal framework for your liability remains unsettled.

Practical guidance:

Prefer AI tools with known, licensed training data. Adobe Firefly, for example, is trained on Adobe Stock, Creative Commons content, and public domain material — and Adobe provides indemnification to enterprise customers. This does not eliminate all risk, but it substantially reduces it.
Maintain records of your generation process. Document the prompts used, the tools employed, and the human modifications applied. This documentation strengthens fair use arguments and demonstrates due diligence.
Implement similarity screening. Before publishing AI-generated images, compare them against databases of copyrighted images to identify potential matches. Several commercial services now offer this capability.

For Content Creators and Licensors

The case validated the claim that content creators have legal standing to challenge AI companies that train on their work without permission. Whether those challenges ultimately succeed on the merits remains to be seen, but the legal pathway exists and is being actively pursued.

For businesses that create and license original content — publishers, media companies, photography agencies, music labels — the strategic question is whether to resist AI training (through litigation and opt-out mechanisms), negotiate compensation (through licensing agreements with AI companies), or participate directly (by launching their own AI-powered content generation tools).

By 2025, several major content companies had pursued all three strategies simultaneously: litigating against unauthorized use, negotiating licensing deals with some AI companies, and building their own AI capabilities. Getty itself launched a generative AI service in partnership with NVIDIA, trained exclusively on Getty's own licensed content — effectively competing with the tools that had trained on its content without permission.

For AI Companies

The case signaled that the era of "train first, ask permission never" was ending. By 2025, most major AI companies had begun securing licensing agreements for training data, offering indemnification to enterprise customers, and implementing mechanisms for content creators to opt out of training datasets. These changes increased the cost of building AI models but reduced legal risk and established more sustainable business relationships with content creators.

Research Note: The economic analysis firm Oxford Economics estimated in 2024 that comprehensive licensing of training data could increase the cost of training a frontier image generation model by $50-200 million — a significant but manageable expense for well-funded AI companies, and a potentially meaningful revenue stream for content creators. However, the estimation was highly uncertain, as the "fair market value" of training data has no established precedent.

The Status as of Early 2026

As of early 2026, the Getty v. Stability AI case remains in active litigation in both the UK and US. Several significant procedural developments have occurred:

In the UK, the High Court allowed Getty's claims to proceed, rejecting Stability AI's attempt to dismiss the case on jurisdictional grounds.
In the US, the court denied Stability AI's motion to dismiss, ruling that Getty had plausibly alleged copyright infringement and that the fair use defense would need to be resolved at trial.
Settlement discussions have been reported but not confirmed. The AI industry is watching closely, as a trial ruling would establish binding precedent (in the relevant jurisdiction) on the legality of training AI models on copyrighted content.

Parallel developments in other cases — particularly the New York Times v. OpenAI (text), Authors Guild v. OpenAI (books), and record label suits against Suno and Udio (music) — will collectively establish the legal framework for generative AI training data rights.

Discussion Questions

The Fair Use Debate. Is training an AI model on copyrighted images more analogous to a human artist studying photographs for inspiration (arguably fair use) or to a company photocopying an entire library to build a competing product (likely not fair use)? What factors should determine where on this spectrum AI training falls?
The Market Effect. Getty argues that AI-generated images directly compete with licensed stock photography, harming the market for the original copyrighted works. Stability AI might argue that AI generation expands the overall market for visual content by enabling use cases that previously couldn't afford professional photography. How should courts weigh these competing claims about market effect?
The Content Creator's Dilemma. If you were a professional photographer whose images were included in the LAION training dataset without your permission, what would you want the legal outcome to be? Consider: (a) a blanket prohibition on using copyrighted images for training, (b) a compulsory licensing system (training is allowed but compensation is required), (c) an opt-out system (training is allowed by default, but creators can opt out), or (d) no restrictions (training is fair use). What are the practical implications of each option for creators, AI companies, and businesses that use AI-generated content?
Indemnification and Risk Transfer. Adobe offers indemnification for commercial use of images generated by Firefly. Stability AI (as an open-source project) does not. How should this difference factor into a business's choice of AI image generation tools? Is indemnification sufficient to manage IP risk, or are additional safeguards needed?
The Precedent Question. If Getty wins, what precedent does that set for other forms of generative AI — text generation trained on books and articles, code generation trained on open-source repositories, music generation trained on copyrighted recordings? If Stability AI wins, what precedent does that set for the rights of content creators? How should business leaders plan for both outcomes?

This case study relates to Chapter 18's discussion of intellectual property issues in generative AI. For a deeper exploration of the regulatory landscape, see Chapter 28: AI Regulation — Global Landscape.