Case Study 38-2: Clearview AI — When the Entire Internet Becomes a Biometric Database

Case Study 38-2: Clearview AI — When the Entire Internet Becomes a Biometric Database

Background

In January 2020, The New York Times published an investigation revealing the existence of Clearview AI — a facial recognition company that had, without public announcement or any regulatory process, scraped more than three billion photographs from publicly accessible internet sources including Facebook, Instagram, LinkedIn, Venmo, and millions of other websites. These photographs, associated with names and other personal information, formed the training dataset for a facial recognition system that Clearview was selling to law enforcement agencies across the United States.

The basic capability that Clearview AI offered was striking: upload a photograph of an unknown person's face, and the system would search its database of billions of scraped images to identify them — producing links to their social media profiles, news appearances, and other online presence. For law enforcement agencies that had previously relied on DMV photo databases or mugshot databases for facial recognition, Clearview AI represented a qualitative leap: instead of searching a database of people who had had some prior government contact, Clearview searched the near-entirety of the documented public internet.

The Architecture of Clearview

Clearview AI's business model depends on a legal theory that has been contested in courts across multiple jurisdictions: that photographs published publicly on the internet are available for any use, including commercial use as AI training data and facial recognition database construction.

Platform terms of service universally prohibit mass scraping of the kind Clearview conducted. Facebook, Google, Twitter, YouTube, and virtually every other platform from which Clearview scraped images sent cease-and-desist letters; Clearview challenged these letters, arguing that the data was publicly accessible and that platforms could not restrict how publicly accessible data was used.

This legal challenge has significant implications beyond Clearview's specific business. The argument that public accessibility equals permission to use would, if sustained, eliminate contextual integrity as a meaningful principle in privacy law: a photograph posted to share with friends and family would be permanently available for any use by any party that could access it, forever, regardless of the context in which it was posted. Nissenbaum's contextual integrity framework — which holds that information flows appropriately when they match the norms of the context in which information was shared — would be legally unenforceable for any information that passes through a publicly accessible system.

Law Enforcement Adoption

Before the Times investigation revealed Clearview's existence, over 600 law enforcement agencies had already adopted the technology, including the FBI, ICE, the TSA, Walmart's asset protection team, and police departments ranging from large urban forces to small-town sheriff's offices. The adoption had occurred without public disclosure, without statutory authorization in most jurisdictions, and without evaluation of the accuracy or bias properties of the system.

This pattern of adoption — prior to public knowledge, prior to regulatory framework, prior to accuracy evaluation — is precisely the pattern that has characterized the deployment of surveillance technologies throughout history. The technology is adopted by early-mover agencies; by the time public attention focuses on it, it is already embedded in law enforcement practice and generates institutional resistance to restriction.

Several wrongful identification cases involving Clearview AI have been documented. The pattern of error mirrors that documented for other facial recognition systems: disproportionate false positives for darker-skinned faces. But because Clearview's database is drawn from internet photographs rather than government records, it introduces additional failure modes: photographs of family members who share physical features; photographs misidentifying the person depicted; photographs taken under conditions (lighting, angle, image quality) that degrade recognition accuracy.

The BIPA Litigation

The most significant legal challenge to Clearview AI came not from federal regulators but from a state biometric privacy law: the Illinois Biometric Information Privacy Act (BIPA). BIPA, enacted in 2008, prohibits the collection, use, or dissemination of biometric identifiers (including "face geometry" derived from photographs) without the informed written consent of the person whose biometrics are collected. Uniquely among U.S. privacy laws, BIPA has a private right of action: individuals can sue for statutory damages without proving actual harm.

A class action lawsuit by Illinois residents against Clearview AI alleged that the company's scraping of Illinois residents' photographs from the internet constituted illegal collection of biometric data without consent. In 2022, Clearview AI settled the lawsuit — agreeing not to sell its database to private companies (it could continue selling to law enforcement), not to sell to Illinois businesses, and agreeing to other use restrictions.

The BIPA litigation illustrates several things that are directly relevant to Chapter 38's analysis: that comprehensive privacy regulation with private enforcement rights can create meaningful accountability for surveillance companies; that state-level regulation can be more effective than federal regulation where federal standards are absent; and that legal challenge can, over time, reshape the privacy landscape for surveillance technologies.

The International Response

In jurisdictions with stronger data protection frameworks, the response to Clearview AI was more categorical than in the United States.

Data protection authorities in the United Kingdom, Australia, Canada, France, Italy, and Greece have each found that Clearview AI's data collection practices violated applicable privacy law — finding that the scraping of photographs without consent constituted unlawful collection of biometric data, that the "public accessibility" argument was insufficient to override data subjects' privacy rights under local law, and that the lack of any legitimate purpose proportionate to the privacy intrusion was disqualifying. Several of these regulators ordered Clearview to delete data relating to their jurisdictions' residents and imposed fines.

The divergence between U.S. and European regulatory responses illustrates a fundamental difference in privacy law architecture: in the EU, GDPR establishes a default of data protection that requires companies to justify data collection; in the United States, data collection is generally permitted unless specifically prohibited. Clearview AI is a case study in what the absence of comprehensive federal privacy legislation enables.

What Clearview AI Reveals About the Future

The Clearview AI case illuminates the surveillance future in several ways that connect directly to Chapter 38's analysis:

1. The retrospective identification problem. Clearview enables law enforcement to take a photograph of someone at a protest, a political meeting, or any public gathering, and identify them. This capability did not previously exist at scale. Its existence changes the chilling effect calculus for public political activity: participation in a public protest is no longer anonymous in the way it was when cameras could capture faces but could not, without enormous manual effort, attach those faces to identities.

2. The permanent record problem. Clearview's database includes photographs from years or decades in the past. Photographs from a person's college years, photographs posted by others without the subject's knowledge, photographs from moments when the person was not thinking about their digital footprint, are all potentially in the database. The past becomes permanently searchable.

3. The consent impossibility. Clearview collected biometric data — legally distinctive from ordinary photographs in most biometric privacy frameworks — from billions of people who have no idea their data exists in Clearview's database, no mechanism to access or correct it, and no effective means of having it deleted. In the United States, outside of Illinois and a few other BIPA-protected jurisdictions, there is no legal right to know your data is in the database, let alone to have it removed.

4. The public/private boundary erosion. The historical distinction between public surveillance (cameras in public spaces operated by law enforcement) and private life (documented in photographs shared in social contexts) has been effectively eliminated. Social media photographs — posted to maintain relationships, share experiences, perform identity — become inputs to a biometric surveillance system. The context of sharing (personal connection, social performance) is entirely severed from the use (identity verification for law enforcement).

Discussion Questions

Clearview AI argues that photographs published publicly on the internet are available for any use, including commercial data collection. Evaluate this argument using the contextual integrity framework. Does public accessibility imply permission?
The wrongful identification cases associated with Clearview AI follow the same racial bias pattern as other facial recognition systems. What is the relationship between Clearview's database construction methodology (scraping the internet) and its racial bias profile?
The retrospective identification capability enabled by Clearview AI — identifying people at protests or political events from archived photographs — raises First Amendment concerns about chilling effects on political association. How should courts balance this concern against law enforcement interests in identification?
The international regulatory response to Clearview AI was substantially stronger than the U.S. response. What are the implications of this divergence for U.S. residents whose data is in Clearview's database? What legal mechanisms, if any, are available to them?
If you were advising the city government of a major U.S. city about whether to permit local police to use Clearview AI, what would you recommend and why? What conditions, if any, would you attach to permitted use?