Case Study 35.1: The Schwartz Case — Legal Hallucination and Professional Liability

Mata v. Avianca and the Fabricated Precedents

Overview

In the spring of 2023, a personal injury lawsuit in a federal district court in New York became one of the most widely discussed AI failures in the technology's brief public history. The case — Mata v. Avianca — did not involve cutting-edge AI, a major corporation's AI governance failure, or a sophisticated adversarial attack. It involved an experienced attorney, a chatbot, and a series of errors that compounded into a landmark episode for every profession that relies on accurate information and professional judgment.

The episode is often summarized as "a lawyer cited fake cases generated by ChatGPT." That summary is accurate but insufficient. The case is more instructive than the summary suggests, because it illuminates the specific cognitive and procedural failures that occur when professionals adopt AI tools without understanding their limitations, without verification habits appropriate to those limitations, and without the institutional frameworks that professional responsibility requires. Those failures are not unique to Steven Schwartz. They are the predictable result of deploying a technology that produces confident-sounding outputs in professional contexts where confidence is often treated as evidence of accuracy.

Background: The Accident and the Lawsuit

Roberto Mata, a citizen of El Salvador, was a passenger on a Avianca Airlines flight in August 2019. During the flight, he alleged, a metal serving cart struck his knee, causing injury. He subsequently brought a personal injury lawsuit against Avianca in the United States District Court for the Southern District of New York. Mata was represented by the law firm Levidow, Levidow and Oberman, a New York personal injury practice.

The case faced a potential procedural obstacle: Avianca argued that the lawsuit was untimely under the statute of limitations established by the Montreal Convention, the international treaty governing liability for injury and death on international flights. Mata's counsel needed to demonstrate that the case could proceed despite this argument, which required legal research to identify precedents for the court's consideration.

It was at this juncture that attorney Steven Schwartz made a decision that would define the remainder of the case.

The Research Decision

Steven Schwartz had been practicing law for more than thirty years. He was not a junior associate unfamiliar with legal research standards. He was an experienced practitioner who, like many lawyers, faced the perpetual pressure of legal work: time constraints, client expectations, and the need to produce thorough research efficiently.

Schwartz decided to use ChatGPT, the large language model developed by OpenAI, to assist with research. He later testified in an affidavit that he had never previously used ChatGPT for legal research and had "no reason to believe" that the chatbot would generate false information. He described asking ChatGPT for case precedents supporting his arguments, and the chatbot producing a list of cases with citations.

The cases ChatGPT provided were:

Varghese v. China Southern Airlines Co., Ltd.
Shaboon v. Egyptair
Petersen v. Iran Air
Martinez v. Delta Airlines, Inc.
Estate of Durden v. KLM Royal Dutch Airlines, NV
Zicherman v. Korean Air Lines Co., Ltd.

Some of these names attached to actual cases that exist in legal databases — Zicherman v. Korean Air Lines is a real case — but the specific citations, holdings, and details that ChatGPT provided were fabricated. The real Zicherman case involved different issues than the holding ChatGPT attributed to it. The other cases either did not exist at all or did not say what ChatGPT claimed they said.

Schwartz incorporated these citations into a brief filed with the court on behalf of Mata.

Discovery of the Fabrications

When Avianca's counsel could not locate the cited cases in any legal database — Westlaw, LexisNexis, or the court's own records — they filed a letter with the court informing it that the cases did not appear to exist. The court issued an order requiring Mata's counsel to provide copies of the cited cases.

What followed was a sequence of compounding errors that the court would later describe in unflattering terms.

Schwartz, apparently still not understanding what had happened, asked ChatGPT whether the cases were real. ChatGPT confirmed that they were real. When Schwartz asked for additional information about the cases, ChatGPT provided detailed descriptions of holdings, parties, and procedural histories — all fabricated. Schwartz, apparently reassured by the chatbot's confirmation, submitted an affidavit to the court stating that the cases were real and that he had "confirmed" their existence.

The court was not satisfied. It demanded that counsel produce copies of the cases themselves. Schwartz's supervising partner, Peter LoDuca, filed a brief that was, the court later found, itself misleading. LoDuca stated that Schwartz had "encountered" the cases through legal databases, which was not accurate.

When counsel finally disclosed the truth — that the cases had been produced by ChatGPT — the court's response was severe.

The Court's Response

Judge P. Kevin Castel presided over the resulting proceedings. In a June 2023 opinion and order, Judge Castel described what had occurred with evident disapproval. He noted that the fabricated citations had been submitted in a brief filed with the court, that counsel had been given multiple opportunities to correct the record and had instead compounded the error, and that the initial response to the court's inquiries had not been candid.

Judge Castel found that Schwartz and LoDuca had violated Rule 11 of the Federal Rules of Civil Procedure, which requires attorneys to certify that factual contentions in filed documents have evidentiary support. He found that they had made "false and misleading statements to the court," both in the original brief and in subsequent filings attempting to explain the situation.

The sanction imposed was $5,000, jointly and severally, to be paid to the court. Schwartz and LoDuca were also ordered to send copies of the opinion to each judge whose name had appeared in the fabricated citations — a specific sanction designed to address the reputational harm to judges falsely credited with opinions they had never written.

The court also referred the matter to the New York State Appellate Division Disciplinary Committee, which investigates attorney misconduct. The precise outcome of that investigation had not been publicly reported as of mid-2024.

The Bar Association Response

The Schwartz case catalyzed responses across the legal profession. The American Bar Association had previously issued Formal Opinion 512 (2023), which addressed lawyers' use of generative AI and emphasized that Model Rule 1.1's competence requirement applied to AI-assisted legal work. The opinion stated that lawyers who use AI tools have a duty to understand the technology's limitations and to verify AI-generated outputs against authoritative sources before relying on them.

State bar associations moved more quickly following the Schwartz case. The New York State Bar Association issued guidance emphasizing that lawyers retain full professional responsibility for AI-assisted work product. Florida's bar association created guidelines specifying that AI-generated citations must be independently verified in authoritative legal databases. California, Texas, and other states issued similar guidance.

Courts began adopting local rules requiring disclosure of AI use. The Fifth Circuit issued guidance. Judge Castel himself issued standing orders in his court. Dozens of federal district courts adopted orders requiring disclosure when AI was used to assist in drafting court filings, and in some cases requiring specific certifications of human verification.

These responses had a constructive effect: they established that professional responsibility frameworks apply to AI-assisted work and that verification is mandatory. But they also illustrated how the legal profession — like many other professions — had deployed AI tools before developing the norms and training necessary to use them responsibly.

What the Schwartz Case Reveals

The case is pedagogically rich because it reveals several failure modes that are generalizable across professional contexts.

The confidence problem: Schwartz initially stated that he had "no reason to believe" ChatGPT would produce false information. This is a fundamental misunderstanding of how large language models work. LLMs do not retrieve facts from verified databases; they generate statistically plausible text. A user who does not understand this distinction may reasonably but incorrectly treat confident-sounding AI output as verified information. Training professionals in the basic architecture of AI tools — what they can and cannot do — is a prerequisite for responsible use.

The self-confirmation problem: When Schwartz asked ChatGPT whether the cited cases were real, ChatGPT confirmed they were and provided additional details. This is a critical feature of the Schwartz episode that is often overlooked: the AI system did not correct its own error when asked. It elaborated on the fabrication. This reflects a known limitation of current LLMs: they have no reliable mechanism for detecting and retracting their own hallucinations. Users who seek to verify AI outputs by asking the AI system itself to verify them are not verifying anything — they are generating additional AI output.

The supervision problem: The involvement of LoDuca, the supervising partner, illustrates the organizational dimension of the failure. LoDuca did not ask Schwartz how the research had been conducted. He did not independently verify the citations. He signed filings that contained inaccurate statements about the source of the research. Professional supervision should include inquiry into research methodology, particularly when AI tools are used.

The verification gap: The straightforward way to verify a legal citation is to look it up in a legal database. Westlaw and LexisNexis both contain comprehensive databases of published court opinions; a case that cannot be found in these databases almost certainly does not exist. The verification step — checking AI-generated citations against authoritative sources — is not technically demanding. It failed here because neither Schwartz nor his colleagues appear to have performed it, apparently accepting ChatGPT's output as verification in itself.

The time pressure dynamic: The case occurred in the context of real practice pressures. Schwartz was facing deadlines. AI tools are appealing precisely because they accelerate research. But time pressure is an environment in which verification steps are most likely to be omitted — which means governance frameworks for AI use in professional settings must make verification mandatory and non-negotiable, not discretionary.

Liability Implications

The Schwartz case stopped short of imposing significant monetary sanctions — $5,000 is modest by legal standards. But the liability landscape for AI-assisted professional errors is developing in ways that may impose far greater costs.

At the professional liability level, malpractice claims arising from AI-assisted errors are a credible risk. If a client suffers harm because an attorney cited fabricated cases, failed to provide accurate legal advice because the AI provided incorrect information, or missed a deadline because an AI-generated calendar was wrong, the attorney may face malpractice liability. Professional liability insurance policies are beginning to grapple with AI-related exclusions and requirements.

At the organizational level, law firms and other professional services firms that fail to implement adequate governance frameworks for AI use may face systemic liability exposure. If a firm allows attorneys to use AI tools without training, verification requirements, or supervision, and a client suffers harm as a result, the firm's systemic failure to govern AI use may be relevant to liability determinations.

At the technology level, questions about OpenAI's liability for harms caused by ChatGPT's hallucinations remain largely unresolved. OpenAI's terms of service contain strong disclaimers of liability and require users to verify AI outputs. Courts have not yet definitively resolved whether these disclaimers are enforceable in all contexts or whether AI developers bear any duty of care for foreseeable professional misuse.

The Broader Professional Reckoning

The Schwartz case was not the last episode of professional AI misuse in the legal context. Subsequent cases documented additional instances of AI-generated fabrications in court filings in various jurisdictions. A Canadian court found fabricated citations submitted by an attorney. Australian courts documented similar episodes. The pattern suggests that the Schwartz case was not an isolated error but an early manifestation of a systematic risk created by the deployment of generative AI tools in professional contexts.

The legal profession's response has been more substantive than in some other professions, in part because the consequences of error in legal filings are immediate and visible — opposing counsel and courts check citations. Other professions — medicine, finance, journalism, engineering — face similar risks from professional reliance on AI-generated information, but with verification challenges that are more complex and consequences that may take longer to surface.

The deeper lesson of the Schwartz case is not about legal ethics specifically. It is about the general risk of deploying AI tools in professional contexts without developing the competence, habits, and institutional frameworks necessary to use them responsibly. The technology will continue to improve; hallucination rates may decline; retrieval-augmented approaches may improve accuracy. But as long as AI systems can produce confident-sounding incorrect outputs, and as long as professional contexts create pressure to trust authoritative-seeming information, the risks illustrated by Schwartz will remain.

Discussion Questions

Schwartz was an experienced attorney with more than thirty years of practice. Does his experience make his error more or less understandable? What does this suggest about the audience for AI literacy training in professional contexts?
The court found that Schwartz had "confirmed" the cases' existence based on ChatGPT's self-confirmation. Should this be treated as a mitigation or an aggravation of the professional responsibility violation? Why?
What specific governance procedures should law firms implement to prevent recurrence of this type of error? Who should be responsible for enforcing compliance with those procedures?
If a client had suffered measurable harm — for instance, if the case had been dismissed because of the fabricated citations — what liability exposure might Schwartz, LoDuca, and their firm face? What legal theories might apply?
How should law school curricula address AI literacy? What should new attorneys know about generative AI before they enter practice?
The court's $5,000 sanction has been described as modest relative to the severity of the conduct. Do you agree? What factors should courts consider when sanctioning AI-related professional misconduct?