45 min read

In This Chapter

Opening Hook
Learning Objectives
Section 18.1: The Accountability Gap in AI
Section 18.2: The Many Hands Problem
Section 18.3: Types of AI Failure Modes
Section 18.4: Developer Responsibility
Section 18.5: Deployer Responsibility
Section 18.6: Operator and User Responsibility
Section 18.7: Platform and Infrastructure Responsibility
Section 18.8: Regulatory and Government Responsibility
Section 18.9: Structural Accountability — Building Systems That Work
Section 18.10: The Uber/Herzberg Case as Accountability Analysis
Discussion Questions

Case Study 01 Case Study 02 Key Takeaways Exercises Quiz Further Reading

Chapter 18: Who Is Responsible When AI Fails?

Opening Hook

On the night of March 18, 2018, Elaine Herzberg was pushing her bicycle across a four-lane road in Tempe, Arizona, when an Uber self-driving vehicle struck and killed her. She became the first pedestrian killed by an autonomous vehicle in a public space. What followed was not a clean story of criminal accountability. It was a sprawling investigation that eventually implicated almost everyone — and convicted almost no one.

The National Transportation Safety Board (NTSB) found that the Uber vehicle's perception system had detected Herzberg six seconds before impact but had misclassified her multiple times — first as an unknown object, then as a vehicle, then as a bicycle — before finally "deciding" she was not going to be in the vehicle's path. The software had been configured to suppress false-positive emergency braking. The safety driver, Rafaela Vasquez, was watching a streaming video on her phone — a Hulu show called "The Voice." Uber's internal safety protocols were inadequate. The Tempe Police Department had approved a testing permit without any framework for evaluating the vehicle's actual safety. Arizona's governor had aggressively recruited autonomous vehicle companies with minimal regulatory oversight, viewing regulation as an obstacle to economic development.

Who was responsible? Uber, the operator of the vehicle? Vasquez, whose attention was elsewhere? The software engineers who disabled the emergency braking to reduce passenger discomfort from "false positives"? The managers who approved deployment with known system limitations? Arizona regulators who permitted testing on public roads without adequate safety evaluation? The federal government, which had no mandatory pre-deployment safety standards for autonomous vehicles?

The NTSB found failures at every level. Uber settled civil claims for an undisclosed sum. Vasquez was charged with negligence but ultimately received probation. No Uber executives were prosecuted. No regulatory reform occurred for several years. Uber's self-driving unit was eventually sold.

This chapter is about why accountability is so difficult to assign when AI systems fail — and what legal, organizational, and structural changes could make it more real. The Herzberg case is not exceptional. It is representative of a pattern that recurs across AI domains: distributed causation, technical opacity, organizational diffusion of responsibility, and a legal system that was not designed for autonomous AI systems. Understanding that pattern — and what breaks it — is one of the most important challenges in AI ethics.

Learning Objectives

By the end of this chapter, students will be able to:

Define the accountability gap in AI systems and explain why it differs from accountability in traditional technology failures.
Apply the "many hands" problem to real AI failure cases, identifying each party's contribution to harm.
Categorize AI failures using the taxonomy of specification, bias, robustness, security, integration, governance, and drift failures.
Articulate the distinct responsibilities of developers, deployers, operators, platforms, and regulators in the AI value chain.
Evaluate the adequacy of existing legal frameworks — negligence, products liability, civil rights law — for assigning AI accountability.
Analyze specific accountability failures in the Uber/Herzberg case and the Amazon hiring algorithm case using the frameworks developed in this chapter.
Describe structural accountability mechanisms — impact assessments, registration, mandatory insurance, audit requirements, and incident reporting — and assess their effectiveness.
Distinguish between individual accountability (blaming a person) and structural accountability (fixing a system), and explain why both are necessary but neither is sufficient alone.

Section 18.1: The Accountability Gap in AI

Defining Accountability

Accountability, in its most basic sense, is the obligation to explain one's actions and face appropriate consequences for them. It is what separates genuine governance from theater. An accountable actor can be identified, questioned, required to justify what they did, and sanctioned if their actions caused harm. In well-functioning institutions, accountability operates at multiple levels simultaneously: individual workers answer to supervisors; organizations answer to regulators; regulators answer to legislatures; legislatures answer to voters.

Accountability is not the same as responsibility, though the terms are often used interchangeably. Responsibility refers to the causal or moral relationship between an actor and an outcome — the surgeon is responsible for the incision she makes. Accountability is more specifically institutional: it refers to the obligation to report and be judged. A surgeon may be responsible for a medical outcome while being held accountable by a hospital review board, a state licensing authority, a court, and a patient. Each of these accountability mechanisms operates differently and produces different consequences.

Liability is the legal form of accountability: the exposure to legal penalties, damages, or other legal consequences for actions that cause harm. Culpability is the moral concept: the degree to which an actor deserves blame, given what they knew and could reasonably have foreseen. These distinctions matter because AI failures often produce genuine culpability — someone made choices that led to foreseeable harm — without producing legal liability, because the law's mechanisms for assigning liability don't fit the contours of the harm.

Why AI Creates Accountability Gaps

AI systems create accountability gaps through five interlocking mechanisms, each of which would be challenging in isolation. Together, they constitute a genuine structural problem.

Distributed causation. Traditional products liability assumes a relatively simple causal story: a manufacturer made a defective product that injured a user. Even in complex cases, there is usually an identifiable "but-for" cause — a design flaw, a manufacturing error, a failure to warn — that a plaintiff can point to. AI systems, by contrast, are produced through processes involving dozens of teams, hundreds of datasets, thousands of design choices, and deployment decisions made by parties who never communicated directly with the developers. The harm Elaine Herzberg suffered was caused by a sensor array designed by one team, an object classification system trained by another, a braking suppression decision made by a third, a safety monitoring protocol set by a fourth, and a permit regime administered by a state government. Which of these was the "real" cause? All of them were. None of them, individually, constitutes an obviously sufficient cause of her death.

Technical opacity. Many modern AI systems — particularly deep learning systems — are not designed in a way that permits easy causal inspection. When a neural network makes a classification decision, there is often no human-interpretable explanation for why it made that decision. This is not a minor inconvenience. It means that the ordinary legal question — what did the system do, and why? — may be unanswerable in any technically rigorous sense. You can observe that the system misclassified Herzberg six seconds before impact. You cannot easily determine which weights, which training examples, which architectural choices produced that misclassification. This opacity creates what researchers call the "black box problem" — not just a challenge for understanding, but a structural obstacle to accountability.

Organizational diffusion. Modern AI development does not happen in a single organization. Uber's self-driving system incorporated components from multiple vendors, used open-source software libraries, relied on cloud computing infrastructure, and was developed by teams distributed across multiple cities. When harm occurs, each organizational unit can point to others. The vendor provided the sensor. The engineering team integrated it. The safety team set the parameters. The product team set the timeline. The legal team assessed the regulatory risk. The executive team approved deployment. Responsibility is genuinely distributed — and this distribution, which is an ordinary feature of complex technology development, becomes an obstacle to accountability when things go wrong.

Novel legal categories. Existing law was not designed with autonomous AI systems in mind. Products liability law developed in the context of physical manufactured goods. Negligence law was built around individual human actors making specific decisions. Civil rights law was drafted to prohibit intentional discrimination and, later, facially neutral practices with discriminatory effects — but neither category maps cleanly onto algorithms that produce disparate outcomes as an emergent consequence of optimization on historical data. Courts are being asked to stretch existing doctrines to cover AI harms, and those doctrines are often ill-fitting. The gaps that result are not bugs in the legal system — they are genuine lacunae that require legislative attention.

The "computer says no" defense. Perhaps the most insidious accountability gap is the use of AI systems as a shield against human accountability. When an AI system makes a decision — denying a loan, rejecting a job application, setting bail conditions — the humans involved can claim that the decision was made by the system, not by them. This is rarely fully accurate: a human made a decision to use the system, to configure it in a particular way, to act on its outputs without independent review. But the framing of "the computer decided" is powerful. It diffuses accountability, makes decision-making less transparent, and allows harmful outcomes to persist behind a facade of algorithmic objectivity.

The Responsibility Vacuum

The combination of these factors creates what scholars call a responsibility vacuum: a situation in which harm occurs, identifiable parties were involved, yet no one is held genuinely accountable. Each party's contribution was partial, opaque, or legally insulated. The result is that victims are harmed without recourse, harmful systems continue operating, and no one learns what actually needs to change.

The responsibility vacuum is not merely a theoretical problem. It has practical consequences: systems that caused harm continue to operate; organizations facing no consequences have no incentive to invest in safety; and the public, observing that AI failures go unaddressed, may reasonably conclude that no one is in charge. This erodes trust — not just in AI systems, but in the institutions that are supposed to govern them.

Vocabulary Builder

Accountability: The obligation to explain one's actions and face appropriate consequences; the institutional dimension of responsibility.
Responsibility: The causal or moral relationship between an actor and an outcome.
Liability: The legal form of accountability; exposure to legal penalties or damages.
Culpability: The degree to which an actor deserves moral blame, given what they knew and could have foreseen.
The many hands problem: The phenomenon in which responsibility for a harmful outcome is distributed across so many actors that no individual can be meaningfully held accountable.
Moral luck: The way in which factors outside an actor's control affect the moral judgments made about them; two engineers who make identical choices may face vastly different accountability consequences depending on whether an accident happens to occur.

Section 18.2: The Many Hands Problem

Thompson's Framework Applied to AI

In 1980, political theorist Dennis Thompson described what he called "the problem of many hands" in public administration: in complex organizations, responsibility for harmful outcomes is divided among so many individuals that it becomes impossible to attribute moral or legal responsibility to any single actor. Each individual contributed only a small, apparently defensible part of the whole. Each can plausibly say they followed procedure, acted in good faith, or relied on others to catch errors they couldn't see. Yet the harmful outcome is real, and someone must have caused it.

Thompson was writing about bureaucratic organizations and political accountability — about Watergate and My Lai, not about neural networks. But his framework anticipates the AI accountability problem with remarkable precision. Modern AI systems are built by teams of dozens to thousands of people, deployed by organizations that are distinct from the developers, used by professionals who neither built nor deployed the system, and affect individuals who had no say in any of these decisions. At each step, responsibility is diluted. At the end of the chain, when harm occurs, no individual's contribution seems sufficient to ground full accountability.

The AI Value Chain

The AI value chain runs through at least six distinct categories of actors:

Researchers develop the theoretical foundations, publish papers, and often release code and pretrained models. Their work shapes what is technically possible and establishes norms of practice. Researchers can claim that their work is basic science — that they cannot control how it is used.

Developers build specific AI systems, writing code, training models on datasets, making architectural choices, and setting performance targets. They know the system's capabilities and limitations better than anyone. But they often deploy the system into organizational contexts they don't fully understand, and they rarely see the downstream effects of their choices.

Platforms provide the infrastructure on which AI systems operate: cloud computing (AWS, Google Cloud, Azure), application programming interfaces, and pre-trained foundation models that developers build on top of. Platforms make AI accessible at scale, but they also become part of the causal chain when AI causes harm.

Deployers are the organizations that take AI systems built by others (or built in-house) and deploy them in real-world settings. A hospital deploying an AI diagnostic system, a financial institution deploying a credit-scoring model, or an employer deploying an AI hiring tool — these organizations make consequential decisions about where AI is used, for what purposes, and with what safeguards.

Operators are the humans who work with AI systems in professional contexts: the doctor who reviews an AI diagnostic recommendation, the loan officer who relies on an algorithmic credit score, the HR professional who acts on an AI-generated candidate ranking. They are the last human link in the chain before the AI's output affects a real person.

Affected persons are those whose lives are shaped by AI decisions: the job applicant whose resume is screened out, the loan applicant who is denied credit, the criminal defendant whose bail conditions are set in part by a risk algorithm. They are the reason the AI system exists — or claims to — but they have the least power in the system.

The Amazon Case: Mapping Many Hands

Amazon's AI hiring tool provides a textbook illustration of the many hands problem. As reported in 2018, Amazon developed a machine learning tool to screen resumes for technical roles. The system was trained on ten years of historical resumes submitted to Amazon, predominantly from men, in a field that has historically employed far more men than women. The system learned to penalize resumes that included the word "women's" (as in "women's chess club") and to downgrade graduates of all-women's colleges. Amazon quietly shelved the tool in 2017 after discovering these patterns.

Who was responsible? Each of the following played a role:

The engineers who built the system used historical data that encoded existing biases without adequately scrutinizing whether those patterns reflected genuine predictive validity or merely historical discrimination. They can point out that they were using standard machine learning techniques, that the bias was emergent rather than intentional, and that they flagged the problem when they discovered it.

The managers who decided to build and deploy the system chose to automate a high-stakes decision without adequately defining what fairness in hiring means, without requiring that the system demonstrate fairness before deployment, and without establishing adequate monitoring to detect the bias that eventually emerged.

The HR professionals who used the system's outputs, presumably, in screening decisions were operating it as a tool in a process they were expected to follow. They may not have known the system was producing biased results.

The executives who approved the project set the incentive structure (move fast, reduce costs, increase hiring efficiency) without creating an accountability structure that would have caught the problem earlier. They made the decision to pursue automation in a high-stakes domain without adequate safeguards.

Amazon as an institution failed to conduct adequate pre-deployment testing, failed to establish monitoring mechanisms, and arguably failed to be transparent with job applicants that their applications were being screened by an AI system.

Each of these failures is real. None of them, individually, is an obvious smoking gun. This is the many hands problem: when everyone is somewhat responsible, the ordinary mechanisms for assigning blame — prosecution, professional discipline, reputational consequences — are difficult to activate.

Systemic vs. Individual Failure

The many hands problem teaches an important lesson: many AI failures are systemic rather than individual. They result not from one person's bad judgment but from systems, processes, and incentive structures that produce predictable harm. This distinction matters enormously for how we respond. If Amazon's bias problem was the fault of a rogue engineer, the solution is to fire that engineer. If it was a systemic problem — standard practices, perverse incentives, inadequate governance — then the solution must be systemic as well: mandatory impact assessments, fairness testing requirements, auditing obligations, regulatory oversight.

Emphasizing individual accountability when the problem is systemic produces a "few bad apples" narrative that lets institutions off the hook. It satisfies the emotional demand for a villain without fixing the underlying system. But equally, emphasizing systemic failure to the exclusion of individual accountability can produce a "no one was responsible" conclusion that lets genuinely culpable individuals escape consequences. The challenge is to hold individuals accountable for their specific contributions while also reforming the systems that made harmful outcomes predictable.

The Collingridge Dilemma

The many hands problem is compounded by what philosopher David Collingridge called the "dilemma of control." When a technology is new, it is easy to regulate — but you don't yet know what harms it will produce, because the technology has not been widely deployed. By the time harms are visible and well-understood, the technology has become entrenched: companies have built business models around it, users have become dependent on it, and the political economy of regulation has shifted to favor incumbents. This dilemma applies directly to AI. Amazon's hiring tool was easy to redesign in 2017 — but by then, how many hiring decisions had it influenced? By the time COMPAS's racial disparities were documented by ProPublica in 2016, the system was being used in courts across the United States. By the time autonomous vehicle safety became a serious regulatory concern, Uber and Waymo had already invested billions in the technology and had powerful political and economic interests in the outcome of regulatory decisions.

The Collingridge dilemma does not mean that regulation is futile. It means that regulation must begin earlier, before harms become entrenched. It also means that the many hands problem is structurally embedded in the timeline of technology development — responsibility is hardest to assign precisely when harms are clearest.

Section 18.3: Types of AI Failure Modes

Understanding who is responsible requires understanding how an AI system actually failed. Failures are not all alike, and different failure modes implicate different parties and call for different responses. The following taxonomy covers the most significant categories.

Specification Failure

A specification failure occurs when the AI system was optimized for the wrong objective — when what the system was told to maximize does not correspond to what we actually want. The system may perform exactly as designed; the design itself is the problem.

Facebook's news feed algorithm is a canonical example. The system was optimized to maximize engagement — time on platform, clicks, shares, comments. This is a precisely measurable objective. The system pursued it with extraordinary effectiveness. The problem is that engagement is not equivalent to wellbeing, democratic participation, or truth. Content that provokes outrage, fear, and tribal identity is highly engaging. Misinformation that confirms existing beliefs is more engaging than accurate information that challenges them. A system optimized for engagement in this environment will systematically amplify the most emotionally manipulative, divisive, and false content — which is exactly what happened.

Specification failures are the developer's core responsibility. Choosing an objective function is one of the most consequential design decisions in AI development. Getting it wrong produces systems that are effective at causing harm while being technically successful. The developers and managers who chose "engagement" as the optimization target for Facebook's news feed knew — or should have known — that engagement is not equivalent to any of the social goods the platform claimed to serve.

Bias and Fairness Failure

A bias or fairness failure occurs when an AI system produces systematically unjust outcomes for identifiable groups. These failures typically arise from training data that encodes historical discrimination, from optimization objectives that fail to account for disparate impact, or from evaluation processes that assess only aggregate performance without examining group-specific outcomes.

The Amazon hiring tool is the paradigmatic example. The COMPAS recidivism tool, discussed at length in Chapters 9 and 30, is another. A credit-scoring algorithm trained on historical lending data in a redlined neighborhood will learn to discriminate by proxy even when race is not an explicit input variable, because the neighborhood's history of racial discrimination has encoded race into dozens of facially neutral variables — property values, credit history, even ZIP code.

Bias failures implicate developers (who trained on biased data and didn't catch the problem), deployers (who used the system without adequate fairness testing), regulators (who did not require pre-deployment validation), and the broader social and historical context that produced the biased data in the first place. Crucially, historical bias is not a sufficient excuse: knowing that your training data encodes discrimination is reason to take extra care in validation, not to proceed without checking.

Robustness Failure

A robustness failure occurs when an AI system fails under conditions that differ from its training environment. AI systems are trained on distributions of data that represent the world as it was during training. When the deployment environment diverges from the training distribution — because of rare edge cases, environmental changes, or deliberate manipulation — the system may fail catastrophically.

Self-driving vehicles trained in suburban California driving conditions may fail in Boston snowstorms. A medical AI trained on predominantly white patients may misdiagnose conditions in patients of color whose imaging presents differently. A fraud detection system trained on historical transactions may fail to catch new fraud patterns that emerged after training. Robustness failures are primarily the developer's responsibility — they result from inadequate testing, insufficient training data diversity, or overconfident deployment into environments for which the system was not validated.

Security Failure

A security failure occurs when an AI system is successfully attacked by an adversary. This category includes adversarial examples (carefully crafted inputs designed to fool the model), data poisoning (corrupting the training data to produce systematic errors), model inversion attacks (extracting private training data from model outputs), and backdoor attacks (embedding hidden behaviors triggered by specific inputs).

Adversarial examples are among the most striking findings in contemporary AI research: a deep learning image classifier, which correctly identifies a school bus, can be made to classify it as an ostrich by adding a pattern of noise imperceptible to a human eye but detectable by the model. A spam filter trained by an adversarially aware attacker can be made to let through malicious emails that exploit specific weaknesses in the filter's decision boundary. Security failures are the developer's and deployer's joint responsibility: developers must design for adversarial robustness, and deployers must assess the threat environment in which they are deploying.

Integration Failure

An integration failure occurs when the AI system works as designed, but the broader sociotechnical system of which it is a part fails. The classic example is the automation paradox: systems designed to reduce human error can, when they work well enough to reduce human engagement, produce catastrophic errors when they unexpectedly require human intervention.

The Boeing 737 MAX disasters provide a non-AI illustration: MCAS (Maneuvering Characteristics Augmentation System) worked as designed in some scenarios; the catastrophic failures occurred when pilots needed to override the system and could not. Similarly, an AI medical diagnostic tool might correctly flag high-risk cases while simultaneously degrading a clinician's diagnostic attention to low-risk cases — so that errors in the low-risk cases increase as physicians over-rely on the system. Integration failures implicate system designers, deployers, and operators, because the failure emerges from the interaction between the AI and the human and organizational systems around it.

Governance Failure

A governance failure occurs when adequate oversight was never established over an AI system. No pre-deployment testing, no monitoring, no incident reporting, no accountability chain. The system may or may not work as designed — but no one would know, because no one is looking.

The absence of mandatory pre-deployment safety testing for most AI systems in the United States is a structural governance failure. The lack of any systematic incident reporting requirement — so that AI failures are not aggregated, analyzed, or learned from — is another. Governance failures are primarily the responsibility of deploying organizations and regulators: they result from choices not to establish adequate oversight, often because oversight costs money and the harm it prevents is diffuse and statistical.

Drift Failure

A drift failure occurs when an AI system that was accurate at deployment deteriorates over time as the world changes and the deployment environment diverges from the training environment. A credit-scoring model trained before the 2008 financial crisis will have been trained in a world that looked very different from the post-crisis economy. A language model trained on 2022 internet data will have gaps and errors when deployed into a 2025 information environment. Models that were trained on data from before a pandemic, an economic shock, or a demographic shift may produce increasingly inaccurate predictions as the gap between training and deployment widens.

Drift failures are primarily the deployer's responsibility: they result from the failure to monitor deployed models and update or retire them when performance degrades. They also require adequate monitoring infrastructure, which is a governance responsibility.

Section 18.4: Developer Responsibility

What Developers Owe

AI developers — the engineers, data scientists, machine learning researchers, and teams who design, train, and test AI systems — occupy a position of unique knowledge and power. They understand the system's design and limitations better than anyone. They make choices that are invisible to users, deployers, and regulators but that determine how the system behaves in millions of real-world interactions. With that knowledge and power comes obligation.

The core developer obligations are: competent implementation, honest documentation, and accurate capability claims. Competent implementation means adhering to professional standards of practice in developing AI systems — not cutting corners on testing, not training on data of unknown provenance, not deploying into high-stakes settings without adequate validation. Honest documentation means accurately describing what the system does, how it was built, what its limitations are, and where it has been tested and found to perform adequately. Accurate capability claims means not overstating what the system can do in marketing materials, technical specifications, or conversations with deployers.

The Negligence Standard

In legal terms, developers' obligations are assessed primarily through the negligence standard: did the developer exercise reasonable care? This requires asking what a reasonably competent AI developer would have done under the circumstances. The answer is informed by professional standards, industry practices, and expert testimony.

Reasonable care in AI development includes: using representative training data and testing for demographic disparities; documenting known limitations and failure modes; conducting adversarial testing before deployment in high-stakes settings; monitoring post-deployment performance; and reporting problems discovered after deployment.

Comparative negligence applies when multiple parties contributed to harm: in the Amazon hiring case, the engineers who discovered the tool was biased and continued to deploy it bore comparative fault alongside the managers who approved deployment and the executives who set the incentive structure.

The "We Were Just Building Tools" Defense

A persistent refrain among AI developers facing accountability questions is that they are merely building tools — that what matters is how those tools are used, not the tools themselves. This defense is inadequate for several reasons.

First, it overstates the neutrality of AI tools. An AI system trained on biased data and deployed in a high-stakes setting is not analogous to a kitchen knife that can be used for cooking or violence. Its design choices — including the training data, the objective function, and the failure to test for bias — make certain harmful uses not merely possible but highly probable. The bias was built in, not added by the deployer.

Second, it ignores the developer's superior knowledge. Developers know things about their systems that deployers, operators, and users do not. They know the training data came from predominantly male engineers. They know the model was tested only on certain population segments. They know the system has a high false-positive rate in certain edge cases. Choosing not to disclose these limitations, or not to test for problems that developers with appropriate domain knowledge should foresee, is a choice — not a neutral act of tool-building.

Third, professional codes of ethics reject this defense explicitly. The ACM Code of Ethics states that computing professionals should "avoid harm" and "be honest and trustworthy." The IEEE Code of Ethics commits members to "hold paramount the safety, health, and welfare of the public." These codes impose obligations on AI developers as professionals, not merely as tool-builders. The "just building tools" defense treats AI development as a craft without professional obligations — an increasingly untenable position as AI systems affect more consequential decisions.

Algorithmic Impact Assessments as Professional Obligation

One structural mechanism for institutionalizing developer responsibility is the algorithmic impact assessment (AIA) — a systematic, documented analysis of an AI system's potential effects on affected populations, conducted before deployment. Impact assessments originated in environmental law (environmental impact assessment) and spread to privacy (privacy impact assessment). Their application to AI is a natural extension.

An adequate AIA requires developers to: describe the system and its intended use; identify affected populations; enumerate potential harms; test for demographic disparities; develop a mitigation plan; and establish a post-deployment monitoring plan. The discipline of completing an AIA forces developers to confront questions about harm, fairness, and accountability that are otherwise easy to defer. Where legislation requires AIAs — as Canada's Directive on Automated Decision-Making does for government AI systems — they become a legally cognizable standard of care: failing to conduct an AIA when one is required is evidence of negligence.

Section 18.5: Deployer Responsibility

The Non-Delegable Nature of Accountability

Organizations that deploy AI systems — in hiring, lending, healthcare, criminal justice, education, or any other consequential domain — have responsibilities that cannot be delegated to the AI vendor or developer. This principle is both ethical and legal: an employer who uses an AI hiring tool that produces racially discriminatory outcomes is subject to employment discrimination law regardless of whether the AI was built in-house or purchased from a vendor. The deployer remains the accountable party.

This matters because the "we just used the vendor's tool" defense is a logical extension of the "we were just building tools" defense — and it is equally inadequate. A hospital that deploys an AI diagnostic system that performs poorly for Black patients cannot avoid accountability by pointing to the AI vendor's contract. An employer that uses an AI screening tool that screens out women cannot escape liability under Title VII by pointing to the software agreement. The deployer made a choice to use the system, to configure it in a particular way, and to act on its outputs — and that choice is theirs.

Due Diligence Obligations

What does responsible AI procurement look like? Due diligence before deploying a third-party AI system should include: reviewing the vendor's documentation of training data, validation methodology, and known limitations; independently testing the system for performance across relevant demographic groups; assessing the system's fitness for the specific deployment context; reviewing the vendor's track record and any history of problems with the system; negotiating contractual rights to audit and to receive notification of material changes; and assessing the regulatory compliance implications of the deployment.

In many organizations, AI procurement is treated like software procurement in general — focused on functionality, price, and vendor reputation, with minimal attention to safety, fairness, or accountability implications. This is changing, slowly, under pressure from regulators and litigants. The EEOC has made clear that employers bear responsibility for the discriminatory effects of AI tools they deploy, regardless of whether they built those tools.

Configuration Responsibility

Deployers often make consequential choices about how AI systems are configured: what threshold to set for a credit-scoring decision; whether to configure an AI hiring tool to weight certain credentials more heavily; how to calibrate a fraud detection system's sensitivity. These configuration choices significantly affect outcomes, and they are made by the deployer, not the developer. Configuration choices that produce discriminatory outcomes are the deployer's responsibility.

Monitoring Responsibility

Deployers also bear ongoing monitoring responsibility. Deploying an AI system is not a one-time action; it is the beginning of an ongoing operational relationship. Responsible deployment requires: tracking post-deployment performance metrics across demographic groups; investigating and responding to complaints from affected individuals; periodically revalidating the system as the deployment environment changes; and retiring or retraining the system when performance degrades.

The failure to monitor is itself a form of negligence: you cannot claim good faith compliance with non-discrimination obligations if you never checked whether the system was discriminating. The EEOC's enforcement approach to AI hiring tools reflects this: employers are expected to validate their hiring practices on a continuous basis, which means regularly testing AI systems for adverse impact.

EEOC and CFPB Enforcement

Federal regulators have been increasingly explicit about deployer accountability for AI-caused harm. The EEOC's 2022 technical assistance document on AI and the ADA, and its 2023 guidance on artificial intelligence and the employment discrimination laws, makes clear that employers are responsible for AI-caused employment discrimination even when the AI was developed by a third party. The CFPB has similarly signaled that financial institutions using AI-based credit models are responsible for those models' compliance with the Equal Credit Opportunity Act and the Fair Housing Act.

This enforcement posture reflects a deliberate regulatory choice: rather than pursuing AI vendors, who may lack the resources or the legal exposure, regulators are pursuing the deploying organizations, which have chosen to use the systems and have the power to stop using them. The approach creates incentives for deployers to conduct genuine due diligence and to demand accountability from vendors.

Section 18.6: Operator and User Responsibility

Professional Users of AI

Many AI systems are deployed not directly to consumers but to professional intermediaries: doctors who use AI diagnostic tools, judges who consider AI risk assessments, loan officers who use AI credit models, HR professionals who use AI screening tools. These professionals exercise judgment about whether to follow AI recommendations, how much weight to give them, and when to override them. Their professional training, licensing, and ethical obligations are part of the accountability framework.

Professional users have knowledge that most consumers lack. A doctor reviewing an AI diagnostic recommendation can bring clinical training to bear. A loan officer reviewing an AI credit score can ask questions about the applicant's circumstances. A judge reviewing an AI risk assessment can consider factors the algorithm did not model. This knowledge creates responsibility: professionals who follow AI recommendations without genuine engagement with the professional question are not fulfilling their professional obligations.

Automation Bias

The core challenge is automation bias: the well-documented cognitive tendency to over-rely on automated recommendations, even in the face of countervailing evidence. In research studies, professionals in many domains — pilots, physicians, financial analysts — demonstrate automation bias: they defer to automated recommendations more than they should, and they fail to catch errors that a genuinely engaged human reviewer would catch.

Automation bias has several components: a tendency to accept automated recommendations more readily than equivalent human recommendations; a tendency to miss errors in automated systems; and a tendency to notice failures less quickly because active engagement is reduced when automation is handling the primary task. These tendencies are strongest when automation is highly reliable most of the time (so that vigilance is not maintained) and when the task is cognitively demanding.

In AI-assisted professional decision-making, automation bias means that the human "in the loop" may not be performing genuine oversight. A doctor who rubber-stamps every AI diagnostic recommendation is not providing meaningful human review — she is providing the appearance of human review while actually delegating the decision to the AI. This matters both ethically and legally: the meaningful human review standard, which appears in the EU AI Act and in EEOC guidance on AI in hiring, requires genuine engagement, not mere presence.

Professional Licensing and Standards

Professional licensing exists, in part, as an accountability mechanism: it ensures that people who make high-stakes decisions on behalf of others have demonstrated competence and are subject to professional discipline. AI does not dissolve these obligations; it changes their application.

A physician who uses an AI diagnostic tool is still obligated to provide competent medical care. If the AI gives a wrong diagnosis and the physician follows it without clinical examination, the physician may be liable for medical malpractice — because a competent physician, exercising the standard of care, would not have relied solely on the AI. A financial advisor who uses an AI-generated investment recommendation is still obligated to provide advice suitable for the client's specific circumstances; an AI recommendation that is not suitable remains the advisor's responsibility.

Legal cases addressing operator reliance on AI are accumulating. Courts have generally declined to treat AI reliance as a shield from professional liability — which is the right outcome. A world in which professionals can escape accountability for bad decisions by claiming to have followed AI recommendations would eliminate professional accountability entirely.

Section 18.7: Platform and Infrastructure Responsibility

Foundation Model Providers

The emergence of large foundation models — GPT-4, Claude, Gemini, LLaMA — has created a new layer of AI accountability that did not exist a decade ago. Foundation model providers occupy a position of extraordinary power in the AI ecosystem: their models are used as the basis for countless downstream applications, many of which the provider never approved, reviewed, or anticipated. A harmful capability baked into a foundation model — a propensity to produce biased outputs, to assist with harmful activities, to hallucinate authoritative-sounding falsehoods — propagates to every downstream application built on it.

What do foundation model providers owe? At minimum: competent implementation of safety measures; honest documentation of known limitations and risks; meaningful terms of service that prohibit harmful uses; and genuine enforcement of those terms. The EU AI Act's provisions on general-purpose AI models impose more specific obligations: risk assessments for models above a capability threshold, technical documentation, compliance with EU copyright law for training data, and (for the highest-capability models) transparency and risk management obligations.

The "we're just a platform" defense — the foundation-model equivalent of "we were just building tools" — is as inadequate here as in other contexts. OpenAI's ChatGPT and Anthropic's Claude are not neutral infrastructure. They are systems with particular capabilities, particular failure modes, and particular social effects, and their providers have chosen to deploy them at scale.

Cloud Infrastructure Responsibility

AWS, Google Cloud, and Microsoft Azure provide the computing infrastructure on which most AI development and deployment occurs. They have access to information about who is running what AI systems, at what scale, on their infrastructure. This creates at least some responsibility to establish terms of service that prohibit hosting AI systems used for clearly illegal purposes, to enforce those terms, and potentially to cooperate with regulatory and law enforcement investigations into AI-caused harm.

Cloud providers have generally argued that they are neutral infrastructure providers with no obligation to investigate or control how their services are used. This argument has become less tenable as cloud providers have themselves entered the AI market as model providers and AI services companies. A company that sells both AI services and the infrastructure those services run on has a more complex relationship to the harms its infrastructure enables than a purely neutral infrastructure provider would.

Section 230 and Its Limits

Section 230 of the Communications Decency Act provides that interactive computer services are not publishers of third-party content, insulating social media platforms from liability for user-generated content in most circumstances. This provision has been centrally important to the development of the internet — and increasingly controversial as platforms have grown into dominant intermediaries that make editorial choices through their algorithmic recommendation systems.

The critical question is whether algorithmic recommendation — the AI-driven curation that determines what content users see — is protected by Section 230 or is a form of editorial conduct that attracts publisher liability. In Gonzalez v. Google (2023), the Supreme Court declined to rule on this question on the merits, leaving the law unsettled. The EU's Digital Services Act takes a different approach, imposing systemic risk assessments and mitigation obligations on very large online platforms — treating algorithm-driven amplification of harmful content as a platform responsibility, not a protected editorial choice.

Section 18.8: Regulatory and Government Responsibility

The Role of Regulators

Regulators exist to address market failures: situations in which individual actors cannot adequately protect themselves and in which the costs of harm are borne by people who are not parties to the relevant transactions. AI harm is a paradigmatic case for regulation: consumers cannot fully evaluate the AI systems that affect them; the people harmed by AI decisions (job applicants, loan seekers, criminal defendants) are not parties to the contracts that govern AI deployment; and the harms are often diffuse, statistical, and difficult to trace to individual decisions.

Effective AI regulation performs several functions: it sets minimum safety and fairness standards that create a level playing field; it requires pre-deployment assessment for high-risk applications; it establishes incident reporting mechanisms that allow learning from failures; and it provides enforcement mechanisms that give bite to legal obligations. The FDA's pre-market review process for medical devices illustrates what proactive AI regulation could look like: significant development burden, but a genuine reduction in the deployment of unsafe systems.

Regulatory Capture

Regulatory capture refers to the process by which regulatory agencies come to serve the interests of the industries they regulate, rather than the public. Capture occurs through multiple mechanisms: regulators who rotate between government and industry ("revolving door"); lobbying that shapes regulatory priorities; industry control of the technical expertise on which regulators depend; and campaign contributions that align legislators with industry interests.

Regulatory capture is a serious risk in AI regulation, precisely because the information asymmetries between industry and regulators are so large. AI companies know their systems far better than any regulator can. They can shape the terms of regulatory debate, propose technical standards that happen to favor their systems, and frame regulatory questions in ways that obscure the relevant tradeoffs. The aggressive recruitment of autonomous vehicle companies to Arizona — the regulatory context for the Herzberg case — is a clear example: state governments were competing for economic development by minimizing regulatory friction, with results that were predictable in retrospect.

The Regulatory Gap

The most fundamental regulatory failure is the absence of any pre-deployment safety requirement for most AI applications. The FDA requires pre-market approval for medical devices, including AI-based diagnostic tools used in clinical settings. The FTC requires truthful advertising. The EEOC prohibits employment discrimination, including through AI tools. But for the vast majority of AI applications — AI-driven pricing, AI-driven content recommendation, AI-driven decision-making in government services, AI-driven insurance underwriting — there is no pre-deployment review, no mandatory impact assessment, and no registration requirement.

This regulatory gap means that AI systems are routinely deployed at scale before anyone outside the developing organization has any information about how they work, what they do, or what harms they might cause. By the time problems emerge — as they did with COMPAS, with Amazon's hiring tool, with Facebook's content recommendation algorithms — they may be entrenched in ways that make reform politically and economically difficult.

Section 18.9: Structural Accountability — Building Systems That Work

Beyond Individual Blame

The preceding analysis reveals a fundamental limitation of the individual accountability framework: most AI failures are systemic failures, and addressing them requires systemic responses. Firing the engineer who built a biased hiring tool, charging the safety driver who was watching Hulu, or fining the company whose algorithm discriminated — all of these are appropriate responses to specific harms, but none of them, alone, prevents similar harms from recurring. Individual accountability is necessary but insufficient.

Structural accountability mechanisms are those that address the systemic conditions under which AI failures occur, rather than merely responding to specific failures after the fact. The most important structural mechanisms include:

Mandatory Impact Assessments

Requiring organizations to conduct and document algorithmic impact assessments before deploying AI in high-risk settings is the most widely proposed structural accountability mechanism. Canada's Directive on Automated Decision-Making has required impact assessments for government AI systems since 2019, with the depth of assessment scaled to the stakes of the decision. The EU AI Act requires conformity assessments for high-risk AI systems. Proposed federal legislation in the United States — including the Algorithmic Accountability Act — would extend this requirement to private-sector AI systems.

The value of impact assessments is not that they catch every potential harm — they won't. It is that they force the question. Organizations that must document potential harms are forced to think about them seriously before deployment. Where an assessment identifies problems, there is a documented record that can support liability if the deployer ignored the warning. Where an assessment is required by law and not conducted, the failure to conduct it is itself evidence of negligence.

Registration Requirements

The EU AI Act creates a public database of high-risk AI systems deployed in the EU market. This is a registration requirement: operators of high-risk AI systems must register them before deployment, providing basic information about the system's purpose, design, and operator. Registration creates a publicly accessible record that enables oversight by researchers, journalists, regulators, and affected communities. It also creates accountability through transparency: knowing that your AI system is publicly registered creates incentives to take its compliance seriously.

Insurance Requirements

Requiring deployers of high-risk AI systems to carry liability insurance is a mechanism for forcing internalization of risk. If you deploy an AI system that discriminates in credit decisions, and you must insure against the liability consequences of that discrimination, your insurance premiums will be higher the greater your exposure. This creates a direct financial incentive to invest in bias reduction, monitoring, and compliance — even absent regulatory enforcement. Mandatory insurance also creates a pool of resources available to compensate victims, addressing the problem that individual victims of AI discrimination often lack the resources to pursue litigation against large corporations.

Audit Requirements

Third-party audit requirements — as established for employment AI by NYC Local Law 144, and for high-risk AI systems by the EU AI Act — create an independent check on deployer compliance. Audit requirements are discussed at length in Chapter 19. For present purposes, the key point is that mandatory external auditing is a structural accountability mechanism: it moves oversight from the internal organization (which has conflicts of interest) to an independent external party with professional obligations.

Incident Reporting

Aviation safety is dramatically better than it was fifty years ago in part because of mandatory incident reporting. Pilots, airlines, and airports are required to report safety incidents, near-misses, and system failures to the FAA, which aggregates the reports, identifies patterns, and develops systemic responses. This learning system works because the data is available, because reporting is protected from most liability consequences (to incentivize honest reporting), and because there is an institutional capacity to analyze and respond to what is reported.

AI currently lacks any comparable system. AI failures occur in scattered contexts — a medical AI misdiagnosis here, a credit algorithm denial there, a content recommendation that amplifies a mass casualty event somewhere else — and there is no institutional mechanism for aggregating them, identifying patterns, or developing systemic responses. Mandatory AI incident reporting, along the lines of aviation incident reporting, would be a transformative structural accountability mechanism.

The Case for Strict Liability

Negligence liability requires proving that a defendant failed to exercise reasonable care. For AI harms, this is often practically impossible: the plaintiff typically lacks access to the information needed to prove negligence (the training data, the model architecture, the internal testing results), and the opacity of AI systems makes it difficult to establish that a specific harm resulted from a specific design or deployment choice.

Strict liability — holding AI developers or deployers liable for AI-caused harms regardless of whether they were negligent — would address this problem by removing the need to prove negligence. It would also create powerful incentives for investment in safety: if you are going to be liable for harm whether or not you were negligent, the incentive to prevent harm is much stronger. The EU's proposed AI Liability Directive and Product Liability Directive revision move toward strict liability for certain categories of AI harm. The argument against strict liability — that it will chill AI innovation — is real but must be weighed against the cost of leaving AI victims without recourse.

Section 18.10: The Uber/Herzberg Case as Accountability Analysis

Returning to the Hook

We began with Elaine Herzberg's death — the first pedestrian fatality involving an autonomous vehicle. We now have a framework for analyzing what happened, why accountability was so elusive, and what structural changes the case demanded.

What Each Party Could Have Done Differently

Uber engineers could have designed the object classification system to prioritize pedestrian safety in ambiguous cases; they could have documented and escalated the known limitations of the braking suppression system; they could have refused to approve deployment until the system met minimum safety standards. They did not do these things, at least in part because organizational incentives — competitive pressure to beat Waymo, the need to demonstrate a working system to investors — pushed against caution.

Uber management could have established a safety culture that gave engineers the standing to block deployment; they could have required the system to pass objective safety tests before deployment on public roads; they could have ensured that the emergency braking system was not disabled in ways that compromised pedestrian safety. Instead, internal communications later revealed by the NTSB showed that safety concerns were subordinated to deployment timelines.

Uber as an institution could have established better monitoring protocols for safety drivers; it could have ensured that safety drivers were actually monitoring the road rather than their phones; it could have conducted more rigorous internal safety review before each testing expansion.

Rafaela Vasquez, the safety driver, should have been monitoring the road — that is why she was there. She was not. This is both an individual failure and a failure of Uber's monitoring and enforcement systems: Uber's systems did not detect that safety drivers were routinely distracted during autonomous vehicle tests, even though internal data later showed this was a common problem.

Arizona regulators could have required Uber to demonstrate that the autonomous vehicle system met minimum safety standards before permitting public road testing; they could have required operators to have functioning backup systems; they could have required monitoring of safety driver attentiveness. Instead, Arizona's approach — recruiting autonomous vehicle companies by minimizing regulatory friction — predictably produced this outcome.

Federal regulators (NHTSA) could have established mandatory safety standards for autonomous vehicle testing before permitting public road testing. No such standards existed in 2018.

What the Legal Response Was — and Wasn't

Uber settled the civil wrongful death claims brought by Herzberg's family for an undisclosed sum, without admitting liability. Vasquez was charged in 2020 with criminal negligence and ultimately received probation in 2023. No Uber executives were charged. No regulatory enforcement action was taken against Uber.

The NTSB issued safety recommendations, which are advisory only — they have no enforcement mechanism. NHTSA proposed new guidance for autonomous vehicle testing, which was not finalized. Arizona enacted modest new requirements for autonomous vehicle permits in 2019.

What Changed

The Herzberg case accelerated several developments. Uber suspended its autonomous vehicle testing program for over a year. Several other AV companies adopted more conservative testing protocols. Public and legislative attention to autonomous vehicle safety increased. But the absence of meaningful criminal or regulatory consequences for Uber's failures means that the accountability signal sent by the case was weak. The lesson that corporate AI safety failures have limited consequences for the corporations involved is one that the AI industry cannot afford to keep learning.

Discussion Questions

The Uber/Herzberg case involved failures at every level — developer, operator, regulator — yet produced minimal legal accountability. What specific legal or regulatory changes would have produced more meaningful accountability, and would those changes have deterred the specific failures that caused Herzberg's death?
Apply the many hands problem to a specific AI failure you are familiar with (suggest: Amazon hiring, COMPAS, Facebook news feed, or a case from your own professional experience). Who contributed to the harm? How would you distribute moral responsibility across the parties?
The "we were just building tools" defense is frequently invoked by AI developers when their systems cause harm. Under what circumstances, if any, do you find this defense persuasive? What conditions would need to hold for the defense to be legitimate?
The chapter distinguishes between individual accountability (blaming a person) and structural accountability (fixing a system). Some critics argue that focusing on structural accountability is a way of letting individuals off the hook. How would you respond to this argument?
Mandatory insurance for high-risk AI systems would create financial incentives for deployers to invest in AI safety. What objections would the AI industry raise to this requirement? How would you evaluate those objections?
The Collingridge dilemma suggests that technology is easiest to regulate when it is new but hardest to understand, and hardest to regulate when it is well-understood but entrenched. How does this dilemma apply to AI regulation? What strategies might help navigate it?
The chapter argues that professional users of AI — doctors, judges, loan officers — have independent obligations that AI reliance does not dissolve. Are there circumstances in which following an AI recommendation without independent review is professionally appropriate? What should those circumstances look like?

Cross-references: Chapter 3 (accountability concepts); Chapter 7 (Amazon hiring); Chapter 9 (COMPAS, fairness metrics); Chapter 22 (whistleblowing); Chapter 30 (COMPAS in criminal justice); Chapter 33 (EU AI Act).