Quiz: Accountability and Audit

Test your understanding before moving to the next chapter. Target: 70% or higher to proceed.


Section 1: Multiple Choice (1 point each)

1. According to Section 17.1.1, accountability comprises three elements. Which of the following correctly lists all three?

  • A) Transparency, fairness, and efficiency
  • B) Answerability, attributability, and enforceability
  • C) Explainability, accuracy, and liability
  • D) Documentation, oversight, and remediation
Answer **B)** Answerability, attributability, and enforceability. *Explanation:* Section 17.1.1 defines accountability as requiring: (1) answerability — someone must explain and justify the decision; (2) attributability — the decision must be traceable to an identifiable person or entity; and (3) enforceability — there must be consequences for unjustified decisions. Algorithmic systems disrupt all three simultaneously: answerability collapses when reasoning is opaque, attributability fragments across multiple actors, and enforceability weakens when legal frameworks have not caught up with the technology.

2. The VitraMed accountability chain in Section 17.1.4 illustrates a situation in which a patient is harmed by an inaccurate risk score. What is the core governance problem this scenario demonstrates?

  • A) The algorithm was not tested before deployment.
  • B) Every actor in the decision chain has a plausible claim of non-liability, so responsibility is distributed into invisibility.
  • C) The physician failed to exercise independent clinical judgment.
  • D) The training data was intentionally biased against certain patient groups.
Answer **B)** Every actor in the decision chain has a plausible claim of non-liability, so responsibility is distributed into invisibility. *Explanation:* Section 17.1.4 presents five actors — VitraMed (developer), partner clinic (deployer), hospital administrator, physician, and data providers — each offering a plausible reason why they are not liable. The developer says they built a tool, not a policy. The deployer says they trusted the vendor. The administrator followed recommended thresholds. The physician relied on the system. The data providers say how the data was used is not their concern. The core problem is structural: accountability has been distributed across the chain so effectively that no single actor bears responsibility. As Mira observes, "the responsibility is so distributed that it's functionally the same as if nobody is."

3. An algorithmic audit that sends matched pairs of test subjects — identical in all respects except for a protected characteristic such as race or gender — through a system to detect disparate treatment is best described as:

  • A) A code audit
  • B) A user experience audit
  • C) An audit study (also called a correspondence study)
  • D) An Algorithmic Impact Assessment
Answer **C)** An audit study (also called a correspondence study). *Explanation:* Section 17.2 describes audit studies as a methodology borrowed from social science in which matched testers who differ only on a characteristic of interest interact with a system to detect differential treatment. This method has been used to detect racial discrimination in hiring platforms, lending algorithms, and housing marketplaces. A code audit (A) examines the system's source code. A user experience audit (B) examines the system from the user's perspective, including interface design and information provided. An AIA (D) is a broader assessment process, not a specific testing method.

4. Section 17.2 argues that internal audits and external audits each have distinct advantages and limitations. Which of the following best describes the key advantage of external audits over internal audits?

  • A) External auditors have access to the source code and training data.
  • B) External auditors are less expensive and faster to conduct.
  • C) External auditors bring independence, reducing the risk of conflicts of interest.
  • D) External auditors always use more sophisticated technical methods.
Answer **C)** External auditors bring independence, reducing the risk of conflicts of interest. *Explanation:* The primary advantage of external audits is independence. Internal auditors may face pressure — explicit or implicit — to produce findings favorable to the organization. External auditors, like external financial auditors, derive their credibility from their independence. However, external audits face their own limitations, including lack of access to proprietary systems, models, and data. Option A is actually a disadvantage of external audits — they typically do *not* have the access that internal auditors have. Options B and D are not supported by the chapter.

5. The "many hands problem" as discussed in Section 17.4 refers to:

  • A) The difficulty of training machine learning models when too many engineers contribute to the codebase
  • B) The situation in which responsibility for an outcome is distributed across so many actors that no one is effectively accountable
  • C) The technical challenge of auditing systems that process data from multiple sources
  • D) The political difficulty of passing algorithmic accountability legislation when many stakeholders oppose it
Answer **B)** The situation in which responsibility for an outcome is distributed across so many actors that no one is effectively accountable. *Explanation:* Section 17.4 introduces the many hands problem (a term originating with political philosopher Dennis Thompson) as the situation where a harmful outcome results from the actions and decisions of many different people, but no single person's contribution is sufficient to make them individually responsible. In algorithmic systems, the many hands include data collectors, annotators, model developers, infrastructure providers, deployers, administrators, and end users. The problem is not merely that many people are involved — it is that the distribution of responsibility makes accountability functionally impossible without new governance structures.

6. Section 17.3 describes Algorithmic Impact Assessments (AIAs) as modeled on which existing governance tool?

  • A) Financial audits under the Sarbanes-Oxley Act
  • B) Environmental Impact Assessments (EIAs) under NEPA
  • C) Clinical trials required by the FDA for pharmaceutical approval
  • D) Regulatory sandboxes used by financial regulators
Answer **B)** Environmental Impact Assessments (EIAs) under NEPA. *Explanation:* Section 17.3 draws an explicit analogy between AIAs and Environmental Impact Assessments, which were established under the National Environmental Policy Act (NEPA) in the United States. Both require that potential harms be identified and assessed *before* a system is deployed or a project is built. Both involve stakeholder consultation. And both face similar risks of capture — the possibility that the assessment process becomes a box-checking exercise rather than a genuine evaluation of harm. The AIA framework adapts EIA principles to the algorithmic context.

7. Which of the following best describes the concept of a "regulatory sandbox" as discussed in Section 17.5?

  • A) A secure virtual environment where algorithms are tested before deployment in the real world
  • B) A limited, controlled regulatory environment in which companies can test innovative systems under relaxed rules with regulatory oversight
  • C) A physical space where regulators meet with technology companies to negotiate compliance requirements
  • D) A database of approved algorithms that have passed regulatory review
Answer **B)** A limited, controlled regulatory environment in which companies can test innovative systems under relaxed rules with regulatory oversight. *Explanation:* Section 17.5 describes regulatory sandboxes as governance mechanisms that allow companies to deploy innovative algorithmic systems in a controlled environment with regulatory supervision, temporarily relaxing certain rules to allow experimentation while maintaining oversight. The concept was pioneered in financial regulation (the UK's Financial Conduct Authority) and has been adapted for AI governance in several jurisdictions. Sandboxes aim to balance innovation with accountability by creating a structured space for learning rather than either prohibiting new systems or allowing unrestricted deployment.

8. According to the chapter, which liability framework holds a party responsible for harm caused by their product regardless of whether they were negligent?

  • A) Negligence standard
  • B) Strict liability
  • C) Product liability under a design defect theory
  • D) Due diligence obligation
Answer **B)** Strict liability. *Explanation:* Section 17.4 presents strict liability as a framework in which the responsible party is liable for harm regardless of fault — even if they exercised reasonable care. The argument for applying strict liability to algorithmic systems is that developers and deployers are best positioned to prevent harm, and strict liability creates strong incentives for safety. A negligence standard (A) requires proving the party failed to exercise reasonable care. Product liability under a design defect theory (C) requires showing the product was defectively designed. A due diligence obligation (D) is a procedural requirement, not a liability standard.

9. Eli raises a concern in Section 17.2 about algorithmic audits: "An audit that finds the algorithm treats everyone equally is still a problem if the system should never have been built in the first place." What limitation of algorithmic auditing is Eli identifying?

  • A) Audits are too expensive for community organizations to commission.
  • B) Audits can evaluate fairness within a system but cannot question whether the system should exist.
  • C) Audits rely on access to source code, which companies refuse to provide.
  • D) Audits take too long to complete and the system changes before the audit is finished.
Answer **B)** Audits can evaluate fairness within a system but cannot question whether the system should exist. *Explanation:* Eli's point highlights a fundamental limitation of the audit paradigm: audits are inherently reformist. They assume the system exists and ask whether it works fairly, accurately, and transparently. They do not and cannot ask whether the system should exist at all — whether the decision to automate a particular domain was itself ethically justified. An algorithm that allocates police patrols might pass a technical fairness audit while still representing an unjustifiable expansion of surveillance. This limitation connects to the chapter's broader argument that audits are necessary but not sufficient governance tools.

10. According to the chapter, what is the primary risk of "audit capture"?

  • A) Malicious actors gaining control of the audit process to sabotage the system
  • B) The audit process becoming a routine compliance exercise that legitimizes systems without genuinely evaluating them
  • C) Auditors developing too close a relationship with the technology and losing their critical perspective
  • D) Audit results being leaked to competitors, revealing proprietary information
Answer **B)** The audit process becoming a routine compliance exercise that legitimizes systems without genuinely evaluating them. *Explanation:* Section 17.3 identifies capture as the risk that AIAs and audits become box-checking exercises — performed because they are required but without genuine critical engagement. Drawing on the EIA analogy, the chapter notes that environmental impact assessments have sometimes been criticized for producing lengthy documents that satisfy procedural requirements while failing to prevent environmental harm. The same risk applies to algorithmic audits: a company might hire an auditor, receive a report, and continue operating exactly as before — with the audit serving as a shield against criticism rather than a tool for accountability. Option C describes a related concern but is not the primary definition of capture as used in the chapter.

Section 2: True/False with Justification (1 point each)

11. "The accountability gap is primarily a problem of technical limitations — once AI systems become fully explainable, the gap will close."

Answer **False.** *Explanation:* Section 17.1 argues that the accountability gap is a governance problem, not merely a technical one. Even if every algorithmic system were fully explainable, the many hands problem would remain — responsibility would still be distributed across developers, deployers, administrators, and data providers. Enforceability challenges would persist — legal frameworks would still need to determine who bears liability. And the sheer scale and speed of algorithmic decision-making would still overwhelm traditional accountability mechanisms designed for human decision-makers. Explainability is necessary for accountability but far from sufficient.

12. "An external algorithmic audit that does not have access to the system's source code or training data can still produce valuable findings."

Answer **True.** *Explanation:* Section 17.2 describes audit studies (correspondence tests) and outcome audits that can be conducted without access to internal system components. By sending matched test inputs through a system and analyzing the outputs, or by analyzing the system's real-world outcomes across demographic groups, external auditors can detect disparate treatment and disparate impact without ever seeing the code. ProPublica's audit of the COMPAS recidivism algorithm and numerous housing and employment discrimination studies used precisely this approach. However, the chapter notes that these methods have limitations — they can identify *that* discrimination exists but may not explain *why* or *how* to fix it.

13. "Under a negligence standard, an AI developer would be liable for algorithmic harm only if a plaintiff could demonstrate that the developer failed to exercise reasonable care in the system's design, testing, or deployment."

Answer **True.** *Explanation:* Section 17.4 explains that the negligence standard requires proof that the defendant failed to meet the standard of care expected of a reasonable professional. Applied to algorithmic systems, this would mean a developer is liable only if they failed to take reasonable steps — such as testing for bias, validating accuracy, or providing adequate documentation. The chapter notes that the negligence standard places the burden of proof on the harmed party, who must demonstrate what "reasonable care" looks like for AI development — a standard that does not yet have well-established benchmarks.

14. "Algorithmic Impact Assessments should ideally be conducted after a system has been deployed, using real-world outcome data rather than hypothetical risk scenarios."

Answer **False.** *Explanation:* Section 17.3 argues that AIAs are most valuable when conducted *before* deployment — during the design and planning phase. Like Environmental Impact Assessments, which are required before a project breaks ground, AIAs are designed to anticipate and prevent harm rather than measure it after the fact. Post-deployment auditing and monitoring are also necessary (and the chapter advocates for ongoing monitoring), but the AIA's distinctive value is in the pre-deployment assessment of risks, stakeholder impacts, and mitigation strategies. Conducting an AIA only after deployment would miss the opportunity to prevent harms that could have been foreseen.

15. "The chapter argues that the financial auditing industry provides a perfect model for algorithmic auditing, requiring no significant adaptation."

Answer **False.** *Explanation:* While Section 17.5 draws parallels between financial auditing and algorithmic auditing — both require independence, professional standards, and regulatory backing — the chapter identifies significant differences that require adaptation. Algorithmic systems are dynamic (they change through retraining), their harms are often diffuse and difficult to quantify, and the "accounting standards" equivalent for algorithmic behavior does not yet exist. The chapter also warns that the financial auditing industry has its own history of capture and failure (most notably in the lead-up to the 2008 financial crisis), suggesting that algorithmic auditing must learn from those failures rather than simply replicate the model.

Section 3: Short Answer (2 points each)

16. Explain the difference between disparate treatment and disparate impact in the context of algorithmic auditing. Why is disparate impact particularly challenging to detect and address in algorithmic systems?

Sample Answer Disparate treatment occurs when a system explicitly uses a protected characteristic (such as race or gender) as an input — treating people differently because of who they are. Disparate impact occurs when a system produces unequal outcomes across protected groups even though it does not explicitly use protected characteristics — typically because proxy variables (ZIP code, name, browsing patterns) correlate with protected attributes. Disparate impact is particularly challenging in algorithmic systems for three reasons. First, the proxy relationships are often non-obvious — a model might use hundreds of features, any combination of which could serve as a proxy for race or gender, and identifying these proxies requires sophisticated statistical analysis. Second, machine learning models discover correlations that humans would not intentionally encode, meaning disparate impact can arise without any intent to discriminate. Third, proving disparate impact legally often requires demonstrating that the disparate outcome is not justified by a legitimate business necessity — a standard that is difficult to apply when the model's reasoning is opaque. *Key points for full credit:* - Clear distinction between disparate treatment (explicit use) and disparate impact (facially neutral but unequal outcomes) - Explains why algorithmic proxies make disparate impact hard to detect - References the intent-vs.-outcome distinction

17. Dr. Adeyemi describes the accountability gap as "a governance crisis, not a technical glitch." Using at least two specific examples from the chapter, explain what this distinction means and why it matters for the solutions we pursue.

Sample Answer By calling the accountability gap a governance crisis rather than a technical glitch, Dr. Adeyemi means that the problem cannot be solved by better algorithms alone — it requires new institutions, legal frameworks, and accountability structures. If the gap were merely technical (e.g., models are opaque), then improving explainability would suffice. But the chapter demonstrates that the gap has structural causes that persist even with perfect transparency. First, the many hands problem: in the VitraMed accountability chain, five different actors each disclaim responsibility. Better explainability would not change this dynamic — even if the model's reasoning were perfectly transparent, the question of who is liable would remain unresolved without a governance framework that assigns responsibility. Second, the enforceability challenge: even when algorithmic discrimination is detected (as in audit studies of hiring platforms), existing legal frameworks may lack clear mechanisms for enforcement because they were designed for human decision-makers. This is a gap in law and institutional design, not in technology. The distinction matters because it determines what solutions are prioritized. A technical framing leads to investments in explainable AI. A governance framing leads to new liability rules, mandatory audits, institutional reforms, and the creation of accountability-holder designations — responses that address the structural architecture of the problem. *Key points for full credit:* - Explains the distinction between technical and governance framings - Provides at least two specific examples from the chapter - Connects the framing to the type of solutions pursued

18. Section 17.5 discusses regulatory sandboxes as a governance mechanism. Explain how a regulatory sandbox works, identify one advantage and one risk, and describe a specific scenario in which a sandbox approach would be appropriate.

Sample Answer A regulatory sandbox is a controlled environment in which companies can deploy innovative algorithmic systems under relaxed regulatory requirements but with active regulatory oversight. The sandbox has defined boundaries — a limited time period, a restricted geographic scope or user population, and specific reporting requirements. It allows regulators and companies to learn together about the technology's impacts before broader rules are established. One advantage is that sandboxes prevent premature regulation: rather than banning new technologies or applying ill-fitting existing rules, regulators can observe actual behavior and craft evidence-based rules. One risk is regulatory capture: if the sandbox is designed primarily by industry participants, it may create overly permissive conditions that normalize lax standards and make subsequent regulation harder. An appropriate scenario would be a municipal government considering the deployment of an AI-powered triage system in emergency services. The system's potential to improve response times is significant, but the risks of misclassification (deprioritizing genuinely urgent calls) are high. A sandbox could allow deployment in one district for six months, with real-time monitoring, mandatory incident reporting, and human override on every decision, generating the evidence needed to determine whether broader deployment is appropriate. *Key points for full credit:* - Accurate description of how a sandbox works - One advantage and one risk clearly identified - A specific, plausible scenario provided

19. Mira observes that the accountability gap is "not that nobody is responsible" but "that the responsibility is so distributed that it's functionally the same as if nobody is." Using the concept of the "many hands problem," explain how distribution of responsibility can be functionally equivalent to absence of responsibility. Propose one governance mechanism that addresses this problem.

Sample Answer When responsibility is distributed across many actors — data providers, developers, deployers, administrators, and end users — each actor's individual contribution to the harmful outcome may seem insufficient to constitute full responsibility. Each can point to others in the chain and argue, with some justification, that their piece was reasonable. The developer built a tool but did not choose how to deploy it. The deployer followed the vendor's recommendations. The administrator set thresholds based on expert guidance. This creates a situation where responsibility exists in theory (someone did make each decision) but is unenforceable in practice (no single actor's contribution is clearly sufficient for liability). This parallels environmental pollution cases where multiple factories each contribute a small amount of contamination to a river — each can argue that their individual discharge is insufficient to cause harm, even as the cumulative effect is devastating. One governance mechanism to address this is the designation of a mandatory "accountability holder" — a legal requirement that every algorithmic system deployed in high-stakes domains must have a single identified entity that bears ultimate liability for the system's outcomes, regardless of how many actors contributed to its development and deployment. This mirrors the "responsible person" concept in pharmaceutical regulation, where the marketing authorization holder bears ultimate responsibility for a drug's safety even though manufacturers, distributors, and prescribers all play roles. *Key points for full credit:* - Explains the mechanism by which distributed responsibility becomes non-accountability - Uses the many hands concept accurately - Proposes a specific governance solution with reasoning

Section 4: Applied Scenario (5 points)

20. Read the following scenario and answer all parts.

Scenario: MetroAssist

MetroAssist is a municipal benefits administration system used by the city of Harborview to determine eligibility for housing assistance, utility subsidies, and emergency food aid. The system was developed by DataWorks, a private technology company, and deployed by the city's Department of Social Services (DSS).

MetroAssist uses a machine learning model trained on five years of historical benefits data to predict whether applicants are "likely to successfully complete" assistance programs. Applicants who receive low scores are not denied benefits outright but are placed in a "supplementary review" queue that, due to staffing shortages, averages a 14-week wait time — compared to 3 weeks for applicants the system scores as "likely to succeed."

A local legal aid organization, the Harborview Justice Center, conducts an audit study. They submit 200 matched pairs of applications — identical in all respects except that one application in each pair uses a traditionally Black name and the other uses a traditionally white name. They find that applications with traditionally Black names receive scores 18% lower on average, resulting in disproportionate placement in the supplementary review queue.

DataWorks responds: "Our model does not use race as an input. The score differences likely reflect correlations in the historical data." The DSS responds: "We rely on DataWorks' validated model. Placement in supplementary review is not a denial of benefits." The city's mayor responds: "This is a matter between the vendor and the department."

(a) Map the accountability chain for this system. Identify at least four actors and describe each one's role and likely claim of non-liability. (1 point)

(b) Evaluate the Harborview Justice Center's audit methodology. What type of audit did they conduct? What did it successfully demonstrate? What questions does it leave unanswered? (1 point)

(c) Analyze DataWorks' response — "Our model does not use race as an input" — using the concepts of disparate treatment and disparate impact. Is this response adequate? Why or why not? (1 point)

(d) The DSS claims that placement in supplementary review "is not a denial of benefits." Evaluate this claim. Is a 14-week wait time functionally different from a denial for someone experiencing a housing or food emergency? What concept from the chapter does this illustrate? (1 point)

(e) Design a governance framework for MetroAssist that addresses the accountability gap identified in this scenario. Your framework should include at least three specific mechanisms, each linked to a concept from the chapter. (1 point)

Sample Answer **(a)** Accountability chain: - **DataWorks (developer):** Built and trained the model. Likely claim: "We provided a decision-support tool. We validated it for accuracy. Deployment decisions are the city's responsibility." - **Department of Social Services (deployer):** Integrated the model into benefits administration. Likely claim: "We relied on the vendor's validated product. We are not AI experts. The model doesn't use race." - **City government (policy authority):** Authorized the contract and set the policy framework. Likely claim: "This is an operational matter for the department. We trust our procurement process." - **Mayor/elected officials (political authority):** Oversee the department and set budget priorities. Likely claim: "This is a matter between the vendor and the department" — deflecting to the bureaucratic chain. - **Historical data generators (past case workers):** Their decisions and biases are embedded in the five years of historical data. Likely claim: Not present to make a claim — they are invisible actors in the system. **(b)** The Harborview Justice Center conducted an **audit study** (correspondence study) — the gold standard method for detecting disparate treatment in algorithmic systems. It successfully demonstrated that the system produces systematically different scores for applications that differ only by name (a proxy for race), establishing a strong prima facie case of disparate impact. However, the audit leaves unanswered: (1) the *mechanism* by which name influences the score (is the model using name directly, or are names correlated with other features?), (2) whether the disparity exists across other protected characteristics, (3) what other proxy variables contribute, and (4) how to fix the problem. Answering these questions would require a code audit and access to the model's internals. **(c)** DataWorks' response illustrates the distinction between disparate treatment and disparate impact. It is true that the model may not use race as an explicit input — this would mean no disparate treatment in the narrow sense. But the audit study demonstrates disparate impact: the system produces racially disparate outcomes through proxy correlations in the historical data (such as ZIP code, education, or prior program participation, which are correlated with race due to structural racism). The response is inadequate because disparate impact can constitute discrimination regardless of intent. Under disparate impact doctrine, the absence of explicit racial inputs does not excuse racially disparate outcomes. The relevant question is not whether race is an input but whether the output is equitable. **(d)** The DSS claim is misleading. For someone experiencing a housing emergency, a 14-week wait is functionally indistinguishable from a denial — they may lose their housing, face eviction, or experience food insecurity during the wait. The chapter's discussion of the accountability gap applies directly: the system has been designed so that no one is formally "denying" benefits, yet the practical impact is that applicants with traditionally Black names wait four times longer for critical assistance. This illustrates the concept of **algorithmic harm through process rather than outcome** — the system does not deny benefits, but it imposes a burden so severe that the distinction is academic. It also illustrates how algorithmic systems can create deniability: DSS can claim no one is denied benefits while the system produces a racially disparate burden. **(e)** Three governance mechanisms: 1. **Mandatory pre-deployment AIA (Section 17.3):** Before any benefits administration algorithm is deployed, the city should require a public Algorithmic Impact Assessment that includes disparate impact testing across protected characteristics, stakeholder consultation (including representatives from affected communities), and a published report with mitigation plans. This would have identified the risk of racial disparity before deployment. 2. **Designated accountability holder with liability (Section 17.4):** The city should designate a specific office or official — such as a Chief Data Ethics Officer within DSS — who bears formal responsibility for the system's outcomes. This person cannot deflect to the vendor or the mayor. They are responsible for ongoing monitoring, audit responses, and remediation. This addresses the many hands problem directly. 3. **Mandatory external audit with public reporting (Section 17.2/17.5):** The system should be subject to annual external audits by an independent auditor, with results published publicly. The audit should include disparate impact testing, outcome analysis, and stakeholder feedback. Public reporting creates accountability through transparency — residents, journalists, and advocacy organizations can scrutinize the results and demand action.

Scoring & Review Recommendations

Score Range Assessment Next Steps
Below 50% (< 15 pts) Needs review Re-read Sections 17.1-17.3 carefully, redo Part A exercises
50-69% (15-20 pts) Partial understanding Review specific weak areas, focus on Part B exercises for applied practice
70-85% (21-25 pts) Solid understanding Ready to proceed to Chapter 18; review any missed topics briefly
Above 85% (> 25 pts) Strong mastery Proceed to Chapter 18: Generative AI: Ethics of Creation and Deception
Section Points Available
Section 1: Multiple Choice 10 points (10 questions x 1 pt)
Section 2: True/False with Justification 5 points (5 questions x 1 pt)
Section 3: Short Answer 8 points (4 questions x 2 pts)
Section 4: Applied Scenario 5 points (5 parts x 1 pt)
Total 28 points