Chapter 19: Exercises — Auditing AI Systems

Difficulty ratings: ⭐ (foundational) through ⭐⭐⭐⭐ (advanced). Exercises marked with † are team/collaborative exercises.

Part A: Comprehension and Application

Exercise 1 ⭐ Define the four types of AI audits (technical, process, impact, and compliance) and for each type, provide: (a) the primary question the audit answers, (b) the information the auditor needs access to, and (c) the primary challenge in conducting the audit. Then explain why effective AI governance requires all four types rather than just one.

Exercise 2 ⭐ List the three primary requirements for effective external AI auditing (access, expertise, and independence). For each requirement, explain: (a) why it is necessary for audit credibility; (b) what the primary obstacle to achieving it is; and (c) what mechanism (legal, contractual, or regulatory) could help address the obstacle.

Exercise 3 ⭐ NYC Local Law 144 requires calculation of "impact ratios" comparing selection rates across demographic groups. Explain what an impact ratio is, how it is calculated, and what the four-fifths rule means in the context of AI hiring tool audits. What does an impact ratio below 0.8 indicate, and what action (if any) does it require under LL 144?

Exercise 4 ⭐ Explain the mathematical incompatibility of fairness metrics that the ProPublica/Northpointe COMPAS controversy revealed. Define demographic parity, equalized odds, and calibration (predictive parity), and explain why these three metrics cannot all be satisfied simultaneously when base rates differ across groups.

Exercise 5 ⭐⭐ You have been hired to conduct a pre-deployment algorithmic impact assessment for an AI-based tool that will be used by a large urban school district to predict which middle school students are at risk of dropping out before graduation. The tool will be trained on historical student data including grades, attendance, disciplinary records, and family income indicators. Identify five specific potential harms the tool could cause, explain how you would test for each, and propose mitigation measures for the three most serious.

Part B: Case Analysis

Exercise 6 ⭐⭐ Compare the ProPublica COMPAS audit with a standard technical audit that would have required full model access. For each of the following dimensions, explain what the output-only audit (ProPublica's methodology) could and could not determine: (a) whether racial disparities exist in predictions; (b) why the disparities exist; (c) whether the tool is more or less accurate than human prediction; (d) whether the disparities arise from model design or training data. What are the implications of these limitations for how output-only audits should be used and disclosed?

Exercise 7 ⭐⭐ NYC Local Law 144 has been criticized for its narrow scope, limited audit standards, and absence of validity requirements. Using the law's text and implementing rules, identify three specific provisions you would amend and explain how your amendments would strengthen the law's effectiveness. Address the counterarguments from affected industries for each amendment.

Exercise 8 ⭐⭐ The chapter discusses the "audit without access" approach. Identify a domain other than criminal justice where this approach could be applied — a domain where AI system outputs and outcomes are both accessible without the deploying organization's cooperation. Describe: (a) what AI system you would audit; (b) how you would construct the audit dataset; (c) what fairness metrics you would examine; and (d) what limitations your approach would have.

Exercise 9 ⭐⭐⭐ Facebook's internal research teams conducted sophisticated studies of the news feed algorithm's effects on user wellbeing and found significant harms. This internal research did not produce adequate organizational response. Design an internal AI audit function for a social media platform of Facebook's scale that would have produced more effective response to these findings. Your design should specify: governance structure (who does the audit function report to?); scope (what systems and harms are covered?); process (how are findings documented, escalated, and acted upon?); and accountability (what happens when management ignores significant findings?).

Exercise 10 ⭐⭐⭐ † In teams of three, conduct a simulated bias audit of a public AI system. Choose one of the following publicly available AI systems: (a) a resume screening tool with a free trial; (b) a publicly accessible risk assessment rubric used in your state's child welfare system; or (c) a credit pre-qualification tool available online. Design a testing methodology, collect data by systematically varying inputs, calculate relevant metrics, and present your findings in a written report with the format of a professional bias audit disclosure. Discuss the limitations of your methodology.

Part C: Critical Thinking and Analysis

Exercise 11 ⭐⭐ The chapter notes that requiring auditors to calculate and disclose specific metrics creates an opportunity for gaming — optimizing the audited system to score well on specified metrics without addressing underlying bias. This is sometimes called Goodhart's Law: "when a measure becomes a target, it ceases to be a good measure." How would you design an AI audit requirement that minimizes gaming while remaining practically implementable? What tradeoffs does your approach involve?

Exercise 12 ⭐⭐ Evaluate this claim: "Impact assessments are only as good as the imagination of those conducting them." What does this mean in practice? What failures of imagination have produced inadequate impact assessments in historical cases? How would you design an AIA process that systematically overcomes the limitations of assessors' imagination?

Exercise 13 ⭐⭐⭐ The EU AI Act's conformity assessment framework allows most high-risk AI systems to undergo self-assessment — the organization itself assesses whether its system complies with the Act's requirements. Only the highest-risk AI systems require third-party assessment by a notified body. Construct an argument either for or against this approach. Your argument should engage with: the costs of third-party assessment; the conflicts of interest in self-assessment; the likely compliance quality under each approach; and the EU AI Act's other provisions that might compensate for self-assessment's weaknesses.

Exercise 14 ⭐⭐⭐ The chapter compares AI auditing to financial auditing, noting that financial audit's institutional infrastructure took decades to develop. Identify three specific features of financial audit's institutional infrastructure (standards body, professional credential, regulator, liability regime, mandatory disclosure) and for each: (a) explain how the feature contributes to financial audit effectiveness; (b) describe the AI auditing equivalent (if any currently exists); and (c) identify the specific obstacles to developing an equivalent for AI auditing.

Exercise 15 ⭐⭐⭐ † In teams, develop a model AI audit standard for a specific high-risk application of your team's choosing (suggestions: AI-based medical diagnosis, AI-based benefits eligibility determination, AI-based parole decisions). Your standard should specify: (a) what data the auditor must examine; (b) what technical analyses must be conducted; (c) what fairness metrics must be reported; (d) what conclusions the auditor must reach on specified questions; (e) what must be disclosed publicly; and (f) what auditor qualifications are required. Present your standard as a draft regulation.

Part D: Applied Professional Scenarios

Exercise 16 ⭐⭐ You are the Chief Compliance Officer of a major bank. Your bank uses an AI-based mortgage underwriting tool provided by a third-party vendor. The vendor has provided you with a summary audit report showing that the tool produces loan approval rates within the four-fifths rule across racial groups. Your legal team has flagged that you are required to conduct your own HMDA (Home Mortgage Disclosure Act) analysis under CFPB guidance. What additional audit steps would you take beyond relying on the vendor's audit? What contractual rights would you want in the vendor agreement?

Exercise 17 ⭐⭐ You are a data scientist at a health insurance company that uses an AI tool to flag members for care management programs. A colleague shows you analysis suggesting that the tool is systematically under-flagging Black members compared to white members with similar health conditions. The tool was developed by a third-party vendor. What are your professional obligations? What steps would you take? To whom would you escalate? What documentation would you create?

Exercise 18 ⭐⭐⭐ You have been hired by a city government to conduct an external audit of an AI tool used to predict child maltreatment risk — an AI system that helps child welfare caseworkers prioritize which families to investigate when a maltreatment report is made. The tool's developer will not provide access to the model or training data. Describe in detail: (a) what data sources you would attempt to use for an output-based audit; (b) what specific analyses you would conduct; (c) what fairness metrics are most appropriate for this context and why; (d) what conclusions you could and could not draw from an output-only audit; and (e) what you would recommend to the city if you found significant racial disparities.

Exercise 19 ⭐⭐⭐ † Your team has been hired to design a red-teaming protocol for a large language model that is being deployed as a customer service chatbot for a major bank. The bank is concerned about the following risks: providing incorrect financial advice; generating discriminatory content in responses; being manipulated into revealing confidential information; and generating content that could be used in financial fraud. For each risk, design: (a) specific test inputs designed to elicit the harmful behavior; (b) evaluation criteria for what constitutes a harmful response; and (c) severity ratings for identified failures. Present your red-teaming protocol as a professional document.

Exercise 20 ⭐⭐⭐⭐ Comparative analysis: Identify an AI system deployed by a U.S. government agency (federal, state, or local) in a consequential domain (benefits, criminal justice, immigration, housing). Research the audit or oversight requirements applicable to this system under current law. Then: (a) describe the current legal requirements; (b) assess the adequacy of current oversight relative to the system's risk level; (c) compare the current requirements to what the EU AI Act would require for an equivalent system; and (d) recommend specific oversight improvements, with implementation mechanisms. (1,200–1,500 words)

Part E: Research and Writing

Exercise 21 ⭐⭐ Research the Datasheets for Datasets and Model Cards initiatives. What information does each require? How are they similar to and different from each other? What are their limitations as accountability mechanisms? Identify one currently deployed AI system whose model card or dataset documentation is publicly available, and evaluate whether the documentation is adequate to support meaningful external auditing.

Exercise 22 ⭐⭐⭐ Research the Federal Reserve's SR 11-7 guidance on model risk management. Identify three specific requirements of SR 11-7 that are applicable to AI-based models used in financial services. For each requirement, explain: (a) what it requires; (b) how it has been applied to AI models in practice (use specific examples if available); and (c) what gaps remain in SR 11-7's coverage of advanced AI systems. How should SR 11-7 be updated to address modern AI systems? (700–900 words)

Exercise 23 ⭐⭐⭐ † Team research project: Identify three employers in your region who use AI-based hiring tools. Research: (a) which AI tools they use (if publicly disclosed); (b) whether they have published bias audits under NYC LL 144 or voluntarily; (c) if audits are published, what the audits found and what the quality of the audit methodology appears to be; and (d) whether the employers have made any public statements about their use of AI in hiring. Present your findings in a 15-minute presentation comparing the three employers' approaches to AI audit transparency.

Exercise 24 ⭐⭐⭐⭐ Policy brief: You have been asked to draft a memorandum to the EEOC Commissioner recommending guidance on AI audit requirements for employment selection tools. Your memorandum (1,200–1,500 words) should: (a) summarize the current legal requirements under Title VII and UGESP for AI selection tools; (b) identify the gaps in current requirements relative to the specific challenges of AI systems; (c) propose specific new guidance on what audits must cover, what metrics must be calculated, what must be disclosed, and what triggers a compliance investigation; and (d) address the legal authority of the EEOC to issue such guidance and the likely industry response.

Exercise 25 ⭐⭐⭐⭐ † Capstone exercise: Working in teams, design a comprehensive AI audit framework for a sector of your team's choosing. The framework should address: (a) what AI systems are subject to mandatory auditing; (b) what types of audits are required (technical, process, impact, compliance); (c) what the audit must cover at minimum; (d) who can conduct the audit (independence and credential requirements); (e) what must be publicly disclosed; (f) what the consequences of audit failure are; (g) how the framework fits into the broader regulatory landscape of the chosen sector. Present the framework as a draft regulation and an implementation memo addressed to regulated entities.