Exercises: Chapter 12 — Bias in Healthcare AI

DataField.Dev

Exercises: Chapter 12 — Bias in Healthcare AI

Difficulty ratings: ⭐ Foundational | ⭐⭐ Developing | ⭐⭐⭐ Advanced | ⭐⭐⭐⭐ Expert † marks exercises recommended for in-class use or group discussion

Part A: Foundational Comprehension (⭐)

Exercise 1 ⭐ Define proxy bias in your own words, without using technical jargon. Then explain, in two or three sentences, how the Optum health risk algorithm illustrates proxy bias. Why did the algorithm produce racially disparate outcomes even though race was not an input variable?

Exercise 2 ⭐ Match each source of healthcare data bias to its primary mechanism:

Source	Mechanism
Clinical trial underrepresentation	A. Documentation reflects provider attitudes
EHR data skew	B. Algorithm learns historically biased treatment patterns
Clinical note bias	C. Training data reflects who accessed care, not population
Historical treatment as training signal	D. Excluded populations are absent from foundational evidence
Measurement tool inaccuracy	E. Physical instrument performs differently across groups

Write a brief explanation for each match.

Exercise 3 ⭐ The Fitzpatrick skin tone scale ranges from type I (very light) to type VI (deeply pigmented). The Adamson and Smith (2018) study found that fewer than 5 percent of images in major dermatology AI training datasets depicted Fitzpatrick types V and VI.

a. What is the likely consequence for AI classifier performance on patients with Fitzpatrick type V or VI skin? b. Why does underrepresentation in training data produce lower model performance rather than simply missing data? c. Why might a high overall accuracy score for a skin lesion classifier mask poor performance on darker skin tones?

Exercise 4 ⭐ Explain the concept of the "Yentl syndrome" as described by Bernadine Healy and its relevance to AI systems trained on clinical data. What types of AI systems would be most affected by this legacy, and why?

Exercise 5 ⭐ What is a model card? List five specific pieces of information that a model card for a healthcare AI system should include. Why is each piece of information relevant to a health system making a procurement decision?

Part B: Application and Analysis (⭐⭐)

Exercise 6 ⭐⭐ † A hospital system is considering deploying a commercially available AI tool to predict 30-day readmission risk for hospitalized patients. The vendor provides an overall accuracy statistic (AUC = 0.82) but no demographic breakdown.

a. What specific questions should the hospital procurement team ask the vendor before purchasing? b. What data would they need access to in order to conduct their own demographic evaluation? c. If the vendor refuses to share training data demographics or subgroup performance data, citing proprietary concerns, what should the hospital do? d. Draft a brief contract clause requiring the vendor to provide demographic performance data.

Exercise 7 ⭐⭐ The eGFR race correction adjusted estimated kidney function upward for Black patients, which its designers argued made the estimate more accurate. Yet it delayed Black patients' access to transplant referrals.

a. Explain the mechanism by which a "more accurate" measurement could produce a worse clinical outcome. b. What should the clinical community have done when introducing the race correction in 1999 to reduce the risk of the harm that was eventually documented? c. The NKF/ASN in 2021 recommended eliminating race from eGFR calculations. Formulate the counterargument — what would a defender of the race correction say? Then respond to that argument.

Exercise 8 ⭐⭐ A mental health tech startup has built an AI-powered depression screening tool. The tool was validated in a clinical trial at a university student health center, with a sample that was 78% white, 12% Asian, and 6% Black. The startup wants to deploy the tool in a network of community mental health centers serving predominantly Black and Hispanic populations in urban areas.

a. Identify at least three specific equity concerns with this deployment plan. b. What steps could the startup take before deployment to reduce these risks? c. Should deployment proceed before additional demographic validation? Justify your answer with reference to both the potential benefits and potential harms.

Exercise 9 ⭐⭐ Describe the "substitution risk" in AI mental health care. Under what conditions is AI substitution for human mental health care most likely to occur, and in which populations? Why is substitution more concerning in mental health than in some other clinical domains? What policy or design interventions could reduce substitution risk?

Exercise 10 ⭐⭐ † Compare the regulatory status of: a. An AI-powered radiology tool that analyzes chest X-rays to detect pneumonia b. A commercial health risk stratification algorithm embedded in a hospital EHR c. A consumer mental health chatbot marketed as a "wellness" application

For each, describe the likely FDA regulatory pathway (or exemption) that applies, and identify any gaps in oversight that the current regulatory framework leaves unaddressed. What reforms would close those gaps?

Exercise 11 ⭐⭐ The Obermeyer et al. study found that replacing healthcare cost with clinical indicators of disease burden produced a more equitable risk stratification algorithm. Consider a commercial vendor's perspective:

a. What are the business incentives for using cost as a training target rather than clinical burden indicators? b. What are the technical challenges of using clinical burden indicators? c. What regulatory or market interventions would create incentives for vendors to adopt more equitable training targets?

Exercise 12 ⭐⭐ Reproductive health AI tools — fertility trackers, pregnancy complication predictors — are often marketed as wellness applications, allowing them to avoid FDA oversight. Yet their outputs may influence clinical decisions.

a. Where do you think the line between "wellness application" and "medical device" should be drawn? b. What specific equity concerns arise in this category of tool? c. Design a brief regulatory framework that would address these concerns without impeding beneficial innovation.

Part C: Synthesis and Critical Thinking (⭐⭐⭐)

Exercise 13 ⭐⭐⭐ † The following argument has been made by some AI researchers and industry groups: "Requiring pre-deployment demographic performance testing will slow healthcare AI innovation. The patients who would benefit most from the technology being delayed are often the very populations whose equity is at stake. Faster deployment, even if imperfect, serves health equity better than cautious delay."

Write a 500-word response to this argument. Your response should: - Identify the strongest version of the argument - Identify the assumptions it depends on - Evaluate whether those assumptions are empirically supported - Articulate your own position with justification

Exercise 14 ⭐⭐⭐ Intersectionality, as a framework for understanding compounded disadvantage, has implications for how AI equity evaluation is designed.

a. Explain why evaluating racial bias and gender bias separately, rather than jointly, might miss harms affecting Black women. b. Design an evaluation protocol for a pregnancy complication prediction AI that would detect intersectional harms. What data would you need? What statistical approaches would you use? c. What practical challenges make intersectional bias evaluation more difficult than single-axis evaluation, and how might they be addressed?

Exercise 15 ⭐⭐⭐ A study finds that a sepsis prediction AI performs significantly better for white patients than for Black patients in a hospital system serving a racially diverse urban population. The hospital's AI governance committee must decide what to do. Consider the following options:

Option A: Immediately discontinue use of the AI for all patients until the performance gap is addressed. Option B: Continue use of the AI for all patients, but flag its known performance limitation to clinicians. Option C: Continue use for white patients, suspend for Black patients pending further evaluation. Option D: Engage the vendor to remediate the performance gap within 90 days, during which time all clinicians are briefed on the limitation.

For each option, identify its primary clinical and ethical strengths and weaknesses. Which option would you recommend, and why? What additional information would change your recommendation?

Exercise 16 ⭐⭐⭐ The FDA's proposed 2023 guidance on demographic performance reporting for AI medical devices requires manufacturers to demonstrate adequate performance across demographic subgroups. Opponents argue that:

i. Perfect performance parity is statistically unachievable for small subgroup samples ii. Defining "adequate" performance across subgroups is inherently arbitrary iii. The requirement will prevent beneficial AI from reaching patients who need it

Evaluate each objection. For each: identify what is factually accurate in the objection, what is exaggerated or unsupported, and what a well-designed regulatory framework would do to address the legitimate concern while preserving the equity mandate.

Exercise 17 ⭐⭐⭐ † The "historical data problem" — that AI trained on records of discriminatory care will learn those discriminatory patterns — may not be fully addressable by collecting more diverse data.

a. Explain the historical data problem using a concrete example from this chapter. b. Identify three technical or methodological approaches that could partially address this problem. c. Identify the limits of each approach. d. Make an argument for why addressing the historical data problem requires changes in clinical practice, not just data science methodology.

Part D: Expert-Level and Scenario Analysis (⭐⭐⭐⭐)

Exercise 18 ⭐⭐⭐⭐ You are the chief ethics officer of a major hospital system. You have been asked to develop a comprehensive healthcare AI equity policy that will govern all AI procurement and deployment decisions for the system. The policy must address:

a. Pre-procurement due diligence requirements b. Conditions under which AI deployment may be approved, conditionally approved, or rejected c. Post-deployment monitoring requirements and triggering thresholds for remediation d. Community engagement requirements for AI affecting specific populations e. Transparency obligations to patients about AI use in their care f. Staff training requirements regarding AI equity

Write the key sections of this policy (approximately 800 words) in policy language suitable for adoption by a healthcare system board.

Exercise 19 ⭐⭐⭐⭐ Consider the claim: "Healthcare AI bias is ultimately a problem of inadequate data, and the solution is more and better data collection from underrepresented populations."

Challenge this claim by: a. Identifying three documented cases from this chapter where the problem is not primarily a data quantity or diversity issue b. Articulating what structural changes — beyond data collection — would be necessary to address healthcare AI bias comprehensively c. Explaining who would need to make those structural changes and what their incentives are or are not to do so d. Evaluating whether comprehensive healthcare AI equity is achievable within the current structure of the U.S. healthcare market

Exercise 20 ⭐⭐⭐⭐ † You are advising a national government in a middle-income country that is considering adopting AI diagnostic tools from major U.S. and European technology companies to extend clinical decision support to underserved rural populations. The tools were validated in high-income country populations.

Draft a policy brief (approximately 600 words) addressed to the health minister that: a. Identifies the specific equity risks of adopting AI validated on demographically different populations b. Recommends conditions under which adoption should or should not proceed c. Proposes a national monitoring and evaluation framework d. Addresses the political reality that refusing to adopt these tools may mean continued underservice of rural populations e. Identifies what the government should demand from AI vendors as conditions of market access

Exercise 21 ⭐⭐⭐⭐ The accountability gap identified in this chapter — vendors blame deploying health systems, health systems blame vendors, patients cannot know they were affected — is a structural problem, not just a communication failure.

a. Map the accountability gap using a stakeholder analysis: identify each actor (AI developer, health system, clinician, regulator, patient), their role in the harm-producing chain, and their claimed reasons for limited accountability. b. Identify analogous accountability gaps in other industries (financial services, consumer products, environmental harm) and the mechanisms society has used to close them. c. Propose a comprehensive accountability framework for healthcare AI that specifies where accountability should reside for different types of harms (development defects, deployment failures, post-deployment drift) and how it should be enforced.

Exercise 22 ⭐⭐⭐⭐ A major health insurance company proposes using an AI algorithm to predict which members are at high risk of opioid use disorder, enabling proactive outreach and preventive intervention. The company argues this is an equity-positive use of AI — it will reach at-risk members before addiction develops. A health equity researcher argues it will stigmatize certain populations and could be used to deny coverage.

a. Map the equity arguments on both sides. b. Identify what evidence would help resolve the empirical disagreements underlying this debate. c. Identify the irreducible values disagreements that empirical evidence cannot resolve. d. Propose a governance process by which this specific use case could be evaluated fairly, including who should participate in the evaluation and what standards should apply.

Exercise 23 ⭐⭐⭐⭐ The clinical integration of AI — the design of how AI outputs are presented within clinical workflows — has significant implications for equity, because it determines whether clinicians can exercise judgment that compensates for algorithmic bias.

a. Design a workflow integration framework for a high-risk clinical AI (e.g., a sepsis prediction tool) that maximizes clinician ability to identify and compensate for potential demographic performance gaps. b. Identify the tensions between your framework's equity goals and the efficiency goals that drive AI adoption. c. Propose evidence-based methods for testing whether your workflow integration design successfully enables equitable clinical judgment.

Exercise 24 ⭐⭐⭐⭐ † The FDA exempts clinical decision support software from medical device regulation when the software "displays, analyzes, or prints medical information about a patient or other medical information such as peer reviewed clinical studies and clinical practice guidelines" and "supports or provides recommendations to a health care professional about prevention, diagnosis, or treatment of a disease or condition" in a way that allows the clinician to independently review the basis for the recommendation.

a. Identify three health risk stratification or clinical AI products that might argue for this exemption even though they have significant clinical impact. b. Evaluate whether the exemption, as worded, is adequate to protect patients from harm from these tools. c. Draft alternative exemption language that would better protect patient safety and equity while preserving innovation space for genuinely low-risk CDS tools.

Exercise 25 ⭐⭐⭐⭐ Comprehensive essay question (1,200–1,500 words):

"The healthcare AI bias crisis is not primarily a technology problem — it is a governance problem. The technology to build more equitable healthcare AI exists. The data collection methods to improve demographic representation exist. The regulatory authority to require demographic performance testing exists. What is missing is the political will, the market incentives, and the accountability mechanisms to require that these tools be built and deployed equitably."

Evaluate this claim. Your essay should: - Define what you mean by a "technology problem" versus a "governance problem" - Examine the evidence for and against the claim that adequate technical solutions exist - Identify the specific governance failures — in regulation, in market structure, in professional norms, in procurement practice — that have allowed biased tools to proliferate - Propose a governance reform agenda with specific, actionable recommendations - Acknowledge the strongest counterarguments and respond to them - Conclude with your own assessment of what the most important single reform would be, and why