Chapter 25 Quiz: Bias in AI Systems
Question 1
What is the most accurate description of AI bias?
A) A rare defect in poorly designed models that can be eliminated with better engineering B) A systematic and repeatable error in an AI system that creates unfair outcomes for particular groups, often without discriminatory intent C) The difference between a model's average prediction and the true value, caused by model underfitting D) An intentional design choice by data scientists who want to discriminate against certain populations
Question 2
Athena's HR screening model amplified the bias in its training data: 78% of model-recommended candidates were under 35, compared to 62% in the historical hiring data. Why do machine learning models tend to amplify existing biases rather than simply replicate them?
A) Machine learning models are programmed to favor the majority class B) Gradient descent algorithms contain inherent age discrimination C) Models optimize for patterns that predict historical outcomes, and correlated features compound the signal, pushing predictions further in the direction of the dominant pattern D) The AutoML tool Athena used was specifically designed to maximize speed over fairness
Question 3
Which of the following best describes historical bias in training data?
A) The data contains recording errors from outdated measurement instruments B) The data faithfully reflects a world that was itself unfair, and the model learns to replicate that unfairness C) The data was collected too long ago to be relevant to current conditions D) The data was deliberately manipulated to produce biased outcomes
Question 4
A credit scoring model does not include race as an input feature. Can the model still produce racially biased outcomes?
A) No — if race is not an input, the model cannot discriminate by race B) Yes — proxy variables such as ZIP code, name frequency, and school attended can reconstruct racial information from correlated features C) Only if the model was trained on data that explicitly included race D) Only if the data scientist intentionally included proxy variables
Question 5
The Obermeyer et al. (2019) study in Science found that a widely used healthcare algorithm discriminated against Black patients. What was the mechanism of bias?
A) The algorithm explicitly used race as a feature and assigned lower scores to Black patients B) The algorithm used healthcare costs as a proxy for health needs, but Black patients incurred lower costs for the same level of illness due to systemic barriers to care C) The algorithm was trained on data from a single hospital that served primarily white patients D) The algorithm used an outdated clinical threshold that did not account for biological differences
Question 6
In the Gender Shades study (Buolamwini & Gebru, 2018), the three commercial facial recognition systems evaluated all had overall accuracy above 90%. Why was this aggregate metric misleading?
A) The benchmark dataset was too small to draw conclusions B) The 90% figure included test images that were too easy and inflated the results C) Disaggregated analysis revealed error rates as high as 34.7% for darker-skinned women — a disparity invisible in the aggregate D) The researchers used an unfair testing methodology that disadvantaged the vendors
Question 7
The "four-fifths rule" for disparate impact states that:
A) A selection process is biased if it selects fewer than 80% of candidates from any group B) A selection process has adverse impact if the selection rate for a protected group is less than 80% of the selection rate for the group with the highest rate C) At least four out of five candidates in any selection process must be from the majority group D) Model accuracy must be at least 80% for all demographic subgroups
Question 8
Athena's HR screening model has a selection rate of 45% for candidates under 35 and 19.6% for candidates 35 and over. What is the disparate impact ratio?
A) 0.436 B) 2.296 C) 0.800 D) 0.254
Question 9
What is the key difference between demographic parity and equalized odds as fairness metrics?
A) Demographic parity is used for hiring; equalized odds is used for lending B) Demographic parity requires equal selection rates across groups regardless of qualification; equalized odds requires equal true positive and false positive rates, conditioning on actual outcome C) Demographic parity is legally required; equalized odds is optional D) There is no meaningful difference; they always produce the same result
Question 10
Chouldechova (2017) proved that when base rates differ across groups, it is mathematically impossible to simultaneously satisfy:
A) Accuracy, precision, and recall B) Calibration, equal false positive rates, and equal false negative rates C) Demographic parity, equalized odds, and model accuracy above 90% D) Pre-processing fairness, in-processing fairness, and post-processing fairness
Question 11
Amazon's AI recruiting tool, trained on 10 years of resume data, penalized resumes containing the word "women's." Which source of bias is primarily responsible?
A) Measurement bias — the resumes were measured incorrectly B) Historical bias — the training data reflected a decade of predominantly male hiring in tech, and the model learned to replicate that pattern C) Evaluation bias — the test set did not include enough women's resumes D) Deployment bias — the model was used for a purpose it was not designed for
Question 12
A predictive policing model directs more patrols to neighborhoods with high historical arrest rates, leading to more arrests in those neighborhoods, which generates more training data reinforcing the original prediction. This is an example of:
A) Historical bias B) Representation bias C) A feedback loop D) Aggregation bias
Question 13
Which mitigation strategy modifies the model's outputs after training to achieve fairness goals?
A) Resampling the training data B) Adversarial debiasing during training C) Threshold adjustment using group-specific decision boundaries D) Adding a fairness penalty to the loss function
Question 14
Ravi's internal investigation reveals that only 12% of candidates rejected by Athena's HR screening model received a secondary human review. This pattern — where humans defer to algorithmic outputs even when they have the authority to override them — is called:
A) Confirmation bias B) Automation bias C) Anchoring bias D) Deployment bias
Question 15
In the "three lines of defense" framework for AI bias:
A) First line = regulators, Second line = auditors, Third line = model builders B) First line = model builders (test for bias before deployment), Second line = governance function (independent review), Third line = internal audit (periodic bias monitoring) C) First line = training data, Second line = algorithm, Third line = output D) First line = legal review, Second line = ethics committee, Third line = public relations
Question 16
A company removes gender from its loan approval model. After deployment, the model still shows significant gender-based disparate impact. The most likely explanation is:
A) The model memorized gender information from the training phase and retained it after removal B) Other features in the model (occupation, part-time status, browser type, shopping patterns) serve as proxy variables that correlate with gender C) The model is broken and needs to be retrained from scratch D) The disparate impact is coincidental and not related to gender
Question 17
Under the EU AI Act (2024), AI systems used in employment decisions are classified as:
A) Minimal risk — no regulatory requirements B) Limited risk — transparency obligations only C) High risk — requiring conformity assessments, human oversight, and bias testing D) Unacceptable risk — prohibited entirely
Question 18
Which statement best describes the relationship between model accuracy and fairness?
A) Higher accuracy always leads to greater fairness B) Fairness always reduces accuracy significantly, making it impractical for business applications C) Aggregate accuracy can mask group-level injustice; a model can be highly accurate overall while systematically failing specific demographic groups D) Accuracy and fairness are completely independent — improving one has no effect on the other
Question 19
An AI dermatology model achieves AUC > 0.90 for lighter skin tones but AUC < 0.75 for darker skin tones. The primary cause is:
A) Biological differences in skin cancer presentation that make darker-skinned patients inherently harder to diagnose B) Representation bias — fewer than 5% of images in standard dermatology training datasets depict dark-skinned patients C) The model architecture is fundamentally unsuited for images of darker skin D) Evaluation bias — the AUC metric is unreliable for darker-skinned populations
Question 20
Professor Okonkwo says: "The most important anti-bias technology is not an algorithm. It is a culture where someone can say, 'I found a problem,' and the response is 'Thank you for finding it,' not 'You're slowing us down.'" This statement emphasizes that:
A) Technical solutions to bias are unnecessary as long as organizational culture is strong B) Bias mitigation is primarily an organizational and cultural challenge, not just a technical one — tools and algorithms matter, but without a culture of accountability, they will not be used effectively C) Data scientists should not be responsible for bias detection D) AI development should be slowed down until bias is fully solved
Question 21
Which of the following is the best first step when a bias audit reveals disparate impact in a deployed model?
A) Lower the classification threshold for the disadvantaged group B) Halt the model and conduct a root-cause investigation before choosing a mitigation strategy C) Retrain the model with the sensitive attribute removed D) Add more data from the disadvantaged group and retrain immediately
Question 22
NK says: "Bias isn't a bug. It's a feature of systems designed by and for a narrow slice of humanity." The AI Now Institute finding that supports this claim is:
A) AI models have an inherent mathematical tendency toward bias B) The AI workforce is approximately 80% male and over 70% white in the US, and homogeneous teams produce systems with systematic blind spots C) AI companies intentionally design systems to discriminate D) Academic AI research focuses exclusively on populations in the United States
Question 23
A post-processing threshold adjustment reduces age-based disparate impact in Athena's model from a DI ratio of 0.574 to approximately 0.97. Overall accuracy drops from 67.8% to 65.1%. What is the most accurate characterization of this result?
A) The accuracy loss is unacceptable and the model should not be adjusted B) The fairness improvement is modest and may not justify the accuracy reduction C) A 2.7-percentage-point accuracy reduction is modest relative to the substantial reduction in discriminatory impact, and the original accuracy itself was inflated by the model's bias toward the majority group D) Threshold adjustment always produces exactly this magnitude of accuracy-fairness tradeoff
Question 24
Lena Park explains that under US employment law, "the algorithm did it" is:
A) A valid legal defense against discrimination claims B) Not a defense — the law treats AI-driven discrimination the same as human-driven discrimination, and algorithmic discrimination may be more legally risky because the pattern is systematic and provable C) A valid defense only if the company can demonstrate that no human reviewed the algorithm's decisions D) A defense that is accepted in US courts but not in EU courts
Question 25
Tom writes in his notebook: "Technical excellence without ethical awareness is not excellence. It is sophisticated negligence." In the context of this chapter, "sophisticated negligence" best describes:
A) Building accurate models while ignoring their impact on people — using technical skill to create systems that efficiently replicate and amplify historical injustice B) Failing to achieve high model accuracy C) Using outdated machine learning algorithms instead of modern ones D) Deploying models without sufficient computational resources
Answer Key Guidance
This quiz covers material from Chapter 25 only. Questions 1-6 test conceptual understanding of bias sources and real-world cases. Questions 7-10 test fairness metrics and measurement. Questions 11-16 test bias mechanisms and mitigation. Questions 17-19 test legal and domain-specific applications. Questions 20-25 test organizational responsibility and synthesis. Detailed answer explanations are available in the appendix.