Chapter 25 Quiz: Bias in AI Systems


Question 1

What is the most accurate description of AI bias?

A) A rare defect in poorly designed models that can be eliminated with better engineering B) A systematic and repeatable error in an AI system that creates unfair outcomes for particular groups, often without discriminatory intent C) The difference between a model's average prediction and the true value, caused by model underfitting D) An intentional design choice by data scientists who want to discriminate against certain populations


Question 2

Athena's HR screening model amplified the bias in its training data: 78% of model-recommended candidates were under 35, compared to 62% in the historical hiring data. Why do machine learning models tend to amplify existing biases rather than simply replicate them?

A) Machine learning models are programmed to favor the majority class B) Gradient descent algorithms contain inherent age discrimination C) Models optimize for patterns that predict historical outcomes, and correlated features compound the signal, pushing predictions further in the direction of the dominant pattern D) The AutoML tool Athena used was specifically designed to maximize speed over fairness


Question 3

Which of the following best describes historical bias in training data?

A) The data contains recording errors from outdated measurement instruments B) The data faithfully reflects a world that was itself unfair, and the model learns to replicate that unfairness C) The data was collected too long ago to be relevant to current conditions D) The data was deliberately manipulated to produce biased outcomes


Question 4

A credit scoring model does not include race as an input feature. Can the model still produce racially biased outcomes?

A) No — if race is not an input, the model cannot discriminate by race B) Yes — proxy variables such as ZIP code, name frequency, and school attended can reconstruct racial information from correlated features C) Only if the model was trained on data that explicitly included race D) Only if the data scientist intentionally included proxy variables


Question 5

The Obermeyer et al. (2019) study in Science found that a widely used healthcare algorithm discriminated against Black patients. What was the mechanism of bias?

A) The algorithm explicitly used race as a feature and assigned lower scores to Black patients B) The algorithm used healthcare costs as a proxy for health needs, but Black patients incurred lower costs for the same level of illness due to systemic barriers to care C) The algorithm was trained on data from a single hospital that served primarily white patients D) The algorithm used an outdated clinical threshold that did not account for biological differences


Question 6

In the Gender Shades study (Buolamwini & Gebru, 2018), the three commercial facial recognition systems evaluated all had overall accuracy above 90%. Why was this aggregate metric misleading?

A) The benchmark dataset was too small to draw conclusions B) The 90% figure included test images that were too easy and inflated the results C) Disaggregated analysis revealed error rates as high as 34.7% for darker-skinned women — a disparity invisible in the aggregate D) The researchers used an unfair testing methodology that disadvantaged the vendors


Question 7

The "four-fifths rule" for disparate impact states that:

A) A selection process is biased if it selects fewer than 80% of candidates from any group B) A selection process has adverse impact if the selection rate for a protected group is less than 80% of the selection rate for the group with the highest rate C) At least four out of five candidates in any selection process must be from the majority group D) Model accuracy must be at least 80% for all demographic subgroups


Question 8

Athena's HR screening model has a selection rate of 45% for candidates under 35 and 19.6% for candidates 35 and over. What is the disparate impact ratio?

A) 0.436 B) 2.296 C) 0.800 D) 0.254


Question 9

What is the key difference between demographic parity and equalized odds as fairness metrics?

A) Demographic parity is used for hiring; equalized odds is used for lending B) Demographic parity requires equal selection rates across groups regardless of qualification; equalized odds requires equal true positive and false positive rates, conditioning on actual outcome C) Demographic parity is legally required; equalized odds is optional D) There is no meaningful difference; they always produce the same result


Question 10

Chouldechova (2017) proved that when base rates differ across groups, it is mathematically impossible to simultaneously satisfy:

A) Accuracy, precision, and recall B) Calibration, equal false positive rates, and equal false negative rates C) Demographic parity, equalized odds, and model accuracy above 90% D) Pre-processing fairness, in-processing fairness, and post-processing fairness


Question 11

Amazon's AI recruiting tool, trained on 10 years of resume data, penalized resumes containing the word "women's." Which source of bias is primarily responsible?

A) Measurement bias — the resumes were measured incorrectly B) Historical bias — the training data reflected a decade of predominantly male hiring in tech, and the model learned to replicate that pattern C) Evaluation bias — the test set did not include enough women's resumes D) Deployment bias — the model was used for a purpose it was not designed for


Question 12

A predictive policing model directs more patrols to neighborhoods with high historical arrest rates, leading to more arrests in those neighborhoods, which generates more training data reinforcing the original prediction. This is an example of:

A) Historical bias B) Representation bias C) A feedback loop D) Aggregation bias


Question 13

Which mitigation strategy modifies the model's outputs after training to achieve fairness goals?

A) Resampling the training data B) Adversarial debiasing during training C) Threshold adjustment using group-specific decision boundaries D) Adding a fairness penalty to the loss function


Question 14

Ravi's internal investigation reveals that only 12% of candidates rejected by Athena's HR screening model received a secondary human review. This pattern — where humans defer to algorithmic outputs even when they have the authority to override them — is called:

A) Confirmation bias B) Automation bias C) Anchoring bias D) Deployment bias


Question 15

In the "three lines of defense" framework for AI bias:

A) First line = regulators, Second line = auditors, Third line = model builders B) First line = model builders (test for bias before deployment), Second line = governance function (independent review), Third line = internal audit (periodic bias monitoring) C) First line = training data, Second line = algorithm, Third line = output D) First line = legal review, Second line = ethics committee, Third line = public relations


Question 16

A company removes gender from its loan approval model. After deployment, the model still shows significant gender-based disparate impact. The most likely explanation is:

A) The model memorized gender information from the training phase and retained it after removal B) Other features in the model (occupation, part-time status, browser type, shopping patterns) serve as proxy variables that correlate with gender C) The model is broken and needs to be retrained from scratch D) The disparate impact is coincidental and not related to gender


Question 17

Under the EU AI Act (2024), AI systems used in employment decisions are classified as:

A) Minimal risk — no regulatory requirements B) Limited risk — transparency obligations only C) High risk — requiring conformity assessments, human oversight, and bias testing D) Unacceptable risk — prohibited entirely


Question 18

Which statement best describes the relationship between model accuracy and fairness?

A) Higher accuracy always leads to greater fairness B) Fairness always reduces accuracy significantly, making it impractical for business applications C) Aggregate accuracy can mask group-level injustice; a model can be highly accurate overall while systematically failing specific demographic groups D) Accuracy and fairness are completely independent — improving one has no effect on the other


Question 19

An AI dermatology model achieves AUC > 0.90 for lighter skin tones but AUC < 0.75 for darker skin tones. The primary cause is:

A) Biological differences in skin cancer presentation that make darker-skinned patients inherently harder to diagnose B) Representation bias — fewer than 5% of images in standard dermatology training datasets depict dark-skinned patients C) The model architecture is fundamentally unsuited for images of darker skin D) Evaluation bias — the AUC metric is unreliable for darker-skinned populations


Question 20

Professor Okonkwo says: "The most important anti-bias technology is not an algorithm. It is a culture where someone can say, 'I found a problem,' and the response is 'Thank you for finding it,' not 'You're slowing us down.'" This statement emphasizes that:

A) Technical solutions to bias are unnecessary as long as organizational culture is strong B) Bias mitigation is primarily an organizational and cultural challenge, not just a technical one — tools and algorithms matter, but without a culture of accountability, they will not be used effectively C) Data scientists should not be responsible for bias detection D) AI development should be slowed down until bias is fully solved


Question 21

Which of the following is the best first step when a bias audit reveals disparate impact in a deployed model?

A) Lower the classification threshold for the disadvantaged group B) Halt the model and conduct a root-cause investigation before choosing a mitigation strategy C) Retrain the model with the sensitive attribute removed D) Add more data from the disadvantaged group and retrain immediately


Question 22

NK says: "Bias isn't a bug. It's a feature of systems designed by and for a narrow slice of humanity." The AI Now Institute finding that supports this claim is:

A) AI models have an inherent mathematical tendency toward bias B) The AI workforce is approximately 80% male and over 70% white in the US, and homogeneous teams produce systems with systematic blind spots C) AI companies intentionally design systems to discriminate D) Academic AI research focuses exclusively on populations in the United States


Question 23

A post-processing threshold adjustment reduces age-based disparate impact in Athena's model from a DI ratio of 0.574 to approximately 0.97. Overall accuracy drops from 67.8% to 65.1%. What is the most accurate characterization of this result?

A) The accuracy loss is unacceptable and the model should not be adjusted B) The fairness improvement is modest and may not justify the accuracy reduction C) A 2.7-percentage-point accuracy reduction is modest relative to the substantial reduction in discriminatory impact, and the original accuracy itself was inflated by the model's bias toward the majority group D) Threshold adjustment always produces exactly this magnitude of accuracy-fairness tradeoff


Question 24

Lena Park explains that under US employment law, "the algorithm did it" is:

A) A valid legal defense against discrimination claims B) Not a defense — the law treats AI-driven discrimination the same as human-driven discrimination, and algorithmic discrimination may be more legally risky because the pattern is systematic and provable C) A valid defense only if the company can demonstrate that no human reviewed the algorithm's decisions D) A defense that is accepted in US courts but not in EU courts


Question 25

Tom writes in his notebook: "Technical excellence without ethical awareness is not excellence. It is sophisticated negligence." In the context of this chapter, "sophisticated negligence" best describes:

A) Building accurate models while ignoring their impact on people — using technical skill to create systems that efficiently replicate and amplify historical injustice B) Failing to achieve high model accuracy C) Using outdated machine learning algorithms instead of modern ones D) Deploying models without sufficient computational resources


Answer Key Guidance

This quiz covers material from Chapter 25 only. Questions 1-6 test conceptual understanding of bias sources and real-world cases. Questions 7-10 test fairness metrics and measurement. Questions 11-16 test bias mechanisms and mitigation. Questions 17-19 test legal and domain-specific applications. Questions 20-25 test organizational responsibility and synthesis. Detailed answer explanations are available in the appendix.