Exercises — Chapter 10: Bias in Hiring and HR Systems

DataField.Dev

Exercises — Chapter 10: Bias in Hiring and HR Systems

Part 2: Bias and Fairness

Difficulty Scale: ⭐ Foundational | ⭐⭐ Applied | ⭐⭐⭐ Analytical | ⭐⭐⭐⭐ Integrative

† = Recommended for graded assessment

Part A: Foundational Knowledge

Exercise 10.1 ⭐ Map the AI-powered hiring pipeline for a mid-sized technology company recruiting software engineers. For each of the following stages, identify (a) what the AI tool does, (b) one example vendor or tool type, and (c) one specific bias risk:

Candidate sourcing
Résumé screening
Skills assessment
Video interview
Offer generation

Exercise 10.2 ⭐ Define the following terms in your own words and provide one concrete example of each from the chapter:

Adverse impact
The four-fifths rule
Criterion validity
Proxy variable
Reasonable accommodation

Exercise 10.3 ⭐ Match each legal framework to its primary application in AI hiring:

Legal Framework	Primary Hiring Application
Title VII of the Civil Rights Act	A. Requires accommodation alternatives for AI assessment tools
Americans with Disabilities Act	B. Prohibits discrimination based on race, color, religion, sex, national origin
Age Discrimination in Employment Act	C. Requires annual bias audits for automated hiring tools in NYC
NYC Local Law 144	D. Classifies AI hiring tools as high-risk, requiring documentation and oversight
EU AI Act	E. Protects workers aged 40 and older from employment discrimination

Exercise 10.4 ⭐ Explain the "proxy variable" problem as it applies to the following scenario: A company trains its ATS model on data from its historical hires, which include candidates' undergraduate institution names as a feature. The model learns to weight applicants from highly selective universities more favorably. What protected characteristics might be proxied by university selectivity, and why does this matter legally?

Exercise 10.5 ⭐⭐ The four-fifths rule in practice. An ATS system at a large financial services firm has processed 10,000 applications for analyst roles. Calculate whether adverse impact is present for each demographic group in the table below, applying the four-fifths rule:

Group	Applications	Passed Screening	Selection Rate	Adverse Impact?
White candidates	5,000	2,000	?	—
Black candidates	2,000	600	?	?
Hispanic candidates	1,500	450	?	?
Asian candidates	1,500	680	?	?

Show your calculations. For any group showing adverse impact, identify what investigation should follow.

Part B: Applied Analysis

Exercise 10.6 ⭐⭐ † The Résumé Audit

Obtain or create two versions of a résumé for a hypothetical candidate applying for the same role — one formatted using a graphic design template (with columns, graphics, and non-standard fonts) and one formatted as a plain text document using standard résumé conventions. Submit both versions to an ATS simulation tool (several are available free online, including Jobscan and Resume Worded).

(a) What differences in parsing quality, keyword detection, and scoring do you observe between the two formats? (b) What types of candidates are most likely to use the design-format résumé, and what are the equity implications of ATS incompatibility with this format? (c) What should ATS vendors do differently, and what should employers require of vendors regarding format compatibility?

Exercise 10.7 ⭐⭐ The Career Gap Analysis

Review the following three résumé excerpts and evaluate how a standard ATS keyword-scoring system would likely treat each gap:

Résumé A: 18-month gap (2019–2021) — no explanation provided

Résumé B: 18-month gap labeled "Career break — caregiving for family member"

Résumé C: 18-month gap labeled "Freelance consulting and skills development"

(a) How might keyword-based ATS systems score each gap differently? (b) Which types of candidates are most likely to have gaps that resemble Résumé A or B versus Résumé C? (c) What ATS design changes would reduce career-gap discrimination against caregivers and veterans?

Exercise 10.8 ⭐⭐ Name Discrimination Simulation

Based on the Bertrand and Mullainathan (2004) methodology and its replications, design a small-scale audit study to test whether a specific company's online application system produces name-based disparate outcomes. Your design should specify:

The name pairs you would use (and how you selected them)
The role(s) you would target
The résumé controls you would hold constant
The outcome measure(s) you would track
The sample size needed to detect a statistically significant effect
The ethical considerations of conducting this type of audit

Note: Do not actually submit fake applications; describe the design as a research proposal.

Exercise 10.9 ⭐⭐ Read the following hypothetical vendor pitch and evaluate its claims:

"Our AI video interview platform analyzes candidates' verbal responses, vocal patterns, and engagement signals to generate a performance prediction score with 89% predictive accuracy. Our system is validated against client hire data across 150 enterprise customers. We have conducted internal adverse impact testing and found no significant disparities across race or gender groups."

Identify at least four specific follow-up questions you would ask this vendor before deploying the platform, explaining why each question matters.

Exercise 10.10 ⭐⭐ † The Disability Accommodation Gap

For each of the following candidates, identify which AI hiring tool creates a potential ADA accommodation issue, what the specific barrier is, and what a reasonable accommodation alternative would be:

Candidate	Condition	AI Tool in Use
Amara	Autism spectrum disorder	HireVue video interview with facial analysis
David	ADHD	45-minute timed cognitive ability test
Priya	Bell's palsy (partial facial paralysis)	Video interview with vocal and facial analysis
James	Stutter	AI voice analysis interview tool
Chen	Anxiety disorder with physiological symptoms	AI-proctored online assessment flagging head movements

For each case, describe what the accommodation should be and how the employer should communicate accommodation options to candidates proactively.

Part C: Analytical Thinking

Exercise 10.11 ⭐⭐⭐ Competing Fairness Frameworks

Chapter 9 introduced multiple fairness metrics — demographic parity, equalized odds, individual fairness, and others. Apply these frameworks to AI résumé screening:

(a) Define what "demographic parity" would require of an ATS system. (b) Define what "individual fairness" would require. (c) Explain a scenario in which maximizing demographic parity in ATS screening would conflict with maximizing predictive validity for job performance. (d) What does this tension suggest about the limits of a purely technical approach to fair hiring AI?

Exercise 10.12 ⭐⭐⭐ The Amazon Case Mechanism

Drawing on Chapter 7's introduction and Chapter 10's analysis, explain in precise technical terms how Amazon's ML-based résumé screening tool learned to systematically downrank women's résumés. Your explanation should address:

What training data the model used and why that data was problematic
What specific features in résumés correlated with gender in the training data
Why the model's behavior was not a bug but an emergent property of the optimization objective
What the company would have needed to do differently during development to catch this problem
Why the same dynamic could emerge in any organization that trains an ML model on historical hiring data

Exercise 10.13 ⭐⭐⭐ † Vendor Due Diligence Framework

You are the Chief People Officer at a 5,000-employee company. Your recruiting director has proposed deploying a new AI résumé screening platform that the vendor claims will cut time-to-hire by 40% and improve candidate quality. Design a vendor due diligence framework covering:

Part 1: Validity Assessment - What validity evidence is required (type, source, independence)? - What is the minimum acceptable standard for criterion validity? - How do you verify that validation studies apply to your specific roles and candidate pool?

Part 2: Adverse Impact Assessment - What adverse impact data must the vendor provide? - What is your organization's threshold for acceptable adverse impact before deployment? - What monitoring commitments must the vendor make post-deployment?

Part 3: Accommodation and Accessibility - What accommodation alternatives must be available? - How must accommodation be communicated to candidates? - Who is responsible for accommodation logistics — vendor or employer?

Part 4: Contract and Liability - What contractual representations must the vendor make? - What remedies does your contract provide if adverse impact is discovered post-deployment? - How does EEOC guidance on employer liability affect your vendor contracting approach?

Exercise 10.14 ⭐⭐⭐ The Flight Risk Dilemma

Your organization's HR analytics team presents a flight risk prediction model with the following characteristics: - 78% accuracy at predicting 12-month voluntary departures - Key predictors: performance trend, communication metadata, tenure, recent accommodation requests, access card swipe patterns - Demographic analysis: the model flags women with young children at 2.3x the rate of men; the model flags employees who have filed accommodation requests at 1.8x the rate of other employees

(a) Identify the legal risks associated with using this model for retention investment decisions. (b) Identify the design flaws that produced the demographic disparities in the flagging rates. (c) Would removing gender and disability status as explicit features from the model solve the problem? Why or why not? (d) What would you recommend your organization do: use the model as-is, redesign it, or abandon it? Justify your recommendation.

Exercise 10.15 ⭐⭐⭐ NYC Local Law 144 Implementation

Your company uses an AI résumé screening tool and an AI video interview platform for positions in your New York City office. Design an implementation plan for compliance with NYC Local Law 144, covering:

Timeline for achieving compliance
Who is responsible for each compliance element
How you will identify and engage an independent bias auditor
What adverse impact metrics will be calculated and how
How you will prepare and publish public disclosure of audit results
How you will implement candidate notification and information provision
How you will handle a scenario where the audit reveals adverse impact above the threshold

Part D: Integrative and Capstone Exercises

Exercise 10.16 ⭐⭐⭐⭐ † The Ethical AI Hiring Audit

Select a publicly available AI hiring tool (or use a hypothetical tool based on published vendor documentation). Conduct a structured ethical audit covering:

1. Validity Assessment - What does the tool claim to measure? - What independent validation evidence exists? - Rate the validity evidence strength: None / Insufficient / Adequate / Strong

2. Adverse Impact Profile - What protected groups are at risk of adverse impact? - What adverse impact data has the vendor published? - Apply the four-fifths rule to any available demographic data

3. Legal Compliance - Does the tool appear to comply with Title VII, ADA, and ADEA requirements? - Does it meet NYC Local Law 144 requirements (if applicable)? - Does it meet EU AI Act requirements (if applicable for your scenario)?

4. Accommodation Adequacy - What accommodation pathways are documented? - Are they proactively communicated to candidates? - Are they operationally feasible?

5. Recommendation - Deploy / Deploy with conditions / Do not deploy - If deploying with conditions: what conditions? - If not deploying: what alternative do you recommend?

Write a 1,500-word audit report suitable for presentation to a Chief People Officer.

Exercise 10.17 ⭐⭐⭐⭐ Bias Audit Design

Design a complete bias audit methodology for an organization that uses AI tools at the following hiring stages: (1) ATS keyword screening, (2) video interview AI scoring, and (3) cognitive ability assessment. Your audit design should specify:

The data to be collected at each stage
The demographic groupings to be analyzed
The statistical tests to be applied
The adverse impact thresholds that trigger action
The documentation requirements
The human roles in the audit process
The cadence and reporting structure
What happens when adverse impact is detected

Your design should be sufficiently detailed that a team with access to standard HR data and a statistician could execute it.

Exercise 10.18 ⭐⭐⭐⭐ † The Stakeholder Analysis

An organization is deploying a new AI-based hiring pipeline. Using the stakeholder framework from Chapter 4, identify and analyze all relevant stakeholders:

For each stakeholder group, specify: - Who they are (be specific: not just "candidates" but "candidates with disabilities," "international candidates," "candidates with career gaps") - What interests they have at stake - What power they have to influence outcomes - What risks they face - What the organization's obligations toward them are

Then, using the information from all stakeholder analyses, make a recommendation about: (a) which stakeholder harms are most serious and require immediate mitigation, and (b) what governance structures would most effectively represent all stakeholders' interests in ongoing AI hiring tool oversight.

Exercise 10.19 ⭐⭐⭐⭐ Global Comparison: US vs. EU AI Hiring Regulation

Compare the regulatory framework for AI hiring tools in the United States (federal law, EEOC guidance, and NYC Local Law 144) with the EU AI Act framework. Structure your comparison as a table covering:

What is regulated
What is required of employers
What is required of vendors
What candidate rights are created
What enforcement mechanisms exist
What penalties apply

Based on your comparison: (a) In what specific ways are EU candidates better protected than US candidates? (b) What US regulatory changes would be most impactful for improving protection of job seekers? (c) What arguments are made against imposing EU-style requirements in the US, and how strong are these arguments?

Exercise 10.20 ⭐⭐⭐⭐ The Culture Fit Algorithm Problem

A technology company's data science team has built an AI "culture fit" scoring model trained on assessment data from current employees rated as "high performers" by their managers. The model uses personality assessment results, communication style patterns, and cognitive test results to predict culture fit for new hires. The team reports that the model has a correlation of 0.42 with manager ratings of new hire fit at 90 days.

(a) Evaluate the 0.42 correlation coefficient: is this an adequate validity coefficient for a high-stakes employment decision? What is the appropriate benchmark?

(b) The existing "high performer" population used to train the model is 72% male and 85% White. Walk through the specific mechanism by which the culture fit model could encode demographic similarity into its predictions.

(c) The team proposes that the model can be debiased by removing gender and race from the feature set. Evaluate this claim — will it work? Draw on the concept of proxy variables in your answer.

(d) Management argues that the model will improve diversity because it is more objective than manager intuition. Evaluate this claim critically.

(e) Design an alternative approach to "culture" assessment that does not rely on demographic similarity as a predictor.

Exercise 10.21 ⭐⭐ The HireVue Ethics Timeline

Construct a chronological timeline of the HireVue case from 2014 to 2023, identifying: - Key deployment milestones - Civil liberties and regulatory challenges - HireVue's public responses - The January 2021 facial analysis abandonment - NYC Local Law 144 passage and implementation

For each event, note whether it was driven by: (a) internal ethics review, (b) external legal/regulatory pressure, (c) media and civil society pressure, or (d) market forces. What pattern emerges, and what does it suggest about what drives AI ethics action in hiring technology?

Exercise 10.22 ⭐⭐ The Structured Interview Alternative

Research shows that structured interviews — with standardized questions, behavioral anchors, and consistent scoring rubrics — have higher predictive validity for job performance than unstructured interviews and that inter-rater reliability is substantially higher. Yet many organizations prefer AI video interviews over structured human interviews.

(a) What organizational factors drive preference for AI video interviews over structured human interviews, even when structured human interviews have stronger validity evidence? (b) What would it cost (in time, training, and process design) to implement structured interviews at the scale that a major employer needs? (c) Design a hybrid model that combines the efficiency benefits of AI screening with the validity benefits of structured human interviewing, specifying where each element is applied in the pipeline.

Exercise 10.23 ⭐⭐⭐ The Amazon Case Counterfactual

Amazon's ML-based résumé screening system was disbanded in 2018 after it was discovered to systematically penalize women's résumés. Suppose you had been a member of Amazon's AI ethics team in 2015, when the system was in development. What process interventions — at what stages of development — could have detected the gender bias before the system was deployed?

Your answer should address: - What tests or evaluations could have been performed on the training data before model training - What bias tests could have been applied during model development - What validation requirements could have been applied before deployment - What ongoing monitoring could have caught the problem post-deployment - What organizational structures would have been needed to make these interventions effective

Exercise 10.24 ⭐⭐⭐ The Flight Risk Self-Fulfilling Prophecy

Analyze the following causal mechanism: a flight risk model identifies employees as "high departure risk" based on patterns associated with being a woman with a young child. The organization responds by investing less in these employees' development and passing them over for stretch assignments. The employees, receiving less development and fewer opportunities, become more likely to leave. The model's subsequent accuracy improves.

(a) Describe this as a feedback loop, identifying all causal relationships. (b) Using concepts from Chapter 8, explain why this dynamic is an instance of historical bias becoming self-reinforcing. (c) What monitoring or circuit-breaker mechanisms would a well-designed flight risk program build in to prevent this feedback loop? (d) Is it possible to design a flight risk program that does not create these feedback loops? What would it require?

Exercise 10.25 ⭐⭐⭐⭐ † The Ethics Policy Memo

You are the Head of HR Ethics at a 25,000-employee organization. The CEO has asked you to draft an AI hiring ethics policy that will govern the organization's use of AI tools in hiring and employment decisions. The policy must be:

Practical enough to be implemented by HR teams without legal or technical specialization
Legally sound with respect to Title VII, ADA, ADEA, and NYC Local Law 144
Sufficiently protective of candidates' rights to withstand civil liberties scrutiny
Aligned with EEOC guidance on AI and employment discrimination

Draft the policy (approximately 1,200 words) covering: 1. Scope: which tools and decisions the policy covers 2. Validity standards: what evidence is required before deploying any AI hiring tool 3. Adverse impact: monitoring requirements, thresholds, and required actions 4. Accommodation: requirements for candidate accommodation in AI assessment 5. Transparency: what must be disclosed to candidates, and how 6. Human oversight: where human judgment is required in the pipeline 7. Vendor accountability: what vendor contracts must include 8. Governance: who is responsible for policy enforcement and annual review

Include at least three concrete examples illustrating how the policy would apply to specific tools or scenarios.