In 2014, Unilever faced a talent acquisition crisis familiar to every large employer: too many applications, too little time, and growing evidence that traditional screening — resume review by human recruiters — was slower, more expensive, and...
In This Chapter
- Opening: The Promise and the Problem
- Learning Objectives
- 10.1 The AI-Powered Hiring Pipeline
- 10.2 Legal Framework for Hiring Discrimination
- 10.3 Résumé Screening Bias
- 10.4 Video Interview Assessment — The HireVue Case
- 10.5 Personality and Cognitive Assessment Tools
- 10.6 AI in Performance Management and Promotion
- 10.7 AI Monitoring and "Flight Risk" Prediction
- 10.8 Disability, Accommodation, and AI Hiring Tools
- 10.9 Building Ethical AI Hiring Practices
- Discussion Questions
Chapter 10: Bias in Hiring and HR Systems
Part 2: Bias and Fairness
Opening: The Promise and the Problem
In 2014, Unilever faced a talent acquisition crisis familiar to every large employer: too many applications, too little time, and growing evidence that traditional screening — resume review by human recruiters — was slower, more expensive, and arguably more biased than any of their other business processes. The company turned to HireVue, an AI-powered platform that recorded candidates answering preset questions via webcam, then analyzed the footage using algorithms designed to predict job performance.
The results were impressive by one measure. Unilever's time-to-hire dropped by 75 percent. The company processed over half a million video interviews. Recruiters were freed from screening to focus on later-stage evaluation. HireVue marketed the system as eliminating the human biases that caused traditional hiring to fail: no more decisions driven by a candidate's likability, their appearance on a good day, or an interviewer's personal network.
What Unilever — and HireVue, and Goldman Sachs, and the hundreds of other companies that deployed the platform — did not adequately examine was the scientific foundation beneath those algorithms. HireVue's system analyzed candidates' facial expressions, word choice, and vocal tone. It claimed these signals, processed by AI, could predict who would be the best employee. There was one significant problem with this claim: it was not supported by peer-reviewed science.
In 2019, the Electronic Privacy Information Center (EPIC) filed a complaint with the Federal Trade Commission, alleging that HireVue's facial analysis technology was "an unfair and deceptive trade practice" that lacked scientific basis and could discriminate against candidates with disabilities — particularly those with autism spectrum disorder, facial paralysis, or anxiety disorders, whose facial presentation differed from the behavioral norms the AI had been trained to expect.
Two years later, in January 2021, HireVue abandoned facial expression analysis entirely. The company did not admit that the technology was discriminatory. It cited "a lack of consensus in the scientific community" — a formulation that, while technically accurate, obscured the degree to which the scientific community had never reached consensus in the first place. The technology had been deployed first, studied later, and abandoned only when civil liberties pressure became too great to ignore.
This sequence — deploy now, ask questions later, retreat when confronted — is not unique to HireVue. It is a defining pattern in AI hiring technology, and it is the central problem this chapter addresses.
AI has transformed every stage of the hiring pipeline. Résumé screening, candidate sourcing, video interviews, personality assessment, reference verification, offer prediction, onboarding, performance management, and turnover prediction have all been automated to some degree. Each transformation carries genuine benefits and genuine risks. This chapter maps where AI enters the hiring process, what legal frameworks govern hiring discrimination, where bias has been documented, and what ethical AI hiring practices look like in practice. The chapter is directed at the HR professionals, managers, and executives who make decisions about these tools — and who bear legal and ethical responsibility for what those tools do.
Learning Objectives
By the end of this chapter, readers will be able to:
- Map the stages of the AI-powered hiring pipeline and identify bias risks at each stage.
- Explain the primary legal frameworks governing employment discrimination in the United States, including Title VII, the ADA, and the four-fifths rule, and apply them to AI-driven hiring scenarios.
- Analyze the mechanisms by which automated résumé screening encodes and amplifies historical bias.
- Evaluate the scientific validity claims made by AI video interview vendors and explain why validity matters for legal compliance and ethical practice.
- Identify how AI tools used in performance management and employee monitoring create new risks for workers, including those with disabilities or protected characteristics.
- Apply a vendor due diligence framework when evaluating AI hiring tools, including questions about adverse impact data, accommodation options, and audit history.
- Design an internal bias audit process for AI hiring tools consistent with EEOC guidance and NYC Local Law 144 requirements.
- Articulate the connection between inclusive AI hiring practices and organizational diversity, drawing on the recurring themes of power, accountability, and the gap between stated values and actual practice.
10.1 The AI-Powered Hiring Pipeline
Before a human hiring manager makes a single decision, AI may have already evaluated and ranked hundreds — or thousands — of candidates. Understanding where AI enters the hiring process, what it does, and what risks it creates at each stage is the foundational skill for any executive or HR professional operating in today's talent landscape.
Sourcing: Finding Candidates Before They Find You
The first stage of AI-assisted hiring occurs before any application is submitted. AI sourcing tools — platforms like LinkedIn Recruiter, HireEZ (formerly Hiretual), Seekout, and Entelo — scrape publicly available professional data to identify passive candidates: people who are not actively looking for work but whose profiles suggest they might be a good match.
These tools use machine learning to predict "candidate fit" based on historical hiring data. The bias risk enters immediately: if the model learns what a successful hire looks like from a company's historical hire data, and that historical data reflects a workforce that skewed toward certain demographics, the sourcing algorithm will preferentially surface candidates who resemble historical hires. LinkedIn's own researchers have documented this dynamic: their "People You May Know" and talent recommendation algorithms have shown gender imbalance in recommendations, reflecting the imbalance in the underlying professional network data.
The sourcing stage is particularly dangerous because it is invisible to candidates and often invisible to HR teams. A recruiting manager who asks a sourcing tool to "find engineers like our best engineers" may not realize they have just instructed an AI to replicate whatever demographic profile currently dominates their engineering team.
Résumé Screening: The First Filter
Once applications arrive, AI-powered applicant tracking systems (ATS) — products from vendors including Workday, SAP SuccessFactors, iCIMS, Greenhouse, Lever, and Oracle Taleo — parse résumés, extract structured information, and rank or filter candidates before any human review occurs.
According to industry surveys, 99 percent of Fortune 500 companies use ATS platforms. Some estimates suggest that more than 75 percent of résumés submitted to large employers are never seen by a human recruiter — they are filtered out by automated screening. The bias risks at this stage are substantial and well-documented; Section 10.3 examines them in depth.
Assessment: Tests Before Interviews
For roles where employers use pre-employment testing, AI has enabled a new generation of assessment tools. Cognitive ability tests, situational judgment tests, and personality assessments — once administered in person or via paper — are now delivered digitally, with AI-powered proctoring that monitors candidate behavior during the test. Platforms like Criteria Corp, HireSelect, Wonderlic, and Revelian offer digital cognitive and personality assessments. Game-based assessments from Pymetrics (now part of Harver) and Knack translate cognitive and personality measurement into app-based tasks that feel less like tests.
The bias risks here are multiple. Cognitive ability assessments have historically shown race-based adverse impact — Black and Hispanic candidates, on average, score lower than White candidates on these instruments, due to factors including differential educational opportunity rather than differential cognitive ability. AI proctoring introduces additional risks: systems that flag "suspicious" eye movements or posture may penalize candidates whose disability or anxiety causes them to move differently.
Video Interviews: AI Analysis of Human Expression
Recorded video interviews — in which candidates answer questions on camera and AI algorithms analyze the footage — represent the most controversial application of AI in hiring. HireVue, Spark Hire, myInterview, and VidCruiter are among the major vendors. HireVue is the largest and most scrutinized, and its case study occupies Section 10.4 and Case Study 01 of this chapter.
The specific signals analyzed vary by vendor, but have included facial expression analysis, eye contact patterns, speaking pace, vocal tone and affect, word choice and semantic content, and physical gesture. The core scientific validity question — whether any of these signals actually predicts job performance — is addressed in detail below.
Reference and Background Checks: AI Verification
Background screening companies including Checkr, Sterling, First Advantage, and HireRight have incorporated AI into the verification process. AI tools can cross-reference databases, flag discrepancies, and process employment verification more quickly than manual review.
Bias risks here include criminal background checks, which are documented to have disparate impact on Black and Hispanic candidates due to well-documented racial disparities in the criminal justice system. The EEOC has issued guidance specifying that blanket criminal record exclusions may violate Title VII; AI systems that automate such exclusions may perpetuate this pattern at scale.
Offer Prediction: Matching Offers to Candidates
A newer category of AI hiring tools attempts to predict whether a candidate, if offered a position, will accept it — and what compensation offer would maximize acceptance probability while minimizing cost to the employer. Vendors offer predictive offer modeling as a feature within larger talent acquisition platforms.
The bias risk here is subtle but significant: systems that predict offer acceptance probability based on historical data may calibrate lower offers for candidates belonging to groups that have historically accepted lower offers — encoding and perpetuating pay inequity rather than correcting it.
Performance Management and Retention: AI After Hire
The AI-powered employment relationship does not end at hire. Organizations increasingly use AI to support performance management (rating employees, predicting high performers, recommending promotions and terminations), turnover prediction (identifying "flight risk" employees before they leave), and employee monitoring (tracking productivity, communication patterns, and attendance). Section 10.6 and Section 10.7 address these post-hire applications in detail.
The key point to establish here is that AI's role in employment is not confined to the front door. It shapes the entire employment lifecycle, and the bias risks that appear in hiring can reappear — often amplified — throughout an employee's tenure.
10.2 Legal Framework for Hiring Discrimination
AI hiring tools do not operate in a legal vacuum. A complex set of federal, state, and local laws governs employment discrimination in the United States, and the application of those laws to AI-driven hiring is actively developing. Understanding this legal landscape is not optional for HR professionals and executives — it is a prerequisite for responsible use of any AI hiring tool.
Title VII of the Civil Rights Act (1964)
Title VII prohibits employment discrimination based on race, color, religion, sex, and national origin. It applies to employers with 15 or more employees and covers all aspects of employment, including hiring, firing, pay, job assignments, promotions, layoffs, training, and benefits.
Title VII recognizes two forms of discrimination:
Disparate treatment involves intentional discrimination — treating an individual worse because of a protected characteristic. AI systems that are explicitly programmed to screen out candidates based on protected characteristics would constitute disparate treatment, though this form is relatively rare.
Disparate impact — sometimes called adverse impact — occurs when a facially neutral practice has a disproportionately negative effect on a protected group, without business necessity justifying it. This form is far more relevant to AI hiring. An algorithm that ranks résumés using criteria that happen to correlate with race, or a video analysis system that penalizes facial expressions more common in certain ethnic groups, may have disparate impact even if race was never explicitly considered.
Under the Supreme Court's ruling in Griggs v. Duke Power Co. (1971), disparate impact can establish a Title VII violation. The employer bears the burden of demonstrating that a discriminatory practice is related to job performance and consistent with business necessity.
Age Discrimination in Employment Act (ADEA, 1967)
The ADEA protects workers aged 40 and older from discrimination in hiring, firing, and other employment decisions. AI systems trained on data that correlates certain features with age — graduation years, school names from certain eras, length of work history, or gaps consistent with retirement and return — may generate systematic age bias without any explicit age signal being considered.
Notably, the Supreme Court has applied a higher evidentiary standard for ADEA disparate impact claims than for Title VII claims (Meacham v. Knolls Atomic Power Laboratory, 2008; Gross v. FBL Financial Services, 2009), making ADEA litigation more complex.
Americans with Disabilities Act (ADA, 1990)
The ADA prohibits discrimination against qualified individuals with disabilities in all aspects of employment. It requires employers to provide reasonable accommodation to allow disabled individuals to perform essential job functions, and to participate in hiring processes.
The application of the ADA to AI hiring tools is one of the most active areas of legal development. AI assessment tools may systematically disadvantage candidates with disabilities — for example, video interview systems that penalize atypical facial affect (affecting people with autism spectrum disorder or facial paralysis), timed cognitive tests that disadvantage candidates with ADHD or processing differences, or voice analysis systems that penalize speech patterns affected by anxiety disorders or speech impediments. Section 10.8 addresses this in depth.
Critically, an employer who uses an AI assessment tool that fails to offer accommodation alternatives may be liable under the ADA — even if the failure is attributable to the vendor's design, not the employer's intent.
The Four-Fifths Rule: Measuring Adverse Impact
The EEOC's Uniform Guidelines on Employee Selection Procedures (1978) established what is commonly called the "four-fifths rule" (or "80 percent rule") as a threshold for identifying adverse impact. Under this standard:
A selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) or eighty percent of the rate for the group with the highest rate will generally be regarded as evidence of adverse impact.
For example, if 50 percent of White applicants pass a particular screening stage, but only 30 percent of Black applicants pass (30/50 = 60%, below the 80% threshold), adverse impact is indicated. This does not automatically constitute a legal violation — the employer may be able to demonstrate job-relatedness — but it triggers legal scrutiny and requires justification.
AI hiring tools, because they process applications at scale, have the potential to create adverse impact at scale. The four-fifths rule provides HR professionals with a concrete, measurable threshold to apply when auditing their own tools.
EEOC Guidance on AI and Employment Discrimination (2023)
In May 2023, the EEOC released technical assistance guidance titled "Artificial Intelligence and Algorithmic Fairness — What You Should Know." This guidance affirmed that existing law applies fully to AI-powered hiring tools, that employers can be liable for discrimination resulting from AI tools they purchase from vendors, and that the burden of demonstrating job-relatedness rests with the employer — not the vendor.
The guidance specifically called out AI tools' potential to violate the ADA by failing to provide reasonable accommodation alternatives, and emphasized that employers cannot outsource their legal responsibility to vendors.
New York City Local Law 144 (2023)
New York City enacted Local Law 144, effective July 2023, as the first legislation in the United States specifically regulating AI hiring tools. The law requires employers and employment agencies using "automated employment decision tools" to:
- Conduct an annual bias audit of the tool, performed by an independent third party
- Make the results of the bias audit publicly available prior to using the tool
- Notify candidates that an automated tool is being used in their evaluation
- Provide candidates information about the tool and its scoring criteria upon request
The law defines an "automated employment decision tool" as any computational process that substantially assists or replaces discretionary decision-making by employers in hiring or promotion. This definition is broad enough to capture most AI résumé screening and video interview tools.
NYC Local Law 144 is currently the most concrete regulatory benchmark available to US employers. Even employers outside New York would be well-advised to treat its requirements as a baseline for responsible AI hiring practice.
EU AI Act: High-Risk Classification
The European Union AI Act (2024) classifies AI systems used in employment, worker management, and access to self-employment as high-risk AI systems. This classification triggers the Act's most stringent requirements: mandatory risk assessments before deployment, ongoing monitoring, robust documentation, transparency to affected individuals, human oversight mechanisms, and conformity assessments.
For organizations operating across US and EU markets — or that handle applications from EU candidates — the EU AI Act's requirements represent a significant compliance burden and a strong signal of the direction regulatory frameworks are moving globally. Chapter 33 addresses the EU AI Act in full; the hiring-specific provisions are introduced here to provide essential context.
Employer Liability for Vendor Tools
A critical legal principle that HR professionals often misunderstand: you can be liable for discrimination embedded in AI tools you purchase from a vendor. The EEOC has been explicit on this point. Vendor contracts that transfer liability to the vendor do not protect employers from EEOC enforcement or private litigation. The employer is the party that made the hiring decision; the employer bears the compliance obligation.
This principle has practical implications for vendor due diligence, addressed in Section 10.9. It also illustrates one of this book's core recurring themes: the question of power and accountability (T1). When an AI system causes harm, the question of who is responsible does not disappear just because technology and commercial contracts complicate the answer. It falls, legally and ethically, on the employer who chose to deploy the tool.
10.3 Résumé Screening Bias
Automated résumé screening is the most widespread application of AI in hiring, and arguably the one with the highest stakes for job seekers. Understanding how it works and where bias enters is essential for any organization that uses — or is considering — these tools.
How Automated Résumé Screening Works
Modern ATS platforms use natural language processing (NLP) to parse résumés: extracting structured fields (name, location, education, employment history, skills) from unstructured text. Once parsed, résumés are scored against job requirements using a combination of rule-based keyword matching, machine learning models trained on historical hiring data, and sometimes predictive models trained on the characteristics of past successful employees.
The scoring output may be binary (pass/fail against minimum requirements), ranked (top N candidates from a larger pool), or tiered (candidates bucketed into interview-recommended, hold, and reject categories). In many large organizations, a human recruiter never sees a résumé that does not clear the automated threshold.
The Name Discrimination Problem
One of the most replicated findings in social science is that names perceived as "Black-sounding" receive fewer callbacks than identical résumés with names perceived as "White-sounding." Marianne Bertrand and Sendhil Mullainathan's landmark 2004 study — "Are Emily and Greg More Employable than Lakisha and Jamal?" — found that résumés with White-sounding names received 50 percent more callbacks than identical résumés with Black-sounding names. Multiple replications, including a 2021 large-scale field experiment by Lincoln Quillian and colleagues, have confirmed and extended this finding.
When an AI system is trained on historical application data that includes callbacks and hire rates, it learns that certain names (and by extension, name patterns associated with specific demographic groups) correlate with negative outcomes. If the system uses name as a feature — even implicitly, through phonetic or syllabic patterns — it can perpetuate this form of discrimination at machine scale. The proxy problem (introduced in Chapter 8) is operative here: the name is a proxy for race, and discriminating on the proxy is legally equivalent to discriminating on the underlying characteristic.
Best-practice résumé screening systems remove candidate names and other demographic identifiers before scoring — a practice known as blind screening or anonymized screening. However, this practice is far from universal, and even when names are removed, other identifiers (school names, location, graduation years, extracurricular activities) may serve as proxies.
Prestige Bias: The Ivy League Filter
Many ATS systems — and many human recruiters — implicitly weight candidate qualifications based on the perceived prestige of educational institutions and prior employers. MIT and Harvard carry more weight than state universities; Goldman Sachs and McKinsey carry more weight than regional firms. When this weighting is encoded into AI screening systems, it amplifies existing socioeconomic and demographic disparities.
Elite university enrollment is itself highly stratified by race and socioeconomic status. A system that preferentially advances candidates from elite institutions is also, indirectly, preferentially advancing candidates from wealthy families and from racial groups that are overrepresented at those institutions. The filter is not facially discriminatory — it claims to select for academic excellence — but its disparate impact is substantial.
The prestige bias extends to employer names. AI systems trained on successful employee data from elite firms will learn to value experience at those firms. This creates a compounding effect: the networks that provide access to elite educational institutions also provide access to prestigious early-career employers, and AI systems trained to replicate the profile of current employees reward that network advantage.
Career Gap Penalties
Most ATS systems penalize résumé gaps — periods where no employment or enrollment is listed. The systems typically interpret gaps as negative signals about employment history or career continuity. However, career gaps are not uniformly distributed across the population:
- Caregiving gaps disproportionately affect women, who are more likely to interrupt careers for childcare, eldercare, or other caregiving responsibilities.
- Medical leave gaps affect individuals with chronic illness or disability.
- Military service transitions create gaps for veterans who served and returned to civilian employment.
- Economic displacement gaps affect workers in industries that experienced structural decline.
A system that penalizes gaps without examining their cause is using a facially neutral criterion that has disparate impact on women, veterans, and individuals with certain disabilities — all groups with some degree of legal protection.
Keyword Matching and Vocabulary Bias
ATS keyword screening operates on the principle that candidates who use specific language to describe their experience — terminology associated with specific tools, methodologies, industries, or roles — are stronger matches than those who do not. The bias risk lies in who has access to the vocabulary the system is optimized for.
Professional vocabulary is partly a function of educational background, professional training, and professional network. Candidates who attended elite universities, worked at large established firms, or have access to career coaches learn the correct terminology earlier and apply it more fluently. Candidates who have equivalent skills but developed them through less prestigious channels — community colleges, small businesses, self-teaching — may lack the vocabulary pattern the ATS expects, even if their underlying competence is equal or superior.
This dynamic creates a troubling feedback loop: companies train their ATS systems on the language of their current employee base, which reflects historical hiring patterns, which may already reflect demographic imbalance, which the system then replicates and amplifies in future hiring.
The Amazon Case Revisited
Chapter 7 introduced Amazon's ill-fated attempt to build an ML-based résumé screening tool — a system that, trained on ten years of successful hires, learned to penalize résumés that contained the word "women's" (as in "women's chess club" or "women's college") and downgraded graduates of all-women's colleges. Amazon disbanded the team in 2018, but the case remains the defining example of how even sophisticated engineering organizations can produce discriminatory AI systems from unexamined historical data.
The mechanism is worth revisiting in the context of résumé screening specifically. Amazon's training data — successful hires over a decade — reflected the company's historical workforce, which was male-dominated in technical roles. The model learned that the features shared by successful hires correlated with being male. It found those features in résumé language, school names, and activity patterns associated with men. It then systematically downranked résumés that deviated from that pattern, including women's résumés, not because it was explicitly programmed to do so, but because gender patterns were embedded in the training data.
This is historical bias (T8, Chapter 8) operating at its clearest. The model was not broken; it was doing exactly what it was designed to do. It was optimizing for the features associated with success in a historically biased environment. The problem was that "success in a historically biased environment" is not the same as "job-related competence."
What Validated Résumé Screening Looks Like
Ethical résumé screening is possible; it requires deliberate design. Best practices include:
Job-related criteria only. Screening criteria should be derived from a systematic job analysis — a documented assessment of what tasks the role requires and what knowledge, skills, and abilities those tasks demand. Criteria not connected to job requirements should be excluded.
Structured scoring with documented rationale. Each criterion should be weighted and documented, enabling audit and review. Ad hoc or intuitive scoring resists accountability.
Regular adverse impact analysis. The four-fifths rule should be applied at every screening stage. Organizations should calculate pass rates by race, gender, age group, and disability status, and investigate disparities above threshold.
Audit trails. Every screening decision should be logged — not just the outcome, but the criteria applied — so that patterns can be detected and explained.
Anonymized screening where feasible. Removing names, graduation years, and other demographic proxies from initial screening reduces the opportunity for name-based and age-based bias to operate.
10.4 Video Interview Assessment — The HireVue Case
No technology in the AI hiring landscape has attracted more scrutiny, or prompted more instructive failure, than AI-powered video interview assessment. HireVue's trajectory — from promising innovation to civil liberties controversy to partial retreat — offers the clearest available case study in what goes wrong when AI tools are deployed without adequate validity evidence, and why the "we'll fix it later" approach is legally and ethically untenable.
What HireVue and Its Competitors Claim to Do
HireVue and platforms like it — including Retorio, Modern Hire, and the AI features offered by video interview platforms like Spark Hire and myInterview — record candidates answering preset interview questions via webcam. They then apply machine learning algorithms to the recording, extracting signals from three domains:
Verbal content: What words the candidate uses, how they structure their responses, the semantic content of their answers relative to ideal responses derived from high-performing employees.
Vocal characteristics: Speaking rate, pitch variation, volume consistency, vocal confidence markers, pauses and hesitations, and what some vendors call "vocal affect" — emotional tone extracted from acoustic properties.
Facial behavior: Expressions mapped to emotion categories using facial action coding, eye contact frequency and duration, head movement, and what vendors variously describe as "engagement," "enthusiasm," or "confidence" signals.
These signals are combined into a composite score that vendors claim predicts the candidate's likelihood of being a successful employee. HireVue's marketing specifically claimed the system could identify "the best talent, faster" by removing the inconsistency and bias of human interviewers.
The Scientific Validity Problem
The core scientific question is whether these signals — particularly facial expression analysis — actually predict job performance. The answer, as of this writing, is: there is no peer-reviewed evidence that they do.
Facial expression analysis in the context of hiring draws on a contested theory called the Facial Action Coding System (FACS), developed by psychologist Paul Ekman, which proposed that certain facial movements correspond to universal emotional states. Ekman's work was enormously influential and formed the basis of multiple commercial applications. However, subsequent research — including a comprehensive review by the Association for Psychological Science in 2019 — found that facial expressions do not reliably reveal emotional states, that the relationship between expressions and emotions varies significantly across cultures, and that the premise of universal emotional expression is not supported by the empirical literature.
If the underlying science is not established, the validity of tools built on that science is undermined at its foundation. Vendors have responded to this challenge in various ways: by claiming that their systems do not necessarily require the Ekman theory to be true, only that whatever patterns they identify correlate with job performance in their training data. This argument — that empirical correlation in training data is sufficient validation — is methodologically questionable and legally insufficient.
The EEOC's Uniform Guidelines require that selection procedures, including AI-based tools, demonstrate criterion validity (the tool predicts actual job performance) or content validity (the tool measures job-related competencies). Vendor-provided validation studies that rely on training data correlations rather than independent criterion validation do not meet this standard.
Disability Discrimination in Video Assessment
Beyond the general validity problem, AI video interview tools create specific and documented risks for candidates with disabilities.
Autism spectrum disorder (ASD): People with ASD often have atypical facial affect — reduced or different patterns of facial expression that do not map to the emotional states that neurotypical faces signal. A system trained on neurotypical facial norms will systematically disadvantage candidates with ASD. This is not merely theoretical: disability advocacy organizations have documented multiple cases of ASD-related candidates reporting that video interview tools returned poor scores inconsistent with their qualifications.
Facial paralysis: Conditions including Bell's palsy, stroke, and some congenital conditions affect facial movement. Candidates with facial paralysis cannot produce the expressions that video analysis systems are calibrated to evaluate.
Anxiety disorders: Anxiety in interview contexts is common across the population, but candidates with anxiety disorders may exhibit sustained physiological symptoms — vocal tremor, irregular eye contact, altered facial expression — that video analysis systems may code as negative signals unrelated to job performance.
Speech impediments and voice differences: Vocal analysis tools are calibrated on the acoustic patterns of particular speech norms. Candidates with stuttering, hearing-related speech differences, or accents from non-dominant language backgrounds may be systematically disadvantaged.
The ADA requires that employers provide reasonable accommodation to candidates with disabilities who need alternative means to demonstrate their qualifications. An AI video interview system that offers no alternative pathway — or that scores candidates on a dimension (facial expression) where they cannot meaningfully compete due to disability — is a plausible ADA violation. The EEOC's 2023 guidance specifically identified video interview AI as a tool type that raises ADA concerns.
Racial and Gender Bias in Facial Analysis
Chapter 7 introduced the NIST Face Recognition Vendor Test (FRVT) findings, which documented that facial recognition algorithms perform significantly less accurately on darker-skinned faces, women's faces, and older faces. Joy Buolamwini and Timnit Gebru's Gender Shades research (2018) demonstrated error rate disparities of up to 34 percentage points between lighter-skinned men and darker-skinned women in commercial facial analysis systems.
These findings are directly relevant to AI video interview tools. If the underlying facial analysis technology performs differently across demographic groups — which the evidence strongly suggests it does — then video assessment scores are not measuring the same thing for all candidates. A system that is more accurate at reading facial signals in lighter-skinned faces is, effectively, evaluating lighter-skinned and darker-skinned candidates by different standards.
This is not a problem vendors can easily fix by rebalancing their training data. The underlying facial analysis technology reflects fundamental limitations in how computer vision systems have been developed — on datasets that historically overrepresented certain populations and used annotation schemes that do not translate reliably across groups (T4: Diversity and Inclusion).
HireVue's Retreat and the Regulatory Aftermath
Under sustained pressure from EPIC, the American Civil Liberties Union, several state legislatures, and unfavorable media coverage, HireVue announced in January 2021 that it was discontinuing the use of facial expression analysis. The announcement was carefully worded. HireVue did not acknowledge that the technology was discriminatory or invalid. It said that removing facial analysis would "improve fairness" — implying the technology had been fair before, merely sub-optimally — and cited "a lack of consensus in the scientific community."
What HireVue retained matters as much as what it abandoned. The company continued to offer verbal content analysis and vocal pattern analysis. These components were presented as on firmer scientific footing, though the peer-reviewed validity evidence for using vocal analysis to predict job performance is also limited.
HireVue's experience illustrates a governance pattern examined in Chapter 18: organizations deploy technology, ethics review happens after deployment when external pressure becomes sufficient, partial retreats are made with minimal acknowledgment of the full scope of the problem, and the surviving portion of the system continues with less scrutiny than the component that attracted attention. This is ethics washing (T3) in its operational form — taking the actions necessary to reduce reputational pressure while preserving as much of the commercial product as possible.
NYC Local Law 144, effective July 2023, created a concrete regulatory baseline that applies to video interview tools meeting the definition of "automated employment decision tools." Employers using HireVue and similar platforms for New York City positions must conduct annual independent bias audits and publish the results — a requirement that would have created significantly different accountability dynamics for HireVue's facial analysis phase if it had been in place earlier.
Alternative Approaches
Structured human interviews, conducted with trained interviewers using standardized questions and documented scoring rubrics, remain the gold standard for legally defensible and empirically validated interview assessment. Research consistently shows that structured interviews have higher predictive validity for job performance than unstructured interviews, and that inter-rater reliability is substantially improved when interviewers follow systematic protocols.
Validated assessment instruments — cognitive ability tests, work sample tests, structured situational judgment tests — developed by industrial-organizational psychologists using rigorous validation methodology provide evidence-based alternatives to AI video assessment. These instruments are not bias-free, but their validity and adverse impact profiles are documented, auditable, and defensible.
The practical question for HR professionals is not whether AI video interviews are convenient — they clearly are — but whether the convenience justifies the legal risk and ethical cost of using tools whose validity has not been established and whose disparate impact on protected groups has not been adequately assessed.
10.5 Personality and Cognitive Assessment Tools
The AI-based personality and cognitive assessment market has grown rapidly, driven by claims that game-based interfaces can assess candidates more engagingly than traditional tests while reducing adverse impact. The reality is more complicated.
The Proliferation of Game-Based Assessment
Pymetrics (acquired by Harver in 2022) was among the first vendors to offer game-based hiring assessment at scale. Candidates play a series of cognitive games — tasks measuring attention, memory, pattern recognition, risk tolerance, and fairness sensitivity — and an AI model trained on high-performing employees generates a match score. Knack offers similar game-based assessments, claiming to identify "hidden talent" that traditional credentials miss. Unilever, JPMorgan, and LinkedIn were among early adopters.
The value proposition is compelling: replace credential-based screening with behavior-based data, enabling candidates without Ivy League credentials to demonstrate competence directly. Some vendors claim this approach reduces adverse impact relative to traditional cognitive tests. This claim deserves scrutiny.
Validity Concerns
Do game-based assessments predict job performance? The answer depends on the specific game, the specific job, and the quality of the validation study. Some games measure established psychological constructs (working memory, processing speed, executive function) that have documented relationships to job performance in cognitive-demanding roles. Others measure constructs whose job-relevance is less established.
A particular concern with vendor-provided validation studies is the difference between "predictive validity in our client base" and "criterion validity for a specific role." A system that predicts who was most likely to be hired — or retained — at companies that were themselves selecting on demographic criteria may be validating against a biased criterion. Industrial-organizational psychologists refer to this as criterion contamination.
Cognitive Assessments and Race-Based Adverse Impact
Standardized cognitive ability tests have documented adverse impact on Black and Hispanic candidates — a finding replicated across decades of industrial-organizational research and attributed to factors including differential educational opportunity, test familiarity effects, and stereotype threat. The four-fifths rule analysis of many cognitive assessments yields disparate impact above the threshold for a number of racial/ethnic groups.
Game-based assessments have been marketed in part as reducing this adverse impact. Some early studies from vendors supported this claim. Independent replications have been more equivocal. A 2021 study published in the Journal of Applied Psychology found that some game-based cognitive assessments retained substantial race-based adverse impact despite their gamified presentation, suggesting that if the underlying construct being measured is cognitive ability, the adverse impact associated with that construct does not disappear simply because the delivery mechanism changes.
Personality Assessments and the ADA
The Big Five personality traits (openness, conscientiousness, extraversion, agreeableness, neuroticism) are among the most rigorously validated constructs in personality psychology, and conscientiousness has the strongest meta-analytic relationship to job performance across roles. However, AI-based personality assessments raise ADA concerns when they intrude into psychological health territory.
The ADA prohibits pre-offer medical examinations and inquiries. Questions that are designed or function as mental health screening — asking about anxiety symptoms, social withdrawal, emotional stability in ways that correlate with clinical diagnoses — may constitute prohibited medical inquiries even if they are framed as "personality" questions. The EEOC has issued guidance on this boundary; AI vendors and employers should ensure that personality assessment tools have been reviewed against ADA medical inquiry standards.
The "Culture Fit" Trap
Many AI assessment tools include a "culture fit" component — a prediction of how well the candidate aligns with the hiring organization's culture, often derived from comparing the candidate's assessment profile to the profile of the current employee population.
Culture fit assessment is legally and ethically hazardous precisely because organizational culture tends to reflect the demographic composition of the existing workforce. A culture fit model trained on a predominantly White, male engineering organization will tend to predict that candidates who are most similar to that demographic profile are the best cultural fits. This converts a demographic bias into a competency score. Chapter 8 introduced the proxy variable concept; culture fit AI is a systematic producer of demographic proxies.
The Society for Industrial and Organizational Psychology (SIOP) has published principles for the validation and use of personnel selection procedures that apply directly here. SIOP's principles require that any selection procedure demonstrate criterion-related validity for job performance — not cultural similarity — and that adverse impact be assessed and minimized where alternatives of equal validity exist.
10.6 AI in Performance Management and Promotion
The AI hiring pipeline extends far beyond the front door. Once employees join an organization, AI systems may evaluate their performance, predict their promotion potential, monitor their productivity, and in some cases generate recommendations for their termination. Each of these applications carries bias risks that compound over time.
From Stack Ranking to Algorithmic Performance Management
Microsoft's "stack ranking" practice — requiring managers to evaluate employees against each other, producing a distribution in which the bottom tier were systematically fired — was abandoned in 2013 after widespread internal criticism that it fostered competition over collaboration and produced outcomes that disadvantaged protected groups. The practice did not disappear from the corporate landscape; it migrated into algorithmic form.
AI performance management systems aggregate multiple data streams — project outcomes, peer reviews, manager ratings, productivity metrics, communication activity — into composite performance scores. These scores may feed into promotion algorithms, compensation adjustment tools, and succession planning systems. The promise is objectivity: removing the idiosyncratic variation of individual manager ratings. The risk is that algorithmic objectivity can mask and amplify systematic bias more effectively than human subjectivity, which at least varies.
Intersectional bias in performance AI is a documented concern. Research on manager-assigned performance ratings consistently shows that women, and particularly women of color, receive lower ratings on comparable performance than white men — a finding attributed to multiple mechanisms including in-group favoritism, double standards for evaluating "leadership," and differential attribution of success versus failure. When AI systems are trained on historical performance ratings to predict future performance or identify high-potential employees, they inherit this bias.
Amazon Warehouse: Management by Algorithm
Amazon's warehouse management system provides the most widely reported example of algorithmic performance management. The company's system tracks fulfillment center workers in near-real-time, monitoring units processed per hour, time off-task, breaks, and other productivity metrics. Workers receive automated coaching notices when they fall below productivity targets; the system can generate automatic termination recommendations for sustained underperformance.
The case is significant for several reasons. First, the workers most affected are disproportionately workers of color — Amazon's fulfillment center workforce reflects the demographics of the communities where warehouses are located, which tend to be lower-income areas with large minority populations. Second, the system's productivity standards were set without adequate analysis of disability accommodations — the target rates may be inaccessible to workers with certain physical or cognitive disabilities, who would ordinarily be entitled to modified productivity expectations under the ADA. Third, human supervisory review of automated termination recommendations has reportedly been limited, raising questions about the "human oversight" that employers typically cite as the check on algorithmic decision-making.
Surveillance-Based Performance Management
Key-logging software, screen capture tools, mouse activity monitoring, and attendance tracking AI have proliferated in remote and hybrid work environments. These tools generate productivity data streams that can be fed into performance evaluation systems. The bias risks are real but underexplored: surveillance tools calibrated for standardized work patterns may penalize non-standard but equally productive work habits, disadvantage workers who take more frequent short breaks (which research suggests improves some types of cognitive performance), and create particular challenges for employees with certain disabilities who may require accommodation-related deviations from monitored norms.
Employee morale effects of performance surveillance are also significant and directly relevant to organizational outcomes. Research on surveillance and motivation suggests that excessive monitoring signals distrust, reduces intrinsic motivation, and drives employees most capable of leaving — often the most skilled — to seek employment elsewhere. The AI that monitors may be destroying the very performance it claims to measure.
10.7 AI Monitoring and "Flight Risk" Prediction
Among the more ethically troubling applications of AI in the employment relationship is "flight risk" prediction — systems that analyze employee behavior to identify who is likely to leave before they have indicated any intention to do so.
How Flight Risk Models Work
Flight risk prediction models aggregate signals from across the employment relationship: performance review scores and trends, badge access data (irregular arrival and departure times), email and calendar metadata (meeting acceptance rates, communication volume changes), productivity tool usage patterns, external data including LinkedIn profile updates or increased connection requests, manager assessments, and tenure and career trajectory data.
Vendors offering predictive attrition tools include IBM's Watson Talent, SAP SuccessFactors, Microsoft Viva Insights, and multiple specialized vendors. IBM claimed in 2019 that its AI could predict which employees were likely to leave with 95 percent accuracy — a claim that attracted significant skepticism from researchers and practitioners.
Privacy and Legal Concerns
The data inputs to flight risk models raise serious privacy questions. Email metadata, calendar data, and badge access records are often collected under broad consent provisions buried in employment agreements. Employees who signed those agreements may not understand that their communication patterns and access records are being analyzed by predictive algorithms to forecast their departure.
The National Labor Relations Act (NLRA) protects employees' rights to engage in "concerted activity" — discussing wages and working conditions, organizing, and engaging in collective action. Flight risk models that flag employees based on communication patterns associated with organizing activity, or that identify "disengaged" employees based on participation in employee advocacy efforts, could constitute unlawful surveillance of protected activity. Several NLRA enforcement cases have addressed employer electronic monitoring; flight risk AI creates new questions in this space.
State privacy laws — including the California Privacy Rights Act (CPRA) and Illinois' BIPA — place additional constraints on employee monitoring and data use. Illinois, where BIPA applies, has seen significant litigation around biometric data collection in workplaces.
Discriminatory Patterns in Flight Risk Prediction
Flight risk models, like all predictive models, predict based on patterns in historical data. Historical attrition patterns are not random: they reflect organizational conditions that make certain groups of employees more likely to leave. If an organization has historically had poor retention of women due to inadequate parental leave or hostile workplace culture, a flight risk model will learn that the features associated with being a woman are predictive of departure. The model will then flag women as high flight risk — and the organization may respond by investing less in their development, passing them over for promotions that would improve retention, or managing them more closely. Each response increases the likelihood that the employee will indeed leave, creating a self-fulfilling prophecy that the model's accuracy reinforces.
Similar dynamics apply to employees with disabilities, employees who have filed accommodation requests, and employees who have participated in diversity advocacy or complaint processes. A flight risk model that flagged post-accommodation employees as "likely to leave" would be generating outputs that could support discriminatory management decisions — even if the model never explicitly considered disability status as a feature.
The Chilling Effect
Employees who know they are being monitored by predictive systems change their behavior. They may avoid legitimate activities — attending an informational meeting about their benefits, speaking with a recruiter at a professional conference, updating their LinkedIn profile — because they understand these actions may trigger algorithmic flags. This chilling effect has implications both for individual autonomy and for organizational culture. Organizations that rely heavily on flight risk AI may suppress the very signals — employees exploring their options, seeking market information, engaging with professional communities — that, if responded to constructively, could improve retention.
10.8 Disability, Accommodation, and AI Hiring Tools
The intersection of AI hiring tools and disability rights warrants dedicated attention because the ADA's reasonable accommodation requirement creates specific legal obligations that many AI tools are poorly designed to meet.
The Reasonable Accommodation Requirement in Hiring
Under the ADA, employers must provide reasonable accommodation to qualified individuals with disabilities to enable them to participate in the application process and demonstrate their qualifications for the position. Reasonable accommodations in traditional hiring might include providing applications in accessible formats, allowing additional time on timed assessments, or offering alternative interview formats.
AI hiring tools create new accommodation challenges because their design typically assumes a standardized candidate interaction. A video interview tool assumes the candidate can see and respond to video prompts, can be filmed by webcam, and can be evaluated on facial and vocal signals. A timed cognitive test assumes the candidate can complete tasks within the standard time limit. An AI-proctored assessment assumes the candidate's eye movements and head position conform to the expected patterns of a non-cheating test-taker. Each of these assumptions can be violated by a disability.
Specific Tools and Specific Disabilities
Video interview tools and autism spectrum disorder: As described in Section 10.4, ASD is associated with atypical facial affect and different patterns of eye contact — features that video analysis systems are calibrated to evaluate. A candidate with ASD who would be an excellent employee may score poorly on a video interview tool simply because their facial presentation does not conform to the neurotypical norm the system was trained on. An ADA-compliant alternative must be available.
Timed cognitive tests and ADHD: Attention-deficit/hyperactivity disorder affects processing speed and sustained attention under standardized conditions. Timed cognitive tests may underestimate the abilities of candidates with ADHD who would perform well in actual job conditions — conditions that include more flexibility, movement, and the ability to structure one's own time. Extended time accommodations are a well-established ADA accommodation for educational testing; similar accommodations must be available for employment testing.
AI proctoring and physical disabilities: AI proctoring tools that monitor candidates during assessments — flagging suspicious head movements, irregular eye gaze, or off-camera glances — may systematically flag behaviors associated with physical disabilities. Candidates with conditions affecting head control, eye muscle function, tremor, or posture may be penalized by proctoring systems calibrated on non-disabled behavioral norms.
Voice analysis and speech differences: Vocal analysis tools are calibrated on the speech patterns of their training populations, which historically overrepresent speakers of mainstream American English without speech differences. Candidates with stuttering, accents associated with certain first-language backgrounds, hearing-related speech differences, or voice changes associated with certain medical conditions may score poorly on vocal analysis assessments that have nothing to do with job performance.
EEOC Guidance and Design Imperatives
The EEOC's 2023 technical assistance guidance specifically addressed AI assessment tools and disability accommodation. The guidance affirmed that:
- Employers must provide reasonable accommodation for AI-based assessments, just as for traditional assessments
- Failure to provide an alternative assessment pathway for candidates who cannot use the primary AI tool due to disability may constitute an ADA violation
- Employers cannot rely on vendor assurances that their tool is ADA-compliant without independent verification
The design implication is significant: AI assessment tools should be developed with universal design principles from the outset — designing for the full range of human diversity rather than retrofitting accommodations after the fact. This means testing tools across populations that include individuals with ASD, physical disabilities, speech differences, anxiety disorders, and other conditions before deployment, not after civil liberties complaints arrive.
10.9 Building Ethical AI Hiring Practices
Having mapped the risks, the question for HR professionals and executives is: what does responsible AI hiring look like in practice? The answer is not "avoid AI in hiring" — these tools offer genuine efficiencies that well-resourced alternatives cannot replicate at scale. The answer is: deploy AI with appropriate safeguards, ongoing monitoring, human oversight, and transparency.
Vendor Due Diligence: Questions You Must Ask
Before deploying any AI hiring tool, organizations should require clear, documented answers to the following questions from any vendor:
Validity and scientific foundation: - What does your tool claim to measure? What is the peer-reviewed scientific basis for the claim that this measurement predicts job performance? - Can you provide independent validation studies — conducted by researchers without financial relationship to your company — demonstrating criterion validity for roles similar to the ones we are filling? - How was the validation data collected? Does the validation sample reflect demographic diversity comparable to the candidate population we will be assessing?
Adverse impact: - Can you provide adverse impact data — disaggregated by race, gender, age group, and disability status — from the tool's deployment across your client base? - What is the adverse impact ratio at standard score thresholds for each protected group covered by Title VII and the ADEA? - Have any of your clients experienced EEOC charges or litigation related to adverse impact from this tool? How were those resolved?
Accommodation: - What alternative assessment pathways are available for candidates who cannot use the primary tool due to disability? - Has your accommodation process been reviewed by legal counsel with ADA expertise? - How do candidates request accommodation, and what is the typical response time?
Audit and transparency: - Do you conduct annual bias audits? By what methodology? Are results published or available to clients? - Are you compliant with NYC Local Law 144 requirements? - What data about candidates is retained, for how long, and under what security conditions?
Job-Related Criteria: The Foundation
Every assessment criterion used in hiring — by AI or by humans — should be traceable to a documented job analysis that establishes its relationship to actual job requirements. A job analysis is a systematic process of identifying the tasks, responsibilities, knowledge, skills, abilities, and other characteristics required for a role. Criteria not supported by job analysis should not be in the screening system.
This principle provides legal protection (the EEOC's Uniform Guidelines require job-relatedness), guides vendor evaluation (ask vendors to show how their tool maps to job analysis for your specific role), and focuses screening on what actually matters for performance.
Audit Requirements: Monitoring What Your System Is Doing
Organizations should conduct regular adverse impact analyses at every stage of their hiring pipeline, not just at the final hiring decision. Funnel analysis — tracking selection rates for protected groups at each stage from application to offer — can reveal where disparate impact is occurring and whether it is concentrated at a particular tool or stage.
NYC Local Law 144 requires annual independent bias audits for automated employment decision tools used in New York City. Even for organizations not subject to this law, the requirements provide a useful template:
- Calculate selection rates by gender and race/ethnicity at each tool-mediated stage
- Apply the four-fifths rule to identify potential adverse impact
- Document methodology, sample sizes, and findings
- Publish results publicly (or at minimum make them available to candidates upon request)
- Act on findings — audit without remediation is audit theater
Human Oversight: Where Humans Must Remain in the Loop
AI hiring tools should inform human decisions, not replace them. For high-stakes decisions — final selection from a shortlist, any decision that results in rejection without human review of the candidate's qualifications, decisions affecting employees at risk of termination — humans must be genuinely involved, not merely signing off on AI outputs.
Chapter 18 will address responsibility allocation in AI decision-making in full; the principle to apply here is that accountability cannot be delegated to an algorithm. The manager or recruiter who makes the final call bears responsibility for that decision, and discharging that responsibility requires actual engagement with the candidate's qualifications rather than passive acceptance of a score.
Documentation: Building the Audit Trail
Legal defensibility and ethical accountability both require documentation. Organizations should maintain records of:
- The criteria and weights applied to each AI screening stage
- The adverse impact analysis results at each audit cycle
- The accommodation requests received and how they were handled
- The vendor contracts, including what representations were made about validity and adverse impact
- The human override decisions — when humans overrode AI recommendations and why
Documentation serves two distinct purposes: it enables retrospective accountability when something goes wrong, and it creates the prospective discipline of requiring that hiring decisions be articulable and defensible.
Transparency with Candidates
Candidates have a legitimate interest in knowing that AI is involved in evaluating their applications, what that AI is doing, and what recourse is available if they believe it has made an error. NYC Local Law 144 requires candidate notification; several European countries require it under GDPR Article 22 protections against fully automated decision-making.
Beyond legal compliance, transparency builds the trust that enables candidates to bring their authentic selves to the process. A candidate who learns, after a rejection, that their video was analyzed by a facial expression algorithm has every reason to feel their privacy was violated and their agency disrespected. A candidate who knows in advance that this analysis is occurring and can opt for an alternative pathway has been treated with the respect owed to every applicant.
Inclusive Design: Testing Before Deployment
AI hiring tools should be tested across demographic groups that reflect the candidate population before deployment, not after problems emerge. Testing should include candidates with disabilities, candidates from diverse racial and ethnic backgrounds, candidates with non-standard educational or career paths, and candidates whose language backgrounds may affect voice or written language assessments.
This testing should be conducted by the vendor and independently verified by the employer. The specific populations tested, the sample sizes, and the adverse impact findings should be documented and available for employer review as part of vendor due diligence.
Looking Ahead: Chapters 18 and 19
The practices described in this section represent the operational layer of ethical AI hiring. The broader questions of who is accountable when AI hiring tools cause harm, how audit processes should be structured, and what liability frameworks govern AI discrimination are addressed in Chapters 18 and 19. Readers who have encountered the tensions in this chapter — between innovation efficiency and harm prevention (T2), between stated values and actual practice (T3), between the convenience of AI tools and the dignity owed to job seekers — will find those tensions given full theoretical and legal treatment in Part 4.
What this chapter establishes is the practical foundation: the specific tools, specific bias mechanisms, specific legal requirements, and specific organizational practices that define the difference between AI hiring that serves organizational goals and AI hiring that creates liability, reproduces historical injustice, and fails the people it is supposed to evaluate fairly.
Discussion Questions
-
The vendor liability question. The EEOC has affirmed that employers are liable for discrimination caused by AI tools they purchase from vendors, even if the vendor designed the problematic feature. A colleague argues that this places an unfair burden on employers who lack the technical expertise to audit AI systems. Another argues that employer liability is appropriate because employers have the greatest power to demand better tools from vendors. Which position do you find more persuasive, and what are the policy implications of each?
-
The HireVue dilemma. When HireVue dropped facial expression analysis in 2021, the company did not refund clients who had used the technology, acknowledge that prior screening decisions may have been discriminatory, or contact candidates who had been rejected based on facial analysis scores. Evaluate this response from the perspectives of legal liability management and genuine ethical accountability. What would a genuinely accountable response have looked like?
-
The efficiency-equity tension. AI résumé screening allows companies to process hundreds of thousands of applications that human reviewers could not evaluate. The efficiency benefit is real. But the same scale that makes AI screening efficient makes its bias effects massive. A screening system with 5 percent bias operating on 1 million applications a year affects 50,000 people. How should organizations weigh the efficiency benefit against the harm-at-scale risk? Is there a threshold at which efficiency gains do not justify disparate impact?
-
Global variation in AI hiring regulation. NYC Local Law 144 requires bias audits for AI hiring tools used in New York City. The EU AI Act requires human oversight and documentation for all high-risk AI employment tools used in Europe. Many jurisdictions have no specific requirements. A global company faces a choice: apply the most stringent requirements globally, apply jurisdiction-specific requirements, or apply a common minimum. What are the ethical considerations in this choice, and what would you recommend?
-
The flight risk prediction question. Your organization's HR analytics team has implemented a flight risk prediction tool that, they report, is 80 percent accurate at predicting which employees will leave in the next 12 months. The CHRO proposes using the scores to prioritize retention investments — spending more development resources on employees scored as high flight risk. A senior manager objects that this approach will reproduce historical inequities by targeting resources at groups that have historically left at higher rates. Who has the stronger argument, and how would you design a more equitable retention investment strategy?
-
Accommodations and competitive fairness. A candidate with ASD requests an alternative to HireVue's video interview process, citing disability. You offer a structured phone interview instead. Another candidate argues that this creates an unfair advantage — the ASD candidate gets a more personalized assessment while others are scored by the AI. How do you respond to this objection, and what does it reveal about how we think about "fairness" in hiring?
-
The culture fit trap. Your organization is considering an AI tool that predicts "culture fit" by comparing candidate assessment profiles to those of your current highest-performing employees. The vendor claims the tool will help you hire people who thrive in your environment. A diversity and inclusion officer raises concerns that the tool will effectively clone your current workforce's demographics. The vendor says their tool has been validated against performance, not demographics. How would you evaluate the vendor's claim, and what additional information would you require before making a deployment decision?
Chapter 10 is part of Part 2: Bias and Fairness. The next chapter examines bias in financial services and credit systems — a domain where many of the algorithmic mechanisms are identical to those in hiring but the legal framework and organizational actors are different. Chapter 18 returns to questions of accountability and responsibility allocation that this chapter has raised but not fully resolved.