Case Study 10.2: Résumé Screening at Scale — Opportunity and Risk in Automated Applicant Tracking

Chapter 10: Bias in Hiring and HR Systems

Overview

Automated applicant tracking systems (ATS) are the invisible infrastructure of modern hiring. When a candidate submits an application online at a Fortune 500 company — through the company's careers portal, through LinkedIn, through Indeed — their résumé almost certainly enters an ATS before any human reads it. The system parses the document, extracts structured information, scores the application against requirements, and either advances or filters the candidate.

Industry surveys consistently report that 99 percent of Fortune 500 companies use ATS platforms. Estimates of the percentage of résumés filtered out before human review range from 70 to 80 percent. At that scale, the bias embedded in these systems — in their keyword models, their formatting requirements, their prestige weights — is not merely a technology problem. It is a labor market problem.

This case study examines how ATS systems work, where bias enters the pipeline, what the documented adverse impacts are, and what better résumé screening looks like. It uses the experience of major vendors and the research of labor economists to ground an analysis that has direct relevance to any organization that receives more applications than it can manually review — which, in practice, means almost every large employer.

1. The Scale Problem: Why Automation Became Inevitable

The numbers are startling. Amazon receives roughly 1 million applications per year. Google processes hundreds of thousands. Even mid-tier employers in competitive markets receive thousands of applications for popular roles. A human recruiting team cannot meaningfully read every application; the cost and time would be prohibitive.

The ATS arose to solve this problem. The first generation of ATS platforms — Taleo (founded 1996, acquired by Oracle 2012), BrassRing (acquired by IBM 2005), and PeopleSoft (acquired by Oracle 2005) — were primarily record-keeping systems: databases that replaced the physical filing of paper applications. The screening functionality was initially rule-based: minimum education requirements, minimum years of experience, required certifications. If you didn't check the boxes, you were automatically filtered.

The second generation added NLP and machine learning. Systems could now parse unstructured résumé text, infer structure even from inconsistent formatting, extract skills from contextual language, and rank candidates rather than merely filter them. Vendors including Workday, SAP SuccessFactors, Greenhouse, iCIMS, Lever, and Jobvite brought ATS functionality to organizations of all sizes. The promise extended from "manage applications efficiently" to "identify the best candidates automatically."

This evolution created a market with an inherent tension. The efficiency benefit of automated screening is real and quantifiable — cost per hire, time to fill, recruiter bandwidth. The bias risk is diffuse and often invisible to the organizations experiencing it — manifesting in diversity outcomes, turnover patterns, and EEOC charges that are attributed to other causes or absorbed as acceptable costs.

2. How ATS Systems Work: Parsing, Scoring, and Ranking

Modern ATS platforms operate in several stages:

Document Parsing

When a candidate submits a résumé, the ATS parses the document — extracting structured data from unstructured text. The parser attempts to identify:

Contact information (name, email, phone, location)
Education history (institutions, degrees, fields of study, graduation dates)
Employment history (employers, job titles, dates, responsibilities)
Skills (explicitly listed skills and skills inferred from job descriptions)
Certifications, languages, and other credentials

Parsing quality varies substantially across vendors and across résumé formats. Well-formatted standard résumés in common formats (Word documents, simple PDFs) parse more reliably than creative-format résumés, résumés with tables or graphics, résumés created in design software (Canva, InDesign), or résumés following formatting conventions from other countries (including CVs with photos, two-page professional summaries common in European contexts, or chronology formats different from reverse-chronological).

When parsing fails — when the ATS cannot extract structured data from a résumé — the system either rejects the document outright or advances it with missing fields. In the former case, a qualified candidate is eliminated before any human review for a reason having nothing to do with their qualifications.

Keyword Matching and Scoring

Once parsed, résumés are scored against job requirements. The basic mechanism is keyword matching: the system searches for the presence (and sometimes frequency) of terms specified in the job requirements — skills, certifications, tools, role titles, industry terms — and scores candidates on how many required and preferred terms appear in their résumé.

More sophisticated systems use semantic matching — identifying conceptually related terms rather than exact keyword matches — and machine learning scoring that weighs factors beyond keywords based on patterns in historical data. A Workday or Greenhouse implementation might learn that candidates who have worked at certain companies, attended certain universities, or used certain terminological patterns have been hired and retained successfully, and weight those factors accordingly.

The output is typically a ranked list or scored pool. Recruiters often see only candidates above a certain threshold, or only the top N candidates from the scored pool. Candidates below threshold may receive automatic rejection notices — sometimes within seconds of application submission.

3. The Keyword Bias Problem: What Keywords Encode

Keyword screening appears neutral. It is looking for evidence of relevant skills and experience — a legitimate hiring purpose. The bias enters in what counts as evidence, who has been able to produce that evidence, and how the system interprets ambiguous signals.

Vocabulary as Class Marker

Professional vocabulary is not uniformly distributed across the population. The terminology that ATS systems recognize and reward is learned through professional training, elite educational contexts, mentoring relationships, and professional networks. Candidates who have attended well-resourced universities, worked at large established firms, had access to professional development programs, or been coached by career advisors know the vocabulary of their target field in ways that candidates who have developed equivalent skills through different pathways may not.

A software engineer who developed the same skills through a coding bootcamp, community college courses, and self-directed learning may not use the same terminology as a graduate of a selective computer science program — not because their skills differ, but because they were not socialized into the same professional linguistic community. An ATS optimized for specific keywords will undercount the bootcamp graduate's qualifications even if both candidates could do the job equally well.

This dynamic creates structural disadvantage that compounds across time: candidates who lack access to the vocabulary-teaching channels are filtered out, reducing their opportunities to build experience in environments where they would acquire the vocabulary, perpetuating their exclusion from future rounds of keyword screening.

The Prestige Signal Problem

Many ATS implementations include scoring components based on employer name and educational institution. These components are typically not explicit weights but emerge from machine learning models trained on historical hire data: if Stanford graduates have historically been hired and retained at higher rates at a given company, the model learns that Stanford is a positive signal.

The problem is that Stanford enrollment reflects socioeconomic and racial stratification that has nothing to do with job performance. A candidate who attended Stanford had, on average, access to greater economic resources, stronger secondary educational preparation, and larger professional networks than an equally capable candidate who attended a less selective institution. The ATS model rewards these access advantages as proxies for performance, encoding socioeconomic and racial privilege into the screening score.

This is the proxy variable problem (Chapter 8) in one of its most consequential forms. Educational institution is a proxy for socioeconomic background, which correlates with race and other protected characteristics. Screening on this proxy, without evidence that it predicts job performance independently of socioeconomic background, is likely to produce disparate impact on racial and ethnic minorities and candidates from lower-income households.

4. Formatting Discrimination: When the Container Defeats the Content

Among the less-discussed but significant adverse impacts of ATS screening is formatting discrimination: the systematic disadvantage experienced by candidates whose résumé format, while coherent to a human reader, is poorly interpreted by automated parsing systems.

Creative and Non-Standard Formats

Candidates in creative fields — graphic designers, marketing professionals, UX researchers, architects — often use résumé formats that demonstrate their design skills. These formats may include visual elements, non-standard section layouts, typography-heavy design, or content embedded in graphics rather than standard text fields. ATS parsers trained on standard business-format résumés may fail to extract information from these documents, causing qualified creative-field candidates to be filtered out before their qualifications are assessed.

This creates a particular irony: candidates whose skills include the ability to communicate effectively through design are penalized by a system that cannot process design.

International Résumé Conventions

CV formatting conventions differ substantially across countries and professional cultures. In Germany, CVs traditionally include a photograph and personal information (date of birth, marital status) that US CVs exclude. Academic CVs used by researchers and scientists can run to many pages and organize experience by publication and presentation rather than employment history. Some cultures use chronological rather than reverse-chronological organization. Résumés from recent immigrants or international candidates may follow formats that ATS systems misparse.

This creates a systematic disadvantage for international candidates and recent immigrants — populations that are already navigating structural disadvantages in the labor market — that has nothing to do with their job qualifications.

PDF and Format Compatibility

Some ATS systems parse PDFs more reliably than Word documents, or vice versa. Some have difficulty with résumés that use tables, text boxes, or columns — common in modern résumé design templates widely available through Canva and similar platforms. Candidates who use popular résumé templates without knowing that the template format is incompatible with common ATS parsers are filtered out for a technical reason entirely unrelated to their qualifications.

Career coaches who work with job seekers in underserved communities note that their clients often use résumé templates accessed through free online platforms — templates that may be incompatible with ATS systems their more resourced counterparts have been advised to avoid. The information about ATS-compatible formatting is available but not uniformly distributed; access to this knowledge is itself a marker of having the right professional networks.

5. Documented Adverse Impacts from ATS Keyword Screening

The adverse impact of keyword-based screening is not merely theoretical. Several lines of evidence document its effects:

Career Gap Penalties at Scale

Research on career gap penalties in hiring has documented that women who take career breaks for caregiving return to job markets facing systematic screening disadvantage. A 2021 study using audit methodology — submitting matched résumés with and without caregiving gaps — found significant callback rate reductions for résumés with gaps, with the penalty disproportionately affecting women's caregiving gaps versus gaps framed as sabbaticals or self-employment. At ATS scale, this penalty operates before any human reviewer can exercise judgment about the context of the gap.

Name-Based Screening Disparities

Marianne Bertrand and Sendhil Mullainathan's 2004 audit study (replicated extensively since) demonstrated substantial name-based callback differences in human screening. Equivalent dynamics operate in ATS systems that retain candidate names during initial parsing and scoring. Systems that use name phonetics, name frequency patterns, or name-associated school records as features — whether intentionally or through emergent patterns in training data — can produce the same name-based disparities that human reviewers produce.

The scale difference is critical. A human recruiter with 500 applications may have name-based bias affect 20-30 outcomes. An ATS processing 50,000 applications with similar bias patterns affects 2,000-3,000 outcomes before a single human is involved.

Veteran and Non-Traditional Career Disadvantages

Veterans returning from military service bring substantial technical skills, leadership experience, and operational competencies. However, military résumé language — which uses military occupational specialty codes, branch-specific terminology, and rank-based role titles — is often poorly parsed by civilian ATS systems. A veteran whose experience in logistics management or cybersecurity operations is described in military terminology may score poorly on a civilian ATS that does not recognize the equivalences.

Similarly, candidates who have built careers through non-traditional pathways — apprenticeships, gig economy work, freelance consulting, community organizing — often have work histories that ATS systems are poorly calibrated to evaluate. The systems reward the structured, institutional career path that correlates with privilege.

6. The Gaming Problem: ATS Optimization and Its Consequences

Once job seekers became aware that ATS systems determine whether their résumés are seen by humans, an industry emerged to help candidates optimize their résumés for ATS detection. Career coaches, résumé writing services, and job search platforms now routinely advise candidates on ATS keyword optimization — tailoring résumé language to match job description keywords, using ATS-parseable formats, and including skill terms in ways that maximize scoring.

This gaming dynamic has several consequences:

The information asymmetry disadvantage. ATS optimization advice is available but not uniformly accessed. Candidates with access to career coaches, university career services, and professional networks learn to optimize their résumés for ATS detection. Candidates who lack these resources — first-generation college students, recent immigrants, workers transitioning from industries with different linguistic conventions — are at a consistent disadvantage.

The signal degradation problem. As candidates learn to optimize for ATS keywords, résumés become less informative and more homogeneous. The diversity of language that a hiring manager might use to identify unusual or exceptional backgrounds is flattened into a standard template that reflects what ATS systems reward rather than what candidates have actually done.

The fairness inversion. Paradoxically, ATS optimization can disadvantage candidates whose qualifications are strongest and most distinctive. A candidate with a genuinely unusual but highly relevant background may struggle to translate their experience into the keyword-dense format that ATS systems reward, while a candidate with more conventional but less distinguished credentials aces the keyword match.

7. What Better Résumé Screening Looks Like

The documented problems with keyword-based ATS screening do not require abandoning automated screening — the scale problem is real, and manual review of every application is not feasible. They require redesigning screening systems around validated criteria and continuous monitoring.

Structured Job Analysis as the Starting Point

Screening criteria should derive from systematic job analysis: a documented process of identifying what tasks the role requires and what knowledge, skills, and abilities those tasks demand. Criteria should be limited to those with demonstrable job-relatedness. Prestige proxies (school name, employer name) should be excluded unless there is specific evidence that they predict performance independent of other, more directly job-related criteria.

Names, graduation years, addresses, and other demographic-adjacent identifiers should be removed from initial résumé screening. Several major ATS vendors now offer blind screening features. Implementation requires configuration and process discipline, but the bias reduction benefits are substantial and documented.

Validated Semantic Matching

Keyword matching systems that have been validated against job performance — rather than trained on past hiring decisions alone — are significantly preferable to raw keyword frequency scoring. Validation should include adverse impact analysis across protected groups.

Continuous Adverse Impact Monitoring

Organizations should calculate selection rates by race, gender, age group, and disability status at every stage of screening — including the ATS filter stage. The four-fifths rule provides a practical threshold for identifying stages with potential adverse impact. When adverse impact is identified, the criteria generating it should be reviewed and either justified by job-relatedness evidence or removed.

Human Review of Edge Cases

Candidates near the ATS threshold — those just below the cutoff score — should receive human review before rejection. These are the candidates most likely to be affected by formatting issues, vocabulary differences, and other technical factors rather than qualifications differences. A small amount of human attention at this edge can prevent a substantial number of qualified candidates from being systematically filtered out.

8. International Variation: EU AI Act Requirements

The EU AI Act classifies AI systems used in recruitment and selection as high-risk, triggering the Act's most demanding requirements. Organizations operating in European markets or screening EU candidates must meet requirements including:

Risk assessment before deployment, documented and updated throughout the system's lifecycle
Training data governance, including documentation of data sources and bias mitigation measures taken
Technical documentation enabling competent authorities to assess compliance
Transparency with candidates about the use of AI in their assessment, including the right to explanation under GDPR Article 22
Human oversight mechanisms that allow qualified humans to override AI assessments
Post-market monitoring, including incident reporting

These requirements are substantially more demanding than anything currently required in the United States outside of New York City. Organizations that operate globally must meet the EU standard for their EU operations; many are choosing to apply the EU standard broadly as a common minimum.

The contrast between US and EU regulatory environments illustrates the global variation theme (T5) that runs throughout this textbook. The same ATS system deployed in Chicago and Frankfurt may face dramatically different legal requirements, creating both compliance complexity for multinational organizations and very different levels of protection for job seekers in different jurisdictions.

9. The Path Forward: Screening for Competence, Not Credential

The deeper problem with ATS-based résumé screening is not technical — it is conceptual. Most screening systems are built to find candidates who look like candidates who have been hired and succeeded before. This backward-looking model systematically replicates historical patterns, including historical demographic patterns, and excludes candidates who have developed relevant capabilities through non-traditional pathways.

The forward-looking alternative is to screen for demonstrated competence rather than inferred competence from credentials. Work sample tests, structured situational judgment assessments, and skills-based screening — asking candidates to demonstrate job-relevant skills rather than describe credentials associated with job-relevant skills — have consistently stronger predictive validity than credential-based screening and, when designed with adverse impact in mind, can reduce demographic disparate impact while maintaining or improving predictive power.

This shift is not merely ethical; it is strategic. Organizations that continue to screen based on credential proxies in a labor market where talent is increasingly distributed across non-traditional pathways will lose competitive ground to organizations that can identify high performers wherever they come from.

The tools to build better systems exist. What is often missing is the organizational will to invest in validation, to commit to ongoing monitoring, and to accept the short-term friction of changing established processes in exchange for the long-term benefit of screening that works for both organizations and candidates.

Discussion Questions

The "formatting discrimination" problem — ATS systems that cannot parse non-standard résumé formats — disproportionately affects candidates from creative fields, international backgrounds, and lower-income households who use free design templates. Is this a bias problem in the usual sense, or is it simply a technical incompatibility? Does the distinction matter ethically?
Career coaches now routinely advise job seekers to optimize their résumés for ATS keyword detection. This advice is primarily accessible to candidates who can afford career coaching or attend well-resourced universities with career services. How does the existence of this information asymmetry change your assessment of the fairness of ATS-based screening?
An employer argues that using machine learning screening trained on successful historical hires is not discriminatory — it is simply using the best available predictor of future success. Construct the strongest possible counter-argument, drawing on the concepts of historical bias, proxy variables, and the four-fifths rule.
The EU AI Act requires organizations to document their ATS training data governance and demonstrate bias mitigation measures. Suppose you are the HR technology director for a global company. Design a 12-month roadmap for bringing your ATS into EU AI Act compliance, identifying the key decisions, stakeholders, and milestones.

Related: See Chapter 7 (Amazon's hiring algorithm — the canonical ATS bias case); Chapter 8 (proxy variables and historical bias); Chapter 9 (the four-fifths rule and adverse impact measurement); Chapter 33 (EU AI Act compliance requirements).