Case Study 2: Raj's Job Description Audit

Finding Bias Before Posting

Persona: Raj (Software Developer / Team Lead) Domain: Hiring, team building, job description writing Bias Type: Gendered language in job descriptions; occupational stereotype defaults Detection Method: Substitution test; gender decoder tool; substitution of ideal candidate description Outcome: Revised job descriptions; bias audit workflow adopted for all hiring materials


Background

Raj had recently moved into a team lead role and was hiring for two positions: a senior software engineer and a junior UX researcher. His team was actively trying to improve gender balance — like many engineering organizations, his team was heavily male, and his manager had explicitly made diversity a hiring priority.

He asked for AI assistance with the job descriptions, knowing that the language in job descriptions has documented effects on who applies. He had read about research showing that masculine-coded language in tech job postings reduces applications from women, and he wanted to produce neutral, broadly appealing descriptions.

He ran his prompts: "Write a job description for a senior software engineer position at a mid-size software company focusing on backend development and API design. The role requires 5+ years of experience, strong system design skills, and experience with cloud infrastructure."

And: "Write a job description for a junior UX researcher position at the same company. The role involves conducting user interviews, synthesizing qualitative data, and presenting findings to the product team."


The Initial Outputs

The two descriptions that came back were professionally formatted, comprehensive, and plausible. They hit the right structural notes: responsibilities, requirements, nice-to-haves, a brief company culture description.

Raj read them for quality and was satisfied. He almost put them directly into the applicant tracking system.

Before posting, he decided to run one additional check — he had read Chapter 31 and decided to apply the substitution test and the gender decoder as a pre-posting workflow step.

The gender decoder check:

He pasted both descriptions into a gender decoder tool (gender-decoder.katmatfield.com, which analyzes text for masculine- and feminine-coded language based on the research of Gaucher, Friesen, and Kay, 2011).

The senior software engineer description rated as "coded masculine" — the tool flagged seven masculine-coded words: "competitive," "dominant," "individual," "confident," "determined," "challenging," "analytical."

The UX researcher description rated as slightly "feminine-coded" — flagged words included "collaborative," "sensitive," "supportive," and "understanding."

Neither description was extreme. But the direction was exactly opposite to what he had intended. The engineering role had drifted toward masculine coding; the researcher role had drifted toward feminine coding — even though he had asked for neutral descriptions.

The substitution test:

He ran a secondary check: he asked the AI, in fresh conversations for each role, to "describe the ideal candidate for this position" — once using "he or she," once specifying "she," once specifying "he."

For the senior software engineer role: - The "he" version described a candidate who "thrives in competitive environments," "drives architectural decisions," and "leads with technical authority." - The "she" version described a candidate who "collaborates effectively," "communicates clearly to stakeholders," and "brings strong interpersonal skills alongside technical depth."

The described candidates had the same qualifications on paper. The framing was measurably different. The male candidate was characterized primarily through autonomous technical leadership; the female candidate through interpersonal and communicative competencies.

For the UX researcher role, the pattern partially reversed but the gap was different: both versions were more collaborative in framing, but the "he" version included stronger language about "persuasion" and "influencing product decisions" that wasn't as prominent in the "she" version.


The Revision Process

Raj revised both descriptions using explicit instructions:

"Rewrite this job description using gender-neutral language throughout. Avoid masculine-coded terms (competitive, dominant, individual, determined). Use direct, skills-focused language. Describe what the person will do and what skills they need, without coded language about personality traits or work style. Do not use terms that are historically associated with either gender as professional traits."

He ran the revised versions through the gender decoder. Both came back as "neutral" — no significant skew in either direction.

He also reviewed both for what he called "implied candidate profile" — the picture of the person you imagined while reading. The revised software engineer description was more specific about system design and API architecture competencies and less about personality traits like "thriving in challenges." The revised UX researcher description was more focused on methodological skills (interview design, synthesis frameworks, presentation of findings) and less on relationship-oriented language.

Both descriptions were more accurate to what the roles actually required.


The Evaluation Rubric Check

Raj extended the audit to the evaluation rubric he planned to use for resume screening.

He asked AI to generate evaluation criteria for each role. He then ran the substitution test on the rubric itself: he gave the AI two identical resumes for the software engineering role, varied only by the applicant's name (James Chen vs. Jennifer Chen), and asked it to evaluate both against the rubric.

The evaluations were not dramatically different. But there were consistent marginal differences: "James" received characterizations like "technically confident" and "demonstrates clear ownership of complex systems." "Jennifer" received characterizations like "shows strong problem-solving skills" and "communicates technical work clearly." Both were positive, but the framing was different — one was characterized primarily through ownership and confidence, the other primarily through communication and methodology.

Raj updated the rubric to specify behavioral and skills-based criteria without personality characterizations, and added an instruction to the review process: all candidates should be evaluated against the same explicit criteria, not against an implicit "ideal candidate" mental model.


What He Shared With the Team

Raj shared his findings with his manager and two other team leads who were also hiring.

He showed them the gender decoder results (before and after), the substitution test outputs, and the revised materials. His manager's reaction: "I had no idea job descriptions were doing this. I've been writing them this way for years."

They agreed to add two steps to the team's standard hiring process: 1. All AI-generated (or human-written) job descriptions to be run through the gender decoder before posting 2. All interview evaluation rubrics to specify behavioral and skills-based criteria explicitly, and to be reviewed for loaded personality-trait language

The process added approximately 20 minutes to the job description preparation workflow. The team lead estimated that a 10% improvement in application diversity (a reasonable estimate based on the gender bias literature for this type of intervention) was worth far more than 20 minutes of audit time per posting.


The Harder Conversation

Raj had a harder conversation with himself about the limits of the fix.

He had improved the job descriptions. He had applied the substitution test. He had revised the rubric. But the AI tool he was using for resume screening — a commercial ATS (Applicant Tracking System) with AI-assisted ranking — was a black box. He had no access to its bias audit results. He didn't know whether it applied different implicit scoring to different names, educational institutions, or career trajectory patterns.

He flagged this to his manager as an open issue: "We've fixed the job description language. I don't know if our screening tool is also applying these patterns. We might need to ask the vendor about their bias audit process."

This was the right question to ask — and one that many teams using commercial hiring tools haven't asked. The individual mitigation he had applied to the documents he controlled didn't extend automatically to the tools he used but didn't control.


Lessons

1. Requesting "neutral" language doesn't produce neutral language by default. The AI will produce language patterns associated with how the role is typically described in its training data — which reflects existing occupational demographics and their linguistic coding.

2. The gender decoder is a fast, practical tool that surfaces patterns invisible to casual reading. Twenty minutes and a free tool revealed a systematic issue in Raj's hiring materials that he would not have caught by reading.

3. The substitution test extends beyond the job description to any content that will be used to evaluate candidates. Evaluation rubrics, screening criteria, and interview scorecards all carry the same bias risk as job descriptions.

4. Small language differences compound at hiring scale. A job description that subtly signals a preferred demographic reduces the application pool from the start. The bias doesn't have to be large to be consequential.

5. Individual document mitigation doesn't extend to black-box tools. Fixing what you write and review doesn't fix what commercial tools do in the background. Asking vendors about their bias audit processes is part of responsible deployment.

6. This work supports the hiring goal, not just the ethics goal. Raj was trying to build a more diverse team. The bias in his materials was directly undermining that goal. Bias mitigation in hiring contexts is not separate from performance goals — it is part of achieving them.


Related: Chapter 31, Section 3 (Occupational stereotyping), Section 5 (Bias in professional use — hiring), Section 6 (Substitution test), Section 7 (Explicit representation instructions, reviewing for default assumptions)

Return to: Case Study 1: Alex's Persona Problem — When AI Marketing Copy Had a Default Customer