Case Study 35-1: Joy Buolamwini and the Algorithmic Justice League

Case Study 35-1: Joy Buolamwini and the Algorithmic Justice League

From a White Mask to a Movement

The Origin Story

Joy Buolamwini was a graduate student at MIT Media Lab in 2015 when she encountered a problem she could not ignore. Working on a project that used facial detection software to make a robot respond to human faces, she found that the software consistently failed to detect her face. It worked fine with the white faces she tested it on. It worked on her dark-skinned face only when she put on a white mask.

This experience — having to literally wear a white mask to be seen by facial detection technology — was a visceral encounter with algorithmic bias. Buolamwini, a Black woman born in Canada and raised in Mississippi, recognized it immediately as a technical manifestation of a deeper social problem: AI systems were being built that replicated and encoded the biases of the world they were trained on, then deployed as if they were neutral.

She began investigating systematically. What she found became the Gender Shades study.

The Research: Methodology and Findings

Working with Timnit Gebru (then a Microsoft researcher), Buolamwini developed a more rigorous approach to auditing facial analysis systems than any that had been previously applied to commercial AI.

The key methodological innovation was the Pilot Parliaments Benchmark (PPB): a dataset of photographs of elected officials from African countries (Rwanda, Uganda, South Africa, Nigeria, Senegal) and European countries (Iceland, Finland, Sweden), designed to ensure roughly equal representation of men and women across darker and lighter skin tones.

Prior AI benchmarks — Labeled Faces in the Wild, IJB-A, others — were dominated by lighter-skinned individuals, reflecting the demographics of their sources (stock photography, celebrity images, academic data). By creating a demographically balanced benchmark, Buolamwini and Gebru could measure performance across demographic groups that existing benchmarks obscured.

The results, published in 2018:

System	Lighter-skin males	Darker-skin females	Gap
Microsoft	93.6%	79.2%	14.4%
IBM	88.0%	65.3%	22.7%
Face++	99.2%	64.5%	34.7%

The maximum error rate gap — 34.7 percentage points between lighter-skinned men and darker-skinned women in Face++ — was the study's headline finding. But the pattern was consistent across all three systems: the highest accuracy was for lighter-skinned men, the lowest for darker-skinned women.

The Industry Response: Improvement Under Scrutiny

One of the most significant outcomes of the Gender Shades study was how quickly the affected companies improved their systems after publication.

Buolamwini conducted a follow-up audit in 2019 ("Actionable Auditing: Investigating the Impact of Publicly Naming Artificial Intelligence Performance Results on Racial Dynamics"):

Microsoft reduced its error rate for darker-skinned women from 20.8% to 1.5% — a 93% reduction
IBM reduced its error rate for darker-skinned women from 34.7% to 3.5% — a 90% reduction
Face++ showed improvement but remained more disparate than the other systems

The rapid improvement demonstrated what Buolamwini called the "actionable" insight: the companies could have achieved better accuracy across demographic groups all along. They had not, because they were not measuring it and had no external pressure to do so. Public disclosure and accountability — not new technology — produced the improvement.

The Algorithmic Justice League

From the Gender Shades research, Buolamwini founded the Algorithmic Justice League (AJL), an organization combining art and research to raise awareness about AI bias and advocate for equitable AI systems.

AJL's work spans multiple registers:

Research: Continued auditing of facial recognition and other AI systems. AJL researchers have documented accuracy disparities in additional systems and contexts, including emotion recognition AI (which displays significant racial bias) and age estimation systems.

Policy advocacy: AJL has testified before Congress, engaged with FTC enforcement proceedings, and contributed to legislative debates on facial recognition. Buolamwini's testimony has helped shape legislative proposals at both federal and state levels.

Artistic intervention: Buolamwini has combined research with poetry and performance — her spoken word poem "AI, Ain't I A Woman?" plays on Sojourner Truth's famous "Ain't I a Woman?" speech and confronts facial recognition systems with images of Black women historical figures (Sojourner Truth, Shirley Chisholm, Michelle Obama, Oprah Winfrey) that many commercial systems fail to identify correctly as women. The performance renders visible the same invisibility she experienced in her graduate student office.

Public education: AJL's "Safe Face Pledge" asks AI developers to commit to not deploying facial recognition in ways that enable surveillance of protected groups. Its educational materials have reached broad audiences.

The Structural Argument

Buolamwini and Gebru's research is sometimes framed as a technical accuracy problem — fix the training data, achieve balanced accuracy, problem solved. Buolamwini herself has consistently pushed for a deeper structural analysis.

Her argument: the accuracy disparities are not merely technical errors but symptoms of who builds AI systems, whose interests are centered in their development, and who bears the costs when they fail.

The AI industry is demographically homogeneous — predominantly white and male, particularly in technical leadership. The benchmark datasets that have historically been used to measure and celebrate AI systems were built from sources that over-represented those demographics. Systems were declared "accurate" without accounting for their performance on people who didn't resemble the training data.

The harm falls disproportionately on people already marginalized: Black and Brown people who are already more likely to be surveilled, more likely to be wrongfully suspected, more likely to lack resources to challenge a false accusation.

A purely technical fix — improving accuracy across demographic groups — is necessary. But it is not sufficient. The deeper question is: should facial recognition be deployed in these ways at all, given the power asymmetries and structural inequalities in which it operates?

The Fight Over Company Responses

After Gender Shades, Buolamwini sought to engage IBM directly. IBM's then-CEO Ginni Rometty wrote an open letter to President-elect Biden in 2021 arguing that IBM would no longer offer "general purpose" facial recognition technology and calling for national regulation.

Buolamwini has written about the complexity of this response: an industry leader calling for regulation of a product they were selling can be genuine reform advocacy or strategic positioning — limiting competition by raising regulatory barriers. She argued for vigilance about whether corporate ethics rhetoric translates into actual accountability for harms already caused.

In 2021, Amazon, IBM, and Microsoft all announced moratoriums on selling facial recognition to police departments — temporarily, pending federal legislation. The announcements were welcomed by advocates as significant corporate responses to the wrongful arrest documentation and advocacy pressure. Critics noted that the moratoriums were time-limited, voluntary, and maintained exceptions for national security.

Timnit Gebru's Firing and the Limits of Corporate AI Ethics

Timnit Gebru co-authored Gender Shades and was subsequently hired by Google to co-lead the Ethical AI team — one of the most prominent AI ethics appointments in the industry. In December 2020, Google fired Gebru after she raised concerns about a research paper (on large language model bias) that Google management wanted her to retract.

The firing generated enormous controversy and a subsequent departure by several other AI ethics researchers from Google. It illustrated the structural vulnerability of AI ethics work within corporations: ethics researchers who challenge products or practices their employer wants to defend can face professional consequences.

Gebru subsequently founded the Distributed AI Research Institute (DAIR), an independent research organization not beholden to corporate interests. Her case, and AJL's independent model, both point toward the same conclusion: effective AI accountability requires independence from the companies being held accountable.

Analysis Questions

1. Buolamwini's origin story involves literally wearing a white mask to be seen by facial detection software. How does this personal experience function as both a research motivation and an artistic/rhetorical strategy? What does it communicate that a statistical finding alone cannot?

2. The Gender Shades companies improved their accuracy after public disclosure. Does this represent a successful accountability mechanism? What are its limitations — what would a more robust accountability system look like?

3. Buolamwini distinguishes between technical fixes (better accuracy) and structural analysis (questioning who AI is built for and against). Is this distinction useful? At what point does the call for structural analysis become an obstacle to achievable technical improvement?

4. Timnit Gebru's firing from Google illustrates the risk to ethics researchers working within corporations. How should the AI ethics field be organized institutionally? What is the appropriate role of corporate AI ethics teams, academic researchers, regulatory agencies, and independent organizations like AJL and DAIR?

5. The Algorithmic Justice League combines research, advocacy, and art. AJL's "AI, Ain't I A Woman?" poem/performance is explicitly artistic. What does the combination of art and research accomplish that either alone would not?

This case study connects to Chapter 35 Sections 35.5 (Gender Shades) and 35.8 (regulatory responses). It connects backward to Chapter 33 (art and activism) and forward to Chapter 36 (racial surveillance) and Chapter 38 (AI governance).