Chapter 7: Further Reading and Annotated Sources
This bibliography provides annotated references for primary sources, key research, and essential journalism underlying Chapter 7. Sources are organized thematically. All sources are real and publicly available unless otherwise noted.
I. Foundational Research on Algorithmic Bias
1. Barocas, S., & Selbst, A. D. (2016). Big data's disparate impact. California Law Review, 104(3), 671–732.
This is the foundational legal-academic analysis of how machine learning can produce disparate impacts under existing civil rights law. Barocas and Selbst identify the specific mechanisms through which facially neutral machine learning systems can encode and amplify discrimination, and they apply the disparate impact doctrine rigorously to algorithmic hiring tools. This article effectively established the legal-academic framework for what has become a major field. Essential reading for anyone who needs to understand the legal dimensions of algorithmic bias — it is careful, thorough, and accessible to readers without prior legal training. Solon Barocas has continued this research program; his subsequent work with Moritz Hardt and Arvind Narayanan (see entry 6 below) provides a complementary technical treatment.
2. Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153–163.
This paper provides the formal mathematical proof that several commonly desired fairness criteria — including calibration, equal false positive rates, and equal false negative rates — are mutually incompatible when base rates differ across groups. Chouldechova applies this result specifically to the COMPAS recidivism prediction tool and the ProPublica analysis, demonstrating that the apparent contradiction between Northpointe's and ProPublica's claims was not a matter of one side being wrong but of two mathematically incompatible fairness criteria. The result is technically moderate in difficulty (requires comfort with probability theory) but the implications are profound and accessible. This paper is the single most important technical reference for understanding why there is no purely technical resolution to the COMPAS controversy.
3. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.
The original research paper documenting racial bias in the Optum health risk stratification algorithm discussed in Sections 7.2 and 7.4. The research team analyzed the algorithm's outputs for a population of more than 43,000 patients and found that Black patients assigned the same risk score as white patients were significantly sicker, because the algorithm used healthcare cost as a proxy for health need. The paper is methodologically sophisticated but the key findings are clearly presented. Published in Science, one of the world's most prestigious scientific journals, giving it particular authority. The paper's authors include economists, physicians, and computer scientists — a model of the interdisciplinary collaboration that responsible AI research requires.
4. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores. Proceedings of Innovations in Theoretical Computer Science.
A companion to Chouldechova (2017), this paper independently establishes the mathematical incompatibility result for fairness criteria and extends it to more general settings. The Kleinberg et al. paper established that the impossibility result is not specific to recidivism prediction but applies generally to risk scoring in any domain where base rates differ across groups. Together with Chouldechova's paper, it establishes the theoretical foundation for understanding why fairness is not a purely technical problem. Somewhat more technically demanding than Chouldechova but important for readers who want the broader result.
II. The Amazon Hiring Algorithm
5. Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. Available at: https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
The original investigative report that disclosed Amazon's biased hiring algorithm to the public. Based on interviews with five current and former Amazon employees, the piece describes the algorithm's design, the discovery of gender bias, and Amazon's decision to shut the tool down. This is the primary source for the Amazon case study and for the public record of this incident. The reporting is careful and specific. Amazon confirmed key aspects of the story while disputing others. Reading the original Reuters report alongside Chapter 7's case study provides a useful exercise in comparing primary source journalism to secondary analysis.
III. Facial Recognition and Demographic Bias
6. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAccT), 77–91.
The foundational Gender Shades study, evaluating commercial facial analysis systems from Microsoft, IBM, and Face++ across demographic subgroups defined by skin tone and gender. This paper introduced the "Fitzpatrick scale" approach to skin tone classification in AI evaluation and documented the dramatic accuracy disparities for dark-skinned women documented in Chapter 7. The paper is highly readable and does not require technical background in machine learning. It has become one of the most widely cited papers in the AI fairness literature. Timnit Gebru went on to co-author the foundational "Stochastic Parrots" paper on large language model bias before being controversially dismissed from Google in 2020 — a case discussed in later chapters.
7. Grother, P., Ngan, M., & Hanaoka, K. (2019). Face Recognition Vendor Technology (FRVT) Part 3: Demographic effects. National Institute of Standards and Technology. NISTIR 8280. Available at: https://doi.org/10.6028/NIST.IR.8280
The authoritative NIST evaluation of commercial facial recognition systems for demographic accuracy disparities, described in Case Study 7.2. This is a technical government report evaluating more than 100 algorithms from dozens of vendors using a common evaluation protocol and large-scale datasets. Its technical depth is substantial, but the key findings — the tables showing false positive rates by demographic group — are clearly presented and accessible. Any professional working in contexts where facial recognition is considered for deployment should read at minimum the executive summary and the tables of results. NIST's ongoing FRVT program continues to publish updated evaluations; practitioners should check for newer reports.
8. Hill, K. (2020, June 24). Wrongfully accused by an algorithm. The New York Times. Available at: https://www.nytimes.com/2020/06/24/technology/facial-recognition-arrest.html
The investigative report documenting the wrongful arrest of Robert Williams, the first publicly documented case of a wrongful arrest based on a facial recognition false match in the United States. Hill's reporting is based on direct interviews with Williams and his wife, access to police records, and review of the facial recognition match. The article provides a human account of what the statistical disparities documented in the NIST report mean in practice. Essential reading alongside Case Study 7.2 for understanding the real-world consequences of algorithmic accuracy disparities.
IV. Criminal Justice and Risk Assessment
9. Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine bias. ProPublica. Available at: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
The investigative report that triggered the public debate about COMPAS recidivism prediction. ProPublica analyzed COMPAS scores and subsequent criminal records for more than 7,000 people in Broward County, Florida, and found that Black defendants were nearly twice as likely as white defendants to be falsely classified as high-risk. This article is one of the most consequential pieces of algorithmic accountability journalism ever published. It reads accessibly, but the full methodology is documented in a companion technical piece. Every business professional working in AI should read this article; it defines the terms of the public debate about algorithmic bias in criminal justice that has continued for nearly a decade since publication.
10. State v. Loomis, 371 Wis. 2d 235, 881 N.W.2d 749 (2016).
The Wisconsin Supreme Court decision upholding the use of COMPAS risk assessment in criminal sentencing against a due process challenge. The case is discussed in Section 7.7. The opinion is readable by non-lawyers. It raises and partially addresses the key legal questions about algorithmic decision-making in high-stakes contexts: the right to explanation, the propriety of using group-level statistical risk assessments to inform individual decisions, and the rights of defendants to contest automated risk assessments. The decision has been criticized by legal scholars for insufficiently engaging with the due process concerns, and it remains a significant reference in ongoing debates about algorithmic accountability in the justice system.
V. Technical Frameworks for Fairness
11. Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and machine learning: Limitations and opportunities. MIT Press. Available free online at: https://fairmlbook.org
The definitive technical textbook on algorithmic fairness, written by three leading researchers in the field. The book covers mathematical definitions of fairness, the incompatibility results (building on Chouldechova and Kleinberg et al.), causal inference approaches to fairness, and the broader social context of algorithmic bias. It is written for readers with some technical background in machine learning but is structured to be accessible to committed non-specialists. Each chapter is available individually from the website. Particularly relevant for Chapter 7 material are Part I (Definitions and Criteria) and Part II (Technical Approaches). This is the book to read if you want to move beyond the conceptual treatment in Chapter 7 to the technical details of fairness measurement.
12. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... & Gebru, T. (2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229.
This paper proposes "model cards" — structured documentation for machine learning models that includes performance metrics disaggregated by demographic group, intended use cases, and known limitations. The model card framework has been adopted by Google, Hugging Face, and other major AI companies as a transparency standard. Relevant to the discussion of pre-deployment testing and disclosure in Section 7.9. The paper is short and accessible; the framework it proposes is directly applicable by organizations developing or purchasing AI systems.
VI. Healthcare AI and Bias
13. Char, D. S., Shah, N. H., & Magnus, D. (2018). Implementing machine learning in health care — addressing ethical challenges. New England Journal of Medicine, 378(11), 981–983.
A concise, authoritative treatment of the ethical challenges of AI in healthcare by physicians and ethicists at Stanford. The piece addresses training data bias, representation gaps, the proxy variable problem as applied to health algorithms, and the clinical consequences of AI systems that perform unequally across demographic groups. Written for a medical audience but fully accessible to business readers. Useful as a complement to the healthcare section of Chapter 7.4.
VII. The Law and Policy Framework
14. Equal Employment Opportunity Commission. (2023). Artificial intelligence and algorithmic fairness initiative: Questions and answers. U.S. EEOC. Available at: https://www.eeoc.gov/laws/guidance/questions-and-answers-clarify-and-provide-a-common-interpretation-uniform-guidelines
The EEOC's technical assistance document on AI and employment discrimination, referenced in Section 7.7. This is the primary regulatory guidance document for US employers using AI in employment decisions. It confirms that Title VII applies to AI hiring tools; identifies specific tools (video interview analysis, résumé screening) as high-risk; and affirms that employers cannot escape liability by attributing discriminatory outcomes to AI vendors. Reading the full document is valuable for HR professionals and legal counsel working with AI hiring tools. The EEOC updates its guidance periodically; verify you are reading the current version.
15. Consumer Financial Protection Bureau. (2022). CFPB circular 2022-03: Adverse action notification requirements in connection with credit decisions based on complex algorithms. CFPB. Available at: https://www.consumerfinance.gov/compliance/circulars/
The CFPB guidance on algorithmic credit decisions, referenced in Section 7.7 and the financial services discussion in Section 7.4. This circular clarifies that lenders must provide specific reasons for adverse credit decisions even when those decisions are made by complex algorithmic models — they cannot cite the model's complexity as a reason they cannot explain the decision. Relevant for financial services professionals and for understanding the tension between complex AI and legal transparency obligations.
VIII. Intersectionality and Social Context
16. Crenshaw, K. W. (1989). Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989(1), 139–167.
The original paper in which Kimberlé Crenshaw introduced the concept of intersectionality, discussed in Section 7.6. Written in a legal context — examining how employment discrimination law fails Black women — this paper established a conceptual framework that has become foundational across multiple fields. Business readers who engage seriously with diversity, equity, and inclusion work, or with AI fairness, should read this paper directly rather than encountering the concept only through secondary citations. It is accessible to non-lawyers, historically specific, and intellectually rigorous.
17. Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. New York University Press.
A book-length treatment of how search algorithms — specifically Google Search — can reproduce and amplify racial bias and stereotypes. Noble combines computational analysis with critical race theory to document patterns in search results for queries about Black women, Black girls, and other groups. The book is written for a general academic audience and does not require technical background. It provides a different and complementary perspective from the machine learning fairness literature — grounded in the social sciences and humanities rather than in computer science — and is important for understanding the sociotechnical dimensions of algorithmic bias discussed in Section 7.1.
IX. Organizational and Cultural Dimensions
18. Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim Code. Polity Press.
A sociological examination of how emerging technologies, including AI systems, can reproduce and extend racial hierarchy — what Benjamin calls the "New Jim Code." The book covers facial recognition, predictive policing, medical AI, and the broader social dynamics of technology-mediated discrimination. Written for a general academic audience. Particularly relevant for the sociotechnical bias concept in Section 7.1 and the structural discrimination discussions throughout Chapter 7. Benjamin's framework helps explain why technical fixes to bias often fail: they address the manifestations of bias without examining the social structures that generate it.
19. Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
A comprehensive critical examination of AI's social, political, and environmental dimensions. Crawford, a senior principal researcher at Microsoft Research, examines AI's entire supply chain — from rare earth mining through data labor to algorithmic decision-making — and the power structures embedded in each stage. Chapter 4, on classification, is particularly relevant to the bias topics in Chapter 7. The book is accessible, well-sourced, and essential reading for business leaders who want to understand AI's broader social context. Crawford's analysis helps explain why algorithmic bias is not an isolated technical problem but a reflection of broader power arrangements.
X. Global and Comparative Perspectives
20. Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press.
A detailed examination of how automated decision systems — including AI, algorithmic scoring, and automated eligibility determination — affect low-income communities in the United States. Eubanks, a political scientist, documents cases in Indiana (automated Medicaid eligibility), Los Angeles (homeless services), and Pennsylvania (child welfare), showing how systems presented as efficiency tools systematically disadvantage the communities they are supposed to serve. The book provides vivid, case-specific examples of the harms that abstract discussions of algorithmic bias produce in practice. Highly recommended for business readers who want to understand what algorithmic harm looks like from the perspective of affected communities. Accessible, rigorously researched, and deeply humane.
A Note on Currency
The field of algorithmic bias moves rapidly. Academic conferences — particularly the ACM Conference on Fairness, Accountability, and Transparency (FAccT), which publishes annually — produce new research that regularly revises the empirical picture. The regulatory landscape in both the United States and Europe is evolving; practitioners should monitor EEOC, CFPB, FTC, and EU AI Office publications for current guidance. The NIST FRVT program publishes updated evaluations of facial recognition systems; check https://www.nist.gov/programs-projects/face-recognition-vendor-testing-frvt for the most recent reports.
The sources listed above represent the foundational literature as of the book's publication. They will remain relevant for conceptual and historical grounding even as the empirical details evolve.