Chapter 8: Further Reading and Annotated Bibliography

Foundational Frameworks

1. Suresh, H., & Guttag, J. (2019). "A Framework for Understanding Sources of Harm Throughout the Machine Learning Life Cycle." arXiv preprint arXiv:1901.10002. Revised 2021.

The organizing framework for Chapter 8. Suresh and Guttag identify seven sources of bias across the ML pipeline (their taxonomy uses seven categories; Chapter 8 follows the six-category version). The paper is remarkably accessible for a technical contribution and provides worked examples for each bias type. Essential reading for any practitioner involved in AI development. The 2021 revision improves the deployment bias section substantially. Available freely on arXiv.

Recommended for: All readers. The primary framework for this chapter.

2. Gebru, T., Morgenstern, J., Vecchione, B., Wortman Vaughan, J., Wallach, H., Daumé III, H., & Crawford, K. (2018). "Datasheets for Datasets." Communications of the ACM, 64(12), 86–92. (Originally arXiv preprint, 2018.)

The paper that proposed standardized documentation for machine learning datasets, modeled on the datasheets used for electronic components. Gebru and colleagues argue that the absence of standardized documentation for training datasets is a root cause of many AI bias failures. The paper provides the full datasheet template with questions for each category. Highly practical and now widely cited in both academia and industry. Timnit Gebru's subsequent forced departure from Google (2020) over a paper on large language model harms is itself a case study in organizational power dynamics in AI ethics.

Recommended for: Practitioners responsible for data governance, procurement, or AI development. Essential reference.

3. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT), 220–229.

The companion to "Datasheets for Datasets," applied at the model level. Model cards provide a standardized format for documenting a model's intended uses, evaluation methodology, performance across demographic groups, and ethical considerations. The paper is brief and practical, and the template is immediately usable. Many AI companies now publish model cards, though quality varies substantially; this paper provides the standard against which to evaluate them.

Recommended for: Practitioners evaluating AI vendors, product managers, AI governance professionals.

Bias in Machine Learning: Core Research

4. Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). "Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings." Advances in Neural Information Processing Systems (NeurIPS), 29.

The landmark paper demonstrating that word embeddings trained on Google News text encode gender stereotypes — specifically, that analogical reasoning on word vectors reproduces gendered occupational associations. The paper also proposes technical debiasing methods, though subsequent research has questioned whether those methods successfully remove the underlying bias or merely hide it. The core finding — that language models absorb gender stereotypes from training data — has been replicated extensively and is foundational to understanding LLM bias.

Recommended for: Readers seeking technical depth on embedding bias; practitioners using word embeddings or LLMs in downstream applications.

5. Caliskan, A., Bryson, J., & Narayanan, A. (2017). "Semantics Derived Automatically from Language Corpora Contain Human-Like Biases." Science, 356(6334), 183–186.

An exceptionally influential paper demonstrating that the biases documented in the Implicit Association Test — a well-validated psychological measure of implicit human bias — are reproduced in word embeddings trained on large text corpora. Using a computational analogue of the IAT (the Word Embedding Association Test, or WEAT), Caliskan and colleagues showed that word embeddings associate white names with pleasant concepts and Black names with unpleasant concepts, replicate gender stereotypes in career/family domains, and reproduce documented patterns of implicit age bias. Published in Science — one of the highest-prestige general science journals — which reflects the significance of the finding for both the AI and social science communities.

Recommended for: Readers seeking the scientific basis for the claim that AI systems absorb human biases from text; interdisciplinary audiences interested in connections between AI and psychology.

6. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). "Fairness Through Awareness." Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS), 214–226.

The formal paper that established why "fairness through blindness" — ignoring protected attributes — is insufficient for achieving non-discrimination. Dwork and colleagues argue that individual fairness requires treating similar individuals similarly, which requires knowing what dimensions of similarity are relevant — including group membership. The paper provides the theoretical foundation for Section 8.8's argument that removing protected attributes does not prevent proxy discrimination. Technical but accessible to readers with some quantitative background.

Recommended for: Technically oriented readers; those interested in the formal foundations of algorithmic fairness.

7. Buolamwini, J., & Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAccT), 77–91.

The Gender Shades study evaluated three commercial face analysis systems on a dataset specifically constructed to be balanced across gender and skin tone categories. The study found error rate disparities of up to 34 percentage points between lighter-skinned men (best performance) and darker-skinned women (worst performance). The paper is a methodological template for bias auditing using disaggregated evaluation and is credited with prompting Microsoft, IBM, and others to update their face analysis systems. Joy Buolamwini's associated TED talk and documentary ("Coded Bias") have made this work widely accessible to non-technical audiences.

Recommended for: All readers. A landmark empirical study with clear practical implications.

Medical Device and Healthcare Bias

8. Sjoding, M. W., Dickson, R. P., Iwashyna, T. J., Gay, S. E., & Valley, T. S. (2020). "Racial Bias in Pulse Oximetry Measurement." New England Journal of Medicine, 383(25), 2477–2478.

The primary research publication documenting racial disparities in pulse oximeter accuracy, forming the empirical basis for Case Study 8.1. Using paired pulse oximeter and arterial blood gas measurements from two major academic medical centers, Sjoding and colleagues found that occult hypoxemia (arterial O2 < 88% despite normal-appearing pulse oximeter reading) occurred at three times the rate in Black patients compared to white patients. Brief (a correspondence article) but methodologically clear. Published at the height of the COVID-19 pandemic, this paper prompted broad clinical and regulatory attention to pulse oximeter bias.

Recommended for: All readers interested in Case Study 8.1; healthcare professionals; those studying measurement bias in medical devices.

9. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." Science, 366(6464), 447–453.

An important companion to the pulse oximeter case, documenting racial bias in a widely used health management algorithm. Obermeyer and colleagues found that the algorithm — used by hundreds of US hospitals to identify patients who would benefit from care management programs — was systematically less likely to flag Black patients than white patients with equivalent underlying illness severity. The cause: the algorithm used healthcare cost as a proxy for health need, and Black patients incur lower costs than white patients with equivalent illness severity because they have historically received less care. The algorithm replicated this pattern. The paper provides a clear mechanism and proposes a remediation approach.

Recommended for: Healthcare professionals, health technology practitioners, anyone studying proxy variable bias in consequential automated systems.

10. US Food and Drug Administration. (2022). "Pulse Oximetry — Recommended Studies and Labeling." FDA Guidance Document.

The FDA's regulatory response to documented pulse oximeter bias, providing background on the regulatory framework governing medical devices and the specific guidance proposed for improving calibration study diversity. Reading this alongside the Sjoding et al. study provides insight into the gap between scientific documentation of a harm and regulatory response. Note the timeline: the Sjoding study was published in December 2020; the FDA convened an expert panel in November 2022; proposed guidance was issued in early 2024. The pacing of regulatory response to documented harm is itself an important governance lesson.

Recommended for: Readers interested in regulatory accountability and medical device governance.

Large Language Models and Cultural Bias

11. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). "Language Models Are Few-Shot Learners." Advances in Neural Information Processing Systems (NeurIPS), 33, 1877–1901.

The original GPT-3 paper. Section 6 contains OpenAI's initial discussion of bias and limitations, which is notably candid about the presence of bias and the difficulty of addressing it. Reading the original bias disclosure alongside subsequent research documenting the extent of the bias provides perspective on the gap between developer acknowledgment and real-world consequence. The full paper is highly technical, but Section 6 is accessible to a general audience and essential context for Case Study 8.2.

Recommended for: Readers studying LLM development and bias; those interested in how AI developers document known limitations.

12. Abid, A., Farooqi, M., & Zou, J. (2021). "Persistent Anti-Muslim Bias in Large Language Models." Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 298–306.

The primary research paper documenting anti-Muslim bias in GPT-3, forming the empirical basis for Section 8.9 and Case Study 8.2. The paper uses a prompt completion methodology to demonstrate that GPT-3 associates Muslim identity with violence at dramatically higher rates than comparable religious groups, and tests the robustness of this association across a range of prompt variations. The finding is methodologically transparent and the results are striking. An accessible paper for non-specialists.

Recommended for: All readers studying LLM bias; Case Study 8.2 discussion.

13. Gehman, S., Gururangan, S., Sap, M., Choi, Y., & Smith, N. A. (2020). "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models." Findings of the Association for Computational Linguistics: EMNLP 2020, 3356–3369.

The paper introducing the RealToxicityPrompts dataset for studying toxicity in language models. The paper documents that large language models produce toxic content — hateful, threatening, or sexually explicit — at substantial rates in response to ordinary prompts, and that even mildly negative prompts reliably elicit severely toxic continuations. An important companion to the Abid et al. anti-Muslim bias study that establishes the general phenomenon of which anti-Muslim bias is a specific instance.

Recommended for: Readers studying toxicity in language models; practitioners deploying LLMs in consumer-facing applications.

14. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., ... & Lowe, R. (2022). "Training Language Models to Follow Instructions with Human Feedback." Advances in Neural Information Processing Systems (NeurIPS), 35.

The InstructGPT paper describing the RLHF-based approach used to develop aligned language models. Essential background for understanding what RLHF does, how it works, and what limitations the developers themselves acknowledge. Section 5 of the paper discusses limitations including potential for RLHF to introduce new biases from rater demographics, the possibility that alignment fine-tuning reduces performance on some tasks, and the sensitivity of the approach to prompt distribution. Reading the developers' own candid acknowledgment of RLHF limitations is more valuable than most secondary summaries.

Recommended for: Technical readers; anyone studying the alignment tax and RLHF limitations.

Fairness and Algorithmic Accountability

15. Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). "Machine Bias: There's Software Used Across the Country to Predict Future Criminals. And It's Biased Against Blacks." ProPublica Investigative Report.

The investigative journalism piece that brought COMPAS recidivism risk assessment bias to public attention and sparked a substantial body of academic research into algorithmic fairness in criminal justice. ProPublica's analysis found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high-risk, while white defendants were more likely to be falsely labeled low-risk. The subsequent academic debate about which fairness metric should apply — and whether Northpointe's counter-claim of calibration fairness was valid — made this case a central example in algorithmic fairness research.

Recommended for: All readers. Essential background on the COMPAS debate and the incompatibility of fairness metrics.

16. Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153–163.

An important technical companion to the ProPublica COMPAS analysis. Chouldechova demonstrates mathematically that when base rates differ between groups, it is impossible to simultaneously satisfy calibration (equal predictive accuracy across groups) and error rate parity (equal false positive and false negative rates across groups). This result — that different fairness criteria are mathematically incompatible under certain conditions — is one of the most important theoretical findings in algorithmic fairness research and directly informs Chapter 9's treatment of fairness metrics.

Recommended for: Technically oriented readers; those who want to understand the mathematical basis of the fairness criteria debate.

17. Raji, I. D., & Buolamwini, J. (2019). "Actionable Auditing: Investigating the Impact of Publicly Naming and Shaming Commercial AI Systems." Proceedings of the 2019 AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 429–435.

A study of whether public disclosure of AI system bias — specifically, the Gender Shades findings — prompted companies to improve their systems. The paper finds that public naming and shaming did produce some improvements, particularly at Microsoft and IBM, though improvements were uneven. The paper raises important questions about the effectiveness of voluntary disclosure vs. mandatory auditing as accountability mechanisms, and provides empirical evidence on which accountability mechanisms actually produce change.

Recommended for: Readers interested in AI governance and accountability; policy-oriented readers.

18. Crawford, K. (2021). "Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence." Yale University Press.

A comprehensive critical analysis of AI's social, political, and environmental dimensions, written for a general audience. Crawford's chapter on AI and bias situates algorithmic discrimination within broader structures of power, labor exploitation, and environmental cost. The book provides essential context for the "recurring themes" of this chapter — power and accountability, ethics washing, and global variation. Crawford argues that AI bias is not a technical failure to be patched but a political and economic phenomenon requiring structural analysis.

Recommended for: All readers. Provides the broader social and political context for the technical material in Chapter 8.

19. Eubanks, V. (2018). "Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor." St. Martin's Press.

A closely researched examination of how automated decision systems — in child welfare, healthcare access, criminal justice, and public benefits — affect low-income communities in the United States. Eubanks documents specific systems and their documented harms through extensive fieldwork and interviews with affected individuals. The book is essential for understanding how AI bias interacts with pre-existing social inequality and why affected communities often cannot meaningfully contest algorithmic decisions. A powerful complement to more technically oriented readings.

Recommended for: All readers. Particularly important for understanding the human consequences of the technical failures described in Chapter 8.

20. Benjamin, R. (2019). "Race After Technology: Abolitionist Tools for the New Jim Code." Polity Press.

Benjamin introduces the concept of the "New Jim Code" — the use of technology to encode and reinforce racial discrimination — and argues that racist algorithms reflect racist social structures, not merely technical errors. The book examines facial recognition, algorithmic policing, predictive analytics in healthcare and education, and other domains. Benjamin argues that "imagined objectivity" — the false belief that algorithms are race-neutral — makes algorithmic discrimination harder to challenge than openly discriminatory human decisions. A sociological and critical perspective that complements the technical taxonomy of Chapter 8.

Recommended for: Readers interested in the social and structural dimensions of AI bias; those studying power and accountability themes.

Note on Open Access

Many of the academic papers listed above are available as preprints on arXiv (arxiv.org) at no cost, including Suresh and Guttag (2019), Gebru et al. (2018) Datasheets, Bolukbasi et al. (2016), Caliskan et al. (2017), and Abid et al. (2021). The ProPublica COMPAS analysis (Angwin et al., 2016) is freely available on ProPublica's website. Sjoding et al. (2020) and Obermeyer et al. (2019) are available through institutional library access.

The books by Crawford, Eubanks, and Benjamin are available in print and digital formats through most academic and public library systems.