Chapter 26 Further Reading: Fairness, Explainability, and Transparency

DataField.Dev

Chapter 26 Further Reading: Fairness, Explainability, and Transparency

Algorithmic Fairness: Foundations

1. Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press. The most comprehensive textbook on algorithmic fairness, written by three leading researchers. Covers the mathematical foundations of fairness definitions, the impossibility results, causal reasoning approaches, and the sociotechnical context in which fairness decisions are made. Available free online at fairmlbook.org. Essential for any reader who wants to move beyond the introductory treatment in this chapter to a rigorous understanding of the field.

2. Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153-163. The paper that proved one version of the impossibility theorem discussed in Section 26.3. Chouldechova demonstrates that when base rates differ across groups, calibration and equal false positive/negative rates cannot be simultaneously achieved. The proof is elegant and accessible. Essential reading for understanding why "just make it fair" is not a well-defined instruction.

3. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). "Inherent Trade-Offs in the Fair Determination of Risk Scores." Proceedings of Innovations in Theoretical Computer Science (ITCS). The companion impossibility result to Chouldechova's paper, arrived at independently. Kleinberg, Mullainathan, and Raghavan prove that calibration and balance (equal positive/negative predictive values across groups) cannot coexist except in trivial cases. Together with Chouldechova's result, this paper establishes the mathematical foundations for the tradeoff framework in Section 26.3.

4. Hardt, M., Price, E., & Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." Advances in Neural Information Processing Systems (NeurIPS). Introduces the equalized odds fairness definition and proposes a post-processing method for achieving it. The paper's key insight is that equalized odds can be achieved by adjusting the model's threshold for each group — trading off some overall accuracy for equal error rates. The method is practical and widely implemented. Connects directly to the equalized odds discussion in Section 26.2.

Explainability: SHAP and LIME

5. Lundberg, S. M., & Lee, S.-I. (2017). "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems (NeurIPS). The foundational SHAP paper. Lundberg and Lee show that Shapley values from cooperative game theory provide a unified framework for feature attribution that satisfies desirable theoretical properties (local accuracy, missingness, consistency). The paper connects several existing explanation methods (LIME, DeepLIFT, layer-wise relevance propagation) as special cases of a single framework. Read this paper to understand why SHAP has become the dominant explanation method.

6. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ... & Lee, S.-I. (2020). "From Local Explanations to Global Understanding with Explainable AI for Trees." Nature Machine Intelligence, 2(1), 56-67. Extends the original SHAP framework with fast exact algorithms for tree-based models (TreeExplainer) and introduces global explanation tools (SHAP interaction values, dependence plots). The paper demonstrates that for tree-based models, exact Shapley values can be computed in polynomial time — making SHAP practical for large-scale deployment. Directly relevant to the ExplainabilityDashboard implementation in Section 26.11.

7. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "'Why Should I Trust You?': Explaining the Predictions of Any Classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. The paper that introduced LIME (Local Interpretable Model-agnostic Explanations). Ribeiro, Singh, and Guestrin demonstrate that complex models can be explained locally by fitting simple, interpretable models to perturbations of individual predictions. The paper's title captures the fundamental question driving the explainability field. Read alongside the SHAP paper for a complete picture of the two dominant explanation approaches.

8. Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2nd ed.). christophm.github.io/interpretable-ml-book/. The most accessible and comprehensive guide to interpretable machine learning methods, available free online. Covers SHAP, LIME, partial dependence plots, feature importance, counterfactual explanations, and more — each with clear explanations, visual examples, and practical guidance. An excellent reference for any practitioner building explainability tools. Updated regularly.

Model Documentation

9. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT). The paper that introduced the model card framework discussed in Section 26.9. Mitchell et al. propose a standardized format for documenting machine learning models, inspired by datasheets in the electronics industry. The paper includes detailed examples for two models (a smile detection classifier and a toxicity detector) that serve as templates for practitioners. Now adopted as standard practice by Google, Hugging Face, and many other organizations.

10. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daume III, H., & Crawford, K. (2021). "Datasheets for Datasets." Communications of the ACM, 64(12), 86-92. The companion framework to model cards, focused on documenting datasets rather than models. Gebru et al. argue that rigorous dataset documentation is essential for responsible AI development, as many AI failures originate in the training data. The paper provides a comprehensive template of questions that dataset creators should answer, organized into seven categories: motivation, composition, collection process, preprocessing, uses, distribution, and maintenance. Essential reading for any organization that creates or curates training data.

Legal and Regulatory Frameworks

11. Wachter, S., Mittelstadt, B., & Floridi, L. (2017). "Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation." International Data Privacy Law, 7(2), 76-99. A provocative and influential legal analysis arguing that GDPR does not actually create a "right to explanation" — contrary to the popular narrative. The authors distinguish between a right to be informed about the existence of automated processing (which GDPR clearly provides) and a right to receive a meaningful explanation of specific decisions (which is less clear). This paper is essential for understanding the legal debate referenced in Section 26.12 and for avoiding the oversimplification that "GDPR requires explanations."

12. Selbst, A. D., & Barocas, S. (2018). "The Intuitive Appeal of Explainable Machines." Fordham Law Review, 87(3), 1085-1139. A legal and philosophical examination of why we want explanations from machines and what counts as a "good" explanation. Selbst and Barocas argue that the demand for explainability reflects deeper concerns about accountability, contestability, and power — and that technical explanations (feature importance values) may not satisfy these underlying concerns. A thoughtful counterpoint to the assumption that SHAP values solve the explainability problem.

13. European Commission. (2024). Regulation (EU) 2024/1689: The Artificial Intelligence Act. Official Journal of the European Union. The full text of the EU AI Act, the world's first comprehensive AI regulation. Required reading for any organization deploying AI in or for the EU market. The Act establishes risk-based categories, transparency requirements, conformity assessments, and penalties (up to 7 percent of global annual turnover for violations). Particularly relevant to this chapter are the high-risk AI system requirements for transparency, human oversight, and documentation in Title III.

Applied Fairness and Case Studies

14. Vigdor, N. (2019). "Apple Card Investigated After Gender Discrimination Complaints." The New York Times, November 10, 2019. The news report that brought the Apple Card gender bias controversy to mainstream attention. Vigdor's reporting captures the public's reaction to Goldman Sachs' inability to explain credit limit decisions — the explainability failure that is the core of Case Study 1 in this chapter. Read alongside the NYDFS investigation findings for a complete picture.

15. Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). "Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT). Proposes a practical framework for internal AI auditing that organizations can adopt. The authors, drawing on experience at Google, describe five stages of an algorithmic audit: scoping, mapping, artifact collection, testing, and reflection. Directly relevant to the fairness audit workflow in Exercise 26.13 and to the governance frameworks in Chapter 27.

16. Buolamwini, J., & Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT). The landmark study that demonstrated dramatic accuracy disparities in commercial facial recognition systems across intersections of gender and skin tone. Buolamwini and Gebru found that IBM, Microsoft, and Face++ classifiers had error rates up to 34 percent for darker-skinned women compared to less than 1 percent for lighter-skinned men. The paper's methodology — disaggregated evaluation across demographic subgroups — has become standard practice and is directly reflected in the model card framework's requirement for performance metrics broken down by group.

Broader Context: Ethics and Society

17. O'Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown. The foundational popular text on algorithmic harm. O'Neil, a mathematician and former quantitative analyst, examines how predictive models in criminal justice, education, hiring, and insurance encode and amplify inequality. The book's core argument — that opaque, widespread, and destructive algorithmic systems constitute "weapons of math destruction" — provides essential context for understanding why explainability and fairness matter beyond regulatory compliance. Accessible to non-technical readers.

18. Rudin, C. (2019). "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead." Nature Machine Intelligence, 1(5), 206-215. A provocative and widely cited argument that in high-stakes applications, organizations should use inherently interpretable models rather than explaining black-box models with post-hoc methods. Rudin argues that the accuracy-interpretability tradeoff is often a myth — well-designed interpretable models frequently match black-box performance. This paper challenges the assumption underlying SHAP and LIME that complex models are necessary and explanation methods can adequately bridge the gap. Essential reading for anyone deciding between interpretable and complex models (Section 26.5).

19. Doshi-Velez, F., & Kim, B. (2017). "Towards a Rigorous Science of Interpretable Machine Learning." arXiv preprint arXiv:1702.08608. A foundational paper that attempts to formalize what "interpretability" means and how it should be evaluated. The authors distinguish between application-grounded, human-grounded, and functionally-grounded evaluation approaches. Important for moving beyond the vague claim that a model is "interpretable" to a rigorous assessment of what type of interpretability is needed and for whom.

Industry Practices and Tooling

20. Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., ... & Zimmermann, T. (2019). "Software Engineering for Machine Learning: A Case Study." IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice. A study of ML engineering practices at Microsoft, documenting the challenges of building production ML systems — including fairness testing, model documentation, and explanation tooling. Provides empirical evidence for the organizational and engineering challenges discussed in this chapter and previews the governance frameworks in Chapter 27.

21. Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J. M. F., & Eckersley, P. (2020). "Explainable Machine Learning in Deployment." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT). A study of how organizations actually deploy explainability tools in practice — not in research papers, but in production systems. The authors survey ML engineers at major technology companies and find that explainability is primarily used for internal debugging rather than external communication, and that the gap between academic explainability research and practical deployment remains wide. A sobering complement to the technical methods discussed in this chapter.

22. FICO. (2022). "Responsible Artificial Intelligence: Driving Fair, Transparent, and Ethical AI." FICO White Paper. FICO's own account of its approach to explainable credit scoring, including its reason code system and its approach to balancing accuracy with interpretability. Provides the industry perspective on the challenges discussed in Case Study 2. Available on FICO's website.

Additional References

23. Corbett-Davies, S., & Goel, S. (2018). "The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning." arXiv preprint arXiv:1808.00023. A comprehensive review of the fairness in machine learning literature that synthesizes the field's key contributions and unresolved tensions. Particularly valuable for its discussion of how different fairness definitions relate to different philosophical traditions (utilitarian, egalitarian, libertarian) and how context should guide the choice among them. An excellent bridge between the technical and philosophical dimensions of fairness.

24. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). "A Survey on Bias and Fairness in Machine Learning." ACM Computing Surveys, 54(6), 1-35. A broad survey covering over 100 papers on bias and fairness in ML, organized by bias type (data bias, algorithmic bias, user interaction bias) and mitigation approach (pre-processing, in-processing, post-processing). Useful as a reference for identifying specific types of bias and finding relevant mitigation techniques.

25. Suresh, H., & Guttag, J. (2021). "A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle." Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO). Identifies six distinct sources of harm in the ML pipeline: historical bias, representation bias, measurement bias, aggregation bias, evaluation bias, and deployment bias. This taxonomy is more granular than the Chapter 25 framework and provides a structured approach to auditing AI systems for potential harms at each stage of development and deployment.

Each item in this reading list was selected because it directly supports concepts introduced in Chapter 26. Items 1-4 deepen the fairness definitions and impossibility results. Items 5-8 provide technical foundations for SHAP, LIME, and interpretability. Items 9-10 cover model and dataset documentation. Items 11-13 address legal and regulatory frameworks. Items 14-16 provide case study context. Items 17-19 offer broader ethical and philosophical perspectives. Items 20-25 cover industry practice and survey literature. For the full bibliography, see Appendix C.