Chapter 39: Further Reading

Foundational Texts

  • Barocas, S., Hardt, M., and Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press. The definitive textbook on fairness in ML, covering definitions, impossibility results, and mitigation strategies. Freely available at https://fairmlbook.org/.

  • O'Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown. An accessible account of how algorithms can perpetuate and amplify societal inequities.

  • Crawford, K. (2021). Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press. Examines the material, social, and environmental costs of AI systems.

Bias and Fairness

Key Papers

  • Buolamwini, J. and Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." FAccT 2018. Demonstrated significant accuracy disparities in commercial facial recognition across gender and skin type.

  • Hardt, M., Price, E., and Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." NeurIPS 2016. Introduced the equalized odds and equal opportunity fairness definitions.

  • Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2). Proved the impossibility of simultaneously satisfying calibration and equalized odds when base rates differ.

  • Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). "Inherent Trade-Offs in the Fair Determination of Risk Scores." ITCS 2017. Independent proof of incompatibility between fairness criteria.

  • Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). "Fairness Through Awareness." ITCS 2012. Introduced the individual fairness framework.

Bias Mitigation

  • Zhang, B. H., Lemoine, B., and Mitchell, M. (2018). "Mitigating Unwanted Biases with Adversarial Learning." AIES 2018. Adversarial debiasing for fair classification.

  • Zemel, R., Wu, Y., Swersky, K., Pitassi, T., and Dwork, C. (2013). "Learning Fair Representations." ICML 2013. Learning representations that are invariant to protected attributes.

  • Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., and Wallach, H. (2018). "A Reductions Approach to Fair Classification." ICML 2018. Reducing fair classification to a sequence of standard classification problems.

Privacy

  • Dwork, C. (2006). "Differential Privacy." ICALP 2006. The foundational paper on differential privacy.

  • Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. (2016). "Deep Learning with Differential Privacy." CCS 2016. Introduced DP-SGD for training neural networks with differential privacy.

  • Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T. B., Song, D., Erlingsson, U., Oprea, A., and Raffel, C. (2021). "Extracting Training Data from Large Language Models." USENIX Security 2021. Demonstrated that GPT-2 memorizes and can regurgitate training data.

  • Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017). "Membership Inference Attacks Against Machine Learning Models." IEEE S&P 2017. The foundational paper on membership inference attacks.

Regulation

  • European Parliament and Council (2024). Regulation (EU) 2024/1689 (AI Act). The full text of the EU AI Act. Available at https://eur-lex.europa.eu/.

  • NIST (2023). AI Risk Management Framework (AI RMF 1.0). Voluntary US framework for managing AI risks. Available at https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence.

  • Smuha, N. A. (2021). "From a 'Race to AI' to a 'Race to AI Regulation': Regulatory Competition for Artificial Intelligence." Law, Innovation and Technology, 13(1). Comparative analysis of AI regulation across jurisdictions.

Documentation and Accountability

  • Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru, T. (2019). "Model Cards for Model Reporting." FAccT 2019. Proposed standardized documentation for trained ML models.

  • Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daume III, H., and Crawford, K. (2021). "Datasheets for Datasets." Communications of the ACM, 64(12). Proposed standardized documentation for datasets.

AI Safety

  • Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mane, D. (2016). "Concrete Problems in AI Safety." arXiv:1606.06565. Outlines five practical safety problems: avoiding side effects, reward hacking, scalable oversight, safe exploration, and distributional shift.

  • Ngo, R., Chan, L., and Mindermann, S. (2024). "The Alignment Problem from a Deep Learning Perspective." ICLR 2024. A modern treatment of AI alignment challenges specific to deep learning systems.

  • Hendrycks, D., Carlini, N., Mazeika, M., et al. (2022). "Unsolved Problems in ML Safety." arXiv:2109.13916. A comprehensive catalog of open problems in ML safety.

Environmental Impact

  • Strubell, E., Ganesh, A., and McCallum, A. (2019). "Energy and Policy Considerations for Deep Learning in NLP." ACL 2019. Quantified the carbon footprint of training large NLP models and called for energy reporting.

  • Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J. (2021). "Carbon Emissions and Large Neural Network Training." arXiv:2104.10350. Google's analysis of the carbon footprint of training large models.

Deepfakes

  • Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., and Ortega-Garcia, J. (2020). "DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection." Information Fusion, 64, 131--148. Comprehensive survey of deepfake generation and detection techniques.

Software and Tools

  • Fairlearn: https://fairlearn.org/. Microsoft's open-source toolkit for assessing and improving fairness of AI systems.

  • AI Fairness 360 (AIF360): https://aif360.mybluemix.net/. IBM's comprehensive fairness toolkit with 70+ metrics and 10+ mitigation algorithms.

  • Opacus: https://opacus.ai/. PyTorch library for training with differential privacy (DP-SGD).

  • What-If Tool: https://pair-code.github.io/what-if-tool/. Google's interactive tool for probing ML models for fairness.