Further Reading: Statistics and AI: Being a Critical Consumer of Data
Books
For Deeper Understanding
Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (2016) O'Neil, a mathematician and former hedge fund quant, coined the term "weapons of math destruction" (WMDs) for algorithms that are opaque, unregulated, and difficult to contest. Her case studies — predatory lending models, teacher evaluation algorithms, recidivism prediction systems — are the narrative companions to the technical analysis in this chapter. O'Neil's central argument is that algorithms encode the biases of their creators and the inequities of their training data, then apply those biases at scale. If this chapter convinced you that algorithmic bias is a statistical problem, this book will show you its human toll.
Hannah Fry, Hello World: Being Human in the Age of Algorithms (2018) Fry, a mathematician at University College London, provides a balanced, accessible overview of algorithms in criminal justice, medicine, transportation, and art. What sets this book apart is Fry's nuanced approach: she doesn't demonize algorithms or evangelize them. Instead, she applies the kind of critical evaluation we've practiced in this chapter — asking what an algorithm does well, where it fails, and what the alternative is. Her chapter on criminal justice covers COMPAS with both rigor and empathy.
Brian Christian, The Alignment Problem: Machine Learning and Human Values (2020) Christian traces the history of how machine learning systems learn — and fail to learn — human values. The book covers reinforcement learning, reward hacking, fairness definitions, and interpretability in depth. For readers who want to understand the technical mechanisms behind the problems described in this chapter (why algorithms overfit, how bias enters training pipelines, what "explainability" actually means), Christian provides the most thorough treatment available for a general audience.
Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (2018) Noble examines how search engine algorithms — trained on patterns in web content — reproduce and amplify racial and gender stereotypes. Her analysis of Google search results for terms like "Black girls" reveals how statistical patterns in training data can produce deeply harmful outputs. This book provides the sociological context for the technical concepts of training data bias and proxy variables discussed in Section 26.3.
Meredith Broussard, Artificial Unintelligence: How Computers Misunderstand the World (2018) Broussard introduces the concept of "technochauvinism" — the assumption that technology solutions are always superior to human solutions. Her critique provides a useful framework for the prediction vs. inference distinction from Section 26.7. Not every problem is best solved by more data and bigger models; some problems require human understanding, judgment, and empathy.
For the Conceptually Curious
Ajay Agrawal, Joshua Gans, and Avi Goldfarb, Prediction Machines: The Simple Economics of Artificial Intelligence (2018) Three economists argue that AI should be understood as a drop in the cost of prediction — and explore the economic consequences. Their framework maps directly onto our prediction vs. inference distinction: prediction is getting cheaper, but judgment (what to do with predictions) remains expensive and essentially human. An accessible economics-of-AI primer.
Judea Pearl and Dana Mackenzie, The Book of Why: The New Science of Cause and Effect (2018) Pearl, a Turing Award winner, makes the case that AI systems need causal reasoning — not just pattern recognition — to be truly intelligent. His argument deepens the correlation-vs.-causation thread from Chapter 4 and Chapter 22: current machine learning finds correlations at scale but cannot reason about causes. Pearl's "ladder of causation" (seeing, doing, imagining) provides a framework for understanding the fundamental limits of prediction without causal models.
Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World (2015) Domingos provides a tour of the five major tribes of machine learning (analogizers, Bayesians, symbolists, connectionists, and evolutionaries) and argues they're converging toward a single "master algorithm." For readers who want to understand where supervised learning, unsupervised learning, and reinforcement learning fit in the broader ML landscape, this is an engaging, if occasionally speculative, guide.
Arvind Narayanan, Sayash Kapoor, AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference (2024) A timely and rigorous guide from two Princeton researchers who distinguish between AI applications that work well (computer vision, game playing, language translation) and those that are largely "snake oil" (predicting social outcomes, criminal recidivism, job performance). Their framework aligns closely with the STATS checklist: always ask what the training data looks like, how performance was validated, and whether the claims are supported by the evidence. Essential reading for anyone who wants to be a more critical consumer of AI claims.
Articles and Papers
Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). "Machine Bias." ProPublica. The investigative report that launched the COMPAS debate. ProPublica's analysis of 7,000+ defendants in Broward County, Florida found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high risk. The analysis is statistically rigorous and clearly presented — a model of data journalism. Available free at propublica.org. This is the primary source for Case Study 1.
Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153-163. The mathematical proof that when base rates differ between groups, no prediction algorithm can simultaneously equalize false positive rates, false negative rates, and predictive values. This paper makes the impossibility result discussed in Section 26.5 rigorous. The paper is technical but the core theorem is clearly stated and the implications are well explained.
Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." Science, 366(6464), 447-453. The landmark study documenting how a widely used healthcare algorithm systematically under-identified Black patients' medical needs by using cost as a proxy for health. The authors show that fixing the algorithm would increase the percentage of Black patients flagged for extra care from 17.7% to 46.5%. This paper demonstrates the proxy variable problem from Section 26.5 with devastating clarity.
Dastin, J. (2018). "Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women." Reuters. The reporting on Amazon's internal hiring algorithm that learned to penalize resumes containing the word "women's." A concise, well-reported example of how training data bias produces algorithmic bias. Essential background for the confounding variable analysis in Section 26.5.
Lazer, D., Kennedy, R., King, G., and Vespignani, A. (2014). "The Parable of Google Flu: Traps in Big Data Analysis." Science, 343(6176), 1203-1205. A post-mortem on Google Flu Trends that identifies two key failures: overfitting to spurious correlations and the instability of search-behavior patterns over time. The authors argue that big data hubris — the assumption that enough data eliminates the need for theory and careful methodology — led to GFT's failure. This paper is the definitive source for Section 26.6's critique of big data fallacies.
Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. The influential (and controversial) paper arguing that large language models have environmental costs, encode biases from their training data, and can produce text that appears meaningful without genuine understanding. The "stochastic parrot" metaphor — a system that produces statistically likely sequences without comprehension — connects directly to the LLM discussion in Section 26.8.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru, T. (2019). "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229. Proposes a standardized framework for documenting machine learning models — including training data, intended uses, performance metrics across subgroups, and limitations. Model cards operationalize many of the questions in the STATS checklist. If you've ever wished AI systems came with a "nutrition label," this paper is where that idea was formalized.
Online Resources
ProPublica's COMPAS Data and Analysis (github.com/propublica/compas-analysis) The full dataset and analysis code from ProPublica's Machine Bias investigation. For students who want to replicate James's analysis (or extend it), the data and R code are publicly available.
Tyler Vigen, Spurious Correlations (tylervigen.com/spurious-correlations) The famous website showing absurd but statistically real correlations between unrelated variables. A vivid, entertaining demonstration of why correlation does not imply causation — and why data mining in large datasets can produce compelling but meaningless patterns.
AI Incident Database (incidentdatabase.ai) A searchable database of real-world AI failures and harms. Each incident is cataloged with details about what went wrong and why. An excellent resource for applying the STATS checklist to real cases.
Algorithmic Justice League (ajl.org) Founded by computer scientist Joy Buolamwini, whose research documented racial and gender bias in facial recognition systems. The site includes research papers, educational resources, and policy recommendations.
Google's "People + AI Guidebook" (pair.withgoogle.com/guidebook) A practical resource for designing AI systems that work well for people. Covers topics including mental models of AI, setting expectations, explaining AI, and handling errors. Useful for understanding the "explanation" side of the transparency debate from Section 26.10.
Connections to Future Chapters
| Topic | Where It Goes Next |
|---|---|
| Algorithmic bias and ethics | Ch.27 deepens the ethical framework — from evaluating AI as a consumer to producing data ethically as a practitioner |
| Simpson's paradox | Ch.27 shows how data can tell opposite stories at different levels of aggregation — a crucial tool for understanding misleading AI claims |
| The STATS checklist | Carries forward to Ch.27 (applied to ethical claims) and Ch.28 (part of your lifelong toolkit) |
| Critical evaluation skills | Ch.28 positions these skills as foundational for continued learning and professional practice |