Bibliography

This bibliography organizes sources into three tiers reflecting the textbook's citation practices, followed by a recommended reading list.


Tier 1: Verified Sources with Full APA 7th Edition Citations

These sources are peer-reviewed publications, government reports, or well-established reference works with verifiable bibliographic information.

Books

Agresti, A. (2018). Statistical methods for the social sciences (5th ed.). Pearson.

Cairo, A. (2019). How charts lie: Getting smarter about visual information. W. W. Norton.

Cairo, A. (2016). The truthful art: Data, charts, and maps for communication. New Riders.

Cleveland, W. S. (1993). Visualizing data. Hobart Press.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.

De Veaux, R. D., Velleman, P. F., & Bock, D. E. (2021). Stats: Data and models (5th ed.). Pearson.

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman and Hall/CRC.

Fisher, R. A. (1935). The design of experiments. Oliver and Boyd.

Gigerenzer, G. (2002). Calculated risks: How to know when numbers deceive you. Simon & Schuster.

Huff, D. (1954). How to lie with statistics. W. W. Norton.

Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.

Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. Wiley.

McGrayne, S. B. (2011). The theory that would not die: How Bayes' rule cracked the Enigma code, hunted down Russian submarines, and emerged triumphant from two centuries of controversy. Yale University Press.

Moore, D. S., McCabe, G. P., & Craig, B. A. (2021). Introduction to the practice of statistics (10th ed.). W. H. Freeman.

O'Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.

Salsburg, D. (2001). The lady tasting tea: How statistics revolutionized science in the twentieth century. W. H. Freeman.

Silver, N. (2012). The signal and the noise: Why so many predictions fail — but some don't. Penguin Press.

Taleb, N. N. (2007). The black swan: The impact of the highly improbable. Random House.

Tufte, E. R. (1983). The visual display of quantitative information. Graphics Press.

Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.

Wheelan, C. (2013). Naked statistics: Stripping the dread from the data. W. W. Norton.

Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.

Journal Articles and Conference Papers

Agresti, A., & Coull, B. A. (1998). Approximate is better than "exact" for interval estimation of binomial proportions. The American Statistician, 52(2), 119-126.

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17-21.

Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407-425.

Bickel, P. J., Hammel, E. A., & O'Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley. Science, 187(4175), 398-404.

Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16(2), 101-133.

Casscells, W., Schoenberger, A., & Graboys, T. B. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299(18), 999-1001.

Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153-163.

Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch's t-test instead of Student's t-test. International Review of Social Psychology, 30(1), 92-101.

Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), eaao5580.

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1-26.

Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246-263.

Gosset, W. S. [Student]. (1908). The probable error of a mean. Biometrika, 6(1), 1-25.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524-532.

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-291.

Kramer, A. D. I., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788-8790.

Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67(4), 371-378.

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157-175.

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359-1366.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129-133.

Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond "p < 0.05." The American Statistician, 73(Suppl. 1), 1-19.

Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209-212.

Wong, B. (2011). Color blindness. Nature Methods, 8(6), 441.

Government and Institutional Reports

American Statistical Association. (2022). Ethical guidelines for statistical practice (revised). https://www.amstat.org/your-career/ethical-guidelines-for-statistical-practice

National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. U.S. Department of Health, Education, and Welfare.

U.S. Census Bureau. American Community Survey. https://www.census.gov/programs-surveys/acs

Centers for Disease Control and Prevention. Behavioral Risk Factor Surveillance System. https://www.cdc.gov/brfss


Tier 2: Attributed Claims

These are claims attributed to specific people, organizations, or time periods but not traced to a specific peer-reviewed publication. They may come from speeches, interviews, popular books, widely known historical events, or general knowledge attributed to a named source.

  • Abraham Wald's analysis of survivorship bias in WWII bomber damage assessment (1943). Widely documented in popular accounts; original statistical work was classified.

  • Florence Nightingale's coxcomb diagrams showing causes of mortality in the Crimean War (1858). Original publication: Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army.

  • The Literary Digest presidential poll of 1936 predicting Landon over Roosevelt. Documented in numerous statistics textbooks and histories of polling.

  • George Gallup's successful prediction of the 1936 election using scientific sampling. Documented in histories of American polling.

  • The Tuskegee Syphilis Study (1932-1972). Extensively documented by the CDC and in numerous historical accounts. See also: Jones, J. H. (1981). Bad blood: The Tuskegee syphilis experiment. Free Press.

  • The Sally Clark wrongful conviction case (1999-2003). Court records are public; statistical analysis discussed in: Hill, R. (2004). Multiple sudden infant deaths — coincidence or beyond coincidence? Paediatric and Perinatal Epidemiology, 18(5), 320-326.

  • Hans Rosling's Gapminder presentations and the "200 Countries, 200 Years, 4 Minutes" BBC visualization. Available at gapminder.org.

  • Google Flu Trends (2008-2015) as cautionary tale. Discussed in: Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu. Science, 343(6176), 1203-1205.

  • Latanya Sweeney's research showing 87% of Americans identifiable by date of birth, zip code, and gender. Original: Sweeney, L. (2000). Simple demographics often identify people uniquely (Carnegie Mellon University Data Privacy Working Paper 3).

  • Arvind Narayanan and Vitaly Shmatikov's de-anonymization of the Netflix Prize dataset (2006). Published in: Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. IEEE Symposium on Security and Privacy, 111-125.

  • Amazon's hiring algorithm that penalized resumes containing the word "women's." Reported in Reuters (2018) by Jeffrey Dastin.

  • Carly Fiorina quote: "The goal is to turn data into information, and information into insight." Widely attributed; adapted for Ch.28 epigraph.


Tier 3: Illustrative Examples Created for the Text

The following examples and scenarios were created by the authors specifically for this textbook. They are pedagogical illustrations, not reports of actual events or real data analyses.

  • Dr. Maya Chen and her epidemiological analyses — fictional character and all associated datasets, results, and case studies.
  • Alex Rivera and StreamVibe — fictional character, fictional company, and all A/B testing scenarios.
  • Professor James Washington and his algorithmic fairness research — fictional character. Analysis methodology is based on real techniques; specific data values are illustrative.
  • Sam Okafor, the Riverside Raptors, and Daria Williams — fictional characters and all basketball statistical analyses.
  • All worked examples with specific numerical values (unless explicitly attributed to a real dataset).
  • All case study scenarios in case-study-01 and case-study-02 files, except those directly citing real published studies.
  • The Data Detective Portfolio dataset analyses shown in chapter project checkpoints.

Top 15 Resources for Continuing Your Statistical Education

  1. Wheelan, C. (2013). Naked Statistics: Stripping the Dread from the Data. The most accessible introduction to statistical thinking. If you want to recommend one book to a friend who "doesn't do math," this is it.

  2. Kahneman, D. (2011). Thinking, Fast and Slow. How human psychology systematically distorts our interpretation of statistical evidence. Essential for understanding why statistical training matters.

  3. Silver, N. (2012). The Signal and the Noise. How to distinguish real patterns from random noise in prediction. Excellent case studies in politics, sports, weather, and economics.

  4. O'Neil, C. (2016). Weapons of Math Destruction. How algorithms and statistical models can perpetuate inequality. The go-to book for understanding algorithmic bias.

  5. Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.). The definitive work on data visualization. Beautiful, profound, and practical.

  6. Knaflic, C. N. (2015). Storytelling with Data. A practical guide to communicating data analysis results effectively. More applied than Tufte.

  7. Gigerenzer, G. (2002). Calculated Risks. How to present and understand probability in a way humans can actually process. Transformed how medical testing is communicated.

  8. McGrayne, S. B. (2011). The Theory That Would Not Die. The remarkable history of Bayes' theorem and its journey from obscurity to the foundation of modern AI.

  9. Salsburg, D. (2001). The Lady Tasting Tea. The stories behind the people who invented modern statistics. Makes Fisher, Pearson, Gosset, and Neyman come alive.

  10. Cairo, A. (2019). How Charts Lie. A practical guide to detecting misleading visualizations in the media. Essential for statistical literacy.

  11. Bruce, P., & Bruce, A. (2017). Practical Statistics for Data Scientists (2nd ed.). O'Reilly. Bridges the gap between traditional statistics and modern data science. Excellent Python examples.

  12. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in Python (2nd ed.). The natural next step after this textbook if you want to pursue machine learning. Free online at statlearning.com.

  13. Spiegelhalter, D. (2019). The Art of Statistics: Learning from Data. A modern introduction to statistical thinking by one of the UK's most prominent statisticians. Covers Bayesian methods, causal inference, and AI.

  14. Harford, T. (2021). The Data Detective: Ten Easy Rules to Make Sense of Statistics. A journalist's guide to not being fooled by data-driven claims. Engaging writing and real-world examples.

  15. Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. For students who want to go deeper on causal inference — the frontier beyond "correlation does not imply causation."

Online Resources

  • Khan Academy Statistics and Probability (khanacademy.org): Free video lessons covering most topics in this textbook.
  • StatQuest with Josh Starmer (youtube.com/statquest): Engaging video explanations of statistical concepts with memorable visual style.
  • Seeing Theory (seeing-theory.brown.edu): Beautiful interactive visualizations of probability and statistics concepts.
  • Gapminder Tools (gapminder.org/tools): Interactive data exploration tool created by Hans Rosling's foundation.
  • Our World in Data (ourworldindata.org): Research-driven data and visualizations on global issues.
  • Python Data Science Handbook by Jake VanderPlas (jakevdp.github.io/PythonDataScienceHandbook): Free online textbook covering pandas, matplotlib, scikit-learn.
  • statsmodels documentation (statsmodels.org): Complete reference for the Python statistical modeling library used throughout this textbook.