Chapter 6 Further Reading: Descriptive Statistics for Sports


Foundational Textbooks

Statistics and Data Analysis

  • Devore, J. L. & Berk, K. N. Modern Mathematical Statistics with Applications (2nd Edition). Springer, 2012. A rigorous treatment of descriptive and inferential statistics with strong coverage of distributional theory. Chapters 1-4 provide the mathematical foundations underlying the descriptive methods used in this chapter.

  • Freedman, D., Pisani, R., & Purves, R. Statistics (4th Edition). W. W. Norton, 2007. An exceptionally clear, intuition-driven introduction to statistics. The explanations of average, spread, correlation, and the normal curve are among the best available for building conceptual understanding before diving into sports applications.

  • Wackerly, D. D., Mendenhall, W., & Scheaffer, R. L. Mathematical Statistics with Applications (7th Edition). Cengage Learning, 2008. Comprehensive coverage of probability distributions, moment-generating functions, and order statistics. Useful for readers who want deeper mathematical treatment of the measures of central tendency and variability.

  • Tukey, J. W. Exploratory Data Analysis. Addison-Wesley, 1977. The classic text on EDA. Tukey invented the box plot and formalized many of the visualization techniques discussed in this chapter. Essential reading for understanding the philosophy behind data exploration before formal modeling.


Sports Analytics Books

  • Albert, J., Glickman, M. E., Swartz, T. B., & Koning, R. H. (Eds.) Handbook of Statistical Methods and Analyses in Sports. CRC Press, 2017. A comprehensive reference covering statistical methods across multiple sports. Chapters on football, basketball, baseball, hockey, and soccer each discuss sport-specific descriptive measures and distributional properties.

  • Winston, W. L. Mathletics: How Gamblers, Managers, and Fans Use Mathematics in Sports. Princeton University Press, 2012. Accessible introduction to quantitative sports analysis with numerous examples of descriptive statistics applied to real sports data. Excellent coverage of Pythagorean win expectation and its connection to point differential.

  • Severini, T. A. Analytic Methods in Sports: Using Mathematics and Statistics to Understand Data from Baseball, Football, Basketball, and Other Sports (2nd Edition). CRC Press, 2020. Mathematically rigorous treatment of sports analytics. Strong coverage of descriptive statistics in context, including scoring distributions, correlation analysis, and the statistical properties of various sports metrics.

  • Tango, T. M., Lichtman, M. G., & Dolphin, A. E. The Book: Playing the Percentages in Baseball. Potomac Books, 2007. While baseball-specific, this book is a masterclass in applying descriptive statistics to evaluate players and strategies. The treatment of sample size, regression to the mean, and weighted averages is applicable across all sports.

  • Oliver, D. Basketball on Paper: Rules and Tools for Performance Analysis. Potomac Books, 2004. Foundational text for basketball analytics. Introduces the "Four Factors" of basketball success and demonstrates how descriptive statistics can identify what truly drives winning. The correlation analysis between various statistics and winning percentage is directly relevant to this chapter.


Sports Betting and Gambling Mathematics

  • Poundstone, W. Fortune's Formula: The Untold Story of the Scientific Betting System That Beat the Casinos and Wall Street. Hill and Wang, 2005. Narrative history of the Kelly criterion and its application to gambling and investing. While not directly about descriptive statistics, it contextualizes why accurate statistical summaries matter for bet sizing.

  • Levitt, S. D. "Why Are Gambling Markets Organised So Differently from Financial Markets?" The Economic Journal, 114(495), 223-246, 2004. Academic paper examining the structure of sports betting markets. Includes descriptive statistical analysis of NFL point spreads and demonstrates how the market's statistical properties differ from efficient financial markets.

  • Boulier, B. L. & Stekler, H. O. "Predicting the Outcomes of National Football League Games." International Journal of Forecasting, 19(2), 257-270, 2003. Uses descriptive and inferential statistics to evaluate NFL prediction methods. Good example of how descriptive statistics form the basis for predictive models.

  • Stern, H. S. "On the Probability of Winning a Football Game." The American Statistician, 45(3), 179-183, 1991. Classic paper examining the distributional properties of NFL scoring and game outcomes. Directly relevant to the discussion of normality assumptions in football scoring data.


Data Visualization

  • Tufte, E. R. The Visual Display of Quantitative Information (2nd Edition). Graphics Press, 2001. The definitive work on data visualization principles. Tufte's concepts of data-ink ratio, chartjunk elimination, and small multiples are essential for creating the clear, informative sports visualizations discussed in this chapter.

  • Wilke, C. O. Fundamentals of Data Visualization. O'Reilly Media, 2019. Modern, practical guide to data visualization using principles from perception science. Available free online at clauswilke.com/dataviz. Excellent reference for choosing the right chart type for sports data analysis.

  • VanderPlas, J. Python Data Science Handbook. O'Reilly Media, 2016. Practical guide to data visualization and analysis using Python's scientific stack (NumPy, Pandas, Matplotlib, Seaborn). The visualization chapters are directly applicable to the Python code examples in this chapter.

  • McKinney, W. Python for Data Analysis (3rd Edition). O'Reilly Media, 2022. The essential reference for Pandas, written by its creator. Covers data manipulation, grouping, aggregation, and descriptive statistics computation in Python. Invaluable for implementing the analyses described in this chapter.


Academic Papers and Research

Scoring Distributions and Key Numbers

  • Stern, H. S. "A Brownian Motion Model for the Progress of Sports Scores." Journal of the American Statistical Association, 89(427), 1128-1134, 1994. Models the evolution of game scores as Brownian motion, providing theoretical justification for why point differentials are approximately normally distributed.

  • Urschel, J. D. & Zhuang, J. "Are NFL Games Getting Closer? Analyzing Margins of Victory from 1978 to 2019." Journal of Sports Analytics, 6(3), 169-178, 2020. Uses descriptive statistics to analyze trends in NFL competitive balance over four decades. Excellent example of longitudinal descriptive analysis.

Home Advantage

  • Jamieson, J. P. "Home Field Advantage in Athletics: A Meta-Analysis." Journal of Applied Social Psychology, 40(7), 1819-1848, 2010. Comprehensive meta-analysis quantifying home advantage across sports using descriptive and inferential statistics. Provides the empirical basis for the home-field adjustments discussed in Case Study 2.

  • Moskowitz, T. J. & Wertheim, L. J. Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won. Crown Archetype, 2011. Data-driven examination of home advantage, referee bias, and other phenomena. Heavy use of descriptive statistics to challenge conventional wisdom.

Correlation and Predictive Value

  • Schuckers, M. E. Statistical Thinking in Sports. CRC Press, 2022. Textbook-style treatment of statistical methods in sports with emphasis on correlation, regression, and model evaluation. Strong bridge between descriptive and inferential methods.

  • Berri, D. J., Schmidt, M. B., & Brook, S. L. The Wages of Wins: Taking Measure of the Many Myths in Modern Sport. Stanford University Press, 2006. Uses correlation analysis and descriptive statistics to debunk common myths about what statistics drive winning in professional sports. Particularly strong on basketball and football.


Online Resources

Websites and Databases

  • Basketball Reference (basketball-reference.com) Comprehensive NBA statistics database. Ideal for practicing descriptive statistics calculations with real data. Includes per-game, per-36, per-100, and advanced statistics.

  • Pro Football Reference (pro-football-reference.com) Complete NFL statistics archive. The "Team Stats" and "Standings" pages provide data suitable for the exercises in this chapter.

  • FanGraphs (fangraphs.com) Advanced baseball statistics and analysis. Their glossary of statistics is an excellent reference for understanding baseball-specific descriptive measures like wOBA, FIP, and WAR.

  • Hockey Reference (hockey-reference.com) NHL statistics database with traditional and advanced metrics. Useful for cross-sport comparisons of scoring distributions and variability.

  • Kaggle Sports Datasets (kaggle.com/datasets?tags=sports) Large collection of freely available sports datasets suitable for practicing the Python code examples from this chapter. Includes NFL play-by-play data, NBA shot logs, and MLB pitch-level data.

Python Libraries Documentation

  • NumPy (numpy.org/doc) Documentation for numerical computing functions including mean, median, standard deviation, percentiles, and correlation. The numpy.statistics module is the foundation for the code examples in this chapter.

  • SciPy Statistics (docs.scipy.org/doc/scipy/reference/stats.html) Documentation for advanced statistical functions including trimmed mean, skewness, kurtosis, distribution fitting, and statistical tests used in the exercises.

  • Pandas (pandas.pydata.org/docs) Documentation for the DataFrame operations used to organize and analyze sports data. The describe(), rolling(), corr(), and groupby() methods are essential tools.

  • Matplotlib (matplotlib.org/stable/contents.html) Documentation for the primary Python plotting library. All visualizations in this chapter's code examples use Matplotlib as the rendering backend.

  • Seaborn (seaborn.pydata.org) Documentation for the statistical visualization library built on Matplotlib. Its heatmap, box plot, violin plot, and distribution plot functions simplify the creation of publication-quality visualizations.


Video Lectures and Courses

  • Khan Academy: Statistics and Probability (khanacademy.org) Free, comprehensive video series covering all descriptive statistics concepts. Start with "Summarizing quantitative data" for central tendency and variability, and "Exploring bivariate numerical data" for correlation and regression.

  • MIT OpenCourseWare 18.05: Introduction to Probability and Statistics (ocw.mit.edu) Rigorous university-level course with lecture notes, problem sets, and solutions. The first four units cover descriptive statistics with mathematical depth beyond most introductory courses.

  • Harvard CS109: Data Science (cs109.github.io) Combines statistics with Python programming. The early lectures on exploratory data analysis and visualization are directly relevant to this chapter's approach.


For readers who want to deepen their understanding in a structured way:

  1. Start with Freedman, Pisani, & Purves for conceptual foundations
  2. Move to VanderPlas for Python implementation skills
  3. Read Winston's Mathletics for sports-specific applications
  4. Study Oliver's Basketball on Paper or Tango's The Book for deep sport-specific analysis
  5. Explore the academic papers for research-level understanding
  6. Practice with real data from Basketball Reference, Pro Football Reference, and Kaggle datasets