Further Reading: Exploratory Data Analysis for Football


EDA Foundations

"Exploratory Data Analysis" by John Tukey The seminal work that defined the field. Tukey's philosophy of letting data speak before hypothesis testing remains foundational.

"The Visual Display of Quantitative Information" by Edward Tufte The bible of data visualization. Essential reading for anyone presenting analytical findings.

"Storytelling with Data" by Cole Nussbaumer Knaflic Practical guide to creating effective visualizations that communicate clearly.


Python Visualization

Matplotlib Documentation (matplotlib.org) The foundation of Python visualization. Master the basics before moving to higher-level libraries.

Seaborn Tutorial (seaborn.pydata.org) Statistical visualization built on matplotlib. Excellent for EDA with its sensible defaults.

Plotly Python Documentation (plotly.com/python) Interactive visualization library. Great for exploratory dashboards and web-based presentations.

"Python Data Science Handbook" by Jake VanderPlas Comprehensive coverage of matplotlib, pandas visualization, and exploratory analysis techniques.


Data Quality and Preparation

"Bad Data Handbook" by Q. Ethan McCallum Real-world stories about data quality issues and how to handle them.

"Data Cleaning with Python" by Michael Walker Practical techniques for handling messy data before analysis.

Pandas User Guide: Working with Missing Data Official pandas documentation on handling NaN values and data imputation.


Sports Analytics EDA

nflfastR Articles (rbsdm.com/stats) Blog posts demonstrating exploratory analysis with NFL data using R (concepts translate to Python).

Open Source Football (opensourcefootball.com) Community-contributed articles often featuring EDA workflows and visualizations.

FiveThirtyEight Sports (fivethirtyeight.com/sports) Examples of data-driven sports journalism with strong visualization.


Statistical Thinking

"Naked Statistics" by Charles Wheelan Accessible introduction to statistical concepts underlying EDA.

"The Art of Statistics" by David Spiegelhalter Modern perspective on extracting meaning from data with real-world examples.

"Statistics Done Wrong" by Alex Reinhart Common pitfalls in data analysis and how to avoid them.


Visualization Theory

"The Grammar of Graphics" by Leland Wilkinson Theoretical foundation for modern visualization libraries like ggplot2 and Altair.

"Visual Thinking for Design" by Colin Ware How human perception affects visualization effectiveness.

"Show Me the Numbers" by Stephen Few Practical guidance for designing dashboards and analytical displays.


Interactive Tools

Streamlit Documentation (streamlit.io) Build interactive data apps quickly for exploratory analysis.

Jupyter Notebook Best Practices Effective use of notebooks for EDA and reproducible analysis.

Altair Documentation (altair-viz.github.io) Declarative visualization library based on Vega-Lite, excellent for quick EDA.


Football-Specific Resources

Expected Points and Win Probability Models (various) Understanding the metrics you're exploring: - Burke, B. "Expected Points" (Advanced Football Analytics) - Lopez, M. "NFL analytics" papers

nfl_data_py Documentation (github.com/nflverse/nfl_data_py) Python library documentation for accessing NFL data.

Lee Sharpe's NFL Data Resources Tools and datasets for NFL analysis.


Online Courses

DataCamp: Exploratory Data Analysis in Python Interactive course covering pandas profiling and visualization.

Coursera: Data Visualization with Python IBM course covering matplotlib, seaborn, and plotly.

Kaggle Learn: Data Visualization Free, hands-on introduction to visualization with Python.


Community and Discussion

r/NFLstatheads (Reddit) Community for NFL statistical analysis discussion.

#NFLAnalytics (Twitter/X) Real-time discussion of NFL analytics.

Sports Analytics Discord Servers Various communities for real-time discussion and feedback.


Tools Reference

pandas-profiling (ydata-profiling) Automated EDA report generation for pandas DataFrames.

sweetviz Another automated EDA library with comparison features.

D-Tale Interactive pandas DataFrame exploration in the browser.