Chapter 5: Further Reading
Data Literacy for Bettors -- Recommended Resources
Pandas and Python Data Analysis
-
McKinney, Wes. Python for Data Analysis, 3rd Edition. O'Reilly Media, 2022. The definitive guide to pandas, written by its creator. Covers data loading, cleaning, transformation, and time-series analysis in depth. Chapters 5 (Getting Started with pandas), 7 (Data Cleaning and Preparation), and 10 (Data Aggregation and Group Operations) are directly applicable to sports data work.
-
VanderPlas, Jake. Python Data Science Handbook. O'Reilly Media, 2016. Available free at https://jakevdp.github.io/PythonDataScienceHandbook/. Excellent coverage of NumPy, pandas, matplotlib, and scikit-learn. The pandas chapter provides a slightly different perspective from McKinney's and is useful as a complementary reference.
-
pandas Official Documentation. https://pandas.pydata.org/docs/ The authoritative reference for every pandas function and method. The "10 Minutes to pandas" tutorial and the "Cookbook" section are particularly useful for quickly finding patterns and idioms.
-
Harrison, Matt. Effective Pandas: Patterns for Data Manipulation. Metasnake, 2021. A focused book on pandas best practices, method chaining, and idiomatic code. Useful for bettors who already know basic pandas but want to write cleaner, more efficient pipelines.
Sports Data and Analytics
-
Severini, Thomas A. Analytic Methods in Sports: Using Mathematics and Statistics to Understand Data from Baseball, Football, Basketball, and Other Sports, 2nd Edition. CRC Press, 2020. An accessible introduction to the mathematical and statistical methods used in sports analytics. Covers regression, probability models, and ranking systems with real sports examples.
-
Albert, Jim, Mark E. Glickman, Tim B. Swartz, and Ruud H. Koning (eds.). Handbook of Statistical Methods and Analyses in Sports. CRC Press, 2017. A comprehensive academic reference covering statistical methods for baseball, football, basketball, soccer, hockey, and other sports. Includes chapters on data collection methodology and quality considerations.
-
Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. Modern Data Science with R, 2nd Edition. CRC Press, 2021. While R-focused rather than Python-focused, this book provides excellent coverage of data wrangling, EDA, and database design principles that transfer directly to any language. The case studies in sports data are particularly relevant.
-
Koseler, Kaan, and Matthew Stephan. "Machine Learning Applications in Baseball: A Systematic Literature Review." Applied Artificial Intelligence 31, no. 9-10 (2017): 745-763. A survey of how machine learning has been applied to baseball data, with extensive discussion of data preparation challenges and feature engineering. Many lessons generalize to other sports.
Database Design and SQL
-
Beaulieu, Alan. Learning SQL, 3rd Edition. O'Reilly Media, 2020. A practical introduction to SQL that covers everything from basic queries to joins, aggregations, and subqueries. The examples use MySQL but the concepts apply to SQLite and PostgreSQL as well.
-
Hernandez, Michael J. Database Design for Mere Mortals, 4th Edition. Addison-Wesley, 2020. A clear, jargon-light guide to relational database design. Covers normalization, primary and foreign keys, and schema design. Useful for bettors designing their first betting database.
-
SQLite Documentation. https://www.sqlite.org/docs.html SQLite is the recommended starting database for individual bettors. The documentation is thorough and includes excellent explanations of indexing, query optimization, and the differences between SQLite and larger database systems.
Web Scraping and Data Collection
-
Mitchell, Ryan. Web Scraping with Python, 2nd Edition. O'Reilly Media, 2018. Covers BeautifulSoup, Scrapy, and Selenium for web scraping. Includes discussion of legal and ethical considerations, handling JavaScript-rendered pages, and working with APIs. Essential reading for anyone building sports data scrapers.
-
The Odds API Documentation. https://the-odds-api.com/liveapi/guides/v4/ Documentation for one of the most accessible free sports odds APIs. Good example of API design and a practical starting point for pulling live and historical odds data programmatically.
Exploratory Data Analysis and Visualization
-
Tukey, John W. Exploratory Data Analysis. Addison-Wesley, 1977. The foundational text on EDA by its inventor. Though published in 1977, the philosophy of letting data "speak" before imposing models remains the cornerstone of good analytical practice. The concepts of stem-and-leaf displays, box plots, and resistant statistics all originate here.
-
Knaflic, Cole Nussbaumer. Storytelling with Data. Wiley, 2015. A practical guide to creating clear, effective data visualizations. While not sports-specific, the principles of reducing clutter, directing attention, and choosing the right chart type apply directly to communicating betting analysis results.
-
Wilke, Claus O. Fundamentals of Data Visualization. O'Reilly Media, 2019. Available free at https://clauswilke.com/dataviz/. A comprehensive guide to visualization principles organized by the type of story you want to tell: distributions, proportions, trends, relationships, and uncertainty. Excellent reference when deciding how to visualize sports data patterns.
Data Quality and Governance
-
Redman, Thomas C. Data Driven: Profiting from Your Most Important Business Asset. Harvard Business Press, 2008. A business-focused book on data quality that articulates why bad data is costly and how to build quality into data processes from the start. The frameworks for assessing and improving data quality apply well to personal betting data pipelines.
-
Wickham, Hadley. "Tidy Data." Journal of Statistical Software 59, no. 10 (2014). The seminal paper defining "tidy data" principles: each variable forms a column, each observation forms a row, and each type of observational unit forms a table. Understanding these principles is essential for structuring sports datasets correctly. Available at https://www.jstatsoft.org/article/view/v059i10.
Betting-Specific Data Resources
-
Killer Sports Query Tool. https://killersports.com/ A powerful query tool for historical NFL, NBA, MLB, and NHL data. Allows complex situational queries (e.g., "home underdogs after a bye week in primetime"). Useful for validating patterns discovered during EDA and for understanding what professional bettors investigate.
-
Kaggle Sports Datasets. https://www.kaggle.com/datasets?search=sports+betting A curated collection of community-contributed sports datasets. Quality varies significantly, so apply the data provenance principles from this chapter. Useful for practice and exploration, but always verify against primary sources before using in production models.