Chapter 2: Further Reading
Annotated Bibliography
This curated reading list provides resources for deepening your understanding of NBA data sources, collection methodologies, and data quality practices. Resources are organized by topic and include brief annotations explaining their relevance.
Official Documentation and Primary Sources
NBA Stats API
nba_api Documentation https://github.com/swar/nba_api
The official documentation for the nba_api Python library. Essential reference for understanding available endpoints, parameters, and response formats. Includes examples and community-contributed documentation for edge cases.
NBA Stats API Endpoints Reference (Community-maintained documentation)
Various community efforts have documented the NBA Stats API endpoints more thoroughly than official sources. Search for "NBA Stats API endpoints" for current community resources.
Basketball-Reference
Sports Reference Data Use Policy https://www.sports-reference.com/data_use.html
Official policy governing the use of data from Basketball-Reference and other Sports Reference sites. Essential reading before implementing any scraping solution. Outlines acceptable use cases and licensing options.
Basketball-Reference Glossary https://www.basketball-reference.com/about/glossary.html
Comprehensive definitions of all statistics tracked by Basketball-Reference. Invaluable for understanding how metrics are calculated and what they represent. Includes historical context for when statistics were first recorded.
Books
Data Collection and Web Scraping
Web Scraping with Python: Collecting More Data from the Modern Web (2nd Edition) Ryan Mitchell | O'Reilly Media, 2018
Comprehensive guide to web scraping techniques using Python and BeautifulSoup. Covers legal considerations, handling JavaScript-rendered content, and building robust scraping pipelines. Directly applicable to Basketball-Reference and similar sports data sites.
Python for Data Analysis (3rd Edition) Wes McKinney | O'Reilly Media, 2022
Written by the creator of pandas, this book provides essential techniques for data manipulation and cleaning. Chapters on loading data from various sources and handling missing values are particularly relevant to basketball data work.
Basketball Analytics Foundations
Basketball on Paper: Rules and Tools for Performance Analysis Dean Oliver | Potomac Books, 2004
The foundational text for basketball analytics. Oliver introduces the "Four Factors" framework and discusses data requirements for basketball analysis. While some specific tools are dated, the analytical principles remain essential.
Basketball Analytics: Objective and Efficient Strategies for Understanding How Teams Win Stephen M. Shea & Christopher E. Baker | CreateSpace, 2013
Bridges traditional statistics and modern analytics approaches. Includes discussion of data sources available at the time and how to work with their limitations. Good introduction to translating data into actionable insights.
Sprawlball: A Visual Tour of the New Era of the NBA Kirk Goldsberry | Houghton Mifflin Harcourt, 2019
Demonstrates the power of spatial data analysis in basketball through innovative visualizations. Chapter on shot chart methodology provides insight into working with location data. Excellent example of data storytelling.
Data Quality and Management
Data Quality: The Accuracy Dimension Jack E. Olson | Morgan Kaufmann, 2003
Academic treatment of data quality concepts including completeness, consistency, and accuracy. Provides frameworks applicable to sports data validation and cleaning. Useful for building systematic data quality processes.
Bad Data Handbook Q. Ethan McCallum (Editor) | O'Reilly Media, 2012
Collection of essays on recognizing and handling problematic data. Includes real-world case studies and practical advice. The chapter on "data provenance" is particularly relevant to historical sports data challenges.
Academic Papers
NBA Analytics Research
A Starting Point for Analyzing Basketball Statistics Justin Kubatko, Dean Oliver, Kevin Pelton, Dan T. Rosenbaum Journal of Quantitative Analysis in Sports, 2007
Foundational academic paper establishing standards for basketball statistical analysis. Discusses data requirements, metric definitions, and analytical approaches. Essential reading for understanding the field's foundations.
A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes Daniel Cervone, Alexander D'Amour, Luke Bornn, Kirk Goldsberry Journal of the American Statistical Association, 2016
Demonstrates sophisticated use of tracking data for possession modeling. Provides insight into the analytical potential of granular tracking data. Technical but illuminating for understanding what's possible with advanced data.
Counterpoints: Advanced Defensive Metrics for NBA Basketball Alexander Franks, Andrew Miller, Luke Bornn, Kirk Goldsberry MIT Sloan Sports Analytics Conference, 2015
Shows how tracking data enables defensive analysis previously impossible with traditional statistics. Relevant for understanding the value of different data types and their unique analytical contributions.
Data Quality in Sports
Accuracy of Event Data in Football: Assessing Underestimation of Possession Duration Various Authors | International Journal of Performance Analysis in Sport
While focused on soccer, this paper's methodology for assessing event data accuracy applies to basketball play-by-play data. Demonstrates techniques for validating event-level data against video.
Online Resources
Tutorials and Courses
NBA API Python Tutorial Series (Various online platforms)
Search for current tutorials on working with the nba_api library. Video tutorials on YouTube and written guides on Medium provide step-by-step introductions to common data collection tasks.
Sports Analytics with Python (Coursera/edX courses) Various Universities
Several universities offer online courses covering sports analytics fundamentals including data collection. Check Coursera, edX, and university extension programs for current offerings.
Community Resources
r/nba and r/nbadiscussion Subreddits https://reddit.com/r/nba and https://reddit.com/r/nbadiscussion
Active communities discussing NBA analytics. Useful for understanding current debates, finding data sources, and connecting with other analysts. Quality varies but can surface valuable resources.
Thinking Basketball https://www.youtube.com/c/ThinkingBasketball
YouTube channel combining statistical analysis with film study. Demonstrates thoughtful use of data in basketball analysis. Good model for presenting data-driven insights to general audiences.
Cleaning the Glass https://cleaningtheglass.com
Subscription service providing cleaned, context-adjusted NBA statistics. While not a data source for raw data, demonstrates best practices for presenting basketball statistics with appropriate context.
Technical References
REST API Design Best Practices Various Online Resources
Understanding REST API principles helps when working with the NBA API and building your own data services. Search for current best practices guides from tech companies like Google, Microsoft, or Stripe.
Parquet File Format Specification https://parquet.apache.org
Official documentation for the Apache Parquet format recommended for analytical data storage. Understanding the format's strengths helps optimize data pipelines.
Historical Data Resources
Archives and Collections
The Basketball-Reference Blog https://www.basketball-reference.com/blog/
Historical articles discussing data collection challenges, methodology changes, and interesting statistical discoveries. Provides context for understanding how basketball statistics have evolved.
APBR (Association for Professional Basketball Research) Various Publications
Organization dedicated to basketball history and statistics. Publications include discussions of historical data challenges and reconstruction efforts for incomplete records.
Data Ethics and Legal Considerations
Computer Fraud and Abuse Act Implications for Web Scraping Various Legal Analysis Articles
Understanding the legal landscape for web scraping is essential. Search for current legal analysis as this area continues to evolve through court decisions.
Terms of Service and Web Scraping EFF (Electronic Frontier Foundation) and Similar Organizations
Resources discussing the intersection of terms of service, automated access, and legal rights. Important for understanding your rights and responsibilities when collecting data.
Recommended Reading Order
For those new to basketball data collection, we recommend the following sequence:
- Start with: nba_api documentation and Basketball-Reference glossary
- Then read: "Web Scraping with Python" for technical foundations
- Follow with: "Basketball on Paper" for analytical context
- Study: Academic papers for advanced understanding
- Monitor: Online communities for current developments
Staying Current
The landscape of sports data changes rapidly. To stay current:
- Follow @nikiellen, @kirkgoldsberry, and other analytics practitioners on Twitter/X
- Attend MIT Sloan Sports Analytics Conference (annually in February/March)
- Subscribe to newsletters like Seth Partnow's or Ben Taylor's Substack
- Monitor the nba_api GitHub repository for updates and breaking changes
- Check Basketball-Reference's blog for methodology updates
A Note on Data Access
The sports data landscape continues to evolve with new providers, API changes, and shifting access policies. Some resources listed here may change over time. Always verify current:
- Terms of service
- API endpoint availability
- Rate limiting policies
- Data licensing requirements
When in doubt about data access rights, contact the data provider directly or consult with legal counsel for commercial applications.
Last updated: Chapter 2 publication date. Check for updated resources at the textbook companion website.