Chapter 2: Further Reading

Annotated Bibliography

This curated reading list provides resources for deepening your understanding of NBA data sources, collection methodologies, and data quality practices. Resources are organized by topic and include brief annotations explaining their relevance.


Official Documentation and Primary Sources

NBA Stats API

nba_api Documentation https://github.com/swar/nba_api

The official documentation for the nba_api Python library. Essential reference for understanding available endpoints, parameters, and response formats. Includes examples and community-contributed documentation for edge cases.

NBA Stats API Endpoints Reference (Community-maintained documentation)

Various community efforts have documented the NBA Stats API endpoints more thoroughly than official sources. Search for "NBA Stats API endpoints" for current community resources.


Basketball-Reference

Sports Reference Data Use Policy https://www.sports-reference.com/data_use.html

Official policy governing the use of data from Basketball-Reference and other Sports Reference sites. Essential reading before implementing any scraping solution. Outlines acceptable use cases and licensing options.

Basketball-Reference Glossary https://www.basketball-reference.com/about/glossary.html

Comprehensive definitions of all statistics tracked by Basketball-Reference. Invaluable for understanding how metrics are calculated and what they represent. Includes historical context for when statistics were first recorded.


Books

Data Collection and Web Scraping

Web Scraping with Python: Collecting More Data from the Modern Web (2nd Edition) Ryan Mitchell | O'Reilly Media, 2018

Comprehensive guide to web scraping techniques using Python and BeautifulSoup. Covers legal considerations, handling JavaScript-rendered content, and building robust scraping pipelines. Directly applicable to Basketball-Reference and similar sports data sites.

Python for Data Analysis (3rd Edition) Wes McKinney | O'Reilly Media, 2022

Written by the creator of pandas, this book provides essential techniques for data manipulation and cleaning. Chapters on loading data from various sources and handling missing values are particularly relevant to basketball data work.


Basketball Analytics Foundations

Basketball on Paper: Rules and Tools for Performance Analysis Dean Oliver | Potomac Books, 2004

The foundational text for basketball analytics. Oliver introduces the "Four Factors" framework and discusses data requirements for basketball analysis. While some specific tools are dated, the analytical principles remain essential.

Basketball Analytics: Objective and Efficient Strategies for Understanding How Teams Win Stephen M. Shea & Christopher E. Baker | CreateSpace, 2013

Bridges traditional statistics and modern analytics approaches. Includes discussion of data sources available at the time and how to work with their limitations. Good introduction to translating data into actionable insights.

Sprawlball: A Visual Tour of the New Era of the NBA Kirk Goldsberry | Houghton Mifflin Harcourt, 2019

Demonstrates the power of spatial data analysis in basketball through innovative visualizations. Chapter on shot chart methodology provides insight into working with location data. Excellent example of data storytelling.


Data Quality and Management

Data Quality: The Accuracy Dimension Jack E. Olson | Morgan Kaufmann, 2003

Academic treatment of data quality concepts including completeness, consistency, and accuracy. Provides frameworks applicable to sports data validation and cleaning. Useful for building systematic data quality processes.

Bad Data Handbook Q. Ethan McCallum (Editor) | O'Reilly Media, 2012

Collection of essays on recognizing and handling problematic data. Includes real-world case studies and practical advice. The chapter on "data provenance" is particularly relevant to historical sports data challenges.


Academic Papers

NBA Analytics Research

A Starting Point for Analyzing Basketball Statistics Justin Kubatko, Dean Oliver, Kevin Pelton, Dan T. Rosenbaum Journal of Quantitative Analysis in Sports, 2007

Foundational academic paper establishing standards for basketball statistical analysis. Discusses data requirements, metric definitions, and analytical approaches. Essential reading for understanding the field's foundations.

A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes Daniel Cervone, Alexander D'Amour, Luke Bornn, Kirk Goldsberry Journal of the American Statistical Association, 2016

Demonstrates sophisticated use of tracking data for possession modeling. Provides insight into the analytical potential of granular tracking data. Technical but illuminating for understanding what's possible with advanced data.

Counterpoints: Advanced Defensive Metrics for NBA Basketball Alexander Franks, Andrew Miller, Luke Bornn, Kirk Goldsberry MIT Sloan Sports Analytics Conference, 2015

Shows how tracking data enables defensive analysis previously impossible with traditional statistics. Relevant for understanding the value of different data types and their unique analytical contributions.


Data Quality in Sports

Accuracy of Event Data in Football: Assessing Underestimation of Possession Duration Various Authors | International Journal of Performance Analysis in Sport

While focused on soccer, this paper's methodology for assessing event data accuracy applies to basketball play-by-play data. Demonstrates techniques for validating event-level data against video.


Online Resources

Tutorials and Courses

NBA API Python Tutorial Series (Various online platforms)

Search for current tutorials on working with the nba_api library. Video tutorials on YouTube and written guides on Medium provide step-by-step introductions to common data collection tasks.

Sports Analytics with Python (Coursera/edX courses) Various Universities

Several universities offer online courses covering sports analytics fundamentals including data collection. Check Coursera, edX, and university extension programs for current offerings.


Community Resources

r/nba and r/nbadiscussion Subreddits https://reddit.com/r/nba and https://reddit.com/r/nbadiscussion

Active communities discussing NBA analytics. Useful for understanding current debates, finding data sources, and connecting with other analysts. Quality varies but can surface valuable resources.

Thinking Basketball https://www.youtube.com/c/ThinkingBasketball

YouTube channel combining statistical analysis with film study. Demonstrates thoughtful use of data in basketball analysis. Good model for presenting data-driven insights to general audiences.

Cleaning the Glass https://cleaningtheglass.com

Subscription service providing cleaned, context-adjusted NBA statistics. While not a data source for raw data, demonstrates best practices for presenting basketball statistics with appropriate context.


Technical References

REST API Design Best Practices Various Online Resources

Understanding REST API principles helps when working with the NBA API and building your own data services. Search for current best practices guides from tech companies like Google, Microsoft, or Stripe.

Parquet File Format Specification https://parquet.apache.org

Official documentation for the Apache Parquet format recommended for analytical data storage. Understanding the format's strengths helps optimize data pipelines.


Historical Data Resources

Archives and Collections

The Basketball-Reference Blog https://www.basketball-reference.com/blog/

Historical articles discussing data collection challenges, methodology changes, and interesting statistical discoveries. Provides context for understanding how basketball statistics have evolved.

APBR (Association for Professional Basketball Research) Various Publications

Organization dedicated to basketball history and statistics. Publications include discussions of historical data challenges and reconstruction efforts for incomplete records.


Computer Fraud and Abuse Act Implications for Web Scraping Various Legal Analysis Articles

Understanding the legal landscape for web scraping is essential. Search for current legal analysis as this area continues to evolve through court decisions.

Terms of Service and Web Scraping EFF (Electronic Frontier Foundation) and Similar Organizations

Resources discussing the intersection of terms of service, automated access, and legal rights. Important for understanding your rights and responsibilities when collecting data.


For those new to basketball data collection, we recommend the following sequence:

  1. Start with: nba_api documentation and Basketball-Reference glossary
  2. Then read: "Web Scraping with Python" for technical foundations
  3. Follow with: "Basketball on Paper" for analytical context
  4. Study: Academic papers for advanced understanding
  5. Monitor: Online communities for current developments

Staying Current

The landscape of sports data changes rapidly. To stay current:

  1. Follow @nikiellen, @kirkgoldsberry, and other analytics practitioners on Twitter/X
  2. Attend MIT Sloan Sports Analytics Conference (annually in February/March)
  3. Subscribe to newsletters like Seth Partnow's or Ben Taylor's Substack
  4. Monitor the nba_api GitHub repository for updates and breaking changes
  5. Check Basketball-Reference's blog for methodology updates

A Note on Data Access

The sports data landscape continues to evolve with new providers, API changes, and shifting access policies. Some resources listed here may change over time. Always verify current:

  • Terms of service
  • API endpoint availability
  • Rate limiting policies
  • Data licensing requirements

When in doubt about data access rights, contact the data provider directly or consult with legal counsel for commercial applications.


Last updated: Chapter 2 publication date. Check for updated resources at the textbook companion website.