Further Reading: Data Sources and Collection in Soccer
This annotated bibliography provides resources for deeper exploration of soccer data sources and collection.
Essential Documentation
StatsBomb Open Data
GitHub: statsbomb/open-data
The official repository for StatsBomb's free data offering. Includes documentation, specifications, and data files. Essential reading for anyone working with this data.
Best for: Understanding StatsBomb data structure and accessing free data
StatsBomb Data Specification
StatsBomb official documentation
Detailed specification of StatsBomb's event data format, including all event types, qualifiers, and coordinate systems. Critical reference when working with their data.
Best for: Understanding exact definitions of events and fields
Opta Definitions
Stats Perform documentation (requires access)
Opta's official definitions for their event taxonomy. Important for understanding legacy data and media statistics.
Best for: Working with Opta-derived statistics
Academic Papers on Data Collection
"Quality Assessment of Football Metrics"
Linke, D., Link, D., & Lames, M. (2018)
Systematic analysis of positional data quality in football. Examines accuracy, precision, and reliability of different tracking systems.
Best for: Understanding tracking data limitations
"Semi-Automated Possession Tracking"
Various authors in sports engineering journals
Papers examining the methodology behind event and possession tracking, including inter-rater reliability studies.
Best for: Understanding event data accuracy
"Validation of GPS-Based Positional Data"
Multiple sports science journals
Numerous validation studies comparing GPS to optical tracking systems, establishing accuracy benchmarks.
Best for: Evaluating GPS tracking limitations
Technical Tutorials
Friends of Tracking GitHub
GitHub: Friends-of-Tracking-Data-FoTD
Collection of Jupyter notebooks teaching soccer analytics using open data. Includes data loading, processing, and visualization tutorials.
Best for: Hands-on learning with code examples
McKay Johns' Tutorial Series
Blog and YouTube
Step-by-step tutorials on accessing and processing soccer data from various sources.
Best for: Practical data access guidance
Soccer Analytics Handbook (online resources)
Various community-created guides
Community-maintained guides to working with different data sources and APIs.
Best for: Troubleshooting specific data access issues
Data Provider Resources
StatsBomb Blog
statsbomb.com/articles
Regular articles explaining their data methodology, new features, and analytical approaches. Essential for understanding their data philosophy.
Best for: Staying current with StatsBomb offerings
Stats Perform AI
statsperform.com
Information on Stats Perform's AI and computer vision applications for data collection and analysis.
Best for: Understanding future directions in data collection
Wyscout Documentation
Wyscout help center
Documentation for Wyscout's platform and data exports. Important for those using their scouting platform.
Best for: Wyscout-specific guidance
Python Libraries
statsbombpy
PyPI: statsbombpy
Official Python wrapper for StatsBomb data API. Well-documented with examples.
pip install statsbombpy
Best for: Accessing StatsBomb open and paid data
kloppy
PyPI: kloppy
Standardized data loading for multiple soccer data formats. Converts between providers' formats to common standard.
pip install kloppy
Best for: Working with multiple data sources
socceraction
PyPI: socceraction
Library for soccer action valuation, built on top of standard data formats.
pip install socceraction
Best for: Advanced event data analysis
mplsoccer
PyPI: mplsoccer
Visualization library for soccer analytics, including pitch plotting and heat maps.
pip install mplsoccer
Best for: Creating soccer visualizations
Industry Reports
Sports Data Landscape Reports
Various consulting firms
Annual reports on the sports data industry, market sizes, and competitive dynamics.
Best for: Business context and market understanding
Conference Presentations
StatsBomb Conference, OptaPro Forum recordings
Presentations from data providers explaining their methodologies and new products.
Best for: Current industry developments
Data Quality and Validation
"Reproducible Research in Sports Analytics"
Various academic papers
Papers discussing best practices for data documentation, validation, and reproducibility in sports analysis.
Best for: Developing rigorous data practices
Data Validation Frameworks
General data science resources
Resources on data validation frameworks (e.g., Great Expectations, pandas validation) applicable to soccer data.
Best for: Building systematic validation processes
Recommended Reading Sequence
For Data Access Beginners: 1. StatsBomb Open Data documentation 2. Friends of Tracking tutorials 3. statsbombpy library documentation 4. FBref website exploration
For Pipeline Development: 1. kloppy documentation 2. Data validation resources 3. Database design tutorials 4. ETL best practices
For Industry Understanding: 1. Provider websites and documentation 2. Conference presentations 3. Industry analysis reports 4. Academic validation papers
Web Resources
Direct Links
- StatsBomb Open Data: github.com/statsbomb/open-data
- FBref: fbref.com
- Understat: understat.com
- Transfermarkt: transfermarkt.com
- Football-Data.co.uk: football-data.co.uk
Communities
- Soccer Analytics Twitter/X: #SoccerAnalytics hashtag
- Reddit: r/SoccerAnalytics
- Discord: Various soccer analytics servers
Continue to Chapter 3: Statistical Foundations for Soccer Analysis