Further Reading: Data Sources and Collection in Soccer

This annotated bibliography provides resources for deeper exploration of soccer data sources and collection.


Essential Documentation

StatsBomb Open Data

GitHub: statsbomb/open-data

The official repository for StatsBomb's free data offering. Includes documentation, specifications, and data files. Essential reading for anyone working with this data.

Best for: Understanding StatsBomb data structure and accessing free data


StatsBomb Data Specification

StatsBomb official documentation

Detailed specification of StatsBomb's event data format, including all event types, qualifiers, and coordinate systems. Critical reference when working with their data.

Best for: Understanding exact definitions of events and fields


Opta Definitions

Stats Perform documentation (requires access)

Opta's official definitions for their event taxonomy. Important for understanding legacy data and media statistics.

Best for: Working with Opta-derived statistics


Academic Papers on Data Collection

"Quality Assessment of Football Metrics"

Linke, D., Link, D., & Lames, M. (2018)

Systematic analysis of positional data quality in football. Examines accuracy, precision, and reliability of different tracking systems.

Best for: Understanding tracking data limitations


"Semi-Automated Possession Tracking"

Various authors in sports engineering journals

Papers examining the methodology behind event and possession tracking, including inter-rater reliability studies.

Best for: Understanding event data accuracy


"Validation of GPS-Based Positional Data"

Multiple sports science journals

Numerous validation studies comparing GPS to optical tracking systems, establishing accuracy benchmarks.

Best for: Evaluating GPS tracking limitations


Technical Tutorials

Friends of Tracking GitHub

GitHub: Friends-of-Tracking-Data-FoTD

Collection of Jupyter notebooks teaching soccer analytics using open data. Includes data loading, processing, and visualization tutorials.

Best for: Hands-on learning with code examples


McKay Johns' Tutorial Series

Blog and YouTube

Step-by-step tutorials on accessing and processing soccer data from various sources.

Best for: Practical data access guidance


Soccer Analytics Handbook (online resources)

Various community-created guides

Community-maintained guides to working with different data sources and APIs.

Best for: Troubleshooting specific data access issues


Data Provider Resources

StatsBomb Blog

statsbomb.com/articles

Regular articles explaining their data methodology, new features, and analytical approaches. Essential for understanding their data philosophy.

Best for: Staying current with StatsBomb offerings


Stats Perform AI

statsperform.com

Information on Stats Perform's AI and computer vision applications for data collection and analysis.

Best for: Understanding future directions in data collection


Wyscout Documentation

Wyscout help center

Documentation for Wyscout's platform and data exports. Important for those using their scouting platform.

Best for: Wyscout-specific guidance


Python Libraries

statsbombpy

PyPI: statsbombpy

Official Python wrapper for StatsBomb data API. Well-documented with examples.

pip install statsbombpy

Best for: Accessing StatsBomb open and paid data


kloppy

PyPI: kloppy

Standardized data loading for multiple soccer data formats. Converts between providers' formats to common standard.

pip install kloppy

Best for: Working with multiple data sources


socceraction

PyPI: socceraction

Library for soccer action valuation, built on top of standard data formats.

pip install socceraction

Best for: Advanced event data analysis


mplsoccer

PyPI: mplsoccer

Visualization library for soccer analytics, including pitch plotting and heat maps.

pip install mplsoccer

Best for: Creating soccer visualizations


Industry Reports

Sports Data Landscape Reports

Various consulting firms

Annual reports on the sports data industry, market sizes, and competitive dynamics.

Best for: Business context and market understanding


Conference Presentations

StatsBomb Conference, OptaPro Forum recordings

Presentations from data providers explaining their methodologies and new products.

Best for: Current industry developments


Data Quality and Validation

"Reproducible Research in Sports Analytics"

Various academic papers

Papers discussing best practices for data documentation, validation, and reproducibility in sports analysis.

Best for: Developing rigorous data practices


Data Validation Frameworks

General data science resources

Resources on data validation frameworks (e.g., Great Expectations, pandas validation) applicable to soccer data.

Best for: Building systematic validation processes


For Data Access Beginners: 1. StatsBomb Open Data documentation 2. Friends of Tracking tutorials 3. statsbombpy library documentation 4. FBref website exploration

For Pipeline Development: 1. kloppy documentation 2. Data validation resources 3. Database design tutorials 4. ETL best practices

For Industry Understanding: 1. Provider websites and documentation 2. Conference presentations 3. Industry analysis reports 4. Academic validation papers


Web Resources

  • StatsBomb Open Data: github.com/statsbomb/open-data
  • FBref: fbref.com
  • Understat: understat.com
  • Transfermarkt: transfermarkt.com
  • Football-Data.co.uk: football-data.co.uk

Communities

  • Soccer Analytics Twitter/X: #SoccerAnalytics hashtag
  • Reddit: r/SoccerAnalytics
  • Discord: Various soccer analytics servers

Continue to Chapter 3: Statistical Foundations for Soccer Analysis