Soccer Data Landscape

Beginner 10 min read 0 views Nov 27, 2025

Navigating Soccer Data Sources

Soccer analytics requires quality data. This guide explores the major data providers, from free open datasets to premium professional services, helping you choose the right sources for your projects.

Free & Open Data Sources

FBref (Football Reference)

FREE Aggregated Stats

Best for: Season-long statistics, player comparisons

Coverage:

  • 30+ leagues worldwide
  • Historical data from multiple seasons
  • Powered by StatsBomb and Opta data
  • Advanced metrics (xG, xA, progressive passes)

Python: Scraping FBref Data

from worldfootballR import fb

# Get Big 5 European League stats
# Available leagues: Premier League, La Liga, Bundesliga, Serie A, Ligue 1
season_stats = fb.get_season_team_stats(
    country="ENG",
    gender="M",
    season_end_year=2024,
    tier="1st",
    stat_type="shooting"
)

# Analyze top scorers
top_scorers = season_stats.nlargest(10, 'goals')
print("Top 10 Scorers:")
print(top_scorers[['player', 'squad', 'goals', 'xG', 'shots']])

# Get player scouting report
player_stats = fb.get_player_scouting_report(
    player_url="https://fbref.com/en/players/21a66f6a/Erling-Haaland",
    pos_versus=["FW"]
)
print("\nPlayer Scouting Report:")
print(player_stats)

Wyscout API (Limited Free Access)

LIMITED FREE Event Data

Best for: Academic research (free academic licenses available)

Features:

  • Extensive coverage across 30+ leagues
  • Detailed event data with tags
  • Player attributes and market values
  • Video clips linked to events

Premium Data Providers

Opta Sports

PREMIUM Event Data

Used by: Premier League, ESPN, major broadcasters

Capabilities:

  • Real-time match data collection
  • 40+ years of historical data
  • 150+ competitions worldwide
  • Advanced metrics and expected goals models
  • Custom API access

Pricing: Enterprise pricing (contact sales)

Best for: Professional clubs, media companies, betting operators

StatsBomb (Professional)

PREMIUM Event Data

Used by: Top European clubs, national teams

Unique Features:

  • 360-degree freeze frames for every event
  • Pressure and defensive action context
  • Industry-leading xG model
  • IQ platform for visual analysis
  • Custom metrics and KPIs

Second Spectrum (Tracking Data)

PREMIUM Tracking Data

Used by: Premier League, Bundesliga, MLS

Technology:

  • Optical tracking system (25+ times per second)
  • Player and ball x,y coordinates
  • Speed, acceleration, distance metrics
  • Space occupation and team shape analysis
  • Machine learning-powered insights

Comparison Table

Provider Cost Data Type Coverage Best For
StatsBomb Open Free Event 7+ competitions Learning, portfolios
FBref Free Aggregated 30+ leagues Season statistics
Understat Free xG Stats Top 6 leagues xG analysis
Opta $$$$$ Event 150+ comps Professionals
StatsBomb Pro $$$$ Event 50+ comps Clubs, analysts
Wyscout $$$$ Event + Video 30+ leagues Scouting, research
Second Spectrum $$$$$ Tracking Select leagues Elite clubs

API Libraries and Tools

Python Libraries

statsbombpy

Official StatsBomb Python library

pip install statsbombpy

mplsoccer

Soccer pitch visualization

pip install mplsoccer

socceraction

Convert events to VAEP actions

pip install socceraction

kloppy

Standardize event and tracking data

pip install kloppy

R Packages

Installing R Soccer Packages

# StatsBomb data access
install.packages("StatsBombR")

# World football data scraping
install.packages("worldfootballR")

# Soccer pitch plotting
install.packages("ggsoccer")

# Advanced plotting
install.packages("ggplot2")

# Data manipulation
install.packages("dplyr")

# Load libraries
library(StatsBombR)
library(worldfootballR)
library(ggsoccer)
library(ggplot2)
library(dplyr)

Choosing the Right Data Source

Decision Guide:

  • Learning soccer analytics? → Start with StatsBomb Open Data
  • Building a portfolio project? → StatsBomb Open + FBref
  • Academic research? → StatsBomb Open or Wyscout academic license
  • Professional club analysis? → Opta or StatsBomb Professional
  • Broadcasting/media? → Opta with real-time feeds
  • Advanced tactical analysis? → Second Spectrum tracking data

Data Quality Considerations

  • Consistency: Event definitions vary between providers
  • Coverage: Not all providers cover all leagues equally
  • Timeliness: Free data may have delays; professional feeds are real-time
  • Granularity: Tracking data offers more detail than event data
  • Cost vs. Value: Premium data is expensive but offers significant advantages

Getting API Access

StatsBomb Open Data Setup

# Python
pip install statsbombpy

# R
install.packages("StatsBombR")

No API key required for open data. Just install and start using!

Wyscout Academic Access

Steps:

  1. Visit Wyscout's academic program page
  2. Provide proof of academic affiliation (.edu email)
  3. Describe your research project
  4. Receive API credentials within 2-4 weeks

Ready to Start?

Now that you know the data landscape, proceed to:

  • Set up your development environment
  • Load your first dataset
  • Conduct basic analysis

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.