Scraping FBref for Stats
Beginner
10 min read
1 views
Nov 27, 2025
FBref provides comprehensive soccer statistics powered by StatsBomb and Opta data. While they don't offer an official API, respectful web scraping can extract valuable data for analysis.
## Setting Up Your Scraper
Use Python with requests and BeautifulSoup for basic scraping:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
def scrape_fbref_table(url):
# Add delay to be respectful to the server
time.sleep(3)
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
# Find the stats table
table = soup.find('table')
df = pd.read_html(str(table))[0]
return df
```
## Extracting Player Statistics
```python
# Example: Get Premier League player stats
league_url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
player_stats = scrape_fbref_table(league_url)
# Clean column names if multi-level
if isinstance(player_stats.columns, pd.MultiIndex):
player_stats.columns = ['_'.join(col).strip() for col in player_stats.columns.values]
```
## Scraping Match Data
```python
def get_match_stats(match_id):
url = f'https://fbref.com/en/matches/{match_id}/'
time.sleep(3)
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
tables = pd.read_html(response.content)
# Extract team stats, player stats, and shot data
return {
'team_stats': tables[0],
'player_stats': tables[1],
'shots': tables[-1]
}
```
## Best Practices
When scraping FBref, follow these guidelines:
- Add delays between requests (3-5 seconds minimum)
- Use appropriate User-Agent headers
- Cache downloaded data to avoid repeated requests
- Respect robots.txt directives
- Consider rate limiting during peak hours
- Store data locally rather than querying repeatedly
## Data Processing
FBref data often requires cleaning:
```python
def clean_fbref_data(df):
# Remove header rows that repeat in long tables
df = df[df['Player'] != 'Player']
# Convert numeric columns
numeric_cols = df.select_dtypes(include=['object']).columns
for col in numeric_cols:
df[col] = pd.to_numeric(df[col], errors='ignore')
return df
```
Always verify that your scraping practices comply with FBref's terms of service and consider supporting them through their subscription service if you use the data extensively.
Discussion
Have questions or feedback? Join our community discussion on
Discord or
GitHub Discussions.
Table of Contents
Related Topics
Quick Actions