Setting Up Your Soccer Analytics Environment
Building Your Analytics Toolkit
This guide walks you through setting up a complete environment for soccer analytics, from installing Python/R to configuring essential libraries and verifying everything works correctly.
Python Setup for Soccer Analytics
Step 1: Install Python
Recommended: Python 3.8 or higher
Download: python.org/downloads
Verify installation:
python --version
# Should output: Python 3.x.x
Step 2: Create Virtual Environment
Setting up isolated environment
# Create virtual environment
python -m venv soccer_env
# Activate (Windows)
soccer_env\Scripts\activate
# Activate (Mac/Linux)
source soccer_env/bin/activate
# Your prompt should now show (soccer_env)
Why Virtual Environments?
Virtual environments keep your project dependencies isolated, preventing version conflicts between projects. Always use one for soccer analytics work.
Step 3: Install Core Libraries
Essential Python packages
# Update pip first
pip install --upgrade pip
# Data manipulation and analysis
pip install pandas numpy scipy
# Visualization
pip install matplotlib seaborn plotly
# Soccer-specific libraries
pip install statsbombpy mplsoccer kloppy socceraction
# Machine learning (optional but useful)
pip install scikit-learn
# Jupyter notebooks
pip install jupyter notebook
# Additional useful tools
pip install requests beautifulsoup4
Step 4: Verify Installation
Test your setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsbombpy import sb
from mplsoccer import Pitch
print("Python Setup Complete!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")
# Test StatsBomb connection
comps = sb.competitions()
print(f"\nStatsBomb competitions available: {len(comps)}")
# Test pitch plotting
pitch = Pitch()
fig, ax = pitch.draw(figsize=(10, 7))
plt.title("Soccer Pitch - Setup Successful!")
plt.show()
print("\n✓ All libraries working correctly!")
R Setup for Soccer Analytics
Step 1: Install R and RStudio
Required Software
- R (4.0+): cran.r-project.org
- RStudio: posit.co/download/rstudio-desktop
Verify installation: Open RStudio and run:
version
Step 2: Install Core Packages
Essential R packages for soccer analytics
# Data manipulation
install.packages("tidyverse") # Includes dplyr, ggplot2, tidyr
install.packages("data.table")
# Soccer data access
install.packages("StatsBombR")
install.packages("worldfootballR")
# Visualization
install.packages("ggsoccer")
install.packages("ggrepel")
install.packages("patchwork")
# Additional useful packages
install.packages("plotly") # Interactive plots
install.packages("scales") # Scaling functions
install.packages("lubridate") # Date handling
# Machine learning (optional)
install.packages("caret")
install.packages("randomForest")
print("R packages installed successfully!")
Step 3: Load and Test Libraries
Verify R setup
library(tidyverse)
library(StatsBombR)
library(ggsoccer)
library(worldfootballR)
cat("R Setup Complete!\n")
cat(sprintf("R version: %s\n", R.version.string))
# Test StatsBomb connection
comps <- FreeCompetitions()
cat(sprintf("\nStatsBomb competitions available: %d\n", nrow(comps)))
# Test pitch plotting
ggplot() +
annotate_pitch() +
theme_pitch() +
labs(title = "Soccer Pitch - Setup Successful!") +
coord_fixed(ratio = 1)
cat("\n✓ All packages working correctly!")
IDE Configuration
Visual Studio Code (Python)
Recommended Extensions
- Python - Microsoft's official Python extension
- Jupyter - Notebook support
- Pylance - Fast Python language server
- Python Indent - Correct Python indentation
Configuration
{
"python.defaultInterpreterPath": "./soccer_env/bin/python",
"jupyter.jupyterServerType": "local",
"python.linting.enabled": true,
"python.formatting.provider": "black"
}
RStudio Configuration
Recommended Settings
Tools → Global Options:
- General: Uncheck "Restore .RData into workspace at startup"
- Code → Display: Enable "Show margin" (80 characters)
- Appearance: Choose your preferred theme
- Packages: Set CRAN mirror to nearest location
Project Structure
Recommended Directory Layout
soccer_analytics/
│
├── data/
│ ├── raw/ # Original data files
│ ├── processed/ # Cleaned datasets
│ └── external/ # Data from external sources
│
├── notebooks/ # Jupyter/R Markdown files
│ ├── exploratory/ # Initial data exploration
│ └── analysis/ # Final analyses
│
├── scripts/ # Python/R scripts
│ ├── data_loading.py # Data import functions
│ ├── preprocessing.py # Data cleaning
│ └── visualization.py # Plotting functions
│
├── outputs/ # Generated files
│ ├── figures/ # Plots and charts
│ └── reports/ # Analysis reports
│
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Creating Project Structure
Bash/Terminal
mkdir -p soccer_analytics/{data/{raw,processed,external},notebooks/{exploratory,analysis},scripts,outputs/{figures,reports}}
Creating a Requirements File
Python: requirements.txt
# Data manipulation
pandas>=1.5.0
numpy>=1.23.0
scipy>=1.9.0
# Visualization
matplotlib>=3.6.0
seaborn>=0.12.0
plotly>=5.11.0
# Soccer analytics
statsbombpy>=1.2.0
mplsoccer>=1.1.0
kloppy>=3.5.0
socceraction>=1.2.0
# Machine learning
scikit-learn>=1.1.0
# Jupyter
jupyter>=1.0.0
notebook>=6.5.0
# Utilities
requests>=2.28.0
beautifulsoup4>=4.11.0
Install from requirements:
pip install -r requirements.txt
R: packages.R
# packages.R - Install all required R packages
required_packages <- c(
# Data manipulation
"tidyverse",
"data.table",
"lubridate",
# Soccer data
"StatsBombR",
"worldfootballR",
# Visualization
"ggsoccer",
"ggrepel",
"patchwork",
"plotly",
"scales",
# Machine learning
"caret",
"randomForest"
)
install_if_missing <- function(packages) {
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) {
install.packages(new_packages)
}
}
install_if_missing(required_packages)
cat("All packages installed!\n")
Run installation script:
source("packages.R")
Testing Your Complete Setup
Python: End-to-End Test
"""
complete_test.py - Comprehensive setup verification
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsbombpy import sb
from mplsoccer import Pitch, VerticalPitch
def test_setup():
print("Testing Soccer Analytics Environment\n" + "="*50)
# Test 1: Data loading
print("\n1. Testing data access...")
comps = sb.competitions()
print(f" ✓ Loaded {len(comps)} competitions")
# Test 2: Data processing
print("\n2. Testing data manipulation...")
matches = sb.matches(competition_id=43, season_id=3)
print(f" ✓ Processed {len(matches)} matches")
# Test 3: Basic analysis
print("\n3. Testing analysis capabilities...")
events = sb.events(match_id=matches.iloc[0]['match_id'])
shots = events[events['type'] == 'Shot']
print(f" ✓ Analyzed {len(shots)} shots")
# Test 4: Visualization
print("\n4. Testing visualization...")
pitch = Pitch(pitch_type='statsbomb', pitch_color='grass')
fig, ax = pitch.draw(figsize=(10, 7))
# Plot shots
for i, shot in shots.iterrows():
if pd.notna(shot['location']):
x, y = shot['location']
color = 'red' if shot['shot_outcome'] == 'Goal' else 'white'
ax.scatter(x, y, c=color, s=100, edgecolors='black', alpha=0.7)
ax.set_title('Shot Map - Setup Test', fontsize=16, fontweight='bold')
plt.savefig('test_plot.png', dpi=150, bbox_inches='tight')
print(" ✓ Created visualization")
plt.close()
print("\n" + "="*50)
print("All tests passed! Your environment is ready.")
print(f"Test plot saved as: test_plot.png")
if __name__ == "__main__":
test_setup()
R: End-to-End Test
# complete_test.R - Comprehensive setup verification
test_setup <- function() {
cat("Testing Soccer Analytics Environment\n")
cat(strrep("=", 50), "\n")
# Test 1: Data loading
cat("\n1. Testing data access...\n")
comps <- FreeCompetitions()
cat(sprintf(" ✓ Loaded %d competitions\n", nrow(comps)))
# Test 2: Data processing
cat("\n2. Testing data manipulation...\n")
matches <- FreeMatches(competition_id = 43, season_id = 3)
cat(sprintf(" ✓ Processed %d matches\n", nrow(matches)))
# Test 3: Basic analysis
cat("\n3. Testing analysis capabilities...\n")
match_data <- get.matchFree(matches[1, ])
events <- allclean(match_data)
shots <- events %>% filter(type.name == "Shot")
cat(sprintf(" ✓ Analyzed %d shots\n", nrow(shots)))
# Test 4: Visualization
cat("\n4. Testing visualization...\n")
plot <- ggplot(shots, aes(x = location.x, y = location.y,
color = shot.outcome.name)) +
annotate_pitch(dimensions = pitch_statsbomb) +
geom_point(size = 4, alpha = 0.7) +
scale_color_manual(values = c(
"Goal" = "#FF0000",
"Saved" = "#FFFF00",
"Off T" = "#FFFFFF",
"Blocked" = "#808080"
)) +
theme_pitch() +
coord_fixed(ratio = 1) +
labs(title = "Shot Map - Setup Test",
color = "Outcome")
ggsave("test_plot.png", plot, width = 10, height = 7, dpi = 150)
cat(" ✓ Created visualization\n")
cat("\n", strrep("=", 50), "\n")
cat("All tests passed! Your environment is ready.\n")
cat("Test plot saved as: test_plot.png\n")
}
# Run the test
test_setup()
Common Setup Issues
Python Issues
Issue: "pip not found"
Solution:
python -m ensurepip --upgrade
Issue: Permission errors during installation
Solution:
pip install --user package_name
Issue: StatsBomb data not loading
Solution: Check internet connection and firewall settings. StatsBomb data is fetched from GitHub.
R Issues
Issue: Package installation fails
Solution: Update R to latest version and try:
install.packages("package_name", dependencies = TRUE)
Issue: "Cannot load shared object"
Solution: Install system dependencies (varies by OS):
# Ubuntu/Debian
sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev
# macOS
brew install openssl libxml2
Environment Ready!
With your setup complete, you're ready to:
- Load soccer data from multiple sources
- Perform comprehensive analyses
- Create professional visualizations
- Build machine learning models
Next: Try your first soccer analysis!