Setting Up Your Soccer Analytics Environment

Beginner 10 min read 0 views Nov 27, 2025

Building Your Analytics Toolkit

This guide walks you through setting up a complete environment for soccer analytics, from installing Python/R to configuring essential libraries and verifying everything works correctly.

Python Setup for Soccer Analytics

Step 1: Install Python

Recommended: Python 3.8 or higher

Download: python.org/downloads

Verify installation:

python --version
# Should output: Python 3.x.x

Step 2: Create Virtual Environment

Setting up isolated environment

# Create virtual environment
python -m venv soccer_env

# Activate (Windows)
soccer_env\Scripts\activate

# Activate (Mac/Linux)
source soccer_env/bin/activate

# Your prompt should now show (soccer_env)

Why Virtual Environments?

Virtual environments keep your project dependencies isolated, preventing version conflicts between projects. Always use one for soccer analytics work.

Step 3: Install Core Libraries

Essential Python packages

# Update pip first
pip install --upgrade pip

# Data manipulation and analysis
pip install pandas numpy scipy

# Visualization
pip install matplotlib seaborn plotly

# Soccer-specific libraries
pip install statsbombpy mplsoccer kloppy socceraction

# Machine learning (optional but useful)
pip install scikit-learn

# Jupyter notebooks
pip install jupyter notebook

# Additional useful tools
pip install requests beautifulsoup4

Step 4: Verify Installation

Test your setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsbombpy import sb
from mplsoccer import Pitch

print("Python Setup Complete!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

# Test StatsBomb connection
comps = sb.competitions()
print(f"\nStatsBomb competitions available: {len(comps)}")

# Test pitch plotting
pitch = Pitch()
fig, ax = pitch.draw(figsize=(10, 7))
plt.title("Soccer Pitch - Setup Successful!")
plt.show()

print("\n✓ All libraries working correctly!")

R Setup for Soccer Analytics

Step 1: Install R and RStudio

Required Software

  1. R (4.0+): cran.r-project.org
  2. RStudio: posit.co/download/rstudio-desktop

Verify installation: Open RStudio and run:

version

Step 2: Install Core Packages

Essential R packages for soccer analytics

# Data manipulation
install.packages("tidyverse")  # Includes dplyr, ggplot2, tidyr
install.packages("data.table")

# Soccer data access
install.packages("StatsBombR")
install.packages("worldfootballR")

# Visualization
install.packages("ggsoccer")
install.packages("ggrepel")
install.packages("patchwork")

# Additional useful packages
install.packages("plotly")      # Interactive plots
install.packages("scales")      # Scaling functions
install.packages("lubridate")   # Date handling

# Machine learning (optional)
install.packages("caret")
install.packages("randomForest")

print("R packages installed successfully!")

Step 3: Load and Test Libraries

Verify R setup

library(tidyverse)
library(StatsBombR)
library(ggsoccer)
library(worldfootballR)

cat("R Setup Complete!\n")
cat(sprintf("R version: %s\n", R.version.string))

# Test StatsBomb connection
comps <- FreeCompetitions()
cat(sprintf("\nStatsBomb competitions available: %d\n", nrow(comps)))

# Test pitch plotting
ggplot() +
  annotate_pitch() +
  theme_pitch() +
  labs(title = "Soccer Pitch - Setup Successful!") +
  coord_fixed(ratio = 1)

cat("\n✓ All packages working correctly!")

IDE Configuration

Visual Studio Code (Python)

Recommended Extensions

  • Python - Microsoft's official Python extension
  • Jupyter - Notebook support
  • Pylance - Fast Python language server
  • Python Indent - Correct Python indentation

Configuration

{
    "python.defaultInterpreterPath": "./soccer_env/bin/python",
    "jupyter.jupyterServerType": "local",
    "python.linting.enabled": true,
    "python.formatting.provider": "black"
}

RStudio Configuration

Recommended Settings

Tools → Global Options:

  • General: Uncheck "Restore .RData into workspace at startup"
  • Code → Display: Enable "Show margin" (80 characters)
  • Appearance: Choose your preferred theme
  • Packages: Set CRAN mirror to nearest location

Project Structure

Recommended Directory Layout

soccer_analytics/
│
├── data/
│   ├── raw/              # Original data files
│   ├── processed/        # Cleaned datasets
│   └── external/         # Data from external sources
│
├── notebooks/            # Jupyter/R Markdown files
│   ├── exploratory/      # Initial data exploration
│   └── analysis/         # Final analyses
│
├── scripts/              # Python/R scripts
│   ├── data_loading.py   # Data import functions
│   ├── preprocessing.py  # Data cleaning
│   └── visualization.py  # Plotting functions
│
├── outputs/              # Generated files
│   ├── figures/          # Plots and charts
│   └── reports/          # Analysis reports
│
├── requirements.txt      # Python dependencies
└── README.md             # Project documentation

Creating Project Structure

Bash/Terminal

mkdir -p soccer_analytics/{data/{raw,processed,external},notebooks/{exploratory,analysis},scripts,outputs/{figures,reports}}

Creating a Requirements File

Python: requirements.txt

# Data manipulation
pandas>=1.5.0
numpy>=1.23.0
scipy>=1.9.0

# Visualization
matplotlib>=3.6.0
seaborn>=0.12.0
plotly>=5.11.0

# Soccer analytics
statsbombpy>=1.2.0
mplsoccer>=1.1.0
kloppy>=3.5.0
socceraction>=1.2.0

# Machine learning
scikit-learn>=1.1.0

# Jupyter
jupyter>=1.0.0
notebook>=6.5.0

# Utilities
requests>=2.28.0
beautifulsoup4>=4.11.0

Install from requirements:

pip install -r requirements.txt

R: packages.R

# packages.R - Install all required R packages

required_packages <- c(
  # Data manipulation
  "tidyverse",
  "data.table",
  "lubridate",

  # Soccer data
  "StatsBombR",
  "worldfootballR",

  # Visualization
  "ggsoccer",
  "ggrepel",
  "patchwork",
  "plotly",
  "scales",

  # Machine learning
  "caret",
  "randomForest"
)

install_if_missing <- function(packages) {
  new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
  if(length(new_packages)) {
    install.packages(new_packages)
  }
}

install_if_missing(required_packages)
cat("All packages installed!\n")

Run installation script:

source("packages.R")

Testing Your Complete Setup

Python: End-to-End Test

"""
complete_test.py - Comprehensive setup verification
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsbombpy import sb
from mplsoccer import Pitch, VerticalPitch

def test_setup():
    print("Testing Soccer Analytics Environment\n" + "="*50)

    # Test 1: Data loading
    print("\n1. Testing data access...")
    comps = sb.competitions()
    print(f"   ✓ Loaded {len(comps)} competitions")

    # Test 2: Data processing
    print("\n2. Testing data manipulation...")
    matches = sb.matches(competition_id=43, season_id=3)
    print(f"   ✓ Processed {len(matches)} matches")

    # Test 3: Basic analysis
    print("\n3. Testing analysis capabilities...")
    events = sb.events(match_id=matches.iloc[0]['match_id'])
    shots = events[events['type'] == 'Shot']
    print(f"   ✓ Analyzed {len(shots)} shots")

    # Test 4: Visualization
    print("\n4. Testing visualization...")
    pitch = Pitch(pitch_type='statsbomb', pitch_color='grass')
    fig, ax = pitch.draw(figsize=(10, 7))

    # Plot shots
    for i, shot in shots.iterrows():
        if pd.notna(shot['location']):
            x, y = shot['location']
            color = 'red' if shot['shot_outcome'] == 'Goal' else 'white'
            ax.scatter(x, y, c=color, s=100, edgecolors='black', alpha=0.7)

    ax.set_title('Shot Map - Setup Test', fontsize=16, fontweight='bold')
    plt.savefig('test_plot.png', dpi=150, bbox_inches='tight')
    print("   ✓ Created visualization")
    plt.close()

    print("\n" + "="*50)
    print("All tests passed! Your environment is ready.")
    print(f"Test plot saved as: test_plot.png")

if __name__ == "__main__":
    test_setup()

R: End-to-End Test

# complete_test.R - Comprehensive setup verification

test_setup <- function() {
  cat("Testing Soccer Analytics Environment\n")
  cat(strrep("=", 50), "\n")

  # Test 1: Data loading
  cat("\n1. Testing data access...\n")
  comps <- FreeCompetitions()
  cat(sprintf("   ✓ Loaded %d competitions\n", nrow(comps)))

  # Test 2: Data processing
  cat("\n2. Testing data manipulation...\n")
  matches <- FreeMatches(competition_id = 43, season_id = 3)
  cat(sprintf("   ✓ Processed %d matches\n", nrow(matches)))

  # Test 3: Basic analysis
  cat("\n3. Testing analysis capabilities...\n")
  match_data <- get.matchFree(matches[1, ])
  events <- allclean(match_data)
  shots <- events %>% filter(type.name == "Shot")
  cat(sprintf("   ✓ Analyzed %d shots\n", nrow(shots)))

  # Test 4: Visualization
  cat("\n4. Testing visualization...\n")

  plot <- ggplot(shots, aes(x = location.x, y = location.y,
                             color = shot.outcome.name)) +
    annotate_pitch(dimensions = pitch_statsbomb) +
    geom_point(size = 4, alpha = 0.7) +
    scale_color_manual(values = c(
      "Goal" = "#FF0000",
      "Saved" = "#FFFF00",
      "Off T" = "#FFFFFF",
      "Blocked" = "#808080"
    )) +
    theme_pitch() +
    coord_fixed(ratio = 1) +
    labs(title = "Shot Map - Setup Test",
         color = "Outcome")

  ggsave("test_plot.png", plot, width = 10, height = 7, dpi = 150)
  cat("   ✓ Created visualization\n")

  cat("\n", strrep("=", 50), "\n")
  cat("All tests passed! Your environment is ready.\n")
  cat("Test plot saved as: test_plot.png\n")
}

# Run the test
test_setup()

Common Setup Issues

Python Issues

Issue: "pip not found"

Solution:

python -m ensurepip --upgrade

Issue: Permission errors during installation

Solution:

pip install --user package_name

Issue: StatsBomb data not loading

Solution: Check internet connection and firewall settings. StatsBomb data is fetched from GitHub.

R Issues

Issue: Package installation fails

Solution: Update R to latest version and try:

install.packages("package_name", dependencies = TRUE)

Issue: "Cannot load shared object"

Solution: Install system dependencies (varies by OS):

# Ubuntu/Debian
sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev

# macOS
brew install openssl libxml2

Environment Ready!

With your setup complete, you're ready to:

  • Load soccer data from multiple sources
  • Perform comprehensive analyses
  • Create professional visualizations
  • Build machine learning models

Next: Try your first soccer analysis!

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.