What is Soccer Analytics?

Beginner 10 min read 0 views Nov 27, 2025

The Beautiful Game Meets Data Science

Soccer analytics has transformed from simple statistics like goals and assists to sophisticated data-driven insights that influence every aspect of the modern game. From tactical decisions to player recruitment, data science is now an integral part of professional football.

Key Areas of Soccer Analytics

  • Performance Analysis: Player and team metrics, physical output, technical actions
  • Tactical Analysis: Formation analysis, pressing patterns, space creation
  • Expected Goals (xG): Shot quality assessment and goal probability
  • Player Recruitment: Data-driven scouting and transfer decisions
  • Match Prediction: Forecasting outcomes using statistical models

Evolution of Soccer Analytics

Early Days: Basic Statistics

Traditional soccer statistics focused on simple counting metrics:

  • Goals, assists, and clean sheets
  • Shots, corners, and possession percentage
  • Yellow and red cards

The xG Revolution

Expected Goals (xG) changed everything by asking: "How many goals should a team/player have scored?" Rather than just counting shots, xG evaluates the quality of each chance based on historical data.

Python: Simple xG Concept

import pandas as pd
import numpy as np

# Simplified xG calculation based on shot distance and angle
def calculate_simple_xg(distance, angle, shot_type='foot'):
    """
    Calculate basic xG value for a shot

    Parameters:
    - distance: Distance from goal in meters
    - angle: Angle to goal in degrees
    - shot_type: 'foot', 'header', or 'freekick'
    """
    # Base probability decreases with distance
    base_prob = np.exp(-0.1 * distance)

    # Adjust for angle (wider angle = better chance)
    angle_factor = angle / 90  # Normalize to 0-1

    # Shot type multipliers
    multipliers = {
        'foot': 1.0,
        'header': 0.7,
        'freekick': 0.05
    }

    xg = base_prob * angle_factor * multipliers.get(shot_type, 1.0)
    return min(xg, 1.0)  # Cap at 1.0

# Example shots
shots = pd.DataFrame({
    'player': ['Player A', 'Player B', 'Player C'],
    'distance': [6, 18, 25],
    'angle': [45, 30, 15],
    'shot_type': ['foot', 'header', 'foot']
})

shots['xG'] = shots.apply(
    lambda row: calculate_simple_xg(row['distance'], row['angle'], row['shot_type']),
    axis=1
)

print("Shot Quality Analysis:")
print(shots)
print(f"\nTotal xG: {shots['xG'].sum():.2f}")
print(f"Average xG per shot: {shots['xG'].mean():.2f}")

R: xG Visualization

library(ggplot2)
library(dplyr)

# Create sample xG data
xg_data <- data.frame(
  player = c("Ronaldo", "Messi", "Haaland", "Mbappe", "Kane"),
  goals = c(18, 21, 24, 19, 20),
  xG = c(15.2, 19.8, 22.1, 16.5, 18.9)
)

# Calculate over/underperformance
xg_data$difference <- xg_data$goals - xg_data$xG
xg_data$performance <- ifelse(xg_data$difference > 0, "Outperforming", "Underperforming")

# Create visualization
ggplot(xg_data, aes(x = reorder(player, difference), y = difference, fill = performance)) +
  geom_col() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  coord_flip() +
  scale_fill_manual(values = c("Outperforming" = "#28a745", "Underperforming" = "#dc3545")) +
  labs(
    title = "Player Goal Performance vs Expected Goals",
    subtitle = "Positive values indicate scoring more than expected",
    x = "Player",
    y = "Goals - xG",
    fill = "Performance"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

# Print summary statistics
cat("\nPerformance Summary:\n")
print(xg_data %>% arrange(desc(difference)))

Modern Soccer Analytics Capabilities

1. Event Data Analysis

Every action on the pitch is recorded with precise coordinates, timestamps, and context:

  • Passes (successful, failed, direction, length)
  • Shots (location, body part, pressure, technique)
  • Dribbles, tackles, interceptions
  • Defensive actions and clearances

2. Tracking Data

GPS and optical tracking systems capture player and ball position at 25-50 times per second:

  • Distance covered and sprint speed
  • Acceleration and deceleration patterns
  • Space occupation and team shape
  • Pressing intensity and defensive lines

3. Video Analysis

Computer vision and machine learning extract insights from match footage:

  • Automatic player and ball tracking
  • Action recognition and classification
  • Tactical formation detection
  • Heat maps and movement patterns

Real-World Applications

Liverpool FC's Analytics Success

Liverpool's recruitment strategy relies heavily on data analytics. Their signing of Mohamed Salah in 2017 was partly based on advanced metrics showing his underlying performance at Roma exceeded his goal output. The data team identified him as undervalued - he subsequently became Premier League Golden Boot winner.

Leicester City's 2015-16 Title

Leicester City's remarkable Premier League title win was supported by sophisticated analytics. Their data team identified undervalued players like N'Golo Kanté and Riyad Mahrez, while their tactical approach was optimized using analytics showing the effectiveness of counter-attacking football.

Getting Started with Soccer Analytics

Essential Skills

Programming

Python or R for data manipulation and visualization

Statistics

Understanding probability, regression, and hypothesis testing

Soccer Knowledge

Understanding tactics, positions, and game dynamics

Data Visualization

Creating clear, insightful charts and graphics

Key Metrics to Understand

Metric Description Use Case
xG Expected Goals - shot quality measure Evaluate attacking performance and finishing
xA Expected Assists - pass quality to shots Measure creative contribution
PPDA Passes Allowed Per Defensive Action Measure pressing intensity
Progressive Passes Passes advancing ball significantly Evaluate ball progression ability
Shot Creating Actions Actions leading to shots Measure offensive contribution

Common Pitfalls to Avoid

  • Context Matters: A 65% pass completion rate might be excellent for a striker but poor for a center-back
  • Sample Size: Don't draw conclusions from 2-3 matches; trends emerge over 10+ games
  • Position Differences: Compare players in similar positions and roles
  • League Adjustments: Stats from different leagues need context for comparison

The Future of Soccer Analytics

The field continues to evolve rapidly:

  • AI and Machine Learning: Predicting player development and injury risk
  • Real-time Analysis: Live tactical insights during matches
  • Wearable Technology: Enhanced biometric and performance tracking
  • Fan Engagement: Interactive data experiences for supporters
  • Broadcast Enhancement: Data-driven graphics and commentary

Your Next Steps

Ready to dive deeper? Continue to the next topics to explore:

  • Available soccer data sources and providers
  • Setting up your Python/R environment for soccer analysis
  • Conducting your first match and player analysis
  • Understanding different types of soccer data

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.