Expected Goals (xG) Fundamentals

Intermediate 10 min read 444 views Nov 25, 2025

Expected Goals (xG) Fundamentals

What is Expected Goals (xG)?

Expected Goals (xG) is a metric that quantifies the quality of shooting opportunities by assigning a probability (0 to 1) to each shot based on historical conversion rates. An xG value of 0.15 means that historically, similar shots are converted 15% of the time.

xG represents a fundamental shift in soccer analysis: rather than simply counting goals scored (which depends on finishing quality and luck), xG measures the underlying quality of chances a team creates and faces. Over a full season, xG becomes highly predictive of actual goals scored and is one of the most reliable indicators of team performance.

Mathematical Foundation

Basic xG Model:

xG = P(Goal | Shot Features)

Where P represents the probability of scoring given the shot characteristics. Modern xG models use logistic regression:

xG = 1 / (1 + e^(-β₀ - β₁X₁ - β₂X₂ - ... - βₙXₙ))

Key variables (X₁, X₂, ... Xₙ) typically include:

  • Distance to goal: Shot distance in meters (primary predictor)
  • Angle: Horizontal angle relative to goal center
  • Type: Header, left foot, right foot, or other
  • Assist type: Cross, through ball, dribble, rebound
  • Defensive pressure: Number of defending players nearby
  • Goalkeeper position: Distance from goal line (advanced models)
  • Shot speed/accuracy: Quality of strike (data-dependent)

Understanding xG Values

xG Range Shooting Quality Example
0.00 - 0.05 Very poor shot 30+ meter effort from difficult angle
0.05 - 0.15 Poor to below average Long range shot or heavily defended
0.15 - 0.30 Average shot Outside box, moderate angle
0.30 - 0.50 Good to very good Inside box, clear sight of goal
0.50 - 1.00 Excellent (clear chance) One-on-one or tap-in opportunity

xG for Teams and Players

Team xG Calculation:

Team xG = Σ xG_value for each shot

Interpretation:

  • Attacking xG: Sum of all xG for shots a team takes (xG For)
  • Defensive xG: Sum of all xG for shots opponents take (xG Against)
  • xG Differential: xG For - xG Against (strong predictor of points over a season)

Real-World Examples

Example 1: Liverpool vs Fulham (2023-24 Season)

Liverpool took 22 shots with 2.8 xG but only scored 1 goal. They created quality chances but underperformed their underlying metric due to poor finishing and goalkeeper saves. Over the season, their xG proved more predictive than this single match outcome.

Example 2: Manchester City 2022-23 Season

Manchester City finished with 2.51 xG per match and 3.13 goals per match—outperforming their xG due to elite finishing and tactical execution. Their xG differential of +1.23 per match was a key indicator of their title-winning performance.

Python Code Example


import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

# Sample shot data
shots_data = {
    'distance_m': [12, 18, 25, 8, 16, 22],
    'angle_deg': [10, 25, 45, 5, 15, 35],
    'shot_type': ['right_foot', 'header', 'right_foot', 'left_foot', 'right_foot', 'header'],
    'assist_type': ['cross', 'through_ball', 'dribble', 'cross', 'through_ball', 'rebound'],
    'defenders_nearby': [2, 3, 1, 0, 2, 4],
    'goal': [1, 0, 0, 1, 1, 0]  # 1 = goal scored, 0 = not scored
}

df = pd.DataFrame(shots_data)

# Feature engineering
df['distance_normalized'] = df['distance_m'] / df['distance_m'].max()
df['angle_normalized'] = df['angle_deg'] / 90

# Encode categorical features
shot_type_encoded = pd.get_dummies(df['shot_type'], prefix='shot')
assist_type_encoded = pd.get_dummies(df['assist_type'], prefix='assist')

# Combine features
X = pd.concat([
    df[['distance_normalized', 'angle_normalized', 'defenders_nearby']],
    shot_type_encoded,
    assist_type_encoded
], axis=1)

y = df['goal']

# Train logistic regression model (xG model)
model = LogisticRegression(random_state=42)
model.fit(X, y)

# Calculate xG for each shot
df['xG'] = model.predict_proba(X)[:, 1]

# Team xG
team_xG = df['xG'].sum()
team_goals = df['goal'].sum()

print(f"Team xG: {team_xG:.2f}")
print(f"Actual Goals: {team_goals}")
print(f"xG per Shot: {team_xG/len(df):.2f}")
print("
Shot-by-shot breakdown:")
print(df[['distance_m', 'angle_deg', 'xG', 'goal']])

R Code Example


# R Implementation of xG Calculation

# Install required packages
# install.packages(c("tidyverse", "caret", "pROC"))

library(tidyverse)
library(caret)

# Sample shot data
shots_data <- tibble(
  distance_m = c(12, 18, 25, 8, 16, 22),
  angle_deg = c(10, 25, 45, 5, 15, 35),
  shot_type = c("right_foot", "header", "right_foot", "left_foot", "right_foot", "header"),
  assist_type = c("cross", "through_ball", "dribble", "cross", "through_ball", "rebound"),
  defenders_nearby = c(2, 3, 1, 0, 2, 4),
  goal = c(1, 0, 0, 1, 1, 0)
)

# Feature engineering
shots_processed <- shots_data %>%
  mutate(
    distance_normalized = distance_m / max(distance_m),
    angle_normalized = angle_deg / 90
  ) %>%
  # One-hot encode categorical features
  mutate(
    shot_right_foot = ifelse(shot_type == "right_foot", 1, 0),
    shot_left_foot = ifelse(shot_type == "left_foot", 1, 0),
    shot_header = ifelse(shot_type == "header", 1, 0),
    assist_cross = ifelse(assist_type == "cross", 1, 0),
    assist_through_ball = ifelse(assist_type == "through_ball", 1, 0),
    assist_dribble = ifelse(assist_type == "dribble", 1, 0),
    assist_rebound = ifelse(assist_type == "rebound", 1, 0)
  )

# Prepare data for modeling
X <- shots_processed %>%
  select(distance_normalized, angle_normalized, defenders_nearby,
         shot_right_foot, shot_left_foot, shot_header,
         assist_cross, assist_through_ball, assist_dribble, assist_rebound)

y <- shots_processed$goal

# Train logistic regression (xG model)
xg_model <- glm(goal ~ distance_normalized + angle_normalized + defenders_nearby +
                   factor(shot_type) + factor(assist_type),
                data = shots_processed,
                family = binomial(link = "logit"))

# Calculate xG predictions
shots_processed$xG <- predict(xg_model, type = "response")

# Team statistics
team_stats <- shots_processed %>%
  summarise(
    Total_xG = sum(xG),
    Actual_Goals = sum(goal),
    Shots = n(),
    xG_per_Shot = sum(xG) / n(),
    Conversion_Rate = sum(goal) / n(),
    Over_Under_Performance = sum(goal) - sum(xG)
  )

print(team_stats)

# Visualization
shots_processed %>%
  ggplot(aes(x = distance_m, y = xG, color = factor(goal))) +
  geom_point(size = 3) +
  scale_color_manual(values = c("0" = "red", "1" = "green"),
                     labels = c("Missed", "Scored")) +
  labs(title = "Shot Quality (xG) vs Distance",
       x = "Distance to Goal (m)",
       y = "Expected Goals (xG)",
       color = "Outcome") +
  theme_minimal()

Interpretation Guidelines

For Attacking Performance:

  • High xG, High Goals: Team is creating quality chances and finishing well. Sustainable performance.
  • High xG, Low Goals: Team is creating quality chances but missing them. Likely to improve (or could indicate goalkeeper performance).
  • Low xG, High Goals: Team is clinical but not creating many chances. Unsustainable; likely to regress.
  • Low xG, Low Goals: Poor chance creation and finishing. Performance likely to continue.

For Defensive Performance:

  • High xG Against: Giving up quality chances; defensive shape or pressure needs improvement.
  • Low xG Against: Strong defensive organization limiting opponent opportunities.

Practical Applications in Top Leagues

Premier League Usage

All 20 Premier League clubs use xG models internally. Public xG data has become standard in match commentary and post-match analysis. Teams use xG for:

  • Evaluating striker recruitment and performance (beyond just goals)
  • Assessing defensive vulnerabilities
  • Understanding match dynamics and tactical effectiveness

Player Evaluation

Shot-Based Evaluation: A striker with 8 goals from 6.2 xG has overperformed; one with 4 goals from 5.8 xG is underperforming. Over a full season, actual goals and xG converge.

Recruitment Implications: Clubs may prefer a player with consistent 0.35 xG per 90 minutes over one with inconsistent 0.50 xG per 90, recognizing that quality and sustainability matter.

Tactical Optimization

Teams compare their xG against different opponent types and tactical setups to identify which formations and strategies generate the highest quality chances. This informs match preparation and in-game adjustments.

Limitations of xG

Model Specificity: xG values differ between data providers (StatsBomb vs Opta) because they use different models and shot categorizations. A 0.30 xG from one provider might be 0.25 from another.

Finishing Quality Variance: xG models use historical averages. Elite finishers (prime Ronaldo, Messi) convert high-xG chances at above-average rates, while struggling strikers underperform.

Goalkeeper Impact: Standard xG doesn't account for goalkeeper position or quality. A shotstopper facing 0.5 xG may allow fewer goals than a weaker goalkeeper facing the same xG.

Sample Size: Individual match xG can be misleading. Teams with 0.8 xG can lose to teams with 1.2 xG. Over 10+ matches, xG becomes highly predictive.

Advanced xG Concepts

Shot Map xG: Visualizing xG distribution across the pitch to understand where a team creates chances (e.g., primarily from crosses vs open play).

Progressive Passing to Shots: Linking pass networks to shot creation to identify which players or passing patterns lead to high-xG opportunities.

Pressure-adjusted xG: Adjusting xG for defensive pressure at the moment of shot to better account for rushed attempts.

Code Examples

Simple xG Model

Basic xG model using logistic regression with distance and angle features

import numpy as np
from sklearn.linear_model import LogisticRegression

def build_xg_model(shots_df):
    """Build Expected Goals model"""
    # Features: distance, angle, body_part
    shots_df["distance"] = np.sqrt((shots_df["x"] - 100)**2 + (shots_df["y"] - 50)**2)
    shots_df["angle"] = np.arctan2(7.32/2, shots_df["distance"]) * 2

    X = shots_df[["distance", "angle"]]
    y = shots_df["goal"]

    model = LogisticRegression()
    model.fit(X, y)
    return model

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.