35 min read

Expected Goals, universally abbreviated as xG, has become the most influential advanced metric in modern soccer analytics. At its core, xG answers a deceptively simple question: Given the characteristics of a shot, what is the probability that it...

Learning Objectives

  • Explain the concept of expected goals and its theoretical foundation
  • Identify the key features that influence shot quality and conversion probability
  • Build xG models from scratch using logistic regression and machine learning techniques
  • Evaluate model performance using appropriate classification metrics
  • Calculate and interpret xG at the shot, match, and season level
  • Apply xG analysis to evaluate player and team performance
  • Recognize the limitations and common misinterpretations of xG
  • Compare different xG model approaches and understand their trade-offs

Chapter 7: Expected Goals (xG) Models

Learning Objectives

By the end of this chapter, you will be able to:

  1. Explain the concept of expected goals and its theoretical foundation
  2. Identify the key features that influence shot quality and conversion probability
  3. Build xG models from scratch using logistic regression and machine learning techniques
  4. Evaluate model performance using appropriate classification metrics
  5. Calculate and interpret xG at the shot, match, and season level
  6. Apply xG analysis to evaluate player and team performance
  7. Recognize the limitations and common misinterpretations of xG
  8. Compare different xG model approaches and understand their trade-offs

7.1 Introduction to Expected Goals

Expected Goals, universally abbreviated as xG, has become the most influential advanced metric in modern soccer analytics. At its core, xG answers a deceptively simple question: Given the characteristics of a shot, what is the probability that it results in a goal?

7.1.1 The Problem xG Solves

Traditional soccer statistics suffer from severe limitations when measuring offensive performance:

Goals are noisy signals. A team might dominate a match, create numerous high-quality chances, and still lose 1-0 because a single speculative shot deflected in. Over a season, luck tends to even out, but for shorter samples--a single match, a month of fixtures, or a cup competition--the randomness inherent in goal scoring makes raw goal counts unreliable performance indicators.

Shot counts lack context. Knowing that Team A took 15 shots while Team B took 8 tells us something, but it ignores the vast differences in shot quality. A shot from 35 meters with defenders blocking the path is fundamentally different from a tap-in at the six-yard box, yet both count equally as "shots."

Human perception is biased. Fans and analysts alike tend to remember dramatic goals while forgetting saved efforts from identical positions. Our memories weight outcomes rather than processes, making it difficult to objectively assess performance.

xG addresses these problems by assigning each shot a probability value between 0 and 1, representing the historical likelihood that such a shot results in a goal. This transforms raw shot counts into quality-adjusted measures of chance creation.

Intuition: Think of xG like a batting average in baseball. A single at-bat tells you almost nothing, but over hundreds of plate appearances, batting average becomes a reliable indicator of hitting skill. Similarly, a single shot's xG has high variance, but cumulative xG over a season provides a stable measure of chance quality.

7.1.2 The Intuition Behind xG with Detailed Examples

Before exploring the technical history, let us build a strong intuitive understanding of what xG really represents. Imagine placing 100 different professional soccer players in exactly the same shooting situation--the same location, the same defensive arrangement, the same body position. If 12 of those 100 players score, the xG for that shot is 0.12.

Example 1: The tap-in. A striker receives a square pass at the back post with an open goal from 3 meters out. Historically, roughly 93 out of 100 professional players would score from this position. The xG is approximately 0.93. When a player misses this kind of chance, commentators describe it as a "sitter" precisely because almost everyone converts it.

Example 2: The edge-of-the-box curler. A midfielder collects the ball 20 meters from goal, slightly to the right of center, and curls a shot toward the far corner. With two defenders partially blocking the view and the goalkeeper well-positioned, perhaps 4 out of 100 players score. The xG is about 0.04. When a player converts such a shot, we praise the technique--it was genuinely difficult.

Example 3: The penalty. A penalty kick from 12 yards with only the goalkeeper to beat converts roughly 76% of the time across the major European leagues. The xG is approximately 0.76. This is why penalties are often analyzed separately--they represent a completely different game situation.

Example 4: The one-on-one. An attacker runs through on goal, alone against the goalkeeper, receiving a through ball at roughly 15 meters from goal. If the goalkeeper stays on their line, the attacker has a wide angle. Historically, one-on-ones convert at roughly 30-40%, giving an xG of approximately 0.30-0.40. Many fans believe one-on-ones should always be scored, but the data shows otherwise--goalkeepers are effective at narrowing the angle and forcing the shooter into difficult decisions.

Key Insight: xG does not predict what will happen on any individual shot. It tells us what typically happens across thousands of similar situations. The power of xG emerges when we aggregate many shots, washing out the randomness of individual outcomes.

7.1.3 History of xG Development

The concept of expected goals evolved gradually within the analytics community, progressing through several distinct generations of models:

Pre-history: Academic Foundations (2000s). Academic researchers in sports economics and statistics began exploring shot quality quantification well before the "expected goals" terminology emerged. Papers on goal-scoring probabilities examined factors such as shot location and match context, but these efforts remained largely confined to academic journals and did not penetrate the broader soccer conversation.

First Generation: The Blogosphere Pioneers (2012-2014). The term "expected goals" gained traction in the soccer analytics blogosphere during this period. Sam Green is widely credited with publishing one of the earliest public xG models, working with Opta data to demonstrate that shot location alone could predict future goal totals better than past goal totals. His work, published in 2012, showed the analytical community that even simple models held remarkable predictive power.

Around the same time, Michael Caley developed his own xG model at Cartilage Free Captain, incorporating additional features beyond shot location. Caley's model became one of the most widely referenced public xG implementations, and his xG match scorelines gained a significant Twitter following. His work was particularly influential in demonstrating how xG could be used to evaluate match performances and identify teams that were over- or underperforming relative to their underlying chance quality.

Other notable early contributors included Mark Taylor (The Power of Goals), who explored the Poisson distribution's application to soccer scoring, and 11tegen11 (Sander IJtsma), who built models incorporating shot angle and distance. Colin Trainor and Constantinos Chappas also made important early contributions to the public understanding of shot-quality metrics.

Second Generation: Commercial Adoption (2015-2017). Commercial data providers (Opta, StatsBomb, Wyscout) began incorporating xG into their offerings. Models became considerably more sophisticated, incorporating additional features like body part, assist type, game state, and whether the shot followed a dribble or a set piece.

StatsBomb's xG model, developed under the leadership of Ted Knutson, became particularly influential. StatsBomb differentiated itself by incorporating "freeze frame" data--snapshots of all visible player positions at the moment of each shot. This allowed their model to account for factors like the number of defenders between the shooter and the goal, the goalkeeper's position, and whether the shooter was under pressure. The StatsBomb xG model became widely regarded as one of the most accurate publicly available implementations.

During this period, Opta (now Stats Perform) also refined their xG model, and Understat launched as a publicly accessible xG database covering the top five European leagues, making xG data freely available to analysts and fans for the first time.

Third Generation: Mainstream Acceptance and Tracking Data (2018-present). xG entered mainstream discourse. Major broadcasters began displaying xG during and after matches. ESPN, Sky Sports, and Amazon Prime all integrated xG into their coverage. Clubs invested heavily in proprietary models, often combining event data with tracking data (player positions captured at 25 frames per second) to build highly granular models.

The Friends of Tracking initiative, led by David Sumpter and others, created open-source educational materials that democratized understanding of xG models. William Spearman's "Beyond Expected Goals" paper at the MIT Sloan Sports Analytics Conference demonstrated how tracking data could be used to build Expected Possession Value (EPV) models that generalized xG to all on-ball actions.

Today, virtually every professional club employs some form of expected goals analysis, and the metric has spawned related measures like Expected Assists (xA), Expected Goals Against (xGA), and Expected Threat (xT).

Historical Note: The public development of xG models represents one of the most successful examples of open-source analytics collaboration in sports. Many of the foundational ideas were developed by unpaid bloggers who shared their methods freely, and several of these pioneers--including Sam Green, Michael Caley, and Ted Knutson--went on to work in professional soccer analytics.

7.1.4 The Fundamental Principle

At its mathematical core, xG relies on a principle familiar from Chapter 3: regression toward the mean. If a player consistently creates 0.5 xG per shot but scores on 70% of their attempts, probability theory tells us they will eventually regress toward the expected rate. Conversely, a clinical finisher averaging 0.3 xG per shot who converts only 15% is likely experiencing bad luck rather than being a poor finisher.

This principle enables xG to serve two distinct purposes:

  1. Descriptive analysis: Measuring the quality of chances created and allowed
  2. Predictive analysis: Forecasting future goals based on chance quality

The tension between these uses--and when to prioritize each--forms a central theme throughout this chapter.

Common Pitfall: Many analysts use xG purely as a predictive tool ("this team will score more goals next month") or purely as a descriptive tool ("this team created better chances today"). Effective use requires understanding which mode is appropriate. For single-match analysis, xG is descriptive. For multi-match forecasting, xG becomes predictive.


7.2 Features That Determine Shot Quality

The predictive power of an xG model depends entirely on the features it incorporates. Understanding which factors influence shot conversion is essential for both building models and interpreting their outputs. This section covers the feature engineering process in depth, discussing not only what features matter but why they matter and how to extract them from raw data.

7.2.1 Distance from Goal

Distance is the single most important predictor of shot conversion. The relationship follows an intuitive pattern: closer shots convert at higher rates.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsbombpy import sb

# Load World Cup 2018 shots
competitions = sb.competitions()
wc_matches = sb.matches(competition_id=43, season_id=3)

# Collect all shots
all_shots = []
for match_id in wc_matches['match_id']:
    events = sb.events(match_id=match_id)
    shots = events[events['type'] == 'Shot'].copy()
    if len(shots) > 0:
        all_shots.append(shots)

shots_df = pd.concat(all_shots, ignore_index=True)

# Extract shot coordinates
shots_df['x'] = shots_df['location'].apply(lambda loc: loc[0] if isinstance(loc, list) else None)
shots_df['y'] = shots_df['location'].apply(lambda loc: loc[1] if isinstance(loc, list) else None)

# Calculate distance to goal center (StatsBomb coordinates: goal at x=120, y=40)
GOAL_X, GOAL_Y = 120, 40
shots_df['distance'] = np.sqrt((GOAL_X - shots_df['x'])**2 + (GOAL_Y - shots_df['y'])**2)

# Create distance bins and calculate conversion rates
shots_df['distance_bin'] = pd.cut(shots_df['distance'], bins=range(0, 45, 5))
shots_df['is_goal'] = (shots_df['shot_outcome'] == 'Goal').astype(int)

conversion_by_distance = shots_df.groupby('distance_bin').agg({
    'is_goal': ['sum', 'count']
}).droplevel(0, axis=1)
conversion_by_distance['rate'] = conversion_by_distance['sum'] / conversion_by_distance['count']

print("Conversion Rate by Distance:")
print(conversion_by_distance)

The data reveals a steep decline in conversion probability as distance increases:

Distance (m) Shots Goals Conversion Rate
0-5 156 78 50.0%
5-10 412 87 23.1%
10-15 389 45 13.6%
15-20 287 19 8.6%
20-25 198 8 4.0%
25-30 134 3 2.2%
30+ 89 1 1.1%

Importantly, the relationship is non-linear. Moving from 5 meters to 10 meters reduces conversion probability far more than moving from 25 meters to 30 meters. This suggests using distance transformations (log, square root) or polynomial terms in models.

Best Practice: Always include transformed distance features (log distance, distance squared) in your models. The raw distance feature alone forces a linear relationship in logistic regression, which poorly approximates the true steep-then-flat decay curve of conversion rates.

7.2.2 Angle to Goal

Shot angle--the visible width of the goal from the shooter's perspective--provides crucial information independent of distance. A shot from 12 meters directly in front of goal has a much wider target than one from the same distance at a sharp angle near the touchline.

def calculate_shot_angle(x, y, goal_x=120, goal_y=40, goal_width=9.32):
    """
    Calculate the angle to goal in degrees.

    The angle represents the visible width of the goal from the shot location.
    """
    # Goal post positions
    left_post_y = goal_y - goal_width / 2
    right_post_y = goal_y + goal_width / 2

    # Angles to each post (from shooter's perspective)
    angle_to_left = np.arctan2(left_post_y - y, goal_x - x)
    angle_to_right = np.arctan2(right_post_y - y, goal_x - x)

    # Total angle (absolute difference)
    angle = np.abs(angle_to_right - angle_to_left)

    return np.degrees(angle)

shots_df['angle'] = shots_df.apply(
    lambda row: calculate_shot_angle(row['x'], row['y']), axis=1
)

# Analyze angle effect
shots_df['angle_bin'] = pd.cut(shots_df['angle'], bins=[0, 10, 20, 30, 40, 50, 90])
conversion_by_angle = shots_df.groupby('angle_bin').agg({
    'is_goal': ['sum', 'count']
}).droplevel(0, axis=1)
conversion_by_angle['rate'] = conversion_by_angle['sum'] / conversion_by_angle['count']

Shots from narrow angles (< 15 degrees) convert at roughly 5%, while those from wide angles (> 30 degrees) convert at approximately 15-20%. The angle effect is partially captured by distance (central shots are often closer), but provides additional predictive power when included alongside distance.

The interaction between distance and angle is especially important. A shot from 10 meters at a narrow angle may have lower xG than a shot from 12 meters at a wide, central angle. This is why models that include both features outperform those using either alone.

7.2.3 Body Part

The body part used to take a shot significantly affects conversion probability. Shots with the dominant foot typically have the highest accuracy, followed by headers and then the weak foot.

# Analyze conversion by body part
body_part_conversion = shots_df.groupby('shot_body_part').agg({
    'is_goal': ['sum', 'count']
}).droplevel(0, axis=1)
body_part_conversion['rate'] = body_part_conversion['sum'] / body_part_conversion['count']
print("\nConversion by Body Part:")
print(body_part_conversion.sort_values('rate', ascending=False))

Typical findings across datasets:

Body Part Conversion Rate Notes
Right Foot 11-13% Most common shot type
Left Foot 10-12% Slightly lower (fewer left-footed players)
Head 7-9% Lower control, harder to direct
Other 15-25% Rare; often tap-ins or deflections

Headers deserve special consideration. While their baseline conversion rate is lower, headers inside the six-yard box convert at very high rates (40%+), suggesting an interaction between body part and location that sophisticated models should capture.

7.2.4 Shot Type and Technique

Beyond body part, the technique employed affects probability:

  • Placed/side-foot shots: Generally more accurate but less powerful
  • Driven/laces shots: More powerful but less precise
  • Volleys: Difficult to control; lower conversion unless close range
  • Chips/lobs: Situational; depend heavily on goalkeeper position
  • Overhead kicks: Very low conversion; primarily desperation attempts
  • Half-volleys: Moderate difficulty; timing is critical

StatsBomb and other providers include detailed shot technique classifications that enable models to differentiate these scenarios. Including shot technique as a categorical feature typically improves model ROC AUC by 0.01-0.02.

7.2.5 Assist Type and Buildup

How the shooter received the ball significantly impacts conversion:

Through balls often result in high-xG chances because they put the attacker behind the defense with time to compose a shot. The average xG from shots preceded by through balls is roughly 0.15-0.20, compared to 0.08-0.10 for all shots.

Crosses typically produce lower-xG opportunities because the shooter must redirect the ball while dealing with defensive pressure and a moving ball. The average xG from headed shots following crosses is roughly 0.04-0.06.

Cutbacks (passes from the byline back into the box) create excellent chances because they often find attackers unmarked with the goal in front of them. Cutback-assisted shots typically have an average xG of 0.12-0.18.

Rebounds from saves or blocks vary widely; sometimes they produce tap-ins, other times they leave the ball at awkward heights or positions. Rebound shots average around 0.15 xG, but with very high variance.

Direct from corner kicks or free kicks represent set-piece situations with their own distinct xG profiles. Headers from corners average about 0.03-0.04 xG, while direct free kicks from typical positions (20-25 meters) average about 0.04-0.06 xG.

# Analyze by assist type (simplified)
# Check if play pattern or pass type is available
if 'shot_key_pass_id' in shots_df.columns:
    # Identify rebound shots
    shots_df['is_rebound'] = shots_df['shot_type'].str.contains('Rebound', case=False, na=False)

    # Free kicks
    shots_df['is_free_kick'] = shots_df['shot_type'].str.contains('Free Kick', case=False, na=False)

7.2.6 Game State and Context

Situational factors influence shot quality in subtle ways:

Score differential: Teams trailing may take lower-quality shots out of desperation; teams leading may be more selective. Research shows that trailing teams take approximately 15% more shots per match but from slightly worse positions, resulting in lower average xG per shot.

Time remaining: Late-game desperation affects shot selection similar to score differential. In the final 15 minutes, shots from outside the box increase by roughly 20% when a team is trailing, suggesting urgency overrides shot quality discipline.

Player fatigue: Shots late in matches or by substitutes who just entered may differ systematically. Tired defenders leave more space, potentially increasing xG for the shooting team, but tired shooters may be less accurate.

Home/away: Slight venue effects exist, though they're primarily captured through general team performance rather than shot quality. Home teams historically take slightly more shots from central positions, possibly due to crowd influence on referee positioning or defensive confidence.

Previous actions in the possession: The number of passes before a shot, whether a dribble occurred, and the speed of the attack all correlate with shot quality. Fast breaks following turnovers tend to produce higher xG shots because the defense is disorganized.

Callout: Feature Engineering Checklist for xG Models

When building your own xG model, consider including these features in rough order of importance: 1. Distance to goal (and log distance) 2. Angle to goal (and angle in radians) 3. Body part (categorical) 4. Shot type/technique (categorical) 5. Assist type (through ball, cross, cutback, none) 6. Play pattern (open play, corner, free kick, counter-attack) 7. Whether the shot was a first-time shot 8. Number of defenders between shooter and goal (if available) 9. Goalkeeper position relative to goal center (if available) 10. Game state (score differential, time remaining)

7.2.7 Goalkeeper and Defensive Context

The most sophisticated xG models incorporate information about defensive positioning:

Goalkeeper position: A goalkeeper off their line dramatically changes the optimal shot selection and conversion probability. When the goalkeeper is more than 2 meters off their line, chip shots become viable and the xG from certain locations can increase by 50% or more. Conversely, a goalkeeper who has closed down the angle effectively can reduce xG substantially.

Defenders in path: The number of defenders between shooter and goal affects both the physical likelihood of the shot reaching the goal and the psychological pressure on the shooter. Each additional defender in the shooting lane reduces conversion probability by roughly 3-5 percentage points.

Defensive pressure: Was the shooter closed down immediately or given time? Shots taken under high pressure convert at roughly half the rate of uncontested shots from the same location.

These features require tracking data or detailed event annotations, which are not universally available. Models with and without this information can differ substantially in their predictions.

Advanced: Tracking data xG models that include goalkeeper position and defender locations typically improve ROC AUC by 0.03-0.05 over event-data-only models. If your organization has tracking data, these features are worth the additional complexity. If not, event-data models still capture the majority of predictive signal.

Real-World Application: StatsBomb's "freeze frame" data captures the positions of all visible players at the moment of each shot. This enables their xG model to account for defensive pressure and goalkeeper positioning, giving it an edge over models using only shot location. Clubs with access to this data can build more accurate internal models.


7.3 Building an xG Model from Scratch

With feature understanding established, we can now construct an xG model. We'll progress from simple to complex, demonstrating the trade-offs at each level.

7.3.1 Data Preparation

First, we prepare a modeling dataset with relevant features:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

def prepare_shot_data(shots_df):
    """
    Prepare shot data for xG modeling.

    Returns
    -------
    X : pd.DataFrame
        Feature matrix
    y : pd.Series
        Target variable (1 = goal, 0 = no goal)
    """
    df = shots_df.copy()

    # Extract coordinates
    df['x'] = df['location'].apply(lambda loc: loc[0] if isinstance(loc, list) else None)
    df['y'] = df['location'].apply(lambda loc: loc[1] if isinstance(loc, list) else None)

    # Remove rows with missing coordinates
    df = df.dropna(subset=['x', 'y'])

    # Calculate derived features
    GOAL_X, GOAL_Y = 120, 40
    df['distance'] = np.sqrt((GOAL_X - df['x'])**2 + (GOAL_Y - df['y'])**2)
    df['angle'] = df.apply(
        lambda row: calculate_shot_angle(row['x'], row['y']), axis=1
    )

    # Distance from center line (lateral position)
    df['y_abs'] = np.abs(df['y'] - GOAL_Y)

    # Log distance (captures non-linearity)
    df['log_distance'] = np.log(df['distance'] + 1)

    # Angle in radians (sometimes more useful)
    df['angle_radians'] = np.radians(df['angle'])

    # Target variable
    df['is_goal'] = (df['shot_outcome'] == 'Goal').astype(int)

    # Remove penalty kicks (they have their own xG)
    df = df[df['shot_type'] != 'Penalty']

    return df

# Prepare the data
model_data = prepare_shot_data(shots_df)
print(f"Total shots for modeling: {len(model_data)}")
print(f"Goals: {model_data['is_goal'].sum()} ({model_data['is_goal'].mean():.1%})")

7.3.2 Simple Distance-Only Model (Logistic Regression)

The simplest meaningful xG model uses only distance. This serves as our baseline and demonstrates the core logistic regression approach:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss, roc_auc_score, brier_score_loss

# Split data
X = model_data[['distance']].values
y = model_data['is_goal'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Fit simple logistic regression
model_simple = LogisticRegression(random_state=42)
model_simple.fit(X_train, y_train)

# Predictions
y_pred_proba = model_simple.predict_proba(X_test)[:, 1]

# Evaluate
print("Simple Distance Model Performance:")
print(f"Log Loss: {log_loss(y_test, y_pred_proba):.4f}")
print(f"ROC AUC: {roc_auc_score(y_test, y_pred_proba):.4f}")
print(f"Brier Score: {brier_score_loss(y_test, y_pred_proba):.4f}")

# Examine coefficients
print(f"\nIntercept: {model_simple.intercept_[0]:.4f}")
print(f"Distance coefficient: {model_simple.coef_[0][0]:.4f}")

The negative distance coefficient confirms that farther shots have lower conversion probability. We can visualize the implied xG curve:

distances = np.linspace(1, 40, 100).reshape(-1, 1)
xg_predictions = model_simple.predict_proba(distances)[:, 1]

plt.figure(figsize=(10, 6))
plt.plot(distances, xg_predictions, 'b-', linewidth=2)
plt.xlabel('Distance to Goal (meters)')
plt.ylabel('Expected Goals (xG)')
plt.title('Simple Distance-Based xG Model')
plt.grid(True, alpha=0.3)
plt.xlim(0, 40)
plt.ylim(0, 0.5)
plt.show()

The logistic regression approach has a key advantage: interpretability. The model output is a direct function of the input features, and the coefficients tell us exactly how each feature contributes to the prediction. For the distance-only model, we can express the xG formula explicitly:

$$xG = \frac{1}{1 + e^{-(\beta_0 + \beta_1 \cdot distance)}}$$

where $\beta_0$ is the intercept and $\beta_1$ is the distance coefficient. This transparency makes logistic regression a popular choice for organizations that need to explain their models to non-technical stakeholders such as coaches and sporting directors.

7.3.3 Multi-Feature Logistic Regression

Adding angle and body part improves predictions:

# Prepare features
features_lr = ['distance', 'angle', 'log_distance']
categorical_features = ['shot_body_part']

# Filter to rows with all features present
model_subset = model_data.dropna(subset=features_lr + categorical_features)

# Create preprocessing pipeline
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), features_lr),
        ('cat', OneHotEncoder(drop='first', sparse_output=False), categorical_features)
    ]
)

X = model_subset[features_lr + categorical_features]
y = model_subset['is_goal'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Fit preprocessing on training data only
X_train_processed = preprocessor.fit_transform(X_train)
X_test_processed = preprocessor.transform(X_test)

# Fit logistic regression
model_multi = LogisticRegression(random_state=42, max_iter=1000)
model_multi.fit(X_train_processed, y_train)

# Predictions
y_pred_proba_multi = model_multi.predict_proba(X_test_processed)[:, 1]

# Evaluate
print("\nMulti-Feature Logistic Regression Performance:")
print(f"Log Loss: {log_loss(y_test, y_pred_proba_multi):.4f}")
print(f"ROC AUC: {roc_auc_score(y_test, y_pred_proba_multi):.4f}")
print(f"Brier Score: {brier_score_loss(y_test, y_pred_proba_multi):.4f}")

The multi-feature model should show improvement across all metrics, particularly ROC AUC (which measures discriminative ability).

7.3.4 Random Forest Model

Random forests offer a middle ground between the interpretability of logistic regression and the power of gradient boosting. They handle non-linear relationships and feature interactions naturally without requiring explicit feature engineering:

from sklearn.ensemble import RandomForestClassifier

# Random Forest model
features_rf = ['distance', 'angle', 'x', 'y', 'y_abs', 'log_distance']
categorical_features_rf = ['shot_body_part', 'shot_type']

# Prepare data with one-hot encoding
model_subset_rf = model_data.dropna(subset=features_rf + categorical_features_rf)
model_encoded_rf = pd.get_dummies(
    model_subset_rf[features_rf + categorical_features_rf + ['is_goal']],
    columns=categorical_features_rf,
    drop_first=True
)

feature_cols_rf = [c for c in model_encoded_rf.columns if c != 'is_goal']
X_rf = model_encoded_rf[feature_cols_rf].values
y_rf = model_encoded_rf['is_goal'].values

X_train_rf, X_test_rf, y_train_rf, y_test_rf = train_test_split(
    X_rf, y_rf, test_size=0.2, random_state=42, stratify=y_rf
)

# Fit Random Forest
model_rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=8,
    min_samples_leaf=30,
    random_state=42,
    n_jobs=-1
)
model_rf.fit(X_train_rf, y_train_rf)

# Evaluate
y_pred_rf = model_rf.predict_proba(X_test_rf)[:, 1]
print("\nRandom Forest Performance:")
print(f"Log Loss: {log_loss(y_test_rf, y_pred_rf):.4f}")
print(f"ROC AUC: {roc_auc_score(y_test_rf, y_pred_rf):.4f}")
print(f"Brier Score: {brier_score_loss(y_test_rf, y_pred_rf):.4f}")

Random forests have a tendency to produce poorly calibrated probabilities--they often push predictions toward the extremes or toward the center. This is because each tree in the forest produces a 0 or 1 prediction, and the forest averages these. Post-hoc calibration (discussed below) is especially important for random forest xG models.

7.3.5 Gradient Boosting Model

For better performance with non-linear interactions, gradient boosting excels:

from sklearn.ensemble import GradientBoostingClassifier

# Additional features for GB model
features_gb = ['distance', 'angle', 'x', 'y', 'y_abs', 'log_distance']
categorical_features_gb = ['shot_body_part', 'shot_type']

# Prepare data
model_subset_gb = model_data.dropna(subset=features_gb + categorical_features_gb)

# One-hot encode categoricals
model_encoded = pd.get_dummies(
    model_subset_gb[features_gb + categorical_features_gb + ['is_goal']],
    columns=categorical_features_gb,
    drop_first=True
)

feature_cols = [c for c in model_encoded.columns if c != 'is_goal']
X = model_encoded[feature_cols].values
y = model_encoded['is_goal'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Fit Gradient Boosting
model_gb = GradientBoostingClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=4,
    min_samples_leaf=20,
    random_state=42
)
model_gb.fit(X_train, y_train)

# Predictions
y_pred_proba_gb = model_gb.predict_proba(X_test)[:, 1]

# Evaluate
print("\nGradient Boosting Model Performance:")
print(f"Log Loss: {log_loss(y_test, y_pred_proba_gb):.4f}")
print(f"ROC AUC: {roc_auc_score(y_test, y_pred_proba_gb):.4f}")
print(f"Brier Score: {brier_score_loss(y_test, y_pred_proba_gb):.4f}")

# Feature importance
importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': model_gb.feature_importances_
}).sort_values('importance', ascending=False)

print("\nTop 10 Feature Importances:")
print(importance.head(10))

Gradient boosting typically achieves the best performance among traditional ML approaches, with ROC AUC scores of 0.78-0.82 on typical xG modeling tasks.

Common Pitfall: Beware of overfitting when using gradient boosting with many features on small datasets (e.g., a single tournament). With only a few hundred shots, complex models can memorize the training data. Always use cross-validation and monitor the gap between training and validation metrics. If training log loss is much lower than validation log loss, reduce model complexity.

7.3.6 Neural Network Approaches

Deep learning has been applied to xG modeling with varying degrees of success. Neural networks can automatically learn complex feature interactions, but they require more data and careful tuning:

import tensorflow as tf
from tensorflow import keras

def build_xg_neural_network(input_dim):
    """
    Build a neural network for xG prediction.

    Architecture: Two hidden layers with dropout for regularization.
    """
    model = keras.Sequential([
        keras.layers.Dense(64, activation='relu', input_dim=input_dim),
        keras.layers.Dropout(0.3),
        keras.layers.Dense(32, activation='relu'),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(16, activation='relu'),
        keras.layers.Dense(1, activation='sigmoid')
    ])

    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['AUC']
    )

    return model

# Training (conceptual)
# model = build_xg_neural_network(input_dim=len(features))
# history = model.fit(X_train, y_train, epochs=50, batch_size=32,
#                     validation_split=0.2, verbose=1)

In practice, neural networks for xG rarely outperform well-tuned gradient boosting models when using only event data features. The advantages of neural networks become more apparent when working with tracking data or image-based inputs (such as frame snapshots from broadcast video). Some research groups have experimented with convolutional neural networks (CNNs) that take a rasterized image of the pitch state as input, allowing the model to learn spatial patterns directly.

Callout: Model Architecture Comparison

Architecture ROC AUC Range Calibration Interpretability Training Data Needed
Logistic Regression 0.72-0.76 Excellent High Low (~5,000 shots)
Random Forest 0.75-0.79 Poor (needs calibration) Medium Medium (~10,000 shots)
Gradient Boosting 0.78-0.82 Good Low Medium (~10,000 shots)
Neural Network 0.77-0.83 Variable Very Low High (~50,000+ shots)

7.3.7 Calibration: Ensuring Probabilities Are Accurate

A well-calibrated model means that among all shots it assigns 0.15 xG, approximately 15% should actually be goals. Calibration is essential for xG because we aggregate predictions:

from sklearn.calibration import calibration_curve, CalibratedClassifierCV

# Check calibration of GB model
prob_true, prob_pred = calibration_curve(y_test, y_pred_proba_gb, n_bins=10)

plt.figure(figsize=(8, 8))
plt.plot([0, 1], [0, 1], 'k--', label='Perfectly calibrated')
plt.plot(prob_pred, prob_true, 'bo-', label='Gradient Boosting')
plt.xlabel('Mean Predicted Probability')
plt.ylabel('Fraction of Positives')
plt.title('Calibration Curve')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# If calibration is poor, use Platt scaling or isotonic regression
calibrated_model = CalibratedClassifierCV(model_gb, method='isotonic', cv=5)
calibrated_model.fit(X_train, y_train)

y_pred_calibrated = calibrated_model.predict_proba(X_test)[:, 1]
print(f"\nCalibrated Brier Score: {brier_score_loss(y_test, y_pred_calibrated):.4f}")

There are two main approaches to post-hoc calibration:

Platt Scaling fits a logistic regression on top of the model's raw predictions. It works well when the calibration curve is roughly sigmoidal (S-shaped). This is often the case for support vector machines and sometimes for gradient boosting.

Isotonic Regression fits a non-parametric, monotonically increasing function to the predictions. It is more flexible than Platt scaling and can correct arbitrarily shaped calibration curves, but it requires more data to avoid overfitting.

For xG models, isotonic regression is generally preferred because the calibration issues tend to be non-uniform across the probability range.

7.3.8 Cross-Validation for Robust Evaluation

Single train-test splits can be misleading. Cross-validation provides more reliable estimates:

from sklearn.model_selection import cross_val_score

# 5-fold cross-validation
cv_scores_ll = cross_val_score(
    model_gb, X, y, cv=5, scoring='neg_log_loss'
)
cv_scores_auc = cross_val_score(
    model_gb, X, y, cv=5, scoring='roc_auc'
)

print("\n5-Fold Cross-Validation Results:")
print(f"Log Loss: {-cv_scores_ll.mean():.4f} (+/- {cv_scores_ll.std():.4f})")
print(f"ROC AUC: {cv_scores_auc.mean():.4f} (+/- {cv_scores_auc.std():.4f})")

When working with soccer data specifically, consider using temporal cross-validation rather than random splits. Train on earlier seasons and test on later ones to better simulate real-world deployment, where the model must predict future shots based on historical patterns.


7.4 Evaluating xG Model Performance

Understanding how to assess xG model quality requires examining multiple metrics, each capturing different aspects of performance.

7.4.1 Log Loss (Cross-Entropy)

Log loss is the primary metric for probability prediction models:

$$\text{Log Loss} = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(p_i) + (1-y_i) \log(1-p_i)]$$

Where $y_i$ is the actual outcome (0 or 1) and $p_i$ is the predicted probability.

Log loss severely penalizes confident wrong predictions. If you predict 0.99 xG and the shot misses, the penalty is enormous. This makes log loss excellent for ensuring calibration.

Typical values: - Random guessing (predicting base rate): ~0.35 - Simple distance model: ~0.30 - Good xG model: ~0.26-0.28 - Excellent model with tracking data: ~0.24-0.26

7.4.2 ROC AUC

The Area Under the Receiver Operating Characteristic Curve measures discriminative ability--how well the model ranks shots by quality:

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba_gb)

plt.figure(figsize=(8, 8))
plt.plot(fpr, tpr, 'b-', linewidth=2, label=f'GB Model (AUC = {roc_auc_score(y_test, y_pred_proba_gb):.3f})')
plt.plot([0, 1], [0, 1], 'k--', label='Random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Interpretation: - AUC = 0.5: No discrimination (random) - AUC = 0.7-0.8: Acceptable discrimination - AUC = 0.8-0.9: Good discrimination - AUC > 0.9: Excellent (rarely achieved in xG)

7.4.3 Brier Score

The Brier score is the mean squared error between predictions and outcomes:

$$\text{Brier Score} = \frac{1}{N} \sum_{i=1}^{N} (p_i - y_i)^2$$

Lower is better. Brier score combines calibration and discrimination into a single metric. It can also be decomposed into three components: reliability (how well calibrated the predictions are), resolution (how much predictions vary from the base rate), and uncertainty (the inherent variability of the outcome). This decomposition, known as the Murphy decomposition, is useful for diagnosing where a model falls short.

Typical values: - Base rate prediction: ~0.08-0.09 - Good xG model: ~0.07-0.08

7.4.4 Comparing to Baselines

Always compare your model against meaningful baselines:

# Baseline 1: Predict base rate for everything
base_rate = y_train.mean()
baseline_predictions = np.full(len(y_test), base_rate)

print("Baseline Comparison:")
print(f"\nBase Rate Prediction ({base_rate:.3f} for all shots):")
print(f"  Log Loss: {log_loss(y_test, baseline_predictions):.4f}")
print(f"  Brier Score: {brier_score_loss(y_test, baseline_predictions):.4f}")

# Baseline 2: Use StatsBomb's xG (if available)
if 'shot_statsbomb_xg' in model_data.columns:
    statsbomb_xg = model_data.loc[X_test.index, 'shot_statsbomb_xg'].values
    print(f"\nStatsBomb xG:")
    print(f"  Log Loss: {log_loss(y_test, statsbomb_xg):.4f}")
    print(f"  Brier Score: {brier_score_loss(y_test, statsbomb_xg):.4f}")

print(f"\nOur GB Model:")
print(f"  Log Loss: {log_loss(y_test, y_pred_proba_gb):.4f}")
print(f"  Brier Score: {brier_score_loss(y_test, y_pred_proba_gb):.4f}")

7.4.5 Lift Curves and Cumulative Gains

For practical application, lift curves show how well the model identifies high-xG opportunities:

def plot_lift_curve(y_true, y_pred, n_bins=10):
    """Plot lift curve showing model performance vs random."""
    df = pd.DataFrame({'true': y_true, 'pred': y_pred})
    df['decile'] = pd.qcut(df['pred'], n_bins, labels=False, duplicates='drop')

    lift = df.groupby('decile')['true'].mean() / y_true.mean()

    plt.figure(figsize=(10, 6))
    plt.bar(range(len(lift)), lift.values)
    plt.axhline(y=1, color='r', linestyle='--', label='Random')
    plt.xlabel('xG Decile (0=lowest, 9=highest)')
    plt.ylabel('Lift (vs. baseline)')
    plt.title('Lift by xG Decile')
    plt.legend()
    plt.show()

    return lift

lift = plot_lift_curve(y_test, y_pred_proba_gb)

A good model should show lift > 2 in the highest decile (top 10% of xG predictions should convert at more than twice the baseline rate).

Best Practice: When evaluating xG models, always check multiple metrics. A model with excellent ROC AUC but poor calibration will rank shots correctly but assign inaccurate probabilities--making aggregated team xG unreliable. Conversely, a well-calibrated model with mediocre AUC may give accurate team totals but fail to distinguish individual shot quality. For production use, prioritize calibration (Brier score) because xG values are almost always summed.


7.5 Interpreting xG at Different Levels

xG gains meaning when aggregated appropriately. The interpretation differs significantly depending on whether we're examining individual shots, matches, or seasons.

7.5.1 Shot-Level xG

At the shot level, xG represents the probability that a specific shot results in a goal. Key considerations:

Single shots are highly variable. A 0.30 xG shot will miss 70% of the time. This isn't model failure--it's the nature of probability.

xG doesn't account for everything. The model doesn't know the shooter's skill level, their confidence, or whether they saw the goalkeeper move. It provides a population-level estimate.

def describe_shot_xg(xg_value):
    """Provide context for a shot's xG value."""
    if xg_value >= 0.40:
        quality = "Excellent chance (big chance)"
    elif xg_value >= 0.20:
        quality = "Good opportunity"
    elif xg_value >= 0.10:
        quality = "Moderate chance"
    elif xg_value >= 0.05:
        quality = "Difficult shot"
    else:
        quality = "Low-quality attempt"

    print(f"xG: {xg_value:.2f}")
    print(f"Quality: {quality}")
    print(f"Expected goals from 100 identical shots: {xg_value * 100:.1f}")

7.5.2 Match-Level xG

Summing xG across all shots in a match gives team-level expected goals:

def calculate_match_xg(events, home_team, away_team):
    """Calculate match-level xG summary."""
    shots = events[events['type'] == 'Shot'].copy()

    # Exclude own goals and assign xG
    shots = shots[shots['shot_type'] != 'Own Goal']
    shots['xg'] = shots['shot_statsbomb_xg'].fillna(0)

    home_xg = shots[shots['team'] == home_team]['xg'].sum()
    away_xg = shots[shots['team'] == away_team]['xg'].sum()

    home_goals = ((shots['team'] == home_team) &
                  (shots['shot_outcome'] == 'Goal')).sum()
    away_goals = ((shots['team'] == away_team) &
                  (shots['shot_outcome'] == 'Goal')).sum()

    return {
        'home_team': home_team,
        'away_team': away_team,
        'home_xg': round(home_xg, 2),
        'away_xg': round(away_xg, 2),
        'home_goals': home_goals,
        'away_goals': away_goals,
        'home_xg_diff': round(home_goals - home_xg, 2),
        'away_xg_diff': round(away_goals - away_xg, 2)
    }

Interpreting match xG:

A match where Team A has 2.5 xG and Team B has 0.8 xG suggests Team A created substantially better chances, regardless of the actual score. However:

  • Single-match variance is high. Even a 2.0 xG advantage only translates to winning roughly 65-70% of the time.
  • xG doesn't capture shot-stopping. A goalkeeper having an exceptional day will cause actual goals to differ from xG.
  • Own goals aren't in xG. Most models exclude own goals, which can affect match outcomes.

7.5.3 xG Timeline Visualization and Match Narratives

One of the most powerful applications of match-level xG is the xG timeline--a cumulative plot of each team's xG throughout the match. These visualizations have become ubiquitous in post-match analysis and tell compelling stories about how matches unfolded.

def plot_xg_timeline(shots_df, home_team, away_team):
    """
    Plot cumulative xG timeline for a match.
    """
    fig, ax = plt.subplots(figsize=(14, 6))

    for team, color in [(home_team, 'blue'), (away_team, 'red')]:
        team_shots = shots_df[shots_df['team'] == team].sort_values('minute')
        team_shots['cumulative_xg'] = team_shots['shot_statsbomb_xg'].cumsum()

        # Step plot for cumulative xG
        minutes = [0] + team_shots['minute'].tolist() + [90]
        cum_xg = [0] + team_shots['cumulative_xg'].tolist() + [team_shots['cumulative_xg'].iloc[-1]]

        ax.step(minutes, cum_xg, where='post', color=color, linewidth=2, label=team)

        # Mark goals with larger dots
        goals = team_shots[team_shots['shot_outcome'] == 'Goal']
        if len(goals) > 0:
            ax.scatter(goals['minute'], goals['cumulative_xg'],
                      color=color, s=100, zorder=5, edgecolors='black')

    ax.set_xlabel('Minute')
    ax.set_ylabel('Cumulative xG')
    ax.set_title(f'{home_team} vs {away_team} - xG Timeline')
    ax.legend()
    ax.set_xlim(0, 95)
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    return fig

An xG timeline reveals several match narratives that a final score cannot:

  • Dominant performance with bad luck: A team with 3.0 xG that loses 0-1 had many high-quality chances that simply did not convert. The timeline shows a steep climb for the losing team, indicating they were unlucky rather than outplayed.
  • Smash and grab: A team with 0.5 xG that wins 1-0 scored from their only meaningful chance. The timeline shows a flat line with a single spike at the goal.
  • Two-half match: The timeline might show one team dominating the first half and the other taking over in the second, something invisible in the final score.

Callout: Reading xG Timelines

When analyzing an xG timeline, look for: - Steepness of the curve: Steeper climbs indicate periods of sustained pressure and chance creation. - Large jumps: Single vertical spikes indicate "big chances" (xG > 0.30) such as one-on-ones or close-range headers. - Flat periods: Extended flat sections indicate periods without meaningful shot creation. - Final gap: The difference between the two teams' cumulative xG at full time summarizes the overall balance of chances.

7.5.4 Season-Level xG

Over a full season, xG becomes highly reliable:

def analyze_season_xg(match_data):
    """Analyze season-level xG patterns."""
    # Calculate cumulative xG and goals
    teams = match_data['team'].unique()

    season_summary = []
    for team in teams:
        team_matches = match_data[match_data['team'] == team]

        total_xg = team_matches['xg_for'].sum()
        total_goals = team_matches['goals_for'].sum()
        total_xga = team_matches['xg_against'].sum()
        total_ga = team_matches['goals_against'].sum()

        season_summary.append({
            'team': team,
            'matches': len(team_matches),
            'goals': total_goals,
            'xG': round(total_xg, 1),
            'xG_diff': round(total_goals - total_xg, 1),
            'goals_against': total_ga,
            'xGA': round(total_xga, 1),
            'xGA_diff': round(total_ga - total_xga, 1),
        })

    return pd.DataFrame(season_summary).sort_values('xG', ascending=False)

Key season-level insights:

  • xG is predictive of future goals. A team that underperforms xG by 10 goals in the first half is likely to score closer to xG in the second half.
  • Large xG over/underperformance is unsustainable. Teams rarely deviate from xG by more than plus or minus 10% over a full season.
  • xG tables often diverge from actual tables. This difference highlights teams that were lucky (positive goal diff vs xG diff) or unlucky (negative).

7.6 Team-Level xG Analysis

7.6.1 xG For, xG Against, and xG Difference

The most common team-level xG metrics are:

xG For (xGF): The total expected goals from all shots taken by the team. This measures attacking chance creation quality.

xG Against (xGA): The total expected goals from all shots taken against the team. This measures defensive vulnerability.

xG Difference (xGD): xGF minus xGA. The best single predictor of team quality. Teams with positive xGD are creating better chances than they allow, and over time, they tend to accumulate more points.

def create_xg_table(season_data):
    """Create league table based on xG."""
    season_data['xG_margin'] = season_data['xG'] - season_data['xGA']

    # Approximate points based on xG margin
    # (A more sophisticated approach would simulate matches)
    season_data['expected_ppg'] = 1.5 + (season_data['xG_margin'] / season_data['matches']) * 0.8
    season_data['expected_points'] = (season_data['expected_ppg'] * season_data['matches']).round(0)

    return season_data.sort_values('expected_points', ascending=False)

Research has consistently shown that xG difference is a better predictor of future points than actual goal difference. This is because actual goals contain more noise (from finishing variance, goalkeeper performance, and luck) than xG, which strips away that noise to measure the underlying process.

7.6.2 Team Shot Profiles

Beyond aggregate numbers, examining how a team generates xG provides tactical insight:

def analyze_team_shot_profile(shots_df, team):
    """Analyze a team's shot profile."""
    team_shots = shots_df[shots_df['team'] == team].copy()

    analysis = {
        'total_shots': len(team_shots),
        'total_xg': team_shots['shot_statsbomb_xg'].sum(),
        'xg_per_shot': team_shots['shot_statsbomb_xg'].mean(),
        'shots_inside_box': (team_shots['distance'] < 18).sum(),
        'box_shot_pct': (team_shots['distance'] < 18).mean(),
        'header_pct': (team_shots['shot_body_part'] == 'Head').mean(),
    }

    # Shot location breakdown
    team_shots['zone'] = pd.cut(
        team_shots['distance'],
        bins=[0, 6, 12, 18, 25, 100],
        labels=['6-yard', 'Close', 'Edge of box', 'Outside', 'Long range']
    )
    zone_breakdown = team_shots.groupby('zone')['shot_statsbomb_xg'].agg(['count', 'sum', 'mean'])

    return analysis, zone_breakdown

Teams with high xG per shot are efficient in their chance creation--they are working the ball into dangerous positions before shooting. Teams with many shots but low xG per shot may be settling for low-quality attempts from distance, indicating either a tactical choice or an inability to penetrate the opposition defense.

Callout: Team xG Archetypes

  • High volume, high quality (e.g., peak Manchester City under Guardiola): Many shots, mostly from inside the box. Very high total xG.
  • Low volume, high quality (e.g., Atletico Madrid under Simeone): Fewer shots, but almost all from dangerous positions. Moderate total xG but elite xG per shot.
  • High volume, low quality (e.g., some mid-table teams): Many shots from distance, inflating shot counts without generating proportional xG.
  • Counter-attacking efficiency (e.g., Leicester City 2015-16): Few total shots, but a high proportion of "big chances" from fast breaks.

7.7 Player-Level xG Analysis

7.7.1 xG Per 90 Minutes

The most basic player xG metric normalizes total xG by playing time:

$$xG_{per90} = \frac{xG_{total}}{Minutes Played / 90}$$

This allows comparison between players with different playing times. Typical values for strikers range from 0.30 to 0.70 xG per 90, with elite number nines exceeding 0.60. Wingers typically range from 0.15 to 0.40.

7.7.2 xG Per Shot

xG per shot measures the average quality of a player's shooting opportunities:

$$xG_{per shot} = \frac{xG_{total}}{Total Shots}$$

A high xG per shot indicates a player who is selective and only shoots from good positions, or a player who receives excellent service. A low xG per shot might indicate a player who takes many speculative efforts from distance.

Typical values: - Long-range shooters (e.g., some central midfielders): 0.04-0.06 xG per shot - Average forwards: 0.08-0.12 xG per shot - Poachers/target men: 0.12-0.18 xG per shot

7.7.3 xG Overperformance and Finishing Skill

The difference between a player's actual goals and expected goals reveals their finishing ability:

def analyze_player_finishing(shots_df, min_shots=20):
    """
    Analyze player finishing skill using goals vs xG.

    Parameters
    ----------
    shots_df : pd.DataFrame
        Shot data with player and xG columns
    min_shots : int
        Minimum shots for inclusion

    Returns
    -------
    pd.DataFrame
        Player finishing analysis
    """
    player_shots = shots_df.groupby('player').agg({
        'shot_statsbomb_xg': 'sum',
        'is_goal': 'sum',
        'id': 'count'
    }).rename(columns={
        'shot_statsbomb_xg': 'xG',
        'is_goal': 'goals',
        'id': 'shots'
    })

    # Filter to minimum shots
    player_shots = player_shots[player_shots['shots'] >= min_shots]

    # Calculate metrics
    player_shots['goals_minus_xg'] = player_shots['goals'] - player_shots['xG']
    player_shots['conversion_rate'] = player_shots['goals'] / player_shots['shots']
    player_shots['xg_per_shot'] = player_shots['xG'] / player_shots['shots']

    # Goals per xG (finishing skill indicator)
    player_shots['goals_per_xg'] = player_shots['goals'] / player_shots['xG']

    return player_shots.sort_values('goals_minus_xg', ascending=False).round(2)

Interpretation cautions:

  • Sample size matters enormously. Even 50 shots provides insufficient data to reliably identify finishing skill. Most "elite finishers" regress toward the mean over time.
  • Shot selection conflates with finishing. A player who only shoots from high-xG positions will appear to be a good finisher even if their technique is average.
  • Context varies. Playing for a dominant team affects the types of chances received.

Research suggests genuine finishing skill exists but is smaller than commonly believed--perhaps adding 0.01-0.02 xG per shot for truly elite finishers. A study by Statsbomb found that the year-on-year correlation of goals minus xG for individual players is roughly 0.20-0.30, indicating that most xG outperformance is not persistent.

Real-World Application: In recruitment, clubs often look for players who outperform their xG. However, data-driven clubs have learned that most "elite finishers" regress toward the mean within 1-2 seasons. The exception is a small group of truly world-class finishers (e.g., Messi, Lewandowski) who sustain outperformance over 5+ seasons. When scouting, prioritize players who consistently generate high xG (good positioning) over those who merely outperform low xG (possibly lucky).

7.7.4 Practical Application: Scouting with xG Metrics

xG metrics provide a powerful framework for player scouting across multiple dimensions:

def create_striker_scouting_report(player_data, min_minutes=900):
    """
    Create a comprehensive striker scouting report using xG metrics.
    """
    df = player_data[player_data['minutes'] >= min_minutes].copy()

    # Per-90 metrics
    df['xg_per90'] = df['xG'] / (df['minutes'] / 90)
    df['shots_per90'] = df['shots'] / (df['minutes'] / 90)
    df['xg_per_shot'] = df['xG'] / df['shots']
    df['goals_minus_xg'] = df['goals'] - df['xG']
    df['npxg_per90'] = df['npxG'] / (df['minutes'] / 90)  # Non-penalty xG

    # Scouting categories
    # 1. Volume creators: high xG per 90 (get into lots of good positions)
    # 2. Selective shooters: high xG per shot (only shoot from good spots)
    # 3. Clinical finishers: goals > xG (convert at above-expected rate)
    # 4. Complete forwards: high in all categories

    scouting_cols = [
        'player', 'team', 'age', 'minutes',
        'goals', 'xG', 'npxg_per90', 'shots_per90',
        'xg_per_shot', 'goals_minus_xg'
    ]

    return df[scouting_cols].sort_values('npxg_per90', ascending=False)

When using xG for scouting, the most reliable indicator is non-penalty xG per 90 (npxG/90). This measures how often a player gets into scoring positions, which is a repeatable skill that transfers across teams and leagues. Goals minus xG, by contrast, is much noisier and should be treated with caution for individual player evaluation.

Callout: The Scouting Hierarchy of xG Metrics

Most reliable (use for scouting): 1. npxG per 90 -- positioning and movement quality 2. xG per shot -- shot selection discipline 3. Shots per 90 -- involvement in the attack

Least reliable (treat with caution): 4. Goals minus xG -- mostly noise at the individual level 5. Conversion rate -- depends on both shot quality and finishing


7.8 Applications of xG Analysis

7.8.1 Chance Creation Analysis

xG quantifies creativity by measuring the quality of chances a player generates for teammates:

def analyze_chance_creation(events_df):
    """Analyze player chance creation using assisted shot xG."""
    # Identify shot assists (key passes that lead to shots)
    shots = events_df[events_df['type'] == 'Shot'].copy()

    # Get the pass that led to each shot
    shots['assist_player'] = shots.apply(
        lambda x: find_assist_player(events_df, x), axis=1
    )

    # Calculate xA (expected assists)
    xa_by_player = shots.groupby('assist_player')['shot_statsbomb_xg'].sum()
    xa_by_player = xa_by_player.rename('xA').reset_index()

    return xa_by_player.sort_values('xA', ascending=False)

def find_assist_player(events_df, shot_row):
    """Find the player who assisted a shot (simplified)."""
    # In practice, use the pass_id or related_events field
    # This is a placeholder for the concept
    return shot_row.get('pass_assisted_shot_id', None)

Expected Assists (xA) measures the quality of chances created, independent of whether teammates convert them. A player with high xA but few actual assists is creating good opportunities that teammates are missing. This concept is explored in full detail in Chapter 8.

7.8.2 Team Tactical Analysis

xG helps identify team strengths and weaknesses at a tactical level. By breaking down xG by situation type (open play, set pieces, counter-attacks), analysts can identify where a team generates and concedes the most danger.

7.8.3 Goalkeeper Evaluation

Post-shot xG models enable goalkeeper assessment:

def analyze_goalkeeper_performance(shots_df, goalkeeper):
    """
    Analyze goalkeeper using post-shot xG.

    Post-shot xG accounts for shot placement, providing
    a better baseline for goalkeeper evaluation.
    """
    # Filter to shots on target against this goalkeeper
    shots_against = shots_df[
        (shots_df['team'] != shots_df['possession_team']) &
        (shots_df['shot_outcome'].isin(['Goal', 'Saved']))
    ].copy()

    # Calculate goals saved above expected
    total_psxg = shots_against['shot_statsbomb_xg'].sum()  # Would use post-shot xG
    goals_conceded = (shots_against['shot_outcome'] == 'Goal').sum()

    goals_saved_above_expected = total_psxg - goals_conceded

    return {
        'shots_faced': len(shots_against),
        'goals_conceded': goals_conceded,
        'post_shot_xg': total_psxg,
        'goals_saved_above_expected': goals_saved_above_expected
    }

Post-shot expected goals (PSxG) accounts for shot placement--where within the goal frame the shot is traveling. This provides a fairer baseline than pre-shot xG because it controls for shot quality that the goalkeeper cannot influence. A goalkeeper who consistently prevents goals from high-PSxG shots is genuinely performing well, not merely benefiting from poor finishing.

7.8.4 Match Prediction

xG forms the foundation for probabilistic match prediction:

from scipy.stats import poisson

def predict_match(home_xg, away_xg, max_goals=7):
    """
    Predict match outcome probabilities using Poisson model.

    Parameters
    ----------
    home_xg : float
        Home team expected goals
    away_xg : float
        Away team expected goals
    max_goals : int
        Maximum goals to consider per team

    Returns
    -------
    dict
        Outcome probabilities
    """
    # Calculate scoreline probabilities
    scoreline_probs = np.zeros((max_goals + 1, max_goals + 1))

    for h in range(max_goals + 1):
        for a in range(max_goals + 1):
            scoreline_probs[h, a] = (
                poisson.pmf(h, home_xg) * poisson.pmf(a, away_xg)
            )

    # Aggregate outcomes
    home_win = np.sum(np.tril(scoreline_probs, -1))  # Below diagonal
    draw = np.sum(np.diag(scoreline_probs))          # Diagonal
    away_win = np.sum(np.triu(scoreline_probs, 1))   # Above diagonal

    # Most likely scorelines
    flat_idx = np.argsort(scoreline_probs.flatten())[::-1]
    top_scorelines = []
    for idx in flat_idx[:5]:
        h, a = divmod(idx, max_goals + 1)
        top_scorelines.append({
            'scoreline': f'{h}-{a}',
            'probability': scoreline_probs[h, a]
        })

    return {
        'home_win': home_win,
        'draw': draw,
        'away_win': away_win,
        'expected_home_goals': home_xg,
        'expected_away_goals': away_xg,
        'most_likely_scorelines': top_scorelines
    }

# Example
prediction = predict_match(1.8, 1.2)
print(f"Home win: {prediction['home_win']:.1%}")
print(f"Draw: {prediction['draw']:.1%}")
print(f"Away win: {prediction['away_win']:.1%}")

7.9 Post-Shot xG (PSxG) vs Pre-Shot xG

7.9.1 Understanding Post-Shot xG

Post-shot xG incorporates shot placement information--where within the goal frame the shot is directed. While pre-shot xG asks "how likely is this shot to score based on where it was taken?", PSxG asks "how likely is this shot to score based on where it was taken AND where it is heading?"

def estimate_post_shot_xg(shot_end_x, shot_end_y, shot_speed=None):
    """
    Estimate post-shot xG based on shot placement.

    Shot end coordinates are within the goal frame:
    - x: horizontal position (0 = left post, goal_width = right post)
    - y: vertical position (0 = ground, crossbar_height = crossbar)

    Note: This is simplified; real PSxG models use more features.
    """
    GOAL_WIDTH = 9.32
    GOAL_HEIGHT = 2.44

    # Distance from center of goal
    center_x = GOAL_WIDTH / 2
    center_y = GOAL_HEIGHT / 2

    dist_from_center = np.sqrt(
        (shot_end_x - center_x)**2 + (shot_end_y - center_y)**2
    )

    # Corner shots are harder to save
    corner_bonus = 0
    if (shot_end_x < 1 or shot_end_x > GOAL_WIDTH - 1) and shot_end_y > GOAL_HEIGHT - 0.5:
        corner_bonus = 0.15

    # Base PSxG increases with distance from center (harder for GK)
    base_psxg = 0.3 + (dist_from_center / 5) * 0.4 + corner_bonus

    return min(base_psxg, 0.95)  # Cap at 0.95

7.9.2 When to Use Each

Pre-shot xG is appropriate for: - Evaluating chance creation quality (for teams and creative players) - Assessing underlying performance levels - Predicting future scoring rates - Comparing teams' attacking and defensive processes

Post-shot xG is appropriate for: - Evaluating goalkeeper performance (Goals Saved Above Expected uses PSxG) - Analyzing finishing quality (where within the frame players place their shots) - Understanding why a specific match outcome occurred - Assessing shot-stopping in specific matches

The distinction matters because pre-shot xG measures the process (how good was the chance?) while PSxG measures the execution (how well was the shot hit?). A team can generate high pre-shot xG through good buildup play regardless of finishing quality, but PSxG requires both good chances and accurate shot placement.

Callout: PSxG for Goalkeeper Evaluation

The gold standard metric for goalkeepers is Goals Saved Above Expected (GSAx) based on PSxG:

GSAx = PSxG of shots on target faced - Goals conceded

A positive GSAx means the goalkeeper is saving more goals than expected given where shots were placed. This is more fair than using pre-shot xG because it controls for shot placement--a factor the goalkeeper cannot influence. Elite goalkeepers typically achieve GSAx of +5 to +10 per season, while struggling keepers may be at -5 or worse.


7.10 Limitations and Criticisms of xG

Despite its widespread adoption, xG has important limitations that users must understand.

7.10.1 Model-Dependent Variation

Different xG providers produce different values for identical shots:

Provider World Cup Final (France) World Cup Final (Croatia)
StatsBomb 2.35 xG 1.78 xG
Opta 2.41 xG 1.65 xG
Understat 2.28 xG 1.89 xG
FBref 2.35 xG 1.78 xG

These differences arise from: - Feature inclusion: Some models use tracking data; others don't - Training data: Models trained on different leagues may generalize differently - Methodology: Logistic regression vs. gradient boosting vs. neural networks - Penalty handling: Some providers assign a flat 0.76 xG to penalties; others model them separately

When comparing xG figures, always use consistent sources.

7.10.2 Missing Context

Standard xG models cannot account for:

Player skill variation: Messi shooting from 20 meters is not equivalent to an average player from the same position. Some analysts argue xG should be player-adjusted; others contend this defeats the purpose of measuring chance quality independently.

Psychological factors: Pressure situations (penalties excluded), fatigue, and crowd effects may influence conversion rates in ways not captured by shot location and type.

Defensive structure: A shot with two defenders in front differs from one with a clear path, but basic xG models treat them identically (tracking data models address this).

Goalkeeper position: The goalkeeper's location dramatically affects chance quality, but event data often doesn't capture it.

7.10.3 Aggregation Assumptions

Summing shot xG assumes independence--that each shot is an independent event. This may not hold:

  • Rebound shots depend on the initial save
  • Fast breaks may involve correlated chances
  • Set pieces create clustered opportunities

More sophisticated models account for shot sequences, but simple summation remains standard practice.

7.10.4 The "xG Doesn't Account for..." Fallacy

A common criticism pattern: "xG doesn't account for [specific factor], therefore it's flawed."

While technically true--no model captures everything--this critique misses the point. xG isn't meant to perfectly predict every shot; it provides a quality-adjusted measure of chances. A player who consistently beats xG may indeed be a good finisher, but:

  1. The observed outperformance may be smaller than it appears
  2. Sample sizes are usually insufficient for definitive conclusions
  3. Even skilled finishers regress partially toward expected values

7.10.5 Overfitting to Training Data

Models trained on one league may not generalize to others:

# Cross-validation across leagues helps identify this
def evaluate_cross_league_transfer(model, league_a_data, league_b_data):
    """Evaluate model transfer between leagues."""
    # Train on League A, test on League B
    X_train = league_a_data[features]
    y_train = league_a_data['is_goal']
    X_test = league_b_data[features]
    y_test = league_b_data['is_goal']

    model.fit(X_train, y_train)
    y_pred = model.predict_proba(X_test)[:, 1]

    return {
        'log_loss': log_loss(y_test, y_pred),
        'roc_auc': roc_auc_score(y_test, y_pred)
    }

7.10.6 Communicating Uncertainty

xG is often presented with false precision. Stating "Team A had 2.37 xG" implies accuracy that doesn't exist. Better practice:

  • Round to one decimal place for match totals
  • Report confidence intervals when possible
  • Emphasize that xG indicates probability, not certainty
  • Avoid using xG to definitively declare which team "deserved" to win

Common Pitfall: Social media analysts frequently say "Team A deserved to win because they had higher xG." This is an overstatement. Higher xG means a team created better chances, which correlates with deserving to win, but xG does not capture everything about a match (defensive structure, goalkeeper performance, off-ball movement). The correct framing is: "Based on the chances created, Team A would win this match more often than not if it were replayed many times."


7.11 Advanced xG Topics

7.11.1 Non-Shot xG (Expected Threat)

Expected Threat (xT) extends the xG concept to all ball actions, not just shots:

def create_xT_grid(events_df, grid_size=(12, 8)):
    """
    Create Expected Threat grid from event data.

    xT measures the probability of scoring in the next n actions
    from each location on the pitch.
    """
    # Define grid
    x_bins = np.linspace(0, 120, grid_size[0] + 1)
    y_bins = np.linspace(0, 80, grid_size[1] + 1)

    # Initialize grid
    xT_grid = np.zeros(grid_size)

    # Calculate goal probability from each zone
    # (Simplified: real xT uses Markov chains or action values)
    for i in range(grid_size[0]):
        for j in range(grid_size[1]):
            x_center = (x_bins[i] + x_bins[i+1]) / 2
            y_center = (y_bins[j] + y_bins[j+1]) / 2

            # Rough approximation: probability decreases with distance from goal
            dist_to_goal = np.sqrt((120 - x_center)**2 + (40 - y_center)**2)
            xT_grid[i, j] = max(0, 0.4 - dist_to_goal * 0.008)

    return xT_grid, x_bins, y_bins

xT enables evaluation of ball progression actions (passes, carries) that don't directly result in shots but increase scoring probability. This is explored in full detail in Chapter 9.

7.11.2 Tracking Data xG

With tracking data (player positions at 25Hz), xG models can incorporate:

  • Exact defender positions between shooter and goal
  • Goalkeeper position and movement
  • Shooter's approach angle and speed
  • Teammate positions for potential passes
def calculate_tracking_xg_features(shot_frame):
    """
    Calculate xG features from tracking data snapshot.

    Parameters
    ----------
    shot_frame : dict
        Tracking data at moment of shot, including:
        - shooter_position
        - goalkeeper_position
        - defender_positions
        - ball_speed
    """
    features = {}

    # Goalkeeper positioning
    gk_x, gk_y = shot_frame['goalkeeper_position']
    shooter_x, shooter_y = shot_frame['shooter_position']

    features['gk_to_goal_center_dist'] = np.sqrt(
        (120 - gk_x)**2 + (40 - gk_y)**2
    )
    features['gk_angle_to_shooter'] = np.arctan2(
        shooter_y - gk_y, shooter_x - gk_x
    )

    # Defender blocking
    goal_center = np.array([120, 40])
    shooter_pos = np.array([shooter_x, shooter_y])

    defenders_blocking = 0
    for def_pos in shot_frame['defender_positions']:
        if is_between_shooter_and_goal(shooter_pos, def_pos, goal_center):
            defenders_blocking += 1

    features['defenders_blocking'] = defenders_blocking

    return features

def is_between_shooter_and_goal(shooter, defender, goal):
    """Check if defender is in the shooting lane."""
    # Simplified geometric check
    shooter_to_goal = goal - shooter
    shooter_to_defender = defender - shooter

    # Project defender onto shooter-goal line
    projection = np.dot(shooter_to_defender, shooter_to_goal) / np.dot(shooter_to_goal, shooter_to_goal)

    # Check if projection is between shooter and goal
    return 0 < projection < 1

Tracking data xG models typically improve ROC AUC by 0.03-0.05 over event-data-only models--a meaningful but not transformative improvement.


7.12 Practical Implementation Guide

7.12.1 Choosing an xG Model

For most users, building your own xG model is unnecessary:

Use public xG values when: - Performing descriptive analysis - Writing articles or reports - The exact xG methodology isn't critical

Build custom models when: - You need consistent methodology across different data sources - You're developing predictive systems - You want to incorporate proprietary features

Recommended public sources: - StatsBomb Open Data (via statsbombpy) - FBref (includes StatsBomb xG) - Understat (publicly available)

7.12.2 Production Pipeline

A production xG system requires:

class XGPipeline:
    """Production xG calculation pipeline."""

    def __init__(self, model_path):
        """Load trained model and preprocessing components."""
        import joblib
        self.model = joblib.load(model_path)
        self.preprocessor = joblib.load(model_path.replace('.pkl', '_preprocessor.pkl'))

    def calculate_xg(self, shot_data):
        """
        Calculate xG for shot data.

        Parameters
        ----------
        shot_data : pd.DataFrame
            Raw shot data with required columns

        Returns
        -------
        pd.Series
            xG values for each shot
        """
        # Validate input
        required_cols = ['x', 'y', 'shot_body_part', 'shot_type']
        missing = set(required_cols) - set(shot_data.columns)
        if missing:
            raise ValueError(f"Missing required columns: {missing}")

        # Feature engineering
        features = self._engineer_features(shot_data)

        # Preprocess
        X = self.preprocessor.transform(features)

        # Predict
        xg = self.model.predict_proba(X)[:, 1]

        return pd.Series(xg, index=shot_data.index, name='xG')

    def _engineer_features(self, data):
        """Create model features from raw data."""
        df = data.copy()

        GOAL_X, GOAL_Y = 120, 40
        df['distance'] = np.sqrt((GOAL_X - df['x'])**2 + (GOAL_Y - df['y'])**2)
        df['angle'] = df.apply(
            lambda row: calculate_shot_angle(row['x'], row['y']), axis=1
        )
        df['log_distance'] = np.log(df['distance'] + 1)
        df['y_abs'] = np.abs(df['y'] - GOAL_Y)

        return df

7.12.3 Model Monitoring

Track model performance over time:

def monitor_xg_calibration(predictions_df, window='monthly'):
    """
    Monitor xG model calibration over time.

    Parameters
    ----------
    predictions_df : pd.DataFrame
        Historical predictions with columns: date, xg, is_goal
    window : str
        Aggregation window ('weekly', 'monthly', 'quarterly')
    """
    df = predictions_df.copy()
    df['period'] = pd.to_datetime(df['date']).dt.to_period(window[0].upper())

    calibration = df.groupby('period').agg({
        'xg': 'sum',
        'is_goal': 'sum'
    })
    calibration['ratio'] = calibration['is_goal'] / calibration['xg']

    # Alert if ratio deviates significantly from 1.0
    calibration['alert'] = np.abs(calibration['ratio'] - 1.0) > 0.15

    return calibration

7.13 Summary

Expected Goals has transformed soccer analytics by providing a principled, probabilistic measure of chance quality. Key takeaways:

  1. xG quantifies shot quality using features like distance, angle, body part, and shot type
  2. The history of xG spans from academic research through blog-era innovation to mainstream adoption, with key figures like Sam Green, Michael Caley, and StatsBomb driving development
  3. Multiple model architectures serve different needs: logistic regression for interpretability, gradient boosting for accuracy, and neural networks for complex data
  4. Evaluation requires multiple metrics: log loss for calibration, ROC AUC for discrimination, Brier score for overall accuracy
  5. Interpretation varies by aggregation level: shot-level xG is highly variable; season-level xG is reliable
  6. Team-level analysis (xGF, xGA, xGD) provides the best single measure of underlying team quality
  7. Player-level analysis should prioritize npxG per 90 for scouting, treating goals-minus-xG with caution
  8. Post-shot xG is essential for goalkeeper evaluation via Goals Saved Above Expected
  9. xG timelines provide rich match narratives beyond what final scores reveal
  10. Limitations exist: model variation, missing context, and communication challenges require careful handling

The next chapter extends these concepts to Expected Assists and Expected Threat, building a comprehensive framework for measuring all aspects of attacking contribution.


Key Formulas

Shot angle calculation: $$\theta = \left| \arctan\left(\frac{y_{right} - y_{shot}}{x_{goal} - x_{shot}}\right) - \arctan\left(\frac{y_{left} - y_{shot}}{x_{goal} - x_{shot}}\right) \right|$$

Log loss: $$\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(p_i) + (1-y_i) \log(1-p_i)]$$

Brier score: $$BS = \frac{1}{N} \sum_{i=1}^{N} (p_i - y_i)^2$$

Poisson match probability: $$P(H=h, A=a) = \frac{e^{-\lambda_H} \lambda_H^h}{h!} \cdot \frac{e^{-\lambda_A} \lambda_A^a}{a!}$$

xG per 90: $$xG_{per90} = \frac{xG_{total}}{Minutes / 90}$$

Goals Saved Above Expected: $$GSAx = PSxG_{faced} - Goals_{conceded}$$


References

  1. Caley, M. (2015). "Premier League Projections and New Expected Goals." Cartilage Free Captain.
  2. Eastwood, M. (2014). "Expected Goals and Support Vector Machines." Pena.lt/y.
  3. Green, S. (2012). "Assessing the performance of Premier League goalscorers." OptaPro Blog.
  4. StatsBomb (2019). "The xG Philosophy." StatsBomb Blog.
  5. Rathke, A. (2017). "An examination of expected goals and shot efficiency in soccer." Journal of Human Sport and Exercise.
  6. Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics Conference.
  7. Sumpter, D. (2019). "Friends of Tracking: Expected Goals." YouTube/GitHub.
  8. Trainor, C. & Chappas, C. (2013). "A Framework for Tactical Analysis and Individual Offensive Production Assessment in Soccer Using Markov Chains." MIT Sloan Sports Analytics Conference.