Expected Goals (xG) tells us the value of a shot. Expected Assists (xA) tells us the value of the final pass before a shot. But what about all the other actions in a possession--the passes that advance the ball up the field, the dribbles that break...
Learning Objectives
- Understand the concept of Expected Threat (xT) and how it differs from xG and xA
- Build and interpret xT grids from historical shot and goal data
- Calculate player and team xT contributions from event data
- Analyze ball progression metrics including progressive passes and carries
- Evaluate player value through possession-based frameworks
- Compare different approaches to valuing on-ball actions
- Apply xT analysis to tactical and recruitment questions
In This Chapter
- Learning Objectives
- Introduction
- 9.1 The Problem with Endpoint Metrics
- 9.2 Understanding Expected Threat (xT)
- 9.3 Building an xT Model
- 9.4 Progressive Actions and Their xT Value
- 9.5 Player Valuation Using xT
- 9.6 Progressive Passes and Carries
- 9.7 Comparison with VAEP, EPV, and Other Possession Value Models
- 9.8 Practical Applications
- 9.9 Visualization Techniques
- 9.10 Limitations and Considerations
- 9.11 Chapter Summary
- Key Terminology
- Key Formulas
- Further Practice
- References
Chapter 9: Expected Threat (xT) and Ball Progression
Learning Objectives
By the end of this chapter, you will be able to:
- Understand the concept of Expected Threat (xT) and how it differs from xG and xA
- Build and interpret xT grids from historical shot and goal data
- Calculate player and team xT contributions from event data
- Analyze ball progression metrics including progressive passes and carries
- Evaluate player value through possession-based frameworks
- Compare different approaches to valuing on-ball actions
- Apply xT analysis to tactical and recruitment questions
Introduction
Expected Goals (xG) tells us the value of a shot. Expected Assists (xA) tells us the value of the final pass before a shot. But what about all the other actions in a possession--the passes that advance the ball up the field, the dribbles that break defensive lines, the movements that create space? How do we value the hundreds of actions that occur before a shooting opportunity emerges?
Expected Threat (xT) addresses this gap by assigning value to every position on the pitch based on how likely that position is to lead to a goal in the near future. When a player moves the ball from a low-threat area to a high-threat area, they generate positive xT--they have increased their team's chance of scoring.
This chapter introduces xT and related ball progression metrics, providing a comprehensive framework for valuing all on-ball actions, not just shots and assists. These metrics have transformed how clubs evaluate midfielders, full-backs, and other players whose contributions don't always appear in traditional statistics.
9.1 The Problem with Endpoint Metrics
9.1.1 The Missing Middle
Traditional statistics focus on endpoints: goals, assists, shots on target. Even advanced metrics like xG and xA concentrate on the final moments of an attack. But most of a soccer match involves:
- Building possession from deep positions
- Progressing the ball through midfield
- Creating advantageous positions for attacks
- Breaking defensive lines through passes or dribbles
Players who excel at these tasks--deep-lying playmakers, ball-progressing center-backs, inverted full-backs--often appear undervalued by endpoint metrics.
Consider two midfielders: - Player A: Completes a 40-yard progressive pass that advances his team into the final third, but the attack fizzles - Player B: Receives the ball 25 yards from goal after Player A's work, makes a simple 5-yard pass, and a teammate shoots
Traditional xA credits Player B for the assist opportunity. Player A gets nothing, despite arguably making the more valuable contribution. This systematic undervaluation of ball progression has real consequences: clubs that rely solely on xG and xA will overlook excellent midfielders and defenders who drive their team's attacking play without appearing in the shot-creation chain.
9.1.2 The Value of Position
The insight behind xT is that position has value. Standing at the edge of the opponent's penalty area is inherently more threatening than standing in your own half, even before any action occurs. This positional value exists because:
- Proximity to goal: Closer positions offer shorter distances and better angles for shooting
- Defensive disruption: Attacking positions force defenders into more difficult decisions
- Transition advantage: Central positions in the final third require multiple defenders to cover
- Shooting opportunity density: Certain zones produce many more shots per possession than others
By quantifying this positional value, we can credit players for improving their team's position, regardless of whether that improvement directly leads to a shot.
Callout: Why xT Matters for Player Valuation
Before xT and similar metrics, evaluating certain positions was extremely difficult with data alone: - Ball-playing center-backs who drive possession forward appeared no different from passive center-backs in traditional stats - Deep-lying playmakers who orchestrate attacks from the center circle received no statistical credit for their orchestration - Inverted full-backs who carry the ball into midfield channels were invisible in xG/xA analysis
xT and progressive action metrics have made these players visible to data-driven scouting for the first time.
9.2 Understanding Expected Threat (xT)
9.2.1 Origin of xT: Karun Singh's Work
Expected Threat was formalized by Karun Singh in a 2019 blog post titled "Introducing Expected Threat (xT)." While the concept of positional value had been explored informally by several analysts, Singh provided the first clear mathematical framework and publicly available implementation.
Singh's key insight was to model the pitch as a Markov chain: the ball's current position determines the probability distribution of future positions, and from any future position, there is some probability of a shot and some probability of scoring. By working backward from these scoring probabilities, you can assign a threat value to every location on the pitch.
The work drew on earlier ideas from several sources. Sarah Rudd presented related concepts at the MIT Sloan Sports Analytics Conference as early as 2011, discussing how to value passes based on the zones they connect. Marek Kwiatkowski explored similar territory with his "goal-expected" framework. However, Singh's formulation was particularly elegant and accessible, and it caught the attention of the broader analytics community rapidly.
Since its introduction, xT has been widely adopted by analytics departments at professional clubs, and the concept has been extended and refined by numerous researchers. It serves as the conceptual foundation for more complex possession value models like VAEP and EPV.
9.2.2 Core Concept
Expected Threat (xT) divides the pitch into a grid of zones (typically 12x8 = 96 zones or 16x12 = 192 zones). Each zone is assigned a value representing the probability that possessing the ball in that zone will lead to a goal in the next few actions.
The fundamental equation:
xT(zone) = P(shot|zone) * P(goal|shot,zone) + P(move|zone) * Sum[P(zone'|zone,move) * xT(zone')]
Where:
- P(shot|zone) = probability of shooting from this zone
- P(goal|shot,zone) = probability of scoring if shooting from this zone
- P(move|zone) = probability of moving the ball (pass/dribble) rather than shooting
- P(zone'|zone,move) = probability of the ball ending up in zone' given a move from zone
- xT(zone') = expected threat value of the destination zone
This recursive equation captures both the immediate shooting threat and the future threat from subsequent actions. The beauty of this formulation is that it propagates value backward from the goal: zones that frequently lead to dangerous positions inherit some of that danger, even if they are far from goal themselves.
Key Insight: The recursive nature of xT means that a zone in the center circle might have a non-trivial xT value not because shots are taken from there, but because passes from that zone frequently reach dangerous areas. This is exactly the kind of value that xG and xA miss entirely.
9.2.3 The Zone-Based Value Surface
To understand what the xT surface looks like in practice, consider the following typical values:
| Zone | Approximate xT | Interpretation |
|---|---|---|
| Own penalty area | 0.001-0.003 | Minimal threat, recovery position |
| Own half | 0.002-0.010 | Building phase |
| Central midfield | 0.005-0.020 | Transition zone |
| Final third wings | 0.010-0.030 | Crossing positions |
| Final third center | 0.030-0.080 | Dangerous territory |
| Edge of box | 0.080-0.150 | High threat |
| Inside penalty area | 0.150-0.400 | Very high threat |
| Central box | 0.300-0.500+ | Maximum threat |
These values represent the probability that possession in that zone leads to a goal within the possession. Several notable patterns emerge from the xT surface:
Centrality premium. Central zones consistently have higher xT than equivalent lateral zones at the same distance from goal. This reflects the fact that central positions offer wider shooting angles and more passing options into dangerous areas.
Non-linear increase near goal. xT values increase slowly through midfield but accelerate rapidly in the final third, particularly within the penalty area. The zone just outside the center of the box might have 5-10 times the xT of a zone 20 meters further back.
Asymmetry in wide areas. Wide zones near the opponent's byline have moderate xT because they enable crosses and cutbacks, but the xT of these zones is lower than central zones at the same distance from goal, reflecting the lower conversion rate of crosses compared to central shots.
9.2.4 The Markov Chain Approach to Calculating xT
The mathematical foundation of xT is a Markov chain--a stochastic model where the probability of future states depends only on the current state, not on the sequence of events that preceded it. In the xT framework:
- States are the zones on the pitch grid
- Transitions are ball movements (passes and carries) between zones
- Absorbing states are goals (the possession ends with a score) and possession losses (the possession ends without a score)
The Markov property assumes that where the ball goes next depends only on where it is now, not on how it got there. This is an approximation--in reality, a ball that arrived via a fast break may have different continuation probabilities than one that arrived through slow build-up--but it is a reasonable simplification that makes the math tractable.
The xT calculation proceeds as follows:
- Estimate shooting probability from each zone: What fraction of possessions in zone $z$ result in a shot?
- Estimate scoring probability given a shot: What fraction of shots from zone $z$ result in a goal?
- Estimate transition probabilities: For passes and carries starting in zone $z$, what is the probability distribution of destination zones?
- Solve the value function recursively: Use value iteration to find the xT for each zone that satisfies the fundamental equation.
The value iteration converges because each step of the Markov chain has some probability of ending the possession (through a shot, turnover, or out-of-play), ensuring that the infinite series of future values converges to a finite sum.
Callout: Markov Chains in Plain Language
A Markov chain is simply a system where what happens next depends only on where you are now, not on your history. Think of it like navigating a city: from any intersection, you can turn left, right, or go straight, and the probability of each choice depends on the intersection (the current state) but not on how you got there. xT treats the soccer pitch the same way: from any zone, the ball has certain probabilities of moving to each other zone, and we can calculate the long-run probability of scoring from each starting position.
9.3 Building an xT Model
9.3.1 Data Requirements
To build an xT model, you need:
-
Event data with precise coordinates - All passes (successful and unsuccessful) - All carries/dribbles - All shots with outcomes
-
Sufficient sample size - Minimum ~50,000 actions for stable estimates - One full league season typically provides adequate data - More data (multiple seasons) produces smoother estimates
-
Consistent coordinate system - Standard 120x80 or 105x68 coordinate space - All data normalized to same orientation (attacking left-to-right)
9.3.2 Grid Definition
Choose your grid resolution:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Common grid configurations
GRID_12x8 = (12, 8) # 96 zones, ~10m x 10m each
GRID_16x12 = (16, 12) # 192 zones, ~7m x 7m each
GRID_24x16 = (24, 16) # 384 zones, ~5m x 5m each
def get_zone(x, y, grid_size=(12, 8), pitch_dims=(120, 80)):
"""Convert coordinates to zone indices."""
zone_x = int(x / pitch_dims[0] * grid_size[0])
zone_y = int(y / pitch_dims[1] * grid_size[1])
# Clamp to valid range
zone_x = max(0, min(zone_x, grid_size[0] - 1))
zone_y = max(0, min(zone_y, grid_size[1] - 1))
return zone_x, zone_y
def zone_to_index(x, y, grid_size=(12, 8), pitch_dims=(120, 80)):
"""Convert coordinates to a single flat index."""
zx, zy = get_zone(x, y, grid_size, pitch_dims)
return zy * grid_size[0] + zx
Higher resolution captures more nuance but requires more data for stable estimates. The 12x8 grid is a good default that balances resolution against data requirements. For clubs with access to multiple seasons of data, 16x12 provides noticeably better spatial resolution, particularly in the penalty area where small differences in position matter significantly.
Best Practice: Start with a 12x8 grid for initial exploration and increase resolution if your dataset supports it. A useful heuristic: each zone should contain at least 100 actions for stable probability estimates. With a 12x8 grid and 50,000 actions per season, that gives roughly 520 actions per zone on average--comfortably above the minimum.
9.3.3 Transition Matrix Calculation
Build transition matrices showing how the ball moves between zones:
def build_transition_matrices(events_df, grid_size=(12, 8)):
"""
Build pass and carry transition matrices.
Returns matrices where M[i,j] = P(end in zone j | start in zone i)
"""
n_zones = grid_size[0] * grid_size[1]
# Initialize count matrices
pass_counts = np.zeros((n_zones, n_zones))
carry_counts = np.zeros((n_zones, n_zones))
pass_totals = np.zeros(n_zones)
carry_totals = np.zeros(n_zones)
# Count transitions
for _, event in events_df.iterrows():
start_zone = zone_to_index(event['start_x'], event['start_y'], grid_size)
end_zone = zone_to_index(event['end_x'], event['end_y'], grid_size)
if event['type'] == 'Pass':
pass_counts[start_zone, end_zone] += 1
pass_totals[start_zone] += 1
elif event['type'] == 'Carry':
carry_counts[start_zone, end_zone] += 1
carry_totals[start_zone] += 1
# Normalize to probabilities
pass_matrix = np.divide(pass_counts, pass_totals[:, np.newaxis],
where=pass_totals[:, np.newaxis] > 0)
carry_matrix = np.divide(carry_counts, carry_totals[:, np.newaxis],
where=carry_totals[:, np.newaxis] > 0)
return pass_matrix, carry_matrix
The transition matrix is the heart of the xT model. Each row represents a starting zone, and each column represents a destination zone. The entry at row $i$, column $j$ gives the probability that a ball movement (pass or carry) starting in zone $i$ ends in zone $j$. These probabilities are estimated simply as the historical frequency of each transition.
9.3.4 Solving for xT Values
Use value iteration to solve for xT:
def calculate_xt_grid(shot_prob, goal_prob, move_prob,
pass_matrix, carry_matrix,
grid_size=(12, 8),
max_iterations=100, tolerance=1e-6):
"""
Calculate xT values using value iteration.
Parameters
----------
shot_prob : array
Probability of shooting from each zone
goal_prob : array
Probability of scoring given a shot from each zone
move_prob : array
Probability of moving the ball (rather than shooting) from each zone
pass_matrix : array
Transition matrix for passes
carry_matrix : array
Transition matrix for carries
grid_size : tuple
Grid dimensions
max_iterations : int
Maximum number of iterations
tolerance : float
Convergence threshold
Returns
-------
array
xT values for each zone
"""
n_zones = grid_size[0] * grid_size[1]
# Initialize with shot values
xT = shot_prob * goal_prob
for iteration in range(max_iterations):
xT_old = xT.copy()
# Calculate expected value of moving
pass_value = pass_matrix @ xT
carry_value = carry_matrix @ xT
move_value = 0.7 * pass_value + 0.3 * carry_value # Weighted average
# Update xT
xT = shot_prob * goal_prob + move_prob * move_value
# Check convergence
max_change = np.max(np.abs(xT - xT_old))
if max_change < tolerance:
print(f"Converged after {iteration + 1} iterations (max change: {max_change:.8f})")
break
if iteration == max_iterations - 1:
print(f"Warning: Did not converge after {max_iterations} iterations (max change: {max_change:.8f})")
return xT
The value iteration algorithm works as follows:
- Initialize xT values using only the immediate shooting value:
xT_0(z) = P(shot|z) * P(goal|shot,z) - Update each zone's value by adding the expected future value from ball movements:
xT_{n+1}(z) = P(shot|z) * P(goal|shot,z) + P(move|z) * E[xT_n(destination)] - Repeat until the maximum change between iterations falls below a tolerance threshold (typically 1e-6)
In practice, convergence typically occurs within 5-15 iterations. The speed of convergence depends on the grid resolution and the degree to which value propagates backward from the goal.
9.3.5 Practical Implementation Details
A complete xT implementation requires careful attention to several details:
def build_complete_xt_model(events_df, grid_size=(12, 8), pitch_dims=(120, 80)):
"""
Build a complete xT model from raw event data.
Parameters
----------
events_df : pd.DataFrame
Event data with columns: type, start_x, start_y, end_x, end_y, outcome
grid_size : tuple
Grid dimensions (columns, rows)
pitch_dims : tuple
Pitch dimensions in the coordinate system
Returns
-------
dict
Complete xT model with grid values and metadata
"""
n_zones = grid_size[0] * grid_size[1]
# Step 1: Calculate zone-level statistics
shot_count = np.zeros(n_zones)
goal_count = np.zeros(n_zones)
move_count = np.zeros(n_zones)
total_actions = np.zeros(n_zones)
for _, event in events_df.iterrows():
zone = zone_to_index(event['start_x'], event['start_y'], grid_size, pitch_dims)
if event['type'] == 'Shot':
shot_count[zone] += 1
if event.get('outcome') == 'Goal':
goal_count[zone] += 1
total_actions[zone] += 1
elif event['type'] in ['Pass', 'Carry']:
move_count[zone] += 1
total_actions[zone] += 1
# Step 2: Calculate probabilities with smoothing
# Add small constant to avoid division by zero
epsilon = 1e-8
shot_prob = shot_count / (total_actions + epsilon)
goal_prob = np.divide(goal_count, shot_count, where=shot_count > 0, out=np.zeros_like(goal_count))
move_prob = move_count / (total_actions + epsilon)
# Step 3: Build transition matrices
pass_matrix, carry_matrix = build_transition_matrices(events_df, grid_size)
# Step 4: Solve for xT
xT = calculate_xt_grid(shot_prob, goal_prob, move_prob,
pass_matrix, carry_matrix, grid_size)
return {
'xT': xT,
'grid_size': grid_size,
'pitch_dims': pitch_dims,
'shot_prob': shot_prob,
'goal_prob': goal_prob,
'move_prob': move_prob,
'n_actions': total_actions
}
9.3.6 Handling Edge Cases
Several situations require special handling:
-
Zones with no data: Use smoothing or interpolation from neighbors. This is particularly important for corner zones and deep defensive positions where events are rare. A common approach is to add a small uniform prior to all zone counts before calculating probabilities.
-
Own penalty area: Very low values, primarily clearance zones. Some implementations set these to zero since possessions in your own box are almost always defensive situations.
-
Unsuccessful actions: Account for turnover probability. When a pass fails, the possession effectively ends for the team in question, so the xT of the destination zone should be zero (or negative, if you want to penalize turnovers). For carries that result in a loss of possession, the negative value should be the xT of the zone where possession was lost.
-
Set pieces: May require separate treatment due to different spatial dynamics. Corner kicks, for instance, move the ball from a fixed position to the penalty area, creating a transition that does not reflect normal open-play dynamics. Some implementations build separate xT grids for open play and set pieces.
-
Defensive actions: The basic xT framework only values offensive ball movements. To incorporate defensive contributions, you would need to model the xT prevented by tackles, interceptions, and blocks, which requires a separate analytical framework.
Callout: Common Implementation Mistakes
When building your own xT model, watch out for these pitfalls: - Not separating successful and unsuccessful actions. Unsuccessful passes should not contribute to the transition matrix because the ball does not actually arrive at the "destination" -- it is intercepted or goes out of play. - Including penalties and free kicks in open-play estimates. Set pieces create artificial transitions that distort the movement probabilities. - Using too fine a grid. With insufficient data, a 24x16 grid will have many empty zones, creating noisy and unreliable estimates. - Forgetting to normalize orientation. If some events have coordinates attacking left-to-right and others right-to-left, the model will be nonsensical. Always normalize all events to the same attacking direction.
9.4 Progressive Actions and Their xT Value
9.4.1 Carries vs Passes in the xT Framework
A key distinction in the xT framework is between passes and carries, which contribute differently to ball progression:
Passes move the ball between players across potentially large distances. They are high-risk (interception possible) but can bypass defensive lines entirely. A single pass can generate large xT if it moves the ball from midfield into the penalty area.
Carries (also called dribbles or ball progressions) involve a single player moving with the ball. They are generally lower-risk per meter of progression but slower and limited by the player's physical ability. Carries generate xT incrementally as the player advances.
def decompose_xt_by_action_type(events_df, xt_grid, grid_size=(12, 8)):
"""
Decompose total xT generation into passes and carries.
"""
pass_xt = 0
carry_xt = 0
pass_count = 0
carry_count = 0
for _, event in events_df.iterrows():
start_zone = zone_to_index(event['start_x'], event['start_y'], grid_size)
end_zone = zone_to_index(event['end_x'], event['end_y'], grid_size)
xt_added = xt_grid[end_zone] - xt_grid[start_zone]
if event['type'] == 'Pass' and event.get('successful', True):
pass_xt += xt_added
pass_count += 1
elif event['type'] == 'Carry':
carry_xt += xt_added
carry_count += 1
return {
'pass_xt_total': pass_xt,
'carry_xt_total': carry_xt,
'pass_count': pass_count,
'carry_count': carry_count,
'xt_per_pass': pass_xt / pass_count if pass_count > 0 else 0,
'xt_per_carry': carry_xt / carry_count if carry_count > 0 else 0,
'pass_share': pass_xt / (pass_xt + carry_xt) if (pass_xt + carry_xt) > 0 else 0
}
In most leagues, passes account for roughly 60-70% of total positive xT generation, with carries accounting for 30-40%. However, this ratio varies dramatically by player profile. Dribbling-oriented wingers may generate 50% or more of their xT through carries, while deep-lying playmakers generate nearly all of theirs through passes.
The distinction between pass xT and carry xT is analytically important because it reveals different player profiles:
- Pass-dominant xT generators (e.g., Toni Kroos, Luka Modric) advance play through vision and technique
- Carry-dominant xT generators (e.g., Adama Traore, Neymar) advance play through dribbling and physical ability
- Balanced contributors (e.g., Kevin De Bruyne) generate xT through both channels roughly equally
9.4.2 xT from Actions: The Basic Calculation
Once we have an xT grid, we can value individual actions:
xT_added(action) = xT(end_zone) - xT(start_zone)
For example: - Pass from central midfield (xT=0.02) to edge of box (xT=0.10): xT added = +0.08 - Failed dribble losing possession: xT added = -0.02 (the possession xT lost) - Backward pass to maintain possession: xT added might be slightly negative but acceptable - Long ball from defense (xT=0.005) to attacking midfielder (xT=0.04): xT added = +0.035
The simplicity of this calculation is one of xT's great strengths. Once the grid is computed, evaluating any action requires only looking up two values and subtracting.
9.5 Player Valuation Using xT
9.5.1 Calculating Player xT
For each player, sum their xT contributions:
def calculate_player_xt(events_df, xt_grid, grid_size=(12, 8)):
"""Calculate xT added by each player."""
player_xt = {}
for _, event in events_df.iterrows():
player = event['player']
if pd.isna(player):
continue
# Get zones
start_zone = zone_to_index(event['start_x'], event['start_y'], grid_size)
end_zone = zone_to_index(event['end_x'], event['end_y'], grid_size)
# Calculate xT added
xt_added = xt_grid[end_zone] - xt_grid[start_zone]
# Account for unsuccessful actions
if not event.get('successful', True):
xt_added = -xt_grid[start_zone] # Lost possession value
# Accumulate
if player not in player_xt:
player_xt[player] = {'xt_added': 0, 'actions': 0,
'xt_passes': 0, 'xt_carries': 0,
'pass_count': 0, 'carry_count': 0}
player_xt[player]['xt_added'] += xt_added
player_xt[player]['actions'] += 1
if event['type'] == 'Pass':
player_xt[player]['xt_passes'] += xt_added
player_xt[player]['pass_count'] += 1
elif event['type'] == 'Carry':
player_xt[player]['xt_carries'] += xt_added
player_xt[player]['carry_count'] += 1
return player_xt
9.5.2 xT per 90 Minutes
Normalize by playing time:
def calculate_xt_per_90(player_xt, player_minutes):
"""Normalize xT to per-90-minute rate."""
results = []
for player, data in player_xt.items():
minutes = player_minutes.get(player, 0)
if minutes >= 450: # Minimum 5 full matches
xt_per_90 = data['xt_added'] / (minutes / 90)
actions_per_90 = data['actions'] / (minutes / 90)
results.append({
'player': player,
'xt_total': data['xt_added'],
'xt_per_90': xt_per_90,
'actions': data['actions'],
'actions_per_90': actions_per_90,
'xt_per_action': data['xt_added'] / data['actions'],
'xt_passes_per90': data['xt_passes'] / (minutes / 90),
'xt_carries_per90': data['xt_carries'] / (minutes / 90),
'minutes': minutes
})
return pd.DataFrame(results).sort_values('xt_per_90', ascending=False)
9.5.3 Decomposing xT by Action Type
Break down contributions to understand player profiles:
def decompose_player_xt(events_df, xt_grid, player_name, grid_size=(12, 8)):
"""Decompose a player's xT into action types."""
player_events = events_df[events_df['player'] == player_name]
decomposition = {
'passes': {'xt': 0, 'count': 0},
'carries': {'xt': 0, 'count': 0},
'successful_actions': {'xt': 0, 'count': 0},
'failed_actions': {'xt': 0, 'count': 0}
}
for _, event in player_events.iterrows():
start_zone = zone_to_index(event['start_x'], event['start_y'], grid_size)
end_zone = zone_to_index(event['end_x'], event['end_y'], grid_size)
if event.get('successful', True):
xt_added = xt_grid[end_zone] - xt_grid[start_zone]
decomposition['successful_actions']['xt'] += xt_added
decomposition['successful_actions']['count'] += 1
else:
xt_added = -xt_grid[start_zone]
decomposition['failed_actions']['xt'] += xt_added
decomposition['failed_actions']['count'] += 1
action_type = event['type'].lower() + 's'
if action_type in decomposition:
decomposition[action_type]['xt'] += xt_added
decomposition[action_type]['count'] += 1
return decomposition
9.5.4 Team-Level xT Analysis
Aggregate to team level for tactical analysis:
def calculate_team_xt(events_df, xt_grid, grid_size=(12, 8)):
"""Calculate team xT metrics."""
team_data = {}
for team in events_df['team'].unique():
team_events = events_df[events_df['team'] == team]
xt_total = 0
by_zone = {}
for _, event in team_events.iterrows():
start_zone = zone_to_index(event['start_x'], event['start_y'], grid_size)
end_zone = zone_to_index(event['end_x'], event['end_y'], grid_size)
if event.get('successful', True):
xt_added = xt_grid[end_zone] - xt_grid[start_zone]
else:
xt_added = -xt_grid[start_zone]
xt_total += xt_added
# Track by start zone for spatial analysis
zone_key = (start_zone // grid_size[0], start_zone % grid_size[0])
if zone_key not in by_zone:
by_zone[zone_key] = 0
by_zone[zone_key] += xt_added
matches = team_events['match_id'].nunique()
team_data[team] = {
'xt_total': xt_total,
'xt_per_match': xt_total / matches,
'matches': matches,
'xt_by_zone': by_zone
}
return team_data
9.6 Progressive Passes and Carries
9.6.1 Defining Progressive Actions
While xT provides a continuous measure of value added, progressive actions offer a simpler, more interpretable metric. A pass or carry is typically considered "progressive" if it:
Standard Definition (Wyscout): - Moves the ball at least 25% closer to the opponent's goal - Or enters the penalty area from outside it
Alternative Definition (distance-based): - Forward passes of at least 30 meters toward goal - Or passes into the final third from outside it
FBref/StatsBomb Definition: - Passes that move the ball toward the opponent's goal by at least 10 meters from their starting point, or any completed pass into the penalty area
def is_progressive_pass(start_x, start_y, end_x, end_y):
"""Determine if a pass is progressive (25% closer to goal rule)."""
# Distance to goal from start and end
goal_x, goal_y = 120, 40 # Standard coordinates
start_dist = np.sqrt((goal_x - start_x)**2 + (goal_y - start_y)**2)
end_dist = np.sqrt((goal_x - end_x)**2 + (goal_y - end_y)**2)
# Check if 25% closer
return end_dist < 0.75 * start_dist
def is_progressive_carry(start_x, start_y, end_x, end_y, min_distance=10):
"""Determine if a carry is progressive."""
# Must move ball forward significantly
forward_progress = end_x - start_x
# Distance to goal check (same as pass)
goal_x, goal_y = 120, 40
start_dist = np.sqrt((goal_x - start_x)**2 + (goal_y - start_y)**2)
end_dist = np.sqrt((goal_x - end_x)**2 + (goal_y - end_y)**2)
return forward_progress >= min_distance and end_dist < 0.75 * start_dist
9.6.2 Progressive Passing Metrics
Key metrics for progressive passing:
| Metric | Definition | Typical Values (CM) |
|---|---|---|
| Progressive passes | Count of progressive passes | 5-15 per 90 |
| Progressive pass distance | Total forward distance | 200-400m per 90 |
| Progressive pass % | Progressive / total passes | 5-15% |
| Passes into final third | Passes ending in final third | 3-10 per 90 |
| Passes into penalty area | Passes ending in box | 1-4 per 90 |
9.6.3 Progressive Carrying Metrics
Key metrics for progressive carries:
| Metric | Definition | Typical Values |
|---|---|---|
| Progressive carries | Count of progressive carries | 3-10 per 90 |
| Progressive carry distance | Total forward distance | 100-300m per 90 |
| Carries into final third | Carries ending in final third | 1-5 per 90 |
| Carries into penalty area | Carries ending in box | 0-2 per 90 |
9.6.4 Combining Progression Metrics
Total ball progression combines passes and carries:
def calculate_progression_metrics(events_df, player_name, minutes_played):
"""Calculate comprehensive progression metrics."""
player_events = events_df[events_df['player'] == player_name]
passes = player_events[player_events['type'] == 'Pass']
carries = player_events[player_events['type'] == 'Carry']
# Progressive passes
prog_passes = passes.apply(
lambda r: is_progressive_pass(r['start_x'], r['start_y'],
r['end_x'], r['end_y']), axis=1
)
# Progressive carries
prog_carries = carries.apply(
lambda r: is_progressive_carry(r['start_x'], r['start_y'],
r['end_x'], r['end_y']), axis=1
)
# Calculate distances
prog_pass_dist = passes[prog_passes].apply(
lambda r: r['end_x'] - r['start_x'], axis=1
).sum()
prog_carry_dist = carries[prog_carries].apply(
lambda r: r['end_x'] - r['start_x'], axis=1
).sum()
# Normalize to per 90
factor = 90 / minutes_played
return {
'progressive_passes_90': prog_passes.sum() * factor,
'progressive_carries_90': prog_carries.sum() * factor,
'total_progressive_actions_90': (prog_passes.sum() + prog_carries.sum()) * factor,
'progressive_pass_distance_90': prog_pass_dist * factor,
'progressive_carry_distance_90': prog_carry_dist * factor,
'total_progressive_distance_90': (prog_pass_dist + prog_carry_dist) * factor
}
Callout: xT vs. Progressive Actions -- When to Use Each
Use xT when: - You need a continuous, granular measure of value added - You want to compare the value of different actions on the same scale - You are building a player valuation model or scouting tool
Use progressive actions when: - You need metrics that are easy to communicate to coaches and scouts - You want a quick proxy for ball progression ability - You are working with limited data or computational resources
In practice, xT per 90 and progressive actions per 90 correlate highly (r > 0.8 for most positions), meaning either can serve as a reasonable proxy for the other.
9.7 Comparison with VAEP, EPV, and Other Possession Value Models
9.7.1 VAEP (Valuing Actions by Estimating Probabilities)
VAEP, developed by Tom Decroos and colleagues at KU Leuven, takes a different approach. Instead of using a grid, VAEP uses machine learning to directly predict how each action changes the probability of: 1. Scoring in the next 10 actions 2. Conceding in the next 10 actions
VAEP_value(action) = Delta_P(scoring) - Delta_P(conceding)
Advantages over xT: - Accounts for action-specific features (pass type, body part, pressure) - Includes defensive value (preventing conceding) - Can handle complex sequences - Values the action itself, not just the positional change
Disadvantages: - More complex to implement - Requires more features and training data - Less interpretable "black box" - Harder to explain to non-technical stakeholders
One important distinction between xT and VAEP is how they handle unsuccessful actions. In xT, a failed pass is typically penalized by subtracting the xT of the starting zone, but this penalty is coarse--it does not account for how dangerous the turnover location is for the opponent's counter-attack. VAEP, by contrast, explicitly models the probability of conceding after a turnover, meaning that a failed pass in your own defensive third is penalized far more heavily than one in the opponent's half. This makes VAEP more suitable for evaluating players whose style involves risk-taking in dangerous areas.
VAEP has been particularly successful in academic research and has been adopted by several professional clubs. The SPADL (Soccer Player Action Description Language) framework that accompanies VAEP provides a standardized way to represent soccer events across different data providers.
Callout: When VAEP and xT Disagree
Players who rank significantly differently under VAEP and xT often reveal interesting tactical profiles: - A player ranking higher in VAEP than xT typically performs context-dependent actions well--they make different decisions under pressure versus in space, and VAEP's richer feature set captures this. - A player ranking higher in xT than VAEP may be generating large spatial progressions but at a high turnover cost, which xT underweights but VAEP captures through its conceding probability component.
When the two frameworks disagree substantially on a player, it is worth investigating the underlying reasons, as the disagreement itself is analytically informative.
9.7.2 EPV (Expected Possession Value)
EPV, developed by Javier Fernandez and Luke Bornn, creates a continuous surface rather than a grid, using tracking data to account for: - Player positions (both teams) - Ball location and trajectory - Space control and pressure - Available passing options
EPV provides the most accurate possession valuation but requires tracking data, which is not always available. It answers the question: "Given the exact positions of all 22 players and the ball, what is the probability that this possession ends in a goal?"
The key advantage of EPV over xT is that it is context-dependent: the same ball position can have very different EPV values depending on where the defenders and attackers are positioned. xT, by contrast, assigns the same value to a zone regardless of the defensive structure around it.
9.7.3 xGChain and xGBuildup
As discussed in Chapter 8, xGChain and xGBuildup provide simpler alternatives that credit all players involved in possessions ending in shots. These metrics share xT's goal of valuing non-endpoint actions but use a fundamentally different approach: rather than calculating positional threat, they distribute shot xG backward through the possession chain.
9.7.4 Comparison of Frameworks
| Framework | Data Required | Complexity | Interpretability | Accuracy | Best For |
|---|---|---|---|---|---|
| xT | Event data | Low | High | Moderate | Quick analysis, scouting |
| VAEP | Event data + features | Medium | Medium | Good | Detailed player evaluation |
| EPV | Tracking data | High | Low | High | Elite-level tactical analysis |
| xG/xA chain | Event data | Low | High | Limited | Build-up contribution |
| Progressive actions | Event data | Very Low | Very High | Low | Communication, basic scouting |
9.7.5 Choosing a Framework
Select based on your resources and needs:
- xT: Best for quick, interpretable analysis with event data only. Ideal for mid-tier clubs without tracking data or extensive data science resources.
- VAEP: Better accuracy with event data, willing to sacrifice interpretability. Suitable for clubs with data science capacity who want more granular player evaluation.
- EPV: Maximum accuracy with tracking data available. Used by elite clubs with comprehensive data infrastructure.
- Progressive actions: Simplest approach, most transparent for communication with coaches and scouts.
Callout: The Practical Reality of Model Choice
In professional soccer analytics, the choice of possession value model often comes down to practical considerations rather than theoretical optimality: - Data availability. Most clubs outside the top leagues do not have tracking data, ruling out EPV. - Engineering resources. VAEP requires significant ML infrastructure; xT can be built in an afternoon. - Communication needs. Coaches and sporting directors need to understand the numbers. xT and progressive actions are far easier to explain than VAEP or EPV. - Update frequency. xT grids are stable across seasons and can be computed once. VAEP models require retraining as new data arrives.
9.8 Practical Applications
9.8.1 Scouting and Recruitment
xT excels at identifying players who progress the ball effectively:
Use Case: Finding Ball-Progressing Center-Backs
def scout_progressive_defenders(player_data, position='CB'):
"""Find defenders with high ball progression."""
defenders = player_data[player_data['position'] == position]
# Key metrics for progressive defenders
metrics = defenders[[
'player', 'team',
'progressive_passes_90',
'progressive_carries_90',
'xt_per_90',
'pass_completion_pct'
]].copy()
# Composite score
metrics['progression_score'] = (
metrics['progressive_passes_90'].rank(pct=True) * 0.35 +
metrics['progressive_carries_90'].rank(pct=True) * 0.25 +
metrics['xt_per_90'].rank(pct=True) * 0.30 +
metrics['pass_completion_pct'].rank(pct=True) * 0.10
)
return metrics.sort_values('progression_score', ascending=False)
This type of scouting query has become increasingly important as modern tactical systems demand center-backs who can initiate attacks from deep positions. A center-back ranking in the top 10% for xT per 90 among all center-backs is likely a strong ball-progressor who could thrive in a possession-oriented system.
Use Case: Identifying Undervalued Midfielders
Midfielders who generate high xT but low xA may be undervalued by traditional statistics. They consistently advance play into dangerous areas but do not always deliver the final pass. These players are often available at lower transfer fees because their contributions are invisible in headline statistics.
def find_undervalued_midfielders(player_data, min_minutes=1500):
"""
Find midfielders with high ball progression but low assist numbers.
These players may be undervalued in the market.
"""
midfielders = player_data[
(player_data['position'].isin(['CM', 'CDM', 'DM'])) &
(player_data['minutes'] >= min_minutes)
].copy()
midfielders['xt_per90'] = midfielders['xt_total'] / (midfielders['minutes'] / 90)
midfielders['xa_per90'] = midfielders['xA'] / (midfielders['minutes'] / 90)
# High progression, low direct chance creation
midfielders['progression_ratio'] = midfielders['xt_per90'] / (midfielders['xa_per90'] + 0.01)
return midfielders.sort_values('xt_per90', ascending=False)
9.8.2 Tactical Analysis
Analyze how teams build attacks:
Use Case: Comparing Build-Up Patterns
def analyze_buildup_patterns(team_events, xt_grid, grid_size=(12, 8)):
"""Analyze team's xT generation by zone."""
# Define zones
zone_xt = {'defensive': 0, 'middle': 0, 'final': 0}
zone_actions = {'defensive': 0, 'middle': 0, 'final': 0}
for _, event in team_events.iterrows():
start_zone = zone_to_index(event['start_x'], event['start_y'], grid_size)
end_zone = zone_to_index(event['end_x'], event['end_y'], grid_size)
if event.get('successful', True):
xt_added = xt_grid[end_zone] - xt_grid[start_zone]
else:
xt_added = -xt_grid[start_zone]
start_x = event['start_x']
if start_x < 40:
zone_xt['defensive'] += xt_added
zone_actions['defensive'] += 1
elif start_x < 80:
zone_xt['middle'] += xt_added
zone_actions['middle'] += 1
else:
zone_xt['final'] += xt_added
zone_actions['final'] += 1
# Percentage distribution
total = sum(max(v, 0) for v in zone_xt.values())
zone_pct = {k: max(v, 0)/total*100 if total > 0 else 0 for k, v in zone_xt.items()}
return zone_xt, zone_pct, zone_actions
Teams with high defensive-third xT generation are skilled at building from the back. Teams with high middle-third xT generation excel at transitional play. Teams concentrated in the final third may rely on direct play or pressing high to win the ball in advanced positions.
9.8.3 Player Development
Track progression improvements over time:
def track_progression_development(player_name, seasons_data):
"""Track a player's ball progression metrics over seasons."""
development = []
for season, data in seasons_data.items():
metrics = calculate_progression_metrics(
data['events'],
player_name,
data['minutes']
)
metrics['season'] = season
development.append(metrics)
return pd.DataFrame(development)
Young players who show improving xT and progressive action numbers across consecutive seasons are likely developing their ability to influence the game at higher levels. This trajectory data is valuable for academy player evaluation and loan decisions.
9.8.4 Match Analysis
Use xT to analyze specific matches:
def analyze_match_xt(match_events, xt_grid, grid_size=(12, 8)):
"""Analyze xT generation throughout a match."""
# By time period
time_periods = [(0, 15), (15, 30), (30, 45), (45, 60), (60, 75), (75, 90)]
analysis = {}
for start, end in time_periods:
period_events = match_events[
(match_events['minute'] >= start) &
(match_events['minute'] < end)
]
for team in period_events['team'].unique():
team_events = period_events[period_events['team'] == team]
xt_total = 0
for _, event in team_events.iterrows():
sz = zone_to_index(event['start_x'], event['start_y'], grid_size)
ez = zone_to_index(event['end_x'], event['end_y'], grid_size)
if event.get('successful', True):
xt_total += xt_grid[ez] - xt_grid[sz]
else:
xt_total -= xt_grid[sz]
key = f"{team}_{start}-{end}"
analysis[key] = xt_total
return analysis
9.9 Visualization Techniques
9.9.1 xT Heatmaps
Visualize the xT grid:
def plot_xt_grid(xt_values, grid_size=(12, 8)):
"""Plot xT grid as heatmap."""
fig, ax = plt.subplots(figsize=(14, 9))
# Reshape to grid
xt_matrix = xt_values.reshape(grid_size[1], grid_size[0])
# Draw pitch background
draw_pitch(ax)
# Overlay heatmap
extent = [0, 120, 0, 80]
im = ax.imshow(xt_matrix, extent=extent, origin='lower',
cmap='RdYlGn', alpha=0.7, aspect='auto')
# Colorbar
cbar = plt.colorbar(im, ax=ax, shrink=0.8)
cbar.set_label('Expected Threat', fontsize=12)
ax.set_title('Expected Threat (xT) Grid', fontsize=14)
return fig
9.9.2 Player xT Maps
Show where players generate xT:
def plot_player_xt_zones(player_events, xt_grid, grid_size=(12, 8)):
"""Plot where a player adds xT."""
fig, ax = plt.subplots(figsize=(14, 9))
draw_pitch(ax)
# Calculate xT added per zone
zone_xt = {}
for _, event in player_events.iterrows():
start_zone = get_zone(event['start_x'], event['start_y'])
sz_idx = zone_to_index(event['start_x'], event['start_y'], grid_size)
ez_idx = zone_to_index(event['end_x'], event['end_y'], grid_size)
if event.get('successful', True):
xt_added = xt_grid[ez_idx] - xt_grid[sz_idx]
else:
xt_added = -xt_grid[sz_idx]
if start_zone not in zone_xt:
zone_xt[start_zone] = 0
zone_xt[start_zone] += xt_added
# Plot as bubbles
for zone, xt in zone_xt.items():
x_center = (zone[0] + 0.5) * 10 # Convert zone to coords
y_center = (zone[1] + 0.5) * 10
color = 'green' if xt > 0 else 'red'
size = abs(xt) * 2000
ax.scatter(x_center, y_center, s=size, c=color, alpha=0.5)
ax.set_title(f'xT Generation by Zone', fontsize=14)
return fig
9.9.3 Progressive Action Maps
Visualize progressive passes and carries:
def plot_progressive_actions(player_events, action_type='Pass'):
"""Plot progressive passes or carries."""
fig, ax = plt.subplots(figsize=(14, 9))
draw_pitch(ax)
actions = player_events[player_events['type'] == action_type]
for _, action in actions.iterrows():
is_prog = is_progressive_pass(
action['start_x'], action['start_y'],
action['end_x'], action['end_y']
)
if is_prog:
ax.annotate('',
xy=(action['end_x'], action['end_y']),
xytext=(action['start_x'], action['start_y']),
arrowprops=dict(arrowstyle='->', color='blue',
lw=1, alpha=0.7))
ax.set_title(f'Progressive {action_type}es', fontsize=14)
return fig
9.10 Limitations and Considerations
9.10.1 xT Limitations
The basic xT model has several known limitations:
1. Ignores defensive positioning. This is the most significant limitation. A pass into the box is not equally valuable against 2 defenders versus 5. xT assigns the same value to a zone regardless of whether the space is open or heavily defended. This means xT can overvalue actions against deep blocks (where defenders are concentrated in high-xT zones) and undervalue actions in transition (where the defense is disorganized but the ball may be in a lower-xT zone).
2. Context-independent. The same position has the same value regardless of game state. Being in the opponent's box at 0-0 in the 10th minute is valued identically to being there at 3-0 up in the 89th minute, even though the tactical context is completely different.
3. Backward passes undervalued. Necessary possession retention appears negative in xT because backward passes move the ball from higher-xT to lower-xT zones. A center-back receiving a pass from a midfielder (negative xT action) and then playing a progressive pass forward (positive xT action) nets out to a modest positive, even though the backward pass was essential for repositioning and creating the subsequent opportunity.
4. Set pieces. Corner kicks don't fit the position-based framework well. A corner kick moves the ball from a relatively low-xT zone (the corner) to a high-xT zone (the penalty area), generating artificial xT that does not reflect the same type of attacking progression as open play.
5. Does not capture off-ball movement. xT measures only on-ball actions. A striker's intelligent run that drags a defender out of position, creating space for a teammate, generates zero xT but may be more valuable than many on-ball actions.
9.10.2 Practical Limitations of Coarse Grid Resolution
The choice of grid resolution introduces a trade-off that deserves explicit attention. When zones are too coarse--as in the common 12x8 grid--actions within the same zone receive zero xT credit even if they meaningfully change the ball's position. For example, in a 12x8 grid each zone covers approximately 10 meters by 10 meters. A carry from the back edge of a zone to the front edge covers nearly 10 meters of forward progression yet generates exactly zero xT because the start and end zones are identical. This "within-zone blindness" systematically undervalues short progressive actions, particularly carries and one-two passing sequences that advance the ball in small increments. In the penalty area, where even 2-3 meters of positional difference can dramatically change the shooting angle and expected goal probability, this problem is especially acute. A player who dribbles from the corner of the box to a central position 8 meters away may cross from one high-xT zone into an adjacent high-xT zone with similar values, receiving minimal credit for what was actually a very valuable action. Analysts should be aware that coarse grids tend to favor players who make long-range progressions (large zone-to-zone jumps) over those who make incremental but cumulatively valuable short progressions.
Callout: Mitigating Grid Resolution Problems
If you are restricted to a coarse grid due to limited data, two practical workarounds can help: - Interpolation: Instead of using discrete zone lookups, interpolate xT values based on the exact coordinates within each zone. This converts the step-function xT surface into a smooth gradient and ensures that every forward movement receives proportional credit. - Hybrid metrics: Combine xT with progressive distance metrics, which do not suffer from grid resolution issues. A player's total progressive carry distance captures value that coarse xT misses.
9.10.3 Extensions and Improvements to Basic xT
Several extensions have been proposed to address xT's limitations:
Context-dependent xT. Rather than a single xT grid, some implementations build separate grids for different game states (open play vs. set pieces, score margin, time period). This partially addresses the context-independence limitation.
Risk-adjusted xT. By incorporating turnover probabilities more explicitly, risk-adjusted xT penalizes high-risk actions more heavily. A player who generates 0.10 xT per successful pass but loses the ball 30% of the time looks different from one who generates 0.08 xT per pass with a 5% turnover rate.
Directional xT. Rather than a scalar value per zone, directional xT assigns different values depending on the direction of the incoming action. A pass arriving from the wing creates a different threat profile than one arriving from a central position, even if both end in the same zone.
Opponent-adjusted xT. By computing xT grids specific to each opponent's defensive structure, analysts can better estimate the actual threat generated against different defensive styles.
9.10.4 Data Quality Issues
- Coordinate accuracy: Small errors compound across many actions. If the data provider's coordinate system has systematic biases (e.g., consistently placing events 1-2 meters from their true location), xT calculations will be affected.
- Carry detection: Some providers do not capture carries well. If carries are underreported, the xT model will underweight their contribution.
- Definition inconsistencies: "Progressive" varies by provider and by analyst. Always be explicit about which definition you are using.
9.10.5 Interpretation Cautions
- Position matters: A fullback with high xT might just take lots of crosses from wide positions, which inflates their xT through sheer volume rather than genuine quality of progression.
- Role dependence: Defensive midfielders naturally generate less xT because they operate in zones with lower xT differentials. Compare within positions, never across positions.
- Risk profiles: High xT can come with high turnover rates. A player who attempts many ambitious passes will generate high gross xT but may also lose possession frequently. Always look at net xT (successful actions minus failed actions).
- Team context: Playing for a possession-heavy team inflates raw xT because there are simply more on-ball actions to accumulate xT from.
9.10.6 Best Practices
- Always normalize by playing time (per 90)
- Compare within positions, not across positions
- Consider xT alongside success rates (do not reward high-risk, low-success)
- Use multiple metrics together for full picture
- Account for team playing style and league context
- Separate open-play xT from set-piece xT where possible
- Look at both gross xT (from successful actions) and net xT (including penalties for turnovers)
Callout: The Complementary Metric Stack
For comprehensive player evaluation, combine xT with other metrics: - xT per 90 + xG per 90 + xA per 90 gives a complete attacking picture - Progressive passes/90 + Progressive carries/90 provides interpretable context for xT numbers - Pass completion % + xT per action distinguishes efficient progressors from reckless ones - Defensive actions per 90 rounds out the picture for midfielders and defenders
9.11 Chapter Summary
Expected Threat (xT) and ball progression metrics fill a crucial gap in soccer analytics by valuing actions that occur before shooting opportunities emerge. Key takeaways:
- xT originated from Karun Singh's work formalizing the concept of positional value on the pitch, building on earlier ideas from Sarah Rudd and others
- The zone-based value surface assigns threat values to every pitch position based on the probability of that position leading to a goal
- The Markov chain approach models ball movement as a stochastic process, propagating value backward from the goal through value iteration
- Progressive actions (passes and carries) offer a simpler alternative to xT for measuring ball progression
- Carries vs passes contribute differently to xT, revealing distinct player profiles (dribblers vs. passers)
- Player valuation using xT makes visible the contributions of deep-lying playmakers, ball-playing defenders, and other traditionally undervalued positions
- Comparison with VAEP and EPV reveals trade-offs between complexity, accuracy, interpretability, and data requirements
- Practical implementation requires careful handling of grid resolution, data quality, unsuccessful actions, and set pieces
- Limitations include context-independence, inability to capture defensive positioning, and undervaluation of backward passes
- Extensions like context-dependent xT and risk-adjusted xT address some of these limitations
In the next chapter, we will explore Passing Networks and Analysis, building on these concepts to understand team structure and playing patterns through network science approaches.
Key Terminology
| Term | Definition |
|---|---|
| Expected Threat (xT) | Probability that possession in a zone leads to a goal |
| xT Grid | Division of pitch into zones with assigned threat values |
| Transition Matrix | Probability matrix showing ball movement between zones |
| Value Iteration | Algorithm for solving xT values recursively |
| Progressive Pass | Pass that moves ball significantly closer to goal |
| Progressive Carry | Ball carry that advances possession toward goal |
| VAEP | Machine learning framework valuing actions by goal probability change |
| EPV | Continuous possession value surface using tracking data |
| Markov Chain | Stochastic model where future states depend only on the current state |
| Context-Dependent xT | Extension of xT that varies values by game state |
Key Formulas
xT Fundamental Equation: $$xT(z) = P(shot|z) \cdot P(goal|shot,z) + P(move|z) \cdot \sum_{z'} P(z'|z,move) \cdot xT(z')$$
xT Added by an Action: $$xT_{added} = xT(z_{end}) - xT(z_{start})$$
xT for Failed Actions: $$xT_{failed} = -xT(z_{start})$$
xT per 90: $$xT_{per90} = \frac{\sum xT_{added}}{Minutes / 90}$$
Progressive Pass Criterion: $$d_{end} < 0.75 \cdot d_{start}$$
Where $d$ is the distance from the point to the center of the opponent's goal.
Further Practice
- Build an xT grid from one full season of StatsBomb data
- Calculate xT per 90 for all midfielders and identify top progressors
- Compare progressive passing leaders with xT leaders--explain differences
- Analyze how a specific team generates xT by zone
- Track a young player's ball progression development over multiple seasons
- Build separate xT grids for home and away matches and compare the differences
- Implement risk-adjusted xT that penalizes turnovers proportional to the xT of the zone where possession was lost
References
- Singh, K. (2019). "Introducing Expected Threat (xT)." Karun.in.
- Rudd, S. (2011). "A Framework for Tactical Analysis and Individual Offensive Production Assessment in Soccer." MIT Sloan Sports Analytics Conference.
- Decroos, T. et al. (2019). "Actions Speak Louder than Goals: Valuing Player Actions in Soccer." KDD Conference.
- Fernandez, J. & Bornn, L. (2018). "Wide Open Spaces: A Statistical Technique for Measuring Space Creation in Professional Soccer." MIT Sloan Sports Analytics Conference.
- Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics Conference.
- Sumpter, D. (2019). "Expected Threat." Friends of Tracking, YouTube.
- StatsBomb (2021). "Progressive Passes and Carries." StatsBomb IQ Documentation.