Possession has long been considered a fundamental indicator of team dominance in soccer. The intuition is straightforward: if your team has the ball, the opponent cannot score. This simple logic has driven tactical philosophies from the Total...
Learning Objectives
- Understand the nuances of measuring possession beyond raw percentages
- Calculate and interpret possession sequence metrics
- Measure territorial control through field position and zone dominance
- Analyze pressing effectiveness and defensive territorial strategies
- Build possession efficiency metrics combining volume and quality
- Compare high-press versus deep-block approaches using data
- Integrate possession metrics with xG and xT for comprehensive team evaluation
In This Chapter
- Introduction
- 11.1 Understanding Possession
- 11.2 Territorial Control
- 11.3 Possession Value Models
- 11.4 Possession Efficiency
- 11.5 Pressing and Possession Regain
- 11.6 Transition Analysis
- 11.7 Game State Effects on Possession Patterns
- 11.8 Visualization
- 11.9 Practical Implementation
- 11.10 Applications and Case Studies
- 11.11 Summary
- References
Chapter 11: Possession and Territorial Control
Introduction
Possession has long been considered a fundamental indicator of team dominance in soccer. The intuition is straightforward: if your team has the ball, the opponent cannot score. This simple logic has driven tactical philosophies from the Total Football of the 1970s Dutch teams through the tiki-taka of Barcelona and Spain in the 2010s. Yet possession statistics alone tell an incomplete story. A team might dominate possession while struggling to create chances, or a counter-attacking side might concede possession deliberately while controlling the most dangerous areas of the pitch.
This chapter develops a comprehensive framework for measuring and analyzing possession and territorial control. We move beyond simple possession percentage to examine how teams control space, where they establish dominance, and how possession translates into attacking threat. By integrating concepts from previous chapters -- particularly Expected Threat (xT) and passing networks -- we build sophisticated metrics that capture the quality and effectiveness of possession rather than merely its quantity.
The distinction between possessing the ball and controlling territory is crucial. A team circulating the ball in their defensive third technically has possession but has ceded territorial control to the opponent pressing high. Conversely, a team with the ball in the opponent's penalty area has both possession and territorial dominance. Our metrics must capture these nuances to provide actionable insights for analysts and coaches.
The evolution of possession analytics mirrors the evolution of soccer tactics. Early statistical analysis simply counted time on the ball. The next generation measured where possession occurred. Today's frontier integrates possession location, speed, direction, and outcome to create holistic pictures of how teams control matches. This chapter traces that evolution and equips you with state-of-the-art tools for each layer of analysis.
11.1 Understanding Possession
11.1.1 Defining Possession: Ball Possession vs. Territorial Possession
Possession seems simple to define but proves surprisingly complex in practice. At the broadest level, we must distinguish between two fundamentally different concepts:
Ball possession refers to which team has the ball. At any given moment during live play, one team controls the ball. The percentage of time (or passes, or touches) each team spends in control is the traditional "possession" statistic displayed on television broadcasts.
Territorial possession refers to which team dominates specific areas of the pitch. A team can have territorial dominance without having the ball -- for example, a pressing team whose players occupy advanced positions while the opposition goalkeeper takes a goal kick. Territorial possession is about spatial control, not ball control.
The distinction matters because the two concepts can diverge dramatically. A team playing out from the back under intense pressure may have ball possession but zero territorial dominance -- their possession is occurring entirely in their own half under duress. Understanding which type of possession you are measuring, and which is more relevant to your analytical question, is essential.
11.1.2 Possession Percentage: Calculation Methods and Their Differences
Different organizations calculate possession differently, and the differences are not trivial:
Time-Based Possession: The proportion of match time each team has the ball.
$$\text{Possession}_A = \frac{\text{Time with ball}_A}{\text{Total playing time}} \times 100\%$$
Time-based possession requires tracking the exact moment possession changes, which is straightforward with tracking data but can be approximate with event data. The advantage of time-based possession is that it accounts for the actual duration of control -- a team that holds the ball for 30 seconds on each possession is contributing more than one that loses it after 5 seconds, even if both complete the same number of passes.
Pass-Based Possession: The proportion of successful passes each team completes.
$$\text{Possession}_A = \frac{\text{Passes}_A}{\text{Passes}_A + \text{Passes}_B} \times 100\%$$
Pass-based possession is the most common method in event data analysis because it is easy to compute and does not require timing information. However, it inflates the possession of teams that play many short passes and underestimates teams that use fewer, longer passes. A team playing 300 short passes in their own half would appear to have more possession than a team playing 200 forward passes that generate more attacking threat.
Touch-Based Possession: The proportion of ball touches by each team.
Touch-based possession counts every interaction with the ball -- passes, dribbles, shots, clearances, and so on. This is more inclusive than pass-based possession and captures activities like dribbling and carrying that pass counts miss.
These methods yield slightly different results, though they typically correlate strongly (r > 0.90). Event data providers like StatsBomb and Opta use variations of these approaches. For our analysis, we primarily use pass-based possession as it aligns naturally with event data, but we note the alternatives where relevant.
Common Pitfall: TV broadcast possession statistics often use a proprietary blend of time-based and touch-based methods that may not match what you compute from event data. Do not assume that your calculated possession percentage will exactly match the broadcast graphic. Differences of 2-3 percentage points are common and reflect methodological choices, not errors.
Intuition: Possession in soccer is like time of possession in basketball -- useful context, but far from the whole story. A team that holds the ball for 70% of the match in their own half is not dominating; they are likely under pressure and recycling cautiously. What matters is not how much you have the ball, but what you do with it and where you have it. This chapter teaches you to measure the "what" and "where" alongside the "how much."
11.1.3 The Possession Paradox
Research has shown that the relationship between possession and match outcomes is nuanced:
- Positive correlation with points: Teams with higher average possession generally finish higher in league tables
- Diminishing returns: Beyond approximately 55-60% possession, additional possession shows decreasing correlation with winning
- Style dependency: Counter-attacking teams can be highly successful with low possession
- Context sensitivity: Possession value depends on where the ball is and what happens with it
This paradox motivates our exploration of possession quality metrics beyond raw percentages.
Real-World Application: The 2018 World Cup provided a striking example of the possession paradox. Germany, the defending champions and one of the most possession-oriented teams in the tournament, were eliminated in the group stage despite averaging 68% possession. Russia, their conqueror, won the match with just 34% possession. This result underscored that possession without penetration is not dominance -- it is sterile control.
The academic literature confirms the paradox across multiple leagues and seasons. Collet (2013) found that possession has a positive but moderate correlation with league points in the top five European leagues -- roughly r = 0.4, meaning possession explains only about 16% of the variation in points. Other factors -- shooting quality, defensive organization, set pieces, and luck -- account for far more. This statistical reality does not diminish the value of possession, but it reframes it: possession is one ingredient of success, not a guarantee.
11.1.4 The Myth of "Sterile Possession"
The phrase "sterile possession" describes possession that circulates the ball without creating attacking threat. It is one of the most commonly invoked critiques in tactical analysis, but it deserves careful examination.
Sterile possession can be identified quantitatively. If a team has high possession percentage but low xG generation, low field tilt, and few entries into the final third, their possession is genuinely failing to create threat. However, there are important caveats:
Possession as defense. Even sterile possession prevents the opponent from attacking. A team holding 65% possession is limiting the opponent to 35% -- fewer opportunities for the other side to create chances. This defensive benefit of possession is real and measurable: teams with higher possession tend to face fewer shots, even if their own attacking output is modest.
Patience vs. sterility. High-possession teams that play patiently, probing for openings before accelerating into the final third, may appear "sterile" during long periods of build-up play but then produce high-quality chances when they eventually penetrate. Judging possession quality requires looking at outcomes over the full match, not just isolated sequences.
Opponent quality effects. What appears to be sterile possession may actually be excellent defending by the opponent. When a strong defensive team limits a possession-oriented side to circulating the ball in non-threatening areas, the blame lies partly with the defense's quality, not solely with the attack's inefficiency.
Best Practice: Before labeling a team's possession as "sterile," check their xG per possession, final third entries, and field tilt. True sterile possession produces low values across all these metrics. If xG per possession is normal but total xG is low, the team may simply have fewer possessions than usual (perhaps because the opponent is also holding the ball effectively).
11.1.5 Possession Sequences
A possession sequence represents a continuous period of team control, ending when: - The opposing team gains possession - Play stops (out of bounds, foul, etc.) - A goal is scored
Key sequence metrics include:
Sequence Length: Number of passes or events in the sequence
$$L_s = \text{count of events in sequence } s$$
Sequence Duration: Time elapsed during the sequence
$$D_s = t_{end} - t_{start}$$
Sequence Progression: Net movement toward goal
$$P_s = x_{final} - x_{initial}$$
Sequence Speed: Rate of progression toward goal
$$S_s = \frac{P_s}{D_s}$$
Sequence Directness: Ratio of net progression to total distance covered
$$\text{Directness}_s = \frac{x_{final} - x_{initial}}{\sum |x_{i+1} - x_i|}$$
A directness value close to 1 means the ball moved almost entirely forward; a value close to 0 means extensive lateral and backward passing relative to net progress.
import pandas as pd
import numpy as np
from statsbombpy import sb
def identify_possession_sequences(events_df, team_name):
"""
Identify possession sequences for a team.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
Returns
-------
list of DataFrame
Each DataFrame is one possession sequence
"""
# Sort by timestamp
events = events_df.sort_values(['minute', 'second', 'index']).reset_index(drop=True)
sequences = []
current_sequence = []
current_team = None
for idx, event in events.iterrows():
event_team = event['team']
# Possession change or play stoppage
if event_team != current_team or event['type'] in ['Half Start', 'Half End']:
if current_sequence and current_team == team_name:
sequences.append(pd.DataFrame(current_sequence))
current_sequence = []
current_team = event_team
current_sequence.append(event)
# Final sequence
if current_sequence and current_team == team_name:
sequences.append(pd.DataFrame(current_sequence))
return sequences
def analyze_sequence(sequence_df):
"""
Calculate metrics for a single possession sequence.
Parameters
----------
sequence_df : DataFrame
Events in the sequence
Returns
-------
dict
Sequence metrics
"""
# Basic counts
n_events = len(sequence_df)
n_passes = len(sequence_df[sequence_df['type'] == 'Pass'])
# Duration
start_time = sequence_df.iloc[0]['minute'] * 60 + sequence_df.iloc[0].get('second', 0)
end_time = sequence_df.iloc[-1]['minute'] * 60 + sequence_df.iloc[-1].get('second', 0)
duration = end_time - start_time
# Spatial progression
start_loc = sequence_df.iloc[0].get('location')
end_loc = sequence_df.iloc[-1].get('location')
if isinstance(start_loc, list) and isinstance(end_loc, list):
progression = end_loc[0] - start_loc[0]
start_x = start_loc[0]
end_x = end_loc[0]
else:
progression = 0
start_x = end_x = None
# Outcome
final_event = sequence_df.iloc[-1]
ends_in_shot = final_event['type'] == 'Shot'
ends_in_goal = ends_in_shot and final_event.get('shot_outcome') == 'Goal'
return {
'n_events': n_events,
'n_passes': n_passes,
'duration': duration,
'progression': progression,
'start_x': start_x,
'end_x': end_x,
'ends_in_shot': ends_in_shot,
'ends_in_goal': ends_in_goal
}
Common Pitfall: When analyzing possession sequences, be careful about how you define sequence boundaries. Different data providers use different rules for when a possession ends. Some count a deflection as a possession change, while others do not. Ensure consistency within your analysis and document your definition explicitly.
11.1.6 Build-Up Play Analysis
Build-up play -- the phase where a team constructs an attack from their own half -- is a critical component of possession that deserves dedicated analysis. Teams vary enormously in how they build attacks: some play short from the goalkeeper, others go long to a target forward, and many use a combination depending on opposition pressing intensity.
Key metrics for build-up play analysis include:
Build-up origin: Where possessions begin (goal kick, throw-in, open play recovery)
Goalkeeper involvement: How often and how the goalkeeper participates in build-up. Short distributions, long kicks, and throws each indicate different tactical approaches.
Build-up speed: Time from possession start to reaching the middle third. Faster build-up suggests a more direct approach; slower build-up suggests patient construction.
Build-up route: Whether the team builds through the center, through the flanks, or uses switches of play. This can be measured by the y-coordinate variance of passes during the build-up phase.
Real-World Application: Manchester City under Pep Guardiola are famous for building from the back. Analysis of their possession sequences reveals that they average 4-5 passes before crossing the halfway line, compared to 2-3 for more direct teams. Their build-up also shows significantly higher y-coordinate variance, indicating frequent switches of play designed to find the weak side of the opposition press.
11.2 Territorial Control
11.2.1 Defining Territory
Territorial control measures where on the pitch teams establish dominance. Unlike possession, which is binary (one team has the ball), territory can be shared or contested. We measure territory through:
- Average field position: Where events occur
- Zone dominance: Proportion of actions in each pitch zone
- Spatial control models: Probabilistic ownership of pitch areas
11.2.2 Field Tilt and Territorial Dominance Metrics
Average X Position: The mean horizontal position of a team's actions
$$\bar{X}_{team} = \frac{1}{n}\sum_{i=1}^{n} x_i$$
where $x_i$ is the x-coordinate of each action.
Field Tilt: The proportion of touches in the attacking third
$$\text{Tilt} = \frac{\text{Touches in attacking third}}{\text{Total touches}}$$
Field Tilt is one of the most underrated metrics in soccer analytics. It is simple to calculate, easy to explain to non-technical audiences, and strongly correlated with match dominance. A team with a Field Tilt above 0.40 (40% of touches in the attacking third) is exerting significant territorial pressure. Most top teams average 0.30-0.35, so values above 0.40 indicate exceptional attacking dominance in a given match.
Territorial Index: Comparison of average positions between teams
$$TI = \frac{\bar{X}_{team} - \bar{X}_{opponent}}{120} + 0.5$$
Values above 0.5 indicate territorial advantage. The Territorial Index is especially useful for comparing the two sides in a specific match because it directly captures the spatial battle between them.
Defensive Line Height: The average x-position of a team's defensive actions indicates how high they defend. A high defensive line (average defensive action x > 50) indicates an aggressive territorial approach, while a low line (x < 40) indicates a deep-block approach. This metric can be derived from the location of tackles, interceptions, and pressures.
Best Practice: Field Tilt is one of the most underrated metrics in soccer analytics. It is simple to calculate, easy to explain to non-technical audiences, and strongly correlated with match dominance. A team with a Field Tilt above 0.40 (40% of touches in the attacking third) is exerting significant territorial pressure. Most top teams average 0.30-0.35, so values above 0.40 indicate exceptional attacking dominance.
Advanced: For a more sophisticated measure of territorial control, consider Voronoi-based spatial models that use tracking data to assign every point on the pitch to the nearest player. This produces a continuous "control map" showing which team dominates each area. While tracking data is required, the resulting pitch control models (pioneered by William Spearman) provide the most accurate picture of spatial dominance available.
def calculate_field_position(events_df, team_name):
"""
Calculate field position metrics for a team.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
Returns
-------
dict
Field position metrics
"""
team_events = events_df[
(events_df['team'] == team_name) &
(events_df['location'].notna())
]
# Extract coordinates
x_coords = []
y_coords = []
for loc in team_events['location']:
if isinstance(loc, list) and len(loc) >= 2:
x_coords.append(loc[0])
y_coords.append(loc[1])
if not x_coords:
return None
x_coords = np.array(x_coords)
# Metrics
avg_x = np.mean(x_coords)
avg_y = np.mean(y_coords)
# Field tilt (attacking third is x > 80 on 120m pitch)
attacking_third = np.sum(x_coords > 80) / len(x_coords)
middle_third = np.sum((x_coords >= 40) & (x_coords <= 80)) / len(x_coords)
defensive_third = np.sum(x_coords < 40) / len(x_coords)
return {
'avg_x': avg_x,
'avg_y': avg_y,
'attacking_third': attacking_third,
'middle_third': middle_third,
'defensive_third': defensive_third,
'field_tilt': attacking_third
}
Common Pitfall: When calculating territorial metrics, remember that StatsBomb coordinates always orient the attacking direction to the right (x = 120). This means the attacking third for a team is always x > 80, regardless of which end they physically defend. Other data providers may not normalize coordinates this way, so always verify the coordinate convention before analysis.
11.2.3 Zone-Based Analysis
Dividing the pitch into zones enables detailed territorial analysis:
def calculate_zone_control(events_df, team_name, n_x=6, n_y=3):
"""
Calculate zone control percentages.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
n_x : int
Number of horizontal zones
n_y : int
Number of vertical zones
Returns
-------
ndarray
Zone control matrix (proportion of actions in each zone)
"""
zone_counts = np.zeros((n_y, n_x))
team_events = events_df[
(events_df['team'] == team_name) &
(events_df['location'].notna())
]
for loc in team_events['location']:
if not isinstance(loc, list):
continue
x, y = loc[0], loc[1]
# Convert to zone indices
zone_x = min(int(x / 120 * n_x), n_x - 1)
zone_y = min(int(y / 80 * n_y), n_y - 1)
zone_counts[zone_y, zone_x] += 1
# Normalize to proportions
total = zone_counts.sum()
if total > 0:
zone_counts = zone_counts / total
return zone_counts
def compare_zone_control(events_df, team1, team2, n_x=6, n_y=3):
"""
Compare zone control between two teams.
Returns matrix where positive values indicate team1 dominance,
negative values indicate team2 dominance.
"""
zone1 = calculate_zone_control(events_df, team1, n_x, n_y)
zone2 = calculate_zone_control(events_df, team2, n_x, n_y)
return zone1 - zone2
11.2.4 Spatial Control Models
Advanced spatial control models estimate the probability of each team controlling any point on the pitch. These typically require tracking data but can be approximated from event data:
$$P(control | x, y) = \sigma\left(\sum_{i} w_i \cdot K(x, y, x_i, y_i)\right)$$
where $K$ is a kernel function (often Gaussian) and $w_i$ are weights based on event importance.
from scipy.ndimage import gaussian_filter
def estimate_spatial_control(events_df, team_name, grid_size=(12, 8), sigma=1.5):
"""
Estimate spatial control using kernel density estimation.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
grid_size : tuple
(n_x, n_y) grid dimensions
sigma : float
Gaussian smoothing parameter
Returns
-------
ndarray
Control probability map
"""
n_x, n_y = grid_size
control_map = np.zeros((n_y, n_x))
team_events = events_df[
(events_df['team'] == team_name) &
(events_df['location'].notna())
]
# Weight by event type
event_weights = {
'Pass': 1.0,
'Carry': 1.0,
'Shot': 2.0,
'Dribble': 1.5,
'Ball Receipt*': 0.5,
'Pressure': 0.5
}
for _, event in team_events.iterrows():
loc = event['location']
if not isinstance(loc, list):
continue
x, y = loc[0], loc[1]
zone_x = min(int(x / 120 * n_x), n_x - 1)
zone_y = min(int(y / 80 * n_y), n_y - 1)
weight = event_weights.get(event['type'], 1.0)
control_map[zone_y, zone_x] += weight
# Apply Gaussian smoothing
control_map = gaussian_filter(control_map, sigma=sigma)
# Normalize to [0, 1]
if control_map.max() > 0:
control_map = control_map / control_map.max()
return control_map
11.3 Possession Value Models
11.3.1 Beyond Simple Possession: Linking Possession to Expected Outcomes
Raw possession percentage fails to capture possession quality. A team with 70% possession but never entering the final third generates less threat than a team with 40% possession but consistently reaching dangerous areas. Possession value models address this by weighting possession by its threat-generating potential.
The key insight driving modern possession value frameworks is that not all possession is created equal. Possessing the ball on the halfway line generates negligible threat; possessing it at the edge of the penalty area generates significant threat. By applying location-based value models -- such as Expected Threat (xT) from Chapter 9 -- to possession data, we create metrics that capture the quality of possession alongside its quantity.
11.3.2 Effective Possession: Possession in Dangerous Areas
We can weight possession by the Expected Threat values of locations controlled:
$$\text{xT Possession} = \sum_{e \in events} xT(x_e, y_e) \cdot w_e$$
where $w_e$ is an event weight (e.g., duration or importance).
def calculate_xt_possession(events_df, team_name, xt_grid):
"""
Calculate xT-weighted possession.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
xt_grid : ndarray
xT values for each zone
Returns
-------
dict
xT possession metrics
"""
grid_y, grid_x = xt_grid.shape
team_events = events_df[
(events_df['team'] == team_name) &
(events_df['location'].notna())
]
total_xt = 0
n_events = 0
for _, event in team_events.iterrows():
loc = event['location']
if not isinstance(loc, list):
continue
x, y = loc[0], loc[1]
zone_x = min(int(x / 120 * grid_x), grid_x - 1)
zone_y = min(int(y / 80 * grid_y), grid_y - 1)
total_xt += xt_grid[zone_y, zone_x]
n_events += 1
return {
'total_xt_possession': total_xt,
'avg_xt_possession': total_xt / n_events if n_events > 0 else 0,
'n_events': n_events
}
11.3.3 Dangerous Possession
We can define "dangerous possession" as possession in high-xT zones:
$$\text{Dangerous Possession \%} = \frac{\text{Events where } xT > \theta}{\text{Total events}}$$
The threshold $\theta$ is a design choice. A common value is 0.05 -- meaning zones where the probability of scoring from a single action exceeds 5%. This typically corresponds to areas inside or near the penalty area and in central positions in the attacking third.
def calculate_dangerous_possession(events_df, team_name, xt_grid, threshold=0.05):
"""
Calculate proportion of possession in dangerous areas.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
xt_grid : ndarray
xT values
threshold : float
xT threshold for "dangerous"
Returns
-------
float
Proportion of dangerous possession
"""
grid_y, grid_x = xt_grid.shape
team_events = events_df[
(events_df['team'] == team_name) &
(events_df['location'].notna())
]
dangerous = 0
total = 0
for _, event in team_events.iterrows():
loc = event['location']
if not isinstance(loc, list):
continue
x, y = loc[0], loc[1]
zone_x = min(int(x / 120 * grid_x), grid_x - 1)
zone_y = min(int(y / 80 * grid_y), grid_y - 1)
total += 1
if xt_grid[zone_y, zone_x] > threshold:
dangerous += 1
return dangerous / total if total > 0 else 0
Intuition: Dangerous possession percentage tells you what fraction of a team's ball touches occur in threatening positions. Two teams can both have 50% possession, but if one has 12% dangerous possession and the other has 5%, the first team is creating significantly more threat from a similar amount of the ball. This is a far more informative metric than raw possession percentage for understanding match dynamics.
11.3.4 Possession-Adjusted Metrics
Many metrics benefit from possession adjustment to enable fair comparison between possession-dominant and counter-attacking teams:
$$\text{Metric per 100 possessions} = \frac{\text{Metric}}{\text{Total possessions}} \times 100$$
Without possession adjustment, high-possession teams will naturally produce higher raw totals for most attacking metrics simply because they have more opportunities. Adjusting per possession normalizes for this, revealing which teams are more efficient with their opportunities.
def possession_adjust_metrics(team_metrics, possession_sequences):
"""
Adjust team metrics for possession volume.
Parameters
----------
team_metrics : dict
Raw metric values
possession_sequences : list
Team's possession sequences
Returns
-------
dict
Possession-adjusted metrics
"""
n_possessions = len(possession_sequences)
if n_possessions == 0:
return team_metrics
adjusted = {}
for key, value in team_metrics.items():
if isinstance(value, (int, float)):
adjusted[f'{key}_per_100poss'] = value / n_possessions * 100
adjusted[key] = value
adjusted['n_possessions'] = n_possessions
return adjusted
Best Practice: Always present both raw and possession-adjusted metrics side by side. Raw metrics show total output; adjusted metrics show efficiency. A team with high raw xG but low xG per possession is generating volume through sheer dominance of the ball. A team with modest raw xG but high xG per possession is lethally efficient with limited opportunities. Both profiles can be successful, and coaches need to see both dimensions.
11.4 Possession Efficiency
11.4.1 Defining Efficiency
Possession efficiency measures how well a team converts possession into attacking threat or goals:
$$\text{Possession Efficiency} = \frac{\text{Value Created}}{\text{Possession Volume}}$$
Different efficiency metrics capture different aspects:
Shot Efficiency: Shots per possession sequence
$$\text{Shot Rate} = \frac{\text{Possessions ending in shot}}{\text{Total possessions}}$$
Typical shot rates range from 8-15% of possessions ending in a shot. Elite attacking teams may reach 15-18%, while struggling sides may fall below 8%.
xG Efficiency: Expected goals per possession
$$\text{xG Rate} = \frac{\text{Total xG}}{\text{Total possessions}}$$
xT Efficiency: Threat generated per possession
$$\text{xT Rate} = \frac{\text{Total xT generated}}{\text{Total possessions}}$$
11.4.2 Possession Sequences: Length, Speed, and Directness
The characteristics of possession sequences reveal fundamental tactical choices:
Long sequences (10+ passes) are characteristic of patient, possession-oriented teams. These sequences are more likely to end in a shot -- research shows that sequences of 6-10 passes have the highest shot rate -- but the marginal returns diminish beyond about 10 passes, as the defense has time to reorganize.
Short sequences (1-3 passes) are characteristic of direct, counter-attacking teams. While each individual sequence has a lower probability of producing a shot, the shots that do result tend to be of higher quality (higher xG per shot) because they catch the defense out of shape.
Sequence speed -- measured as meters progressed per second -- distinguishes fast transitions from patient build-up. Counter-pressing teams like Liverpool under Klopp showed the highest sequence speeds in the Premier League, reflecting their philosophy of attacking quickly after winning the ball.
Sequence directness captures whether a team progresses linearly toward goal or circulates laterally. A team with high directness plays the ball forward at every opportunity; a team with low directness recycles possession and probes for openings.
class PossessionEfficiencyAnalyzer:
"""
Analyze possession efficiency for a team.
Attributes
----------
team_name : str
Team to analyze
sequences : list
Possession sequences
"""
def __init__(self, events_df, team_name, xt_grid=None):
"""
Initialize analyzer.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team name
xt_grid : ndarray, optional
xT values for threat calculation
"""
self.team_name = team_name
self.events_df = events_df
self.xt_grid = xt_grid
# Build sequences
self.sequences = identify_possession_sequences(events_df, team_name)
self._analyze_sequences()
def _analyze_sequences(self):
"""Analyze all possession sequences."""
self.sequence_metrics = []
for seq in self.sequences:
metrics = analyze_sequence(seq)
# Add xT if available
if self.xt_grid is not None:
metrics['xt_generated'] = self._calculate_sequence_xt(seq)
self.sequence_metrics.append(metrics)
def _calculate_sequence_xt(self, sequence_df):
"""Calculate xT generated in a sequence."""
grid_y, grid_x = self.xt_grid.shape
total_xt = 0
for i in range(len(sequence_df) - 1):
start = sequence_df.iloc[i]
end = sequence_df.iloc[i + 1]
start_loc = start.get('location')
end_loc = end.get('location')
if not (isinstance(start_loc, list) and isinstance(end_loc, list)):
continue
# Get zones
sz_x = min(int(start_loc[0] / 120 * grid_x), grid_x - 1)
sz_y = min(int(start_loc[1] / 80 * grid_y), grid_y - 1)
ez_x = min(int(end_loc[0] / 120 * grid_x), grid_x - 1)
ez_y = min(int(end_loc[1] / 80 * grid_y), grid_y - 1)
xt_delta = self.xt_grid[ez_y, ez_x] - self.xt_grid[sz_y, sz_x]
total_xt += max(0, xt_delta)
return total_xt
def get_efficiency_metrics(self):
"""
Calculate overall efficiency metrics.
Returns
-------
dict
Efficiency metrics
"""
n_sequences = len(self.sequences)
if n_sequences == 0:
return {}
df = pd.DataFrame(self.sequence_metrics)
# Basic rates
shot_rate = df['ends_in_shot'].sum() / n_sequences
goal_rate = df['ends_in_goal'].sum() / n_sequences
# Progression
avg_progression = df['progression'].mean()
# Duration efficiency
avg_duration = df['duration'].mean()
metrics = {
'n_possessions': n_sequences,
'shot_rate': shot_rate,
'goal_rate': goal_rate,
'avg_progression': avg_progression,
'avg_duration': avg_duration,
'avg_passes': df['n_passes'].mean()
}
# xT if available
if 'xt_generated' in df.columns:
metrics['total_xt_generated'] = df['xt_generated'].sum()
metrics['xt_per_possession'] = df['xt_generated'].mean()
return metrics
def get_sequence_quality_distribution(self):
"""
Analyze distribution of sequence quality.
Returns
-------
DataFrame
Sequence quality categories
"""
df = pd.DataFrame(self.sequence_metrics)
# Categorize sequences
def categorize(row):
if row['ends_in_goal']:
return 'Goal'
elif row['ends_in_shot']:
return 'Shot'
elif row.get('progression', 0) > 30:
return 'Good progression'
elif row.get('n_passes', 0) >= 5:
return 'Sustained'
else:
return 'Unproductive'
df['category'] = df.apply(categorize, axis=1)
return df['category'].value_counts(normalize=True)
11.4.3 Efficiency Comparison
Comparing efficiency between teams or across matches:
def compare_possession_efficiency(events_df, team1, team2, xt_grid=None):
"""
Compare possession efficiency between two teams.
Parameters
----------
events_df : DataFrame
Match events
team1, team2 : str
Team names
xt_grid : ndarray, optional
xT values
Returns
-------
DataFrame
Comparison of efficiency metrics
"""
analyzer1 = PossessionEfficiencyAnalyzer(events_df, team1, xt_grid)
analyzer2 = PossessionEfficiencyAnalyzer(events_df, team2, xt_grid)
metrics1 = analyzer1.get_efficiency_metrics()
metrics2 = analyzer2.get_efficiency_metrics()
comparison = pd.DataFrame({
team1: metrics1,
team2: metrics2
}).T
return comparison
11.5 Pressing and Possession Regain
11.5.1 The Connection to Possession
Pressing -- applying defensive pressure to regain the ball -- directly affects possession dynamics. High-pressing teams aim to regain possession in dangerous areas, while low-block teams concede possession but in less threatening locations. The choice between these approaches has profound implications for both possession metrics and match outcomes.
The relationship between pressing and possession is bidirectional. High pressing leads to more possession regained in advanced areas, which in turn creates shorter distances to goal and higher-quality attacking opportunities. But pressing also carries risk: if the press is beaten, the pressing team is left with players out of position and vulnerable to counter-attacks. Understanding this risk-reward tradeoff quantitatively is one of the most valuable applications of possession analytics.
11.5.2 PPDA: Passes Per Defensive Action
PPDA measures pressing intensity:
$$PPDA = \frac{\text{Opponent passes allowed}}{\text{Defensive actions in opponent's half}}$$
Lower PPDA indicates more intense pressing (fewer opponent passes allowed per defensive action). Typical PPDA values range from about 6 (extremely intense pressing, like peak Jurgen Klopp Liverpool) to about 15 (very passive defending, like a deep-block team protecting a lead).
The standard definition counts opponent passes in their own half and defensive actions (tackles, interceptions, fouls, pressures) in the opponent's half:
def calculate_ppda(events_df, pressing_team, defending_third_threshold=40):
"""
Calculate PPDA for a team.
Parameters
----------
events_df : DataFrame
Match events
pressing_team : str
Team whose pressing to measure
defending_third_threshold : float
X-coordinate marking defending team's defensive third
Returns
-------
float
PPDA value
"""
# Get opponent
teams = events_df['team'].unique()
opponent = [t for t in teams if t != pressing_team][0]
# Opponent passes in their defensive third
opponent_passes = events_df[
(events_df['team'] == opponent) &
(events_df['type'] == 'Pass') &
(events_df['location'].apply(
lambda x: isinstance(x, list) and x[0] < defending_third_threshold
))
]
# Defensive actions by pressing team in opponent's defensive third
# (which is pressing team's attacking third, x > 80 from their perspective)
defensive_actions = events_df[
(events_df['team'] == pressing_team) &
(events_df['type'].isin(['Pressure', 'Tackle', 'Interception', 'Foul Committed'])) &
(events_df['location'].apply(
lambda x: isinstance(x, list) and x[0] > 80
))
]
n_passes = len(opponent_passes)
n_def_actions = len(defensive_actions)
return n_passes / n_def_actions if n_def_actions > 0 else float('inf')
Common Pitfall: PPDA definitions vary across the analytics community. Some analysts use the opponent's half rather than their defensive third, some exclude fouls from defensive actions, and some weight actions differently. When comparing PPDA values across sources, always verify the exact definition being used. The trends and relative comparisons are usually consistent, but absolute values may differ.
11.5.3 Pressing Intensity and High Turnovers
Beyond PPDA, several additional metrics capture pressing behavior:
Pressing height: The average x-coordinate of pressure events. Higher values indicate more aggressive pressing.
Pressing success rate: The proportion of pressure events that result in a turnover within a short time window (typically 5 seconds).
High turnovers measure ball recoveries in dangerous areas:
def calculate_high_turnovers(events_df, team_name, threshold_x=80):
"""
Count high turnovers (ball recoveries in attacking third).
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
threshold_x : float
X-coordinate threshold for "high"
Returns
-------
dict
High turnover metrics
"""
recovery_types = ['Ball Recovery', 'Interception']
high_recoveries = events_df[
(events_df['team'] == team_name) &
(events_df['type'].isin(recovery_types)) &
(events_df['location'].apply(
lambda x: isinstance(x, list) and x[0] > threshold_x
))
]
total_recoveries = events_df[
(events_df['team'] == team_name) &
(events_df['type'].isin(recovery_types))
]
return {
'high_turnovers': len(high_recoveries),
'total_recoveries': len(total_recoveries),
'high_turnover_rate': len(high_recoveries) / len(total_recoveries)
if len(total_recoveries) > 0 else 0
}
Real-World Application: Research by Gegenpressing (2019) showed that possessions starting from high turnovers (ball won in the attacking third) produce shots with approximately 30% higher xG per shot than possessions starting from deep recoveries. This quantifies the tactical logic of high pressing: winning the ball higher up creates better scoring opportunities because the defense has less time and space to organize.
11.5.4 Counter-Pressing (Gegenpressing) Analytics
Counter-pressing measures immediate pressure after losing possession. The term "Gegenpressing" was popularized by Jurgen Klopp but the concept is central to many modern tactical systems. The idea is that the moment of possession loss is the moment of greatest defensive vulnerability for the opponent -- they have just transitioned from defense to attack and are not yet organized. Immediate pressing in this window can win the ball back before the opponent can counter-attack.
Key metrics for counter-pressing analysis:
Regain time: How quickly does the team win the ball back after losing it? Measured in seconds from possession loss to next ball recovery.
Regain location: Where does the team win the ball back relative to where they lost it? Regaining close to the loss location suggests effective counter-pressing; regaining far back suggests the team retreated.
Counter-press success rate: What proportion of possession losses lead to regaining the ball within a defined time window (typically 5-8 seconds)?
def analyze_counter_pressing(events_df, team_name, time_window=5):
"""
Analyze counter-pressing behavior after possession loss.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
time_window : int
Seconds after loss to measure pressing
Returns
-------
dict
Counter-pressing metrics
"""
events = events_df.sort_values(['minute', 'second', 'index']).reset_index(drop=True)
# Find possession losses
losses = []
for i in range(1, len(events)):
current = events.iloc[i]
previous = events.iloc[i-1]
# Possession loss = team had ball, now opponent has it
if previous['team'] == team_name and current['team'] != team_name:
losses.append({
'index': i,
'minute': previous['minute'],
'second': previous.get('second', 0),
'location': previous.get('location')
})
# Analyze pressing after each loss
counter_press_success = 0
counter_press_attempts = 0
for loss in losses:
loss_time = loss['minute'] * 60 + loss['second']
# Look for pressing actions within time window
for j in range(loss['index'], min(loss['index'] + 20, len(events))):
event = events.iloc[j]
event_time = event['minute'] * 60 + event.get('second', 0)
if event_time - loss_time > time_window:
break
if event['team'] == team_name:
if event['type'] in ['Pressure', 'Tackle', 'Interception']:
counter_press_attempts += 1
if event['type'] in ['Ball Recovery', 'Interception']:
counter_press_success += 1
break
return {
'possession_losses': len(losses),
'counter_press_attempts': counter_press_attempts,
'counter_press_regains': counter_press_success,
'counter_press_success_rate': counter_press_success / len(losses)
if len(losses) > 0 else 0
}
Advanced: The most sophisticated counter-pressing analysis measures not just whether the ball was regained but what happened next. Did the regain lead to a shot? How much xT was generated in the possession following the counter-press? This "counter-press value" metric connects the defensive action directly to attacking outcomes, providing a complete picture of the pressing team's return on investment.
11.6 Transition Analysis
11.6.1 Attack-to-Defense Transitions
Transitions -- the moments when possession changes hands -- are among the most dangerous phases of play. When a team loses the ball, they are momentarily disorganized, and the opponent can exploit this vulnerability. The speed and quality of defensive transitions determines how effectively a team responds to losing possession.
Key metrics for defensive transitions include:
Recovery time: Seconds from possession loss to the team's first defensive action (pressure, tackle, or retreat to defensive shape).
Shape recovery distance: How far back the team's defensive line moves in the 10 seconds after losing possession. A team that maintains its shape has a small recovery distance; one that is caught out of position has a large recovery distance.
Counter-attack concession rate: What proportion of possession losses lead to the opponent creating a shot within 15 seconds?
11.6.2 Defense-to-Attack Transitions
When a team wins the ball, the speed and directness of their subsequent actions determines whether they can exploit the opponent's disorganization:
Transition speed: Meters progressed toward goal per second in the first 10 seconds after winning possession.
Transition directness: Proportion of actions in the first 10 seconds that move the ball forward (as opposed to sideways or backward).
Fast break frequency: How often does the team generate a shot within 15 seconds of winning possession?
Intuition: Transitions are where the biggest mismatches occur in soccer. During settled possession, both teams are organized. During transitions, one team is momentarily exposed. The best counter-attacking teams in the world -- Leicester City in 2015-16, Real Madrid under Zidane, Inter Milan under Conte -- are masters of exploiting these momentary imbalances. Measuring transition speed and quality tells you how effectively a team capitalizes on these windows.
11.7 Game State Effects on Possession Patterns
11.7.1 How Scores Change Possession Behavior
Game state -- whether a team is winning, drawing, or losing -- has a profound and measurable effect on possession patterns. Understanding these effects is critical for interpreting possession statistics correctly.
When winning: Teams tend to increase possession percentage (they can afford to be patient), but the nature of their possession changes. Average field position typically drops (they sit deeper), and sequence directness decreases (they circulate more and attack less aggressively). Pressing intensity often decreases as well, with PPDA rising (less aggressive pressing).
When losing: Teams tend to increase pressing intensity (lower PPDA), push their average field position higher, and increase sequence directness. Centralization in the passing network often increases as the team funnels play through their most creative players in search of a goal.
When drawing: Behavior is closest to the team's "default" tactical setup, making drawing-state metrics the most representative of a team's intrinsic style.
Best Practice: When comparing possession metrics across teams or matches, always control for game state. A team that spent 60 minutes leading will show different possession patterns than one that spent 60 minutes trailing, regardless of their tactical philosophy. The cleanest comparisons use drawing-state data only, or weight metrics by game state proportionally.
11.7.2 Situational Possession Analysis
For advanced analysis, segment possession metrics by game state:
def analyze_possession_by_game_state(events_df, team_name):
"""
Analyze possession patterns separately for each game state.
Returns possession metrics when winning, drawing, and losing.
"""
# Determine game state at each minute (simplified)
goals = events_df[
(events_df['type'] == 'Shot') &
(events_df['shot_outcome'] == 'Goal')
].sort_values('minute')
teams = events_df['team'].unique()
opponent = [t for t in teams if t != team_name][0]
score = {team_name: 0, opponent: 0}
state_at_minute = {}
for minute in range(0, 96):
goals_this_min = goals[goals['minute'] == minute]
for _, g in goals_this_min.iterrows():
score[g['team']] += 1
if score[team_name] > score[opponent]:
state_at_minute[minute] = 'winning'
elif score[team_name] < score[opponent]:
state_at_minute[minute] = 'losing'
else:
state_at_minute[minute] = 'drawing'
# Segment events by game state
results = {}
for state in ['winning', 'drawing', 'losing']:
state_minutes = [m for m, s in state_at_minute.items() if s == state]
if not state_minutes:
continue
state_events = events_df[events_df['minute'].isin(state_minutes)]
field_pos = calculate_field_position(state_events, team_name)
if field_pos:
results[state] = {
'minutes': len(state_minutes),
**field_pos
}
return results
11.8 Visualization
11.8.1 Possession Maps
import matplotlib.pyplot as plt
from mplsoccer import Pitch
import seaborn as sns
def plot_possession_heatmap(events_df, team_name, ax=None):
"""
Create possession heatmap showing event density.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to visualize
ax : matplotlib.axes, optional
Axis to plot on
Returns
-------
fig, ax
"""
if ax is None:
pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b',
line_color='white')
fig, ax = pitch.draw(figsize=(12, 8))
else:
fig = ax.figure
pitch = Pitch(pitch_type='statsbomb')
pitch.draw(ax=ax)
team_events = events_df[
(events_df['team'] == team_name) &
(events_df['location'].notna())
]
x_coords = []
y_coords = []
for loc in team_events['location']:
if isinstance(loc, list) and len(loc) >= 2:
x_coords.append(loc[0])
y_coords.append(loc[1])
# Hexbin heatmap
hexbin = ax.hexbin(x_coords, y_coords, gridsize=15, cmap='YlOrRd',
alpha=0.7, mincnt=1, extent=[0, 120, 0, 80])
ax.set_title(f'{team_name} Possession Heatmap', fontsize=14, color='white')
return fig, ax
def plot_territorial_comparison(events_df, team1, team2, figsize=(14, 6)):
"""
Compare territorial control between two teams.
Parameters
----------
events_df : DataFrame
Match events
team1, team2 : str
Team names
figsize : tuple
Figure size
Returns
-------
fig, axes
"""
fig, axes = plt.subplots(1, 2, figsize=figsize)
for ax, team, color in zip(axes, [team1, team2], ['Blues', 'Reds']):
pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b',
line_color='white')
pitch.draw(ax=ax)
team_events = events_df[
(events_df['team'] == team) &
(events_df['location'].notna())
]
x_coords = []
y_coords = []
for loc in team_events['location']:
if isinstance(loc, list):
x_coords.append(loc[0])
y_coords.append(loc[1])
if x_coords:
ax.hexbin(x_coords, y_coords, gridsize=12, cmap=color,
alpha=0.6, mincnt=1, extent=[0, 120, 0, 80])
avg_x = np.mean(x_coords)
ax.axvline(avg_x, color='white', linestyle='--', alpha=0.7)
ax.text(avg_x + 2, 75, f'Avg: {avg_x:.1f}m', color='white', fontsize=10)
ax.set_title(f'{team}', fontsize=12, color='white')
plt.tight_layout()
return fig, axes
11.8.2 Possession Flow Diagrams
def plot_possession_flow(events_df, team_name, ax=None):
"""
Visualize possession flow between pitch thirds.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
ax : matplotlib.axes, optional
Axis to plot on
Returns
-------
fig, ax
"""
if ax is None:
fig, ax = plt.subplots(figsize=(10, 6))
else:
fig = ax.figure
# Count passes between zones
team_passes = events_df[
(events_df['team'] == team_name) &
(events_df['type'] == 'Pass') &
(events_df['pass_outcome'].isna())
]
zones = ['Defensive', 'Middle', 'Attacking']
zone_bounds = [(0, 40), (40, 80), (80, 120)]
def get_zone(x):
for i, (low, high) in enumerate(zone_bounds):
if low <= x < high:
return zones[i]
return zones[-1]
# Count transitions
flow = {(z1, z2): 0 for z1 in zones for z2 in zones}
for _, p in team_passes.iterrows():
start_loc = p.get('location')
end_loc = p.get('pass_end_location')
if not (isinstance(start_loc, list) and isinstance(end_loc, list)):
continue
start_zone = get_zone(start_loc[0])
end_zone = get_zone(end_loc[0])
flow[(start_zone, end_zone)] += 1
# Create Sankey-like visualization
zone_x = {z: i for i, z in enumerate(zones)}
max_flow = max(flow.values()) if flow.values() else 1
for (start, end), count in flow.items():
if count == 0:
continue
x1 = zone_x[start]
x2 = zone_x[end]
# Offset for different directions
if x2 > x1:
y_offset = 0.1
color = 'green'
elif x2 < x1:
y_offset = -0.1
color = 'red'
else:
y_offset = 0
color = 'gray'
width = count / max_flow * 5 + 0.5
alpha = count / max_flow * 0.5 + 0.3
ax.annotate('', xy=(x2, 0.5 + y_offset), xytext=(x1, 0.5 - y_offset),
arrowprops=dict(arrowstyle='->', color=color,
lw=width, alpha=alpha,
connectionstyle='arc3,rad=0.2'))
# Label
mid_x = (x1 + x2) / 2
ax.text(mid_x, 0.5 + y_offset * 2, str(count), fontsize=9, ha='center')
# Zone labels
for zone in zones:
x = zone_x[zone]
ax.text(x, 0.5, zone, fontsize=12, ha='center', va='center',
bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))
ax.set_xlim(-0.5, 2.5)
ax.set_ylim(0, 1)
ax.axis('off')
ax.set_title(f'{team_name} Passing Flow', fontsize=14)
return fig, ax
11.8.3 Sequence Quality Dashboard
def plot_possession_dashboard(events_df, team_name, figsize=(14, 10)):
"""
Create comprehensive possession analysis dashboard.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
figsize : tuple
Figure size
Returns
-------
fig, axes
"""
fig = plt.figure(figsize=figsize)
# Layout: 2x2 grid plus metrics panel
ax1 = fig.add_subplot(2, 2, 1) # Heatmap
ax2 = fig.add_subplot(2, 2, 2) # Sequence length distribution
ax3 = fig.add_subplot(2, 2, 3) # Flow diagram
ax4 = fig.add_subplot(2, 2, 4) # Metrics table
# 1. Possession heatmap
pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b',
line_color='white')
pitch.draw(ax=ax1)
team_events = events_df[
(events_df['team'] == team_name) &
(events_df['location'].notna())
]
x_coords = [loc[0] for loc in team_events['location']
if isinstance(loc, list)]
y_coords = [loc[1] for loc in team_events['location']
if isinstance(loc, list)]
if x_coords:
ax1.hexbin(x_coords, y_coords, gridsize=12, cmap='YlOrRd',
alpha=0.6, mincnt=1, extent=[0, 120, 0, 80])
ax1.set_title('Possession Locations', fontsize=11, color='white')
# 2. Sequence length distribution
sequences = identify_possession_sequences(events_df, team_name)
seq_lengths = [len(s) for s in sequences]
ax2.hist(seq_lengths, bins=range(1, max(seq_lengths) + 2),
color='steelblue', edgecolor='white', alpha=0.7)
ax2.set_xlabel('Sequence Length (events)')
ax2.set_ylabel('Frequency')
ax2.set_title('Possession Sequence Lengths', fontsize=11)
# 3. Passing flow
plot_possession_flow(events_df, team_name, ax=ax3)
# 4. Metrics summary
ax4.axis('off')
# Calculate metrics
field_pos = calculate_field_position(events_df, team_name)
efficiency = PossessionEfficiencyAnalyzer(events_df, team_name).get_efficiency_metrics()
metrics_text = f"""
{team_name} Possession Metrics
Total Possessions: {efficiency.get('n_possessions', 'N/A')}
Average Sequence Length: {np.mean(seq_lengths):.1f} events
Shot Rate: {efficiency.get('shot_rate', 0)*100:.1f}%
Average Field Position: {field_pos.get('avg_x', 0):.1f}m
Attacking Third: {field_pos.get('attacking_third', 0)*100:.1f}%
Defensive Third: {field_pos.get('defensive_third', 0)*100:.1f}%
Avg Progression: {efficiency.get('avg_progression', 0):.1f}m
"""
ax4.text(0.1, 0.9, metrics_text, transform=ax4.transAxes,
fontsize=11, verticalalignment='top', fontfamily='monospace',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
plt.tight_layout()
return fig
11.9 Practical Implementation
11.9.1 Complete Analysis Pipeline
class PossessionAnalyzer:
"""
Comprehensive possession and territorial control analyzer.
Provides unified interface for all possession metrics.
"""
def __init__(self, events_df, team_name, opponent_name=None, xt_grid=None):
"""
Initialize analyzer.
Parameters
----------
events_df : DataFrame
Match events
team_name : str
Team to analyze
opponent_name : str, optional
Opponent team name
xt_grid : ndarray, optional
xT values for quality weighting
"""
self.events_df = events_df
self.team_name = team_name
self.opponent_name = opponent_name or self._find_opponent()
self.xt_grid = xt_grid
# Build sequences
self.sequences = identify_possession_sequences(events_df, team_name)
# Calculate basic metrics
self._calculate_metrics()
def _find_opponent(self):
"""Find opponent team name."""
teams = self.events_df['team'].unique()
opponents = [t for t in teams if t != self.team_name]
return opponents[0] if opponents else None
def _calculate_metrics(self):
"""Calculate all possession metrics."""
# Field position
self.field_position = calculate_field_position(
self.events_df, self.team_name
)
# Zone control
self.zone_control = calculate_zone_control(
self.events_df, self.team_name
)
# Possession efficiency
self.efficiency = PossessionEfficiencyAnalyzer(
self.events_df, self.team_name, self.xt_grid
).get_efficiency_metrics()
# PPDA
if self.opponent_name:
self.ppda = calculate_ppda(self.events_df, self.team_name)
self.high_turnovers = calculate_high_turnovers(
self.events_df, self.team_name
)
self.counter_pressing = analyze_counter_pressing(
self.events_df, self.team_name
)
def get_summary(self):
"""Get comprehensive summary of possession metrics."""
summary = {
'team': self.team_name,
'n_possessions': len(self.sequences),
**self.field_position,
**self.efficiency
}
if hasattr(self, 'ppda'):
summary['ppda'] = self.ppda
summary.update(self.high_turnovers)
summary.update(self.counter_pressing)
return summary
def compare_to_opponent(self):
"""Compare possession metrics to opponent."""
if not self.opponent_name:
return None
opponent_analyzer = PossessionAnalyzer(
self.events_df, self.opponent_name, self.team_name, self.xt_grid
)
comparison = pd.DataFrame({
self.team_name: self.get_summary(),
self.opponent_name: opponent_analyzer.get_summary()
}).T
return comparison
def plot_dashboard(self, figsize=(14, 10)):
"""Create possession analysis dashboard."""
return plot_possession_dashboard(
self.events_df, self.team_name, figsize
)
11.10 Applications and Case Studies
11.10.1 Style Identification
Possession and territorial metrics enable objective style classification:
- Possession-dominant: High possession %, high field position, low PPDA
- Counter-attacking: Low possession %, high efficiency, low field position
- Pressing-intensive: Low PPDA, high counter-pressing success, high turnovers in attacking third
- Deep-defending: High defensive third %, low attacking third %, high PPDA
These classifications are not mutually exclusive. A team can be both possession-dominant and pressing-intensive (like Guardiola's Barcelona or Manchester City), or counter-attacking and pressing-intensive (like Klopp's Liverpool in certain phases).
Real-World Application: Multi-dimensional style profiles enable more nuanced scouting. Rather than simply labeling a team as "possession-based," analysts can describe them precisely: "60% possession, 0.35 field tilt, 8.2 PPDA, 32% counter-press success, 0.08 xG per possession." This level of detail allows coaches to prepare specific tactical responses and recruitment departments to find stylistic matches.
11.10.2 Match Analysis
Combining possession metrics with outcome analysis:
def analyze_match_possession(match_id, xt_grid=None):
"""
Complete possession analysis for a match.
Parameters
----------
match_id : int
StatsBomb match ID
xt_grid : ndarray, optional
xT grid for quality weighting
Returns
-------
dict
Complete analysis results
"""
events = sb.events(match_id=match_id)
teams = events['team'].unique()
results = {}
for team in teams:
analyzer = PossessionAnalyzer(events, team, xt_grid=xt_grid)
results[team] = analyzer.get_summary()
# Add comparison
results['comparison'] = pd.DataFrame(results).T
return results
11.10.3 Possession Value Frameworks: Linking Possession to Expected Outcomes
The ultimate goal of possession analytics is to link possession patterns to expected outcomes -- goals scored and conceded. Several frameworks accomplish this:
xG chain: Each player's contribution to possessions that result in shots, weighted by the xG of those shots. This measures how involved a player is in the team's most threatening possessions.
Possession value added (PVA): The total xT generated by a player's actions during their possessions. This measures how much a player moves the ball into more threatening positions.
Expected possession value (EPV): A continuous model that assigns a value to each moment of possession based on the current ball location, nearby player positions, and game state. EPV requires tracking data but provides the most comprehensive picture of possession value.
These frameworks represent the frontier of possession analytics, connecting the "how much" and "where" of possession to the "so what" of expected outcomes. By combining the metrics from this chapter -- possession percentage, field tilt, dangerous possession, sequence characteristics, pressing intensity -- with outcome models like xG and xT, analysts can build complete pictures of how teams create and prevent goals through their control of the ball and territory.
Advanced: The emerging field of "expected possession value" (EPV), pioneered by Fernandez, Bornn, and Cervone, assigns a real-time value to every moment of ball possession based on the probability of the current possession resulting in a goal. This framework subsumes many of the metrics in this chapter into a single unified model. While computationally intensive and requiring tracking data, EPV represents the theoretical ideal toward which possession analytics is converging.
11.11 Summary
This chapter developed a comprehensive framework for analyzing possession and territorial control:
- Possession fundamentals: Ball vs. territorial possession, calculation methods, the possession paradox, sterile possession, and possession sequences
- Territorial metrics: Field position, field tilt, defensive line height, zone control, and spatial dominance models
- Possession value: xT-weighted possession, dangerous possession, and possession-adjusted metrics
- Efficiency metrics: Shot rate, xG per possession, sequence length, speed, directness, and quality distribution
- Pressing integration: PPDA, pressing intensity, high turnovers, counter-pressing analytics
- Transition analysis: Attack-to-defense and defense-to-attack transition metrics
- Game state effects: How winning, drawing, and losing alter possession patterns
- Visualization: Heatmaps, flow diagrams, dashboards, and territorial comparisons
- Possession value frameworks: xG chain, PVA, and EPV connecting possession to outcomes
The key insight is that possession quality matters more than quantity. A team with 40% possession but excellent efficiency can outperform a team with 60% possession but poor conversion. Modern analytics requires measuring not just how much of the ball a team has, but what they do with it and where they do it.
These metrics integrate naturally with passing networks (Chapter 10) and Expected Threat (Chapter 9) to provide holistic understanding of team attacking play. Combined with defensive metrics (Part III), they enable complete tactical analysis.
References
-
Mackay, N. (2017). Predicting goal probabilities for possessions in football. MIT Sloan Sports Analytics Conference.
-
Fernandez, J., & Bornn, L. (2018). Wide open spaces: A statistical technique for measuring space creation in professional soccer. MIT Sloan Sports Analytics Conference.
-
Spearman, W. (2018). Beyond expected goals. MIT Sloan Sports Analytics Conference.
-
Trainor, C., & Chappas, G. (2019). Possession-based models for player and team behavior. StatsBomb Innovation in Football Conference.
-
Power, P., et al. (2017). Not all passes are created equal: Objectively measuring the risk and reward of passes in soccer. KDD 2017.
-
Collet, C. (2013). The possession game? A comparative analysis of ball retention and team success in European and international football. Journal of Sports Sciences, 31(2), 123-136.
-
Fernandez, J., Bornn, L., & Cervone, D. (2021). A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions. Machine Learning, 110(6), 1389-1427.
-
Robberechts, P., & Davis, J. (2020). Valuing on-the-ball actions in soccer: A critical comparison of xT and VAEP. AAAI Workshop on AI in Team Sports.