26 min read

Possession has long been considered a fundamental indicator of team dominance in soccer. The intuition is straightforward: if your team has the ball, the opponent cannot score. This simple logic has driven tactical philosophies from the Total...

Learning Objectives

  • Understand the nuances of measuring possession beyond raw percentages
  • Calculate and interpret possession sequence metrics
  • Measure territorial control through field position and zone dominance
  • Analyze pressing effectiveness and defensive territorial strategies
  • Build possession efficiency metrics combining volume and quality
  • Compare high-press versus deep-block approaches using data
  • Integrate possession metrics with xG and xT for comprehensive team evaluation

Chapter 11: Possession and Territorial Control

Introduction

Possession has long been considered a fundamental indicator of team dominance in soccer. The intuition is straightforward: if your team has the ball, the opponent cannot score. This simple logic has driven tactical philosophies from the Total Football of the 1970s Dutch teams through the tiki-taka of Barcelona and Spain in the 2010s. Yet possession statistics alone tell an incomplete story. A team might dominate possession while struggling to create chances, or a counter-attacking side might concede possession deliberately while controlling the most dangerous areas of the pitch.

This chapter develops a comprehensive framework for measuring and analyzing possession and territorial control. We move beyond simple possession percentage to examine how teams control space, where they establish dominance, and how possession translates into attacking threat. By integrating concepts from previous chapters -- particularly Expected Threat (xT) and passing networks -- we build sophisticated metrics that capture the quality and effectiveness of possession rather than merely its quantity.

The distinction between possessing the ball and controlling territory is crucial. A team circulating the ball in their defensive third technically has possession but has ceded territorial control to the opponent pressing high. Conversely, a team with the ball in the opponent's penalty area has both possession and territorial dominance. Our metrics must capture these nuances to provide actionable insights for analysts and coaches.

The evolution of possession analytics mirrors the evolution of soccer tactics. Early statistical analysis simply counted time on the ball. The next generation measured where possession occurred. Today's frontier integrates possession location, speed, direction, and outcome to create holistic pictures of how teams control matches. This chapter traces that evolution and equips you with state-of-the-art tools for each layer of analysis.

11.1 Understanding Possession

11.1.1 Defining Possession: Ball Possession vs. Territorial Possession

Possession seems simple to define but proves surprisingly complex in practice. At the broadest level, we must distinguish between two fundamentally different concepts:

Ball possession refers to which team has the ball. At any given moment during live play, one team controls the ball. The percentage of time (or passes, or touches) each team spends in control is the traditional "possession" statistic displayed on television broadcasts.

Territorial possession refers to which team dominates specific areas of the pitch. A team can have territorial dominance without having the ball -- for example, a pressing team whose players occupy advanced positions while the opposition goalkeeper takes a goal kick. Territorial possession is about spatial control, not ball control.

The distinction matters because the two concepts can diverge dramatically. A team playing out from the back under intense pressure may have ball possession but zero territorial dominance -- their possession is occurring entirely in their own half under duress. Understanding which type of possession you are measuring, and which is more relevant to your analytical question, is essential.

11.1.2 Possession Percentage: Calculation Methods and Their Differences

Different organizations calculate possession differently, and the differences are not trivial:

Time-Based Possession: The proportion of match time each team has the ball.

$$\text{Possession}_A = \frac{\text{Time with ball}_A}{\text{Total playing time}} \times 100\%$$

Time-based possession requires tracking the exact moment possession changes, which is straightforward with tracking data but can be approximate with event data. The advantage of time-based possession is that it accounts for the actual duration of control -- a team that holds the ball for 30 seconds on each possession is contributing more than one that loses it after 5 seconds, even if both complete the same number of passes.

Pass-Based Possession: The proportion of successful passes each team completes.

$$\text{Possession}_A = \frac{\text{Passes}_A}{\text{Passes}_A + \text{Passes}_B} \times 100\%$$

Pass-based possession is the most common method in event data analysis because it is easy to compute and does not require timing information. However, it inflates the possession of teams that play many short passes and underestimates teams that use fewer, longer passes. A team playing 300 short passes in their own half would appear to have more possession than a team playing 200 forward passes that generate more attacking threat.

Touch-Based Possession: The proportion of ball touches by each team.

Touch-based possession counts every interaction with the ball -- passes, dribbles, shots, clearances, and so on. This is more inclusive than pass-based possession and captures activities like dribbling and carrying that pass counts miss.

These methods yield slightly different results, though they typically correlate strongly (r > 0.90). Event data providers like StatsBomb and Opta use variations of these approaches. For our analysis, we primarily use pass-based possession as it aligns naturally with event data, but we note the alternatives where relevant.

Common Pitfall: TV broadcast possession statistics often use a proprietary blend of time-based and touch-based methods that may not match what you compute from event data. Do not assume that your calculated possession percentage will exactly match the broadcast graphic. Differences of 2-3 percentage points are common and reflect methodological choices, not errors.

Intuition: Possession in soccer is like time of possession in basketball -- useful context, but far from the whole story. A team that holds the ball for 70% of the match in their own half is not dominating; they are likely under pressure and recycling cautiously. What matters is not how much you have the ball, but what you do with it and where you have it. This chapter teaches you to measure the "what" and "where" alongside the "how much."

11.1.3 The Possession Paradox

Research has shown that the relationship between possession and match outcomes is nuanced:

  1. Positive correlation with points: Teams with higher average possession generally finish higher in league tables
  2. Diminishing returns: Beyond approximately 55-60% possession, additional possession shows decreasing correlation with winning
  3. Style dependency: Counter-attacking teams can be highly successful with low possession
  4. Context sensitivity: Possession value depends on where the ball is and what happens with it

This paradox motivates our exploration of possession quality metrics beyond raw percentages.

Real-World Application: The 2018 World Cup provided a striking example of the possession paradox. Germany, the defending champions and one of the most possession-oriented teams in the tournament, were eliminated in the group stage despite averaging 68% possession. Russia, their conqueror, won the match with just 34% possession. This result underscored that possession without penetration is not dominance -- it is sterile control.

The academic literature confirms the paradox across multiple leagues and seasons. Collet (2013) found that possession has a positive but moderate correlation with league points in the top five European leagues -- roughly r = 0.4, meaning possession explains only about 16% of the variation in points. Other factors -- shooting quality, defensive organization, set pieces, and luck -- account for far more. This statistical reality does not diminish the value of possession, but it reframes it: possession is one ingredient of success, not a guarantee.

11.1.4 The Myth of "Sterile Possession"

The phrase "sterile possession" describes possession that circulates the ball without creating attacking threat. It is one of the most commonly invoked critiques in tactical analysis, but it deserves careful examination.

Sterile possession can be identified quantitatively. If a team has high possession percentage but low xG generation, low field tilt, and few entries into the final third, their possession is genuinely failing to create threat. However, there are important caveats:

Possession as defense. Even sterile possession prevents the opponent from attacking. A team holding 65% possession is limiting the opponent to 35% -- fewer opportunities for the other side to create chances. This defensive benefit of possession is real and measurable: teams with higher possession tend to face fewer shots, even if their own attacking output is modest.

Patience vs. sterility. High-possession teams that play patiently, probing for openings before accelerating into the final third, may appear "sterile" during long periods of build-up play but then produce high-quality chances when they eventually penetrate. Judging possession quality requires looking at outcomes over the full match, not just isolated sequences.

Opponent quality effects. What appears to be sterile possession may actually be excellent defending by the opponent. When a strong defensive team limits a possession-oriented side to circulating the ball in non-threatening areas, the blame lies partly with the defense's quality, not solely with the attack's inefficiency.

Best Practice: Before labeling a team's possession as "sterile," check their xG per possession, final third entries, and field tilt. True sterile possession produces low values across all these metrics. If xG per possession is normal but total xG is low, the team may simply have fewer possessions than usual (perhaps because the opponent is also holding the ball effectively).

11.1.5 Possession Sequences

A possession sequence represents a continuous period of team control, ending when: - The opposing team gains possession - Play stops (out of bounds, foul, etc.) - A goal is scored

Key sequence metrics include:

Sequence Length: Number of passes or events in the sequence

$$L_s = \text{count of events in sequence } s$$

Sequence Duration: Time elapsed during the sequence

$$D_s = t_{end} - t_{start}$$

Sequence Progression: Net movement toward goal

$$P_s = x_{final} - x_{initial}$$

Sequence Speed: Rate of progression toward goal

$$S_s = \frac{P_s}{D_s}$$

Sequence Directness: Ratio of net progression to total distance covered

$$\text{Directness}_s = \frac{x_{final} - x_{initial}}{\sum |x_{i+1} - x_i|}$$

A directness value close to 1 means the ball moved almost entirely forward; a value close to 0 means extensive lateral and backward passing relative to net progress.

import pandas as pd
import numpy as np
from statsbombpy import sb

def identify_possession_sequences(events_df, team_name):
    """
    Identify possession sequences for a team.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze

    Returns
    -------
    list of DataFrame
        Each DataFrame is one possession sequence
    """
    # Sort by timestamp
    events = events_df.sort_values(['minute', 'second', 'index']).reset_index(drop=True)

    sequences = []
    current_sequence = []
    current_team = None

    for idx, event in events.iterrows():
        event_team = event['team']

        # Possession change or play stoppage
        if event_team != current_team or event['type'] in ['Half Start', 'Half End']:
            if current_sequence and current_team == team_name:
                sequences.append(pd.DataFrame(current_sequence))
            current_sequence = []
            current_team = event_team

        current_sequence.append(event)

    # Final sequence
    if current_sequence and current_team == team_name:
        sequences.append(pd.DataFrame(current_sequence))

    return sequences


def analyze_sequence(sequence_df):
    """
    Calculate metrics for a single possession sequence.

    Parameters
    ----------
    sequence_df : DataFrame
        Events in the sequence

    Returns
    -------
    dict
        Sequence metrics
    """
    # Basic counts
    n_events = len(sequence_df)
    n_passes = len(sequence_df[sequence_df['type'] == 'Pass'])

    # Duration
    start_time = sequence_df.iloc[0]['minute'] * 60 + sequence_df.iloc[0].get('second', 0)
    end_time = sequence_df.iloc[-1]['minute'] * 60 + sequence_df.iloc[-1].get('second', 0)
    duration = end_time - start_time

    # Spatial progression
    start_loc = sequence_df.iloc[0].get('location')
    end_loc = sequence_df.iloc[-1].get('location')

    if isinstance(start_loc, list) and isinstance(end_loc, list):
        progression = end_loc[0] - start_loc[0]
        start_x = start_loc[0]
        end_x = end_loc[0]
    else:
        progression = 0
        start_x = end_x = None

    # Outcome
    final_event = sequence_df.iloc[-1]
    ends_in_shot = final_event['type'] == 'Shot'
    ends_in_goal = ends_in_shot and final_event.get('shot_outcome') == 'Goal'

    return {
        'n_events': n_events,
        'n_passes': n_passes,
        'duration': duration,
        'progression': progression,
        'start_x': start_x,
        'end_x': end_x,
        'ends_in_shot': ends_in_shot,
        'ends_in_goal': ends_in_goal
    }

Common Pitfall: When analyzing possession sequences, be careful about how you define sequence boundaries. Different data providers use different rules for when a possession ends. Some count a deflection as a possession change, while others do not. Ensure consistency within your analysis and document your definition explicitly.

11.1.6 Build-Up Play Analysis

Build-up play -- the phase where a team constructs an attack from their own half -- is a critical component of possession that deserves dedicated analysis. Teams vary enormously in how they build attacks: some play short from the goalkeeper, others go long to a target forward, and many use a combination depending on opposition pressing intensity.

Key metrics for build-up play analysis include:

Build-up origin: Where possessions begin (goal kick, throw-in, open play recovery)

Goalkeeper involvement: How often and how the goalkeeper participates in build-up. Short distributions, long kicks, and throws each indicate different tactical approaches.

Build-up speed: Time from possession start to reaching the middle third. Faster build-up suggests a more direct approach; slower build-up suggests patient construction.

Build-up route: Whether the team builds through the center, through the flanks, or uses switches of play. This can be measured by the y-coordinate variance of passes during the build-up phase.

Real-World Application: Manchester City under Pep Guardiola are famous for building from the back. Analysis of their possession sequences reveals that they average 4-5 passes before crossing the halfway line, compared to 2-3 for more direct teams. Their build-up also shows significantly higher y-coordinate variance, indicating frequent switches of play designed to find the weak side of the opposition press.

11.2 Territorial Control

11.2.1 Defining Territory

Territorial control measures where on the pitch teams establish dominance. Unlike possession, which is binary (one team has the ball), territory can be shared or contested. We measure territory through:

  1. Average field position: Where events occur
  2. Zone dominance: Proportion of actions in each pitch zone
  3. Spatial control models: Probabilistic ownership of pitch areas

11.2.2 Field Tilt and Territorial Dominance Metrics

Average X Position: The mean horizontal position of a team's actions

$$\bar{X}_{team} = \frac{1}{n}\sum_{i=1}^{n} x_i$$

where $x_i$ is the x-coordinate of each action.

Field Tilt: The proportion of touches in the attacking third

$$\text{Tilt} = \frac{\text{Touches in attacking third}}{\text{Total touches}}$$

Field Tilt is one of the most underrated metrics in soccer analytics. It is simple to calculate, easy to explain to non-technical audiences, and strongly correlated with match dominance. A team with a Field Tilt above 0.40 (40% of touches in the attacking third) is exerting significant territorial pressure. Most top teams average 0.30-0.35, so values above 0.40 indicate exceptional attacking dominance in a given match.

Territorial Index: Comparison of average positions between teams

$$TI = \frac{\bar{X}_{team} - \bar{X}_{opponent}}{120} + 0.5$$

Values above 0.5 indicate territorial advantage. The Territorial Index is especially useful for comparing the two sides in a specific match because it directly captures the spatial battle between them.

Defensive Line Height: The average x-position of a team's defensive actions indicates how high they defend. A high defensive line (average defensive action x > 50) indicates an aggressive territorial approach, while a low line (x < 40) indicates a deep-block approach. This metric can be derived from the location of tackles, interceptions, and pressures.

Best Practice: Field Tilt is one of the most underrated metrics in soccer analytics. It is simple to calculate, easy to explain to non-technical audiences, and strongly correlated with match dominance. A team with a Field Tilt above 0.40 (40% of touches in the attacking third) is exerting significant territorial pressure. Most top teams average 0.30-0.35, so values above 0.40 indicate exceptional attacking dominance.

Advanced: For a more sophisticated measure of territorial control, consider Voronoi-based spatial models that use tracking data to assign every point on the pitch to the nearest player. This produces a continuous "control map" showing which team dominates each area. While tracking data is required, the resulting pitch control models (pioneered by William Spearman) provide the most accurate picture of spatial dominance available.

def calculate_field_position(events_df, team_name):
    """
    Calculate field position metrics for a team.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze

    Returns
    -------
    dict
        Field position metrics
    """
    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    # Extract coordinates
    x_coords = []
    y_coords = []

    for loc in team_events['location']:
        if isinstance(loc, list) and len(loc) >= 2:
            x_coords.append(loc[0])
            y_coords.append(loc[1])

    if not x_coords:
        return None

    x_coords = np.array(x_coords)

    # Metrics
    avg_x = np.mean(x_coords)
    avg_y = np.mean(y_coords)

    # Field tilt (attacking third is x > 80 on 120m pitch)
    attacking_third = np.sum(x_coords > 80) / len(x_coords)
    middle_third = np.sum((x_coords >= 40) & (x_coords <= 80)) / len(x_coords)
    defensive_third = np.sum(x_coords < 40) / len(x_coords)

    return {
        'avg_x': avg_x,
        'avg_y': avg_y,
        'attacking_third': attacking_third,
        'middle_third': middle_third,
        'defensive_third': defensive_third,
        'field_tilt': attacking_third
    }

Common Pitfall: When calculating territorial metrics, remember that StatsBomb coordinates always orient the attacking direction to the right (x = 120). This means the attacking third for a team is always x > 80, regardless of which end they physically defend. Other data providers may not normalize coordinates this way, so always verify the coordinate convention before analysis.

11.2.3 Zone-Based Analysis

Dividing the pitch into zones enables detailed territorial analysis:

def calculate_zone_control(events_df, team_name, n_x=6, n_y=3):
    """
    Calculate zone control percentages.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    n_x : int
        Number of horizontal zones
    n_y : int
        Number of vertical zones

    Returns
    -------
    ndarray
        Zone control matrix (proportion of actions in each zone)
    """
    zone_counts = np.zeros((n_y, n_x))

    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    for loc in team_events['location']:
        if not isinstance(loc, list):
            continue

        x, y = loc[0], loc[1]

        # Convert to zone indices
        zone_x = min(int(x / 120 * n_x), n_x - 1)
        zone_y = min(int(y / 80 * n_y), n_y - 1)

        zone_counts[zone_y, zone_x] += 1

    # Normalize to proportions
    total = zone_counts.sum()
    if total > 0:
        zone_counts = zone_counts / total

    return zone_counts


def compare_zone_control(events_df, team1, team2, n_x=6, n_y=3):
    """
    Compare zone control between two teams.

    Returns matrix where positive values indicate team1 dominance,
    negative values indicate team2 dominance.
    """
    zone1 = calculate_zone_control(events_df, team1, n_x, n_y)
    zone2 = calculate_zone_control(events_df, team2, n_x, n_y)

    return zone1 - zone2

11.2.4 Spatial Control Models

Advanced spatial control models estimate the probability of each team controlling any point on the pitch. These typically require tracking data but can be approximated from event data:

$$P(control | x, y) = \sigma\left(\sum_{i} w_i \cdot K(x, y, x_i, y_i)\right)$$

where $K$ is a kernel function (often Gaussian) and $w_i$ are weights based on event importance.

from scipy.ndimage import gaussian_filter

def estimate_spatial_control(events_df, team_name, grid_size=(12, 8), sigma=1.5):
    """
    Estimate spatial control using kernel density estimation.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    grid_size : tuple
        (n_x, n_y) grid dimensions
    sigma : float
        Gaussian smoothing parameter

    Returns
    -------
    ndarray
        Control probability map
    """
    n_x, n_y = grid_size
    control_map = np.zeros((n_y, n_x))

    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    # Weight by event type
    event_weights = {
        'Pass': 1.0,
        'Carry': 1.0,
        'Shot': 2.0,
        'Dribble': 1.5,
        'Ball Receipt*': 0.5,
        'Pressure': 0.5
    }

    for _, event in team_events.iterrows():
        loc = event['location']
        if not isinstance(loc, list):
            continue

        x, y = loc[0], loc[1]
        zone_x = min(int(x / 120 * n_x), n_x - 1)
        zone_y = min(int(y / 80 * n_y), n_y - 1)

        weight = event_weights.get(event['type'], 1.0)
        control_map[zone_y, zone_x] += weight

    # Apply Gaussian smoothing
    control_map = gaussian_filter(control_map, sigma=sigma)

    # Normalize to [0, 1]
    if control_map.max() > 0:
        control_map = control_map / control_map.max()

    return control_map

11.3 Possession Value Models

11.3.1 Beyond Simple Possession: Linking Possession to Expected Outcomes

Raw possession percentage fails to capture possession quality. A team with 70% possession but never entering the final third generates less threat than a team with 40% possession but consistently reaching dangerous areas. Possession value models address this by weighting possession by its threat-generating potential.

The key insight driving modern possession value frameworks is that not all possession is created equal. Possessing the ball on the halfway line generates negligible threat; possessing it at the edge of the penalty area generates significant threat. By applying location-based value models -- such as Expected Threat (xT) from Chapter 9 -- to possession data, we create metrics that capture the quality of possession alongside its quantity.

11.3.2 Effective Possession: Possession in Dangerous Areas

We can weight possession by the Expected Threat values of locations controlled:

$$\text{xT Possession} = \sum_{e \in events} xT(x_e, y_e) \cdot w_e$$

where $w_e$ is an event weight (e.g., duration or importance).

def calculate_xt_possession(events_df, team_name, xt_grid):
    """
    Calculate xT-weighted possession.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    xt_grid : ndarray
        xT values for each zone

    Returns
    -------
    dict
        xT possession metrics
    """
    grid_y, grid_x = xt_grid.shape

    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    total_xt = 0
    n_events = 0

    for _, event in team_events.iterrows():
        loc = event['location']
        if not isinstance(loc, list):
            continue

        x, y = loc[0], loc[1]
        zone_x = min(int(x / 120 * grid_x), grid_x - 1)
        zone_y = min(int(y / 80 * grid_y), grid_y - 1)

        total_xt += xt_grid[zone_y, zone_x]
        n_events += 1

    return {
        'total_xt_possession': total_xt,
        'avg_xt_possession': total_xt / n_events if n_events > 0 else 0,
        'n_events': n_events
    }

11.3.3 Dangerous Possession

We can define "dangerous possession" as possession in high-xT zones:

$$\text{Dangerous Possession \%} = \frac{\text{Events where } xT > \theta}{\text{Total events}}$$

The threshold $\theta$ is a design choice. A common value is 0.05 -- meaning zones where the probability of scoring from a single action exceeds 5%. This typically corresponds to areas inside or near the penalty area and in central positions in the attacking third.

def calculate_dangerous_possession(events_df, team_name, xt_grid, threshold=0.05):
    """
    Calculate proportion of possession in dangerous areas.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    xt_grid : ndarray
        xT values
    threshold : float
        xT threshold for "dangerous"

    Returns
    -------
    float
        Proportion of dangerous possession
    """
    grid_y, grid_x = xt_grid.shape

    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    dangerous = 0
    total = 0

    for _, event in team_events.iterrows():
        loc = event['location']
        if not isinstance(loc, list):
            continue

        x, y = loc[0], loc[1]
        zone_x = min(int(x / 120 * grid_x), grid_x - 1)
        zone_y = min(int(y / 80 * grid_y), grid_y - 1)

        total += 1
        if xt_grid[zone_y, zone_x] > threshold:
            dangerous += 1

    return dangerous / total if total > 0 else 0

Intuition: Dangerous possession percentage tells you what fraction of a team's ball touches occur in threatening positions. Two teams can both have 50% possession, but if one has 12% dangerous possession and the other has 5%, the first team is creating significantly more threat from a similar amount of the ball. This is a far more informative metric than raw possession percentage for understanding match dynamics.

11.3.4 Possession-Adjusted Metrics

Many metrics benefit from possession adjustment to enable fair comparison between possession-dominant and counter-attacking teams:

$$\text{Metric per 100 possessions} = \frac{\text{Metric}}{\text{Total possessions}} \times 100$$

Without possession adjustment, high-possession teams will naturally produce higher raw totals for most attacking metrics simply because they have more opportunities. Adjusting per possession normalizes for this, revealing which teams are more efficient with their opportunities.

def possession_adjust_metrics(team_metrics, possession_sequences):
    """
    Adjust team metrics for possession volume.

    Parameters
    ----------
    team_metrics : dict
        Raw metric values
    possession_sequences : list
        Team's possession sequences

    Returns
    -------
    dict
        Possession-adjusted metrics
    """
    n_possessions = len(possession_sequences)

    if n_possessions == 0:
        return team_metrics

    adjusted = {}
    for key, value in team_metrics.items():
        if isinstance(value, (int, float)):
            adjusted[f'{key}_per_100poss'] = value / n_possessions * 100
        adjusted[key] = value

    adjusted['n_possessions'] = n_possessions
    return adjusted

Best Practice: Always present both raw and possession-adjusted metrics side by side. Raw metrics show total output; adjusted metrics show efficiency. A team with high raw xG but low xG per possession is generating volume through sheer dominance of the ball. A team with modest raw xG but high xG per possession is lethally efficient with limited opportunities. Both profiles can be successful, and coaches need to see both dimensions.

11.4 Possession Efficiency

11.4.1 Defining Efficiency

Possession efficiency measures how well a team converts possession into attacking threat or goals:

$$\text{Possession Efficiency} = \frac{\text{Value Created}}{\text{Possession Volume}}$$

Different efficiency metrics capture different aspects:

Shot Efficiency: Shots per possession sequence

$$\text{Shot Rate} = \frac{\text{Possessions ending in shot}}{\text{Total possessions}}$$

Typical shot rates range from 8-15% of possessions ending in a shot. Elite attacking teams may reach 15-18%, while struggling sides may fall below 8%.

xG Efficiency: Expected goals per possession

$$\text{xG Rate} = \frac{\text{Total xG}}{\text{Total possessions}}$$

xT Efficiency: Threat generated per possession

$$\text{xT Rate} = \frac{\text{Total xT generated}}{\text{Total possessions}}$$

11.4.2 Possession Sequences: Length, Speed, and Directness

The characteristics of possession sequences reveal fundamental tactical choices:

Long sequences (10+ passes) are characteristic of patient, possession-oriented teams. These sequences are more likely to end in a shot -- research shows that sequences of 6-10 passes have the highest shot rate -- but the marginal returns diminish beyond about 10 passes, as the defense has time to reorganize.

Short sequences (1-3 passes) are characteristic of direct, counter-attacking teams. While each individual sequence has a lower probability of producing a shot, the shots that do result tend to be of higher quality (higher xG per shot) because they catch the defense out of shape.

Sequence speed -- measured as meters progressed per second -- distinguishes fast transitions from patient build-up. Counter-pressing teams like Liverpool under Klopp showed the highest sequence speeds in the Premier League, reflecting their philosophy of attacking quickly after winning the ball.

Sequence directness captures whether a team progresses linearly toward goal or circulates laterally. A team with high directness plays the ball forward at every opportunity; a team with low directness recycles possession and probes for openings.

class PossessionEfficiencyAnalyzer:
    """
    Analyze possession efficiency for a team.

    Attributes
    ----------
    team_name : str
        Team to analyze
    sequences : list
        Possession sequences
    """

    def __init__(self, events_df, team_name, xt_grid=None):
        """
        Initialize analyzer.

        Parameters
        ----------
        events_df : DataFrame
            Match events
        team_name : str
            Team name
        xt_grid : ndarray, optional
            xT values for threat calculation
        """
        self.team_name = team_name
        self.events_df = events_df
        self.xt_grid = xt_grid

        # Build sequences
        self.sequences = identify_possession_sequences(events_df, team_name)
        self._analyze_sequences()

    def _analyze_sequences(self):
        """Analyze all possession sequences."""
        self.sequence_metrics = []

        for seq in self.sequences:
            metrics = analyze_sequence(seq)

            # Add xT if available
            if self.xt_grid is not None:
                metrics['xt_generated'] = self._calculate_sequence_xt(seq)

            self.sequence_metrics.append(metrics)

    def _calculate_sequence_xt(self, sequence_df):
        """Calculate xT generated in a sequence."""
        grid_y, grid_x = self.xt_grid.shape
        total_xt = 0

        for i in range(len(sequence_df) - 1):
            start = sequence_df.iloc[i]
            end = sequence_df.iloc[i + 1]

            start_loc = start.get('location')
            end_loc = end.get('location')

            if not (isinstance(start_loc, list) and isinstance(end_loc, list)):
                continue

            # Get zones
            sz_x = min(int(start_loc[0] / 120 * grid_x), grid_x - 1)
            sz_y = min(int(start_loc[1] / 80 * grid_y), grid_y - 1)
            ez_x = min(int(end_loc[0] / 120 * grid_x), grid_x - 1)
            ez_y = min(int(end_loc[1] / 80 * grid_y), grid_y - 1)

            xt_delta = self.xt_grid[ez_y, ez_x] - self.xt_grid[sz_y, sz_x]
            total_xt += max(0, xt_delta)

        return total_xt

    def get_efficiency_metrics(self):
        """
        Calculate overall efficiency metrics.

        Returns
        -------
        dict
            Efficiency metrics
        """
        n_sequences = len(self.sequences)

        if n_sequences == 0:
            return {}

        df = pd.DataFrame(self.sequence_metrics)

        # Basic rates
        shot_rate = df['ends_in_shot'].sum() / n_sequences
        goal_rate = df['ends_in_goal'].sum() / n_sequences

        # Progression
        avg_progression = df['progression'].mean()

        # Duration efficiency
        avg_duration = df['duration'].mean()

        metrics = {
            'n_possessions': n_sequences,
            'shot_rate': shot_rate,
            'goal_rate': goal_rate,
            'avg_progression': avg_progression,
            'avg_duration': avg_duration,
            'avg_passes': df['n_passes'].mean()
        }

        # xT if available
        if 'xt_generated' in df.columns:
            metrics['total_xt_generated'] = df['xt_generated'].sum()
            metrics['xt_per_possession'] = df['xt_generated'].mean()

        return metrics

    def get_sequence_quality_distribution(self):
        """
        Analyze distribution of sequence quality.

        Returns
        -------
        DataFrame
            Sequence quality categories
        """
        df = pd.DataFrame(self.sequence_metrics)

        # Categorize sequences
        def categorize(row):
            if row['ends_in_goal']:
                return 'Goal'
            elif row['ends_in_shot']:
                return 'Shot'
            elif row.get('progression', 0) > 30:
                return 'Good progression'
            elif row.get('n_passes', 0) >= 5:
                return 'Sustained'
            else:
                return 'Unproductive'

        df['category'] = df.apply(categorize, axis=1)

        return df['category'].value_counts(normalize=True)

11.4.3 Efficiency Comparison

Comparing efficiency between teams or across matches:

def compare_possession_efficiency(events_df, team1, team2, xt_grid=None):
    """
    Compare possession efficiency between two teams.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team1, team2 : str
        Team names
    xt_grid : ndarray, optional
        xT values

    Returns
    -------
    DataFrame
        Comparison of efficiency metrics
    """
    analyzer1 = PossessionEfficiencyAnalyzer(events_df, team1, xt_grid)
    analyzer2 = PossessionEfficiencyAnalyzer(events_df, team2, xt_grid)

    metrics1 = analyzer1.get_efficiency_metrics()
    metrics2 = analyzer2.get_efficiency_metrics()

    comparison = pd.DataFrame({
        team1: metrics1,
        team2: metrics2
    }).T

    return comparison

11.5 Pressing and Possession Regain

11.5.1 The Connection to Possession

Pressing -- applying defensive pressure to regain the ball -- directly affects possession dynamics. High-pressing teams aim to regain possession in dangerous areas, while low-block teams concede possession but in less threatening locations. The choice between these approaches has profound implications for both possession metrics and match outcomes.

The relationship between pressing and possession is bidirectional. High pressing leads to more possession regained in advanced areas, which in turn creates shorter distances to goal and higher-quality attacking opportunities. But pressing also carries risk: if the press is beaten, the pressing team is left with players out of position and vulnerable to counter-attacks. Understanding this risk-reward tradeoff quantitatively is one of the most valuable applications of possession analytics.

11.5.2 PPDA: Passes Per Defensive Action

PPDA measures pressing intensity:

$$PPDA = \frac{\text{Opponent passes allowed}}{\text{Defensive actions in opponent's half}}$$

Lower PPDA indicates more intense pressing (fewer opponent passes allowed per defensive action). Typical PPDA values range from about 6 (extremely intense pressing, like peak Jurgen Klopp Liverpool) to about 15 (very passive defending, like a deep-block team protecting a lead).

The standard definition counts opponent passes in their own half and defensive actions (tackles, interceptions, fouls, pressures) in the opponent's half:

def calculate_ppda(events_df, pressing_team, defending_third_threshold=40):
    """
    Calculate PPDA for a team.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    pressing_team : str
        Team whose pressing to measure
    defending_third_threshold : float
        X-coordinate marking defending team's defensive third

    Returns
    -------
    float
        PPDA value
    """
    # Get opponent
    teams = events_df['team'].unique()
    opponent = [t for t in teams if t != pressing_team][0]

    # Opponent passes in their defensive third
    opponent_passes = events_df[
        (events_df['team'] == opponent) &
        (events_df['type'] == 'Pass') &
        (events_df['location'].apply(
            lambda x: isinstance(x, list) and x[0] < defending_third_threshold
        ))
    ]

    # Defensive actions by pressing team in opponent's defensive third
    # (which is pressing team's attacking third, x > 80 from their perspective)
    defensive_actions = events_df[
        (events_df['team'] == pressing_team) &
        (events_df['type'].isin(['Pressure', 'Tackle', 'Interception', 'Foul Committed'])) &
        (events_df['location'].apply(
            lambda x: isinstance(x, list) and x[0] > 80
        ))
    ]

    n_passes = len(opponent_passes)
    n_def_actions = len(defensive_actions)

    return n_passes / n_def_actions if n_def_actions > 0 else float('inf')

Common Pitfall: PPDA definitions vary across the analytics community. Some analysts use the opponent's half rather than their defensive third, some exclude fouls from defensive actions, and some weight actions differently. When comparing PPDA values across sources, always verify the exact definition being used. The trends and relative comparisons are usually consistent, but absolute values may differ.

11.5.3 Pressing Intensity and High Turnovers

Beyond PPDA, several additional metrics capture pressing behavior:

Pressing height: The average x-coordinate of pressure events. Higher values indicate more aggressive pressing.

Pressing success rate: The proportion of pressure events that result in a turnover within a short time window (typically 5 seconds).

High turnovers measure ball recoveries in dangerous areas:

def calculate_high_turnovers(events_df, team_name, threshold_x=80):
    """
    Count high turnovers (ball recoveries in attacking third).

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    threshold_x : float
        X-coordinate threshold for "high"

    Returns
    -------
    dict
        High turnover metrics
    """
    recovery_types = ['Ball Recovery', 'Interception']

    high_recoveries = events_df[
        (events_df['team'] == team_name) &
        (events_df['type'].isin(recovery_types)) &
        (events_df['location'].apply(
            lambda x: isinstance(x, list) and x[0] > threshold_x
        ))
    ]

    total_recoveries = events_df[
        (events_df['team'] == team_name) &
        (events_df['type'].isin(recovery_types))
    ]

    return {
        'high_turnovers': len(high_recoveries),
        'total_recoveries': len(total_recoveries),
        'high_turnover_rate': len(high_recoveries) / len(total_recoveries)
                              if len(total_recoveries) > 0 else 0
    }

Real-World Application: Research by Gegenpressing (2019) showed that possessions starting from high turnovers (ball won in the attacking third) produce shots with approximately 30% higher xG per shot than possessions starting from deep recoveries. This quantifies the tactical logic of high pressing: winning the ball higher up creates better scoring opportunities because the defense has less time and space to organize.

11.5.4 Counter-Pressing (Gegenpressing) Analytics

Counter-pressing measures immediate pressure after losing possession. The term "Gegenpressing" was popularized by Jurgen Klopp but the concept is central to many modern tactical systems. The idea is that the moment of possession loss is the moment of greatest defensive vulnerability for the opponent -- they have just transitioned from defense to attack and are not yet organized. Immediate pressing in this window can win the ball back before the opponent can counter-attack.

Key metrics for counter-pressing analysis:

Regain time: How quickly does the team win the ball back after losing it? Measured in seconds from possession loss to next ball recovery.

Regain location: Where does the team win the ball back relative to where they lost it? Regaining close to the loss location suggests effective counter-pressing; regaining far back suggests the team retreated.

Counter-press success rate: What proportion of possession losses lead to regaining the ball within a defined time window (typically 5-8 seconds)?

def analyze_counter_pressing(events_df, team_name, time_window=5):
    """
    Analyze counter-pressing behavior after possession loss.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    time_window : int
        Seconds after loss to measure pressing

    Returns
    -------
    dict
        Counter-pressing metrics
    """
    events = events_df.sort_values(['minute', 'second', 'index']).reset_index(drop=True)

    # Find possession losses
    losses = []
    for i in range(1, len(events)):
        current = events.iloc[i]
        previous = events.iloc[i-1]

        # Possession loss = team had ball, now opponent has it
        if previous['team'] == team_name and current['team'] != team_name:
            losses.append({
                'index': i,
                'minute': previous['minute'],
                'second': previous.get('second', 0),
                'location': previous.get('location')
            })

    # Analyze pressing after each loss
    counter_press_success = 0
    counter_press_attempts = 0

    for loss in losses:
        loss_time = loss['minute'] * 60 + loss['second']

        # Look for pressing actions within time window
        for j in range(loss['index'], min(loss['index'] + 20, len(events))):
            event = events.iloc[j]
            event_time = event['minute'] * 60 + event.get('second', 0)

            if event_time - loss_time > time_window:
                break

            if event['team'] == team_name:
                if event['type'] in ['Pressure', 'Tackle', 'Interception']:
                    counter_press_attempts += 1

                if event['type'] in ['Ball Recovery', 'Interception']:
                    counter_press_success += 1
                    break

    return {
        'possession_losses': len(losses),
        'counter_press_attempts': counter_press_attempts,
        'counter_press_regains': counter_press_success,
        'counter_press_success_rate': counter_press_success / len(losses)
                                       if len(losses) > 0 else 0
    }

Advanced: The most sophisticated counter-pressing analysis measures not just whether the ball was regained but what happened next. Did the regain lead to a shot? How much xT was generated in the possession following the counter-press? This "counter-press value" metric connects the defensive action directly to attacking outcomes, providing a complete picture of the pressing team's return on investment.

11.6 Transition Analysis

11.6.1 Attack-to-Defense Transitions

Transitions -- the moments when possession changes hands -- are among the most dangerous phases of play. When a team loses the ball, they are momentarily disorganized, and the opponent can exploit this vulnerability. The speed and quality of defensive transitions determines how effectively a team responds to losing possession.

Key metrics for defensive transitions include:

Recovery time: Seconds from possession loss to the team's first defensive action (pressure, tackle, or retreat to defensive shape).

Shape recovery distance: How far back the team's defensive line moves in the 10 seconds after losing possession. A team that maintains its shape has a small recovery distance; one that is caught out of position has a large recovery distance.

Counter-attack concession rate: What proportion of possession losses lead to the opponent creating a shot within 15 seconds?

11.6.2 Defense-to-Attack Transitions

When a team wins the ball, the speed and directness of their subsequent actions determines whether they can exploit the opponent's disorganization:

Transition speed: Meters progressed toward goal per second in the first 10 seconds after winning possession.

Transition directness: Proportion of actions in the first 10 seconds that move the ball forward (as opposed to sideways or backward).

Fast break frequency: How often does the team generate a shot within 15 seconds of winning possession?

Intuition: Transitions are where the biggest mismatches occur in soccer. During settled possession, both teams are organized. During transitions, one team is momentarily exposed. The best counter-attacking teams in the world -- Leicester City in 2015-16, Real Madrid under Zidane, Inter Milan under Conte -- are masters of exploiting these momentary imbalances. Measuring transition speed and quality tells you how effectively a team capitalizes on these windows.

11.7 Game State Effects on Possession Patterns

11.7.1 How Scores Change Possession Behavior

Game state -- whether a team is winning, drawing, or losing -- has a profound and measurable effect on possession patterns. Understanding these effects is critical for interpreting possession statistics correctly.

When winning: Teams tend to increase possession percentage (they can afford to be patient), but the nature of their possession changes. Average field position typically drops (they sit deeper), and sequence directness decreases (they circulate more and attack less aggressively). Pressing intensity often decreases as well, with PPDA rising (less aggressive pressing).

When losing: Teams tend to increase pressing intensity (lower PPDA), push their average field position higher, and increase sequence directness. Centralization in the passing network often increases as the team funnels play through their most creative players in search of a goal.

When drawing: Behavior is closest to the team's "default" tactical setup, making drawing-state metrics the most representative of a team's intrinsic style.

Best Practice: When comparing possession metrics across teams or matches, always control for game state. A team that spent 60 minutes leading will show different possession patterns than one that spent 60 minutes trailing, regardless of their tactical philosophy. The cleanest comparisons use drawing-state data only, or weight metrics by game state proportionally.

11.7.2 Situational Possession Analysis

For advanced analysis, segment possession metrics by game state:

def analyze_possession_by_game_state(events_df, team_name):
    """
    Analyze possession patterns separately for each game state.

    Returns possession metrics when winning, drawing, and losing.
    """
    # Determine game state at each minute (simplified)
    goals = events_df[
        (events_df['type'] == 'Shot') &
        (events_df['shot_outcome'] == 'Goal')
    ].sort_values('minute')

    teams = events_df['team'].unique()
    opponent = [t for t in teams if t != team_name][0]

    score = {team_name: 0, opponent: 0}
    state_at_minute = {}

    for minute in range(0, 96):
        goals_this_min = goals[goals['minute'] == minute]
        for _, g in goals_this_min.iterrows():
            score[g['team']] += 1

        if score[team_name] > score[opponent]:
            state_at_minute[minute] = 'winning'
        elif score[team_name] < score[opponent]:
            state_at_minute[minute] = 'losing'
        else:
            state_at_minute[minute] = 'drawing'

    # Segment events by game state
    results = {}
    for state in ['winning', 'drawing', 'losing']:
        state_minutes = [m for m, s in state_at_minute.items() if s == state]
        if not state_minutes:
            continue

        state_events = events_df[events_df['minute'].isin(state_minutes)]
        field_pos = calculate_field_position(state_events, team_name)

        if field_pos:
            results[state] = {
                'minutes': len(state_minutes),
                **field_pos
            }

    return results

11.8 Visualization

11.8.1 Possession Maps

import matplotlib.pyplot as plt
from mplsoccer import Pitch
import seaborn as sns

def plot_possession_heatmap(events_df, team_name, ax=None):
    """
    Create possession heatmap showing event density.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to visualize
    ax : matplotlib.axes, optional
        Axis to plot on

    Returns
    -------
    fig, ax
    """
    if ax is None:
        pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b',
                     line_color='white')
        fig, ax = pitch.draw(figsize=(12, 8))
    else:
        fig = ax.figure
        pitch = Pitch(pitch_type='statsbomb')
        pitch.draw(ax=ax)

    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    x_coords = []
    y_coords = []

    for loc in team_events['location']:
        if isinstance(loc, list) and len(loc) >= 2:
            x_coords.append(loc[0])
            y_coords.append(loc[1])

    # Hexbin heatmap
    hexbin = ax.hexbin(x_coords, y_coords, gridsize=15, cmap='YlOrRd',
                       alpha=0.7, mincnt=1, extent=[0, 120, 0, 80])

    ax.set_title(f'{team_name} Possession Heatmap', fontsize=14, color='white')

    return fig, ax


def plot_territorial_comparison(events_df, team1, team2, figsize=(14, 6)):
    """
    Compare territorial control between two teams.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team1, team2 : str
        Team names
    figsize : tuple
        Figure size

    Returns
    -------
    fig, axes
    """
    fig, axes = plt.subplots(1, 2, figsize=figsize)

    for ax, team, color in zip(axes, [team1, team2], ['Blues', 'Reds']):
        pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b',
                     line_color='white')
        pitch.draw(ax=ax)

        team_events = events_df[
            (events_df['team'] == team) &
            (events_df['location'].notna())
        ]

        x_coords = []
        y_coords = []

        for loc in team_events['location']:
            if isinstance(loc, list):
                x_coords.append(loc[0])
                y_coords.append(loc[1])

        if x_coords:
            ax.hexbin(x_coords, y_coords, gridsize=12, cmap=color,
                     alpha=0.6, mincnt=1, extent=[0, 120, 0, 80])

            avg_x = np.mean(x_coords)
            ax.axvline(avg_x, color='white', linestyle='--', alpha=0.7)
            ax.text(avg_x + 2, 75, f'Avg: {avg_x:.1f}m', color='white', fontsize=10)

        ax.set_title(f'{team}', fontsize=12, color='white')

    plt.tight_layout()
    return fig, axes

11.8.2 Possession Flow Diagrams

def plot_possession_flow(events_df, team_name, ax=None):
    """
    Visualize possession flow between pitch thirds.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    ax : matplotlib.axes, optional
        Axis to plot on

    Returns
    -------
    fig, ax
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 6))
    else:
        fig = ax.figure

    # Count passes between zones
    team_passes = events_df[
        (events_df['team'] == team_name) &
        (events_df['type'] == 'Pass') &
        (events_df['pass_outcome'].isna())
    ]

    zones = ['Defensive', 'Middle', 'Attacking']
    zone_bounds = [(0, 40), (40, 80), (80, 120)]

    def get_zone(x):
        for i, (low, high) in enumerate(zone_bounds):
            if low <= x < high:
                return zones[i]
        return zones[-1]

    # Count transitions
    flow = {(z1, z2): 0 for z1 in zones for z2 in zones}

    for _, p in team_passes.iterrows():
        start_loc = p.get('location')
        end_loc = p.get('pass_end_location')

        if not (isinstance(start_loc, list) and isinstance(end_loc, list)):
            continue

        start_zone = get_zone(start_loc[0])
        end_zone = get_zone(end_loc[0])
        flow[(start_zone, end_zone)] += 1

    # Create Sankey-like visualization
    zone_x = {z: i for i, z in enumerate(zones)}

    max_flow = max(flow.values()) if flow.values() else 1

    for (start, end), count in flow.items():
        if count == 0:
            continue

        x1 = zone_x[start]
        x2 = zone_x[end]

        # Offset for different directions
        if x2 > x1:
            y_offset = 0.1
            color = 'green'
        elif x2 < x1:
            y_offset = -0.1
            color = 'red'
        else:
            y_offset = 0
            color = 'gray'

        width = count / max_flow * 5 + 0.5
        alpha = count / max_flow * 0.5 + 0.3

        ax.annotate('', xy=(x2, 0.5 + y_offset), xytext=(x1, 0.5 - y_offset),
                   arrowprops=dict(arrowstyle='->', color=color,
                                  lw=width, alpha=alpha,
                                  connectionstyle='arc3,rad=0.2'))

        # Label
        mid_x = (x1 + x2) / 2
        ax.text(mid_x, 0.5 + y_offset * 2, str(count), fontsize=9, ha='center')

    # Zone labels
    for zone in zones:
        x = zone_x[zone]
        ax.text(x, 0.5, zone, fontsize=12, ha='center', va='center',
               bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))

    ax.set_xlim(-0.5, 2.5)
    ax.set_ylim(0, 1)
    ax.axis('off')
    ax.set_title(f'{team_name} Passing Flow', fontsize=14)

    return fig, ax

11.8.3 Sequence Quality Dashboard

def plot_possession_dashboard(events_df, team_name, figsize=(14, 10)):
    """
    Create comprehensive possession analysis dashboard.

    Parameters
    ----------
    events_df : DataFrame
        Match events
    team_name : str
        Team to analyze
    figsize : tuple
        Figure size

    Returns
    -------
    fig, axes
    """
    fig = plt.figure(figsize=figsize)

    # Layout: 2x2 grid plus metrics panel
    ax1 = fig.add_subplot(2, 2, 1)  # Heatmap
    ax2 = fig.add_subplot(2, 2, 2)  # Sequence length distribution
    ax3 = fig.add_subplot(2, 2, 3)  # Flow diagram
    ax4 = fig.add_subplot(2, 2, 4)  # Metrics table

    # 1. Possession heatmap
    pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b',
                 line_color='white')
    pitch.draw(ax=ax1)

    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    x_coords = [loc[0] for loc in team_events['location']
                if isinstance(loc, list)]
    y_coords = [loc[1] for loc in team_events['location']
                if isinstance(loc, list)]

    if x_coords:
        ax1.hexbin(x_coords, y_coords, gridsize=12, cmap='YlOrRd',
                  alpha=0.6, mincnt=1, extent=[0, 120, 0, 80])

    ax1.set_title('Possession Locations', fontsize=11, color='white')

    # 2. Sequence length distribution
    sequences = identify_possession_sequences(events_df, team_name)
    seq_lengths = [len(s) for s in sequences]

    ax2.hist(seq_lengths, bins=range(1, max(seq_lengths) + 2),
            color='steelblue', edgecolor='white', alpha=0.7)
    ax2.set_xlabel('Sequence Length (events)')
    ax2.set_ylabel('Frequency')
    ax2.set_title('Possession Sequence Lengths', fontsize=11)

    # 3. Passing flow
    plot_possession_flow(events_df, team_name, ax=ax3)

    # 4. Metrics summary
    ax4.axis('off')

    # Calculate metrics
    field_pos = calculate_field_position(events_df, team_name)
    efficiency = PossessionEfficiencyAnalyzer(events_df, team_name).get_efficiency_metrics()

    metrics_text = f"""
    {team_name} Possession Metrics

    Total Possessions: {efficiency.get('n_possessions', 'N/A')}
    Average Sequence Length: {np.mean(seq_lengths):.1f} events
    Shot Rate: {efficiency.get('shot_rate', 0)*100:.1f}%

    Average Field Position: {field_pos.get('avg_x', 0):.1f}m
    Attacking Third: {field_pos.get('attacking_third', 0)*100:.1f}%
    Defensive Third: {field_pos.get('defensive_third', 0)*100:.1f}%

    Avg Progression: {efficiency.get('avg_progression', 0):.1f}m
    """

    ax4.text(0.1, 0.9, metrics_text, transform=ax4.transAxes,
            fontsize=11, verticalalignment='top', fontfamily='monospace',
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

    plt.tight_layout()
    return fig

11.9 Practical Implementation

11.9.1 Complete Analysis Pipeline

class PossessionAnalyzer:
    """
    Comprehensive possession and territorial control analyzer.

    Provides unified interface for all possession metrics.
    """

    def __init__(self, events_df, team_name, opponent_name=None, xt_grid=None):
        """
        Initialize analyzer.

        Parameters
        ----------
        events_df : DataFrame
            Match events
        team_name : str
            Team to analyze
        opponent_name : str, optional
            Opponent team name
        xt_grid : ndarray, optional
            xT values for quality weighting
        """
        self.events_df = events_df
        self.team_name = team_name
        self.opponent_name = opponent_name or self._find_opponent()
        self.xt_grid = xt_grid

        # Build sequences
        self.sequences = identify_possession_sequences(events_df, team_name)

        # Calculate basic metrics
        self._calculate_metrics()

    def _find_opponent(self):
        """Find opponent team name."""
        teams = self.events_df['team'].unique()
        opponents = [t for t in teams if t != self.team_name]
        return opponents[0] if opponents else None

    def _calculate_metrics(self):
        """Calculate all possession metrics."""
        # Field position
        self.field_position = calculate_field_position(
            self.events_df, self.team_name
        )

        # Zone control
        self.zone_control = calculate_zone_control(
            self.events_df, self.team_name
        )

        # Possession efficiency
        self.efficiency = PossessionEfficiencyAnalyzer(
            self.events_df, self.team_name, self.xt_grid
        ).get_efficiency_metrics()

        # PPDA
        if self.opponent_name:
            self.ppda = calculate_ppda(self.events_df, self.team_name)
            self.high_turnovers = calculate_high_turnovers(
                self.events_df, self.team_name
            )
            self.counter_pressing = analyze_counter_pressing(
                self.events_df, self.team_name
            )

    def get_summary(self):
        """Get comprehensive summary of possession metrics."""
        summary = {
            'team': self.team_name,
            'n_possessions': len(self.sequences),
            **self.field_position,
            **self.efficiency
        }

        if hasattr(self, 'ppda'):
            summary['ppda'] = self.ppda
            summary.update(self.high_turnovers)
            summary.update(self.counter_pressing)

        return summary

    def compare_to_opponent(self):
        """Compare possession metrics to opponent."""
        if not self.opponent_name:
            return None

        opponent_analyzer = PossessionAnalyzer(
            self.events_df, self.opponent_name, self.team_name, self.xt_grid
        )

        comparison = pd.DataFrame({
            self.team_name: self.get_summary(),
            self.opponent_name: opponent_analyzer.get_summary()
        }).T

        return comparison

    def plot_dashboard(self, figsize=(14, 10)):
        """Create possession analysis dashboard."""
        return plot_possession_dashboard(
            self.events_df, self.team_name, figsize
        )

11.10 Applications and Case Studies

11.10.1 Style Identification

Possession and territorial metrics enable objective style classification:

  • Possession-dominant: High possession %, high field position, low PPDA
  • Counter-attacking: Low possession %, high efficiency, low field position
  • Pressing-intensive: Low PPDA, high counter-pressing success, high turnovers in attacking third
  • Deep-defending: High defensive third %, low attacking third %, high PPDA

These classifications are not mutually exclusive. A team can be both possession-dominant and pressing-intensive (like Guardiola's Barcelona or Manchester City), or counter-attacking and pressing-intensive (like Klopp's Liverpool in certain phases).

Real-World Application: Multi-dimensional style profiles enable more nuanced scouting. Rather than simply labeling a team as "possession-based," analysts can describe them precisely: "60% possession, 0.35 field tilt, 8.2 PPDA, 32% counter-press success, 0.08 xG per possession." This level of detail allows coaches to prepare specific tactical responses and recruitment departments to find stylistic matches.

11.10.2 Match Analysis

Combining possession metrics with outcome analysis:

def analyze_match_possession(match_id, xt_grid=None):
    """
    Complete possession analysis for a match.

    Parameters
    ----------
    match_id : int
        StatsBomb match ID
    xt_grid : ndarray, optional
        xT grid for quality weighting

    Returns
    -------
    dict
        Complete analysis results
    """
    events = sb.events(match_id=match_id)
    teams = events['team'].unique()

    results = {}

    for team in teams:
        analyzer = PossessionAnalyzer(events, team, xt_grid=xt_grid)
        results[team] = analyzer.get_summary()

    # Add comparison
    results['comparison'] = pd.DataFrame(results).T

    return results

11.10.3 Possession Value Frameworks: Linking Possession to Expected Outcomes

The ultimate goal of possession analytics is to link possession patterns to expected outcomes -- goals scored and conceded. Several frameworks accomplish this:

xG chain: Each player's contribution to possessions that result in shots, weighted by the xG of those shots. This measures how involved a player is in the team's most threatening possessions.

Possession value added (PVA): The total xT generated by a player's actions during their possessions. This measures how much a player moves the ball into more threatening positions.

Expected possession value (EPV): A continuous model that assigns a value to each moment of possession based on the current ball location, nearby player positions, and game state. EPV requires tracking data but provides the most comprehensive picture of possession value.

These frameworks represent the frontier of possession analytics, connecting the "how much" and "where" of possession to the "so what" of expected outcomes. By combining the metrics from this chapter -- possession percentage, field tilt, dangerous possession, sequence characteristics, pressing intensity -- with outcome models like xG and xT, analysts can build complete pictures of how teams create and prevent goals through their control of the ball and territory.

Advanced: The emerging field of "expected possession value" (EPV), pioneered by Fernandez, Bornn, and Cervone, assigns a real-time value to every moment of ball possession based on the probability of the current possession resulting in a goal. This framework subsumes many of the metrics in this chapter into a single unified model. While computationally intensive and requiring tracking data, EPV represents the theoretical ideal toward which possession analytics is converging.

11.11 Summary

This chapter developed a comprehensive framework for analyzing possession and territorial control:

  1. Possession fundamentals: Ball vs. territorial possession, calculation methods, the possession paradox, sterile possession, and possession sequences
  2. Territorial metrics: Field position, field tilt, defensive line height, zone control, and spatial dominance models
  3. Possession value: xT-weighted possession, dangerous possession, and possession-adjusted metrics
  4. Efficiency metrics: Shot rate, xG per possession, sequence length, speed, directness, and quality distribution
  5. Pressing integration: PPDA, pressing intensity, high turnovers, counter-pressing analytics
  6. Transition analysis: Attack-to-defense and defense-to-attack transition metrics
  7. Game state effects: How winning, drawing, and losing alter possession patterns
  8. Visualization: Heatmaps, flow diagrams, dashboards, and territorial comparisons
  9. Possession value frameworks: xG chain, PVA, and EPV connecting possession to outcomes

The key insight is that possession quality matters more than quantity. A team with 40% possession but excellent efficiency can outperform a team with 60% possession but poor conversion. Modern analytics requires measuring not just how much of the ball a team has, but what they do with it and where they do it.

These metrics integrate naturally with passing networks (Chapter 10) and Expected Threat (Chapter 9) to provide holistic understanding of team attacking play. Combined with defensive metrics (Part III), they enable complete tactical analysis.

References

  1. Mackay, N. (2017). Predicting goal probabilities for possessions in football. MIT Sloan Sports Analytics Conference.

  2. Fernandez, J., & Bornn, L. (2018). Wide open spaces: A statistical technique for measuring space creation in professional soccer. MIT Sloan Sports Analytics Conference.

  3. Spearman, W. (2018). Beyond expected goals. MIT Sloan Sports Analytics Conference.

  4. Trainor, C., & Chappas, G. (2019). Possession-based models for player and team behavior. StatsBomb Innovation in Football Conference.

  5. Power, P., et al. (2017). Not all passes are created equal: Objectively measuring the risk and reward of passes in soccer. KDD 2017.

  6. Collet, C. (2013). The possession game? A comparative analysis of ball retention and team success in European and international football. Journal of Sports Sciences, 31(2), 123-136.

  7. Fernandez, J., Bornn, L., & Cervone, D. (2021). A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions. Machine Learning, 110(6), 1389-1427.

  8. Robberechts, P., & Davis, J. (2020). Valuing on-the-ball actions in soccer: A critical comparison of xT and VAEP. AAAI Workshop on AI in Team Sports.