40 min read

> "Football is a game of spaces. Creating them, occupying them, denying

Learning Objectives

  • Explain why spatial ownership is central to modern tactical analysis
  • Construct and interpret Voronoi diagrams from tracking data
  • Implement a probabilistic pitch control model following the Fernandez-Bornn framework
  • Quantify space creation and exploitation at both the individual and team levels
  • Analyse off-ball movement to identify runs that create tactical advantages
  • Define and compute metrics for dangerous space identification
  • Apply spatial analytics to real match scenarios for tactical insight
  • Understand computational considerations for production-grade spatial analysis
  • Compare zone-based and continuous spatial models

Chapter 17: Spatial Analysis and Pitch Control

"Football is a game of spaces. Creating them, occupying them, denying them." --- Johan Cruyff


17.1 Introduction to Spatial Analytics

Association football is, at its core, a territorial contest. Two teams compete for 90 minutes to control, exploit, and deny regions of a 105 m x 68 m rectangle. Although match narratives have traditionally been told through on-ball events --- passes completed, shots taken, tackles won --- the vast majority of a player's time is spent without the ball. Tracking data, now captured at 25 Hz in most elite leagues, finally allows us to quantify what happens in those 97--99 % of frames where a player is simply occupying space.

Spatial analytics is the branch of soccer data science that converts raw positional information into actionable measures of territorial control and tactical structure. Its tools range from classical computational geometry (Voronoi tessellations, convex hulls, Delaunay triangulations) to modern probabilistic models (pitch control surfaces, influence functions, expected threat grids). Together they answer questions such as:

  • Which team controls more of the pitch at the moment of a key pass?
  • How much space does a striker create for team-mates by making a decoy run?
  • Where are the dangerous pockets of space that a defence consistently fails to cover?

This chapter provides a rigorous yet practical introduction to each of these topics. We will build models from first principles, implement them in Python, and apply them to realistic match scenarios.

17.1.1 Why Space Matters More Than Possession

Possession percentages remain one of the most quoted statistics in broadcast media, yet they carry surprisingly little predictive power. Research by Collet (2013) showed that raw possession explains less than 5 % of the variance in league points across the top five European leagues. What matters is not how long a team has the ball but where and how it uses the space available.

Consider two teams that each average 50 % possession. Team A circulates the ball in its own half, rarely breaking defensive lines. Team B consistently advances into the half-spaces and the area between the opposition's defensive and midfield lines. The spatial footprints of these two teams tell radically different stories despite identical possession figures.

Spatial analytics replaces scalar possession with a rich, field-level picture of territorial control. Instead of asking "who has the ball?" we ask "who controls which regions of the pitch, and how does that control evolve over time?"

Callout --- The Positional Play Revolution: Pep Guardiola's tactical philosophy of "juego de posicion" (positional play) is fundamentally a spatial philosophy. Its core principle is that the optimal pass is not always forward---it is the pass that moves the ball to the zone where the team has the most spatial advantage. This philosophy has driven demand for spatial analytics tools that can quantify which zones a team controls and how that control changes with each action. Clubs employing positional play systems were among the earliest adopters of pitch control models, because these models directly formalize the concept that underpins their tactical identity.

17.1.2 The Tracking-Data Revolution

Before the widespread deployment of optical and GPS tracking systems, spatial analysis was limited to manual notation and aggregated heat-maps. Modern tracking data --- from providers such as Second Spectrum, Hawkeye, Signality, and SkillCorner --- delivers $(x, y)$ coordinates for every player and the ball at 25 frames per second. A single 90-minute match therefore produces roughly:

$$ N_{\text{frames}} = 90 \times 60 \times 25 = 135{,}000 \text{ frames} $$

Each frame contains positions for 22 outfield players, up to 3 substitutes warming up, and the ball, yielding millions of data points per match. This torrent of information is both an opportunity and a challenge: opportunity because it captures the full spatiotemporal structure of the game; challenge because naive analyses can drown in noise.

The models in this chapter are designed to compress tracking data into interpretable, decision-relevant summaries.

Beyond optical tracking, broadcast-derived tracking (from providers like SkillCorner and Metrica Sports) uses computer vision to extract player positions from standard broadcast footage. While less precise than optical tracking (typical positional error of 0.5-1.0 m vs. 0.1-0.3 m for optical systems), broadcast-derived tracking has the enormous advantage of availability: it can be generated for any match with broadcast footage, spanning leagues and competitions where optical tracking is not installed. This democratization of tracking data is expanding the reach of spatial analytics beyond the elite leagues.

17.1.3 A Roadmap of the Chapter

Section Topic Key Output
17.2 Voronoi Diagrams Tessellation of the pitch into dominant regions
17.3 Pitch Control Models Probabilistic surface of territorial control
17.4 Space Creation & Exploitation Metrics for how players generate and use space
17.5 Off-Ball Movement Analysis Quantifying runs, decoys, and rotations
17.6 Dangerous Space Identification Locating high-value regions the defence fails to cover
17.7 Defensive Shape and Spatial Compactness Measuring team structure without the ball
17.8 Zone-Based vs. Continuous Spatial Models Comparing the two major paradigms
17.9 Tactical Applications Putting it all together for match and recruitment analysis
17.10 Computational Considerations Scalability and production deployment
17.11 Future Directions Where spatial analytics is heading

17.2 Voronoi Diagrams in Soccer

17.2.1 Definition and Geometric Foundations

Given a set of $n$ points (sites) $\mathcal{P} = {p_1, p_2, \ldots, p_n}$ in the Euclidean plane, the Voronoi diagram partitions the plane into $n$ convex regions $\{V_1, V_2, \ldots, V_n\}$ such that every point in $V_i$ is closer to $p_i$ than to any other site:

$$ V_i = \bigl\{ x \in \mathbb{R}^2 \;\big|\; \|x - p_i\| \le \|x - p_j\| \;\;\forall\, j \ne i \bigr\} $$

In a soccer context the sites are the 22 outfield players (goalkeepers are typically excluded or treated separately), and the regions represent the dominant area of each player --- the portion of the pitch to which that player is the nearest.

Callout --- Historical Note

Voronoi diagrams are named after the Ukrainian mathematician Georgy Voronoi (1908), but the concept appeared independently in the work of Dirichlet (1850) and Descartes (1644). In soccer analytics they were first popularised by Taki and Hasegawa (2000) and later refined by Kim (2004).

The dual of the Voronoi diagram is the Delaunay triangulation, which connects sites whose Voronoi cells share an edge. Delaunay edges are useful for identifying passing lanes because they link pairs of players who are geometrically adjacent.

17.2.2 Computing Voronoi Diagrams

SciPy's spatial.Voronoi implements Fortune's sweep-line algorithm in $O(n \log n)$ time:

import numpy as np
from scipy.spatial import Voronoi

# Example: 22 player positions (x, y) in metres
positions = np.random.uniform(low=[0, 0], high=[105, 68], size=(22, 2))
vor = Voronoi(positions)

The resulting object contains:

  • vor.vertices --- coordinates of Voronoi vertices.
  • vor.regions --- lists of vertex indices forming each cell.
  • vor.ridge_vertices --- pairs of vertices forming each ridge (edge).
  • vor.point_region --- mapping from input point index to region index.

Because the pitch is a bounded rectangle, unbounded Voronoi cells must be clipped to $[0, 105] \times [0, 68]$. We demonstrate clipping in Code Example 01.

from shapely.geometry import Polygon, box
from scipy.spatial import Voronoi
import numpy as np

def clipped_voronoi(positions: np.ndarray,
                    pitch_length: float = 105.0,
                    pitch_width: float = 68.0) -> list[Polygon]:
    """Compute Voronoi cells clipped to the pitch boundary.

    Args:
        positions: (n, 2) array of player positions.
        pitch_length: Length of the pitch in metres.
        pitch_width: Width of the pitch in metres.

    Returns:
        List of Shapely Polygon objects, one per player.
    """
    # Add mirror points outside the pitch to handle boundary cells
    mirror_points = []
    for p in positions:
        mirror_points.append([-p[0], p[1]])            # left mirror
        mirror_points.append([2 * pitch_length - p[0], p[1]])  # right mirror
        mirror_points.append([p[0], -p[1]])             # bottom mirror
        mirror_points.append([p[0], 2 * pitch_width - p[1]])   # top mirror

    all_points = np.vstack([positions, mirror_points])
    vor = Voronoi(all_points)
    pitch_box = box(0, 0, pitch_length, pitch_width)

    cells = []
    for i in range(len(positions)):
        region_idx = vor.point_region[i]
        region = vor.regions[region_idx]
        if -1 in region:
            cells.append(pitch_box)  # fallback for degenerate cells
        else:
            polygon = Polygon(vor.vertices[region])
            cells.append(polygon.intersection(pitch_box))
    return cells

17.2.3 Interpreting Dominant Regions

The area of a player's Voronoi cell --- the dominant region area --- measures how much space that player "owns" at a given instant. Large cells for defenders may indicate a stretched back line; large cells for attackers may suggest isolation.

Aggregated across a match, dominant-region statistics yield insights such as:

  • Team compactness: The sum of Voronoi areas for a team's outfield players always equals half the pitch (minus the opponent's half), but the variance of those areas reveals how compact or spread-out the team is.
  • Defensive exposure: If a centre-back's Voronoi cell extends deep into the half-space, it may indicate that a full-back has pushed too high.
  • Pressing intensity: During a high press, the pressing team's forward players will have small Voronoi cells in the opponent's defensive third, indicating spatial congestion near the ball.

Callout --- Voronoi Areas as a Pressing Diagnostic: During a high press, we expect the pressing team's forward Voronoi cells to be small and tightly clustered in the opponent's defensive third, while the opponent's defenders have shrinking Voronoi cells (because they are being enclosed). The ratio of attacking Voronoi areas to defending Voronoi areas in the pressing zone provides a simple, interpretable metric for pressing intensity. Values below 0.7 (attackers' cells are 30% smaller) indicate an effective press; values above 1.2 indicate a disjointed press where the attackers are too spread out.

17.2.4 Limitations of Voronoi Models

The classic Voronoi diagram assigns space based solely on Euclidean distance. This ignores:

  1. Player velocity: A player sprinting toward a point may arrive before a closer but stationary opponent.
  2. Orientation and body shape: A player facing the ball can react faster than one facing away.
  3. Physical capabilities: Differences in maximum speed and acceleration.

These limitations motivate the weighted and probabilistic models introduced in Section 17.3.

17.2.5 Weighted Voronoi Diagrams

An intermediate step between the naive Voronoi and full probabilistic pitch control is the weighted Voronoi diagram (also called a power diagram or Laguerre diagram). Here, each player is assigned a weight $w_i$ based on their velocity, and the dominant region is defined as:

$$ V_i^w = \bigl\{ x \in \mathbb{R}^2 \;\big|\; \|x - p_i\|^2 - w_i \le \|x - p_j\|^2 - w_j \;\;\forall\, j \ne i \bigr\} $$

A natural choice for the weight is a function of the player's speed and direction relative to the target point. A player running toward a region effectively has a larger weight for that region, expanding their dominant area in the direction of travel.

This approach is computationally simpler than full pitch control models while capturing the most important factor that the naive Voronoi ignores: velocity.


17.3 Pitch Control Models

17.3.1 From Voronoi to Probability

A Voronoi diagram assigns every point on the pitch to exactly one player with probability 1. Reality is more nuanced: two players of comparable speed and proximity share a contested zone. Pitch control models replace the binary assignment with a continuous probability surface $\mathrm{PC}(x, y) \in [0, 1]$, where values near 1 indicate strong control by the home team and values near 0 indicate strong control by the away team.

17.3.2 The Fernandez--Bornn Framework

Fernandez and Bornn (2018) introduced one of the most influential pitch control models. The core idea is to define an influence function $I_i(x)$ for each player $i$, then aggregate influences across teams.

17.3.2.1 Influence Function

For player $i$ at position $p_i$ with velocity $v_i$, the influence at a target point $x$ is modelled as a bivariate Gaussian centred not at $p_i$ but at a predicted future position that accounts for momentum:

$$ \mu_i = p_i + v_i \cdot \Delta t $$

where $\Delta t$ is a short look-ahead time (typically 0.5--1.0 s). The covariance matrix $\Sigma_i$ is elongated along the velocity direction to reflect the fact that a player can cover more ground in the direction they are already moving:

$$ \Sigma_i = R(\theta_i) \begin{pmatrix} \sigma_{\parallel}^2 & 0 \\ 0 & \sigma_{\perp}^2 \end{pmatrix} R(\theta_i)^\top $$

where $R(\theta_i)$ is the rotation matrix aligning the major axis with $v_i$, $\sigma_{\parallel}$ is the spread along the velocity direction, and $\sigma_{\perp}$ is the spread perpendicular to it.

The influence value is:

$$ I_i(x) = \frac{1}{2\pi |\Sigma_i|^{1/2}} \exp\!\Bigl(-\tfrac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1} (x - \mu_i)\Bigr) $$

17.3.2.2 Team Influence and Pitch Control

The total influence of team $A$ at point $x$ is:

$$ I_A(x) = \sum_{i \in A} I_i(x) $$

Pitch control for team $A$ is then:

$$ \mathrm{PC}_A(x) = \frac{I_A(x)}{I_A(x) + I_B(x)} $$

This normalisation ensures $\mathrm{PC}_A(x) + \mathrm{PC}_B(x) = 1$ everywhere.

Callout --- Model Parameters

Typical parameter ranges used in the literature:

Parameter Symbol Range
Look-ahead time $\Delta t$ 0.5--1.5 s
Parallel spread $\sigma_{\parallel}$ 5--15 m
Perpendicular spread $\sigma_{\perp}$ 3--8 m
Speed scaling --- Linear with $\|v_i\|$

The optimal parameter values depend on the tracking data source and the specific application. Parameters should be tuned using validation data (e.g., predicting which team wins loose balls or receives passes in contested zones).

17.3.2.3 Implementation Details

A practical implementation of the Fernandez-Bornn model requires careful handling of several edge cases:

  • Stationary players: When $\|v_i\| \approx 0$, the covariance matrix becomes isotropic (circular influence). The rotation matrix $R(\theta_i)$ is undefined when velocity is zero, so a default orientation (e.g., facing the ball) should be used.
  • Goalkeepers: Goalkeepers have a different influence profile than outfield players because their movement is constrained to a smaller area but they have the advantage of using their hands. Many implementations exclude goalkeepers from the pitch control calculation or apply a separate influence function within the penalty area.
  • Ball position: The influence function can be modulated by proximity to the ball. Players near the ball should have higher influence because they are more likely to be actively contesting for possession.
import numpy as np

def compute_influence(player_pos: np.ndarray, player_vel: np.ndarray,
                      target: np.ndarray, dt: float = 0.7,
                      sigma_par: float = 10.0, sigma_perp: float = 5.0
                      ) -> float:
    """Compute Fernandez-Bornn influence of a player at a target point.

    Args:
        player_pos: (2,) array of player position.
        player_vel: (2,) array of player velocity.
        target: (2,) array of the target point.
        dt: Look-ahead time in seconds.
        sigma_par: Spread along velocity direction (metres).
        sigma_perp: Spread perpendicular to velocity (metres).

    Returns:
        Influence value (unnormalised density).
    """
    speed = np.linalg.norm(player_vel)
    mu = player_pos + player_vel * dt

    if speed > 0.5:
        theta = np.arctan2(player_vel[1], player_vel[0])
        # Scale spreads by speed
        s_par = sigma_par * (1 + speed / 10.0)
        s_perp = sigma_perp
    else:
        theta = 0.0
        s_par = sigma_par
        s_perp = sigma_perp

    cos_t, sin_t = np.cos(theta), np.sin(theta)
    R = np.array([[cos_t, -sin_t], [sin_t, cos_t]])
    D = np.diag([s_par ** 2, s_perp ** 2])
    Sigma = R @ D @ R.T

    diff = target - mu
    Sigma_inv = np.linalg.inv(Sigma)
    exponent = -0.5 * diff @ Sigma_inv @ diff
    det = np.linalg.det(Sigma)
    return np.exp(exponent) / (2 * np.pi * np.sqrt(det))

17.3.3 Spearman's Time-to-Intercept Model

An alternative formulation, developed by William Spearman (2017) and used at Liverpool FC, replaces the Gaussian influence with a physical model of player motion. For each point $x$ on the pitch, the model computes the time to intercept $\tau_i(x)$ for every player $i$:

$$ \tau_i(x) = \frac{-v_{i,\parallel} + \sqrt{v_{i,\parallel}^2 + 2a_{\max}\|x - p_i\|}}{a_{\max}} $$

where $v_{i,\parallel}$ is the component of velocity toward $x$ and $a_{\max}$ is maximum acceleration. Pitch control is then computed by comparing the earliest arrival time of each team, possibly with a logistic or sigmoid transition:

$$ \mathrm{PC}_A(x) = \sigma\!\bigl(\kappa\,[\tau_{B,\min}(x) - \tau_{A,\min}(x)]\bigr) $$

where $\sigma$ is the logistic function and $\kappa$ controls the sharpness of the transition.

Callout --- Comparing the Two Models: The Fernandez-Bornn and Spearman models represent two philosophically different approaches. The Fernandez-Bornn model is statistical: it defines influence through probability density functions and aggregates them. Spearman's model is physical: it computes actual arrival times based on kinematic equations. In practice, both produce similar pitch control surfaces for most game situations. They diverge most when players are moving at high speed in opposite directions, where the physics-based model tends to be more accurate because it properly accounts for deceleration and direction-change costs. The choice between them often comes down to computational preference and the level of physics detail desired.

17.3.4 Grid Discretisation

In practice, the pitch is discretised into a grid of cells (e.g., $105 \times 68$ one-metre cells or a coarser $53 \times 34$ two-metre grid). The pitch control value is computed at each cell centre, yielding a matrix that can be visualised as a heat-map.

# Pseudocode for grid evaluation
grid_x = np.linspace(0, 105, 53)
grid_y = np.linspace(0, 68, 34)
XX, YY = np.meshgrid(grid_x, grid_y)
target_points = np.column_stack([XX.ravel(), YY.ravel()])

PC = np.zeros(len(target_points))
for idx, pt in enumerate(target_points):
    inf_A = sum(influence(player, pt) for player in team_A)
    inf_B = sum(influence(player, pt) for player in team_B)
    PC[idx] = inf_A / (inf_A + inf_B + 1e-10)

The resolution is a trade-off between computational cost and visual fidelity. For real-time applications, 2-metre grids are common; for offline analysis, 1-metre or sub-metre grids are feasible.

17.3.5 Interpreting Pitch Control Surfaces

A pitch control surface is typically rendered as a colour map where red indicates home-team control and blue indicates away-team control (or vice versa). Key visual patterns include:

  • High press: The attacking team's pitch control extends deep into the opponent's defensive third.
  • Low block: A defending team has a dense band of control in front of its own penalty area.
  • Transition moment: Immediately after a turnover, the pitch control surface is in flux, with the formerly attacking team's influence still extended forward.
  • Overloads: Asymmetric pitch control on one flank, indicating a numerical advantage.

When presenting pitch control surfaces to coaches and players, it is helpful to annotate them with key tactical features: arrows showing the direction of potential passes into open space, circles highlighting zones of contested control, and labels identifying the players whose influence dominates each region. Raw heat maps, while visually striking, are often too abstract for non-technical audiences without these annotations.

17.3.6 Validation and Calibration

A pitch control model should be validated against observed outcomes. One approach is to evaluate how well the model predicts which team wins a loose ball or a 50/50 contest. Fernandez and Bornn report an area under the ROC curve (AUC) of approximately 0.78 for their model, compared with 0.73 for the naive Voronoi baseline.

Calibration can also be assessed by grouping all grid cells by predicted pitch control decile and comparing the predicted control probabilities with the empirical frequency of the team successfully playing the ball to those locations.

Additional validation approaches include:

  • Pass completion prediction: A pass into a zone with high pitch control for the passing team should succeed more often than a pass into a contested zone. If the model is well-calibrated, pass completion rate should increase monotonically with pitch control at the receiving location.
  • Shot quality prediction: Shots taken from zones of high pitch control should, on average, be higher quality (higher xG) because the shooter has more time and space. This correlation provides an indirect validation of the model.
  • Consistency across matches: The overall pitch control balance (total area controlled by each team) should correlate with possession statistics, with pitch control being a more informative measure.

17.4 Space Creation and Exploitation

17.4.1 Defining Space Creation

Space creation occurs when a player's movement increases the area or quality of space available to a team-mate. The most common mechanism is a pull-away run: an attacker drags a defender out of position, opening a pocket for a team-mate to exploit.

We quantify space creation using the delta dominant-region area method:

$$ \Delta A_j(t) = A_j^{\text{actual}}(t) - A_j^{\text{counterfactual}}(t) $$

where $A_j^{\text{actual}}(t)$ is team-mate $j$'s Voronoi area at time $t$ and $A_j^{\text{counterfactual}}(t)$ is what $j$'s area would be if the space-creating player $i$ had remained stationary since time $t_0$. The counterfactual is computed by freezing player $i$'s position while letting all other players move as observed.

17.4.2 Space Exploitation Metrics

Space creation is only valuable if it is exploited. We define exploitation as a subsequent event --- typically a pass into the newly created space --- that advances play.

$$ \text{Exploitation Rate} = \frac{\text{Number of passes into created space}}{\text{Number of space-creation events}} $$

A more granular metric weights exploitation by the quality of the space:

$$ \text{Weighted Exploitation} = \sum_{k} \Delta A_k \cdot xT(x_k, y_k) $$

where $xT(x_k, y_k)$ is the expected threat value at the location where the pass is received. This combines the quantity of space created with its positional value.

Callout --- The Space Creation-Exploitation Disconnect: One of the most interesting findings from spatial analysis is that the best space creators are not always on teams that best exploit the space they create. A striker making brilliant runs into the channels may create space repeatedly, but if the midfield lacks the vision or passing quality to exploit that space, the effort is wasted. This disconnect---high space creation but low exploitation---is a diagnostic marker for a team whose attacking talent is misaligned. It suggests that the team needs not a better striker but a better playmaker who can recognise and deliver into the created space.

17.4.3 Individual Space-Creation Profiles

By accumulating $\Delta A$ values across a season, we can build profiles that distinguish different archetypes:

Player Type Typical $\overline{\Delta A}$ (m$^2$/90) Mechanism
Pressing forward 150--250 Diagonal runs stretching the back line
False nine 100--180 Dropping deep, pulling centre-backs forward
Inverted winger 120--200 Cutting inside, creating width for the full-back
Box-to-box midfielder 80--140 Late runs into the box from deep

17.4.4 Team-Level Space Creation

At the team level, the total space created in the final third per possession is a strong correlate of chance quality. Teams that consistently generate $\geq 50$ m$^2$ of space in the final third per possession tend to produce higher expected-goals totals, controlling for possession volume.

Callout --- Practical Tip

When computing counterfactuals, freeze only the space-creating player. Freezing multiple players simultaneously introduces unrealistic scenarios and inflates $\Delta A$ estimates.

17.4.5 Spatial Value Surfaces

A spatial value surface assigns a continuous value to every point on the pitch, representing the probability of scoring if the ball is moved to that location. The most well-known spatial value surface is the Expected Threat (xT) grid developed by Karun Singh (2019).

The xT model divides the pitch into a grid (typically 12 x 8 zones) and assigns each zone a value equal to the probability of a goal being scored within the next $n$ actions if a team has the ball in that zone. This is computed via a Markov chain:

$$ xT(z) = P(\text{shoot} | z) \cdot P(\text{goal} | z, \text{shoot}) + P(\text{move} | z) \cdot \sum_{z'} P(z' | z, \text{move}) \cdot xT(z') $$

where $z$ is the current zone, "shoot" and "move" represent the decision to shoot or move the ball, and the summation covers all possible destination zones.

The resulting surface has intuitive properties: the penalty area has the highest values (typically $xT > 0.05$), the centre circle has moderate values ($xT \approx 0.01$), and a team's own penalty area has very low values ($xT < 0.001$). The gradient of the xT surface reveals which regions of the pitch offer the highest marginal value for ball progression.

Callout --- xT vs. Pitch Control: Complementary, Not Competing: Expected Threat and pitch control are answers to different questions. xT asks "how valuable is this zone?" while pitch control asks "who controls this zone?" The product of the two---pitch-control-weighted xT---combines both questions into a single, powerful metric that measures "how much threatening territory does each team control right now?" This combined metric is more informative than either component alone.


17.5 Off-Ball Movement Analysis

17.5.1 The Invisible Game

Off-ball movement is often called the "invisible" aspect of soccer. Legendary coaches from Arrigo Sacchi to Pep Guardiola have emphasised that the quality of a team's play is determined more by what players do without the ball than with it.

Tracking data has finally made the invisible visible. In this section we develop a systematic framework for classifying, measuring, and evaluating off-ball runs.

17.5.2 Taxonomy of Off-Ball Runs

We distinguish several categories of purposeful off-ball movement:

  1. Penetrating run: Movement toward the opponent's goal line, typically behind the defensive line.
  2. Lateral run: Horizontal movement designed to create passing angles or stretch the defence.
  3. Dropping run: Movement away from the opponent's goal, designed to receive in space between the lines.
  4. Decoy run: A run that the player does not expect to receive the ball on, but which pulls defenders out of position.
  5. Rotation: A positional exchange between two or more team-mates.

Each type of run serves a distinct tactical purpose and should be evaluated against different criteria. A penetrating run is valuable if it creates a goal-scoring opportunity; a decoy run is valuable if it creates space for a teammate, regardless of whether the runner ever touches the ball.

17.5.3 Detecting Runs Algorithmically

A simple heuristic for detecting penetrating runs from tracking data:

  1. Compute the player's velocity vector $v_i(t)$.
  2. Project onto the direction toward the opponent's goal centre: $v_{\text{goal}} = v_i \cdot \hat{g}$, where $\hat{g}$ is the unit vector from $p_i$ toward the goal.
  3. The player is making a penetrating run if $v_{\text{goal}} > v_{\min}$ (e.g., 3 m/s) and the player is in the attacking half.
  4. Cluster consecutive frames satisfying this criterion into discrete run events.
def detect_penetrating_runs(
    positions: np.ndarray,      # (T, 2) array of player positions
    velocities: np.ndarray,     # (T, 2) array of player velocities
    goal_centre: np.ndarray,    # (2,) target goal position
    v_min: float = 3.0,         # m/s threshold
    min_frames: int = 13,       # ~0.5 s at 25 Hz
) -> list[tuple[int, int]]:
    """Detect penetrating runs and return (start, end) frame pairs."""
    direction = goal_centre - positions
    norms = np.linalg.norm(direction, axis=1, keepdims=True)
    norms[norms == 0] = 1.0
    unit_dir = direction / norms

    projection = np.sum(velocities * unit_dir, axis=1)
    mask = projection > v_min

    runs = []
    start = None
    for t, active in enumerate(mask):
        if active and start is None:
            start = t
        elif not active and start is not None:
            if t - start >= min_frames:
                runs.append((start, t))
            start = None
    return runs

For detecting lateral runs and dropping runs, the same framework applies with different projection directions. Lateral runs project onto the horizontal axis; dropping runs project onto the direction away from the opponent's goal.

17.5.4 Evaluating Run Quality

Not all runs are equal. We evaluate each run along several dimensions:

  • Depth gained ($\Delta y$): How far toward goal the player advanced.
  • Space created ($\Delta A$): How much Voronoi area was freed for team-mates (Section 17.4).
  • Defenders engaged ($n_{\text{def}}$): How many opponents adjusted their position in response.
  • Pass received: Whether the run resulted in a successful reception.
  • Positional value change ($\Delta xT$): The change in expected threat associated with the movement.

A composite run quality score can be constructed as a weighted combination:

$$ Q_{\text{run}} = w_1 \Delta y + w_2 \Delta A + w_3 n_{\text{def}} + w_4 \mathbb{1}_{\text{received}} \cdot \Delta xT $$

Weights can be learned by regressing on downstream outcomes (e.g., shots within 10 seconds of the run).

17.5.5 Case Example: The Decoy Run

A classic tactical scenario illustrates the power of off-ball analysis. Consider a striker who makes a diagonal run into the channel, dragging a centre-back wide. The ball is instead played centrally to a midfielder who has found space vacated by the centre-back. Without tracking data, the striker receives zero credit in event-based systems. Spatial analysis, however, reveals that the striker's movement was the causal mechanism that created the chance.

This example highlights the fundamental limitation of event data and the revolutionary potential of tracking data. Players like Thomas Muller, whose primary contribution is spatial---finding and creating pockets of space through intelligent movement---are systematically undervalued by traditional event-based statistics.

17.5.6 Collective Off-Ball Movement Patterns

Individual runs are important, but the most effective attacking teams coordinate their off-ball movement. Common coordinated patterns include:

  • Overload-and-isolate: Three or four players crowd one side of the pitch, drawing defenders toward them, then the ball is switched to an isolated player on the opposite flank who faces a 1v1 situation.
  • Staggered depth runs: Two or three players make runs at different depths (near post, far post, cutback zone), creating a decision dilemma for the defence.
  • Third-man movement: Player A moves to draw a defender, creating space for Player B, who receives the ball and immediately plays into Player C, who has moved into the space vacated by the original defender.

Detecting and quantifying these coordinated patterns requires analyzing the movement of multiple players simultaneously, which is significantly more complex than single-player run detection. Graph-based representations---where players are nodes and their spatial relationships are edges---offer a promising framework for this analysis.


17.6 Dangerous Space Identification

17.6.1 What Makes Space "Dangerous"?

Not all space on the pitch is equally valuable. The penalty area is obviously more dangerous than the centre circle, but within the final third there are gradations that depend on the defensive shape, the proximity of attackers, and the speed of play.

We define dangerous space as regions satisfying three criteria simultaneously:

  1. High positional value: $xT(x, y) > \tau_{xT}$ (e.g., top 20 % of xT values).
  2. Accessible to the attacking team: $\mathrm{PC}_A(x, y) > 0.5$ or within a short passing distance of a controlled zone.
  3. Currently unoccupied by a defender: No defender within a minimum radius (e.g., 3 m).

17.6.2 The Dangerous-Space Matrix

Combining pitch control with expected threat yields a Dangerous-Space Matrix (DSM):

$$ \mathrm{DSM}(x, y) = \mathrm{PC}_A(x, y) \cdot xT(x, y) \cdot \bigl(1 - D_{\text{def}}(x, y)\bigr) $$

where $D_{\text{def}}(x, y) \in [0, 1]$ is a defender-density term computed as:

$$ D_{\text{def}}(x, y) = 1 - \prod_{j \in B} \Bigl(1 - \exp\!\bigl(-\|x - p_j\|^2 / 2\sigma_d^2\bigr)\Bigr) $$

with $\sigma_d \approx 3$ m. High DSM values identify locations that are both valuable and vulnerable.

17.6.3 Aggregated Dangerous Space Metrics

Over a match or season we can compute:

  • Dangerous Space Volume (DSV): Total DSM summed across the grid, per frame, averaged over the match.

$$ \text{DSV} = \frac{1}{T} \sum_{t=1}^{T} \sum_{(x,y) \in \mathcal{G}} \mathrm{DSM}_t(x, y) \cdot \delta A $$

where $\delta A$ is the area of each grid cell and $T$ is the number of frames.

  • Dangerous Space Exploitation (DSE): The fraction of entries into dangerous space that lead to a shot within 15 seconds.

  • Defensive Dangerous Space Conceded (DDSC): The dangerous space volume computed from the opponent's perspective, measuring defensive vulnerability.

17.6.4 Spatial Entropy

Another useful concept is spatial entropy, which measures the unpredictability of a team's attacking patterns:

$$ H = -\sum_{k=1}^{K} p_k \log p_k $$

where $p_k$ is the proportion of a team's final-third entries that occur in zone $k$. High entropy indicates diverse, hard-to-predict attack patterns; low entropy suggests predictability (e.g., always attacking down one flank).

Callout --- Tactical Insight

Teams with high spatial entropy in the final third tend to create more xG per possession, because defences cannot anticipate the point of attack. However, some coaches deliberately accept low entropy to exploit a known defensive weakness on one side. The optimal level of spatial entropy depends on the opponent: against a well-organized defence, high entropy is needed to probe for weaknesses; against a disorganized defence, concentrating attacks on the weakest point (low entropy) may be more efficient.

17.6.5 Half-Space Dominance

The half-spaces---the narrow corridors between the centre of the pitch and the wide areas---have received increasing attention in modern tactical analysis. These zones are particularly dangerous because they force defenders into difficult decisions: if a centre-back steps out to engage a player in the half-space, they leave a gap in the defensive line; if they hold position, the half-space player has time and space to pick a pass.

We can measure a team's half-space dominance as:

$$ \text{HS Dominance} = \frac{1}{T} \sum_{t=1}^{T} \left[ \sum_{(x,y) \in \text{HS}} \mathrm{PC}_A(x, y, t) \cdot \delta A \right] $$

where HS denotes the half-space zones (typically defined as the strips between 15-25m from the centre line on each side). Teams with high half-space dominance in the attacking third---such as Manchester City under Guardiola---tend to create more central, high-quality chances.


17.7 Defensive Shape and Spatial Compactness

17.7.1 Measuring Team Shape

Defensive shape---the spatial configuration of a team when it does not have the ball---is a fundamental concept in tactical analysis. A well-organized defensive shape denies space in dangerous areas, forces the opponent to play in low-value zones, and creates conditions for pressing traps.

We quantify defensive shape using several geometric metrics:

  • Convex hull area: The area of the smallest convex polygon containing all outfield players of the defending team. A smaller convex hull indicates a more compact defensive block.

$$ \text{Compactness} = \frac{A_{\text{convex hull}}}{A_{\text{pitch}}} $$

  • Effective playing space: The area between the highest defender and the lowest attacker (the "effective" part of the pitch where the game is actively contested). This is typically much smaller than the full pitch area.

  • Inter-line distances: The distances between the defensive line, the midfield line, and the attacking line. Well-organized teams maintain consistent inter-line distances of approximately 10-15 metres, compressing the space available to the opponent.

17.7.2 Defensive Line Metrics

The defensive line---the vertical position of the deepest defenders---is one of the most important tactical parameters. Key metrics include:

  • Average defensive line height: The mean y-coordinate (toward the opponent's goal) of the back four/five when the team is out of possession.
  • Defensive line stability: The standard deviation of the defensive line height across frames. Low stability (high variance) indicates a poorly synchronized backline vulnerable to through-balls.
  • Offside trap frequency: The number of times the defensive line steps up collectively, catching opponents offside. This requires tight coordination and is a sign of an organized defence.
def compute_defensive_line_metrics(
    defender_positions: np.ndarray,  # (T, n_defenders, 2) array
    attacking_direction: int = 1,     # 1 or -1
) -> dict:
    """Compute defensive line metrics from tracking data.

    Args:
        defender_positions: Array of defender positions over time.
        attacking_direction: Direction the defending team attacks.

    Returns:
        Dictionary of defensive line metrics.
    """
    # Defensive line height is the position of the second-deepest defender
    # (to exclude outliers like a full-back caught upfield)
    if attacking_direction == 1:
        sorted_y = np.sort(defender_positions[:, :, 1], axis=1)
        line_height = sorted_y[:, 1]  # second-lowest y
    else:
        sorted_y = np.sort(defender_positions[:, :, 1], axis=1)
        line_height = sorted_y[:, -2]  # second-highest y

    return {
        "avg_line_height": np.mean(line_height),
        "line_stability": np.std(line_height),
        "max_depth": np.min(line_height) if attacking_direction == 1 else np.max(line_height),
    }

17.7.3 Pressing Shape Analysis

When a team transitions from defence to pressing, their spatial shape changes dramatically. The key metrics for pressing shape are:

  • Pressing cone width: When the first player engages the ball carrier, the angle subtended by the supporting pressers defines the "pressing cone." A narrow cone (< 45 degrees) indicates a coordinated press that restricts the opponent's passing options. A wide cone (> 90 degrees) indicates a disjointed press that is easy to play through.

  • Pressing coverage: The proportion of the opponent's passing options that are covered by the pressing team. This is computed by checking whether each potential pass recipient has a pressing player within interception range.

  • Defensive transition compactness: How quickly the team becomes compact after losing the ball. Measured as the rate of change of convex hull area in the 3-5 seconds following a turnover.

Callout --- The Compactness-Coverage Trade-Off: A perfectly compact defensive block leaves no space centrally but may concede width. A widely spread defensive shape covers more of the pitch but leaves gaps between players. The optimal balance depends on the team's personnel (fast defenders can cover more space) and the opponent's attacking style (wide-play teams require wider coverage; central-play teams require more compactness). Spatial analysis allows coaches to quantify this trade-off and find the optimal defensive configuration for each opponent.


17.8 Zone-Based vs. Continuous Spatial Models

17.8.1 The Zone-Based Approach

Zone-based models divide the pitch into a fixed grid of discrete zones and assign properties (value, control, danger) to each zone. The most common configurations are:

  • 6 x 4 grid (24 zones): Used in the original xT model. Simple and interpretable, but too coarse for detailed tactical analysis.
  • 12 x 8 grid (96 zones): The standard for xT calculations. Each zone is approximately 8.75m x 8.5m, which is fine enough for most purposes.
  • 18 x 12 grid (216 zones): Higher resolution, used in more detailed models.

Advantages of zone-based models: - Computationally simple and fast - Easy to interpret and communicate to coaches - Work with event data (no tracking data required) - Aggregate naturally across matches and seasons

Limitations: - Arbitrary boundary effects (a pass from zone A to zone B is valued differently from a pass that crosses the same distance but stays within a single zone) - Cannot capture within-zone variation (the edge of the penalty area near the corner is very different from the centre of the penalty area, but they may fall in the same zone) - Resolution is fixed and cannot adapt to tactical context

17.8.2 The Continuous Approach

Continuous models treat the pitch as a continuous surface and assign values to every point, not just zone centres. Pitch control models (Section 17.3) are inherently continuous. Other continuous approaches include:

  • Kernel density estimation (KDE): Smooth the locations of events (shots, passes, tackles) using a Gaussian kernel to produce a continuous surface of event density.
  • Gaussian process regression: Fit a Gaussian process to observed outcomes (e.g., goal probability) at specific locations, producing a smooth surface with uncertainty estimates.
  • Neural network surfaces: Deep learning models (such as the SoccerMap architecture by Fernandez and Bornn, 2021) that learn continuous spatial value functions directly from data.

Advantages of continuous models: - No arbitrary boundary effects - Can capture fine-grained spatial variation - Naturally integrate with tracking data - Can be conditioned on game context (score state, time remaining)

Limitations: - Higher computational cost - Require tracking data for full benefit - Harder to interpret and communicate - Risk of overfitting to noise

17.8.3 Hybrid Approaches

In practice, many analysts use hybrid approaches that combine the strengths of both paradigms:

  1. Use zone-based models for season-level aggregation and comparison (where the large sample size averages out within-zone variation).
  2. Use continuous models for frame-level analysis in specific match situations (where the precise spatial structure matters).
  3. Use zone-based overlays on continuous surfaces for communication (divide the pitch control surface into zones and report the average control in each zone).

Callout --- Choosing the Right Model: The choice between zone-based and continuous models is not primarily a technical one---it is driven by the question being asked and the audience for the answer. A coaching staff reviewing a specific attacking sequence needs continuous spatial detail. A recruitment department comparing players across a season needs zone-based aggregation. A board presentation needs simple, high-level summaries. The best spatial analysts are fluent in both paradigms and choose the right tool for each context.


17.9 Applications in Tactical Analysis

17.9.1 Pre-Match Preparation

Spatial models are invaluable for scouting upcoming opponents:

  • Defensive vulnerability maps: Compute the average DSM of the opponent when defending. This reveals systematic gaps --- for example, a team that consistently leaves dangerous space in the right half-space when its left-back pushes high.
  • Pressing triggers: Identify the pitch zones where the opponent's pitch control drops sharply after a back pass, indicating opportunities to win possession high.
  • Set-piece ownership: Voronoi analysis of corner kicks and free kicks reveals which zones are typically contested and which are conceded.

17.9.2 In-Match Analysis

Real-time spatial analysis is increasingly used in the technical area:

  • Half-time adjustments: Comparing first-half pitch control surfaces against the match plan to diagnose tactical issues.
  • Substitution impact: Overlaying the pitch control surface before and after a substitution to see how the spatial structure changed.
  • Fatigue detection: As players tire, their influence functions shrink (lower $\sigma_{\parallel}$ due to reduced top speed), which is visible in the pitch control surface.

Callout --- Real-Time Constraints: In-match spatial analysis faces severe time constraints. Analysts have approximately 15 minutes at half-time to diagnose issues and recommend adjustments. This means that spatial models must produce results in seconds, not minutes. Pre-computed dashboards with automated anomaly detection ("our pitch control in the left half-space is 15% below our season average") are more useful than raw pitch control surfaces that require manual interpretation.

17.9.3 Post-Match Review

After the final whistle, spatial models support deeper analysis:

  • Passage-of-play breakdowns: For each goal or clear chance, reconstruct the pitch control surface at each key moment (e.g., the pass before the assist, the assist itself, the shot) to identify the spatial trigger.
  • Pressing effectiveness: Compute how much pitch control the pressing team gains in the five seconds after initiating a press, distinguishing successful from failed pressing sequences.
  • Positional maps under control context: Traditional average-position plots are misleading because they ignore the context of possession. By conditioning on pitch control state (e.g., only frames where the team controls > 60 % of the pitch), we get a more meaningful picture of team shape.

17.9.4 Player Recruitment and Evaluation

Spatial metrics open new dimensions for scouting:

  • Space creation per 90: Identifies players who generate value through off-ball movement, a quality invisible in traditional statistics.
  • Dangerous-space entries per 90: Measures a player's ability to penetrate into high-value zones.
  • Pitch control won on the dribble: Quantifies how much territorial advantage a ball carrier generates through individual actions.

These metrics are particularly valuable for identifying undervalued players whose contributions are not captured by goals, assists, or expected assists.

17.9.5 Integration with Expected Threat

Pitch control and expected threat (xT) are natural complements. While xT assigns a value to each zone based on historical probability of scoring from that location, pitch control tells us who controls that zone at any given moment. The product --- pitch-control-weighted xT --- gives a dynamic, frame-by-frame measure of threatening possession:

$$ \text{PC-xT}_A(t) = \sum_{(x,y) \in \mathcal{G}} \mathrm{PC}_A(x, y, t) \cdot xT(x, y) \cdot \delta A $$

Differencing this quantity before and after an action (pass, carry, dribble) yields the spatial value added (SVA) of that action, providing a spatially-aware alternative to traditional expected-threat models.


17.10 Computational Considerations and Scalability

17.10.1 The Scale of the Problem

Processing spatial data at scale presents significant computational challenges. A single match at 25 Hz generates 135,000 frames. Computing pitch control at each frame on a $105 \times 68$ grid requires evaluating the influence of 22 players at 7,140 grid points per frame, yielding approximately $22 \times 7{,}140 \times 135{,}000 \approx 21.2$ billion influence function evaluations per match.

For a full season of 380 matches, the total computation exceeds 8 trillion evaluations---clearly beyond the scope of naive implementations.

17.10.2 Optimisation Strategies

Several strategies reduce the computational burden:

  1. Spatial locality: A player's influence is negligible beyond approximately 3 standard deviations from their mean position. This allows pruning: for each grid point, evaluate only the players whose influence exceeds a threshold (typically within 30-40m), reducing the effective number of players per grid point from 22 to typically 4-8.

  2. Temporal subsampling: Full 25 Hz computation is rarely necessary. Subsampling to 5 Hz (every 5th frame) reduces computation by 80% with minimal information loss for most tactical analyses. For event-specific analysis (e.g., the moment of a pass), full-resolution computation can be applied selectively.

  3. Vectorised computation: NumPy's broadcasting and vectorized operations allow computing all grid points simultaneously, replacing Python loops with optimized C-level operations. A well-vectorized implementation can compute pitch control for a single frame in under 10 milliseconds.

def vectorized_pitch_control(
    team_a_pos: np.ndarray,    # (n_a, 2)
    team_a_vel: np.ndarray,    # (n_a, 2)
    team_b_pos: np.ndarray,    # (n_b, 2)
    team_b_vel: np.ndarray,    # (n_b, 2)
    grid: np.ndarray,          # (n_grid, 2)
    dt: float = 0.7,
    sigma_par: float = 10.0,
    sigma_perp: float = 5.0,
) -> np.ndarray:
    """Compute pitch control using vectorised operations.

    Args:
        team_a_pos: Positions of team A players.
        team_a_vel: Velocities of team A players.
        team_b_pos: Positions of team B players.
        team_b_vel: Velocities of team B players.
        grid: Target grid points.
        dt: Look-ahead time.
        sigma_par: Parallel spread.
        sigma_perp: Perpendicular spread.

    Returns:
        Array of pitch control values for team A at each grid point.
    """
    def team_influence(pos, vel):
        # Predict future positions
        mu = pos + vel * dt  # (n_players, 2)
        # Compute distances to all grid points
        # mu: (n_players, 1, 2), grid: (1, n_grid, 2)
        diff = grid[np.newaxis, :, :] - mu[:, np.newaxis, :]  # (n_players, n_grid, 2)
        # Simplified isotropic influence (for speed)
        sigma_sq = (sigma_par ** 2 + sigma_perp ** 2) / 2
        dist_sq = np.sum(diff ** 2, axis=2)  # (n_players, n_grid)
        influence = np.exp(-dist_sq / (2 * sigma_sq))
        return np.sum(influence, axis=0)  # (n_grid,)

    inf_a = team_influence(team_a_pos, team_a_vel)
    inf_b = team_influence(team_b_pos, team_b_vel)
    return inf_a / (inf_a + inf_b + 1e-10)
  1. GPU acceleration: For production systems processing hundreds of matches, GPU-based implementations using PyTorch or JAX can achieve 10-100x speedups over CPU-only NumPy code. The pitch control computation is embarrassingly parallel across grid points, making it an excellent candidate for GPU acceleration.

  2. Pre-computation and caching: For analysis that requires pitch control at only specific moments (e.g., the frame when a pass is released), pre-computing and caching pitch control for all events in a match is far more efficient than computing the full 135,000-frame surface.

17.10.3 Approximation Techniques

When exact pitch control is too expensive, several approximation techniques offer good accuracy at reduced cost:

  • Coarse-to-fine refinement: Compute pitch control on a coarse grid (e.g., $26 \times 17$), identify regions of interest (high value, high contention), then refine only those regions at higher resolution.
  • Nearest-neighbor approximation: For each grid point, compute the time-to-arrive for only the two nearest players from each team, rather than all players. This is exact in most of the pitch (where only nearby players matter) and slightly approximate in contested zones.
  • Lookup table interpolation: Pre-compute influence values for a range of distances and speeds, then interpolate at runtime. This replaces expensive exponential evaluations with table lookups.

Callout --- Production vs. Research: Research implementations of pitch control models prioritize correctness and flexibility. Production implementations prioritize speed and scalability. The two are often very different codebases. When moving from research to production, expect to rewrite most of the spatial computation code, replace Python loops with vectorized operations, add caching layers, and implement error handling for tracking data quality issues (missing frames, misidentified players, calibration errors).


17.11 Future Directions in Spatial Analytics

17.11.1 Deep Learning Approaches

The next generation of spatial models is being driven by deep learning. Key developments include:

  • SoccerMap (Fernandez & Bornn, 2021): A convolutional neural network that takes tracking data as input and directly outputs pitch control and spatial value surfaces. Unlike the analytical models in Section 17.3, SoccerMap learns the relationship between player configurations and spatial value from data, potentially capturing patterns that hand-crafted models miss.
  • Graph neural networks for tactical patterns: Representing each frame as a graph (players as nodes, spatial relationships as edges) and using graph neural networks to classify tactical states, predict actions, and evaluate spatial configurations.
  • Sequence models for movement prediction: Recurrent neural networks and transformers that predict future player trajectories, enabling forward-looking pitch control that accounts for where players will be, not just where they are.

17.11.2 Three-Dimensional Spatial Analysis

Current spatial models operate in two dimensions, ignoring the vertical dimension. However, the height of the ball and the jumping ability of players are critical factors in aerial duels, crosses, and set pieces. Future models that incorporate the z-coordinate will capture:

  • Aerial control surfaces: Pitch control extended to three dimensions for crossed balls and set pieces.
  • Trajectory-aware passing models: Models that consider the arc of a pass, not just its origin and destination, to assess whether a lofted ball can clear defenders.
  • Goalkeeper reach models: Three-dimensional influence functions for goalkeepers that account for their ability to reach high shots.

17.11.3 Integration with Biomechanical Data

As wearable sensor technology advances, tracking data is being augmented with biomechanical information: body orientation, limb positions, fatigue indicators, and muscle load. Integrating this data with spatial models will enable:

  • Orientation-aware pitch control: A player facing away from the ball has less effective influence than one facing toward it. Body orientation data allows this to be modeled explicitly.
  • Fatigue-adjusted influence functions: As a match progresses, tired players cover less ground and accelerate more slowly. Dynamically adjusting influence function parameters based on physical load data will produce more accurate pitch control in the later stages of matches.
  • Injury risk surfaces: Combining spatial demands (how much ground a player needs to cover) with biomechanical load data to identify moments where injury risk is elevated.

17.11.4 Generative Models for Tactical Simulation

Perhaps the most exciting frontier is the use of generative models to simulate alternative tactical scenarios. Given a specific match situation, a generative model could answer questions like:

  • "What would the pitch control surface look like if the left-back held a deeper position?"
  • "If we change the pressing trigger from the centre-forward to the inside-forward, how does the expected pitch control in the pressing zone change?"
  • "What is the optimal off-ball movement pattern for the front three in this specific defensive configuration?"

These counterfactual simulations would transform spatial analytics from a descriptive tool (what happened) into a prescriptive tool (what should happen), directly supporting tactical decision-making.

Callout --- The Limits of Spatial Models: Despite their power, spatial models have fundamental limitations. They cannot capture psychological factors (a defender's confidence, a striker's composure), communication between players, or the qualitative difference between a player who controls the ball cleanly and one who takes a heavy touch. Spatial models describe where players are and where they can go, but not the full richness of what they can do when they get there. The best spatial analysts combine quantitative spatial models with qualitative video analysis, using each to compensate for the other's blind spots.

17.11.5 Ethical and Practical Considerations

As with all advanced analytics, spatial models must be deployed responsibly:

  • Data access: Tracking data is not publicly available for most leagues. Analysts working with public event data can approximate some spatial metrics using freeze-frame data (available in StatsBomb open data), but full pitch control requires tracking data.
  • Model uncertainty: Pitch control surfaces are estimates, not ground truth. Communicating uncertainty (e.g., via confidence bands) is essential when presenting to coaches and decision-makers.
  • Computational cost: Frame-by-frame pitch control computation for a full match can take minutes even on modern hardware. Efficient implementations using vectorised NumPy operations and spatial indexing (e.g., KD-trees) are necessary for production systems.
  • Player privacy: Tracking data captures player movement at high resolution, raising questions about data ownership and privacy, particularly when data is used for purposes beyond the original collection context (e.g., selling movement profiles to third parties).

Summary

Spatial analysis and pitch control represent the frontier of tactical analytics in professional soccer. By moving beyond on-ball events to model the full positional structure of the game, we gain insights into space creation, off-ball movement, defensive vulnerability, and territorial control that are invisible in traditional statistics.

The progression from simple Voronoi diagrams to probabilistic pitch control models mirrors the broader arc of sports analytics: from deterministic, geometry-based heuristics to probabilistic, physics- informed models. Each step adds nuance and predictive power at the cost of additional complexity and data requirements.

The tools and frameworks presented in this chapter --- Voronoi tessellations, Fernandez--Bornn influence functions, Spearman's time-to-intercept model, space creation metrics, off-ball run detection, dangerous-space identification, defensive shape analysis, and spatial value surfaces --- form the core toolkit of the modern spatial analyst. In the hands of a skilled practitioner, they transform raw tracking coordinates into actionable tactical intelligence.

The field continues to evolve rapidly, with deep learning, three-dimensional models, biomechanical integration, and generative simulation pushing the boundaries of what spatial analysis can accomplish. What remains constant is the fundamental insight captured in Cruyff's opening quote: football is a game of spaces, and the team that better understands, creates, and exploits space will, more often than not, prevail.


References

  • Collet, C. (2013). The possession game? A comparative analysis of ball retention and team success in European and international football, 2007--2010. Journal of Sports Sciences, 31(2), 123--136.
  • Fernandez, J., & Bornn, L. (2018). Wide Open Spaces: A statistical technique for measuring space creation in professional soccer. MIT Sloan Sports Analytics Conference.
  • Fernandez, J., & Bornn, L. (2021). SoccerMap: A Deep Learning Architecture for Visuo-Spatial Analysis in Soccer. ECML PKDD.
  • Kim, S. (2004). Voronoi Analysis of a Soccer Game. Nonlinear Analysis: Modelling and Control, 9(3), 233--240.
  • Power, P., Ruiz, H., Wei, X., & Lucey, P. (2017). Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data. KDD.
  • Singh, K. (2019). Introducing Expected Threat (xT). Blog post.
  • Spearman, W. (2017). Beyond Expected Goals. MIT Sloan Sports Analytics Conference.
  • Spearman, W. (2018). Physics-Based Modeling of Pass Probabilities in Soccer. MIT Sloan Sports Analytics Conference.
  • Taki, T., & Hasegawa, J. (2000). Visualization of dominant region in team games and its application to teamwork analysis. Proceedings of Computer Graphics International.