> "Football is a game of spaces. Whoever controls the space, controls the game."
Learning Objectives
- Describe FIFA pitch dimension standards and explain how variability affects spatial analysis
- Define and convert between the major coordinate systems used by data providers (StatsBomb, Opta, Wyscout, FIFA EPTS)
- Apply affine transformations to translate, scale, rotate, and reflect coordinate data
- Partition the pitch into meaningful zones and compute zone-based summary statistics
- Create publication-quality pitch visualizations using mplsoccer
- Generate and interpret heat maps and kernel density estimates of player and ball positions
- Construct shot maps and pass network visualizations on the pitch
- Explain the foundational ideas behind pitch control models including Voronoi and Fernandez-Bornn approaches
- Understand action density surfaces and spatial value models
In This Chapter
- Chapter Overview
- 6.1 Pitch Dimensions and Standardization
- 6.2 Coordinate Systems and Transformations
- 6.3 Zones and Regions of Interest
- 6.4 Visualizing Spatial Data
- 6.5 Heat Maps and Density Estimation
- 6.6 Introduction to Pitch Control Concepts
- 6.7 Action Density and Spatial Value Surfaces
- Chapter Summary
- Looking Ahead
Chapter 6: The Soccer Pitch as a Coordinate System
"Football is a game of spaces. Whoever controls the space, controls the game." -- Johan Cruyff
Chapter Overview
Every event in a soccer match -- a pass, a tackle, a shot -- happens at a specific location on the pitch. To analyze the game quantitatively, we need a precise language for describing where things happen. That language is coordinate geometry.
In this chapter we treat the soccer pitch as a two-dimensional coordinate plane. We begin with the physical object itself: the Laws of the Game specify pitch dimensions within a range, and data providers each impose their own coordinate system on that rectangle. Understanding these systems, and being able to convert fluently between them, is the essential first skill of spatial soccer analytics.
From there we move to interpretation. Raw coordinates are rarely useful on their own; analysts need ways to aggregate positions into meaningful regions (zones, thirds, half-spaces) and to visualize spatial patterns (scatter plots, heat maps, kernel density surfaces). We explore shot maps, pass maps, and pass networks as applied spatial visualizations. We close the chapter by introducing the concept of pitch control -- the idea that a player's influence extends beyond the single point they occupy -- and the related idea of spatial value surfaces, setting the stage for the tracking-data chapters later in the book.
By the end of this chapter you will be comfortable loading event data, transforming it into a common coordinate frame, partitioning the pitch into analytically useful regions, and producing professional-quality spatial visualizations.
6.1 Pitch Dimensions and Standardization
6.1.1 The Laws of the Game
The International Football Association Board (IFAB) Laws of the Game specify that a pitch must be rectangular, with the following dimensional constraints:
| Dimension | Minimum | Maximum | Recommended (international) |
|---|---|---|---|
| Length (touchline) | 90 m | 120 m | 105 m |
| Width (goal line) | 45 m | 90 m | 68 m |
For international matches FIFA mandates a pitch of 105 m x 68 m, and this is the de facto standard adopted by most professional leagues and data providers. Throughout this book, unless stated otherwise, we assume these dimensions.
The allowable range is enormous. A pitch could legally be as small as 90 x 45 m (4,050 m^2) or as large as 120 x 90 m (10,800 m^2) --- a nearly three-fold difference in playing area. In practice, top-flight leagues operate within a much narrower band, but meaningful variation still exists.
Common Pitfall: Not every stadium uses the 105 x 68 standard. The Premier League, for example, permits pitches as narrow as 64 m. If you are comparing spatial data across venues without normalizing for pitch size, your analysis may contain a systematic bias. Always check whether your data provider has already normalized coordinates to a standard pitch.
6.1.2 Key Pitch Markings and Their Positions
Key pitch markings and their metric positions (assuming 105 x 68 m):
| Feature | Position |
|---|---|
| Centre spot | (52.5, 34.0) |
| Centre circle radius | 9.15 m |
| Penalty spot | 11.0 m from goal line |
| Penalty area | 16.5 m from each post, 16.5 m deep |
| Goal area (6-yard box) | 5.5 m from each post, 5.5 m deep |
| Goal width | 7.32 m |
| Goal height | 2.44 m |
| Corner arc radius | 1.0 m |
These markings are precisely defined by the Laws of the Game and do not vary between venues. The penalty spot is always 11.0 m from the goal line. The goal always measures 7.32 m wide and 2.44 m tall. These fixed dimensions are critical anchor points for spatial calculations such as shot angle, shot distance, and expected goals models.
6.1.3 Variation in Pitch Sizes Across Leagues and Venues
Although the 105 x 68 standard dominates international competition, domestic leagues allow --- and clubs exploit --- dimensional variation. Some notable examples:
| Club / Venue | Approximate Dimensions | Strategic Rationale |
|---|---|---|
| Barcelona (Camp Nou) | 105 x 68 m | Standard; suits possession-based play |
| Stoke City (bet365 Stadium, historical) | 100 x 64 m | Narrower pitch compressed space, suited direct physical style |
| Manchester City (Etihad Stadium) | 105 x 68 m | Maximum width to exploit wide overloads |
The Premier League standardized pitch sizes to 105 x 68 m for the 2024-25 season onward, but historically some clubs used narrower or shorter pitches. In the Bundesliga, dimensions have long been more standardized, while in Serie A, some older stadiums with running tracks (like the Stadio Olimpico) feature slightly different proportions.
Real-World Application: When Tony Pulis managed Stoke City, the club reportedly used one of the narrowest permitted pitches in the Premier League. This reduced the space available for opponents' wide players and favored Stoke's direct, physical style of play. Analysts comparing player performance across venues must account for such tactical dimensions.
6.1.4 Why Dimensions Matter for Analytics
Pitch size affects the game in measurable ways. A wider pitch stretches defences and creates more space for attackers in the flanks. A shorter pitch compresses midfield and favours high-pressing teams. When we compute metrics such as pass distance, shot angle, or defensive compactness, the underlying pitch dimensions directly determine the numerical result.
Consider a simple shot-angle calculation. For a shot taken from position $(x, y)$ toward a goal centred at $(x_g, y_{g1})$ and $(x_g, y_{g2})$, the angle subtended is:
$$\theta = \arctan\!\left(\frac{y_{g2} - y}{x_g - x}\right) - \arctan\!\left(\frac{y_{g1} - y}{x_g - x}\right)$$
If the pitch width changes but the data is not renormalized, the $y$-coordinates shift, and $\theta$ is miscalculated. This is one of many reasons to always work in a consistent coordinate frame.
The effect extends beyond individual calculations. Metrics like defensive compactness (the area enclosed by a team's outfield players), high-press intensity (the frequency of pressing actions in the opponent's third), and passing distance distributions all depend on pitch dimensions. Even heat maps can be distorted if they are generated on a 100 x 64 pitch but displayed on a 105 x 68 template.
6.1.5 Normalizing Pitch Coordinates
When pitch dimensions vary, we can normalize every position to a unit pitch of dimensions $[0, 1] \times [0, 1]$:
$$x_{\text{norm}} = \frac{x}{L}, \qquad y_{\text{norm}} = \frac{y}{W}$$
where $L$ is the pitch length and $W$ is the pitch width. This allows apples-to-apples comparison across venues. You can then rescale to any desired reference frame (e.g., 105 x 68) by multiplying back:
$$x_{105} = x_{\text{norm}} \times 105, \qquad y_{68} = y_{\text{norm}} \times 68$$
import numpy as np
def normalize_coordinates(
x: np.ndarray,
y: np.ndarray,
pitch_length: float,
pitch_width: float,
) -> tuple[np.ndarray, np.ndarray]:
"""Normalize coordinates to the unit square [0,1] x [0,1]."""
return x / pitch_length, y / pitch_width
def rescale_coordinates(
x_norm: np.ndarray,
y_norm: np.ndarray,
target_length: float = 105.0,
target_width: float = 68.0,
) -> tuple[np.ndarray, np.ndarray]:
"""Rescale unit-square coordinates to a target pitch size."""
return x_norm * target_length, y_norm * target_width
Best Practice: Always record the source coordinate system and pitch dimensions in your data pipeline's metadata. A simple JSON sidecar file or a column in your DataFrame will save hours of debugging later.
6.1.6 The Third Dimension: Ball Height and Tracking Data
While most event data treats the pitch as a two-dimensional surface, tracking data and ball-tracking systems increasingly provide a third dimension: the height ($z$-coordinate) of the ball. This is critical for analyzing:
- Aerial duels: Knowing the ball's height at the moment of a header.
- Crosses: Distinguishing between low, driven crosses and high, looping crosses.
- Shot trajectories: Computing whether a shot passed over or under the crossbar.
- Goalkeeper positioning: Assessing whether a goalkeeper was at the correct height to make a save.
The FIFA EPTS (Electronic Performance and Tracking Systems) standard includes a $z$-axis measured in metres from the pitch surface. We will work primarily in two dimensions throughout this chapter but will return to three-dimensional analysis in Chapter 18 (Tracking Data Analysis).
6.2 Coordinate Systems and Transformations
6.2.1 Data Provider Coordinate Systems
The three major event-data providers each use a different coordinate system. Understanding these differences is essential for anyone who works with more than one data source.
StatsBomb
- Pitch dimensions: 120 x 80 (arbitrary units, not metres)
- Origin: top-left corner of the pitch when the attacking team attacks left-to-right
- $x$-axis: runs left to right (0 to 120)
- $y$-axis: runs top to bottom (0 to 80)
- The team always attacks toward $x = 120$ in the first half
Opta
- Pitch dimensions: 100 x 100 (percentage-based)
- Origin: bottom-left for the team's own half in the first half
- $x$-axis: 0 to 100 (own goal line to opposition goal line)
- $y$-axis: 0 to 100 (left touchline to right touchline from the perspective of the attacking team)
Wyscout
- Pitch dimensions: 100 x 100 (percentage-based)
- Origin: top-left (similar to screen coordinates)
- $x$-axis: 0 to 100 (left to right, own goal to opposition goal)
- $y$-axis: 0 to 100 (top to bottom)
FIFA EPTS (tracking data)
- Pitch dimensions: actual metres (105 x 68 for standard pitches)
- Origin: centre of the pitch (0, 0)
- $x$-axis: -52.5 to +52.5
- $y$-axis: -34.0 to +34.0
- Includes a $z$-axis for ball height
| Provider | Length units | Width units | Origin | y-direction |
|---|---|---|---|---|
| StatsBomb | 0 -- 120 | 0 -- 80 | Top-left | Downward |
| Opta | 0 -- 100 | 0 -- 100 | Bottom-left | Upward |
| Wyscout | 0 -- 100 | 0 -- 100 | Top-left | Downward |
| FIFA EPTS | -52.5 -- 52.5 | -34.0 -- 34.0 | Centre | Upward |
Intuition: Think of each provider's system as a different "ruler" placed on the same physical object. The events themselves do not change -- only the numbers used to describe them. Coordinate transformations are just ruler-to-ruler conversions.
6.2.2 Why Do Providers Use Different Systems?
The divergence in coordinate systems is partly historical and partly practical. StatsBomb chose 120 x 80 to provide more granular integer coordinates without resorting to floating-point numbers (120 x 80 = 9,600 grid cells, compared to 100 x 100 = 10,000 for Opta/Wyscout, but with a more realistic aspect ratio). Opta and Wyscout chose percentage-based systems to abstract away from physical pitch dimensions entirely --- every pitch maps to the same [0, 100] x [0, 100] grid regardless of its actual size. FIFA EPTS uses real-world metres centered on the pitch midpoint because tracking data must align with physical camera systems and GPS receivers.
The $y$-axis direction is the most treacherous difference. StatsBomb and Wyscout use a "screen coordinate" convention where $y$ increases downward (matching how computer graphics typically work). Opta and FIFA EPTS use a "mathematical" convention where $y$ increases upward. Mixing up the $y$-direction is one of the most common sources of bugs in spatial soccer analysis.
Common Pitfall: If your pass arrows are all pointing in the wrong vertical direction, or your shot map shows goals being scored at the wrong end of the pitch, the first thing to check is the $y$-axis convention. This error is so common that experienced analysts check it reflexively when working with a new dataset.
6.2.3 Affine Transformations
Every conversion between the systems above can be expressed as an affine transformation: a combination of scaling, reflection, and translation. In two dimensions the general form is:
$$\begin{pmatrix} x' \\ y' \end{pmatrix} = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} + \begin{pmatrix} t_x \\ t_y \end{pmatrix}$$
For our purposes, the off-diagonal elements ($b$ and $c$) are zero because the coordinate axes are never rotated between providers -- only scaled, translated, or reflected. So the transformation simplifies to:
$$x' = s_x \cdot x + t_x, \qquad y' = s_y \cdot y + t_y$$
where $s_x, s_y$ are scale factors and $t_x, t_y$ are translations. A reflection (flipping the $y$-axis) is encoded as a negative scale factor.
The beauty of affine transformations is that they are composable. If you know how to convert from System A to System B, and from System B to System C, you can compose the two transformations to convert directly from A to C. This is why a hub-and-spoke architecture (with one canonical system at the centre) is the most efficient approach.
6.2.4 Converting Between Providers
Let us derive the conversion from StatsBomb to standard metres (105 x 68, origin at bottom-left, $y$ upward).
- Scale $x$: StatsBomb $x$ runs from 0 to 120; we want 0 to 105. Scale factor: $s_x = 105 / 120 = 0.875$.
- Reflect and scale $y$: StatsBomb $y$ runs from 0 (top) to 80 (bottom); we want 0 (bottom) to 68 (top). We need to flip: $y' = 68 - (68/80) \cdot y$, i.e., $s_y = -68/80 = -0.85$ and $t_y = 68$.
def statsbomb_to_metres(
x: np.ndarray, y: np.ndarray
) -> tuple[np.ndarray, np.ndarray]:
"""Convert StatsBomb coordinates (120x80) to metres (105x68, origin bottom-left)."""
x_m = x * (105.0 / 120.0)
y_m = 68.0 - y * (68.0 / 80.0)
return x_m, y_m
Similarly, for Opta (already $y$-up, percentage-based):
def opta_to_metres(
x: np.ndarray, y: np.ndarray
) -> tuple[np.ndarray, np.ndarray]:
"""Convert Opta coordinates (100x100) to metres (105x68, origin bottom-left)."""
x_m = x * (105.0 / 100.0)
y_m = y * (68.0 / 100.0)
return x_m, y_m
And for Wyscout ($y$-down, percentage-based):
def wyscout_to_metres(
x: np.ndarray, y: np.ndarray
) -> tuple[np.ndarray, np.ndarray]:
"""Convert Wyscout coordinates (100x100) to metres (105x68, origin bottom-left)."""
x_m = x * (105.0 / 100.0)
y_m = 68.0 - y * (68.0 / 100.0)
return x_m, y_m
For FIFA EPTS (origin at centre):
def epts_to_metres(
x: np.ndarray, y: np.ndarray
) -> tuple[np.ndarray, np.ndarray]:
"""Convert FIFA EPTS coordinates (centre origin) to metres (105x68, origin bottom-left)."""
x_m = x + 52.5
y_m = y + 34.0
return x_m, y_m
Best Practice: Write a single
convert_coordinates()function that acceptssourceandtargetarguments. Internally, route through a canonical frame (e.g., metres with bottom-left origin). This way you only need $N$ converters (one per provider) instead of $N^2$ pairwise converters.
6.2.5 A Unified Conversion Pipeline
from typing import Literal
CoordSystem = Literal["statsbomb", "opta", "wyscout", "epts", "metres"]
# Each function maps FROM a provider TO metres (105x68, bottom-left origin, y-up).
_TO_METRES = {
"statsbomb": statsbomb_to_metres,
"opta": opta_to_metres,
"wyscout": wyscout_to_metres,
"epts": epts_to_metres,
"metres": lambda x, y: (x, y),
}
def metres_to_statsbomb(x, y):
return x * (120.0 / 105.0), (68.0 - y) * (80.0 / 68.0)
def metres_to_opta(x, y):
return x * (100.0 / 105.0), y * (100.0 / 68.0)
def metres_to_wyscout(x, y):
return x * (100.0 / 105.0), (68.0 - y) * (100.0 / 68.0)
def metres_to_epts(x, y):
return x - 52.5, y - 34.0
_FROM_METRES = {
"statsbomb": metres_to_statsbomb,
"opta": metres_to_opta,
"wyscout": metres_to_wyscout,
"epts": metres_to_epts,
"metres": lambda x, y: (x, y),
}
def convert_coordinates(
x: np.ndarray,
y: np.ndarray,
source: CoordSystem,
target: CoordSystem,
) -> tuple[np.ndarray, np.ndarray]:
"""Convert coordinates between any two supported systems via metres."""
x_m, y_m = _TO_METRES[source](x, y)
return _FROM_METRES[target](x_m, y_m)
6.2.6 Validating Your Coordinate Transformations
A simple but effective validation strategy is to convert known landmark positions and check that they land where expected. The penalty spot, for example, should always map to a point 11.0 m from the goal line and centered on the pitch width.
def validate_transformation(source: CoordSystem) -> None:
"""Validate coordinate transformation by checking known landmarks."""
# Centre spot in each system
landmarks = {
"statsbomb": {"centre": (60.0, 40.0), "penalty_spot_attacking": (108.0, 40.0)},
"opta": {"centre": (50.0, 50.0), "penalty_spot_attacking": (88.5 / 105 * 100, 50.0)},
"wyscout": {"centre": (50.0, 50.0), "penalty_spot_attacking": (88.5 / 105 * 100, 50.0)},
"epts": {"centre": (0.0, 0.0), "penalty_spot_attacking": (41.5, 0.0)},
}
if source not in landmarks:
return
for name, (x, y) in landmarks[source].items():
x_m, y_m = convert_coordinates(
np.array([x]), np.array([y]), source, "metres"
)
print(f"{source} {name}: ({x}, {y}) -> metres ({x_m[0]:.1f}, {y_m[0]:.1f})")
Best Practice: Always validate your coordinate transformations before running a full analysis. Converting two or three known landmarks takes thirty seconds and can save hours of debugging caused by a flipped axis or incorrect scale factor.
6.2.7 Direction of Play and Half Alignment
Most providers encode events so that a team always attacks in one direction (e.g., left-to-right). When they do not, or when you need to compare first-half and second-half data, you must flip the coordinates for one half:
$$x_{\text{flipped}} = L - x, \qquad y_{\text{flipped}} = W - y$$
where $L$ and $W$ are the pitch dimensions in the current system.
This is particularly relevant when working with tracking data, where the raw coordinates reflect the actual physical position on the pitch and teams switch ends at half-time. Without aligning the direction of play, a team's attacking actions from the first half would appear in the left side of the pitch, while their second-half attacking actions would appear in the right side --- making heat maps, pass maps, and shot maps unreadable.
Common Pitfall: Forgetting to align direction of play is one of the most frequent sources of error in spatial analytics. If your heat map shows a striker spending most of the match in their own half, check the direction-of-play flag before questioning the data.
6.2.8 Handling Edge Cases and Data Quality
Real-world coordinate data is not always clean. Common issues include:
- Coordinates outside the pitch boundary. Events near the touchline or goal line may have coordinates slightly outside [0, L] x [0, W] due to measurement imprecision. Clipping to the valid range is usually appropriate.
- Missing coordinates. Some events (substitutions, cards) may not have meaningful spatial coordinates. These should be filtered before spatial analysis.
- Duplicate events. Some providers may record the same physical event multiple times (e.g., a pass from both the passer's and receiver's perspective). Deduplication is essential.
- Coordinate precision. StatsBomb provides coordinates to one decimal place; Opta provides integers. This affects the effective spatial resolution of your analysis.
def clean_coordinates(
x: np.ndarray,
y: np.ndarray,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
) -> tuple[np.ndarray, np.ndarray]:
"""Clean coordinates by clipping to pitch boundaries and removing NaN values."""
mask = np.isfinite(x) & np.isfinite(y)
x_clean = np.clip(x[mask], 0, pitch_length)
y_clean = np.clip(y[mask], 0, pitch_width)
return x_clean, y_clean
6.3 Zones and Regions of Interest
6.3.1 Why Zones?
Raw $(x, y)$ coordinates are precise but not always interpretable. Coaches and analysts think in terms of regions: "the final third", "the left half-space", "the box". Zoning the pitch bridges the gap between continuous spatial data and the discrete language of tactical analysis.
Zoning also helps with sample size. A single pixel on a heat map may contain only one or two events; grouping into zones aggregates events into bins large enough for meaningful statistics. This is particularly important for metrics like "completion rate in zone X" or "xG per shot from zone Y," which require enough events in each zone to produce stable estimates.
6.3.2 Thirds
The simplest zoning scheme divides the pitch into three equal longitudinal strips:
| Zone | $x$-range (metres) | Tactical meaning |
|---|---|---|
| Defensive third | 0.0 -- 35.0 | Build-up under pressure |
| Middle third | 35.0 -- 70.0 | Progression and transition |
| Attacking third (final third) | 70.0 -- 105.0 | Chance creation |
def assign_third(x: np.ndarray, pitch_length: float = 105.0) -> np.ndarray:
"""Assign each x-coordinate to a pitch third (1=defensive, 2=middle, 3=attacking)."""
third_length = pitch_length / 3
return np.clip(np.floor(x / third_length).astype(int) + 1, 1, 3)
The thirds partition is the most commonly used zoning scheme in broadcast analysis and coaching discussions. Phrases like "we need to be better in the final third" or "they won the ball back in the middle third" are part of the everyday vocabulary of the sport. Mapping these phrases to precise $x$-coordinate ranges allows analysts to quantify concepts that coaches describe qualitatively.
Intuition: The three-thirds partition roughly corresponds to three phases of play: build-up (defensive third), progression (middle third), and creation/finishing (attacking third). While the boundaries are somewhat arbitrary, they align well with how coaches organize their tactical plans.
6.3.3 Channels and Half-Spaces
German tactical analysis popularized the concept of five vertical channels (Binnenlaufbahn):
| Channel | $y$-range (metres) | Width |
|---|---|---|
| Left wing | 0.0 -- 11.33 | ~11.3 m |
| Left half-space | 11.33 -- 26.83 | ~15.5 m |
| Centre | 26.83 -- 41.17 | ~14.3 m |
| Right half-space | 41.17 -- 56.67 | ~15.5 m |
| Right wing | 56.67 -- 68.0 | ~11.3 m |
The half-spaces are considered the most dangerous areas because they sit between the centre-backs and the full-backs, offering diagonal passing and shooting angles. A player who receives the ball in the half-space can face the goal diagonally, giving them a wider view of the pitch and more options than a player on the wing (who faces the byline) or in the centre (who faces a wall of defenders).
The exact boundaries of the half-spaces vary by source. Some analysts use equal fifths of the pitch width (each 13.6 m), while others use the boundaries defined by the penalty area width and the touchlines. The version above is based on the influential work of Ren Maric and other German-language tactical writers.
Real-World Application: Pep Guardiola's Manchester City teams from 2017 onward systematically exploit the half-spaces. By measuring the proportion of progressive passes entering these channels, analysts can quantify a team's tactical adherence to positional play. Data from the 2022-23 season showed that Manchester City directed nearly 40% of their final-third entries through the half-spaces, compared to a league average of roughly 28%.
6.3.4 Zone 14
Zone 14 is a concept from the seminal work of analysts at Prozone (now part of Stats Perform). It refers to the central area just outside the penalty box, roughly defined as:
$$70.0 \leq x \leq 88.5, \qquad 20.5 \leq y \leq 47.5$$
(in standard metres). This zone is considered critical because actions there -- especially through-balls and shots -- correlate strongly with goals. Zone 14 offers a combination of proximity to goal and lateral space that makes it the most dangerous area on the pitch for creative actions.
Research has shown that passes completed into Zone 14 have a significantly higher probability of leading to a shot within the next five seconds compared to passes completed into adjacent zones. This makes "entries into Zone 14" a useful proxy for creative output, particularly for evaluating central midfielders and attacking midfielders.
def is_zone_14(
x: np.ndarray,
y: np.ndarray,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
) -> np.ndarray:
"""Return a boolean mask for events in Zone 14."""
x_min = pitch_length * (2 / 3) # 70.0
x_max = pitch_length - 16.5 # 88.5 (edge of penalty area)
y_min = (pitch_width / 2) - 13.5 # 20.5
y_max = (pitch_width / 2) + 13.5 # 47.5
return (x >= x_min) & (x <= x_max) & (y >= y_min) & (y <= y_max)
6.3.5 Custom Grid Zoning
For more granular analysis, we can overlay a regular grid. A common choice is a 6 x 4 grid (24 zones), but any resolution is possible.
def assign_grid_zone(
x: np.ndarray,
y: np.ndarray,
n_x: int = 6,
n_y: int = 4,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
) -> tuple[np.ndarray, np.ndarray]:
"""Assign events to cells in an n_x by n_y grid.
Returns:
col: column index (0 to n_x-1, left to right)
row: row index (0 to n_y-1, bottom to top)
"""
col = np.clip(
np.floor(x / (pitch_length / n_x)).astype(int), 0, n_x - 1
)
row = np.clip(
np.floor(y / (pitch_width / n_y)).astype(int), 0, n_y - 1
)
return col, row
The choice of grid resolution depends on the analysis. A 6 x 4 grid (24 zones) provides a good balance between tactical interpretability and statistical sample size. A 12 x 8 grid (96 zones) offers more spatial detail but requires a larger dataset to produce stable estimates in each zone. For expected threat models (Chapter 13), grids of 12 x 8 or 16 x 12 are common.
Best Practice: When choosing a grid resolution, compute the average number of events per cell. If any cell has fewer than 20-30 events, your zone-based statistics will be unstable. Either reduce the resolution, combine adjacent cells, or use kernel density estimation instead.
6.3.6 The Penalty Area and the "Box"
The penalty area extends 16.5 m from each goal post and 16.5 m into the pitch. In standard metres (origin bottom-left):
$$88.5 \leq x \leq 105.0, \qquad 13.84 \leq y \leq 54.16$$
where $y_{\text{min}} = (68 - 7.32)/2 - 16.5 + 16.5 = 13.84$ and $y_{\text{max}} = 68 - 13.84 = 54.16$.
More precisely, using the IFAB definition:
- Goal posts are at $y = (68 - 7.32)/2 = 30.34$ and $y = (68 + 7.32)/2 = 37.66$.
- The penalty area extends 16.5 m outward from each post along the goal line, then 16.5 m into the field.
- So the penalty area spans $y \in [30.34 - 16.5, 37.66 + 16.5] = [13.84, 54.16]$ and $x \in [88.5, 105.0]$.
def is_in_penalty_area(
x: np.ndarray,
y: np.ndarray,
attacking: bool = True,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
) -> np.ndarray:
"""Return a boolean mask for events inside the penalty area."""
pa_depth = 16.5
pa_half_width = 7.32 / 2 + 16.5 # 20.16
y_centre = pitch_width / 2
if attacking:
x_check = (x >= pitch_length - pa_depth) & (x <= pitch_length)
else:
x_check = (x >= 0) & (x <= pa_depth)
y_check = (y >= y_centre - pa_half_width) & (y <= y_centre + pa_half_width)
return x_check & y_check
Intuition: The penalty area covers roughly 9.5% of the total pitch area, yet a disproportionate share of goals originate from actions inside it. This is why zone-based shot analysis almost always starts with an "inside/outside the box" split.
6.3.7 Combining Zones: The 15-Zone and 18-Zone Models
Sophisticated tactical analysis often uses hybrid zoning schemes that combine thirds and channels. A 3 x 5 grid (thirds x channels) produces 15 zones. Adding a separate "penalty area" zone and splitting the attacking third into "penalty area" and "outside the box" creates an 18-zone model that is widely used in professional scouting platforms.
These hybrid models allow analysts to answer questions like: "How many progressive passes did this midfielder complete into the left half-space in the attacking third?" or "What percentage of opponent entries into our defensive third came through the wings versus the centre?" Such questions are fundamental to tactical analysis and recruitment.
6.4 Visualizing Spatial Data
6.4.1 Drawing the Pitch
Before plotting any data, we need to draw the pitch itself. The mplsoccer library provides production-ready pitch drawings with minimal code:
from mplsoccer import Pitch
# StatsBomb-style pitch (120 x 80)
pitch = Pitch(pitch_type="statsbomb", pitch_color="grass", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))
You can also specify other pitch types: "opta", "wyscout", "uefa", "metricasports", "custom", or "tracab".
The mplsoccer library, developed by Andrew Rowlinson, has become the standard tool for pitch visualizations in the Python soccer analytics community. It handles the fiddly geometry of pitch markings (arcs, circles, penalty areas) and provides a consistent API for overlaying data on the pitch.
Best Practice: When publishing visualizations, use
pitch_color="#1a1a2e"(dark background) withline_color="#e0e0e0"(light lines) for readability. The default grass texture can obscure subtle data patterns. Dark backgrounds also make colored data points and heat map overlays more visually prominent.
6.4.2 Scatter Plots
The simplest spatial visualization is a scatter plot of event locations:
import pandas as pd
from mplsoccer import Pitch
def plot_events(
df: pd.DataFrame,
x_col: str = "x",
y_col: str = "y",
pitch_type: str = "statsbomb",
title: str = "Event Locations",
) -> None:
"""Plot events as a scatter plot on a soccer pitch."""
pitch = Pitch(pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc")
fig, ax = pitch.draw(figsize=(12, 8))
pitch.scatter(df[x_col], df[y_col], ax=ax, s=30, color="#e74c3c", edgecolors="white",
linewidth=0.5, alpha=0.7, zorder=2)
ax.set_title(title, fontsize=16, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
Scatter plots work best when the number of events is small enough that individual points are distinguishable --- typically fewer than 200 events. For a single match's passes (typically 400-600 per team), scatter plots become cluttered. For a season's worth of touches (thousands of events), they are almost unreadable. In these cases, heat maps or binned statistics are more appropriate.
6.4.3 Shot Maps and Their Construction
Shot maps are one of the most popular and intuitive spatial visualizations in soccer analytics. They plot each shot as a point on the pitch, with visual encoding (color, size, marker shape) conveying additional information about the shot's outcome and quality.
A well-designed shot map encodes:
- Position: The $(x, y)$ location of the shot.
- Outcome: Goal (filled marker), saved (open marker), blocked (different shape), off-target (different shape).
- xG value: Marker size proportional to the shot's expected goals value.
from mplsoccer import Pitch
import pandas as pd
import matplotlib.pyplot as plt
def plot_shot_map(
df: pd.DataFrame,
pitch_type: str = "statsbomb",
title: str = "Shot Map",
) -> None:
"""Plot a shot map with xG-sized markers and outcome coloring."""
pitch = Pitch(
pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc",
half=True, # Show only the attacking half
)
fig, ax = pitch.draw(figsize=(12, 8))
# Define outcome colors
outcome_colors = {
"Goal": "#2ecc71",
"Saved": "#f39c12",
"Blocked": "#95a5a6",
"Off Target": "#e74c3c",
"Wayward": "#e74c3c",
"Post": "#3498db",
}
for outcome, group in df.groupby("outcome"):
color = outcome_colors.get(outcome, "#ffffff")
pitch.scatter(
group["x"], group["y"], ax=ax,
s=group["xg"] * 500 + 20, # Scale size by xG
color=color, edgecolors="white", linewidth=0.8,
alpha=0.8, zorder=3, label=outcome,
)
ax.legend(loc="upper left", fontsize=9, framealpha=0.7)
ax.set_title(title, fontsize=16, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
Real-World Application: Shot maps are a staple of post-match analysis in professional clubs and media outlets. They immediately reveal patterns such as: "This team takes too many shots from outside the box," or "This striker consistently gets into high-xG positions." The visual clarity of a shot map often communicates these patterns more effectively than a table of numbers.
6.4.4 Arrow Plots for Passes
Passes have a start and end location, making arrow plots natural:
def plot_passes(
df: pd.DataFrame,
pitch_type: str = "statsbomb",
title: str = "Pass Map",
) -> None:
"""Plot passes as arrows on a soccer pitch."""
pitch = Pitch(pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc")
fig, ax = pitch.draw(figsize=(12, 8))
# Separate completed and incomplete passes
completed = df[df["outcome"] == "Complete"]
incomplete = df[df["outcome"] != "Complete"]
pitch.arrows(
completed["x"], completed["y"],
completed["end_x"], completed["end_y"],
ax=ax, color="#2ecc71", width=2, headwidth=5, alpha=0.6,
)
pitch.arrows(
incomplete["x"], incomplete["y"],
incomplete["end_x"], incomplete["end_y"],
ax=ax, color="#e74c3c", width=1, headwidth=4, alpha=0.4,
)
ax.set_title(title, fontsize=16, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
Pass maps can be filtered by pass type (progressive, backward, lateral, cross, through ball) to isolate specific tactical patterns. For example, plotting only progressive passes (those that move the ball significantly closer to the opponent's goal) reveals a team's preferred build-up routes.
6.4.5 Pass Networks
A pass network goes beyond individual pass arrows to show the average positions of players and the volume of passes between each pair. Pass networks reveal a team's tactical shape and the key passing connections within the squad.
To construct a pass network:
- Compute each player's average position (mean $x$, mean $y$) based on their on-ball actions.
- Count the number of passes between each pair of players who were on the pitch simultaneously.
- Plot the average positions as nodes and the pass counts as edges, with node size proportional to involvement and edge width proportional to pass volume.
import numpy as np
import pandas as pd
from mplsoccer import Pitch
def plot_pass_network(
df_passes: pd.DataFrame,
df_positions: pd.DataFrame,
min_passes: int = 3,
pitch_type: str = "statsbomb",
title: str = "Pass Network",
) -> None:
"""Plot a pass network showing average positions and pass connections.
Args:
df_passes: DataFrame with columns ['passer', 'recipient', 'x', 'y'].
df_positions: DataFrame with columns ['player', 'avg_x', 'avg_y', 'count'].
min_passes: Minimum number of passes between a pair to draw an edge.
"""
pitch = Pitch(pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc")
fig, ax = pitch.draw(figsize=(12, 8))
# Draw edges (pass connections)
pass_counts = df_passes.groupby(["passer", "recipient"]).size().reset_index(name="count")
pass_counts = pass_counts[pass_counts["count"] >= min_passes]
for _, row in pass_counts.iterrows():
passer_pos = df_positions[df_positions["player"] == row["passer"]]
recipient_pos = df_positions[df_positions["player"] == row["recipient"]]
if len(passer_pos) == 0 or len(recipient_pos) == 0:
continue
pitch.lines(
passer_pos["avg_x"].values, passer_pos["avg_y"].values,
recipient_pos["avg_x"].values, recipient_pos["avg_y"].values,
ax=ax, lw=row["count"] / 3, color="white", alpha=0.4, zorder=2,
)
# Draw nodes (player positions)
pitch.scatter(
df_positions["avg_x"], df_positions["avg_y"], ax=ax,
s=df_positions["count"] * 3, color="#e74c3c",
edgecolors="white", linewidth=1.5, zorder=3,
)
ax.set_title(title, fontsize=16, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
Intuition: Pass networks are to soccer what circuit diagrams are to electronics. They show how information (the ball) flows through the system (the team). A highly centralized network (with one player receiving and distributing most passes) looks very different from a decentralized one (with even distribution across players). These structural differences correspond to different tactical philosophies.
6.4.6 Binned Statistics (Grid Heat Maps)
Rather than plotting every event, we can count events per zone:
from mplsoccer import Pitch
import numpy as np
def plot_bin_statistic(
x: np.ndarray,
y: np.ndarray,
pitch_type: str = "statsbomb",
bins: tuple[int, int] = (6, 4),
title: str = "Binned Event Count",
) -> None:
"""Plot a binned statistic (count) on the pitch."""
pitch = Pitch(pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc")
fig, ax = pitch.draw(figsize=(12, 8))
bin_stat = pitch.bin_statistic(x, y, statistic="count", bins=bins)
pitch.heatmap(bin_stat, ax=ax, cmap="hot", edgecolors="#1a1a2e")
pitch.label_heatmap(bin_stat, ax=ax, str_format="{:.0f}", color="white", fontsize=12)
ax.set_title(title, fontsize=16, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
Common Pitfall: The choice of bin size dramatically affects what patterns you see. Too few bins mask spatial detail; too many create noisy, hard-to-read maps. Start with 6 x 4 (thirds x channels) for a tactically meaningful first look, then refine as needed.
6.4.7 Working with Pitch Templates and Overlays
In professional settings, analysts often work with custom pitch templates that include team branding, sponsor logos, or tactical annotations. The mplsoccer library supports custom pitch dimensions and colors, making it straightforward to create branded visualizations.
Common overlays include:
- Zonal boundaries: Lines marking thirds, channels, or half-spaces.
- Defensive and pressing lines: Horizontal lines showing the average position of a team's defensive or pressing line.
- Expected threat contours: Contour lines showing the spatial distribution of threat across the pitch (see Section 6.7).
- Formation templates: Dots showing the expected positions in a given formation (e.g., 4-3-3, 3-5-2).
from mplsoccer import Pitch
import matplotlib.pyplot as plt
def plot_pitch_with_zones(
pitch_type: str = "statsbomb",
pitch_length: float = 120.0,
pitch_width: float = 80.0,
) -> None:
"""Draw a pitch with third and channel boundary overlays."""
pitch = Pitch(pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc")
fig, ax = pitch.draw(figsize=(12, 8))
# Draw third boundaries
for frac in [1/3, 2/3]:
ax.axvline(x=frac * pitch_length, color="#555555", linestyle="--",
linewidth=1, alpha=0.6)
# Draw channel boundaries (approximate)
for frac in [1/6, 2/6, 4/6, 5/6]:
ax.axhline(y=frac * pitch_width, color="#555555", linestyle="--",
linewidth=1, alpha=0.4)
ax.set_title("Pitch with Zone Boundaries", fontsize=16, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
6.5 Heat Maps and Density Estimation
6.5.1 From Bins to Smooth Surfaces
Binned heat maps are easy to understand, but they suffer from boundary effects: an event at $x = 35.1$ falls in the middle third, while one at $x = 34.9$ falls in the defensive third, even though they are 20 cm apart. Kernel Density Estimation (KDE) solves this by replacing each data point with a smooth "bump" (a kernel function) and summing all bumps to produce a continuous density surface.
The philosophical difference is important. Binned heat maps treat the pitch as a collection of discrete zones. KDE treats it as a continuous surface where density varies smoothly from point to point. The continuous approach is more faithful to the underlying reality --- a player's influence does not abruptly change at an arbitrary zonal boundary.
6.5.2 Kernel Density Estimation: Theory
Given $N$ observed positions $(x_i, y_i)$, the KDE estimate of the density at any point $(x, y)$ is:
$$\hat{f}(x, y) = \frac{1}{N} \sum_{i=1}^{N} K_H\!\left(\begin{pmatrix} x - x_i \\ y - y_i \end{pmatrix}\right)$$
where $K_H$ is a bivariate kernel function with bandwidth matrix $H$. The most common choice is the Gaussian kernel:
$$K_H(\mathbf{u}) = \frac{1}{2\pi |H|^{1/2}} \exp\!\left(-\frac{1}{2} \mathbf{u}^T H^{-1} \mathbf{u}\right)$$
For practical purposes, we often use a diagonal bandwidth matrix $H = \text{diag}(h_x^2, h_y^2)$, which allows different smoothing in the $x$ and $y$ directions. This is useful because the pitch is longer than it is wide, and the natural scale of variation may differ along the two axes.
Other kernel choices include the Epanechnikov kernel (which has compact support and is theoretically optimal in a certain sense) and the uniform kernel (which produces a simple moving-window average). In practice, the choice of kernel matters less than the choice of bandwidth.
6.5.3 Choosing the Bandwidth
The bandwidth $h$ controls the trade-off between bias and variance:
- Small $h$: The estimate is spiky and fits the data closely (low bias, high variance).
- Large $h$: The estimate is smooth and may oversmooth genuine features (high bias, low variance).
Common bandwidth selection methods include:
- Silverman's rule of thumb: $h = \left(\frac{4}{3N}\right)^{1/5} \sigma \approx 1.06 \sigma N^{-1/5}$
- Scott's rule: $h = N^{-1/(d+4)} \sigma$ for $d$ dimensions
- Cross-validation: minimize the integrated squared error via leave-one-out CV
For soccer-specific applications, a useful heuristic is:
| Data Type | Typical N per player-season | Suggested Bandwidth (metres) |
|---|---|---|
| Touches | 1,500 -- 3,000 | 3 -- 5 |
| Passes | 1,000 -- 2,500 | 4 -- 7 |
| Shots | 50 -- 150 | 8 -- 12 |
| Defensive actions | 100 -- 300 | 6 -- 10 |
| Tracking positions (per match) | 50,000 -- 100,000 | 1 -- 3 |
Intuition: Think of the bandwidth as the "blur radius" on a photograph. A small radius keeps fine details but amplifies noise; a large radius smooths everything into vague blobs. For soccer analytics, a bandwidth of 5--10 metres (in pitch coordinates) is usually a good starting point for event data (hundreds of events), while tracking data (thousands of frames) can support smaller bandwidths.
6.5.4 Implementing Heat Maps with mplsoccer
from mplsoccer import Pitch
import numpy as np
def plot_kde_heatmap(
x: np.ndarray,
y: np.ndarray,
pitch_type: str = "statsbomb",
title: str = "Density Heat Map",
cmap: str = "hot",
levels: int = 100,
) -> None:
"""Plot a kernel density estimate heat map on the pitch."""
pitch = Pitch(pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc")
fig, ax = pitch.draw(figsize=(12, 8))
# mplsoccer's kdeplot handles the KDE internally
pitch.kdeplot(
x, y, ax=ax,
cmap=cmap, fill=True, levels=levels,
thresh=0.05, alpha=0.7,
)
ax.set_title(title, fontsize=16, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
6.5.5 Implementing KDE from Scratch
Understanding the mechanics is valuable even when using library functions:
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
def compute_kde_surface(
x: np.ndarray,
y: np.ndarray,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
grid_resolution: int = 100,
bandwidth_method: str = "scott",
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Compute a KDE surface over the pitch.
Args:
x: x-coordinates of events (in metres).
y: y-coordinates of events (in metres).
pitch_length: Length of the pitch in metres.
pitch_width: Width of the pitch in metres.
grid_resolution: Number of grid points along each axis.
bandwidth_method: Bandwidth estimation method ('scott' or 'silverman').
Returns:
X, Y: meshgrid arrays for the evaluation grid.
Z: density values at each grid point.
"""
positions = np.vstack([x, y])
kde = gaussian_kde(positions, bw_method=bandwidth_method)
x_grid = np.linspace(0, pitch_length, grid_resolution)
y_grid = np.linspace(0, pitch_width, grid_resolution)
X, Y = np.meshgrid(x_grid, y_grid)
grid_positions = np.vstack([X.ravel(), Y.ravel()])
Z = kde(grid_positions).reshape(X.shape)
return X, Y, Z
6.5.6 Comparing Players with Heat Maps
One powerful application is comparing the spatial profiles of two players:
def compare_player_heatmaps(
x1: np.ndarray, y1: np.ndarray, name1: str,
x2: np.ndarray, y2: np.ndarray, name2: str,
pitch_type: str = "statsbomb",
) -> None:
"""Side-by-side KDE heat maps for two players."""
pitch = Pitch(pitch_type=pitch_type, pitch_color="#1a1a2e", line_color="#c7d5cc")
fig, axes = pitch.draw(nrows=1, ncols=2, figsize=(20, 8))
pitch.kdeplot(x1, y1, ax=axes[0], cmap="Blues", fill=True, levels=100, thresh=0.05)
axes[0].set_title(name1, fontsize=14, color="white")
pitch.kdeplot(x2, y2, ax=axes[1], cmap="Reds", fill=True, levels=100, thresh=0.05)
axes[1].set_title(name2, fontsize=14, color="white")
fig.set_facecolor("#1a1a2e")
fig.suptitle("Spatial Profile Comparison", fontsize=18, color="white", y=1.02)
return fig, axes
Real-World Application: Recruitment analysts routinely compare the heat maps of potential signings to the player they would replace. If the replacement candidate's spatial profile is dramatically different, the tactical fit may be poor regardless of aggregate statistics. For example, if a team is replacing a striker who drops deep to link play with one who primarily operates on the shoulder of the last defender, the team's build-up patterns will need to change fundamentally.
6.5.7 Difference Heat Maps
An even more informative comparison is the difference heat map, which subtracts one player's density surface from another's. Positive regions show where the first player is more active; negative regions show where the second player dominates.
def plot_difference_heatmap(
x1: np.ndarray, y1: np.ndarray, name1: str,
x2: np.ndarray, y2: np.ndarray, name2: str,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
grid_resolution: int = 100,
) -> None:
"""Plot the difference in density between two players."""
from mplsoccer import Pitch
X, Y, Z1 = compute_kde_surface(x1, y1, pitch_length, pitch_width, grid_resolution)
_, _, Z2 = compute_kde_surface(x2, y2, pitch_length, pitch_width, grid_resolution)
Z_diff = Z1 - Z2
pitch = Pitch(
pitch_type="custom", pitch_length=pitch_length, pitch_width=pitch_width,
pitch_color="#1a1a2e", line_color="#c7d5cc",
)
fig, ax = pitch.draw(figsize=(12, 8))
ax.contourf(X, Y, Z_diff, levels=20, cmap="RdBu_r", alpha=0.7)
ax.set_title(f"Density Difference: {name1} (red) vs {name2} (blue)",
fontsize=14, color="white", pad=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
6.6 Introduction to Pitch Control Concepts
6.6.1 Beyond Points: Influence and Control
So far, we have treated players and events as isolated points on the pitch. But a player is not a dimensionless dot; they exert influence over a region of space around them. A fast player running at full speed controls more space ahead of them than behind. A goalkeeper's reach extends roughly 2 metres in every direction. The concept of pitch control formalizes this intuition.
Pitch control models assign to every point $(x, y)$ on the pitch a value $P(x, y) \in [0, 1]$ representing the probability that a given team could gain possession of a ball played to that location.
The applications of pitch control are numerous:
- Evaluating passing options: Which locations on the pitch can a player pass to with high probability of success?
- Measuring space creation: How much controlled space does a team have in dangerous areas?
- Analyzing off-ball movement: Which runs create the most space for teammates?
- Assessing defensive coverage: Are there gaps in the defensive structure that opponents could exploit?
6.6.2 Voronoi Tessellation: A Starting Point
The simplest model of spatial dominance is the Voronoi tessellation. Given player positions $\{(x_i, y_i)\}$, the Voronoi cell of player $i$ is the set of all points on the pitch closer to player $i$ than to any other player:
$$V_i = \{(x, y) : \|(x, y) - (x_i, y_i)\| \leq \|(x, y) - (x_j, y_j)\| \; \forall j \neq i\}$$
The pitch is partitioned into cells, one per player, and each cell represents the area that player "controls" under the assumption that whichever player is nearest will win the ball.
from scipy.spatial import Voronoi
import numpy as np
def compute_voronoi(
x: np.ndarray,
y: np.ndarray,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
) -> Voronoi:
"""Compute the Voronoi tessellation for player positions.
Args:
x: x-coordinates of player positions.
y: y-coordinates of player positions.
pitch_length: Length of the pitch.
pitch_width: Width of the pitch.
Returns:
A scipy.spatial.Voronoi object.
"""
# Add mirror points to bound the tessellation to the pitch
points = np.column_stack([x, y])
mirror_points = np.array([
[-100, -100], [-100, pitch_width + 100],
[pitch_length + 100, -100], [pitch_length + 100, pitch_width + 100],
])
all_points = np.vstack([points, mirror_points])
return Voronoi(all_points)
Voronoi tessellations are computationally cheap and visually striking. They are often used in broadcast graphics to show "areas of influence" during live matches. However, their simplicity is also their limitation.
6.6.3 Limitations of Voronoi Models
Voronoi diagrams are elegant but make several unrealistic assumptions:
- All players are equally fast. In reality, a player sprinting toward a location at 9 m/s can reach it before a stationary player 5 m closer.
- All directions are equivalent. Players accelerate faster in their current direction of motion than laterally.
- The ball's trajectory is ignored. A pass to a specific location takes time to arrive; during that time, players move.
- The goalkeepers are treated like outfield players. In reality, goalkeepers have different movement constraints and influence patterns.
- Static snapshot. Voronoi diagrams use instantaneous positions and ignore velocities entirely.
These limitations motivate more sophisticated models.
Common Pitfall: Voronoi diagrams can be visually compelling and easy to compute, which makes them tempting to over-interpret. A Voronoi cell that shows a defender "controlling" a large area behind them does not account for the fact that the defender is sprinting forward and would take several seconds to turn and cover that space. Always treat Voronoi as a rough approximation, not a precise model of spatial control.
6.6.4 Velocity-Aware Pitch Control: The Fernandez-Bornn Model
The foundational paper by Fernandez and Bornn (2018) and the later implementation by Spearman (2018) extend Voronoi models by incorporating player velocity and acceleration. The basic idea is to replace Euclidean distance with time-to-reach:
$$T_i(x, y) = \text{time for player } i \text{ to reach } (x, y) \text{ given current position, velocity, and max acceleration}$$
Then the pitch control at $(x, y)$ for team $A$ is:
$$PC_A(x, y) = \frac{\sum_{i \in A} p_i(x, y)}{\sum_{i \in A} p_i(x, y) + \sum_{j \in B} p_j(x, y)}$$
where $p_i(x, y)$ is the probability that player $i$ arrives at $(x, y)$ before the ball, modeled (for instance) as a logistic function of the difference between ball travel time and player arrival time.
The Fernandez-Bornn model specifically introduces the concept of a player's "influence area" as a bivariate Gaussian distribution centered on the player's current position but stretched in the direction of their velocity. The covariance matrix of the Gaussian is determined by the player's speed and direction:
$$\Sigma_i = R_i \cdot S_i \cdot R_i^T$$
where $R_i$ is a rotation matrix aligned with the player's velocity direction and $S_i$ is a scaling matrix that stretches the influence area further in the direction of motion.
Advanced: The full pitch control model requires tracking data (positions and velocities at 25 Hz) and a ball physics model. We will implement it in detail in Chapter 18 (Tracking Data Analysis). For now, the key takeaway is the conceptual shift from nearest-player to fastest-to-arrive.
6.6.5 Computing Simple Pitch Control
Here is a simplified pitch control computation using Euclidean distance and a logistic weighting:
import numpy as np
from scipy.special import expit # logistic sigmoid
def simple_pitch_control(
team_a_xy: np.ndarray,
team_b_xy: np.ndarray,
grid_resolution: int = 50,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
sigma: float = 10.0,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Compute a simple distance-based pitch control surface.
Args:
team_a_xy: (N, 2) array of Team A player positions (x, y).
team_b_xy: (M, 2) array of Team B player positions (x, y).
grid_resolution: Number of grid points per axis.
sigma: Controls the spread of each player's influence.
Returns:
X, Y: meshgrid arrays.
PC: pitch control values in [0, 1] for Team A.
"""
x_grid = np.linspace(0, pitch_length, grid_resolution)
y_grid = np.linspace(0, pitch_width, grid_resolution)
X, Y = np.meshgrid(x_grid, y_grid)
grid = np.stack([X, Y], axis=-1) # (res, res, 2)
def team_influence(team_xy: np.ndarray) -> np.ndarray:
influence = np.zeros((grid_resolution, grid_resolution))
for pos in team_xy:
dist = np.linalg.norm(grid - pos, axis=-1)
influence += np.exp(-dist**2 / (2 * sigma**2))
return influence
inf_a = team_influence(team_a_xy)
inf_b = team_influence(team_b_xy)
# Avoid division by zero
total = inf_a + inf_b + 1e-10
PC = inf_a / total
return X, Y, PC
6.6.6 Visualizing Pitch Control
import matplotlib.pyplot as plt
from mplsoccer import Pitch
def plot_pitch_control(
X: np.ndarray,
Y: np.ndarray,
PC: np.ndarray,
team_a_xy: np.ndarray,
team_b_xy: np.ndarray,
title: str = "Pitch Control",
) -> None:
"""Visualize a pitch control surface with player positions."""
pitch = Pitch(
pitch_type="custom", pitch_length=105, pitch_width=68,
pitch_color="#1a1a2e", line_color="#c7d5cc",
)
fig, ax = pitch.draw(figsize=(12, 8))
# Plot pitch control as a filled contour
ax.contourf(X, Y, PC, levels=20, cmap="RdBu", alpha=0.7, vmin=0, vmax=1)
# Plot players
ax.scatter(team_a_xy[:, 0], team_a_xy[:, 1], c="blue", s=100,
edgecolors="white", linewidth=1.5, zorder=5, label="Team A")
ax.scatter(team_b_xy[:, 0], team_b_xy[:, 1], c="red", s=100,
edgecolors="white", linewidth=1.5, zorder=5, label="Team B")
ax.set_title(title, fontsize=16, color="white", pad=10)
ax.legend(loc="upper left", fontsize=10)
fig.set_facecolor("#1a1a2e")
return fig, ax
Real-World Application: Liverpool's data science team reportedly uses pitch control models to evaluate passing options in real time. By computing the pitch control surface at each frame of tracking data, they can assess whether a player made the "optimal" pass -- the one directed at the point of highest friendly pitch control -- or chose a lower-probability option.
6.6.7 Measuring Space Control: Aggregated Metrics
Once we have a pitch control surface, we can derive useful aggregated metrics:
- Total controlled area: The integral of pitch control over the entire pitch gives a single number representing how much space a team dominates.
- Controlled area in the attacking third: Restricting the integral to the final third measures attacking spatial dominance.
- Space around the ball: Computing pitch control in a radius around the ball carrier's position measures how much freedom they have.
def compute_controlled_area(
PC: np.ndarray,
X: np.ndarray,
Y: np.ndarray,
x_range: tuple[float, float] | None = None,
y_range: tuple[float, float] | None = None,
) -> float:
"""Compute the area controlled by Team A (PC > 0.5) in a given region.
Args:
PC: Pitch control values (Team A).
X, Y: Meshgrid arrays.
x_range: Optional (min, max) to restrict the x-axis region.
y_range: Optional (min, max) to restrict the y-axis region.
Returns:
Controlled area in square metres.
"""
mask = PC > 0.5
if x_range is not None:
mask &= (X >= x_range[0]) & (X <= x_range[1])
if y_range is not None:
mask &= (Y >= y_range[0]) & (Y <= y_range[1])
dx = X[0, 1] - X[0, 0]
dy = Y[1, 0] - Y[0, 0]
cell_area = dx * dy
return np.sum(mask) * cell_area
6.7 Action Density and Spatial Value Surfaces
6.7.1 Action Density Surfaces
Beyond simple event counts and heat maps, action density surfaces show the spatial distribution of specific action types --- passes, shots, tackles, pressures --- normalized to a rate per match or per 90 minutes. These surfaces reveal not just where a player or team is active, but how intensely they engage in specific actions across different regions of the pitch.
For example, a "pressing density surface" shows where a team applies its pressing actions most intensely. A "progressive carry density surface" shows where a player initiates ball carries that advance the ball toward the opponent's goal. By comparing these surfaces across teams or across time periods (e.g., first half vs. second half), analysts can identify tactical patterns and adjustments.
Intuition: An action density surface is like a topographic map of a specific tactical behavior. The "peaks" show where the behavior is most concentrated, and the "valleys" show where it is sparse. Reading these surfaces becomes second nature with practice and reveals patterns that are invisible in aggregate statistics.
6.7.2 Spatial Value Surfaces
A spatial value surface assigns a numerical value to every location on the pitch, representing how "dangerous" or "valuable" that location is from an attacking perspective. The most prominent example is the Expected Threat (xT) model, which we will cover in detail in Chapter 13.
The basic idea is simple: from historical data, compute the probability that a possession that reaches location $(x, y)$ eventually results in a goal. Locations close to the opponent's goal have high value; locations deep in a team's own half have low value. The resulting surface looks like a gradually ascending slope from the defensive goal to the attacking goal, with a steep rise near the penalty area.
$$xT(x, y) = P(\text{goal} \mid \text{possession at } (x, y))$$
This surface serves as a foundation for valuing player actions. A pass that moves the ball from a low-value location to a high-value location is more valuable than one that moves it laterally at the same value level. A ball carry that advances from $xT = 0.01$ to $xT = 0.05$ "adds" 0.04 units of expected threat.
6.7.3 From Pitch Control to Spatial Value
Pitch control tells us who controls a location; it does not tell us how valuable that control is. Combining pitch control with a spatial value model produces a composite metric: Pitch Control x Expected Threat. This quantifies the value of space that a team dominates, offering a single number that captures both spatial dominance and tactical positioning.
$$\text{Space Value}_A = \iint_{\text{pitch}} PC_A(x, y) \cdot xT(x, y) \, dx \, dy$$
where $xT(x, y)$ is the expected threat at position $(x, y)$.
This integral can be approximated on a grid:
$$\text{Space Value}_A \approx \sum_{i,j} PC_A(x_i, y_j) \cdot xT(x_i, y_j) \cdot \Delta x \cdot \Delta y$$
Teams that control high-value areas of the pitch (near the opponent's penalty area, in the half-spaces) will have higher space value than teams that control large areas of low-value space (deep in their own half, near the touchlines).
Real-World Application: Combining pitch control with spatial value models allows analysts to answer questions like: "During this match, which team controlled more dangerous space?" or "In the 10 minutes before the goal, how did the losing team's control of high-value space change?" These questions connect abstract spatial models to concrete tactical narratives.
6.7.4 Building Spatial Value Surfaces from Data
A simple approach to constructing a spatial value surface uses a grid-based Markov chain. Divide the pitch into $M \times N$ cells. For each cell, estimate two probabilities from historical data:
- $P(\text{shot} \mid \text{action in cell } (i,j))$: the probability that a possession currently in this cell leads to a shot.
- $P(\text{goal} \mid \text{shot from cell } (i,j))$: the probability that a shot from this cell results in a goal (this is essentially the xG for shots from this location).
The expected threat for each cell is then:
$$xT(i,j) = P(\text{shot} \mid i,j) \times P(\text{goal} \mid \text{shot from } i,j) + P(\text{move} \mid i,j) \times \sum_{(k,l)} P(\text{move to } (k,l) \mid (i,j)) \times xT(k,l)$$
This recursive formula can be solved iteratively (value iteration) or as a system of linear equations. We implement this fully in Chapter 13.
import numpy as np
def initialize_xt_surface(
n_x: int = 12,
n_y: int = 8,
pitch_length: float = 105.0,
pitch_width: float = 68.0,
) -> np.ndarray:
"""Initialize a simple Expected Threat surface using distance to goal.
This is a placeholder; the real xT model (Chapter 13) is data-driven.
"""
x_centres = np.linspace(
pitch_length / (2 * n_x), pitch_length * (1 - 1 / (2 * n_x)), n_x
)
y_centres = np.linspace(
pitch_width / (2 * n_y), pitch_width * (1 - 1 / (2 * n_y)), n_y
)
X, Y = np.meshgrid(x_centres, y_centres)
# Simple approximation: xT increases with proximity to goal centre
goal_x = pitch_length
goal_y = pitch_width / 2
dist_to_goal = np.sqrt((X - goal_x) ** 2 + (Y - goal_y) ** 2)
max_dist = np.sqrt(pitch_length ** 2 + (pitch_width / 2) ** 2)
xt = np.exp(-dist_to_goal / max_dist * 5) # Exponential decay
xt = xt / xt.max() * 0.35 # Normalize so max xT is ~0.35
return xt
Best Practice: When building spatial value surfaces, always validate against actual goal-scoring data. The surface should predict which locations are most likely to lead to goals. Compare the predicted "most dangerous zones" to the actual distribution of goals in your dataset.
Chapter Summary
In this chapter we established the soccer pitch as a two-dimensional coordinate plane and built the foundational tools for spatial analysis:
-
Pitch dimensions are standardized at 105 x 68 m for international play, but variability exists across domestic leagues and venues. Always normalize or document your coordinate system, and be aware that pitch size affects spatial metrics.
-
Coordinate systems differ across data providers (StatsBomb, Opta, Wyscout, FIFA EPTS). Conversions are affine transformations -- simple scaling, translation, and reflection operations. A hub-and-spoke architecture through a canonical metres-based system provides the most maintainable conversion pipeline.
-
Zones and regions (thirds, channels, half-spaces, Zone 14, penalty area) bridge continuous coordinates and tactical language. They are essential for aggregation, interpretation, and communication with coaches and scouts.
-
Visualization starts with scatter plots and arrow maps, progresses through binned statistics, and reaches its most informative form with kernel density estimation (heat maps). Shot maps, pass maps, and pass networks are specialized visualizations that answer specific tactical questions.
-
Heat maps and KDE produce smooth density surfaces from discrete events. The bandwidth parameter controls the bias-variance trade-off, and the choice of bandwidth should be guided by the number of events and the desired level of spatial detail.
-
Pitch control extends the spatial framework from points to influence regions. Even the simplest Voronoi model provides useful insights, and velocity-aware models (Fernandez-Bornn, Spearman) form the backbone of modern tracking-data analysis.
-
Spatial value surfaces assign a numerical value to every location on the pitch, enabling the valuation of player actions based on where they move the ball. Combining pitch control with spatial value produces a powerful composite metric that captures both spatial dominance and tactical positioning.
These tools will recur throughout the remainder of the book. Every metric we build -- from Expected Goals (Chapter 11) to Expected Threat (Chapter 13) to pressing intensity (Chapter 19) -- depends on accurately representing and manipulating spatial data on the pitch.
Looking Ahead
In Chapter 7: Introduction to Event Data, we will see how data providers record the raw events (passes, shots, tackles) that populate the coordinate systems we have just studied. We will load real StatsBomb data, parse the JSON structure, and build our first event-data analysis pipeline.
Related Reading
Explore this topic in other books
College Football Analytics Visualization Fundamentals NFL Analytics Exploratory Data Analysis Basketball Analytics Exploratory Data Analysis Prediction Markets Exploratory Analysis