Chapter 6: Key Takeaways

The Soccer Pitch as a Coordinate System


1. Pitch Dimensions and Standardization

  • The FIFA standard is 105 m x 68 m, but the Laws of the Game permit pitches from 90-120 m long and 45-90 m wide.
  • Always verify and document the pitch dimensions underlying your data.
  • Normalize coordinates to a unit square $[0,1] \times [0,1]$ when comparing across venues, then rescale to a standard reference frame.

2. Coordinate Systems

Provider Dimensions Origin y-direction
StatsBomb 120 x 80 Top-left Down
Opta 100 x 100 Bottom-left Up
Wyscout 100 x 100 Top-left Down
FIFA EPTS 105 x 68 (metres) Centre Up
  • Every conversion between providers is an affine transformation: $x' = s_x \cdot x + t_x$, $y' = s_y \cdot y + t_y$.
  • Route all conversions through a canonical system (e.g., standard metres) to reduce $N^2$ pairwise converters to $2N$.

3. Direction of Play

  • Most providers orient attacks left-to-right, but always verify.
  • Flip coordinates when aligning halves: $x' = L - x$, $y' = W - y$.
  • A misaligned direction of play is one of the most common data-pipeline bugs.

4. Zones and Regions

  • Thirds: Defensive (0-35 m), Middle (35-70 m), Attacking (70-105 m).
  • Five channels: Left wing, left half-space, centre, right half-space, right wing.
  • Zone 14: The central area just outside the penalty box -- a high-value zone for chance creation.
  • Custom grids: A 6 x 4 or 6 x 5 grid provides a good balance of granularity and interpretability.

5. Visualization

  • Scatter plots show individual events; arrow plots show passes with direction.
  • Binned heat maps aggregate events into grid cells; choice of bin size strongly affects readability.
  • Use mplsoccer for production-quality pitch drawings with consistent coordinate handling.

6. Heat Maps and KDE

  • Kernel Density Estimation replaces discrete bins with smooth density surfaces.
  • The bandwidth parameter controls smoothing: small = spiky (low bias, high variance), large = blurry (high bias, low variance).
  • Silverman's rule: $h \approx 1.06 \sigma N^{-1/5}$ provides a reasonable automatic bandwidth.
  • Start with bandwidths of 5-10 m for event data; refine as needed.

7. Pitch Control

  • Players exert influence over a region, not just a single point.
  • Voronoi tessellation is the simplest model: each point belongs to the nearest player.
  • Velocity-aware models replace distance with time-to-reach, incorporating speed and direction.
  • Pitch control $PC(x,y) \in [0,1]$ represents the probability a team wins possession at $(x,y)$.
  • Combining pitch control with Expected Threat yields a spatial value metric.

Essential Formulas

Concept Formula
Normalization $x_{\text{norm}} = x / L$
Affine transform $x' = s_x \cdot x + t_x$
Shot angle $\theta = \arctan\!\left(\frac{y_{g2}-y}{x_g-x}\right) - \arctan\!\left(\frac{y_{g1}-y}{x_g-x}\right)$
2D Gaussian KDE $\hat{f}(x,y) = \frac{1}{N}\sum_i K_H(x-x_i, y-y_i)$
Silverman bandwidth $h \approx 1.06\sigma N^{-1/5}$
Pitch control $PC_A = \frac{\sum_{i\in A} p_i}{\sum_{i\in A} p_i + \sum_{j\in B} p_j}$

Common Mistakes to Avoid

  1. Mixing coordinate systems without conversion.
  2. Ignoring the direction-of-play flag.
  3. Assuming all pitches are 105 x 68 m without checking.
  4. Choosing KDE bandwidth without considering sample size.
  5. Interpreting Voronoi diagrams as true pitch control.

Connections to Other Chapters

  • Chapter 7 (Event Data): The coordinate systems studied here are the spatial backbone of event data.
  • Chapter 11 (Expected Goals): Shot location in the coordinate system is the primary input to xG models.
  • Chapter 13 (Expected Threat): Spatial value models divide the pitch into a grid and assign threat values to each cell.
  • Chapter 18 (Tracking Data): Pitch control models require 25 Hz coordinate data.
  • Chapter 22 (Pressing Analytics): Defensive shape metrics quantify pressing intensity and compactness.