Chapter 6: Key Takeaways
The Soccer Pitch as a Coordinate System
1. Pitch Dimensions and Standardization
- The FIFA standard is 105 m x 68 m, but the Laws of the Game permit pitches from 90-120 m long and 45-90 m wide.
- Always verify and document the pitch dimensions underlying your data.
- Normalize coordinates to a unit square $[0,1] \times [0,1]$ when comparing across venues, then rescale to a standard reference frame.
2. Coordinate Systems
| Provider | Dimensions | Origin | y-direction |
|---|---|---|---|
| StatsBomb | 120 x 80 | Top-left | Down |
| Opta | 100 x 100 | Bottom-left | Up |
| Wyscout | 100 x 100 | Top-left | Down |
| FIFA EPTS | 105 x 68 (metres) | Centre | Up |
- Every conversion between providers is an affine transformation: $x' = s_x \cdot x + t_x$, $y' = s_y \cdot y + t_y$.
- Route all conversions through a canonical system (e.g., standard metres) to reduce $N^2$ pairwise converters to $2N$.
3. Direction of Play
- Most providers orient attacks left-to-right, but always verify.
- Flip coordinates when aligning halves: $x' = L - x$, $y' = W - y$.
- A misaligned direction of play is one of the most common data-pipeline bugs.
4. Zones and Regions
- Thirds: Defensive (0-35 m), Middle (35-70 m), Attacking (70-105 m).
- Five channels: Left wing, left half-space, centre, right half-space, right wing.
- Zone 14: The central area just outside the penalty box -- a high-value zone for chance creation.
- Custom grids: A 6 x 4 or 6 x 5 grid provides a good balance of granularity and interpretability.
5. Visualization
- Scatter plots show individual events; arrow plots show passes with direction.
- Binned heat maps aggregate events into grid cells; choice of bin size strongly affects readability.
- Use
mplsoccerfor production-quality pitch drawings with consistent coordinate handling.
6. Heat Maps and KDE
- Kernel Density Estimation replaces discrete bins with smooth density surfaces.
- The bandwidth parameter controls smoothing: small = spiky (low bias, high variance), large = blurry (high bias, low variance).
- Silverman's rule: $h \approx 1.06 \sigma N^{-1/5}$ provides a reasonable automatic bandwidth.
- Start with bandwidths of 5-10 m for event data; refine as needed.
7. Pitch Control
- Players exert influence over a region, not just a single point.
- Voronoi tessellation is the simplest model: each point belongs to the nearest player.
- Velocity-aware models replace distance with time-to-reach, incorporating speed and direction.
- Pitch control $PC(x,y) \in [0,1]$ represents the probability a team wins possession at $(x,y)$.
- Combining pitch control with Expected Threat yields a spatial value metric.
Essential Formulas
| Concept | Formula |
|---|---|
| Normalization | $x_{\text{norm}} = x / L$ |
| Affine transform | $x' = s_x \cdot x + t_x$ |
| Shot angle | $\theta = \arctan\!\left(\frac{y_{g2}-y}{x_g-x}\right) - \arctan\!\left(\frac{y_{g1}-y}{x_g-x}\right)$ |
| 2D Gaussian KDE | $\hat{f}(x,y) = \frac{1}{N}\sum_i K_H(x-x_i, y-y_i)$ |
| Silverman bandwidth | $h \approx 1.06\sigma N^{-1/5}$ |
| Pitch control | $PC_A = \frac{\sum_{i\in A} p_i}{\sum_{i\in A} p_i + \sum_{j\in B} p_j}$ |
Common Mistakes to Avoid
- Mixing coordinate systems without conversion.
- Ignoring the direction-of-play flag.
- Assuming all pitches are 105 x 68 m without checking.
- Choosing KDE bandwidth without considering sample size.
- Interpreting Voronoi diagrams as true pitch control.
Connections to Other Chapters
- Chapter 7 (Event Data): The coordinate systems studied here are the spatial backbone of event data.
- Chapter 11 (Expected Goals): Shot location in the coordinate system is the primary input to xG models.
- Chapter 13 (Expected Threat): Spatial value models divide the pitch into a grid and assign threat values to each cell.
- Chapter 18 (Tracking Data): Pitch control models require 25 Hz coordinate data.
- Chapter 22 (Pressing Analytics): Defensive shape metrics quantify pressing intensity and compactness.