Chapter 6 Exercises

Part A: Conceptual Questions (Difficulty: *)

A.1 State the FIFA-recommended dimensions for an international soccer pitch. Why does the Laws of the Game permit a range of dimensions rather than a single fixed size?

A.2 Explain in your own words why different data providers use different coordinate systems. What historical or practical reasons might drive this?

A.3 Define the term affine transformation. Which specific types of affine transformation are needed to convert between soccer data provider coordinate systems?

A.4 What is the difference between a normalized coordinate system (unit square) and a metric coordinate system? Give one advantage and one disadvantage of each.

A.5 Explain why the direction-of-play convention matters when analyzing spatial data. Describe a concrete scenario where ignoring it would produce misleading results.

A.6 What is Zone 14? Why is it considered tactically important?

A.7 Define the "half-spaces" in the five-channel model of the pitch. Why are they considered more dangerous than the wings or the centre?

A.8 Explain the bias-variance trade-off in kernel density estimation using the analogy of blurring a photograph.

A.9 What is the fundamental assumption of a Voronoi tessellation when applied to player positions? List two ways this assumption is unrealistic.

A.10 In one or two sentences, explain how pitch control differs from a simple Voronoi diagram.

Part B: Computational Problems (Difficulty: **)

B.1 A shot is taken from position $(92, 30)$ in standard metres (105 x 68, origin bottom-left). The goal posts are at $(105, 30.34)$ and $(105, 37.66)$. Compute the shot angle $\theta$ in degrees using the formula from Section 6.1.2.

B.2 Convert the following StatsBomb coordinates to standard metres (105 x 68, origin bottom-left, $y$-up): - (a) $(60, 40)$ - (b) $(100, 20)$ - (c) $(120, 0)$ - (d) $(0, 80)$

B.3 Convert the following Opta coordinates to standard metres: - (a) $(50, 50)$ - (b) $(100, 100)$ - (c) $(75, 25)$

B.4 Convert the following Wyscout coordinates to standard metres: - (a) $(50, 50)$ - (b) $(0, 0)$ - (c) $(100, 100)$

B.5 A tracking data file uses FIFA EPTS coordinates. A player is at $(-20.0, 15.0)$. Convert this to: - (a) Standard metres (origin bottom-left) - (b) StatsBomb coordinates - (c) Opta coordinates

B.6 Write the $2 \times 2$ scale matrix and translation vector for converting Wyscout coordinates to StatsBomb coordinates. Verify your answer by converting the Wyscout point $(50, 50)$.

B.7 An analyst has event data from a stadium with a pitch measuring 100 m x 65 m. The data uses the raw metres. Normalize the coordinates to the unit square, then rescale to the standard 105 x 68 m pitch. For the point $(75, 50)$, what are the final coordinates?

B.8 A player's touch locations in StatsBomb coordinates during a match are: $(80, 30)$, $(85, 35)$, $(90, 40)$, $(95, 25)$, $(100, 45)$, $(110, 38)$. Assign each touch to a pitch third (defensive, middle, attacking) after converting to standard metres.

B.9 Using the five-channel model from Section 6.3.3, assign each of the following positions (in standard metres) to a channel: - (a) $(50, 5)$ - (b) $(50, 20)$ - (c) $(50, 34)$ - (d) $(50, 55)$ - (e) $(50, 65)$

B.10 Compute the area (in square metres) of: - (a) The penalty area - (b) The goal area (6-yard box) - (c) Zone 14 (as defined in Section 6.3.4) - (d) Express each as a percentage of the total pitch area (105 x 68 = 7140 m$^2$).

Part C: Programming Problems (Difficulty: ***)

C.1 Write a Python function convert_coordinates(x, y, source, target) that converts between all four systems (StatsBomb, Opta, Wyscout, FIFA EPTS) by routing through standard metres. Include type hints and a docstring. Test it with at least three example conversions.

C.2 Write a function flip_direction(x, y, pitch_length, pitch_width) that flips coordinates to account for direction of play. Test it by flipping StatsBomb coordinates and verifying that a point near one goal maps to the corresponding point near the other goal.

C.3 Generate 200 random event positions uniformly distributed across the pitch (in standard metres). Write code to: - (a) Assign each event to a pitch third and print the count per third. - (b) Assign each event to one of the 5 channels and print the count per channel. - (c) Assign each event to a cell in a 6 x 4 grid and print the grid as a 2D array of counts.

C.4 Using mplsoccer, draw a pitch and plot the following elements: - The centre circle - Both penalty areas - The penalty spots - Custom markers at the positions $(30, 34)$, $(52.5, 34)$, and $(80, 34)$ (in standard metres) Adjust your pitch_type accordingly or use a custom pitch.

C.5 Write a function that takes a DataFrame of pass events (with columns x, y, end_x, end_y, outcome) and produces a pass map on a pitch. Color completed passes green and incomplete passes red. Include arrows and use appropriate transparency.

C.6 Using scipy.stats.gaussian_kde, write a function that: - (a) Takes an array of $(x, y)$ positions and computes a KDE surface over the pitch. - (b) Plots the surface as a filled contour on a pitch drawn with mplsoccer. - (c) Accepts a bandwidth parameter and demonstrates the effect of three different bandwidth values on the same data.

C.7 Simulate the positions of 22 players on a pitch (11 per team). Compute and plot the Voronoi tessellation using scipy.spatial.Voronoi. Color each cell according to the team. Clip the tessellation to the pitch boundaries.

C.8 Using your simulated player positions from C.7, compute the simple pitch control surface from Section 6.6.5. Plot it as a filled contour and overlay the player positions. Experiment with the sigma parameter and describe how it changes the result.

C.9 Write a function that computes the total "controlled area" for each team by integrating the pitch control surface:

$$A_{\text{team}} = \iint_{\text{pitch}} \mathbf{1}[PC_{\text{team}}(x, y) > 0.5] \, dx \, dy$$

Apply it to your simulated data from C.8 and report the area each team controls.

C.10 Download the StatsBomb open data for a match of your choice (e.g., a 2018 World Cup match). Extract all pass events for one team. Produce: - (a) A scatter plot of pass origins. - (b) An arrow map of passes. - (c) A binned heat map (6 x 5 grid) of pass origins. - (d) A KDE heat map of pass origins. Arrange all four plots in a 2 x 2 figure.

Part D: Advanced / Open-Ended Problems (Difficulty: ****)

D.1 Pitch-size normalization study. Research the actual pitch dimensions for at least five Premier League stadiums. Write code that takes event data in raw metres for each stadium, normalizes to the unit square, and rescales to 105 x 68. Quantify the maximum coordinate shift (in metres) that normalization produces for each stadium.

D.2 Custom zone optimization. Instead of using fixed grid zones, propose a data-driven zoning scheme. Cluster a set of shot locations (from StatsBomb open data) into zones using k-means clustering. Compare the within-zone variance of expected goals (xG) between your data-driven zones and a regular 6 x 5 grid. Which approach produces more homogeneous zones?

D.3 Asymmetric KDE. Standard Gaussian KDE uses a symmetric kernel. In soccer, a player's influence is often directional (e.g., a winger hugs the touchline, so their activity is compressed laterally but extended longitudinally). Implement a KDE with an anisotropic bandwidth matrix:

$$H = \begin{pmatrix} h_x^2 & 0 \\ 0 & h_y^2 \end{pmatrix}$$

where $h_x \neq h_y$. Demonstrate its effect on a winger's touch data.

D.4 Velocity-aware pitch control. Extend the simple pitch control model from Section 6.6.5 to incorporate player velocity vectors. Assume each player has a velocity $(v_x, v_y)$, and model their time-to-reach as:

$$T_i(x, y) = \frac{\|(x, y) - (x_i + v_{xi} \cdot \Delta t, y_i + v_{yi} \cdot \Delta t)\|}{v_{\max}}$$

where $\Delta t$ is a short lookahead time (e.g., 0.5 s) and $v_{\max} = 8$ m/s. Compute and visualize the resulting pitch control surface for a scenario with two attackers running into space.

D.5 Heat map comparison metric. Given two heat maps $f_1(x, y)$ and $f_2(x, y)$ (e.g., for two players), define a similarity metric using the Bhattacharyya coefficient:

$$BC = \int \sqrt{f_1(x, y) \cdot f_2(x, y)} \, dx \, dy$$

Implement this metric and compute it for pairs of players from the same team. Which positions tend to have the most similar spatial profiles? Which are most distinct?