Chapter 18 Exercises: Tracking Data Analytics
Section 18.1 --- Understanding Tracking Data
Exercise 18.1. A tracking system records data at 25 Hz for a 95-minute match (including stoppage time). Calculate the total number of frames recorded. If each frame contains position data for 22 outfield players, 2 goalkeepers, and 1 ball entity, how many individual position records are generated for the entire match?
Exercise 18.2. A data provider uses a coordinate system with the origin at the bottom-left corner of the pitch, with the $x$-axis running along the length (0 to 105 m) and the $y$-axis along the width (0 to 68 m). Convert this to a center-origin system where coordinates range from $(-52.5, -34)$ to $(52.5, 34)$. Write the transformation equations and apply them to the point $(75, 20)$.
Exercise 18.3. Raw tracking data for a player contains the following $(x, y)$ coordinates over 5 consecutive frames at 25 Hz: $(12.0, 7.0)$, $(12.1, 7.05)$, $(12.3, 7.15)$, $(12.2, 7.10)$, $(12.5, 7.25)$. The fourth frame appears to be an anomaly (the player does not actually reverse direction). Apply linear interpolation to correct the fourth frame using the third and fifth frames, and compare the smoothed trajectory to the original.
Exercise 18.4. Explain why smoothing is necessary before computing velocity and acceleration from positional tracking data. Describe the trade-off between noise reduction and signal distortion when choosing the window size for a Savitzky-Golay filter.
Exercise 18.5. Write a Python function that takes a DataFrame of tracking data with columns ['frame_id', 'timestamp', 'player_id', 'x', 'y'] and performs the following preprocessing steps: (a) identifies and reports missing frames for each player, (b) interpolates missing positions using cubic spline interpolation, and (c) applies Savitzky-Golay smoothing to the resulting positions.
Section 18.2 --- Physical Performance Metrics
Exercise 18.6. Using the speed zone definitions provided in Section 18.2.2, classify the following instantaneous speeds: (a) 3.5 m/s, (b) 8.8 m/s, (c) 1.2 m/s, (d) 7.0 m/s, (e) 10.3 m/s.
Exercise 18.7. A player's speed time series (in m/s) over 10 seconds at 1 Hz is: $[2.0, 3.5, 7.0, 8.5, 9.5, 9.8, 9.2, 8.0, 4.5, 3.0]$. Identify all sprint events using a threshold of 9.0 m/s and a minimum duration of 1.0 second. Report the start time, end time, duration, and peak speed of each detected sprint.
Exercise 18.8. Compute the metabolic power for a player moving at $s = 7.0$ m/s with a horizontal acceleration of $a_h = 2.5$ m/s$^2$. Use the equivalent slope model described in Section 18.2.4 with $g = 11.81$ m/s$^2$ and efficiency factor $\eta = 0.25$.
Exercise 18.9. A center midfielder covers the following distances in each speed zone during a match: Walking: 4,200 m; Jogging: 4,800 m; Running: 1,800 m; High-speed running: 900 m; Sprinting: 320 m. Calculate (a) total distance, (b) high-speed running distance (HSRD), (c) the percentage of total distance covered at high intensity (high-speed running + sprinting).
Exercise 18.10. Explain the concept of individualized speed thresholds. For a player with $v_{\max} = 11.2$ m/s, compute the individualized thresholds for high-speed running ($>75\%$ of $v_{\max}$) and sprinting ($>90\%$ of $v_{\max}$). How do these compare to the standard fixed thresholds?
Section 18.3 --- Speed and Acceleration Analysis
Exercise 18.11. Given the following position data at 25 Hz, compute the velocity and speed at $t = 0.08$ s using central finite differences:
| $t$ (s) | $x$ (m) | $y$ (m) |
|---|---|---|
| 0.00 | 22.00 | 12.00 |
| 0.04 | 22.18 | 12.08 |
| 0.08 | 22.40 | 12.20 |
| 0.12 | 22.66 | 12.36 |
| 0.16 | 22.96 | 12.56 |
Exercise 18.12. Using the velocity data computed in Exercise 18.11, compute the acceleration vector at $t = 0.08$ s. Decompose it into tangential and normal components. Interpret the physical meaning of each component.
Exercise 18.13. Write a Python function that takes a player's position time series and computes the tangential and normal acceleration at each time step. Test it with synthetic data of a player running in a circular arc of radius 15 m at constant speed 6 m/s.
Exercise 18.14. A player decelerates from 10.0 m/s to 2.0 m/s in 1.5 seconds. Assuming constant deceleration, compute (a) the deceleration magnitude, (b) the distance covered during deceleration, and (c) the mechanical energy dissipated per kilogram of body mass.
Exercise 18.15. Describe how the choice of frame rate (10 Hz vs. 25 Hz) affects the accuracy of acceleration calculations. What additional preprocessing step becomes more critical at higher frame rates?
Section 18.4 --- Distance and Work Rate Metrics
Exercise 18.16. A substitute enters the match at the 60th minute and plays until the 90th minute (plus 5 minutes of stoppage time). They cover 3,800 m in total. A starter plays the full 95 minutes and covers 11,200 m. Compare their work rates in m/min. What can you conclude about relative intensity?
Exercise 18.17. Compute the total distance covered by a player whose speed time series (at 10 Hz) for a 10-second window is: $[3.0, 3.2, 3.5, 4.0, 4.8, 7.5, 8.0, 8.2, 7.8, 7.0, ..., 3.0]$ (assume 100 samples total with linear interpolation between the given values for the remaining samples). Compare the result obtained by summing Euclidean displacements versus integrating the speed time series.
Exercise 18.18. Write a Python function that computes the distance covered in each 5-minute interval of a match, and produces a bar chart showing the temporal profile. Test it with synthetic data for a player who covers approximately 120 m/min in the first half and 105 m/min in the second half.
Exercise 18.19. A team's possession percentage is 62%. A midfielder covers 12,100 m total. If the team has possession for 55 minutes and the opponent for 33 minutes (with 7 minutes of dead ball), estimate the expected in-possession and out-of-possession distances assuming the player's work rate is 15% higher out of possession than in possession.
Exercise 18.20. Explain why high-speed running distance (HSRD) is considered a more sensitive indicator of fatigue than total distance. Provide numerical examples to support your argument using hypothetical first-half vs. second-half data.
Section 18.5 --- Synchronization and Collective Movement
Exercise 18.21. Ten outfield players of a team have the following positions at a single frame:
| Player | $x$ (m) | $y$ (m) |
|---|---|---|
| 1 | -35.0 | 0.0 |
| 2 | -27.0 | 17.0 |
| 3 | -27.0 | -17.0 |
| 4 | -22.0 | 27.0 |
| 5 | -22.0 | -27.0 |
| 6 | -12.0 | 12.0 |
| 7 | -12.0 | -12.0 |
| 8 | 7.0 | 22.0 |
| 9 | 7.0 | -22.0 |
| 10 | 17.0 | 0.0 |
Compute: (a) the team centroid, (b) the stretch index, (c) the team length and width, and (d) the approximate team surface area using the convex hull.
Exercise 18.22. Write a Python function that computes the Voronoi tessellation for all 22 players on the pitch at a single frame and returns the area of each player's Voronoi cell. Clip the cells to the pitch boundaries ($-52.5 \leq x \leq 52.5$, $-34 \leq y \leq 34$).
Exercise 18.23. Two teammates' $x$-positions over 100 frames are given by $x_A(t) = 10\sin(0.1t)$ and $x_B(t) = 10\sin(0.1t + \phi)$, where $\phi$ is a phase offset. Compute the Pearson correlation between $x_A$ and $x_B$ for $\phi = 0$, $\phi = \pi/4$, $\phi = \pi/2$, and $\phi = \pi$. Interpret each result in terms of team synchronization.
Exercise 18.24. Explain the difference between the convex hull area and the effective playing area concept. Under what match situations would you expect these two measures to diverge significantly?
Exercise 18.25. A team's stretch index over the course of a match has a mean of 17.2 m with a standard deviation of 3.1 m. During attacking possessions, the mean increases to 20.4 m; during defending phases, it drops to 14.1 m. Calculate the expansion ratio (attacking SI / defending SI) and discuss what this reveals about the team's tactical approach.
Section 18.6 --- Fatigue and Load Monitoring
Exercise 18.26. A player's work rate (m/min) across 18 five-minute intervals of a match is:
$[125, 130, 128, 118, 132, 127, 115, 120, 122, 110, 108, 115, 105, 102, 112, 98, 95, 100]$
Compute: (a) the average work rate for each half, (b) the percentage decline from first half to second half, and (c) identify the interval with the lowest work rate and discuss possible causes.
Exercise 18.27. Compute the Acute-to-Chronic Workload Ratio for a player with the following daily total distance (in meters) over the past 28 days:
Days 1--7: $[8500, 0, 9200, 0, 10500, 0, 11000]$ Days 8--14: $[8000, 0, 9500, 0, 10000, 0, 10800]$ Days 15--21: $[9000, 0, 9800, 0, 11500, 0, 10200]$ Days 22--28: $[12000, 0, 13000, 0, 14000, 0, 12500]$
Use both rolling average and EWMA methods. Is the player in the "danger zone"?
Exercise 18.28. A coach wants to implement a real-time substitution recommendation system. Define three quantitative criteria based on tracking data that would trigger a substitution alert. For each criterion, specify the metric, the threshold, and the time window over which it should be evaluated.
Exercise 18.29. Write a Python function that detects transient fatigue episodes. Define a transient fatigue episode as a period of at least 3 consecutive minutes where a player's rolling 1-minute work rate drops below 70% of their match average. Return the start time, duration, and preceding high-intensity period for each episode.
Exercise 18.30. Discuss the limitations of using total distance as a fatigue indicator. Propose a composite fatigue index that combines at least three different tracking-derived metrics, and explain how you would weight each component.
Section 18.7 --- Integrating Tracking with Event Data
Exercise 18.31. Write a Python function that, given a pass event (with timestamp and coordinates) and the corresponding tracking frame, computes the following contextual features: (a) number of opponents within 5 m of the passer, (b) distance to the nearest opponent, (c) speed of the passer at the moment of the pass, (d) angle between the pass direction and the passer's movement direction.
Exercise 18.32. For a shot event, use tracking data to compute: (a) the visible goal angle from the shooter's position, accounting for the goalkeeper's position and any blocking defenders, (b) the number of defenders between the shooter and the goal, and (c) the shooter's speed at the moment of the shot. Discuss how these features could improve an xG model.
Exercise 18.33. Implement a simplified pitch control model. For a single frame of tracking data, compute the pitch control surface on a $105 \times 68$ grid (1 m resolution). Use a simple influence function where each player's influence at a point is inversely proportional to their estimated time-to-reach that point. Visualize the result as a heatmap.
Exercise 18.34. Describe the challenges of synchronizing event data and tracking data from different providers. Propose a robust synchronization algorithm that handles: (a) different clock references, (b) annotation timing errors of up to 2 seconds, and (c) missing events.
Exercise 18.35. Design an analysis pipeline that combines tracking and event data to evaluate pressing effectiveness. The pipeline should: (a) identify pressing triggers from event data (turnovers), (b) extract the tracking data for a 10-second window following each trigger, (c) compute pressing metrics (closing speed, number of players involved, team compactness), and (d) link the outcome (ball won, foul, passed out of pressure) to the pressing intensity. Write pseudocode for this pipeline.
Advanced / Open-Ended Problems
Exercise 18.36. Design and implement a formation detection algorithm using tracking data. The algorithm should: (a) identify the formation template (e.g., 4-3-3, 4-4-2) at each frame, (b) compute formation stability (how often the detected formation matches the nominal formation), and (c) identify formation transitions during the match.
Exercise 18.37. Build a player similarity model using tracking data. Define a feature vector for each player based on their physical performance profile (total distance, HSRD, sprint count, acceleration count, work rate, time in each speed zone) and spatial behavior (average position, position heatmap, Voronoi cell area). Use cosine similarity or Euclidean distance in feature space to find the most similar player pairs across different teams.
Exercise 18.38. Implement a simple Expected Possession Value (EPV) model that uses both event and tracking data. For each frame of a possession, the model should estimate the probability of the possession ending in a goal, considering: (a) the current ball position, (b) the attacking team's formation and positions, (c) the defending team's formation and positions, and (d) the pitch control surface. Train the model on historical data using logistic regression.
Exercise 18.39. Write a Python pipeline that processes a full match of tracking data and generates a comprehensive physical performance report for each player. The report should include: total distance, distance in each speed zone, number and characteristics of sprints, acceleration/deceleration counts, work rate by 5-minute interval, and a fatigue assessment. Output the report as a formatted table and a set of visualizations.
Exercise 18.40. Critically evaluate the metabolic power model (Section 18.2.4) by implementing it and comparing its output to simple speed-based energy expenditure estimates. Generate synthetic data for three scenarios: (a) constant-speed running at 5 m/s, (b) repeated sprints with full recovery, and (c) variable-speed running with frequent direction changes. Analyze when the two models agree and diverge.