Chapter 23: Exercises
Section 23.2 -- Manual Video Tagging Systems
Exercise 23.1 (Conceptual) Design a hierarchical tagging schema for set-piece analysis that covers corner kicks, free kicks, throw-ins, and goal kicks. Your schema should have at least three levels of hierarchy and include both the event type and the outcome. Provide the full tree structure and brief definitions for each tag.
Exercise 23.2 (Calculation) Two analysts independently tag the same 90-minute match for shot events. Analyst A tags 18 shots and Analyst B tags 21 shots. They agree on 16 events. Assuming the remaining tagged events are unique to each analyst: (a) Calculate the observed agreement $p_o$ as the proportion of events both agree on relative to the total unique events identified. (b) If the probability of each analyst tagging any given moment as a shot by chance is 0.02, calculate $p_e$. (c) Compute Cohen's kappa $\kappa$. (d) Interpret the result. Would you consider this acceptable for a professional tagging system?
Exercise 23.3 (Practical) You are building a tagging schema for pressing analysis. Define exactly five tags that capture the essential information about a pressing event. For each tag, specify: (a) the tag name, (b) the data type (categorical, numerical, timestamp), (c) the possible values or range, and (d) a precise definition that would achieve high inter-rater reliability.
Exercise 23.4 (Data Analysis) Given the following inter-rater reliability data for a tagging schema with six event categories, identify which categories need refinement and suggest specific improvements:
| Category | $\kappa$ | $p_o$ | $p_e$ |
|---|---|---|---|
| Goals | 0.98 | 0.99 | 0.50 |
| Shots | 0.82 | 0.91 | 0.50 |
| Passes | 0.75 | 0.85 | 0.40 |
| Tackles | 0.58 | 0.72 | 0.33 |
| Fouls | 0.51 | 0.68 | 0.35 |
| Pressing Triggers | 0.43 | 0.60 | 0.30 |
Exercise 23.5 (Design) A club's video analysis department processes 50 matches per week across all departments. Each match requires 45 minutes of tagging time with the current schema. The department has three full-time analysts, each working 40 hours per week. (a) Can they handle the workload? (b) If not, propose two strategies to address the gap: one involving schema simplification and one involving technology.
Section 23.3 -- Computer Vision Fundamentals
Exercise 23.6 (Calculation) A broadcast camera captures a soccer pitch at 1920x1080 resolution. The visible portion of the pitch is approximately 100m x 60m. (a) Calculate the spatial resolution in meters per pixel for both dimensions. (b) A soccer ball has a diameter of 22 cm. How many pixels wide would the ball appear at the center of the frame? (c) At the far end of the pitch, the effective resolution halves due to perspective. How many pixels wide would the ball appear there? (d) What is the minimum resolution needed for reliable ball detection if the detector requires at least 8 pixels across the ball?
Exercise 23.7 (Programming) Implement a function that converts pixel coordinates to pitch coordinates using a homography matrix. Your function should: (a) Accept a 3x3 homography matrix and an array of pixel coordinates. (b) Apply the projective transformation. (c) Return real-world pitch coordinates in meters. (d) Include input validation and handle the case where the homography determinant is near zero.
Exercise 23.8 (Mathematical) Given four point correspondences between image coordinates and pitch coordinates:
| Image (pixels) | Pitch (meters) |
|---|---|
| (120, 540) | (0, 0) |
| (1800, 540) | (105, 0) |
| (480, 180) | (0, 68) |
| (1440, 180) | (105, 68) |
(a) Set up the system of equations for the Direct Linear Transform (DLT) algorithm. (b) Explain why at least four point correspondences are needed. (c) Why might you want to use more than four points in practice?
Exercise 23.9 (Conceptual) Explain the concept of transfer learning in the context of soccer CV. Specifically: (a) Why is training a deep CNN from scratch impractical for most soccer applications? (b) Which layers of a pre-trained network would you freeze vs. fine-tune for a player detection task? (c) What types of soccer-specific data would you need for fine-tuning?
Exercise 23.10 (Calculation) A CV pipeline processes video at 25 fps. The detection model takes 15ms per frame, the tracking model takes 5ms, and the event detection model takes 8ms. (a) Can this pipeline run in real-time if the models run sequentially? (b) What is the maximum throughput in frames per second? (c) If the detection and event detection models can run in parallel (with tracking depending on detection output), what is the new maximum throughput?
Section 23.4 -- Object Detection and Tracking
Exercise 23.11 (Calculation) A player detection model produces the following results on a test set of 500 frames: - True Positives: 9,200 - False Positives: 450 - False Negatives: 800
(a) Calculate precision, recall, and F1 score. (b) If the model uses a confidence threshold of 0.5, and lowering it to 0.3 would increase TP to 9,700 but also increase FP to 1,200, should you lower the threshold? Justify your answer for both a scouting application and an officiating application.
Exercise 23.12 (Programming) Implement the Intersection over Union (IoU) calculation for two bounding boxes. Your function should: (a) Accept two bounding boxes in the format $(x_{\min}, y_{\min}, x_{\max}, y_{\max})$. (b) Calculate the area of intersection. (c) Calculate the area of union. (d) Return the IoU value. (e) Handle edge cases (non-overlapping boxes, identical boxes, zero-area boxes).
Exercise 23.13 (Mathematical) A Kalman filter tracks a player with state vector $\mathbf{x} = (x, y, \dot{x}, \dot{y})^T$. The player is at position $(50, 30)$ with velocity $(2, 1)$ m/s at time $t$. The time step is $\Delta t = 0.04$ seconds (25 fps). (a) Write the state transition matrix $\mathbf{F}$. (b) Predict the state at time $t + \Delta t$. (c) If a detection places the player at $(50.12, 30.06)$ at time $t + \Delta t$, and the Kalman gain is $\mathbf{K} = \text{diag}(0.6, 0.6, 0.3, 0.3)$, compute the updated state estimate.
Exercise 23.14 (Analysis) The Hungarian algorithm is used for data association in multi-object tracking. Consider a scenario with three detections and three existing tracks, with the following cost matrix:
$$\mathbf{C} = \begin{pmatrix} 2.1 & 10.5 & 7.3 \\ 9.2 & 1.8 & 8.1 \\ 4.9 & 8.7 & 2.4 \end{pmatrix}$$
(a) Find the optimal assignment by inspection or systematic method. (b) What is the total assignment cost? (c) If a maximum cost threshold of 7.0 is applied (assignments above this cost are rejected), which assignments would be made? (d) What happens to unassigned detections and unassigned tracks?
Exercise 23.15 (Practical) You are evaluating a player tracking system by comparing its output against ground truth GPS data for 11 players over a 10-minute period. The system produces position estimates at 25 Hz. (a) How many position comparisons will you make in total? (b) Define three metrics you would use to evaluate tracking accuracy. (c) The system achieves a mean position error of 1.2m with a standard deviation of 0.8m. For tactical analysis requiring 0.5m accuracy, is this system suitable? What about for broadcasting graphics overlays?
Exercise 23.16 (Programming) Implement a simple team classification algorithm that uses k-means clustering on color histograms extracted from player bounding box crops. Your solution should: (a) Extract the torso region from each bounding box. (b) Compute a normalized color histogram for each crop. (c) Cluster players into three groups (two teams + referees). (d) Return team labels.
Exercise 23.17 (Conceptual) Explain three strategies for handling occlusion in multi-object tracking. For each strategy, describe: (a) how it works, (b) its advantages, (c) its limitations, and (d) the types of occlusion it handles best (brief, partial, full, group).
Section 23.5 -- Pose Estimation Applications
Exercise 23.18 (Calculation) A pose estimation model detects 17 COCO keypoints for a player. The keypoint coordinates for a player preparing to shoot are (in pixels):
| Keypoint | x | y |
|---|---|---|
| Left Shoulder | 450 | 200 |
| Right Shoulder | 490 | 205 |
| Left Hip | 455 | 280 |
| Right Hip | 485 | 278 |
| Left Knee | 440 | 340 |
| Right Knee | 510 | 330 |
| Left Ankle | 430 | 400 |
| Right Ankle | 530 | 360 |
(a) Calculate the body orientation angle from the shoulder line. (b) Calculate the knee angle of the right leg (the kicking leg). (c) Estimate which direction the player is facing. (d) If the right ankle moves to (550, 340) in the next frame (0.04s later), what is the angular velocity of the right lower leg about the right knee?
Exercise 23.19 (Programming) Write a function that takes a sequence of pose keypoints for a player over multiple frames and computes a "fatigue index" based on trunk lean angle. The function should: (a) Calculate the trunk angle (angle between the midpoint of shoulders, midpoint of hips, and vertical) for each frame. (b) Compute the average trunk angle for the first 15 minutes and the last 15 minutes. (c) Return the ratio as the fatigue index.
Exercise 23.20 (Analysis) A goalkeeper's dive is captured over 8 frames (0.32 seconds total). The hip keypoint positions are:
| Frame | x (m) | y (m) |
|---|---|---|
| 1 | 0.0 | 1.0 |
| 2 | 0.1 | 1.0 |
| 3 | 0.3 | 0.95 |
| 4 | 0.6 | 0.85 |
| 5 | 1.0 | 0.70 |
| 6 | 1.4 | 0.50 |
| 7 | 1.7 | 0.35 |
| 8 | 1.9 | 0.25 |
(a) Plot the dive trajectory. (b) Calculate the horizontal and vertical velocities at each time step. (c) Estimate the launch angle of the dive (the angle of the velocity vector at frame 3). (d) Calculate the total distance covered by the hip during the dive.
Exercise 23.21 (Research) Discuss the potential and limitations of using pose estimation for injury risk assessment. Consider: (a) What biomechanical indicators visible through pose estimation are associated with injury risk? (b) What are the accuracy requirements for such an application? (c) How would you validate such a system? (d) What ethical considerations arise from automated injury risk monitoring?
Section 23.6 -- Automated Event Detection
Exercise 23.22 (Programming) Implement a rule-based corner kick detector using tracking data. Your detector should identify when: - The ball goes out of play over the end line - The ball was last touched by a defending player - Play resumes with the ball placed in a corner arc
Define appropriate thresholds and handle edge cases.
Exercise 23.23 (Calculation) An event detection model for shot detection produces the following confusion matrix on a test set:
| Predicted Shot | Predicted No Shot | |
|---|---|---|
| Actual Shot | 142 | 18 |
| Actual No Shot | 35 | 4,805 |
(a) Calculate precision, recall, F1 score, and accuracy. (b) Explain why accuracy is a misleading metric here. (c) If this model is used for automated highlight generation, which error type (false positive or false negative) is more costly? Why? (d) Suggest a threshold adjustment strategy to optimize for the more important metric.
Exercise 23.24 (Design) Design an automated system for detecting pressing events from tracking data. Your design should include: (a) A precise definition of a "pressing event." (b) The input features required. (c) The detection algorithm (rule-based, ML, or hybrid). (d) The evaluation methodology. (e) At least three edge cases the system must handle.
Exercise 23.25 (Mathematical) A temporal convolutional network (TCN) for event detection uses causal convolutions with kernel size $k = 3$ and dilation factors $d = [1, 2, 4, 8]$ across four layers. (a) Calculate the receptive field of this network (how many past frames each output depends on). (b) At 25 fps, how many seconds of context does this represent? (c) Is this sufficient for detecting a counterattack that typically develops over 5--8 seconds? (d) How would you modify the architecture to increase the receptive field?
Exercise 23.26 (Analysis) Compare rule-based and machine learning approaches to event detection by filling in a comparison table for the following criteria: accuracy on well-defined events, accuracy on ambiguous events, training data requirements, interpretability, maintenance effort, generalization to new leagues, and computational cost. Justify each rating.
Exercise 23.27 (Practical) You have access to a dataset of 200 annotated matches with event labels. You want to train a shot detection model. (a) How would you split the data into training, validation, and test sets? Why is a random split potentially problematic? (b) How many positive examples (shots) would you expect in your dataset? (Assume an average of 25 shots per match.) (c) Describe a data augmentation strategy to address class imbalance. (d) Define the loss function you would use and explain why.
Section 23.7 -- Future of CV in Soccer
Exercise 23.28 (Essay) Write a 500-word analysis of how CV-based tracking from broadcast video could democratize soccer analytics. Consider: (a) current barriers to data access for lower-league clubs, (b) the types of analysis that would become possible, (c) the accuracy trade-offs compared to dedicated tracking systems, and (d) the potential impact on competitive balance.
Exercise 23.29 (Design) Design a multi-modal event detection system that combines video, audio, and text (commentary) to detect goals. Describe: (a) The features extracted from each modality. (b) The fusion strategy (early fusion, late fusion, or attention-based). (c) How you would handle missing modalities (e.g., no commentary available). (d) The expected improvement over a video-only system.
Exercise 23.30 (Ethics) A club wants to use CV-based pose estimation to monitor player fatigue during training sessions and automatically adjust training load. Discuss: (a) The potential benefits for player welfare. (b) Privacy concerns for players. (c) The risk of over-reliance on automated systems. (d) How you would design the system to balance these competing concerns. (e) What policies should govern the use and storage of this data?
Exercise 23.31 (Calculation) A club is considering two tracking solutions: - System A (Dedicated optical): $200,000 installation + $50,000/year. Accuracy: 0.2m. Home matches only (25/season). - System B (CV from broadcast): $30,000/year subscription. Accuracy: 1.5m. All matches (50/season).
(a) Calculate the 5-year total cost for each system. (b) Calculate the cost per tracked match over 5 years. (c) If the club's scouting department also uses System B to track 200 additional opposition matches per year, recalculate the cost per tracked match. (d) Under what circumstances would System A provide better value despite its higher cost?
Exercise 23.32 (Research) Investigate and summarize the current state of automated offside detection technology (as used in VAR systems). Your summary should cover: (a) The hardware setup required. (b) The CV algorithms used for player limb detection. (c) The accuracy and reliability of the system. (d) Controversies or limitations that have been reported. (e) How the system handles edge cases (e.g., player on the ground, overlapping players).
Exercise 23.33 (Integration) This exercise connects Chapter 23 to earlier chapters. Describe how CV-derived tracking data would feed into each of the following analytical frameworks: (a) Expected Goals (xG) models (Chapter 14). (b) Passing networks (Chapter 16). (c) Pressing metrics (Chapter 17). (d) For each, identify what additional information CV provides beyond traditional event data, and what limitations CV-derived data might introduce.