Chapter 27 Exercises
Section 27.1: Real-Time Data Infrastructure
Exercise 1: Latency Budget Allocation
A club's real-time analytics pipeline has a total latency budget of 2,000 ms. The capture latency is 50 ms and the render latency is 80 ms. If the transmission latency is 120 ms, how much latency budget remains for processing? If the processing stage must run three sequential models (tracking enrichment, formation detection, and xG computation), what is the maximum allowable latency per model assuming equal allocation?
Exercise 2: Throughput Calculation
A tracking system produces data at 25 Hz for 23 entities (22 players + 1 ball). Each tracking frame event is 256 bytes. Calculate: (a) The total event rate in events per second. (b) The required bandwidth in megabytes per second. (c) If using a Kafka broker with 10 ms latency and 1 MB/s throughput per partition, how many partitions are needed?
Exercise 3: Serialization Format Comparison
Write a Python program that creates a sample TrackingFrame object and serializes it to JSON, then compares the JSON size to a hand-packed binary format (using struct.pack). Calculate the compression ratio.
Exercise 4: Event Schema Design
Design a data schema (as a Python dataclass) for a real-time match event that covers passes, shots, tackles, and fouls. Include all fields necessary for downstream analytics. Justify your design choices in comments.
Exercise 5: Message Broker Selection
A club is choosing between Redis Streams and Apache Kafka for their real-time pipeline. They need to process a single match at a time, with a development team of two engineers. Write a 300-word comparative analysis recommending one solution, considering latency, throughput, operational complexity, and team size.
Section 27.2: Live Match Analytics
Exercise 6: Pressing Intensity Calculation
Given the following data for 4 defending players at time $t$:
| Player | Speed (m/s) | Angle to Ball Carrier (degrees) | Distance to Ball (m) |
|---|---|---|---|
| D1 | 8.2 | 15 | 10.5 |
| D2 | 4.1 | 45 | 14.0 |
| D3 | 9.0 | 10 | 8.2 |
| D4 | 3.5 | 80 | 20.0 |
Using a pressing engagement radius of $R = 15$ meters, compute the pressing intensity $\text{PI}(t)$.
Exercise 7: Momentum Score Implementation
Implement a Python function that computes the momentum score $M(t)$ given arrays of xT rate, pressing intensity, possession share, and territorial index over a rolling window. Use weights $\alpha = 0.3$, $\beta = 0.25$, $\gamma = 0.25$, $\delta = 0.2$.
Exercise 8: Formation Detection
Given the following outfield player positions for a team (excluding goalkeeper):
positions = [
(15, 20), (15, 34), (15, 48), (15, 62), # Back four
(35, 25), (35, 40), (35, 55), # Midfield three
(55, 15), (55, 40), (55, 65), # Front three
]
(a) Compute the centroid of each "line" of players. (b) Classify this as a 4-3-3, 4-4-2, or 3-5-2 formation by computing the assignment cost to reference templates. (c) What happens to the classification if position (35, 40) moves to (48, 40)?
Exercise 9: Real-Time xG Model
Build a simple logistic regression xG model using the following training data:
| Distance (m) | Angle (degrees) | Goal (0/1) |
|---|---|---|
| 8 | 25 | 1 |
| 15 | 12 | 0 |
| 6 | 35 | 1 |
| 22 | 8 | 0 |
| 10 | 20 | 1 |
| 30 | 5 | 0 |
| 12 | 18 | 0 |
| 7 | 30 | 1 |
Train the model and compute the xG for a shot from 11 meters at a 22-degree angle.
Exercise 10: Fatigue Index
A player has the following high-speed running (HSR) data over 6 time intervals (each 15 minutes):
hsr_distances = [180, 210, 195, 160, 120, 90] # meters per interval
Using an exponential decay factor of $\lambda = 0.1$ per interval, compute the exponentially weighted HSR metric at the end of the match (after interval 6). Compare this to the simple cumulative HSR.
Section 27.3: Decision Support Systems
Exercise 11: Substitution Optimization
A team is losing 0-1 in the 65th minute. They have the following substitution options:
| Sub In | Quality Rating | Context Fit |
|---|---|---|
| Player A (striker) | 9.5 | 0.9 |
| Player B (midfielder) | 9.0 | 0.7 |
| Player C (winger) | 10.0 | 0.6 |
The player to be substituted (Player X) has a quality rating of 8.5 and a fatigue index of 0.4. Using the simplified substitution impact model from Section 27.3.2, with $R(65) = 0.28$ (remaining influence factor), compute $\Delta\text{xPoints}$ for each option and recommend the optimal substitution.
Exercise 12: Tactical Gap Detection
Implement a Python function that computes the defensive gap between midfield and defensive lines, given lists of y-coordinates for midfielders and defenders. Test it with sample data and define an "alert threshold" above which the system should warn the coaching staff.
Exercise 13: Set-Piece Pattern Matching
Given the following feature vectors for 3 historical corner kick routines and one observed setup:
historical = [
[0.8, 0.2, 0.6, 0.1], # Near post delivery
[0.2, 0.8, 0.3, 0.7], # Far post delivery
[0.5, 0.5, 0.9, 0.2], # Short corner
]
observed = [0.7, 0.3, 0.5, 0.15]
Compute the similarity score $S$ for each historical routine with $\sigma = 0.5$. Which routine is most likely being executed?
Exercise 14: Confidence Intervals
A substitution recommendation model outputs the following xPoints changes across 10 ensemble members:
ensemble_outputs = [0.12, 0.08, 0.15, -0.02, 0.11, 0.09, 0.20, 0.05, 0.13, 0.07]
Compute the mean, standard deviation, and 95% confidence interval. Should the system flag this recommendation as high-confidence or low-confidence? Justify with a threshold.
Exercise 15: Decision Tree for Tactical Adjustments
Design a decision tree (pseudocode or Python) that takes the following inputs and outputs a tactical recommendation: - Pressing intensity (high/low) - Territorial dominance (our half/their half) - Score differential (-2 to +2) - Time remaining (minutes)
Section 27.4: Visualization for Quick Decisions
Exercise 16: Sparkline Generation
Write a Python function using matplotlib that generates a sparkline (a small, word-sized line chart) from a list of values. The function should produce a figure no larger than 150x30 pixels, with no axes, labels, or borders.
Exercise 17: Voronoi Pitch Control
Using the plot_pitch_control function from Section 27.4.5 as a starting point, extend it to:
(a) Color the Voronoi regions with transparency based on the nearest player's speed (faster = more saturated).
(b) Add a ball position marker.
(c) Add player jersey numbers as text labels.
Exercise 18: Alert Dashboard Mockup
Design (in code or pseudocode) a dashboard layout with three panels: 1. A momentum bar showing the current momentum score (-1 to +1). 2. A fatigue alert list showing players sorted by fatigue index. 3. A mini pitch map showing current positions.
Implement this using matplotlib with a 3-panel figure.
Exercise 19: Color-Blind Accessible Design
The standard color encoding (green/amber/red/blue) is problematic for color-blind users. Propose an alternative encoding scheme that uses both color and shape/pattern to convey the same urgency levels. Implement a sample alert display in Python.
Exercise 20: Heat Map Generation
Write a Python function that generates a real-time pressing heat map from the last 5 minutes of player position data. Use a 2D Gaussian kernel density estimate and overlay it on a pitch diagram.
Section 27.5: Bench-Side Technology
Exercise 21: Failure Mode Analysis
For a real-time analytics system deployed at a stadium, list 5 plausible failure scenarios not covered in Table 27.5.4. For each, describe the impact and propose a mitigation strategy.
Exercise 22: Communication Protocol Design
Design a structured halftime briefing template (as a Python dictionary/data structure) that includes: - Key performance indicators for both teams - Top 3 tactical observations - Substitution recommendations with confidence levels - Set-piece analysis summary - Physical load warnings
Exercise 23: Regulatory Compliance Checklist
Research and create a compliance checklist (minimum 10 items) for deploying real-time analytics technology at a UEFA Champions League match. Consider FIFA regulations, UEFA-specific rules, and data protection requirements.
Section 27.6: Post-Match Rapid Analysis
Exercise 24: Automated Report Generator
Extend the generate_player_summary function from Section 27.6.2 to include:
- Defensive actions (tackles, interceptions, clearances)
- Positional heat map description (which zones the player occupied most)
- Comparison to season average for each metric
Exercise 25: Video Tagging System
Design a Python class MatchEventTagger that:
- Stores tagged events with timestamps, event types, players involved, and pitch zones.
- Supports querying by event type, player, zone, or time range.
- Generates playlists (ordered lists of time ranges) for common queries.
Exercise 26: Physical Load Report
Write a Python function that computes the metabolic power for a player given arrays of velocity and acceleration sampled at 25 Hz. Plot the metabolic power over time and annotate peaks above a threshold.
Exercise 27: Natural Language Match Summary
Write a function that takes a dictionary of match statistics (possession, shots, xG, passes, etc.) for both teams and generates a 3-paragraph natural language summary of the match. Use template-based NLG with conditional logic for different match narratives (dominant win, close contest, upset, etc.).
Section 27.7: Building Real-Time Pipelines
Exercise 28: Windowing Implementation
Implement three windowing strategies in Python: (a) Tumbling window: aggregate events into non-overlapping 5-minute bins. (b) Sliding window: compute a rolling average over 5-minute windows, updated every 30 seconds. (c) Session window: group events into possession sequences separated by gaps of > 3 seconds.
Exercise 29: Pipeline Latency Profiling
Using the measure_pipeline_latency function from Section 27.7.4, create a test harness that:
(a) Generates 10,000 synthetic tracking events.
(b) Defines a mock pipeline function with random processing time (5-50 ms).
(c) Measures and plots the latency distribution (histogram + CDF).
Exercise 30: State Management Simulation
Implement a stateful stream processor that maintains cumulative distance, sprint count, and rolling average speed for each player. The processor should: (a) Handle out-of-order events (events arriving with timestamps earlier than the latest processed event). (b) Support checkpointing (serialize state to a file every 1,000 events). (c) Support recovery from a checkpoint.
Exercise 31: End-to-End Pipeline
Build a complete mini real-time pipeline that: 1. Reads simulated tracking data from a CSV file (one row per frame). 2. Computes per-player distance, speed, and acceleration in real-time. 3. Detects formations every 30 seconds. 4. Computes a simple momentum score. 5. Outputs a live-updating text-based dashboard to the console.
Exercise 32: Security Audit
Design a security audit checklist for a real-time soccer analytics system. Include at least 15 items covering network security, data encryption, access control, logging, and compliance. Implement a Python function that validates a system configuration dictionary against this checklist.
Exercise 33: Scaling Analysis
A scouting network needs to process 8 concurrent matches, each generating 25 Hz tracking data for 23 entities. Each event requires 2 ms of processing time on a single core. (a) Calculate the total processing load in core-milliseconds per second. (b) How many CPU cores are needed with a 70% utilization target? (c) If cloud compute costs $0.05 per core-hour, what is the cost per match-day (assuming 8 matches over 3 hours)?
Challenge Problems
Exercise 34: Adversarial Robustness
An opponent becomes aware that your team uses real-time formation detection. They deliberately adopt a fluid, positionless formation to confuse your system. Describe (a) how this would affect the formation detection algorithm from Section 27.2.3, (b) two alternative approaches that would be more robust to this adversarial strategy, and (c) implement one of these approaches in Python.
Exercise 35: Causal Inference in Real-Time
A common pitfall in momentum analysis is confusing correlation with causation. Design an experiment (using historical match data) to test whether the momentum score from Section 27.2.2 has genuine predictive power for subsequent goals, or whether it merely reflects recent events. Implement the experimental design in Python, including appropriate statistical tests.