Chapter 27 Exercises

Section 27.1: Real-Time Data Infrastructure

Exercise 1: Latency Budget Allocation

A club's real-time analytics pipeline has a total latency budget of 2,000 ms. The capture latency is 50 ms and the render latency is 80 ms. If the transmission latency is 120 ms, how much latency budget remains for processing? If the processing stage must run three sequential models (tracking enrichment, formation detection, and xG computation), what is the maximum allowable latency per model assuming equal allocation?

Exercise 2: Throughput Calculation

A tracking system produces data at 25 Hz for 23 entities (22 players + 1 ball). Each tracking frame event is 256 bytes. Calculate: (a) The total event rate in events per second. (b) The required bandwidth in megabytes per second. (c) If using a Kafka broker with 10 ms latency and 1 MB/s throughput per partition, how many partitions are needed?

Exercise 3: Serialization Format Comparison

Write a Python program that creates a sample TrackingFrame object and serializes it to JSON, then compares the JSON size to a hand-packed binary format (using struct.pack). Calculate the compression ratio.

Exercise 4: Event Schema Design

Design a data schema (as a Python dataclass) for a real-time match event that covers passes, shots, tackles, and fouls. Include all fields necessary for downstream analytics. Justify your design choices in comments.

Exercise 5: Message Broker Selection

A club is choosing between Redis Streams and Apache Kafka for their real-time pipeline. They need to process a single match at a time, with a development team of two engineers. Write a 300-word comparative analysis recommending one solution, considering latency, throughput, operational complexity, and team size.


Section 27.2: Live Match Analytics

Exercise 6: Pressing Intensity Calculation

Given the following data for 4 defending players at time $t$:

Player Speed (m/s) Angle to Ball Carrier (degrees) Distance to Ball (m)
D1 8.2 15 10.5
D2 4.1 45 14.0
D3 9.0 10 8.2
D4 3.5 80 20.0

Using a pressing engagement radius of $R = 15$ meters, compute the pressing intensity $\text{PI}(t)$.

Exercise 7: Momentum Score Implementation

Implement a Python function that computes the momentum score $M(t)$ given arrays of xT rate, pressing intensity, possession share, and territorial index over a rolling window. Use weights $\alpha = 0.3$, $\beta = 0.25$, $\gamma = 0.25$, $\delta = 0.2$.

Exercise 8: Formation Detection

Given the following outfield player positions for a team (excluding goalkeeper):

positions = [
    (15, 20), (15, 34), (15, 48), (15, 62),   # Back four
    (35, 25), (35, 40), (35, 55),               # Midfield three
    (55, 15), (55, 40), (55, 65),               # Front three
]

(a) Compute the centroid of each "line" of players. (b) Classify this as a 4-3-3, 4-4-2, or 3-5-2 formation by computing the assignment cost to reference templates. (c) What happens to the classification if position (35, 40) moves to (48, 40)?

Exercise 9: Real-Time xG Model

Build a simple logistic regression xG model using the following training data:

Distance (m) Angle (degrees) Goal (0/1)
8 25 1
15 12 0
6 35 1
22 8 0
10 20 1
30 5 0
12 18 0
7 30 1

Train the model and compute the xG for a shot from 11 meters at a 22-degree angle.

Exercise 10: Fatigue Index

A player has the following high-speed running (HSR) data over 6 time intervals (each 15 minutes):

hsr_distances = [180, 210, 195, 160, 120, 90]  # meters per interval

Using an exponential decay factor of $\lambda = 0.1$ per interval, compute the exponentially weighted HSR metric at the end of the match (after interval 6). Compare this to the simple cumulative HSR.


Section 27.3: Decision Support Systems

Exercise 11: Substitution Optimization

A team is losing 0-1 in the 65th minute. They have the following substitution options:

Sub In Quality Rating Context Fit
Player A (striker) 9.5 0.9
Player B (midfielder) 9.0 0.7
Player C (winger) 10.0 0.6

The player to be substituted (Player X) has a quality rating of 8.5 and a fatigue index of 0.4. Using the simplified substitution impact model from Section 27.3.2, with $R(65) = 0.28$ (remaining influence factor), compute $\Delta\text{xPoints}$ for each option and recommend the optimal substitution.

Exercise 12: Tactical Gap Detection

Implement a Python function that computes the defensive gap between midfield and defensive lines, given lists of y-coordinates for midfielders and defenders. Test it with sample data and define an "alert threshold" above which the system should warn the coaching staff.

Exercise 13: Set-Piece Pattern Matching

Given the following feature vectors for 3 historical corner kick routines and one observed setup:

historical = [
    [0.8, 0.2, 0.6, 0.1],  # Near post delivery
    [0.2, 0.8, 0.3, 0.7],  # Far post delivery
    [0.5, 0.5, 0.9, 0.2],  # Short corner
]
observed = [0.7, 0.3, 0.5, 0.15]

Compute the similarity score $S$ for each historical routine with $\sigma = 0.5$. Which routine is most likely being executed?

Exercise 14: Confidence Intervals

A substitution recommendation model outputs the following xPoints changes across 10 ensemble members:

ensemble_outputs = [0.12, 0.08, 0.15, -0.02, 0.11, 0.09, 0.20, 0.05, 0.13, 0.07]

Compute the mean, standard deviation, and 95% confidence interval. Should the system flag this recommendation as high-confidence or low-confidence? Justify with a threshold.

Exercise 15: Decision Tree for Tactical Adjustments

Design a decision tree (pseudocode or Python) that takes the following inputs and outputs a tactical recommendation: - Pressing intensity (high/low) - Territorial dominance (our half/their half) - Score differential (-2 to +2) - Time remaining (minutes)


Section 27.4: Visualization for Quick Decisions

Exercise 16: Sparkline Generation

Write a Python function using matplotlib that generates a sparkline (a small, word-sized line chart) from a list of values. The function should produce a figure no larger than 150x30 pixels, with no axes, labels, or borders.

Exercise 17: Voronoi Pitch Control

Using the plot_pitch_control function from Section 27.4.5 as a starting point, extend it to: (a) Color the Voronoi regions with transparency based on the nearest player's speed (faster = more saturated). (b) Add a ball position marker. (c) Add player jersey numbers as text labels.

Exercise 18: Alert Dashboard Mockup

Design (in code or pseudocode) a dashboard layout with three panels: 1. A momentum bar showing the current momentum score (-1 to +1). 2. A fatigue alert list showing players sorted by fatigue index. 3. A mini pitch map showing current positions.

Implement this using matplotlib with a 3-panel figure.

Exercise 19: Color-Blind Accessible Design

The standard color encoding (green/amber/red/blue) is problematic for color-blind users. Propose an alternative encoding scheme that uses both color and shape/pattern to convey the same urgency levels. Implement a sample alert display in Python.

Exercise 20: Heat Map Generation

Write a Python function that generates a real-time pressing heat map from the last 5 minutes of player position data. Use a 2D Gaussian kernel density estimate and overlay it on a pitch diagram.


Section 27.5: Bench-Side Technology

Exercise 21: Failure Mode Analysis

For a real-time analytics system deployed at a stadium, list 5 plausible failure scenarios not covered in Table 27.5.4. For each, describe the impact and propose a mitigation strategy.

Exercise 22: Communication Protocol Design

Design a structured halftime briefing template (as a Python dictionary/data structure) that includes: - Key performance indicators for both teams - Top 3 tactical observations - Substitution recommendations with confidence levels - Set-piece analysis summary - Physical load warnings

Exercise 23: Regulatory Compliance Checklist

Research and create a compliance checklist (minimum 10 items) for deploying real-time analytics technology at a UEFA Champions League match. Consider FIFA regulations, UEFA-specific rules, and data protection requirements.


Section 27.6: Post-Match Rapid Analysis

Exercise 24: Automated Report Generator

Extend the generate_player_summary function from Section 27.6.2 to include: - Defensive actions (tackles, interceptions, clearances) - Positional heat map description (which zones the player occupied most) - Comparison to season average for each metric

Exercise 25: Video Tagging System

Design a Python class MatchEventTagger that: - Stores tagged events with timestamps, event types, players involved, and pitch zones. - Supports querying by event type, player, zone, or time range. - Generates playlists (ordered lists of time ranges) for common queries.

Exercise 26: Physical Load Report

Write a Python function that computes the metabolic power for a player given arrays of velocity and acceleration sampled at 25 Hz. Plot the metabolic power over time and annotate peaks above a threshold.

Exercise 27: Natural Language Match Summary

Write a function that takes a dictionary of match statistics (possession, shots, xG, passes, etc.) for both teams and generates a 3-paragraph natural language summary of the match. Use template-based NLG with conditional logic for different match narratives (dominant win, close contest, upset, etc.).


Section 27.7: Building Real-Time Pipelines

Exercise 28: Windowing Implementation

Implement three windowing strategies in Python: (a) Tumbling window: aggregate events into non-overlapping 5-minute bins. (b) Sliding window: compute a rolling average over 5-minute windows, updated every 30 seconds. (c) Session window: group events into possession sequences separated by gaps of > 3 seconds.

Exercise 29: Pipeline Latency Profiling

Using the measure_pipeline_latency function from Section 27.7.4, create a test harness that: (a) Generates 10,000 synthetic tracking events. (b) Defines a mock pipeline function with random processing time (5-50 ms). (c) Measures and plots the latency distribution (histogram + CDF).

Exercise 30: State Management Simulation

Implement a stateful stream processor that maintains cumulative distance, sprint count, and rolling average speed for each player. The processor should: (a) Handle out-of-order events (events arriving with timestamps earlier than the latest processed event). (b) Support checkpointing (serialize state to a file every 1,000 events). (c) Support recovery from a checkpoint.

Exercise 31: End-to-End Pipeline

Build a complete mini real-time pipeline that: 1. Reads simulated tracking data from a CSV file (one row per frame). 2. Computes per-player distance, speed, and acceleration in real-time. 3. Detects formations every 30 seconds. 4. Computes a simple momentum score. 5. Outputs a live-updating text-based dashboard to the console.

Exercise 32: Security Audit

Design a security audit checklist for a real-time soccer analytics system. Include at least 15 items covering network security, data encryption, access control, logging, and compliance. Implement a Python function that validates a system configuration dictionary against this checklist.

Exercise 33: Scaling Analysis

A scouting network needs to process 8 concurrent matches, each generating 25 Hz tracking data for 23 entities. Each event requires 2 ms of processing time on a single core. (a) Calculate the total processing load in core-milliseconds per second. (b) How many CPU cores are needed with a 70% utilization target? (c) If cloud compute costs $0.05 per core-hour, what is the cost per match-day (assuming 8 matches over 3 hours)?


Challenge Problems

Exercise 34: Adversarial Robustness

An opponent becomes aware that your team uses real-time formation detection. They deliberately adopt a fluid, positionless formation to confuse your system. Describe (a) how this would affect the formation detection algorithm from Section 27.2.3, (b) two alternative approaches that would be more robust to this adversarial strategy, and (c) implement one of these approaches in Python.

Exercise 35: Causal Inference in Real-Time

A common pitfall in momentum analysis is confusing correlation with causation. Design an experiment (using historical match data) to test whether the momentum score from Section 27.2.2 has genuine predictive power for subsequent goals, or whether it merely reflects recent events. Implement the experimental design in Python, including appropriate statistical tests.