Chapter 26 Exercises

Section 26.1: Understanding Soccer Injuries

Exercise 26.1 (Conceptual) Define the injury burden metric and explain why it provides a more useful summary of a team's injury problem than incidence or severity alone. Calculate the injury burden for hamstring injuries given an incidence of 1.2 injuries per 1,000 match hours and a mean severity of 18 days.

Exercise 26.2 (Data Analysis) The following table shows injury data for a professional soccer squad over three seasons:

Season Total Injuries Total Days Lost Total Exposure (hrs) League Position
2021-22 58 1,240 7,200 6th
2022-23 42 890 7,400 3rd
2023-24 51 1,100 7,100 5th

(a) Calculate the injury incidence rate (per 1,000 hours) for each season. (b) Calculate the mean severity (days per injury) for each season. (c) Calculate the injury burden for each season. (d) Discuss the relationship between injury metrics and league position in this dataset.

Exercise 26.3 (Statistical) A club's data department has identified the following injury incidence rates by position:

  • Goalkeepers: 3.2 per 1,000 hours
  • Center-backs: 7.8 per 1,000 hours
  • Full-backs: 9.4 per 1,000 hours
  • Central midfielders: 8.1 per 1,000 hours
  • Wingers: 10.9 per 1,000 hours
  • Strikers: 10.2 per 1,000 hours

Using a Poisson model, calculate the probability that a winger sustains at least one injury during a 200-hour exposure period.

Exercise 26.4 (Research) Explain the difference between intrinsic and extrinsic injury risk factors. For each category, identify three risk factors that are modifiable and three that are non-modifiable. Discuss how analytics can contribute to managing each modifiable factor.

Section 26.2: Load Monitoring Frameworks

Exercise 26.5 (Calculation) A player's daily sRPE loads (in arbitrary units) over the past 35 days are provided below. Calculate: (a) The 7-day acute workload (days 29-35). (b) The 28-day chronic workload (average weekly load for days 1-28). (c) The acute:chronic workload ratio. (d) Classify the ACWR according to Gabbett's zones (sweet spot: 0.8-1.3, danger zone: >1.5).

Daily loads (AU): 450, 600, 0, 500, 700, 550, 0, 400, 650, 0, 480, 620, 500, 0, 350, 600, 0, 520, 700, 580, 0, 420, 550, 0, 500, 600, 480, 0, 700, 800, 650, 0, 750, 600, 500.

Exercise 26.6 (Programming) Write a Python function that calculates the EWMA-based ACWR for a given time series of daily loads. Your function should: (a) Accept a list or array of daily loads and optional parameters for acute and chronic window sizes. (b) Calculate EWMA for both acute (default N=7) and chronic (default N=28) windows. (c) Return the ACWR time series. (d) Handle edge cases (e.g., division by zero when chronic load is zero).

Test your function on the data from Exercise 26.5.

Exercise 26.7 (Critical Thinking) Describe the mathematical coupling artifact in the rolling-average ACWR model. Provide a numerical example demonstrating how a high acute load can dampen the ACWR due to its inclusion in the chronic window. Propose and justify a correction.

Exercise 26.8 (Data Analysis) A sports scientist has collected the following data for a midfielder over a 4-week period:

Week Total Distance (km) HSR Distance (km) Sprint Distance (m) sRPE Load (AU) Sleep (hrs) Wellness (1-10)
1 42.5 4.2 320 2,800 9.8 9.5
2 48.3 7.1 410 3,200 9.2 9.0
3 55.1 8.8 520 3,900 8.5 7.5
4 38.0 3.5 280 2,400 9.0 8.0

(a) Calculate training monotony and strain for weeks 1-3 (assume 6 training days per week with load evenly distributed plus one rest day). (b) Identify any concerning trends in the data. (c) If you were the sports scientist, what recommendations would you make for week 5?

Exercise 26.9 (Conceptual) Compare and contrast three different internal load metrics: session RPE (sRPE), Banister's TRIMP, and Edwards' training load. Discuss the advantages and limitations of each, and explain in what contexts each might be preferred.

Exercise 26.10 (Programming) Implement a comprehensive daily load monitoring dashboard data generator in Python. Create synthetic data for a 25-player squad over a 40-week season including: - Daily GPS metrics (total distance, HSR, sprints) - Daily sRPE - Weekly wellness scores - Match days (2 per week during congested periods, 1 otherwise)

Calculate rolling ACWR for total distance and sRPE for each player.

Section 26.3: Injury Risk Models

Exercise 26.11 (Mathematical) Derive the precision (positive predictive value) for an injury risk model as a function of sensitivity, specificity, and base rate using Bayes' theorem. Plot precision as a function of base rate for sensitivities of 0.7, 0.8, and 0.9 with specificity fixed at 0.90.

Exercise 26.12 (Programming) Build a logistic regression injury risk model using synthetic data: (a) Generate a synthetic dataset of 5,000 player-days with features: ACWR, age, previous injuries (count), cumulative load, sleep hours. (b) Simulate injury outcomes with a realistic base rate of approximately 1.5%. (c) Fit a logistic regression model with L2 regularization. (d) Evaluate the model using AUC-ROC, precision-recall curves, and the Brier score. (e) Interpret the coefficients in terms of odds ratios.

Exercise 26.13 (Programming) Extend Exercise 26.12 by: (a) Fitting a Random Forest classifier to the same dataset. (b) Comparing the Random Forest to the logistic regression model using AUC-ROC and calibration curves. (c) Extracting and plotting feature importance from the Random Forest. (d) Discussing which model you would recommend for deployment in a club setting and why.

Exercise 26.14 (Statistical) A club's injury risk model has AUC-ROC = 0.72. The head of performance asks: "Is this model good enough to use?" Write a 500-word memo addressing this question, covering: (a) What AUC-ROC means in practical terms. (b) How the low base rate affects the model's practical utility. (c) What additional metrics should be examined beyond AUC-ROC. (d) How the model could be used despite imperfect accuracy.

Exercise 26.15 (Advanced) Implement a Cox proportional hazards model for time-to-injury analysis: (a) Generate synthetic survival data for 100 players over a 300-day season, with covariates including ACWR, age, and fitness level. (b) Fit a Cox PH model using the lifelines library. (c) Plot survival curves stratified by ACWR quartile. (d) Test the proportional hazards assumption. (e) Interpret the hazard ratios.

Exercise 26.16 (Feature Engineering) Design a feature engineering pipeline for an injury risk model. For each feature, specify: (a) The raw data source. (b) The transformation applied. (c) The physiological or empirical rationale. (d) Expected direction of effect on injury risk.

Create at least 15 features spanning load, readiness, history, and contextual domains.

Section 26.4: Recovery and Return-to-Play

Exercise 26.17 (Modeling) A player's CMJ height data (in cm) following a hamstring injury and return to play are:

Day 0 (post-injury): 27.0 cm Day 7: 30.0 cm Day 14: 34.0 cm Day 21: 36.5 cm Day 28: 38.0 cm Baseline: 40.0 cm

(a) Fit the exponential recovery model $R(t) = R_0 + (R_{\text{baseline}} - R_0)(1 - e^{-t/\tau})$ to this data using nonlinear least squares. (b) Estimate the time constant $\tau$. (c) Predict when the player will reach 95% of baseline. (d) Discuss the limitations of this simple model.

Exercise 26.18 (Decision Analysis) A player has completed rehabilitation and achieved 92% of baseline strength. The team has a crucial Champions League match in 3 days. The head coach wants to include the player. Using a decision-theoretic framework: (a) Define the relevant outcomes and their probabilities (you may use reasonable estimates). (b) Assign costs to each outcome. (c) Calculate the expected cost of playing vs. not playing the player. (d) What additional information would change your recommendation?

Exercise 26.19 (Data Analysis) Analyze the following re-injury data:

Player Initial Injury Duration (days) Days Before Re-injury Re-injury Duration (days)
A 14 18 21
B 21 45 28
C 10 12 18
D 28 60 35
E 7 8 14
F 35 90 42

(a) Calculate the mean ratio of re-injury duration to initial injury duration. (b) Is there a correlation between initial injury duration and time to re-injury? (c) Fit a simple linear regression predicting re-injury duration from initial injury duration. (d) Discuss what these patterns suggest for return-to-play decision-making.

Exercise 26.20 (Programming) Build a return-to-play readiness score that combines multiple metrics into a single composite index. The score should: (a) Accept inputs: CMJ (% baseline), strength (% baseline), wellness score (0-10), training load tolerance (% target), and psychological readiness (0-10). (b) Apply appropriate weights to each component (justify your weights). (c) Return a readiness score (0-100) and a traffic light classification (red/amber/green). (d) Test with at least 5 example player profiles.

Section 26.5: Scheduling and Rotation Strategy

Exercise 26.21 (Optimization) A team has 8 matches in 25 days. They have a squad of 22 outfield players and 3 goalkeepers. Formulate a linear programming model to optimize squad rotation: (a) Define decision variables, objective function, and constraints. (b) Include positional requirements (4 defenders, 3 midfielders, 3 forwards per match in a 4-3-3). (c) Add a constraint that no outfield player starts more than 5 of the 8 matches. (d) Add a constraint that no player starts 3 consecutive matches. (e) Solve the model using Python's scipy.optimize or PuLP.

Exercise 26.22 (Simulation) Simulate a congested fixture period of 6 matches in 18 days: (a) Create a roster of 20 outfield players, each with a quality rating (1-100), a fatigue accumulation rate, and a recovery rate. (b) Simulate three rotation strategies: (i) no rotation (always play the best XI), (ii) systematic rotation (rotate 3-4 players per match), (iii) optimized rotation (using a greedy algorithm). (c) Track cumulative fatigue, match performance, and injury probability for each strategy. (d) Compare the total expected performance and injury outcomes across strategies.

Exercise 26.23 (Case Analysis) During the 2023-24 season, a Premier League club plays Saturday-Tuesday-Saturday for three consecutive weeks. Analyze the following scenario: - The squad has 24 fit outfield players. - 4 players are considered "indispensable" (they should play every important match). - 2 matches are against top-6 opponents (high importance). - 4 matches are against lower-half opponents (moderate importance).

Design a rotation plan that: (a) protects the 4 key players from starting all 6 matches, (b) fields the strongest available XI for the 2 big matches, and (c) ensures no player exceeds an ACWR of 1.3. Present your plan as a table.

Exercise 26.24 (Programming) Implement the performance decay model from Section 26.5.3 in Python: (a) Define a function that calculates player quality for a given match based on their match history in the preceding 14 days. (b) Calibrate the fatigue sensitivity parameter $\delta$ using reasonable assumptions. (c) Plot quality decay for a player starting matches on days 1, 4, 7, and 10 versus a player starting only on days 1 and 7. (d) Discuss how this model could be integrated into a rotation optimization framework.

Section 26.6: Long-Term Player Health

Exercise 26.25 (Data Analysis) A club has career load data for 50 players. The following summary statistics are provided:

Age Group Mean Career km Mean Injury Rate (per 1000 hrs) Mean Recovery Time (days)
<23 8,500 8.2 12
23-27 22,000 9.1 15
28-31 38,000 10.8 19
32+ 52,000 13.4 24

(a) Calculate the correlation between age group (using midpoint ages) and injury rate. (b) Calculate the correlation between career km and recovery time. (c) Propose a model that predicts recovery time as a function of age and career load. (d) Discuss the implications for managing veteran players.

Exercise 26.26 (Modeling) Implement the aging capacity model from Section 26.6.4: $$C(a) = C_{\text{peak}} \cdot e^{-\lambda(a - a_{\text{peak}})^2}$$

(a) Plot the capacity curve for $C_{\text{peak}} = 100$, $a_{\text{peak}} = 26$, and $\lambda = 0.005, 0.01, 0.02$. (b) For each value of $\lambda$, calculate the age at which capacity drops to 80% of peak. (c) Discuss what the $\lambda$ parameter represents physiologically. (d) How would you estimate $\lambda$ from real data? What data would you need?

Exercise 26.27 (Programming) Create a career load tracker that: (a) Simulates a player's career from age 18 to 35 with seasonal load data. (b) Applies age-dependent load capacity limits. (c) Tracks cumulative load and plots it against a recommended career load trajectory. (d) Flags seasons where annual load exceeds recommended limits. (e) Estimates career load percentile relative to a population of simulated players.

Section 26.7: Integration with Performance Staff

Exercise 26.28 (Design) Design a daily squad status dashboard for a head coach. Specify: (a) The top-level view (what information is visible at a glance). (b) The drill-down views (what information is available on request). (c) The color-coding scheme and threshold definitions. (d) The data update frequency and latency requirements. (e) Sketch a wireframe layout (describe in words or pseudocode).

Exercise 26.29 (Communication) Write a one-page brief for a head coach explaining why a key player should be rested for an upcoming match. Your brief should: (a) Present the relevant load data in a non-technical format. (b) Quantify the injury risk in terms the coach can relate to. (c) Propose an alternative (e.g., start with reduced minutes, substitute role). (d) Acknowledge the competitive cost of resting the player. (e) Avoid jargon and statistical terminology.

Exercise 26.30 (Ethics) A club's data science team has developed an injury risk model that can predict with reasonable accuracy which players are likely to suffer season-ending injuries within the next 12 months. The club's director of football asks the team to provide these predictions for contract negotiation purposes (to avoid extending contracts of high-risk players).

(a) Identify the ethical concerns with this request. (b) Propose guidelines for the appropriate use of injury prediction data. (c) Discuss how player unions and collective bargaining agreements might address this issue. (d) Consider the analogy with genetic testing in employment/insurance contexts.

Exercise 26.31 (Capstone) Design a complete injury prevention analytics program for a newly promoted Premier League club. Your proposal should cover: (a) Data collection infrastructure (hardware, software, processes). (b) Staffing requirements (roles, qualifications, reporting structure). (c) Key metrics and KPIs for the program. (d) A phased implementation plan (Year 1, Year 2, Year 3). (e) Expected costs and projected benefits (injury reduction, financial savings). (f) How you would measure the program's success.

Exercise 26.32 (Research) Write a 1,000-word literature review on one of the following topics: (a) The predictive validity of the ACWR for injury in professional soccer. (b) Machine learning approaches to injury prediction: promises and pitfalls. (c) The role of psychological factors in injury risk and recovery. (d) Sleep and recovery in elite soccer: current evidence and practical recommendations.

Include at least 8 peer-reviewed references.