Chapter 5 Key Takeaways

Core Concepts at a Glance

1. Traditional Statistics Are Necessary but Insufficient

Traditional box-score statistics (goals, assists, pass completion rate, shots, clean sheets) remain widely understood and useful for basic communication. However, they suffer from five systematic limitations:

Limitation Example
Lack of context Goals do not reflect opponent quality
No event weighting All shots count equally regardless of location
Credit assignment Only the final passer gets an assist
Small sample sizes 80--100 shots per season yield noisy conversion rates
Selection bias Ambitious passers are penalized by completion rates

2. Desirable Metric Properties

A well-designed soccer metric should exhibit:

Property Definition Test
Validity Measures what it claims to measure Correlation with observable outcomes
Reliability Consistent under similar conditions Split-half correlation
Discrimination Separates genuinely different players ICC > 0.3
Interpretability Stakeholders understand its meaning Express in natural units (goals, points)
Actionability Points toward a decision Tied to recruitment, tactics, or training

3. Signal-to-Noise Decomposition

$$\text{Observed Value} = \text{True Talent} + \text{Context Effects} + \text{Random Noise}$$

Good metric design maximizes signal (true talent) relative to noise through larger samples, context adjustments, and appropriate normalization.

4. Rate vs. Counting Statistics

Type Definition Best For Watch Out For
Counting Accumulates over time (e.g., total goals) Volume, squad contribution, awards Penalizes players with less playing time
Rate Normalized by denominator (e.g., goals per 90) Efficiency, cross-player comparison Unreliable with small samples (< 900 min)

Per-90 normalization:

$$\text{Metric per 90} = \frac{\text{Count}}{\text{Minutes}} \times 90$$

5. Context Adjustments

Adjustment Formula / Approach Purpose
Opponent Raw x (League Avg Conceded / Opp Avg Conceded) Fair comparison across opposition quality
Game state Re-weight to league-average game-state distribution Remove tactical behavior bias
Possession Offensive: Raw x (50% / Team Poss%) Normalize for opportunity
Venue Multiply by home/away correction factor Remove home advantage bias
League Scale by relative league strength estimate Enable cross-league comparison

6. The Three Pillars of Validation

Pillar Question Method Benchmark
Stability Is it consistent over time? Split-half reliability, Spearman-Brown r > 0.5
Discrimination Does it separate players? Intraclass correlation (ICC) ICC > 0.3
Predictive power Does it forecast outcomes? First-half-to-second-half correlation R^2 > baseline

Stabilization point:

$$n^* = \frac{1 - \text{ICC}}{\text{ICC}}$$

7. Stabilization Reference Table

Metric Approx. Matches to Stabilize
Pass completion % 6--8
Tackle rate 8--10
Shot volume 10--12
xG per shot 15--20
Goal conversion rate 35--40+
Save percentage 30--40+

8. Communication Principles

  1. Lead with the question, not the method.
  2. Use natural units (goals, points, wins).
  3. Provide comparisons (league average, positional percentile).
  4. Visualize uncertainty (confidence intervals, ranges).
  5. Tell a story that connects data to decisions.
  6. Build trust incrementally through transparency, track record, and humility.

Key Formulas

Formula Expression
Per-90 rate $\frac{\text{Count}}{\text{Minutes}} \times 90$
Opponent adjustment $\text{Raw} \times \frac{\text{League Avg}}{\text{Opp Avg}}$
Possession adj. (offense) $\text{Raw} \times \frac{0.50}{\text{Team Poss}}$
Possession adj. (defense) $\text{Raw} \times \frac{0.50}{1 - \text{Team Poss}}$
Spearman-Brown $r_{\text{full}} = \frac{2 r_{\text{half}}}{1 + r_{\text{half}}}$
ICC $\frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{between}} + \sigma^2_{\text{within}}}$
Stabilization point $n^* = \frac{1 - \text{ICC}}{\text{ICC}}$

Metric Classification Quick Reference

Purpose Type Examples
Descriptive What happened? Total goals, shot map, pass map
Predictive What will happen? xG, xA, points projection
Prescriptive What should we do? Transfer recommendation score

Common Mistakes to Avoid

  1. Comparing per-90 rates without checking sample size (minimum ~900 minutes).
  2. Using pass completion rate without specifying pass type or difficulty.
  3. Applying context adjustments without reporting raw values alongside.
  4. Treating a single season of goal data as a reliable measure of finishing skill.
  5. Presenting 50 metrics on a dashboard when 5 would suffice.
  6. Reporting metrics to four decimal places when two significant figures are appropriate.
  7. Confusing descriptive findings ("xG was higher than goals") with prescriptive conclusions ("he will regress").
  8. Ignoring possession context when comparing players across teams with different styles.

Self-Check Questions

Before moving to Chapter 6, make sure you can answer each of the following:

  • [ ] Can I explain why pass completion rate is misleading without additional context?
  • [ ] Can I compute a per-90 rate and explain when to use it vs. a counting statistic?
  • [ ] Can I apply at least two types of context adjustment (e.g., opponent, possession)?
  • [ ] Can I describe the split-half reliability method and interpret the resulting correlation?
  • [ ] Can I calculate a stabilization point given an ICC value?
  • [ ] Can I outline a presentation strategy for a non-technical audience?
  • [ ] Can I list the five desirable properties of a good metric?
  • [ ] Can I distinguish between descriptive, predictive, and prescriptive metrics?