Chapter 5 Key Takeaways
Core Concepts at a Glance
1. Traditional Statistics Are Necessary but Insufficient
Traditional box-score statistics (goals, assists, pass completion rate, shots, clean sheets) remain widely understood and useful for basic communication. However, they suffer from five systematic limitations:
| Limitation | Example |
|---|---|
| Lack of context | Goals do not reflect opponent quality |
| No event weighting | All shots count equally regardless of location |
| Credit assignment | Only the final passer gets an assist |
| Small sample sizes | 80--100 shots per season yield noisy conversion rates |
| Selection bias | Ambitious passers are penalized by completion rates |
2. Desirable Metric Properties
A well-designed soccer metric should exhibit:
| Property | Definition | Test |
|---|---|---|
| Validity | Measures what it claims to measure | Correlation with observable outcomes |
| Reliability | Consistent under similar conditions | Split-half correlation |
| Discrimination | Separates genuinely different players | ICC > 0.3 |
| Interpretability | Stakeholders understand its meaning | Express in natural units (goals, points) |
| Actionability | Points toward a decision | Tied to recruitment, tactics, or training |
3. Signal-to-Noise Decomposition
$$\text{Observed Value} = \text{True Talent} + \text{Context Effects} + \text{Random Noise}$$
Good metric design maximizes signal (true talent) relative to noise through larger samples, context adjustments, and appropriate normalization.
4. Rate vs. Counting Statistics
| Type | Definition | Best For | Watch Out For |
|---|---|---|---|
| Counting | Accumulates over time (e.g., total goals) | Volume, squad contribution, awards | Penalizes players with less playing time |
| Rate | Normalized by denominator (e.g., goals per 90) | Efficiency, cross-player comparison | Unreliable with small samples (< 900 min) |
Per-90 normalization:
$$\text{Metric per 90} = \frac{\text{Count}}{\text{Minutes}} \times 90$$
5. Context Adjustments
| Adjustment | Formula / Approach | Purpose |
|---|---|---|
| Opponent | Raw x (League Avg Conceded / Opp Avg Conceded) | Fair comparison across opposition quality |
| Game state | Re-weight to league-average game-state distribution | Remove tactical behavior bias |
| Possession | Offensive: Raw x (50% / Team Poss%) | Normalize for opportunity |
| Venue | Multiply by home/away correction factor | Remove home advantage bias |
| League | Scale by relative league strength estimate | Enable cross-league comparison |
6. The Three Pillars of Validation
| Pillar | Question | Method | Benchmark |
|---|---|---|---|
| Stability | Is it consistent over time? | Split-half reliability, Spearman-Brown | r > 0.5 |
| Discrimination | Does it separate players? | Intraclass correlation (ICC) | ICC > 0.3 |
| Predictive power | Does it forecast outcomes? | First-half-to-second-half correlation | R^2 > baseline |
Stabilization point:
$$n^* = \frac{1 - \text{ICC}}{\text{ICC}}$$
7. Stabilization Reference Table
| Metric | Approx. Matches to Stabilize |
|---|---|
| Pass completion % | 6--8 |
| Tackle rate | 8--10 |
| Shot volume | 10--12 |
| xG per shot | 15--20 |
| Goal conversion rate | 35--40+ |
| Save percentage | 30--40+ |
8. Communication Principles
- Lead with the question, not the method.
- Use natural units (goals, points, wins).
- Provide comparisons (league average, positional percentile).
- Visualize uncertainty (confidence intervals, ranges).
- Tell a story that connects data to decisions.
- Build trust incrementally through transparency, track record, and humility.
Key Formulas
| Formula | Expression |
|---|---|
| Per-90 rate | $\frac{\text{Count}}{\text{Minutes}} \times 90$ |
| Opponent adjustment | $\text{Raw} \times \frac{\text{League Avg}}{\text{Opp Avg}}$ |
| Possession adj. (offense) | $\text{Raw} \times \frac{0.50}{\text{Team Poss}}$ |
| Possession adj. (defense) | $\text{Raw} \times \frac{0.50}{1 - \text{Team Poss}}$ |
| Spearman-Brown | $r_{\text{full}} = \frac{2 r_{\text{half}}}{1 + r_{\text{half}}}$ |
| ICC | $\frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{between}} + \sigma^2_{\text{within}}}$ |
| Stabilization point | $n^* = \frac{1 - \text{ICC}}{\text{ICC}}$ |
Metric Classification Quick Reference
| Purpose | Type | Examples |
|---|---|---|
| Descriptive | What happened? | Total goals, shot map, pass map |
| Predictive | What will happen? | xG, xA, points projection |
| Prescriptive | What should we do? | Transfer recommendation score |
Common Mistakes to Avoid
- Comparing per-90 rates without checking sample size (minimum ~900 minutes).
- Using pass completion rate without specifying pass type or difficulty.
- Applying context adjustments without reporting raw values alongside.
- Treating a single season of goal data as a reliable measure of finishing skill.
- Presenting 50 metrics on a dashboard when 5 would suffice.
- Reporting metrics to four decimal places when two significant figures are appropriate.
- Confusing descriptive findings ("xG was higher than goals") with prescriptive conclusions ("he will regress").
- Ignoring possession context when comparing players across teams with different styles.
Self-Check Questions
Before moving to Chapter 6, make sure you can answer each of the following:
- [ ] Can I explain why pass completion rate is misleading without additional context?
- [ ] Can I compute a per-90 rate and explain when to use it vs. a counting statistic?
- [ ] Can I apply at least two types of context adjustment (e.g., opponent, possession)?
- [ ] Can I describe the split-half reliability method and interpret the resulting correlation?
- [ ] Can I calculate a stabilization point given an ICC value?
- [ ] Can I outline a presentation strategy for a non-technical audience?
- [ ] Can I list the five desirable properties of a good metric?
- [ ] Can I distinguish between descriptive, predictive, and prescriptive metrics?