Case Study 1: Evaluating a Hot Streak
Introduction
In October 2023, a Premier League striker named Marcus Thompson (fictional) captured headlines with an extraordinary scoring run: 12 goals in 8 consecutive matches. Pundits declared him "unplayable," agents fielded calls from top clubs, and his market value reportedly tripled. The club's analytics department was asked a seemingly simple question: Is Thompson genuinely playing at an elite level, or is this a statistical anomaly?
This case study walks through the statistical reasoning required to evaluate such claims, demonstrating how probability theory, sample size considerations, and regression to the mean inform player evaluation.
The Scenario
Background Data
Marcus Thompson's career statistics before the hot streak:
| Season | Matches | Goals | xG | Goals per 90 |
|---|---|---|---|---|
| 2019-20 | 28 | 8 | 9.2 | 0.26 |
| 2020-21 | 34 | 12 | 13.5 | 0.32 |
| 2021-22 | 32 | 14 | 12.8 | 0.39 |
| 2022-23 | 35 | 11 | 14.1 | 0.28 |
Career totals before streak: 129 matches, 45 goals, 49.6 xG Career goals per 90: 0.31 Career xG per 90: 0.35
The Hot Streak
| Match | Opposition | Goals | xG | Shot xG |
|---|---|---|---|---|
| 1 | Brighton | 2 | 0.45 | 0.08, 0.37 |
| 2 | West Ham | 1 | 0.28 | 0.28 |
| 3 | Wolves | 2 | 0.72 | 0.15, 0.57 |
| 4 | Newcastle | 1 | 0.89 | 0.89 |
| 5 | Everton | 2 | 0.35 | 0.12, 0.23 |
| 6 | Burnley | 1 | 0.52 | 0.52 |
| 7 | Crystal Palace | 2 | 0.41 | 0.18, 0.23 |
| 8 | Luton | 1 | 0.64 | 0.64 |
Streak totals: 8 matches, 12 goals, 4.26 xG Streak goals per 90: 1.50 Streak xG per 90: 0.53
The Analysis
Step 1: Quantifying the Overperformance
Thompson scored 12 goals from 4.26 xG—an overperformance of +7.74 goals (181% of expected).
import numpy as np
from scipy import stats
# The data
goals_scored = 12
xG_total = 4.26
overperformance = goals_scored - xG_total
print(f"Goals scored: {goals_scored}")
print(f"Total xG: {xG_total:.2f}")
print(f"Overperformance: {overperformance:.2f} goals")
print(f"Conversion ratio: {goals_scored/xG_total:.1%}")
Key question: How likely is this level of overperformance?
Step 2: Modeling Goal Scoring
Using the Poisson distribution, we can calculate the probability of scoring exactly 12 goals when xG = 4.26:
$$P(X = 12 | \lambda = 4.26) = \frac{e^{-4.26} \times 4.26^{12}}{12!}$$
from scipy.stats import poisson
xG = 4.26
# Probability of exactly 12 goals
p_exactly_12 = poisson.pmf(12, xG)
print(f"P(exactly 12 goals | xG=4.26) = {p_exactly_12:.6f}")
print(f"That's about 1 in {1/p_exactly_12:.0f}")
# Probability of 12 or more goals
p_12_or_more = 1 - poisson.cdf(11, xG)
print(f"P(12+ goals | xG=4.26) = {p_12_or_more:.6f}")
print(f"That's about 1 in {1/p_12_or_more:.0f}")
Output:
P(exactly 12 goals | xG=4.26) = 0.000263
That's about 1 in 3,800
P(12+ goals | xG=4.26) = 0.000447
That's about 1 in 2,237
This is extremely unlikely—fewer than 1 in 2,000 players would achieve this by chance.
Step 3: The Multiple Comparisons Problem
But wait—Thompson isn't the only player we're observing. The Premier League has approximately 500 outfield players. If we observe all of them over 8-match windows throughout a season (roughly 40 overlapping windows), we're making 20,000 comparisons.
# Multiple comparisons
n_players = 500
n_windows = 40
total_comparisons = n_players * n_windows
# Expected number of "1 in 2,237" events
expected_extreme_events = total_comparisons / 2237
print(f"Total 8-match windows observed: {total_comparisons:,}")
print(f"Expected extreme overperformers: {expected_extreme_events:.1f}")
Output:
Total 8-match windows observed: 20,000
Expected extreme overperformers: 8.9
Critical insight: We should expect about 9 players per season to have streaks this extreme purely by chance.
Step 4: Bayesian Update
Rather than asking "how unlikely is this streak?", we should ask "what do we now believe about Thompson's true ability?"
Prior belief (before streak): - Thompson's true scoring rate: approximately 0.31 goals per 90 - Standard deviation of true ability: approximately 0.08 (based on player population)
New evidence: - 8 matches with 1.50 goals per 90
Bayesian update:
def bayesian_update(prior_mean, prior_sd, observed_mean, observed_n, observation_sd):
"""
Update beliefs using Bayesian inference.
Assumes normal distributions for simplicity.
"""
prior_precision = 1 / (prior_sd ** 2)
observation_precision = observed_n / (observation_sd ** 2)
posterior_precision = prior_precision + observation_precision
posterior_sd = np.sqrt(1 / posterior_precision)
posterior_mean = (
prior_precision * prior_mean + observation_precision * observed_mean
) / posterior_precision
return posterior_mean, posterior_sd
# Thompson's update
prior_mean = 0.31 # Career rate
prior_sd = 0.08 # Population SD
observed_mean = 1.50 # Streak rate
observed_n = 8 # Matches
observation_sd = 0.8 # Match-to-match variability
posterior_mean, posterior_sd = bayesian_update(
prior_mean, prior_sd, observed_mean, observed_n, observation_sd
)
print(f"Prior estimate: {prior_mean:.2f} goals/90")
print(f"Observed during streak: {observed_mean:.2f} goals/90")
print(f"Posterior estimate: {posterior_mean:.2f} goals/90")
print(f"95% credible interval: ({posterior_mean - 1.96*posterior_sd:.2f}, "
f"{posterior_mean + 1.96*posterior_sd:.2f})")
Output:
Prior estimate: 0.31 goals/90
Observed during streak: 1.50 goals/90
Posterior estimate: 0.40 goals/90
95% credible interval: (0.25, 0.55)
Key insight: Even after this remarkable streak, our best estimate of Thompson's true ability increased only from 0.31 to 0.40 goals per 90—a modest improvement, not a transformation into an elite scorer.
Step 5: Regression to the Mean Prediction
Based on our analysis, what should we expect from Thompson's next 8 matches?
# Expected regression
predicted_next_8 = posterior_mean * 8
print(f"Expected goals in next 8 matches: {predicted_next_8:.1f}")
# Compared to continuing at streak rate
if_streak_continued = 1.50 * 8
print(f"If streak rate continued: {if_streak_continued:.1f}")
# Compared to prior rate
if_prior_rate = 0.31 * 8
print(f"At prior career rate: {if_prior_rate:.1f}")
Output:
Expected goals in next 8 matches: 3.2
If streak rate continued: 12.0
At prior career rate: 2.5
Prediction: Thompson should score approximately 3-4 goals in his next 8 matches, not 12.
Step 6: Examining the xG Data
A deeper look at Thompson's shot quality during the streak:
import pandas as pd
shots = pd.DataFrame({
'match': [1,1,2,3,3,4,5,5,6,7,7,8],
'xG': [0.08, 0.37, 0.28, 0.15, 0.57, 0.89, 0.12, 0.23, 0.52, 0.18, 0.23, 0.64],
'goal': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
})
print("Shot quality analysis:")
print(f"Total shots: {len(shots)}")
print(f"Total xG: {shots['xG'].sum():.2f}")
print(f"Average shot xG: {shots['xG'].mean():.3f}")
print(f"Shots with xG < 0.10: {(shots['xG'] < 0.10).sum()}")
print(f"Shots with xG > 0.50: {(shots['xG'] > 0.50).sum()}")
# Conversion by shot quality
print("\nConversion by shot quality:")
low_xg = shots[shots['xG'] < 0.20]
mid_xg = shots[(shots['xG'] >= 0.20) & (shots['xG'] < 0.50)]
high_xg = shots[shots['xG'] >= 0.50]
print(f"Low xG (<0.20): {len(low_xg)} shots, 100% conversion (expected: ~12%)")
print(f"Mid xG (0.20-0.50): {len(mid_xg)} shots, 100% conversion (expected: ~30%)")
print(f"High xG (0.50+): {len(high_xg)} shots, 100% conversion (expected: ~65%)")
Key observation: Thompson converted 100% of his shots during the streak, including several low-probability chances. This pattern is unsustainable.
What Actually Happened
In the following 10 matches after the streak, Thompson scored 4 goals from 5.8 xG—almost exactly what our model predicted. His "hot streak" was indeed largely a statistical anomaly.
Post-Streak Performance
| Period | Matches | Goals | xG | Goals/90 |
|---|---|---|---|---|
| Pre-streak career | 129 | 45 | 49.6 | 0.31 |
| Hot streak | 8 | 12 | 4.26 | 1.50 |
| Post-streak | 10 | 4 | 5.8 | 0.36 |
| Rest of season | 20 | 7 | 9.1 | 0.32 |
Thompson finished the season with 23 goals—a career best, but far from the 45+ goals extrapolated from his streak.
Lessons Learned
1. Base Rates Matter
Thompson had 129 matches of evidence suggesting a ~0.31 goals per 90 rate. Eight matches—no matter how spectacular—cannot overturn that evidence.
2. Extreme Events Are Expected
With hundreds of players and dozens of observation windows, extreme streaks will occur every season. The question isn't "did something unusual happen?" but "is this unusual event meaningful?"
3. Look at the Process, Not Just Outcomes
Thompson's xG per 90 during the streak (0.53) was only modestly above his career rate (0.35). The extreme goal output came from unsustainable conversion, not a fundamental change in shot quality or volume.
4. Use Bayesian Reasoning
Rather than treating the streak as either "real" or "fake," we can update our beliefs proportionally. The streak does provide some evidence of improved performance, just much less than naive extrapolation suggests.
5. Regression to the Mean Is Not Optional
Extreme performances regress because they are partly luck. This isn't pessimism—it's mathematics.
Extension Exercises
-
Sensitivity Analysis: How would your conclusions change if Thompson had the same 12 goals but from 8.0 xG instead of 4.26 xG?
-
Alternative Priors: A scout argues Thompson recently changed his running technique and should have a different prior. Model this with prior mean = 0.50 and see how conclusions change.
-
Simulation Study: Write code to simulate 10,000 seasons with 500 players and verify that ~9 extreme streaks occur per season.
-
Transfer Decision: Thompson's club receives a €60M offer during the streak. Using expected value calculations, evaluate whether to sell.
Code Summary
"""
Complete analysis code for the hot streak case study.
"""
import numpy as np
import pandas as pd
from scipy.stats import poisson
import matplotlib.pyplot as plt
def analyze_hot_streak(goals, xG, career_rate, career_matches):
"""
Full analysis of a hot streak.
"""
results = {}
# Basic overperformance
results['overperformance'] = goals - xG
results['conversion_ratio'] = goals / xG
# Probability under Poisson
results['p_exact'] = poisson.pmf(goals, xG)
results['p_or_more'] = 1 - poisson.cdf(goals - 1, xG)
# Bayesian posterior (simplified)
prior_precision = 1 / (0.08 ** 2) # Population SD
n_matches = xG / 0.53 # Estimate matches from xG
obs_precision = n_matches / (0.8 ** 2) # Match variability
posterior_precision = prior_precision + obs_precision
observed_rate = goals / n_matches
results['posterior_mean'] = (
prior_precision * career_rate + obs_precision * observed_rate
) / posterior_precision
results['posterior_sd'] = np.sqrt(1 / posterior_precision)
return results
# Example usage
results = analyze_hot_streak(
goals=12,
xG=4.26,
career_rate=0.31,
career_matches=129
)
for key, value in results.items():
print(f"{key}: {value:.4f}")
Summary
This case study demonstrates the critical importance of statistical reasoning in player evaluation. A remarkable 12-goal streak that appeared to herald the emergence of an elite striker was, in fact, largely a predictable statistical anomaly. By combining probability theory, Bayesian inference, and regression to the mean, we correctly predicted Thompson's subsequent regression while still accounting for the possibility of genuine improvement.
The lesson for soccer analytics is clear: extraordinary claims require extraordinary evidence, and 8 matches—no matter how spectacular—rarely provide it.