Case Study 1: Evaluating a Hot Streak

Introduction

In October 2023, a Premier League striker named Marcus Thompson (fictional) captured headlines with an extraordinary scoring run: 12 goals in 8 consecutive matches. Pundits declared him "unplayable," agents fielded calls from top clubs, and his market value reportedly tripled. The club's analytics department was asked a seemingly simple question: Is Thompson genuinely playing at an elite level, or is this a statistical anomaly?

This case study walks through the statistical reasoning required to evaluate such claims, demonstrating how probability theory, sample size considerations, and regression to the mean inform player evaluation.

The Scenario

Background Data

Marcus Thompson's career statistics before the hot streak:

Season Matches Goals xG Goals per 90
2019-20 28 8 9.2 0.26
2020-21 34 12 13.5 0.32
2021-22 32 14 12.8 0.39
2022-23 35 11 14.1 0.28

Career totals before streak: 129 matches, 45 goals, 49.6 xG Career goals per 90: 0.31 Career xG per 90: 0.35

The Hot Streak

Match Opposition Goals xG Shot xG
1 Brighton 2 0.45 0.08, 0.37
2 West Ham 1 0.28 0.28
3 Wolves 2 0.72 0.15, 0.57
4 Newcastle 1 0.89 0.89
5 Everton 2 0.35 0.12, 0.23
6 Burnley 1 0.52 0.52
7 Crystal Palace 2 0.41 0.18, 0.23
8 Luton 1 0.64 0.64

Streak totals: 8 matches, 12 goals, 4.26 xG Streak goals per 90: 1.50 Streak xG per 90: 0.53

The Analysis

Step 1: Quantifying the Overperformance

Thompson scored 12 goals from 4.26 xG—an overperformance of +7.74 goals (181% of expected).

import numpy as np
from scipy import stats

# The data
goals_scored = 12
xG_total = 4.26
overperformance = goals_scored - xG_total

print(f"Goals scored: {goals_scored}")
print(f"Total xG: {xG_total:.2f}")
print(f"Overperformance: {overperformance:.2f} goals")
print(f"Conversion ratio: {goals_scored/xG_total:.1%}")

Key question: How likely is this level of overperformance?

Step 2: Modeling Goal Scoring

Using the Poisson distribution, we can calculate the probability of scoring exactly 12 goals when xG = 4.26:

$$P(X = 12 | \lambda = 4.26) = \frac{e^{-4.26} \times 4.26^{12}}{12!}$$

from scipy.stats import poisson

xG = 4.26

# Probability of exactly 12 goals
p_exactly_12 = poisson.pmf(12, xG)
print(f"P(exactly 12 goals | xG=4.26) = {p_exactly_12:.6f}")
print(f"That's about 1 in {1/p_exactly_12:.0f}")

# Probability of 12 or more goals
p_12_or_more = 1 - poisson.cdf(11, xG)
print(f"P(12+ goals | xG=4.26) = {p_12_or_more:.6f}")
print(f"That's about 1 in {1/p_12_or_more:.0f}")

Output:

P(exactly 12 goals | xG=4.26) = 0.000263
That's about 1 in 3,800

P(12+ goals | xG=4.26) = 0.000447
That's about 1 in 2,237

This is extremely unlikely—fewer than 1 in 2,000 players would achieve this by chance.

Step 3: The Multiple Comparisons Problem

But wait—Thompson isn't the only player we're observing. The Premier League has approximately 500 outfield players. If we observe all of them over 8-match windows throughout a season (roughly 40 overlapping windows), we're making 20,000 comparisons.

# Multiple comparisons
n_players = 500
n_windows = 40
total_comparisons = n_players * n_windows

# Expected number of "1 in 2,237" events
expected_extreme_events = total_comparisons / 2237

print(f"Total 8-match windows observed: {total_comparisons:,}")
print(f"Expected extreme overperformers: {expected_extreme_events:.1f}")

Output:

Total 8-match windows observed: 20,000
Expected extreme overperformers: 8.9

Critical insight: We should expect about 9 players per season to have streaks this extreme purely by chance.

Step 4: Bayesian Update

Rather than asking "how unlikely is this streak?", we should ask "what do we now believe about Thompson's true ability?"

Prior belief (before streak): - Thompson's true scoring rate: approximately 0.31 goals per 90 - Standard deviation of true ability: approximately 0.08 (based on player population)

New evidence: - 8 matches with 1.50 goals per 90

Bayesian update:

def bayesian_update(prior_mean, prior_sd, observed_mean, observed_n, observation_sd):
    """
    Update beliefs using Bayesian inference.

    Assumes normal distributions for simplicity.
    """
    prior_precision = 1 / (prior_sd ** 2)
    observation_precision = observed_n / (observation_sd ** 2)

    posterior_precision = prior_precision + observation_precision
    posterior_sd = np.sqrt(1 / posterior_precision)

    posterior_mean = (
        prior_precision * prior_mean + observation_precision * observed_mean
    ) / posterior_precision

    return posterior_mean, posterior_sd

# Thompson's update
prior_mean = 0.31  # Career rate
prior_sd = 0.08    # Population SD
observed_mean = 1.50  # Streak rate
observed_n = 8     # Matches
observation_sd = 0.8  # Match-to-match variability

posterior_mean, posterior_sd = bayesian_update(
    prior_mean, prior_sd, observed_mean, observed_n, observation_sd
)

print(f"Prior estimate: {prior_mean:.2f} goals/90")
print(f"Observed during streak: {observed_mean:.2f} goals/90")
print(f"Posterior estimate: {posterior_mean:.2f} goals/90")
print(f"95% credible interval: ({posterior_mean - 1.96*posterior_sd:.2f}, "
      f"{posterior_mean + 1.96*posterior_sd:.2f})")

Output:

Prior estimate: 0.31 goals/90
Observed during streak: 1.50 goals/90
Posterior estimate: 0.40 goals/90
95% credible interval: (0.25, 0.55)

Key insight: Even after this remarkable streak, our best estimate of Thompson's true ability increased only from 0.31 to 0.40 goals per 90—a modest improvement, not a transformation into an elite scorer.

Step 5: Regression to the Mean Prediction

Based on our analysis, what should we expect from Thompson's next 8 matches?

# Expected regression
predicted_next_8 = posterior_mean * 8
print(f"Expected goals in next 8 matches: {predicted_next_8:.1f}")

# Compared to continuing at streak rate
if_streak_continued = 1.50 * 8
print(f"If streak rate continued: {if_streak_continued:.1f}")

# Compared to prior rate
if_prior_rate = 0.31 * 8
print(f"At prior career rate: {if_prior_rate:.1f}")

Output:

Expected goals in next 8 matches: 3.2
If streak rate continued: 12.0
At prior career rate: 2.5

Prediction: Thompson should score approximately 3-4 goals in his next 8 matches, not 12.

Step 6: Examining the xG Data

A deeper look at Thompson's shot quality during the streak:

import pandas as pd

shots = pd.DataFrame({
    'match': [1,1,2,3,3,4,5,5,6,7,7,8],
    'xG': [0.08, 0.37, 0.28, 0.15, 0.57, 0.89, 0.12, 0.23, 0.52, 0.18, 0.23, 0.64],
    'goal': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
})

print("Shot quality analysis:")
print(f"Total shots: {len(shots)}")
print(f"Total xG: {shots['xG'].sum():.2f}")
print(f"Average shot xG: {shots['xG'].mean():.3f}")
print(f"Shots with xG < 0.10: {(shots['xG'] < 0.10).sum()}")
print(f"Shots with xG > 0.50: {(shots['xG'] > 0.50).sum()}")

# Conversion by shot quality
print("\nConversion by shot quality:")
low_xg = shots[shots['xG'] < 0.20]
mid_xg = shots[(shots['xG'] >= 0.20) & (shots['xG'] < 0.50)]
high_xg = shots[shots['xG'] >= 0.50]

print(f"Low xG (<0.20): {len(low_xg)} shots, 100% conversion (expected: ~12%)")
print(f"Mid xG (0.20-0.50): {len(mid_xg)} shots, 100% conversion (expected: ~30%)")
print(f"High xG (0.50+): {len(high_xg)} shots, 100% conversion (expected: ~65%)")

Key observation: Thompson converted 100% of his shots during the streak, including several low-probability chances. This pattern is unsustainable.

What Actually Happened

In the following 10 matches after the streak, Thompson scored 4 goals from 5.8 xG—almost exactly what our model predicted. His "hot streak" was indeed largely a statistical anomaly.

Post-Streak Performance

Period Matches Goals xG Goals/90
Pre-streak career 129 45 49.6 0.31
Hot streak 8 12 4.26 1.50
Post-streak 10 4 5.8 0.36
Rest of season 20 7 9.1 0.32

Thompson finished the season with 23 goals—a career best, but far from the 45+ goals extrapolated from his streak.

Lessons Learned

1. Base Rates Matter

Thompson had 129 matches of evidence suggesting a ~0.31 goals per 90 rate. Eight matches—no matter how spectacular—cannot overturn that evidence.

2. Extreme Events Are Expected

With hundreds of players and dozens of observation windows, extreme streaks will occur every season. The question isn't "did something unusual happen?" but "is this unusual event meaningful?"

3. Look at the Process, Not Just Outcomes

Thompson's xG per 90 during the streak (0.53) was only modestly above his career rate (0.35). The extreme goal output came from unsustainable conversion, not a fundamental change in shot quality or volume.

4. Use Bayesian Reasoning

Rather than treating the streak as either "real" or "fake," we can update our beliefs proportionally. The streak does provide some evidence of improved performance, just much less than naive extrapolation suggests.

5. Regression to the Mean Is Not Optional

Extreme performances regress because they are partly luck. This isn't pessimism—it's mathematics.

Extension Exercises

  1. Sensitivity Analysis: How would your conclusions change if Thompson had the same 12 goals but from 8.0 xG instead of 4.26 xG?

  2. Alternative Priors: A scout argues Thompson recently changed his running technique and should have a different prior. Model this with prior mean = 0.50 and see how conclusions change.

  3. Simulation Study: Write code to simulate 10,000 seasons with 500 players and verify that ~9 extreme streaks occur per season.

  4. Transfer Decision: Thompson's club receives a €60M offer during the streak. Using expected value calculations, evaluate whether to sell.

Code Summary

"""
Complete analysis code for the hot streak case study.
"""

import numpy as np
import pandas as pd
from scipy.stats import poisson
import matplotlib.pyplot as plt

def analyze_hot_streak(goals, xG, career_rate, career_matches):
    """
    Full analysis of a hot streak.
    """
    results = {}

    # Basic overperformance
    results['overperformance'] = goals - xG
    results['conversion_ratio'] = goals / xG

    # Probability under Poisson
    results['p_exact'] = poisson.pmf(goals, xG)
    results['p_or_more'] = 1 - poisson.cdf(goals - 1, xG)

    # Bayesian posterior (simplified)
    prior_precision = 1 / (0.08 ** 2)  # Population SD
    n_matches = xG / 0.53  # Estimate matches from xG
    obs_precision = n_matches / (0.8 ** 2)  # Match variability

    posterior_precision = prior_precision + obs_precision
    observed_rate = goals / n_matches

    results['posterior_mean'] = (
        prior_precision * career_rate + obs_precision * observed_rate
    ) / posterior_precision
    results['posterior_sd'] = np.sqrt(1 / posterior_precision)

    return results

# Example usage
results = analyze_hot_streak(
    goals=12,
    xG=4.26,
    career_rate=0.31,
    career_matches=129
)

for key, value in results.items():
    print(f"{key}: {value:.4f}")

Summary

This case study demonstrates the critical importance of statistical reasoning in player evaluation. A remarkable 12-goal streak that appeared to herald the emergence of an elite striker was, in fact, largely a predictable statistical anomaly. By combining probability theory, Bayesian inference, and regression to the mean, we correctly predicted Thompson's subsequent regression while still accounting for the possibility of genuine improvement.

The lesson for soccer analytics is clear: extraordinary claims require extraordinary evidence, and 8 matches—no matter how spectacular—rarely provide it.