> "Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital." — Aaron Levenstein
In This Chapter
Chapter 5: Statistical Foundations for Football Analytics
"Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital." — Aaron Levenstein
Learning Objectives
By the end of this chapter, you will be able to:
- Apply probability concepts to football decision-making
- Understand and interpret Expected Points Added (EPA)
- Calculate and use Win Probability models
- Conduct hypothesis testing for football questions
- Recognize and avoid common statistical pitfalls
- Apply regression techniques to football problems
5.1 Probability Foundations
Why Probability Matters in Football
Football is fundamentally a game of probabilities. Every decision—whether to go for it on fourth down, when to throw deep, how to allocate the salary cap—involves weighing uncertain outcomes.
Key Probability Concepts:
import numpy as np
from scipy import stats
# Basic probability: If a kicker makes 85% of field goals
fg_probability = 0.85
# Expected value: What's the expected points from a 40-yard attempt?
# Assume: Make = 3 points, Miss = 0 points (simplified)
expected_points = fg_probability * 3 + (1 - fg_probability) * 0
print(f"Expected points: {expected_points:.2f}") # 2.55
Conditional Probability
The probability of event A given that event B has occurred:
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
Football Example: Third Down Conversion
def conditional_probability_example(pbp):
"""
Calculate P(First Down | Third and Short).
This is the probability of converting given we're in
a third-and-short situation.
"""
# P(Third and Short)
all_plays = len(pbp.query("down == 3"))
third_short = len(pbp.query("down == 3 and ydstogo <= 3"))
p_third_short = third_short / all_plays
# P(First Down AND Third and Short)
first_down_short = len(pbp.query("down == 3 and ydstogo <= 3 and first_down == 1"))
p_first_and_short = first_down_short / all_plays
# P(First Down | Third and Short)
p_conversion_given_short = p_first_and_short / p_third_short
return p_conversion_given_short
Bayes' Theorem
Bayes' theorem allows us to update probabilities as new information arrives:
$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$
Application: Updating Win Probability
At kickoff, each team has approximately 50% win probability (with slight home advantage). As the game progresses, we update this probability based on:
- Score differential
- Time remaining
- Field position
- Down and distance
- Timeouts remaining
def bayes_win_probability_update():
"""
Conceptual example of Bayesian updating in football.
Prior: Pre-game win probability = 50%
Evidence: Team scores opening touchdown
Posterior: Updated win probability
Historical data tells us:
- P(Win | Score First TD) ≈ 68%
- This incorporates our prior implicitly
"""
prior_win_prob = 0.50
p_score_first_given_win = 0.58 # Teams that win score first more often
p_score_first_given_lose = 0.42
p_score_first = 0.50 # Either team equally likely to score first
# Bayes' theorem
posterior = (p_score_first_given_win * prior_win_prob) / p_score_first
return posterior # ≈ 0.58
5.2 Expected Points Added (EPA)
The Foundation Metric
Expected Points Added (EPA) is the foundational metric in modern football analytics. It measures how much each play changes a team's expected point outcome.
The Concept:
Every down-and-distance situation has an expected point value. For example: - 1st and 10 at own 25 ≈ 1.0 expected points - 1st and 10 at opponent's 25 ≈ 4.0 expected points - 1st and Goal at the 1 ≈ 6.3 expected points
EPA measures the change: EPA = EP_after - EP_before
Building an Expected Points Model
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import PolynomialFeatures
class ExpectedPointsModel:
"""
Simplified Expected Points model.
The real nflfastR model uses:
- Multinomial logistic regression
- 7 outcome categories (TD, FG, Safety, etc. for each team)
- Game state variables (down, distance, field position, etc.)
"""
def __init__(self):
self.model = None
self.point_values = {
'touchdown': 7, # Including extra point
'field_goal': 3,
'safety': 2,
'no_score': 0
}
def calculate_expected_points(self, yardline_100, down, ydstogo):
"""
Calculate expected points for a situation.
Parameters
----------
yardline_100 : int
Yards from opponent's goal (1-99)
down : int
Current down (1-4)
ydstogo : int
Yards needed for first down
Returns
-------
float
Expected points value
"""
# Simplified model using historical averages
# Real models use regression with many features
# Base EP by field position (simplified)
base_ep = (100 - yardline_100) * 0.06 - 1.5
# Down adjustment
down_adjustment = {
1: 0.3,
2: 0.0,
3: -0.3,
4: -0.6
}
base_ep += down_adjustment.get(down, 0)
# Distance adjustment (longer = harder)
if ydstogo > 10:
base_ep -= 0.3
elif ydstogo <= 3:
base_ep += 0.2
return base_ep
def calculate_epa(self, play_data):
"""
Calculate EPA for a play.
Parameters
----------
play_data : dict
Contains before/after state
Returns
-------
float
Expected Points Added
"""
ep_before = self.calculate_expected_points(
play_data['yardline_100_before'],
play_data['down_before'],
play_data['ydstogo_before']
)
# Handle scoring plays and turnovers
if play_data.get('touchdown'):
ep_after = 7
elif play_data.get('turnover'):
ep_after = -self.calculate_expected_points(
100 - play_data['yardline_100_after'],
1,
10
)
else:
ep_after = self.calculate_expected_points(
play_data['yardline_100_after'],
play_data['down_after'],
play_data['ydstogo_after']
)
return ep_after - ep_before
Interpreting EPA
| EPA Value | Interpretation |
|---|---|
| +0.5 | Good play, above average |
| +1.0 | Very good play |
| +2.0 | Excellent play (long gain, TD) |
| 0.0 | Average play |
| -0.5 | Below average play |
| -2.0 | Bad play (turnover, sack) |
| -4.0 | Disaster (pick-six territory) |
def interpret_epa(epa):
"""Provide interpretation of EPA value."""
if epa >= 2.0:
return "Excellent play"
elif epa >= 1.0:
return "Very good play"
elif epa >= 0.5:
return "Good play"
elif epa >= -0.5:
return "Average play"
elif epa >= -1.0:
return "Below average"
elif epa >= -2.0:
return "Bad play"
else:
return "Disaster"
EPA Applications
1. Player Evaluation
def evaluate_quarterback(pbp, min_dropbacks=200):
"""Evaluate QBs using EPA."""
qb_stats = (
pbp
.query("pass == 1")
.groupby('passer_player_name')
.agg(
dropbacks=('pass', 'count'),
total_epa=('epa', 'sum'),
epa_per_play=('epa', 'mean'),
success_rate=('success', 'mean')
)
.query(f"dropbacks >= {min_dropbacks}")
.sort_values('epa_per_play', ascending=False)
)
return qb_stats
2. Play-Calling Analysis
def analyze_play_calling(pbp, team):
"""Analyze play-calling efficiency."""
team_plays = pbp.query(f"posteam == '{team}'")
analysis = (
team_plays
.groupby(['down', 'play_type'])
.agg(
plays=('play_id', 'count'),
epa=('epa', 'mean')
)
.reset_index()
.pivot(index='down', columns='play_type', values='epa')
)
return analysis
5.3 Win Probability
Understanding Win Probability
Win Probability (WP) estimates the likelihood of winning at any point during a game, based on the current game state.
Factors in Win Probability:
- Score differential
- Time remaining
- Down and distance
- Field position
- Timeouts (for each team)
- Whether team has the ball
Win Probability Model
class WinProbabilityModel:
"""
Simplified Win Probability model.
Real models use logistic regression on historical game data.
"""
def calculate_wp(
self,
score_diff: int,
seconds_remaining: int,
has_ball: bool = True,
yardline_100: int = 75,
down: int = 1,
ydstogo: int = 10,
home: bool = True
) -> float:
"""
Calculate win probability.
Parameters
----------
score_diff : int
Score differential (positive = winning)
seconds_remaining : int
Seconds left in the game
has_ball : bool
Whether the team has possession
yardline_100 : int
Yards from opponent's goal
down : int
Current down
ydstogo : int
Yards to first down
home : bool
Whether team is home
Returns
-------
float
Win probability (0-1)
"""
# Simplified model
# Real model: Logistic regression on game outcomes
# Base probability from score differential
# Each point ≈ 3% win probability adjustment at game start
# This effect decays as time remains (more volatile early)
time_factor = seconds_remaining / 3600 # Fraction of game remaining
# Score adjustment (diminishes with less time)
score_impact = score_diff * 0.03 * (1 + time_factor)
# Home advantage
home_advantage = 0.03 if home else -0.03
# Possession bonus (depends on field position and time)
if has_ball:
possession_bonus = 0.05 * (1 - yardline_100/100) * time_factor
else:
possession_bonus = 0
# Combine factors
base_wp = 0.5 + score_impact + home_advantage + possession_bonus
# Clamp to valid range
return max(0.01, min(0.99, base_wp))
def calculate_wpa(self, wp_before: float, wp_after: float) -> float:
"""
Calculate Win Probability Added.
Parameters
----------
wp_before : float
Win probability before play
wp_after : float
Win probability after play
Returns
-------
float
Win Probability Added
"""
return wp_after - wp_before
Win Probability Added (WPA)
WPA measures how much each play changed the team's chances of winning:
def calculate_game_wpa(pbp, game_id, team):
"""
Calculate cumulative WPA through a game.
Parameters
----------
pbp : pd.DataFrame
Play-by-play data
game_id : str
Game identifier
team : str
Team to calculate WPA for
Returns
-------
pd.DataFrame
Play-by-play with cumulative WPA
"""
game = pbp.query(f"game_id == '{game_id}'").copy()
# Calculate WPA for each play
game['team_wpa'] = game.apply(
lambda x: x['wpa'] if x['posteam'] == team else -x['wpa'],
axis=1
)
# Cumulative WPA
game['cumulative_wpa'] = game['team_wpa'].cumsum()
return game[['play_id', 'desc', 'wpa', 'team_wpa', 'cumulative_wpa']]
Clutch Performance
WPA identifies clutch performers—players who contribute most when the stakes are highest:
def identify_clutch_performers(pbp, position='QB', min_plays=50):
"""
Find players with highest average WPA.
Parameters
----------
pbp : pd.DataFrame
Play-by-play data
position : str
Position to analyze
min_plays : int
Minimum plays for inclusion
Returns
-------
pd.DataFrame
Players ranked by average WPA
"""
if position == 'QB':
player_col = 'passer_player_name'
filter_expr = 'pass == 1'
elif position == 'RB':
player_col = 'rusher_player_name'
filter_expr = 'rush == 1'
else:
player_col = 'receiver_player_name'
filter_expr = 'pass == 1'
clutch_stats = (
pbp
.query(filter_expr)
.groupby(player_col)
.agg(
plays=('wpa', 'count'),
total_wpa=('wpa', 'sum'),
avg_wpa=('wpa', 'mean')
)
.query(f"plays >= {min_plays}")
.sort_values('total_wpa', ascending=False)
)
return clutch_stats
5.4 Statistical Inference
Hypothesis Testing in Football
Hypothesis testing helps us determine whether observed differences are statistically significant or could be due to chance.
Common Football Questions:
- Is home field advantage real?
- Does this QB perform differently in cold weather?
- Is the difference between two receivers' catch rates meaningful?
The t-Test
from scipy import stats
def test_home_field_advantage(pbp):
"""
Test whether home teams have a significant EPA advantage.
H0: No difference between home and away EPA
H1: Home teams have higher EPA
"""
# Separate home and away plays
home_epa = pbp[pbp['posteam'] == pbp['home_team']]['epa'].dropna()
away_epa = pbp[pbp['posteam'] == pbp['away_team']]['epa'].dropna()
# Independent samples t-test
t_stat, p_value = stats.ttest_ind(home_epa, away_epa)
# Effect size (Cohen's d)
pooled_std = np.sqrt((home_epa.std()**2 + away_epa.std()**2) / 2)
cohens_d = (home_epa.mean() - away_epa.mean()) / pooled_std
print(f"Home EPA: {home_epa.mean():.4f}")
print(f"Away EPA: {away_epa.mean():.4f}")
print(f"Difference: {home_epa.mean() - away_epa.mean():.4f}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4e}")
print(f"Cohen's d: {cohens_d:.4f}")
return {
'home_mean': home_epa.mean(),
'away_mean': away_epa.mean(),
't_stat': t_stat,
'p_value': p_value,
'cohens_d': cohens_d
}
Confidence Intervals
def calculate_confidence_interval(data, confidence=0.95):
"""
Calculate confidence interval for the mean.
Parameters
----------
data : array-like
Sample data
confidence : float
Confidence level
Returns
-------
tuple
(lower bound, mean, upper bound)
"""
data = np.array(data)
n = len(data)
mean = data.mean()
se = stats.sem(data) # Standard error of the mean
# t-critical value
t_crit = stats.t.ppf((1 + confidence) / 2, n - 1)
margin = t_crit * se
return (mean - margin, mean, mean + margin)
Application: Is a QB's EPA Significantly Different from Average?
def test_qb_above_average(pbp, qb_name, min_plays=200):
"""
Test if a QB's EPA is significantly above league average.
Parameters
----------
pbp : pd.DataFrame
Play-by-play data
qb_name : str
QB name to test
min_plays : int
Minimum plays
Returns
-------
dict
Test results
"""
# Get QB's passes
qb_passes = pbp.query(f"pass == 1 and passer_player_name == '{qb_name}'")['epa'].dropna()
if len(qb_passes) < min_plays:
return {'error': f'Only {len(qb_passes)} plays, need {min_plays}'}
# League average
league_avg = pbp.query("pass == 1")['epa'].mean()
# One-sample t-test against league average
t_stat, p_value = stats.ttest_1samp(qb_passes, league_avg)
# One-tailed (is QB above average?)
p_value_one_tail = p_value / 2 if t_stat > 0 else 1 - p_value / 2
# Confidence interval
ci_low, mean, ci_high = calculate_confidence_interval(qb_passes)
print(f"{qb_name} EPA Analysis:")
print(f" Mean EPA: {mean:.4f}")
print(f" 95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
print(f" League Average: {league_avg:.4f}")
print(f" p-value (one-tailed): {p_value_one_tail:.4f}")
significant = p_value_one_tail < 0.05 and mean > league_avg
return {
'qb_mean': mean,
'league_avg': league_avg,
'ci': (ci_low, ci_high),
'p_value': p_value_one_tail,
'significant': significant
}
Multiple Comparisons Problem
When testing many hypotheses, some will appear significant by chance:
def adjust_p_values(p_values, method='bonferroni'):
"""
Adjust p-values for multiple comparisons.
Parameters
----------
p_values : array-like
Original p-values
method : str
'bonferroni' or 'fdr' (Benjamini-Hochberg)
Returns
-------
array
Adjusted p-values
"""
p_values = np.array(p_values)
n = len(p_values)
if method == 'bonferroni':
return np.minimum(p_values * n, 1.0)
elif method == 'fdr':
# Benjamini-Hochberg procedure
sorted_idx = np.argsort(p_values)
sorted_p = p_values[sorted_idx]
adjusted = np.zeros(n)
for i, p in enumerate(sorted_p):
adjusted[sorted_idx[i]] = min(p * n / (i + 1), 1.0)
# Ensure monotonicity
for i in range(n-2, -1, -1):
adjusted[i] = min(adjusted[i], adjusted[i+1]) if i < n-1 else adjusted[i]
return adjusted
5.5 Regression Analysis
Linear Regression for Football
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
def regression_example(pbp):
"""
Predict EPA using game situation variables.
"""
# Prepare data
plays = pbp.query("play_type.isin(['pass', 'run']) and epa.notna()").copy()
# Features
features = ['down', 'ydstogo', 'yardline_100', 'score_differential', 'pass']
X = plays[features].dropna()
y = plays.loc[X.index, 'epa']
# Fit model
X_const = sm.add_constant(X)
model = sm.OLS(y, X_const).fit()
print(model.summary())
return model
Logistic Regression for Binary Outcomes
from sklearn.linear_model import LogisticRegression
def predict_success(pbp):
"""
Build logistic regression to predict play success.
"""
plays = pbp.query("play_type.isin(['pass', 'run']) and success.notna()").copy()
# Features
features = ['down', 'ydstogo', 'yardline_100', 'score_differential', 'pass']
X = plays[features].dropna()
y = plays.loc[X.index, 'success']
# Fit logistic regression
model = LogisticRegression(max_iter=1000)
model.fit(X, y)
# Coefficients
coef_df = pd.DataFrame({
'feature': features,
'coefficient': model.coef_[0],
'odds_ratio': np.exp(model.coef_[0])
})
print("Logistic Regression Coefficients:")
print(coef_df)
return model, coef_df
Regularization
To prevent overfitting when using many features:
from sklearn.linear_model import Ridge, Lasso
def regularized_regression(X, y, alpha=1.0, method='ridge'):
"""
Fit regularized regression.
Parameters
----------
X : array-like
Features
y : array-like
Target
alpha : float
Regularization strength
method : str
'ridge' (L2) or 'lasso' (L1)
Returns
-------
model
Fitted model
"""
if method == 'ridge':
model = Ridge(alpha=alpha)
else:
model = Lasso(alpha=alpha)
model.fit(X, y)
return model
5.6 Common Statistical Pitfalls
Sample Size Issues
def check_sample_size(n, effect_size=0.2, power=0.8, alpha=0.05):
"""
Check if sample size is adequate for detecting an effect.
Parameters
----------
n : int
Sample size
effect_size : float
Expected effect size (Cohen's d)
power : float
Desired statistical power
alpha : float
Significance level
Returns
-------
dict
Power analysis results
"""
from scipy.stats import norm
z_alpha = norm.ppf(1 - alpha/2)
z_beta = norm.ppf(power)
required_n = ((z_alpha + z_beta) / effect_size) ** 2 * 2
actual_power = norm.cdf(
effect_size * np.sqrt(n/2) - z_alpha
)
return {
'current_n': n,
'required_n': int(required_n),
'adequate': n >= required_n,
'actual_power': actual_power
}
Regression to the Mean
The Problem: Extreme performances tend to regress toward average.
def demonstrate_regression_to_mean(pbp):
"""
Show regression to the mean in QB performance.
QBs who perform extremely well in week 1 tend to
regress toward average in subsequent weeks.
"""
# Get QB performance by week
qb_weekly = (
pbp
.query("pass == 1")
.groupby(['passer_player_name', 'week'])
.agg(
passes=('pass', 'count'),
epa=('epa', 'mean')
)
.query("passes >= 20")
.reset_index()
)
# Find top performers in week 1
week1_top = (
qb_weekly
.query("week == 1")
.nlargest(5, 'epa')
['passer_player_name']
.tolist()
)
# Track their performance in subsequent weeks
tracking = (
qb_weekly
[qb_weekly['passer_player_name'].isin(week1_top)]
.pivot(index='passer_player_name', columns='week', values='epa')
)
print("Week 1 Top Performers - EPA by Week:")
print(tracking)
print("\nNote how extreme week 1 performers tend to regress toward average.")
return tracking
Survivorship Bias
Only looking at successful outcomes can lead to wrong conclusions:
def survivorship_bias_example():
"""
Demonstrate survivorship bias in draft analysis.
If we only analyze starters, we miss all the players
who were drafted but never made an impact.
"""
print("Survivorship Bias Warning:")
print("-" * 50)
print("When analyzing 'successful' draft picks:")
print("- Don't forget about busts who are no longer in league")
print("- Sample of 'starters' is biased toward success")
print("- True hit rate requires counting all picks, not just successes")
print()
print("Example: '1st round QBs have 65% success rate'")
print("Reality: This often only counts QBs who are still playing,")
print(" not the ones who flamed out after 2 seasons.")
Selection Bias
def selection_bias_example():
"""
Explain selection bias in football analytics.
"""
print("Selection Bias Examples in Football:")
print("-" * 50)
print()
print("1. Fourth Down Analysis")
print(" - Teams only go for it when they think they can convert")
print(" - Observed conversion rate is biased upward")
print(" - Can't conclude 'teams should always go for it'")
print()
print("2. Deep Pass Analysis")
print(" - QBs throw deep when receivers are open")
print(" - Completion rate reflects receiver separation, not just QB skill")
print()
print("3. Blitz Analysis")
print(" - Defenses blitz when they expect pass")
print(" - Comparing EPA on blitz vs non-blitz is confounded")
5.7 Effect Sizes and Practical Significance
Beyond p-values
Statistical significance ≠ practical significance
def calculate_effect_size(group1, group2):
"""
Calculate various effect size measures.
Parameters
----------
group1 : array-like
First group
group2 : array-like
Second group
Returns
-------
dict
Effect size measures
"""
# Cohen's d
pooled_std = np.sqrt(
(np.std(group1)**2 + np.std(group2)**2) / 2
)
cohens_d = (np.mean(group1) - np.mean(group2)) / pooled_std
# Percent difference
pct_diff = (np.mean(group1) - np.mean(group2)) / np.mean(group2) * 100
# Effect size interpretation
if abs(cohens_d) < 0.2:
interpretation = "negligible"
elif abs(cohens_d) < 0.5:
interpretation = "small"
elif abs(cohens_d) < 0.8:
interpretation = "medium"
else:
interpretation = "large"
return {
'cohens_d': cohens_d,
'pct_diff': pct_diff,
'interpretation': interpretation
}
Football-Specific Thresholds
| Effect Size | EPA Difference | Interpretation |
|---|---|---|
| Negligible | < 0.02 | No practical difference |
| Small | 0.02-0.05 | Noticeable over many plays |
| Medium | 0.05-0.10 | Meaningful strategic difference |
| Large | > 0.10 | Major performance difference |
5.8 Chapter Summary
Statistical foundations enable rigorous football analysis:
- Probability provides the framework for decision-making under uncertainty
- EPA quantifies play value by measuring expected point changes
- Win Probability tracks game state and identifies crucial moments
- Hypothesis testing determines whether differences are meaningful
- Regression models relationships and makes predictions
- Avoiding pitfalls ensures conclusions are valid
These tools form the basis for all advanced football analytics covered in subsequent chapters.
Chapter Summary Checklist
After completing this chapter, you should be able to:
- [ ] Calculate conditional probabilities for game situations
- [ ] Explain how EPA is calculated and what it measures
- [ ] Interpret Win Probability and WPA
- [ ] Conduct and interpret hypothesis tests
- [ ] Build regression models for football outcomes
- [ ] Recognize and avoid common statistical pitfalls
Preview: Chapter 6
Part 2 begins with quarterback evaluation—applying these statistical foundations to measure passer performance beyond traditional stats.
Related Reading
Explore this topic in other books
AI Engineering Probability, Statistics & Information Theory Sports Betting Probability and Odds Sports Betting Probability Distributions Prediction Markets Probability Fundamentals College Football Analytics Descriptive Statistics Basketball Analytics Descriptive Statistics Soccer Analytics Statistics for Soccer