Key Takeaways: Statistical Foundations for Football Analytics

One-page reference for Chapter 5 concepts


Core Metrics

EPA (Expected Points Added)

EPA = EP_after_play - EP_before_play
  • Measures how much a play changed expected point outcome
  • Positive = good play, negative = bad play
  • Average around 0 by construction

Win Probability (WP) & WPA

WPA = WP_after_play - WP_before_play
  • WP: Probability of winning at any game state
  • WPA: How much play changed win probability
  • Context-dependent (game situation matters)

Key Probability Formulas

Conditional Probability

$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

Bayes' Theorem

$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$

Expected Value

$$E[X] = \sum_{i} x_i \cdot P(x_i)$$


Hypothesis Testing Quick Reference

Test Use Case
One-sample t-test Is this QB's EPA different from league avg?
Two-sample t-test Do home teams have higher EPA than away?
Paired t-test Same player, two conditions (home/away)
z-test for proportions Is this catch rate different from 65%?
Chi-square Association between two categorical variables

P-value Interpretation

  • p < 0.05: Statistically significant at 95% confidence
  • p = 0.03: "If null is true, 3% chance of this extreme result"
  • p ≠ P(hypothesis is true)

Effect Size Guidelines

Cohen's d Interpretation
< 0.2 Negligible
0.2 - 0.5 Small
0.5 - 0.8 Medium
> 0.8 Large

Football EPA Thresholds

Difference Meaning
< 0.02 EPA No practical difference
0.02-0.05 Small but noticeable
0.05-0.10 Meaningful
> 0.10 Large difference

Confidence Interval Formula

$$\bar{x} \pm t_{crit} \cdot \frac{s}{\sqrt{n}}$$

from scipy import stats
ci = stats.t.interval(0.95, df=n-1, loc=mean, scale=se)

Regression Quick Reference

Linear Regression

import statsmodels.api as sm
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())

Logistic Regression

from sklearn.linear_model import LogisticRegression
model = LogisticRegression().fit(X, y)
odds_ratios = np.exp(model.coef_)

Common Pitfalls

Pitfall Description Solution
Small sample Wide CIs, unreliable estimates Report sample size, use CI
Multiple comparisons False positives accumulate Bonferroni or FDR correction
Regression to mean Extreme values regress toward avg Don't overreact to outliers
Survivorship bias Only analyzing successes Include failures in analysis
Selection bias Non-random sample Acknowledge limitations
p-hacking Testing until significant Pre-register hypotheses

Quick Statistical Tests

from scipy import stats

# Two-sample t-test
t, p = stats.ttest_ind(group1, group2)

# One-sample t-test
t, p = stats.ttest_1samp(data, population_mean)

# Correlation
r, p = stats.pearsonr(x, y)

# Chi-square
chi2, p, dof, expected = stats.chi2_contingency(table)

When to Use What

Question Analysis
Is X different from average? One-sample t-test
Are A and B different? Two-sample t-test
Predict continuous Y Linear regression
Predict binary Y Logistic regression
Reduce Type I error Multiple testing correction
Practical importance Effect size (Cohen's d)

Red Flags in Analysis

  • "Significant" with tiny effect size
  • No sample size reported
  • Cherry-picked time period
  • No confidence intervals
  • Correlation treated as causation
  • p = 0.049 after many tests

Preview: Chapter 6

Part 2 begins with Quarterback Evaluation—applying these statistical foundations to measure passing performance.