Key Takeaways: Statistical Foundations for Football Analytics
One-page reference for Chapter 5 concepts
Core Metrics
EPA (Expected Points Added)
EPA = EP_after_play - EP_before_play
- Measures how much a play changed expected point outcome
- Positive = good play, negative = bad play
- Average around 0 by construction
Win Probability (WP) & WPA
WPA = WP_after_play - WP_before_play
- WP: Probability of winning at any game state
- WPA: How much play changed win probability
- Context-dependent (game situation matters)
Key Probability Formulas
Conditional Probability
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
Bayes' Theorem
$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$
Expected Value
$$E[X] = \sum_{i} x_i \cdot P(x_i)$$
Hypothesis Testing Quick Reference
| Test | Use Case |
|---|---|
| One-sample t-test | Is this QB's EPA different from league avg? |
| Two-sample t-test | Do home teams have higher EPA than away? |
| Paired t-test | Same player, two conditions (home/away) |
| z-test for proportions | Is this catch rate different from 65%? |
| Chi-square | Association between two categorical variables |
P-value Interpretation
- p < 0.05: Statistically significant at 95% confidence
- p = 0.03: "If null is true, 3% chance of this extreme result"
- p ≠ P(hypothesis is true)
Effect Size Guidelines
| Cohen's d | Interpretation |
|---|---|
| < 0.2 | Negligible |
| 0.2 - 0.5 | Small |
| 0.5 - 0.8 | Medium |
| > 0.8 | Large |
Football EPA Thresholds
| Difference | Meaning |
|---|---|
| < 0.02 EPA | No practical difference |
| 0.02-0.05 | Small but noticeable |
| 0.05-0.10 | Meaningful |
| > 0.10 | Large difference |
Confidence Interval Formula
$$\bar{x} \pm t_{crit} \cdot \frac{s}{\sqrt{n}}$$
from scipy import stats
ci = stats.t.interval(0.95, df=n-1, loc=mean, scale=se)
Regression Quick Reference
Linear Regression
import statsmodels.api as sm
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())
Logistic Regression
from sklearn.linear_model import LogisticRegression
model = LogisticRegression().fit(X, y)
odds_ratios = np.exp(model.coef_)
Common Pitfalls
| Pitfall | Description | Solution |
|---|---|---|
| Small sample | Wide CIs, unreliable estimates | Report sample size, use CI |
| Multiple comparisons | False positives accumulate | Bonferroni or FDR correction |
| Regression to mean | Extreme values regress toward avg | Don't overreact to outliers |
| Survivorship bias | Only analyzing successes | Include failures in analysis |
| Selection bias | Non-random sample | Acknowledge limitations |
| p-hacking | Testing until significant | Pre-register hypotheses |
Quick Statistical Tests
from scipy import stats
# Two-sample t-test
t, p = stats.ttest_ind(group1, group2)
# One-sample t-test
t, p = stats.ttest_1samp(data, population_mean)
# Correlation
r, p = stats.pearsonr(x, y)
# Chi-square
chi2, p, dof, expected = stats.chi2_contingency(table)
When to Use What
| Question | Analysis |
|---|---|
| Is X different from average? | One-sample t-test |
| Are A and B different? | Two-sample t-test |
| Predict continuous Y | Linear regression |
| Predict binary Y | Logistic regression |
| Reduce Type I error | Multiple testing correction |
| Practical importance | Effect size (Cohen's d) |
Red Flags in Analysis
- "Significant" with tiny effect size
- No sample size reported
- Cherry-picked time period
- No confidence intervals
- Correlation treated as causation
- p = 0.049 after many tests
Preview: Chapter 6
Part 2 begins with Quarterback Evaluation—applying these statistical foundations to measure passing performance.