Chapter 18: Key Takeaways - Game Outcome Prediction

DataField.Dev

Chapter 18: Key Takeaways - Game Outcome Prediction

Quick Reference Summary

This chapter covered building complete game prediction systems using rating systems, feature engineering, and machine learning.

Core Concepts

Prediction Targets

Target	Output	Use Case	Key Metric
Win Probability	0-1 probability	Broadcasting, analytics	Brier Score
Point Spread	Expected margin	Betting, power rankings	MAE
Straight-Up	Binary winner	Season records	Accuracy
Total Points	Combined score	Over/under betting	RMSE

Theoretical Limits

Prediction Ceiling: ~75-78%
- Due to inherent game variance (σ ≈ 13.5 points)
- Even perfect knowledge of "true" spread limits accuracy
- Random component is irreducible

Good Model Targets:
- Accuracy: 65-70%
- AUC-ROC: 0.72-0.78
- Brier Score: < 0.22

Essential Formulas

Spread-Probability Conversion

P(win) = 1 - Φ(0 | μ=spread, σ=13.5)
P(win) ≈ 1 / (1 + e^(-spread/4))  # Simplified

spread = Φ⁻¹(P) × σ

Example conversions:
  -7 points → 70% win probability
  -14 points → 85% win probability
  -3 points → 59% win probability

Elo Rating Update

Expected Score:
  E = 1 / (1 + 10^((R_opp - R_self) / 400))

Rating Update:
  R_new = R_old + K × (Actual - Expected)

Home Field Adjustment:
  Add ~65 Elo points to home team for expected score

Season Reversion:
  R_season_start = R_prior + 0.33 × (1500 - R_prior)

Brier Skill Score

BSS = 1 - (BS_model / BS_baseline)

Where:
  BS = mean((P_predicted - Y_actual)²)
  BS_baseline = variance of outcomes (use home win rate)

BSS > 0: Model beats baseline
BSS = 1: Perfect prediction

Code Patterns

Elo Rating System

class EloSystem:
    def __init__(self, k=20, hfa=65, reversion=0.33):
        self.k = k
        self.hfa = hfa
        self.reversion = reversion
        self.ratings = {}

    def expected_prob(self, home, away):
        h_rating = self.ratings.get(home, 1500) + self.hfa
        a_rating = self.ratings.get(away, 1500)
        return 1 / (1 + 10 ** ((a_rating - h_rating) / 400))

    def update(self, home, away, home_won):
        expected = self.expected_prob(home, away)
        actual = 1.0 if home_won else 0.0
        change = self.k * (actual - expected)

        self.ratings[home] = self.ratings.get(home, 1500) + change
        self.ratings[away] = self.ratings.get(away, 1500) - change

Feature Engineering

def create_game_features(home_stats, away_stats):
    """Create differential features."""
    features = {}

    # Differentials
    for stat in ['off_epa', 'def_epa', 'win_pct', 'ppg']:
        features[f'{stat}_diff'] = home_stats[stat] - away_stats[stat]

    # Aggregates
    features['total_quality'] = home_stats['total_epa'] + away_stats['total_epa']

    # Indicators
    features['home_field'] = 1

    return features

Ensemble Prediction

def ensemble_predict(elo_prob, ml_prob, weights={'elo': 0.35, 'ml': 0.65}):
    """Combine Elo and ML predictions."""
    return weights['elo'] * elo_prob + weights['ml'] * ml_prob

# Optimize weights on validation set
from sklearn.metrics import brier_score_loss
best_weight = min(
    np.arange(0.1, 0.6, 0.05),
    key=lambda w: brier_score_loss(y_val, w*elo + (1-w)*ml)
)

Temporal Cross-Validation

from sklearn.model_selection import TimeSeriesSplit

def temporal_cv(X, y, dates, model, n_splits=5):
    """Cross-validate with temporal ordering."""
    sorted_idx = dates.argsort()
    X_sorted = X.iloc[sorted_idx]
    y_sorted = y.iloc[sorted_idx]

    tscv = TimeSeriesSplit(n_splits=n_splits)
    scores = cross_val_score(model, X_sorted, y_sorted, cv=tscv)

    return scores.mean(), scores.std()

Model Selection Guide

Scenario	Recommended Approach
Limited data (< 500 games)	Elo ratings alone
Standard prediction	Elo + Logistic Regression ensemble
Complex feature interactions	Gradient Boosting
Production system	Calibrated ensemble with multiple models
Real-time updates	Elo (low computational cost)
Maximum accuracy	Neural network ensemble

Common Pitfalls

1. Data Leakage

Wrong: Using game outcome features to predict game outcome

# BAD - includes future information
features = ['home_score', 'away_score', 'final_margin']

Right: Use only pre-game information

# GOOD - only prior information
features = ['home_rating', 'away_rating', 'rest_days']

2. Random Split for Time Series

Wrong: Random train/test split

# BAD - future games in training, past in test
X_train, X_test = train_test_split(X, test_size=0.2)

Right: Temporal split

# GOOD - always train on past, test on future
train = data[data['season'] < 2023]
test = data[data['season'] >= 2023]

3. Ignoring Calibration

Wrong: Only checking accuracy

# BAD - high accuracy but poor probabilities
print(f"Accuracy: {accuracy:.1%}")

Right: Assess calibration

# GOOD - check probability quality
from sklearn.metrics import brier_score_loss
print(f"Accuracy: {accuracy:.1%}")
print(f"Brier Score: {brier:.4f}")
print(f"ECE: {ece:.4f}")

4. Overfitting to Small Samples

Wrong: Complex model with few games

# BAD - too many parameters for data size
model = RandomForestClassifier(n_estimators=500, max_depth=20)
model.fit(X_train[:100], y_train[:100])  # Only 100 games!

Right: Match complexity to data

# GOOD - simpler model for small data
model = LogisticRegression(C=0.1)  # Regularized
model.fit(X_train, y_train)

Evaluation Checklist

Before Training

[ ] No data leakage (future info not in features)
[ ] Temporal split implemented
[ ] Features use only pre-game information
[ ] Baseline accuracy calculated

Model Comparison

[ ] Multiple models evaluated
[ ] Cross-validation used (TimeSeriesSplit)
[ ] Feature importance analyzed
[ ] Ensemble combinations tested

Evaluation

[ ] Accuracy vs. baseline compared
[ ] Brier Score calculated
[ ] Calibration curve plotted
[ ] ATS performance measured (if applicable)
[ ] Confidence stratification analyzed

Quick Reference Tables

Typical Metrics by Model Quality

Quality Level	Accuracy	AUC-ROC	Brier Score
Poor	< 55%	< 0.55	> 0.28
Baseline	55-60%	0.55-0.65	0.25-0.28
Good	60-65%	0.65-0.72	0.22-0.25
Very Good	65-70%	0.72-0.78	0.18-0.22
Excellent	> 70%	> 0.78	< 0.18

Feature Importance Rankings (Typical)

Rank	Feature Type	Description
1	Elo/Rating Difference	Overall team quality
2	Efficiency Differential	EPA, success rate
3	Recent Form	Last 5 games performance
4	Win Percentage Diff	Season record comparison
5	Home Field	~3 point advantage
6	Rest Differential	Bye week, short week
7	Turnover Margin	Ball security
8	Head-to-Head	Historical matchup

Spread to Probability Quick Reference

Spread	Win Prob	Upset Rate
-21	94%	6%
-14	85%	15%
-10	77%	23%
-7	70%	30%
-3	59%	41%
Pick	50%	50%
+3	41%	N/A
+7	30%	N/A

Red Flags

Warning signs your model may have issues:

Accuracy > 75% → Likely data leakage
Train >> Test accuracy → Overfitting
Probabilities clustered near 0.5 → Model not discriminating
ECE > 0.05 → Poor calibration
ATS < 50% → Not beating random
Negative Brier Skill Score → Worse than baseline

Next Steps

After mastering game prediction, proceed to: - Chapter 19: Player Performance Forecasting - Chapter 20: Recruiting Analytics - Chapter 21: Win Probability Models - Chapter 22: Machine Learning Applications