Predicting the outcome of college football games represents one of the most challenging and rewarding applications of sports analytics. With billions of dollars wagered annually on college football, sophisticated models compete against Vegas odds...
In This Chapter
- Learning Objectives
- Introduction
- 18.1 The Game Prediction Problem
- 18.2 Rating Systems for Team Strength
- 18.3 Feature Engineering for Game Prediction
- 18.4 Machine Learning Models for Game Prediction
- 18.5 Evaluating Game Predictions
- 18.6 Production Deployment
- Summary
- Chapter 18 Exercises
- Chapter 18 Code Examples
Chapter 18: Game Outcome Prediction
Learning Objectives
By the end of this chapter, you will be able to:
- Build comprehensive game prediction models using multiple approaches
- Engineer advanced features capturing team strength, matchup dynamics, and situational factors
- Implement Elo rating systems and power rankings for football prediction
- Apply ensemble methods to combine predictions from multiple models
- Evaluate model performance against betting markets and other benchmarks
- Deploy production-ready prediction systems for real-time game forecasting
Introduction
Predicting the outcome of college football games represents one of the most challenging and rewarding applications of sports analytics. With billions of dollars wagered annually on college football, sophisticated models compete against Vegas odds, while teams and broadcasters seek accurate predictions for strategic planning and entertainment. This chapter presents state-of-the-art techniques for game outcome prediction, building upon the machine learning foundations from Chapter 17.
Game prediction differs fundamentally from other prediction tasks because of the inherent unpredictability of football. Unlike baseball's large sample sizes or basketball's frequent scoring, football games are decided by relatively few plays, each with enormous variance. A single fumble, interception, or blown coverage can swing a game by 7-14 points. Despite this variance, systematic patterns exist—better teams win more often, home teams have advantages, and certain matchups favor specific playing styles.
The goal of this chapter is not to create a perfect prediction system (which is impossible), but to build models that capture meaningful signal, produce well-calibrated probabilities, and outperform naive baselines. We'll examine multiple approaches: statistical models built from team metrics, Elo-style rating systems, machine learning classifiers, and ensemble methods that combine predictions intelligently.
18.1 The Game Prediction Problem
18.1.1 What We're Predicting
Game prediction typically targets one or more outcomes:
Win Probability: The probability that a specific team wins the game. This is the most common prediction target.
Point Spread: The expected margin of victory. Related to win probability but captures game competitiveness.
Total Points (Over/Under): The total points scored by both teams. Less common in academic analytics but critical for betting markets.
Straight-Up Winner: Binary classification—which team wins without regard to margin.
Each target has different characteristics and requires different modeling approaches:
import numpy as np
import pandas as pd
from dataclasses import dataclass
from typing import Dict, List, Tuple, Optional
from enum import Enum
class PredictionTarget(Enum):
"""Types of game predictions."""
WIN_PROBABILITY = 'win_probability'
POINT_SPREAD = 'point_spread'
TOTAL_POINTS = 'total_points'
STRAIGHT_UP = 'straight_up'
@dataclass
class GamePrediction:
"""
Complete prediction for a single game.
Attributes:
-----------
game_id : str
Unique game identifier
home_team : str
Home team name
away_team : str
Away team name
home_win_prob : float
Probability home team wins (0-1)
predicted_spread : float
Predicted margin (positive = home favored)
predicted_total : float
Predicted total points
confidence : float
Model confidence in prediction
model_name : str
Name of model generating prediction
"""
game_id: str
home_team: str
away_team: str
home_win_prob: float
predicted_spread: float
predicted_total: float
confidence: float = 0.5
model_name: str = 'default'
@property
def away_win_prob(self) -> float:
"""Away team win probability."""
return 1 - self.home_win_prob
@property
def predicted_winner(self) -> str:
"""Predicted winner based on probability."""
return self.home_team if self.home_win_prob > 0.5 else self.away_team
def to_dict(self) -> Dict:
"""Convert to dictionary."""
return {
'game_id': self.game_id,
'home_team': self.home_team,
'away_team': self.away_team,
'home_win_prob': self.home_win_prob,
'away_win_prob': self.away_win_prob,
'predicted_spread': self.predicted_spread,
'predicted_total': self.predicted_total,
'predicted_winner': self.predicted_winner,
'confidence': self.confidence,
'model': self.model_name
}
18.1.2 The Challenge of Football Prediction
Football prediction is inherently difficult for several reasons:
Small Sample Sizes: Each team plays only 12-14 games per season, providing limited data to assess team strength.
High Variance Per Play: Individual plays have enormous variance. A 60-yard touchdown pass and an interception might both come from similar pre-snap situations.
Dynamic Team Strength: Teams improve or regress throughout the season due to injuries, coaching adjustments, player development, and momentum.
Context Dependence: Team performance varies by opponent, weather, venue, time of day, and broader season context.
Selection Effects: Bowl games and playoffs match teams differently than regular season, creating selection bias in historical data.
Understanding these challenges helps set appropriate expectations and guides modeling decisions:
def calculate_baseline_accuracy(data: pd.DataFrame) -> Dict:
"""
Calculate baseline prediction accuracy.
Parameters:
-----------
data : pd.DataFrame
Historical game data with 'home_win' column
Returns:
--------
Dict : Baseline accuracy metrics
"""
home_win_rate = data['home_win'].mean()
baselines = {
'home_always': home_win_rate, # Always pick home team
'away_always': 1 - home_win_rate, # Always pick away team
'coin_flip': 0.50, # Random guess
'majority_class': max(home_win_rate, 1 - home_win_rate)
}
# Typical historical home win rate in CFB
print("Baseline Prediction Accuracy:")
print(f" Home team wins: {home_win_rate:.1%}")
print(f" Always pick home: {baselines['home_always']:.1%}")
print(f" Random guess: {baselines['coin_flip']:.1%}")
print(f" Majority class: {baselines['majority_class']:.1%}")
return baselines
def calculate_prediction_ceiling(n_games: int = 1000) -> float:
"""
Estimate theoretical prediction ceiling for football.
Based on analysis of closing lines and actual outcomes,
the practical ceiling is approximately 75-78% for individual games.
Parameters:
-----------
n_games : int
Number of simulated games
Returns:
--------
float : Estimated ceiling accuracy
"""
np.random.seed(42)
# True team strength difference (in points)
true_spread = np.random.normal(0, 10, n_games)
# Actual margin includes game variance (std dev ~13-14 points)
actual_margin = true_spread + np.random.normal(0, 13.5, n_games)
# Perfect knowledge of spread
predicted_winner = true_spread > 0
actual_winner = actual_margin > 0
accuracy = (predicted_winner == actual_winner).mean()
print(f"Theoretical ceiling with perfect spread knowledge: {accuracy:.1%}")
print("Note: This assumes spread captures all predictable information")
return accuracy
18.1.3 Relationship Between Spread and Win Probability
Point spread and win probability are closely related but distinct. The market point spread represents the expected margin of victory, while win probability represents the likelihood of winning regardless of margin.
The relationship between spread and win probability follows a sigmoid-like curve, but the exact shape depends on assumptions about score variance:
from scipy.stats import norm
def spread_to_probability(spread: float,
std_dev: float = 13.5) -> float:
"""
Convert point spread to win probability.
Uses a normal distribution assumption for score margin.
Parameters:
-----------
spread : float
Point spread (positive = team favored)
std_dev : float
Standard deviation of score margins (typically 13-14)
Returns:
--------
float : Win probability
"""
# P(win) = P(actual_margin > 0) where actual_margin ~ N(spread, std_dev)
prob = 1 - norm.cdf(0, loc=spread, scale=std_dev)
return prob
def probability_to_spread(prob: float,
std_dev: float = 13.5) -> float:
"""
Convert win probability to implied point spread.
Parameters:
-----------
prob : float
Win probability (0-1)
std_dev : float
Standard deviation of score margins
Returns:
--------
float : Implied point spread
"""
# Inverse of spread_to_probability
spread = norm.ppf(prob) * std_dev
return spread
def demonstrate_spread_probability_relationship():
"""Show the spread-probability relationship."""
spreads = np.arange(-28, 29, 1)
probs = [spread_to_probability(s) for s in spreads]
print("\nSpread to Win Probability Conversion:")
print("-" * 40)
for spread, prob in [(-21, None), (-14, None), (-7, None),
(-3, None), (0, None), (3, None),
(7, None), (14, None), (21, None)]:
prob = spread_to_probability(spread)
print(f" Spread {spread:+3d}: {prob:.1%} win probability")
return spreads, probs
# Example output
demonstrate_spread_probability_relationship()
18.2 Rating Systems for Team Strength
Rating systems provide a principled way to estimate team strength from game results. These ratings can then predict future games by comparing team strengths.
18.2.1 The Elo Rating System
Developed by Arpad Elo for chess, the Elo system has been successfully adapted to football. The key insight is that rating differences predict win probabilities, and actual results update ratings proportionally to how surprising they were.
class EloRatingSystem:
"""
Elo rating system for college football.
The system maintains ratings for all teams and updates them
after each game based on actual vs expected results.
"""
def __init__(self,
initial_rating: float = 1500,
k_factor: float = 20,
home_advantage: float = 65,
mean_reversion: float = 0.25):
"""
Initialize Elo system.
Parameters:
-----------
initial_rating : float
Starting rating for new teams
k_factor : float
Maximum rating change per game
home_advantage : float
Home field advantage in Elo points
mean_reversion : float
Fraction of rating that reverts to mean each season
"""
self.initial_rating = initial_rating
self.k_factor = k_factor
self.home_advantage = home_advantage
self.mean_reversion = mean_reversion
self.ratings = {}
self.history = []
def get_rating(self, team: str) -> float:
"""Get current rating for a team."""
return self.ratings.get(team, self.initial_rating)
def expected_win_prob(self, home_team: str, away_team: str) -> float:
"""
Calculate expected win probability for home team.
Uses logistic curve with home advantage adjustment.
Parameters:
-----------
home_team : str
Home team name
away_team : str
Away team name
Returns:
--------
float : Home team win probability
"""
home_rating = self.get_rating(home_team) + self.home_advantage
away_rating = self.get_rating(away_team)
rating_diff = home_rating - away_rating
# Logistic function: maps rating diff to probability
# 400 point diff ≈ 91% win probability
expected = 1 / (1 + 10 ** (-rating_diff / 400))
return expected
def update_ratings(self, home_team: str, away_team: str,
home_score: int, away_score: int,
margin_of_victory_multiplier: bool = True) -> Dict:
"""
Update ratings after a game.
Parameters:
-----------
home_team, away_team : str
Team names
home_score, away_score : int
Final scores
margin_of_victory_multiplier : bool
Whether to adjust K based on margin
Returns:
--------
Dict : Update details
"""
# Get pre-game ratings
home_rating_pre = self.get_rating(home_team)
away_rating_pre = self.get_rating(away_team)
# Expected result
expected_home = self.expected_win_prob(home_team, away_team)
# Actual result (1 for win, 0 for loss, 0.5 for tie)
if home_score > away_score:
actual_home = 1.0
elif home_score < away_score:
actual_home = 0.0
else:
actual_home = 0.5
# Margin of victory multiplier (reduces blowout impact)
if margin_of_victory_multiplier:
margin = abs(home_score - away_score)
mov_mult = np.log(max(margin, 1) + 1) * (2.2 / (1 + 0.001 * abs(
home_rating_pre - away_rating_pre + self.home_advantage * (1 if actual_home > 0.5 else -1)
)))
mov_mult = min(mov_mult, 3.0) # Cap at 3x
else:
mov_mult = 1.0
# Calculate rating change
k_effective = self.k_factor * mov_mult
rating_change = k_effective * (actual_home - expected_home)
# Update ratings
self.ratings[home_team] = home_rating_pre + rating_change
self.ratings[away_team] = away_rating_pre - rating_change
# Record history
result = {
'home_team': home_team,
'away_team': away_team,
'home_score': home_score,
'away_score': away_score,
'home_rating_pre': home_rating_pre,
'away_rating_pre': away_rating_pre,
'home_rating_post': self.ratings[home_team],
'away_rating_post': self.ratings[away_team],
'expected_home': expected_home,
'actual_home': actual_home,
'rating_change': rating_change
}
self.history.append(result)
return result
def apply_season_mean_reversion(self):
"""
Apply mean reversion at season boundary.
Ratings regress toward mean to account for roster turnover.
"""
mean_rating = np.mean(list(self.ratings.values())) if self.ratings else self.initial_rating
for team in self.ratings:
current = self.ratings[team]
self.ratings[team] = current + self.mean_reversion * (mean_rating - current)
def get_rankings(self, top_n: int = 25) -> pd.DataFrame:
"""Get top teams by rating."""
if not self.ratings:
return pd.DataFrame()
rankings = pd.DataFrame([
{'team': team, 'rating': rating}
for team, rating in self.ratings.items()
]).sort_values('rating', ascending=False)
rankings['rank'] = range(1, len(rankings) + 1)
return rankings.head(top_n)
def evaluate_predictions(self, test_games: pd.DataFrame) -> Dict:
"""
Evaluate prediction accuracy on test games.
Parameters:
-----------
test_games : pd.DataFrame
Games with home_team, away_team, home_win columns
Returns:
--------
Dict : Evaluation metrics
"""
predictions = []
actuals = []
for _, game in test_games.iterrows():
pred_prob = self.expected_win_prob(game['home_team'], game['away_team'])
predictions.append(pred_prob)
actuals.append(game['home_win'])
predictions = np.array(predictions)
actuals = np.array(actuals)
# Binary predictions
pred_binary = (predictions > 0.5).astype(int)
from sklearn.metrics import accuracy_score, roc_auc_score, brier_score_loss
return {
'accuracy': accuracy_score(actuals, pred_binary),
'auc_roc': roc_auc_score(actuals, predictions),
'brier_score': brier_score_loss(actuals, predictions),
'log_loss': -np.mean(
actuals * np.log(predictions + 1e-10) +
(1 - actuals) * np.log(1 - predictions + 1e-10)
)
}
18.2.2 Margin-Based Rating Systems
Pure Elo considers only wins and losses. Margin-based systems incorporate point differential for more signal:
class MarginRatingSystem:
"""
Rating system based on scoring margin.
Similar to Massey Ratings or simple power rankings.
Uses least squares to solve for team strengths.
"""
def __init__(self, home_advantage: float = 2.5):
"""
Initialize margin rating system.
Parameters:
-----------
home_advantage : float
Home field advantage in points
"""
self.home_advantage = home_advantage
self.ratings = {}
self.games = []
def add_game(self, home_team: str, away_team: str,
home_score: int, away_score: int):
"""Add a game to the system."""
self.games.append({
'home_team': home_team,
'away_team': away_team,
'home_score': home_score,
'away_score': away_score,
'margin': home_score - away_score
})
def solve_ratings(self, ridge_alpha: float = 0.1) -> Dict[str, float]:
"""
Solve for team ratings using ridge regression.
Model: margin = home_rating - away_rating + home_advantage + noise
Parameters:
-----------
ridge_alpha : float
Regularization parameter
Returns:
--------
Dict[str, float] : Team ratings
"""
if not self.games:
return {}
df = pd.DataFrame(self.games)
# Get all teams
all_teams = sorted(set(df['home_team']) | set(df['away_team']))
team_to_idx = {team: i for i, team in enumerate(all_teams)}
n_teams = len(all_teams)
n_games = len(df)
# Build design matrix
# Each row: home team gets +1, away team gets -1
X = np.zeros((n_games, n_teams))
y = df['margin'].values - self.home_advantage # Remove HFA
for i, game in df.iterrows():
X[i, team_to_idx[game['home_team']]] = 1
X[i, team_to_idx[game['away_team']]] = -1
# Ridge regression for stability
from sklearn.linear_model import Ridge
model = Ridge(alpha=ridge_alpha, fit_intercept=False)
model.fit(X, y)
# Extract ratings
self.ratings = {
team: model.coef_[idx]
for team, idx in team_to_idx.items()
}
return self.ratings
def predict_spread(self, home_team: str, away_team: str) -> float:
"""Predict point spread for a matchup."""
home_rating = self.ratings.get(home_team, 0)
away_rating = self.ratings.get(away_team, 0)
return home_rating - away_rating + self.home_advantage
def predict_win_prob(self, home_team: str, away_team: str,
std_dev: float = 13.5) -> float:
"""Predict win probability from spread."""
spread = self.predict_spread(home_team, away_team)
return spread_to_probability(spread, std_dev)
18.2.3 Adjusted Efficiency Ratings
Modern rating systems often use play-level efficiency metrics rather than just game scores:
class AdjustedEfficiencyRatings:
"""
Rating system based on adjusted offensive and defensive efficiency.
Similar to approaches used by ESPN's FPI or SP+.
Adjusts for opponent strength and game context.
"""
def __init__(self):
self.off_ratings = {}
self.def_ratings = {}
self.overall_ratings = {}
self.games = []
def add_game_efficiency(self, team: str, opponent: str,
off_epa: float, def_epa: float,
is_home: bool = True):
"""
Add game efficiency data.
Parameters:
-----------
team : str
Team name
opponent : str
Opponent name
off_epa : float
Offensive EPA per play
def_epa : float
Defensive EPA per play (negative is better)
is_home : bool
Whether team was home
"""
self.games.append({
'team': team,
'opponent': opponent,
'off_epa': off_epa,
'def_epa': def_epa,
'is_home': is_home
})
def calculate_ratings(self, n_iterations: int = 10) -> None:
"""
Calculate adjusted efficiency ratings iteratively.
Uses iterative adjustment to account for opponent strength.
Parameters:
-----------
n_iterations : int
Number of adjustment iterations
"""
df = pd.DataFrame(self.games)
all_teams = sorted(set(df['team']) | set(df['opponent']))
# Initialize with raw averages
for team in all_teams:
team_games = df[df['team'] == team]
if len(team_games) > 0:
self.off_ratings[team] = team_games['off_epa'].mean()
self.def_ratings[team] = team_games['def_epa'].mean()
else:
self.off_ratings[team] = 0.0
self.def_ratings[team] = 0.0
# Iterative adjustment
for iteration in range(n_iterations):
new_off = {}
new_def = {}
for team in all_teams:
team_games = df[df['team'] == team]
if len(team_games) == 0:
new_off[team] = 0.0
new_def[team] = 0.0
continue
# Adjust offensive production for opponent defense
adj_off = []
for _, game in team_games.iterrows():
opp_def = self.def_ratings.get(game['opponent'], 0)
# If opponent defense is bad (high EPA allowed), adjust down
adjustment = game['off_epa'] - opp_def
adj_off.append(adjustment)
# Adjust defensive production for opponent offense
adj_def = []
for _, game in team_games.iterrows():
opp_off = self.off_ratings.get(game['opponent'], 0)
# If opponent offense is good, adjust up (less negative)
adjustment = game['def_epa'] + opp_off
adj_def.append(adjustment)
new_off[team] = np.mean(adj_off)
new_def[team] = np.mean(adj_def)
self.off_ratings = new_off
self.def_ratings = new_def
# Calculate overall rating
for team in all_teams:
self.overall_ratings[team] = (
self.off_ratings[team] - self.def_ratings[team]
)
def predict_game(self, home_team: str, away_team: str,
home_advantage: float = 0.03) -> Dict:
"""
Predict game outcome.
Parameters:
-----------
home_team : str
Home team
away_team : str
Away team
home_advantage : float
Home advantage in EPA units
Returns:
--------
Dict : Prediction details
"""
# Home offense vs away defense
home_off = self.off_ratings.get(home_team, 0) + home_advantage
away_def = self.def_ratings.get(away_team, 0)
home_expected_off = home_off - away_def
# Away offense vs home defense
away_off = self.off_ratings.get(away_team, 0)
home_def = self.def_ratings.get(home_team, 0) + home_advantage * 0.5
away_expected_off = away_off - home_def
# Net efficiency difference
efficiency_diff = home_expected_off - away_expected_off
# Convert to spread (roughly 20 points per 0.1 EPA/play)
predicted_spread = efficiency_diff * 200
# Convert to probability
win_prob = spread_to_probability(predicted_spread)
return {
'home_win_prob': win_prob,
'predicted_spread': predicted_spread,
'home_expected_epa': home_expected_off,
'away_expected_epa': away_expected_off,
'efficiency_diff': efficiency_diff
}
18.3 Feature Engineering for Game Prediction
Effective feature engineering is critical for game prediction models. Features should capture team strength, matchup dynamics, situational factors, and historical patterns.
18.3.1 Team Strength Features
class TeamStrengthFeatures:
"""
Generate team strength features for game prediction.
"""
def __init__(self, lookback_games: int = 5):
"""
Initialize feature generator.
Parameters:
-----------
lookback_games : int
Number of recent games for rolling features
"""
self.lookback_games = lookback_games
def generate_features(self, team_data: pd.DataFrame,
game_date: pd.Timestamp) -> Dict:
"""
Generate features for a team at a specific point in time.
Parameters:
-----------
team_data : pd.DataFrame
Historical team game data
game_date : pd.Timestamp
Date of game to predict (features use only prior data)
Returns:
--------
Dict : Feature dictionary
"""
# Filter to games before this date
prior_games = team_data[team_data['date'] < game_date]
if len(prior_games) == 0:
return self._default_features()
# Season to date features
season = game_date.year if game_date.month > 6 else game_date.year - 1
season_games = prior_games[prior_games['season'] == season]
features = {}
# Win percentage
features['season_win_pct'] = (
season_games['win'].mean() if len(season_games) > 0 else 0.5
)
# Scoring features
features['ppg'] = season_games['points_for'].mean() if len(season_games) > 0 else 25
features['papg'] = season_games['points_against'].mean() if len(season_games) > 0 else 25
features['point_diff_per_game'] = features['ppg'] - features['papg']
# Efficiency features (if available)
if 'off_epa' in season_games.columns:
features['off_epa'] = season_games['off_epa'].mean()
features['def_epa'] = season_games['def_epa'].mean()
features['total_epa'] = features['off_epa'] - features['def_epa']
# Rolling features (recent form)
recent = season_games.tail(self.lookback_games)
if len(recent) > 0:
features['recent_win_pct'] = recent['win'].mean()
features['recent_ppg'] = recent['points_for'].mean()
features['recent_papg'] = recent['points_against'].mean()
else:
features['recent_win_pct'] = features['season_win_pct']
features['recent_ppg'] = features['ppg']
features['recent_papg'] = features['papg']
# Turnover features
if 'turnovers' in season_games.columns:
features['turnover_margin'] = (
(season_games['turnovers_forced'] - season_games['turnovers']).mean()
)
else:
features['turnover_margin'] = 0
# Third down and red zone (if available)
if 'third_down_pct' in season_games.columns:
features['third_down_pct'] = season_games['third_down_pct'].mean()
features['red_zone_pct'] = season_games['red_zone_pct'].mean()
return features
def _default_features(self) -> Dict:
"""Return default features for new/unknown teams."""
return {
'season_win_pct': 0.5,
'ppg': 25,
'papg': 25,
'point_diff_per_game': 0,
'off_epa': 0,
'def_epa': 0,
'total_epa': 0,
'recent_win_pct': 0.5,
'recent_ppg': 25,
'recent_papg': 25,
'turnover_margin': 0
}
class MatchupFeatures:
"""
Generate matchup-specific features.
"""
@staticmethod
def generate_differential_features(home_features: Dict,
away_features: Dict) -> Dict:
"""
Create differential features between teams.
Parameters:
-----------
home_features : Dict
Home team features
away_features : Dict
Away team features
Returns:
--------
Dict : Differential features
"""
diff_features = {}
# For each numeric feature, create differential
for key in home_features:
if isinstance(home_features[key], (int, float)):
diff_features[f'{key}_diff'] = (
home_features[key] - away_features.get(key, 0)
)
# Matchup-specific features
diff_features['total_quality'] = (
home_features.get('total_epa', 0) +
away_features.get('total_epa', 0)
)
# Stylistic matchups
diff_features['offense_matchup'] = (
home_features.get('off_epa', 0) +
away_features.get('def_epa', 0) # Good offense vs bad defense
)
diff_features['defense_matchup'] = (
-home_features.get('def_epa', 0) -
away_features.get('off_epa', 0) # Good defense vs good offense
)
return diff_features
@staticmethod
def generate_head_to_head_features(history: pd.DataFrame,
home_team: str,
away_team: str,
n_games: int = 5) -> Dict:
"""
Generate head-to-head historical features.
Parameters:
-----------
history : pd.DataFrame
Historical games between teams
home_team : str
Home team name
away_team : str
Away team name
n_games : int
Number of recent games to consider
Returns:
--------
Dict : Head-to-head features
"""
# Filter to games between these teams
h2h = history[
((history['home_team'] == home_team) & (history['away_team'] == away_team)) |
((history['home_team'] == away_team) & (history['away_team'] == home_team))
].tail(n_games)
if len(h2h) == 0:
return {
'h2h_games': 0,
'h2h_win_pct': 0.5,
'h2h_avg_margin': 0
}
# Calculate win percentage for home team
home_wins = h2h[
((h2h['home_team'] == home_team) & (h2h['home_win'] == 1)) |
((h2h['away_team'] == home_team) & (h2h['home_win'] == 0))
]
# Average margin (from home team perspective)
margins = []
for _, game in h2h.iterrows():
if game['home_team'] == home_team:
margin = game['home_score'] - game['away_score']
else:
margin = game['away_score'] - game['home_score']
margins.append(margin)
return {
'h2h_games': len(h2h),
'h2h_win_pct': len(home_wins) / len(h2h),
'h2h_avg_margin': np.mean(margins)
}
18.3.2 Situational Features
class SituationalFeatures:
"""
Generate situational and contextual features.
"""
@staticmethod
def generate_rest_features(home_last_game: pd.Timestamp,
away_last_game: pd.Timestamp,
game_date: pd.Timestamp) -> Dict:
"""
Generate rest and scheduling features.
Parameters:
-----------
home_last_game : pd.Timestamp
Date of home team's last game
away_last_game : pd.Timestamp
Date of away team's last game
game_date : pd.Timestamp
Date of upcoming game
Returns:
--------
Dict : Rest features
"""
home_rest = (game_date - home_last_game).days
away_rest = (game_date - away_last_game).days
return {
'home_rest_days': home_rest,
'away_rest_days': away_rest,
'rest_advantage': home_rest - away_rest,
'home_short_rest': 1 if home_rest < 7 else 0,
'away_short_rest': 1 if away_rest < 7 else 0,
'home_bye': 1 if home_rest > 10 else 0,
'away_bye': 1 if away_rest > 10 else 0
}
@staticmethod
def generate_travel_features(home_lat: float, home_lon: float,
away_lat: float, away_lon: float) -> Dict:
"""
Generate travel-related features.
Uses great circle distance approximation.
"""
from math import radians, sin, cos, sqrt, atan2
R = 3959 # Earth radius in miles
lat1, lon1 = radians(home_lat), radians(home_lon)
lat2, lon2 = radians(away_lat), radians(away_lon)
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * atan2(sqrt(a), sqrt(1-a))
distance = R * c
# Time zone difference (simplified)
tz_diff = abs(home_lon - away_lon) / 15 # Approximate hours
return {
'travel_distance': distance,
'timezone_diff': tz_diff,
'long_travel': 1 if distance > 1000 else 0,
'cross_country': 1 if distance > 2000 else 0
}
@staticmethod
def generate_season_context(game_week: int, game_type: str,
home_record: Tuple[int, int],
away_record: Tuple[int, int]) -> Dict:
"""
Generate season context features.
Parameters:
-----------
game_week : int
Week number in season
game_type : str
'regular', 'conference_championship', 'bowl', 'playoff'
home_record : Tuple[int, int]
Home team (wins, losses)
away_record : Tuple[int, int]
Away team (wins, losses)
Returns:
--------
Dict : Context features
"""
home_wins, home_losses = home_record
away_wins, away_losses = away_record
return {
'game_week': game_week,
'early_season': 1 if game_week <= 4 else 0,
'mid_season': 1 if 5 <= game_week <= 9 else 0,
'late_season': 1 if game_week >= 10 else 0,
'is_bowl': 1 if game_type == 'bowl' else 0,
'is_playoff': 1 if game_type == 'playoff' else 0,
'is_conf_championship': 1 if game_type == 'conference_championship' else 0,
'home_games_played': home_wins + home_losses,
'away_games_played': away_wins + away_losses,
'home_elimination_game': 1 if home_losses >= 3 else 0, # Simplified
'away_elimination_game': 1 if away_losses >= 3 else 0
}
18.4 Machine Learning Models for Game Prediction
18.4.1 Complete Prediction Pipeline
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.calibration import CalibratedClassifierCV
class GamePredictionPipeline:
"""
Complete machine learning pipeline for game prediction.
"""
def __init__(self):
self.feature_generators = {
'team_strength': TeamStrengthFeatures(),
'matchup': MatchupFeatures,
'situational': SituationalFeatures
}
self.scaler = StandardScaler()
self.model = None
self.feature_columns = None
self.is_fitted = False
def prepare_game_features(self, game: Dict,
team_history: pd.DataFrame) -> pd.DataFrame:
"""
Prepare all features for a single game.
Parameters:
-----------
game : Dict
Game information
team_history : pd.DataFrame
Historical team performance data
Returns:
--------
pd.DataFrame : Feature row
"""
game_date = pd.Timestamp(game['date'])
# Team strength features
home_strength = self.feature_generators['team_strength'].generate_features(
team_history[team_history['team'] == game['home_team']],
game_date
)
away_strength = self.feature_generators['team_strength'].generate_features(
team_history[team_history['team'] == game['away_team']],
game_date
)
# Differential features
diff_features = MatchupFeatures.generate_differential_features(
home_strength, away_strength
)
# Combine all features
features = {**diff_features}
features['home_field'] = 1
return pd.DataFrame([features])
def prepare_training_data(self, games: pd.DataFrame,
team_history: pd.DataFrame) -> Tuple[pd.DataFrame, pd.Series]:
"""
Prepare training data from historical games.
Parameters:
-----------
games : pd.DataFrame
Games to prepare (with home_win labels)
team_history : pd.DataFrame
Team performance history
Returns:
--------
Tuple[pd.DataFrame, pd.Series] : Features and labels
"""
all_features = []
for _, game in games.iterrows():
features = self.prepare_game_features(game.to_dict(), team_history)
all_features.append(features)
X = pd.concat(all_features, ignore_index=True)
y = games['home_win']
self.feature_columns = X.columns.tolist()
return X, y
def fit(self, X: pd.DataFrame, y: pd.Series,
model_type: str = 'gradient_boosting',
calibrate: bool = True) -> 'GamePredictionPipeline':
"""
Fit the prediction model.
Parameters:
-----------
X : pd.DataFrame
Feature matrix
y : pd.Series
Labels
model_type : str
'logistic', 'random_forest', or 'gradient_boosting'
calibrate : bool
Whether to calibrate probabilities
Returns:
--------
self : Fitted pipeline
"""
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Select base model
if model_type == 'logistic':
base_model = LogisticRegression(max_iter=1000, random_state=42)
elif model_type == 'random_forest':
base_model = RandomForestClassifier(
n_estimators=200, max_depth=6, random_state=42
)
elif model_type == 'gradient_boosting':
base_model = GradientBoostingClassifier(
n_estimators=200, max_depth=4, learning_rate=0.05, random_state=42
)
else:
raise ValueError(f"Unknown model type: {model_type}")
# Calibrate if requested
if calibrate:
self.model = CalibratedClassifierCV(base_model, cv=5, method='isotonic')
else:
self.model = base_model
self.model.fit(X_scaled, y)
self.is_fitted = True
return self
def predict(self, X: pd.DataFrame) -> GamePrediction:
"""
Generate prediction for a game.
Parameters:
-----------
X : pd.DataFrame
Game features (single row)
Returns:
--------
GamePrediction : Prediction object
"""
if not self.is_fitted:
raise ValueError("Pipeline not fitted")
X_scaled = self.scaler.transform(X)
prob = self.model.predict_proba(X_scaled)[0, 1]
spread = probability_to_spread(prob)
# Estimate total (simplified)
total = 55 # Average total
return {
'home_win_prob': prob,
'predicted_spread': spread,
'predicted_total': total
}
def cross_validate(self, X: pd.DataFrame, y: pd.Series,
cv: int = 5) -> Dict:
"""
Cross-validate the model.
Parameters:
-----------
X : pd.DataFrame
Features
y : pd.Series
Labels
cv : int
Number of folds
Returns:
--------
Dict : CV results
"""
X_scaled = self.scaler.fit_transform(X)
# Use TimeSeriesSplit for temporal data
tscv = TimeSeriesSplit(n_splits=cv)
accuracy_scores = cross_val_score(
self.model, X_scaled, y, cv=tscv, scoring='accuracy'
)
auc_scores = cross_val_score(
self.model, X_scaled, y, cv=tscv, scoring='roc_auc'
)
return {
'cv_accuracy_mean': accuracy_scores.mean(),
'cv_accuracy_std': accuracy_scores.std(),
'cv_auc_mean': auc_scores.mean(),
'cv_auc_std': auc_scores.std()
}
18.4.2 Ensemble Methods
Combining multiple models often produces better predictions than any single model:
class EnsemblePredictor:
"""
Ensemble predictor combining multiple models.
"""
def __init__(self):
self.models = {}
self.weights = {}
self.scaler = StandardScaler()
self.is_fitted = False
def add_model(self, name: str, model, weight: float = 1.0):
"""Add a model to the ensemble."""
self.models[name] = model
self.weights[name] = weight
def add_default_models(self):
"""Add default ensemble of models."""
self.models = {
'logistic': LogisticRegression(max_iter=1000, random_state=42),
'random_forest': RandomForestClassifier(
n_estimators=200, max_depth=6, random_state=42
),
'gradient_boosting': GradientBoostingClassifier(
n_estimators=200, max_depth=4, learning_rate=0.05, random_state=42
)
}
# Equal weights by default
self.weights = {name: 1.0 for name in self.models}
def fit(self, X: pd.DataFrame, y: pd.Series,
optimize_weights: bool = True) -> 'EnsemblePredictor':
"""
Fit all models in the ensemble.
Parameters:
-----------
X : pd.DataFrame
Features
y : pd.Series
Labels
optimize_weights : bool
Whether to optimize weights based on CV performance
Returns:
--------
self : Fitted ensemble
"""
X_scaled = self.scaler.fit_transform(X)
cv_scores = {}
for name, model in self.models.items():
print(f"Fitting {name}...")
# Cross-validate to get weight
if optimize_weights:
scores = cross_val_score(model, X_scaled, y, cv=5, scoring='roc_auc')
cv_scores[name] = scores.mean()
# Fit on full data
model.fit(X_scaled, y)
# Optimize weights based on CV performance
if optimize_weights:
total_score = sum(cv_scores.values())
self.weights = {
name: score / total_score
for name, score in cv_scores.items()
}
print("\nOptimized weights:")
for name, weight in self.weights.items():
print(f" {name}: {weight:.3f}")
self.is_fitted = True
return self
def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
"""
Get ensemble probability predictions.
Uses weighted average of model predictions.
"""
if not self.is_fitted:
raise ValueError("Ensemble not fitted")
X_scaled = self.scaler.transform(X)
weighted_probs = np.zeros(len(X))
total_weight = sum(self.weights.values())
for name, model in self.models.items():
probs = model.predict_proba(X_scaled)[:, 1]
weighted_probs += self.weights[name] * probs
return weighted_probs / total_weight
def predict(self, X: pd.DataFrame) -> np.ndarray:
"""Get binary predictions."""
probs = self.predict_proba(X)
return (probs > 0.5).astype(int)
def evaluate(self, X: pd.DataFrame, y: pd.Series) -> Dict:
"""Evaluate ensemble performance."""
from sklearn.metrics import accuracy_score, roc_auc_score, brier_score_loss
probs = self.predict_proba(X)
preds = (probs > 0.5).astype(int)
# Ensemble metrics
ensemble_metrics = {
'ensemble_accuracy': accuracy_score(y, preds),
'ensemble_auc': roc_auc_score(y, probs),
'ensemble_brier': brier_score_loss(y, probs)
}
# Individual model metrics
X_scaled = self.scaler.transform(X)
for name, model in self.models.items():
model_probs = model.predict_proba(X_scaled)[:, 1]
model_preds = (model_probs > 0.5).astype(int)
ensemble_metrics[f'{name}_accuracy'] = accuracy_score(y, model_preds)
ensemble_metrics[f'{name}_auc'] = roc_auc_score(y, model_probs)
return ensemble_metrics
18.5 Evaluating Game Predictions
18.5.1 Metrics for Game Prediction
class GamePredictionEvaluator:
"""
Comprehensive evaluation for game predictions.
"""
@staticmethod
def evaluate_accuracy(y_true: np.ndarray, y_pred: np.ndarray,
y_prob: np.ndarray) -> Dict:
"""
Evaluate prediction accuracy.
Parameters:
-----------
y_true : np.ndarray
Actual outcomes (0/1)
y_pred : np.ndarray
Predicted outcomes (0/1)
y_prob : np.ndarray
Predicted probabilities
Returns:
--------
Dict : Accuracy metrics
"""
from sklearn.metrics import (
accuracy_score, precision_score, recall_score,
f1_score, roc_auc_score, brier_score_loss, log_loss
)
return {
'accuracy': accuracy_score(y_true, y_pred),
'precision': precision_score(y_true, y_pred),
'recall': recall_score(y_true, y_pred),
'f1': f1_score(y_true, y_pred),
'auc_roc': roc_auc_score(y_true, y_prob),
'brier_score': brier_score_loss(y_true, y_prob),
'log_loss': log_loss(y_true, y_prob)
}
@staticmethod
def evaluate_by_confidence(y_true: np.ndarray, y_prob: np.ndarray,
n_bins: int = 5) -> pd.DataFrame:
"""
Evaluate accuracy by prediction confidence.
Parameters:
-----------
y_true : np.ndarray
Actual outcomes
y_prob : np.ndarray
Predicted probabilities
n_bins : int
Number of confidence bins
Returns:
--------
pd.DataFrame : Accuracy by confidence level
"""
# Calculate confidence as distance from 0.5
confidence = np.abs(y_prob - 0.5) * 2 # 0 to 1 scale
# Create bins
bins = np.linspace(0, 1, n_bins + 1)
bin_labels = [f'{bins[i]:.0%}-{bins[i+1]:.0%}' for i in range(n_bins)]
results = []
for i in range(n_bins):
mask = (confidence >= bins[i]) & (confidence < bins[i+1])
if mask.sum() > 0:
pred_binary = (y_prob[mask] > 0.5).astype(int)
accuracy = (pred_binary == y_true[mask]).mean()
results.append({
'confidence_bin': bin_labels[i],
'n_games': mask.sum(),
'accuracy': accuracy,
'avg_confidence': confidence[mask].mean()
})
return pd.DataFrame(results)
@staticmethod
def compare_to_baseline(y_true: np.ndarray, y_pred: np.ndarray,
y_prob: np.ndarray) -> Dict:
"""
Compare model to various baselines.
Parameters:
-----------
y_true : np.ndarray
Actual outcomes
y_pred : np.ndarray
Model predictions
y_prob : np.ndarray
Model probabilities
Returns:
--------
Dict : Comparison metrics
"""
# Baselines
home_rate = y_true.mean()
home_always = home_rate
away_always = 1 - home_rate
coin_flip = 0.5
model_accuracy = (y_pred == y_true).mean()
# Information gain metrics
from sklearn.metrics import brier_score_loss
model_brier = brier_score_loss(y_true, y_prob)
baseline_brier = brier_score_loss(y_true, np.full_like(y_prob, home_rate))
return {
'model_accuracy': model_accuracy,
'home_always_baseline': home_always,
'coin_flip_baseline': coin_flip,
'improvement_over_home': model_accuracy - home_always,
'improvement_over_coin': model_accuracy - coin_flip,
'model_brier': model_brier,
'baseline_brier': baseline_brier,
'brier_skill_score': 1 - (model_brier / baseline_brier)
}
@staticmethod
def evaluate_against_spread(y_true_margin: np.ndarray,
predicted_spread: np.ndarray,
market_spread: np.ndarray) -> Dict:
"""
Evaluate predictions against the spread.
Parameters:
-----------
y_true_margin : np.ndarray
Actual game margins (home - away)
predicted_spread : np.ndarray
Model predicted spreads
market_spread : np.ndarray
Market/Vegas spreads
Returns:
--------
Dict : ATS metrics
"""
# Model covers when prediction is on correct side of market
model_pick = predicted_spread > market_spread
actual_cover = y_true_margin > market_spread
model_ats_accuracy = (model_pick == actual_cover).mean()
# MAE comparison
model_mae = np.abs(predicted_spread - y_true_margin).mean()
market_mae = np.abs(market_spread - y_true_margin).mean()
return {
'model_ats_accuracy': model_ats_accuracy,
'model_mae': model_mae,
'market_mae': market_mae,
'mae_vs_market': model_mae - market_mae,
'games_evaluated': len(y_true_margin)
}
18.5.2 Calibration Assessment
def assess_calibration(y_true: np.ndarray, y_prob: np.ndarray,
n_bins: int = 10) -> Dict:
"""
Comprehensive calibration assessment.
Parameters:
-----------
y_true : np.ndarray
Actual outcomes
y_prob : np.ndarray
Predicted probabilities
n_bins : int
Number of calibration bins
Returns:
--------
Dict : Calibration metrics and data
"""
from sklearn.calibration import calibration_curve
# Calculate calibration curve
prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=n_bins)
# Expected Calibration Error (ECE)
bin_boundaries = np.linspace(0, 1, n_bins + 1)
ece = 0
mce = 0 # Maximum Calibration Error
calibration_data = []
for i in range(n_bins):
low, high = bin_boundaries[i], bin_boundaries[i+1]
mask = (y_prob >= low) & (y_prob < high)
if mask.sum() > 0:
bin_accuracy = y_true[mask].mean()
bin_confidence = y_prob[mask].mean()
bin_size = mask.sum()
error = abs(bin_accuracy - bin_confidence)
ece += (bin_size / len(y_prob)) * error
mce = max(mce, error)
calibration_data.append({
'bin': f'{low:.1f}-{high:.1f}',
'n_samples': bin_size,
'avg_confidence': bin_confidence,
'actual_accuracy': bin_accuracy,
'calibration_error': error
})
return {
'ece': ece,
'mce': mce,
'calibration_curve': (prob_true, prob_pred),
'calibration_data': pd.DataFrame(calibration_data),
'brier_score': brier_score_loss(y_true, y_prob)
}
def plot_calibration_and_distribution(y_true: np.ndarray,
y_prob: np.ndarray,
title: str = 'Model Calibration') -> plt.Figure:
"""
Plot calibration curve with prediction distribution.
"""
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10),
gridspec_kw={'height_ratios': [2, 1]})
# Calibration plot
from sklearn.calibration import calibration_curve
prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=10)
ax1.plot([0, 1], [0, 1], 'k--', label='Perfect calibration')
ax1.plot(prob_pred, prob_true, 's-', label='Model')
ax1.set_xlabel('Mean Predicted Probability')
ax1.set_ylabel('Fraction of Positives')
ax1.set_title(title)
ax1.legend(loc='lower right')
ax1.grid(True, alpha=0.3)
# Distribution of predictions
ax2.hist(y_prob, bins=50, edgecolor='black', alpha=0.7)
ax2.axvline(x=0.5, color='red', linestyle='--', label='Decision threshold')
ax2.set_xlabel('Predicted Probability')
ax2.set_ylabel('Count')
ax2.set_title('Distribution of Predictions')
ax2.legend()
fig.tight_layout()
return fig
18.6 Production Deployment
18.6.1 Complete Prediction System
class ProductionGamePredictor:
"""
Production-ready game prediction system.
Combines Elo ratings, ML features, and ensemble prediction.
"""
def __init__(self, config: Dict = None):
"""
Initialize production predictor.
Parameters:
-----------
config : Dict
Configuration options
"""
self.config = config or self._default_config()
# Components
self.elo_system = EloRatingSystem(
k_factor=self.config['elo_k_factor'],
home_advantage=self.config['elo_home_advantage']
)
self.feature_pipeline = GamePredictionPipeline()
self.ensemble = EnsemblePredictor()
self.is_trained = False
self.training_metrics = None
def _default_config(self) -> Dict:
"""Default configuration."""
return {
'elo_k_factor': 20,
'elo_home_advantage': 65,
'elo_weight': 0.3,
'ml_weight': 0.7,
'calibration_method': 'isotonic',
'min_games_for_prediction': 3
}
def train(self, historical_games: pd.DataFrame,
team_history: pd.DataFrame,
test_season: int = 2023) -> Dict:
"""
Train the complete system.
Parameters:
-----------
historical_games : pd.DataFrame
Game results with scores
team_history : pd.DataFrame
Team performance data
test_season : int
Season to hold out for testing
Returns:
--------
Dict : Training results
"""
print("=" * 60)
print("TRAINING GAME PREDICTION SYSTEM")
print("=" * 60)
# 1. Build Elo ratings from history
print("\n1. Building Elo ratings...")
train_games = historical_games[historical_games['season'] < test_season]
for _, game in train_games.iterrows():
self.elo_system.update_ratings(
game['home_team'], game['away_team'],
game['home_score'], game['away_score']
)
print(f" Processed {len(train_games)} games")
# 2. Prepare ML features
print("\n2. Preparing ML features...")
X_train, y_train = self.feature_pipeline.prepare_training_data(
train_games, team_history
)
print(f" Generated {len(X_train.columns)} features")
# 3. Train ensemble
print("\n3. Training ensemble models...")
self.ensemble.add_default_models()
self.ensemble.fit(X_train, y_train, optimize_weights=True)
# 4. Evaluate on test set
print("\n4. Evaluating on test set...")
test_games = historical_games[historical_games['season'] >= test_season]
X_test, y_test = self.feature_pipeline.prepare_training_data(
test_games, team_history
)
self.training_metrics = self.ensemble.evaluate(X_test, y_test)
print(f"\n Test Results:")
print(f" Accuracy: {self.training_metrics['ensemble_accuracy']:.1%}")
print(f" AUC-ROC: {self.training_metrics['ensemble_auc']:.3f}")
print(f" Brier Score: {self.training_metrics['ensemble_brier']:.4f}")
self.is_trained = True
return self.training_metrics
def predict_game(self, home_team: str, away_team: str,
game_context: Dict = None) -> GamePrediction:
"""
Generate prediction for an upcoming game.
Parameters:
-----------
home_team : str
Home team name
away_team : str
Away team name
game_context : Dict
Additional context (date, venue, etc.)
Returns:
--------
GamePrediction : Complete prediction
"""
if not self.is_trained:
raise ValueError("System not trained. Call train() first.")
# Get Elo-based prediction
elo_prob = self.elo_system.expected_win_prob(home_team, away_team)
# For production, we would generate features from current team data
# Here we'll combine with Elo prediction
# Weighted combination (simplified for demo)
elo_weight = self.config['elo_weight']
ml_weight = self.config['ml_weight']
# In production, would get ML prediction from current features
# For now, use Elo as proxy
combined_prob = elo_prob # Simplified
# Calculate spread and confidence
predicted_spread = probability_to_spread(combined_prob)
# Confidence based on rating difference
home_rating = self.elo_system.get_rating(home_team)
away_rating = self.elo_system.get_rating(away_team)
rating_diff = abs(home_rating - away_rating)
confidence = min(rating_diff / 400, 1.0) # Normalize
return GamePrediction(
game_id=f"{home_team}_{away_team}_{game_context.get('date', 'unknown')}",
home_team=home_team,
away_team=away_team,
home_win_prob=combined_prob,
predicted_spread=predicted_spread,
predicted_total=55, # Simplified
confidence=confidence,
model_name='production_ensemble'
)
def predict_week(self, games: List[Dict]) -> pd.DataFrame:
"""
Generate predictions for a week of games.
Parameters:
-----------
games : List[Dict]
List of game dictionaries with home_team, away_team
Returns:
--------
pd.DataFrame : All predictions
"""
predictions = []
for game in games:
pred = self.predict_game(
game['home_team'],
game['away_team'],
game
)
predictions.append(pred.to_dict())
df = pd.DataFrame(predictions)
df = df.sort_values('home_win_prob', ascending=False)
return df
Summary
Game outcome prediction represents a sophisticated application of machine learning to sports analytics. Key takeaways from this chapter:
-
Multiple Approaches: Effective prediction combines rating systems (Elo), statistical features, and machine learning models. No single approach dominates across all situations.
-
Feature Engineering is Critical: The quality of features determines model performance more than algorithm choice. Focus on capturing team strength, matchup dynamics, and situational factors.
-
Calibration Matters: Well-calibrated probabilities enable better decision-making. Always assess calibration alongside accuracy.
-
Respect Uncertainty: Football is inherently unpredictable. Even perfect models would achieve only ~75-78% accuracy due to game variance.
-
Evaluation Against Baselines: Always compare to meaningful baselines (home team advantage, market lines) to assess true model value.
-
Temporal Validation: Use proper temporal splits to avoid data leakage and assess realistic performance.
-
Ensemble Methods: Combining diverse models typically outperforms any single approach.
The next chapter extends these techniques to player performance forecasting, predicting individual player outcomes throughout a season.
Chapter 18 Exercises
See exercises.md for practice problems ranging from basic Elo implementation to building complete prediction systems.
Chapter 18 Code Examples
example-01-elo-system.py: Complete Elo rating implementationexample-02-feature-engineering.py: Advanced feature generationexample-03-ml-pipeline.py: Full prediction pipeline with ensembleexample-04-evaluation.py: Comprehensive model evaluation
Related Reading
Explore this topic in other books
NFL Analytics Elo Power Ratings Sports Betting Ratings & Rankings Systems