Every NFL game generates a flood of predictions: Vegas point spreads, TV analyst picks, fantasy projections, and Twitter hot takes. But what separates a rigorous prediction model from a lucky guess? How do we build systems that consistently...
In This Chapter
Chapter 18: Introduction to Prediction Models
Introduction
Every NFL game generates a flood of predictions: Vegas point spreads, TV analyst picks, fantasy projections, and Twitter hot takes. But what separates a rigorous prediction model from a lucky guess? How do we build systems that consistently outperform random chance—or even the market?
This chapter introduces the foundations of NFL prediction modeling:
- What makes a prediction model - Components, inputs, and outputs
- Evaluation metrics - How to measure if your model actually works
- Common pitfalls - Why most prediction attempts fail
- Building blocks - The fundamental approaches we'll explore in subsequent chapters
By the end of this chapter, you'll understand what goes into a real prediction model and have a framework for evaluating any predictive system—yours or others'.
What Is a Prediction Model?
Definition
A prediction model is a systematic method for generating forecasts about uncertain future events based on available information. For NFL predictions, this typically means:
- Inputs: Historical data, team ratings, injuries, weather, etc.
- Process: Mathematical transformation of inputs to outputs
- Outputs: Predicted winner, point spread, win probability, player stats
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
@dataclass
class GamePrediction:
"""Standard prediction output format."""
game_id: str
home_team: str
away_team: str
# Core predictions
predicted_winner: str
home_win_probability: float
predicted_spread: float # Negative = home favored
predicted_total: float
# Confidence measures
confidence: float # 0-1 scale
model_uncertainty: float
# Optional detailed predictions
home_score: Optional[float] = None
away_score: Optional[float] = None
def basic_prediction_model(home_rating: float, away_rating: float,
home_field_advantage: float = 2.5) -> GamePrediction:
"""
Simplest possible prediction model.
Args:
home_rating: Home team strength (points above average)
away_rating: Away team strength (points above average)
home_field_advantage: HFA in points
Returns:
GamePrediction object
"""
# Predicted spread
spread = away_rating - home_rating - home_field_advantage
# Convert to win probability
# Using logistic approximation: each point ≈ 3% win probability
home_wp = 1 / (1 + 10 ** (spread / 8))
# Predicted total (simplified)
league_avg_total = 45.0
total = league_avg_total + (home_rating + away_rating) / 2
# Predicted scores
home_score = (total / 2) - (spread / 2)
away_score = (total / 2) + (spread / 2)
return GamePrediction(
game_id="example",
home_team="HOME",
away_team="AWAY",
predicted_winner="HOME" if home_wp > 0.5 else "AWAY",
home_win_probability=round(home_wp, 3),
predicted_spread=round(spread, 1),
predicted_total=round(total, 1),
confidence=abs(home_wp - 0.5) * 2, # Higher when more certain
model_uncertainty=0.15, # Fixed for this simple model
home_score=round(home_score, 1),
away_score=round(away_score, 1)
)
The Prediction Pipeline
Every prediction model follows a similar pipeline:
[Raw Data] → [Feature Engineering] → [Model] → [Predictions] → [Evaluation]
↑ ↓
└──────────── [Feedback Loop] ←───────────────┘
class PredictionPipeline:
"""
Standard prediction model pipeline.
Example usage:
pipeline = PredictionPipeline()
pipeline.load_data(schedules, pbp)
pipeline.engineer_features()
pipeline.train_model()
predictions = pipeline.predict(upcoming_games)
pipeline.evaluate(predictions, results)
"""
def __init__(self):
self.raw_data = None
self.features = None
self.model = None
self.predictions = None
def load_data(self, schedules: pd.DataFrame,
pbp: pd.DataFrame = None) -> None:
"""Load raw data sources."""
self.schedules = schedules
self.pbp = pbp
# Filter to completed games for training
self.completed = schedules[schedules['home_score'].notna()].copy()
print(f"Loaded {len(self.completed)} completed games")
def engineer_features(self) -> pd.DataFrame:
"""
Transform raw data into model features.
This is where domain knowledge becomes crucial.
"""
features = []
for _, game in self.completed.iterrows():
# Calculate team ratings up to this point
home_rating = self._get_team_rating(game['home_team'], game)
away_rating = self._get_team_rating(game['away_team'], game)
features.append({
'game_id': game['game_id'],
'home_team': game['home_team'],
'away_team': game['away_team'],
'home_rating': home_rating,
'away_rating': away_rating,
'rating_diff': home_rating - away_rating,
'home_score': game['home_score'],
'away_score': game['away_score'],
'actual_spread': game['away_score'] - game['home_score']
})
self.features = pd.DataFrame(features)
return self.features
def _get_team_rating(self, team: str, game: pd.Series) -> float:
"""Get team rating based on games before this one."""
# Get prior games for this team
prior = self.completed[
(self.completed['game_id'] < game['game_id']) &
((self.completed['home_team'] == team) |
(self.completed['away_team'] == team))
]
if len(prior) < 3:
return 0.0 # Default to league average
# Simple rating: average point differential
margins = []
for _, g in prior.tail(8).iterrows(): # Last 8 games
if g['home_team'] == team:
margins.append(g['home_score'] - g['away_score'])
else:
margins.append(g['away_score'] - g['home_score'])
return np.mean(margins)
def train_model(self) -> None:
"""Train the prediction model."""
if self.features is None:
raise ValueError("Must engineer features first")
# For this simple model, we just validate the relationship
# between rating difference and actual spread
from scipy import stats
correlation = stats.pearsonr(
self.features['rating_diff'],
-self.features['actual_spread'] # Negate: positive diff = home wins
)
print(f"Rating vs Spread correlation: {correlation[0]:.3f}")
self.model = {'trained': True, 'correlation': correlation[0]}
def predict(self, games: pd.DataFrame) -> List[GamePrediction]:
"""Generate predictions for upcoming games."""
predictions = []
for _, game in games.iterrows():
home_rating = self._get_team_rating(game['home_team'], game)
away_rating = self._get_team_rating(game['away_team'], game)
pred = basic_prediction_model(home_rating, away_rating)
pred.game_id = game['game_id']
pred.home_team = game['home_team']
pred.away_team = game['away_team']
predictions.append(pred)
self.predictions = predictions
return predictions
def evaluate(self, predictions: List[GamePrediction],
results: pd.DataFrame) -> Dict:
"""Evaluate prediction accuracy."""
correct = 0
total_error = 0
brier_sum = 0
for pred in predictions:
result = results[results['game_id'] == pred.game_id]
if len(result) == 0:
continue
actual = result.iloc[0]
actual_winner = actual['home_team'] if \
actual['home_score'] > actual['away_score'] else actual['away_team']
actual_spread = actual['away_score'] - actual['home_score']
# Straight up accuracy
if pred.predicted_winner == actual_winner:
correct += 1
# Spread error
total_error += abs(pred.predicted_spread - actual_spread)
# Brier score
actual_home_win = 1 if actual_winner == pred.home_team else 0
brier_sum += (pred.home_win_probability - actual_home_win) ** 2
n = len(predictions)
return {
'games_evaluated': n,
'straight_up_accuracy': correct / n if n > 0 else 0,
'mean_absolute_error': total_error / n if n > 0 else 0,
'brier_score': brier_sum / n if n > 0 else 0
}
Types of NFL Predictions
1. Game Outcome Predictions
The most common prediction type: who wins?
@dataclass
class OutcomePrediction:
"""Simple win/loss prediction."""
home_team: str
away_team: str
predicted_winner: str
win_probability: float
confidence: str # "high", "medium", "low"
def classify_confidence(probability: float) -> str:
"""Classify prediction confidence level."""
deviation = abs(probability - 0.5)
if deviation > 0.25:
return "high"
elif deviation > 0.10:
return "medium"
else:
return "low"
2. Point Spread Predictions
Predicting the margin of victory:
@dataclass
class SpreadPrediction:
"""Point spread prediction."""
home_team: str
away_team: str
predicted_spread: float # Positive = away favored
spread_std: float # Uncertainty
cover_probability: float # If betting vs a line
def predict_against_spread(predicted_spread: float,
market_spread: float,
model_std: float = 13.5) -> Dict:
"""
Predict outcome against the spread.
Args:
predicted_spread: Model's predicted spread
market_spread: Vegas/market spread
model_std: Standard deviation of spread predictions
Returns:
Cover probabilities and edge
"""
from scipy import stats
# Difference between model and market
edge = predicted_spread - market_spread
# Probability home covers (beats the spread)
# If market_spread is -7 and we predict -10, home covers more often
home_cover_prob = stats.norm.cdf(-market_spread, loc=-predicted_spread, scale=model_std)
return {
'predicted_spread': predicted_spread,
'market_spread': market_spread,
'edge': edge,
'home_cover_prob': round(home_cover_prob, 3),
'away_cover_prob': round(1 - home_cover_prob, 3),
'recommended_side': 'home' if home_cover_prob > 0.53 else
'away' if home_cover_prob < 0.47 else 'no_bet'
}
3. Total Points Predictions
Predicting combined score:
def predict_total(home_offense: float, away_offense: float,
home_defense: float, away_defense: float,
pace_factor: float = 1.0) -> Dict:
"""
Predict game total points.
Args:
home_offense: Home team offensive rating (points/game)
away_offense: Away team offensive rating
home_defense: Home team defensive rating (points allowed/game)
away_defense: Away team defensive rating
pace_factor: Adjustment for game pace
Returns:
Total prediction with over/under probabilities
"""
league_avg = 22.0 # Points per team
# Expected points for each team
home_expected = (home_offense + away_defense) / 2 * pace_factor
away_expected = (away_offense + home_defense) / 2 * pace_factor
predicted_total = home_expected + away_expected
return {
'home_points': round(home_expected, 1),
'away_points': round(away_expected, 1),
'predicted_total': round(predicted_total, 1),
'total_std': 10.0 # Typical standard deviation
}
4. Season-Long Predictions
Predicting full season outcomes:
def predict_season_wins(team_rating: float,
schedule_sos: float,
games: int = 17) -> Dict:
"""
Predict season win total.
Args:
team_rating: Team's power rating
schedule_sos: Strength of schedule (opponent avg rating)
games: Number of games
"""
# Expected win probability per game
avg_opponent = schedule_sos
rating_diff = team_rating - avg_opponent
# Average win probability
avg_wp = 1 / (1 + 10 ** (-rating_diff / 8))
# Add home field for half the games
home_wp = 1 / (1 + 10 ** (-(rating_diff + 2.5) / 8))
away_wp = 1 / (1 + 10 ** (-(rating_diff - 2.5) / 8))
blended_wp = (home_wp + away_wp) / 2
expected_wins = blended_wp * games
# Variance (binomial)
variance = games * blended_wp * (1 - blended_wp)
std = np.sqrt(variance)
return {
'expected_wins': round(expected_wins, 1),
'win_std': round(std, 1),
'win_range_90': (
max(0, round(expected_wins - 1.65 * std, 0)),
min(games, round(expected_wins + 1.65 * std, 0))
),
'playoff_probability': round(
1 - stats.norm.cdf(9.5, loc=expected_wins, scale=std), 2
) if 'stats' in dir() else None
}
Evaluation Metrics
Why Evaluation Matters
A prediction model is only as good as its track record. Without proper evaluation, you can't distinguish skill from luck.
from dataclasses import dataclass
from typing import List
import numpy as np
@dataclass
class ModelEvaluation:
"""Comprehensive model evaluation results."""
# Sample size
n_predictions: int
# Accuracy metrics
straight_up_accuracy: float
ats_accuracy: float # Against the spread
# Calibration metrics
brier_score: float
log_loss: float
# Error metrics
mae_spread: float # Mean absolute error on spread
rmse_spread: float # Root mean squared error
# Comparison to baseline
vs_random: float # Improvement over 50%
vs_market: float # Improvement over market
class ModelEvaluator:
"""
Comprehensive prediction model evaluator.
Example usage:
evaluator = ModelEvaluator()
results = evaluator.evaluate(predictions, actuals)
evaluator.print_report(results)
"""
def evaluate(self, predictions: List[Dict],
actuals: pd.DataFrame) -> ModelEvaluation:
"""
Evaluate prediction accuracy.
Args:
predictions: List of prediction dicts with game_id, spread, probability
actuals: DataFrame with actual results
"""
n = len(predictions)
if n == 0:
raise ValueError("No predictions to evaluate")
correct_su = 0 # Straight up
correct_ats = 0 # Against spread
brier_sum = 0
log_loss_sum = 0
spread_errors = []
for pred in predictions:
actual = actuals[actuals['game_id'] == pred['game_id']]
if len(actual) == 0:
continue
actual = actual.iloc[0]
# Actual outcome
home_won = actual['home_score'] > actual['away_score']
actual_spread = actual['away_score'] - actual['home_score']
# Straight up accuracy
pred_home_wins = pred.get('home_win_prob', 0.5) > 0.5
if pred_home_wins == home_won:
correct_su += 1
# ATS accuracy (if market spread available)
if 'market_spread' in pred and 'predicted_spread' in pred:
# Did our predicted side cover?
our_side = 'home' if pred['predicted_spread'] < pred['market_spread'] else 'away'
home_covered = actual_spread < pred['market_spread']
if (our_side == 'home' and home_covered) or \
(our_side == 'away' and not home_covered):
correct_ats += 1
# Brier score
prob = pred.get('home_win_prob', 0.5)
outcome = 1 if home_won else 0
brier_sum += (prob - outcome) ** 2
# Log loss
prob_clipped = np.clip(prob, 0.001, 0.999)
if home_won:
log_loss_sum -= np.log(prob_clipped)
else:
log_loss_sum -= np.log(1 - prob_clipped)
# Spread error
if 'predicted_spread' in pred:
spread_errors.append(pred['predicted_spread'] - actual_spread)
# Calculate metrics
spread_errors = np.array(spread_errors)
return ModelEvaluation(
n_predictions=n,
straight_up_accuracy=correct_su / n,
ats_accuracy=correct_ats / n if correct_ats > 0 else 0,
brier_score=brier_sum / n,
log_loss=log_loss_sum / n,
mae_spread=np.mean(np.abs(spread_errors)) if len(spread_errors) > 0 else 0,
rmse_spread=np.sqrt(np.mean(spread_errors ** 2)) if len(spread_errors) > 0 else 0,
vs_random=correct_su / n - 0.5,
vs_market=correct_ats / n - 0.5 if correct_ats > 0 else 0
)
def print_report(self, evaluation: ModelEvaluation) -> None:
"""Print formatted evaluation report."""
print(f"\n{'='*50}")
print("Model Evaluation Report")
print(f"{'='*50}")
print(f"\n--- Sample Size ---")
print(f"Predictions evaluated: {evaluation.n_predictions}")
print(f"\n--- Accuracy ---")
print(f"Straight-up: {evaluation.straight_up_accuracy:.1%}")
print(f"Against spread: {evaluation.ats_accuracy:.1%}")
print(f"\n--- Calibration ---")
print(f"Brier score: {evaluation.brier_score:.4f}")
print(f"Log loss: {evaluation.log_loss:.4f}")
print(f"\n--- Spread Error ---")
print(f"MAE: {evaluation.mae_spread:.2f} points")
print(f"RMSE: {evaluation.rmse_spread:.2f} points")
print(f"\n--- vs Baselines ---")
print(f"vs Random (50%): {evaluation.vs_random:+.1%}")
print(f"vs Market: {evaluation.vs_market:+.1%}")
print(f"\n{'='*50}\n")
Key Metrics Explained
1. Straight-Up Accuracy
Simply: what percentage of winners did you predict correctly?
def calculate_straight_up_accuracy(predictions: List, actuals: List) -> float:
"""
Calculate straight-up prediction accuracy.
Baseline: ~50% (coin flip)
Good model: 55-60%
Elite model: 60-65%
Unrealistic: >70% sustained
"""
correct = sum(1 for p, a in zip(predictions, actuals) if p == a)
return correct / len(predictions)
2. Against-the-Spread (ATS) Accuracy
For betting: do you beat the point spread more than 50% of the time?
def calculate_ats_accuracy(predictions: List[float],
market_spreads: List[float],
actual_spreads: List[float]) -> float:
"""
Calculate against-the-spread accuracy.
To profit betting spreads, need >52.4% (accounting for vig).
Args:
predictions: Your predicted spreads
market_spreads: Vegas spreads
actual_spreads: Actual game spreads
"""
correct = 0
for pred, market, actual in zip(predictions, market_spreads, actual_spreads):
# Which side would you bet?
bet_home = pred < market # You think home is better than market
# Did that side cover?
home_covered = actual < market
if bet_home == home_covered:
correct += 1
return correct / len(predictions)
3. Brier Score
Measures probability calibration: are your 70% predictions winning 70% of the time?
def calculate_brier_score(probabilities: List[float],
outcomes: List[int]) -> float:
"""
Calculate Brier score.
Brier = mean((probability - outcome)^2)
Perfect: 0.0
Random (50%): 0.25
Good model: 0.20-0.22
"""
return np.mean([(p - o) ** 2 for p, o in zip(probabilities, outcomes)])
def analyze_calibration(probabilities: List[float],
outcomes: List[int],
n_bins: int = 10) -> pd.DataFrame:
"""
Analyze probability calibration by binning predictions.
Well-calibrated: predicted probability ≈ actual win rate
"""
bins = np.linspace(0, 1, n_bins + 1)
results = []
for i in range(n_bins):
mask = (np.array(probabilities) >= bins[i]) & \
(np.array(probabilities) < bins[i + 1])
if sum(mask) > 0:
bin_probs = np.array(probabilities)[mask]
bin_outcomes = np.array(outcomes)[mask]
results.append({
'bin_start': bins[i],
'bin_end': bins[i + 1],
'n_predictions': sum(mask),
'mean_probability': np.mean(bin_probs),
'actual_win_rate': np.mean(bin_outcomes),
'calibration_error': abs(np.mean(bin_probs) - np.mean(bin_outcomes))
})
return pd.DataFrame(results)
4. Mean Absolute Error (MAE)
Average spread prediction error in points:
def calculate_mae(predicted_spreads: List[float],
actual_spreads: List[float]) -> float:
"""
Calculate mean absolute error on spreads.
NFL games have high variance: ~13-14 points standard deviation.
Good model MAE: 10-12 points
Market MAE: ~10 points
"""
return np.mean([abs(p - a) for p, a in zip(predicted_spreads, actual_spreads)])
Common Pitfalls
Pitfall 1: Overfitting
Training a model that works perfectly on past data but fails on new data:
def demonstrate_overfitting():
"""
Show how overfitting occurs.
"""
# Generate sample data
np.random.seed(42)
n = 100
# True relationship: spread ≈ rating_diff + noise
rating_diff = np.random.normal(0, 5, n)
noise = np.random.normal(0, 13, n) # NFL has ~13pt std dev
actual_spread = rating_diff + noise
# Overfit model: memorize training data
from sklearn.tree import DecisionTreeRegressor
# Deep tree memorizes noise
overfit_model = DecisionTreeRegressor(max_depth=None)
overfit_model.fit(rating_diff.reshape(-1, 1), actual_spread)
# Simple model: linear relationship
simple_coef = np.corrcoef(rating_diff, actual_spread)[0, 1] * \
np.std(actual_spread) / np.std(rating_diff)
# Training error
overfit_train_error = np.mean(np.abs(
overfit_model.predict(rating_diff.reshape(-1, 1)) - actual_spread
))
simple_train_error = np.mean(np.abs(
rating_diff * simple_coef - actual_spread
))
# Test on new data
new_rating_diff = np.random.normal(0, 5, n)
new_noise = np.random.normal(0, 13, n)
new_actual = new_rating_diff + new_noise
overfit_test_error = np.mean(np.abs(
overfit_model.predict(new_rating_diff.reshape(-1, 1)) - new_actual
))
simple_test_error = np.mean(np.abs(
new_rating_diff * simple_coef - new_actual
))
return {
'overfit_train_mae': overfit_train_error,
'overfit_test_mae': overfit_test_error,
'simple_train_mae': simple_train_error,
'simple_test_mae': simple_test_error,
'lesson': 'Overfit model: great training, poor testing'
}
Pitfall 2: Data Leakage
Using information that wouldn't be available at prediction time:
def demonstrate_data_leakage():
"""
Show common data leakage scenarios.
"""
leakage_examples = {
'using_final_season_stats': {
'problem': 'Using end-of-season stats to predict mid-season games',
'solution': 'Only use data available before each game'
},
'using_game_result_features': {
'problem': 'Including yards, turnovers from the game being predicted',
'solution': 'Strictly separate features from outcome'
},
'future_opponent_info': {
'problem': 'Using opponent stats from games after this one',
'solution': 'Time-aware feature engineering'
},
'injury_reports': {
'problem': 'Using game-day injury info for earlier predictions',
'solution': 'Only use info available at prediction time'
}
}
return leakage_examples
def proper_temporal_split(games: pd.DataFrame,
train_seasons: List[int],
test_seasons: List[int]) -> Tuple:
"""
Properly split data to avoid leakage.
Key: Test data must be strictly after training data.
"""
train = games[games['season'].isin(train_seasons)]
test = games[games['season'].isin(test_seasons)]
# Verify no overlap
assert train['season'].max() < test['season'].min(), \
"Training data must precede test data"
return train, test
Pitfall 3: Ignoring Variance
NFL outcomes have enormous randomness:
def understand_nfl_variance():
"""
Quantify NFL game variance.
"""
# NFL point spread standard deviation is ~13-14 points
spread_std = 13.5
# This means even large predicted edges have significant uncertainty
scenarios = [
{'predicted_edge': 3, 'description': 'Small edge'},
{'predicted_edge': 7, 'description': 'Medium edge'},
{'predicted_edge': 14, 'description': 'Large edge (rare)'}
]
from scipy import stats
results = []
for s in scenarios:
# Probability of covering (winning by more than 0)
cover_prob = stats.norm.cdf(s['predicted_edge'] / spread_std)
results.append({
'edge': s['predicted_edge'],
'description': s['description'],
'cover_probability': round(cover_prob, 3),
'upset_probability': round(1 - cover_prob, 3)
})
return {
'spread_std': spread_std,
'scenarios': results,
'key_insight': 'Even 7-point favorites lose outright ~20% of the time'
}
Pitfall 4: Sample Size Illusions
Small samples create false confidence:
def sample_size_analysis():
"""
Show how sample size affects reliability.
"""
true_accuracy = 0.55 # Actual model skill
sample_sizes = [20, 50, 100, 200, 500, 1000]
results = []
for n in sample_sizes:
# Standard error of proportion
se = np.sqrt(true_accuracy * (1 - true_accuracy) / n)
# 95% confidence interval
ci_low = true_accuracy - 1.96 * se
ci_high = true_accuracy + 1.96 * se
# Probability of appearing >60% accurate by chance
from scipy import stats
prob_look_great = 1 - stats.norm.cdf((0.60 - true_accuracy) / se)
results.append({
'sample_size': n,
'ci_width': round(ci_high - ci_low, 3),
'ci_low': round(ci_low, 3),
'ci_high': round(ci_high, 3),
'prob_misleading': round(prob_look_great, 3)
})
return pd.DataFrame(results)
Building Blocks of Prediction Models
Component 1: Team Ratings
The foundation of most models—a single number representing team quality:
def create_simple_rating_system(games: pd.DataFrame) -> Dict[str, float]:
"""
Create basic team ratings from game results.
This is the simplest possible rating: average point differential.
"""
teams = set(games['home_team'].tolist() + games['away_team'].tolist())
ratings = {}
for team in teams:
home = games[games['home_team'] == team]
away = games[games['away_team'] == team]
home_diff = (home['home_score'] - home['away_score']).sum()
away_diff = (away['away_score'] - away['home_score']).sum()
total_games = len(home) + len(away)
ratings[team] = (home_diff + away_diff) / total_games if total_games > 0 else 0
# Normalize to mean 0
mean_rating = np.mean(list(ratings.values()))
ratings = {team: r - mean_rating for team, r in ratings.items()}
return ratings
Component 2: Home Field Advantage
Account for the home team's edge:
def estimate_home_field_advantage(games: pd.DataFrame) -> Dict:
"""
Estimate home field advantage from historical data.
"""
margins = games['home_score'] - games['away_score']
home_wins = (margins > 0).sum()
total_games = len(games)
return {
'avg_margin': round(margins.mean(), 2),
'home_win_pct': round(home_wins / total_games, 3),
'median_margin': round(margins.median(), 1),
'std_margin': round(margins.std(), 2)
}
Component 3: Adjustments
Factors that modify the base prediction:
def calculate_adjustments(game: Dict, context: Dict) -> Dict:
"""
Calculate prediction adjustments for various factors.
"""
adjustments = {}
# Rest advantage
home_rest = context.get('home_days_rest', 7)
away_rest = context.get('away_days_rest', 7)
adjustments['rest'] = (home_rest - away_rest) * 0.5 # ~0.5 pts per day
# Travel
away_travel_miles = context.get('away_travel_miles', 0)
if away_travel_miles > 2000:
adjustments['travel'] = 1.5 # Long travel hurts away team
elif away_travel_miles > 1000:
adjustments['travel'] = 0.5
else:
adjustments['travel'] = 0
# Timezone (west to east is hardest)
tz_diff = context.get('timezone_diff', 0) # Positive = away traveling east
if tz_diff >= 2:
adjustments['timezone'] = 1.0
elif tz_diff <= -2:
adjustments['timezone'] = -0.5 # East to west is easier
else:
adjustments['timezone'] = 0
# Weather (for outdoor games)
if context.get('is_outdoor', False):
temp = context.get('temperature', 65)
wind = context.get('wind_speed', 5)
if temp < 32:
adjustments['cold'] = -0.5 # Favors home team slightly
if wind > 15:
adjustments['wind'] = -1.0 # Reduces scoring
# Injuries (simplified)
home_injury_impact = context.get('home_injury_impact', 0)
away_injury_impact = context.get('away_injury_impact', 0)
adjustments['injuries'] = away_injury_impact - home_injury_impact
adjustments['total'] = sum(adjustments.values())
return adjustments
Component 4: Uncertainty Quantification
Good models know what they don't know:
def quantify_prediction_uncertainty(base_spread: float,
sample_games: int,
rating_confidence: float) -> Dict:
"""
Quantify uncertainty in a prediction.
Args:
base_spread: Point spread prediction
sample_games: Number of games used for ratings
rating_confidence: Confidence in team ratings (0-1)
"""
# Base uncertainty: NFL game variance
base_std = 13.5
# Additional uncertainty from small samples
sample_uncertainty = 5 / np.sqrt(max(sample_games, 1))
# Rating uncertainty
rating_uncertainty = (1 - rating_confidence) * 3
# Combined uncertainty (root sum of squares)
total_std = np.sqrt(base_std**2 + sample_uncertainty**2 + rating_uncertainty**2)
# Confidence interval
from scipy import stats
ci_90 = stats.norm.interval(0.90, loc=base_spread, scale=total_std)
return {
'predicted_spread': base_spread,
'total_uncertainty': round(total_std, 1),
'ci_90_low': round(ci_90[0], 1),
'ci_90_high': round(ci_90[1], 1),
'components': {
'game_variance': base_std,
'sample_uncertainty': round(sample_uncertainty, 1),
'rating_uncertainty': round(rating_uncertainty, 1)
}
}
A Complete Simple Model
Putting it all together:
class SimpleNFLPredictor:
"""
A complete but simple NFL prediction model.
This demonstrates all the core components while remaining
interpretable and educational.
Example usage:
model = SimpleNFLPredictor()
model.fit(historical_games)
prediction = model.predict('KC', 'BUF', is_home_kc=True)
"""
def __init__(self, hfa: float = 2.5, recency_weight: float = 0.1):
"""
Initialize predictor.
Args:
hfa: Home field advantage in points
recency_weight: Weight decay for older games
"""
self.hfa = hfa
self.recency_weight = recency_weight
self.ratings = {}
self.is_fitted = False
def fit(self, games: pd.DataFrame) -> 'SimpleNFLPredictor':
"""
Fit the model to historical games.
Args:
games: DataFrame with home_team, away_team, home_score, away_score
"""
# Sort by date/week
games = games.sort_values(['season', 'week'])
# Calculate ratings using recency-weighted average
teams = set(games['home_team'].tolist() + games['away_team'].tolist())
for team in teams:
team_games = games[
(games['home_team'] == team) | (games['away_team'] == team)
]
margins = []
weights = []
for i, (_, game) in enumerate(team_games.iterrows()):
if game['home_team'] == team:
margin = game['home_score'] - game['away_score'] - self.hfa
else:
margin = game['away_score'] - game['home_score'] + self.hfa
margins.append(margin)
# More recent games weighted higher
weight = (1 - self.recency_weight) ** (len(team_games) - i - 1)
weights.append(weight)
if margins:
self.ratings[team] = np.average(margins, weights=weights)
else:
self.ratings[team] = 0
# Normalize to mean 0
mean_rating = np.mean(list(self.ratings.values()))
self.ratings = {team: r - mean_rating for team, r in self.ratings.items()}
self.is_fitted = True
return self
def predict(self, home_team: str, away_team: str,
adjustments: Dict = None) -> GamePrediction:
"""
Make a prediction for a single game.
Args:
home_team: Home team abbreviation
away_team: Away team abbreviation
adjustments: Optional adjustment factors
"""
if not self.is_fitted:
raise ValueError("Model must be fit before predicting")
home_rating = self.ratings.get(home_team, 0)
away_rating = self.ratings.get(away_team, 0)
# Base spread (negative = home favored)
spread = away_rating - home_rating - self.hfa
# Apply adjustments
if adjustments:
spread -= adjustments.get('total', 0)
# Convert to probability
home_wp = 1 / (1 + 10 ** (spread / 8))
# Predicted scores
avg_total = 45.0
home_score = (avg_total - spread) / 2
away_score = (avg_total + spread) / 2
return GamePrediction(
game_id=f"{home_team}_{away_team}",
home_team=home_team,
away_team=away_team,
predicted_winner=home_team if home_wp > 0.5 else away_team,
home_win_probability=round(home_wp, 3),
predicted_spread=round(spread, 1),
predicted_total=avg_total,
confidence=round(abs(home_wp - 0.5) * 2, 2),
model_uncertainty=13.5,
home_score=round(home_score, 1),
away_score=round(away_score, 1)
)
def predict_season(self, schedule: pd.DataFrame) -> pd.DataFrame:
"""
Predict all games in a schedule.
Args:
schedule: DataFrame with home_team, away_team for each game
"""
predictions = []
for _, game in schedule.iterrows():
pred = self.predict(game['home_team'], game['away_team'])
predictions.append({
'game_id': pred.game_id,
'home_team': pred.home_team,
'away_team': pred.away_team,
'predicted_winner': pred.predicted_winner,
'home_win_prob': pred.home_win_probability,
'spread': pred.predicted_spread
})
return pd.DataFrame(predictions)
def get_rankings(self) -> pd.DataFrame:
"""Get team rankings by rating."""
rankings = pd.DataFrame([
{'team': team, 'rating': rating}
for team, rating in self.ratings.items()
])
return rankings.sort_values('rating', ascending=False).reset_index(drop=True)
Summary
Key Concepts:
- Prediction models are systematic - Not guesses, but mathematical transformations of data
- Evaluation is essential - Without proper metrics, you can't distinguish skill from luck
- Pitfalls are everywhere - Overfitting, leakage, variance, and small samples trap many modelers
- Building blocks combine - Ratings + HFA + adjustments + uncertainty = prediction
What Makes a Good Model:
| Aspect | Poor Model | Good Model |
|---|---|---|
| Inputs | One or two factors | Multiple relevant features |
| Evaluation | "Feels right" | Rigorous backtesting |
| Uncertainty | Ignored | Quantified |
| Updates | Static | Learns from new data |
Next Steps:
The following chapters will build on these foundations: - Chapter 19: Elo and Power Ratings - Sophisticated team rating systems - Chapter 20: Machine Learning - Advanced modeling techniques - Chapter 21: Game Simulation - Monte Carlo approaches - Chapter 22: Betting Markets - Using market information
Preview: Chapter 19
Next, we'll dive deep into Elo and Power Ratings—the backbone of most NFL prediction systems. You'll learn how to build rating systems that automatically adjust based on game results, handle margin of victory, and account for opponent strength.
References
- Silver, N. (2012). The Signal and the Noise
- Winston, W. L. (2012). Mathletics
- Football Outsiders. "DVOA Methodology"
- FiveThirtyEight. "How Our NFL Predictions Work"
- Brier, G. W. (1950). "Verification of Forecasts Expressed in Terms of Probability"