4 min read

Every NFL game generates a flood of predictions: Vegas point spreads, TV analyst picks, fantasy projections, and Twitter hot takes. But what separates a rigorous prediction model from a lucky guess? How do we build systems that consistently...

Chapter 18: Introduction to Prediction Models

Introduction

Every NFL game generates a flood of predictions: Vegas point spreads, TV analyst picks, fantasy projections, and Twitter hot takes. But what separates a rigorous prediction model from a lucky guess? How do we build systems that consistently outperform random chance—or even the market?

This chapter introduces the foundations of NFL prediction modeling:

  • What makes a prediction model - Components, inputs, and outputs
  • Evaluation metrics - How to measure if your model actually works
  • Common pitfalls - Why most prediction attempts fail
  • Building blocks - The fundamental approaches we'll explore in subsequent chapters

By the end of this chapter, you'll understand what goes into a real prediction model and have a framework for evaluating any predictive system—yours or others'.


What Is a Prediction Model?

Definition

A prediction model is a systematic method for generating forecasts about uncertain future events based on available information. For NFL predictions, this typically means:

  • Inputs: Historical data, team ratings, injuries, weather, etc.
  • Process: Mathematical transformation of inputs to outputs
  • Outputs: Predicted winner, point spread, win probability, player stats
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass

@dataclass
class GamePrediction:
    """Standard prediction output format."""
    game_id: str
    home_team: str
    away_team: str

    # Core predictions
    predicted_winner: str
    home_win_probability: float
    predicted_spread: float  # Negative = home favored
    predicted_total: float

    # Confidence measures
    confidence: float  # 0-1 scale
    model_uncertainty: float

    # Optional detailed predictions
    home_score: Optional[float] = None
    away_score: Optional[float] = None


def basic_prediction_model(home_rating: float, away_rating: float,
                           home_field_advantage: float = 2.5) -> GamePrediction:
    """
    Simplest possible prediction model.

    Args:
        home_rating: Home team strength (points above average)
        away_rating: Away team strength (points above average)
        home_field_advantage: HFA in points

    Returns:
        GamePrediction object
    """
    # Predicted spread
    spread = away_rating - home_rating - home_field_advantage

    # Convert to win probability
    # Using logistic approximation: each point ≈ 3% win probability
    home_wp = 1 / (1 + 10 ** (spread / 8))

    # Predicted total (simplified)
    league_avg_total = 45.0
    total = league_avg_total + (home_rating + away_rating) / 2

    # Predicted scores
    home_score = (total / 2) - (spread / 2)
    away_score = (total / 2) + (spread / 2)

    return GamePrediction(
        game_id="example",
        home_team="HOME",
        away_team="AWAY",
        predicted_winner="HOME" if home_wp > 0.5 else "AWAY",
        home_win_probability=round(home_wp, 3),
        predicted_spread=round(spread, 1),
        predicted_total=round(total, 1),
        confidence=abs(home_wp - 0.5) * 2,  # Higher when more certain
        model_uncertainty=0.15,  # Fixed for this simple model
        home_score=round(home_score, 1),
        away_score=round(away_score, 1)
    )

The Prediction Pipeline

Every prediction model follows a similar pipeline:

[Raw Data] → [Feature Engineering] → [Model] → [Predictions] → [Evaluation]
     ↑                                              ↓
     └──────────── [Feedback Loop] ←───────────────┘
class PredictionPipeline:
    """
    Standard prediction model pipeline.

    Example usage:
        pipeline = PredictionPipeline()
        pipeline.load_data(schedules, pbp)
        pipeline.engineer_features()
        pipeline.train_model()
        predictions = pipeline.predict(upcoming_games)
        pipeline.evaluate(predictions, results)
    """

    def __init__(self):
        self.raw_data = None
        self.features = None
        self.model = None
        self.predictions = None

    def load_data(self, schedules: pd.DataFrame,
                  pbp: pd.DataFrame = None) -> None:
        """Load raw data sources."""
        self.schedules = schedules
        self.pbp = pbp

        # Filter to completed games for training
        self.completed = schedules[schedules['home_score'].notna()].copy()
        print(f"Loaded {len(self.completed)} completed games")

    def engineer_features(self) -> pd.DataFrame:
        """
        Transform raw data into model features.

        This is where domain knowledge becomes crucial.
        """
        features = []

        for _, game in self.completed.iterrows():
            # Calculate team ratings up to this point
            home_rating = self._get_team_rating(game['home_team'], game)
            away_rating = self._get_team_rating(game['away_team'], game)

            features.append({
                'game_id': game['game_id'],
                'home_team': game['home_team'],
                'away_team': game['away_team'],
                'home_rating': home_rating,
                'away_rating': away_rating,
                'rating_diff': home_rating - away_rating,
                'home_score': game['home_score'],
                'away_score': game['away_score'],
                'actual_spread': game['away_score'] - game['home_score']
            })

        self.features = pd.DataFrame(features)
        return self.features

    def _get_team_rating(self, team: str, game: pd.Series) -> float:
        """Get team rating based on games before this one."""
        # Get prior games for this team
        prior = self.completed[
            (self.completed['game_id'] < game['game_id']) &
            ((self.completed['home_team'] == team) |
             (self.completed['away_team'] == team))
        ]

        if len(prior) < 3:
            return 0.0  # Default to league average

        # Simple rating: average point differential
        margins = []
        for _, g in prior.tail(8).iterrows():  # Last 8 games
            if g['home_team'] == team:
                margins.append(g['home_score'] - g['away_score'])
            else:
                margins.append(g['away_score'] - g['home_score'])

        return np.mean(margins)

    def train_model(self) -> None:
        """Train the prediction model."""
        if self.features is None:
            raise ValueError("Must engineer features first")

        # For this simple model, we just validate the relationship
        # between rating difference and actual spread
        from scipy import stats

        correlation = stats.pearsonr(
            self.features['rating_diff'],
            -self.features['actual_spread']  # Negate: positive diff = home wins
        )

        print(f"Rating vs Spread correlation: {correlation[0]:.3f}")
        self.model = {'trained': True, 'correlation': correlation[0]}

    def predict(self, games: pd.DataFrame) -> List[GamePrediction]:
        """Generate predictions for upcoming games."""
        predictions = []

        for _, game in games.iterrows():
            home_rating = self._get_team_rating(game['home_team'], game)
            away_rating = self._get_team_rating(game['away_team'], game)

            pred = basic_prediction_model(home_rating, away_rating)
            pred.game_id = game['game_id']
            pred.home_team = game['home_team']
            pred.away_team = game['away_team']

            predictions.append(pred)

        self.predictions = predictions
        return predictions

    def evaluate(self, predictions: List[GamePrediction],
                 results: pd.DataFrame) -> Dict:
        """Evaluate prediction accuracy."""
        correct = 0
        total_error = 0
        brier_sum = 0

        for pred in predictions:
            result = results[results['game_id'] == pred.game_id]
            if len(result) == 0:
                continue

            actual = result.iloc[0]
            actual_winner = actual['home_team'] if \
                actual['home_score'] > actual['away_score'] else actual['away_team']
            actual_spread = actual['away_score'] - actual['home_score']

            # Straight up accuracy
            if pred.predicted_winner == actual_winner:
                correct += 1

            # Spread error
            total_error += abs(pred.predicted_spread - actual_spread)

            # Brier score
            actual_home_win = 1 if actual_winner == pred.home_team else 0
            brier_sum += (pred.home_win_probability - actual_home_win) ** 2

        n = len(predictions)
        return {
            'games_evaluated': n,
            'straight_up_accuracy': correct / n if n > 0 else 0,
            'mean_absolute_error': total_error / n if n > 0 else 0,
            'brier_score': brier_sum / n if n > 0 else 0
        }

Types of NFL Predictions

1. Game Outcome Predictions

The most common prediction type: who wins?

@dataclass
class OutcomePrediction:
    """Simple win/loss prediction."""
    home_team: str
    away_team: str
    predicted_winner: str
    win_probability: float
    confidence: str  # "high", "medium", "low"

def classify_confidence(probability: float) -> str:
    """Classify prediction confidence level."""
    deviation = abs(probability - 0.5)
    if deviation > 0.25:
        return "high"
    elif deviation > 0.10:
        return "medium"
    else:
        return "low"

2. Point Spread Predictions

Predicting the margin of victory:

@dataclass
class SpreadPrediction:
    """Point spread prediction."""
    home_team: str
    away_team: str
    predicted_spread: float  # Positive = away favored
    spread_std: float  # Uncertainty
    cover_probability: float  # If betting vs a line

def predict_against_spread(predicted_spread: float,
                           market_spread: float,
                           model_std: float = 13.5) -> Dict:
    """
    Predict outcome against the spread.

    Args:
        predicted_spread: Model's predicted spread
        market_spread: Vegas/market spread
        model_std: Standard deviation of spread predictions

    Returns:
        Cover probabilities and edge
    """
    from scipy import stats

    # Difference between model and market
    edge = predicted_spread - market_spread

    # Probability home covers (beats the spread)
    # If market_spread is -7 and we predict -10, home covers more often
    home_cover_prob = stats.norm.cdf(-market_spread, loc=-predicted_spread, scale=model_std)

    return {
        'predicted_spread': predicted_spread,
        'market_spread': market_spread,
        'edge': edge,
        'home_cover_prob': round(home_cover_prob, 3),
        'away_cover_prob': round(1 - home_cover_prob, 3),
        'recommended_side': 'home' if home_cover_prob > 0.53 else
                           'away' if home_cover_prob < 0.47 else 'no_bet'
    }

3. Total Points Predictions

Predicting combined score:

def predict_total(home_offense: float, away_offense: float,
                  home_defense: float, away_defense: float,
                  pace_factor: float = 1.0) -> Dict:
    """
    Predict game total points.

    Args:
        home_offense: Home team offensive rating (points/game)
        away_offense: Away team offensive rating
        home_defense: Home team defensive rating (points allowed/game)
        away_defense: Away team defensive rating
        pace_factor: Adjustment for game pace

    Returns:
        Total prediction with over/under probabilities
    """
    league_avg = 22.0  # Points per team

    # Expected points for each team
    home_expected = (home_offense + away_defense) / 2 * pace_factor
    away_expected = (away_offense + home_defense) / 2 * pace_factor

    predicted_total = home_expected + away_expected

    return {
        'home_points': round(home_expected, 1),
        'away_points': round(away_expected, 1),
        'predicted_total': round(predicted_total, 1),
        'total_std': 10.0  # Typical standard deviation
    }

4. Season-Long Predictions

Predicting full season outcomes:

def predict_season_wins(team_rating: float,
                        schedule_sos: float,
                        games: int = 17) -> Dict:
    """
    Predict season win total.

    Args:
        team_rating: Team's power rating
        schedule_sos: Strength of schedule (opponent avg rating)
        games: Number of games
    """
    # Expected win probability per game
    avg_opponent = schedule_sos
    rating_diff = team_rating - avg_opponent

    # Average win probability
    avg_wp = 1 / (1 + 10 ** (-rating_diff / 8))

    # Add home field for half the games
    home_wp = 1 / (1 + 10 ** (-(rating_diff + 2.5) / 8))
    away_wp = 1 / (1 + 10 ** (-(rating_diff - 2.5) / 8))
    blended_wp = (home_wp + away_wp) / 2

    expected_wins = blended_wp * games

    # Variance (binomial)
    variance = games * blended_wp * (1 - blended_wp)
    std = np.sqrt(variance)

    return {
        'expected_wins': round(expected_wins, 1),
        'win_std': round(std, 1),
        'win_range_90': (
            max(0, round(expected_wins - 1.65 * std, 0)),
            min(games, round(expected_wins + 1.65 * std, 0))
        ),
        'playoff_probability': round(
            1 - stats.norm.cdf(9.5, loc=expected_wins, scale=std), 2
        ) if 'stats' in dir() else None
    }

Evaluation Metrics

Why Evaluation Matters

A prediction model is only as good as its track record. Without proper evaluation, you can't distinguish skill from luck.

from dataclasses import dataclass
from typing import List
import numpy as np

@dataclass
class ModelEvaluation:
    """Comprehensive model evaluation results."""

    # Sample size
    n_predictions: int

    # Accuracy metrics
    straight_up_accuracy: float
    ats_accuracy: float  # Against the spread

    # Calibration metrics
    brier_score: float
    log_loss: float

    # Error metrics
    mae_spread: float  # Mean absolute error on spread
    rmse_spread: float  # Root mean squared error

    # Comparison to baseline
    vs_random: float  # Improvement over 50%
    vs_market: float  # Improvement over market


class ModelEvaluator:
    """
    Comprehensive prediction model evaluator.

    Example usage:
        evaluator = ModelEvaluator()
        results = evaluator.evaluate(predictions, actuals)
        evaluator.print_report(results)
    """

    def evaluate(self, predictions: List[Dict],
                 actuals: pd.DataFrame) -> ModelEvaluation:
        """
        Evaluate prediction accuracy.

        Args:
            predictions: List of prediction dicts with game_id, spread, probability
            actuals: DataFrame with actual results
        """
        n = len(predictions)
        if n == 0:
            raise ValueError("No predictions to evaluate")

        correct_su = 0  # Straight up
        correct_ats = 0  # Against spread
        brier_sum = 0
        log_loss_sum = 0
        spread_errors = []

        for pred in predictions:
            actual = actuals[actuals['game_id'] == pred['game_id']]
            if len(actual) == 0:
                continue

            actual = actual.iloc[0]

            # Actual outcome
            home_won = actual['home_score'] > actual['away_score']
            actual_spread = actual['away_score'] - actual['home_score']

            # Straight up accuracy
            pred_home_wins = pred.get('home_win_prob', 0.5) > 0.5
            if pred_home_wins == home_won:
                correct_su += 1

            # ATS accuracy (if market spread available)
            if 'market_spread' in pred and 'predicted_spread' in pred:
                # Did our predicted side cover?
                our_side = 'home' if pred['predicted_spread'] < pred['market_spread'] else 'away'
                home_covered = actual_spread < pred['market_spread']

                if (our_side == 'home' and home_covered) or \
                   (our_side == 'away' and not home_covered):
                    correct_ats += 1

            # Brier score
            prob = pred.get('home_win_prob', 0.5)
            outcome = 1 if home_won else 0
            brier_sum += (prob - outcome) ** 2

            # Log loss
            prob_clipped = np.clip(prob, 0.001, 0.999)
            if home_won:
                log_loss_sum -= np.log(prob_clipped)
            else:
                log_loss_sum -= np.log(1 - prob_clipped)

            # Spread error
            if 'predicted_spread' in pred:
                spread_errors.append(pred['predicted_spread'] - actual_spread)

        # Calculate metrics
        spread_errors = np.array(spread_errors)

        return ModelEvaluation(
            n_predictions=n,
            straight_up_accuracy=correct_su / n,
            ats_accuracy=correct_ats / n if correct_ats > 0 else 0,
            brier_score=brier_sum / n,
            log_loss=log_loss_sum / n,
            mae_spread=np.mean(np.abs(spread_errors)) if len(spread_errors) > 0 else 0,
            rmse_spread=np.sqrt(np.mean(spread_errors ** 2)) if len(spread_errors) > 0 else 0,
            vs_random=correct_su / n - 0.5,
            vs_market=correct_ats / n - 0.5 if correct_ats > 0 else 0
        )

    def print_report(self, evaluation: ModelEvaluation) -> None:
        """Print formatted evaluation report."""
        print(f"\n{'='*50}")
        print("Model Evaluation Report")
        print(f"{'='*50}")

        print(f"\n--- Sample Size ---")
        print(f"Predictions evaluated: {evaluation.n_predictions}")

        print(f"\n--- Accuracy ---")
        print(f"Straight-up:       {evaluation.straight_up_accuracy:.1%}")
        print(f"Against spread:    {evaluation.ats_accuracy:.1%}")

        print(f"\n--- Calibration ---")
        print(f"Brier score:       {evaluation.brier_score:.4f}")
        print(f"Log loss:          {evaluation.log_loss:.4f}")

        print(f"\n--- Spread Error ---")
        print(f"MAE:               {evaluation.mae_spread:.2f} points")
        print(f"RMSE:              {evaluation.rmse_spread:.2f} points")

        print(f"\n--- vs Baselines ---")
        print(f"vs Random (50%):   {evaluation.vs_random:+.1%}")
        print(f"vs Market:         {evaluation.vs_market:+.1%}")

        print(f"\n{'='*50}\n")

Key Metrics Explained

1. Straight-Up Accuracy

Simply: what percentage of winners did you predict correctly?

def calculate_straight_up_accuracy(predictions: List, actuals: List) -> float:
    """
    Calculate straight-up prediction accuracy.

    Baseline: ~50% (coin flip)
    Good model: 55-60%
    Elite model: 60-65%
    Unrealistic: >70% sustained
    """
    correct = sum(1 for p, a in zip(predictions, actuals) if p == a)
    return correct / len(predictions)

2. Against-the-Spread (ATS) Accuracy

For betting: do you beat the point spread more than 50% of the time?

def calculate_ats_accuracy(predictions: List[float],
                           market_spreads: List[float],
                           actual_spreads: List[float]) -> float:
    """
    Calculate against-the-spread accuracy.

    To profit betting spreads, need >52.4% (accounting for vig).

    Args:
        predictions: Your predicted spreads
        market_spreads: Vegas spreads
        actual_spreads: Actual game spreads
    """
    correct = 0
    for pred, market, actual in zip(predictions, market_spreads, actual_spreads):
        # Which side would you bet?
        bet_home = pred < market  # You think home is better than market

        # Did that side cover?
        home_covered = actual < market

        if bet_home == home_covered:
            correct += 1

    return correct / len(predictions)

3. Brier Score

Measures probability calibration: are your 70% predictions winning 70% of the time?

def calculate_brier_score(probabilities: List[float],
                          outcomes: List[int]) -> float:
    """
    Calculate Brier score.

    Brier = mean((probability - outcome)^2)

    Perfect: 0.0
    Random (50%): 0.25
    Good model: 0.20-0.22
    """
    return np.mean([(p - o) ** 2 for p, o in zip(probabilities, outcomes)])


def analyze_calibration(probabilities: List[float],
                        outcomes: List[int],
                        n_bins: int = 10) -> pd.DataFrame:
    """
    Analyze probability calibration by binning predictions.

    Well-calibrated: predicted probability ≈ actual win rate
    """
    bins = np.linspace(0, 1, n_bins + 1)
    results = []

    for i in range(n_bins):
        mask = (np.array(probabilities) >= bins[i]) & \
               (np.array(probabilities) < bins[i + 1])

        if sum(mask) > 0:
            bin_probs = np.array(probabilities)[mask]
            bin_outcomes = np.array(outcomes)[mask]

            results.append({
                'bin_start': bins[i],
                'bin_end': bins[i + 1],
                'n_predictions': sum(mask),
                'mean_probability': np.mean(bin_probs),
                'actual_win_rate': np.mean(bin_outcomes),
                'calibration_error': abs(np.mean(bin_probs) - np.mean(bin_outcomes))
            })

    return pd.DataFrame(results)

4. Mean Absolute Error (MAE)

Average spread prediction error in points:

def calculate_mae(predicted_spreads: List[float],
                  actual_spreads: List[float]) -> float:
    """
    Calculate mean absolute error on spreads.

    NFL games have high variance: ~13-14 points standard deviation.

    Good model MAE: 10-12 points
    Market MAE: ~10 points
    """
    return np.mean([abs(p - a) for p, a in zip(predicted_spreads, actual_spreads)])

Common Pitfalls

Pitfall 1: Overfitting

Training a model that works perfectly on past data but fails on new data:

def demonstrate_overfitting():
    """
    Show how overfitting occurs.
    """
    # Generate sample data
    np.random.seed(42)
    n = 100

    # True relationship: spread ≈ rating_diff + noise
    rating_diff = np.random.normal(0, 5, n)
    noise = np.random.normal(0, 13, n)  # NFL has ~13pt std dev
    actual_spread = rating_diff + noise

    # Overfit model: memorize training data
    from sklearn.tree import DecisionTreeRegressor

    # Deep tree memorizes noise
    overfit_model = DecisionTreeRegressor(max_depth=None)
    overfit_model.fit(rating_diff.reshape(-1, 1), actual_spread)

    # Simple model: linear relationship
    simple_coef = np.corrcoef(rating_diff, actual_spread)[0, 1] * \
                  np.std(actual_spread) / np.std(rating_diff)

    # Training error
    overfit_train_error = np.mean(np.abs(
        overfit_model.predict(rating_diff.reshape(-1, 1)) - actual_spread
    ))
    simple_train_error = np.mean(np.abs(
        rating_diff * simple_coef - actual_spread
    ))

    # Test on new data
    new_rating_diff = np.random.normal(0, 5, n)
    new_noise = np.random.normal(0, 13, n)
    new_actual = new_rating_diff + new_noise

    overfit_test_error = np.mean(np.abs(
        overfit_model.predict(new_rating_diff.reshape(-1, 1)) - new_actual
    ))
    simple_test_error = np.mean(np.abs(
        new_rating_diff * simple_coef - new_actual
    ))

    return {
        'overfit_train_mae': overfit_train_error,
        'overfit_test_mae': overfit_test_error,
        'simple_train_mae': simple_train_error,
        'simple_test_mae': simple_test_error,
        'lesson': 'Overfit model: great training, poor testing'
    }

Pitfall 2: Data Leakage

Using information that wouldn't be available at prediction time:

def demonstrate_data_leakage():
    """
    Show common data leakage scenarios.
    """
    leakage_examples = {
        'using_final_season_stats': {
            'problem': 'Using end-of-season stats to predict mid-season games',
            'solution': 'Only use data available before each game'
        },
        'using_game_result_features': {
            'problem': 'Including yards, turnovers from the game being predicted',
            'solution': 'Strictly separate features from outcome'
        },
        'future_opponent_info': {
            'problem': 'Using opponent stats from games after this one',
            'solution': 'Time-aware feature engineering'
        },
        'injury_reports': {
            'problem': 'Using game-day injury info for earlier predictions',
            'solution': 'Only use info available at prediction time'
        }
    }

    return leakage_examples


def proper_temporal_split(games: pd.DataFrame,
                          train_seasons: List[int],
                          test_seasons: List[int]) -> Tuple:
    """
    Properly split data to avoid leakage.

    Key: Test data must be strictly after training data.
    """
    train = games[games['season'].isin(train_seasons)]
    test = games[games['season'].isin(test_seasons)]

    # Verify no overlap
    assert train['season'].max() < test['season'].min(), \
        "Training data must precede test data"

    return train, test

Pitfall 3: Ignoring Variance

NFL outcomes have enormous randomness:

def understand_nfl_variance():
    """
    Quantify NFL game variance.
    """
    # NFL point spread standard deviation is ~13-14 points
    spread_std = 13.5

    # This means even large predicted edges have significant uncertainty
    scenarios = [
        {'predicted_edge': 3, 'description': 'Small edge'},
        {'predicted_edge': 7, 'description': 'Medium edge'},
        {'predicted_edge': 14, 'description': 'Large edge (rare)'}
    ]

    from scipy import stats

    results = []
    for s in scenarios:
        # Probability of covering (winning by more than 0)
        cover_prob = stats.norm.cdf(s['predicted_edge'] / spread_std)

        results.append({
            'edge': s['predicted_edge'],
            'description': s['description'],
            'cover_probability': round(cover_prob, 3),
            'upset_probability': round(1 - cover_prob, 3)
        })

    return {
        'spread_std': spread_std,
        'scenarios': results,
        'key_insight': 'Even 7-point favorites lose outright ~20% of the time'
    }

Pitfall 4: Sample Size Illusions

Small samples create false confidence:

def sample_size_analysis():
    """
    Show how sample size affects reliability.
    """
    true_accuracy = 0.55  # Actual model skill

    sample_sizes = [20, 50, 100, 200, 500, 1000]
    results = []

    for n in sample_sizes:
        # Standard error of proportion
        se = np.sqrt(true_accuracy * (1 - true_accuracy) / n)

        # 95% confidence interval
        ci_low = true_accuracy - 1.96 * se
        ci_high = true_accuracy + 1.96 * se

        # Probability of appearing >60% accurate by chance
        from scipy import stats
        prob_look_great = 1 - stats.norm.cdf((0.60 - true_accuracy) / se)

        results.append({
            'sample_size': n,
            'ci_width': round(ci_high - ci_low, 3),
            'ci_low': round(ci_low, 3),
            'ci_high': round(ci_high, 3),
            'prob_misleading': round(prob_look_great, 3)
        })

    return pd.DataFrame(results)

Building Blocks of Prediction Models

Component 1: Team Ratings

The foundation of most models—a single number representing team quality:

def create_simple_rating_system(games: pd.DataFrame) -> Dict[str, float]:
    """
    Create basic team ratings from game results.

    This is the simplest possible rating: average point differential.
    """
    teams = set(games['home_team'].tolist() + games['away_team'].tolist())
    ratings = {}

    for team in teams:
        home = games[games['home_team'] == team]
        away = games[games['away_team'] == team]

        home_diff = (home['home_score'] - home['away_score']).sum()
        away_diff = (away['away_score'] - away['home_score']).sum()
        total_games = len(home) + len(away)

        ratings[team] = (home_diff + away_diff) / total_games if total_games > 0 else 0

    # Normalize to mean 0
    mean_rating = np.mean(list(ratings.values()))
    ratings = {team: r - mean_rating for team, r in ratings.items()}

    return ratings

Component 2: Home Field Advantage

Account for the home team's edge:

def estimate_home_field_advantage(games: pd.DataFrame) -> Dict:
    """
    Estimate home field advantage from historical data.
    """
    margins = games['home_score'] - games['away_score']
    home_wins = (margins > 0).sum()
    total_games = len(games)

    return {
        'avg_margin': round(margins.mean(), 2),
        'home_win_pct': round(home_wins / total_games, 3),
        'median_margin': round(margins.median(), 1),
        'std_margin': round(margins.std(), 2)
    }

Component 3: Adjustments

Factors that modify the base prediction:

def calculate_adjustments(game: Dict, context: Dict) -> Dict:
    """
    Calculate prediction adjustments for various factors.
    """
    adjustments = {}

    # Rest advantage
    home_rest = context.get('home_days_rest', 7)
    away_rest = context.get('away_days_rest', 7)
    adjustments['rest'] = (home_rest - away_rest) * 0.5  # ~0.5 pts per day

    # Travel
    away_travel_miles = context.get('away_travel_miles', 0)
    if away_travel_miles > 2000:
        adjustments['travel'] = 1.5  # Long travel hurts away team
    elif away_travel_miles > 1000:
        adjustments['travel'] = 0.5
    else:
        adjustments['travel'] = 0

    # Timezone (west to east is hardest)
    tz_diff = context.get('timezone_diff', 0)  # Positive = away traveling east
    if tz_diff >= 2:
        adjustments['timezone'] = 1.0
    elif tz_diff <= -2:
        adjustments['timezone'] = -0.5  # East to west is easier
    else:
        adjustments['timezone'] = 0

    # Weather (for outdoor games)
    if context.get('is_outdoor', False):
        temp = context.get('temperature', 65)
        wind = context.get('wind_speed', 5)

        if temp < 32:
            adjustments['cold'] = -0.5  # Favors home team slightly
        if wind > 15:
            adjustments['wind'] = -1.0  # Reduces scoring

    # Injuries (simplified)
    home_injury_impact = context.get('home_injury_impact', 0)
    away_injury_impact = context.get('away_injury_impact', 0)
    adjustments['injuries'] = away_injury_impact - home_injury_impact

    adjustments['total'] = sum(adjustments.values())

    return adjustments

Component 4: Uncertainty Quantification

Good models know what they don't know:

def quantify_prediction_uncertainty(base_spread: float,
                                    sample_games: int,
                                    rating_confidence: float) -> Dict:
    """
    Quantify uncertainty in a prediction.

    Args:
        base_spread: Point spread prediction
        sample_games: Number of games used for ratings
        rating_confidence: Confidence in team ratings (0-1)
    """
    # Base uncertainty: NFL game variance
    base_std = 13.5

    # Additional uncertainty from small samples
    sample_uncertainty = 5 / np.sqrt(max(sample_games, 1))

    # Rating uncertainty
    rating_uncertainty = (1 - rating_confidence) * 3

    # Combined uncertainty (root sum of squares)
    total_std = np.sqrt(base_std**2 + sample_uncertainty**2 + rating_uncertainty**2)

    # Confidence interval
    from scipy import stats
    ci_90 = stats.norm.interval(0.90, loc=base_spread, scale=total_std)

    return {
        'predicted_spread': base_spread,
        'total_uncertainty': round(total_std, 1),
        'ci_90_low': round(ci_90[0], 1),
        'ci_90_high': round(ci_90[1], 1),
        'components': {
            'game_variance': base_std,
            'sample_uncertainty': round(sample_uncertainty, 1),
            'rating_uncertainty': round(rating_uncertainty, 1)
        }
    }

A Complete Simple Model

Putting it all together:

class SimpleNFLPredictor:
    """
    A complete but simple NFL prediction model.

    This demonstrates all the core components while remaining
    interpretable and educational.

    Example usage:
        model = SimpleNFLPredictor()
        model.fit(historical_games)
        prediction = model.predict('KC', 'BUF', is_home_kc=True)
    """

    def __init__(self, hfa: float = 2.5, recency_weight: float = 0.1):
        """
        Initialize predictor.

        Args:
            hfa: Home field advantage in points
            recency_weight: Weight decay for older games
        """
        self.hfa = hfa
        self.recency_weight = recency_weight
        self.ratings = {}
        self.is_fitted = False

    def fit(self, games: pd.DataFrame) -> 'SimpleNFLPredictor':
        """
        Fit the model to historical games.

        Args:
            games: DataFrame with home_team, away_team, home_score, away_score
        """
        # Sort by date/week
        games = games.sort_values(['season', 'week'])

        # Calculate ratings using recency-weighted average
        teams = set(games['home_team'].tolist() + games['away_team'].tolist())

        for team in teams:
            team_games = games[
                (games['home_team'] == team) | (games['away_team'] == team)
            ]

            margins = []
            weights = []

            for i, (_, game) in enumerate(team_games.iterrows()):
                if game['home_team'] == team:
                    margin = game['home_score'] - game['away_score'] - self.hfa
                else:
                    margin = game['away_score'] - game['home_score'] + self.hfa

                margins.append(margin)
                # More recent games weighted higher
                weight = (1 - self.recency_weight) ** (len(team_games) - i - 1)
                weights.append(weight)

            if margins:
                self.ratings[team] = np.average(margins, weights=weights)
            else:
                self.ratings[team] = 0

        # Normalize to mean 0
        mean_rating = np.mean(list(self.ratings.values()))
        self.ratings = {team: r - mean_rating for team, r in self.ratings.items()}

        self.is_fitted = True
        return self

    def predict(self, home_team: str, away_team: str,
                adjustments: Dict = None) -> GamePrediction:
        """
        Make a prediction for a single game.

        Args:
            home_team: Home team abbreviation
            away_team: Away team abbreviation
            adjustments: Optional adjustment factors
        """
        if not self.is_fitted:
            raise ValueError("Model must be fit before predicting")

        home_rating = self.ratings.get(home_team, 0)
        away_rating = self.ratings.get(away_team, 0)

        # Base spread (negative = home favored)
        spread = away_rating - home_rating - self.hfa

        # Apply adjustments
        if adjustments:
            spread -= adjustments.get('total', 0)

        # Convert to probability
        home_wp = 1 / (1 + 10 ** (spread / 8))

        # Predicted scores
        avg_total = 45.0
        home_score = (avg_total - spread) / 2
        away_score = (avg_total + spread) / 2

        return GamePrediction(
            game_id=f"{home_team}_{away_team}",
            home_team=home_team,
            away_team=away_team,
            predicted_winner=home_team if home_wp > 0.5 else away_team,
            home_win_probability=round(home_wp, 3),
            predicted_spread=round(spread, 1),
            predicted_total=avg_total,
            confidence=round(abs(home_wp - 0.5) * 2, 2),
            model_uncertainty=13.5,
            home_score=round(home_score, 1),
            away_score=round(away_score, 1)
        )

    def predict_season(self, schedule: pd.DataFrame) -> pd.DataFrame:
        """
        Predict all games in a schedule.

        Args:
            schedule: DataFrame with home_team, away_team for each game
        """
        predictions = []

        for _, game in schedule.iterrows():
            pred = self.predict(game['home_team'], game['away_team'])
            predictions.append({
                'game_id': pred.game_id,
                'home_team': pred.home_team,
                'away_team': pred.away_team,
                'predicted_winner': pred.predicted_winner,
                'home_win_prob': pred.home_win_probability,
                'spread': pred.predicted_spread
            })

        return pd.DataFrame(predictions)

    def get_rankings(self) -> pd.DataFrame:
        """Get team rankings by rating."""
        rankings = pd.DataFrame([
            {'team': team, 'rating': rating}
            for team, rating in self.ratings.items()
        ])
        return rankings.sort_values('rating', ascending=False).reset_index(drop=True)

Summary

Key Concepts:

  1. Prediction models are systematic - Not guesses, but mathematical transformations of data
  2. Evaluation is essential - Without proper metrics, you can't distinguish skill from luck
  3. Pitfalls are everywhere - Overfitting, leakage, variance, and small samples trap many modelers
  4. Building blocks combine - Ratings + HFA + adjustments + uncertainty = prediction

What Makes a Good Model:

Aspect Poor Model Good Model
Inputs One or two factors Multiple relevant features
Evaluation "Feels right" Rigorous backtesting
Uncertainty Ignored Quantified
Updates Static Learns from new data

Next Steps:

The following chapters will build on these foundations: - Chapter 19: Elo and Power Ratings - Sophisticated team rating systems - Chapter 20: Machine Learning - Advanced modeling techniques - Chapter 21: Game Simulation - Monte Carlo approaches - Chapter 22: Betting Markets - Using market information


Preview: Chapter 19

Next, we'll dive deep into Elo and Power Ratings—the backbone of most NFL prediction systems. You'll learn how to build rating systems that automatically adjust based on game results, handle margin of victory, and account for opponent strength.


References

  1. Silver, N. (2012). The Signal and the Noise
  2. Winston, W. L. (2012). Mathletics
  3. Football Outsiders. "DVOA Methodology"
  4. FiveThirtyEight. "How Our NFL Predictions Work"
  5. Brier, G. W. (1950). "Verification of Forecasts Expressed in Terms of Probability"