5 min read

In This Chapter

Learning Objectives
Introduction
18.1 The Game Prediction Problem
18.2 Rating Systems for Team Strength
18.3 Feature Engineering for Game Prediction
18.4 Machine Learning Models for Game Prediction
18.5 Evaluating Game Predictions
18.6 Production Deployment
Summary
Chapter 18 Exercises
Chapter 18 Code Examples

Exercises Quiz Case Study 01 Case Study 02

Chapter 18: Game Outcome Prediction

Learning Objectives

By the end of this chapter, you will be able to:

Build comprehensive game prediction models using multiple approaches
Engineer advanced features capturing team strength, matchup dynamics, and situational factors
Implement Elo rating systems and power rankings for football prediction
Apply ensemble methods to combine predictions from multiple models
Evaluate model performance against betting markets and other benchmarks
Deploy production-ready prediction systems for real-time game forecasting

Introduction

Predicting the outcome of college football games represents one of the most challenging and rewarding applications of sports analytics. With billions of dollars wagered annually on college football, sophisticated models compete against Vegas odds, while teams and broadcasters seek accurate predictions for strategic planning and entertainment. This chapter presents state-of-the-art techniques for game outcome prediction, building upon the machine learning foundations from Chapter 17.

Game prediction differs fundamentally from other prediction tasks because of the inherent unpredictability of football. Unlike baseball's large sample sizes or basketball's frequent scoring, football games are decided by relatively few plays, each with enormous variance. A single fumble, interception, or blown coverage can swing a game by 7-14 points. Despite this variance, systematic patterns exist—better teams win more often, home teams have advantages, and certain matchups favor specific playing styles.

The goal of this chapter is not to create a perfect prediction system (which is impossible), but to build models that capture meaningful signal, produce well-calibrated probabilities, and outperform naive baselines. We'll examine multiple approaches: statistical models built from team metrics, Elo-style rating systems, machine learning classifiers, and ensemble methods that combine predictions intelligently.

18.1 The Game Prediction Problem

18.1.1 What We're Predicting

Game prediction typically targets one or more outcomes:

Win Probability: The probability that a specific team wins the game. This is the most common prediction target.

Point Spread: The expected margin of victory. Related to win probability but captures game competitiveness.

Total Points (Over/Under): The total points scored by both teams. Less common in academic analytics but critical for betting markets.

Straight-Up Winner: Binary classification—which team wins without regard to margin.

Each target has different characteristics and requires different modeling approaches:

import numpy as np
import pandas as pd
from dataclasses import dataclass
from typing import Dict, List, Tuple, Optional
from enum import Enum


class PredictionTarget(Enum):
    """Types of game predictions."""
    WIN_PROBABILITY = 'win_probability'
    POINT_SPREAD = 'point_spread'
    TOTAL_POINTS = 'total_points'
    STRAIGHT_UP = 'straight_up'


@dataclass
class GamePrediction:
    """
    Complete prediction for a single game.

    Attributes:
    -----------
    game_id : str
        Unique game identifier
    home_team : str
        Home team name
    away_team : str
        Away team name
    home_win_prob : float
        Probability home team wins (0-1)
    predicted_spread : float
        Predicted margin (positive = home favored)
    predicted_total : float
        Predicted total points
    confidence : float
        Model confidence in prediction
    model_name : str
        Name of model generating prediction
    """
    game_id: str
    home_team: str
    away_team: str
    home_win_prob: float
    predicted_spread: float
    predicted_total: float
    confidence: float = 0.5
    model_name: str = 'default'

    @property
    def away_win_prob(self) -> float:
        """Away team win probability."""
        return 1 - self.home_win_prob

    @property
    def predicted_winner(self) -> str:
        """Predicted winner based on probability."""
        return self.home_team if self.home_win_prob > 0.5 else self.away_team

    def to_dict(self) -> Dict:
        """Convert to dictionary."""
        return {
            'game_id': self.game_id,
            'home_team': self.home_team,
            'away_team': self.away_team,
            'home_win_prob': self.home_win_prob,
            'away_win_prob': self.away_win_prob,
            'predicted_spread': self.predicted_spread,
            'predicted_total': self.predicted_total,
            'predicted_winner': self.predicted_winner,
            'confidence': self.confidence,
            'model': self.model_name
        }

18.1.2 The Challenge of Football Prediction

Football prediction is inherently difficult for several reasons:

Small Sample Sizes: Each team plays only 12-14 games per season, providing limited data to assess team strength.

High Variance Per Play: Individual plays have enormous variance. A 60-yard touchdown pass and an interception might both come from similar pre-snap situations.

Dynamic Team Strength: Teams improve or regress throughout the season due to injuries, coaching adjustments, player development, and momentum.

Context Dependence: Team performance varies by opponent, weather, venue, time of day, and broader season context.

Selection Effects: Bowl games and playoffs match teams differently than regular season, creating selection bias in historical data.

Understanding these challenges helps set appropriate expectations and guides modeling decisions:

def calculate_baseline_accuracy(data: pd.DataFrame) -> Dict:
    """
    Calculate baseline prediction accuracy.

    Parameters:
    -----------
    data : pd.DataFrame
        Historical game data with 'home_win' column

    Returns:
    --------
    Dict : Baseline accuracy metrics
    """
    home_win_rate = data['home_win'].mean()

    baselines = {
        'home_always': home_win_rate,  # Always pick home team
        'away_always': 1 - home_win_rate,  # Always pick away team
        'coin_flip': 0.50,  # Random guess
        'majority_class': max(home_win_rate, 1 - home_win_rate)
    }

    # Typical historical home win rate in CFB
    print("Baseline Prediction Accuracy:")
    print(f"  Home team wins: {home_win_rate:.1%}")
    print(f"  Always pick home: {baselines['home_always']:.1%}")
    print(f"  Random guess: {baselines['coin_flip']:.1%}")
    print(f"  Majority class: {baselines['majority_class']:.1%}")

    return baselines


def calculate_prediction_ceiling(n_games: int = 1000) -> float:
    """
    Estimate theoretical prediction ceiling for football.

    Based on analysis of closing lines and actual outcomes,
    the practical ceiling is approximately 75-78% for individual games.

    Parameters:
    -----------
    n_games : int
        Number of simulated games

    Returns:
    --------
    float : Estimated ceiling accuracy
    """
    np.random.seed(42)

    # True team strength difference (in points)
    true_spread = np.random.normal(0, 10, n_games)

    # Actual margin includes game variance (std dev ~13-14 points)
    actual_margin = true_spread + np.random.normal(0, 13.5, n_games)

    # Perfect knowledge of spread
    predicted_winner = true_spread > 0
    actual_winner = actual_margin > 0

    accuracy = (predicted_winner == actual_winner).mean()

    print(f"Theoretical ceiling with perfect spread knowledge: {accuracy:.1%}")
    print("Note: This assumes spread captures all predictable information")

    return accuracy

18.1.3 Relationship Between Spread and Win Probability

Point spread and win probability are closely related but distinct. The market point spread represents the expected margin of victory, while win probability represents the likelihood of winning regardless of margin.

The relationship between spread and win probability follows a sigmoid-like curve, but the exact shape depends on assumptions about score variance:

from scipy.stats import norm


def spread_to_probability(spread: float,
                         std_dev: float = 13.5) -> float:
    """
    Convert point spread to win probability.

    Uses a normal distribution assumption for score margin.

    Parameters:
    -----------
    spread : float
        Point spread (positive = team favored)
    std_dev : float
        Standard deviation of score margins (typically 13-14)

    Returns:
    --------
    float : Win probability
    """
    # P(win) = P(actual_margin > 0) where actual_margin ~ N(spread, std_dev)
    prob = 1 - norm.cdf(0, loc=spread, scale=std_dev)
    return prob


def probability_to_spread(prob: float,
                         std_dev: float = 13.5) -> float:
    """
    Convert win probability to implied point spread.

    Parameters:
    -----------
    prob : float
        Win probability (0-1)
    std_dev : float
        Standard deviation of score margins

    Returns:
    --------
    float : Implied point spread
    """
    # Inverse of spread_to_probability
    spread = norm.ppf(prob) * std_dev
    return spread


def demonstrate_spread_probability_relationship():
    """Show the spread-probability relationship."""
    spreads = np.arange(-28, 29, 1)
    probs = [spread_to_probability(s) for s in spreads]

    print("\nSpread to Win Probability Conversion:")
    print("-" * 40)
    for spread, prob in [(-21, None), (-14, None), (-7, None),
                         (-3, None), (0, None), (3, None),
                         (7, None), (14, None), (21, None)]:
        prob = spread_to_probability(spread)
        print(f"  Spread {spread:+3d}: {prob:.1%} win probability")

    return spreads, probs


# Example output
demonstrate_spread_probability_relationship()

18.2 Rating Systems for Team Strength

Rating systems provide a principled way to estimate team strength from game results. These ratings can then predict future games by comparing team strengths.

18.2.1 The Elo Rating System

Developed by Arpad Elo for chess, the Elo system has been successfully adapted to football. The key insight is that rating differences predict win probabilities, and actual results update ratings proportionally to how surprising they were.

class EloRatingSystem:
    """
    Elo rating system for college football.

    The system maintains ratings for all teams and updates them
    after each game based on actual vs expected results.
    """

    def __init__(self,
                 initial_rating: float = 1500,
                 k_factor: float = 20,
                 home_advantage: float = 65,
                 mean_reversion: float = 0.25):
        """
        Initialize Elo system.

        Parameters:
        -----------
        initial_rating : float
            Starting rating for new teams
        k_factor : float
            Maximum rating change per game
        home_advantage : float
            Home field advantage in Elo points
        mean_reversion : float
            Fraction of rating that reverts to mean each season
        """
        self.initial_rating = initial_rating
        self.k_factor = k_factor
        self.home_advantage = home_advantage
        self.mean_reversion = mean_reversion
        self.ratings = {}
        self.history = []

    def get_rating(self, team: str) -> float:
        """Get current rating for a team."""
        return self.ratings.get(team, self.initial_rating)

    def expected_win_prob(self, home_team: str, away_team: str) -> float:
        """
        Calculate expected win probability for home team.

        Uses logistic curve with home advantage adjustment.

        Parameters:
        -----------
        home_team : str
            Home team name
        away_team : str
            Away team name

        Returns:
        --------
        float : Home team win probability
        """
        home_rating = self.get_rating(home_team) + self.home_advantage
        away_rating = self.get_rating(away_team)

        rating_diff = home_rating - away_rating

        # Logistic function: maps rating diff to probability
        # 400 point diff ≈ 91% win probability
        expected = 1 / (1 + 10 ** (-rating_diff / 400))

        return expected

    def update_ratings(self, home_team: str, away_team: str,
                      home_score: int, away_score: int,
                      margin_of_victory_multiplier: bool = True) -> Dict:
        """
        Update ratings after a game.

        Parameters:
        -----------
        home_team, away_team : str
            Team names
        home_score, away_score : int
            Final scores
        margin_of_victory_multiplier : bool
            Whether to adjust K based on margin

        Returns:
        --------
        Dict : Update details
        """
        # Get pre-game ratings
        home_rating_pre = self.get_rating(home_team)
        away_rating_pre = self.get_rating(away_team)

        # Expected result
        expected_home = self.expected_win_prob(home_team, away_team)

        # Actual result (1 for win, 0 for loss, 0.5 for tie)
        if home_score > away_score:
            actual_home = 1.0
        elif home_score < away_score:
            actual_home = 0.0
        else:
            actual_home = 0.5

        # Margin of victory multiplier (reduces blowout impact)
        if margin_of_victory_multiplier:
            margin = abs(home_score - away_score)
            mov_mult = np.log(max(margin, 1) + 1) * (2.2 / (1 + 0.001 * abs(
                home_rating_pre - away_rating_pre + self.home_advantage * (1 if actual_home > 0.5 else -1)
            )))
            mov_mult = min(mov_mult, 3.0)  # Cap at 3x
        else:
            mov_mult = 1.0

        # Calculate rating change
        k_effective = self.k_factor * mov_mult
        rating_change = k_effective * (actual_home - expected_home)

        # Update ratings
        self.ratings[home_team] = home_rating_pre + rating_change
        self.ratings[away_team] = away_rating_pre - rating_change

        # Record history
        result = {
            'home_team': home_team,
            'away_team': away_team,
            'home_score': home_score,
            'away_score': away_score,
            'home_rating_pre': home_rating_pre,
            'away_rating_pre': away_rating_pre,
            'home_rating_post': self.ratings[home_team],
            'away_rating_post': self.ratings[away_team],
            'expected_home': expected_home,
            'actual_home': actual_home,
            'rating_change': rating_change
        }
        self.history.append(result)

        return result

    def apply_season_mean_reversion(self):
        """
        Apply mean reversion at season boundary.

        Ratings regress toward mean to account for roster turnover.
        """
        mean_rating = np.mean(list(self.ratings.values())) if self.ratings else self.initial_rating

        for team in self.ratings:
            current = self.ratings[team]
            self.ratings[team] = current + self.mean_reversion * (mean_rating - current)

    def get_rankings(self, top_n: int = 25) -> pd.DataFrame:
        """Get top teams by rating."""
        if not self.ratings:
            return pd.DataFrame()

        rankings = pd.DataFrame([
            {'team': team, 'rating': rating}
            for team, rating in self.ratings.items()
        ]).sort_values('rating', ascending=False)

        rankings['rank'] = range(1, len(rankings) + 1)

        return rankings.head(top_n)

    def evaluate_predictions(self, test_games: pd.DataFrame) -> Dict:
        """
        Evaluate prediction accuracy on test games.

        Parameters:
        -----------
        test_games : pd.DataFrame
            Games with home_team, away_team, home_win columns

        Returns:
        --------
        Dict : Evaluation metrics
        """
        predictions = []
        actuals = []

        for _, game in test_games.iterrows():
            pred_prob = self.expected_win_prob(game['home_team'], game['away_team'])
            predictions.append(pred_prob)
            actuals.append(game['home_win'])

        predictions = np.array(predictions)
        actuals = np.array(actuals)

        # Binary predictions
        pred_binary = (predictions > 0.5).astype(int)

        from sklearn.metrics import accuracy_score, roc_auc_score, brier_score_loss

        return {
            'accuracy': accuracy_score(actuals, pred_binary),
            'auc_roc': roc_auc_score(actuals, predictions),
            'brier_score': brier_score_loss(actuals, predictions),
            'log_loss': -np.mean(
                actuals * np.log(predictions + 1e-10) +
                (1 - actuals) * np.log(1 - predictions + 1e-10)
            )
        }

18.2.2 Margin-Based Rating Systems

Pure Elo considers only wins and losses. Margin-based systems incorporate point differential for more signal:

class MarginRatingSystem:
    """
    Rating system based on scoring margin.

    Similar to Massey Ratings or simple power rankings.
    Uses least squares to solve for team strengths.
    """

    def __init__(self, home_advantage: float = 2.5):
        """
        Initialize margin rating system.

        Parameters:
        -----------
        home_advantage : float
            Home field advantage in points
        """
        self.home_advantage = home_advantage
        self.ratings = {}
        self.games = []

    def add_game(self, home_team: str, away_team: str,
                home_score: int, away_score: int):
        """Add a game to the system."""
        self.games.append({
            'home_team': home_team,
            'away_team': away_team,
            'home_score': home_score,
            'away_score': away_score,
            'margin': home_score - away_score
        })

    def solve_ratings(self, ridge_alpha: float = 0.1) -> Dict[str, float]:
        """
        Solve for team ratings using ridge regression.

        Model: margin = home_rating - away_rating + home_advantage + noise

        Parameters:
        -----------
        ridge_alpha : float
            Regularization parameter

        Returns:
        --------
        Dict[str, float] : Team ratings
        """
        if not self.games:
            return {}

        df = pd.DataFrame(self.games)

        # Get all teams
        all_teams = sorted(set(df['home_team']) | set(df['away_team']))
        team_to_idx = {team: i for i, team in enumerate(all_teams)}
        n_teams = len(all_teams)
        n_games = len(df)

        # Build design matrix
        # Each row: home team gets +1, away team gets -1
        X = np.zeros((n_games, n_teams))
        y = df['margin'].values - self.home_advantage  # Remove HFA

        for i, game in df.iterrows():
            X[i, team_to_idx[game['home_team']]] = 1
            X[i, team_to_idx[game['away_team']]] = -1

        # Ridge regression for stability
        from sklearn.linear_model import Ridge
        model = Ridge(alpha=ridge_alpha, fit_intercept=False)
        model.fit(X, y)

        # Extract ratings
        self.ratings = {
            team: model.coef_[idx]
            for team, idx in team_to_idx.items()
        }

        return self.ratings

    def predict_spread(self, home_team: str, away_team: str) -> float:
        """Predict point spread for a matchup."""
        home_rating = self.ratings.get(home_team, 0)
        away_rating = self.ratings.get(away_team, 0)

        return home_rating - away_rating + self.home_advantage

    def predict_win_prob(self, home_team: str, away_team: str,
                        std_dev: float = 13.5) -> float:
        """Predict win probability from spread."""
        spread = self.predict_spread(home_team, away_team)
        return spread_to_probability(spread, std_dev)

18.2.3 Adjusted Efficiency Ratings

Modern rating systems often use play-level efficiency metrics rather than just game scores:

class AdjustedEfficiencyRatings:
    """
    Rating system based on adjusted offensive and defensive efficiency.

    Similar to approaches used by ESPN's FPI or SP+.
    Adjusts for opponent strength and game context.
    """

    def __init__(self):
        self.off_ratings = {}
        self.def_ratings = {}
        self.overall_ratings = {}
        self.games = []

    def add_game_efficiency(self, team: str, opponent: str,
                           off_epa: float, def_epa: float,
                           is_home: bool = True):
        """
        Add game efficiency data.

        Parameters:
        -----------
        team : str
            Team name
        opponent : str
            Opponent name
        off_epa : float
            Offensive EPA per play
        def_epa : float
            Defensive EPA per play (negative is better)
        is_home : bool
            Whether team was home
        """
        self.games.append({
            'team': team,
            'opponent': opponent,
            'off_epa': off_epa,
            'def_epa': def_epa,
            'is_home': is_home
        })

    def calculate_ratings(self, n_iterations: int = 10) -> None:
        """
        Calculate adjusted efficiency ratings iteratively.

        Uses iterative adjustment to account for opponent strength.

        Parameters:
        -----------
        n_iterations : int
            Number of adjustment iterations
        """
        df = pd.DataFrame(self.games)

        all_teams = sorted(set(df['team']) | set(df['opponent']))

        # Initialize with raw averages
        for team in all_teams:
            team_games = df[df['team'] == team]
            if len(team_games) > 0:
                self.off_ratings[team] = team_games['off_epa'].mean()
                self.def_ratings[team] = team_games['def_epa'].mean()
            else:
                self.off_ratings[team] = 0.0
                self.def_ratings[team] = 0.0

        # Iterative adjustment
        for iteration in range(n_iterations):
            new_off = {}
            new_def = {}

            for team in all_teams:
                team_games = df[df['team'] == team]

                if len(team_games) == 0:
                    new_off[team] = 0.0
                    new_def[team] = 0.0
                    continue

                # Adjust offensive production for opponent defense
                adj_off = []
                for _, game in team_games.iterrows():
                    opp_def = self.def_ratings.get(game['opponent'], 0)
                    # If opponent defense is bad (high EPA allowed), adjust down
                    adjustment = game['off_epa'] - opp_def
                    adj_off.append(adjustment)

                # Adjust defensive production for opponent offense
                adj_def = []
                for _, game in team_games.iterrows():
                    opp_off = self.off_ratings.get(game['opponent'], 0)
                    # If opponent offense is good, adjust up (less negative)
                    adjustment = game['def_epa'] + opp_off
                    adj_def.append(adjustment)

                new_off[team] = np.mean(adj_off)
                new_def[team] = np.mean(adj_def)

            self.off_ratings = new_off
            self.def_ratings = new_def

        # Calculate overall rating
        for team in all_teams:
            self.overall_ratings[team] = (
                self.off_ratings[team] - self.def_ratings[team]
            )

    def predict_game(self, home_team: str, away_team: str,
                    home_advantage: float = 0.03) -> Dict:
        """
        Predict game outcome.

        Parameters:
        -----------
        home_team : str
            Home team
        away_team : str
            Away team
        home_advantage : float
            Home advantage in EPA units

        Returns:
        --------
        Dict : Prediction details
        """
        # Home offense vs away defense
        home_off = self.off_ratings.get(home_team, 0) + home_advantage
        away_def = self.def_ratings.get(away_team, 0)
        home_expected_off = home_off - away_def

        # Away offense vs home defense
        away_off = self.off_ratings.get(away_team, 0)
        home_def = self.def_ratings.get(home_team, 0) + home_advantage * 0.5
        away_expected_off = away_off - home_def

        # Net efficiency difference
        efficiency_diff = home_expected_off - away_expected_off

        # Convert to spread (roughly 20 points per 0.1 EPA/play)
        predicted_spread = efficiency_diff * 200

        # Convert to probability
        win_prob = spread_to_probability(predicted_spread)

        return {
            'home_win_prob': win_prob,
            'predicted_spread': predicted_spread,
            'home_expected_epa': home_expected_off,
            'away_expected_epa': away_expected_off,
            'efficiency_diff': efficiency_diff
        }

18.3 Feature Engineering for Game Prediction

Effective feature engineering is critical for game prediction models. Features should capture team strength, matchup dynamics, situational factors, and historical patterns.

18.3.1 Team Strength Features

class TeamStrengthFeatures:
    """
    Generate team strength features for game prediction.
    """

    def __init__(self, lookback_games: int = 5):
        """
        Initialize feature generator.

        Parameters:
        -----------
        lookback_games : int
            Number of recent games for rolling features
        """
        self.lookback_games = lookback_games

    def generate_features(self, team_data: pd.DataFrame,
                         game_date: pd.Timestamp) -> Dict:
        """
        Generate features for a team at a specific point in time.

        Parameters:
        -----------
        team_data : pd.DataFrame
            Historical team game data
        game_date : pd.Timestamp
            Date of game to predict (features use only prior data)

        Returns:
        --------
        Dict : Feature dictionary
        """
        # Filter to games before this date
        prior_games = team_data[team_data['date'] < game_date]

        if len(prior_games) == 0:
            return self._default_features()

        # Season to date features
        season = game_date.year if game_date.month > 6 else game_date.year - 1
        season_games = prior_games[prior_games['season'] == season]

        features = {}

        # Win percentage
        features['season_win_pct'] = (
            season_games['win'].mean() if len(season_games) > 0 else 0.5
        )

        # Scoring features
        features['ppg'] = season_games['points_for'].mean() if len(season_games) > 0 else 25
        features['papg'] = season_games['points_against'].mean() if len(season_games) > 0 else 25
        features['point_diff_per_game'] = features['ppg'] - features['papg']

        # Efficiency features (if available)
        if 'off_epa' in season_games.columns:
            features['off_epa'] = season_games['off_epa'].mean()
            features['def_epa'] = season_games['def_epa'].mean()
            features['total_epa'] = features['off_epa'] - features['def_epa']

        # Rolling features (recent form)
        recent = season_games.tail(self.lookback_games)
        if len(recent) > 0:
            features['recent_win_pct'] = recent['win'].mean()
            features['recent_ppg'] = recent['points_for'].mean()
            features['recent_papg'] = recent['points_against'].mean()
        else:
            features['recent_win_pct'] = features['season_win_pct']
            features['recent_ppg'] = features['ppg']
            features['recent_papg'] = features['papg']

        # Turnover features
        if 'turnovers' in season_games.columns:
            features['turnover_margin'] = (
                (season_games['turnovers_forced'] - season_games['turnovers']).mean()
            )
        else:
            features['turnover_margin'] = 0

        # Third down and red zone (if available)
        if 'third_down_pct' in season_games.columns:
            features['third_down_pct'] = season_games['third_down_pct'].mean()
            features['red_zone_pct'] = season_games['red_zone_pct'].mean()

        return features

    def _default_features(self) -> Dict:
        """Return default features for new/unknown teams."""
        return {
            'season_win_pct': 0.5,
            'ppg': 25,
            'papg': 25,
            'point_diff_per_game': 0,
            'off_epa': 0,
            'def_epa': 0,
            'total_epa': 0,
            'recent_win_pct': 0.5,
            'recent_ppg': 25,
            'recent_papg': 25,
            'turnover_margin': 0
        }


class MatchupFeatures:
    """
    Generate matchup-specific features.
    """

    @staticmethod
    def generate_differential_features(home_features: Dict,
                                       away_features: Dict) -> Dict:
        """
        Create differential features between teams.

        Parameters:
        -----------
        home_features : Dict
            Home team features
        away_features : Dict
            Away team features

        Returns:
        --------
        Dict : Differential features
        """
        diff_features = {}

        # For each numeric feature, create differential
        for key in home_features:
            if isinstance(home_features[key], (int, float)):
                diff_features[f'{key}_diff'] = (
                    home_features[key] - away_features.get(key, 0)
                )

        # Matchup-specific features
        diff_features['total_quality'] = (
            home_features.get('total_epa', 0) +
            away_features.get('total_epa', 0)
        )

        # Stylistic matchups
        diff_features['offense_matchup'] = (
            home_features.get('off_epa', 0) +
            away_features.get('def_epa', 0)  # Good offense vs bad defense
        )
        diff_features['defense_matchup'] = (
            -home_features.get('def_epa', 0) -
            away_features.get('off_epa', 0)  # Good defense vs good offense
        )

        return diff_features

    @staticmethod
    def generate_head_to_head_features(history: pd.DataFrame,
                                      home_team: str,
                                      away_team: str,
                                      n_games: int = 5) -> Dict:
        """
        Generate head-to-head historical features.

        Parameters:
        -----------
        history : pd.DataFrame
            Historical games between teams
        home_team : str
            Home team name
        away_team : str
            Away team name
        n_games : int
            Number of recent games to consider

        Returns:
        --------
        Dict : Head-to-head features
        """
        # Filter to games between these teams
        h2h = history[
            ((history['home_team'] == home_team) & (history['away_team'] == away_team)) |
            ((history['home_team'] == away_team) & (history['away_team'] == home_team))
        ].tail(n_games)

        if len(h2h) == 0:
            return {
                'h2h_games': 0,
                'h2h_win_pct': 0.5,
                'h2h_avg_margin': 0
            }

        # Calculate win percentage for home team
        home_wins = h2h[
            ((h2h['home_team'] == home_team) & (h2h['home_win'] == 1)) |
            ((h2h['away_team'] == home_team) & (h2h['home_win'] == 0))
        ]

        # Average margin (from home team perspective)
        margins = []
        for _, game in h2h.iterrows():
            if game['home_team'] == home_team:
                margin = game['home_score'] - game['away_score']
            else:
                margin = game['away_score'] - game['home_score']
            margins.append(margin)

        return {
            'h2h_games': len(h2h),
            'h2h_win_pct': len(home_wins) / len(h2h),
            'h2h_avg_margin': np.mean(margins)
        }

18.3.2 Situational Features

class SituationalFeatures:
    """
    Generate situational and contextual features.
    """

    @staticmethod
    def generate_rest_features(home_last_game: pd.Timestamp,
                              away_last_game: pd.Timestamp,
                              game_date: pd.Timestamp) -> Dict:
        """
        Generate rest and scheduling features.

        Parameters:
        -----------
        home_last_game : pd.Timestamp
            Date of home team's last game
        away_last_game : pd.Timestamp
            Date of away team's last game
        game_date : pd.Timestamp
            Date of upcoming game

        Returns:
        --------
        Dict : Rest features
        """
        home_rest = (game_date - home_last_game).days
        away_rest = (game_date - away_last_game).days

        return {
            'home_rest_days': home_rest,
            'away_rest_days': away_rest,
            'rest_advantage': home_rest - away_rest,
            'home_short_rest': 1 if home_rest < 7 else 0,
            'away_short_rest': 1 if away_rest < 7 else 0,
            'home_bye': 1 if home_rest > 10 else 0,
            'away_bye': 1 if away_rest > 10 else 0
        }

    @staticmethod
    def generate_travel_features(home_lat: float, home_lon: float,
                                away_lat: float, away_lon: float) -> Dict:
        """
        Generate travel-related features.

        Uses great circle distance approximation.
        """
        from math import radians, sin, cos, sqrt, atan2

        R = 3959  # Earth radius in miles

        lat1, lon1 = radians(home_lat), radians(home_lon)
        lat2, lon2 = radians(away_lat), radians(away_lon)

        dlat = lat2 - lat1
        dlon = lon2 - lon1

        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        c = 2 * atan2(sqrt(a), sqrt(1-a))
        distance = R * c

        # Time zone difference (simplified)
        tz_diff = abs(home_lon - away_lon) / 15  # Approximate hours

        return {
            'travel_distance': distance,
            'timezone_diff': tz_diff,
            'long_travel': 1 if distance > 1000 else 0,
            'cross_country': 1 if distance > 2000 else 0
        }

    @staticmethod
    def generate_season_context(game_week: int, game_type: str,
                               home_record: Tuple[int, int],
                               away_record: Tuple[int, int]) -> Dict:
        """
        Generate season context features.

        Parameters:
        -----------
        game_week : int
            Week number in season
        game_type : str
            'regular', 'conference_championship', 'bowl', 'playoff'
        home_record : Tuple[int, int]
            Home team (wins, losses)
        away_record : Tuple[int, int]
            Away team (wins, losses)

        Returns:
        --------
        Dict : Context features
        """
        home_wins, home_losses = home_record
        away_wins, away_losses = away_record

        return {
            'game_week': game_week,
            'early_season': 1 if game_week <= 4 else 0,
            'mid_season': 1 if 5 <= game_week <= 9 else 0,
            'late_season': 1 if game_week >= 10 else 0,
            'is_bowl': 1 if game_type == 'bowl' else 0,
            'is_playoff': 1 if game_type == 'playoff' else 0,
            'is_conf_championship': 1 if game_type == 'conference_championship' else 0,
            'home_games_played': home_wins + home_losses,
            'away_games_played': away_wins + away_losses,
            'home_elimination_game': 1 if home_losses >= 3 else 0,  # Simplified
            'away_elimination_game': 1 if away_losses >= 3 else 0
        }

18.4 Machine Learning Models for Game Prediction

18.4.1 Complete Prediction Pipeline

from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.calibration import CalibratedClassifierCV


class GamePredictionPipeline:
    """
    Complete machine learning pipeline for game prediction.
    """

    def __init__(self):
        self.feature_generators = {
            'team_strength': TeamStrengthFeatures(),
            'matchup': MatchupFeatures,
            'situational': SituationalFeatures
        }
        self.scaler = StandardScaler()
        self.model = None
        self.feature_columns = None
        self.is_fitted = False

    def prepare_game_features(self, game: Dict,
                             team_history: pd.DataFrame) -> pd.DataFrame:
        """
        Prepare all features for a single game.

        Parameters:
        -----------
        game : Dict
            Game information
        team_history : pd.DataFrame
            Historical team performance data

        Returns:
        --------
        pd.DataFrame : Feature row
        """
        game_date = pd.Timestamp(game['date'])

        # Team strength features
        home_strength = self.feature_generators['team_strength'].generate_features(
            team_history[team_history['team'] == game['home_team']],
            game_date
        )
        away_strength = self.feature_generators['team_strength'].generate_features(
            team_history[team_history['team'] == game['away_team']],
            game_date
        )

        # Differential features
        diff_features = MatchupFeatures.generate_differential_features(
            home_strength, away_strength
        )

        # Combine all features
        features = {**diff_features}
        features['home_field'] = 1

        return pd.DataFrame([features])

    def prepare_training_data(self, games: pd.DataFrame,
                             team_history: pd.DataFrame) -> Tuple[pd.DataFrame, pd.Series]:
        """
        Prepare training data from historical games.

        Parameters:
        -----------
        games : pd.DataFrame
            Games to prepare (with home_win labels)
        team_history : pd.DataFrame
            Team performance history

        Returns:
        --------
        Tuple[pd.DataFrame, pd.Series] : Features and labels
        """
        all_features = []

        for _, game in games.iterrows():
            features = self.prepare_game_features(game.to_dict(), team_history)
            all_features.append(features)

        X = pd.concat(all_features, ignore_index=True)
        y = games['home_win']

        self.feature_columns = X.columns.tolist()

        return X, y

    def fit(self, X: pd.DataFrame, y: pd.Series,
           model_type: str = 'gradient_boosting',
           calibrate: bool = True) -> 'GamePredictionPipeline':
        """
        Fit the prediction model.

        Parameters:
        -----------
        X : pd.DataFrame
            Feature matrix
        y : pd.Series
            Labels
        model_type : str
            'logistic', 'random_forest', or 'gradient_boosting'
        calibrate : bool
            Whether to calibrate probabilities

        Returns:
        --------
        self : Fitted pipeline
        """
        # Scale features
        X_scaled = self.scaler.fit_transform(X)

        # Select base model
        if model_type == 'logistic':
            base_model = LogisticRegression(max_iter=1000, random_state=42)
        elif model_type == 'random_forest':
            base_model = RandomForestClassifier(
                n_estimators=200, max_depth=6, random_state=42
            )
        elif model_type == 'gradient_boosting':
            base_model = GradientBoostingClassifier(
                n_estimators=200, max_depth=4, learning_rate=0.05, random_state=42
            )
        else:
            raise ValueError(f"Unknown model type: {model_type}")

        # Calibrate if requested
        if calibrate:
            self.model = CalibratedClassifierCV(base_model, cv=5, method='isotonic')
        else:
            self.model = base_model

        self.model.fit(X_scaled, y)
        self.is_fitted = True

        return self

    def predict(self, X: pd.DataFrame) -> GamePrediction:
        """
        Generate prediction for a game.

        Parameters:
        -----------
        X : pd.DataFrame
            Game features (single row)

        Returns:
        --------
        GamePrediction : Prediction object
        """
        if not self.is_fitted:
            raise ValueError("Pipeline not fitted")

        X_scaled = self.scaler.transform(X)

        prob = self.model.predict_proba(X_scaled)[0, 1]
        spread = probability_to_spread(prob)

        # Estimate total (simplified)
        total = 55  # Average total

        return {
            'home_win_prob': prob,
            'predicted_spread': spread,
            'predicted_total': total
        }

    def cross_validate(self, X: pd.DataFrame, y: pd.Series,
                      cv: int = 5) -> Dict:
        """
        Cross-validate the model.

        Parameters:
        -----------
        X : pd.DataFrame
            Features
        y : pd.Series
            Labels
        cv : int
            Number of folds

        Returns:
        --------
        Dict : CV results
        """
        X_scaled = self.scaler.fit_transform(X)

        # Use TimeSeriesSplit for temporal data
        tscv = TimeSeriesSplit(n_splits=cv)

        accuracy_scores = cross_val_score(
            self.model, X_scaled, y, cv=tscv, scoring='accuracy'
        )
        auc_scores = cross_val_score(
            self.model, X_scaled, y, cv=tscv, scoring='roc_auc'
        )

        return {
            'cv_accuracy_mean': accuracy_scores.mean(),
            'cv_accuracy_std': accuracy_scores.std(),
            'cv_auc_mean': auc_scores.mean(),
            'cv_auc_std': auc_scores.std()
        }

18.4.2 Ensemble Methods

Combining multiple models often produces better predictions than any single model:

class EnsemblePredictor:
    """
    Ensemble predictor combining multiple models.
    """

    def __init__(self):
        self.models = {}
        self.weights = {}
        self.scaler = StandardScaler()
        self.is_fitted = False

    def add_model(self, name: str, model, weight: float = 1.0):
        """Add a model to the ensemble."""
        self.models[name] = model
        self.weights[name] = weight

    def add_default_models(self):
        """Add default ensemble of models."""
        self.models = {
            'logistic': LogisticRegression(max_iter=1000, random_state=42),
            'random_forest': RandomForestClassifier(
                n_estimators=200, max_depth=6, random_state=42
            ),
            'gradient_boosting': GradientBoostingClassifier(
                n_estimators=200, max_depth=4, learning_rate=0.05, random_state=42
            )
        }
        # Equal weights by default
        self.weights = {name: 1.0 for name in self.models}

    def fit(self, X: pd.DataFrame, y: pd.Series,
           optimize_weights: bool = True) -> 'EnsemblePredictor':
        """
        Fit all models in the ensemble.

        Parameters:
        -----------
        X : pd.DataFrame
            Features
        y : pd.Series
            Labels
        optimize_weights : bool
            Whether to optimize weights based on CV performance

        Returns:
        --------
        self : Fitted ensemble
        """
        X_scaled = self.scaler.fit_transform(X)

        cv_scores = {}

        for name, model in self.models.items():
            print(f"Fitting {name}...")

            # Cross-validate to get weight
            if optimize_weights:
                scores = cross_val_score(model, X_scaled, y, cv=5, scoring='roc_auc')
                cv_scores[name] = scores.mean()

            # Fit on full data
            model.fit(X_scaled, y)

        # Optimize weights based on CV performance
        if optimize_weights:
            total_score = sum(cv_scores.values())
            self.weights = {
                name: score / total_score
                for name, score in cv_scores.items()
            }

            print("\nOptimized weights:")
            for name, weight in self.weights.items():
                print(f"  {name}: {weight:.3f}")

        self.is_fitted = True
        return self

    def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
        """
        Get ensemble probability predictions.

        Uses weighted average of model predictions.
        """
        if not self.is_fitted:
            raise ValueError("Ensemble not fitted")

        X_scaled = self.scaler.transform(X)

        weighted_probs = np.zeros(len(X))
        total_weight = sum(self.weights.values())

        for name, model in self.models.items():
            probs = model.predict_proba(X_scaled)[:, 1]
            weighted_probs += self.weights[name] * probs

        return weighted_probs / total_weight

    def predict(self, X: pd.DataFrame) -> np.ndarray:
        """Get binary predictions."""
        probs = self.predict_proba(X)
        return (probs > 0.5).astype(int)

    def evaluate(self, X: pd.DataFrame, y: pd.Series) -> Dict:
        """Evaluate ensemble performance."""
        from sklearn.metrics import accuracy_score, roc_auc_score, brier_score_loss

        probs = self.predict_proba(X)
        preds = (probs > 0.5).astype(int)

        # Ensemble metrics
        ensemble_metrics = {
            'ensemble_accuracy': accuracy_score(y, preds),
            'ensemble_auc': roc_auc_score(y, probs),
            'ensemble_brier': brier_score_loss(y, probs)
        }

        # Individual model metrics
        X_scaled = self.scaler.transform(X)
        for name, model in self.models.items():
            model_probs = model.predict_proba(X_scaled)[:, 1]
            model_preds = (model_probs > 0.5).astype(int)

            ensemble_metrics[f'{name}_accuracy'] = accuracy_score(y, model_preds)
            ensemble_metrics[f'{name}_auc'] = roc_auc_score(y, model_probs)

        return ensemble_metrics

18.5 Evaluating Game Predictions

18.5.1 Metrics for Game Prediction

class GamePredictionEvaluator:
    """
    Comprehensive evaluation for game predictions.
    """

    @staticmethod
    def evaluate_accuracy(y_true: np.ndarray, y_pred: np.ndarray,
                         y_prob: np.ndarray) -> Dict:
        """
        Evaluate prediction accuracy.

        Parameters:
        -----------
        y_true : np.ndarray
            Actual outcomes (0/1)
        y_pred : np.ndarray
            Predicted outcomes (0/1)
        y_prob : np.ndarray
            Predicted probabilities

        Returns:
        --------
        Dict : Accuracy metrics
        """
        from sklearn.metrics import (
            accuracy_score, precision_score, recall_score,
            f1_score, roc_auc_score, brier_score_loss, log_loss
        )

        return {
            'accuracy': accuracy_score(y_true, y_pred),
            'precision': precision_score(y_true, y_pred),
            'recall': recall_score(y_true, y_pred),
            'f1': f1_score(y_true, y_pred),
            'auc_roc': roc_auc_score(y_true, y_prob),
            'brier_score': brier_score_loss(y_true, y_prob),
            'log_loss': log_loss(y_true, y_prob)
        }

    @staticmethod
    def evaluate_by_confidence(y_true: np.ndarray, y_prob: np.ndarray,
                               n_bins: int = 5) -> pd.DataFrame:
        """
        Evaluate accuracy by prediction confidence.

        Parameters:
        -----------
        y_true : np.ndarray
            Actual outcomes
        y_prob : np.ndarray
            Predicted probabilities
        n_bins : int
            Number of confidence bins

        Returns:
        --------
        pd.DataFrame : Accuracy by confidence level
        """
        # Calculate confidence as distance from 0.5
        confidence = np.abs(y_prob - 0.5) * 2  # 0 to 1 scale

        # Create bins
        bins = np.linspace(0, 1, n_bins + 1)
        bin_labels = [f'{bins[i]:.0%}-{bins[i+1]:.0%}' for i in range(n_bins)]

        results = []
        for i in range(n_bins):
            mask = (confidence >= bins[i]) & (confidence < bins[i+1])
            if mask.sum() > 0:
                pred_binary = (y_prob[mask] > 0.5).astype(int)
                accuracy = (pred_binary == y_true[mask]).mean()

                results.append({
                    'confidence_bin': bin_labels[i],
                    'n_games': mask.sum(),
                    'accuracy': accuracy,
                    'avg_confidence': confidence[mask].mean()
                })

        return pd.DataFrame(results)

    @staticmethod
    def compare_to_baseline(y_true: np.ndarray, y_pred: np.ndarray,
                           y_prob: np.ndarray) -> Dict:
        """
        Compare model to various baselines.

        Parameters:
        -----------
        y_true : np.ndarray
            Actual outcomes
        y_pred : np.ndarray
            Model predictions
        y_prob : np.ndarray
            Model probabilities

        Returns:
        --------
        Dict : Comparison metrics
        """
        # Baselines
        home_rate = y_true.mean()
        home_always = home_rate
        away_always = 1 - home_rate
        coin_flip = 0.5

        model_accuracy = (y_pred == y_true).mean()

        # Information gain metrics
        from sklearn.metrics import brier_score_loss

        model_brier = brier_score_loss(y_true, y_prob)
        baseline_brier = brier_score_loss(y_true, np.full_like(y_prob, home_rate))

        return {
            'model_accuracy': model_accuracy,
            'home_always_baseline': home_always,
            'coin_flip_baseline': coin_flip,
            'improvement_over_home': model_accuracy - home_always,
            'improvement_over_coin': model_accuracy - coin_flip,
            'model_brier': model_brier,
            'baseline_brier': baseline_brier,
            'brier_skill_score': 1 - (model_brier / baseline_brier)
        }

    @staticmethod
    def evaluate_against_spread(y_true_margin: np.ndarray,
                               predicted_spread: np.ndarray,
                               market_spread: np.ndarray) -> Dict:
        """
        Evaluate predictions against the spread.

        Parameters:
        -----------
        y_true_margin : np.ndarray
            Actual game margins (home - away)
        predicted_spread : np.ndarray
            Model predicted spreads
        market_spread : np.ndarray
            Market/Vegas spreads

        Returns:
        --------
        Dict : ATS metrics
        """
        # Model covers when prediction is on correct side of market
        model_pick = predicted_spread > market_spread
        actual_cover = y_true_margin > market_spread

        model_ats_accuracy = (model_pick == actual_cover).mean()

        # MAE comparison
        model_mae = np.abs(predicted_spread - y_true_margin).mean()
        market_mae = np.abs(market_spread - y_true_margin).mean()

        return {
            'model_ats_accuracy': model_ats_accuracy,
            'model_mae': model_mae,
            'market_mae': market_mae,
            'mae_vs_market': model_mae - market_mae,
            'games_evaluated': len(y_true_margin)
        }

18.5.2 Calibration Assessment

def assess_calibration(y_true: np.ndarray, y_prob: np.ndarray,
                      n_bins: int = 10) -> Dict:
    """
    Comprehensive calibration assessment.

    Parameters:
    -----------
    y_true : np.ndarray
        Actual outcomes
    y_prob : np.ndarray
        Predicted probabilities
    n_bins : int
        Number of calibration bins

    Returns:
    --------
    Dict : Calibration metrics and data
    """
    from sklearn.calibration import calibration_curve

    # Calculate calibration curve
    prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=n_bins)

    # Expected Calibration Error (ECE)
    bin_boundaries = np.linspace(0, 1, n_bins + 1)
    ece = 0
    mce = 0  # Maximum Calibration Error

    calibration_data = []

    for i in range(n_bins):
        low, high = bin_boundaries[i], bin_boundaries[i+1]
        mask = (y_prob >= low) & (y_prob < high)

        if mask.sum() > 0:
            bin_accuracy = y_true[mask].mean()
            bin_confidence = y_prob[mask].mean()
            bin_size = mask.sum()

            error = abs(bin_accuracy - bin_confidence)
            ece += (bin_size / len(y_prob)) * error
            mce = max(mce, error)

            calibration_data.append({
                'bin': f'{low:.1f}-{high:.1f}',
                'n_samples': bin_size,
                'avg_confidence': bin_confidence,
                'actual_accuracy': bin_accuracy,
                'calibration_error': error
            })

    return {
        'ece': ece,
        'mce': mce,
        'calibration_curve': (prob_true, prob_pred),
        'calibration_data': pd.DataFrame(calibration_data),
        'brier_score': brier_score_loss(y_true, y_prob)
    }


def plot_calibration_and_distribution(y_true: np.ndarray,
                                      y_prob: np.ndarray,
                                      title: str = 'Model Calibration') -> plt.Figure:
    """
    Plot calibration curve with prediction distribution.
    """
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10),
                                   gridspec_kw={'height_ratios': [2, 1]})

    # Calibration plot
    from sklearn.calibration import calibration_curve
    prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=10)

    ax1.plot([0, 1], [0, 1], 'k--', label='Perfect calibration')
    ax1.plot(prob_pred, prob_true, 's-', label='Model')

    ax1.set_xlabel('Mean Predicted Probability')
    ax1.set_ylabel('Fraction of Positives')
    ax1.set_title(title)
    ax1.legend(loc='lower right')
    ax1.grid(True, alpha=0.3)

    # Distribution of predictions
    ax2.hist(y_prob, bins=50, edgecolor='black', alpha=0.7)
    ax2.axvline(x=0.5, color='red', linestyle='--', label='Decision threshold')
    ax2.set_xlabel('Predicted Probability')
    ax2.set_ylabel('Count')
    ax2.set_title('Distribution of Predictions')
    ax2.legend()

    fig.tight_layout()
    return fig

18.6 Production Deployment

18.6.1 Complete Prediction System

class ProductionGamePredictor:
    """
    Production-ready game prediction system.

    Combines Elo ratings, ML features, and ensemble prediction.
    """

    def __init__(self, config: Dict = None):
        """
        Initialize production predictor.

        Parameters:
        -----------
        config : Dict
            Configuration options
        """
        self.config = config or self._default_config()

        # Components
        self.elo_system = EloRatingSystem(
            k_factor=self.config['elo_k_factor'],
            home_advantage=self.config['elo_home_advantage']
        )
        self.feature_pipeline = GamePredictionPipeline()
        self.ensemble = EnsemblePredictor()

        self.is_trained = False
        self.training_metrics = None

    def _default_config(self) -> Dict:
        """Default configuration."""
        return {
            'elo_k_factor': 20,
            'elo_home_advantage': 65,
            'elo_weight': 0.3,
            'ml_weight': 0.7,
            'calibration_method': 'isotonic',
            'min_games_for_prediction': 3
        }

    def train(self, historical_games: pd.DataFrame,
             team_history: pd.DataFrame,
             test_season: int = 2023) -> Dict:
        """
        Train the complete system.

        Parameters:
        -----------
        historical_games : pd.DataFrame
            Game results with scores
        team_history : pd.DataFrame
            Team performance data
        test_season : int
            Season to hold out for testing

        Returns:
        --------
        Dict : Training results
        """
        print("=" * 60)
        print("TRAINING GAME PREDICTION SYSTEM")
        print("=" * 60)

        # 1. Build Elo ratings from history
        print("\n1. Building Elo ratings...")
        train_games = historical_games[historical_games['season'] < test_season]

        for _, game in train_games.iterrows():
            self.elo_system.update_ratings(
                game['home_team'], game['away_team'],
                game['home_score'], game['away_score']
            )
        print(f"   Processed {len(train_games)} games")

        # 2. Prepare ML features
        print("\n2. Preparing ML features...")
        X_train, y_train = self.feature_pipeline.prepare_training_data(
            train_games, team_history
        )
        print(f"   Generated {len(X_train.columns)} features")

        # 3. Train ensemble
        print("\n3. Training ensemble models...")
        self.ensemble.add_default_models()
        self.ensemble.fit(X_train, y_train, optimize_weights=True)

        # 4. Evaluate on test set
        print("\n4. Evaluating on test set...")
        test_games = historical_games[historical_games['season'] >= test_season]
        X_test, y_test = self.feature_pipeline.prepare_training_data(
            test_games, team_history
        )

        self.training_metrics = self.ensemble.evaluate(X_test, y_test)

        print(f"\n   Test Results:")
        print(f"   Accuracy: {self.training_metrics['ensemble_accuracy']:.1%}")
        print(f"   AUC-ROC: {self.training_metrics['ensemble_auc']:.3f}")
        print(f"   Brier Score: {self.training_metrics['ensemble_brier']:.4f}")

        self.is_trained = True

        return self.training_metrics

    def predict_game(self, home_team: str, away_team: str,
                    game_context: Dict = None) -> GamePrediction:
        """
        Generate prediction for an upcoming game.

        Parameters:
        -----------
        home_team : str
            Home team name
        away_team : str
            Away team name
        game_context : Dict
            Additional context (date, venue, etc.)

        Returns:
        --------
        GamePrediction : Complete prediction
        """
        if not self.is_trained:
            raise ValueError("System not trained. Call train() first.")

        # Get Elo-based prediction
        elo_prob = self.elo_system.expected_win_prob(home_team, away_team)

        # For production, we would generate features from current team data
        # Here we'll combine with Elo prediction

        # Weighted combination (simplified for demo)
        elo_weight = self.config['elo_weight']
        ml_weight = self.config['ml_weight']

        # In production, would get ML prediction from current features
        # For now, use Elo as proxy
        combined_prob = elo_prob  # Simplified

        # Calculate spread and confidence
        predicted_spread = probability_to_spread(combined_prob)

        # Confidence based on rating difference
        home_rating = self.elo_system.get_rating(home_team)
        away_rating = self.elo_system.get_rating(away_team)
        rating_diff = abs(home_rating - away_rating)
        confidence = min(rating_diff / 400, 1.0)  # Normalize

        return GamePrediction(
            game_id=f"{home_team}_{away_team}_{game_context.get('date', 'unknown')}",
            home_team=home_team,
            away_team=away_team,
            home_win_prob=combined_prob,
            predicted_spread=predicted_spread,
            predicted_total=55,  # Simplified
            confidence=confidence,
            model_name='production_ensemble'
        )

    def predict_week(self, games: List[Dict]) -> pd.DataFrame:
        """
        Generate predictions for a week of games.

        Parameters:
        -----------
        games : List[Dict]
            List of game dictionaries with home_team, away_team

        Returns:
        --------
        pd.DataFrame : All predictions
        """
        predictions = []

        for game in games:
            pred = self.predict_game(
                game['home_team'],
                game['away_team'],
                game
            )
            predictions.append(pred.to_dict())

        df = pd.DataFrame(predictions)
        df = df.sort_values('home_win_prob', ascending=False)

        return df

Summary

Game outcome prediction represents a sophisticated application of machine learning to sports analytics. Key takeaways from this chapter:

Multiple Approaches: Effective prediction combines rating systems (Elo), statistical features, and machine learning models. No single approach dominates across all situations.
Feature Engineering is Critical: The quality of features determines model performance more than algorithm choice. Focus on capturing team strength, matchup dynamics, and situational factors.
Calibration Matters: Well-calibrated probabilities enable better decision-making. Always assess calibration alongside accuracy.
Respect Uncertainty: Football is inherently unpredictable. Even perfect models would achieve only ~75-78% accuracy due to game variance.
Evaluation Against Baselines: Always compare to meaningful baselines (home team advantage, market lines) to assess true model value.
Temporal Validation: Use proper temporal splits to avoid data leakage and assess realistic performance.
Ensemble Methods: Combining diverse models typically outperforms any single approach.

The next chapter extends these techniques to player performance forecasting, predicting individual player outcomes throughout a season.

Chapter 18 Exercises

See exercises.md for practice problems ranging from basic Elo implementation to building complete prediction systems.

Chapter 18 Code Examples

example-01-elo-system.py: Complete Elo rating implementation
example-02-feature-engineering.py: Advanced feature generation
example-03-ml-pipeline.py: Full prediction pipeline with ensemble
example-04-evaluation.py: Comprehensive model evaluation