Chapter 18: Exercises - Game Outcome Prediction
Exercise Overview
These exercises build progressively from implementing basic rating systems to building complete game prediction pipelines.
Level 1: Conceptual Foundation
Exercise 1.1: Understanding Prediction Targets
Task: Match each prediction target with its appropriate evaluation metric.
| Prediction Target | Primary Metric |
|---|---|
| Win Probability | A. Mean Absolute Error |
| Point Spread | B. Brier Score |
| Total Points | C. Against-the-Spread % |
Questions: 1. Why is accuracy alone insufficient for evaluating win probability models? 2. What makes predicting the point spread different from predicting the winner? 3. How does the relationship between spread and win probability help model evaluation?
Exercise 1.2: Baseline Calculations
Task: Given the following data from a college football season, calculate baseline accuracies.
Total games: 800
Home team wins: 472
Average margin (home - away): +3.2 points
Calculate: 1. Always-pick-home baseline accuracy 2. Random guess baseline accuracy 3. Majority class baseline accuracy 4. If a model achieves 62% accuracy, what is the improvement over each baseline?
Exercise 1.3: Spread-Probability Conversion
Task: Using the standard deviation of 13.5 points for game margins:
-
Convert these spreads to win probabilities: - Home -7 - Home +3 - Home -14 - Pick'em (0)
-
Convert these win probabilities to implied spreads: - 65% home win probability - 80% home win probability - 45% home win probability
Level 2: Building Rating Systems
Exercise 2.1: Simple Elo Implementation
Task: Implement a basic Elo rating system.
class SimpleElo:
"""
Implement basic Elo rating functionality.
Requirements:
1. Initialize all teams at 1500
2. K-factor of 20
3. No home field advantage (yet)
"""
def __init__(self, k_factor: float = 20):
# YOUR CODE HERE
pass
def expected_score(self, rating_a: float, rating_b: float) -> float:
"""
Calculate expected score for team A.
Use standard Elo formula: E = 1 / (1 + 10^((Rb - Ra) / 400))
"""
# YOUR CODE HERE
pass
def update_rating(self, rating: float, expected: float,
actual: float, k: float) -> float:
"""
Update a single rating.
New rating = Old rating + K * (Actual - Expected)
"""
# YOUR CODE HERE
pass
def process_game(self, team_a: str, team_b: str,
a_won: bool) -> Tuple[float, float]:
"""
Process a game result and return new ratings for both teams.
Parameters:
-----------
team_a, team_b : str
Team names
a_won : bool
True if team_a won
Returns:
--------
Tuple[float, float] : New ratings for (team_a, team_b)
"""
# YOUR CODE HERE
pass
Test your implementation:
# Test case
elo = SimpleElo()
# Team A beats Team B (both start at 1500)
new_a, new_b = elo.process_game("TeamA", "TeamB", True)
print(f"TeamA: {new_a:.1f}, TeamB: {new_b:.1f}")
# Expected: TeamA: ~1510, TeamB: ~1490
Exercise 2.2: Adding Home Field Advantage
Task: Extend your Elo system to include home field advantage.
class EloWithHFA(SimpleElo):
"""
Extend SimpleElo with home field advantage.
Home team gets a rating boost when calculating expected score.
Standard CFB HFA is about 60-70 Elo points (~3 actual points).
"""
def __init__(self, k_factor: float = 20, hfa: float = 65):
super().__init__(k_factor)
self.hfa = hfa
def process_game(self, home_team: str, away_team: str,
home_won: bool) -> Tuple[float, float]:
"""
Process game with home field advantage.
The expected score calculation should add HFA to home team's rating.
The actual rating update should NOT include HFA.
"""
# YOUR CODE HERE
pass
Exercise 2.3: Margin-Based Ratings
Task: Implement a simple margin-based rating system.
class SimpleMarginRating:
"""
Rating system that accounts for margin of victory.
Use least squares regression:
margin = home_rating - away_rating + home_advantage + error
"""
def __init__(self, home_advantage: float = 3.0):
self.home_advantage = home_advantage
self.games = []
self.ratings = {}
def add_game(self, home_team: str, away_team: str,
home_score: int, away_score: int):
"""Record a game result."""
# YOUR CODE HERE
pass
def calculate_ratings(self) -> Dict[str, float]:
"""
Calculate ratings using ridge regression.
Hint: Create a design matrix where each row is a game:
- Column for each team (+1 for home, -1 for away, 0 otherwise)
- Target is (margin - home_advantage)
"""
# YOUR CODE HERE
pass
def predict_margin(self, home_team: str, away_team: str) -> float:
"""Predict margin for a matchup."""
# YOUR CODE HERE
pass
Level 3: Feature Engineering
Exercise 3.1: Team Strength Features
Task: Create a feature generator for team strength metrics.
class TeamFeatureGenerator:
"""
Generate team strength features from game history.
"""
def __init__(self, lookback_games: int = 5):
self.lookback_games = lookback_games
def generate(self, team: str, games: pd.DataFrame,
cutoff_date: pd.Timestamp) -> Dict[str, float]:
"""
Generate features for a team using only games before cutoff_date.
Required features:
1. season_win_pct - Season win percentage
2. ppg - Points per game
3. papg - Points allowed per game
4. recent_win_pct - Win % in last N games
5. recent_point_diff - Point differential in last N games
6. home_win_pct - Win % in home games
7. away_win_pct - Win % in away games
Parameters:
-----------
team : str
Team name
games : pd.DataFrame
Games with columns: date, home_team, away_team,
home_score, away_score, season
cutoff_date : pd.Timestamp
Only use games before this date
Returns:
--------
Dict[str, float] : Feature dictionary
"""
# YOUR CODE HERE
pass
Exercise 3.2: Matchup Features
Task: Create differential features from team features.
def create_matchup_features(home_features: Dict[str, float],
away_features: Dict[str, float]) -> Dict[str, float]:
"""
Create matchup features from individual team features.
Required features:
1. All differential features (home - away)
2. total_quality - Sum of both teams' point differentials
3. pace_matchup - Combined scoring rates
4. home_field - Always 1 (indicator)
Parameters:
-----------
home_features : Dict
Home team feature dictionary
away_features : Dict
Away team feature dictionary
Returns:
--------
Dict[str, float] : Matchup features
"""
# YOUR CODE HERE
pass
Exercise 3.3: Situational Features
Task: Generate situational context features.
def create_situational_features(game_info: Dict) -> Dict[str, float]:
"""
Create situational and context features.
game_info contains:
- game_date: pd.Timestamp
- home_last_game_date: pd.Timestamp
- away_last_game_date: pd.Timestamp
- week: int (1-15)
- game_type: str ('regular', 'bowl', 'playoff')
- weather: str ('dome', 'good', 'rain', 'cold')
Required features:
1. home_rest_days - Days since home team's last game
2. away_rest_days - Days since away team's last game
3. rest_advantage - Difference in rest days
4. home_bye_week - 1 if home has 10+ days rest
5. away_bye_week - 1 if away has 10+ days rest
6. early_season - 1 if week <= 4
7. late_season - 1 if week >= 10
8. is_bowl - 1 if bowl game
9. is_rivalry - 1 if rivalry game (you define)
10. bad_weather - 1 if rain or cold
Returns:
--------
Dict[str, float] : Situational features
"""
# YOUR CODE HERE
pass
Level 4: Complete Prediction Pipeline
Exercise 4.1: Training Pipeline
Task: Build a complete training pipeline.
class GamePredictionTrainer:
"""
Complete training pipeline for game prediction.
"""
def __init__(self, model_type: str = 'gradient_boosting'):
self.model_type = model_type
self.scaler = StandardScaler()
self.model = None
self.feature_columns = None
def prepare_data(self, games: pd.DataFrame) -> Tuple[pd.DataFrame, pd.Series]:
"""
Prepare features and labels from game data.
For each game, generate features using only prior information
to avoid data leakage.
Returns:
--------
Tuple[pd.DataFrame, pd.Series] : (X, y)
"""
# YOUR CODE HERE
pass
def temporal_split(self, X: pd.DataFrame, y: pd.Series,
dates: pd.Series, test_start: str) -> Tuple:
"""
Split data temporally.
Returns:
--------
Tuple : (X_train, X_test, y_train, y_test)
"""
# YOUR CODE HERE
pass
def train(self, X_train: pd.DataFrame, y_train: pd.Series,
calibrate: bool = True) -> Dict:
"""
Train the model with optional calibration.
Returns:
--------
Dict : Training results including CV scores
"""
# YOUR CODE HERE
pass
def evaluate(self, X_test: pd.DataFrame, y_test: pd.Series) -> Dict:
"""
Comprehensive evaluation on test set.
Include:
- Accuracy, AUC, Brier score
- Comparison to baselines
- Calibration assessment
Returns:
--------
Dict : Evaluation metrics
"""
# YOUR CODE HERE
pass
Exercise 4.2: Ensemble Predictor
Task: Implement an ensemble that combines multiple models.
class GamePredictionEnsemble:
"""
Ensemble predictor combining Elo and ML models.
"""
def __init__(self):
self.elo = EloRatingSystem()
self.ml_model = None
self.weights = {'elo': 0.3, 'ml': 0.7}
def train(self, games: pd.DataFrame) -> Dict:
"""
Train both Elo and ML components.
Steps:
1. Process all games through Elo to build ratings
2. Generate ML features
3. Train ML model
4. Optionally optimize weights based on CV
Returns:
--------
Dict : Training results
"""
# YOUR CODE HERE
pass
def predict(self, home_team: str, away_team: str,
additional_features: Dict = None) -> Dict:
"""
Generate ensemble prediction.
Combine Elo probability with ML probability using weights.
Returns:
--------
Dict : Prediction with probability, spread, confidence
"""
# YOUR CODE HERE
pass
def optimize_weights(self, validation_games: pd.DataFrame) -> Dict:
"""
Optimize ensemble weights on validation data.
Find weights that minimize Brier score or maximize AUC.
Returns:
--------
Dict : Optimized weights
"""
# YOUR CODE HERE
pass
Level 5: Advanced Applications
Exercise 5.1: Season Simulation
Task: Build a season simulation system using your prediction model.
class SeasonSimulator:
"""
Monte Carlo season simulation using game predictions.
"""
def __init__(self, predictor):
"""
Initialize with a trained predictor.
Parameters:
-----------
predictor : GamePredictionEnsemble or similar
Trained prediction model
"""
self.predictor = predictor
def simulate_game(self, home_team: str, away_team: str) -> Dict:
"""
Simulate a single game outcome.
Use predicted probability to sample winner.
Use spread and variance to sample margin.
Returns:
--------
Dict : Simulated result with winner, margin, scores
"""
# YOUR CODE HERE
pass
def simulate_season(self, schedule: List[Dict],
n_simulations: int = 10000) -> pd.DataFrame:
"""
Simulate entire remaining season.
Parameters:
-----------
schedule : List[Dict]
List of games with home_team, away_team, date
n_simulations : int
Number of Monte Carlo iterations
Returns:
--------
pd.DataFrame : Results with columns:
- team
- avg_wins
- win_distribution (10%, 25%, 50%, 75%, 90%)
- playoff_probability
- conference_championship_probability
"""
# YOUR CODE HERE
pass
def playoff_scenarios(self, current_standings: pd.DataFrame,
remaining_schedule: List[Dict],
n_simulations: int = 10000) -> pd.DataFrame:
"""
Calculate playoff probability for each team.
Returns:
--------
pd.DataFrame : Team playoff probabilities
"""
# YOUR CODE HERE
pass
Exercise 5.2: Betting System Evaluation
Task: Build a system to evaluate predictions against betting markets.
class BettingEvaluator:
"""
Evaluate prediction model for betting applications.
"""
def __init__(self, min_edge: float = 0.02):
"""
Parameters:
-----------
min_edge : float
Minimum edge required to recommend a bet (default 2%)
"""
self.min_edge = min_edge
self.results = []
def evaluate_spread_bets(self, predictions: pd.DataFrame,
market_spreads: pd.DataFrame,
actual_results: pd.DataFrame) -> Dict:
"""
Evaluate against-the-spread performance.
Parameters:
-----------
predictions : pd.DataFrame
Model predictions with predicted_spread
market_spreads : pd.DataFrame
Vegas/market spreads
actual_results : pd.DataFrame
Actual game results with margin
Returns:
--------
Dict : ATS evaluation including:
- Overall ATS record
- ROI assuming -110 juice
- Record when model has edge > min_edge
- Breakdown by spread range
"""
# YOUR CODE HERE
pass
def kelly_criterion(self, predicted_prob: float,
market_prob: float,
fraction: float = 0.25) -> float:
"""
Calculate Kelly Criterion bet size.
Parameters:
-----------
predicted_prob : float
Model's estimated probability
market_prob : float
Implied probability from market odds
fraction : float
Kelly fraction (0.25 = quarter Kelly)
Returns:
--------
float : Recommended bet size as fraction of bankroll
"""
# YOUR CODE HERE
pass
def backtest(self, historical_predictions: pd.DataFrame,
historical_lines: pd.DataFrame,
actual_results: pd.DataFrame,
bankroll: float = 10000) -> pd.DataFrame:
"""
Backtest betting strategy.
Returns:
--------
pd.DataFrame : Backtest results with bankroll over time
"""
# YOUR CODE HERE
pass
Exercise 5.3: Real-Time Prediction API
Task: Design a prediction API for production use.
from dataclasses import dataclass
from typing import Optional
import json
@dataclass
class PredictionRequest:
"""Request format for game prediction."""
home_team: str
away_team: str
game_date: str
neutral_site: bool = False
weather: Optional[str] = None
@dataclass
class PredictionResponse:
"""Response format for game prediction."""
game_id: str
home_team: str
away_team: str
home_win_probability: float
away_win_probability: float
predicted_spread: float
predicted_total: float
confidence: float
model_version: str
timestamp: str
class PredictionAPI:
"""
Production-ready prediction API.
"""
def __init__(self, model_path: str):
"""
Load trained model from disk.
Parameters:
-----------
model_path : str
Path to saved model artifacts
"""
# YOUR CODE HERE
pass
def validate_request(self, request: PredictionRequest) -> bool:
"""
Validate incoming prediction request.
Check:
- Team names are valid
- Date is in valid format
- Teams have sufficient history for prediction
Returns:
--------
bool : True if valid
"""
# YOUR CODE HERE
pass
def predict(self, request: PredictionRequest) -> PredictionResponse:
"""
Generate prediction from request.
Returns:
--------
PredictionResponse : Formatted prediction response
"""
# YOUR CODE HERE
pass
def batch_predict(self, requests: List[PredictionRequest]) -> List[PredictionResponse]:
"""
Generate predictions for multiple games.
Returns:
--------
List[PredictionResponse] : All predictions
"""
# YOUR CODE HERE
pass
def to_json(self, response: PredictionResponse) -> str:
"""Convert response to JSON string."""
# YOUR CODE HERE
pass
Bonus Challenge: Conference Championship Predictor
Task: Build a system specifically for predicting conference championships.
Conference championship games have unique characteristics: - Both teams are typically ranked - Often rematches of regular season games - High-stakes environment - Usually played at neutral sites
Design and implement a model that: 1. Incorporates regular season head-to-head results 2. Accounts for championship game-specific factors 3. Handles the compressed sample size (limited historical data) 4. Evaluates performance on historical championship games
class ConferenceChampionshipPredictor:
"""
Specialized predictor for conference championship games.
Your implementation should:
1. Weight recent performance more heavily
2. Include head-to-head factors
3. Account for neutral site
4. Consider "big game" performance history
"""
# YOUR IMPLEMENTATION HERE
pass
Submission Guidelines
For each exercise: 1. Include all code with appropriate comments 2. Add docstrings explaining your approach 3. Provide test cases demonstrating correctness 4. Include sample output showing results
For Level 4-5 exercises: - Include evaluation metrics on provided test data - Document any assumptions made - Explain design decisions and trade-offs - Compare your approach to baselines