Capstone Project 1: Complete NFL Prediction and Betting System


Project Overview

This capstone project challenges you to design, build, test, and document a complete end-to-end NFL game prediction and betting system. You will collect real data, engineer meaningful features, train multiple predictive models, rigorously backtest their performance, and construct a production-ready pipeline that generates weekly bet recommendations with proper bankroll management.

This project synthesizes material from nearly every part of the textbook. You will apply probability theory (Chapter 2), expected value analysis (Chapter 3), bankroll management (Chapter 4), data literacy skills (Chapter 5), regression analysis (Chapter 9), Bayesian reasoning (Chapter 10), market understanding (Chapter 11), line shopping strategies (Chapter 12), value betting theory (Chapter 13), Kelly staking (Chapter 14), NFL-specific modeling (Chapter 15), time series methods (Chapter 23), Monte Carlo simulation (Chapter 24), optimization (Chapter 25), rating systems (Chapter 26), advanced regression and classification (Chapter 27), feature engineering (Chapter 28), neural networks (Chapter 29), model evaluation (Chapter 30), production pipeline design (Chapter 31), and discipline systems (Chapter 37).

The result should be a system you could realistically deploy for a live NFL season -- not a toy exercise, but a genuine analytical tool grounded in sound statistical methodology.

Learning Objectives

Upon completing this project, you will be able to:

  1. Build automated data collection pipelines for NFL play-by-play data, odds, and contextual information.
  2. Engineer predictive features grounded in football analytics research (EPA, DVOA-style metrics, schedule effects).
  3. Train, tune, and evaluate multiple model families for spread, totals, and moneyline prediction.
  4. Conduct rigorous walk-forward backtesting that avoids look-ahead bias and reflects realistic execution.
  5. Implement Kelly-based bet sizing with fractional scaling and risk constraints.
  6. Design a production pipeline that runs autonomously on a weekly schedule.
  7. Communicate your methodology and results in a professional technical report.

Requirements and Deliverables

Specific, Measurable Requirements

Requirement Minimum Standard Exceeds Expectations
Data sources 2 play-by-play sources + 1 odds source 3+ play-by-play sources + 3+ odds sources
Historical seasons collected 5 seasons (2019--2023) 8+ seasons (2016--2023)
Engineered features 50 unique features 80+ unique features across 6+ categories
Model types trained 3 distinct model families 5+ model families with ensemble
Walk-forward backtest seasons 3 out-of-sample seasons 5+ out-of-sample seasons
Bet types modeled Spreads and totals Spreads, totals, and moneylines
Brier score (spread ATS) Below 0.260 Below 0.250
Documentation length 15 pages technical report 20+ pages with supplementary analysis
Code quality Runs without errors, basic comments Fully documented, typed, with tests

Final Deliverables

  1. Complete Python codebase organized per the project structure below, with a README.md explaining setup and execution.
  2. Technical report (15--20 pages) covering methodology, feature importance, model comparison, backtest results, and critical self-evaluation.
  3. Backtest results dashboard -- either a Jupyter notebook with interactive plots or a Streamlit/Dash web application.
  4. 10-minute presentation with slides summarizing the approach and key findings.

Phase 1: Data Collection

Duration: Week 1 (of 8)

Relevant chapters: Chapter 5 (Data Literacy), Chapter 15 (Modeling the NFL), Chapter 31 (ML Betting Pipeline)

1.1 NFL Play-by-Play Data

The foundation of any NFL model is play-by-play data. Your primary source is the nfl_data_py package, which wraps nflfastR data and provides pre-computed EPA (Expected Points Added) and WPA (Win Probability Added) values for every play.

Required data fields: game_id, season, week, home_team, away_team, posteam, defteam, play_type, yards_gained, epa, wpa, success (binary: EPA > 0), down, ydstogo, yardline_100, score_differential, half_seconds_remaining, pass_attempt, rush_attempt, interception, fumble, sack, penalty, touchdown, field_goal_attempt, punt, qb_name, receiver_name, rusher_name, air_yards, yards_after_catch.

"""
phase1_data_collection.py
Collect and store NFL play-by-play data, schedule information,
odds history, and contextual features.
"""

import nfl_data_py as nfl
import pandas as pd
import sqlite3
import requests
import time
from pathlib import Path
from datetime import datetime
from typing import List, Optional

# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)
DB_PATH = DATA_DIR / "nfl_betting.db"

SEASONS = list(range(2016, 2024))  # 2016 through 2023

# ---------------------------------------------------------------------------
# 1. Play-by-Play Data
# ---------------------------------------------------------------------------

def collect_play_by_play(seasons: List[int]) -> pd.DataFrame:
    """
    Download NFL play-by-play data for the specified seasons using nfl_data_py.
    This includes pre-computed EPA and WPA for every play.

    Returns a DataFrame with one row per play, all plays across all seasons.
    """
    print(f"Downloading play-by-play data for seasons: {seasons}")
    pbp = nfl.import_pbp_data(seasons, downcast=True)

    # Filter to regular season and playoffs
    pbp = pbp[pbp["season_type"].isin(["REG", "POST"])].copy()

    # Verify critical columns exist
    required_cols = [
        "game_id", "season", "week", "home_team", "away_team",
        "posteam", "defteam", "play_type", "yards_gained", "epa",
        "wpa", "success", "down", "ydstogo", "score_differential",
        "half_seconds_remaining", "pass_attempt", "rush_attempt",
        "interception", "fumble_lost", "sack", "touchdown"
    ]
    missing = [c for c in required_cols if c not in pbp.columns]
    if missing:
        raise ValueError(f"Missing expected columns: {missing}")

    print(f"Collected {len(pbp):,} plays across {len(seasons)} seasons")
    return pbp


def aggregate_game_level_stats(pbp: pd.DataFrame) -> pd.DataFrame:
    """
    Aggregate play-by-play data to the game-team level.
    Produces one row per team per game with offensive and defensive summaries.
    """
    # Offensive stats (when team has possession)
    off = pbp.groupby(["game_id", "season", "week", "posteam"]).agg(
        off_plays=("play_type", "count"),
        off_epa_total=("epa", "sum"),
        off_epa_per_play=("epa", "mean"),
        off_success_rate=("success", "mean"),
        off_pass_attempts=("pass_attempt", "sum"),
        off_rush_attempts=("rush_attempt", "sum"),
        off_yards=("yards_gained", "sum"),
        off_touchdowns=("touchdown", "sum"),
        off_interceptions=("interception", "sum"),
        off_fumbles_lost=("fumble_lost", "sum"),
        off_sacks_taken=("sack", "sum"),
    ).reset_index().rename(columns={"posteam": "team"})

    # Defensive stats (when opposing team has possession)
    defense = pbp.groupby(["game_id", "season", "week", "defteam"]).agg(
        def_plays=("play_type", "count"),
        def_epa_total=("epa", "sum"),
        def_epa_per_play=("epa", "mean"),
        def_success_rate=("success", "mean"),
        def_pass_attempts_faced=("pass_attempt", "sum"),
        def_rush_attempts_faced=("rush_attempt", "sum"),
        def_yards_allowed=("yards_gained", "sum"),
        def_touchdowns_allowed=("touchdown", "sum"),
        def_interceptions_forced=("interception", "sum"),
        def_fumbles_forced=("fumble_lost", "sum"),
        def_sacks=("sack", "sum"),
    ).reset_index().rename(columns={"defteam": "team"})

    # Merge offensive and defensive stats
    game_stats = off.merge(
        defense,
        on=["game_id", "season", "week", "team"],
        how="outer"
    )

    print(f"Aggregated {len(game_stats):,} team-game records")
    return game_stats


# ---------------------------------------------------------------------------
# 2. Schedule and Results
# ---------------------------------------------------------------------------

def collect_schedules(seasons: List[int]) -> pd.DataFrame:
    """
    Download NFL schedules with game results.
    Returns one row per game with teams, scores, location, and surface info.
    """
    schedules = nfl.import_schedules(seasons)
    schedules = schedules[[
        "game_id", "season", "game_type", "week", "gameday", "weekday",
        "gametime", "away_team", "away_score", "home_team", "home_score",
        "location", "roof", "surface", "temp", "wind", "away_rest",
        "home_rest", "away_moneyline", "home_moneyline", "spread_line",
        "away_spread_odds", "home_spread_odds", "total_line",
        "under_odds", "over_odds", "div_game", "overtime"
    ]].copy()

    schedules["home_margin"] = schedules["home_score"] - schedules["away_score"]
    schedules["total_score"] = schedules["home_score"] + schedules["away_score"]

    print(f"Collected {len(schedules):,} games across {len(seasons)} seasons")
    return schedules


# ---------------------------------------------------------------------------
# 3. Historical Odds Data
# ---------------------------------------------------------------------------

def collect_odds_history(seasons: List[int]) -> pd.DataFrame:
    """
    Collect historical closing odds from multiple sportsbooks.

    Primary source: The odds data embedded in nfl_data_py schedules (from
    ESPN / standard consensus lines).

    Secondary sources to supplement:
    - Australian Sports Betting dataset (free, covers NFL from 2007+)
      URL: https://www.aussportsbetting.com/data/
    - Pro-Football-Reference game lines
    - Kaggle NFL odds datasets

    For multi-book analysis, you should also integrate data from the
    Odds API (https://the-odds-api.com/) which provides lines from
    DraftKings, FanDuel, BetMGM, Caesars, and others.
    """
    # Start with schedule-embedded odds
    schedules = nfl.import_schedules(seasons)
    odds = schedules[[
        "game_id", "season", "week", "away_team", "home_team",
        "away_moneyline", "home_moneyline", "spread_line",
        "away_spread_odds", "home_spread_odds",
        "total_line", "under_odds", "over_odds"
    ]].copy()

    odds.rename(columns={
        "spread_line": "consensus_spread",
        "total_line": "consensus_total",
        "home_moneyline": "home_ml",
        "away_moneyline": "away_ml"
    }, inplace=True)

    print(f"Collected odds for {len(odds):,} games")
    return odds


def fetch_odds_api_snapshot(api_key: str, sport: str = "americanfootball_nfl") -> dict:
    """
    Fetch current odds from The Odds API (requires free API key).
    Returns odds from multiple sportsbooks for current NFL games.
    Free tier: 500 requests/month.
    """
    url = f"https://api.the-odds-api.com/v4/sports/{sport}/odds/"
    params = {
        "apiKey": api_key,
        "regions": "us",
        "markets": "spreads,totals,h2h",
        "oddsFormat": "american",
        "bookmakers": "draftkings,fanduel,betmgm,caesars,pointsbet"
    }
    response = requests.get(url, params=params, timeout=30)
    response.raise_for_status()
    return response.json()


# ---------------------------------------------------------------------------
# 4. Contextual Data: Injuries, Weather, Travel
# ---------------------------------------------------------------------------

def collect_injury_data(seasons: List[int]) -> pd.DataFrame:
    """
    Collect NFL injury report data. nfl_data_py provides weekly injury data
    including player name, team, position, injury type, and game status
    (Out, Doubtful, Questionable, Probable).
    """
    injuries = nfl.import_injuries(seasons)
    injuries = injuries[[
        "season", "week", "team", "full_name", "position",
        "report_status", "practice_status"
    ]].copy()

    # Encode severity for modeling
    status_map = {
        "Out": 1.0,
        "Doubtful": 0.85,
        "Questionable": 0.50,
        "Probable": 0.10,
    }
    injuries["miss_probability"] = injuries["report_status"].map(status_map).fillna(0.0)

    print(f"Collected {len(injuries):,} injury records")
    return injuries


def calculate_travel_distance(home_team: str, away_team: str,
                              team_locations: dict) -> float:
    """
    Calculate approximate travel distance (miles) between team cities.
    team_locations should map team abbreviations to (latitude, longitude).
    Uses the Haversine formula.
    """
    import math

    lat1, lon1 = team_locations[away_team]
    lat2, lon2 = team_locations[home_team]

    R = 3959  # Earth radius in miles
    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)
    a = (math.sin(dlat / 2) ** 2 +
         math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *
         math.sin(dlon / 2) ** 2)
    c = 2 * math.asin(math.sqrt(a))
    return R * c


# NFL team locations (latitude, longitude)
TEAM_LOCATIONS = {
    "ARI": (33.5276, -112.2626), "ATL": (33.7573, -84.4009),
    "BAL": (39.2780, -76.6227),  "BUF": (42.7738, -78.7870),
    "CAR": (35.2258, -80.8528),  "CHI": (41.8623, -87.6167),
    "CIN": (39.0955, -84.5160),  "CLE": (41.5061, -81.6995),
    "DAL": (32.7473, -97.0945),  "DEN": (39.7439, -105.0201),
    "DET": (42.3400, -83.0456),  "GB":  (44.5013, -88.0622),
    "HOU": (29.6847, -95.4107),  "IND": (39.7601, -86.1639),
    "JAX": (30.3239, -81.6373),  "KC":  (39.0489, -94.4839),
    "LV":  (36.0909, -115.1833), "LAC": (33.9535, -118.3392),
    "LAR": (33.9535, -118.3392), "MIA": (25.9580, -80.2389),
    "MIN": (44.9736, -93.2575),  "NE":  (42.0909, -71.2643),
    "NO":  (29.9511, -90.0812),  "NYG": (40.8128, -74.0742),
    "NYJ": (40.8128, -74.0742),  "PHI": (39.9008, -75.1675),
    "PIT": (40.4468, -80.0158),  "SF":  (37.4032, -121.9698),
    "SEA": (47.5952, -122.3316), "TB":  (27.9759, -82.5033),
    "TEN": (36.1665, -86.7713),  "WAS": (38.9076, -76.8645),
}


# ---------------------------------------------------------------------------
# 5. Database Storage
# ---------------------------------------------------------------------------

def store_to_database(df: pd.DataFrame, table_name: str,
                      db_path: Path = DB_PATH) -> None:
    """Store a DataFrame to SQLite for persistent, queryable access."""
    conn = sqlite3.connect(str(db_path))
    df.to_sql(table_name, conn, if_exists="replace", index=False)
    conn.close()
    print(f"Stored {len(df):,} rows to table '{table_name}'")


def load_from_database(table_name: str, db_path: Path = DB_PATH) -> pd.DataFrame:
    """Load a table from the SQLite database."""
    conn = sqlite3.connect(str(db_path))
    df = pd.read_sql(f"SELECT * FROM {table_name}", conn)
    conn.close()
    return df


# ---------------------------------------------------------------------------
# Main Collection Pipeline
# ---------------------------------------------------------------------------

def run_data_collection():
    """Execute the complete data collection pipeline."""
    print("=" * 60)
    print("NFL BETTING SYSTEM - DATA COLLECTION PIPELINE")
    print("=" * 60)

    # Play-by-play
    pbp = collect_play_by_play(SEASONS)
    store_to_database(pbp, "play_by_play")

    # Aggregated game-level stats
    game_stats = aggregate_game_level_stats(pbp)
    store_to_database(game_stats, "game_stats")

    # Schedules and results
    schedules = collect_schedules(SEASONS)
    store_to_database(schedules, "schedules")

    # Odds
    odds = collect_odds_history(SEASONS)
    store_to_database(odds, "odds_history")

    # Injuries
    injuries = collect_injury_data(SEASONS)
    store_to_database(injuries, "injuries")

    print("\nData collection complete.")
    print(f"Database stored at: {DB_PATH}")


if __name__ == "__main__":
    run_data_collection()

Phase 1 Checklist

  • [ ] Play-by-play data downloaded for all target seasons
  • [ ] Game-level aggregated statistics computed and stored
  • [ ] Schedule and results data collected with home margins and totals
  • [ ] Odds data collected from at least one source; multi-book data integrated if available
  • [ ] Injury reports downloaded and severity-encoded
  • [ ] Travel distances computed for all team matchups
  • [ ] All data stored in SQLite database with consistent team abbreviations
  • [ ] Data quality checks passed: no missing game IDs, scores match schedule data, EPA values present for 95%+ of plays

Phase 2: Feature Engineering

Duration: Weeks 2--3

Relevant chapters: Chapter 15 (NFL Modeling), Chapter 23 (Time Series), Chapter 26 (Ratings), Chapter 28 (Feature Engineering)

You must engineer a minimum of 50 features across the following categories. Each feature should have a documented rationale grounded in football analytics research.

2.1 EPA-Based Efficiency Features

These are the core predictive features. Research consistently shows EPA-based metrics are more predictive of future performance than traditional stats like yards and points (see Chapter 15, Section 15.2).

"""
phase2_feature_engineering.py
Engineer predictive features from raw NFL data.
"""

import pandas as pd
import numpy as np
from typing import Tuple

# ---------------------------------------------------------------------------
# EPA-Based Efficiency (Features 1-16)
# ---------------------------------------------------------------------------

def compute_rolling_epa_features(game_stats: pd.DataFrame,
                                  windows: list = [3, 5, 8]) -> pd.DataFrame:
    """
    Compute rolling EPA-based efficiency metrics over multiple windows.

    The choice of window sizes follows Chapter 23 (Time Series): short windows
    (3 games) capture recent form, medium windows (5 games) balance signal
    and noise, and longer windows (8 games) approximate half-season trends.

    Features produced per window:
      - rolling_off_epa_per_play: offensive EPA/play (higher is better)
      - rolling_def_epa_per_play: defensive EPA/play (lower is better)
      - rolling_off_success_rate: fraction of plays with positive EPA
      - rolling_def_success_rate: fraction of opponent plays with positive EPA
      - rolling_pass_epa: EPA/play on pass plays only
      - rolling_rush_epa: EPA/play on rush plays only
      - rolling_epa_margin: offensive EPA minus defensive EPA allowed
    """
    # Sort by team and chronological order
    df = game_stats.sort_values(["team", "season", "week"]).copy()

    for w in windows:
        grp = df.groupby("team")

        # Offensive EPA per play
        df[f"off_epa_per_play_r{w}"] = grp["off_epa_per_play"].transform(
            lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
        )

        # Defensive EPA per play
        df[f"def_epa_per_play_r{w}"] = grp["def_epa_per_play"].transform(
            lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
        )

        # Offensive success rate
        df[f"off_success_rate_r{w}"] = grp["off_success_rate"].transform(
            lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
        )

        # Defensive success rate
        df[f"def_success_rate_r{w}"] = grp["def_success_rate"].transform(
            lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
        )

        # Combined EPA margin
        df[f"epa_margin_r{w}"] = df[f"off_epa_per_play_r{w}"] - df[f"def_epa_per_play_r{w}"]

    return df


def compute_pass_rush_splits(pbp: pd.DataFrame,
                              game_stats: pd.DataFrame) -> pd.DataFrame:
    """
    Compute separate EPA metrics for pass and rush plays.
    Pass EPA is generally more stable and predictive than rush EPA
    (see Chapter 15, Section 15.2).
    """
    pass_epa = pbp[pbp["pass_attempt"] == 1].groupby(
        ["game_id", "posteam"]
    )["epa"].mean().reset_index().rename(
        columns={"posteam": "team", "epa": "pass_epa_per_play"}
    )

    rush_epa = pbp[pbp["rush_attempt"] == 1].groupby(
        ["game_id", "posteam"]
    )["epa"].mean().reset_index().rename(
        columns={"posteam": "team", "epa": "rush_epa_per_play"}
    )

    df = game_stats.merge(pass_epa, on=["game_id", "team"], how="left")
    df = df.merge(rush_epa, on=["game_id", "team"], how="left")

    return df


# ---------------------------------------------------------------------------
# Schedule and Rest Features (Features 17-26)
# ---------------------------------------------------------------------------

def compute_schedule_features(schedules: pd.DataFrame) -> pd.DataFrame:
    """
    Compute schedule-related features that affect game outcomes.

    Rest advantage: Teams with more days of rest perform better on average.
    Chapter 15 documents a roughly 1-point spread advantage per extra
    rest day beyond the standard 7 days.

    Bye week: Teams coming off bye weeks historically perform above
    expectation, though this edge has diminished as markets have adjusted.

    Travel: Long-distance travel, especially westward, has a measurable
    fatigue effect (see Chapter 15, Section 15.3).
    """
    df = schedules.copy()

    # Rest differential
    df["rest_differential"] = df["home_rest"] - df["away_rest"]

    # Bye week flags (rest >= 13 days typically indicates bye)
    df["home_off_bye"] = (df["home_rest"] >= 13).astype(int)
    df["away_off_bye"] = (df["away_rest"] >= 13).astype(int)

    # Short rest flags (Thursday games, rest <= 5)
    df["home_short_rest"] = (df["home_rest"] <= 5).astype(int)
    df["away_short_rest"] = (df["away_rest"] <= 5).astype(int)

    # Division game flag (tighter, lower-scoring, more unpredictable)
    # div_game is already in schedule data

    # Time zone differential (proxy for travel fatigue)
    timezone_map = {
        "ARI": -7, "LAR": -8, "LAC": -8, "SF": -8, "SEA": -8,
        "LV": -8, "DEN": -7, "DAL": -6, "HOU": -6, "KC": -6,
        "MIN": -6, "CHI": -6, "GB": -6, "NO": -6, "TEN": -6,
        "IND": -5, "JAX": -5, "ATL": -5, "CAR": -5, "CIN": -5,
        "CLE": -5, "DET": -5, "MIA": -5, "TB": -5, "BAL": -5,
        "BUF": -5, "NE": -5, "NYG": -5, "NYJ": -5, "PHI": -5,
        "PIT": -5, "WAS": -5
    }
    df["tz_diff"] = (
        df["away_team"].map(timezone_map).fillna(-6) -
        df["home_team"].map(timezone_map).fillna(-6)
    )

    # Game timing features
    df["is_primetime"] = df["gametime"].apply(
        lambda x: 1 if pd.notna(x) and (
            str(x) >= "20:00" or str(x) >= "16:25"
        ) else 0
    )

    # Week of season (early vs. late)
    df["season_phase"] = pd.cut(
        df["week"],
        bins=[0, 4, 9, 13, 18],
        labels=["early", "mid_early", "mid_late", "late"]
    )

    return df


# ---------------------------------------------------------------------------
# Rating System Features (Features 27-34)
# ---------------------------------------------------------------------------

def compute_elo_ratings(schedules: pd.DataFrame,
                        k_factor: float = 20.0,
                        home_advantage: float = 48.0,
                        mean_reversion: float = 0.33) -> pd.DataFrame:
    """
    Compute Elo ratings for all NFL teams using the methodology from
    Chapter 26 (Ratings and Ranking Systems).

    Parameters follow the FiveThirtyEight NFL Elo approach:
    - k_factor: 20 (how quickly ratings update after each game)
    - home_advantage: 48 Elo points (approximately 3 points on the spread)
    - mean_reversion: 1/3 regression to mean (1505) between seasons

    The Elo rating difference between two teams converts to a spread
    prediction via: predicted_spread = elo_diff / 25
    """
    teams = set(schedules["home_team"]).union(set(schedules["away_team"]))
    elo = {team: 1505.0 for team in teams}
    elo_records = []

    current_season = None

    for _, game in schedules.sort_values(["season", "week"]).iterrows():
        season = game["season"]

        # Between-season mean reversion
        if season != current_season:
            if current_season is not None:
                for team in elo:
                    elo[team] = elo[team] * (1 - mean_reversion) + 1505 * mean_reversion
            current_season = season

        home = game["home_team"]
        away = game["away_team"]

        # Store pre-game Elo
        home_elo_pre = elo[home]
        away_elo_pre = elo[away]

        # Expected outcome
        elo_diff = home_elo_pre + home_advantage - away_elo_pre
        expected_home = 1.0 / (1.0 + 10.0 ** (-elo_diff / 400.0))

        # Actual outcome
        if pd.notna(game.get("home_score")) and pd.notna(game.get("away_score")):
            margin = game["home_score"] - game["away_score"]
            actual_home = 1.0 if margin > 0 else (0.5 if margin == 0 else 0.0)

            # Margin of victory multiplier (diminishing returns for blowouts)
            mov_mult = np.log(abs(margin) + 1) * (2.2 / (elo_diff * 0.001 + 2.2))

            # Update Elo
            update = k_factor * mov_mult * (actual_home - expected_home)
            elo[home] += update
            elo[away] -= update

        elo_records.append({
            "game_id": game["game_id"],
            "season": season,
            "week": game["week"],
            "home_team": home,
            "away_team": away,
            "home_elo": home_elo_pre,
            "away_elo": away_elo_pre,
            "elo_diff": home_elo_pre - away_elo_pre,
            "elo_predicted_spread": -(home_elo_pre + home_advantage - away_elo_pre) / 25.0,
            "elo_home_win_prob": expected_home,
        })

    return pd.DataFrame(elo_records)


# ---------------------------------------------------------------------------
# Injury Impact Features (Features 35-40)
# ---------------------------------------------------------------------------

def compute_injury_features(injuries: pd.DataFrame,
                            schedules: pd.DataFrame) -> pd.DataFrame:
    """
    Quantify the impact of injuries on team strength.

    Uses positional value weights derived from Chapter 15: QB injuries
    have the largest impact on team performance, followed by edge
    rushers and offensive tackles.

    Positional weights (approximate points of spread impact):
      QB: 5.0, EDGE/OLB: 1.2, OT: 1.0, WR1: 0.8, CB: 0.7,
      RB: 0.3, all others: 0.2
    """
    position_weights = {
        "QB": 5.0, "DE": 1.2, "OLB": 1.2, "OT": 1.0, "T": 1.0,
        "WR": 0.8, "CB": 0.7, "S": 0.5, "DT": 0.5, "LB": 0.4,
        "ILB": 0.4, "G": 0.4, "C": 0.3, "RB": 0.3, "TE": 0.3,
        "K": 0.2, "P": 0.1
    }

    inj = injuries.copy()
    inj["pos_weight"] = inj["position"].map(position_weights).fillna(0.2)
    inj["weighted_impact"] = inj["miss_probability"] * inj["pos_weight"]

    # Aggregate to team-week level
    team_injury = inj.groupby(["season", "week", "team"]).agg(
        injury_count=("full_name", "count"),
        injury_impact_total=("weighted_impact", "sum"),
        qb_injury_flag=("position", lambda x: int(
            any((x == "QB") & (inj.loc[x.index, "miss_probability"] > 0.5))
        )),
    ).reset_index()

    return team_injury


# ---------------------------------------------------------------------------
# Weather Features (Features 41-45)
# ---------------------------------------------------------------------------

def compute_weather_features(schedules: pd.DataFrame) -> pd.DataFrame:
    """
    Engineer weather features that affect scoring and strategy.
    Cold, windy conditions reduce scoring and favor unders/running teams.
    Indoor games normalize these effects (see Chapter 15, Section 15.3).
    """
    df = schedules.copy()

    df["is_dome"] = df["roof"].isin(["dome", "closed"]).astype(int)
    df["temp_cold"] = ((df["temp"] < 35) & (df["is_dome"] == 0)).astype(int)
    df["wind_high"] = ((df["wind"] > 15) & (df["is_dome"] == 0)).astype(int)
    df["wind_extreme"] = ((df["wind"] > 25) & (df["is_dome"] == 0)).astype(int)

    # Interaction: cold AND windy is worse than either alone
    df["cold_windy"] = (df["temp_cold"] & df["wind_high"]).astype(int)

    return df


# ---------------------------------------------------------------------------
# Market-Derived Features (Features 46-52)
# ---------------------------------------------------------------------------

def compute_market_features(odds: pd.DataFrame,
                             schedules: pd.DataFrame) -> pd.DataFrame:
    """
    Derive features from the betting market itself.
    The market is an information aggregator; features derived from
    line movements and implied probabilities are highly predictive
    (see Chapter 11, Understanding Betting Markets).
    """
    df = schedules.merge(odds, on=["game_id", "season", "week",
                                    "away_team", "home_team"], how="left")

    # Implied probability from moneyline (Chapter 2)
    def ml_to_prob(ml):
        if pd.isna(ml):
            return np.nan
        if ml > 0:
            return 100.0 / (ml + 100.0)
        else:
            return abs(ml) / (abs(ml) + 100.0)

    df["home_implied_prob"] = df["home_ml"].apply(ml_to_prob)
    df["away_implied_prob"] = df["away_ml"].apply(ml_to_prob)

    # Remove vig using multiplicative method (Chapter 2, Section 2.3)
    total_prob = df["home_implied_prob"] + df["away_implied_prob"]
    df["home_fair_prob"] = df["home_implied_prob"] / total_prob
    df["away_fair_prob"] = df["away_implied_prob"] / total_prob

    # Spread as a feature (market's best estimate of margin)
    # Already available as consensus_spread

    # Total as a feature (market's estimate of combined scoring)
    # Already available as consensus_total

    return df


# ---------------------------------------------------------------------------
# Combine All Features
# ---------------------------------------------------------------------------

def build_feature_matrix(db_path: str = "data/nfl_betting.db") -> pd.DataFrame:
    """
    Load all data, compute all features, and produce the final
    feature matrix for modeling. One row per game, with separate
    home and away feature columns.
    """
    import sqlite3
    conn = sqlite3.connect(db_path)

    pbp = pd.read_sql("SELECT * FROM play_by_play", conn)
    game_stats = pd.read_sql("SELECT * FROM game_stats", conn)
    schedules = pd.read_sql("SELECT * FROM schedules", conn)
    odds = pd.read_sql("SELECT * FROM odds_history", conn)
    injuries = pd.read_sql("SELECT * FROM injuries", conn)
    conn.close()

    # Compute all feature groups
    epa_features = compute_rolling_epa_features(game_stats)
    epa_features = compute_pass_rush_splits(pbp, epa_features)
    schedule_features = compute_schedule_features(schedules)
    elo_features = compute_elo_ratings(schedules)
    injury_features = compute_injury_features(injuries, schedules)
    weather_features = compute_weather_features(schedules)
    market_features = compute_market_features(odds, schedules)

    # Merge into single game-level DataFrame
    # (detailed merge logic depends on your schema; join on game_id)
    features = schedule_features.merge(elo_features, on=[
        "game_id", "season", "week", "home_team", "away_team"
    ], how="left")

    features = features.merge(weather_features[[
        "game_id", "is_dome", "temp_cold", "wind_high",
        "wind_extreme", "cold_windy"
    ]], on="game_id", how="left")

    features = features.merge(market_features[[
        "game_id", "home_implied_prob", "away_implied_prob",
        "home_fair_prob", "away_fair_prob"
    ]], on="game_id", how="left")

    # Merge rolling EPA for home and away teams separately
    for side, team_col in [("home", "home_team"), ("away", "away_team")]:
        side_epa = epa_features.rename(
            columns={c: f"{side}_{c}" for c in epa_features.columns
                     if c not in ["game_id", "season", "week", "team"]}
        )
        side_epa = side_epa.rename(columns={"team": team_col})
        features = features.merge(
            side_epa, on=["game_id", "season", "week", team_col], how="left"
        )

    # Merge injuries for home and away
    for side, team_col in [("home", "home_team"), ("away", "away_team")]:
        side_inj = injury_features.rename(
            columns={c: f"{side}_{c}" for c in injury_features.columns
                     if c not in ["season", "week", "team"]}
        )
        side_inj = side_inj.rename(columns={"team": team_col})
        features = features.merge(
            side_inj, on=["season", "week", team_col], how="left"
        )

    # Differential features (home minus away)
    for w in [3, 5, 8]:
        features[f"epa_diff_r{w}"] = (
            features.get(f"home_off_epa_per_play_r{w}", 0) -
            features.get(f"away_off_epa_per_play_r{w}", 0)
        )

    features["injury_diff"] = (
        features.get("away_injury_impact_total", 0) -
        features.get("home_injury_impact_total", 0)
    )

    print(f"Final feature matrix: {features.shape[0]} games x {features.shape[1]} columns")
    return features

2.2 Complete Feature List

The following table catalogs the minimum 50 features you should engineer. Group them by category and document each one.

# Feature Name Category Description Chapter Reference
1-3 off_epa_per_play_r{3,5,8} EPA Efficiency Rolling offensive EPA per play Ch 15, 28
4-6 def_epa_per_play_r{3,5,8} EPA Efficiency Rolling defensive EPA per play Ch 15, 28
7-9 off_success_rate_r{3,5,8} EPA Efficiency Rolling offensive success rate Ch 15
10-12 def_success_rate_r{3,5,8} EPA Efficiency Rolling defensive success rate Ch 15
13-15 epa_margin_r{3,5,8} EPA Efficiency Offense EPA minus defense EPA Ch 15
16 pass_epa_per_play EPA Splits EPA per pass play Ch 15
17 rush_epa_per_play EPA Splits EPA per rush play Ch 15
18 rest_differential Schedule Home rest days minus away rest days Ch 15
19 home_off_bye Schedule Home team coming off bye week Ch 15
20 away_off_bye Schedule Away team coming off bye week Ch 15
21 home_short_rest Schedule Home team on short rest Ch 15
22 away_short_rest Schedule Away team on short rest Ch 15
23 div_game Schedule Divisional matchup flag Ch 15
24 tz_diff Schedule Time zone differential Ch 15
25 is_primetime Schedule Primetime game flag Ch 15
26 season_phase Schedule Phase of season (early/mid/late) Ch 23
27 home_elo Ratings Home team Elo rating Ch 26
28 away_elo Ratings Away team Elo rating Ch 26
29 elo_diff Ratings Elo difference (home minus away) Ch 26
30 elo_predicted_spread Ratings Elo-derived point spread Ch 26
31 elo_home_win_prob Ratings Elo-derived home win probability Ch 26
32-34 massey_off, massey_def, massey_overall Ratings Massey ratings (Chapter 26 exercise) Ch 26
35 injury_impact_total Injuries Weighted injury severity sum Ch 15, 32
36 qb_injury_flag Injuries Starting QB out or doubtful Ch 15
37 injury_count Injuries Number of players on injury report Ch 15
38 injury_diff Injuries Injury impact differential Ch 15
39-40 positional_injury_off, positional_injury_def Injuries Offensive vs defensive injury impact Ch 15
41 is_dome Weather Indoor game flag Ch 15
42 temp_cold Weather Temperature below 35F Ch 15
43 wind_high Weather Wind above 15 mph Ch 15
44 wind_extreme Weather Wind above 25 mph Ch 15
45 cold_windy Weather Cold and windy interaction Ch 15
46 home_implied_prob Market Market-implied home win probability Ch 2, 11
47 home_fair_prob Market Vig-removed implied probability Ch 2
48 consensus_spread Market Consensus point spread Ch 11
49 consensus_total Market Consensus over/under total Ch 11
50 spread_vs_elo Hybrid Market spread minus Elo spread Ch 11, 26
51 home_record_r5 Trend Home team W-L last 5 games Ch 23
52 away_record_r5 Trend Away team W-L last 5 games Ch 23

Phase 2 Checklist

  • [ ] All 50+ features computed with shift(1) to prevent look-ahead bias
  • [ ] Rolling windows use only past data (never current game)
  • [ ] Elo ratings computed sequentially with proper between-season regression
  • [ ] Injury features weighted by positional importance
  • [ ] Weather features handle dome games correctly
  • [ ] Feature correlation matrix examined; highly correlated features (r > 0.90) addressed
  • [ ] Feature distributions visualized and outliers investigated
  • [ ] Missing values documented with imputation strategy

Phase 3: Model Building

Duration: Weeks 3--4

Relevant chapters: Chapter 9 (Regression), Chapter 27 (Advanced Regression/Classification), Chapter 29 (Neural Networks)

You must build at least three distinct model families. Each model targets one or more of: (a) point spread prediction, (b) game total prediction, (c) moneyline (win probability) prediction.

3.1 Model Specifications

Model A: Ridge/Lasso Regression (Chapter 9) - Target: home_margin (continuous) for spread prediction - Regularization selected via cross-validation - Serves as the interpretable baseline

Model B: XGBoost Gradient Boosting (Chapter 27) - Target: home_margin (regression) and home_win (classification) - Hyperparameter tuning via Optuna (Chapter 29, Section 29.5) - Feature importance analysis using SHAP (Chapter 27, Section 27.5)

Model C: Neural Network (Chapter 29) - Architecture: feed-forward with 2--3 hidden layers, dropout regularization - Target: home_margin (regression head) and home_win_probability (classification head) - Multi-task learning configuration

Model D (bonus): Ensemble/Stacking - Combine predictions from Models A, B, and C using a meta-learner - Stacking follows Chapter 27, Section 27.2

"""
phase3_model_building.py
Train spread, totals, and moneyline prediction models.
"""

import pandas as pd
import numpy as np
from sklearn.linear_model import Ridge, RidgeCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import xgboost as xgb
import torch
import torch.nn as nn
from typing import Dict, Tuple, List
import optuna
import shap

# ---------------------------------------------------------------------------
# Model A: Ridge Regression Baseline
# ---------------------------------------------------------------------------

class SpreadModelRidge:
    """
    Linear regression with L2 regularization for spread prediction.
    This is the interpretable baseline model (Chapter 9).
    """

    def __init__(self, alphas: list = None):
        if alphas is None:
            alphas = [0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]
        self.pipeline = Pipeline([
            ("scaler", StandardScaler()),
            ("ridge", RidgeCV(alphas=alphas, cv=5))
        ])
        self.feature_names = None

    def fit(self, X: pd.DataFrame, y: pd.Series):
        self.feature_names = list(X.columns)
        self.pipeline.fit(X, y)
        return self

    def predict(self, X: pd.DataFrame) -> np.ndarray:
        return self.pipeline.predict(X)

    def get_coefficients(self) -> pd.DataFrame:
        ridge = self.pipeline.named_steps["ridge"]
        scaler = self.pipeline.named_steps["scaler"]
        coefs = ridge.coef_ * scaler.scale_
        return pd.DataFrame({
            "feature": self.feature_names,
            "coefficient": coefs,
            "abs_coefficient": np.abs(coefs)
        }).sort_values("abs_coefficient", ascending=False)


# ---------------------------------------------------------------------------
# Model B: XGBoost
# ---------------------------------------------------------------------------

class SpreadModelXGB:
    """
    XGBoost gradient boosting model for spread prediction (Chapter 27).
    Supports both regression (spread prediction) and classification
    (win probability).
    """

    def __init__(self, task: str = "regression", params: dict = None):
        self.task = task
        default_params = {
            "n_estimators": 500,
            "max_depth": 6,
            "learning_rate": 0.05,
            "subsample": 0.8,
            "colsample_bytree": 0.8,
            "min_child_weight": 5,
            "reg_alpha": 0.1,
            "reg_lambda": 1.0,
            "random_state": 42,
        }
        if task == "regression":
            default_params["objective"] = "reg:squarederror"
            default_params["eval_metric"] = "rmse"
        else:
            default_params["objective"] = "binary:logistic"
            default_params["eval_metric"] = "logloss"

        if params:
            default_params.update(params)

        if task == "regression":
            self.model = xgb.XGBRegressor(**default_params)
        else:
            self.model = xgb.XGBClassifier(**default_params)

    def fit(self, X: pd.DataFrame, y: pd.Series,
            eval_set: list = None):
        fit_params = {"verbose": False}
        if eval_set:
            fit_params["eval_set"] = eval_set
            fit_params["early_stopping_rounds"] = 50
        self.model.fit(X, y, **fit_params)
        return self

    def predict(self, X: pd.DataFrame) -> np.ndarray:
        return self.model.predict(X)

    def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
        if self.task == "classification":
            return self.model.predict_proba(X)[:, 1]
        raise ValueError("predict_proba only available for classification")

    def explain(self, X: pd.DataFrame) -> shap.Explanation:
        explainer = shap.TreeExplainer(self.model)
        return explainer(X)

    @staticmethod
    def tune_hyperparameters(X_train, y_train, X_val, y_val,
                              n_trials: int = 100) -> dict:
        """
        Bayesian hyperparameter optimization using Optuna
        (Chapter 29, Section 29.5).
        """
        def objective(trial):
            params = {
                "n_estimators": trial.suggest_int("n_estimators", 100, 1000),
                "max_depth": trial.suggest_int("max_depth", 3, 10),
                "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3, log=True),
                "subsample": trial.suggest_float("subsample", 0.6, 1.0),
                "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
                "min_child_weight": trial.suggest_int("min_child_weight", 1, 20),
                "reg_alpha": trial.suggest_float("reg_alpha", 1e-3, 10.0, log=True),
                "reg_lambda": trial.suggest_float("reg_lambda", 1e-3, 10.0, log=True),
            }
            model = xgb.XGBRegressor(**params, random_state=42,
                                      objective="reg:squarederror")
            model.fit(X_train, y_train, eval_set=[(X_val, y_val)],
                      early_stopping_rounds=50, verbose=False)
            preds = model.predict(X_val)
            rmse = np.sqrt(np.mean((preds - y_val) ** 2))
            return rmse

        study = optuna.create_study(direction="minimize")
        study.optimize(objective, n_trials=n_trials)
        return study.best_params


# ---------------------------------------------------------------------------
# Model C: Neural Network (PyTorch)
# ---------------------------------------------------------------------------

class NFLNet(nn.Module):
    """
    Multi-task neural network for NFL prediction (Chapter 29).
    Predicts both point spread (regression) and win probability
    (classification) simultaneously.
    """

    def __init__(self, input_dim: int, hidden_dims: list = None,
                 dropout: float = 0.3):
        super().__init__()
        if hidden_dims is None:
            hidden_dims = [128, 64, 32]

        layers = []
        prev_dim = input_dim
        for dim in hidden_dims:
            layers.extend([
                nn.Linear(prev_dim, dim),
                nn.BatchNorm1d(dim),
                nn.ReLU(),
                nn.Dropout(dropout),
            ])
            prev_dim = dim

        self.shared = nn.Sequential(*layers)
        self.spread_head = nn.Linear(prev_dim, 1)   # Regression
        self.win_head = nn.Sequential(
            nn.Linear(prev_dim, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        shared = self.shared(x)
        spread = self.spread_head(shared).squeeze(-1)
        win_prob = self.win_head(shared).squeeze(-1)
        return spread, win_prob


class NFLNetTrainer:
    """Training harness for the multi-task neural network."""

    def __init__(self, input_dim: int, lr: float = 0.001,
                 weight_decay: float = 1e-4, spread_weight: float = 0.7,
                 win_weight: float = 0.3):
        self.model = NFLNet(input_dim)
        self.optimizer = torch.optim.Adam(
            self.model.parameters(), lr=lr, weight_decay=weight_decay
        )
        self.spread_loss_fn = nn.MSELoss()
        self.win_loss_fn = nn.BCELoss()
        self.spread_weight = spread_weight
        self.win_weight = win_weight
        self.scaler = StandardScaler()

    def fit(self, X_train: np.ndarray, y_spread: np.ndarray,
            y_win: np.ndarray, epochs: int = 200, batch_size: int = 64,
            X_val: np.ndarray = None, y_spread_val: np.ndarray = None,
            patience: int = 20):
        X_scaled = self.scaler.fit_transform(X_train)
        X_t = torch.FloatTensor(X_scaled)
        y_s_t = torch.FloatTensor(y_spread)
        y_w_t = torch.FloatTensor(y_win)

        best_val_loss = float("inf")
        patience_counter = 0

        self.model.train()
        for epoch in range(epochs):
            indices = torch.randperm(len(X_t))
            epoch_loss = 0.0
            n_batches = 0

            for i in range(0, len(X_t), batch_size):
                batch_idx = indices[i:i+batch_size]
                X_batch = X_t[batch_idx]
                y_s_batch = y_s_t[batch_idx]
                y_w_batch = y_w_t[batch_idx]

                self.optimizer.zero_grad()
                pred_spread, pred_win = self.model(X_batch)

                loss_spread = self.spread_loss_fn(pred_spread, y_s_batch)
                loss_win = self.win_loss_fn(pred_win, y_w_batch)
                loss = self.spread_weight * loss_spread + self.win_weight * loss_win

                loss.backward()
                self.optimizer.step()

                epoch_loss += loss.item()
                n_batches += 1

            # Early stopping on validation set
            if X_val is not None:
                val_loss = self._validate(X_val, y_spread_val)
                if val_loss < best_val_loss:
                    best_val_loss = val_loss
                    patience_counter = 0
                else:
                    patience_counter += 1
                    if patience_counter >= patience:
                        print(f"Early stopping at epoch {epoch}")
                        break

    def _validate(self, X_val, y_spread_val):
        self.model.eval()
        X_scaled = self.scaler.transform(X_val)
        X_t = torch.FloatTensor(X_scaled)
        with torch.no_grad():
            pred_spread, _ = self.model(X_t)
        self.model.train()
        return float(nn.MSELoss()(pred_spread, torch.FloatTensor(y_spread_val)))

    def predict(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        self.model.eval()
        X_scaled = self.scaler.transform(X)
        X_t = torch.FloatTensor(X_scaled)
        with torch.no_grad():
            spread, win_prob = self.model(X_t)
        return spread.numpy(), win_prob.numpy()

Phase 3 Checklist

  • [ ] Ridge regression trained and coefficient table produced
  • [ ] XGBoost trained with early stopping; hyperparameters tuned via Optuna
  • [ ] Neural network trained with multi-task loss and early stopping
  • [ ] SHAP analysis completed for XGBoost model
  • [ ] All models produce predictions for: spread, totals, and win probability
  • [ ] No data leakage: all features use only past information relative to each game

Phase 4: Backtesting

Duration: Weeks 4--5

Relevant chapters: Chapter 30 (Model Evaluation), Chapter 24 (Monte Carlo Simulation), Chapter 8 (Hypothesis Testing)

4.1 Walk-Forward Validation Protocol

You must use walk-forward (expanding window) validation as described in Chapter 30, Section 30.3. This is the only evaluation methodology that realistically simulates how you would use the model in practice.

"""
phase4_backtesting.py
Walk-forward validation and performance evaluation.
"""

import pandas as pd
import numpy as np
from sklearn.metrics import brier_score_loss, log_loss, mean_squared_error
from typing import List, Dict, Tuple
import warnings

# ---------------------------------------------------------------------------
# Walk-Forward Validation
# ---------------------------------------------------------------------------

def walk_forward_backtest(features: pd.DataFrame,
                           target_col: str,
                           model_class,
                           model_params: dict,
                           feature_cols: List[str],
                           train_start_season: int,
                           test_start_season: int,
                           test_end_season: int,
                           retrain_frequency: str = "season") -> pd.DataFrame:
    """
    Walk-forward validation for NFL betting models (Chapter 30).

    Protocol:
    1. Train on all data from train_start_season through season N-1.
    2. Predict all games in season N.
    3. Advance to season N+1, retrain on all data through season N.
    4. Repeat until test_end_season.

    The retrain_frequency parameter controls how often the model is
    retrained. For NFL, "season" (retrain before each season) is the
    standard approach given the 272-game regular season.

    This function returns a DataFrame of out-of-sample predictions
    alongside actual outcomes, suitable for evaluation.
    """
    all_predictions = []

    for test_season in range(test_start_season, test_end_season + 1):
        # Training data: all completed seasons before the test season
        train_mask = features["season"] < test_season
        test_mask = features["season"] == test_season

        X_train = features.loc[train_mask, feature_cols].copy()
        y_train = features.loc[train_mask, target_col].copy()
        X_test = features.loc[test_mask, feature_cols].copy()
        y_test = features.loc[test_mask, target_col].copy()

        # Drop rows with missing features
        valid_train = X_train.dropna().index
        X_train = X_train.loc[valid_train]
        y_train = y_train.loc[valid_train]

        valid_test = X_test.dropna().index
        X_test = X_test.loc[valid_test]
        y_test = y_test.loc[valid_test]

        if len(X_train) == 0 or len(X_test) == 0:
            continue

        # Train model
        model = model_class(**model_params)
        model.fit(X_train, y_train)

        # Predict
        preds = model.predict(X_test)

        # Store predictions
        test_df = features.loc[valid_test, [
            "game_id", "season", "week", "home_team", "away_team",
            target_col, "consensus_spread", "consensus_total"
        ]].copy()
        test_df["prediction"] = preds
        test_df["residual"] = test_df[target_col] - preds

        all_predictions.append(test_df)

        print(f"Season {test_season}: trained on {len(X_train)} games, "
              f"tested on {len(X_test)} games, "
              f"RMSE = {np.sqrt(mean_squared_error(y_test, preds)):.3f}")

    return pd.concat(all_predictions, ignore_index=True)


# ---------------------------------------------------------------------------
# Evaluation Metrics
# ---------------------------------------------------------------------------

def evaluate_spread_model(predictions: pd.DataFrame) -> Dict[str, float]:
    """
    Comprehensive evaluation of a spread prediction model.

    Metrics (from Chapter 30):
    - RMSE: root mean squared error of margin prediction
    - MAE: mean absolute error
    - ATS record: win rate picking against the spread
    - Brier score: for ATS picks as binary predictions
    - Correlation: predicted vs actual margin
    """
    df = predictions.copy()

    # Point spread accuracy
    rmse = np.sqrt(mean_squared_error(df["home_margin"], df["prediction"]))
    mae = np.mean(np.abs(df["home_margin"] - df["prediction"]))

    # Against-the-spread analysis
    df["model_ats_pick"] = np.where(
        df["prediction"] + df["consensus_spread"] > 0, "home", "away"
    )
    df["actual_ats"] = np.where(
        df["home_margin"] + df["consensus_spread"] > 0, "home", "away"
    )
    # Exclude pushes
    non_push = df["home_margin"] + df["consensus_spread"] != 0
    ats_correct = (df.loc[non_push, "model_ats_pick"] ==
                   df.loc[non_push, "actual_ats"])
    ats_record = ats_correct.mean()

    # Brier score for ATS (convert to probability-like confidence)
    # Using logistic transformation of predicted edge
    df["ats_edge"] = df["prediction"] + df["consensus_spread"]
    df["ats_probability"] = 1.0 / (1.0 + np.exp(-df["ats_edge"] / 6.0))
    df["ats_actual"] = (df["home_margin"] + df["consensus_spread"] > 0).astype(float)
    brier = brier_score_loss(
        df.loc[non_push, "ats_actual"],
        df.loc[non_push, "ats_probability"]
    )

    # Correlation
    corr = df["home_margin"].corr(df["prediction"])

    return {
        "rmse": rmse,
        "mae": mae,
        "ats_record": ats_record,
        "ats_games": non_push.sum(),
        "brier_score": brier,
        "correlation": corr
    }


def profit_simulation(predictions: pd.DataFrame,
                       unit_size: float = 100.0,
                       min_edge: float = 1.0,
                       vig: float = -110) -> Dict[str, float]:
    """
    Simulate flat-bet profitability of the model's ATS picks.

    Only bets where the model disagrees with the market by at least
    min_edge points. Assumes standard -110 vig on all bets.

    Returns profit metrics from Chapter 3 and Chapter 37.
    """
    df = predictions.copy()

    # Model edge = model predicted margin - market spread (negative = home favored more)
    df["model_edge"] = df["prediction"] + df["consensus_spread"]
    df["bet_side"] = np.where(df["model_edge"] > 0, "home", "away")
    df["abs_edge"] = np.abs(df["model_edge"])

    # Filter to bets meeting minimum edge threshold
    bets = df[df["abs_edge"] >= min_edge].copy()

    if len(bets) == 0:
        return {"total_bets": 0, "profit": 0, "roi": 0}

    # Determine win/loss
    bets["actual_cover"] = np.where(
        bets["bet_side"] == "home",
        bets["home_margin"] + bets["consensus_spread"] > 0,
        bets["home_margin"] + bets["consensus_spread"] < 0
    )
    bets["push"] = (bets["home_margin"] + bets["consensus_spread"]) == 0

    # Profit calculation at -110
    risk_per_bet = unit_size
    win_per_bet = unit_size * (100.0 / abs(vig))

    bets["profit"] = np.where(
        bets["push"], 0.0,
        np.where(bets["actual_cover"], win_per_bet, -risk_per_bet)
    )

    total_profit = bets["profit"].sum()
    total_risked = len(bets) * risk_per_bet
    roi = total_profit / total_risked * 100

    # Calculate max drawdown
    cumulative = bets["profit"].cumsum()
    running_max = cumulative.cummax()
    drawdown = cumulative - running_max
    max_drawdown = drawdown.min()

    return {
        "total_bets": len(bets),
        "wins": int(bets["actual_cover"].sum()),
        "losses": int((~bets["actual_cover"] & ~bets["push"]).sum()),
        "pushes": int(bets["push"].sum()),
        "win_rate": bets["actual_cover"].mean(),
        "total_profit": total_profit,
        "total_risked": total_risked,
        "roi_pct": roi,
        "max_drawdown": max_drawdown,
        "avg_edge": bets["abs_edge"].mean(),
    }

4.2 Required Evaluation Outputs

  1. Model comparison table showing RMSE, MAE, ATS record, Brier score, and correlation for each model across each test season.
  2. Calibration plot (Chapter 30, Section 30.4) showing predicted probability vs actual outcome frequency, with at least 10 bins.
  3. Profit simulation curve showing cumulative profit over time at three edge thresholds (0.5, 1.0, 2.0 points).
  4. SHAP summary plot for the XGBoost model showing the top 20 most important features.
  5. Season-by-season breakdown to identify whether performance is consistent or driven by one outlier season.

Phase 4 Checklist

  • [ ] Walk-forward backtest completed across 3+ out-of-sample seasons
  • [ ] No look-ahead bias in any feature or training step
  • [ ] All three model families evaluated on identical test sets
  • [ ] Brier score computed and compared to a naive baseline (market implied probability)
  • [ ] Profit simulation run with realistic -110 vig assumption
  • [ ] Calibration plots generated for all models
  • [ ] Results are statistically tested: is ATS record significantly above 52.4%? (Chapter 8)

Phase 5: Betting Strategy

Duration: Week 6

Relevant chapters: Chapter 4 (Bankroll Management), Chapter 12 (Line Shopping), Chapter 13 (Value Betting), Chapter 14 (Advanced Bankroll), Chapter 25 (Optimization)

5.1 Kelly-Based Bet Sizing

Implement fractional Kelly sizing as described in Chapter 4 and fully derived in Chapter 14. You must use fractional Kelly (quarter or half Kelly) to account for estimation uncertainty.

"""
phase5_betting_strategy.py
Kelly sizing, line shopping, and bet recommendation engine.
"""

import numpy as np
import pandas as pd
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass

@dataclass
class BetRecommendation:
    game_id: str
    season: int
    week: int
    home_team: str
    away_team: str
    bet_side: str         # "home_spread", "away_spread", "over", "under"
    model_edge: float     # in points (spread) or probability (moneyline)
    best_odds: int        # American odds at best available book
    best_book: str        # Which sportsbook has the best line
    kelly_fraction: float # Recommended bet size as fraction of bankroll
    bet_amount: float     # Dollar amount at current bankroll
    expected_value: float # Expected profit of the bet


def fractional_kelly(win_prob: float, american_odds: int,
                      kelly_fraction: float = 0.25) -> float:
    """
    Calculate fractional Kelly bet size (Chapter 4, Chapter 14).

    Full Kelly maximizes the geometric growth rate of bankroll but
    requires perfectly known probabilities. Since our probabilities
    are estimated (with error), we use fractional Kelly:

    Full Kelly: f* = (bp - q) / b
    where b = decimal payout ratio, p = win probability, q = 1 - p

    Fractional Kelly: f = kelly_fraction * f*

    Quarter Kelly (0.25) is recommended for sports betting because:
    1. It reduces the impact of probability estimation errors
    2. It reduces variance by ~75% while sacrificing only ~25% of growth rate
    3. It makes drawdowns more psychologically manageable (Chapter 36)
    """
    if american_odds > 0:
        decimal_odds = 1.0 + american_odds / 100.0
    else:
        decimal_odds = 1.0 + 100.0 / abs(american_odds)

    b = decimal_odds - 1.0  # Net payout per unit risked
    p = win_prob
    q = 1.0 - p

    full_kelly = (b * p - q) / b

    if full_kelly <= 0:
        return 0.0

    return kelly_fraction * full_kelly


def compare_lines_across_books(game_id: str,
                                 odds_by_book: Dict[str, Dict]) -> Dict:
    """
    Find the best available line across sportsbooks (Chapter 12).

    odds_by_book format:
    {
        "DraftKings": {"home_spread": -3.0, "home_spread_odds": -110, ...},
        "FanDuel": {"home_spread": -2.5, "home_spread_odds": -115, ...},
        ...
    }

    Returns the best line for each bet type and the book offering it.
    """
    best_lines = {}

    for bet_type in ["home_spread", "away_spread", "over", "under",
                     "home_ml", "away_ml"]:
        best_odds = -999
        best_book = None
        best_line = None

        for book, lines in odds_by_book.items():
            odds_key = f"{bet_type}_odds"
            if odds_key in lines and lines[odds_key] is not None:
                if lines[odds_key] > best_odds:
                    best_odds = lines[odds_key]
                    best_book = book
                    best_line = lines.get(bet_type.replace("_odds", ""))

        best_lines[bet_type] = {
            "line": best_line,
            "odds": best_odds,
            "book": best_book
        }

    return best_lines


def generate_weekly_recommendations(
    predictions: pd.DataFrame,
    bankroll: float,
    odds_by_game: Dict[str, Dict[str, Dict]],
    min_edge_spread: float = 1.0,
    min_edge_total: float = 1.5,
    min_ev_pct: float = 0.02,
    kelly_fraction: float = 0.25,
    max_bet_pct: float = 0.05,
    max_weekly_exposure: float = 0.20
) -> List[BetRecommendation]:
    """
    Generate bet recommendations for a given week.

    Constraints (from Chapter 4 and Chapter 14):
    - min_edge_spread: minimum model edge in points to trigger a spread bet
    - min_edge_total: minimum model edge in points to trigger a total bet
    - min_ev_pct: minimum expected value as a percentage of bet amount
    - kelly_fraction: fraction of full Kelly to bet (default: quarter Kelly)
    - max_bet_pct: maximum single bet as fraction of bankroll (hard cap)
    - max_weekly_exposure: maximum total weekly exposure as fraction of bankroll
    """
    recommendations = []
    total_exposure = 0.0

    for _, game in predictions.iterrows():
        game_id = game["game_id"]

        if game_id not in odds_by_game:
            continue

        best_lines = compare_lines_across_books(game_id, odds_by_game[game_id])

        # Spread bet evaluation
        model_edge = game["prediction"] + game["consensus_spread"]
        if abs(model_edge) >= min_edge_spread:
            bet_side = "home_spread" if model_edge > 0 else "away_spread"
            line_info = best_lines[bet_side]

            if line_info["odds"] > -999:
                # Convert point edge to win probability
                # Using logistic function calibrated to historical data
                win_prob = 1.0 / (1.0 + np.exp(-abs(model_edge) / 5.5))
                ev_pct = win_prob * (100 / abs(line_info["odds"])
                         if line_info["odds"] < 0
                         else line_info["odds"] / 100) - (1 - win_prob)

                if ev_pct >= min_ev_pct:
                    kelly = fractional_kelly(win_prob, line_info["odds"],
                                              kelly_fraction)
                    bet_pct = min(kelly, max_bet_pct)
                    bet_amount = round(bankroll * bet_pct, 2)

                    if total_exposure + bet_pct <= max_weekly_exposure and bet_amount > 0:
                        recommendations.append(BetRecommendation(
                            game_id=game_id,
                            season=game["season"],
                            week=game["week"],
                            home_team=game["home_team"],
                            away_team=game["away_team"],
                            bet_side=bet_side,
                            model_edge=abs(model_edge),
                            best_odds=line_info["odds"],
                            best_book=line_info["book"],
                            kelly_fraction=bet_pct,
                            bet_amount=bet_amount,
                            expected_value=ev_pct * bet_amount
                        ))
                        total_exposure += bet_pct

    # Sort by expected value, highest first
    recommendations.sort(key=lambda r: r.expected_value, reverse=True)
    return recommendations

Phase 5 Checklist

  • [ ] Fractional Kelly sizing implemented with quarter-Kelly default
  • [ ] Line shopping comparison across 3+ sportsbooks
  • [ ] Minimum edge thresholds calibrated against backtest results
  • [ ] Maximum bet size capped at 5% of bankroll
  • [ ] Maximum weekly exposure capped at 20% of bankroll
  • [ ] Expected value calculated for every recommended bet
  • [ ] Bet recommendations sorted by EV for portfolio construction

Phase 6: Production Pipeline

Duration: Week 7

Relevant chapters: Chapter 31 (ML Betting Pipeline), Chapter 37 (Discipline and Systems)

6.1 Weekly Automation Pipeline

Build a pipeline that runs every Tuesday (once new data from the prior week is available) and produces bet recommendations for the upcoming NFL week.

"""
phase6_production_pipeline.py
Weekly automated NFL betting pipeline.
"""

import schedule
import time
import logging
from datetime import datetime
from pathlib import Path

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
    handlers=[
        logging.FileHandler("pipeline.log"),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)


class NFLBettingPipeline:
    """
    End-to-end weekly pipeline (Chapter 31).

    Workflow executed every Tuesday:
    1. Update data: pull latest play-by-play, injury reports, and odds
    2. Recompute features: rebuild feature matrix with latest game results
    3. Update model: retrain if configured (monthly or per-season)
    4. Generate predictions: run model on upcoming week's games
    5. Compare lines: fetch current odds across sportsbooks
    6. Generate recommendations: produce bet recommendations with Kelly sizing
    7. Log and report: store results and send notification
    """

    def __init__(self, config_path: str = "config.yaml"):
        self.config = self._load_config(config_path)
        self.bankroll = self.config.get("bankroll", 10000.0)
        self.current_season = self.config.get("current_season", 2024)
        self.current_week = self.config.get("current_week", 1)

    def _load_config(self, path: str) -> dict:
        import yaml
        with open(path) as f:
            return yaml.safe_load(f)

    def run_weekly_pipeline(self):
        """Execute the complete weekly pipeline."""
        logger.info("=" * 60)
        logger.info(f"NFL BETTING PIPELINE - Season {self.current_season}, "
                     f"Week {self.current_week}")
        logger.info("=" * 60)

        try:
            # Step 1: Update data
            logger.info("Step 1: Updating data...")
            self._update_data()

            # Step 2: Recompute features
            logger.info("Step 2: Computing features...")
            features = self._compute_features()

            # Step 3: Update model (if scheduled)
            logger.info("Step 3: Checking model freshness...")
            model = self._update_model_if_needed(features)

            # Step 4: Generate predictions
            logger.info("Step 4: Generating predictions...")
            predictions = self._generate_predictions(model, features)

            # Step 5: Compare lines
            logger.info("Step 5: Fetching current odds...")
            odds = self._fetch_current_odds()

            # Step 6: Generate recommendations
            logger.info("Step 6: Generating bet recommendations...")
            recommendations = self._generate_recommendations(predictions, odds)

            # Step 7: Log and report
            logger.info("Step 7: Logging results and sending report...")
            self._log_results(recommendations)
            self._send_report(recommendations)

            logger.info(f"Pipeline complete. {len(recommendations)} "
                        f"recommendations generated.")

        except Exception as e:
            logger.error(f"Pipeline failed: {e}", exc_info=True)
            self._send_alert(f"Pipeline failure: {e}")

    def _update_data(self):
        """Pull latest play-by-play and injury data."""
        # Re-run data collection for current season
        # (Implementation reuses Phase 1 code)
        pass

    def _compute_features(self):
        """Rebuild feature matrix with latest data."""
        # (Implementation reuses Phase 2 code)
        pass

    def _update_model_if_needed(self, features):
        """Retrain model if the retrain schedule requires it."""
        # Check last retrain date; retrain if more than N weeks ago
        # or if it is the start of a new season
        pass

    def _generate_predictions(self, model, features):
        """Run the ensemble model on upcoming games."""
        pass

    def _fetch_current_odds(self):
        """Fetch live odds from multiple sportsbooks via API."""
        pass

    def _generate_recommendations(self, predictions, odds):
        """Produce bet recommendations using Phase 5 strategy code."""
        pass

    def _log_results(self, recommendations):
        """Store recommendations to database for tracking (Chapter 37)."""
        pass

    def _send_report(self, recommendations):
        """Send weekly report via email or Slack webhook."""
        pass

    def _send_alert(self, message):
        """Send error alert for pipeline failures."""
        pass


# Schedule the pipeline to run every Tuesday at 10:00 AM
if __name__ == "__main__":
    pipeline = NFLBettingPipeline()

    schedule.every().tuesday.at("10:00").do(pipeline.run_weekly_pipeline)

    logger.info("Pipeline scheduler started. Waiting for next run...")
    while True:
        schedule.run_pending()
        time.sleep(60)

Phase 6 Checklist

  • [ ] Pipeline runs end-to-end from data update through recommendation
  • [ ] Error handling catches and reports failures at each step
  • [ ] Results logged to database for performance tracking (Chapter 37)
  • [ ] Notifications sent via email, Slack, or similar
  • [ ] Pipeline can be run manually for testing
  • [ ] Configuration file externalizes parameters (bankroll, thresholds, API keys)

Project Structure

nfl-betting-system/
|-- README.md
|-- requirements.txt
|-- config.yaml
|-- pipeline.log
|
|-- src/
|   |-- __init__.py
|   |-- data/
|   |   |-- __init__.py
|   |   |-- collect_pbp.py          # Phase 1: play-by-play collection
|   |   |-- collect_odds.py         # Phase 1: odds collection
|   |   |-- collect_injuries.py     # Phase 1: injury data
|   |   |-- collect_weather.py      # Phase 1: weather data
|   |   |-- database.py             # SQLite read/write utilities
|   |
|   |-- features/
|   |   |-- __init__.py
|   |   |-- epa_features.py         # Phase 2: EPA-based features
|   |   |-- schedule_features.py    # Phase 2: schedule/rest features
|   |   |-- elo_ratings.py          # Phase 2: Elo rating system
|   |   |-- injury_features.py      # Phase 2: injury impact
|   |   |-- weather_features.py     # Phase 2: weather features
|   |   |-- market_features.py      # Phase 2: market-derived features
|   |   |-- build_matrix.py         # Phase 2: combine all features
|   |
|   |-- models/
|   |   |-- __init__.py
|   |   |-- ridge_model.py          # Phase 3: Ridge regression
|   |   |-- xgboost_model.py        # Phase 3: XGBoost
|   |   |-- neural_net.py           # Phase 3: PyTorch neural network
|   |   |-- ensemble.py             # Phase 3: model ensemble
|   |   |-- calibration.py          # Phase 3: probability calibration
|   |
|   |-- evaluation/
|   |   |-- __init__.py
|   |   |-- backtest.py             # Phase 4: walk-forward validation
|   |   |-- metrics.py              # Phase 4: evaluation metrics
|   |   |-- profit_sim.py           # Phase 4: profit simulation
|   |   |-- visualizations.py       # Phase 4: plots and dashboards
|   |
|   |-- strategy/
|   |   |-- __init__.py
|   |   |-- kelly.py                # Phase 5: Kelly sizing
|   |   |-- line_shopping.py        # Phase 5: cross-book comparison
|   |   |-- recommendations.py      # Phase 5: bet recommendation engine
|   |
|   |-- pipeline/
|       |-- __init__.py
|       |-- weekly_pipeline.py      # Phase 6: automated weekly pipeline
|       |-- scheduler.py            # Phase 6: scheduling
|       |-- notifications.py        # Phase 6: alerting
|
|-- notebooks/
|   |-- 01_data_exploration.ipynb
|   |-- 02_feature_analysis.ipynb
|   |-- 03_model_comparison.ipynb
|   |-- 04_backtest_results.ipynb
|   |-- 05_strategy_analysis.ipynb
|
|-- data/
|   |-- nfl_betting.db              # SQLite database
|   |-- raw/                        # Raw downloaded data
|   |-- processed/                  # Processed feature files
|
|-- models/
|   |-- ridge_latest.pkl
|   |-- xgboost_latest.json
|   |-- neural_net_latest.pt
|
|-- reports/
|   |-- weekly/                     # Weekly recommendation reports
|   |-- backtest_report.html        # Backtest dashboard
|   |-- technical_report.pdf        # Final technical report
|
|-- tests/
    |-- test_data_collection.py
    |-- test_features.py
    |-- test_models.py
    |-- test_strategy.py
    |-- test_pipeline.py

Grading Rubric

Component Weight Excellent (90-100%) Good (75-89%) Satisfactory (60-74%) Needs Work (<60%)
Data Pipeline 15% Automated collection from 3+ sources; robust error handling; complete data quality checks 2 sources; basic error handling; data quality checked 1 source; manual steps required; minimal quality checks Incomplete data; critical errors; no quality assurance
Feature Engineering 20% 80+ features across 6+ categories; all grounded in analytics research; no leakage 50-79 features; 4-5 categories; no leakage 30-49 features; 3 categories; minor leakage risk <30 features; obvious data leakage; poorly motivated
Modeling 20% 4+ model types with ensemble; hyperparameter tuning; SHAP analysis; multi-task learning 3 model types; tuning performed; basic feature importance 2 model types; default hyperparameters; no interpretability 1 model; no tuning; no understanding of model behavior
Backtesting 20% Walk-forward across 5+ seasons; all metrics computed; statistical significance tested; calibration excellent 3-4 seasons; most metrics computed; calibration checked 2 seasons; basic metrics only; no calibration analysis No proper temporal separation; metrics incomplete
Strategy & Production 15% Kelly sizing with constraints; multi-book line shopping; fully automated pipeline with monitoring Kelly sizing; single book; semi-automated pipeline Flat bet sizing; no line shopping; manual process No sizing logic; no strategy; no pipeline
Documentation & Report 10% Professional technical report; clear code documentation; insightful self-evaluation Complete report; adequate documentation; some reflection Incomplete report; minimal documentation Missing report; no documentation

Suggested Timeline

Week Phase Key Activities Milestone
1 Phase 1: Data Collection Set up environment, download data, build database Database populated with 5+ seasons
2 Phase 2: Feature Engineering (Part 1) EPA features, schedule features, Elo ratings 30+ features computed
3 Phase 2 (cont.) + Phase 3 starts Injury/weather/market features; begin Ridge model 50+ features; baseline model trained
4 Phase 3: Model Building XGBoost and neural network; hyperparameter tuning All 3+ models trained
5 Phase 4: Backtesting Walk-forward validation; evaluation metrics; profit simulation Backtest report complete
6 Phase 5: Betting Strategy Kelly sizing; line shopping; recommendation engine Strategy module functional
7 Phase 6: Production Pipeline Automation; scheduling; monitoring; alerting Pipeline runs end-to-end
8 Documentation & Polish Technical report; presentation; code cleanup All deliverables submitted

Chapter Reference Index

The following chapters are directly applied in this capstone project:

  • Chapter 2 (Probability and Odds): Implied probability extraction, vig removal
  • Chapter 3 (Expected Value): EV calculation for bet recommendations
  • Chapter 4 (Bankroll Management): Kelly Criterion, bankroll constraints
  • Chapter 5 (Data Literacy): Data collection, cleaning, storage
  • Chapter 8 (Hypothesis Testing): Statistical significance of ATS record
  • Chapter 9 (Regression Analysis): Ridge regression baseline model
  • Chapter 10 (Bayesian Thinking): Prior information in probability estimation
  • Chapter 11 (Betting Markets): Market-derived features, line interpretation
  • Chapter 12 (Line Shopping): Multi-book comparison, CLV tracking
  • Chapter 13 (Value Betting): Systematic value identification framework
  • Chapter 14 (Advanced Bankroll): Full Kelly derivation, fractional Kelly, portfolio theory
  • Chapter 15 (Modeling the NFL): EPA metrics, injury impact, schedule factors, home field advantage
  • Chapter 23 (Time Series): Rolling windows, mean reversion, seasonal patterns
  • Chapter 24 (Monte Carlo Simulation): Profit simulation, confidence intervals
  • Chapter 25 (Optimization): Portfolio-level bet sizing optimization
  • Chapter 26 (Ratings and Rankings): Elo rating system, Massey ratings
  • Chapter 27 (Advanced Regression/Classification): XGBoost, SHAP, calibration
  • Chapter 28 (Feature Engineering): Feature design principles, domain features
  • Chapter 29 (Neural Networks): Multi-task PyTorch model, Optuna tuning
  • Chapter 30 (Model Evaluation): Walk-forward validation, Brier score, calibration plots
  • Chapter 31 (ML Betting Pipeline): System architecture, automation, monitoring
  • Chapter 36 (Psychology): Managing variance and drawdowns emotionally
  • Chapter 37 (Discipline and Systems): Bet logging, performance tracking, review processes

Tips for Success

  1. Start with data quality. The single most common failure mode in sports prediction projects is bad data. Verify your data against known results before building anything.

  2. Baseline early. Get Ridge regression working in week 2. Use it as your reference point for all subsequent models. If a complex model cannot beat Ridge, something is wrong.

  3. Respect temporal ordering. Every feature, every training split, every evaluation must strictly respect the arrow of time. Even one feature that inadvertently uses future information will invalidate your entire backtest.

  4. The market is your toughest competitor. NFL point spreads close within 1-2 points of the actual margin on average. A model that beats the market by even 0.5 points consistently is exceptional.

  5. Bet sizing matters more than prediction accuracy. A perfectly calibrated model with poor bet sizing will underperform a decent model with disciplined Kelly sizing (Chapter 14).

  6. Document everything. Your future self (and your grader) will thank you. Record every decision, every hyperparameter choice, and every result.


This capstone project integrates material from Chapters 2--5, 8--15, 23--31, 36, and 37 of The Sports Betting Textbook.