Capstone Project 1: Complete NFL Prediction and Betting System
Project Overview
This capstone project challenges you to design, build, test, and document a complete end-to-end NFL game prediction and betting system. You will collect real data, engineer meaningful features, train multiple predictive models, rigorously backtest their performance, and construct a production-ready pipeline that generates weekly bet recommendations with proper bankroll management.
This project synthesizes material from nearly every part of the textbook. You will apply probability theory (Chapter 2), expected value analysis (Chapter 3), bankroll management (Chapter 4), data literacy skills (Chapter 5), regression analysis (Chapter 9), Bayesian reasoning (Chapter 10), market understanding (Chapter 11), line shopping strategies (Chapter 12), value betting theory (Chapter 13), Kelly staking (Chapter 14), NFL-specific modeling (Chapter 15), time series methods (Chapter 23), Monte Carlo simulation (Chapter 24), optimization (Chapter 25), rating systems (Chapter 26), advanced regression and classification (Chapter 27), feature engineering (Chapter 28), neural networks (Chapter 29), model evaluation (Chapter 30), production pipeline design (Chapter 31), and discipline systems (Chapter 37).
The result should be a system you could realistically deploy for a live NFL season -- not a toy exercise, but a genuine analytical tool grounded in sound statistical methodology.
Learning Objectives
Upon completing this project, you will be able to:
- Build automated data collection pipelines for NFL play-by-play data, odds, and contextual information.
- Engineer predictive features grounded in football analytics research (EPA, DVOA-style metrics, schedule effects).
- Train, tune, and evaluate multiple model families for spread, totals, and moneyline prediction.
- Conduct rigorous walk-forward backtesting that avoids look-ahead bias and reflects realistic execution.
- Implement Kelly-based bet sizing with fractional scaling and risk constraints.
- Design a production pipeline that runs autonomously on a weekly schedule.
- Communicate your methodology and results in a professional technical report.
Requirements and Deliverables
Specific, Measurable Requirements
| Requirement | Minimum Standard | Exceeds Expectations |
|---|---|---|
| Data sources | 2 play-by-play sources + 1 odds source | 3+ play-by-play sources + 3+ odds sources |
| Historical seasons collected | 5 seasons (2019--2023) | 8+ seasons (2016--2023) |
| Engineered features | 50 unique features | 80+ unique features across 6+ categories |
| Model types trained | 3 distinct model families | 5+ model families with ensemble |
| Walk-forward backtest seasons | 3 out-of-sample seasons | 5+ out-of-sample seasons |
| Bet types modeled | Spreads and totals | Spreads, totals, and moneylines |
| Brier score (spread ATS) | Below 0.260 | Below 0.250 |
| Documentation length | 15 pages technical report | 20+ pages with supplementary analysis |
| Code quality | Runs without errors, basic comments | Fully documented, typed, with tests |
Final Deliverables
- Complete Python codebase organized per the project structure below, with a
README.mdexplaining setup and execution. - Technical report (15--20 pages) covering methodology, feature importance, model comparison, backtest results, and critical self-evaluation.
- Backtest results dashboard -- either a Jupyter notebook with interactive plots or a Streamlit/Dash web application.
- 10-minute presentation with slides summarizing the approach and key findings.
Phase 1: Data Collection
Duration: Week 1 (of 8)
Relevant chapters: Chapter 5 (Data Literacy), Chapter 15 (Modeling the NFL), Chapter 31 (ML Betting Pipeline)
1.1 NFL Play-by-Play Data
The foundation of any NFL model is play-by-play data. Your primary source is the nfl_data_py package, which wraps nflfastR data and provides pre-computed EPA (Expected Points Added) and WPA (Win Probability Added) values for every play.
Required data fields: game_id, season, week, home_team, away_team, posteam, defteam, play_type, yards_gained, epa, wpa, success (binary: EPA > 0), down, ydstogo, yardline_100, score_differential, half_seconds_remaining, pass_attempt, rush_attempt, interception, fumble, sack, penalty, touchdown, field_goal_attempt, punt, qb_name, receiver_name, rusher_name, air_yards, yards_after_catch.
"""
phase1_data_collection.py
Collect and store NFL play-by-play data, schedule information,
odds history, and contextual features.
"""
import nfl_data_py as nfl
import pandas as pd
import sqlite3
import requests
import time
from pathlib import Path
from datetime import datetime
from typing import List, Optional
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)
DB_PATH = DATA_DIR / "nfl_betting.db"
SEASONS = list(range(2016, 2024)) # 2016 through 2023
# ---------------------------------------------------------------------------
# 1. Play-by-Play Data
# ---------------------------------------------------------------------------
def collect_play_by_play(seasons: List[int]) -> pd.DataFrame:
"""
Download NFL play-by-play data for the specified seasons using nfl_data_py.
This includes pre-computed EPA and WPA for every play.
Returns a DataFrame with one row per play, all plays across all seasons.
"""
print(f"Downloading play-by-play data for seasons: {seasons}")
pbp = nfl.import_pbp_data(seasons, downcast=True)
# Filter to regular season and playoffs
pbp = pbp[pbp["season_type"].isin(["REG", "POST"])].copy()
# Verify critical columns exist
required_cols = [
"game_id", "season", "week", "home_team", "away_team",
"posteam", "defteam", "play_type", "yards_gained", "epa",
"wpa", "success", "down", "ydstogo", "score_differential",
"half_seconds_remaining", "pass_attempt", "rush_attempt",
"interception", "fumble_lost", "sack", "touchdown"
]
missing = [c for c in required_cols if c not in pbp.columns]
if missing:
raise ValueError(f"Missing expected columns: {missing}")
print(f"Collected {len(pbp):,} plays across {len(seasons)} seasons")
return pbp
def aggregate_game_level_stats(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Aggregate play-by-play data to the game-team level.
Produces one row per team per game with offensive and defensive summaries.
"""
# Offensive stats (when team has possession)
off = pbp.groupby(["game_id", "season", "week", "posteam"]).agg(
off_plays=("play_type", "count"),
off_epa_total=("epa", "sum"),
off_epa_per_play=("epa", "mean"),
off_success_rate=("success", "mean"),
off_pass_attempts=("pass_attempt", "sum"),
off_rush_attempts=("rush_attempt", "sum"),
off_yards=("yards_gained", "sum"),
off_touchdowns=("touchdown", "sum"),
off_interceptions=("interception", "sum"),
off_fumbles_lost=("fumble_lost", "sum"),
off_sacks_taken=("sack", "sum"),
).reset_index().rename(columns={"posteam": "team"})
# Defensive stats (when opposing team has possession)
defense = pbp.groupby(["game_id", "season", "week", "defteam"]).agg(
def_plays=("play_type", "count"),
def_epa_total=("epa", "sum"),
def_epa_per_play=("epa", "mean"),
def_success_rate=("success", "mean"),
def_pass_attempts_faced=("pass_attempt", "sum"),
def_rush_attempts_faced=("rush_attempt", "sum"),
def_yards_allowed=("yards_gained", "sum"),
def_touchdowns_allowed=("touchdown", "sum"),
def_interceptions_forced=("interception", "sum"),
def_fumbles_forced=("fumble_lost", "sum"),
def_sacks=("sack", "sum"),
).reset_index().rename(columns={"defteam": "team"})
# Merge offensive and defensive stats
game_stats = off.merge(
defense,
on=["game_id", "season", "week", "team"],
how="outer"
)
print(f"Aggregated {len(game_stats):,} team-game records")
return game_stats
# ---------------------------------------------------------------------------
# 2. Schedule and Results
# ---------------------------------------------------------------------------
def collect_schedules(seasons: List[int]) -> pd.DataFrame:
"""
Download NFL schedules with game results.
Returns one row per game with teams, scores, location, and surface info.
"""
schedules = nfl.import_schedules(seasons)
schedules = schedules[[
"game_id", "season", "game_type", "week", "gameday", "weekday",
"gametime", "away_team", "away_score", "home_team", "home_score",
"location", "roof", "surface", "temp", "wind", "away_rest",
"home_rest", "away_moneyline", "home_moneyline", "spread_line",
"away_spread_odds", "home_spread_odds", "total_line",
"under_odds", "over_odds", "div_game", "overtime"
]].copy()
schedules["home_margin"] = schedules["home_score"] - schedules["away_score"]
schedules["total_score"] = schedules["home_score"] + schedules["away_score"]
print(f"Collected {len(schedules):,} games across {len(seasons)} seasons")
return schedules
# ---------------------------------------------------------------------------
# 3. Historical Odds Data
# ---------------------------------------------------------------------------
def collect_odds_history(seasons: List[int]) -> pd.DataFrame:
"""
Collect historical closing odds from multiple sportsbooks.
Primary source: The odds data embedded in nfl_data_py schedules (from
ESPN / standard consensus lines).
Secondary sources to supplement:
- Australian Sports Betting dataset (free, covers NFL from 2007+)
URL: https://www.aussportsbetting.com/data/
- Pro-Football-Reference game lines
- Kaggle NFL odds datasets
For multi-book analysis, you should also integrate data from the
Odds API (https://the-odds-api.com/) which provides lines from
DraftKings, FanDuel, BetMGM, Caesars, and others.
"""
# Start with schedule-embedded odds
schedules = nfl.import_schedules(seasons)
odds = schedules[[
"game_id", "season", "week", "away_team", "home_team",
"away_moneyline", "home_moneyline", "spread_line",
"away_spread_odds", "home_spread_odds",
"total_line", "under_odds", "over_odds"
]].copy()
odds.rename(columns={
"spread_line": "consensus_spread",
"total_line": "consensus_total",
"home_moneyline": "home_ml",
"away_moneyline": "away_ml"
}, inplace=True)
print(f"Collected odds for {len(odds):,} games")
return odds
def fetch_odds_api_snapshot(api_key: str, sport: str = "americanfootball_nfl") -> dict:
"""
Fetch current odds from The Odds API (requires free API key).
Returns odds from multiple sportsbooks for current NFL games.
Free tier: 500 requests/month.
"""
url = f"https://api.the-odds-api.com/v4/sports/{sport}/odds/"
params = {
"apiKey": api_key,
"regions": "us",
"markets": "spreads,totals,h2h",
"oddsFormat": "american",
"bookmakers": "draftkings,fanduel,betmgm,caesars,pointsbet"
}
response = requests.get(url, params=params, timeout=30)
response.raise_for_status()
return response.json()
# ---------------------------------------------------------------------------
# 4. Contextual Data: Injuries, Weather, Travel
# ---------------------------------------------------------------------------
def collect_injury_data(seasons: List[int]) -> pd.DataFrame:
"""
Collect NFL injury report data. nfl_data_py provides weekly injury data
including player name, team, position, injury type, and game status
(Out, Doubtful, Questionable, Probable).
"""
injuries = nfl.import_injuries(seasons)
injuries = injuries[[
"season", "week", "team", "full_name", "position",
"report_status", "practice_status"
]].copy()
# Encode severity for modeling
status_map = {
"Out": 1.0,
"Doubtful": 0.85,
"Questionable": 0.50,
"Probable": 0.10,
}
injuries["miss_probability"] = injuries["report_status"].map(status_map).fillna(0.0)
print(f"Collected {len(injuries):,} injury records")
return injuries
def calculate_travel_distance(home_team: str, away_team: str,
team_locations: dict) -> float:
"""
Calculate approximate travel distance (miles) between team cities.
team_locations should map team abbreviations to (latitude, longitude).
Uses the Haversine formula.
"""
import math
lat1, lon1 = team_locations[away_team]
lat2, lon2 = team_locations[home_team]
R = 3959 # Earth radius in miles
dlat = math.radians(lat2 - lat1)
dlon = math.radians(lon2 - lon1)
a = (math.sin(dlat / 2) ** 2 +
math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *
math.sin(dlon / 2) ** 2)
c = 2 * math.asin(math.sqrt(a))
return R * c
# NFL team locations (latitude, longitude)
TEAM_LOCATIONS = {
"ARI": (33.5276, -112.2626), "ATL": (33.7573, -84.4009),
"BAL": (39.2780, -76.6227), "BUF": (42.7738, -78.7870),
"CAR": (35.2258, -80.8528), "CHI": (41.8623, -87.6167),
"CIN": (39.0955, -84.5160), "CLE": (41.5061, -81.6995),
"DAL": (32.7473, -97.0945), "DEN": (39.7439, -105.0201),
"DET": (42.3400, -83.0456), "GB": (44.5013, -88.0622),
"HOU": (29.6847, -95.4107), "IND": (39.7601, -86.1639),
"JAX": (30.3239, -81.6373), "KC": (39.0489, -94.4839),
"LV": (36.0909, -115.1833), "LAC": (33.9535, -118.3392),
"LAR": (33.9535, -118.3392), "MIA": (25.9580, -80.2389),
"MIN": (44.9736, -93.2575), "NE": (42.0909, -71.2643),
"NO": (29.9511, -90.0812), "NYG": (40.8128, -74.0742),
"NYJ": (40.8128, -74.0742), "PHI": (39.9008, -75.1675),
"PIT": (40.4468, -80.0158), "SF": (37.4032, -121.9698),
"SEA": (47.5952, -122.3316), "TB": (27.9759, -82.5033),
"TEN": (36.1665, -86.7713), "WAS": (38.9076, -76.8645),
}
# ---------------------------------------------------------------------------
# 5. Database Storage
# ---------------------------------------------------------------------------
def store_to_database(df: pd.DataFrame, table_name: str,
db_path: Path = DB_PATH) -> None:
"""Store a DataFrame to SQLite for persistent, queryable access."""
conn = sqlite3.connect(str(db_path))
df.to_sql(table_name, conn, if_exists="replace", index=False)
conn.close()
print(f"Stored {len(df):,} rows to table '{table_name}'")
def load_from_database(table_name: str, db_path: Path = DB_PATH) -> pd.DataFrame:
"""Load a table from the SQLite database."""
conn = sqlite3.connect(str(db_path))
df = pd.read_sql(f"SELECT * FROM {table_name}", conn)
conn.close()
return df
# ---------------------------------------------------------------------------
# Main Collection Pipeline
# ---------------------------------------------------------------------------
def run_data_collection():
"""Execute the complete data collection pipeline."""
print("=" * 60)
print("NFL BETTING SYSTEM - DATA COLLECTION PIPELINE")
print("=" * 60)
# Play-by-play
pbp = collect_play_by_play(SEASONS)
store_to_database(pbp, "play_by_play")
# Aggregated game-level stats
game_stats = aggregate_game_level_stats(pbp)
store_to_database(game_stats, "game_stats")
# Schedules and results
schedules = collect_schedules(SEASONS)
store_to_database(schedules, "schedules")
# Odds
odds = collect_odds_history(SEASONS)
store_to_database(odds, "odds_history")
# Injuries
injuries = collect_injury_data(SEASONS)
store_to_database(injuries, "injuries")
print("\nData collection complete.")
print(f"Database stored at: {DB_PATH}")
if __name__ == "__main__":
run_data_collection()
Phase 1 Checklist
- [ ] Play-by-play data downloaded for all target seasons
- [ ] Game-level aggregated statistics computed and stored
- [ ] Schedule and results data collected with home margins and totals
- [ ] Odds data collected from at least one source; multi-book data integrated if available
- [ ] Injury reports downloaded and severity-encoded
- [ ] Travel distances computed for all team matchups
- [ ] All data stored in SQLite database with consistent team abbreviations
- [ ] Data quality checks passed: no missing game IDs, scores match schedule data, EPA values present for 95%+ of plays
Phase 2: Feature Engineering
Duration: Weeks 2--3
Relevant chapters: Chapter 15 (NFL Modeling), Chapter 23 (Time Series), Chapter 26 (Ratings), Chapter 28 (Feature Engineering)
You must engineer a minimum of 50 features across the following categories. Each feature should have a documented rationale grounded in football analytics research.
2.1 EPA-Based Efficiency Features
These are the core predictive features. Research consistently shows EPA-based metrics are more predictive of future performance than traditional stats like yards and points (see Chapter 15, Section 15.2).
"""
phase2_feature_engineering.py
Engineer predictive features from raw NFL data.
"""
import pandas as pd
import numpy as np
from typing import Tuple
# ---------------------------------------------------------------------------
# EPA-Based Efficiency (Features 1-16)
# ---------------------------------------------------------------------------
def compute_rolling_epa_features(game_stats: pd.DataFrame,
windows: list = [3, 5, 8]) -> pd.DataFrame:
"""
Compute rolling EPA-based efficiency metrics over multiple windows.
The choice of window sizes follows Chapter 23 (Time Series): short windows
(3 games) capture recent form, medium windows (5 games) balance signal
and noise, and longer windows (8 games) approximate half-season trends.
Features produced per window:
- rolling_off_epa_per_play: offensive EPA/play (higher is better)
- rolling_def_epa_per_play: defensive EPA/play (lower is better)
- rolling_off_success_rate: fraction of plays with positive EPA
- rolling_def_success_rate: fraction of opponent plays with positive EPA
- rolling_pass_epa: EPA/play on pass plays only
- rolling_rush_epa: EPA/play on rush plays only
- rolling_epa_margin: offensive EPA minus defensive EPA allowed
"""
# Sort by team and chronological order
df = game_stats.sort_values(["team", "season", "week"]).copy()
for w in windows:
grp = df.groupby("team")
# Offensive EPA per play
df[f"off_epa_per_play_r{w}"] = grp["off_epa_per_play"].transform(
lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
)
# Defensive EPA per play
df[f"def_epa_per_play_r{w}"] = grp["def_epa_per_play"].transform(
lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
)
# Offensive success rate
df[f"off_success_rate_r{w}"] = grp["off_success_rate"].transform(
lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
)
# Defensive success rate
df[f"def_success_rate_r{w}"] = grp["def_success_rate"].transform(
lambda x: x.shift(1).rolling(w, min_periods=max(1, w // 2)).mean()
)
# Combined EPA margin
df[f"epa_margin_r{w}"] = df[f"off_epa_per_play_r{w}"] - df[f"def_epa_per_play_r{w}"]
return df
def compute_pass_rush_splits(pbp: pd.DataFrame,
game_stats: pd.DataFrame) -> pd.DataFrame:
"""
Compute separate EPA metrics for pass and rush plays.
Pass EPA is generally more stable and predictive than rush EPA
(see Chapter 15, Section 15.2).
"""
pass_epa = pbp[pbp["pass_attempt"] == 1].groupby(
["game_id", "posteam"]
)["epa"].mean().reset_index().rename(
columns={"posteam": "team", "epa": "pass_epa_per_play"}
)
rush_epa = pbp[pbp["rush_attempt"] == 1].groupby(
["game_id", "posteam"]
)["epa"].mean().reset_index().rename(
columns={"posteam": "team", "epa": "rush_epa_per_play"}
)
df = game_stats.merge(pass_epa, on=["game_id", "team"], how="left")
df = df.merge(rush_epa, on=["game_id", "team"], how="left")
return df
# ---------------------------------------------------------------------------
# Schedule and Rest Features (Features 17-26)
# ---------------------------------------------------------------------------
def compute_schedule_features(schedules: pd.DataFrame) -> pd.DataFrame:
"""
Compute schedule-related features that affect game outcomes.
Rest advantage: Teams with more days of rest perform better on average.
Chapter 15 documents a roughly 1-point spread advantage per extra
rest day beyond the standard 7 days.
Bye week: Teams coming off bye weeks historically perform above
expectation, though this edge has diminished as markets have adjusted.
Travel: Long-distance travel, especially westward, has a measurable
fatigue effect (see Chapter 15, Section 15.3).
"""
df = schedules.copy()
# Rest differential
df["rest_differential"] = df["home_rest"] - df["away_rest"]
# Bye week flags (rest >= 13 days typically indicates bye)
df["home_off_bye"] = (df["home_rest"] >= 13).astype(int)
df["away_off_bye"] = (df["away_rest"] >= 13).astype(int)
# Short rest flags (Thursday games, rest <= 5)
df["home_short_rest"] = (df["home_rest"] <= 5).astype(int)
df["away_short_rest"] = (df["away_rest"] <= 5).astype(int)
# Division game flag (tighter, lower-scoring, more unpredictable)
# div_game is already in schedule data
# Time zone differential (proxy for travel fatigue)
timezone_map = {
"ARI": -7, "LAR": -8, "LAC": -8, "SF": -8, "SEA": -8,
"LV": -8, "DEN": -7, "DAL": -6, "HOU": -6, "KC": -6,
"MIN": -6, "CHI": -6, "GB": -6, "NO": -6, "TEN": -6,
"IND": -5, "JAX": -5, "ATL": -5, "CAR": -5, "CIN": -5,
"CLE": -5, "DET": -5, "MIA": -5, "TB": -5, "BAL": -5,
"BUF": -5, "NE": -5, "NYG": -5, "NYJ": -5, "PHI": -5,
"PIT": -5, "WAS": -5
}
df["tz_diff"] = (
df["away_team"].map(timezone_map).fillna(-6) -
df["home_team"].map(timezone_map).fillna(-6)
)
# Game timing features
df["is_primetime"] = df["gametime"].apply(
lambda x: 1 if pd.notna(x) and (
str(x) >= "20:00" or str(x) >= "16:25"
) else 0
)
# Week of season (early vs. late)
df["season_phase"] = pd.cut(
df["week"],
bins=[0, 4, 9, 13, 18],
labels=["early", "mid_early", "mid_late", "late"]
)
return df
# ---------------------------------------------------------------------------
# Rating System Features (Features 27-34)
# ---------------------------------------------------------------------------
def compute_elo_ratings(schedules: pd.DataFrame,
k_factor: float = 20.0,
home_advantage: float = 48.0,
mean_reversion: float = 0.33) -> pd.DataFrame:
"""
Compute Elo ratings for all NFL teams using the methodology from
Chapter 26 (Ratings and Ranking Systems).
Parameters follow the FiveThirtyEight NFL Elo approach:
- k_factor: 20 (how quickly ratings update after each game)
- home_advantage: 48 Elo points (approximately 3 points on the spread)
- mean_reversion: 1/3 regression to mean (1505) between seasons
The Elo rating difference between two teams converts to a spread
prediction via: predicted_spread = elo_diff / 25
"""
teams = set(schedules["home_team"]).union(set(schedules["away_team"]))
elo = {team: 1505.0 for team in teams}
elo_records = []
current_season = None
for _, game in schedules.sort_values(["season", "week"]).iterrows():
season = game["season"]
# Between-season mean reversion
if season != current_season:
if current_season is not None:
for team in elo:
elo[team] = elo[team] * (1 - mean_reversion) + 1505 * mean_reversion
current_season = season
home = game["home_team"]
away = game["away_team"]
# Store pre-game Elo
home_elo_pre = elo[home]
away_elo_pre = elo[away]
# Expected outcome
elo_diff = home_elo_pre + home_advantage - away_elo_pre
expected_home = 1.0 / (1.0 + 10.0 ** (-elo_diff / 400.0))
# Actual outcome
if pd.notna(game.get("home_score")) and pd.notna(game.get("away_score")):
margin = game["home_score"] - game["away_score"]
actual_home = 1.0 if margin > 0 else (0.5 if margin == 0 else 0.0)
# Margin of victory multiplier (diminishing returns for blowouts)
mov_mult = np.log(abs(margin) + 1) * (2.2 / (elo_diff * 0.001 + 2.2))
# Update Elo
update = k_factor * mov_mult * (actual_home - expected_home)
elo[home] += update
elo[away] -= update
elo_records.append({
"game_id": game["game_id"],
"season": season,
"week": game["week"],
"home_team": home,
"away_team": away,
"home_elo": home_elo_pre,
"away_elo": away_elo_pre,
"elo_diff": home_elo_pre - away_elo_pre,
"elo_predicted_spread": -(home_elo_pre + home_advantage - away_elo_pre) / 25.0,
"elo_home_win_prob": expected_home,
})
return pd.DataFrame(elo_records)
# ---------------------------------------------------------------------------
# Injury Impact Features (Features 35-40)
# ---------------------------------------------------------------------------
def compute_injury_features(injuries: pd.DataFrame,
schedules: pd.DataFrame) -> pd.DataFrame:
"""
Quantify the impact of injuries on team strength.
Uses positional value weights derived from Chapter 15: QB injuries
have the largest impact on team performance, followed by edge
rushers and offensive tackles.
Positional weights (approximate points of spread impact):
QB: 5.0, EDGE/OLB: 1.2, OT: 1.0, WR1: 0.8, CB: 0.7,
RB: 0.3, all others: 0.2
"""
position_weights = {
"QB": 5.0, "DE": 1.2, "OLB": 1.2, "OT": 1.0, "T": 1.0,
"WR": 0.8, "CB": 0.7, "S": 0.5, "DT": 0.5, "LB": 0.4,
"ILB": 0.4, "G": 0.4, "C": 0.3, "RB": 0.3, "TE": 0.3,
"K": 0.2, "P": 0.1
}
inj = injuries.copy()
inj["pos_weight"] = inj["position"].map(position_weights).fillna(0.2)
inj["weighted_impact"] = inj["miss_probability"] * inj["pos_weight"]
# Aggregate to team-week level
team_injury = inj.groupby(["season", "week", "team"]).agg(
injury_count=("full_name", "count"),
injury_impact_total=("weighted_impact", "sum"),
qb_injury_flag=("position", lambda x: int(
any((x == "QB") & (inj.loc[x.index, "miss_probability"] > 0.5))
)),
).reset_index()
return team_injury
# ---------------------------------------------------------------------------
# Weather Features (Features 41-45)
# ---------------------------------------------------------------------------
def compute_weather_features(schedules: pd.DataFrame) -> pd.DataFrame:
"""
Engineer weather features that affect scoring and strategy.
Cold, windy conditions reduce scoring and favor unders/running teams.
Indoor games normalize these effects (see Chapter 15, Section 15.3).
"""
df = schedules.copy()
df["is_dome"] = df["roof"].isin(["dome", "closed"]).astype(int)
df["temp_cold"] = ((df["temp"] < 35) & (df["is_dome"] == 0)).astype(int)
df["wind_high"] = ((df["wind"] > 15) & (df["is_dome"] == 0)).astype(int)
df["wind_extreme"] = ((df["wind"] > 25) & (df["is_dome"] == 0)).astype(int)
# Interaction: cold AND windy is worse than either alone
df["cold_windy"] = (df["temp_cold"] & df["wind_high"]).astype(int)
return df
# ---------------------------------------------------------------------------
# Market-Derived Features (Features 46-52)
# ---------------------------------------------------------------------------
def compute_market_features(odds: pd.DataFrame,
schedules: pd.DataFrame) -> pd.DataFrame:
"""
Derive features from the betting market itself.
The market is an information aggregator; features derived from
line movements and implied probabilities are highly predictive
(see Chapter 11, Understanding Betting Markets).
"""
df = schedules.merge(odds, on=["game_id", "season", "week",
"away_team", "home_team"], how="left")
# Implied probability from moneyline (Chapter 2)
def ml_to_prob(ml):
if pd.isna(ml):
return np.nan
if ml > 0:
return 100.0 / (ml + 100.0)
else:
return abs(ml) / (abs(ml) + 100.0)
df["home_implied_prob"] = df["home_ml"].apply(ml_to_prob)
df["away_implied_prob"] = df["away_ml"].apply(ml_to_prob)
# Remove vig using multiplicative method (Chapter 2, Section 2.3)
total_prob = df["home_implied_prob"] + df["away_implied_prob"]
df["home_fair_prob"] = df["home_implied_prob"] / total_prob
df["away_fair_prob"] = df["away_implied_prob"] / total_prob
# Spread as a feature (market's best estimate of margin)
# Already available as consensus_spread
# Total as a feature (market's estimate of combined scoring)
# Already available as consensus_total
return df
# ---------------------------------------------------------------------------
# Combine All Features
# ---------------------------------------------------------------------------
def build_feature_matrix(db_path: str = "data/nfl_betting.db") -> pd.DataFrame:
"""
Load all data, compute all features, and produce the final
feature matrix for modeling. One row per game, with separate
home and away feature columns.
"""
import sqlite3
conn = sqlite3.connect(db_path)
pbp = pd.read_sql("SELECT * FROM play_by_play", conn)
game_stats = pd.read_sql("SELECT * FROM game_stats", conn)
schedules = pd.read_sql("SELECT * FROM schedules", conn)
odds = pd.read_sql("SELECT * FROM odds_history", conn)
injuries = pd.read_sql("SELECT * FROM injuries", conn)
conn.close()
# Compute all feature groups
epa_features = compute_rolling_epa_features(game_stats)
epa_features = compute_pass_rush_splits(pbp, epa_features)
schedule_features = compute_schedule_features(schedules)
elo_features = compute_elo_ratings(schedules)
injury_features = compute_injury_features(injuries, schedules)
weather_features = compute_weather_features(schedules)
market_features = compute_market_features(odds, schedules)
# Merge into single game-level DataFrame
# (detailed merge logic depends on your schema; join on game_id)
features = schedule_features.merge(elo_features, on=[
"game_id", "season", "week", "home_team", "away_team"
], how="left")
features = features.merge(weather_features[[
"game_id", "is_dome", "temp_cold", "wind_high",
"wind_extreme", "cold_windy"
]], on="game_id", how="left")
features = features.merge(market_features[[
"game_id", "home_implied_prob", "away_implied_prob",
"home_fair_prob", "away_fair_prob"
]], on="game_id", how="left")
# Merge rolling EPA for home and away teams separately
for side, team_col in [("home", "home_team"), ("away", "away_team")]:
side_epa = epa_features.rename(
columns={c: f"{side}_{c}" for c in epa_features.columns
if c not in ["game_id", "season", "week", "team"]}
)
side_epa = side_epa.rename(columns={"team": team_col})
features = features.merge(
side_epa, on=["game_id", "season", "week", team_col], how="left"
)
# Merge injuries for home and away
for side, team_col in [("home", "home_team"), ("away", "away_team")]:
side_inj = injury_features.rename(
columns={c: f"{side}_{c}" for c in injury_features.columns
if c not in ["season", "week", "team"]}
)
side_inj = side_inj.rename(columns={"team": team_col})
features = features.merge(
side_inj, on=["season", "week", team_col], how="left"
)
# Differential features (home minus away)
for w in [3, 5, 8]:
features[f"epa_diff_r{w}"] = (
features.get(f"home_off_epa_per_play_r{w}", 0) -
features.get(f"away_off_epa_per_play_r{w}", 0)
)
features["injury_diff"] = (
features.get("away_injury_impact_total", 0) -
features.get("home_injury_impact_total", 0)
)
print(f"Final feature matrix: {features.shape[0]} games x {features.shape[1]} columns")
return features
2.2 Complete Feature List
The following table catalogs the minimum 50 features you should engineer. Group them by category and document each one.
| # | Feature Name | Category | Description | Chapter Reference |
|---|---|---|---|---|
| 1-3 | off_epa_per_play_r{3,5,8} | EPA Efficiency | Rolling offensive EPA per play | Ch 15, 28 |
| 4-6 | def_epa_per_play_r{3,5,8} | EPA Efficiency | Rolling defensive EPA per play | Ch 15, 28 |
| 7-9 | off_success_rate_r{3,5,8} | EPA Efficiency | Rolling offensive success rate | Ch 15 |
| 10-12 | def_success_rate_r{3,5,8} | EPA Efficiency | Rolling defensive success rate | Ch 15 |
| 13-15 | epa_margin_r{3,5,8} | EPA Efficiency | Offense EPA minus defense EPA | Ch 15 |
| 16 | pass_epa_per_play | EPA Splits | EPA per pass play | Ch 15 |
| 17 | rush_epa_per_play | EPA Splits | EPA per rush play | Ch 15 |
| 18 | rest_differential | Schedule | Home rest days minus away rest days | Ch 15 |
| 19 | home_off_bye | Schedule | Home team coming off bye week | Ch 15 |
| 20 | away_off_bye | Schedule | Away team coming off bye week | Ch 15 |
| 21 | home_short_rest | Schedule | Home team on short rest | Ch 15 |
| 22 | away_short_rest | Schedule | Away team on short rest | Ch 15 |
| 23 | div_game | Schedule | Divisional matchup flag | Ch 15 |
| 24 | tz_diff | Schedule | Time zone differential | Ch 15 |
| 25 | is_primetime | Schedule | Primetime game flag | Ch 15 |
| 26 | season_phase | Schedule | Phase of season (early/mid/late) | Ch 23 |
| 27 | home_elo | Ratings | Home team Elo rating | Ch 26 |
| 28 | away_elo | Ratings | Away team Elo rating | Ch 26 |
| 29 | elo_diff | Ratings | Elo difference (home minus away) | Ch 26 |
| 30 | elo_predicted_spread | Ratings | Elo-derived point spread | Ch 26 |
| 31 | elo_home_win_prob | Ratings | Elo-derived home win probability | Ch 26 |
| 32-34 | massey_off, massey_def, massey_overall | Ratings | Massey ratings (Chapter 26 exercise) | Ch 26 |
| 35 | injury_impact_total | Injuries | Weighted injury severity sum | Ch 15, 32 |
| 36 | qb_injury_flag | Injuries | Starting QB out or doubtful | Ch 15 |
| 37 | injury_count | Injuries | Number of players on injury report | Ch 15 |
| 38 | injury_diff | Injuries | Injury impact differential | Ch 15 |
| 39-40 | positional_injury_off, positional_injury_def | Injuries | Offensive vs defensive injury impact | Ch 15 |
| 41 | is_dome | Weather | Indoor game flag | Ch 15 |
| 42 | temp_cold | Weather | Temperature below 35F | Ch 15 |
| 43 | wind_high | Weather | Wind above 15 mph | Ch 15 |
| 44 | wind_extreme | Weather | Wind above 25 mph | Ch 15 |
| 45 | cold_windy | Weather | Cold and windy interaction | Ch 15 |
| 46 | home_implied_prob | Market | Market-implied home win probability | Ch 2, 11 |
| 47 | home_fair_prob | Market | Vig-removed implied probability | Ch 2 |
| 48 | consensus_spread | Market | Consensus point spread | Ch 11 |
| 49 | consensus_total | Market | Consensus over/under total | Ch 11 |
| 50 | spread_vs_elo | Hybrid | Market spread minus Elo spread | Ch 11, 26 |
| 51 | home_record_r5 | Trend | Home team W-L last 5 games | Ch 23 |
| 52 | away_record_r5 | Trend | Away team W-L last 5 games | Ch 23 |
Phase 2 Checklist
- [ ] All 50+ features computed with shift(1) to prevent look-ahead bias
- [ ] Rolling windows use only past data (never current game)
- [ ] Elo ratings computed sequentially with proper between-season regression
- [ ] Injury features weighted by positional importance
- [ ] Weather features handle dome games correctly
- [ ] Feature correlation matrix examined; highly correlated features (r > 0.90) addressed
- [ ] Feature distributions visualized and outliers investigated
- [ ] Missing values documented with imputation strategy
Phase 3: Model Building
Duration: Weeks 3--4
Relevant chapters: Chapter 9 (Regression), Chapter 27 (Advanced Regression/Classification), Chapter 29 (Neural Networks)
You must build at least three distinct model families. Each model targets one or more of: (a) point spread prediction, (b) game total prediction, (c) moneyline (win probability) prediction.
3.1 Model Specifications
Model A: Ridge/Lasso Regression (Chapter 9) - Target: home_margin (continuous) for spread prediction - Regularization selected via cross-validation - Serves as the interpretable baseline
Model B: XGBoost Gradient Boosting (Chapter 27) - Target: home_margin (regression) and home_win (classification) - Hyperparameter tuning via Optuna (Chapter 29, Section 29.5) - Feature importance analysis using SHAP (Chapter 27, Section 27.5)
Model C: Neural Network (Chapter 29) - Architecture: feed-forward with 2--3 hidden layers, dropout regularization - Target: home_margin (regression head) and home_win_probability (classification head) - Multi-task learning configuration
Model D (bonus): Ensemble/Stacking - Combine predictions from Models A, B, and C using a meta-learner - Stacking follows Chapter 27, Section 27.2
"""
phase3_model_building.py
Train spread, totals, and moneyline prediction models.
"""
import pandas as pd
import numpy as np
from sklearn.linear_model import Ridge, RidgeCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import xgboost as xgb
import torch
import torch.nn as nn
from typing import Dict, Tuple, List
import optuna
import shap
# ---------------------------------------------------------------------------
# Model A: Ridge Regression Baseline
# ---------------------------------------------------------------------------
class SpreadModelRidge:
"""
Linear regression with L2 regularization for spread prediction.
This is the interpretable baseline model (Chapter 9).
"""
def __init__(self, alphas: list = None):
if alphas is None:
alphas = [0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]
self.pipeline = Pipeline([
("scaler", StandardScaler()),
("ridge", RidgeCV(alphas=alphas, cv=5))
])
self.feature_names = None
def fit(self, X: pd.DataFrame, y: pd.Series):
self.feature_names = list(X.columns)
self.pipeline.fit(X, y)
return self
def predict(self, X: pd.DataFrame) -> np.ndarray:
return self.pipeline.predict(X)
def get_coefficients(self) -> pd.DataFrame:
ridge = self.pipeline.named_steps["ridge"]
scaler = self.pipeline.named_steps["scaler"]
coefs = ridge.coef_ * scaler.scale_
return pd.DataFrame({
"feature": self.feature_names,
"coefficient": coefs,
"abs_coefficient": np.abs(coefs)
}).sort_values("abs_coefficient", ascending=False)
# ---------------------------------------------------------------------------
# Model B: XGBoost
# ---------------------------------------------------------------------------
class SpreadModelXGB:
"""
XGBoost gradient boosting model for spread prediction (Chapter 27).
Supports both regression (spread prediction) and classification
(win probability).
"""
def __init__(self, task: str = "regression", params: dict = None):
self.task = task
default_params = {
"n_estimators": 500,
"max_depth": 6,
"learning_rate": 0.05,
"subsample": 0.8,
"colsample_bytree": 0.8,
"min_child_weight": 5,
"reg_alpha": 0.1,
"reg_lambda": 1.0,
"random_state": 42,
}
if task == "regression":
default_params["objective"] = "reg:squarederror"
default_params["eval_metric"] = "rmse"
else:
default_params["objective"] = "binary:logistic"
default_params["eval_metric"] = "logloss"
if params:
default_params.update(params)
if task == "regression":
self.model = xgb.XGBRegressor(**default_params)
else:
self.model = xgb.XGBClassifier(**default_params)
def fit(self, X: pd.DataFrame, y: pd.Series,
eval_set: list = None):
fit_params = {"verbose": False}
if eval_set:
fit_params["eval_set"] = eval_set
fit_params["early_stopping_rounds"] = 50
self.model.fit(X, y, **fit_params)
return self
def predict(self, X: pd.DataFrame) -> np.ndarray:
return self.model.predict(X)
def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
if self.task == "classification":
return self.model.predict_proba(X)[:, 1]
raise ValueError("predict_proba only available for classification")
def explain(self, X: pd.DataFrame) -> shap.Explanation:
explainer = shap.TreeExplainer(self.model)
return explainer(X)
@staticmethod
def tune_hyperparameters(X_train, y_train, X_val, y_val,
n_trials: int = 100) -> dict:
"""
Bayesian hyperparameter optimization using Optuna
(Chapter 29, Section 29.5).
"""
def objective(trial):
params = {
"n_estimators": trial.suggest_int("n_estimators", 100, 1000),
"max_depth": trial.suggest_int("max_depth", 3, 10),
"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3, log=True),
"subsample": trial.suggest_float("subsample", 0.6, 1.0),
"colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
"min_child_weight": trial.suggest_int("min_child_weight", 1, 20),
"reg_alpha": trial.suggest_float("reg_alpha", 1e-3, 10.0, log=True),
"reg_lambda": trial.suggest_float("reg_lambda", 1e-3, 10.0, log=True),
}
model = xgb.XGBRegressor(**params, random_state=42,
objective="reg:squarederror")
model.fit(X_train, y_train, eval_set=[(X_val, y_val)],
early_stopping_rounds=50, verbose=False)
preds = model.predict(X_val)
rmse = np.sqrt(np.mean((preds - y_val) ** 2))
return rmse
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=n_trials)
return study.best_params
# ---------------------------------------------------------------------------
# Model C: Neural Network (PyTorch)
# ---------------------------------------------------------------------------
class NFLNet(nn.Module):
"""
Multi-task neural network for NFL prediction (Chapter 29).
Predicts both point spread (regression) and win probability
(classification) simultaneously.
"""
def __init__(self, input_dim: int, hidden_dims: list = None,
dropout: float = 0.3):
super().__init__()
if hidden_dims is None:
hidden_dims = [128, 64, 32]
layers = []
prev_dim = input_dim
for dim in hidden_dims:
layers.extend([
nn.Linear(prev_dim, dim),
nn.BatchNorm1d(dim),
nn.ReLU(),
nn.Dropout(dropout),
])
prev_dim = dim
self.shared = nn.Sequential(*layers)
self.spread_head = nn.Linear(prev_dim, 1) # Regression
self.win_head = nn.Sequential(
nn.Linear(prev_dim, 1),
nn.Sigmoid()
)
def forward(self, x):
shared = self.shared(x)
spread = self.spread_head(shared).squeeze(-1)
win_prob = self.win_head(shared).squeeze(-1)
return spread, win_prob
class NFLNetTrainer:
"""Training harness for the multi-task neural network."""
def __init__(self, input_dim: int, lr: float = 0.001,
weight_decay: float = 1e-4, spread_weight: float = 0.7,
win_weight: float = 0.3):
self.model = NFLNet(input_dim)
self.optimizer = torch.optim.Adam(
self.model.parameters(), lr=lr, weight_decay=weight_decay
)
self.spread_loss_fn = nn.MSELoss()
self.win_loss_fn = nn.BCELoss()
self.spread_weight = spread_weight
self.win_weight = win_weight
self.scaler = StandardScaler()
def fit(self, X_train: np.ndarray, y_spread: np.ndarray,
y_win: np.ndarray, epochs: int = 200, batch_size: int = 64,
X_val: np.ndarray = None, y_spread_val: np.ndarray = None,
patience: int = 20):
X_scaled = self.scaler.fit_transform(X_train)
X_t = torch.FloatTensor(X_scaled)
y_s_t = torch.FloatTensor(y_spread)
y_w_t = torch.FloatTensor(y_win)
best_val_loss = float("inf")
patience_counter = 0
self.model.train()
for epoch in range(epochs):
indices = torch.randperm(len(X_t))
epoch_loss = 0.0
n_batches = 0
for i in range(0, len(X_t), batch_size):
batch_idx = indices[i:i+batch_size]
X_batch = X_t[batch_idx]
y_s_batch = y_s_t[batch_idx]
y_w_batch = y_w_t[batch_idx]
self.optimizer.zero_grad()
pred_spread, pred_win = self.model(X_batch)
loss_spread = self.spread_loss_fn(pred_spread, y_s_batch)
loss_win = self.win_loss_fn(pred_win, y_w_batch)
loss = self.spread_weight * loss_spread + self.win_weight * loss_win
loss.backward()
self.optimizer.step()
epoch_loss += loss.item()
n_batches += 1
# Early stopping on validation set
if X_val is not None:
val_loss = self._validate(X_val, y_spread_val)
if val_loss < best_val_loss:
best_val_loss = val_loss
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= patience:
print(f"Early stopping at epoch {epoch}")
break
def _validate(self, X_val, y_spread_val):
self.model.eval()
X_scaled = self.scaler.transform(X_val)
X_t = torch.FloatTensor(X_scaled)
with torch.no_grad():
pred_spread, _ = self.model(X_t)
self.model.train()
return float(nn.MSELoss()(pred_spread, torch.FloatTensor(y_spread_val)))
def predict(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
self.model.eval()
X_scaled = self.scaler.transform(X)
X_t = torch.FloatTensor(X_scaled)
with torch.no_grad():
spread, win_prob = self.model(X_t)
return spread.numpy(), win_prob.numpy()
Phase 3 Checklist
- [ ] Ridge regression trained and coefficient table produced
- [ ] XGBoost trained with early stopping; hyperparameters tuned via Optuna
- [ ] Neural network trained with multi-task loss and early stopping
- [ ] SHAP analysis completed for XGBoost model
- [ ] All models produce predictions for: spread, totals, and win probability
- [ ] No data leakage: all features use only past information relative to each game
Phase 4: Backtesting
Duration: Weeks 4--5
Relevant chapters: Chapter 30 (Model Evaluation), Chapter 24 (Monte Carlo Simulation), Chapter 8 (Hypothesis Testing)
4.1 Walk-Forward Validation Protocol
You must use walk-forward (expanding window) validation as described in Chapter 30, Section 30.3. This is the only evaluation methodology that realistically simulates how you would use the model in practice.
"""
phase4_backtesting.py
Walk-forward validation and performance evaluation.
"""
import pandas as pd
import numpy as np
from sklearn.metrics import brier_score_loss, log_loss, mean_squared_error
from typing import List, Dict, Tuple
import warnings
# ---------------------------------------------------------------------------
# Walk-Forward Validation
# ---------------------------------------------------------------------------
def walk_forward_backtest(features: pd.DataFrame,
target_col: str,
model_class,
model_params: dict,
feature_cols: List[str],
train_start_season: int,
test_start_season: int,
test_end_season: int,
retrain_frequency: str = "season") -> pd.DataFrame:
"""
Walk-forward validation for NFL betting models (Chapter 30).
Protocol:
1. Train on all data from train_start_season through season N-1.
2. Predict all games in season N.
3. Advance to season N+1, retrain on all data through season N.
4. Repeat until test_end_season.
The retrain_frequency parameter controls how often the model is
retrained. For NFL, "season" (retrain before each season) is the
standard approach given the 272-game regular season.
This function returns a DataFrame of out-of-sample predictions
alongside actual outcomes, suitable for evaluation.
"""
all_predictions = []
for test_season in range(test_start_season, test_end_season + 1):
# Training data: all completed seasons before the test season
train_mask = features["season"] < test_season
test_mask = features["season"] == test_season
X_train = features.loc[train_mask, feature_cols].copy()
y_train = features.loc[train_mask, target_col].copy()
X_test = features.loc[test_mask, feature_cols].copy()
y_test = features.loc[test_mask, target_col].copy()
# Drop rows with missing features
valid_train = X_train.dropna().index
X_train = X_train.loc[valid_train]
y_train = y_train.loc[valid_train]
valid_test = X_test.dropna().index
X_test = X_test.loc[valid_test]
y_test = y_test.loc[valid_test]
if len(X_train) == 0 or len(X_test) == 0:
continue
# Train model
model = model_class(**model_params)
model.fit(X_train, y_train)
# Predict
preds = model.predict(X_test)
# Store predictions
test_df = features.loc[valid_test, [
"game_id", "season", "week", "home_team", "away_team",
target_col, "consensus_spread", "consensus_total"
]].copy()
test_df["prediction"] = preds
test_df["residual"] = test_df[target_col] - preds
all_predictions.append(test_df)
print(f"Season {test_season}: trained on {len(X_train)} games, "
f"tested on {len(X_test)} games, "
f"RMSE = {np.sqrt(mean_squared_error(y_test, preds)):.3f}")
return pd.concat(all_predictions, ignore_index=True)
# ---------------------------------------------------------------------------
# Evaluation Metrics
# ---------------------------------------------------------------------------
def evaluate_spread_model(predictions: pd.DataFrame) -> Dict[str, float]:
"""
Comprehensive evaluation of a spread prediction model.
Metrics (from Chapter 30):
- RMSE: root mean squared error of margin prediction
- MAE: mean absolute error
- ATS record: win rate picking against the spread
- Brier score: for ATS picks as binary predictions
- Correlation: predicted vs actual margin
"""
df = predictions.copy()
# Point spread accuracy
rmse = np.sqrt(mean_squared_error(df["home_margin"], df["prediction"]))
mae = np.mean(np.abs(df["home_margin"] - df["prediction"]))
# Against-the-spread analysis
df["model_ats_pick"] = np.where(
df["prediction"] + df["consensus_spread"] > 0, "home", "away"
)
df["actual_ats"] = np.where(
df["home_margin"] + df["consensus_spread"] > 0, "home", "away"
)
# Exclude pushes
non_push = df["home_margin"] + df["consensus_spread"] != 0
ats_correct = (df.loc[non_push, "model_ats_pick"] ==
df.loc[non_push, "actual_ats"])
ats_record = ats_correct.mean()
# Brier score for ATS (convert to probability-like confidence)
# Using logistic transformation of predicted edge
df["ats_edge"] = df["prediction"] + df["consensus_spread"]
df["ats_probability"] = 1.0 / (1.0 + np.exp(-df["ats_edge"] / 6.0))
df["ats_actual"] = (df["home_margin"] + df["consensus_spread"] > 0).astype(float)
brier = brier_score_loss(
df.loc[non_push, "ats_actual"],
df.loc[non_push, "ats_probability"]
)
# Correlation
corr = df["home_margin"].corr(df["prediction"])
return {
"rmse": rmse,
"mae": mae,
"ats_record": ats_record,
"ats_games": non_push.sum(),
"brier_score": brier,
"correlation": corr
}
def profit_simulation(predictions: pd.DataFrame,
unit_size: float = 100.0,
min_edge: float = 1.0,
vig: float = -110) -> Dict[str, float]:
"""
Simulate flat-bet profitability of the model's ATS picks.
Only bets where the model disagrees with the market by at least
min_edge points. Assumes standard -110 vig on all bets.
Returns profit metrics from Chapter 3 and Chapter 37.
"""
df = predictions.copy()
# Model edge = model predicted margin - market spread (negative = home favored more)
df["model_edge"] = df["prediction"] + df["consensus_spread"]
df["bet_side"] = np.where(df["model_edge"] > 0, "home", "away")
df["abs_edge"] = np.abs(df["model_edge"])
# Filter to bets meeting minimum edge threshold
bets = df[df["abs_edge"] >= min_edge].copy()
if len(bets) == 0:
return {"total_bets": 0, "profit": 0, "roi": 0}
# Determine win/loss
bets["actual_cover"] = np.where(
bets["bet_side"] == "home",
bets["home_margin"] + bets["consensus_spread"] > 0,
bets["home_margin"] + bets["consensus_spread"] < 0
)
bets["push"] = (bets["home_margin"] + bets["consensus_spread"]) == 0
# Profit calculation at -110
risk_per_bet = unit_size
win_per_bet = unit_size * (100.0 / abs(vig))
bets["profit"] = np.where(
bets["push"], 0.0,
np.where(bets["actual_cover"], win_per_bet, -risk_per_bet)
)
total_profit = bets["profit"].sum()
total_risked = len(bets) * risk_per_bet
roi = total_profit / total_risked * 100
# Calculate max drawdown
cumulative = bets["profit"].cumsum()
running_max = cumulative.cummax()
drawdown = cumulative - running_max
max_drawdown = drawdown.min()
return {
"total_bets": len(bets),
"wins": int(bets["actual_cover"].sum()),
"losses": int((~bets["actual_cover"] & ~bets["push"]).sum()),
"pushes": int(bets["push"].sum()),
"win_rate": bets["actual_cover"].mean(),
"total_profit": total_profit,
"total_risked": total_risked,
"roi_pct": roi,
"max_drawdown": max_drawdown,
"avg_edge": bets["abs_edge"].mean(),
}
4.2 Required Evaluation Outputs
- Model comparison table showing RMSE, MAE, ATS record, Brier score, and correlation for each model across each test season.
- Calibration plot (Chapter 30, Section 30.4) showing predicted probability vs actual outcome frequency, with at least 10 bins.
- Profit simulation curve showing cumulative profit over time at three edge thresholds (0.5, 1.0, 2.0 points).
- SHAP summary plot for the XGBoost model showing the top 20 most important features.
- Season-by-season breakdown to identify whether performance is consistent or driven by one outlier season.
Phase 4 Checklist
- [ ] Walk-forward backtest completed across 3+ out-of-sample seasons
- [ ] No look-ahead bias in any feature or training step
- [ ] All three model families evaluated on identical test sets
- [ ] Brier score computed and compared to a naive baseline (market implied probability)
- [ ] Profit simulation run with realistic -110 vig assumption
- [ ] Calibration plots generated for all models
- [ ] Results are statistically tested: is ATS record significantly above 52.4%? (Chapter 8)
Phase 5: Betting Strategy
Duration: Week 6
Relevant chapters: Chapter 4 (Bankroll Management), Chapter 12 (Line Shopping), Chapter 13 (Value Betting), Chapter 14 (Advanced Bankroll), Chapter 25 (Optimization)
5.1 Kelly-Based Bet Sizing
Implement fractional Kelly sizing as described in Chapter 4 and fully derived in Chapter 14. You must use fractional Kelly (quarter or half Kelly) to account for estimation uncertainty.
"""
phase5_betting_strategy.py
Kelly sizing, line shopping, and bet recommendation engine.
"""
import numpy as np
import pandas as pd
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
@dataclass
class BetRecommendation:
game_id: str
season: int
week: int
home_team: str
away_team: str
bet_side: str # "home_spread", "away_spread", "over", "under"
model_edge: float # in points (spread) or probability (moneyline)
best_odds: int # American odds at best available book
best_book: str # Which sportsbook has the best line
kelly_fraction: float # Recommended bet size as fraction of bankroll
bet_amount: float # Dollar amount at current bankroll
expected_value: float # Expected profit of the bet
def fractional_kelly(win_prob: float, american_odds: int,
kelly_fraction: float = 0.25) -> float:
"""
Calculate fractional Kelly bet size (Chapter 4, Chapter 14).
Full Kelly maximizes the geometric growth rate of bankroll but
requires perfectly known probabilities. Since our probabilities
are estimated (with error), we use fractional Kelly:
Full Kelly: f* = (bp - q) / b
where b = decimal payout ratio, p = win probability, q = 1 - p
Fractional Kelly: f = kelly_fraction * f*
Quarter Kelly (0.25) is recommended for sports betting because:
1. It reduces the impact of probability estimation errors
2. It reduces variance by ~75% while sacrificing only ~25% of growth rate
3. It makes drawdowns more psychologically manageable (Chapter 36)
"""
if american_odds > 0:
decimal_odds = 1.0 + american_odds / 100.0
else:
decimal_odds = 1.0 + 100.0 / abs(american_odds)
b = decimal_odds - 1.0 # Net payout per unit risked
p = win_prob
q = 1.0 - p
full_kelly = (b * p - q) / b
if full_kelly <= 0:
return 0.0
return kelly_fraction * full_kelly
def compare_lines_across_books(game_id: str,
odds_by_book: Dict[str, Dict]) -> Dict:
"""
Find the best available line across sportsbooks (Chapter 12).
odds_by_book format:
{
"DraftKings": {"home_spread": -3.0, "home_spread_odds": -110, ...},
"FanDuel": {"home_spread": -2.5, "home_spread_odds": -115, ...},
...
}
Returns the best line for each bet type and the book offering it.
"""
best_lines = {}
for bet_type in ["home_spread", "away_spread", "over", "under",
"home_ml", "away_ml"]:
best_odds = -999
best_book = None
best_line = None
for book, lines in odds_by_book.items():
odds_key = f"{bet_type}_odds"
if odds_key in lines and lines[odds_key] is not None:
if lines[odds_key] > best_odds:
best_odds = lines[odds_key]
best_book = book
best_line = lines.get(bet_type.replace("_odds", ""))
best_lines[bet_type] = {
"line": best_line,
"odds": best_odds,
"book": best_book
}
return best_lines
def generate_weekly_recommendations(
predictions: pd.DataFrame,
bankroll: float,
odds_by_game: Dict[str, Dict[str, Dict]],
min_edge_spread: float = 1.0,
min_edge_total: float = 1.5,
min_ev_pct: float = 0.02,
kelly_fraction: float = 0.25,
max_bet_pct: float = 0.05,
max_weekly_exposure: float = 0.20
) -> List[BetRecommendation]:
"""
Generate bet recommendations for a given week.
Constraints (from Chapter 4 and Chapter 14):
- min_edge_spread: minimum model edge in points to trigger a spread bet
- min_edge_total: minimum model edge in points to trigger a total bet
- min_ev_pct: minimum expected value as a percentage of bet amount
- kelly_fraction: fraction of full Kelly to bet (default: quarter Kelly)
- max_bet_pct: maximum single bet as fraction of bankroll (hard cap)
- max_weekly_exposure: maximum total weekly exposure as fraction of bankroll
"""
recommendations = []
total_exposure = 0.0
for _, game in predictions.iterrows():
game_id = game["game_id"]
if game_id not in odds_by_game:
continue
best_lines = compare_lines_across_books(game_id, odds_by_game[game_id])
# Spread bet evaluation
model_edge = game["prediction"] + game["consensus_spread"]
if abs(model_edge) >= min_edge_spread:
bet_side = "home_spread" if model_edge > 0 else "away_spread"
line_info = best_lines[bet_side]
if line_info["odds"] > -999:
# Convert point edge to win probability
# Using logistic function calibrated to historical data
win_prob = 1.0 / (1.0 + np.exp(-abs(model_edge) / 5.5))
ev_pct = win_prob * (100 / abs(line_info["odds"])
if line_info["odds"] < 0
else line_info["odds"] / 100) - (1 - win_prob)
if ev_pct >= min_ev_pct:
kelly = fractional_kelly(win_prob, line_info["odds"],
kelly_fraction)
bet_pct = min(kelly, max_bet_pct)
bet_amount = round(bankroll * bet_pct, 2)
if total_exposure + bet_pct <= max_weekly_exposure and bet_amount > 0:
recommendations.append(BetRecommendation(
game_id=game_id,
season=game["season"],
week=game["week"],
home_team=game["home_team"],
away_team=game["away_team"],
bet_side=bet_side,
model_edge=abs(model_edge),
best_odds=line_info["odds"],
best_book=line_info["book"],
kelly_fraction=bet_pct,
bet_amount=bet_amount,
expected_value=ev_pct * bet_amount
))
total_exposure += bet_pct
# Sort by expected value, highest first
recommendations.sort(key=lambda r: r.expected_value, reverse=True)
return recommendations
Phase 5 Checklist
- [ ] Fractional Kelly sizing implemented with quarter-Kelly default
- [ ] Line shopping comparison across 3+ sportsbooks
- [ ] Minimum edge thresholds calibrated against backtest results
- [ ] Maximum bet size capped at 5% of bankroll
- [ ] Maximum weekly exposure capped at 20% of bankroll
- [ ] Expected value calculated for every recommended bet
- [ ] Bet recommendations sorted by EV for portfolio construction
Phase 6: Production Pipeline
Duration: Week 7
Relevant chapters: Chapter 31 (ML Betting Pipeline), Chapter 37 (Discipline and Systems)
6.1 Weekly Automation Pipeline
Build a pipeline that runs every Tuesday (once new data from the prior week is available) and produces bet recommendations for the upcoming NFL week.
"""
phase6_production_pipeline.py
Weekly automated NFL betting pipeline.
"""
import schedule
import time
import logging
from datetime import datetime
from pathlib import Path
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler("pipeline.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
class NFLBettingPipeline:
"""
End-to-end weekly pipeline (Chapter 31).
Workflow executed every Tuesday:
1. Update data: pull latest play-by-play, injury reports, and odds
2. Recompute features: rebuild feature matrix with latest game results
3. Update model: retrain if configured (monthly or per-season)
4. Generate predictions: run model on upcoming week's games
5. Compare lines: fetch current odds across sportsbooks
6. Generate recommendations: produce bet recommendations with Kelly sizing
7. Log and report: store results and send notification
"""
def __init__(self, config_path: str = "config.yaml"):
self.config = self._load_config(config_path)
self.bankroll = self.config.get("bankroll", 10000.0)
self.current_season = self.config.get("current_season", 2024)
self.current_week = self.config.get("current_week", 1)
def _load_config(self, path: str) -> dict:
import yaml
with open(path) as f:
return yaml.safe_load(f)
def run_weekly_pipeline(self):
"""Execute the complete weekly pipeline."""
logger.info("=" * 60)
logger.info(f"NFL BETTING PIPELINE - Season {self.current_season}, "
f"Week {self.current_week}")
logger.info("=" * 60)
try:
# Step 1: Update data
logger.info("Step 1: Updating data...")
self._update_data()
# Step 2: Recompute features
logger.info("Step 2: Computing features...")
features = self._compute_features()
# Step 3: Update model (if scheduled)
logger.info("Step 3: Checking model freshness...")
model = self._update_model_if_needed(features)
# Step 4: Generate predictions
logger.info("Step 4: Generating predictions...")
predictions = self._generate_predictions(model, features)
# Step 5: Compare lines
logger.info("Step 5: Fetching current odds...")
odds = self._fetch_current_odds()
# Step 6: Generate recommendations
logger.info("Step 6: Generating bet recommendations...")
recommendations = self._generate_recommendations(predictions, odds)
# Step 7: Log and report
logger.info("Step 7: Logging results and sending report...")
self._log_results(recommendations)
self._send_report(recommendations)
logger.info(f"Pipeline complete. {len(recommendations)} "
f"recommendations generated.")
except Exception as e:
logger.error(f"Pipeline failed: {e}", exc_info=True)
self._send_alert(f"Pipeline failure: {e}")
def _update_data(self):
"""Pull latest play-by-play and injury data."""
# Re-run data collection for current season
# (Implementation reuses Phase 1 code)
pass
def _compute_features(self):
"""Rebuild feature matrix with latest data."""
# (Implementation reuses Phase 2 code)
pass
def _update_model_if_needed(self, features):
"""Retrain model if the retrain schedule requires it."""
# Check last retrain date; retrain if more than N weeks ago
# or if it is the start of a new season
pass
def _generate_predictions(self, model, features):
"""Run the ensemble model on upcoming games."""
pass
def _fetch_current_odds(self):
"""Fetch live odds from multiple sportsbooks via API."""
pass
def _generate_recommendations(self, predictions, odds):
"""Produce bet recommendations using Phase 5 strategy code."""
pass
def _log_results(self, recommendations):
"""Store recommendations to database for tracking (Chapter 37)."""
pass
def _send_report(self, recommendations):
"""Send weekly report via email or Slack webhook."""
pass
def _send_alert(self, message):
"""Send error alert for pipeline failures."""
pass
# Schedule the pipeline to run every Tuesday at 10:00 AM
if __name__ == "__main__":
pipeline = NFLBettingPipeline()
schedule.every().tuesday.at("10:00").do(pipeline.run_weekly_pipeline)
logger.info("Pipeline scheduler started. Waiting for next run...")
while True:
schedule.run_pending()
time.sleep(60)
Phase 6 Checklist
- [ ] Pipeline runs end-to-end from data update through recommendation
- [ ] Error handling catches and reports failures at each step
- [ ] Results logged to database for performance tracking (Chapter 37)
- [ ] Notifications sent via email, Slack, or similar
- [ ] Pipeline can be run manually for testing
- [ ] Configuration file externalizes parameters (bankroll, thresholds, API keys)
Project Structure
nfl-betting-system/
|-- README.md
|-- requirements.txt
|-- config.yaml
|-- pipeline.log
|
|-- src/
| |-- __init__.py
| |-- data/
| | |-- __init__.py
| | |-- collect_pbp.py # Phase 1: play-by-play collection
| | |-- collect_odds.py # Phase 1: odds collection
| | |-- collect_injuries.py # Phase 1: injury data
| | |-- collect_weather.py # Phase 1: weather data
| | |-- database.py # SQLite read/write utilities
| |
| |-- features/
| | |-- __init__.py
| | |-- epa_features.py # Phase 2: EPA-based features
| | |-- schedule_features.py # Phase 2: schedule/rest features
| | |-- elo_ratings.py # Phase 2: Elo rating system
| | |-- injury_features.py # Phase 2: injury impact
| | |-- weather_features.py # Phase 2: weather features
| | |-- market_features.py # Phase 2: market-derived features
| | |-- build_matrix.py # Phase 2: combine all features
| |
| |-- models/
| | |-- __init__.py
| | |-- ridge_model.py # Phase 3: Ridge regression
| | |-- xgboost_model.py # Phase 3: XGBoost
| | |-- neural_net.py # Phase 3: PyTorch neural network
| | |-- ensemble.py # Phase 3: model ensemble
| | |-- calibration.py # Phase 3: probability calibration
| |
| |-- evaluation/
| | |-- __init__.py
| | |-- backtest.py # Phase 4: walk-forward validation
| | |-- metrics.py # Phase 4: evaluation metrics
| | |-- profit_sim.py # Phase 4: profit simulation
| | |-- visualizations.py # Phase 4: plots and dashboards
| |
| |-- strategy/
| | |-- __init__.py
| | |-- kelly.py # Phase 5: Kelly sizing
| | |-- line_shopping.py # Phase 5: cross-book comparison
| | |-- recommendations.py # Phase 5: bet recommendation engine
| |
| |-- pipeline/
| |-- __init__.py
| |-- weekly_pipeline.py # Phase 6: automated weekly pipeline
| |-- scheduler.py # Phase 6: scheduling
| |-- notifications.py # Phase 6: alerting
|
|-- notebooks/
| |-- 01_data_exploration.ipynb
| |-- 02_feature_analysis.ipynb
| |-- 03_model_comparison.ipynb
| |-- 04_backtest_results.ipynb
| |-- 05_strategy_analysis.ipynb
|
|-- data/
| |-- nfl_betting.db # SQLite database
| |-- raw/ # Raw downloaded data
| |-- processed/ # Processed feature files
|
|-- models/
| |-- ridge_latest.pkl
| |-- xgboost_latest.json
| |-- neural_net_latest.pt
|
|-- reports/
| |-- weekly/ # Weekly recommendation reports
| |-- backtest_report.html # Backtest dashboard
| |-- technical_report.pdf # Final technical report
|
|-- tests/
|-- test_data_collection.py
|-- test_features.py
|-- test_models.py
|-- test_strategy.py
|-- test_pipeline.py
Grading Rubric
| Component | Weight | Excellent (90-100%) | Good (75-89%) | Satisfactory (60-74%) | Needs Work (<60%) |
|---|---|---|---|---|---|
| Data Pipeline | 15% | Automated collection from 3+ sources; robust error handling; complete data quality checks | 2 sources; basic error handling; data quality checked | 1 source; manual steps required; minimal quality checks | Incomplete data; critical errors; no quality assurance |
| Feature Engineering | 20% | 80+ features across 6+ categories; all grounded in analytics research; no leakage | 50-79 features; 4-5 categories; no leakage | 30-49 features; 3 categories; minor leakage risk | <30 features; obvious data leakage; poorly motivated |
| Modeling | 20% | 4+ model types with ensemble; hyperparameter tuning; SHAP analysis; multi-task learning | 3 model types; tuning performed; basic feature importance | 2 model types; default hyperparameters; no interpretability | 1 model; no tuning; no understanding of model behavior |
| Backtesting | 20% | Walk-forward across 5+ seasons; all metrics computed; statistical significance tested; calibration excellent | 3-4 seasons; most metrics computed; calibration checked | 2 seasons; basic metrics only; no calibration analysis | No proper temporal separation; metrics incomplete |
| Strategy & Production | 15% | Kelly sizing with constraints; multi-book line shopping; fully automated pipeline with monitoring | Kelly sizing; single book; semi-automated pipeline | Flat bet sizing; no line shopping; manual process | No sizing logic; no strategy; no pipeline |
| Documentation & Report | 10% | Professional technical report; clear code documentation; insightful self-evaluation | Complete report; adequate documentation; some reflection | Incomplete report; minimal documentation | Missing report; no documentation |
Suggested Timeline
| Week | Phase | Key Activities | Milestone |
|---|---|---|---|
| 1 | Phase 1: Data Collection | Set up environment, download data, build database | Database populated with 5+ seasons |
| 2 | Phase 2: Feature Engineering (Part 1) | EPA features, schedule features, Elo ratings | 30+ features computed |
| 3 | Phase 2 (cont.) + Phase 3 starts | Injury/weather/market features; begin Ridge model | 50+ features; baseline model trained |
| 4 | Phase 3: Model Building | XGBoost and neural network; hyperparameter tuning | All 3+ models trained |
| 5 | Phase 4: Backtesting | Walk-forward validation; evaluation metrics; profit simulation | Backtest report complete |
| 6 | Phase 5: Betting Strategy | Kelly sizing; line shopping; recommendation engine | Strategy module functional |
| 7 | Phase 6: Production Pipeline | Automation; scheduling; monitoring; alerting | Pipeline runs end-to-end |
| 8 | Documentation & Polish | Technical report; presentation; code cleanup | All deliverables submitted |
Chapter Reference Index
The following chapters are directly applied in this capstone project:
- Chapter 2 (Probability and Odds): Implied probability extraction, vig removal
- Chapter 3 (Expected Value): EV calculation for bet recommendations
- Chapter 4 (Bankroll Management): Kelly Criterion, bankroll constraints
- Chapter 5 (Data Literacy): Data collection, cleaning, storage
- Chapter 8 (Hypothesis Testing): Statistical significance of ATS record
- Chapter 9 (Regression Analysis): Ridge regression baseline model
- Chapter 10 (Bayesian Thinking): Prior information in probability estimation
- Chapter 11 (Betting Markets): Market-derived features, line interpretation
- Chapter 12 (Line Shopping): Multi-book comparison, CLV tracking
- Chapter 13 (Value Betting): Systematic value identification framework
- Chapter 14 (Advanced Bankroll): Full Kelly derivation, fractional Kelly, portfolio theory
- Chapter 15 (Modeling the NFL): EPA metrics, injury impact, schedule factors, home field advantage
- Chapter 23 (Time Series): Rolling windows, mean reversion, seasonal patterns
- Chapter 24 (Monte Carlo Simulation): Profit simulation, confidence intervals
- Chapter 25 (Optimization): Portfolio-level bet sizing optimization
- Chapter 26 (Ratings and Rankings): Elo rating system, Massey ratings
- Chapter 27 (Advanced Regression/Classification): XGBoost, SHAP, calibration
- Chapter 28 (Feature Engineering): Feature design principles, domain features
- Chapter 29 (Neural Networks): Multi-task PyTorch model, Optuna tuning
- Chapter 30 (Model Evaluation): Walk-forward validation, Brier score, calibration plots
- Chapter 31 (ML Betting Pipeline): System architecture, automation, monitoring
- Chapter 36 (Psychology): Managing variance and drawdowns emotionally
- Chapter 37 (Discipline and Systems): Bet logging, performance tracking, review processes
Tips for Success
-
Start with data quality. The single most common failure mode in sports prediction projects is bad data. Verify your data against known results before building anything.
-
Baseline early. Get Ridge regression working in week 2. Use it as your reference point for all subsequent models. If a complex model cannot beat Ridge, something is wrong.
-
Respect temporal ordering. Every feature, every training split, every evaluation must strictly respect the arrow of time. Even one feature that inadvertently uses future information will invalidate your entire backtest.
-
The market is your toughest competitor. NFL point spreads close within 1-2 points of the actual margin on average. A model that beats the market by even 0.5 points consistently is exceptional.
-
Bet sizing matters more than prediction accuracy. A perfectly calibrated model with poor bet sizing will underperform a decent model with disciplined Kelly sizing (Chapter 14).
-
Document everything. Your future self (and your grader) will thank you. Record every decision, every hyperparameter choice, and every result.
This capstone project integrates material from Chapters 2--5, 8--15, 23--31, 36, and 37 of The Sports Betting Textbook.