Case Study: Building an Edge from Scratch --- One Model's Journey from Idea to Profit

Executive Summary

This case study follows the development of an NFL prediction model built by a quantitative analyst ("Sam") who had no prior sports betting experience but strong statistical skills. Over 18 months --- from initial concept through three full NFL seasons of live betting --- Sam iterated through four model versions, tracked every bet with meticulous detail, and used calibration analysis and CLV to evaluate whether the model had genuine predictive value. The journey illustrates the practical reality of value betting: the first model lost money, the second broke even, and the third finally produced a sustainable +3.1% ROI verified by +2.0% average CLV across 1,800 bets. The key lesson is that building a winning model is an iterative process where systematic evaluation --- not raw results --- guides improvement.

Background

The Modeler's Profile

Sam is a 29-year-old data scientist at a technology company with a PhD in applied statistics. In early 2023, Sam read a series of academic papers on sports betting market efficiency and was struck by two findings: first, that closing lines at sharp sportsbooks are highly efficient but not perfectly so; and second, that systematic model-based approaches have historically found small but exploitable edges, particularly in less-efficient markets.

Sam had three advantages: strong technical skills in Python, statistics, and machine learning; familiarity with Bayesian reasoning; and the discipline to follow a structured research process. Sam had one critical disadvantage: zero domain knowledge of football.

The Plan

Sam allocated 6 months to model development before placing a single real bet. The plan was:

Months 1-2: Collect historical NFL data, build a baseline Elo model, and establish benchmark metrics.
Months 3-4: Enhance the model with additional features (efficiency metrics, situational adjustments, injury data).
Months 5-6: Backtest on held-out data, perform calibration analysis, and identify the model's strengths and weaknesses.
Season 1 (2023): Live-test with small stakes ($50/bet), tracking CLV on every bet.
Season 2 (2024): Iterate based on Season 1 findings, increase stakes if CLV is positive.

The Analysis

Phase 1: The Baseline Elo Model (Model v1.0)

Sam started with a standard Elo model with the following parameters:

Parameter	Value	Rationale
K-factor	20	Standard starting point for NFL
Home advantage	65 Elo points	Historical average (~53% home win rate)
Initial rating	1500	Convention
Mean reversion	33% per season	Captures roster turnover
MoV adjustment	Log-based multiplier	Reduces impact of blowouts

Sam trained the model on 5 seasons of NFL data (2018-2022) and evaluated it on a held-out test set (2022 season, 272 games):

Metric	Model v1.0	Naive Baseline (55% home)	Sharp Closing Line
Brier score	0.237	0.248	0.213
Log loss	0.681	0.693	0.665
AUC	0.605	0.500	0.680
Calibration error	0.035	0.042	0.012

The baseline Elo model was better than the naive baseline but substantially worse than the sharp closing line. The Brier score gap (0.237 vs. 0.213) indicated that betting the model's raw probabilities against a sharp book would be unprofitable --- the market was more accurate.

Key Observation: The calibration analysis revealed that v1.0 was overconfident on strong favorites (predicting 75%+ when actual win rate was closer to 70%) and underconfident on close games (predicting 50-55% when actual rate was closer to 53-57%). This asymmetric miscalibration created specific opportunities.

Phase 2: Adding Features (Model v2.0)

Sam enhanced the model with three feature categories:

Efficiency Metrics: Offensive and defensive yards per play, adjusted for strength of opponent. Sam computed rolling 8-game averages, weighted by recency.

Situational Adjustments: Rest advantage (extra days between games), travel distance (cross-country flights versus short trips), and divisional rivalry effects (which tended to compress point spreads).

Weather Data: Temperature, wind speed, and precipitation at the game venue. Sam hypothesized that extreme weather would favor lower-scoring games and teams with strong rushing attacks.

Model v2.0 was trained on 2018-2022 and tested on 2022:

Metric	v1.0	v2.0	Sharp Closing Line
Brier score	0.237	0.224	0.213
Log loss	0.681	0.668	0.665
Calibration error	0.035	0.021	0.012

The improvement was meaningful: the Brier score dropped from 0.237 to 0.224, and the calibration error halved. However, the model still trailed the sharp closing line (0.224 vs. 0.213). Sam noted that the log loss was now very close to the sharp line (0.668 vs. 0.665), suggesting the model was nearly competitive at discriminating winners from losers, but its probability calibration still lagged.

Phase 3: Bayesian Combination (Model v3.0)

Rather than trying to beat the market entirely with the model, Sam adopted the Bayesian combination approach from Section 13.1 of the chapter. The logic: use the model's output as one signal and the sharp closing line as another, combining them with calibrated weights.

Sam used model weight 0.30 and market weight 0.70, based on the relative Brier scores. The combined probability was computed in log-odds space:

$$ \text{logit}(p_{\text{combined}}) = 0.30 \times \text{logit}(p_{\text{model}}) + 0.70 \times \text{logit}(p_{\text{market}}) $$

When the combined probability exceeded the implied probability at a soft book by at least 2%, Sam would bet.

Backtest Results (2022 season, using actual closing lines and soft book lines):

Metric	v3.0 Combined
Bets identified	312
Average edge	+3.8%
Simulated CLV	+2.3%
Simulated ROI (at -108 avg odds)	+3.1%

The model identified value primarily in three areas: totals in extreme weather games, spreads where rest and travel advantages were mispriced, and moneylines where the public overbet popular teams.

Season 1 (2023): Live Testing

Sam opened accounts at six sportsbooks and began live betting with $50 flat stakes. Every bet was recorded with the full set of fields described in Section 13.3.

Season 1 Results:

Metric	Value
Total bets	584
Win rate	51.9%
Average odds obtained	-109.2
ROI	-0.8%
Mean CLV	+0.7%
CLV hit rate	55.2%
Total profit/loss	-$234

The season was a financial disappointment. Sam lost $234 despite the model identifying genuine value opportunities. However, the CLV data told a more nuanced story:

CLV was positive. The +0.7% mean CLV indicated that Sam's combined probabilities were, on average, slightly better than the closing line. This was statistically weak (t = 1.42, p = 0.078) but directionally encouraging.

The loss was consistent with variance. A bettor with +0.7% CLV placing 584 bets at -109 average odds would be profitable only approximately 58% of the time in simulation. Sam's loss was within the expected range.

Model weakness identified. CLV was negative on moneyline bets (-0.3%) but positive on spreads (+0.9%) and totals (+1.4%). The moneyline model was not adding value above the market.

Off-Season Iteration (Model v3.1)

Based on Season 1 findings, Sam made three changes:

Dropped moneyline bets. The model had no edge in this market.
Increased weather feature weighting. Totals CLV was strongest in weather-affected games, suggesting the market was slow to price in weather.
Added snap count data. Sam obtained pre-game snap count projections that helped predict total scoring more accurately.
Adjusted the Bayesian weights. After observing that the model outperformed the market on totals, Sam used a higher model weight (0.40) for totals and kept 0.30 for spreads.

Season 2 (2024): The Breakthrough

Sam increased stakes to $100/bet and continued disciplined tracking.

Season 2 Results:

Metric	Value
Total bets	628
Win rate	53.5%
Average odds obtained	-107.8
ROI	+3.1%
Mean CLV	+2.0%
CLV hit rate	60.8%
Total profit	+$1,946

By Market Type:

Market	Bets	CLV	ROI	CLV Hit Rate
Spreads	380	+1.6%	+2.2%	58.4%
Totals	248	+2.6%	+4.5%	64.1%

The results validated the iterative approach:

CLV was statistically significant. Mean CLV of +2.0% over 628 bets gave t = 4.21, p < 0.001.
The totals edge was strong. The +2.6% CLV on totals confirmed that weather and snap count data provided genuine predictive value that the market underpriced.
Line shopping contributed. Sam's average odds of -107.8 were better than the market average of -110, contributing approximately 1 cent of vig savings per bet through line comparison across six books.
CLV and ROI aligned. The +2.0% CLV predicted approximately +3% ROI (accounting for vig structure), and the actual +3.1% ROI matched closely.

Season 3 (2025 through Week 10): Continuation

Sam increased stakes to $150/bet. Through Week 10:

Metric	Value
Total bets	588
Win rate	53.1%
Mean CLV	+1.8%
ROI	+2.4%
Total profit	+$2,117

The slight CLV decline (from 2.0% to 1.8%) was expected. Sam monitored rolling 100-bet CLV and noticed that it dipped to +1.2% during Weeks 6-8 (coinciding with key injuries not yet in the snap count projections). After adjusting the injury model, CLV recovered to +2.1% in Weeks 9-10.

Cumulative Results Across Three Seasons

Metric	Season 1	Season 2	Season 3*	Total
Bets	584	628	588	1,800
Win Rate	51.9%	53.5%	53.1%	52.9%
Mean CLV	+0.7%	+2.0%	+1.8%	+1.5%
ROI	-0.8%	+3.1%	+2.4%	+1.7%
Profit	-$234 \| +$1,946	+$2,117 \| +$3,829

*Season 3 through Week 10.

The 1,800-bet CLV of +1.5% is highly significant (t = 5.8, p < 0.001), confirming that the model has genuine predictive value. The Season 1 loss was attributable to variance and model immaturity, not to a fundamentally flawed approach.

Key Findings

The first model was not profitable, and that was expected. Building a winning model is iterative. Model v1.0 was worse than the market, v2.0 was competitive, and v3.0/v3.1 were profitable. Bettors who expect their first model to print money will be disappointed.
CLV guided the iteration. Without CLV tracking, Sam would have concluded after Season 1 that the approach was failing (negative ROI). With CLV, Sam could see that the process was marginally sound (positive CLV) and that specific improvements (dropping moneylines, enhancing weather model) could lift the edge above the profitability threshold.
The Bayesian combination was key. Sam's model alone (Brier 0.224) was not good enough to beat the market (Brier 0.213). But the model contributed incremental information that, when combined with the market's signal, produced a combined estimate better than either input alone. This is the practical application of Section 13.1's Bayesian framework.
Market-specific edges matter. Sam's edge was concentrated in totals (+2.6% CLV) and moderate in spreads (+1.6% CLV). The moneyline market provided no edge. Without per-market CLV tracking, Sam might have continued placing unprofitable moneyline bets.
The investment in tracking paid off. Sam's detailed journal --- recording the pre-bet thesis, counter-arguments, and post-event review for every bet --- was the foundation for systematic improvement. The journal entries for weather-affected totals repeatedly noted that the model's thesis was validated (weather suppressed scoring as predicted), which provided the confidence to increase model weight for totals.

The Python Analysis

The accompanying code (case-study-code.py) implements the Elo model construction pipeline used in this case study, including:

Elo model with calibration analysis that computes Brier score, log loss, and bin-level calibration tables.
Bayesian probability combiner that merges model and market probabilities in log-odds space.
Value identification pipeline that scans for bets where combined probability exceeds implied probability by a configurable threshold.
Season simulation framework that generates synthetic seasons and tracks CLV across model iterations.
Rolling CLV monitor with edge decay detection that flags when model performance is deteriorating.

Discussion Questions

Sam's model outperforms the market on totals but not on moneylines. What specific features of the totals market might make it less efficient than the moneyline market? Is this exploitable long-term?
The Bayesian combination used model weight 0.30 for spreads and 0.40 for totals. How would you determine the optimal weights? Design an experiment using historical data to calibrate these weights.
Sam's Season 1 loss was -$234 despite positive CLV. If Season 1 had produced a -$1,500 loss (still consistent with the CLV variance), should Sam have continued? At what loss level should a bettor with positive CLV consider stopping?
Sam retrained the model once between seasons. Is annual retraining sufficient, or should the model update continuously? Discuss the trade-offs between stability and responsiveness.
Sam's edge on totals appears driven by weather data and snap count projections. If a competing data provider makes similar projections freely available, how quickly will Sam's edge erode? What preemptive steps should Sam take?