Case Study: Building an Edge from Scratch --- One Model's Journey from Idea to Profit
Executive Summary
This case study follows the development of an NFL prediction model built by a quantitative analyst ("Sam") who had no prior sports betting experience but strong statistical skills. Over 18 months --- from initial concept through three full NFL seasons of live betting --- Sam iterated through four model versions, tracked every bet with meticulous detail, and used calibration analysis and CLV to evaluate whether the model had genuine predictive value. The journey illustrates the practical reality of value betting: the first model lost money, the second broke even, and the third finally produced a sustainable +3.1% ROI verified by +2.0% average CLV across 1,800 bets. The key lesson is that building a winning model is an iterative process where systematic evaluation --- not raw results --- guides improvement.
Background
The Modeler's Profile
Sam is a 29-year-old data scientist at a technology company with a PhD in applied statistics. In early 2023, Sam read a series of academic papers on sports betting market efficiency and was struck by two findings: first, that closing lines at sharp sportsbooks are highly efficient but not perfectly so; and second, that systematic model-based approaches have historically found small but exploitable edges, particularly in less-efficient markets.
Sam had three advantages: strong technical skills in Python, statistics, and machine learning; familiarity with Bayesian reasoning; and the discipline to follow a structured research process. Sam had one critical disadvantage: zero domain knowledge of football.
The Plan
Sam allocated 6 months to model development before placing a single real bet. The plan was:
- Months 1-2: Collect historical NFL data, build a baseline Elo model, and establish benchmark metrics.
- Months 3-4: Enhance the model with additional features (efficiency metrics, situational adjustments, injury data).
- Months 5-6: Backtest on held-out data, perform calibration analysis, and identify the model's strengths and weaknesses.
- Season 1 (2023): Live-test with small stakes ($50/bet), tracking CLV on every bet.
- Season 2 (2024): Iterate based on Season 1 findings, increase stakes if CLV is positive.
The Analysis
Phase 1: The Baseline Elo Model (Model v1.0)
Sam started with a standard Elo model with the following parameters:
| Parameter | Value | Rationale |
|---|---|---|
| K-factor | 20 | Standard starting point for NFL |
| Home advantage | 65 Elo points | Historical average (~53% home win rate) |
| Initial rating | 1500 | Convention |
| Mean reversion | 33% per season | Captures roster turnover |
| MoV adjustment | Log-based multiplier | Reduces impact of blowouts |
Sam trained the model on 5 seasons of NFL data (2018-2022) and evaluated it on a held-out test set (2022 season, 272 games):
| Metric | Model v1.0 | Naive Baseline (55% home) | Sharp Closing Line |
|---|---|---|---|
| Brier score | 0.237 | 0.248 | 0.213 |
| Log loss | 0.681 | 0.693 | 0.665 |
| AUC | 0.605 | 0.500 | 0.680 |
| Calibration error | 0.035 | 0.042 | 0.012 |
The baseline Elo model was better than the naive baseline but substantially worse than the sharp closing line. The Brier score gap (0.237 vs. 0.213) indicated that betting the model's raw probabilities against a sharp book would be unprofitable --- the market was more accurate.
Key Observation: The calibration analysis revealed that v1.0 was overconfident on strong favorites (predicting 75%+ when actual win rate was closer to 70%) and underconfident on close games (predicting 50-55% when actual rate was closer to 53-57%). This asymmetric miscalibration created specific opportunities.
Phase 2: Adding Features (Model v2.0)
Sam enhanced the model with three feature categories:
Efficiency Metrics: Offensive and defensive yards per play, adjusted for strength of opponent. Sam computed rolling 8-game averages, weighted by recency.
Situational Adjustments: Rest advantage (extra days between games), travel distance (cross-country flights versus short trips), and divisional rivalry effects (which tended to compress point spreads).
Weather Data: Temperature, wind speed, and precipitation at the game venue. Sam hypothesized that extreme weather would favor lower-scoring games and teams with strong rushing attacks.
Model v2.0 was trained on 2018-2022 and tested on 2022:
| Metric | v1.0 | v2.0 | Sharp Closing Line |
|---|---|---|---|
| Brier score | 0.237 | 0.224 | 0.213 |
| Log loss | 0.681 | 0.668 | 0.665 |
| Calibration error | 0.035 | 0.021 | 0.012 |
The improvement was meaningful: the Brier score dropped from 0.237 to 0.224, and the calibration error halved. However, the model still trailed the sharp closing line (0.224 vs. 0.213). Sam noted that the log loss was now very close to the sharp line (0.668 vs. 0.665), suggesting the model was nearly competitive at discriminating winners from losers, but its probability calibration still lagged.
Phase 3: Bayesian Combination (Model v3.0)
Rather than trying to beat the market entirely with the model, Sam adopted the Bayesian combination approach from Section 13.1 of the chapter. The logic: use the model's output as one signal and the sharp closing line as another, combining them with calibrated weights.
Sam used model weight 0.30 and market weight 0.70, based on the relative Brier scores. The combined probability was computed in log-odds space:
$$ \text{logit}(p_{\text{combined}}) = 0.30 \times \text{logit}(p_{\text{model}}) + 0.70 \times \text{logit}(p_{\text{market}}) $$
When the combined probability exceeded the implied probability at a soft book by at least 2%, Sam would bet.
Backtest Results (2022 season, using actual closing lines and soft book lines):
| Metric | v3.0 Combined |
|---|---|
| Bets identified | 312 |
| Average edge | +3.8% |
| Simulated CLV | +2.3% |
| Simulated ROI (at -108 avg odds) | +3.1% |
The model identified value primarily in three areas: totals in extreme weather games, spreads where rest and travel advantages were mispriced, and moneylines where the public overbet popular teams.
Season 1 (2023): Live Testing
Sam opened accounts at six sportsbooks and began live betting with $50 flat stakes. Every bet was recorded with the full set of fields described in Section 13.3.
Season 1 Results:
| Metric | Value |
|---|---|
| Total bets | 584 |
| Win rate | 51.9% |
| Average odds obtained | -109.2 |
| ROI | -0.8% |
| Mean CLV | +0.7% |
| CLV hit rate | 55.2% |
| Total profit/loss | -$234 |
The season was a financial disappointment. Sam lost $234 despite the model identifying genuine value opportunities. However, the CLV data told a more nuanced story:
CLV was positive. The +0.7% mean CLV indicated that Sam's combined probabilities were, on average, slightly better than the closing line. This was statistically weak (t = 1.42, p = 0.078) but directionally encouraging.
The loss was consistent with variance. A bettor with +0.7% CLV placing 584 bets at -109 average odds would be profitable only approximately 58% of the time in simulation. Sam's loss was within the expected range.
Model weakness identified. CLV was negative on moneyline bets (-0.3%) but positive on spreads (+0.9%) and totals (+1.4%). The moneyline model was not adding value above the market.
Off-Season Iteration (Model v3.1)
Based on Season 1 findings, Sam made three changes:
- Dropped moneyline bets. The model had no edge in this market.
- Increased weather feature weighting. Totals CLV was strongest in weather-affected games, suggesting the market was slow to price in weather.
- Added snap count data. Sam obtained pre-game snap count projections that helped predict total scoring more accurately.
- Adjusted the Bayesian weights. After observing that the model outperformed the market on totals, Sam used a higher model weight (0.40) for totals and kept 0.30 for spreads.
Season 2 (2024): The Breakthrough
Sam increased stakes to $100/bet and continued disciplined tracking.
Season 2 Results:
| Metric | Value |
|---|---|
| Total bets | 628 |
| Win rate | 53.5% |
| Average odds obtained | -107.8 |
| ROI | +3.1% |
| Mean CLV | +2.0% |
| CLV hit rate | 60.8% |
| Total profit | +$1,946 |
By Market Type:
| Market | Bets | CLV | ROI | CLV Hit Rate |
|---|---|---|---|---|
| Spreads | 380 | +1.6% | +2.2% | 58.4% |
| Totals | 248 | +2.6% | +4.5% | 64.1% |
The results validated the iterative approach:
- CLV was statistically significant. Mean CLV of +2.0% over 628 bets gave t = 4.21, p < 0.001.
- The totals edge was strong. The +2.6% CLV on totals confirmed that weather and snap count data provided genuine predictive value that the market underpriced.
- Line shopping contributed. Sam's average odds of -107.8 were better than the market average of -110, contributing approximately 1 cent of vig savings per bet through line comparison across six books.
- CLV and ROI aligned. The +2.0% CLV predicted approximately +3% ROI (accounting for vig structure), and the actual +3.1% ROI matched closely.
Season 3 (2025 through Week 10): Continuation
Sam increased stakes to $150/bet. Through Week 10:
| Metric | Value |
|---|---|
| Total bets | 588 |
| Win rate | 53.1% |
| Mean CLV | +1.8% |
| ROI | +2.4% |
| Total profit | +$2,117 |
The slight CLV decline (from 2.0% to 1.8%) was expected. Sam monitored rolling 100-bet CLV and noticed that it dipped to +1.2% during Weeks 6-8 (coinciding with key injuries not yet in the snap count projections). After adjusting the injury model, CLV recovered to +2.1% in Weeks 9-10.
Cumulative Results Across Three Seasons
| Metric | Season 1 | Season 2 | Season 3* | Total |
|---|---|---|---|---|
| Bets | 584 | 628 | 588 | 1,800 |
| Win Rate | 51.9% | 53.5% | 53.1% | 52.9% |
| Mean CLV | +0.7% | +2.0% | +1.8% | +1.5% |
| ROI | -0.8% | +3.1% | +2.4% | +1.7% |
| Profit | -$234 | +$1,946 | +$2,117 | +$3,829 |
*Season 3 through Week 10.
The 1,800-bet CLV of +1.5% is highly significant (t = 5.8, p < 0.001), confirming that the model has genuine predictive value. The Season 1 loss was attributable to variance and model immaturity, not to a fundamentally flawed approach.
Key Findings
-
The first model was not profitable, and that was expected. Building a winning model is iterative. Model v1.0 was worse than the market, v2.0 was competitive, and v3.0/v3.1 were profitable. Bettors who expect their first model to print money will be disappointed.
-
CLV guided the iteration. Without CLV tracking, Sam would have concluded after Season 1 that the approach was failing (negative ROI). With CLV, Sam could see that the process was marginally sound (positive CLV) and that specific improvements (dropping moneylines, enhancing weather model) could lift the edge above the profitability threshold.
-
The Bayesian combination was key. Sam's model alone (Brier 0.224) was not good enough to beat the market (Brier 0.213). But the model contributed incremental information that, when combined with the market's signal, produced a combined estimate better than either input alone. This is the practical application of Section 13.1's Bayesian framework.
-
Market-specific edges matter. Sam's edge was concentrated in totals (+2.6% CLV) and moderate in spreads (+1.6% CLV). The moneyline market provided no edge. Without per-market CLV tracking, Sam might have continued placing unprofitable moneyline bets.
-
The investment in tracking paid off. Sam's detailed journal --- recording the pre-bet thesis, counter-arguments, and post-event review for every bet --- was the foundation for systematic improvement. The journal entries for weather-affected totals repeatedly noted that the model's thesis was validated (weather suppressed scoring as predicted), which provided the confidence to increase model weight for totals.
The Python Analysis
The accompanying code (case-study-code.py) implements the Elo model construction pipeline used in this case study, including:
- Elo model with calibration analysis that computes Brier score, log loss, and bin-level calibration tables.
- Bayesian probability combiner that merges model and market probabilities in log-odds space.
- Value identification pipeline that scans for bets where combined probability exceeds implied probability by a configurable threshold.
- Season simulation framework that generates synthetic seasons and tracks CLV across model iterations.
- Rolling CLV monitor with edge decay detection that flags when model performance is deteriorating.
Discussion Questions
-
Sam's model outperforms the market on totals but not on moneylines. What specific features of the totals market might make it less efficient than the moneyline market? Is this exploitable long-term?
-
The Bayesian combination used model weight 0.30 for spreads and 0.40 for totals. How would you determine the optimal weights? Design an experiment using historical data to calibrate these weights.
-
Sam's Season 1 loss was -$234 despite positive CLV. If Season 1 had produced a -$1,500 loss (still consistent with the CLV variance), should Sam have continued? At what loss level should a bettor with positive CLV consider stopping?
-
Sam retrained the model once between seasons. Is annual retraining sufficient, or should the model update continuously? Discuss the trade-offs between stability and responsiveness.
-
Sam's edge on totals appears driven by weather data and snap count projections. If a competing data provider makes similar projections freely available, how quickly will Sam's edge erode? What preemptive steps should Sam take?