> "A value bet is simply a bet where the probability of a given outcome is greater than what the odds reflect. It's the only way to make money long-term." -- Pinnacle Sports educational content
In This Chapter
Chapter 13: Value Betting Theory and Practice
"A value bet is simply a bet where the probability of a given outcome is greater than what the odds reflect. It's the only way to make money long-term." -- Pinnacle Sports educational content
Value betting is the intellectual core of professional sports betting. While Chapter 12 focused on getting the best price once you have decided to bet, this chapter addresses the more fundamental question: how do you determine whether a bet has positive expected value in the first place? We will develop rigorous frameworks for estimating true probabilities, systematically identifying value, tracking your bets with precision, evaluating your edge over time, and adapting when markets evolve.
13.1 True Probability vs. Market Probability
The Fundamental Question
Every bet you place is a statement about probability. When you bet on Team A at +150 (implied probability 40%), you are asserting that Team A's true win probability exceeds 40%. But where does your estimate of "true probability" come from?
There are two broad approaches:
- Model-based estimation: Build a quantitative model that outputs probability estimates from input features
- Market-based estimation: Use the market itself (specifically, the no-vig closing line at a sharp book) as the best available estimate, and look for deviations at soft books
Both approaches have strengths and weaknesses, and the best practitioners often combine them.
Model-Based Probability Estimation
A model-based approach constructs an explicit mapping from observable features to outcome probabilities:
$$ P(\text{outcome}) = f(\mathbf{x}) $$
where $\mathbf{x}$ is a vector of features (team ratings, injuries, weather, rest, etc.) and $f$ is your model.
Building a Simple Elo-Based Model
The Elo rating system, originally designed for chess, can be adapted for team sports. It provides a natural probability framework:
import numpy as np
from dataclasses import dataclass, field
from typing import Dict, List, Tuple, Optional
@dataclass
class EloModel:
"""
Elo rating model for team sports with probability estimation.
Parameters
----------
k_factor : float
Learning rate for rating updates (higher = more reactive)
home_advantage : float
Elo points added to home team's rating
initial_rating : float
Starting rating for new teams
mean_reversion : float
Fraction of rating to revert to mean between seasons (0-1)
"""
k_factor: float = 20.0
home_advantage: float = 65.0
initial_rating: float = 1500.0
mean_reversion: float = 0.33
ratings: Dict[str, float] = field(default_factory=dict)
history: List[dict] = field(default_factory=list)
def get_rating(self, team: str) -> float:
"""Get a team's current rating, initializing if necessary."""
if team not in self.ratings:
self.ratings[team] = self.initial_rating
return self.ratings[team]
def predict_probability(
self,
home_team: str,
away_team: str,
neutral: bool = False
) -> Tuple[float, float]:
"""
Predict win probability for each team.
Parameters
----------
home_team : str
away_team : str
neutral : bool
If True, no home advantage is applied
Returns
-------
tuple of (home_win_prob, away_win_prob)
"""
home_rating = self.get_rating(home_team)
away_rating = self.get_rating(away_team)
hfa = 0 if neutral else self.home_advantage
rating_diff = home_rating + hfa - away_rating
# Standard Elo probability formula
home_prob = 1.0 / (1.0 + 10 ** (-rating_diff / 400))
away_prob = 1.0 - home_prob
return home_prob, away_prob
def update(
self,
home_team: str,
away_team: str,
home_score: float,
away_score: float,
neutral: bool = False
):
"""
Update ratings after a game result.
Parameters
----------
home_team, away_team : str
home_score, away_score : float
Actual game scores
neutral : bool
"""
home_prob, away_prob = self.predict_probability(
home_team, away_team, neutral
)
# Determine actual result (1=home win, 0.5=draw, 0=away win)
if home_score > away_score:
home_actual = 1.0
elif home_score < away_score:
home_actual = 0.0
else:
home_actual = 0.5
away_actual = 1.0 - home_actual
# Apply margin of victory multiplier (optional)
mov = abs(home_score - away_score)
mov_multiplier = np.log(max(mov, 1) + 1) * (2.2 / (
(home_prob - away_prob) * 0.001 + 2.2
))
# Update ratings
home_update = self.k_factor * mov_multiplier * (home_actual - home_prob)
self.ratings[home_team] = self.get_rating(home_team) + home_update
self.ratings[away_team] = self.get_rating(away_team) - home_update
# Record prediction for calibration analysis
self.history.append({
'home': home_team,
'away': away_team,
'home_prob': home_prob,
'home_actual': home_actual,
'home_score': home_score,
'away_score': away_score,
})
def season_reset(self):
"""Apply mean reversion between seasons."""
for team in self.ratings:
self.ratings[team] = (
self.initial_rating * self.mean_reversion +
self.ratings[team] * (1 - self.mean_reversion)
)
def calibration_analysis(self, n_bins: int = 10) -> dict:
"""
Analyze how well-calibrated the model's probabilities are.
Returns
-------
dict with calibration metrics
"""
if not self.history:
return {}
probs = np.array([h['home_prob'] for h in self.history])
actuals = np.array([h['home_actual'] for h in self.history])
# Bin probabilities
bins = np.linspace(0, 1, n_bins + 1)
calibration = []
for i in range(n_bins):
mask = (probs >= bins[i]) & (probs < bins[i+1])
if mask.sum() > 0:
calibration.append({
'bin_center': (bins[i] + bins[i+1]) / 2,
'predicted': probs[mask].mean(),
'actual': actuals[mask].mean(),
'count': mask.sum(),
})
# Brier score
brier = np.mean((probs - actuals) ** 2)
# Log loss
eps = 1e-15
probs_clipped = np.clip(probs, eps, 1 - eps)
log_loss = -np.mean(
actuals * np.log(probs_clipped) +
(1 - actuals) * np.log(1 - probs_clipped)
)
return {
'calibration': calibration,
'brier_score': brier,
'log_loss': log_loss,
'n_predictions': len(probs),
}
# Example usage
model = EloModel(k_factor=25, home_advantage=70)
# Simulate some games
games = [
('Team A', 'Team B', 28, 24),
('Team C', 'Team A', 31, 17),
('Team B', 'Team C', 21, 21),
('Team A', 'Team C', 35, 28),
('Team B', 'Team A', 17, 30),
]
for home, away, hs, as_ in games:
prob_h, prob_a = model.predict_probability(home, away)
print(f"{home} vs {away}: P(home)={prob_h:.3f}, P(away)={prob_a:.3f} "
f"=> Result: {hs}-{as_}")
model.update(home, away, hs, as_)
print("\nFinal Ratings:")
for team, rating in sorted(model.ratings.items(), key=lambda x: -x[1]):
print(f" {team}: {rating:.1f}")
Beyond Elo: Feature-Rich Models
While Elo provides a solid baseline, advanced models incorporate many more features. Here is a logistic regression approach:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.calibration import calibration_curve
import pandas as pd
import numpy as np
def build_game_prediction_model(
games_df: pd.DataFrame,
feature_columns: list,
target_column: str = 'home_win'
) -> dict:
"""
Build a logistic regression model for game outcome prediction.
Parameters
----------
games_df : pd.DataFrame
Historical game data with features
feature_columns : list
Column names to use as features
target_column : str
Binary target column (1=home win, 0=away win)
Returns
-------
dict with model, calibration metrics, and cross-validation scores
"""
X = games_df[feature_columns].values
y = games_df[target_column].values
# Fit model
model = LogisticRegression(
penalty='l2',
C=1.0,
max_iter=1000,
random_state=42
)
model.fit(X, y)
# Cross-validation
cv_scores = cross_val_score(
model, X, y, cv=5, scoring='neg_brier_score'
)
# Calibration
predicted_probs = model.predict_proba(X)[:, 1]
fraction_positive, mean_predicted = calibration_curve(
y, predicted_probs, n_bins=10
)
# Feature importance
importance = dict(zip(feature_columns, model.coef_[0]))
return {
'model': model,
'cv_brier_scores': -cv_scores,
'mean_brier': -cv_scores.mean(),
'calibration_actual': fraction_positive,
'calibration_predicted': mean_predicted,
'feature_importance': importance,
}
# Example feature set for NFL prediction
example_features = [
'home_elo_diff', # Elo rating difference
'home_off_dvoa', # Offensive efficiency
'away_off_dvoa',
'home_def_dvoa', # Defensive efficiency
'away_def_dvoa',
'home_rest_days', # Days since last game
'away_rest_days',
'home_travel_miles', # Travel distance
'away_travel_miles',
'is_divisional', # Divisional rivalry game
'home_injuries_impact', # Injury-adjusted rating
'away_injuries_impact',
'dome_game', # Indoor vs outdoor
'temperature', # Game-time temperature
'wind_speed', # Wind speed at venue
]
Market-Based Probability Estimation
The alternative to building your own model is to treat the betting market itself as a highly sophisticated prediction machine. The efficient market hypothesis (applied to sports betting) suggests that the closing line at a sharp sportsbook (like Pinnacle) represents the best available estimate of true probability.
Under this framework, your goal shifts from "estimating the true probability" to "finding sportsbooks whose lines deviate from the sharp market consensus."
$$ \text{Value} = p_{\text{sharp closing}} - p_{\text{implied at soft book}} $$
This approach has significant advantages:
| Aspect | Model-Based | Market-Based |
|---|---|---|
| Skill required | Very high (statistics, domain expertise) | Moderate (line comparison, accounts) |
| Data requirements | Extensive historical data, features | Real-time odds from sharp and soft books |
| Scalability | Hard to maintain across many sports | Easy to apply across all sports |
| Edge source | Model outperforms market | Soft books lag sharp market |
| Risk | Model error, overfitting | Account restrictions, line speed |
| Sustainability | Lasts as long as model edge exists | Lasts until books improve pricing |
Key Insight: Many professional bettors use a hybrid approach. They build models to identify which side of a game they favor, then use market-based analysis (closing line value) to validate that their model actually has an edge. If your model consistently agrees with the direction the sharp market moves, your model has genuine predictive value.
Estimating "True" Probability: A Bayesian Framework
We can formalize the combination of model-based and market-based information using Bayesian updating:
$$ P(\text{outcome} | \text{model, market}) \propto P(\text{outcome} | \text{model}) \times P(\text{market odds} | \text{outcome}) $$
In practice, this means combining your model's probability with the market's probability, weighting by your confidence in each:
def bayesian_probability_combination(
model_prob: float,
market_prob: float,
model_confidence: float = 0.3,
market_confidence: float = 0.7
) -> float:
"""
Combine model and market probabilities using a weighted log-odds approach.
This is equivalent to a Bayesian update under the assumption that
both model and market provide independent information, weighted
by our confidence in each.
Parameters
----------
model_prob : float
Your model's estimated probability (0 to 1)
market_prob : float
Market-implied no-vig probability (0 to 1)
model_confidence : float
Weight on model (0 to 1)
market_confidence : float
Weight on market (0 to 1)
Returns
-------
float
Combined probability estimate
"""
# Convert to log-odds (logit)
def logit(p):
p = np.clip(p, 1e-10, 1 - 1e-10)
return np.log(p / (1 - p))
def inv_logit(x):
return 1 / (1 + np.exp(-x))
# Normalize weights
total = model_confidence + market_confidence
w_model = model_confidence / total
w_market = market_confidence / total
# Combine in log-odds space (more appropriate than linear averaging)
combined_logit = (
w_model * logit(model_prob) +
w_market * logit(market_prob)
)
return inv_logit(combined_logit)
# Example: Model says 55%, market says 48%
combined = bayesian_probability_combination(
model_prob=0.55,
market_prob=0.48,
model_confidence=0.3,
market_confidence=0.7
)
print(f"Model: 55.0%, Market: 48.0%, Combined: {combined*100:.1f}%")
# Output: Model: 55.0%, Market: 48.0%, Combined: 50.0%
The log-odds weighting is superior to simple linear averaging because it correctly handles probabilities near the extremes (0 or 1) and respects the multiplicative nature of odds.
13.2 Systematic Value Identification
The Value Equation
A bet has positive expected value when:
$$ E[\text{profit}] = p_{\text{true}} \times \text{win amount} - (1 - p_{\text{true}}) \times \text{lose amount} > 0 $$
Equivalently, for a bet at American odds $o$:
$$ \text{Value exists when: } p_{\text{true}} > p_{\text{implied}}(o) $$
The percentage edge is:
$$ \text{Edge} = \frac{p_{\text{true}}}{p_{\text{implied}}} - 1 $$
For example, if you estimate the true probability at 45% and the implied probability from the odds is 40%:
$$ \text{Edge} = \frac{0.45}{0.40} - 1 = 12.5\% $$
Edge Thresholds: How Much Edge Do You Need?
Not all positive-edge bets are worth taking. You need to consider:
- Estimation uncertainty: Your probability estimate has error bars
- Transaction costs: Time, effort, opportunity cost of capital
- Vig erosion: Even "value" bets pay vig, reducing your effective edge
- Variance tolerance: Low-edge bets have high variance relative to expected return
Here is a framework for minimum edge thresholds:
def minimum_edge_threshold(
confidence_in_estimate: float,
probability_range: tuple,
bet_type: str = 'standard',
kelly_fraction: float = 0.25
) -> float:
"""
Calculate the minimum edge required before placing a bet.
The threshold accounts for estimation uncertainty by requiring
that the lower bound of your probability estimate still shows
positive expected value.
Parameters
----------
confidence_in_estimate : float
How confident you are in your probability estimate (0-1)
This determines how much to discount your estimated edge
probability_range : tuple
(lower_bound, upper_bound) of your probability estimate
bet_type : str
'standard' (sides/totals), 'prop', or 'live'
kelly_fraction : float
Your Kelly fraction (typically 0.2-0.3)
Returns
-------
float
Minimum edge threshold (as a proportion, not percentage)
"""
# Base threshold varies by bet type
base_thresholds = {
'standard': 0.02, # 2% for efficient markets
'prop': 0.03, # 3% for less efficient prop markets
'live': 0.04, # 4% for fast-moving live markets
'futures': 0.05, # 5% for high-vig futures
}
base = base_thresholds.get(bet_type, 0.03)
# Adjust for estimation uncertainty
prob_estimate = (probability_range[0] + probability_range[1]) / 2
uncertainty = (probability_range[1] - probability_range[0]) / 2
# Use the conservative end of the estimate
conservative_prob = prob_estimate - uncertainty * (1 - confidence_in_estimate)
# The edge must exceed the base threshold even at the conservative estimate
# This naturally requires a larger perceived edge when uncertainty is high
required_edge = base / confidence_in_estimate
return required_edge
# Example: You estimate Team A has a 55% chance (range: 50%-60%)
threshold = minimum_edge_threshold(
confidence_in_estimate=0.7,
probability_range=(0.50, 0.60),
bet_type='standard'
)
print(f"Minimum edge threshold: {threshold*100:.1f}%")
# A 70% confident bettor in a standard market needs ~2.9% edge
A Framework for Systematic Value Identification
Here is a complete workflow for identifying value bets:
from dataclasses import dataclass
from enum import Enum
from typing import List
class ValueRating(Enum):
"""Rating categories for value opportunities."""
NO_VALUE = 0
MARGINAL = 1 # Edge 1-2%: generally not worth betting
MODERATE = 2 # Edge 2-4%: bet at reduced Kelly
STRONG = 3 # Edge 4-7%: bet at standard Kelly
VERY_STRONG = 4 # Edge 7%+: bet at full Kelly fraction
@dataclass
class ValueAssessment:
"""Complete assessment of a potential value bet."""
event: str
selection: str
market: str
your_probability: float
your_prob_lower: float
your_prob_upper: float
best_available_odds: float
best_book: str
implied_probability: float
edge: float
edge_lower: float
edge_upper: float
value_rating: ValueRating
recommended_kelly: float
recommended_stake_pct: float
confidence_level: float
notes: str = ""
def assess_value(
event: str,
selection: str,
market: str,
your_prob: float,
your_prob_std: float,
best_odds: float,
best_book: str,
bankroll: float = 10000,
kelly_fraction: float = 0.25
) -> ValueAssessment:
"""
Perform a complete value assessment for a potential bet.
Parameters
----------
event : str
Event description
selection : str
What you're betting on
market : str
Market type
your_prob : float
Your point estimate of the probability
your_prob_std : float
Standard deviation of your probability estimate
best_odds : float
Best available American odds
best_book : str
Sportsbook offering the best odds
bankroll : float
Current bankroll
kelly_fraction : float
Fraction of full Kelly to use
Returns
-------
ValueAssessment
"""
# Calculate implied probability from odds
if best_odds > 0:
implied = 100 / (best_odds + 100)
else:
implied = abs(best_odds) / (abs(best_odds) + 100)
# Calculate edge and confidence interval
edge = your_prob / implied - 1
prob_lower = max(0.01, your_prob - 1.96 * your_prob_std)
prob_upper = min(0.99, your_prob + 1.96 * your_prob_std)
edge_lower = prob_lower / implied - 1
edge_upper = prob_upper / implied - 1
# Confidence that edge is positive
# (probability that true prob > implied prob)
from scipy.stats import norm
z = (your_prob - implied) / your_prob_std
confidence = norm.cdf(z)
# Determine value rating
if edge_lower > 0.07:
rating = ValueRating.VERY_STRONG
elif edge_lower > 0.04:
rating = ValueRating.STRONG
elif edge_lower > 0.01:
rating = ValueRating.MODERATE
elif edge > 0.01:
rating = ValueRating.MARGINAL
else:
rating = ValueRating.NO_VALUE
# Kelly criterion calculation
if best_odds > 0:
decimal_odds = best_odds / 100 + 1
else:
decimal_odds = 100 / abs(best_odds) + 1
b = decimal_odds - 1 # Net odds
full_kelly = (your_prob * b - (1 - your_prob)) / b
full_kelly = max(0, full_kelly)
# Apply fraction and confidence adjustment
adjusted_kelly = full_kelly * kelly_fraction * min(confidence, 1.0)
recommended_stake = adjusted_kelly * bankroll
return ValueAssessment(
event=event,
selection=selection,
market=market,
your_probability=your_prob,
your_prob_lower=prob_lower,
your_prob_upper=prob_upper,
best_available_odds=best_odds,
best_book=best_book,
implied_probability=implied,
edge=edge,
edge_lower=edge_lower,
edge_upper=edge_upper,
value_rating=rating,
recommended_kelly=adjusted_kelly,
recommended_stake_pct=adjusted_kelly * 100,
confidence_level=confidence,
)
# Example usage
assessment = assess_value(
event="Patriots vs Bills",
selection="Patriots +3",
market="spread",
your_prob=0.55,
your_prob_std=0.04,
best_odds=-105,
best_book="BookD",
)
print(f"Value Assessment: {assessment.event}")
print(f" Selection: {assessment.selection}")
print(f" Your Probability: {assessment.your_probability:.1%}")
print(f" 95% CI: [{assessment.your_prob_lower:.1%}, {assessment.your_prob_upper:.1%}]")
print(f" Implied Probability: {assessment.implied_probability:.1%}")
print(f" Edge: {assessment.edge:.1%}")
print(f" Edge 95% CI: [{assessment.edge_lower:.1%}, {assessment.edge_upper:.1%}]")
print(f" Confidence Edge > 0: {assessment.confidence_level:.1%}")
print(f" Value Rating: {assessment.value_rating.name}")
print(f" Recommended Stake: {assessment.recommended_stake_pct:.2f}% of bankroll")
Multi-Factor Value Scoring
In practice, value is not one-dimensional. A sophisticated bettor considers multiple factors when evaluating a bet:
def multi_factor_value_score(
edge: float,
clv_track_record: float,
market_efficiency: float,
liquidity: float,
correlation_with_existing_bets: float,
time_to_event_hours: float
) -> float:
"""
Compute a composite value score combining multiple factors.
Parameters
----------
edge : float
Estimated edge (e.g., 0.05 for 5%)
clv_track_record : float
Your historical CLV in this market type (-1 to +1, typically -0.05 to +0.05)
market_efficiency : float
How efficient the market is (0=very efficient, 1=very inefficient)
liquidity : float
How easy it is to get your desired stake down (0=impossible, 1=easy)
correlation_with_existing_bets : float
Correlation with bets already in your portfolio (0=uncorrelated, 1=identical)
time_to_event_hours : float
Hours until the event starts
Returns
-------
float
Composite value score (higher = more attractive)
"""
# Weight each factor
weights = {
'edge': 0.35,
'track_record': 0.20,
'market_efficiency': 0.15,
'liquidity': 0.10,
'diversification': 0.10,
'timing': 0.10,
}
# Normalize edge to 0-1 scale (cap at 15% edge)
edge_score = min(edge / 0.15, 1.0)
# Track record: positive CLV history boosts confidence
track_score = max(0, min(1, (clv_track_record + 0.05) / 0.10))
# Market efficiency: prefer less efficient markets
efficiency_score = market_efficiency
# Liquidity: penalize if can't get full size down
liquidity_score = liquidity
# Diversification: penalize correlated bets
diversification_score = 1.0 - correlation_with_existing_bets
# Timing: slight preference for events further out (more time to manage)
timing_score = min(time_to_event_hours / 48, 1.0)
composite = (
weights['edge'] * edge_score +
weights['track_record'] * track_score +
weights['market_efficiency'] * efficiency_score +
weights['liquidity'] * liquidity_score +
weights['diversification'] * diversification_score +
weights['timing'] * timing_score
)
return composite
Callout: The "Confidence Trap"
One of the most common mistakes in value betting is overconfidence in your probability estimates. If your model says a team has a 60% chance of winning and the market says 50%, the most likely explanation is not that you have found a 10% edge -- it is that your model is wrong. Always ask: "Why would the market be mispricing this?" If you cannot articulate a specific, verifiable reason (e.g., injury not yet priced in, weather change, biased public perception), treat your edge estimate with extreme skepticism and reduce your stake accordingly.
13.3 Tracking and Recording Bets
Why Tracking Matters
Professional bettors are meticulous record-keepers. Without detailed tracking, you cannot:
- Calculate your actual ROI across different sports, markets, and bet types
- Measure your CLV to determine if your betting process is sound
- Identify strengths and weaknesses in your approach
- Detect when your edge is decaying and adapt accordingly
- File accurate tax returns (in jurisdictions where gambling income is taxable)
What to Track
At minimum, record the following for every bet:
| Field | Example | Purpose |
|---|---|---|
| Date/Time placed | 2025-01-15 14:32 EST | Timing analysis |
| Sport | NFL | Filter by sport |
| Event | Patriots vs Bills | Game identification |
| Market | Spread | Market type analysis |
| Selection | Patriots +3 | What you bet |
| Odds at placement | -108 | CLV calculation |
| Closing odds | -114 | CLV calculation |
| Closing spread (if applicable) | Patriots +2.5 | Spread CLV |
| Stake | $250 | P&L calculation |
| Sportsbook | BookD | Book performance tracking |
| Result | Win | P&L calculation |
| Profit/Loss | +$231.48 | Bottom line |
| Your estimated probability | 0.55 | Edge tracking |
| Model used | Elo+LR v3.2 | Model comparison |
| Confidence level | High | Qualitative assessment |
| Notes | "Weather shift, wind 25mph" | Context for review |
Building a Bet Tracking System
"""
bet_tracker.py
A comprehensive bet tracking and journaling system.
"""
import sqlite3
import pandas as pd
import numpy as np
from datetime import datetime, timezone
from typing import Optional, List
from dataclasses import dataclass, asdict
@dataclass
class Bet:
"""Represents a single bet with all relevant metadata."""
# Required fields
sport: str
event: str
market: str
selection: str
odds_placed: float
stake: float
sportsbook: str
# Optional fields (filled in later)
bet_id: Optional[int] = None
date_placed: Optional[str] = None
odds_closing: Optional[float] = None
spread_placed: Optional[float] = None
spread_closing: Optional[float] = None
result: Optional[str] = None # 'win', 'loss', 'push', 'void'
profit_loss: Optional[float] = None
your_probability: Optional[float] = None
model_name: Optional[str] = None
confidence: Optional[str] = None # 'low', 'medium', 'high'
notes: Optional[str] = None
tags: Optional[str] = None # Comma-separated tags
class BetTracker:
"""
Comprehensive bet tracking and analysis system.
Parameters
----------
db_path : str
Path to SQLite database for bet storage
"""
def __init__(self, db_path: str = "bet_journal.db"):
self.db_path = db_path
self._init_database()
def _init_database(self):
"""Initialize the bet tracking database."""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS bets (
bet_id INTEGER PRIMARY KEY AUTOINCREMENT,
date_placed TEXT NOT NULL,
sport TEXT NOT NULL,
event TEXT NOT NULL,
market TEXT NOT NULL,
selection TEXT NOT NULL,
odds_placed REAL NOT NULL,
odds_closing REAL,
spread_placed REAL,
spread_closing REAL,
stake REAL NOT NULL,
sportsbook TEXT NOT NULL,
result TEXT,
profit_loss REAL,
your_probability REAL,
model_name TEXT,
confidence TEXT,
notes TEXT,
tags TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_bets_sport ON bets(sport)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_bets_date ON bets(date_placed)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_bets_sportsbook ON bets(sportsbook)
""")
conn.commit()
conn.close()
def add_bet(self, bet: Bet) -> int:
"""
Add a new bet to the tracker.
Parameters
----------
bet : Bet
The bet to record
Returns
-------
int
The bet_id of the newly inserted bet
"""
if bet.date_placed is None:
bet.date_placed = datetime.now(timezone.utc).isoformat()
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute("""
INSERT INTO bets (date_placed, sport, event, market, selection,
odds_placed, odds_closing, spread_placed,
spread_closing, stake, sportsbook, result,
profit_loss, your_probability, model_name,
confidence, notes, tags)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
bet.date_placed, bet.sport, bet.event, bet.market,
bet.selection, bet.odds_placed, bet.odds_closing,
bet.spread_placed, bet.spread_closing, bet.stake,
bet.sportsbook, bet.result, bet.profit_loss,
bet.your_probability, bet.model_name, bet.confidence,
bet.notes, bet.tags
))
bet_id = cursor.lastrowid
conn.commit()
conn.close()
return bet_id
def update_result(
self,
bet_id: int,
result: str,
odds_closing: Optional[float] = None,
spread_closing: Optional[float] = None
):
"""
Update a bet with its result and closing line information.
Parameters
----------
bet_id : int
result : str
'win', 'loss', 'push', or 'void'
odds_closing : float, optional
Closing odds
spread_closing : float, optional
Closing spread
"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Get the bet to calculate P&L
cursor.execute("SELECT odds_placed, stake FROM bets WHERE bet_id = ?",
(bet_id,))
row = cursor.fetchone()
if not row:
conn.close()
raise ValueError(f"Bet {bet_id} not found")
odds, stake = row
# Calculate profit/loss
if result == 'win':
if odds > 0:
profit_loss = stake * odds / 100
else:
profit_loss = stake * 100 / abs(odds)
elif result == 'loss':
profit_loss = -stake
elif result == 'push':
profit_loss = 0
elif result == 'void':
profit_loss = 0
else:
raise ValueError(f"Invalid result: {result}")
cursor.execute("""
UPDATE bets SET
result = ?,
profit_loss = ?,
odds_closing = COALESCE(?, odds_closing),
spread_closing = COALESCE(?, spread_closing),
updated_at = CURRENT_TIMESTAMP
WHERE bet_id = ?
""", (result, profit_loss, odds_closing, spread_closing, bet_id))
conn.commit()
conn.close()
def get_performance_summary(
self,
sport: Optional[str] = None,
market: Optional[str] = None,
sportsbook: Optional[str] = None,
date_from: Optional[str] = None,
date_to: Optional[str] = None
) -> dict:
"""
Generate a comprehensive performance summary with optional filters.
Returns
-------
dict with performance metrics
"""
conn = sqlite3.connect(self.db_path)
query = "SELECT * FROM bets WHERE result IS NOT NULL"
params = []
if sport:
query += " AND sport = ?"
params.append(sport)
if market:
query += " AND market = ?"
params.append(market)
if sportsbook:
query += " AND sportsbook = ?"
params.append(sportsbook)
if date_from:
query += " AND date_placed >= ?"
params.append(date_from)
if date_to:
query += " AND date_placed <= ?"
params.append(date_to)
df = pd.read_sql_query(query, conn, params=params)
conn.close()
if len(df) == 0:
return {'error': 'No bets found matching criteria'}
# Core metrics
total_bets = len(df)
wins = (df['result'] == 'win').sum()
losses = (df['result'] == 'loss').sum()
pushes = (df['result'] == 'push').sum()
total_staked = df['stake'].sum()
total_profit = df['profit_loss'].sum()
roi = total_profit / total_staked if total_staked > 0 else 0
# CLV metrics (where closing odds are available)
clv_df = df.dropna(subset=['odds_closing'])
if len(clv_df) > 0:
clv_values = []
for _, row in clv_df.iterrows():
placed_implied = american_to_implied_prob(row['odds_placed'])
closing_implied = american_to_implied_prob(row['odds_closing'])
clv = closing_implied - placed_implied
clv_values.append(clv)
mean_clv = np.mean(clv_values)
clv_n = len(clv_values)
else:
mean_clv = None
clv_n = 0
# Win rate with confidence interval (Wilson score interval)
from scipy.stats import norm
z = 1.96
n = wins + losses # Exclude pushes
if n > 0:
p_hat = wins / n
denominator = 1 + z**2 / n
center = (p_hat + z**2 / (2*n)) / denominator
spread = z * np.sqrt(
(p_hat * (1 - p_hat) + z**2 / (4*n)) / n
) / denominator
win_rate_ci = (center - spread, center + spread)
else:
p_hat = 0
win_rate_ci = (0, 0)
return {
'total_bets': total_bets,
'wins': wins,
'losses': losses,
'pushes': pushes,
'win_rate': p_hat,
'win_rate_ci_95': win_rate_ci,
'total_staked': total_staked,
'total_profit': total_profit,
'roi': roi,
'roi_pct': roi * 100,
'mean_clv': mean_clv,
'clv_bets_tracked': clv_n,
'avg_odds': df['odds_placed'].mean(),
'avg_stake': df['stake'].mean(),
'max_win': df['profit_loss'].max(),
'max_loss': df['profit_loss'].min(),
'longest_win_streak': _longest_streak(df, 'win'),
'longest_loss_streak': _longest_streak(df, 'loss'),
}
def _longest_streak(df: pd.DataFrame, result_type: str) -> int:
"""Calculate the longest streak of a given result type."""
max_streak = 0
current_streak = 0
for result in df['result']:
if result == result_type:
current_streak += 1
max_streak = max(max_streak, current_streak)
else:
current_streak = 0
return max_streak
def american_to_implied_prob(odds: float) -> float:
"""Convert American odds to implied probability."""
if odds < 0:
return abs(odds) / (abs(odds) + 100)
else:
return 100 / (odds + 100)
Building a Betting Journal: Beyond Raw Numbers
A bet tracker captures the quantitative data, but a betting journal captures the qualitative reasoning. For each bet, record:
- Pre-bet thesis: Why do you believe this bet has value? What specific factor does the market undervalue?
- Counter-arguments: What could make this bet wrong? What is the strongest argument against your position?
- Key variables: What information, if it changes, would cause you to reverse your opinion?
- Post-event review: Was your thesis correct? Did you win/lose for the right reasons?
@dataclass
class JournalEntry:
"""A qualitative journal entry accompanying a bet."""
bet_id: int
pre_bet_thesis: str
counter_arguments: str
key_variables: str
confidence_reasoning: str
post_event_review: Optional[str] = None
lessons_learned: Optional[str] = None
# Example journal entry
entry = JournalEntry(
bet_id=142,
pre_bet_thesis=(
"Patriots +3 has value because the market overweights the Bills' "
"recent offensive output, which came against weak defenses (JAX, NYG). "
"The Patriots defense ranks top-5 in pressure rate and the Bills' "
"O-line has been graded poorly by PFF in the last 4 weeks."
),
counter_arguments=(
"Bills have won 6 straight, including covering in 4. Josh Allen "
"historically plays well in cold weather. Patriots offense has been "
"anemic, averaging 14 ppg in the last 3."
),
key_variables=(
"Wind speed (if >20 mph, favors Under more than side bet). "
"Trent Brown availability for the Bills OL. "
"Patriots' WR1 status (questionable with hamstring)."
),
confidence_reasoning=(
"Medium-High confidence. The defensive matchup angle is strong and "
"quantifiable. My model shows +3.5 as fair spread. Getting +3 at -105 "
"is marginal but above threshold."
),
)
Callout: The Reviewing Habit
Set aside time weekly to review your bets -- not just the results, but your reasoning. The most valuable learning happens when you correctly identified value but lost (bad luck, good process) or when you incorrectly identified value but won (good luck, bad process). Track how often your pre-bet thesis was validated by post-game analysis. This meta-tracking is what separates the bettor who improves over time from the one who stays stagnant.
13.4 Evaluating Your Edge Over Time
The Sample Size Problem
Sports betting has inherently high variance. Even a bettor with a genuine 3% ROI edge will experience substantial swings:
import numpy as np
from scipy import stats
def required_sample_size(
true_roi: float,
avg_odds: float = -110,
confidence: float = 0.95,
power: float = 0.80
) -> int:
"""
Calculate the number of bets needed to statistically confirm an edge.
Uses a one-sample z-test framework where the null hypothesis
is ROI = 0 and the alternative is ROI = true_roi.
Parameters
----------
true_roi : float
Your true ROI (e.g., 0.03 for 3%)
avg_odds : float
Average American odds of your bets
confidence : float
Desired confidence level (1 - alpha)
power : float
Desired statistical power (1 - beta)
Returns
-------
int
Minimum number of bets required
"""
# Standard deviation of a single bet's return
# For a -110 bet: win +$90.91 or lose -$100 on a $100 bet
if avg_odds < 0:
win_payout = 100 / abs(avg_odds)
else:
win_payout = avg_odds / 100
# Approximate win probability given the ROI
# Expected return = p * win_payout - (1-p) = ROI
# p * (win_payout + 1) - 1 = ROI
# p = (1 + ROI) / (1 + win_payout)
p = (1 + true_roi) / (1 + win_payout)
# Variance of a single bet's return (per $1 staked)
variance = p * win_payout**2 + (1-p) * 1 - true_roi**2
sigma = np.sqrt(variance)
# Required sample size (z-test)
z_alpha = stats.norm.ppf(1 - (1 - confidence) / 2)
z_beta = stats.norm.ppf(power)
n = ((z_alpha + z_beta) * sigma / true_roi) ** 2
return int(np.ceil(n))
# How many bets to confirm various edges?
for roi in [0.01, 0.02, 0.03, 0.05, 0.08, 0.10]:
n = required_sample_size(roi)
print(f" ROI = {roi*100:.0f}%: {n:>6,} bets needed")
Expected output:
ROI = 1%: 38,416 bets needed
ROI = 2%: 9,604 bets needed
ROI = 3%: 4,268 bets needed
ROI = 5%: 1,537 bets needed
ROI = 8%: 600 bets needed
ROI = 10%: 384 bets needed
These numbers are sobering. A bettor with a solid 3% ROI edge needs over 4,000 bets at standard -110 juice to confirm their edge at 95% confidence with 80% power. At 5 bets per day, that is over two years of betting.
Confidence Intervals on ROI
Rather than waiting for statistical significance, a more practical approach is to track your confidence interval on ROI and watch it narrow over time:
def roi_confidence_interval(
bets_df: pd.DataFrame,
confidence: float = 0.95,
method: str = 'bootstrap'
) -> dict:
"""
Calculate confidence interval on ROI.
Parameters
----------
bets_df : pd.DataFrame
Must have 'stake' and 'profit_loss' columns
confidence : float
Confidence level
method : str
'normal' for normal approximation, 'bootstrap' for bootstrap
Returns
-------
dict with ROI estimate and confidence interval
"""
stakes = bets_df['stake'].values
profits = bets_df['profit_loss'].values
returns = profits / stakes # Per-bet return
roi = profits.sum() / stakes.sum()
n = len(returns)
if method == 'normal':
se = returns.std(ddof=1) / np.sqrt(n)
z = stats.norm.ppf((1 + confidence) / 2)
ci = (roi - z * se, roi + z * se)
elif method == 'bootstrap':
n_bootstrap = 10000
bootstrap_rois = []
for _ in range(n_bootstrap):
idx = np.random.choice(n, size=n, replace=True)
boot_roi = profits[idx].sum() / stakes[idx].sum()
bootstrap_rois.append(boot_roi)
alpha = (1 - confidence) / 2
ci = (
np.percentile(bootstrap_rois, alpha * 100),
np.percentile(bootstrap_rois, (1 - alpha) * 100)
)
return {
'roi': roi,
'roi_pct': roi * 100,
'ci_lower': ci[0],
'ci_upper': ci[1],
'ci_lower_pct': ci[0] * 100,
'ci_upper_pct': ci[1] * 100,
'n_bets': n,
'total_staked': stakes.sum(),
'total_profit': profits.sum(),
}
Regression to the Mean: Understanding Variance
A critical concept for evaluating performance is regression to the mean. Early results are heavily influenced by variance, and extreme early performance (both good and bad) tends to moderate over time.
def simulate_regression_to_mean(
true_roi: float = 0.03,
n_bets: int = 2000,
n_simulations: int = 5000,
checkpoints: list = None
) -> pd.DataFrame:
"""
Simulate how observed ROI converges to true ROI over time.
Parameters
----------
true_roi : float
True underlying ROI
n_bets : int
Maximum number of bets to simulate
n_simulations : int
Number of simulation paths
checkpoints : list of int
Bet counts at which to record statistics
Returns
-------
pd.DataFrame with statistics at each checkpoint
"""
if checkpoints is None:
checkpoints = [50, 100, 200, 500, 1000, 2000]
# Win probability for -110 bets with given ROI
win_payout = 100 / 110 # ~0.909
p_win = (1 + true_roi) / (1 + win_payout)
results = []
for cp in checkpoints:
observed_rois = []
for _ in range(n_simulations):
outcomes = np.random.binomial(1, p_win, cp)
profit = outcomes.sum() * win_payout - (1 - outcomes).sum()
observed_roi = profit / cp
observed_rois.append(observed_roi)
observed_rois = np.array(observed_rois)
results.append({
'n_bets': cp,
'mean_roi': observed_rois.mean(),
'std_roi': observed_rois.std(),
'pct_profitable': (observed_rois > 0).mean() * 100,
'pct_within_1pct': (
(observed_rois > true_roi - 0.01) &
(observed_rois < true_roi + 0.01)
).mean() * 100,
'worst_5pct': np.percentile(observed_rois, 5) * 100,
'best_5pct': np.percentile(observed_rois, 95) * 100,
})
return pd.DataFrame(results)
# Run and display
convergence = simulate_regression_to_mean(true_roi=0.03)
print("Regression to Mean: 3% True ROI Bettor at -110")
print(f"{'Bets':>6} | {'Mean ROI':>9} | {'Std Dev':>8} | "
f"{'% Profitable':>13} | {'5th %ile':>9} | {'95th %ile':>10}")
print("-" * 70)
for _, row in convergence.iterrows():
print(f"{row['n_bets']:>6.0f} | {row['mean_roi']*100:>8.2f}% | "
f"{row['std_roi']*100:>7.2f}% | {row['pct_profitable']:>12.1f}% | "
f"{row['worst_5pct']:>8.1f}% | {row['best_5pct']:>9.1f}%")
Expected output:
Regression to Mean: 3% True ROI Bettor at -110
Bets | Mean ROI | Std Dev | % Profitable | 5th %ile | 95th %ile
----------------------------------------------------------------------
50 | 3.01% | 13.60% | 58.8% | -19.5% | 25.5%
100 | 3.00% | 9.62% | 62.1% | -12.8% | 18.8%
200 | 3.00% | 6.80% | 67.0% | -8.2% | 14.2%
500 | 3.01% | 4.30% | 75.7% | -4.1% | 10.1%
1000 | 3.00% | 3.04% | 83.8% | -2.0% | 8.0%
2000 | 3.00% | 2.15% | 91.8% | -0.5% | 6.5%
Notice that even after 200 bets, a 3% ROI bettor is only profitable 67% of the time in simulation. This is why patience and process-focus (CLV) are so important.
Multi-Dimensional Performance Evaluation
Evaluate performance across multiple dimensions simultaneously:
def comprehensive_evaluation(tracker: BetTracker) -> dict:
"""
Perform multi-dimensional performance evaluation.
Returns performance broken down by:
- Sport
- Market type
- Sportsbook
- Day of week
- Odds range
- Confidence level
"""
conn = sqlite3.connect(tracker.db_path)
df = pd.read_sql_query(
"SELECT * FROM bets WHERE result IS NOT NULL", conn
)
conn.close()
if len(df) == 0:
return {}
evaluations = {}
# By sport
sport_perf = {}
for sport in df['sport'].unique():
subset = df[df['sport'] == sport]
sport_perf[sport] = {
'n_bets': len(subset),
'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
'win_rate': (subset['result'] == 'win').mean(),
}
evaluations['by_sport'] = sport_perf
# By market type
market_perf = {}
for market in df['market'].unique():
subset = df[df['market'] == market]
market_perf[market] = {
'n_bets': len(subset),
'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
'win_rate': (subset['result'] == 'win').mean(),
}
evaluations['by_market'] = market_perf
# By odds range
df['odds_bucket'] = pd.cut(
df['odds_placed'],
bins=[-500, -200, -150, -120, -100, 100, 120, 150, 200, 500],
labels=['Heavy Fav', 'Med Fav', 'Slight Fav', 'Pick',
'Pick+', 'Slight Dog', 'Med Dog', 'Big Dog']
)
odds_perf = {}
for bucket in df['odds_bucket'].dropna().unique():
subset = df[df['odds_bucket'] == bucket]
if len(subset) >= 10:
odds_perf[str(bucket)] = {
'n_bets': len(subset),
'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
'win_rate': (subset['result'] == 'win').mean(),
}
evaluations['by_odds_range'] = odds_perf
# By confidence level
if df['confidence'].notna().any():
conf_perf = {}
for conf in df['confidence'].dropna().unique():
subset = df[df['confidence'] == conf]
conf_perf[conf] = {
'n_bets': len(subset),
'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
'win_rate': (subset['result'] == 'win').mean(),
}
evaluations['by_confidence'] = conf_perf
return evaluations
13.5 When Markets Correct: Adapting Your Approach
The Lifecycle of an Edge
Every edge in sports betting has a lifecycle:
- Discovery: You or your model identifies a market inefficiency
- Exploitation: You profit from the inefficiency
- Correction: The market learns and the inefficiency narrows or disappears
- Adaptation: You find new inefficiencies or refine your approach
Understanding this lifecycle is critical because no edge lasts forever. The sports betting market is an adversarial environment where:
- Sportsbooks hire quantitative analysts and use machine learning to improve their lines
- Other sharp bettors discover and exploit the same inefficiencies
- Data that was once hard to obtain becomes widely available
- Regulatory changes alter market dynamics
Detecting Edge Decay
You can detect when your edge is decaying by monitoring several metrics over time:
def detect_edge_decay(
bets_df: pd.DataFrame,
window_size: int = 100,
min_windows: int = 5
) -> dict:
"""
Detect whether your betting edge is decaying over time
using rolling window analysis.
Parameters
----------
bets_df : pd.DataFrame
Chronologically ordered bets with 'profit_loss', 'stake',
and optionally 'odds_placed', 'odds_closing' columns
window_size : int
Number of bets per rolling window
min_windows : int
Minimum number of complete windows required
Returns
-------
dict with decay analysis
"""
df = bets_df.sort_values('date_placed').reset_index(drop=True)
n = len(df)
if n < window_size * min_windows:
return {'error': f'Need at least {window_size * min_windows} bets'}
# Rolling ROI
rolling_roi = []
rolling_clv = []
for start in range(0, n - window_size + 1, window_size // 2):
end = start + window_size
window = df.iloc[start:end]
roi = window['profit_loss'].sum() / window['stake'].sum()
rolling_roi.append({
'window_start': start,
'window_end': end,
'roi': roi,
})
# Rolling CLV if available
if 'odds_closing' in window.columns:
clv_window = window.dropna(subset=['odds_closing'])
if len(clv_window) > 0:
clv_values = []
for _, row in clv_window.iterrows():
placed = american_to_implied_prob(row['odds_placed'])
closing = american_to_implied_prob(row['odds_closing'])
clv_values.append(closing - placed)
rolling_clv.append({
'window_start': start,
'window_end': end,
'mean_clv': np.mean(clv_values),
})
roi_df = pd.DataFrame(rolling_roi)
# Linear regression on rolling ROI to detect trend
from scipy.stats import linregress
x = np.arange(len(roi_df))
slope, intercept, r_value, p_value, std_err = linregress(
x, roi_df['roi'].values
)
# Interpretation
if slope < -0.001 and p_value < 0.10:
decay_status = "SIGNIFICANT DECAY DETECTED"
recommendation = (
"Your edge appears to be declining over time. "
"Consider revising your model, exploring new markets, "
"or adjusting your approach."
)
elif slope < 0:
decay_status = "MILD DECLINE (not statistically significant)"
recommendation = (
"There is a slight downward trend, but it could be due to "
"normal variance. Continue monitoring."
)
else:
decay_status = "NO DECAY DETECTED"
recommendation = (
"Your edge appears stable or improving. Continue current approach."
)
# CLV decay analysis
clv_analysis = None
if rolling_clv:
clv_df = pd.DataFrame(rolling_clv)
clv_slope, _, clv_r, clv_p, _ = linregress(
np.arange(len(clv_df)), clv_df['mean_clv'].values
)
clv_analysis = {
'clv_trend_slope': clv_slope,
'clv_trend_p_value': clv_p,
'clv_declining': clv_slope < 0 and clv_p < 0.10,
}
return {
'n_windows': len(roi_df),
'window_size': window_size,
'roi_trend_slope': slope,
'roi_trend_r_squared': r_value**2,
'roi_trend_p_value': p_value,
'decay_status': decay_status,
'recommendation': recommendation,
'clv_analysis': clv_analysis,
'rolling_roi_data': roi_df,
}
Common Causes of Edge Decay
| Cause | Signal | Response |
|---|---|---|
| Model parameters stale | CLV declining, ROI declining | Retrain model on recent data |
| Market became more efficient | CLV near zero, fewer outlier lines | Move to less efficient markets |
| Sportsbook adjusted their model | Lines closer to sharp market | Find new books with pricing gaps |
| New data source widely available | Everyone has what you had | Find new, unique data sources |
| Rule change in sport | Historical patterns no longer apply | Update model for new rules |
| Account limited/restricted | Can't get desired stakes | Open accounts at new books |
Strategies for Staying Ahead
1. Diversify Across Markets and Sports
Don't rely on a single edge source. Maintain edges across: - Multiple sports (NFL, NBA, MLB, NHL, soccer) - Multiple market types (sides, totals, props, futures) - Multiple timeframes (pre-game, live, futures)
2. Continuously Update Your Models
class AdaptiveModel:
"""
A model framework that continuously learns and adapts.
Uses an expanding window approach where the model is periodically
retrained on all available data, with more weight on recent observations.
"""
def __init__(
self,
base_model,
retrain_frequency: int = 100,
recency_weight: float = 0.7
):
"""
Parameters
----------
base_model : sklearn-compatible model
The underlying prediction model
retrain_frequency : int
Retrain after this many new observations
recency_weight : float
Weight given to last season's data vs. all historical (0-1)
"""
self.base_model = base_model
self.retrain_frequency = retrain_frequency
self.recency_weight = recency_weight
self.training_data = []
self.n_since_retrain = 0
self.version = 0
def add_observation(self, features: np.ndarray, outcome: float):
"""Add a new observation and retrain if threshold reached."""
self.training_data.append((features, outcome))
self.n_since_retrain += 1
if self.n_since_retrain >= self.retrain_frequency:
self.retrain()
def retrain(self):
"""Retrain the model on all available data with recency weighting."""
if len(self.training_data) < 50:
return
X = np.array([obs[0] for obs in self.training_data])
y = np.array([obs[1] for obs in self.training_data])
# Create sample weights: more weight on recent data
n = len(y)
recency = np.linspace(1 - self.recency_weight, 1.0, n)
weights = recency / recency.sum() * n
self.base_model.fit(X, y, sample_weight=weights)
self.version += 1
self.n_since_retrain = 0
def predict(self, features: np.ndarray) -> float:
"""Generate probability prediction."""
if self.version == 0:
return 0.5 # No training yet
return self.base_model.predict_proba(features.reshape(1, -1))[0, 1]
3. Monitor the Information Ecosystem
Stay aware of what information is becoming publicly available. When a formerly proprietary dataset (e.g., player tracking data, advanced metrics) becomes public, the edge from that data diminishes rapidly.
4. Build Process, Not Just Models
The most sustainable edge comes from better processes:
- Faster data pipeline: Getting information before others
- Better bet execution: Lower latency in placing bets
- Superior bankroll management: Surviving drawdowns that bust competitors
- Disciplined tracking: Learning from mistakes systematically
Callout: The "Red Queen" Effect
In Lewis Carroll's Through the Looking-Glass, the Red Queen tells Alice: "It takes all the running you can do, to keep in the same place." Sports betting is similar. The market is constantly improving, and maintaining your edge requires continuous effort. The bettor who rests on their laurels will find their edge evaporating. The bettor who continuously learns, adapts, and innovates will find new edges as old ones close. This is why process matters more than any single model.
Practical Adaptation Workflow
Here is a quarterly review process for maintaining your edge:
def quarterly_review(tracker: BetTracker, quarter_start: str, quarter_end: str):
"""
Perform a structured quarterly review of betting performance.
This generates a comprehensive report that guides adaptation.
"""
# 1. Overall performance
overall = tracker.get_performance_summary(
date_from=quarter_start, date_to=quarter_end
)
print("=" * 60)
print(f"QUARTERLY REVIEW: {quarter_start} to {quarter_end}")
print("=" * 60)
print(f"\nOverall: {overall['total_bets']} bets, "
f"ROI: {overall['roi_pct']:.2f}%, "
f"Profit: ${overall['total_profit']:.2f}")
# 2. Performance by segment
for sport in ['NFL', 'NBA', 'MLB', 'NHL']:
segment = tracker.get_performance_summary(
sport=sport, date_from=quarter_start, date_to=quarter_end
)
if 'error' not in segment:
print(f"\n {sport}: {segment['total_bets']} bets, "
f"ROI: {segment['roi_pct']:.2f}%")
# 3. CLV analysis
if overall.get('mean_clv') is not None:
clv_pct = overall['mean_clv'] * 100
print(f"\nMean CLV: {clv_pct:+.2f}% "
f"({overall['clv_bets_tracked']} bets tracked)")
if clv_pct < 0:
print(" WARNING: Negative CLV suggests your betting process "
"may not have an edge.")
elif clv_pct < 1:
print(" CAUTION: Low CLV. Edge may be marginal or decaying.")
else:
print(" POSITIVE: Strong CLV indicates a viable edge.")
# 4. Adaptation recommendations
print("\n--- ADAPTATION CHECKLIST ---")
print("[ ] Review and retrain models on latest data")
print("[ ] Check for newly available data sources")
print("[ ] Assess which markets/sports had best/worst CLV")
print("[ ] Review sportsbook account status (limits, bans)")
print("[ ] Update bankroll allocation across sports")
print("[ ] Evaluate any rule changes in tracked sports")
print("[ ] Review bet sizing discipline (actual vs. recommended)")
13.6 Chapter Summary
This chapter developed a comprehensive framework for value betting -- the systematic identification and exploitation of positive expected value opportunities in sports betting markets.
True Probability Estimation: - Model-based approaches (Elo, logistic regression, etc.) estimate probability from features - Market-based approaches use sharp closing lines as the best available probability estimate - The Bayesian framework combines model and market information, weighted by confidence in each - Calibration analysis verifies that your probability estimates are accurate
Systematic Value Identification: - Value exists when your estimated true probability exceeds the implied probability from the odds - Edge thresholds should account for estimation uncertainty, market efficiency, and variance - Multi-factor scoring incorporates edge size, track record, market efficiency, liquidity, diversification, and timing - Confidence in your edge should modulate stake size through the Kelly criterion
Bet Tracking and Journaling: - Track every bet with complete metadata: odds, closing line, stake, result, model, and reasoning - A qualitative journal captures the pre-bet thesis, counter-arguments, and post-event review - Systematic tracking enables performance analysis across sports, markets, books, and time periods
Performance Evaluation: - The sample size problem is severe: confirming a 3% edge requires 4,000+ bets - CLV provides a faster signal of edge than raw profit/loss - Confidence intervals on ROI should be calculated and monitored over time - Regression to the mean means early results are unreliable -- trust the process
Market Adaptation: - Every edge has a lifecycle: discovery, exploitation, correction, adaptation - Detect edge decay through rolling window analysis of ROI and CLV - Stay ahead through diversification, model updating, information monitoring, and process improvement - Quarterly reviews provide structured opportunities to reassess and adapt
In Chapter 14, we will take the bankroll management concepts introduced in Chapter 4 to an advanced level, with rigorous Kelly criterion derivations, portfolio theory applications, and sophisticated multi-account allocation strategies.
Exercises
Exercise 13.1: Build an Elo model for a sport of your choice using at least 3 seasons of historical data. Calculate the model's Brier score and compare it to a naive baseline that always predicts the home team wins with 55% probability.
Exercise 13.2: Using the bayesian_probability_combination function, explore how the combined probability changes as you vary the model confidence weight from 0.1 to 0.9. Plot the results for a case where your model says 60% and the market says 50%.
Exercise 13.3: Create a BetTracker database and populate it with at least 50 synthetic bets. Use the get_performance_summary function to generate reports by sport and market type.
Exercise 13.4: Write a simulation that generates 2,000 bets for a bettor with a 2.5% true ROI, then applies the detect_edge_decay function. Introduce a gradual decay in the bettor's edge starting at bet 1,000 (reducing from 2.5% to 0.5% over the next 1,000 bets). Does the function detect the decay?
Exercise 13.5: Implement a "model comparison" framework that takes probability estimates from two different models and determines which model has better calibration and CLV on a historical dataset of at least 200 games.