12 min read

> "A value bet is simply a bet where the probability of a given outcome is greater than what the odds reflect. It's the only way to make money long-term." -- Pinnacle Sports educational content

Chapter 13: Value Betting Theory and Practice

"A value bet is simply a bet where the probability of a given outcome is greater than what the odds reflect. It's the only way to make money long-term." -- Pinnacle Sports educational content

Value betting is the intellectual core of professional sports betting. While Chapter 12 focused on getting the best price once you have decided to bet, this chapter addresses the more fundamental question: how do you determine whether a bet has positive expected value in the first place? We will develop rigorous frameworks for estimating true probabilities, systematically identifying value, tracking your bets with precision, evaluating your edge over time, and adapting when markets evolve.


13.1 True Probability vs. Market Probability

The Fundamental Question

Every bet you place is a statement about probability. When you bet on Team A at +150 (implied probability 40%), you are asserting that Team A's true win probability exceeds 40%. But where does your estimate of "true probability" come from?

There are two broad approaches:

  1. Model-based estimation: Build a quantitative model that outputs probability estimates from input features
  2. Market-based estimation: Use the market itself (specifically, the no-vig closing line at a sharp book) as the best available estimate, and look for deviations at soft books

Both approaches have strengths and weaknesses, and the best practitioners often combine them.

Model-Based Probability Estimation

A model-based approach constructs an explicit mapping from observable features to outcome probabilities:

$$ P(\text{outcome}) = f(\mathbf{x}) $$

where $\mathbf{x}$ is a vector of features (team ratings, injuries, weather, rest, etc.) and $f$ is your model.

Building a Simple Elo-Based Model

The Elo rating system, originally designed for chess, can be adapted for team sports. It provides a natural probability framework:

import numpy as np
from dataclasses import dataclass, field
from typing import Dict, List, Tuple, Optional

@dataclass
class EloModel:
    """
    Elo rating model for team sports with probability estimation.

    Parameters
    ----------
    k_factor : float
        Learning rate for rating updates (higher = more reactive)
    home_advantage : float
        Elo points added to home team's rating
    initial_rating : float
        Starting rating for new teams
    mean_reversion : float
        Fraction of rating to revert to mean between seasons (0-1)
    """
    k_factor: float = 20.0
    home_advantage: float = 65.0
    initial_rating: float = 1500.0
    mean_reversion: float = 0.33
    ratings: Dict[str, float] = field(default_factory=dict)
    history: List[dict] = field(default_factory=list)

    def get_rating(self, team: str) -> float:
        """Get a team's current rating, initializing if necessary."""
        if team not in self.ratings:
            self.ratings[team] = self.initial_rating
        return self.ratings[team]

    def predict_probability(
        self,
        home_team: str,
        away_team: str,
        neutral: bool = False
    ) -> Tuple[float, float]:
        """
        Predict win probability for each team.

        Parameters
        ----------
        home_team : str
        away_team : str
        neutral : bool
            If True, no home advantage is applied

        Returns
        -------
        tuple of (home_win_prob, away_win_prob)
        """
        home_rating = self.get_rating(home_team)
        away_rating = self.get_rating(away_team)

        hfa = 0 if neutral else self.home_advantage
        rating_diff = home_rating + hfa - away_rating

        # Standard Elo probability formula
        home_prob = 1.0 / (1.0 + 10 ** (-rating_diff / 400))
        away_prob = 1.0 - home_prob

        return home_prob, away_prob

    def update(
        self,
        home_team: str,
        away_team: str,
        home_score: float,
        away_score: float,
        neutral: bool = False
    ):
        """
        Update ratings after a game result.

        Parameters
        ----------
        home_team, away_team : str
        home_score, away_score : float
            Actual game scores
        neutral : bool
        """
        home_prob, away_prob = self.predict_probability(
            home_team, away_team, neutral
        )

        # Determine actual result (1=home win, 0.5=draw, 0=away win)
        if home_score > away_score:
            home_actual = 1.0
        elif home_score < away_score:
            home_actual = 0.0
        else:
            home_actual = 0.5

        away_actual = 1.0 - home_actual

        # Apply margin of victory multiplier (optional)
        mov = abs(home_score - away_score)
        mov_multiplier = np.log(max(mov, 1) + 1) * (2.2 / (
            (home_prob - away_prob) * 0.001 + 2.2
        ))

        # Update ratings
        home_update = self.k_factor * mov_multiplier * (home_actual - home_prob)
        self.ratings[home_team] = self.get_rating(home_team) + home_update
        self.ratings[away_team] = self.get_rating(away_team) - home_update

        # Record prediction for calibration analysis
        self.history.append({
            'home': home_team,
            'away': away_team,
            'home_prob': home_prob,
            'home_actual': home_actual,
            'home_score': home_score,
            'away_score': away_score,
        })

    def season_reset(self):
        """Apply mean reversion between seasons."""
        for team in self.ratings:
            self.ratings[team] = (
                self.initial_rating * self.mean_reversion +
                self.ratings[team] * (1 - self.mean_reversion)
            )

    def calibration_analysis(self, n_bins: int = 10) -> dict:
        """
        Analyze how well-calibrated the model's probabilities are.

        Returns
        -------
        dict with calibration metrics
        """
        if not self.history:
            return {}

        probs = np.array([h['home_prob'] for h in self.history])
        actuals = np.array([h['home_actual'] for h in self.history])

        # Bin probabilities
        bins = np.linspace(0, 1, n_bins + 1)
        calibration = []

        for i in range(n_bins):
            mask = (probs >= bins[i]) & (probs < bins[i+1])
            if mask.sum() > 0:
                calibration.append({
                    'bin_center': (bins[i] + bins[i+1]) / 2,
                    'predicted': probs[mask].mean(),
                    'actual': actuals[mask].mean(),
                    'count': mask.sum(),
                })

        # Brier score
        brier = np.mean((probs - actuals) ** 2)

        # Log loss
        eps = 1e-15
        probs_clipped = np.clip(probs, eps, 1 - eps)
        log_loss = -np.mean(
            actuals * np.log(probs_clipped) +
            (1 - actuals) * np.log(1 - probs_clipped)
        )

        return {
            'calibration': calibration,
            'brier_score': brier,
            'log_loss': log_loss,
            'n_predictions': len(probs),
        }

# Example usage
model = EloModel(k_factor=25, home_advantage=70)

# Simulate some games
games = [
    ('Team A', 'Team B', 28, 24),
    ('Team C', 'Team A', 31, 17),
    ('Team B', 'Team C', 21, 21),
    ('Team A', 'Team C', 35, 28),
    ('Team B', 'Team A', 17, 30),
]

for home, away, hs, as_ in games:
    prob_h, prob_a = model.predict_probability(home, away)
    print(f"{home} vs {away}: P(home)={prob_h:.3f}, P(away)={prob_a:.3f} "
          f"=> Result: {hs}-{as_}")
    model.update(home, away, hs, as_)

print("\nFinal Ratings:")
for team, rating in sorted(model.ratings.items(), key=lambda x: -x[1]):
    print(f"  {team}: {rating:.1f}")

Beyond Elo: Feature-Rich Models

While Elo provides a solid baseline, advanced models incorporate many more features. Here is a logistic regression approach:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.calibration import calibration_curve
import pandas as pd
import numpy as np

def build_game_prediction_model(
    games_df: pd.DataFrame,
    feature_columns: list,
    target_column: str = 'home_win'
) -> dict:
    """
    Build a logistic regression model for game outcome prediction.

    Parameters
    ----------
    games_df : pd.DataFrame
        Historical game data with features
    feature_columns : list
        Column names to use as features
    target_column : str
        Binary target column (1=home win, 0=away win)

    Returns
    -------
    dict with model, calibration metrics, and cross-validation scores
    """
    X = games_df[feature_columns].values
    y = games_df[target_column].values

    # Fit model
    model = LogisticRegression(
        penalty='l2',
        C=1.0,
        max_iter=1000,
        random_state=42
    )
    model.fit(X, y)

    # Cross-validation
    cv_scores = cross_val_score(
        model, X, y, cv=5, scoring='neg_brier_score'
    )

    # Calibration
    predicted_probs = model.predict_proba(X)[:, 1]
    fraction_positive, mean_predicted = calibration_curve(
        y, predicted_probs, n_bins=10
    )

    # Feature importance
    importance = dict(zip(feature_columns, model.coef_[0]))

    return {
        'model': model,
        'cv_brier_scores': -cv_scores,
        'mean_brier': -cv_scores.mean(),
        'calibration_actual': fraction_positive,
        'calibration_predicted': mean_predicted,
        'feature_importance': importance,
    }

# Example feature set for NFL prediction
example_features = [
    'home_elo_diff',        # Elo rating difference
    'home_off_dvoa',        # Offensive efficiency
    'away_off_dvoa',
    'home_def_dvoa',        # Defensive efficiency
    'away_def_dvoa',
    'home_rest_days',       # Days since last game
    'away_rest_days',
    'home_travel_miles',    # Travel distance
    'away_travel_miles',
    'is_divisional',        # Divisional rivalry game
    'home_injuries_impact', # Injury-adjusted rating
    'away_injuries_impact',
    'dome_game',            # Indoor vs outdoor
    'temperature',          # Game-time temperature
    'wind_speed',           # Wind speed at venue
]

Market-Based Probability Estimation

The alternative to building your own model is to treat the betting market itself as a highly sophisticated prediction machine. The efficient market hypothesis (applied to sports betting) suggests that the closing line at a sharp sportsbook (like Pinnacle) represents the best available estimate of true probability.

Under this framework, your goal shifts from "estimating the true probability" to "finding sportsbooks whose lines deviate from the sharp market consensus."

$$ \text{Value} = p_{\text{sharp closing}} - p_{\text{implied at soft book}} $$

This approach has significant advantages:

Aspect Model-Based Market-Based
Skill required Very high (statistics, domain expertise) Moderate (line comparison, accounts)
Data requirements Extensive historical data, features Real-time odds from sharp and soft books
Scalability Hard to maintain across many sports Easy to apply across all sports
Edge source Model outperforms market Soft books lag sharp market
Risk Model error, overfitting Account restrictions, line speed
Sustainability Lasts as long as model edge exists Lasts until books improve pricing

Key Insight: Many professional bettors use a hybrid approach. They build models to identify which side of a game they favor, then use market-based analysis (closing line value) to validate that their model actually has an edge. If your model consistently agrees with the direction the sharp market moves, your model has genuine predictive value.

Estimating "True" Probability: A Bayesian Framework

We can formalize the combination of model-based and market-based information using Bayesian updating:

$$ P(\text{outcome} | \text{model, market}) \propto P(\text{outcome} | \text{model}) \times P(\text{market odds} | \text{outcome}) $$

In practice, this means combining your model's probability with the market's probability, weighting by your confidence in each:

def bayesian_probability_combination(
    model_prob: float,
    market_prob: float,
    model_confidence: float = 0.3,
    market_confidence: float = 0.7
) -> float:
    """
    Combine model and market probabilities using a weighted log-odds approach.

    This is equivalent to a Bayesian update under the assumption that
    both model and market provide independent information, weighted
    by our confidence in each.

    Parameters
    ----------
    model_prob : float
        Your model's estimated probability (0 to 1)
    market_prob : float
        Market-implied no-vig probability (0 to 1)
    model_confidence : float
        Weight on model (0 to 1)
    market_confidence : float
        Weight on market (0 to 1)

    Returns
    -------
    float
        Combined probability estimate
    """
    # Convert to log-odds (logit)
    def logit(p):
        p = np.clip(p, 1e-10, 1 - 1e-10)
        return np.log(p / (1 - p))

    def inv_logit(x):
        return 1 / (1 + np.exp(-x))

    # Normalize weights
    total = model_confidence + market_confidence
    w_model = model_confidence / total
    w_market = market_confidence / total

    # Combine in log-odds space (more appropriate than linear averaging)
    combined_logit = (
        w_model * logit(model_prob) +
        w_market * logit(market_prob)
    )

    return inv_logit(combined_logit)

# Example: Model says 55%, market says 48%
combined = bayesian_probability_combination(
    model_prob=0.55,
    market_prob=0.48,
    model_confidence=0.3,
    market_confidence=0.7
)
print(f"Model: 55.0%, Market: 48.0%, Combined: {combined*100:.1f}%")
# Output: Model: 55.0%, Market: 48.0%, Combined: 50.0%

The log-odds weighting is superior to simple linear averaging because it correctly handles probabilities near the extremes (0 or 1) and respects the multiplicative nature of odds.


13.2 Systematic Value Identification

The Value Equation

A bet has positive expected value when:

$$ E[\text{profit}] = p_{\text{true}} \times \text{win amount} - (1 - p_{\text{true}}) \times \text{lose amount} > 0 $$

Equivalently, for a bet at American odds $o$:

$$ \text{Value exists when: } p_{\text{true}} > p_{\text{implied}}(o) $$

The percentage edge is:

$$ \text{Edge} = \frac{p_{\text{true}}}{p_{\text{implied}}} - 1 $$

For example, if you estimate the true probability at 45% and the implied probability from the odds is 40%:

$$ \text{Edge} = \frac{0.45}{0.40} - 1 = 12.5\% $$

Edge Thresholds: How Much Edge Do You Need?

Not all positive-edge bets are worth taking. You need to consider:

  1. Estimation uncertainty: Your probability estimate has error bars
  2. Transaction costs: Time, effort, opportunity cost of capital
  3. Vig erosion: Even "value" bets pay vig, reducing your effective edge
  4. Variance tolerance: Low-edge bets have high variance relative to expected return

Here is a framework for minimum edge thresholds:

def minimum_edge_threshold(
    confidence_in_estimate: float,
    probability_range: tuple,
    bet_type: str = 'standard',
    kelly_fraction: float = 0.25
) -> float:
    """
    Calculate the minimum edge required before placing a bet.

    The threshold accounts for estimation uncertainty by requiring
    that the lower bound of your probability estimate still shows
    positive expected value.

    Parameters
    ----------
    confidence_in_estimate : float
        How confident you are in your probability estimate (0-1)
        This determines how much to discount your estimated edge
    probability_range : tuple
        (lower_bound, upper_bound) of your probability estimate
    bet_type : str
        'standard' (sides/totals), 'prop', or 'live'
    kelly_fraction : float
        Your Kelly fraction (typically 0.2-0.3)

    Returns
    -------
    float
        Minimum edge threshold (as a proportion, not percentage)
    """
    # Base threshold varies by bet type
    base_thresholds = {
        'standard': 0.02,   # 2% for efficient markets
        'prop': 0.03,       # 3% for less efficient prop markets
        'live': 0.04,       # 4% for fast-moving live markets
        'futures': 0.05,    # 5% for high-vig futures
    }

    base = base_thresholds.get(bet_type, 0.03)

    # Adjust for estimation uncertainty
    prob_estimate = (probability_range[0] + probability_range[1]) / 2
    uncertainty = (probability_range[1] - probability_range[0]) / 2

    # Use the conservative end of the estimate
    conservative_prob = prob_estimate - uncertainty * (1 - confidence_in_estimate)

    # The edge must exceed the base threshold even at the conservative estimate
    # This naturally requires a larger perceived edge when uncertainty is high
    required_edge = base / confidence_in_estimate

    return required_edge

# Example: You estimate Team A has a 55% chance (range: 50%-60%)
threshold = minimum_edge_threshold(
    confidence_in_estimate=0.7,
    probability_range=(0.50, 0.60),
    bet_type='standard'
)
print(f"Minimum edge threshold: {threshold*100:.1f}%")
# A 70% confident bettor in a standard market needs ~2.9% edge

A Framework for Systematic Value Identification

Here is a complete workflow for identifying value bets:

from dataclasses import dataclass
from enum import Enum
from typing import List

class ValueRating(Enum):
    """Rating categories for value opportunities."""
    NO_VALUE = 0
    MARGINAL = 1       # Edge 1-2%: generally not worth betting
    MODERATE = 2       # Edge 2-4%: bet at reduced Kelly
    STRONG = 3         # Edge 4-7%: bet at standard Kelly
    VERY_STRONG = 4    # Edge 7%+: bet at full Kelly fraction

@dataclass
class ValueAssessment:
    """Complete assessment of a potential value bet."""
    event: str
    selection: str
    market: str
    your_probability: float
    your_prob_lower: float
    your_prob_upper: float
    best_available_odds: float
    best_book: str
    implied_probability: float
    edge: float
    edge_lower: float
    edge_upper: float
    value_rating: ValueRating
    recommended_kelly: float
    recommended_stake_pct: float
    confidence_level: float
    notes: str = ""

def assess_value(
    event: str,
    selection: str,
    market: str,
    your_prob: float,
    your_prob_std: float,
    best_odds: float,
    best_book: str,
    bankroll: float = 10000,
    kelly_fraction: float = 0.25
) -> ValueAssessment:
    """
    Perform a complete value assessment for a potential bet.

    Parameters
    ----------
    event : str
        Event description
    selection : str
        What you're betting on
    market : str
        Market type
    your_prob : float
        Your point estimate of the probability
    your_prob_std : float
        Standard deviation of your probability estimate
    best_odds : float
        Best available American odds
    best_book : str
        Sportsbook offering the best odds
    bankroll : float
        Current bankroll
    kelly_fraction : float
        Fraction of full Kelly to use

    Returns
    -------
    ValueAssessment
    """
    # Calculate implied probability from odds
    if best_odds > 0:
        implied = 100 / (best_odds + 100)
    else:
        implied = abs(best_odds) / (abs(best_odds) + 100)

    # Calculate edge and confidence interval
    edge = your_prob / implied - 1
    prob_lower = max(0.01, your_prob - 1.96 * your_prob_std)
    prob_upper = min(0.99, your_prob + 1.96 * your_prob_std)
    edge_lower = prob_lower / implied - 1
    edge_upper = prob_upper / implied - 1

    # Confidence that edge is positive
    # (probability that true prob > implied prob)
    from scipy.stats import norm
    z = (your_prob - implied) / your_prob_std
    confidence = norm.cdf(z)

    # Determine value rating
    if edge_lower > 0.07:
        rating = ValueRating.VERY_STRONG
    elif edge_lower > 0.04:
        rating = ValueRating.STRONG
    elif edge_lower > 0.01:
        rating = ValueRating.MODERATE
    elif edge > 0.01:
        rating = ValueRating.MARGINAL
    else:
        rating = ValueRating.NO_VALUE

    # Kelly criterion calculation
    if best_odds > 0:
        decimal_odds = best_odds / 100 + 1
    else:
        decimal_odds = 100 / abs(best_odds) + 1

    b = decimal_odds - 1  # Net odds
    full_kelly = (your_prob * b - (1 - your_prob)) / b
    full_kelly = max(0, full_kelly)

    # Apply fraction and confidence adjustment
    adjusted_kelly = full_kelly * kelly_fraction * min(confidence, 1.0)
    recommended_stake = adjusted_kelly * bankroll

    return ValueAssessment(
        event=event,
        selection=selection,
        market=market,
        your_probability=your_prob,
        your_prob_lower=prob_lower,
        your_prob_upper=prob_upper,
        best_available_odds=best_odds,
        best_book=best_book,
        implied_probability=implied,
        edge=edge,
        edge_lower=edge_lower,
        edge_upper=edge_upper,
        value_rating=rating,
        recommended_kelly=adjusted_kelly,
        recommended_stake_pct=adjusted_kelly * 100,
        confidence_level=confidence,
    )

# Example usage
assessment = assess_value(
    event="Patriots vs Bills",
    selection="Patriots +3",
    market="spread",
    your_prob=0.55,
    your_prob_std=0.04,
    best_odds=-105,
    best_book="BookD",
)

print(f"Value Assessment: {assessment.event}")
print(f"  Selection: {assessment.selection}")
print(f"  Your Probability: {assessment.your_probability:.1%}")
print(f"  95% CI: [{assessment.your_prob_lower:.1%}, {assessment.your_prob_upper:.1%}]")
print(f"  Implied Probability: {assessment.implied_probability:.1%}")
print(f"  Edge: {assessment.edge:.1%}")
print(f"  Edge 95% CI: [{assessment.edge_lower:.1%}, {assessment.edge_upper:.1%}]")
print(f"  Confidence Edge > 0: {assessment.confidence_level:.1%}")
print(f"  Value Rating: {assessment.value_rating.name}")
print(f"  Recommended Stake: {assessment.recommended_stake_pct:.2f}% of bankroll")

Multi-Factor Value Scoring

In practice, value is not one-dimensional. A sophisticated bettor considers multiple factors when evaluating a bet:

def multi_factor_value_score(
    edge: float,
    clv_track_record: float,
    market_efficiency: float,
    liquidity: float,
    correlation_with_existing_bets: float,
    time_to_event_hours: float
) -> float:
    """
    Compute a composite value score combining multiple factors.

    Parameters
    ----------
    edge : float
        Estimated edge (e.g., 0.05 for 5%)
    clv_track_record : float
        Your historical CLV in this market type (-1 to +1, typically -0.05 to +0.05)
    market_efficiency : float
        How efficient the market is (0=very efficient, 1=very inefficient)
    liquidity : float
        How easy it is to get your desired stake down (0=impossible, 1=easy)
    correlation_with_existing_bets : float
        Correlation with bets already in your portfolio (0=uncorrelated, 1=identical)
    time_to_event_hours : float
        Hours until the event starts

    Returns
    -------
    float
        Composite value score (higher = more attractive)
    """
    # Weight each factor
    weights = {
        'edge': 0.35,
        'track_record': 0.20,
        'market_efficiency': 0.15,
        'liquidity': 0.10,
        'diversification': 0.10,
        'timing': 0.10,
    }

    # Normalize edge to 0-1 scale (cap at 15% edge)
    edge_score = min(edge / 0.15, 1.0)

    # Track record: positive CLV history boosts confidence
    track_score = max(0, min(1, (clv_track_record + 0.05) / 0.10))

    # Market efficiency: prefer less efficient markets
    efficiency_score = market_efficiency

    # Liquidity: penalize if can't get full size down
    liquidity_score = liquidity

    # Diversification: penalize correlated bets
    diversification_score = 1.0 - correlation_with_existing_bets

    # Timing: slight preference for events further out (more time to manage)
    timing_score = min(time_to_event_hours / 48, 1.0)

    composite = (
        weights['edge'] * edge_score +
        weights['track_record'] * track_score +
        weights['market_efficiency'] * efficiency_score +
        weights['liquidity'] * liquidity_score +
        weights['diversification'] * diversification_score +
        weights['timing'] * timing_score
    )

    return composite

Callout: The "Confidence Trap"

One of the most common mistakes in value betting is overconfidence in your probability estimates. If your model says a team has a 60% chance of winning and the market says 50%, the most likely explanation is not that you have found a 10% edge -- it is that your model is wrong. Always ask: "Why would the market be mispricing this?" If you cannot articulate a specific, verifiable reason (e.g., injury not yet priced in, weather change, biased public perception), treat your edge estimate with extreme skepticism and reduce your stake accordingly.


13.3 Tracking and Recording Bets

Why Tracking Matters

Professional bettors are meticulous record-keepers. Without detailed tracking, you cannot:

  1. Calculate your actual ROI across different sports, markets, and bet types
  2. Measure your CLV to determine if your betting process is sound
  3. Identify strengths and weaknesses in your approach
  4. Detect when your edge is decaying and adapt accordingly
  5. File accurate tax returns (in jurisdictions where gambling income is taxable)

What to Track

At minimum, record the following for every bet:

Field Example Purpose
Date/Time placed 2025-01-15 14:32 EST Timing analysis
Sport NFL Filter by sport
Event Patriots vs Bills Game identification
Market Spread Market type analysis
Selection Patriots +3 What you bet
Odds at placement -108 CLV calculation
Closing odds -114 CLV calculation
Closing spread (if applicable) Patriots +2.5 Spread CLV
Stake $250 P&L calculation
Sportsbook BookD Book performance tracking
Result Win P&L calculation
Profit/Loss +$231.48 Bottom line
Your estimated probability 0.55 Edge tracking
Model used Elo+LR v3.2 Model comparison
Confidence level High Qualitative assessment
Notes "Weather shift, wind 25mph" Context for review

Building a Bet Tracking System

"""
bet_tracker.py
A comprehensive bet tracking and journaling system.
"""

import sqlite3
import pandas as pd
import numpy as np
from datetime import datetime, timezone
from typing import Optional, List
from dataclasses import dataclass, asdict

@dataclass
class Bet:
    """Represents a single bet with all relevant metadata."""
    # Required fields
    sport: str
    event: str
    market: str
    selection: str
    odds_placed: float
    stake: float
    sportsbook: str

    # Optional fields (filled in later)
    bet_id: Optional[int] = None
    date_placed: Optional[str] = None
    odds_closing: Optional[float] = None
    spread_placed: Optional[float] = None
    spread_closing: Optional[float] = None
    result: Optional[str] = None  # 'win', 'loss', 'push', 'void'
    profit_loss: Optional[float] = None
    your_probability: Optional[float] = None
    model_name: Optional[str] = None
    confidence: Optional[str] = None  # 'low', 'medium', 'high'
    notes: Optional[str] = None
    tags: Optional[str] = None  # Comma-separated tags

class BetTracker:
    """
    Comprehensive bet tracking and analysis system.

    Parameters
    ----------
    db_path : str
        Path to SQLite database for bet storage
    """

    def __init__(self, db_path: str = "bet_journal.db"):
        self.db_path = db_path
        self._init_database()

    def _init_database(self):
        """Initialize the bet tracking database."""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute("""
            CREATE TABLE IF NOT EXISTS bets (
                bet_id INTEGER PRIMARY KEY AUTOINCREMENT,
                date_placed TEXT NOT NULL,
                sport TEXT NOT NULL,
                event TEXT NOT NULL,
                market TEXT NOT NULL,
                selection TEXT NOT NULL,
                odds_placed REAL NOT NULL,
                odds_closing REAL,
                spread_placed REAL,
                spread_closing REAL,
                stake REAL NOT NULL,
                sportsbook TEXT NOT NULL,
                result TEXT,
                profit_loss REAL,
                your_probability REAL,
                model_name TEXT,
                confidence TEXT,
                notes TEXT,
                tags TEXT,
                created_at TEXT DEFAULT CURRENT_TIMESTAMP,
                updated_at TEXT DEFAULT CURRENT_TIMESTAMP
            )
        """)

        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_bets_sport ON bets(sport)
        """)
        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_bets_date ON bets(date_placed)
        """)
        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_bets_sportsbook ON bets(sportsbook)
        """)

        conn.commit()
        conn.close()

    def add_bet(self, bet: Bet) -> int:
        """
        Add a new bet to the tracker.

        Parameters
        ----------
        bet : Bet
            The bet to record

        Returns
        -------
        int
            The bet_id of the newly inserted bet
        """
        if bet.date_placed is None:
            bet.date_placed = datetime.now(timezone.utc).isoformat()

        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute("""
            INSERT INTO bets (date_placed, sport, event, market, selection,
                            odds_placed, odds_closing, spread_placed,
                            spread_closing, stake, sportsbook, result,
                            profit_loss, your_probability, model_name,
                            confidence, notes, tags)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            bet.date_placed, bet.sport, bet.event, bet.market,
            bet.selection, bet.odds_placed, bet.odds_closing,
            bet.spread_placed, bet.spread_closing, bet.stake,
            bet.sportsbook, bet.result, bet.profit_loss,
            bet.your_probability, bet.model_name, bet.confidence,
            bet.notes, bet.tags
        ))

        bet_id = cursor.lastrowid
        conn.commit()
        conn.close()
        return bet_id

    def update_result(
        self,
        bet_id: int,
        result: str,
        odds_closing: Optional[float] = None,
        spread_closing: Optional[float] = None
    ):
        """
        Update a bet with its result and closing line information.

        Parameters
        ----------
        bet_id : int
        result : str
            'win', 'loss', 'push', or 'void'
        odds_closing : float, optional
            Closing odds
        spread_closing : float, optional
            Closing spread
        """
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        # Get the bet to calculate P&L
        cursor.execute("SELECT odds_placed, stake FROM bets WHERE bet_id = ?",
                       (bet_id,))
        row = cursor.fetchone()
        if not row:
            conn.close()
            raise ValueError(f"Bet {bet_id} not found")

        odds, stake = row

        # Calculate profit/loss
        if result == 'win':
            if odds > 0:
                profit_loss = stake * odds / 100
            else:
                profit_loss = stake * 100 / abs(odds)
        elif result == 'loss':
            profit_loss = -stake
        elif result == 'push':
            profit_loss = 0
        elif result == 'void':
            profit_loss = 0
        else:
            raise ValueError(f"Invalid result: {result}")

        cursor.execute("""
            UPDATE bets SET
                result = ?,
                profit_loss = ?,
                odds_closing = COALESCE(?, odds_closing),
                spread_closing = COALESCE(?, spread_closing),
                updated_at = CURRENT_TIMESTAMP
            WHERE bet_id = ?
        """, (result, profit_loss, odds_closing, spread_closing, bet_id))

        conn.commit()
        conn.close()

    def get_performance_summary(
        self,
        sport: Optional[str] = None,
        market: Optional[str] = None,
        sportsbook: Optional[str] = None,
        date_from: Optional[str] = None,
        date_to: Optional[str] = None
    ) -> dict:
        """
        Generate a comprehensive performance summary with optional filters.

        Returns
        -------
        dict with performance metrics
        """
        conn = sqlite3.connect(self.db_path)

        query = "SELECT * FROM bets WHERE result IS NOT NULL"
        params = []

        if sport:
            query += " AND sport = ?"
            params.append(sport)
        if market:
            query += " AND market = ?"
            params.append(market)
        if sportsbook:
            query += " AND sportsbook = ?"
            params.append(sportsbook)
        if date_from:
            query += " AND date_placed >= ?"
            params.append(date_from)
        if date_to:
            query += " AND date_placed <= ?"
            params.append(date_to)

        df = pd.read_sql_query(query, conn, params=params)
        conn.close()

        if len(df) == 0:
            return {'error': 'No bets found matching criteria'}

        # Core metrics
        total_bets = len(df)
        wins = (df['result'] == 'win').sum()
        losses = (df['result'] == 'loss').sum()
        pushes = (df['result'] == 'push').sum()

        total_staked = df['stake'].sum()
        total_profit = df['profit_loss'].sum()
        roi = total_profit / total_staked if total_staked > 0 else 0

        # CLV metrics (where closing odds are available)
        clv_df = df.dropna(subset=['odds_closing'])
        if len(clv_df) > 0:
            clv_values = []
            for _, row in clv_df.iterrows():
                placed_implied = american_to_implied_prob(row['odds_placed'])
                closing_implied = american_to_implied_prob(row['odds_closing'])
                clv = closing_implied - placed_implied
                clv_values.append(clv)
            mean_clv = np.mean(clv_values)
            clv_n = len(clv_values)
        else:
            mean_clv = None
            clv_n = 0

        # Win rate with confidence interval (Wilson score interval)
        from scipy.stats import norm
        z = 1.96
        n = wins + losses  # Exclude pushes
        if n > 0:
            p_hat = wins / n
            denominator = 1 + z**2 / n
            center = (p_hat + z**2 / (2*n)) / denominator
            spread = z * np.sqrt(
                (p_hat * (1 - p_hat) + z**2 / (4*n)) / n
            ) / denominator
            win_rate_ci = (center - spread, center + spread)
        else:
            p_hat = 0
            win_rate_ci = (0, 0)

        return {
            'total_bets': total_bets,
            'wins': wins,
            'losses': losses,
            'pushes': pushes,
            'win_rate': p_hat,
            'win_rate_ci_95': win_rate_ci,
            'total_staked': total_staked,
            'total_profit': total_profit,
            'roi': roi,
            'roi_pct': roi * 100,
            'mean_clv': mean_clv,
            'clv_bets_tracked': clv_n,
            'avg_odds': df['odds_placed'].mean(),
            'avg_stake': df['stake'].mean(),
            'max_win': df['profit_loss'].max(),
            'max_loss': df['profit_loss'].min(),
            'longest_win_streak': _longest_streak(df, 'win'),
            'longest_loss_streak': _longest_streak(df, 'loss'),
        }

def _longest_streak(df: pd.DataFrame, result_type: str) -> int:
    """Calculate the longest streak of a given result type."""
    max_streak = 0
    current_streak = 0
    for result in df['result']:
        if result == result_type:
            current_streak += 1
            max_streak = max(max_streak, current_streak)
        else:
            current_streak = 0
    return max_streak

def american_to_implied_prob(odds: float) -> float:
    """Convert American odds to implied probability."""
    if odds < 0:
        return abs(odds) / (abs(odds) + 100)
    else:
        return 100 / (odds + 100)

Building a Betting Journal: Beyond Raw Numbers

A bet tracker captures the quantitative data, but a betting journal captures the qualitative reasoning. For each bet, record:

  1. Pre-bet thesis: Why do you believe this bet has value? What specific factor does the market undervalue?
  2. Counter-arguments: What could make this bet wrong? What is the strongest argument against your position?
  3. Key variables: What information, if it changes, would cause you to reverse your opinion?
  4. Post-event review: Was your thesis correct? Did you win/lose for the right reasons?
@dataclass
class JournalEntry:
    """A qualitative journal entry accompanying a bet."""
    bet_id: int
    pre_bet_thesis: str
    counter_arguments: str
    key_variables: str
    confidence_reasoning: str
    post_event_review: Optional[str] = None
    lessons_learned: Optional[str] = None

# Example journal entry
entry = JournalEntry(
    bet_id=142,
    pre_bet_thesis=(
        "Patriots +3 has value because the market overweights the Bills' "
        "recent offensive output, which came against weak defenses (JAX, NYG). "
        "The Patriots defense ranks top-5 in pressure rate and the Bills' "
        "O-line has been graded poorly by PFF in the last 4 weeks."
    ),
    counter_arguments=(
        "Bills have won 6 straight, including covering in 4. Josh Allen "
        "historically plays well in cold weather. Patriots offense has been "
        "anemic, averaging 14 ppg in the last 3."
    ),
    key_variables=(
        "Wind speed (if >20 mph, favors Under more than side bet). "
        "Trent Brown availability for the Bills OL. "
        "Patriots' WR1 status (questionable with hamstring)."
    ),
    confidence_reasoning=(
        "Medium-High confidence. The defensive matchup angle is strong and "
        "quantifiable. My model shows +3.5 as fair spread. Getting +3 at -105 "
        "is marginal but above threshold."
    ),
)

Callout: The Reviewing Habit

Set aside time weekly to review your bets -- not just the results, but your reasoning. The most valuable learning happens when you correctly identified value but lost (bad luck, good process) or when you incorrectly identified value but won (good luck, bad process). Track how often your pre-bet thesis was validated by post-game analysis. This meta-tracking is what separates the bettor who improves over time from the one who stays stagnant.


13.4 Evaluating Your Edge Over Time

The Sample Size Problem

Sports betting has inherently high variance. Even a bettor with a genuine 3% ROI edge will experience substantial swings:

import numpy as np
from scipy import stats

def required_sample_size(
    true_roi: float,
    avg_odds: float = -110,
    confidence: float = 0.95,
    power: float = 0.80
) -> int:
    """
    Calculate the number of bets needed to statistically confirm an edge.

    Uses a one-sample z-test framework where the null hypothesis
    is ROI = 0 and the alternative is ROI = true_roi.

    Parameters
    ----------
    true_roi : float
        Your true ROI (e.g., 0.03 for 3%)
    avg_odds : float
        Average American odds of your bets
    confidence : float
        Desired confidence level (1 - alpha)
    power : float
        Desired statistical power (1 - beta)

    Returns
    -------
    int
        Minimum number of bets required
    """
    # Standard deviation of a single bet's return
    # For a -110 bet: win +$90.91 or lose -$100 on a $100 bet
    if avg_odds < 0:
        win_payout = 100 / abs(avg_odds)
    else:
        win_payout = avg_odds / 100

    # Approximate win probability given the ROI
    # Expected return = p * win_payout - (1-p) = ROI
    # p * (win_payout + 1) - 1 = ROI
    # p = (1 + ROI) / (1 + win_payout)
    p = (1 + true_roi) / (1 + win_payout)

    # Variance of a single bet's return (per $1 staked)
    variance = p * win_payout**2 + (1-p) * 1 - true_roi**2
    sigma = np.sqrt(variance)

    # Required sample size (z-test)
    z_alpha = stats.norm.ppf(1 - (1 - confidence) / 2)
    z_beta = stats.norm.ppf(power)

    n = ((z_alpha + z_beta) * sigma / true_roi) ** 2

    return int(np.ceil(n))

# How many bets to confirm various edges?
for roi in [0.01, 0.02, 0.03, 0.05, 0.08, 0.10]:
    n = required_sample_size(roi)
    print(f"  ROI = {roi*100:.0f}%: {n:>6,} bets needed")

Expected output:

  ROI = 1%:  38,416 bets needed
  ROI = 2%:   9,604 bets needed
  ROI = 3%:   4,268 bets needed
  ROI = 5%:   1,537 bets needed
  ROI = 8%:     600 bets needed
  ROI = 10%:    384 bets needed

These numbers are sobering. A bettor with a solid 3% ROI edge needs over 4,000 bets at standard -110 juice to confirm their edge at 95% confidence with 80% power. At 5 bets per day, that is over two years of betting.

Confidence Intervals on ROI

Rather than waiting for statistical significance, a more practical approach is to track your confidence interval on ROI and watch it narrow over time:

def roi_confidence_interval(
    bets_df: pd.DataFrame,
    confidence: float = 0.95,
    method: str = 'bootstrap'
) -> dict:
    """
    Calculate confidence interval on ROI.

    Parameters
    ----------
    bets_df : pd.DataFrame
        Must have 'stake' and 'profit_loss' columns
    confidence : float
        Confidence level
    method : str
        'normal' for normal approximation, 'bootstrap' for bootstrap

    Returns
    -------
    dict with ROI estimate and confidence interval
    """
    stakes = bets_df['stake'].values
    profits = bets_df['profit_loss'].values
    returns = profits / stakes  # Per-bet return

    roi = profits.sum() / stakes.sum()
    n = len(returns)

    if method == 'normal':
        se = returns.std(ddof=1) / np.sqrt(n)
        z = stats.norm.ppf((1 + confidence) / 2)
        ci = (roi - z * se, roi + z * se)

    elif method == 'bootstrap':
        n_bootstrap = 10000
        bootstrap_rois = []
        for _ in range(n_bootstrap):
            idx = np.random.choice(n, size=n, replace=True)
            boot_roi = profits[idx].sum() / stakes[idx].sum()
            bootstrap_rois.append(boot_roi)

        alpha = (1 - confidence) / 2
        ci = (
            np.percentile(bootstrap_rois, alpha * 100),
            np.percentile(bootstrap_rois, (1 - alpha) * 100)
        )

    return {
        'roi': roi,
        'roi_pct': roi * 100,
        'ci_lower': ci[0],
        'ci_upper': ci[1],
        'ci_lower_pct': ci[0] * 100,
        'ci_upper_pct': ci[1] * 100,
        'n_bets': n,
        'total_staked': stakes.sum(),
        'total_profit': profits.sum(),
    }

Regression to the Mean: Understanding Variance

A critical concept for evaluating performance is regression to the mean. Early results are heavily influenced by variance, and extreme early performance (both good and bad) tends to moderate over time.

def simulate_regression_to_mean(
    true_roi: float = 0.03,
    n_bets: int = 2000,
    n_simulations: int = 5000,
    checkpoints: list = None
) -> pd.DataFrame:
    """
    Simulate how observed ROI converges to true ROI over time.

    Parameters
    ----------
    true_roi : float
        True underlying ROI
    n_bets : int
        Maximum number of bets to simulate
    n_simulations : int
        Number of simulation paths
    checkpoints : list of int
        Bet counts at which to record statistics

    Returns
    -------
    pd.DataFrame with statistics at each checkpoint
    """
    if checkpoints is None:
        checkpoints = [50, 100, 200, 500, 1000, 2000]

    # Win probability for -110 bets with given ROI
    win_payout = 100 / 110  # ~0.909
    p_win = (1 + true_roi) / (1 + win_payout)

    results = []
    for cp in checkpoints:
        observed_rois = []
        for _ in range(n_simulations):
            outcomes = np.random.binomial(1, p_win, cp)
            profit = outcomes.sum() * win_payout - (1 - outcomes).sum()
            observed_roi = profit / cp
            observed_rois.append(observed_roi)

        observed_rois = np.array(observed_rois)
        results.append({
            'n_bets': cp,
            'mean_roi': observed_rois.mean(),
            'std_roi': observed_rois.std(),
            'pct_profitable': (observed_rois > 0).mean() * 100,
            'pct_within_1pct': (
                (observed_rois > true_roi - 0.01) &
                (observed_rois < true_roi + 0.01)
            ).mean() * 100,
            'worst_5pct': np.percentile(observed_rois, 5) * 100,
            'best_5pct': np.percentile(observed_rois, 95) * 100,
        })

    return pd.DataFrame(results)

# Run and display
convergence = simulate_regression_to_mean(true_roi=0.03)
print("Regression to Mean: 3% True ROI Bettor at -110")
print(f"{'Bets':>6} | {'Mean ROI':>9} | {'Std Dev':>8} | "
      f"{'% Profitable':>13} | {'5th %ile':>9} | {'95th %ile':>10}")
print("-" * 70)
for _, row in convergence.iterrows():
    print(f"{row['n_bets']:>6.0f} | {row['mean_roi']*100:>8.2f}% | "
          f"{row['std_roi']*100:>7.2f}% | {row['pct_profitable']:>12.1f}% | "
          f"{row['worst_5pct']:>8.1f}% | {row['best_5pct']:>9.1f}%")

Expected output:

Regression to Mean: 3% True ROI Bettor at -110
  Bets |  Mean ROI |  Std Dev | % Profitable |  5th %ile | 95th %ile
----------------------------------------------------------------------
    50 |     3.01% |   13.60% |         58.8% |    -19.5% |      25.5%
   100 |     3.00% |    9.62% |         62.1% |    -12.8% |      18.8%
   200 |     3.00% |    6.80% |         67.0% |     -8.2% |      14.2%
   500 |     3.01% |    4.30% |         75.7% |     -4.1% |       10.1%
  1000 |     3.00% |    3.04% |         83.8% |     -2.0% |       8.0%
  2000 |     3.00% |    2.15% |         91.8% |     -0.5% |       6.5%

Notice that even after 200 bets, a 3% ROI bettor is only profitable 67% of the time in simulation. This is why patience and process-focus (CLV) are so important.

Multi-Dimensional Performance Evaluation

Evaluate performance across multiple dimensions simultaneously:

def comprehensive_evaluation(tracker: BetTracker) -> dict:
    """
    Perform multi-dimensional performance evaluation.

    Returns performance broken down by:
    - Sport
    - Market type
    - Sportsbook
    - Day of week
    - Odds range
    - Confidence level
    """
    conn = sqlite3.connect(tracker.db_path)
    df = pd.read_sql_query(
        "SELECT * FROM bets WHERE result IS NOT NULL", conn
    )
    conn.close()

    if len(df) == 0:
        return {}

    evaluations = {}

    # By sport
    sport_perf = {}
    for sport in df['sport'].unique():
        subset = df[df['sport'] == sport]
        sport_perf[sport] = {
            'n_bets': len(subset),
            'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
            'win_rate': (subset['result'] == 'win').mean(),
        }
    evaluations['by_sport'] = sport_perf

    # By market type
    market_perf = {}
    for market in df['market'].unique():
        subset = df[df['market'] == market]
        market_perf[market] = {
            'n_bets': len(subset),
            'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
            'win_rate': (subset['result'] == 'win').mean(),
        }
    evaluations['by_market'] = market_perf

    # By odds range
    df['odds_bucket'] = pd.cut(
        df['odds_placed'],
        bins=[-500, -200, -150, -120, -100, 100, 120, 150, 200, 500],
        labels=['Heavy Fav', 'Med Fav', 'Slight Fav', 'Pick',
                'Pick+', 'Slight Dog', 'Med Dog', 'Big Dog']
    )
    odds_perf = {}
    for bucket in df['odds_bucket'].dropna().unique():
        subset = df[df['odds_bucket'] == bucket]
        if len(subset) >= 10:
            odds_perf[str(bucket)] = {
                'n_bets': len(subset),
                'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
                'win_rate': (subset['result'] == 'win').mean(),
            }
    evaluations['by_odds_range'] = odds_perf

    # By confidence level
    if df['confidence'].notna().any():
        conf_perf = {}
        for conf in df['confidence'].dropna().unique():
            subset = df[df['confidence'] == conf]
            conf_perf[conf] = {
                'n_bets': len(subset),
                'roi': subset['profit_loss'].sum() / subset['stake'].sum(),
                'win_rate': (subset['result'] == 'win').mean(),
            }
        evaluations['by_confidence'] = conf_perf

    return evaluations

13.5 When Markets Correct: Adapting Your Approach

The Lifecycle of an Edge

Every edge in sports betting has a lifecycle:

  1. Discovery: You or your model identifies a market inefficiency
  2. Exploitation: You profit from the inefficiency
  3. Correction: The market learns and the inefficiency narrows or disappears
  4. Adaptation: You find new inefficiencies or refine your approach

Understanding this lifecycle is critical because no edge lasts forever. The sports betting market is an adversarial environment where:

  • Sportsbooks hire quantitative analysts and use machine learning to improve their lines
  • Other sharp bettors discover and exploit the same inefficiencies
  • Data that was once hard to obtain becomes widely available
  • Regulatory changes alter market dynamics

Detecting Edge Decay

You can detect when your edge is decaying by monitoring several metrics over time:

def detect_edge_decay(
    bets_df: pd.DataFrame,
    window_size: int = 100,
    min_windows: int = 5
) -> dict:
    """
    Detect whether your betting edge is decaying over time
    using rolling window analysis.

    Parameters
    ----------
    bets_df : pd.DataFrame
        Chronologically ordered bets with 'profit_loss', 'stake',
        and optionally 'odds_placed', 'odds_closing' columns
    window_size : int
        Number of bets per rolling window
    min_windows : int
        Minimum number of complete windows required

    Returns
    -------
    dict with decay analysis
    """
    df = bets_df.sort_values('date_placed').reset_index(drop=True)
    n = len(df)

    if n < window_size * min_windows:
        return {'error': f'Need at least {window_size * min_windows} bets'}

    # Rolling ROI
    rolling_roi = []
    rolling_clv = []

    for start in range(0, n - window_size + 1, window_size // 2):
        end = start + window_size
        window = df.iloc[start:end]

        roi = window['profit_loss'].sum() / window['stake'].sum()
        rolling_roi.append({
            'window_start': start,
            'window_end': end,
            'roi': roi,
        })

        # Rolling CLV if available
        if 'odds_closing' in window.columns:
            clv_window = window.dropna(subset=['odds_closing'])
            if len(clv_window) > 0:
                clv_values = []
                for _, row in clv_window.iterrows():
                    placed = american_to_implied_prob(row['odds_placed'])
                    closing = american_to_implied_prob(row['odds_closing'])
                    clv_values.append(closing - placed)
                rolling_clv.append({
                    'window_start': start,
                    'window_end': end,
                    'mean_clv': np.mean(clv_values),
                })

    roi_df = pd.DataFrame(rolling_roi)

    # Linear regression on rolling ROI to detect trend
    from scipy.stats import linregress
    x = np.arange(len(roi_df))
    slope, intercept, r_value, p_value, std_err = linregress(
        x, roi_df['roi'].values
    )

    # Interpretation
    if slope < -0.001 and p_value < 0.10:
        decay_status = "SIGNIFICANT DECAY DETECTED"
        recommendation = (
            "Your edge appears to be declining over time. "
            "Consider revising your model, exploring new markets, "
            "or adjusting your approach."
        )
    elif slope < 0:
        decay_status = "MILD DECLINE (not statistically significant)"
        recommendation = (
            "There is a slight downward trend, but it could be due to "
            "normal variance. Continue monitoring."
        )
    else:
        decay_status = "NO DECAY DETECTED"
        recommendation = (
            "Your edge appears stable or improving. Continue current approach."
        )

    # CLV decay analysis
    clv_analysis = None
    if rolling_clv:
        clv_df = pd.DataFrame(rolling_clv)
        clv_slope, _, clv_r, clv_p, _ = linregress(
            np.arange(len(clv_df)), clv_df['mean_clv'].values
        )
        clv_analysis = {
            'clv_trend_slope': clv_slope,
            'clv_trend_p_value': clv_p,
            'clv_declining': clv_slope < 0 and clv_p < 0.10,
        }

    return {
        'n_windows': len(roi_df),
        'window_size': window_size,
        'roi_trend_slope': slope,
        'roi_trend_r_squared': r_value**2,
        'roi_trend_p_value': p_value,
        'decay_status': decay_status,
        'recommendation': recommendation,
        'clv_analysis': clv_analysis,
        'rolling_roi_data': roi_df,
    }

Common Causes of Edge Decay

Cause Signal Response
Model parameters stale CLV declining, ROI declining Retrain model on recent data
Market became more efficient CLV near zero, fewer outlier lines Move to less efficient markets
Sportsbook adjusted their model Lines closer to sharp market Find new books with pricing gaps
New data source widely available Everyone has what you had Find new, unique data sources
Rule change in sport Historical patterns no longer apply Update model for new rules
Account limited/restricted Can't get desired stakes Open accounts at new books

Strategies for Staying Ahead

1. Diversify Across Markets and Sports

Don't rely on a single edge source. Maintain edges across: - Multiple sports (NFL, NBA, MLB, NHL, soccer) - Multiple market types (sides, totals, props, futures) - Multiple timeframes (pre-game, live, futures)

2. Continuously Update Your Models

class AdaptiveModel:
    """
    A model framework that continuously learns and adapts.

    Uses an expanding window approach where the model is periodically
    retrained on all available data, with more weight on recent observations.
    """

    def __init__(
        self,
        base_model,
        retrain_frequency: int = 100,
        recency_weight: float = 0.7
    ):
        """
        Parameters
        ----------
        base_model : sklearn-compatible model
            The underlying prediction model
        retrain_frequency : int
            Retrain after this many new observations
        recency_weight : float
            Weight given to last season's data vs. all historical (0-1)
        """
        self.base_model = base_model
        self.retrain_frequency = retrain_frequency
        self.recency_weight = recency_weight
        self.training_data = []
        self.n_since_retrain = 0
        self.version = 0

    def add_observation(self, features: np.ndarray, outcome: float):
        """Add a new observation and retrain if threshold reached."""
        self.training_data.append((features, outcome))
        self.n_since_retrain += 1

        if self.n_since_retrain >= self.retrain_frequency:
            self.retrain()

    def retrain(self):
        """Retrain the model on all available data with recency weighting."""
        if len(self.training_data) < 50:
            return

        X = np.array([obs[0] for obs in self.training_data])
        y = np.array([obs[1] for obs in self.training_data])

        # Create sample weights: more weight on recent data
        n = len(y)
        recency = np.linspace(1 - self.recency_weight, 1.0, n)
        weights = recency / recency.sum() * n

        self.base_model.fit(X, y, sample_weight=weights)
        self.version += 1
        self.n_since_retrain = 0

    def predict(self, features: np.ndarray) -> float:
        """Generate probability prediction."""
        if self.version == 0:
            return 0.5  # No training yet
        return self.base_model.predict_proba(features.reshape(1, -1))[0, 1]

3. Monitor the Information Ecosystem

Stay aware of what information is becoming publicly available. When a formerly proprietary dataset (e.g., player tracking data, advanced metrics) becomes public, the edge from that data diminishes rapidly.

4. Build Process, Not Just Models

The most sustainable edge comes from better processes:

  • Faster data pipeline: Getting information before others
  • Better bet execution: Lower latency in placing bets
  • Superior bankroll management: Surviving drawdowns that bust competitors
  • Disciplined tracking: Learning from mistakes systematically

Callout: The "Red Queen" Effect

In Lewis Carroll's Through the Looking-Glass, the Red Queen tells Alice: "It takes all the running you can do, to keep in the same place." Sports betting is similar. The market is constantly improving, and maintaining your edge requires continuous effort. The bettor who rests on their laurels will find their edge evaporating. The bettor who continuously learns, adapts, and innovates will find new edges as old ones close. This is why process matters more than any single model.

Practical Adaptation Workflow

Here is a quarterly review process for maintaining your edge:

def quarterly_review(tracker: BetTracker, quarter_start: str, quarter_end: str):
    """
    Perform a structured quarterly review of betting performance.

    This generates a comprehensive report that guides adaptation.
    """
    # 1. Overall performance
    overall = tracker.get_performance_summary(
        date_from=quarter_start, date_to=quarter_end
    )

    print("=" * 60)
    print(f"QUARTERLY REVIEW: {quarter_start} to {quarter_end}")
    print("=" * 60)
    print(f"\nOverall: {overall['total_bets']} bets, "
          f"ROI: {overall['roi_pct']:.2f}%, "
          f"Profit: ${overall['total_profit']:.2f}")

    # 2. Performance by segment
    for sport in ['NFL', 'NBA', 'MLB', 'NHL']:
        segment = tracker.get_performance_summary(
            sport=sport, date_from=quarter_start, date_to=quarter_end
        )
        if 'error' not in segment:
            print(f"\n  {sport}: {segment['total_bets']} bets, "
                  f"ROI: {segment['roi_pct']:.2f}%")

    # 3. CLV analysis
    if overall.get('mean_clv') is not None:
        clv_pct = overall['mean_clv'] * 100
        print(f"\nMean CLV: {clv_pct:+.2f}% "
              f"({overall['clv_bets_tracked']} bets tracked)")

        if clv_pct < 0:
            print("  WARNING: Negative CLV suggests your betting process "
                  "may not have an edge.")
        elif clv_pct < 1:
            print("  CAUTION: Low CLV. Edge may be marginal or decaying.")
        else:
            print("  POSITIVE: Strong CLV indicates a viable edge.")

    # 4. Adaptation recommendations
    print("\n--- ADAPTATION CHECKLIST ---")
    print("[ ] Review and retrain models on latest data")
    print("[ ] Check for newly available data sources")
    print("[ ] Assess which markets/sports had best/worst CLV")
    print("[ ] Review sportsbook account status (limits, bans)")
    print("[ ] Update bankroll allocation across sports")
    print("[ ] Evaluate any rule changes in tracked sports")
    print("[ ] Review bet sizing discipline (actual vs. recommended)")

13.6 Chapter Summary

This chapter developed a comprehensive framework for value betting -- the systematic identification and exploitation of positive expected value opportunities in sports betting markets.

True Probability Estimation: - Model-based approaches (Elo, logistic regression, etc.) estimate probability from features - Market-based approaches use sharp closing lines as the best available probability estimate - The Bayesian framework combines model and market information, weighted by confidence in each - Calibration analysis verifies that your probability estimates are accurate

Systematic Value Identification: - Value exists when your estimated true probability exceeds the implied probability from the odds - Edge thresholds should account for estimation uncertainty, market efficiency, and variance - Multi-factor scoring incorporates edge size, track record, market efficiency, liquidity, diversification, and timing - Confidence in your edge should modulate stake size through the Kelly criterion

Bet Tracking and Journaling: - Track every bet with complete metadata: odds, closing line, stake, result, model, and reasoning - A qualitative journal captures the pre-bet thesis, counter-arguments, and post-event review - Systematic tracking enables performance analysis across sports, markets, books, and time periods

Performance Evaluation: - The sample size problem is severe: confirming a 3% edge requires 4,000+ bets - CLV provides a faster signal of edge than raw profit/loss - Confidence intervals on ROI should be calculated and monitored over time - Regression to the mean means early results are unreliable -- trust the process

Market Adaptation: - Every edge has a lifecycle: discovery, exploitation, correction, adaptation - Detect edge decay through rolling window analysis of ROI and CLV - Stay ahead through diversification, model updating, information monitoring, and process improvement - Quarterly reviews provide structured opportunities to reassess and adapt

In Chapter 14, we will take the bankroll management concepts introduced in Chapter 4 to an advanced level, with rigorous Kelly criterion derivations, portfolio theory applications, and sophisticated multi-account allocation strategies.


Exercises

Exercise 13.1: Build an Elo model for a sport of your choice using at least 3 seasons of historical data. Calculate the model's Brier score and compare it to a naive baseline that always predicts the home team wins with 55% probability.

Exercise 13.2: Using the bayesian_probability_combination function, explore how the combined probability changes as you vary the model confidence weight from 0.1 to 0.9. Plot the results for a case where your model says 60% and the market says 50%.

Exercise 13.3: Create a BetTracker database and populate it with at least 50 synthetic bets. Use the get_performance_summary function to generate reports by sport and market type.

Exercise 13.4: Write a simulation that generates 2,000 bets for a bettor with a 2.5% true ROI, then applies the detect_edge_decay function. Introduce a gradual decay in the bettor's edge starting at bet 1,000 (reducing from 2.5% to 0.5% over the next 1,000 bets). Does the function detect the decay?

Exercise 13.5: Implement a "model comparison" framework that takes probability estimates from two different models and determines which model has better calibration and CLV on a historical dataset of at least 200 games.