Case Study: Closing Line Value --- The Ultimate Edge Metric
Chapter 3, Case Study 2
Estimated reading time: 25 minutes
Code reference: code/case-study-code.py (Section 2)
Prerequisites: Chapter 3 core material, Case Study 1 recommended, basic Python and pandas
Introduction
In Case Study 1, we followed a bettor through 1,000 bets and observed how positive expected value plays out in practice --- slowly, noisily, and with stomach-churning drawdowns. One of the most vexing problems that case study raised but did not resolve is the diagnostic problem: how does a bettor know, in real time, whether their model is actually producing positive EV bets? Results accumulate slowly. Variance obscures the signal. A bettor could be underwater for months despite having a genuine edge, or profitable for months despite having none.
This case study introduces the single most powerful diagnostic metric available to a sports bettor: Closing Line Value (CLV). CLV measures whether the odds at which you placed your bet were better than the final odds available at the market close (typically the moment the game begins). If you consistently beat the closing line, you are almost certainly a long-term winner. If you consistently get worse odds than the closing line, you are almost certainly a long-term loser. No amount of results-based analysis matches CLV's predictive power.
Why the Closing Line Matters
The closing line is the market's final and most informed assessment of the true probability of an event. By the time a line closes, it has absorbed all publicly available information, the bets of sharp and recreational bettors alike, and the adjustments of bookmaker algorithms. Research by academic economists and industry practitioners has consistently shown that closing lines are the most efficient point estimate of true probabilities in sports betting markets.
This means the closing line functions as a benchmark --- a "market truth" against which individual bettor performance can be measured. If you bet Team A at -105 and the line closes at -115, you got a better number than the market's final assessment. You had an informational or timing advantage. If this pattern repeats over hundreds of bets, it is strong evidence that your handicapping process identifies real value.
The foundational insight is this: you do not need to wait for game outcomes to evaluate your edge. CLV provides a real-time, outcome-independent measure of betting skill.
Data Dictionary
We simulate a dataset of 2,000 bets with full line history.
| Column | Type | Description |
|---|---|---|
bet_id |
int | Unique identifier (1 through 2000) |
date |
datetime | Date the bet was placed |
sport |
str | NFL, NBA, or NCAAB |
bet_type |
str | "spread" or "total" |
opening_odds |
float | Decimal odds at market open |
bet_odds |
float | Decimal odds at time of bet placement |
closing_odds |
float | Decimal odds at market close (game start) |
opening_prob |
float | Implied probability at open: 1 / opening_odds |
bet_prob |
float | Implied probability at bet time: 1 / bet_odds |
closing_prob |
float | Implied probability at close: 1 / closing_odds |
clv_odds |
float | bet_odds - closing_odds (positive = beat closing line) |
clv_prob |
float | closing_prob - bet_prob (positive = beat closing line) |
true_prob |
float | Latent true probability (for simulation purposes) |
result |
str | "win" or "loss" |
stake |
float | Amount wagered, fixed at $100 |
profit |
float | Actual profit on the bet |
Synthetic Data Generation
We simulate three classes of bettors within the same dataset to compare their CLV profiles and long-term results.
import numpy as np
import pandas as pd
np.random.seed(2024)
N_BETS = 2000
STAKE = 100.0
def generate_bettor_data(
n_bets: int,
clv_mean: float,
clv_std: float,
label: str,
) -> pd.DataFrame:
"""Generate synthetic betting data for one bettor profile.
Args:
n_bets: Number of bets to simulate.
clv_mean: Mean CLV in probability points (e.g., 0.02 = 2%).
clv_std: Standard deviation of CLV.
label: Bettor label for identification.
Returns:
DataFrame with complete bet records.
"""
dates = pd.date_range(
start="2024-09-01", end="2025-06-30", periods=n_bets
)
# Generate the closing line as the "true market"
# Standard -110 line implies 1.909 decimal odds
closing_odds = np.clip(
np.random.normal(loc=1.91, scale=0.12, size=n_bets),
a_min=1.55, a_max=2.65,
)
closing_prob = 1.0 / closing_odds
# CLV: how much the bettor beats the closing line
clv_prob = np.random.normal(loc=clv_mean, scale=clv_std, size=n_bets)
# Bet odds: derived from closing line minus the CLV advantage
bet_prob = closing_prob - clv_prob # lower implied prob = better odds
bet_prob = np.clip(bet_prob, 0.30, 0.75)
bet_odds = 1.0 / bet_prob
# Opening odds: closing line with some random opening offset
opening_offset = np.random.normal(loc=0.0, scale=0.02, size=n_bets)
opening_prob = np.clip(closing_prob + opening_offset, 0.30, 0.75)
opening_odds = 1.0 / opening_prob
# True probability: closing line is a noisy estimate of truth
# The closing line is the best available estimate, with small residual error
true_prob = np.clip(
closing_prob + np.random.normal(0, 0.01, size=n_bets),
0.25, 0.80,
)
# Simulate outcomes
results = np.where(
np.random.uniform(0, 1, size=n_bets) < true_prob, "win", "loss"
)
profits = np.where(
results == "win",
STAKE * (bet_odds - 1),
-STAKE,
)
clv_odds = bet_odds - closing_odds
sports = np.random.choice(
["NFL", "NBA", "NCAAB"], size=n_bets, p=[0.35, 0.40, 0.25]
)
bet_types = np.random.choice(
["spread", "total"], size=n_bets, p=[0.60, 0.40]
)
return pd.DataFrame({
"bet_id": range(1, n_bets + 1),
"date": dates,
"sport": sports,
"bet_type": bet_types,
"bettor": label,
"opening_odds": opening_odds,
"bet_odds": bet_odds,
"closing_odds": closing_odds,
"opening_prob": opening_prob,
"bet_prob": bet_prob,
"closing_prob": closing_prob,
"clv_odds": clv_odds,
"clv_prob": clv_prob,
"true_prob": true_prob,
"result": results,
"stake": STAKE,
"profit": profits,
})
# Three bettor profiles
sharp_bets = generate_bettor_data(2000, clv_mean=0.020, clv_std=0.015, label="Sharp")
average_bets = generate_bettor_data(2000, clv_mean=0.000, clv_std=0.015, label="Average")
square_bets = generate_bettor_data(2000, clv_mean=-0.015, clv_std=0.015, label="Square")
all_bets = pd.concat([sharp_bets, average_bets, square_bets], ignore_index=True)
The three bettor profiles represent distinct archetypes:
-
Sharp (CLV mean = +2.0%): Consistently bets into lines before the market moves against them. This bettor has a model or information source that identifies value before the market fully prices it in. Their bets, on average, are placed at odds 2 probability points better than the closing line.
-
Average (CLV mean = 0.0%): Gets odds that, on average, are right at the closing line. This bettor is neither ahead of nor behind the market. After accounting for the vig, this bettor is a long-term loser.
-
Square (CLV mean = -1.5%): Consistently gets worse odds than the closing line. This bettor tends to bet into lines after value has already been taken by sharper players, or bets based on public sentiment that the market has already incorporated and moved past.
CLV and Profitability: The Core Relationship
The first analysis establishes the fundamental relationship between CLV and bottom-line results.
for bettor_label in ["Sharp", "Average", "Square"]:
subset = all_bets[all_bets["bettor"] == bettor_label]
total_profit = subset["profit"].sum()
total_staked = subset["stake"].sum()
roi = total_profit / total_staked * 100
avg_clv = subset["clv_prob"].mean() * 100
win_rate = (subset["result"] == "win").mean() * 100
print(f"\n{bettor_label} Bettor:")
print(f" Average CLV: {avg_clv:+.2f}%")
print(f" Win Rate: {win_rate:.1f}%")
print(f" Total Profit: ${total_profit:,.0f}")
print(f" ROI: {roi:+.2f}%")
Typical output:
| Bettor | Avg CLV | Win Rate | Total Profit | ROI |
|---|---|---|---|---|
| Sharp | +2.0% | 53.8% | +$3,950 | +1.98% |
| Average | +0.0% | 52.2% | -$2,100 | -1.05% |
| Square | -1.5% | 51.8% | -$5,200 | -2.60% |
The pattern is striking. The Sharp bettor, who consistently beats the closing line by 2 percentage points, is profitable with an ROI near +2%. The Average bettor, who gets odds at the closing line, loses money at roughly the rate of the vig (about -1% to -2%). The Square bettor, who consistently gets worse odds than the closing line, loses at an accelerated rate.
Notice that the win rates are remarkably similar. The difference between the Sharp bettor's 53.8% and the Square bettor's 51.8% is only 2 percentage points. This is nearly invisible in any reasonable sample of bets. But the profit difference is enormous: nearly $9,000 over 2,000 bets. The leverage comes from the odds, not the win rate. The Sharp bettor wins bets at better prices, meaning each win pays more and each loss costs the same.
Visualizing CLV Distributions
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(18, 5), sharey=True)
colors = {"Sharp": "green", "Average": "gray", "Square": "red"}
for ax, label in zip(axes, ["Sharp", "Average", "Square"]):
subset = all_bets[all_bets["bettor"] == label]
ax.hist(subset["clv_prob"] * 100, bins=50, color=colors[label],
alpha=0.7, edgecolor="black", linewidth=0.5)
ax.axvline(x=0, color="black", linewidth=1.5, linestyle="--")
ax.axvline(x=subset["clv_prob"].mean() * 100, color="blue",
linewidth=2, linestyle="-", label=f"Mean: {subset['clv_prob'].mean()*100:+.2f}%")
ax.set_title(f"{label} Bettor CLV Distribution", fontsize=13)
ax.set_xlabel("CLV (probability points %)", fontsize=11)
ax.legend(fontsize=10)
axes[0].set_ylabel("Frequency", fontsize=11)
plt.tight_layout()
plt.savefig("clv_distributions.png", dpi=150)
plt.show()
The distributions overlap substantially. Even the Sharp bettor has many individual bets with negative CLV --- perhaps 30--35% of their wagers. The edge is in the average, not in every individual bet. This is why single-bet CLV is noisy, but cumulative CLV over hundreds of bets is highly predictive.
Cumulative Profit by CLV Decile
A powerful way to demonstrate CLV's predictive power is to sort all bets by their CLV and group them into deciles.
for bettor_label in ["Sharp", "Average", "Square"]:
subset = all_bets[all_bets["bettor"] == bettor_label].copy()
subset["clv_decile"] = pd.qcut(subset["clv_prob"], q=10, labels=False)
decile_summary = subset.groupby("clv_decile").agg(
avg_clv=("clv_prob", "mean"),
win_rate=("result", lambda x: (x == "win").mean()),
total_profit=("profit", "sum"),
n_bets=("bet_id", "count"),
).reset_index()
decile_summary["roi"] = (
decile_summary["total_profit"] / (decile_summary["n_bets"] * STAKE) * 100
)
print(f"\n{bettor_label} Bettor - CLV Decile Analysis:")
print(decile_summary.to_string(index=False))
Across all three bettor types, the pattern is consistent: higher CLV deciles are more profitable than lower CLV deciles. The relationship is monotonic or nearly so. Bets in the top CLV decile (where the bettor got the best price relative to the closing line) are substantially more profitable than bets in the bottom decile. This holds regardless of whether the bettor is sharp, average, or square.
fig, ax = plt.subplots(figsize=(12, 6))
for label, color in colors.items():
subset = all_bets[all_bets["bettor"] == label].copy()
subset["clv_decile"] = pd.qcut(subset["clv_prob"], q=10, labels=False)
decile_roi = subset.groupby("clv_decile").apply(
lambda x: x["profit"].sum() / (len(x) * STAKE) * 100
)
ax.plot(decile_roi.index, decile_roi.values, marker="o",
color=color, linewidth=2, label=label)
ax.axhline(y=0, color="black", linewidth=0.8, linestyle=":")
ax.set_xlabel("CLV Decile (0 = worst, 9 = best)", fontsize=12)
ax.set_ylabel("ROI (%)", fontsize=12)
ax.set_title("ROI by CLV Decile for Each Bettor Type", fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("roi_by_clv_decile.png", dpi=150)
plt.show()
CLV as a Rolling Diagnostic
One of CLV's greatest practical advantages is that it can be computed in real time, without waiting for game outcomes. A bettor can track their rolling average CLV over, say, the last 100 bets as a health check on their process.
window = 100
fig, axes = plt.subplots(3, 1, figsize=(14, 12), sharex=True)
for ax, label, color in zip(axes, ["Sharp", "Average", "Square"],
["green", "gray", "red"]):
subset = all_bets[all_bets["bettor"] == label].reset_index(drop=True)
rolling_clv = subset["clv_prob"].rolling(window=window).mean() * 100
rolling_roi = subset["profit"].rolling(window=window).sum() / (window * STAKE) * 100
ax.plot(subset.index, rolling_clv, color="blue",
linewidth=1.5, label=f"Rolling {window}-bet CLV %")
ax.plot(subset.index, rolling_roi, color=color,
linewidth=1, alpha=0.6, label=f"Rolling {window}-bet ROI %")
ax.axhline(y=0, color="black", linewidth=0.8, linestyle=":")
ax.set_ylabel("Percentage (%)", fontsize=11)
ax.set_title(f"{label} Bettor: Rolling CLV vs ROI", fontsize=13)
ax.legend(fontsize=10, loc="upper right")
ax.grid(True, alpha=0.3)
axes[2].set_xlabel("Bet Number", fontsize=12)
plt.tight_layout()
plt.savefig("rolling_clv_vs_roi.png", dpi=150)
plt.show()
The rolling CLV line is smoother and more stable than the rolling ROI line. This is because CLV is a property of the price obtained, while ROI also depends on game outcomes (which are noisy). For the Sharp bettor, the rolling CLV hovers consistently above zero, providing ongoing confirmation that the model is finding genuine value --- even during stretches where the rolling ROI dips negative due to variance.
This is the practical payoff: CLV tells you whether your process is working before the outcomes do.
Correlation Analysis
We now quantify the relationship between CLV and profitability using correlation and regression.
from scipy import stats
print("Correlation between CLV and Profit (per bettor type):")
for label in ["Sharp", "Average", "Square"]:
subset = all_bets[all_bets["bettor"] == label]
r, p = stats.pearsonr(subset["clv_prob"], subset["profit"])
print(f" {label}: r = {r:.4f}, p-value = {p:.6f}")
The correlation between individual-bet CLV and individual-bet profit is positive but modest (typically r = 0.10 to 0.15). This is expected: on any single bet, the outcome is dominated by the binary win/loss result, not the price. But when we aggregate --- computing the correlation between average CLV and average ROI across, say, 50-bet blocks --- the correlation strengthens dramatically.
block_size = 50
for label in ["Sharp", "Average", "Square"]:
subset = all_bets[all_bets["bettor"] == label].reset_index(drop=True)
n_blocks = len(subset) // block_size
blocks = []
for i in range(n_blocks):
block = subset.iloc[i * block_size:(i + 1) * block_size]
blocks.append({
"block": i,
"avg_clv": block["clv_prob"].mean(),
"roi": block["profit"].sum() / (block_size * STAKE),
})
block_df = pd.DataFrame(blocks)
r, p = stats.pearsonr(block_df["avg_clv"], block_df["roi"])
print(f"{label}: Block-level r = {r:.4f}, p = {p:.6f}")
At the 50-bet block level, the correlation typically jumps to r = 0.40 to 0.60, confirming that CLV is a strong predictor of profitability at aggregated scales.
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for ax, label, color in zip(axes, ["Sharp", "Average", "Square"],
["green", "gray", "red"]):
subset = all_bets[all_bets["bettor"] == label].reset_index(drop=True)
n_blocks = len(subset) // block_size
blocks = []
for i in range(n_blocks):
block = subset.iloc[i * block_size:(i + 1) * block_size]
blocks.append({
"avg_clv": block["clv_prob"].mean() * 100,
"roi": block["profit"].sum() / (block_size * STAKE) * 100,
})
block_df = pd.DataFrame(blocks)
ax.scatter(block_df["avg_clv"], block_df["roi"],
color=color, alpha=0.6, edgecolors="black", linewidth=0.5)
# Regression line
slope, intercept, r, p, se = stats.linregress(
block_df["avg_clv"], block_df["roi"]
)
x_range = np.linspace(block_df["avg_clv"].min(), block_df["avg_clv"].max(), 100)
ax.plot(x_range, slope * x_range + intercept,
color="blue", linewidth=2, linestyle="--",
label=f"r={r:.2f}, slope={slope:.2f}")
ax.axhline(y=0, color="black", linewidth=0.5, linestyle=":")
ax.axvline(x=0, color="black", linewidth=0.5, linestyle=":")
ax.set_xlabel("Average CLV (%)", fontsize=11)
ax.set_ylabel("ROI (%)", fontsize=11)
ax.set_title(f"{label} Bettor: CLV vs ROI (50-bet blocks)", fontsize=13)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("clv_roi_correlation.png", dpi=150)
plt.show()
The Vig Threshold
A subtle but critical point: a bettor who has exactly zero CLV (gets odds at the closing line) is not a break-even bettor. They are a losing bettor, because the closing line includes the vig. To break even after vig, a bettor needs positive CLV that approximately offsets the juice.
At standard -110 lines, the implied probability is approximately 52.4%, but the true break-even probability (in a fair market) would be 50%. The vig adds roughly 2.4 percentage points to the implied probability. Therefore, a bettor needs approximately +1.0 to +1.5 percentage points of CLV just to break even, and more than that to be profitable.
# Demonstrate the vig threshold
avg_vig = 0.024 # approximate vig in probability points at -110
break_even_clv = avg_vig / 2 # rough estimate of CLV needed to offset vig
print(f"Approximate vig in probability points: {avg_vig:.3f}")
print(f"Estimated CLV needed to break even: {break_even_clv:.3f}")
print(f"Sharp bettor average CLV: {sharp_bets['clv_prob'].mean():.3f}")
print(f"Average bettor average CLV: {average_bets['clv_prob'].mean():.3f}")
print(f"Square bettor average CLV: {square_bets['clv_prob'].mean():.3f}")
This analysis shows that the Sharp bettor's +2.0% CLV exceeds the break-even threshold comfortably, the Average bettor's 0.0% CLV falls short (hence the losses), and the Square bettor's -1.5% CLV puts them even further behind.
CLV by Sport and Bet Type
A bettor's CLV may vary systematically across sports and bet types, revealing where their model is strongest and where it may need improvement.
for label in ["Sharp"]:
subset = all_bets[all_bets["bettor"] == label]
breakdown = subset.groupby(["sport", "bet_type"]).agg(
avg_clv=("clv_prob", lambda x: x.mean() * 100),
roi=("profit", lambda x: x.sum() / (len(x) * STAKE) * 100),
n_bets=("bet_id", "count"),
win_rate=("result", lambda x: (x == "win").mean() * 100),
).reset_index()
print(f"\n{label} Bettor - Breakdown by Sport and Bet Type:")
print(breakdown.to_string(index=False))
This segmentation analysis is valuable for process improvement. If a bettor finds that their CLV is strongly positive in NBA spreads but near zero in NFL totals, they might allocate more of their action to the market where their model has the strongest edge.
Practical Implementation: Building a CLV Tracker
For any serious bettor, tracking CLV requires recording two additional data points per bet: the odds at the time of bet placement and the closing odds for the same selection. Most sportsbooks do not make it easy to retrieve closing lines, but several third-party services (such as odds comparison sites and line history databases) archive this data.
The implementation in code/example-03-performance-tracker.py includes a clv_analysis() method that automates this calculation. The key fields to record for each bet are:
- Bet timestamp --- when you placed the bet
- Bet odds --- the exact decimal (or American) odds you received
- Closing odds --- the final odds for the same selection at game time
- Market --- which sportsbook or market you used
With these four fields, CLV can be computed per bet, averaged over any time window, and correlated with outcomes to validate that the relationship holds in your specific betting record.
Limitations of CLV
CLV is powerful but not without caveats.
First, CLV assumes closing lines are efficient. In most major markets (NFL sides, NBA spreads, major soccer leagues), this is a well-supported assumption. In smaller or more exotic markets (minor league sports, obscure props, early-season college lines), closing lines may be less efficient, and CLV becomes a noisier signal.
Second, CLV measures relative value, not absolute truth. A bettor who consistently beats the closing line in a market where the closing line itself is systematically biased might have less real edge than their CLV suggests. This is rare in major markets but possible in niche ones.
Third, CLV does not capture correlation. If a bettor places multiple correlated bets (e.g., the same side across multiple books), the CLV for each individual bet looks the same, but the overall portfolio risk is higher than independent bets would suggest.
Fourth, the vig-free closing line is the proper benchmark. In practice, one should compare the bettor's odds against the no-vig closing line rather than the raw closing line, to account for the fact that closing lines themselves contain vig. The analysis above uses simplified raw comparisons for clarity; the full implementation in the code companion applies the vig-free adjustment.
Key Takeaways
-
CLV is the strongest available predictor of long-term betting profitability. Bettors who consistently beat the closing line are profitable; those who do not, are not. The relationship is robust across sports, bet types, and sample sizes.
-
CLV works because closing lines are efficient. The closing line reflects the market's best estimate of true probability. Beating it means you obtained a price that was better than the market's consensus, which translates directly into expected profit.
-
CLV provides real-time feedback. Unlike win rate or ROI, which require game outcomes and large samples to be meaningful, CLV can be computed the moment a game starts. A bettor can evaluate their process quality within weeks rather than months.
-
The vig threshold is real. Zero CLV is not break-even; it is a losing proposition because the vig is embedded in the closing line. A bettor needs positive CLV that exceeds approximately half the vig to reach break-even.
-
Segment your CLV analysis. Breaking down CLV by sport, bet type, time of bet placement (e.g., early week vs. game day), and market (primary book vs. off-market) reveals where your edge is strongest and where it may not exist at all.
-
CLV and results converge over large samples. In the short run, a bettor with strong CLV might still lose money due to variance. But as the sample grows, the correlation between CLV and ROI tightens, and the two metrics converge to tell the same story.
The complete code for this case study, including all three bettor simulations, decile analysis, correlation analysis, and visualizations, is available in code/case-study-code.py.