Case Study 1: Building a Professional Betting Journal from Scratch
Executive Summary
Record-keeping is the unsexy foundation of profitable sports betting. Every serious bettor knows they should keep a journal; far fewer actually maintain one with the discipline and depth required to generate actionable insight. This case study follows David, a software engineer who transitioned from recreational to semi-professional NBA betting, through the twelve-month process of designing, implementing, and iterating on a comprehensive digital betting journal. Starting with a simple Google Sheet that captured seven fields per bet, David progressively expanded his system through three major iterations --- each driven by specific analytical questions his existing journal could not answer. By the end of the year, his journal had evolved into a Python-backed SQLite database capturing 28 fields per bet, integrated with automated performance analysis, CLV computation, and emotional state tracking. The case documents each iteration in detail, including the Python code that powered the system, the specific insights each version unlocked, and the performance impact of those insights. David's experience demonstrates that a journal is not a static tool but a living system that evolves alongside the bettor it serves.
Background
The Bettor Profile
David had been betting casually on NBA games for two years, using a combination of his own observations and a simple regression model that predicted point spreads based on team efficiency ratings, pace, and home court advantage. His model was decent but not exceptional --- backtesting suggested a 2-3% edge on NBA sides when value existed.
For his first two years, David kept no records beyond what his sportsbook's transaction history showed. He believed he was profitable but had no way to verify this, let alone diagnose which aspects of his process were working and which were not. His "system" was to check his model output each morning, compare it to the available lines, and bet when the discrepancy felt large enough. Stake sizing was intuitive --- "more when I feel good about it, less when I don't."
The catalyst for change was a conversation with a professional bettor who asked David three questions he could not answer: "What is your CLV over the last six months?", "Which bet types are your most profitable?", and "How does your performance change when you override your model?" The inability to answer any of these convinced David that his lack of record-keeping was not merely inconvenient --- it was actively costing him money.
Version 1: The Minimum Viable Journal (Months 1-3)
Design Philosophy
David followed the principle of starting simple and expanding based on need. His first journal was a Google Sheet with seven columns:
| Field | Type | Example |
|---|---|---|
| Date | Date | 2024-11-15 |
| Game | Text | Celtics vs. Pacers |
| Selection | Text | Celtics -4.5 |
| Odds | Decimal | 1.909 |
| Stake | Currency | $200 |
| Result | W/L/P | W |
| P&L | Currency | $182 |
Implementation
The sheet required approximately 30 seconds per bet to complete. David entered bets on his phone immediately after placing them and updated results each morning while checking scores.
Insights from Version 1
After three months and 187 bets, the simple journal revealed several surprises:
-
He was betting more than he realized. David had estimated he placed "about 4 bets per week." The actual average was 7.2, with spikes to 15 on busy NBA slates.
-
His overall ROI was +1.8%. Positive, but much lower than the 3% his model backtesting suggested. The gap was unexplained.
-
Weekend bets underperformed. Saturday and Sunday bets had a -2.1% ROI versus +3.4% for weekday bets. David had no hypothesis for why.
Limitations
The journal could answer "Am I profitable?" but could not answer "Why?" or "Where?" The absence of model output, reasoning, and emotional state data made it impossible to diagnose the ROI gap or the weekend underperformance.
Version 2: Adding Analytical Depth (Months 4-7)
Design Philosophy
David's second iteration added the fields needed to answer diagnostic questions. He migrated from Google Sheets to a Python-backed CSV system, which allowed automated calculations.
New Fields Added
| Field | Purpose |
|---|---|
| Model spread | Enable CLV analysis |
| Opening line | Detect anchoring |
| Closing line | Compute CLV |
| Confidence (1-10) | Calibration analysis |
| Reasoning (free text) | Bias detection |
| Override flag (Y/N) | Override performance analysis |
| Emotional state | Tilt correlation |
| Bankroll at time | Accurate stake percentage |
Key Python Code
"""Version 2 journal with automated CLV computation.
Extends the CSV-based journal with closing line value analysis,
category performance breakdowns, and basic bias detection.
"""
import pandas as pd
import numpy as np
from typing import Dict, Optional
class JournalV2:
"""Enhanced betting journal with analytical capabilities."""
def __init__(self, csv_path: str) -> None:
"""Load journal data from CSV.
Args:
csv_path: Path to the journal CSV file.
"""
self.df = pd.read_csv(csv_path, parse_dates=["date"])
self.df = self.df.sort_values("date").reset_index(drop=True)
self._prepare_fields()
def _prepare_fields(self) -> None:
"""Compute derived fields."""
self.df["won"] = (self.df["result"] == "W").astype(int)
self.df["stake_pct"] = (
self.df["stake"] / self.df["bankroll"] * 100
)
# CLV: difference between your odds and closing odds
# Positive means you got a better price than the close
if "closing_odds" in self.df.columns:
self.df["clv"] = self.df["odds"] - self.df["closing_odds"]
def compute_clv_summary(self) -> Dict:
"""Compute closing line value metrics."""
if "clv" not in self.df.columns:
return {"error": "No CLV data available."}
clv_data = self.df.dropna(subset=["clv"])
return {
"avg_clv": round(clv_data["clv"].mean(), 4),
"pct_beating_close": round(
(clv_data["clv"] > 0).mean() * 100, 1
),
"clv_positive_roi": round(
clv_data[clv_data["clv"] > 0]["profit_loss"].sum()
/ max(clv_data[clv_data["clv"] > 0]["stake"].sum(), 1)
* 100, 2
),
}
def performance_by_day(self) -> pd.DataFrame:
"""Break down performance by day of week."""
self.df["day_of_week"] = self.df["date"].dt.day_name()
return self.df.groupby("day_of_week").agg(
bets=("result", "count"),
wins=("won", "sum"),
total_staked=("stake", "sum"),
total_pl=("profit_loss", "sum"),
).assign(
win_rate=lambda x: round(x["wins"] / x["bets"] * 100, 1),
roi=lambda x: round(
x["total_pl"] / x["total_staked"] * 100, 2
),
)
def override_analysis(self) -> Dict:
"""Compare model-following vs override performance."""
if "is_override" not in self.df.columns:
return {"error": "No override data."}
overrides = self.df[self.df["is_override"] == True]
following = self.df[self.df["is_override"] == False]
return {
"override_count": len(overrides),
"override_roi": round(
overrides["profit_loss"].sum()
/ max(overrides["stake"].sum(), 1) * 100, 2
) if len(overrides) > 0 else 0.0,
"model_following_roi": round(
following["profit_loss"].sum()
/ max(following["stake"].sum(), 1) * 100, 2
) if len(following) > 0 else 0.0,
}
Insights from Version 2
With four months of enriched data (312 additional bets, 499 total), David discovered:
-
The weekend problem was emotional. Saturday bets coincided with social drinking and fatigue. His emotional state ratings on Saturdays averaged 4.2/10 versus 6.8/10 on weekdays. Bets placed when his emotional state was below 5 had a -4.3% ROI.
-
His CLV was positive overall (+1.2 cents/dollar) but negative on override bets (-0.8 cents/dollar). His model was better than his gut.
-
Overrides destroyed value. Model-following bets yielded +3.1%; override bets yielded -2.4%. The 14% of his bets that were overrides were dragging down his entire portfolio.
-
Confirmation bias was visible in reasoning text. Bets with reasoning containing narrative language ("momentum," "revenge game") had -3.8% ROI. Bets with purely quantitative reasoning had +4.1% ROI.
Version 3: Full Automation (Months 8-12)
Design Philosophy
Version 3 migrated the journal to a SQLite database with a Python command-line interface for data entry, integrated the discipline enforcement system from Chapter 37, and added automated weekly and monthly reporting.
Architecture
The system comprised three components: a JournalDB class for data persistence, a DisciplineGate class that validated every proposed bet before recording, and a ReportEngine class that generated automated performance reports.
Key Insight: The Integration Effect
The most valuable aspect of Version 3 was not any single feature but the integration of journal, discipline, and analysis into a single workflow. Every bet followed the same path:
- David's model flagged an opportunity.
- He entered the proposed bet into the system.
- The discipline gate checked all rules (stake limits, loss limits, CLV expectation).
- If approved, the system prompted for reasoning and emotional state.
- The bet was recorded with all 28 fields populated.
- Results were entered the next morning.
- Weekly and monthly reports ran automatically.
This integration eliminated the friction that had caused David to skip journal entries in Version 1. The journal was no longer a separate task; it was part of the betting workflow itself.
Results After Twelve Months
David's twelve-month journey produced measurable improvement across every dimension:
| Metric | Before Journal | After 12 Months |
|---|---|---|
| Estimated ROI | "Probably positive" | +3.8% (verified) |
| CLV | Unknown | +1.8 cents/dollar |
| Override rate | Unknown | Reduced from 14% to 3% |
| Emotional state tracking | None | 100% of sessions |
| Weekend ROI | Unknown (-2.1%) | +1.2% (after process changes) |
| Model adherence | Inconsistent | 97% |
The single largest improvement came from eliminating judgment-based overrides, which David estimated saved approximately 1.5 percentage points of ROI annually. The second-largest came from the Saturday protocol (no betting after 8 PM, no betting after any alcohol consumption), which converted his worst day into a slightly positive one.
Key Lessons
-
Start simple and expand based on need. David's Version 1 was seven fields in a Google Sheet. It was incomplete, but it was better than nothing and it revealed the questions that drove Version 2. Attempting to build the full 28-field database from Day 1 would have been overwhelming and likely abandoned.
-
The journal must be embedded in the workflow. When the journal was a separate task (Versions 1 and 2), David's completion rate hovered around 85%. When it became an integrated part of the betting process (Version 3), completion reached 99%.
-
Qualitative data is as valuable as quantitative data. The most impactful discoveries (emotional state correlation, narrative language impact, override performance) came from fields that many bettors consider optional: reasoning text, emotional state, and override flags.
-
The journal evolves with the bettor. Each version was right for its time. A beginner does not need 28 fields; they need the habit of recording anything at all. The sophistication should grow with the bettor's analytical ability and the questions they need to answer.
-
Integration is the multiplier. Journal, discipline, and analysis systems are each valuable individually, but their combined value is multiplicative, not additive. The integration creates a self-reinforcing cycle where data drives insight, insight drives process change, and process change generates better data.
Discussion Questions
-
David's Version 1 captured only seven fields. If you could add only one field to this minimum set, which would you choose and why?
-
The override analysis showed that David's model was better than his gut. Under what conditions might overrides add value, and how would you design a system that preserves valuable overrides while eliminating destructive ones?
-
David discovered that Saturday performance was degraded by alcohol and fatigue. How should a bettor balance the social dimensions of sports fandom with the discipline requirements of professional betting?
-
The migration from Google Sheets to a Python-backed system required significant technical skill. How can non-technical bettors achieve similar analytical depth without coding?
-
David's journal completion rate jumped from 85% to 99% when the journal was embedded in the workflow. What other habit-formation principles from behavioral science could be applied to journal discipline?