Capstone Project 2: Multi-Sport Arbitrage and Value Detection Platform
Project Overview
In this capstone project, you will design and build a production-grade platform that monitors real-time odds from multiple sportsbooks across multiple sports, detects arbitrage opportunities and value bets, and provides an alert and tracking system for managing a betting portfolio. This project puts you in the role of a quantitative betting operations engineer -- someone who builds the infrastructure that turns analytical insights into executable trades.
Arbitrage betting (Chapter 25, Section 25.3) exploits pricing discrepancies between sportsbooks to guarantee a profit regardless of the outcome. Value betting (Chapter 13) identifies situations where a sportsbook's implied probability is lower than the true probability of an event, creating a positive expected value opportunity. Both strategies require fast, reliable data infrastructure, precise mathematical calculations, and disciplined execution tracking.
This project draws on material from across the textbook: odds conversion and implied probability (Chapter 2), expected value theory (Chapter 3), bankroll management (Chapter 4), data literacy and database design (Chapter 5), the betting marketplace and line shopping (Chapters 11--12), value betting theory (Chapter 13), staking strategies (Chapter 14), optimization (Chapter 25), the ML betting pipeline (Chapter 31), discipline systems (Chapter 37), and risk management (Chapter 38).
Learning Objectives
Upon completing this project, you will be able to:
- Architect a real-time data pipeline that ingests odds from multiple sportsbook APIs concurrently.
- Implement the mathematical foundations of arbitrage detection for two-way, three-way, and multi-leg markets.
- Build value detection engines that compare model-derived probabilities to market odds.
- Create real-time alerting infrastructure for time-sensitive betting opportunities.
- Design and implement bet logging and P&L tracking systems.
- Build web-based dashboards for operational monitoring and performance analysis.
- Manage a multi-sport, multi-sportsbook portfolio with proper risk controls.
Requirements and Deliverables
Specific, Measurable Requirements
| Requirement | Minimum Standard | Exceeds Expectations |
|---|---|---|
| Sportsbooks monitored | 5 distinct books | 8+ books including offshore |
| Sports covered | 3 sports (e.g., NFL, NBA, MLB) | 5+ sports including international |
| Arbitrage detection latency | < 5 seconds from odds update to alert | < 1 second |
| Value model sports | 1 sport with probability model | 3+ sports with calibrated models |
| Alert channels | 1 channel (email or console) | 3+ channels (email, SMS, Slack, push) |
| Dashboard | Static Jupyter notebook | Interactive web app (Streamlit/Dash) |
| Bet logging | CSV/spreadsheet | Database with full API tracking |
| Backtest period | 1 month simulated operation | 3+ months with performance attribution |
| Documentation | README + inline comments | Architecture doc + API doc + user guide |
Final Deliverables
- Working platform with complete source code, configuration files, and deployment instructions.
- System architecture document (5--8 pages) with data flow diagrams, component descriptions, and technology choices.
- One-month simulated operation report showing detected opportunities, executed bets, and P&L analysis.
- Live demonstration (15 minutes) showing the platform running, detecting an opportunity, and generating an alert.
Phase 1: Odds Data Infrastructure
Duration: Weeks 1--2 (of 8)
Relevant chapters: Chapter 5 (Data Literacy), Chapter 11 (Betting Markets), Chapter 12 (Line Shopping), Chapter 31 (ML Pipeline)
1.1 Sportsbook API Integration
The foundation of this platform is reliable, fast odds data from multiple sportsbooks. Your primary data source is The Odds API (https://the-odds-api.com/), which aggregates odds from dozens of sportsbooks in a single standardized API.
"""
phase1_odds_infrastructure.py
Real-time odds collection from multiple sportsbooks across multiple sports.
"""
import requests
import sqlite3
import time
import json
import logging
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass, asdict
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
@dataclass
class OddsConfig:
"""Configuration for the odds collection system."""
api_key: str
db_path: str = "data/odds_platform.db"
poll_interval_seconds: int = 60
regions: str = "us,eu,uk,au"
markets: str = "h2h,spreads,totals"
odds_format: str = "american"
bookmakers: str = (
"draftkings,fanduel,betmgm,caesars,pointsbet,"
"wynnbet,bet365,unibet,bovada,betonlineag"
)
sports: List[str] = None
def __post_init__(self):
if self.sports is None:
self.sports = [
"americanfootball_nfl",
"basketball_nba",
"baseball_mlb",
"icehockey_nhl",
"soccer_epl",
]
# ---------------------------------------------------------------------------
# Data Models
# ---------------------------------------------------------------------------
@dataclass
class OddsSnapshot:
"""A single odds snapshot for one outcome at one sportsbook."""
snapshot_id: str
timestamp: str
sport: str
event_id: str
home_team: str
away_team: str
commence_time: str
bookmaker: str
market: str # "h2h", "spreads", "totals"
outcome_name: str # team name, "Over", "Under"
price: int # American odds
point: float # spread or total line (None for h2h)
# ---------------------------------------------------------------------------
# Odds Collection Engine
# ---------------------------------------------------------------------------
class OddsCollector:
"""
Collects odds from The Odds API and stores them in SQLite.
Design decisions (informed by Chapter 31):
- SQLite for simplicity; upgrade to PostgreSQL for production scale
- Concurrent fetching across sports using ThreadPoolExecutor
- Full snapshot history stored for line movement analysis
- Deduplication based on bookmaker + event + market + outcome + price
"""
def __init__(self, config: OddsConfig):
self.config = config
self.db_path = Path(config.db_path)
self.db_path.parent.mkdir(parents=True, exist_ok=True)
self._init_database()
def _init_database(self):
"""Create the odds database schema."""
conn = sqlite3.connect(str(self.db_path))
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS odds_snapshots (
snapshot_id TEXT PRIMARY KEY,
timestamp TEXT NOT NULL,
sport TEXT NOT NULL,
event_id TEXT NOT NULL,
home_team TEXT NOT NULL,
away_team TEXT NOT NULL,
commence_time TEXT,
bookmaker TEXT NOT NULL,
market TEXT NOT NULL,
outcome_name TEXT NOT NULL,
price INTEGER NOT NULL,
point REAL
)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_event_book_market
ON odds_snapshots (event_id, bookmaker, market)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_timestamp
ON odds_snapshots (timestamp)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS api_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
sport TEXT NOT NULL,
status_code INTEGER,
remaining_requests INTEGER,
events_returned INTEGER,
error_message TEXT
)
""")
conn.commit()
conn.close()
def fetch_sport_odds(self, sport: str) -> List[OddsSnapshot]:
"""
Fetch current odds for a single sport from The Odds API.
API documentation: https://the-odds-api.com/liveapi/guides/v4/
Free tier: 500 requests/month
Paid tiers scale up to unlimited requests
Returns a list of OddsSnapshot objects.
"""
url = f"https://api.the-odds-api.com/v4/sports/{sport}/odds/"
params = {
"apiKey": self.config.api_key,
"regions": self.config.regions,
"markets": self.config.markets,
"oddsFormat": self.config.odds_format,
"bookmakers": self.config.bookmakers,
}
try:
response = requests.get(url, params=params, timeout=30)
remaining = response.headers.get("x-requests-remaining", "unknown")
logger.info(f"[{sport}] Status: {response.status_code}, "
f"Remaining API calls: {remaining}")
self._log_api_call(sport, response.status_code,
int(remaining) if remaining != "unknown" else None,
len(response.json()) if response.ok else 0)
response.raise_for_status()
data = response.json()
except requests.RequestException as e:
logger.error(f"[{sport}] API request failed: {e}")
self._log_api_call(sport, getattr(e.response, 'status_code', None),
None, 0, str(e))
return []
snapshots = []
now = datetime.now(timezone.utc).isoformat()
for event in data:
event_id = event["id"]
home = event.get("home_team", "")
away = event.get("away_team", "")
commence = event.get("commence_time", "")
for bookmaker in event.get("bookmakers", []):
book_key = bookmaker["key"]
for market in bookmaker.get("markets", []):
market_key = market["key"]
for outcome in market.get("outcomes", []):
snapshot_id = (
f"{event_id}_{book_key}_{market_key}_"
f"{outcome['name']}_{now}"
)
snapshots.append(OddsSnapshot(
snapshot_id=snapshot_id,
timestamp=now,
sport=sport,
event_id=event_id,
home_team=home,
away_team=away,
commence_time=commence,
bookmaker=book_key,
market=market_key,
outcome_name=outcome["name"],
price=outcome.get("price", 0),
point=outcome.get("point"),
))
return snapshots
def fetch_all_sports(self) -> List[OddsSnapshot]:
"""
Fetch odds for all configured sports concurrently.
Uses ThreadPoolExecutor for parallel API calls (Chapter 31).
"""
all_snapshots = []
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {
executor.submit(self.fetch_sport_odds, sport): sport
for sport in self.config.sports
}
for future in as_completed(futures):
sport = futures[future]
try:
snapshots = future.result()
all_snapshots.extend(snapshots)
logger.info(f"[{sport}] Collected {len(snapshots)} "
f"odds snapshots")
except Exception as e:
logger.error(f"[{sport}] Collection failed: {e}")
return all_snapshots
def store_snapshots(self, snapshots: List[OddsSnapshot]):
"""Store odds snapshots to the database."""
if not snapshots:
return
conn = sqlite3.connect(str(self.db_path))
cursor = conn.cursor()
rows = [
(s.snapshot_id, s.timestamp, s.sport, s.event_id,
s.home_team, s.away_team, s.commence_time,
s.bookmaker, s.market, s.outcome_name, s.price, s.point)
for s in snapshots
]
cursor.executemany("""
INSERT OR IGNORE INTO odds_snapshots
(snapshot_id, timestamp, sport, event_id, home_team,
away_team, commence_time, bookmaker, market,
outcome_name, price, point)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", rows)
conn.commit()
conn.close()
logger.info(f"Stored {len(snapshots)} snapshots to database")
def _log_api_call(self, sport, status_code, remaining, events, error=None):
conn = sqlite3.connect(str(self.db_path))
conn.execute("""
INSERT INTO api_log (timestamp, sport, status_code,
remaining_requests, events_returned,
error_message)
VALUES (?, ?, ?, ?, ?, ?)
""", (datetime.now(timezone.utc).isoformat(), sport,
status_code, remaining, events, error))
conn.commit()
conn.close()
def run_continuous_collection(self):
"""
Run the collector in a continuous loop with configurable
polling interval. This is the production data ingestion loop
(Chapter 31, Section 31.2).
"""
logger.info(f"Starting continuous collection. "
f"Poll interval: {self.config.poll_interval_seconds}s")
logger.info(f"Sports: {self.config.sports}")
logger.info(f"Database: {self.db_path}")
while True:
try:
snapshots = self.fetch_all_sports()
self.store_snapshots(snapshots)
logger.info(f"Cycle complete. Total: {len(snapshots)} snapshots. "
f"Sleeping {self.config.poll_interval_seconds}s...")
except Exception as e:
logger.error(f"Collection cycle failed: {e}", exc_info=True)
time.sleep(self.config.poll_interval_seconds)
# ---------------------------------------------------------------------------
# Data Access Layer
# ---------------------------------------------------------------------------
class OddsDataAccess:
"""
Query interface for the odds database.
Provides methods for retrieving current odds, historical snapshots,
and line movement data.
"""
def __init__(self, db_path: str = "data/odds_platform.db"):
self.db_path = db_path
def get_current_odds(self, sport: Optional[str] = None,
event_id: Optional[str] = None) -> pd.DataFrame:
"""
Get the most recent odds snapshot for each event/bookmaker/market
combination. This is the "current state" of the market.
"""
import pandas as pd
conn = sqlite3.connect(self.db_path)
query = """
SELECT o.*
FROM odds_snapshots o
INNER JOIN (
SELECT event_id, bookmaker, market, outcome_name,
MAX(timestamp) as max_ts
FROM odds_snapshots
{where_clause}
GROUP BY event_id, bookmaker, market, outcome_name
) latest
ON o.event_id = latest.event_id
AND o.bookmaker = latest.bookmaker
AND o.market = latest.market
AND o.outcome_name = latest.outcome_name
AND o.timestamp = latest.max_ts
"""
conditions = []
params = []
if sport:
conditions.append("sport = ?")
params.append(sport)
if event_id:
conditions.append("event_id = ?")
params.append(event_id)
where_clause = ""
if conditions:
where_clause = "WHERE " + " AND ".join(conditions)
query = query.format(where_clause=where_clause)
df = pd.read_sql(query, conn, params=params)
conn.close()
return df
def get_line_movement(self, event_id: str, bookmaker: str,
market: str) -> pd.DataFrame:
"""Get the full history of odds changes for a specific line."""
import pandas as pd
conn = sqlite3.connect(self.db_path)
df = pd.read_sql("""
SELECT timestamp, outcome_name, price, point
FROM odds_snapshots
WHERE event_id = ? AND bookmaker = ? AND market = ?
ORDER BY timestamp
""", conn, params=[event_id, bookmaker, market])
conn.close()
return df
def get_upcoming_events(self, sport: Optional[str] = None) -> pd.DataFrame:
"""Get unique upcoming events with their earliest commence time."""
import pandas as pd
conn = sqlite3.connect(self.db_path)
query = """
SELECT DISTINCT event_id, sport, home_team, away_team,
commence_time
FROM odds_snapshots
WHERE commence_time > datetime('now')
"""
params = []
if sport:
query += " AND sport = ?"
params.append(sport)
query += " ORDER BY commence_time"
df = pd.read_sql(query, conn, params=params)
conn.close()
return df
1.2 Database Design
The database schema stores every odds snapshot with a timestamp, enabling both real-time queries (current best odds) and historical analysis (line movements, closing line value). The schema follows the data modeling principles from Chapter 5.
Core tables:
| Table | Purpose | Key Columns |
|---|---|---|
| odds_snapshots | Every odds observation | event_id, bookmaker, market, outcome, price, point, timestamp |
| events | Unique sporting events | event_id, sport, home, away, commence_time, result |
| arbitrage_opportunities | Detected arb instances | event_id, timestamp, market, arb_percentage, books_involved |
| value_opportunities | Detected value bets | event_id, timestamp, market, model_prob, implied_prob, edge |
| bets | Executed bet records | bet_id, event_id, bookmaker, market, outcome, odds, stake, result |
| bankroll_log | Running bankroll balance | timestamp, balance, change, reason |
| api_log | API call tracking | timestamp, sport, status, remaining_requests |
Phase 1 Checklist
- [ ] The Odds API key obtained and tested (free tier is sufficient for development)
- [ ] Odds collection works for all target sports
- [ ] Database schema created with proper indexes
- [ ] Continuous collection loop runs without crashing for 24 hours
- [ ] API rate limits respected and tracked
- [ ] Data quality verified: odds values are reasonable, no null event IDs
- [ ] Line movement history accumulating correctly
Phase 2: Arbitrage Detection Engine
Duration: Weeks 2--3
Relevant chapters: Chapter 2 (Probability and Odds), Chapter 25 (Optimization, Section 25.3)
2.1 Mathematical Foundation
An arbitrage opportunity exists when the combined implied probabilities of all outcomes at the best available odds across sportsbooks sum to less than 1.0. Chapter 25, Section 25.3 provides the formal definition:
For a two-outcome market (e.g., NFL moneyline), arbitrage exists when:
1/decimal_odds_A + 1/decimal_odds_B < 1.0
The arbitrage percentage is:
arb_pct = (1.0 - (1/decimal_odds_A + 1/decimal_odds_B)) * 100
The optimal stake allocation to guarantee equal profit regardless of outcome follows from Chapter 25:
stake_A = total_stake * (1/decimal_odds_A) / (1/decimal_odds_A + 1/decimal_odds_B)
stake_B = total_stake * (1/decimal_odds_B) / (1/decimal_odds_A + 1/decimal_odds_B)
For three-outcome markets (e.g., soccer with draw), the same logic extends to three legs.
"""
phase2_arbitrage_detection.py
Real-time arbitrage detection across multiple sportsbooks.
"""
import numpy as np
import pandas as pd
from itertools import combinations, product
from typing import List, Dict, Optional, Tuple
from dataclasses import dataclass
from datetime import datetime, timezone
import logging
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Odds Conversion Utilities (Chapter 2)
# ---------------------------------------------------------------------------
def american_to_decimal(american: int) -> float:
"""
Convert American odds to decimal odds (Chapter 2, Section 2.2).
American +150 -> Decimal 2.50
American -200 -> Decimal 1.50
"""
if american > 0:
return 1.0 + american / 100.0
elif american < 0:
return 1.0 + 100.0 / abs(american)
else:
return 1.0 # Even money edge case
def american_to_implied_prob(american: int) -> float:
"""
Convert American odds to implied probability (Chapter 2, Section 2.3).
This includes the vig; the raw implied probability.
"""
if american > 0:
return 100.0 / (american + 100.0)
elif american < 0:
return abs(american) / (abs(american) + 100.0)
else:
return 0.5
# ---------------------------------------------------------------------------
# Arbitrage Detection
# ---------------------------------------------------------------------------
@dataclass
class ArbitrageOpportunity:
"""Represents a detected arbitrage opportunity."""
event_id: str
sport: str
home_team: str
away_team: str
market: str
arb_percentage: float # Guaranteed profit percentage
total_implied_probability: float # Sum of implied probs (< 1.0 for arb)
legs: List[Dict] # Each leg: book, outcome, odds, stake_pct
detected_at: str
estimated_profit_per_100: float # Profit per $100 wagered
class ArbitrageDetector:
"""
Detects arbitrage opportunities across sportsbooks in real time.
Mathematical basis from Chapter 25, Section 25.3:
Arbitrage exists when the best available odds across all sportsbooks
for each outcome in a market produce a combined implied probability
strictly less than 1.0.
Detection algorithm:
1. For each event and market, find the best odds for each outcome
across all sportsbooks.
2. Convert each best odds to implied probability.
3. Sum the implied probabilities.
4. If the sum < 1.0, an arbitrage opportunity exists.
5. Calculate optimal stake allocation and guaranteed profit.
"""
def __init__(self, min_arb_pct: float = 0.1, max_arb_pct: float = 20.0):
"""
Parameters:
- min_arb_pct: minimum arbitrage percentage to report (filters noise)
- max_arb_pct: maximum arb percentage to report (very large arbs
are usually data errors, per Chapter 25)
"""
self.min_arb_pct = min_arb_pct
self.max_arb_pct = max_arb_pct
def detect_two_way_arbitrage(self, event_odds: pd.DataFrame) -> Optional[ArbitrageOpportunity]:
"""
Detect arbitrage in a two-outcome market (moneyline, spread, total).
event_odds should contain columns:
event_id, sport, home_team, away_team, bookmaker, market,
outcome_name, price
For a two-way market, there should be exactly 2 distinct outcomes.
"""
# Get unique outcomes
outcomes = event_odds["outcome_name"].unique()
if len(outcomes) != 2:
return None
event_id = event_odds["event_id"].iloc[0]
sport = event_odds["sport"].iloc[0]
home = event_odds["home_team"].iloc[0]
away = event_odds["away_team"].iloc[0]
market = event_odds["market"].iloc[0]
# Find best odds for each outcome
best_odds = {}
for outcome in outcomes:
outcome_df = event_odds[event_odds["outcome_name"] == outcome]
best_row = outcome_df.loc[outcome_df["price"].idxmax()]
best_odds[outcome] = {
"bookmaker": best_row["bookmaker"],
"price": int(best_row["price"]),
"decimal_odds": american_to_decimal(int(best_row["price"])),
"implied_prob": american_to_implied_prob(int(best_row["price"])),
"point": best_row.get("point"),
}
# Check for arbitrage
total_implied = sum(o["implied_prob"] for o in best_odds.values())
if total_implied < 1.0:
arb_pct = (1.0 - total_implied) * 100.0
if arb_pct < self.min_arb_pct or arb_pct > self.max_arb_pct:
return None
# Calculate optimal stake allocation
legs = []
for outcome, odds_info in best_odds.items():
stake_pct = (1.0 / odds_info["decimal_odds"]) / total_implied
legs.append({
"outcome": outcome,
"bookmaker": odds_info["bookmaker"],
"american_odds": odds_info["price"],
"decimal_odds": odds_info["decimal_odds"],
"implied_probability": odds_info["implied_prob"],
"stake_percentage": stake_pct,
"point": odds_info["point"],
})
# Profit per $100 total stake
# If we invest $100 total, we get back:
# stake_on_A * decimal_odds_A = (100 * stake_pct_A) * decimal_A
# This is the same regardless of which outcome hits
profit_per_100 = 100.0 * (1.0 / total_implied - 1.0)
return ArbitrageOpportunity(
event_id=event_id,
sport=sport,
home_team=home,
away_team=away,
market=market,
arb_percentage=arb_pct,
total_implied_probability=total_implied,
legs=legs,
detected_at=datetime.now(timezone.utc).isoformat(),
estimated_profit_per_100=profit_per_100,
)
return None
def detect_three_way_arbitrage(self, event_odds: pd.DataFrame) -> Optional[ArbitrageOpportunity]:
"""
Detect arbitrage in a three-outcome market (soccer: home/draw/away).
The math extends naturally from the two-way case:
Sum of 1/decimal_odds for all three outcomes must be < 1.0.
"""
outcomes = event_odds["outcome_name"].unique()
if len(outcomes) != 3:
return None
event_id = event_odds["event_id"].iloc[0]
sport = event_odds["sport"].iloc[0]
home = event_odds["home_team"].iloc[0]
away = event_odds["away_team"].iloc[0]
market = event_odds["market"].iloc[0]
best_odds = {}
for outcome in outcomes:
outcome_df = event_odds[event_odds["outcome_name"] == outcome]
best_row = outcome_df.loc[outcome_df["price"].idxmax()]
best_odds[outcome] = {
"bookmaker": best_row["bookmaker"],
"price": int(best_row["price"]),
"decimal_odds": american_to_decimal(int(best_row["price"])),
"implied_prob": american_to_implied_prob(int(best_row["price"])),
}
total_implied = sum(o["implied_prob"] for o in best_odds.values())
if total_implied < 1.0:
arb_pct = (1.0 - total_implied) * 100.0
if arb_pct < self.min_arb_pct or arb_pct > self.max_arb_pct:
return None
legs = []
for outcome, odds_info in best_odds.items():
stake_pct = (1.0 / odds_info["decimal_odds"]) / total_implied
legs.append({
"outcome": outcome,
"bookmaker": odds_info["bookmaker"],
"american_odds": odds_info["price"],
"decimal_odds": odds_info["decimal_odds"],
"implied_probability": odds_info["implied_prob"],
"stake_percentage": stake_pct,
})
profit_per_100 = 100.0 * (1.0 / total_implied - 1.0)
return ArbitrageOpportunity(
event_id=event_id,
sport=sport,
home_team=home,
away_team=away,
market=market,
arb_percentage=arb_pct,
total_implied_probability=total_implied,
legs=legs,
detected_at=datetime.now(timezone.utc).isoformat(),
estimated_profit_per_100=profit_per_100,
)
return None
def scan_all_events(self, current_odds: pd.DataFrame) -> List[ArbitrageOpportunity]:
"""
Scan all current events and markets for arbitrage opportunities.
Iterates over every unique (event_id, market) combination and
checks for arbitrage using the appropriate method (2-way or 3-way).
"""
opportunities = []
groups = current_odds.groupby(["event_id", "market"])
for (event_id, market), group in groups:
n_outcomes = group["outcome_name"].nunique()
if n_outcomes == 2:
arb = self.detect_two_way_arbitrage(group)
elif n_outcomes == 3:
arb = self.detect_three_way_arbitrage(group)
else:
continue
if arb is not None:
opportunities.append(arb)
# Sort by arb percentage (most profitable first)
opportunities.sort(key=lambda x: x.arb_percentage, reverse=True)
logger.info(f"Scanned {len(groups)} event-markets, "
f"found {len(opportunities)} arbitrage opportunities")
return opportunities
def calculate_arb_stakes(total_stake: float,
arb: ArbitrageOpportunity) -> Dict[str, float]:
"""
Calculate exact dollar amounts to stake on each leg of an arbitrage.
Returns a dictionary mapping outcome name to dollar stake.
The total of all stakes equals total_stake.
The guaranteed profit equals total_stake * arb_percentage / 100.
"""
stakes = {}
for leg in arb.legs:
stakes[leg["outcome"]] = round(total_stake * leg["stake_percentage"], 2)
guaranteed_profit = round(total_stake * arb.arb_percentage / 100.0, 2)
return {
"stakes": stakes,
"total_staked": sum(stakes.values()),
"guaranteed_profit": guaranteed_profit,
"profit_percentage": arb.arb_percentage,
}
Phase 2 Checklist
- [ ] Two-way arbitrage detection implemented and tested against known examples
- [ ] Three-way arbitrage detection implemented for soccer markets
- [ ] Arbitrage scanning runs across all events in < 5 seconds
- [ ] Data error filtering in place (arbs > 20% flagged for review)
- [ ] Stake calculation verified: guaranteed profit math checked by hand
- [ ] Arbitrage opportunities stored to database with timestamps
Phase 3: Value Detection
Duration: Weeks 3--4
Relevant chapters: Chapter 2 (Implied Probability), Chapter 3 (Expected Value), Chapter 13 (Value Betting), Chapter 27 (Classification)
3.1 Value Betting Framework
Value detection requires comparing model-derived "true" probabilities to market-implied probabilities. When your model's probability significantly exceeds the market's implied probability (after removing the vig), you have a value bet (Chapter 13, Section 13.1).
"""
phase3_value_detection.py
Identify value betting opportunities by comparing model probabilities
to market-implied probabilities.
"""
import numpy as np
import pandas as pd
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass
from datetime import datetime, timezone
from scipy.stats import norm
import logging
logger = logging.getLogger(__name__)
@dataclass
class ValueOpportunity:
"""Represents a detected value betting opportunity."""
event_id: str
sport: str
home_team: str
away_team: str
market: str
outcome: str
model_probability: float
market_implied_probability: float
fair_implied_probability: float # After vig removal
edge_vs_fair: float # model_prob - fair_implied_prob
expected_value_pct: float # EV as percentage of stake
best_odds: int # American odds at best book
best_book: str
confidence: str # "low", "medium", "high"
detected_at: str
class ValueDetector:
"""
Detects value betting opportunities across sports.
The value detection process (Chapter 13):
1. Obtain model probability estimates for each outcome
2. Extract market-implied probabilities from best available odds
3. Remove the vig using the multiplicative method (Chapter 2)
4. Compare model probability to vig-free implied probability
5. If model_prob > fair_implied_prob by a significant margin,
flag as a value bet
6. Calculate expected value: EV = model_prob * (decimal_odds) - 1
"""
def __init__(self, min_edge: float = 0.03, min_ev_pct: float = 0.02,
confidence_thresholds: Tuple[float, float] = (0.03, 0.06)):
"""
Parameters:
- min_edge: minimum probability edge to consider (3% default)
- min_ev_pct: minimum EV as fraction of stake (2% default)
- confidence_thresholds: (medium, high) edge thresholds
"""
self.min_edge = min_edge
self.min_ev_pct = min_ev_pct
self.low_threshold = confidence_thresholds[0]
self.high_threshold = confidence_thresholds[1]
def remove_vig_multiplicative(self, implied_probs: List[float]) -> List[float]:
"""
Remove the vig using the multiplicative method (Chapter 2, Section 2.3).
For a market with outcomes having implied probabilities p1, p2, ..., pn,
the vig-free probabilities are:
fair_pi = pi / sum(all pi)
This is the simplest vig removal method and assumes the vig is
distributed proportionally across all outcomes.
"""
total = sum(implied_probs)
if total == 0:
return implied_probs
return [p / total for p in implied_probs]
def remove_vig_power(self, implied_probs: List[float],
tol: float = 1e-8) -> List[float]:
"""
Remove the vig using the power method (Shin model), from
Chapter 2, Section 2.3. This method is more accurate for
markets where the vig is asymmetrically distributed (e.g.,
heavy favorites pay more vig).
Solves for k such that: sum(pi^k) = 1
where pi are the raw implied probabilities.
"""
from scipy.optimize import brentq
def objective(k):
return sum(p ** k for p in implied_probs) - 1.0
try:
k = brentq(objective, 0.5, 2.0, xtol=tol)
return [p ** k for p in implied_probs]
except ValueError:
# Fallback to multiplicative if power method fails
return self.remove_vig_multiplicative(implied_probs)
def detect_value(self, event_odds: pd.DataFrame,
model_probs: Dict[str, float],
vig_method: str = "multiplicative") -> List[ValueOpportunity]:
"""
Detect value opportunities for a single event.
Parameters:
- event_odds: DataFrame with current odds from all sportsbooks
- model_probs: dict mapping outcome_name to model probability
e.g., {"Kansas City Chiefs": 0.62, "Buffalo Bills": 0.38}
- vig_method: "multiplicative" or "power" (Chapter 2)
"""
if event_odds.empty:
return []
event_id = event_odds["event_id"].iloc[0]
sport = event_odds["sport"].iloc[0]
home = event_odds["home_team"].iloc[0]
away = event_odds["away_team"].iloc[0]
market = event_odds["market"].iloc[0]
outcomes = event_odds["outcome_name"].unique()
opportunities = []
# Find best odds for each outcome
best_odds = {}
for outcome in outcomes:
outcome_df = event_odds[event_odds["outcome_name"] == outcome]
best_row = outcome_df.loc[outcome_df["price"].idxmax()]
best_odds[outcome] = {
"bookmaker": best_row["bookmaker"],
"price": int(best_row["price"]),
"implied_prob": american_to_implied_prob(int(best_row["price"])),
"decimal_odds": american_to_decimal(int(best_row["price"])),
}
# Remove vig to get fair probabilities
raw_probs = [best_odds[o]["implied_prob"] for o in outcomes]
if vig_method == "power":
fair_probs = self.remove_vig_power(raw_probs)
else:
fair_probs = self.remove_vig_multiplicative(raw_probs)
fair_prob_map = dict(zip(outcomes, fair_probs))
# Check each outcome for value
for outcome in outcomes:
if outcome not in model_probs:
continue
model_prob = model_probs[outcome]
fair_prob = fair_prob_map[outcome]
market_prob = best_odds[outcome]["implied_prob"]
decimal_odds = best_odds[outcome]["decimal_odds"]
# Edge = model probability minus vig-free market probability
edge = model_prob - fair_prob
# Expected value: EV = p * odds - 1 (Chapter 3)
ev_pct = model_prob * decimal_odds - 1.0
if edge >= self.min_edge and ev_pct >= self.min_ev_pct:
# Assign confidence level
if edge >= self.high_threshold:
confidence = "high"
elif edge >= self.low_threshold:
confidence = "medium"
else:
confidence = "low"
opportunities.append(ValueOpportunity(
event_id=event_id,
sport=sport,
home_team=home,
away_team=away,
market=market,
outcome=outcome,
model_probability=round(model_prob, 4),
market_implied_probability=round(market_prob, 4),
fair_implied_probability=round(fair_prob, 4),
edge_vs_fair=round(edge, 4),
expected_value_pct=round(ev_pct, 4),
best_odds=best_odds[outcome]["price"],
best_book=best_odds[outcome]["bookmaker"],
confidence=confidence,
detected_at=datetime.now(timezone.utc).isoformat(),
))
return opportunities
# ---------------------------------------------------------------------------
# Sport-Specific Probability Models
# ---------------------------------------------------------------------------
class SimpleEloModel:
"""
A simple Elo-based probability model for value detection.
This serves as the baseline probability engine. For production use,
replace with the sport-specific models from Chapters 15--22.
The Elo model converts rating differences to win probabilities
using the logistic function (Chapter 26):
P(A wins) = 1 / (1 + 10^((Rb - Ra) / 400))
"""
def __init__(self, k_factor: float = 20.0, home_adv: float = 100.0,
initial_rating: float = 1500.0):
self.k = k_factor
self.home_adv = home_adv
self.ratings = {}
self.initial_rating = initial_rating
def get_rating(self, team: str) -> float:
return self.ratings.get(team, self.initial_rating)
def predict_win_probability(self, home_team: str,
away_team: str) -> Dict[str, float]:
"""
Predict win probability for a matchup.
Returns {home_team: prob, away_team: prob}.
"""
home_r = self.get_rating(home_team) + self.home_adv
away_r = self.get_rating(away_team)
expected_home = 1.0 / (1.0 + 10.0 ** ((away_r - home_r) / 400.0))
expected_away = 1.0 - expected_home
return {
home_team: round(expected_home, 4),
away_team: round(expected_away, 4),
}
def update(self, home_team: str, away_team: str, home_won: bool):
"""Update ratings after a game result."""
probs = self.predict_win_probability(home_team, away_team)
actual_home = 1.0 if home_won else 0.0
home_r = self.get_rating(home_team)
away_r = self.get_rating(away_team)
update = self.k * (actual_home - probs[home_team])
self.ratings[home_team] = home_r + update
self.ratings[away_team] = away_r - update
def train_on_historical_results(self, results: pd.DataFrame):
"""
Train the Elo model on historical game results.
results should have: home_team, away_team, home_won (bool)
"""
for _, game in results.iterrows():
self.update(game["home_team"], game["away_team"], game["home_won"])
3.2 Multi-Sport Probability Models
You must implement probability models for at least one sport (three for exceeding expectations). Each model should be calibrated (Chapter 27, Section 27.3) and backtested (Chapter 30).
| Sport | Recommended Model Approach | Chapter Reference |
|---|---|---|
| NFL | EPA-based regression + Elo ensemble | Ch 15, 26, 27 |
| NBA | Adjusted net rating + Four Factors model | Ch 16, 26 |
| MLB | Starting pitcher FIP + bullpen ERA | Ch 17, 9 |
| NHL | Corsi/Fenwick possession model | Ch 18, 26 |
| Soccer (EPL) | Expected goals (xG) + Elo | Ch 19, 26 |
Phase 3 Checklist
- [ ] At least one sport has a calibrated probability model
- [ ] Vig removal implemented using both multiplicative and power methods
- [ ] Value detection tested against known value situations
- [ ] Edge thresholds calibrated against historical backtest
- [ ] Confidence levels assigned to each value opportunity
- [ ] Expected value calculated for every flagged opportunity
Phase 4: Alert System
Duration: Weeks 4--5
Relevant chapters: Chapter 31 (ML Pipeline, monitoring and alerting), Chapter 37 (Discipline Systems)
4.1 Real-Time Notification Engine
When the platform detects an arbitrage or value opportunity, it must notify you immediately. Time sensitivity is critical for arbitrage (opportunities may last seconds) and important for value (lines move as information is priced in).
"""
phase4_alert_system.py
Real-time alert system for arbitrage and value opportunities.
"""
import smtplib
import requests
import json
import logging
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from typing import List, Optional
from dataclasses import dataclass, asdict
from datetime import datetime
logger = logging.getLogger(__name__)
@dataclass
class AlertConfig:
"""Configuration for alert channels."""
# Email
smtp_server: str = ""
smtp_port: int = 587
email_from: str = ""
email_to: str = ""
email_password: str = ""
# Slack
slack_webhook_url: str = ""
# Telegram
telegram_bot_token: str = ""
telegram_chat_id: str = ""
# Console (always enabled)
console_enabled: bool = True
# Alert thresholds
arb_alert_min_pct: float = 0.5 # Minimum arb% to trigger alert
value_alert_min_ev: float = 0.03 # Minimum EV% to trigger alert
class AlertManager:
"""
Manages sending alerts through multiple channels.
Design principle (Chapter 37): Alerts should be actionable, timely,
and not overwhelming. Too many low-quality alerts lead to alert
fatigue and missed opportunities.
"""
def __init__(self, config: AlertConfig):
self.config = config
self.alert_history = []
def send_arbitrage_alert(self, arb: 'ArbitrageOpportunity',
stake_info: dict):
"""Send an alert for a detected arbitrage opportunity."""
if arb.arb_percentage < self.config.arb_alert_min_pct:
return
subject = (f"ARB ALERT: {arb.sport} | "
f"{arb.home_team} vs {arb.away_team} | "
f"{arb.arb_percentage:.2f}%")
body_lines = [
f"ARBITRAGE OPPORTUNITY DETECTED",
f"{'=' * 50}",
f"Sport: {arb.sport}",
f"Event: {arb.home_team} vs {arb.away_team}",
f"Market: {arb.market}",
f"Arb Percentage: {arb.arb_percentage:.2f}%",
f"Profit per $100: ${arb.estimated_profit_per_100:.2f}",
f"",
f"LEGS:",
]
for leg in arb.legs:
body_lines.append(
f" {leg['outcome']}: {leg['bookmaker']} @ "
f"{leg['american_odds']:+d} | "
f"Stake: {leg['stake_percentage']:.1%}"
)
if stake_info:
body_lines.extend([
f"",
f"STAKE ALLOCATION (for ${stake_info['total_staked']:.2f}):",
])
for outcome, amount in stake_info["stakes"].items():
body_lines.append(f" {outcome}: ${amount:.2f}")
body_lines.append(
f"Guaranteed Profit: ${stake_info['guaranteed_profit']:.2f}"
)
body_lines.append(f"\nDetected at: {arb.detected_at}")
body = "\n".join(body_lines)
self._dispatch_alert(subject, body, priority="high")
def send_value_alert(self, value: 'ValueOpportunity'):
"""Send an alert for a detected value betting opportunity."""
if value.expected_value_pct < self.config.value_alert_min_ev:
return
subject = (f"VALUE: {value.sport} | {value.outcome} | "
f"Edge: {value.edge_vs_fair:.1%} | "
f"EV: {value.expected_value_pct:.1%}")
body_lines = [
f"VALUE BET DETECTED",
f"{'=' * 50}",
f"Sport: {value.sport}",
f"Event: {value.home_team} vs {value.away_team}",
f"Market: {value.market}",
f"Bet On: {value.outcome}",
f"",
f"ANALYSIS:",
f" Model Probability: {value.model_probability:.1%}",
f" Market Implied (raw): {value.market_implied_probability:.1%}",
f" Market Implied (fair): {value.fair_implied_probability:.1%}",
f" Edge vs Fair: {value.edge_vs_fair:.1%}",
f" Expected Value: {value.expected_value_pct:.1%}",
f" Confidence: {value.confidence.upper()}",
f"",
f"EXECUTION:",
f" Best Odds: {value.best_odds:+d} @ {value.best_book}",
f"\nDetected at: {value.detected_at}",
]
body = "\n".join(body_lines)
priority = "high" if value.confidence == "high" else "medium"
self._dispatch_alert(subject, body, priority=priority)
def _dispatch_alert(self, subject: str, body: str,
priority: str = "medium"):
"""Send alert through all configured channels."""
self.alert_history.append({
"timestamp": datetime.utcnow().isoformat(),
"subject": subject,
"priority": priority,
})
if self.config.console_enabled:
self._send_console(subject, body)
if self.config.email_from:
self._send_email(subject, body)
if self.config.slack_webhook_url:
self._send_slack(subject, body, priority)
if self.config.telegram_bot_token:
self._send_telegram(subject, body)
def _send_console(self, subject: str, body: str):
border = "=" * 60
print(f"\n{border}")
print(f" ALERT: {subject}")
print(f"{border}")
print(body)
print(f"{border}\n")
def _send_email(self, subject: str, body: str):
try:
msg = MIMEMultipart()
msg["From"] = self.config.email_from
msg["To"] = self.config.email_to
msg["Subject"] = subject
msg.attach(MIMEText(body, "plain"))
with smtplib.SMTP(self.config.smtp_server,
self.config.smtp_port) as server:
server.starttls()
server.login(self.config.email_from,
self.config.email_password)
server.send_message(msg)
logger.info(f"Email alert sent: {subject}")
except Exception as e:
logger.error(f"Email alert failed: {e}")
def _send_slack(self, subject: str, body: str, priority: str):
try:
emoji = ":rotating_light:" if priority == "high" else ":chart_with_upwards_trend:"
payload = {
"text": f"{emoji} *{subject}*\n```{body}```"
}
response = requests.post(
self.config.slack_webhook_url,
json=payload, timeout=10
)
response.raise_for_status()
logger.info(f"Slack alert sent: {subject}")
except Exception as e:
logger.error(f"Slack alert failed: {e}")
def _send_telegram(self, subject: str, body: str):
try:
url = (f"https://api.telegram.org/bot"
f"{self.config.telegram_bot_token}/sendMessage")
payload = {
"chat_id": self.config.telegram_chat_id,
"text": f"*{subject}*\n\n{body}",
"parse_mode": "Markdown"
}
response = requests.post(url, json=payload, timeout=10)
response.raise_for_status()
logger.info(f"Telegram alert sent: {subject}")
except Exception as e:
logger.error(f"Telegram alert failed: {e}")
Phase 4 Checklist
- [ ] At least one alert channel functional (email, Slack, or Telegram)
- [ ] Arbitrage alerts include all legs, odds, stake allocation, and profit
- [ ] Value alerts include model probability, market probability, edge, and EV
- [ ] Alert thresholds configurable to prevent alert fatigue
- [ ] Alert history logged for later review
- [ ] Alert delivery latency under 10 seconds from detection
Phase 5: Execution and Tracking
Duration: Weeks 5--6
Relevant chapters: Chapter 4 (Bankroll Management), Chapter 14 (Advanced Bankroll), Chapter 37 (Discipline and Record-Keeping)
5.1 Bet Logging and Portfolio Management
Every bet must be logged with full detail per Chapter 37's discipline framework. The bet log is the foundation of all performance analysis.
"""
phase5_execution_tracking.py
Bet logging, portfolio management, and P&L tracking.
"""
import sqlite3
import pandas as pd
import numpy as np
from datetime import datetime, timezone
from typing import Optional, List, Dict
from dataclasses import dataclass
import uuid
import logging
logger = logging.getLogger(__name__)
@dataclass
class BetRecord:
"""Complete record of a single bet (Chapter 37, Section 37.1)."""
bet_id: str
timestamp: str
event_id: str
sport: str
home_team: str
away_team: str
market: str # h2h, spreads, totals
outcome: str # team name, Over, Under
bookmaker: str
odds_placed: int # American odds at time of placement
line_placed: float # spread/total line at placement (None for h2h)
closing_odds: int # Closing line odds (filled after event)
closing_line: float # Closing spread/total (filled after event)
stake: float # Dollar amount risked
strategy: str # "arbitrage", "value", "model"
model_probability: float # Model's estimated probability
market_probability: float # Market implied probability at placement
edge_at_placement: float # model_prob - fair_market_prob
confidence: str # "low", "medium", "high"
notes: str # Free text notes
result: str # "win", "loss", "push", "pending"
profit: float # Dollar profit/loss (filled after settlement)
clv: float # Closing Line Value (filled after event)
class BetTracker:
"""
Comprehensive bet tracking and portfolio management system.
Records every bet with the full set of fields recommended in
Chapter 37 (Section 37.1), including reasoning, model output,
and market context.
"""
def __init__(self, db_path: str = "data/odds_platform.db",
initial_bankroll: float = 10000.0):
self.db_path = db_path
self.initial_bankroll = initial_bankroll
self._init_tables()
def _init_tables(self):
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS bets (
bet_id TEXT PRIMARY KEY,
timestamp TEXT NOT NULL,
event_id TEXT NOT NULL,
sport TEXT NOT NULL,
home_team TEXT NOT NULL,
away_team TEXT NOT NULL,
market TEXT NOT NULL,
outcome TEXT NOT NULL,
bookmaker TEXT NOT NULL,
odds_placed INTEGER NOT NULL,
line_placed REAL,
closing_odds INTEGER,
closing_line REAL,
stake REAL NOT NULL,
strategy TEXT NOT NULL,
model_probability REAL,
market_probability REAL,
edge_at_placement REAL,
confidence TEXT,
notes TEXT,
result TEXT DEFAULT 'pending',
profit REAL DEFAULT 0.0,
clv REAL
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS bankroll_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
balance REAL NOT NULL,
change REAL NOT NULL,
reason TEXT NOT NULL,
bet_id TEXT
)
""")
# Initialize bankroll if empty
cursor.execute("SELECT COUNT(*) FROM bankroll_log")
if cursor.fetchone()[0] == 0:
cursor.execute("""
INSERT INTO bankroll_log (timestamp, balance, change, reason)
VALUES (?, ?, ?, ?)
""", (datetime.now(timezone.utc).isoformat(),
self.initial_bankroll, self.initial_bankroll,
"Initial deposit"))
conn.commit()
conn.close()
def place_bet(self, bet: BetRecord) -> str:
"""Record a new bet and update bankroll."""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute("""
INSERT INTO bets
(bet_id, timestamp, event_id, sport, home_team, away_team,
market, outcome, bookmaker, odds_placed, line_placed,
stake, strategy, model_probability, market_probability,
edge_at_placement, confidence, notes)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
bet.bet_id, bet.timestamp, bet.event_id, bet.sport,
bet.home_team, bet.away_team, bet.market, bet.outcome,
bet.bookmaker, bet.odds_placed, bet.line_placed,
bet.stake, bet.strategy, bet.model_probability,
bet.market_probability, bet.edge_at_placement,
bet.confidence, bet.notes
))
# Update bankroll (deduct stake)
current_balance = self.get_current_bankroll()
new_balance = current_balance - bet.stake
cursor.execute("""
INSERT INTO bankroll_log (timestamp, balance, change, reason, bet_id)
VALUES (?, ?, ?, ?, ?)
""", (bet.timestamp, new_balance, -bet.stake,
f"Bet placed: {bet.outcome} @ {bet.bookmaker}", bet.bet_id))
conn.commit()
conn.close()
logger.info(f"Bet placed: {bet.bet_id} | {bet.outcome} "
f"@ {bet.odds_placed} | ${bet.stake:.2f}")
return bet.bet_id
def settle_bet(self, bet_id: str, result: str,
closing_odds: Optional[int] = None,
closing_line: Optional[float] = None):
"""
Settle a bet and update bankroll.
Also calculates Closing Line Value (CLV) -- the difference
between the odds you received and the closing odds.
CLV is the most reliable predictor of long-term profitability
(Chapter 12, Section 12.2).
"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Get bet details
cursor.execute("SELECT * FROM bets WHERE bet_id = ?", (bet_id,))
row = cursor.fetchone()
if not row:
conn.close()
raise ValueError(f"Bet not found: {bet_id}")
columns = [d[0] for d in cursor.description]
bet_dict = dict(zip(columns, row))
odds_placed = bet_dict["odds_placed"]
stake = bet_dict["stake"]
# Calculate profit
if result == "win":
if odds_placed > 0:
profit = stake * (odds_placed / 100.0)
else:
profit = stake * (100.0 / abs(odds_placed))
elif result == "push":
profit = 0.0
else: # loss
profit = -stake
# Calculate CLV
clv = None
if closing_odds is not None:
placed_decimal = american_to_decimal(odds_placed)
closing_decimal = american_to_decimal(closing_odds)
# CLV as percentage: (placed_decimal / closing_decimal - 1) * 100
clv = (placed_decimal / closing_decimal - 1.0) * 100.0
# Update bet record
cursor.execute("""
UPDATE bets
SET result = ?, profit = ?, closing_odds = ?,
closing_line = ?, clv = ?
WHERE bet_id = ?
""", (result, profit, closing_odds, closing_line, clv, bet_id))
# Update bankroll (return stake + profit for wins, return stake for pushes)
current_balance = self.get_current_bankroll()
if result == "win":
change = stake + profit # Get stake back plus winnings
elif result == "push":
change = stake # Get stake back
else:
change = 0.0 # Stake already deducted at placement
new_balance = current_balance + change
cursor.execute("""
INSERT INTO bankroll_log (timestamp, balance, change, reason, bet_id)
VALUES (?, ?, ?, ?, ?)
""", (datetime.now(timezone.utc).isoformat(), new_balance, change,
f"Settled: {result} (profit: ${profit:.2f})", bet_id))
conn.commit()
conn.close()
logger.info(f"Bet settled: {bet_id} | {result} | "
f"Profit: ${profit:.2f} | CLV: {clv}")
def get_current_bankroll(self) -> float:
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute("""
SELECT balance FROM bankroll_log
ORDER BY id DESC LIMIT 1
""")
result = cursor.fetchone()
conn.close()
return result[0] if result else self.initial_bankroll
def get_performance_summary(self, sport: Optional[str] = None,
strategy: Optional[str] = None,
start_date: Optional[str] = None) -> Dict:
"""
Calculate comprehensive performance metrics (Chapter 37, Section 37.2).
"""
conn = sqlite3.connect(self.db_path)
query = "SELECT * FROM bets WHERE result != 'pending'"
params = []
if sport:
query += " AND sport = ?"
params.append(sport)
if strategy:
query += " AND strategy = ?"
params.append(strategy)
if start_date:
query += " AND timestamp >= ?"
params.append(start_date)
bets = pd.read_sql(query, conn, params=params)
conn.close()
if bets.empty:
return {"total_bets": 0, "message": "No settled bets found"}
total_staked = bets["stake"].sum()
total_profit = bets["profit"].sum()
wins = (bets["result"] == "win").sum()
losses = (bets["result"] == "loss").sum()
pushes = (bets["result"] == "push").sum()
# ROI (Chapter 3, Section 3.4)
roi = (total_profit / total_staked * 100) if total_staked > 0 else 0
# CLV analysis (Chapter 12)
clv_data = bets["clv"].dropna()
avg_clv = clv_data.mean() if len(clv_data) > 0 else None
# Drawdown analysis (Chapter 14, Section 14.4)
cumulative_profit = bets["profit"].cumsum()
running_max = cumulative_profit.cummax()
drawdown = cumulative_profit - running_max
max_drawdown = drawdown.min()
# Sharpe ratio proxy (annualized, Chapter 14, Section 14.2)
if len(bets) > 1 and bets["profit"].std() > 0:
daily_returns = bets["profit"] / bets["stake"]
sharpe = (daily_returns.mean() / daily_returns.std()) * np.sqrt(252)
else:
sharpe = None
return {
"total_bets": len(bets),
"wins": int(wins),
"losses": int(losses),
"pushes": int(pushes),
"win_rate": wins / (wins + losses) if (wins + losses) > 0 else 0,
"total_staked": round(total_staked, 2),
"total_profit": round(total_profit, 2),
"roi_pct": round(roi, 2),
"avg_stake": round(bets["stake"].mean(), 2),
"avg_clv": round(avg_clv, 3) if avg_clv is not None else None,
"max_drawdown": round(max_drawdown, 2),
"sharpe_ratio": round(sharpe, 2) if sharpe is not None else None,
"current_bankroll": round(self.get_current_bankroll(), 2),
}
Phase 5 Checklist
- [ ] Bet logging captures all 20+ fields from Chapter 37
- [ ] Bankroll balance updated in real time after every bet and settlement
- [ ] CLV calculated for every settled bet with available closing lines
- [ ] Performance summary includes ROI, win rate, CLV, max drawdown, Sharpe ratio
- [ ] Filtering by sport, strategy, and date range works correctly
- [ ] Bet settlement handles win, loss, and push cases correctly
Phase 6: Analysis and Reporting
Duration: Weeks 6--7
Relevant chapters: Chapter 30 (Model Evaluation), Chapter 37 (Performance Analysis), Chapter 41 (Performance Attribution)
6.1 Web Dashboard
Build an interactive dashboard using Streamlit, Dash, or a similar framework. The dashboard should provide operational monitoring and performance analysis.
Required dashboard pages:
- Live Monitor -- Current odds, active arbitrage opportunities, active value opportunities, pending bets.
- Arbitrage History -- All detected arbitrage opportunities with timestamps, sizes, and execution status.
- Value History -- All detected value opportunities with edge, EV, and outcome.
- Portfolio Performance -- Cumulative P&L chart, bankroll curve, drawdown chart, ROI by sport, ROI by strategy.
- CLV Analysis -- CLV distribution, average CLV by sport, CLV vs. actual profit scatter plot.
- Performance Attribution -- Decompose returns by sport, by strategy (arbitrage vs. value), by sportsbook, by market type, and by time period.
"""
phase6_dashboard.py
Streamlit-based web dashboard for the arbitrage and value platform.
"""
import streamlit as st
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from datetime import datetime, timedelta
import sqlite3
def load_data(db_path: str = "data/odds_platform.db"):
"""Load all data from the platform database."""
conn = sqlite3.connect(db_path)
bets = pd.read_sql("SELECT * FROM bets", conn)
bankroll = pd.read_sql("SELECT * FROM bankroll_log", conn)
conn.close()
return bets, bankroll
def main():
st.set_page_config(page_title="Betting Platform", layout="wide")
st.title("Multi-Sport Arbitrage & Value Detection Platform")
bets, bankroll = load_data()
# Sidebar navigation
page = st.sidebar.selectbox("Page", [
"Live Monitor",
"Portfolio Performance",
"Arbitrage History",
"Value History",
"CLV Analysis",
"Performance Attribution"
])
if page == "Portfolio Performance":
render_portfolio_page(bets, bankroll)
elif page == "CLV Analysis":
render_clv_page(bets)
elif page == "Performance Attribution":
render_attribution_page(bets)
# ... additional pages
def render_portfolio_page(bets: pd.DataFrame, bankroll: pd.DataFrame):
"""Render the portfolio performance page."""
st.header("Portfolio Performance")
settled = bets[bets["result"] != "pending"].copy()
if settled.empty:
st.warning("No settled bets yet.")
return
# Key metrics row
col1, col2, col3, col4, col5 = st.columns(5)
total_profit = settled["profit"].sum()
total_staked = settled["stake"].sum()
roi = (total_profit / total_staked * 100) if total_staked > 0 else 0
win_rate = (settled["result"] == "win").mean() * 100
col1.metric("Total P&L", f"${total_profit:,.2f}")
col2.metric("ROI", f"{roi:.1f}%")
col3.metric("Win Rate", f"{win_rate:.1f}%")
col4.metric("Total Bets", len(settled))
col5.metric("Bankroll", f"${bankroll['balance'].iloc[-1]:,.2f}")
# Cumulative P&L chart
settled = settled.sort_values("timestamp")
settled["cumulative_pl"] = settled["profit"].cumsum()
fig = go.Figure()
fig.add_trace(go.Scatter(
x=list(range(len(settled))),
y=settled["cumulative_pl"],
mode="lines",
name="Cumulative P&L",
line=dict(color="green" if total_profit > 0 else "red", width=2)
))
fig.update_layout(
title="Cumulative Profit & Loss",
xaxis_title="Bet Number",
yaxis_title="Cumulative P&L ($)",
height=400
)
st.plotly_chart(fig, use_container_width=True)
# Drawdown chart (Chapter 14, Section 14.4)
running_max = settled["cumulative_pl"].cummax()
drawdown = settled["cumulative_pl"] - running_max
fig_dd = go.Figure()
fig_dd.add_trace(go.Scatter(
x=list(range(len(settled))),
y=drawdown,
fill="tozeroy",
name="Drawdown",
line=dict(color="red")
))
fig_dd.update_layout(
title="Drawdown",
xaxis_title="Bet Number",
yaxis_title="Drawdown ($)",
height=300
)
st.plotly_chart(fig_dd, use_container_width=True)
# Performance by sport
by_sport = settled.groupby("sport").agg(
bets=("bet_id", "count"),
profit=("profit", "sum"),
staked=("stake", "sum"),
win_rate=("result", lambda x: (x == "win").mean())
).reset_index()
by_sport["roi"] = by_sport["profit"] / by_sport["staked"] * 100
st.subheader("Performance by Sport")
st.dataframe(by_sport.style.format({
"profit": "${:,.2f}",
"staked": "${:,.2f}",
"win_rate": "{:.1%}",
"roi": "{:.1f}%"
}))
def render_clv_page(bets: pd.DataFrame):
"""
Render CLV analysis page.
CLV (Closing Line Value) is the single most important metric
for evaluating a bettor's long-term edge (Chapter 12, Section 12.2).
Positive CLV means you consistently beat the closing line,
which is strongly predictive of long-term profitability.
"""
st.header("Closing Line Value Analysis")
clv_bets = bets[bets["clv"].notna()].copy()
if clv_bets.empty:
st.warning("No CLV data available yet.")
return
avg_clv = clv_bets["clv"].mean()
st.metric("Average CLV", f"{avg_clv:.2f}%",
delta="Positive = beating the market" if avg_clv > 0 else "Negative = market beats you")
# CLV distribution
fig = px.histogram(clv_bets, x="clv", nbins=30,
title="CLV Distribution")
fig.add_vline(x=0, line_dash="dash", line_color="red")
fig.add_vline(x=avg_clv, line_dash="dash", line_color="green")
st.plotly_chart(fig, use_container_width=True)
# CLV vs actual profit
fig2 = px.scatter(clv_bets, x="clv", y="profit",
color="sport",
title="CLV vs. Actual Profit per Bet")
st.plotly_chart(fig2, use_container_width=True)
def render_attribution_page(bets: pd.DataFrame):
"""
Performance attribution analysis (Chapter 41, Section 41.4).
Decompose returns to understand where profit is coming from.
"""
st.header("Performance Attribution")
settled = bets[bets["result"] != "pending"].copy()
if settled.empty:
st.warning("No settled bets yet.")
return
# Attribution by strategy
st.subheader("By Strategy")
by_strat = settled.groupby("strategy").agg(
bets=("bet_id", "count"),
profit=("profit", "sum"),
staked=("stake", "sum"),
).reset_index()
by_strat["roi"] = by_strat["profit"] / by_strat["staked"] * 100
fig = px.bar(by_strat, x="strategy", y="profit",
title="Profit by Strategy", color="strategy")
st.plotly_chart(fig, use_container_width=True)
# Attribution by bookmaker
st.subheader("By Sportsbook")
by_book = settled.groupby("bookmaker").agg(
bets=("bet_id", "count"),
profit=("profit", "sum"),
avg_clv=("clv", "mean"),
).reset_index()
st.dataframe(by_book.style.format({
"profit": "${:,.2f}",
"avg_clv": "{:.2f}%"
}))
if __name__ == "__main__":
main()
Phase 6 Checklist
- [ ] Dashboard has at least 4 functional pages
- [ ] Cumulative P&L chart displays correctly
- [ ] Drawdown chart updates with new data
- [ ] Performance segmented by sport, strategy, and bookmaker
- [ ] CLV analysis page shows distribution and relationship to profit
- [ ] Dashboard loads within 3 seconds
Grading Rubric
| Component | Weight | Excellent (90-100%) | Good (75-89%) | Satisfactory (60-74%) | Needs Work (<60%) |
|---|---|---|---|---|---|
| Odds Infrastructure | 20% | 8+ books, 5+ sports, < 1s latency, robust error handling | 5 books, 3 sports, < 5s latency | 3 books, 2 sports, works but fragile | Fewer than 3 books; frequent failures |
| Arbitrage Detection | 20% | 2-way + 3-way detection; cross-market arbs; data error filtering | 2-way + 3-way; basic filtering | 2-way only; no data validation | Incorrect math; missed arbs |
| Value Detection | 20% | 3+ sport models, calibrated, power vig removal, backtested | 1 sport model, multiplicative vig removal | Elo-only baseline; not calibrated | No probability model; arbitrary thresholds |
| Alert & Execution | 15% | 3+ channels; full bet logging with CLV; automated settlement | 1 channel; bet logging; manual settlement | Console only; basic logging | No alerts; no tracking |
| Dashboard & Reporting | 15% | Interactive web app; 6+ pages; full attribution analysis | Static notebook; 4 pages; basic charts | Minimal visualization; 2 pages | No dashboard; raw numbers only |
| Documentation | 10% | Architecture doc + API doc + user guide; clean code | Architecture doc; adequate comments | README only; sparse comments | No documentation |
Suggested Timeline
| Week | Phase | Key Activities | Milestone |
|---|---|---|---|
| 1 | Phase 1: Infrastructure | API setup, database schema, initial data collection | Odds flowing into database |
| 2 | Phase 1 (cont.) + Phase 2 | Continuous collection; two-way arbitrage detection | Arbs detected on live data |
| 3 | Phase 2 (cont.) + Phase 3 | Three-way arbs; begin value model for first sport | First value bets identified |
| 4 | Phase 3 (cont.) + Phase 4 | Calibrate value model; build alert system | Alerts firing on detection |
| 5 | Phase 4 (cont.) + Phase 5 | Multi-channel alerts; bet logging system | Full bet tracking operational |
| 6 | Phase 5 (cont.) + Phase 6 | Settlement; P&L tracking; begin dashboard | Performance metrics available |
| 7 | Phase 6: Dashboard | Complete all dashboard pages; simulated operation | Dashboard functional |
| 8 | Documentation & Polish | Architecture doc; simulated operation report; cleanup | All deliverables submitted |
Chapter Reference Index
- Chapter 2 (Probability and Odds): Odds conversion, implied probability, vig removal methods
- Chapter 3 (Expected Value): EV calculation for value bets
- Chapter 4 (Bankroll Management): Bankroll allocation across arbitrage and value strategies
- Chapter 5 (Data Literacy): Database design, data quality, API integration
- Chapter 11 (Betting Markets): Market structure, how odds are set, market efficiency
- Chapter 12 (Line Shopping): Multi-book comparison, CLV tracking and interpretation
- Chapter 13 (Value Betting): True vs. market probability, systematic value identification
- Chapter 14 (Advanced Bankroll): Portfolio theory for multi-strategy operations, drawdown management
- Chapter 25 (Optimization): Arbitrage detection algorithms, portfolio optimization
- Chapter 26 (Ratings and Rankings): Elo model as baseline probability engine
- Chapter 27 (Advanced Regression): Probability calibration for value models
- Chapter 30 (Model Evaluation): Backtesting value models, calibration assessment
- Chapter 31 (ML Betting Pipeline): System architecture, data pipelines, monitoring
- Chapter 37 (Discipline Systems): Bet logging, performance tracking, systematic processes
- Chapter 38 (Risk Management): Loss limits, risk controls, responsible gambling
- Chapter 41 (Putting It Together): Performance attribution, portfolio approach
Tips for Success
-
Start with data reliability. Your arbitrage detector is only as good as your odds data. An erroneous odds value will produce phantom arbitrage opportunities. Build data validation from day one.
-
Arbitrage windows are narrow. Real arbitrage opportunities in liquid markets typically last seconds to minutes. Your system's value is proportional to its speed. Focus on latency optimization.
-
Beware of execution risk. Detecting an arbitrage is not the same as capturing it. Sportsbooks may reject bets, limits may be too low, or one leg may execute while the other fails. Always account for these scenarios in your P&L tracking.
-
CLV is your north star. The single best indicator of whether your value detection is working is Closing Line Value (Chapter 12). If you consistently get better odds than the closing line, you are likely a long-term winner even if short-term results are negative.
-
Calibration matters enormously. An uncalibrated probability model will generate false value signals. Use Chapter 27's calibration techniques and Chapter 30's evaluation methods rigorously.
-
Build for monitoring, not just detection. The most successful platforms are those that tell you when something is wrong (stale data, model drift, API failures) as quickly as they tell you when something is right (opportunities).
This capstone project integrates material from Chapters 2--5, 11--14, 25--27, 30--31, 37--38, and 41 of The Sports Betting Textbook.