Capstone Project 2: Multi-Sport Arbitrage and Value Detection Platform


Project Overview

In this capstone project, you will design and build a production-grade platform that monitors real-time odds from multiple sportsbooks across multiple sports, detects arbitrage opportunities and value bets, and provides an alert and tracking system for managing a betting portfolio. This project puts you in the role of a quantitative betting operations engineer -- someone who builds the infrastructure that turns analytical insights into executable trades.

Arbitrage betting (Chapter 25, Section 25.3) exploits pricing discrepancies between sportsbooks to guarantee a profit regardless of the outcome. Value betting (Chapter 13) identifies situations where a sportsbook's implied probability is lower than the true probability of an event, creating a positive expected value opportunity. Both strategies require fast, reliable data infrastructure, precise mathematical calculations, and disciplined execution tracking.

This project draws on material from across the textbook: odds conversion and implied probability (Chapter 2), expected value theory (Chapter 3), bankroll management (Chapter 4), data literacy and database design (Chapter 5), the betting marketplace and line shopping (Chapters 11--12), value betting theory (Chapter 13), staking strategies (Chapter 14), optimization (Chapter 25), the ML betting pipeline (Chapter 31), discipline systems (Chapter 37), and risk management (Chapter 38).

Learning Objectives

Upon completing this project, you will be able to:

  1. Architect a real-time data pipeline that ingests odds from multiple sportsbook APIs concurrently.
  2. Implement the mathematical foundations of arbitrage detection for two-way, three-way, and multi-leg markets.
  3. Build value detection engines that compare model-derived probabilities to market odds.
  4. Create real-time alerting infrastructure for time-sensitive betting opportunities.
  5. Design and implement bet logging and P&L tracking systems.
  6. Build web-based dashboards for operational monitoring and performance analysis.
  7. Manage a multi-sport, multi-sportsbook portfolio with proper risk controls.

Requirements and Deliverables

Specific, Measurable Requirements

Requirement Minimum Standard Exceeds Expectations
Sportsbooks monitored 5 distinct books 8+ books including offshore
Sports covered 3 sports (e.g., NFL, NBA, MLB) 5+ sports including international
Arbitrage detection latency < 5 seconds from odds update to alert < 1 second
Value model sports 1 sport with probability model 3+ sports with calibrated models
Alert channels 1 channel (email or console) 3+ channels (email, SMS, Slack, push)
Dashboard Static Jupyter notebook Interactive web app (Streamlit/Dash)
Bet logging CSV/spreadsheet Database with full API tracking
Backtest period 1 month simulated operation 3+ months with performance attribution
Documentation README + inline comments Architecture doc + API doc + user guide

Final Deliverables

  1. Working platform with complete source code, configuration files, and deployment instructions.
  2. System architecture document (5--8 pages) with data flow diagrams, component descriptions, and technology choices.
  3. One-month simulated operation report showing detected opportunities, executed bets, and P&L analysis.
  4. Live demonstration (15 minutes) showing the platform running, detecting an opportunity, and generating an alert.

Phase 1: Odds Data Infrastructure

Duration: Weeks 1--2 (of 8)

Relevant chapters: Chapter 5 (Data Literacy), Chapter 11 (Betting Markets), Chapter 12 (Line Shopping), Chapter 31 (ML Pipeline)

1.1 Sportsbook API Integration

The foundation of this platform is reliable, fast odds data from multiple sportsbooks. Your primary data source is The Odds API (https://the-odds-api.com/), which aggregates odds from dozens of sportsbooks in a single standardized API.

"""
phase1_odds_infrastructure.py
Real-time odds collection from multiple sportsbooks across multiple sports.
"""

import requests
import sqlite3
import time
import json
import logging
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass, asdict
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------

@dataclass
class OddsConfig:
    """Configuration for the odds collection system."""
    api_key: str
    db_path: str = "data/odds_platform.db"
    poll_interval_seconds: int = 60
    regions: str = "us,eu,uk,au"
    markets: str = "h2h,spreads,totals"
    odds_format: str = "american"
    bookmakers: str = (
        "draftkings,fanduel,betmgm,caesars,pointsbet,"
        "wynnbet,bet365,unibet,bovada,betonlineag"
    )
    sports: List[str] = None

    def __post_init__(self):
        if self.sports is None:
            self.sports = [
                "americanfootball_nfl",
                "basketball_nba",
                "baseball_mlb",
                "icehockey_nhl",
                "soccer_epl",
            ]


# ---------------------------------------------------------------------------
# Data Models
# ---------------------------------------------------------------------------

@dataclass
class OddsSnapshot:
    """A single odds snapshot for one outcome at one sportsbook."""
    snapshot_id: str
    timestamp: str
    sport: str
    event_id: str
    home_team: str
    away_team: str
    commence_time: str
    bookmaker: str
    market: str         # "h2h", "spreads", "totals"
    outcome_name: str   # team name, "Over", "Under"
    price: int          # American odds
    point: float        # spread or total line (None for h2h)


# ---------------------------------------------------------------------------
# Odds Collection Engine
# ---------------------------------------------------------------------------

class OddsCollector:
    """
    Collects odds from The Odds API and stores them in SQLite.

    Design decisions (informed by Chapter 31):
    - SQLite for simplicity; upgrade to PostgreSQL for production scale
    - Concurrent fetching across sports using ThreadPoolExecutor
    - Full snapshot history stored for line movement analysis
    - Deduplication based on bookmaker + event + market + outcome + price
    """

    def __init__(self, config: OddsConfig):
        self.config = config
        self.db_path = Path(config.db_path)
        self.db_path.parent.mkdir(parents=True, exist_ok=True)
        self._init_database()

    def _init_database(self):
        """Create the odds database schema."""
        conn = sqlite3.connect(str(self.db_path))
        cursor = conn.cursor()

        cursor.execute("""
            CREATE TABLE IF NOT EXISTS odds_snapshots (
                snapshot_id TEXT PRIMARY KEY,
                timestamp TEXT NOT NULL,
                sport TEXT NOT NULL,
                event_id TEXT NOT NULL,
                home_team TEXT NOT NULL,
                away_team TEXT NOT NULL,
                commence_time TEXT,
                bookmaker TEXT NOT NULL,
                market TEXT NOT NULL,
                outcome_name TEXT NOT NULL,
                price INTEGER NOT NULL,
                point REAL
            )
        """)

        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_event_book_market
            ON odds_snapshots (event_id, bookmaker, market)
        """)

        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_timestamp
            ON odds_snapshots (timestamp)
        """)

        cursor.execute("""
            CREATE TABLE IF NOT EXISTS api_log (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT NOT NULL,
                sport TEXT NOT NULL,
                status_code INTEGER,
                remaining_requests INTEGER,
                events_returned INTEGER,
                error_message TEXT
            )
        """)

        conn.commit()
        conn.close()

    def fetch_sport_odds(self, sport: str) -> List[OddsSnapshot]:
        """
        Fetch current odds for a single sport from The Odds API.

        API documentation: https://the-odds-api.com/liveapi/guides/v4/
        Free tier: 500 requests/month
        Paid tiers scale up to unlimited requests

        Returns a list of OddsSnapshot objects.
        """
        url = f"https://api.the-odds-api.com/v4/sports/{sport}/odds/"
        params = {
            "apiKey": self.config.api_key,
            "regions": self.config.regions,
            "markets": self.config.markets,
            "oddsFormat": self.config.odds_format,
            "bookmakers": self.config.bookmakers,
        }

        try:
            response = requests.get(url, params=params, timeout=30)
            remaining = response.headers.get("x-requests-remaining", "unknown")
            logger.info(f"[{sport}] Status: {response.status_code}, "
                        f"Remaining API calls: {remaining}")

            self._log_api_call(sport, response.status_code,
                               int(remaining) if remaining != "unknown" else None,
                               len(response.json()) if response.ok else 0)

            response.raise_for_status()
            data = response.json()

        except requests.RequestException as e:
            logger.error(f"[{sport}] API request failed: {e}")
            self._log_api_call(sport, getattr(e.response, 'status_code', None),
                               None, 0, str(e))
            return []

        snapshots = []
        now = datetime.now(timezone.utc).isoformat()

        for event in data:
            event_id = event["id"]
            home = event.get("home_team", "")
            away = event.get("away_team", "")
            commence = event.get("commence_time", "")

            for bookmaker in event.get("bookmakers", []):
                book_key = bookmaker["key"]

                for market in bookmaker.get("markets", []):
                    market_key = market["key"]

                    for outcome in market.get("outcomes", []):
                        snapshot_id = (
                            f"{event_id}_{book_key}_{market_key}_"
                            f"{outcome['name']}_{now}"
                        )

                        snapshots.append(OddsSnapshot(
                            snapshot_id=snapshot_id,
                            timestamp=now,
                            sport=sport,
                            event_id=event_id,
                            home_team=home,
                            away_team=away,
                            commence_time=commence,
                            bookmaker=book_key,
                            market=market_key,
                            outcome_name=outcome["name"],
                            price=outcome.get("price", 0),
                            point=outcome.get("point"),
                        ))

        return snapshots

    def fetch_all_sports(self) -> List[OddsSnapshot]:
        """
        Fetch odds for all configured sports concurrently.
        Uses ThreadPoolExecutor for parallel API calls (Chapter 31).
        """
        all_snapshots = []

        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = {
                executor.submit(self.fetch_sport_odds, sport): sport
                for sport in self.config.sports
            }

            for future in as_completed(futures):
                sport = futures[future]
                try:
                    snapshots = future.result()
                    all_snapshots.extend(snapshots)
                    logger.info(f"[{sport}] Collected {len(snapshots)} "
                                f"odds snapshots")
                except Exception as e:
                    logger.error(f"[{sport}] Collection failed: {e}")

        return all_snapshots

    def store_snapshots(self, snapshots: List[OddsSnapshot]):
        """Store odds snapshots to the database."""
        if not snapshots:
            return

        conn = sqlite3.connect(str(self.db_path))
        cursor = conn.cursor()

        rows = [
            (s.snapshot_id, s.timestamp, s.sport, s.event_id,
             s.home_team, s.away_team, s.commence_time,
             s.bookmaker, s.market, s.outcome_name, s.price, s.point)
            for s in snapshots
        ]

        cursor.executemany("""
            INSERT OR IGNORE INTO odds_snapshots
            (snapshot_id, timestamp, sport, event_id, home_team,
             away_team, commence_time, bookmaker, market,
             outcome_name, price, point)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, rows)

        conn.commit()
        conn.close()
        logger.info(f"Stored {len(snapshots)} snapshots to database")

    def _log_api_call(self, sport, status_code, remaining, events, error=None):
        conn = sqlite3.connect(str(self.db_path))
        conn.execute("""
            INSERT INTO api_log (timestamp, sport, status_code,
                                  remaining_requests, events_returned,
                                  error_message)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (datetime.now(timezone.utc).isoformat(), sport,
              status_code, remaining, events, error))
        conn.commit()
        conn.close()

    def run_continuous_collection(self):
        """
        Run the collector in a continuous loop with configurable
        polling interval. This is the production data ingestion loop
        (Chapter 31, Section 31.2).
        """
        logger.info(f"Starting continuous collection. "
                     f"Poll interval: {self.config.poll_interval_seconds}s")
        logger.info(f"Sports: {self.config.sports}")
        logger.info(f"Database: {self.db_path}")

        while True:
            try:
                snapshots = self.fetch_all_sports()
                self.store_snapshots(snapshots)
                logger.info(f"Cycle complete. Total: {len(snapshots)} snapshots. "
                            f"Sleeping {self.config.poll_interval_seconds}s...")
            except Exception as e:
                logger.error(f"Collection cycle failed: {e}", exc_info=True)

            time.sleep(self.config.poll_interval_seconds)


# ---------------------------------------------------------------------------
# Data Access Layer
# ---------------------------------------------------------------------------

class OddsDataAccess:
    """
    Query interface for the odds database.
    Provides methods for retrieving current odds, historical snapshots,
    and line movement data.
    """

    def __init__(self, db_path: str = "data/odds_platform.db"):
        self.db_path = db_path

    def get_current_odds(self, sport: Optional[str] = None,
                          event_id: Optional[str] = None) -> pd.DataFrame:
        """
        Get the most recent odds snapshot for each event/bookmaker/market
        combination. This is the "current state" of the market.
        """
        import pandas as pd

        conn = sqlite3.connect(self.db_path)

        query = """
            SELECT o.*
            FROM odds_snapshots o
            INNER JOIN (
                SELECT event_id, bookmaker, market, outcome_name,
                       MAX(timestamp) as max_ts
                FROM odds_snapshots
                {where_clause}
                GROUP BY event_id, bookmaker, market, outcome_name
            ) latest
            ON o.event_id = latest.event_id
               AND o.bookmaker = latest.bookmaker
               AND o.market = latest.market
               AND o.outcome_name = latest.outcome_name
               AND o.timestamp = latest.max_ts
        """

        conditions = []
        params = []
        if sport:
            conditions.append("sport = ?")
            params.append(sport)
        if event_id:
            conditions.append("event_id = ?")
            params.append(event_id)

        where_clause = ""
        if conditions:
            where_clause = "WHERE " + " AND ".join(conditions)

        query = query.format(where_clause=where_clause)
        df = pd.read_sql(query, conn, params=params)
        conn.close()
        return df

    def get_line_movement(self, event_id: str, bookmaker: str,
                           market: str) -> pd.DataFrame:
        """Get the full history of odds changes for a specific line."""
        import pandas as pd

        conn = sqlite3.connect(self.db_path)
        df = pd.read_sql("""
            SELECT timestamp, outcome_name, price, point
            FROM odds_snapshots
            WHERE event_id = ? AND bookmaker = ? AND market = ?
            ORDER BY timestamp
        """, conn, params=[event_id, bookmaker, market])
        conn.close()
        return df

    def get_upcoming_events(self, sport: Optional[str] = None) -> pd.DataFrame:
        """Get unique upcoming events with their earliest commence time."""
        import pandas as pd

        conn = sqlite3.connect(self.db_path)
        query = """
            SELECT DISTINCT event_id, sport, home_team, away_team,
                            commence_time
            FROM odds_snapshots
            WHERE commence_time > datetime('now')
        """
        params = []
        if sport:
            query += " AND sport = ?"
            params.append(sport)

        query += " ORDER BY commence_time"
        df = pd.read_sql(query, conn, params=params)
        conn.close()
        return df

1.2 Database Design

The database schema stores every odds snapshot with a timestamp, enabling both real-time queries (current best odds) and historical analysis (line movements, closing line value). The schema follows the data modeling principles from Chapter 5.

Core tables:

Table Purpose Key Columns
odds_snapshots Every odds observation event_id, bookmaker, market, outcome, price, point, timestamp
events Unique sporting events event_id, sport, home, away, commence_time, result
arbitrage_opportunities Detected arb instances event_id, timestamp, market, arb_percentage, books_involved
value_opportunities Detected value bets event_id, timestamp, market, model_prob, implied_prob, edge
bets Executed bet records bet_id, event_id, bookmaker, market, outcome, odds, stake, result
bankroll_log Running bankroll balance timestamp, balance, change, reason
api_log API call tracking timestamp, sport, status, remaining_requests

Phase 1 Checklist

  • [ ] The Odds API key obtained and tested (free tier is sufficient for development)
  • [ ] Odds collection works for all target sports
  • [ ] Database schema created with proper indexes
  • [ ] Continuous collection loop runs without crashing for 24 hours
  • [ ] API rate limits respected and tracked
  • [ ] Data quality verified: odds values are reasonable, no null event IDs
  • [ ] Line movement history accumulating correctly

Phase 2: Arbitrage Detection Engine

Duration: Weeks 2--3

Relevant chapters: Chapter 2 (Probability and Odds), Chapter 25 (Optimization, Section 25.3)

2.1 Mathematical Foundation

An arbitrage opportunity exists when the combined implied probabilities of all outcomes at the best available odds across sportsbooks sum to less than 1.0. Chapter 25, Section 25.3 provides the formal definition:

For a two-outcome market (e.g., NFL moneyline), arbitrage exists when:

1/decimal_odds_A + 1/decimal_odds_B < 1.0

The arbitrage percentage is:

arb_pct = (1.0 - (1/decimal_odds_A + 1/decimal_odds_B)) * 100

The optimal stake allocation to guarantee equal profit regardless of outcome follows from Chapter 25:

stake_A = total_stake * (1/decimal_odds_A) / (1/decimal_odds_A + 1/decimal_odds_B)
stake_B = total_stake * (1/decimal_odds_B) / (1/decimal_odds_A + 1/decimal_odds_B)

For three-outcome markets (e.g., soccer with draw), the same logic extends to three legs.

"""
phase2_arbitrage_detection.py
Real-time arbitrage detection across multiple sportsbooks.
"""

import numpy as np
import pandas as pd
from itertools import combinations, product
from typing import List, Dict, Optional, Tuple
from dataclasses import dataclass
from datetime import datetime, timezone
import logging

logger = logging.getLogger(__name__)


# ---------------------------------------------------------------------------
# Odds Conversion Utilities (Chapter 2)
# ---------------------------------------------------------------------------

def american_to_decimal(american: int) -> float:
    """
    Convert American odds to decimal odds (Chapter 2, Section 2.2).
    American +150 -> Decimal 2.50
    American -200 -> Decimal 1.50
    """
    if american > 0:
        return 1.0 + american / 100.0
    elif american < 0:
        return 1.0 + 100.0 / abs(american)
    else:
        return 1.0  # Even money edge case


def american_to_implied_prob(american: int) -> float:
    """
    Convert American odds to implied probability (Chapter 2, Section 2.3).
    This includes the vig; the raw implied probability.
    """
    if american > 0:
        return 100.0 / (american + 100.0)
    elif american < 0:
        return abs(american) / (abs(american) + 100.0)
    else:
        return 0.5


# ---------------------------------------------------------------------------
# Arbitrage Detection
# ---------------------------------------------------------------------------

@dataclass
class ArbitrageOpportunity:
    """Represents a detected arbitrage opportunity."""
    event_id: str
    sport: str
    home_team: str
    away_team: str
    market: str
    arb_percentage: float           # Guaranteed profit percentage
    total_implied_probability: float # Sum of implied probs (< 1.0 for arb)
    legs: List[Dict]                # Each leg: book, outcome, odds, stake_pct
    detected_at: str
    estimated_profit_per_100: float  # Profit per $100 wagered


class ArbitrageDetector:
    """
    Detects arbitrage opportunities across sportsbooks in real time.

    Mathematical basis from Chapter 25, Section 25.3:
    Arbitrage exists when the best available odds across all sportsbooks
    for each outcome in a market produce a combined implied probability
    strictly less than 1.0.

    Detection algorithm:
    1. For each event and market, find the best odds for each outcome
       across all sportsbooks.
    2. Convert each best odds to implied probability.
    3. Sum the implied probabilities.
    4. If the sum < 1.0, an arbitrage opportunity exists.
    5. Calculate optimal stake allocation and guaranteed profit.
    """

    def __init__(self, min_arb_pct: float = 0.1, max_arb_pct: float = 20.0):
        """
        Parameters:
        - min_arb_pct: minimum arbitrage percentage to report (filters noise)
        - max_arb_pct: maximum arb percentage to report (very large arbs
          are usually data errors, per Chapter 25)
        """
        self.min_arb_pct = min_arb_pct
        self.max_arb_pct = max_arb_pct

    def detect_two_way_arbitrage(self, event_odds: pd.DataFrame) -> Optional[ArbitrageOpportunity]:
        """
        Detect arbitrage in a two-outcome market (moneyline, spread, total).

        event_odds should contain columns:
        event_id, sport, home_team, away_team, bookmaker, market,
        outcome_name, price

        For a two-way market, there should be exactly 2 distinct outcomes.
        """
        # Get unique outcomes
        outcomes = event_odds["outcome_name"].unique()
        if len(outcomes) != 2:
            return None

        event_id = event_odds["event_id"].iloc[0]
        sport = event_odds["sport"].iloc[0]
        home = event_odds["home_team"].iloc[0]
        away = event_odds["away_team"].iloc[0]
        market = event_odds["market"].iloc[0]

        # Find best odds for each outcome
        best_odds = {}
        for outcome in outcomes:
            outcome_df = event_odds[event_odds["outcome_name"] == outcome]
            best_row = outcome_df.loc[outcome_df["price"].idxmax()]
            best_odds[outcome] = {
                "bookmaker": best_row["bookmaker"],
                "price": int(best_row["price"]),
                "decimal_odds": american_to_decimal(int(best_row["price"])),
                "implied_prob": american_to_implied_prob(int(best_row["price"])),
                "point": best_row.get("point"),
            }

        # Check for arbitrage
        total_implied = sum(o["implied_prob"] for o in best_odds.values())

        if total_implied < 1.0:
            arb_pct = (1.0 - total_implied) * 100.0

            if arb_pct < self.min_arb_pct or arb_pct > self.max_arb_pct:
                return None

            # Calculate optimal stake allocation
            legs = []
            for outcome, odds_info in best_odds.items():
                stake_pct = (1.0 / odds_info["decimal_odds"]) / total_implied
                legs.append({
                    "outcome": outcome,
                    "bookmaker": odds_info["bookmaker"],
                    "american_odds": odds_info["price"],
                    "decimal_odds": odds_info["decimal_odds"],
                    "implied_probability": odds_info["implied_prob"],
                    "stake_percentage": stake_pct,
                    "point": odds_info["point"],
                })

            # Profit per $100 total stake
            # If we invest $100 total, we get back:
            # stake_on_A * decimal_odds_A = (100 * stake_pct_A) * decimal_A
            # This is the same regardless of which outcome hits
            profit_per_100 = 100.0 * (1.0 / total_implied - 1.0)

            return ArbitrageOpportunity(
                event_id=event_id,
                sport=sport,
                home_team=home,
                away_team=away,
                market=market,
                arb_percentage=arb_pct,
                total_implied_probability=total_implied,
                legs=legs,
                detected_at=datetime.now(timezone.utc).isoformat(),
                estimated_profit_per_100=profit_per_100,
            )

        return None

    def detect_three_way_arbitrage(self, event_odds: pd.DataFrame) -> Optional[ArbitrageOpportunity]:
        """
        Detect arbitrage in a three-outcome market (soccer: home/draw/away).

        The math extends naturally from the two-way case:
        Sum of 1/decimal_odds for all three outcomes must be < 1.0.
        """
        outcomes = event_odds["outcome_name"].unique()
        if len(outcomes) != 3:
            return None

        event_id = event_odds["event_id"].iloc[0]
        sport = event_odds["sport"].iloc[0]
        home = event_odds["home_team"].iloc[0]
        away = event_odds["away_team"].iloc[0]
        market = event_odds["market"].iloc[0]

        best_odds = {}
        for outcome in outcomes:
            outcome_df = event_odds[event_odds["outcome_name"] == outcome]
            best_row = outcome_df.loc[outcome_df["price"].idxmax()]
            best_odds[outcome] = {
                "bookmaker": best_row["bookmaker"],
                "price": int(best_row["price"]),
                "decimal_odds": american_to_decimal(int(best_row["price"])),
                "implied_prob": american_to_implied_prob(int(best_row["price"])),
            }

        total_implied = sum(o["implied_prob"] for o in best_odds.values())

        if total_implied < 1.0:
            arb_pct = (1.0 - total_implied) * 100.0

            if arb_pct < self.min_arb_pct or arb_pct > self.max_arb_pct:
                return None

            legs = []
            for outcome, odds_info in best_odds.items():
                stake_pct = (1.0 / odds_info["decimal_odds"]) / total_implied
                legs.append({
                    "outcome": outcome,
                    "bookmaker": odds_info["bookmaker"],
                    "american_odds": odds_info["price"],
                    "decimal_odds": odds_info["decimal_odds"],
                    "implied_probability": odds_info["implied_prob"],
                    "stake_percentage": stake_pct,
                })

            profit_per_100 = 100.0 * (1.0 / total_implied - 1.0)

            return ArbitrageOpportunity(
                event_id=event_id,
                sport=sport,
                home_team=home,
                away_team=away,
                market=market,
                arb_percentage=arb_pct,
                total_implied_probability=total_implied,
                legs=legs,
                detected_at=datetime.now(timezone.utc).isoformat(),
                estimated_profit_per_100=profit_per_100,
            )

        return None

    def scan_all_events(self, current_odds: pd.DataFrame) -> List[ArbitrageOpportunity]:
        """
        Scan all current events and markets for arbitrage opportunities.

        Iterates over every unique (event_id, market) combination and
        checks for arbitrage using the appropriate method (2-way or 3-way).
        """
        opportunities = []

        groups = current_odds.groupby(["event_id", "market"])
        for (event_id, market), group in groups:
            n_outcomes = group["outcome_name"].nunique()

            if n_outcomes == 2:
                arb = self.detect_two_way_arbitrage(group)
            elif n_outcomes == 3:
                arb = self.detect_three_way_arbitrage(group)
            else:
                continue

            if arb is not None:
                opportunities.append(arb)

        # Sort by arb percentage (most profitable first)
        opportunities.sort(key=lambda x: x.arb_percentage, reverse=True)

        logger.info(f"Scanned {len(groups)} event-markets, "
                     f"found {len(opportunities)} arbitrage opportunities")

        return opportunities


def calculate_arb_stakes(total_stake: float,
                          arb: ArbitrageOpportunity) -> Dict[str, float]:
    """
    Calculate exact dollar amounts to stake on each leg of an arbitrage.

    Returns a dictionary mapping outcome name to dollar stake.
    The total of all stakes equals total_stake.
    The guaranteed profit equals total_stake * arb_percentage / 100.
    """
    stakes = {}
    for leg in arb.legs:
        stakes[leg["outcome"]] = round(total_stake * leg["stake_percentage"], 2)

    guaranteed_profit = round(total_stake * arb.arb_percentage / 100.0, 2)

    return {
        "stakes": stakes,
        "total_staked": sum(stakes.values()),
        "guaranteed_profit": guaranteed_profit,
        "profit_percentage": arb.arb_percentage,
    }

Phase 2 Checklist

  • [ ] Two-way arbitrage detection implemented and tested against known examples
  • [ ] Three-way arbitrage detection implemented for soccer markets
  • [ ] Arbitrage scanning runs across all events in < 5 seconds
  • [ ] Data error filtering in place (arbs > 20% flagged for review)
  • [ ] Stake calculation verified: guaranteed profit math checked by hand
  • [ ] Arbitrage opportunities stored to database with timestamps

Phase 3: Value Detection

Duration: Weeks 3--4

Relevant chapters: Chapter 2 (Implied Probability), Chapter 3 (Expected Value), Chapter 13 (Value Betting), Chapter 27 (Classification)

3.1 Value Betting Framework

Value detection requires comparing model-derived "true" probabilities to market-implied probabilities. When your model's probability significantly exceeds the market's implied probability (after removing the vig), you have a value bet (Chapter 13, Section 13.1).

"""
phase3_value_detection.py
Identify value betting opportunities by comparing model probabilities
to market-implied probabilities.
"""

import numpy as np
import pandas as pd
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass
from datetime import datetime, timezone
from scipy.stats import norm
import logging

logger = logging.getLogger(__name__)


@dataclass
class ValueOpportunity:
    """Represents a detected value betting opportunity."""
    event_id: str
    sport: str
    home_team: str
    away_team: str
    market: str
    outcome: str
    model_probability: float
    market_implied_probability: float
    fair_implied_probability: float  # After vig removal
    edge_vs_fair: float              # model_prob - fair_implied_prob
    expected_value_pct: float        # EV as percentage of stake
    best_odds: int                   # American odds at best book
    best_book: str
    confidence: str                  # "low", "medium", "high"
    detected_at: str


class ValueDetector:
    """
    Detects value betting opportunities across sports.

    The value detection process (Chapter 13):
    1. Obtain model probability estimates for each outcome
    2. Extract market-implied probabilities from best available odds
    3. Remove the vig using the multiplicative method (Chapter 2)
    4. Compare model probability to vig-free implied probability
    5. If model_prob > fair_implied_prob by a significant margin,
       flag as a value bet
    6. Calculate expected value: EV = model_prob * (decimal_odds) - 1
    """

    def __init__(self, min_edge: float = 0.03, min_ev_pct: float = 0.02,
                 confidence_thresholds: Tuple[float, float] = (0.03, 0.06)):
        """
        Parameters:
        - min_edge: minimum probability edge to consider (3% default)
        - min_ev_pct: minimum EV as fraction of stake (2% default)
        - confidence_thresholds: (medium, high) edge thresholds
        """
        self.min_edge = min_edge
        self.min_ev_pct = min_ev_pct
        self.low_threshold = confidence_thresholds[0]
        self.high_threshold = confidence_thresholds[1]

    def remove_vig_multiplicative(self, implied_probs: List[float]) -> List[float]:
        """
        Remove the vig using the multiplicative method (Chapter 2, Section 2.3).

        For a market with outcomes having implied probabilities p1, p2, ..., pn,
        the vig-free probabilities are:
            fair_pi = pi / sum(all pi)

        This is the simplest vig removal method and assumes the vig is
        distributed proportionally across all outcomes.
        """
        total = sum(implied_probs)
        if total == 0:
            return implied_probs
        return [p / total for p in implied_probs]

    def remove_vig_power(self, implied_probs: List[float],
                          tol: float = 1e-8) -> List[float]:
        """
        Remove the vig using the power method (Shin model), from
        Chapter 2, Section 2.3. This method is more accurate for
        markets where the vig is asymmetrically distributed (e.g.,
        heavy favorites pay more vig).

        Solves for k such that: sum(pi^k) = 1
        where pi are the raw implied probabilities.
        """
        from scipy.optimize import brentq

        def objective(k):
            return sum(p ** k for p in implied_probs) - 1.0

        try:
            k = brentq(objective, 0.5, 2.0, xtol=tol)
            return [p ** k for p in implied_probs]
        except ValueError:
            # Fallback to multiplicative if power method fails
            return self.remove_vig_multiplicative(implied_probs)

    def detect_value(self, event_odds: pd.DataFrame,
                      model_probs: Dict[str, float],
                      vig_method: str = "multiplicative") -> List[ValueOpportunity]:
        """
        Detect value opportunities for a single event.

        Parameters:
        - event_odds: DataFrame with current odds from all sportsbooks
        - model_probs: dict mapping outcome_name to model probability
          e.g., {"Kansas City Chiefs": 0.62, "Buffalo Bills": 0.38}
        - vig_method: "multiplicative" or "power" (Chapter 2)
        """
        if event_odds.empty:
            return []

        event_id = event_odds["event_id"].iloc[0]
        sport = event_odds["sport"].iloc[0]
        home = event_odds["home_team"].iloc[0]
        away = event_odds["away_team"].iloc[0]
        market = event_odds["market"].iloc[0]

        outcomes = event_odds["outcome_name"].unique()
        opportunities = []

        # Find best odds for each outcome
        best_odds = {}
        for outcome in outcomes:
            outcome_df = event_odds[event_odds["outcome_name"] == outcome]
            best_row = outcome_df.loc[outcome_df["price"].idxmax()]
            best_odds[outcome] = {
                "bookmaker": best_row["bookmaker"],
                "price": int(best_row["price"]),
                "implied_prob": american_to_implied_prob(int(best_row["price"])),
                "decimal_odds": american_to_decimal(int(best_row["price"])),
            }

        # Remove vig to get fair probabilities
        raw_probs = [best_odds[o]["implied_prob"] for o in outcomes]
        if vig_method == "power":
            fair_probs = self.remove_vig_power(raw_probs)
        else:
            fair_probs = self.remove_vig_multiplicative(raw_probs)

        fair_prob_map = dict(zip(outcomes, fair_probs))

        # Check each outcome for value
        for outcome in outcomes:
            if outcome not in model_probs:
                continue

            model_prob = model_probs[outcome]
            fair_prob = fair_prob_map[outcome]
            market_prob = best_odds[outcome]["implied_prob"]
            decimal_odds = best_odds[outcome]["decimal_odds"]

            # Edge = model probability minus vig-free market probability
            edge = model_prob - fair_prob

            # Expected value: EV = p * odds - 1 (Chapter 3)
            ev_pct = model_prob * decimal_odds - 1.0

            if edge >= self.min_edge and ev_pct >= self.min_ev_pct:
                # Assign confidence level
                if edge >= self.high_threshold:
                    confidence = "high"
                elif edge >= self.low_threshold:
                    confidence = "medium"
                else:
                    confidence = "low"

                opportunities.append(ValueOpportunity(
                    event_id=event_id,
                    sport=sport,
                    home_team=home,
                    away_team=away,
                    market=market,
                    outcome=outcome,
                    model_probability=round(model_prob, 4),
                    market_implied_probability=round(market_prob, 4),
                    fair_implied_probability=round(fair_prob, 4),
                    edge_vs_fair=round(edge, 4),
                    expected_value_pct=round(ev_pct, 4),
                    best_odds=best_odds[outcome]["price"],
                    best_book=best_odds[outcome]["bookmaker"],
                    confidence=confidence,
                    detected_at=datetime.now(timezone.utc).isoformat(),
                ))

        return opportunities


# ---------------------------------------------------------------------------
# Sport-Specific Probability Models
# ---------------------------------------------------------------------------

class SimpleEloModel:
    """
    A simple Elo-based probability model for value detection.

    This serves as the baseline probability engine. For production use,
    replace with the sport-specific models from Chapters 15--22.

    The Elo model converts rating differences to win probabilities
    using the logistic function (Chapter 26):
        P(A wins) = 1 / (1 + 10^((Rb - Ra) / 400))
    """

    def __init__(self, k_factor: float = 20.0, home_adv: float = 100.0,
                 initial_rating: float = 1500.0):
        self.k = k_factor
        self.home_adv = home_adv
        self.ratings = {}
        self.initial_rating = initial_rating

    def get_rating(self, team: str) -> float:
        return self.ratings.get(team, self.initial_rating)

    def predict_win_probability(self, home_team: str,
                                  away_team: str) -> Dict[str, float]:
        """
        Predict win probability for a matchup.
        Returns {home_team: prob, away_team: prob}.
        """
        home_r = self.get_rating(home_team) + self.home_adv
        away_r = self.get_rating(away_team)

        expected_home = 1.0 / (1.0 + 10.0 ** ((away_r - home_r) / 400.0))
        expected_away = 1.0 - expected_home

        return {
            home_team: round(expected_home, 4),
            away_team: round(expected_away, 4),
        }

    def update(self, home_team: str, away_team: str, home_won: bool):
        """Update ratings after a game result."""
        probs = self.predict_win_probability(home_team, away_team)
        actual_home = 1.0 if home_won else 0.0

        home_r = self.get_rating(home_team)
        away_r = self.get_rating(away_team)

        update = self.k * (actual_home - probs[home_team])
        self.ratings[home_team] = home_r + update
        self.ratings[away_team] = away_r - update

    def train_on_historical_results(self, results: pd.DataFrame):
        """
        Train the Elo model on historical game results.
        results should have: home_team, away_team, home_won (bool)
        """
        for _, game in results.iterrows():
            self.update(game["home_team"], game["away_team"], game["home_won"])

3.2 Multi-Sport Probability Models

You must implement probability models for at least one sport (three for exceeding expectations). Each model should be calibrated (Chapter 27, Section 27.3) and backtested (Chapter 30).

Sport Recommended Model Approach Chapter Reference
NFL EPA-based regression + Elo ensemble Ch 15, 26, 27
NBA Adjusted net rating + Four Factors model Ch 16, 26
MLB Starting pitcher FIP + bullpen ERA Ch 17, 9
NHL Corsi/Fenwick possession model Ch 18, 26
Soccer (EPL) Expected goals (xG) + Elo Ch 19, 26

Phase 3 Checklist

  • [ ] At least one sport has a calibrated probability model
  • [ ] Vig removal implemented using both multiplicative and power methods
  • [ ] Value detection tested against known value situations
  • [ ] Edge thresholds calibrated against historical backtest
  • [ ] Confidence levels assigned to each value opportunity
  • [ ] Expected value calculated for every flagged opportunity

Phase 4: Alert System

Duration: Weeks 4--5

Relevant chapters: Chapter 31 (ML Pipeline, monitoring and alerting), Chapter 37 (Discipline Systems)

4.1 Real-Time Notification Engine

When the platform detects an arbitrage or value opportunity, it must notify you immediately. Time sensitivity is critical for arbitrage (opportunities may last seconds) and important for value (lines move as information is priced in).

"""
phase4_alert_system.py
Real-time alert system for arbitrage and value opportunities.
"""

import smtplib
import requests
import json
import logging
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from typing import List, Optional
from dataclasses import dataclass, asdict
from datetime import datetime

logger = logging.getLogger(__name__)


@dataclass
class AlertConfig:
    """Configuration for alert channels."""
    # Email
    smtp_server: str = ""
    smtp_port: int = 587
    email_from: str = ""
    email_to: str = ""
    email_password: str = ""

    # Slack
    slack_webhook_url: str = ""

    # Telegram
    telegram_bot_token: str = ""
    telegram_chat_id: str = ""

    # Console (always enabled)
    console_enabled: bool = True

    # Alert thresholds
    arb_alert_min_pct: float = 0.5    # Minimum arb% to trigger alert
    value_alert_min_ev: float = 0.03  # Minimum EV% to trigger alert


class AlertManager:
    """
    Manages sending alerts through multiple channels.

    Design principle (Chapter 37): Alerts should be actionable, timely,
    and not overwhelming. Too many low-quality alerts lead to alert
    fatigue and missed opportunities.
    """

    def __init__(self, config: AlertConfig):
        self.config = config
        self.alert_history = []

    def send_arbitrage_alert(self, arb: 'ArbitrageOpportunity',
                              stake_info: dict):
        """Send an alert for a detected arbitrage opportunity."""
        if arb.arb_percentage < self.config.arb_alert_min_pct:
            return

        subject = (f"ARB ALERT: {arb.sport} | "
                    f"{arb.home_team} vs {arb.away_team} | "
                    f"{arb.arb_percentage:.2f}%")

        body_lines = [
            f"ARBITRAGE OPPORTUNITY DETECTED",
            f"{'=' * 50}",
            f"Sport: {arb.sport}",
            f"Event: {arb.home_team} vs {arb.away_team}",
            f"Market: {arb.market}",
            f"Arb Percentage: {arb.arb_percentage:.2f}%",
            f"Profit per $100: ${arb.estimated_profit_per_100:.2f}",
            f"",
            f"LEGS:",
        ]

        for leg in arb.legs:
            body_lines.append(
                f"  {leg['outcome']}: {leg['bookmaker']} @ "
                f"{leg['american_odds']:+d} | "
                f"Stake: {leg['stake_percentage']:.1%}"
            )

        if stake_info:
            body_lines.extend([
                f"",
                f"STAKE ALLOCATION (for ${stake_info['total_staked']:.2f}):",
            ])
            for outcome, amount in stake_info["stakes"].items():
                body_lines.append(f"  {outcome}: ${amount:.2f}")
            body_lines.append(
                f"Guaranteed Profit: ${stake_info['guaranteed_profit']:.2f}"
            )

        body_lines.append(f"\nDetected at: {arb.detected_at}")
        body = "\n".join(body_lines)

        self._dispatch_alert(subject, body, priority="high")

    def send_value_alert(self, value: 'ValueOpportunity'):
        """Send an alert for a detected value betting opportunity."""
        if value.expected_value_pct < self.config.value_alert_min_ev:
            return

        subject = (f"VALUE: {value.sport} | {value.outcome} | "
                    f"Edge: {value.edge_vs_fair:.1%} | "
                    f"EV: {value.expected_value_pct:.1%}")

        body_lines = [
            f"VALUE BET DETECTED",
            f"{'=' * 50}",
            f"Sport: {value.sport}",
            f"Event: {value.home_team} vs {value.away_team}",
            f"Market: {value.market}",
            f"Bet On: {value.outcome}",
            f"",
            f"ANALYSIS:",
            f"  Model Probability: {value.model_probability:.1%}",
            f"  Market Implied (raw): {value.market_implied_probability:.1%}",
            f"  Market Implied (fair): {value.fair_implied_probability:.1%}",
            f"  Edge vs Fair: {value.edge_vs_fair:.1%}",
            f"  Expected Value: {value.expected_value_pct:.1%}",
            f"  Confidence: {value.confidence.upper()}",
            f"",
            f"EXECUTION:",
            f"  Best Odds: {value.best_odds:+d} @ {value.best_book}",
            f"\nDetected at: {value.detected_at}",
        ]

        body = "\n".join(body_lines)
        priority = "high" if value.confidence == "high" else "medium"
        self._dispatch_alert(subject, body, priority=priority)

    def _dispatch_alert(self, subject: str, body: str,
                        priority: str = "medium"):
        """Send alert through all configured channels."""
        self.alert_history.append({
            "timestamp": datetime.utcnow().isoformat(),
            "subject": subject,
            "priority": priority,
        })

        if self.config.console_enabled:
            self._send_console(subject, body)

        if self.config.email_from:
            self._send_email(subject, body)

        if self.config.slack_webhook_url:
            self._send_slack(subject, body, priority)

        if self.config.telegram_bot_token:
            self._send_telegram(subject, body)

    def _send_console(self, subject: str, body: str):
        border = "=" * 60
        print(f"\n{border}")
        print(f"  ALERT: {subject}")
        print(f"{border}")
        print(body)
        print(f"{border}\n")

    def _send_email(self, subject: str, body: str):
        try:
            msg = MIMEMultipart()
            msg["From"] = self.config.email_from
            msg["To"] = self.config.email_to
            msg["Subject"] = subject
            msg.attach(MIMEText(body, "plain"))

            with smtplib.SMTP(self.config.smtp_server,
                               self.config.smtp_port) as server:
                server.starttls()
                server.login(self.config.email_from,
                              self.config.email_password)
                server.send_message(msg)

            logger.info(f"Email alert sent: {subject}")
        except Exception as e:
            logger.error(f"Email alert failed: {e}")

    def _send_slack(self, subject: str, body: str, priority: str):
        try:
            emoji = ":rotating_light:" if priority == "high" else ":chart_with_upwards_trend:"
            payload = {
                "text": f"{emoji} *{subject}*\n```{body}```"
            }
            response = requests.post(
                self.config.slack_webhook_url,
                json=payload, timeout=10
            )
            response.raise_for_status()
            logger.info(f"Slack alert sent: {subject}")
        except Exception as e:
            logger.error(f"Slack alert failed: {e}")

    def _send_telegram(self, subject: str, body: str):
        try:
            url = (f"https://api.telegram.org/bot"
                   f"{self.config.telegram_bot_token}/sendMessage")
            payload = {
                "chat_id": self.config.telegram_chat_id,
                "text": f"*{subject}*\n\n{body}",
                "parse_mode": "Markdown"
            }
            response = requests.post(url, json=payload, timeout=10)
            response.raise_for_status()
            logger.info(f"Telegram alert sent: {subject}")
        except Exception as e:
            logger.error(f"Telegram alert failed: {e}")

Phase 4 Checklist

  • [ ] At least one alert channel functional (email, Slack, or Telegram)
  • [ ] Arbitrage alerts include all legs, odds, stake allocation, and profit
  • [ ] Value alerts include model probability, market probability, edge, and EV
  • [ ] Alert thresholds configurable to prevent alert fatigue
  • [ ] Alert history logged for later review
  • [ ] Alert delivery latency under 10 seconds from detection

Phase 5: Execution and Tracking

Duration: Weeks 5--6

Relevant chapters: Chapter 4 (Bankroll Management), Chapter 14 (Advanced Bankroll), Chapter 37 (Discipline and Record-Keeping)

5.1 Bet Logging and Portfolio Management

Every bet must be logged with full detail per Chapter 37's discipline framework. The bet log is the foundation of all performance analysis.

"""
phase5_execution_tracking.py
Bet logging, portfolio management, and P&L tracking.
"""

import sqlite3
import pandas as pd
import numpy as np
from datetime import datetime, timezone
from typing import Optional, List, Dict
from dataclasses import dataclass
import uuid
import logging

logger = logging.getLogger(__name__)


@dataclass
class BetRecord:
    """Complete record of a single bet (Chapter 37, Section 37.1)."""
    bet_id: str
    timestamp: str
    event_id: str
    sport: str
    home_team: str
    away_team: str
    market: str           # h2h, spreads, totals
    outcome: str          # team name, Over, Under
    bookmaker: str
    odds_placed: int      # American odds at time of placement
    line_placed: float    # spread/total line at placement (None for h2h)
    closing_odds: int     # Closing line odds (filled after event)
    closing_line: float   # Closing spread/total (filled after event)
    stake: float          # Dollar amount risked
    strategy: str         # "arbitrage", "value", "model"
    model_probability: float   # Model's estimated probability
    market_probability: float  # Market implied probability at placement
    edge_at_placement: float   # model_prob - fair_market_prob
    confidence: str       # "low", "medium", "high"
    notes: str            # Free text notes
    result: str           # "win", "loss", "push", "pending"
    profit: float         # Dollar profit/loss (filled after settlement)
    clv: float            # Closing Line Value (filled after event)


class BetTracker:
    """
    Comprehensive bet tracking and portfolio management system.

    Records every bet with the full set of fields recommended in
    Chapter 37 (Section 37.1), including reasoning, model output,
    and market context.
    """

    def __init__(self, db_path: str = "data/odds_platform.db",
                 initial_bankroll: float = 10000.0):
        self.db_path = db_path
        self.initial_bankroll = initial_bankroll
        self._init_tables()

    def _init_tables(self):
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute("""
            CREATE TABLE IF NOT EXISTS bets (
                bet_id TEXT PRIMARY KEY,
                timestamp TEXT NOT NULL,
                event_id TEXT NOT NULL,
                sport TEXT NOT NULL,
                home_team TEXT NOT NULL,
                away_team TEXT NOT NULL,
                market TEXT NOT NULL,
                outcome TEXT NOT NULL,
                bookmaker TEXT NOT NULL,
                odds_placed INTEGER NOT NULL,
                line_placed REAL,
                closing_odds INTEGER,
                closing_line REAL,
                stake REAL NOT NULL,
                strategy TEXT NOT NULL,
                model_probability REAL,
                market_probability REAL,
                edge_at_placement REAL,
                confidence TEXT,
                notes TEXT,
                result TEXT DEFAULT 'pending',
                profit REAL DEFAULT 0.0,
                clv REAL
            )
        """)

        cursor.execute("""
            CREATE TABLE IF NOT EXISTS bankroll_log (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT NOT NULL,
                balance REAL NOT NULL,
                change REAL NOT NULL,
                reason TEXT NOT NULL,
                bet_id TEXT
            )
        """)

        # Initialize bankroll if empty
        cursor.execute("SELECT COUNT(*) FROM bankroll_log")
        if cursor.fetchone()[0] == 0:
            cursor.execute("""
                INSERT INTO bankroll_log (timestamp, balance, change, reason)
                VALUES (?, ?, ?, ?)
            """, (datetime.now(timezone.utc).isoformat(),
                  self.initial_bankroll, self.initial_bankroll,
                  "Initial deposit"))

        conn.commit()
        conn.close()

    def place_bet(self, bet: BetRecord) -> str:
        """Record a new bet and update bankroll."""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute("""
            INSERT INTO bets
            (bet_id, timestamp, event_id, sport, home_team, away_team,
             market, outcome, bookmaker, odds_placed, line_placed,
             stake, strategy, model_probability, market_probability,
             edge_at_placement, confidence, notes)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            bet.bet_id, bet.timestamp, bet.event_id, bet.sport,
            bet.home_team, bet.away_team, bet.market, bet.outcome,
            bet.bookmaker, bet.odds_placed, bet.line_placed,
            bet.stake, bet.strategy, bet.model_probability,
            bet.market_probability, bet.edge_at_placement,
            bet.confidence, bet.notes
        ))

        # Update bankroll (deduct stake)
        current_balance = self.get_current_bankroll()
        new_balance = current_balance - bet.stake

        cursor.execute("""
            INSERT INTO bankroll_log (timestamp, balance, change, reason, bet_id)
            VALUES (?, ?, ?, ?, ?)
        """, (bet.timestamp, new_balance, -bet.stake,
              f"Bet placed: {bet.outcome} @ {bet.bookmaker}", bet.bet_id))

        conn.commit()
        conn.close()

        logger.info(f"Bet placed: {bet.bet_id} | {bet.outcome} "
                     f"@ {bet.odds_placed} | ${bet.stake:.2f}")
        return bet.bet_id

    def settle_bet(self, bet_id: str, result: str,
                    closing_odds: Optional[int] = None,
                    closing_line: Optional[float] = None):
        """
        Settle a bet and update bankroll.

        Also calculates Closing Line Value (CLV) -- the difference
        between the odds you received and the closing odds.
        CLV is the most reliable predictor of long-term profitability
        (Chapter 12, Section 12.2).
        """
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        # Get bet details
        cursor.execute("SELECT * FROM bets WHERE bet_id = ?", (bet_id,))
        row = cursor.fetchone()
        if not row:
            conn.close()
            raise ValueError(f"Bet not found: {bet_id}")

        columns = [d[0] for d in cursor.description]
        bet_dict = dict(zip(columns, row))

        odds_placed = bet_dict["odds_placed"]
        stake = bet_dict["stake"]

        # Calculate profit
        if result == "win":
            if odds_placed > 0:
                profit = stake * (odds_placed / 100.0)
            else:
                profit = stake * (100.0 / abs(odds_placed))
        elif result == "push":
            profit = 0.0
        else:  # loss
            profit = -stake

        # Calculate CLV
        clv = None
        if closing_odds is not None:
            placed_decimal = american_to_decimal(odds_placed)
            closing_decimal = american_to_decimal(closing_odds)
            # CLV as percentage: (placed_decimal / closing_decimal - 1) * 100
            clv = (placed_decimal / closing_decimal - 1.0) * 100.0

        # Update bet record
        cursor.execute("""
            UPDATE bets
            SET result = ?, profit = ?, closing_odds = ?,
                closing_line = ?, clv = ?
            WHERE bet_id = ?
        """, (result, profit, closing_odds, closing_line, clv, bet_id))

        # Update bankroll (return stake + profit for wins, return stake for pushes)
        current_balance = self.get_current_bankroll()
        if result == "win":
            change = stake + profit  # Get stake back plus winnings
        elif result == "push":
            change = stake  # Get stake back
        else:
            change = 0.0  # Stake already deducted at placement

        new_balance = current_balance + change

        cursor.execute("""
            INSERT INTO bankroll_log (timestamp, balance, change, reason, bet_id)
            VALUES (?, ?, ?, ?, ?)
        """, (datetime.now(timezone.utc).isoformat(), new_balance, change,
              f"Settled: {result} (profit: ${profit:.2f})", bet_id))

        conn.commit()
        conn.close()

        logger.info(f"Bet settled: {bet_id} | {result} | "
                     f"Profit: ${profit:.2f} | CLV: {clv}")

    def get_current_bankroll(self) -> float:
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute("""
            SELECT balance FROM bankroll_log
            ORDER BY id DESC LIMIT 1
        """)
        result = cursor.fetchone()
        conn.close()
        return result[0] if result else self.initial_bankroll

    def get_performance_summary(self, sport: Optional[str] = None,
                                 strategy: Optional[str] = None,
                                 start_date: Optional[str] = None) -> Dict:
        """
        Calculate comprehensive performance metrics (Chapter 37, Section 37.2).
        """
        conn = sqlite3.connect(self.db_path)

        query = "SELECT * FROM bets WHERE result != 'pending'"
        params = []

        if sport:
            query += " AND sport = ?"
            params.append(sport)
        if strategy:
            query += " AND strategy = ?"
            params.append(strategy)
        if start_date:
            query += " AND timestamp >= ?"
            params.append(start_date)

        bets = pd.read_sql(query, conn, params=params)
        conn.close()

        if bets.empty:
            return {"total_bets": 0, "message": "No settled bets found"}

        total_staked = bets["stake"].sum()
        total_profit = bets["profit"].sum()
        wins = (bets["result"] == "win").sum()
        losses = (bets["result"] == "loss").sum()
        pushes = (bets["result"] == "push").sum()

        # ROI (Chapter 3, Section 3.4)
        roi = (total_profit / total_staked * 100) if total_staked > 0 else 0

        # CLV analysis (Chapter 12)
        clv_data = bets["clv"].dropna()
        avg_clv = clv_data.mean() if len(clv_data) > 0 else None

        # Drawdown analysis (Chapter 14, Section 14.4)
        cumulative_profit = bets["profit"].cumsum()
        running_max = cumulative_profit.cummax()
        drawdown = cumulative_profit - running_max
        max_drawdown = drawdown.min()

        # Sharpe ratio proxy (annualized, Chapter 14, Section 14.2)
        if len(bets) > 1 and bets["profit"].std() > 0:
            daily_returns = bets["profit"] / bets["stake"]
            sharpe = (daily_returns.mean() / daily_returns.std()) * np.sqrt(252)
        else:
            sharpe = None

        return {
            "total_bets": len(bets),
            "wins": int(wins),
            "losses": int(losses),
            "pushes": int(pushes),
            "win_rate": wins / (wins + losses) if (wins + losses) > 0 else 0,
            "total_staked": round(total_staked, 2),
            "total_profit": round(total_profit, 2),
            "roi_pct": round(roi, 2),
            "avg_stake": round(bets["stake"].mean(), 2),
            "avg_clv": round(avg_clv, 3) if avg_clv is not None else None,
            "max_drawdown": round(max_drawdown, 2),
            "sharpe_ratio": round(sharpe, 2) if sharpe is not None else None,
            "current_bankroll": round(self.get_current_bankroll(), 2),
        }

Phase 5 Checklist

  • [ ] Bet logging captures all 20+ fields from Chapter 37
  • [ ] Bankroll balance updated in real time after every bet and settlement
  • [ ] CLV calculated for every settled bet with available closing lines
  • [ ] Performance summary includes ROI, win rate, CLV, max drawdown, Sharpe ratio
  • [ ] Filtering by sport, strategy, and date range works correctly
  • [ ] Bet settlement handles win, loss, and push cases correctly

Phase 6: Analysis and Reporting

Duration: Weeks 6--7

Relevant chapters: Chapter 30 (Model Evaluation), Chapter 37 (Performance Analysis), Chapter 41 (Performance Attribution)

6.1 Web Dashboard

Build an interactive dashboard using Streamlit, Dash, or a similar framework. The dashboard should provide operational monitoring and performance analysis.

Required dashboard pages:

  1. Live Monitor -- Current odds, active arbitrage opportunities, active value opportunities, pending bets.
  2. Arbitrage History -- All detected arbitrage opportunities with timestamps, sizes, and execution status.
  3. Value History -- All detected value opportunities with edge, EV, and outcome.
  4. Portfolio Performance -- Cumulative P&L chart, bankroll curve, drawdown chart, ROI by sport, ROI by strategy.
  5. CLV Analysis -- CLV distribution, average CLV by sport, CLV vs. actual profit scatter plot.
  6. Performance Attribution -- Decompose returns by sport, by strategy (arbitrage vs. value), by sportsbook, by market type, and by time period.
"""
phase6_dashboard.py
Streamlit-based web dashboard for the arbitrage and value platform.
"""

import streamlit as st
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from datetime import datetime, timedelta
import sqlite3


def load_data(db_path: str = "data/odds_platform.db"):
    """Load all data from the platform database."""
    conn = sqlite3.connect(db_path)
    bets = pd.read_sql("SELECT * FROM bets", conn)
    bankroll = pd.read_sql("SELECT * FROM bankroll_log", conn)
    conn.close()
    return bets, bankroll


def main():
    st.set_page_config(page_title="Betting Platform", layout="wide")
    st.title("Multi-Sport Arbitrage & Value Detection Platform")

    bets, bankroll = load_data()

    # Sidebar navigation
    page = st.sidebar.selectbox("Page", [
        "Live Monitor",
        "Portfolio Performance",
        "Arbitrage History",
        "Value History",
        "CLV Analysis",
        "Performance Attribution"
    ])

    if page == "Portfolio Performance":
        render_portfolio_page(bets, bankroll)
    elif page == "CLV Analysis":
        render_clv_page(bets)
    elif page == "Performance Attribution":
        render_attribution_page(bets)
    # ... additional pages


def render_portfolio_page(bets: pd.DataFrame, bankroll: pd.DataFrame):
    """Render the portfolio performance page."""
    st.header("Portfolio Performance")

    settled = bets[bets["result"] != "pending"].copy()

    if settled.empty:
        st.warning("No settled bets yet.")
        return

    # Key metrics row
    col1, col2, col3, col4, col5 = st.columns(5)

    total_profit = settled["profit"].sum()
    total_staked = settled["stake"].sum()
    roi = (total_profit / total_staked * 100) if total_staked > 0 else 0
    win_rate = (settled["result"] == "win").mean() * 100

    col1.metric("Total P&L", f"${total_profit:,.2f}")
    col2.metric("ROI", f"{roi:.1f}%")
    col3.metric("Win Rate", f"{win_rate:.1f}%")
    col4.metric("Total Bets", len(settled))
    col5.metric("Bankroll", f"${bankroll['balance'].iloc[-1]:,.2f}")

    # Cumulative P&L chart
    settled = settled.sort_values("timestamp")
    settled["cumulative_pl"] = settled["profit"].cumsum()

    fig = go.Figure()
    fig.add_trace(go.Scatter(
        x=list(range(len(settled))),
        y=settled["cumulative_pl"],
        mode="lines",
        name="Cumulative P&L",
        line=dict(color="green" if total_profit > 0 else "red", width=2)
    ))
    fig.update_layout(
        title="Cumulative Profit & Loss",
        xaxis_title="Bet Number",
        yaxis_title="Cumulative P&L ($)",
        height=400
    )
    st.plotly_chart(fig, use_container_width=True)

    # Drawdown chart (Chapter 14, Section 14.4)
    running_max = settled["cumulative_pl"].cummax()
    drawdown = settled["cumulative_pl"] - running_max

    fig_dd = go.Figure()
    fig_dd.add_trace(go.Scatter(
        x=list(range(len(settled))),
        y=drawdown,
        fill="tozeroy",
        name="Drawdown",
        line=dict(color="red")
    ))
    fig_dd.update_layout(
        title="Drawdown",
        xaxis_title="Bet Number",
        yaxis_title="Drawdown ($)",
        height=300
    )
    st.plotly_chart(fig_dd, use_container_width=True)

    # Performance by sport
    by_sport = settled.groupby("sport").agg(
        bets=("bet_id", "count"),
        profit=("profit", "sum"),
        staked=("stake", "sum"),
        win_rate=("result", lambda x: (x == "win").mean())
    ).reset_index()
    by_sport["roi"] = by_sport["profit"] / by_sport["staked"] * 100

    st.subheader("Performance by Sport")
    st.dataframe(by_sport.style.format({
        "profit": "${:,.2f}",
        "staked": "${:,.2f}",
        "win_rate": "{:.1%}",
        "roi": "{:.1f}%"
    }))


def render_clv_page(bets: pd.DataFrame):
    """
    Render CLV analysis page.

    CLV (Closing Line Value) is the single most important metric
    for evaluating a bettor's long-term edge (Chapter 12, Section 12.2).
    Positive CLV means you consistently beat the closing line,
    which is strongly predictive of long-term profitability.
    """
    st.header("Closing Line Value Analysis")

    clv_bets = bets[bets["clv"].notna()].copy()

    if clv_bets.empty:
        st.warning("No CLV data available yet.")
        return

    avg_clv = clv_bets["clv"].mean()
    st.metric("Average CLV", f"{avg_clv:.2f}%",
              delta="Positive = beating the market" if avg_clv > 0 else "Negative = market beats you")

    # CLV distribution
    fig = px.histogram(clv_bets, x="clv", nbins=30,
                        title="CLV Distribution")
    fig.add_vline(x=0, line_dash="dash", line_color="red")
    fig.add_vline(x=avg_clv, line_dash="dash", line_color="green")
    st.plotly_chart(fig, use_container_width=True)

    # CLV vs actual profit
    fig2 = px.scatter(clv_bets, x="clv", y="profit",
                       color="sport",
                       title="CLV vs. Actual Profit per Bet")
    st.plotly_chart(fig2, use_container_width=True)


def render_attribution_page(bets: pd.DataFrame):
    """
    Performance attribution analysis (Chapter 41, Section 41.4).
    Decompose returns to understand where profit is coming from.
    """
    st.header("Performance Attribution")

    settled = bets[bets["result"] != "pending"].copy()

    if settled.empty:
        st.warning("No settled bets yet.")
        return

    # Attribution by strategy
    st.subheader("By Strategy")
    by_strat = settled.groupby("strategy").agg(
        bets=("bet_id", "count"),
        profit=("profit", "sum"),
        staked=("stake", "sum"),
    ).reset_index()
    by_strat["roi"] = by_strat["profit"] / by_strat["staked"] * 100

    fig = px.bar(by_strat, x="strategy", y="profit",
                  title="Profit by Strategy", color="strategy")
    st.plotly_chart(fig, use_container_width=True)

    # Attribution by bookmaker
    st.subheader("By Sportsbook")
    by_book = settled.groupby("bookmaker").agg(
        bets=("bet_id", "count"),
        profit=("profit", "sum"),
        avg_clv=("clv", "mean"),
    ).reset_index()

    st.dataframe(by_book.style.format({
        "profit": "${:,.2f}",
        "avg_clv": "{:.2f}%"
    }))


if __name__ == "__main__":
    main()

Phase 6 Checklist

  • [ ] Dashboard has at least 4 functional pages
  • [ ] Cumulative P&L chart displays correctly
  • [ ] Drawdown chart updates with new data
  • [ ] Performance segmented by sport, strategy, and bookmaker
  • [ ] CLV analysis page shows distribution and relationship to profit
  • [ ] Dashboard loads within 3 seconds

Grading Rubric

Component Weight Excellent (90-100%) Good (75-89%) Satisfactory (60-74%) Needs Work (<60%)
Odds Infrastructure 20% 8+ books, 5+ sports, < 1s latency, robust error handling 5 books, 3 sports, < 5s latency 3 books, 2 sports, works but fragile Fewer than 3 books; frequent failures
Arbitrage Detection 20% 2-way + 3-way detection; cross-market arbs; data error filtering 2-way + 3-way; basic filtering 2-way only; no data validation Incorrect math; missed arbs
Value Detection 20% 3+ sport models, calibrated, power vig removal, backtested 1 sport model, multiplicative vig removal Elo-only baseline; not calibrated No probability model; arbitrary thresholds
Alert & Execution 15% 3+ channels; full bet logging with CLV; automated settlement 1 channel; bet logging; manual settlement Console only; basic logging No alerts; no tracking
Dashboard & Reporting 15% Interactive web app; 6+ pages; full attribution analysis Static notebook; 4 pages; basic charts Minimal visualization; 2 pages No dashboard; raw numbers only
Documentation 10% Architecture doc + API doc + user guide; clean code Architecture doc; adequate comments README only; sparse comments No documentation

Suggested Timeline

Week Phase Key Activities Milestone
1 Phase 1: Infrastructure API setup, database schema, initial data collection Odds flowing into database
2 Phase 1 (cont.) + Phase 2 Continuous collection; two-way arbitrage detection Arbs detected on live data
3 Phase 2 (cont.) + Phase 3 Three-way arbs; begin value model for first sport First value bets identified
4 Phase 3 (cont.) + Phase 4 Calibrate value model; build alert system Alerts firing on detection
5 Phase 4 (cont.) + Phase 5 Multi-channel alerts; bet logging system Full bet tracking operational
6 Phase 5 (cont.) + Phase 6 Settlement; P&L tracking; begin dashboard Performance metrics available
7 Phase 6: Dashboard Complete all dashboard pages; simulated operation Dashboard functional
8 Documentation & Polish Architecture doc; simulated operation report; cleanup All deliverables submitted

Chapter Reference Index

  • Chapter 2 (Probability and Odds): Odds conversion, implied probability, vig removal methods
  • Chapter 3 (Expected Value): EV calculation for value bets
  • Chapter 4 (Bankroll Management): Bankroll allocation across arbitrage and value strategies
  • Chapter 5 (Data Literacy): Database design, data quality, API integration
  • Chapter 11 (Betting Markets): Market structure, how odds are set, market efficiency
  • Chapter 12 (Line Shopping): Multi-book comparison, CLV tracking and interpretation
  • Chapter 13 (Value Betting): True vs. market probability, systematic value identification
  • Chapter 14 (Advanced Bankroll): Portfolio theory for multi-strategy operations, drawdown management
  • Chapter 25 (Optimization): Arbitrage detection algorithms, portfolio optimization
  • Chapter 26 (Ratings and Rankings): Elo model as baseline probability engine
  • Chapter 27 (Advanced Regression): Probability calibration for value models
  • Chapter 30 (Model Evaluation): Backtesting value models, calibration assessment
  • Chapter 31 (ML Betting Pipeline): System architecture, data pipelines, monitoring
  • Chapter 37 (Discipline Systems): Bet logging, performance tracking, systematic processes
  • Chapter 38 (Risk Management): Loss limits, risk controls, responsible gambling
  • Chapter 41 (Putting It Together): Performance attribution, portfolio approach

Tips for Success

  1. Start with data reliability. Your arbitrage detector is only as good as your odds data. An erroneous odds value will produce phantom arbitrage opportunities. Build data validation from day one.

  2. Arbitrage windows are narrow. Real arbitrage opportunities in liquid markets typically last seconds to minutes. Your system's value is proportional to its speed. Focus on latency optimization.

  3. Beware of execution risk. Detecting an arbitrage is not the same as capturing it. Sportsbooks may reject bets, limits may be too low, or one leg may execute while the other fails. Always account for these scenarios in your P&L tracking.

  4. CLV is your north star. The single best indicator of whether your value detection is working is Closing Line Value (Chapter 12). If you consistently get better odds than the closing line, you are likely a long-term winner even if short-term results are negative.

  5. Calibration matters enormously. An uncalibrated probability model will generate false value signals. Use Chapter 27's calibration techniques and Chapter 30's evaluation methods rigorously.

  6. Build for monitoring, not just detection. The most successful platforms are those that tell you when something is wrong (stale data, model drift, API failures) as quickly as they tell you when something is right (opportunities).


This capstone project integrates material from Chapters 2--5, 11--14, 25--27, 30--31, 37--38, and 41 of The Sports Betting Textbook.