Seasons 1-3 (3,690 games): Walk-forward training and validation. - Season 4 (1,230 games): Model selection, calibration, and hyperparameter tuning. - Season 5 (1,230 games): Final evaluation and backtest. Touched only once after all decisions are made. → Chapter 30 Quiz: Model Evaluation and Selection
(a) Sentiment Analysis:
Text A: Compound approximately -0.5 (injury + uncertainty terms dominate) - Text B: Compound approximately +0.7 (winning, elite, positive terms) - Text C: Compound approximately -0.8 (ruled out is -3.0, breaking news context) → Chapter 32 Quiz: Natural Language Processing for Betting
Text A: {player: "Jayson Tatum", injury: "right ankle soreness", status: "questionable", body_part: "right ankle", injury_type: "soreness", confidence: 0.70} - Text C: {player: "Jayson Tatum", injury: "ruled out", status: "out", confidence: 0.70}. Updated status from questionable to out. → Chapter 32 Quiz: Natural Language Processing for Betting
(c) Event Detection:
Text A: INJURY_UPDATE, significance 0.7, star player modifier 1.5x = expected line move ~3.0 pts - Text B: GENERAL_NEWS, significance 0.1 - Text C: INJURY_UPDATE, significance 0.7 + 0.2 (BREAKING) = 0.9, star player 1.5x = expected line move ~4.5 pts → Chapter 32 Quiz: Natural Language Processing for Betting
(c) Model Integration:
Starting pitcher: Feature = pitcher's season ERA, WHIP, expected win-share value. This is the highest-impact feature, as the starting pitcher determines 30-40% of outcome variance. - Bullpen: Feature = bullpen freshness score (inverse of aggregate recent IP for relievers). High workload = negative i → Chapter 32 Quiz: Natural Language Processing for Betting
(c) Quarter-Kelly stakes:
KC -3: b = 0.926, f* = (0.5405*0.926 - 0.4595)/0.926 = 0.0443, quarter = 1.11%, stake = **$222** - Over 47.5: b = 0.909, f* = (0.5412*0.909 - 0.4588)/0.909 = 0.0361, quarter = 0.90%, stake = **$180** - BUF ML: b = 1.50, f* = (0.4203*1.50 - 0.5797)/1.50 = 0.0340, quarter = 0.85%, stake = **$170** → Chapter 13 Quiz: Value Betting Theory and Practice
(i) NLP Social Media Monitoring:
**How it works:** NLP models continuously scan Twitter/X, Instagram, team accounts, and news feeds for keywords indicating injuries, lineup changes, or other material information. When detected, the system automatically flags the information and may trigger line adjustments within seconds, often bef → Chapter 40 Quiz: The Future of Sports Betting
(ii) Reinforcement Learning for Line Adjustment:
**How it works:** An RL agent learns optimal pricing strategies through trial-and-error interaction with bettor behavior. The agent observes bet flow (volume, source, timing) and adjusts lines to maximize expected profit, learning to shade prices based on observed bettor patterns (e.g., adding extra → Chapter 40 Quiz: The Future of Sports Betting
(iii) Behavioral Segmentation:
**How it works:** ML classification models analyze each bettor's activity in real time: bet types, timing, sizing, sport preferences, response to promotions, and win/loss patterns. Bettors are classified into segments (casual, engaged, semi-sharp, sharp, VIP) and receive different treatment (limits, → Chapter 40 Quiz: The Future of Sports Betting
(iv) Computer Vision Pre-Game Analysis:
**How it works:** Cameras in stadiums and training facilities capture player movements during warmups. Computer vision models analyze gait, range of motion, and activity levels to detect potential injuries or physical limitations not yet publicly known. This information feeds into the odds compilati → Chapter 40 Quiz: The Future of Sports Betting
+$5.43
Game 2: Over 45.5 at -105. EV = 0.55 x 95.24 - 0.45 x 100 = 52.38 - 45.00 = **+$7.38** - Game 3: BUF ML at +150. Win $250 with 42% prob, lose $100 with 58% prob. EV per $100 = 0.42 x 150 - 0.58 x 100 = 63.00 - 58.00 = **+$5.00**. At $500: EV = $25.00. → Chapter 12 Quiz: Line Shopping and Odds Optimization
Create a Google Sheet with one row per bet and columns: Date, Sport, Selection, Odds, Stake, Bankroll, Reasoning (2-3 sentences), Confidence (1-10), Mood (1-10), Result, P&L, Was Process Followed (Y/N). - Bookmark the sheet on phone and laptop for quick access. → Chapter 37 Quiz: Discipline, Systems, and Record-Keeping
Add a "Summary" tab to the Google Sheet with formulas for: total bets, win rate, total P&L, ROI, and average stake. - Set a recurring calendar event: "Weekly Review" every Sunday at 7 PM (30 minutes). During the review, update the summary, read through the week's reasoning column, and note any patte → Chapter 37 Quiz: Discipline, Systems, and Record-Keeping
Write a one-page "Rules Card" (physical index card or phone note) containing: minimum edge threshold (e.g., 2%), maximum stake (e.g., 3% of bankroll), required reasoning length (minimum 2 sentences), and the question "Have I considered at least one reason this bet could lose?" - Tape the card next t → Chapter 37 Quiz: Discipline, Systems, and Record-Keeping
4. Enforce (15 minutes to set up):
Write three non-negotiable rules at the top of the Rules Card: daily loss limit (e.g., 5% of bankroll), maximum bets per day (e.g., 8), and no betting after midnight. - Tell one trusted person (friend, partner, fellow bettor) about these three rules and ask them to hold you accountable. → Chapter 37 Quiz: Discipline, Systems, and Record-Keeping
4. Ensemble Methods for Imbalanced Data
**BalancedRandomForest:** Random forest variant that balances each bootstrap sample. - **EasyEnsemble:** Trains multiple models on balanced subsets and averages. - **RUSBoost:** Combines random undersampling with boosting. → Chapter 27: Advanced Regression and Classification
Add a "Lessons" tab to the Google Sheet. After each weekly review, write one sentence: "This week I learned..." and one sentence: "Next week I will change..." - Set a recurring calendar event: "Monthly Deep Review" on the first Sunday of each month (1 hour). → Chapter 37 Quiz: Discipline, Systems, and Record-Keeping
60%
The effective sample size is $\alpha + \beta = 6 + 4 = 10$ (equivalent to having observed 6 wins in 10 games) - The distribution is moderately concentrated around 60%, expressing moderate confidence → Chapter 10 Quiz: Bayesian Thinking for Bettors
64.29%
Profit if win: \$100 x (100/180) = \$55.56 - EV = (0.62 x \$55.56) - (0.38 x \$100) = \$34.44 - \$38.00 = **-\$3.56** - This is a negative EV bet. Your model says 62% but the line implies 64.29%, meaning the sportsbook is pricing the Bucks as more likely to win than your model estimates. → Chapter 1 Quiz: Introduction to Sports Betting
A well-known sharp group has identified Cardinals -6 as mispriced. Their models say the fair line is Cardinals -7.5. - They simultaneously hit Cardinals -6 at Pinnacle, Circa, Bookmaker, and three offshore books. - Bet sizes: $5,000 to $25,000 per book, depending on limits. → Chapter 11: Understanding Betting Markets
DraftKings, FanDuel, and BetMGM --- which monitor sharp book movements --- adjust their lines from Cardinals -6 to Cardinals -6.5. - Books that are slower to react briefly show Cardinals -6, creating an arbitrage window. → Chapter 11: Understanding Betting Markets
Demonstrates the law of large numbers and builds intuition for probability convergence. - **`OddsConverter`** --- A complete class for converting between all five major odds formats. Use this as a utility throughout your betting operations. - **`analyze_market()`** --- Extracts implied probabilities → Chapter 2: Probability and Odds
Maintain multiple models and regularly evaluate their performance - Develop new models and data sources continuously, even when current ones are working - Monitor edge metrics (CLV, ROI by strategy) for signs of decay - Be willing to retire strategies that are no longer profitable - Be willing to sc → Chapter 41: Putting It All Together
Advantages of professional status:
Losses deductible against all income (not just winnings) - Business expenses deductible (data subscriptions, software, equipment, travel) - Estimated tax savings: $4,000 this year → Chapter 38 Quiz: Risk Management and Responsible Gambling
Advantages:
Extremely simple to implement and track - Minimizes the impact of any single loss - Removes emotional decision-making from bet sizing - Robust to estimation errors in your edge → Chapter 4: Bankroll Management Fundamentals
**Always available liquidity:** An AMM always offers a price, unlike an order book which requires counterparties. Bettors can always place bets without waiting for a match. - **Simpler user experience:** Bettors interact with a pool rather than navigating an order book, reducing the complexity barri → Chapter 40 Quiz: The Future of Sports Betting
AMM Disadvantages:
**Higher effective margins:** AMM pricing formulas typically produce wider spreads than competitive order books, especially for large bets that significantly impact pool pricing. - **Impermanent loss risk:** Liquidity providers face the risk that one-sided betting patterns drain the pool on the winn → Chapter 40 Quiz: The Future of Sports Betting
Anticipate industry trends
global market expansion, regulatory evolution, and technological convergence with financial markets --- and position your betting operation to adapt as the landscape changes. → Part IX: Industry and Future
Application and vetting:
Background investigations of owners, directors, and key employees - Financial capacity requirements (minimum capital, bond requirements) - Technical standards compliance (system testing, security audits) - Operational plans (responsible gambling, AML, data protection) → Chapter 39: The Sports Betting Industry
The confidence interval spanning zero means the model could actually be unprofitable once deployed. - Backtest ROI may overstate live performance due to execution differences (getting worse odds, missed bets, etc.). - Model selection bias (you chose this model because it looked good in the backtest) → Chapter 30 Quiz: Model Evaluation and Selection
Arguments for deployment (with caution):
The Brier score (0.208) and ECE (0.015) indicate a genuinely well-calibrated and discriminating model, which is a good foundation. - The confidence interval is centered on positive ROI, with the lower bound only slightly negative. - Three seasons is a reasonable evaluation period but may not provide → Chapter 30 Quiz: Model Evaluation and Selection
Assess emerging technologies
AI-driven pricing, blockchain platforms, betting exchanges, and micro-betting --- for their potential impact on market structure, efficiency, and opportunity. → Part IX: Industry and Future
Historical odds data for Australian sports (AFL, NRL, A-League) and international sports. Free CSV downloads. Includes closing Pinnacle lines. → Appendix D: Data Sources Directory
Lakers: **-135 at Sportsbook Y** (better for the bettor; lower vig on the favorite) - Celtics: **+130 at Sportsbook X** (higher payout on the underdog) → Chapter 2 Quiz: Probability and Odds
specifically, the Tree-structured Parzen Estimator (TPE) --- to intelligently explore the hyperparameter space. Unlike grid or random search, Optuna uses the results of previous trials to decide which configurations to try next, focusing on promising regions of the search space. → Chapter 29: Neural Networks for Sports Prediction
Behavioral warning signs:
Preoccupation with betting that interferes with work or relationships. - Irritability or restlessness when unable to bet. - Using betting to escape negative emotions (anxiety, depression, boredom) rather than as genuine entertainment. - Repeatedly attempting to cut back or stop without success. - Ch → Chapter 36 Quiz: The Psychology of Betting
when you placed the bet 2. **Bet odds** --- the exact decimal (or American) odds you received 3. **Closing odds** --- the final odds for the same selection at game time 4. **Market** --- which sportsbook or market you used → Case Study: Closing Line Value --- The Ultimate Edge Metric
Bet Timing:
The optimal time to bet depends on your edge source and the sport - NFL and NBA bets generally favor early action for model-based edges - MLB bets should account for confirmed pitchers and weather - Steam moves represent coordinated sharp action and are generally not to be faded - Combining systemat → Chapter 12: Line Shopping and Market Analysis
Bet Tracking and Journaling:
Track every bet with complete metadata: odds, closing line, stake, result, model, and reasoning - A qualitative journal captures the pre-bet thesis, counter-arguments, and post-event review - Systematic tracking enables performance analysis across sports, markets, books, and time periods → Chapter 13: Value Betting Theory and Practice
The world's largest betting exchange. Historical data available through the Betfair Historical Data portal (subscription required). Includes tick-by-tick price data, matched volumes, and full order book snapshots. - API: Free with Betfair account. Supports placing bets, streaming prices, and histori → Appendix D: Data Sources Directory
Player X: Sportsbook B at -160 (lower implied probability of 61.54% vs. 64.29%, so better odds for the bettor) - Player Y: Sportsbook A at +155 (lower implied probability of 39.22% vs. 40.82%, so better odds for the bettor) → Chapter 2 Exercises: Probability and Odds
Over/under props may be mispriced because the tails are fatter than the normal model assumes - Betting on extreme alternate lines (e.g., "over 35.5" at long odds) may offer value if the book uses a normal distribution to set prices - The player is both more predictable (many games near the mean) and → Chapter 6 Quiz: Descriptive Statistics for Sports
Narrative bias: constructing a story from the game action ("this team looks sluggish") that may not reflect probabilistic reality. - Availability bias: the visual impressions from the last few minutes of play are disproportionately salient. - Overconfidence in subjective observation: eye-test impres → Chapter 36 Quiz: The Psychology of Betting
Biases on the "trust the model" side:
Automation bias: over-trusting the model because it is quantitative, even when it may be using stale inputs or missing live-game context. → Chapter 36 Quiz: The Psychology of Betting
blue-chip ratio
the percentage of a team's roster composed of 4-star and 5-star recruits---is one of the strongest predictors of championship-level performance. The finding is striking in its simplicity: → Chapter 20: Modeling College Sports
C
Challenges in African markets:
Responsible gambling infrastructure is often underdeveloped - Payment processing can be complex (reliance on mobile money) - Regulatory capacity and enforcement may be limited - Internet connectivity and speed constraints affect real-time products - Tax structures are sometimes punitive and unstable → Chapter 40: The Future of Sports Betting
Challenges:
High margins erode expected value - Models must be extremely fast and accurate - The data infrastructure required is significant - Edge may be transient, as operators rapidly improve automated pricing - Regulatory restrictions may limit availability in some jurisdictions → Chapter 40: The Future of Sports Betting
CLV is the most reliable predictor of long-term betting profitability - It measures the quality of your betting decisions independently of outcome variance - CLV converges to a meaningful signal faster than raw win/loss records - Calculating CLV requires recording both your placed odds and the closi → Chapter 12: Line Shopping and Market Analysis
social media sentiment, injury reports, weather, referee tendencies, and player tracking data --- through automated pipelines that deliver actionable features to your models. → Part VII: Live Betting and Advanced Markets
Combine using weighted averaging
it is robust and requires minimal tuning. Use stacking only when you have enough historical data (500+ games) to train the meta-model reliably. 6. **Recalibrate the ensemble** periodically. Optimal weights shift over time as the competitive landscape changes. → Chapter 26: Ratings and Ranking Systems
Common regulatory trends:
Mandatory KYC and AML compliance - Responsible gambling requirements (deposit limits, self-exclusion) - Advertising restrictions, particularly regarding minors and vulnerable populations - Data localization requirements - Integrity monitoring obligations - Increasing tax rates as governments seek to → Chapter 40: The Future of Sports Betting
Common tests used in this book:
**One-sample z-test:** Test whether a bettor's win rate p-hat differs from the break-even rate p_0. z = (p-hat - p_0) / sqrt(p_0*(1-p_0)/n). - **Two-sample t-test:** Compare means of two groups (e.g., model A performance vs. model B). - **Chi-squared goodness-of-fit:** Test whether observed frequenc → Appendix A: Mathematical Foundations
Kaizen, A/B testing, feedback loops, and systematic mistake analysis --- ensure that your process improves over time. The compounding effect of small, consistent improvements is one of the most powerful edges available to a sports bettor. → Chapter 37: Discipline, Systems, and Record-Keeping
Convention notes:
Bold lowercase letters (**x**, **w**) denote column vectors. - Bold uppercase letters (**X**, **A**) denote matrices. - Subscripts index elements: x_i is the i-th element of vector **x**; X_{ij} is the element in row i, column j of matrix **X**. - Hats (^) over parameters denote estimators or fitted → Appendix A: Mathematical Foundations
Same-game parlays can be +EV when books fail to properly price positive correlations - The Gaussian copula method enables simulation of correlated parlay outcomes - Parlays compound edge but also compound vig; they are only advantageous with sufficient per-leg edge or mispriced correlations - System → Chapter 14: Advanced Bankroll and Staking Strategies
Critically evaluate any quantitative betting claim
**Inefficiency is the opportunity.** Every emerging market discussed in this chapter is less efficient than the corresponding major market. The quantitative bettor's advantage is proportional to the market's inefficiency. - **Data challenges require creative solutions.** From esports patch disruptio → Chapter 22: Modeling Emerging Markets
Current approaches:
Fractional Kelly (betting a fixed fraction of Kelly, typically 25--50%) as a heuristic for accounting for estimation error - Bayesian Kelly, which integrates over the posterior distribution of the edge - Robust optimization approaches that maximize worst-case expected log-wealth → Chapter 42: Research Frontiers
Current landscape (as of early 2026):
38+ states plus DC have legalized sports betting in some form - Mobile betting is legal in approximately 30 states - Several additional states have legislation pending or under active consideration - Notable holdouts include California (the largest potential market), Texas, and Georgia, though legis → Chapter 40: The Future of Sports Betting
Customer Due Diligence (CDD):
Identity verification for all customers (KYC) - Enhanced due diligence for high-value or high-risk customers - Ongoing monitoring of customer activity - Record-keeping of all transactions (typically 5--7 years) → Chapter 39: The Sports Betting Industry
The nflfastR/nfl_data_py ecosystem provides rich play-by-play data with pre-calculated EPA, WPA, and other advanced metrics dating back to 1999. - EPA (Expected Points Added) is the foundational metric for measuring play and player value, superior to raw yardage because it accounts for game situatio → Chapter 15: Modeling the NFL
Data Collector
Fetches odds from the API on a schedule 2. **Database** -- Stores historical odds in SQLite 3. **Analyzer** -- Compares current odds to find value 4. **Alerter** -- Sends notifications when value is detected → Chapter 12: Line Shopping and Market Analysis
structuring match results for regression. 2. **Parameter estimation** — fitting a Poisson regression to extract attack ratings, defense ratings, and home advantage. 3. **Prediction** — computing scoreline probabilities and derived market probabilities (1X2, Over/Under, BTTS). 4. **Value identificati → Case Study: Poisson Modeling of Soccer — Predicting Match Outcomes
Data Snooping Simulation:
Simulate 100 betting strategies, each with 200 bets at true 50% win rate - Show the distribution of p-values (should be approximately uniform) - Demonstrate how many strategies appear "significant" by chance - Apply corrections and show the reduction in false discoveries → Chapter 8 Exercises: Hypothesis Testing and Statistical Significance
a mechanism for bringing real-world data onto the blockchain --- reports the result. The oracle might be a decentralized network of reporters (as in Augur or UMA), a trusted data feed (as in Chainlink), or a combination of approaches. → Chapter 40: The Future of Sports Betting
Decentralized Protocol (1.93, 1% fee, $5 gas):
Effective profit if win: $930 x 0.99 - $5 = $920.70 - $5 = $915.70 - Wait, let me recalculate: odds 1.93 means profit = $930 on a $1,000 bet. - After 1% fee: $930 x 0.99 = $920.70 - After gas: $920.70 - $5 = $915.70 - Win: 0.52 x $915.70 = $476.16 - Lose: 0.48 x (-$1,000 - $5) = 0.48 x (-$1,005) = - → Chapter 40 Quiz: The Future of Sports Betting
Deep Feature Synthesis (DFS)
automatically generating features by applying transformation and aggregation primitives to relational datasets. For sports data, this is particularly powerful because our data naturally has relational structure: games contain players, players belong to teams, teams belong to leagues, games occur at → Chapter 28: Feature Engineering for Sports Prediction
Periodic emotional check-ins (every 30-45 minutes) - Monitor for behavioral indicators: faster decisions, larger bets, skipped checklist items - Track the post-loss cascade: frustration, rumination, action-seeking, process abandonment → Chapter 36 Key Takeaways: The Psychology of Betting
Detection:
New "discipline drift" alerts: if checklist completion drops below 95% in any week, or if average reasoning length drops below 25 words, or if stake sizes exceed 2.5% more than twice in a week, an automatic alert fires. - Weekly review template expanded to include explicit review of every override a → Case Study 2: The Discipline Breakdown --- A Bettor's Worst Month
Different models and risk assessments
Each book's trading desk uses proprietary models 2. **Different customer bases** -- A book with sharp action will move lines differently than a recreational-heavy book 3. **Different hold targets** -- Some books aim for 4.5% hold; others target 6% or more 4. **Timing of line moves** -- Books react t → Chapter 12: Line Shopping and Market Analysis
dimension-free
unlike numerical integration methods whose cost explodes with dimensionality, Monte Carlo works just as well in 1,000 dimensions as in one. It is a curse because improving precision is expensive: going from 1% precision to 0.1% precision requires 100 times as many samples. → Chapter 24: Simulation and Monte Carlo Methods
Disadvantages:
Does not account for varying levels of confidence or edge across different bets - Does not adjust for odds (a bet at +300 and a bet at -150 receive the same stake) - The fixed dollar amount gradually becomes a larger percentage of bankroll during losing streaks and a smaller percentage during winnin → Chapter 4: Bankroll Management Fundamentals
Discover hidden structure
interactions, nonlinearities, and temporal patterns --- that domain experts might overlook. - **Scale across markets**, applying the same pipeline to spreads, totals, moneylines, player props, and futures with minimal modification. → Part VI: Machine Learning for Sports Betting
combine different model families, not just different hyperparameters. 3. **Calibration is not optional.** Uncalibrated probabilities lead to incorrect bet sizing and phantom value. 4. **Class imbalance requires special handling** for rare-outcome markets, but always recalibrate after rebalancing. 5. → Chapter 27: Advanced Regression and Classification
Does the data fit?
Always validate with visual inspection and formal goodness-of-fit tests (Section 7.5). - If the fit is poor, consider extensions: negative binomial for overdispersed counts, t-distribution for heavy-tailed continuous data, mixture models for multimodal data. → Chapter 7: Probability Distributions in Betting
Drawdown Management:
Maximum drawdown analysis quantifies the worst expected decline in bankroll - Recovery time grows rapidly with drawdown depth; a 30% drawdown can take hundreds of bets to recover - Pre-committed drawdown policies prevent emotional decision-making during inevitable downswings - The psychological impa → Chapter 14: Advanced Bankroll and Staking Strategies
Drawdowns are psychologically devastating
even with a positive edge, extended drawdowns cause self-doubt and poor decision-making 2. **Drawdowns determine survival** -- if you run out of bankroll, your edge is meaningless 3. **Drawdowns inform bankroll sizing** -- your initial bankroll should be large enough to survive expected drawdowns → Chapter 14: Advanced Bankroll and Staking Strategies
E
Edge uncertainty
240 bets is a small sample. The 95% confidence interval for their true win rate is approximately 47.9% to 60.5%. The lower end of this interval means they may have no edge at all. (2) **Flat bet sizing** -- they bet a fixed $300 regardless of bankroll, meaning early in their history they were riskin → Chapter 4 Quiz: Bankroll Management Fundamentals
Embedding dimensions:
Teams: 32 teams, d = min(50, ceil(32/2)) = 16. Two team embeddings (home and away) contribute 32 dimensions. - Venues: 30 stadiums, d = min(50, ceil(30/2)) = 15. One venue embedding contributes 15 dimensions. - Total embedding dimensions: 16 + 16 + 15 = 47. → Chapter 29 Quiz: Neural Networks for Sports Prediction
Emotional review:
What was your emotional state during the week? - Were there tilt episodes? What triggered them? How did you respond? - Did external factors (stress, fatigue, personal events) affect your betting? → Chapter 37: Discipline, Systems, and Record-Keeping
Emotional warning signs:
Significant mood swings tied to betting outcomes - Feeling anxious or irritable when not betting - Using betting as a coping mechanism for stress, depression, or boredom - Feeling a "rush" from the act of placing a bet (as distinct from the intellectual satisfaction of identifying value) - Experienc → Chapter 38: Risk Management and Responsible Gambling
entity embeddings
dense, learned vector representations of teams --- can capture meaningful similarity structures directly from game outcome data. We train a neural network with team embeddings on five NBA seasons (2019-2024), analyze the learned embedding space to discover that teams cluster by playing style and con → Case Study 1: Entity Embeddings for NBA Team Representation
Undocumented but widely used public endpoints for scores, schedules, standings, and rosters across all major sports. JSON format. No authentication required for basic endpoints. - Example: `site.api.espn.com/apis/site/v2/sports/football/nfl/scoreboard` - Coverage: NFL, NBA, MLB, NHL, college sports, → Appendix D: Data Sources Directory
Estimated pages: 10 | Estimated words: 4,000
B.1 Standard Normal Distribution Table - B.2 Student's t-Distribution Critical Values - B.3 Chi-Squared Distribution Critical Values - B.4 F-Distribution Critical Values - B.5 Poisson Distribution Tables - B.6 Binomial Coefficients and Probability Tables → The Sports Betting Textbook: From Strategies to Mathematical Models
Estimated pages: 20 | Estimated words: 8,000
A.1 Set Theory and Combinatorics - A.2 Calculus Review (derivatives, integrals, optimization) - A.3 Linear Algebra (vectors, matrices, eigenvalues) - A.4 Probability Theory (axioms, distributions, expectation) - A.5 Optimization (unconstrained, constrained, convex) - A.6 Information Theory (entropy, → The Sports Betting Textbook: From Strategies to Mathematical Models
This arrangement involves deception of the sportsbook and potentially the regulatory authorities. - It undermines the integrity of the KYC process, which exists for anti-money laundering and responsible gambling purposes. - The chapter explicitly states: "Operating multiple accounts at the same spor → Chapter 38 Quiz: Risk Management and Responsible Gambling
Evaluation methodology:
**Validation scheme:** Walk-forward validation across three full NBA seasons (e.g., train on all data before each game, predict that game, advance). No random splitting. - **Primary metric:** Brier score (average squared error of predicted win probabilities). - **Secondary metric:** Simulated ROI as → Chapter 28 Quiz: Feature Engineering for Sports Betting
Evidence AGAINST (or at least not conclusive):
The p-value (0.024) is significant but not overwhelming; it would not survive a more stringent threshold (alpha = 0.01, p = 0.024 > 0.01) - The confidence interval for win rate (50.04% to 56.96%) barely excludes 50% - Cannot reject the null that his true win rate equals the breakeven rate (p = 0.263 → Case Study 1: Is This Bettor Skilled or Lucky? A Statistical Investigation
Evidence FOR skill:
Win rate of 53.5% is statistically significant at the 5% level (z = 1.98, p = 0.024) against the null of 50% - Consistent performance across sports and seasons (no evidence of a single lucky streak) - Profitable in absolute terms (+$3,527) → Case Study 1: Is This Bettor Skilled or Lucky? A Statistical Investigation
Examples:
Early injury information (knowing about a key player's injury before it is publicly announced) - Lineup information for sports where lineups are not announced until close to game time - Detailed scouting information about player form, tactical plans, or team chemistry - Weather information that is m → Chapter 42: Research Frontiers
E(Thursday, Underdog Covers) = 123 * 830 / 1621 = 62.98 - E(Thursday, Favorite Covers) = 123 * 791 / 1621 = 60.02 - E(Sunday, Underdog Covers) = 1498 * 830 / 1621 = 767.02 - E(Sunday, Favorite Covers) = 1498 * 791 / 1621 = 730.98 → Case Study 2: The Thursday Night Football Effect — Myth or Reality?
Expected high correlations:
EPA/play (offense) and Yards per play (offense): r likely > 0.75, as both measure offensive efficiency per play. - EPA/play (offense) and Success rate (offense): r likely > 0.65, as successful plays drive positive EPA. - EPA/play (defense) and Yards per play (defense): r likely > 0.70, same logic as → Chapter 28 Quiz: Feature Engineering for Sports Betting
the tendency of casual bettors to overbet longshots (because the potential payoff is exciting) and underbet favorites (because the payoff seems small). As a result, the implied probability of longshots is often inflated more than that of favorites relative to the true probability. → Chapter 2: Probability and Odds
feature clustering
grouping correlated features together and replacing each cluster with a single representative or aggregate. This is particularly useful when you have many features measuring similar constructs (e.g., multiple metrics of defensive quality). → Chapter 28: Feature Engineering for Sports Prediction
Feature leakage
using a team's full-season win percentage to predict mid-season games. At game 40, you would only know the win percentage through game 39, but including the full season's record introduces future information. → Chapter 30 Quiz: Model Evaluation and Selection
features
numerical representations that capture the underlying dynamics that drive outcomes. This transformation process, known as **feature engineering**, is widely regarded as the single most impactful step in any predictive modeling pipeline. → Chapter 28: Feature Engineering for Sports Prediction
Federal tax obligations:
**All winnings are taxable.** This includes winnings from sports betting, casino games, fantasy sports, and any other form of gambling. - **Reporting threshold for sportsbooks.** Sportsbooks are required to issue a Form W-2G for certain winnings. For sports betting, this is triggered when winnings o → Chapter 38: Risk Management and Responsible Gambling
Financial warning signs:
Betting more than he can afford to lose (defined as money allocated for essentials like rent, food, or debt payments). - Increasing bet sizes to "make up for" previous losses. - Borrowing money to fund betting. - Hiding the extent of betting from a partner or family. → Chapter 36 Quiz: The Psychology of Betting
Football-Data.co.uk (football-data.co.uk)
Free downloadable CSV files with historical match results and bookmaker odds for major European soccer leagues from the mid-1990s to present. Includes Bet365, Pinnacle, and market average odds. - Essential resource for soccer betting research. Updated weekly during seasons. → Appendix D: Data Sources Directory
Friend A (modeler) likely biases:
**Overconfidence:** Friend A may be too certain that the model is sound and dismiss Friend B's concerns without adequate consideration. - **Sunk cost fallacy:** Friend A has invested significant time building the model and may resist acknowledging potential problems. - **Confirmation bias:** Friend → Chapter 36 Quiz: The Psychology of Betting
Friend B (bankroll provider) likely biases:
**Loss aversion:** The pain of watching a 12% decline in capital is psychologically more intense than the pleasure of equivalent gains, driving a desire to reduce exposure. - **Outcome orientation:** Friend B may evaluate the strategy based on the three-month result rather than the process quality o → Chapter 36 Quiz: The Psychology of Betting
Further Reading
Kahneman, Daniel. *Thinking, Fast and Slow*. Farrar, Straus and Giroux, 2011. - Duke, Annie. *Thinking in Bets: Making Smarter Decisions When You Don't Have All the Facts*. Portfolio, 2018. - Thaler, Richard H. *Misbehaving: The Making of Behavioral Economics*. W.W. Norton, 2015. - Tetlock, Philip E → Chapter 36: The Psychology of Betting
Official data rights holder for many NCAA sports and several international leagues. Live data feeds and trading tools. → Appendix D: Data Sources Directory
Glicko-2 parameters:
$\mu$: Rating on the Glicko-2 scale (related to Elo by $\mu = (R - 1500) / 173.7178$) - $\phi$: Rating deviation (analogous to standard deviation of the rating estimate) - $\sigma$: Volatility (how much the player's true ability tends to fluctuate) - $\tau$: System constant that constrains volatilit → Chapter 21: Modeling Combat Sports and Tennis
H
Helpline Numbers (Repeat for Emphasis)
**US:** National Council on Problem Gambling, 1-800-522-4700 (24/7) - **UK:** GamCare, 0808-8020-133 - **Australia:** Gambling Help, 1800-858-858 - **International:** Gamblers Anonymous, www.gamblersanonymous.org → Chapter 38: Risk Management and Responsible Gambling
the mathematical advantage the sportsbook holds. It is extracted through the vig. No matter which side the public bets, the house expects to profit approximately 4.55 cents on every dollar wagered. → Chapter 3: Expected Value and the Bettor's Edge
checklists, decision rules, automated guardrails, and review cycles --- that enforce discipline independent of your emotional state on any given day. → Part VIII: Psychology and Discipline
Implications for bettors:
Greater market access as more jurisdictions legalize - More standardized consumer protections globally - Potentially fewer opportunities for regulatory arbitrage as frameworks converge - Better data availability as regulated markets require transparent reporting - Increasing tax burden on operators → Chapter 40: The Future of Sports Betting
Leg 1: 1/1.80 = 55.56% - Leg 2: 1/2.10 = 47.62% - Leg 3: 1/1.50 = 66.67% - Combined: 0.5556 x 0.4762 x 0.6667 = 0.1764 = **17.64%** → Chapter 2 Exercises: Probability and Odds
In this chapter, you will learn to:
Convert between American odds, decimal odds, fractional odds, and implied probabilities fluently - Calculate the vigorish embedded in any set of odds and assess the true cost of each wager - Set up a Python-based betting analysis environment with basic tracking and calculation tools → Chapter 1: Introduction to Sports Betting
Increased bet frequency
placing bets outside normal patterns, including bets on unfamiliar sports or markets to "get action." 2. **Increased stake sizes** --- betting larger amounts than the model or staking plan recommends, often to recover recent losses quickly. 3. **Ignoring model output** --- overriding quantitative re → Chapter 36 Quiz: The Psychology of Betting
Interaction features
differences, ratios, and products of corresponding team statistics --- capture matchup dynamics that individual team features cannot represent. Style clash metrics quantify how playing styles interact. → Chapter 28: Feature Engineering for Sports Prediction
International cooperation:
Information-sharing agreements between regulators - Harmonization efforts through organizations like the International Association of Gaming Regulators (IAGR) - Sports integrity monitoring networks (IBIA, sport-specific bodies) operating globally - Emerging standards for data use, algorithmic fairne → Chapter 40: The Future of Sports Betting
If your analysis requires data not available through free sources (real-time feeds, player tracking, niche sports), evaluate paid providers. Start with free tiers and trial periods. → Chapter 5: Data Literacy for Bettors
Is the data available on a reference site?
**Yes:** Check if `pandas.read_html()` can extract it directly. If not, use `requests` + `BeautifulSoup` while respecting robots.txt and rate limits. - **No:** Continue to step 4. → Chapter 5: Data Literacy for Bettors
Is the quantity continuous or discrete?
Continuous (margins, ratings, efficiency metrics) --> Consider **normal** (or t-distribution for heavy tails). - Discrete (counts, wins) --> Go to question 2. → Chapter 7: Probability Distributions in Betting
Advantage: Non-parametric and can correct any monotonic miscalibration pattern, no matter how complex. - Disadvantage: Requires more calibration data (at least 500-1000 observations) to avoid overfitting. With small calibration sets, it can produce noisy, unreliable calibration mappings. → Chapter 30 Quiz: Model Evaluation and Selection
K
Kaggle / GitHub Community Datasets
Various scraped odds datasets appear periodically. Notable: the Pinnacle closing line dataset for NFL (covers 2007-present in some versions), and comprehensive soccer odds collections. → Appendix D: Data Sources Directory
Kaggle Datasets (kaggle.com)
Numerous sports datasets contributed by the community. Quality varies. Notable datasets include historical NFL game results with spreads, NBA shot logs, MLB Statcast data, and various soccer datasets. → Appendix D: Data Sources Directory
KC -3
smallest uncertainty relative to edge, most efficient market for the model's strength; (2) **BUF ML** -- highest raw edge, though higher uncertainty; (3) **Over 47.5** -- lowest edge-to-uncertainty ratio. All bets should be sized conservatively at quarter-Kelly or less given the MARGINAL ratings. Th → Chapter 13 Quiz: Value Betting Theory and Practice
Kelly Criterion Derivation:
The Kelly criterion maximizes the expected logarithm of wealth, which is equivalent to maximizing the long-run geometric growth rate - The formula $f^* = (pb - q)/b$ emerges naturally from calculus optimization of the growth function - Logarithmic utility provides ruin avoidance, asymptotic optimali → Chapter 14: Advanced Bankroll and Staking Strategies
Key areas:
Backend services (bet processing, trading platforms, settlement engines) - Frontend and mobile development (iOS, Android, web applications) - Data engineering (ETL pipelines, data warehousing, real-time streaming) - Infrastructure and DevOps (cloud infrastructure, CI/CD, monitoring) - Machine learni → Chapter 39: The Sports Betting Industry
Key MCMC concepts for practitioners:
**Warmup/burn-in:** The chain needs time to "find" the high-probability region of the posterior. Discard early samples. - **Chains:** Run multiple independent chains and check that they agree (convergence). - **Effective sample size:** Due to autocorrelation, 4,000 MCMC samples may contain only 1,00 → Chapter 10: Bayesian Thinking for Bettors
Key principles:
Use rolling windows rather than season-long averages to capture current form - Adjust for opponent quality (strength of schedule) - Include contextual features (home/away, rest days, travel distance) - Engineer interaction features where domain knowledge suggests them - Avoid look-ahead bias: featur → Chapter 41: Putting It All Together
Key properties of SHAP:
**Local accuracy:** SHAP values sum to the actual prediction. - **Consistency:** If a feature's contribution increases, its SHAP value never decreases. - **Missingness:** Features not in the model get zero SHAP value. - **Additivity:** For ensemble models, SHAP values of the ensemble equal the sum o → Chapter 27: Advanced Regression and Classification
Key questions:
Can features and model architectures developed for NBA prediction transfer to international basketball leagues? - Can in-play models trained on one sport (e.g., tennis) inform in-play modeling for another (e.g., soccer)? - How should models handle structural breaks --- rule changes, expansion, or pa → Chapter 42: Research Frontiers
Key responsibilities:
Setting opening lines and managing line movement throughout the betting cycle - Monitoring competitor pricing and market consensus - Managing liability and exposure within risk parameters - Identifying and responding to sharp betting action - Making rapid decisions during live events → Chapter 39: The Sports Betting Industry
Key sub-problems:
How do sportsbooks' detection algorithms work, and what betting patterns trigger limiting? - What is the optimal strategy for distributing bets across accounts to maximize total volume while minimizing the probability of being limited? - How should the bettor trade off between exploiting a large edg → Chapter 42: Research Frontiers
Key Takeaways
Cognitive biases are not signs of weakness but features of human cognition that require active countermeasures in a betting context. - Generate your own projections before viewing market lines to prevent anchoring. - Actively seek disconfirming evidence for every bet you consider (the "red team" app → Chapter 36: The Psychology of Betting
Key Takeaways:
The Poisson distribution is a natural model for soccer goals because goals are rare, discrete events occurring at a roughly constant rate. - A log-linear Poisson regression extracts attack, defense, and home advantage parameters from historical data. - The independent Poisson model generates a compl → Case Study: Poisson Modeling of Soccer — Predicting Match Outcomes
L
League-Level Analysis:
Correlation matrix of all statistics vs. win percentage - Ranking of teams by consistency (CV of point differential) - League-wide scoring distribution with normality tests → Chapter 6 Exercises: Descriptive Statistics for Sports
Learning priorities:
Stay current on advances in machine learning and statistical methodology - Follow academic research in sports analytics and prediction markets - Monitor industry developments (new operators, regulatory changes, product innovations) - Track changes in the sports themselves (rule changes, strategic ev → Chapter 41: Putting It All Together
Legal analysis:
Operating accounts in another person's name (known as "beard" or "runner" accounts) violates the terms of service of every legitimate sportsbook. - It may constitute fraud, as the friend would be providing false identity information for KYC purposes. - In regulated US markets, this may violate state → Chapter 38 Quiz: Risk Management and Responsible Gambling
Limitations of built-in importance:
**Biased toward high-cardinality features.** Features with many unique values have more potential split points and are used more often, even if they are not more predictive. - **Does not account for feature interactions.** A feature might be unimportant on its own but highly important in combination → Chapter 27: Advanced Regression and Classification
Limitations:
Assumes the future resembles the past, which is not always true. - Sensitive to how you define the reference class (which seasons? Which teams? Which conditions?). - Does not account for game-specific information (injuries, weather, motivation). → Chapter 2: Probability and Odds
Line Shopping Impact:
Even small improvements in odds (3-5 cents) compound dramatically over thousands of bets - The difference between shopping and not shopping can double or triple a bettor's annual ROI - Key numbers in football and basketball make half-point improvements disproportionately valuable → Chapter 12: Line Shopping and Market Analysis
Current odds, active arbitrage opportunities, active value opportunities, pending bets. 2. **Arbitrage History** -- All detected arbitrage opportunities with timestamps, sizes, and execution status. 3. **Value History** -- All detected value opportunities with edge, EV, and outcome. 4. **Portfolio P → Capstone Project 2: Multi-Sport Arbitrage and Value Detection Platform
Every edge has a lifecycle: discovery, exploitation, correction, adaptation - Detect edge decay through rolling window analysis of ROI and CLV - Stay ahead through diversification, model updating, information monitoring, and process improvement - Quarterly reviews provide structured opportunities to → Chapter 13: Value Betting Theory and Practice
Market Patterns:
Key numbers (3, 7) create structural features in the NFL margin distribution that affect teaser strategy and spread pricing. - Wong teasers crossing both 3 and 7 have historically been profitable at standard -110 pricing. - Divisional games show reduced home-field advantage and more competitive outc → Chapter 15: Modeling the NFL
Difference in key statistics (offensive rating A minus defensive rating B) - Historical head-to-head record - Style matchup indicators (e.g., run-heavy offense vs. weak run defense) → Chapter 27: Advanced Regression and Classification
the tendency for extreme observations to be followed by less extreme ones --- is one of the most important statistical phenomena in sports. It was first identified by Sir Francis Galton in the 1880s, who noticed that unusually tall parents tended to have children who were tall but closer to the popu → Chapter 23: Time Series Analysis for Betting
Real-time integrity monitoring of micro-betting patterns by operators and independent integrity bodies - Limitations on which micro-betting markets are offered (excluding easily manipulable events) - Collaboration between operators, leagues, and law enforcement - Use of AI-based anomaly detection sy → Chapter 40: The Future of Sports Betting
Current Brier score (7-day rolling) - Calibration plot (predicted vs. actual probability) - Feature importance drift over time - Prediction volume trend → Chapter 31: The Complete ML Betting Pipeline
Model training leakage
training the prediction model on all 5 seasons of data, then "backtesting" on games from those same seasons. The model has already seen the outcomes it is predicting, so the backtest results are unrealistically good. Walk-forward validation prevents this by retraining at each step using only past da → Chapter 30 Quiz: Model Evaluation and Selection
Opponent-adjusted efficiency ratings form the basis of spread prediction. An iterative adjustment process accounts for strength of schedule. - Ridge regression with time-series cross-validation provides a principled framework for combining multiple efficiency features into spread and totals predicti → Chapter 15: Modeling the NFL
Multi-Account and Multi-Sport Allocation:
Bankroll allocation across sportsbooks should consider juice efficiency, limits, restriction risk, and promotional value - Seasonal allocation shifts as sports calendars change throughout the year - Regular (weekly) rebalancing maintains optimal allocation as account balances drift - October-Novembe → Chapter 14: Advanced Bankroll and Staking Strategies
N
natural experiments
situations where a treatment is assigned in a quasi-random manner, allowing causal inference without a formal experiment. → Chapter 42: Research Frontiers
NBA API (nba.com/stats)
Accessed via the `nba_api` Python package. Player tracking data, shot charts, lineup combinations, hustle stats, and advanced box scores. - Limitations: Rate-limited. Headers must include a valid referer. → Appendix D: Data Sources Directory
NBA examples:
Charlotte Bobcats -> Charlotte Hornets (2014) - New Jersey Nets -> Brooklyn Nets (2012) - Seattle SuperSonics -> Oklahoma City Thunder (2008) → Chapter 5: Data Literacy for Bettors
Washington: Redskins (through 2019) -> Football Team (2020-2021) -> Commanders (2022+) - Oakland Raiders -> Las Vegas Raiders (2020) - San Diego Chargers -> Los Angeles Chargers (2017) - St. Louis Rams -> Los Angeles Rams (2016) → Chapter 5: Data Literacy for Bettors
nflverse (github.com/nflverse)
Community-maintained R and Python packages for NFL data. Includes play-by-play data from nflfastR, roster information, next-gen stats, and draft picks. The `nfl_data_py` Python package provides easy access. - Coverage: NFL play-by-play from 1999, with EPA and WPA calculations. → Appendix D: Data Sources Directory
O
Odds formats
American, decimal, fractional, Hong Kong, and Malay. Each format encodes the same information in a different way, and a professional bettor must be fluent in all of them. → Chapter 2: Probability and Odds
Odds-Portal (oddsportal.com)
Historical opening and closing odds from dozens of bookmakers. Covers all major sports globally. Free access for manual use; scraping is against terms of service. - Sports: Soccer, basketball, hockey, tennis, American football, baseball, and more. → Appendix D: Data Sources Directory
OddsJam (oddsjam.com)
Real-time odds comparison across US sportsbooks. Positive EV finder and arbitrage scanner. Subscription required ($99+/month). → Appendix D: Data Sources Directory
Offensive Efficiency Features:
`home_off_epa`: Home team's offensive Expected Points Added per play (season-to-date rolling average, excluding the current game). EPA measures the value of each play relative to league-average expectations, making it a pace-neutral efficiency metric. - `away_off_epa`: Away team's offensive EPA per → Case Study: Building an NFL Game Totals Model with Regression
Ongoing obligations:
Regular reporting of financial and operational metrics - Payment of licensing fees and taxes - Maintenance of internal controls - Submission to regulatory audits and investigations → Chapter 39: The Sports Betting Industry
Open questions:
How efficient are specific sub-markets (player props, live betting, micro-betting)? Most efficiency research focuses on game-level spreads and totals. The efficiency of newer, higher-margin markets is less understood. - How quickly do betting markets incorporate new information? There is evidence th → Chapter 42: Research Frontiers
Operate across all major betting market types
pregame, live, props, and futures --- with a unified analytical framework that adapts the same core principles to each market's unique characteristics. → Part VII: Live Betting and Advanced Markets
Operator obligations:
Staff training on identifying problem gambling behaviors - Customer interaction requirements when harmful behavior is detected - Marketing restrictions (no targeting of self-excluded individuals, restrictions on bonus offers to at-risk customers) - Contributions to responsible gambling research and → Chapter 39: The Sports Betting Industry
Opportunities:
High volume means more opportunities to find edge, even if each individual edge is small - The rapid feedback loop (bet, outcome, repeat) is conducive to machine learning approaches - Less mature markets may contain larger inefficiencies than established game-level markets - Latency advantages (fast → Chapter 40: The Future of Sports Betting
`home_plays_per_game`: Home team's average offensive plays per game, a proxy for pace. - `away_plays_per_game`: Away team's average offensive plays per game. - `combined_pace`: Sum of both teams' plays per game, capturing the expected overall tempo. → Case Study: Building an NFL Game Totals Model with Regression
patch updates
developer-released modifications to game balance, maps, and mechanics --- that can fundamentally alter the competitive dynamics. This creates a unique modeling challenge: historical data may become less relevant after a significant patch. → Chapter 22: Modeling Emerging Markets
Patterns to look for:
Correlation between emotional state and bet sizing deviations. - Degradation of process compliance during losing streaks. - Systematic differences between high-confidence and low-confidence bet outcomes (calibration check). - Times of day or specific sports where process compliance is weakest. - Pro → Chapter 36 Quiz: The Psychology of Betting
Per-Bet Entry (complete for each bet placed):
Standard bet log fields (sport, market, odds, stake, model probability) - Time spent on analysis (minutes) - Checklist compliance (items completed out of total) - Override (Y/N and reason if yes) - Emotional state at time of bet (1-5) → Case Study 2: The Tilt Diary --- A Season of Emotional Tracking
The sample size problem is severe: confirming a 3% edge requires 4,000+ bets - CLV provides a faster signal of edge than raw profit/loss - Confidence intervals on ROI should be calculated and monitored over time - Regression to the mean means early results are unreliable -- trust the process → Chapter 13: Value Betting Theory and Practice
Test provisional hypotheses on held-out data (different seasons, different leagues, or prospective tracking). - Apply pre-registered, one-shot hypothesis tests. - Require significance at $\alpha = 0.01$ or stricter. → Chapter 8: Hypothesis Testing and Statistical Significance
Phase 3: Economic Validation
Even statistically significant angles may not be profitable after accounting for vig, line movement, and execution costs. - Simulate actual betting (with realistic odds, bet timing, and bankroll constraints). - Require positive expected value after all costs. → Chapter 8: Hypothesis Testing and Statistical Significance
Phase 4: Monitored Deployment
Begin betting the angle with small stakes. - Track results prospectively. - Compare ongoing results to pre-deployment expectations. - Establish clear criteria for abandoning the angle if results deteriorate. → Chapter 8: Hypothesis Testing and Statistical Significance
Pinnacle API
Pinnacle offers a free API for current odds on all markets. Requires a funded Pinnacle account. Widely regarded as the sharpest lines in the market. - Documentation: `pinnacle.com/en/api` → Appendix D: Data Sources Directory
Platoon effects
the handedness advantage of batters against opposite-handed pitchers --- are robust, persistent, and quantifiable. They should be incorporated into every lineup-level projection. → Chapter 17: Modeling MLB
Platt scaling:
Advantage: Requires very few parameters (just a slope and intercept), making it robust with small calibration sets (as few as 200-300 observations). - Disadvantage: Assumes the miscalibration is a monotonic sigmoid transformation, which may not capture more complex calibration errors (e.g., a model → Chapter 30 Quiz: Model Evaluation and Selection
Player Impact:
Quarterback value dominates all other positional values. Elite QBs are worth 8-12 points over replacement per game. - Injury adjustments should be probabilistic (accounting for questionable/doubtful designations) and position-specific. - The market generally prices high-profile QB injuries efficient → Chapter 15: Modeling the NFL
Player protection tools:
Deposit limits (daily, weekly, monthly) - Loss limits and wager limits - Session time limits and reality checks - Self-exclusion options (operator-level and jurisdiction-wide) - Cool-off periods and account closure options → Chapter 39: The Sports Betting Industry
Portfolio Theory:
Markowitz mean-variance optimization adapts naturally to bet portfolios - Correlated bets reduce the diversification benefit; account for covariance in allocation - Optimal portfolio allocation considers edge, correlation, market efficiency, liquidity, and timing - Diversification across 5-15 simult → Chapter 14: Advanced Bankroll and Staking Strategies
position
accepting an unbalanced book when they believe one side is more likely than the other. Books with sophisticated models may intentionally accept more risk on the side they believe is less likely to win, effectively acting as bettors themselves. This is a significant evolution from the traditional "ba → Chapter 2: Probability and Odds
Position A: Trader (NFL)
Base salary: $95,000 - Annual bonus target: 15% of base - Career progression: Senior Trader in 3 years, Head of Trading in 6--8 years - Required skills: Deep NFL knowledge, statistical literacy, fast decision-making under pressure → Chapter 39 Quiz: The Sports Betting Industry
Position B: Data Scientist (Pricing Models)
Base salary: $130,000 - Annual bonus target: 10% of base - Career progression: Senior DS in 2--3 years, Lead in 4--5 years - Required skills: Python, ML frameworks, statistics, SQL, some domain knowledge → Chapter 39 Quiz: The Sports Betting Industry
Positive correlations:
Team moneyline and team spread (same team) - Team moneyline and over (winning team contributes to total) - Player props within the same game (game environment affects all) - Weather-sensitive bets (wind, rain affect multiple outcomes) → Chapter 14: Advanced Bankroll and Staking Strategies
Practical probability estimation
using historical frequencies, power ratings, and calibration analysis to develop and validate your own probability estimates. → Chapter 2: Probability and Odds
Practical risks:
If discovered, both parties could face account closure, forfeiture of funds (including unrealized profits), and potential legal consequences. - The friend assumes liability for tax reporting on income they did not earn, creating complications for both parties. → Chapter 38 Quiz: Risk Management and Responsible Gambling
Practical Tools:
We built a complete Python system for collecting, storing, analyzing, and alerting on odds across multiple sportsbooks - The system uses The Odds API for data collection, SQLite for storage, and configurable alerting - This infrastructure can be extended with dashboards, mobile alerts, and bankroll → Chapter 12: Line Shopping and Market Analysis
Prevention (before tilt occurs):
Pre-session emotional self-assessment (mood, stress, sleep) - Go/reduce/no-go decision based on composite readiness score - Session length limits and scheduled breaks → Chapter 36 Key Takeaways: The Psychology of Betting
Prevention:
Override policy changed from "information-based overrides allowed with documentation" to "zero overrides for 90 days; then information-based only with pre-approval from accountability partner." - The discipline enforcer was modified to make loss limits truly unoverridable --- the system now refuses → Case Study 2: The Discipline Breakdown --- A Bettor's Worst Month
probability calibration
the art and science of ensuring your model's probabilities are trustworthy. We then address the challenge of **imbalanced outcomes** (predicting rare events like upsets or blowouts) and conclude with **model interpretability**, using SHAP values and related tools to understand not just what your mod → Chapter 27: Advanced Regression and Classification
Probability fundamentals
the axioms, addition and multiplication rules, conditional probability, and independence. These are not abstract concepts but practical tools you will use every time you evaluate a bet. → Chapter 2: Probability and Odds
Process review:
Were all bets placed according to your documented process? - Were there any deviations? If so, what triggered them? - Were stopping rules activated? Did you follow them? - Were there bets you should have made but did not (missed opportunities)? → Chapter 37: Discipline, Systems, and Record-Keeping
Q
Quality checks at this stage:
Are all expected data sources available and up to date? - Are there missing values, duplicates, or obvious errors? - Do team/player identifiers match across sources? - Are historical and current data on the same scale and definition? → Chapter 41: Putting It All Together
Quantitative review:
Total bets placed - Win/loss record - Total profit/loss - ROI for the week - Average stake size (and whether it deviated from your plan) - CLV summary (average CLV captured, percentage of bets with positive CLV) → Chapter 37: Discipline, Systems, and Record-Keeping
R
r = 0.98
higher, as expected, since win rate mechanically determines ROI for fixed-odds bets. However, win rate is not a skill metric; it is an outcome metric. The relevant comparison is between CLV as a leading indicator (available before results are known) and profitability as the lagging indicator. → Case Study: Closing Line Value Across 5,000 Bets --- Separating Skill from Luck
reach
the distance from fingertip to fingertip with arms extended. A fighter with a significant reach advantage can strike from a distance where their opponent cannot effectively counter, control range, and use jabs and front kicks to keep shorter-armed opponents at bay. → Chapter 21: Modeling Combat Sports and Tennis
Read any betting market
point spreads, totals, moneylines, props, futures --- and immediately understand the implied probability, the embedded vig, and the no-vig fair price. → Part I: Foundations of Sports Betting
Open accounts under own name at additional legitimate sportsbooks - Use betting exchanges (Betfair, etc.) where peer-to-peer matching reduces restriction risk - Diversify across jurisdictions where legally permitted - Accept account limitations as a cost of doing business → Chapter 38 Quiz: Risk Management and Responsible Gambling
Recovery (after a tilt episode):
Full session stop; no more bets until next day - Written post-mortem: trigger, emotional progression, decisions made, financial impact - Review and reinforce the pre-bet checklist before next session → Chapter 36 Key Takeaways: The Psychology of Betting
abrupt shifts in team performance caused by trades, injuries, coaching changes, or lineup adjustments. This case study builds an LSTM-based model that processes a team's game-by-game performance sequence and learns to detect when a team has entered a new performance regime. Using three NBA seasons o → Case Study 2: LSTM Models for Detecting NBA Team Performance Regime Changes
Regularization strategy:
Dropout: 0.3 between hidden layers (moderately aggressive for this dataset size) - Weight decay: 1e-4 (standard for Adam) - Batch normalization: After each linear layer, before dropout - Early stopping: Patience of 15 epochs, monitoring validation loss → Chapter 29 Quiz: Neural Networks for Sports Prediction
Regulatory enforcement:
Regular compliance audits - Significant fines for violations (UK operators have faced fines exceeding $20 million for responsible gambling failures) - License suspension or revocation for serious or repeated violations → Chapter 39: The Sports Betting Industry
Requirements:
Accept betting records as input (wins, losses, pushes, odds for each bet) - Implement a z-test for proportions (one-sided and two-sided) - Implement a t-test for profit/loss per bet (one-sided and two-sided) - Calculate exact binomial test p-values - Generate confidence intervals (both Wald and Wils → Chapter 8 Exercises: Hypothesis Testing and Statistical Significance
residuals
the difference between the actual outcomes and the current predictions. 3. Fit a decision tree to these residuals. 4. Add this tree to the ensemble (scaled by a learning rate). 5. Repeat steps 2--4 for a specified number of iterations. → Chapter 27: Advanced Regression and Classification
Response:
New mandatory cooling-off protocol: any week with more than one discipline flag triggers a 48-hour betting pause, followed by a formal review before resumption. - Accountability partner notified automatically of any hard stop overrides (which should now be impossible) and of any discipline drift ale → Case Study 2: The Discipline Breakdown --- A Bettor's Worst Month
Rest and Schedule Features:
`home_rest_days`: Days since the home team's last game. Values of 0 indicate a back-to-back. - `away_rest_days`: Days since the away team's last game. - `home_b2b`: Binary indicator --- 1 if the home team is on the second game of a back-to-back. - `away_b2b`: Binary indicator for the away team. - `r → Case Study: Logistic Regression for NBA Moneyline Predictions
Rest and Travel:
The back-to-back effect is one of the most well-documented patterns in NBA betting, worth approximately 2.5 to 4.0 points depending on the opponent's rest situation and travel involved. - The market has improved at pricing B2B effects but still does not fully account for cumulative schedule fatigue → Chapter 16: Modeling the NBA
Retrosheet (retrosheet.org)
Free play-by-play data for every MLB game from 1921 to present. Event files can be parsed with the Chadwick tools. Essential for baseball simulation and historical analysis. → Appendix D: Data Sources Directory
Risk Assessment:
Operators must conduct and maintain a documented AML risk assessment - Risk-based approach to due diligence (higher-risk customers receive more scrutiny) - Regular review and updating of AML policies and procedures → Chapter 39: The Sports Betting Industry
Junior Trader: $55,000--$80,000 - Trader: $75,000--$120,000 - Senior Trader: $110,000--$170,000 - Head of Trading: $150,000--$250,000+ → Chapter 39: The Sports Betting Industry
Scoring Guide:
90-100: Excellent mastery of bankroll management concepts - 80-89: Strong understanding with minor gaps - 70-79: Adequate understanding; review weak areas - 60-69: Significant gaps; revisit chapter material - Below 60: Comprehensive review recommended → Chapter 4 Quiz: Bankroll Management Fundamentals
seasonal effects
systematic deviations in performance or betting market behavior that recur at regular intervals. From a time series perspective, a seasonal effect is a predictable component of the series that repeats with a known period. → Chapter 23: Time Series Analysis for Betting
Second Spectrum / Hawk-Eye
Optical tracking data for NBA and Premier League. Player and ball position at 25 fps. Powers advanced spatial analytics but is extremely expensive and typically limited to teams and media partners. → Appendix D: Data Sources Directory
Detailed description of your betting approach - Models used and their theoretical basis - Staking methodology with justification - How your strategy evolved from pre-season to final form - Reference specific textbook concepts and chapters → Capstone Project 3: Season-Long Betting Simulation Challenge
Section 3: Performance Analysis (3--4 pages)
All metrics from the table above, with visualizations - Cumulative P&L chart with drawdown overlay - Calibration plot (predicted probability vs. actual win rate, 10 bins minimum) - Performance breakdown by sport, market type, month, and confidence level - CLV distribution plot - Comparison of actual → Capstone Project 3: Season-Long Betting Simulation Challenge
Section 4: Skill vs. Variance Analysis (2 pages)
Was your result (good or bad) primarily skill or variance? - Use the binomial test or bootstrap confidence interval (Chapter 8, Chapter 24) to assess whether your win rate is significantly different from breakeven (52.4% at -110) - Compare your actual ROI to the distribution of ROIs a random bettor → Capstone Project 3: Season-Long Betting Simulation Challenge
Section 5: Behavioral and Process Review (2 pages)
Honest assessment of your decision-making discipline - Specific examples of good process (even with bad outcomes) - Specific examples of bad process (even with good outcomes) - How did drawdowns affect your behavior? Did you deviate from strategy? - What cognitive biases (Chapter 36) did you observe → Capstone Project 3: Season-Long Betting Simulation Challenge
Section 6: Lessons Learned (1--2 pages)
What are the three most important things you learned? - How has this simulation changed your understanding of sports betting? - What surprised you most? - What specific textbook concepts proved most valuable in practice? - What concepts seemed important in theory but were hard to apply in practice? → Capstone Project 3: Season-Long Betting Simulation Challenge
Section A: Overall Philosophy (1 page)
What is your betting philosophy? (e.g., model-driven, market-driven, hybrid) - Which sports and markets will you prioritize and why? - What is your edge hypothesis? Where do you believe inefficiencies exist? - How will you balance expected value against variance? - Reference the value betting framew → Capstone Project 3: Season-Long Betting Simulation Challenge
What staking method will you use? (Flat bet, percentage, Kelly, fractional Kelly) - Justify your choice using the analysis from Chapter 4 and Chapter 14. - What is your standard unit size? - What are your maximum bet size rules? - Under what conditions will you adjust your unit size? - What are your → Capstone Project 3: Season-Long Betting Simulation Challenge
Section B: Model Assessment (1 page)
Is your model performing as expected? - Calibration analysis: are your probability estimates accurate? (Chapter 30) - Which sports/markets have been profitable and which have not? - Feature importance or model diagnostics - Any evidence of model degradation or changing market conditions? → Capstone Project 3: Season-Long Betting Simulation Challenge
Section C: Model Description (1--2 pages)
What predictive model(s) will you use? - What data inputs drive your model? - How do you estimate probabilities? - How do you identify value relative to the market? - What are your minimum edge thresholds for placing a bet? - What is your process for each sport you plan to bet? - Reference specific → Capstone Project 3: Season-Long Betting Simulation Challenge
Section C: Strategy Adjustments (1 page)
Based on the data, what will you change for the second half? - Adjustments to staking (increase/decrease unit size, change Kelly fraction) - Adjustments to model (new features, recalibration, different thresholds) - Sports or markets you will add, drop, or reallocate capital toward - Justify every c → Capstone Project 3: Season-Long Betting Simulation Challenge
Section D: Process and Discipline (0.5 page)
Describe your weekly workflow: how will you analyze games, select bets, and record decisions? - What safeguards do you have against tilt and emotional betting (Chapter 36)? - How will you track your performance (Chapter 37)? → Capstone Project 3: Season-Long Betting Simulation Challenge
sensitivity analysis
understanding how the optimal solution changes when parameters vary. In betting, our probability estimates are uncertain, so we want to know: → Chapter 25: Optimization Methods for Betting
Series
a one-dimensional array with labels. When you load NFL play-by-play data, each row is a play and each column is an attribute of that play (down, distance, yard line, passer, receiver, expected points added, and so on). → Chapter 5: Data Literacy for Bettors
Set and enforce responsible gambling limits
loss thresholds, session constraints, and cooling-off triggers --- backed by mathematical analysis of their impact on long-term outcomes. → Part VIII: Psychology and Discipline
Driving performance, tee shots on par 4s and par 5s 2. **SG: Approach (SG:APP)** --- Approach shots into the green 3. **SG: Around the Green (SG:ARG)** --- Chips, pitches, and bunker shots 4. **SG: Putting (SG:PUTT)** --- Performance on the putting surface → Chapter 22: Modeling Emerging Markets
simulation
the art and science of using random number generation to explore the behavior of complex systems. Monte Carlo methods, named after the famous casino in Monaco, use repeated random sampling to obtain numerical results that would be intractable to compute analytically. Where Chapter 23 built models fr → Chapter 24: Simulation and Monte Carlo Methods
Situational bettor (uses simple rules):
Rules: bet pass on 3rd & long, bet run on goal line, avoid 1st down bets - Accuracy improvement: ~3% above baseline - At 15% margin, still loses approximately $4.20 per $50 bet - Over a game: ~$21 in expected losses on 5 selective bets → Case Study: Micro-Betting Pricing and Profitability Analysis
Situational Features:
`dome`: Binary indicator for games played in a dome or retractable-roof stadium. Dome games tend to score higher due to controlled conditions. - `wind_speed`: Wind speed in mph at kickoff. High wind suppresses passing and reduces scoring. - `temperature`: Temperature in Fahrenheit. Extreme cold can → Case Study: Building an NFL Game Totals Model with Regression
Skills required:
Deep knowledge of specific sports (most traders specialize in one or two sports) - Statistical literacy and comfort with quantitative models - Ability to process information quickly under time pressure - Understanding of market microstructure and price formation - Programming skills (Python, SQL) in → Chapter 39: The Sports Betting Industry
Sportradar (sportradar.com)
Industry-standard data provider. Official data partner of the NFL, NBA, NHL, MLB, and NASCAR. Real-time feeds, play-by-play, player props data, and proprietary advanced metrics. - Pricing: Enterprise-level; typically $10,000+ annually for research tiers. Developer trials available with limited call → Appendix D: Data Sources Directory
Sports Reference Family (sports-reference.com)
**Pro-Football-Reference.com** -- Comprehensive NFL statistics from 1920 to present. Game logs, player stats, advanced metrics (ANY/A, DVOA-like), draft data, and coaching records. CSV export available for most tables. - **Basketball-Reference.com** -- NBA, ABA, WNBA, and international data. Box sco → Appendix D: Data Sources Directory
Historical line movement data for NFL, NBA, MLB, NHL, and college sports. Shows opening and closing lines at major sportsbooks along with public betting percentages. - Free access with registration; premium tiers available. → Appendix D: Data Sources Directory
Stats Perform (statsperform.com)
Formerly Opta. Premier soccer data provider. Event-level data with 2,000+ events per match. Expected goals models, player ratings, and possession metrics. - Also covers basketball, American football, baseball, cricket, tennis. - Pricing: Enterprise. Academic partnerships available. → Appendix D: Data Sources Directory
Statsbomb (statsbomb.com)
High-quality soccer event data. Free tier covers select competitions (La Liga, Champions League finals, NWSL). Full product includes 360-degree freeze-frame data showing all player positions at each event. - Pricing: Tiered; academic access programs exist. → Appendix D: Data Sources Directory
Step 1 -- Data Ingestion:
Input: External APIs (schedule, odds, injury reports, box scores from prior games) - Process: API clients fetch data with retry logic and rate limiting. Raw data is validated against expected schemas and stored in the raw data store (database or files). - Output: Raw game data, current odds, updated → Chapter 31 Quiz: The Complete ML Betting Pipeline
Step 1: Determine your bankroll.
Identify the total amount you can allocate to betting without impacting your financial stability. - Express this in units (recommend 1 unit = 1% of bankroll to start). → Chapter 4: Bankroll Management Fundamentals
Input: Raw data from the data store, feature transformer definitions - Process: Feature transformers compute rolling averages, Elo ratings, rest days, and other features for all 30 NBA teams using only data prior to today. Features are stored in the feature store with version metadata. - Output: Fea → Chapter 31 Quiz: The Complete ML Betting Pipeline
Step 2: Assess your edge.
If you have a documented, backtested edge with at least 200+ bets of history: proceed to Kelly-based approaches. - If you are new, testing a model, or do not have a quantified edge: use flat betting at 1% of bankroll. → Chapter 4: Bankroll Management Fundamentals
Input: Feature vectors for all 8 games, active model from model registry - Process: The prediction service loads the active model, retrieves features for each game, runs inference, and outputs calibrated probabilities. Each prediction is logged with a unique ID and feature snapshot. - Output: Home-w → Chapter 31 Quiz: The Complete ML Betting Pipeline
Step 3: Choose your Kelly fraction.
New model or uncertain edge: Quarter Kelly (0.25) - Moderately confident, some track record: Half Kelly (0.50) - Highly confident, long track record, well-calibrated probabilities: Three-quarter Kelly (0.75) - Never: Full Kelly (unless you are conducting academic research) → Chapter 4: Bankroll Management Fundamentals
Step 3: Manage the book.
If one side has significantly more liability, consider shading the line (adjusting odds slightly to attract the other side). - If the book has a strong opinion that differs from the action, hold the line and accept the imbalanced position. → Chapter 11: Understanding Betting Markets
Step 4 -- Edge Computation and Bet Decision:
Input: Predicted probabilities, current market odds from multiple sportsbooks - Process: For each game and market, compute the edge (model probability minus implied probability). Compare edge to the minimum threshold. For games with sufficient edge (3 out of 8), compute Kelly criterion bet sizes. - → Chapter 31 Quiz: The Complete ML Betting Pipeline
Step 4: Calculate risk of ruin.
Use the Monte Carlo simulator from Section 4.5 to estimate your risk of ruin under your chosen strategy. - If the risk of ruin exceeds your tolerance (see the framework in Section 4.5.5), reduce your Kelly fraction or increase your bankroll. → Chapter 4: Bankroll Management Fundamentals
Step 4: Closing line.
The closing line reflects the full information set: sharp action, recreational action, news, and the book's own assessment. - This is the book's final "best guess" at fair value. → Chapter 11: Understanding Betting Markets
Step 5 -- Risk Management:
Input: 3 bet recommendations, current portfolio state (today's existing bets, cumulative P&L) - Process: Check each recommendation against risk limits: single bet maximum, daily exposure limit, correlated exposure limits (e.g., not too much exposure to one team). Adjust sizes downward if any limit i → Chapter 31 Quiz: The Complete ML Betting Pipeline
Step 5: Set review points.
Review your unit size monthly or quarterly. - Increase unit size only after sustained growth (25--50% bankroll increase) with a sample of 200+ bets. - Decrease unit size immediately upon a 25% drawdown. - Track the performance of each confidence tier separately if using a tiered system. → Chapter 4: Bankroll Management Fundamentals
Step 6 -- Bet Execution:
Input: Approved bets with target sportsbook and odds - Process: The execution engine verifies that the current odds at the target sportsbook still match (or are better than) the odds used in the edge calculation. If odds have moved unfavorably beyond a tolerance, the bet is skipped. Otherwise, the b → Chapter 31 Quiz: The Complete ML Betting Pipeline
Step 7 -- Monitoring and Logging:
Input: Data from all prior steps - Process: All predictions, decisions, and executions are logged. Monitoring tracks latency, data freshness, model performance, and P&L. Alerts fire for anomalies. - Output: Audit trail, dashboards, alerts → Chapter 31 Quiz: The Complete ML Betting Pipeline
Strength of Schedule Adjustment:
`home_sos`: Average net rating of the home team's opponents over their last 15 games. A team with a 10-5 record against strong opponents is better than a 10-5 record against weak ones. - `away_sos`: Same for the away team. → Case Study: Logistic Regression for NBA Moneyline Predictions
Strengths:
Simple, transparent, and easy to compute. - Provides a solid baseline or "prior" probability. - Can be computed for very specific situations (e.g., "How often does a team favored by 3--6 points on the road cover the spread?"). → Chapter 2: Probability and Odds
Operators must file Suspicious Activity Reports (SARs) with financial intelligence units when they detect potentially illicit activity - Common triggers include: structured deposits designed to avoid reporting thresholds, rapid movement of funds through accounts with minimal betting activity, and us → Chapter 39: The Sports Betting Industry
pre-bet checklists, qualification criteria, and process documentation --- shift decision-making from emotional impulse to structured analysis. Like airline pilots using checklists, bettors use these tools not because they are incompetent but because they recognize that even experts make errors under → Chapter 37: Discipline, Systems, and Record-Keeping
Systematic Value Identification:
Value exists when your estimated true probability exceeds the implied probability from the odds - Edge thresholds should account for estimation uncertainty, market efficiency, and variance - Multi-factor scoring incorporates edge size, track record, market efficiency, liquidity, diversification, and → Chapter 13: Value Betting Theory and Practice
T
tabular
rows of games with columns of features --- a domain where tree-based models like XGBoost have traditionally dominated. The datasets are **small** by deep learning standards: a decade of NBA games yields roughly 12,000 observations, compared to the millions or billions of examples used to train image → Chapter 29: Neural Networks for Sports Prediction
Team pace ratings
adjusted possessions per 48 minutes, weighted by recency 2. **Rest differentials** --- the difference in days of rest between teams, with a nonlinear response (2+ days of rest advantage had diminishing returns) 3. **Altitude/travel adjustment** --- teams playing at altitude (Denver) or after cross-c → Case Study: The Decay and Renewal of a Betting Edge --- Adapting Through Three Market Regimes
Team Strength Features:
`home_net_rating`: Home team's net rating (offensive rating minus defensive rating), computed as a rolling average over the last 15 games. Net rating is the single most predictive team-level statistic in basketball analytics. - `away_net_rating`: Away team's net rating, computed identically. - `net_ → Case Study: Logistic Regression for NBA Moneyline Predictions
Team-level statistics (rolling):
Offensive/defensive efficiency (points per 100 possessions, yards per play) - Win/loss record over recent games (last 5, 10, 20) - Home/away splits - Rating system outputs (Elo, Glicko-2, Massey, PageRank from Chapter 26) → Chapter 27: Advanced Regression and Classification
Temporal Adjustments:
Early-season statistics (Weeks 1--4) are noisy. We apply Bayesian shrinkage, blending current-season averages with a prior based on the previous season's performance. The shrinkage weight decays exponentially as more games are played. → Case Study: Building an NFL Game Totals Model with Regression
Tennis
Point-by-point data, well-understood probability models, frequent scoring 2. **Soccer** -- Massive live handle in European markets, goal-based state changes create clear mispricing windows 3. **NFL Football** -- Rich game state, natural pauses allow careful analysis, but sophisticated books 4. **NBA → Chapter 33: Live and In-Play Betting
Testing:
Test with a simulated bettor who has a 54% true win rate over 500 bets - Test with a simulated bettor who has a 50% true win rate over 500 bets (should usually fail to reject) - Test with a small sample (30 bets) to observe wide confidence intervals → Chapter 8 Exercises: Hypothesis Testing and Statistical Significance
The daily feedback loop:
Place bets according to your process. - Record results the same day. - Brief emotional check-in: How do you feel? Were there deviations? Were stopping rules triggered? → Chapter 37: Discipline, Systems, and Record-Keeping
Dean Oliver's Four Factors (eFG%, TOV%, OREB%, FT Rate) explain over 90% of the variance in NBA team win percentage. - Shooting efficiency (eFG%) is by far the most important factor, followed by turnovers, offensive rebounding, and free throw rate. - Pace is a critical adjustment factor: per-possess → Chapter 16: Modeling the NBA
The monthly feedback loop:
Conduct the monthly review (Section 37.2.2). - Run the full performance dashboard. - Run the bias audit from Chapter 36. - Run the calibration analysis from Chapter 36. - Identify systemic patterns (profitable and unprofitable categories). - Decide whether to adjust model parameters, qualification c → Chapter 37: Discipline, Systems, and Record-Keeping
The Odds API (the-odds-api.com)
Real-time and pre-match odds from 40+ bookmakers via REST API. Covers major US and international sports. Free tier: 500 requests/month. Paid tiers from $20/month. - Endpoints: head-to-head, spreads, totals, player props. - Python: `requests.get('https://api.the-odds-api.com/v4/sports/americanfootbal → Appendix D: Data Sources Directory
The overround
the sportsbook's built-in profit margin. Understanding how much you are being "taxed" on each bet, and how that tax varies across books and markets, is essential for determining where to focus your betting activity. → Chapter 2: Probability and Odds
The quarterly/seasonal feedback loop:
Review the entire season or quarter. - Evaluate whether your edge has grown, shrunk, or remained stable. - Conduct A/B tests on process changes. - Update your operating manual. - Set goals for the next quarter. → Chapter 37: Discipline, Systems, and Record-Keeping
The true probability
the actual likelihood of the event occurring, which no one knows with certainty. 2. **The implied probability** --- the probability embedded in the sportsbook's odds, which includes the overround. 3. **Your estimated probability** --- your best assessment of the true probability based on your analys → Chapter 2: Probability and Odds
PSI < 0.10: No significant drift. The distributions are essentially the same. - 0.10 <= PSI < 0.25: Moderate drift. Investigation is warranted; the model may need monitoring more closely. - PSI >= 0.25: Significant drift. The model should be retrained or the data pipeline investigated for errors. → Chapter 31 Quiz: The Complete ML Betting Pipeline
Tier 1 (15% drawdown, bankroll at $12,750):
Reduce to 1/6th Kelly (from quarter-Kelly). - Review last 50 bets for process errors. - Continue betting if process is sound. - Return to quarter-Kelly when drawdown recovers to 10%. → Chapter 14 Quiz: Advanced Bankroll and Staking Strategies
Tier 2 (25% drawdown, bankroll at $11,250):
Reduce to 1/8th Kelly. - Full model audit: check data inputs, coefficient stability, and out-of-sample performance. - Reduce to NBA only (pause lower-volume sports). - Return to Tier 1 protocol when drawdown recovers to 20%. → Chapter 14 Quiz: Advanced Bankroll and Staking Strategies
Tier 3 (32% drawdown, bankroll at $10,200):
Pause all live betting for one week. - Complete model rebuild using last 200 bets. - Run a 50-bet paper trading test before resuming. - Resume at 1/8th Kelly; return to Tier 2 protocol after 25 consecutive bets without new low. → Chapter 14 Quiz: Advanced Bankroll and Staking Strategies
Trace of the bug:
For -150: `1 + 100/(-150)` = `1 - 0.6667` = `0.3333` (wrong; should be 1.6667) - For +200: `1 + 200/100` = `3.0` (correct) - Combined: `0.3333 * 3.0` = `1.0` (wrong; should be 5.0) → Chapter 2 Quiz: Probability and Odds
LSTMs, embeddings, and attention networks --- for sequential sports prediction tasks, knowing when they add value over simpler approaches. → Part VI: Machine Learning for Sports Betting
Travel and Fatigue:
`away_travel_miles`: Approximate distance the away team traveled for this game, computed from team city coordinates. Long-distance travel (cross-country games) correlates with reduced performance. → Case Study: Logistic Regression for NBA Moneyline Predictions
True Probability Estimation:
Model-based approaches (Elo, logistic regression, etc.) estimate probability from features - Market-based approaches use sharp closing lines as the best available probability estimate - The Bayesian framework combines model and market information, weighted by confidence in each - Calibration analysi → Chapter 13: Value Betting Theory and Practice
Types of missing data in sports:
**Did Not Play (DNP):** A player was on the roster but did not enter the game. Their stats are legitimately zero for counting stats (points, rebounds, assists) but should be treated as missing for rate stats (shooting percentage, yards per attempt). - **Injury/Suspension:** The player was unavailabl → Chapter 5: Data Literacy for Bettors
Types of self-exclusion:
**Single-operator exclusion.** You exclude yourself from one specific sportsbook. This is useful if you have a problematic relationship with a specific platform (e.g., one that offers live betting features that trigger impulsive behavior). → Chapter 38: Risk Management and Responsible Gambling
the book has mispriced the event - **Stale lines** -- the book has not yet reacted to news or sharp action - **Different market structure** -- the book may be dealing to a different number intentionally → Chapter 12: Line Shopping and Market Analysis
Variants of SMOTE:
**Borderline-SMOTE:** Only oversamples minority examples near the decision boundary. - **SMOTE-ENN:** Combines SMOTE with Edited Nearest Neighbors to clean noisy examples. - **ADASYN:** Adaptively generates more synthetic examples for harder-to-classify minority instances. → Chapter 27: Advanced Regression and Classification
Stake at 50% of normal size (1% of bankroll instead of 2%) - Maximum 3 bets per day (instead of normal 5-7) - Full pre-bet checklist required for every bet, no exceptions - End-of-day review required with written notes - If any loss limit is hit at reduced levels, stop for 48 hours → Chapter 38 Quiz: Risk Management and Responsible Gambling
Week 2 --- Gradual Escalation:
Stake at 75% of normal size - Maximum 5 bets per day - Full pre-bet checklist continues - If week 1 was profitable and process-compliant: continue escalation - If week 1 was not process-compliant: repeat week 1 protocol → Chapter 38 Quiz: Risk Management and Responsible Gambling
**Model variations.** Does adding a new feature to your model improve performance? Run both versions in parallel for a defined sample size and compare. - **Staking strategies.** Does switching from flat staking to Kelly-based staking improve risk-adjusted returns? Run both approaches on paper for a → Chapter 37: Discipline, Systems, and Record-Keeping
What remains open:
How to incorporate model uncertainty that changes over time (your model may be better calibrated in some contexts than others) - How to account for the correlation between edge estimation error and bet frequency (if your model systematically overestimates edge, you also systematically overbet) - Opt → Chapter 42: Research Frontiers
What this does NOT mean:
It does NOT mean that losses will be "corrected" by future wins (gambler's fallacy). - It does NOT guarantee any specific result over any finite number of bets. - It does NOT eliminate the need for bankroll management. → Chapter 3 Key Takeaways: Expected Value and the Bettor's Edge
What this means for bettors:
A +EV bettor will be profitable in the long run with near certainty. - A -EV bettor will lose money in the long run with near certainty. - Short-term results (dozens or even hundreds of bets) can deviate dramatically from EV due to variance. - The sportsbook leverages the LLN by processing millions → Chapter 3 Key Takeaways: Expected Value and the Bettor's Edge
When appropriate:
When the data spans several orders of magnitude - When the standard deviation is proportional to the mean - When multiplicative relationships are more natural than additive ones - When you need normality for subsequent statistical tests → Chapter 6 Quiz: Descriptive Statistics for Sports
When NOT appropriate for sports scoring data:
Individual game scoring rarely spans orders of magnitude (typically 5-50 points, not 1-1000) - Additive models are usually more natural for game scores - The skewness may be mild enough that robust methods (trimmed mean, median) suffice - Interpretation becomes harder (what does "log-points" mean to → Chapter 6 Quiz: Descriptive Statistics for Sports
Early in the season (small $n$) - When teams have limited track records (expansion teams, major roster overhauls) - For rare events (championship probabilities, perfect seasons) → Chapter 10: Bayesian Thinking for Bettors
When the line barely moves:
The market's initial assessment was well-calibrated. - The opening line was already close to the consensus fair value. - There were no major information surprises or imbalances in sharp action. → Chapter 11: Understanding Betting Markets
When the line moves significantly:
The market received substantial new information between opening and closing. - The opening line was likely less accurate, and the closing line better reflects the true probability. - Bettors who identified the correct direction early captured value. → Chapter 11: Understanding Betting Markets
When to loosen limits (cautiously):
When your bankroll has grown substantially and your percentage-based limits translate to dollar amounts that are trivially small relative to your daily volume - When you have a long, documented track record of staying within your limits and your CLV analysis confirms a sustained edge - When you are → Chapter 38: Risk Management and Responsible Gambling
When to tighten limits:
During prolonged drawdowns (when your bankroll is significantly below its peak) - When your model's edge appears to be declining (based on CLV analysis) - During periods of personal stress or emotional instability - When you are new to a sport, bet type, or market that you have not yet established a → Chapter 38: Risk Management and Responsible Gambling
When to use histograms in sports:
Examining the distribution of final scores - Checking whether margins of victory follow a normal distribution - Comparing home vs. away scoring distributions → Chapter 6: Descriptive Statistics for Sports
When to use it for sports prediction:
**Use it** when training on a GPU (CUDA device). The speedup is most noticeable when data loading is a bottleneck, such as with large datasets or complex data preprocessing. - **Do not use it** when training on CPU only, as pinned memory allocation has slightly higher overhead than standard allocati → Chapter 29 Quiz: Neural Networks for Sports Prediction
When to use the median in sports:
**Home run totals:** The distribution of home runs per player is heavily right-skewed. A handful of sluggers hit 40-plus while most hitters are in single digits. The median gives a better sense of a "typical" player. - **Margin of victory:** Blowouts skew the mean. The median margin tells you what a → Chapter 6: Descriptive Statistics for Sports
Without normalization:
The policy gradient would have very high variance, requiring many more episodes to converge. - The learning rate would need to be very small to prevent instability in high-reward episodes, but this would make learning painfully slow in low-reward episodes. - In sports betting environments where indi → Chapter 42 Quiz: Research Frontiers