Chapter 2 Exercises: Probability and Odds
Part A: Conceptual Questions (8 Problems)
A1. State the three axioms of probability (Kolmogorov axioms) and explain how each one applies to a sporting event. Use a tennis match between two players as your example, and describe what it would mean if any of the axioms were violated in the context of betting.
A2. A sportsbook lists the following outcomes for an NBA game:
- Team A wins by 10+ points
- Team A wins by 1-9 points
- Team B wins by 1-9 points
- Team B wins by 10+ points
Explain why these outcomes satisfy the requirement of being mutually exclusive and collectively exhaustive. Now consider a sportsbook that also offers "Total points over 210.5" on the same game. Are the original four outcomes and this fifth outcome mutually exclusive? Explain the distinction between mutually exclusive events in the same market versus across different markets.
A3. Define the difference between independent and dependent events. For each of the following pairs, argue whether they are independent or dependent, and justify your reasoning:
a) The outcome of Game 1 of a best-of-seven playoff series and the outcome of Game 2. b) The result of a coin toss to determine which team kicks off and the final result of a football match. c) A starting pitcher being announced as injured two hours before game time and the movement of the moneyline for that game. d) The outcome of an English Premier League match on Saturday and the outcome of a La Liga match on the same Saturday.
A4. Explain the concept of conditional probability using the following scenario: In the NFL, a team that scores first wins approximately 65% of the time. A bettor placed a pre-game wager on Team A at even odds. Team B scores first.
a) What is the conditional probability that Team A wins, given that Team B scored first? b) How does this relate to the concept of live (in-play) betting odds? c) Why does the sportsbook adjust its lines after the first score?
A5. The Gambler's Fallacy is the mistaken belief that past independent events affect future probabilities. Provide three specific examples of how the Gambler's Fallacy manifests in sports betting:
a) One example involving a roulette-style independent event within a sport. b) One example involving a team's winning or losing streak. c) One example involving a bettor's personal history of wins and losses.
For each, explain why the reasoning is fallacious and what the correct probabilistic reasoning should be.
A6. Distinguish between frequentist probability and subjective (Bayesian) probability in the context of sports betting. A sharp bettor estimates that a particular underdog has a 40% chance of winning, while the sportsbook's implied probability is 30%. Explain:
a) What the frequentist interpretation of "40% chance" would mean. b) What the Bayesian interpretation of "40% chance" would mean. c) Why the distinction matters for the bettor's decision-making process.
A7. Explain the Law of Large Numbers and its relevance to sports betting. A bettor has identified bets that they believe have a 5% edge. After 20 bets, they are down 10 units. Does this outcome contradict their claimed edge? How many bets might they need before they can be reasonably confident their results reflect their true edge? Discuss the emotional and financial challenges this mathematical reality creates for bettors.
A8. The concept of "value" in betting is fundamentally a probability statement. Define what it means for a bet to have positive expected value (+EV). Then explain the paradox: if a bettor consistently finds +EV bets, why can they still lose money over a meaningful sample? Connect this to the concepts of variance, bankroll management, and the difference between a single trial and a long-run expectation.
Part B: Calculation Problems (7 Problems)
B1. Convert the following American odds to decimal odds and fractional odds. Then calculate the implied probability for each.
| American Odds | Decimal Odds | Fractional Odds | Implied Probability |
|---|---|---|---|
| -150 | |||
| +200 | |||
| -110 | |||
| +450 | |||
| -300 | |||
| +100 | |||
| -10000 |
Show all work for each conversion.
Answers
| American Odds | Decimal Odds | Fractional Odds | Implied Probability | |--------------|-------------|-----------------|-------------------| | -150 | 1.667 | 2/3 | 60.00% | | +200 | 3.000 | 2/1 | 33.33% | | -110 | 1.909 | 10/11 | 52.38% | | +450 | 5.500 | 9/2 | 18.18% | | -300 | 1.333 | 1/3 | 75.00% | | +100 | 2.000 | 1/1 | 50.00% | | -10000 | 1.010 | 1/100 | 99.01% | **Conversion formulas used:** For negative American odds (e.g., -150): - Decimal = 1 + (100 / |American|) = 1 + (100/150) = 1.667 - Implied Probability = |American| / (|American| + 100) = 150/250 = 60% For positive American odds (e.g., +200): - Decimal = 1 + (American / 100) = 1 + (200/100) = 3.000 - Implied Probability = 100 / (American + 100) = 100/300 = 33.33%B2. A sportsbook offers the following lines on an MLB game:
- New York Yankees: -135
- Boston Red Sox: +120
a) Calculate the implied probability for each outcome. b) Calculate the total implied probability (overround/vig). c) Calculate the bookmaker's margin percentage. d) Remove the vig to find the "fair" or "true" implied probabilities using the multiplicative method. e) If a sharp bettor believes the Yankees have a 55% true probability of winning, is there value on either side?
Answers
a) **Implied probabilities:** - Yankees (-135): 135 / (135 + 100) = 135 / 235 = 57.45% - Red Sox (+120): 100 / (120 + 100) = 100 / 220 = 45.45% b) **Total implied probability (overround):** 57.45% + 45.45% = 102.90% c) **Bookmaker's margin:** 102.90% - 100% = 2.90% Alternatively expressed as: (102.90 - 100) / 102.90 = 2.82% (margin on turnover) d) **Vig-removed probabilities (multiplicative method):** - Yankees: 57.45% / 102.90% = 55.83% - Red Sox: 45.45% / 102.90% = 44.17% - Check: 55.83% + 44.17% = 100.00% e) **Value assessment:** The bettor's estimate of 55% for the Yankees is lower than the vig-removed implied probability of 55.83%. There is no value on the Yankees side. The bettor's implied probability for the Red Sox would be 45%, which is higher than the vig-removed 44.17%. However, the Red Sox line implies 45.45%, which is higher than the bettor's 45%. There is no value on either side for this bettor.B3. Convert the following decimal odds to American odds and fractional odds:
a) 1.50 b) 2.75 c) 1.05 d) 8.00 e) 1.91
Answers
a) **1.50:** - American: Since decimal < 2.00, it's a favorite: -100 / (1.50 - 1) = -100 / 0.50 = **-200** - Fractional: (1.50 - 1) = 0.50 = **1/2** b) **2.75:** - American: Since decimal >= 2.00, it's an underdog: (2.75 - 1) x 100 = **+175** - Fractional: (2.75 - 1) = 1.75 = **7/4** c) **1.05:** - American: -100 / (1.05 - 1) = -100 / 0.05 = **-2000** - Fractional: (1.05 - 1) = 0.05 = **1/20** d) **8.00:** - American: (8.00 - 1) x 100 = **+700** - Fractional: (8.00 - 1) = 7.00 = **7/1** e) **1.91:** - American: -100 / (1.91 - 1) = -100 / 0.91 = **-110** (rounded) - Fractional: (1.91 - 1) = 0.91 = approximately **10/11**B4. An English bookmaker lists the following fractional odds on a Premier League match:
- Home Win: 6/4
- Draw: 9/4
- Away Win: 7/4
a) Convert each to decimal odds and implied probability. b) Calculate the overround. c) What is the bookmaker's expected profit per 100 currency units wagered (assuming equal action on all outcomes)? d) A bettor believes the true probabilities are: Home 38%, Draw 28%, Away 34%. On which outcome(s), if any, should they bet?
Answers
a) **Conversions:** - Home Win (6/4): Decimal = 1 + 6/4 = 2.50; Implied = 1/2.50 = 40.00% - Draw (9/4): Decimal = 1 + 9/4 = 3.25; Implied = 1/3.25 = 30.77% - Away Win (7/4): Decimal = 1 + 7/4 = 2.75; Implied = 1/2.75 = 36.36% b) **Overround:** 40.00% + 30.77% + 36.36% = 107.13% c) **Expected profit per 100 units:** The bookmaker's margin on turnover = (107.13 - 100) / 107.13 = 6.66% Expected profit per 100 units wagered = 6.66 units. d) **Value assessment:** Compare bettor's probabilities to implied probabilities: - Home: Bettor 38% vs. Implied 40.00% -> No value (bettor thinks less likely than line implies) - Draw: Bettor 28% vs. Implied 30.77% -> No value - Away: Bettor 34% vs. Implied 36.36% -> No value The bettor should not bet on any outcome. In all three cases, the bookmaker's implied probability is higher than the bettor's estimated probability, meaning the line already overestimates each outcome's likelihood (from the bettor's perspective).B5. A parlay (accumulator) consists of the following three independent bets:
- Bet 1: Decimal odds 1.80
- Bet 2: Decimal odds 2.10
- Bet 3: Decimal odds 1.50
a) Calculate the combined decimal odds of the parlay. b) Calculate the implied probability of all three bets winning (using the individual implied probabilities). c) Calculate the implied probability from the combined parlay odds. d) If a bettor wagers $50 on this parlay, what is the total payout if all three win? e) The sportsbook offers a "parlay bonus" of 10% on three-leg parlays. What are the new effective combined odds and payout?
Answers
a) **Combined decimal odds:** 1.80 x 2.10 x 1.50 = **5.67** b) **Implied probability from individual legs:** - Leg 1: 1/1.80 = 55.56% - Leg 2: 1/2.10 = 47.62% - Leg 3: 1/1.50 = 66.67% - Combined: 0.5556 x 0.4762 x 0.6667 = 0.1764 = **17.64%** c) **Implied probability from combined parlay odds:** 1/5.67 = **17.64%** (matches, as expected) d) **Payout on $50 wager:** $50 x 5.67 = **$283.50** (total return including stake) Profit = $283.50 - $50 = **$233.50** e) **With 10% parlay bonus:** Bonus applies to profit: $233.50 x 1.10 = $256.85 Total return = $50 + $256.85 = **$306.85** Effective combined odds = $306.85 / $50 = **6.137**B6. Two sportsbooks offer lines on the same tennis match:
Sportsbook A: - Player X: -180 - Player Y: +155
Sportsbook B: - Player X: -160 - Player Y: +145
a) Calculate the overround for each sportsbook. b) Which sportsbook offers better value for a bet on Player X? On Player Y? c) Is there an arbitrage opportunity? If so, calculate the guaranteed profit percentage, and determine the optimal stake allocation for a total investment of $1,000.
Answers
a) **Overround calculations:** Sportsbook A: - Player X (-180): 180/280 = 64.29% - Player Y (+155): 100/255 = 39.22% - Overround: 64.29% + 39.22% = **103.50%** Sportsbook B: - Player X (-160): 160/260 = 61.54% - Player Y (+145): 100/245 = 40.82% - Overround: 61.54% + 40.82% = **102.36%** b) **Better value:** - Player X: Sportsbook B at -160 (lower implied probability of 61.54% vs. 64.29%, so better odds for the bettor) - Player Y: Sportsbook A at +155 (lower implied probability of 39.22% vs. 40.82%, so better odds for the bettor) c) **Arbitrage check:** Best odds: Player X at -160 (Sportsbook B) and Player Y at +155 (Sportsbook A) Combined implied: 61.54% + 39.22% = **100.75%** Since the combined implied probability exceeds 100%, there is **no arbitrage opportunity**. An arbitrage exists only when the combined implied probability of the best available odds across books is below 100%.B7. A futures market for the winner of the FIFA World Cup lists the following odds (decimal) for the top 8 favorites:
| Team | Decimal Odds |
|---|---|
| Brazil | 4.50 |
| France | 5.00 |
| England | 7.00 |
| Argentina | 8.00 |
| Germany | 9.00 |
| Spain | 10.00 |
| Netherlands | 15.00 |
| Portugal | 17.00 |
The remaining teams in the tournament (24 others) have a combined implied probability (before vig removal) of 22%.
a) Calculate the implied probability for each of the top 8 teams. b) Calculate the total overround for the entire market. c) Explain why futures markets typically have much higher overrounds than game-day moneylines. d) Using the multiplicative method, calculate the vig-removed probability for Brazil and for Portugal.
Answers
a) **Implied probabilities:** | Team | Decimal Odds | Implied Probability | |------|-------------|-------------------| | Brazil | 4.50 | 22.22% | | France | 5.00 | 20.00% | | England | 7.00 | 14.29% | | Argentina | 8.00 | 12.50% | | Germany | 9.00 | 11.11% | | Spain | 10.00 | 10.00% | | Netherlands | 15.00 | 6.67% | | Portugal | 17.00 | 5.88% | b) **Total overround:** Top 8 sum: 22.22 + 20.00 + 14.29 + 12.50 + 11.11 + 10.00 + 6.67 + 5.88 = 102.67% Rest of field: 22.00% Total: 102.67% + 22.00% = **124.67%** Overround = 24.67% c) **Why futures have higher overrounds:** Futures markets have higher overrounds because (1) there are many more possible outcomes, each adding a small margin; (2) the event is far in the future, creating more uncertainty; (3) lower liquidity on individual outcomes means wider spreads; (4) the bookmaker faces greater risk from sharp bettors having an information advantage over a longer time horizon. d) **Vig-removed probabilities (multiplicative method):** - Brazil: 22.22% / 124.67% = **17.83%** - Portugal: 5.88% / 124.67% = **4.72%**Part C: Programming Problems (5 Problems)
C1. Build an OddsConverter class in Python with the following specifications:
- Constructor accepts odds in any format (American, decimal, or fractional) with a format indicator.
- Methods:
to_american(),to_decimal(),to_fractional(),to_implied_probability() - All methods should return properly formatted values.
- Handle edge cases: even money (+100 / 2.00 / 1/1), heavy favorites (-10000 / 1.01), long shots (+5000 / 51.00).
- Include input validation that raises meaningful errors for invalid odds (e.g., decimal odds below 1.0, American odds of 0).
- Write at least 10 unit tests covering normal cases and edge cases.
# Starter structure (implement fully):
class OddsConverter:
def __init__(self, odds_value, odds_format='american'):
"""
Initialize with odds in any format.
Args:
odds_value: The odds value. For fractional, pass as string "6/4" or tuple (6, 4).
odds_format: One of 'american', 'decimal', 'fractional'
"""
pass
def to_american(self):
"""Return American odds as integer (e.g., -150 or +200)."""
pass
def to_decimal(self):
"""Return decimal odds rounded to 3 decimal places."""
pass
def to_fractional(self):
"""Return fractional odds as a string (e.g., '3/2')."""
pass
def to_implied_probability(self):
"""Return implied probability as a float between 0 and 1."""
pass
def __repr__(self):
"""String representation showing all formats."""
pass
C2. Write a batch conversion tool that reads a CSV file containing odds from a sportsbook and outputs a new CSV with all formats and implied probabilities. The tool should:
- Accept an input CSV with columns:
event,outcome,odds_value,odds_format - Output a CSV with columns:
event,outcome,american,decimal,fractional,implied_probability - Handle mixed formats in the input (some rows American, some decimal, etc.)
- Include summary statistics at the bottom: average margin per event, highest/lowest implied probability
- Use the
OddsConverterclass from Problem C1
# Example input CSV format:
# event,outcome,odds_value,odds_format
# Lakers vs Celtics,Lakers,-150,american
# Lakers vs Celtics,Celtics,2.60,decimal
# Man City vs Arsenal,Home,4/6,fractional
# Man City vs Arsenal,Draw,3.40,decimal
# Man City vs Arsenal,Away,+350,american
C3. Create an overround analyzer that takes a set of odds for all outcomes in a market and computes:
- Individual implied probabilities for each outcome
- Total overround (raw and as percentage)
- Vig-removed ("true") probabilities using three different methods: 1. Multiplicative method (proportional reduction) 2. Additive method (equal reduction from each outcome) 3. Power method (Shin's method approximation)
- A comparison table showing how the three methods differ
- A visualization (using matplotlib) as a grouped bar chart comparing raw vs. vig-removed probabilities
# Starter structure:
def analyze_overround(odds_dict, odds_format='american'):
"""
Args:
odds_dict: Dictionary mapping outcome names to odds values
e.g., {'Lakers': -150, 'Celtics': +130}
odds_format: Format of the odds values
Returns:
Dictionary containing all analysis results
"""
pass
C4. Build a probability calibration checker. Given a CSV of historical bets with predicted probabilities and actual outcomes, the tool should:
- Group predictions into probability bins (e.g., 0-10%, 10-20%, ..., 90-100%)
- Calculate the actual win rate within each bin
- Compute a calibration score (Brier score)
- Generate a calibration plot (predicted probability vs. actual frequency)
- Identify ranges where the model is overconfident or underconfident
- Output a textual summary of calibration quality
# Example input CSV:
# event,predicted_probability,actual_outcome
# Game 1,0.65,1
# Game 2,0.30,0
# Game 3,0.72,1
def calibration_check(predictions_csv_path):
"""
Analyze how well predicted probabilities match actual outcomes.
Returns:
dict with calibration metrics, bin data, and summary text
"""
pass
C5. Create an interactive odds comparison dashboard (command-line interface) that:
- Allows the user to input odds from multiple sportsbooks for the same event
- Displays a formatted table comparing odds, implied probabilities, and margins across books
- Highlights the best available odds for each outcome
- Checks for arbitrage opportunities and, if found, calculates optimal stake allocation
- Computes the "market consensus" probability (average of vig-removed probabilities across all books)
# The program should support a session like:
# > Enter event name: Lakers vs Celtics
# > Number of sportsbooks: 3
# > Sportsbook 1 name: DraftKings
# > DraftKings - Lakers odds (American): -150
# > DraftKings - Celtics odds (American): +130
# > Sportsbook 2 name: FanDuel
# > ...
#
# Output:
# ┌─────────────┬──────────┬──────────┬──────────┐
# │ Outcome │ DraftKings│ FanDuel │ BetMGM │
# ├─────────────┼──────────┼──────────┼──────────┤
# │ Lakers │ -150 │ -145 │ -155* │
# │ Celtics │ +130 │ +125 │ +135* │
# ├─────────────┼──────────┼──────────┼──────────┤
# │ Overround │ 3.2% │ 3.5% │ 2.9% │
# └─────────────┴──────────┴──────────┴──────────┘
# * Best available odds
# Arbitrage: Not found
# Market consensus: Lakers 58.2%, Celtics 41.8%
Part D: Analysis Problems (5 Problems)
D1. Visit a major online sportsbook (or use provided sample data below) and record the moneyline odds for 10 upcoming games across two different sports. For each game:
a) Convert all odds to implied probabilities. b) Calculate the overround. c) Compare the overround between the two sports. Which sport has tighter margins? Hypothesize why.
Sample data if live odds are unavailable:
| Sport | Game | Home Odds | Away Odds |
|---|---|---|---|
| NBA | Lakers vs Celtics | -140 | +120 |
| NBA | Warriors vs Bucks | +105 | -125 |
| NBA | Nuggets vs Suns | -180 | +155 |
| NBA | Heat vs 76ers | +110 | -130 |
| NBA | Mavericks vs Clippers | -115 | -105 |
| NFL | Chiefs vs Bills | -150 | +130 |
| NFL | Eagles vs Cowboys | -120 | +100 |
| NFL | 49ers vs Ravens | +140 | -165 |
| NFL | Lions vs Bengals | -105 | -115 |
| NFL | Dolphins vs Jets | -200 | +170 |
D2. Compare the odds offered by three different sportsbooks for the same five events. For each event:
a) Identify which book offers the best odds for each outcome. b) Calculate the overround for each book. c) Determine whether any arbitrage opportunities exist. d) Compute the "best available" combined line (cherry-picking the best odds across books) and its combined overround. e) Write a one-paragraph analysis of what the differences in margins tell you about each sportsbook's strategy.
D3. Analyze a three-way market (e.g., soccer match with Home/Draw/Away). Using the following odds from a Premier League match:
- Home Win: 2.10
- Draw: 3.40
- Away Win: 3.80
a) Calculate the raw implied probabilities and overround. b) Apply all three vig-removal methods from Problem C3. c) Discuss which vig-removal method you believe is most appropriate for soccer and why. d) If a bettor's model predicts Home 44%, Draw 27%, Away 29%, identify any value bets. e) Calculate the expected value (in units) per unit wagered for each possible bet.
D4. Examine how odds move over time. Consider the following moneyline movements for an NFL game from Monday to Sunday:
| Day | Home Odds | Away Odds |
|---|---|---|
| Monday | -130 | +110 |
| Tuesday | -135 | +115 |
| Wednesday | -140 | +120 |
| Thursday | -140 | +120 |
| Friday | -145 | +125 |
| Saturday | -150 | +130 |
| Sunday (Kickoff) | -155 | +135 |
a) Calculate the implied probability for both sides on each day. b) Calculate the overround on each day. c) Plot the implied probability trend for the home team over the week. d) What might explain the consistent movement toward the home team? e) If a bettor believed the true probability was constant at 57% for the home team, on which day(s) was there value, and how much expected value per unit?
D5. Obtain the closing odds and actual results for 50 games in a sport of your choice (or use the dataset below). Perform a calibration analysis:
a) Group the implied probabilities (of the favorite) into deciles. b) Calculate the actual win rate in each group. c) Plot a calibration curve. d) Calculate the Brier score for the sportsbook's closing line implied probabilities. e) How well-calibrated are the closing lines? Discuss whether sportsbooks systematically overestimate or underestimate favorites at different probability levels.
A sample dataset of 50 results with closing implied probabilities is available in the course supplementary materials.
Part E: Research Problems (5 Problems)
E1. Research and write a 500-word report comparing how odds are displayed in at least four different countries or regions (e.g., United States, United Kingdom, Continental Europe, Asia/Hong Kong). Your report should cover:
- The standard odds format used in each region
- Historical reasons for the format's development
- How each format handles even-money bets and heavy favorites
- Whether bettors in each region tend to think in terms of probability or potential payout
- The trend toward standardization in online betting platforms
E2. Investigate the concept of the "efficient market hypothesis" as applied to sports betting markets. Write a 400-word summary addressing:
- What market efficiency means in the context of betting odds
- Evidence for and against the efficiency of closing lines
- The role of sharp bettors in driving lines toward efficiency
- How the vig complicates the assessment of market efficiency
- Why some researchers argue that betting markets are more efficient than financial markets for short-term predictions
E3. Research the history and mathematics of the "overround" in bookmaking. Your report (300-500 words) should include:
- When and where the concept of overround originated
- How overround percentages have changed over time (pre-internet vs. modern online markets)
- Typical overround ranges for different sports and bet types today
- The competitive dynamics that have driven margins lower in recent years
- The relationship between overround and the number of possible outcomes in a market
E4. Compare the odds and margins offered by at least three types of betting operators:
- A traditional bookmaker (e.g., Bet365, William Hill)
- A betting exchange (e.g., Betfair, Smarkets)
- A US-focused sportsbook (e.g., DraftKings, FanDuel)
For each, research and document: - Their typical margin/commission structure - How they handle two-way vs. three-way markets - The role of the bettor vs. the house in setting odds - Advantages and disadvantages for the bettor - Which is most favorable for sharp (professional) bettors and why
E5. Research the concept of "Shin probabilities" (Shin, 1991, 1992, 1993) as a method for extracting true probabilities from bookmaker odds. Write a 400-word summary that includes:
- The theoretical basis for Shin's method
- How it differs from the simple multiplicative vig-removal method
- The concept of the "insider trading" parameter (z) in Shin's model
- Practical applications of Shin probabilities in sports analytics
- Limitations and criticisms of the approach
- References to at least two academic papers that have applied or extended Shin's work
Submission Guidelines
- Part A: Written answers, approximately one paragraph per sub-question.
- Part B: Show all mathematical work. Partial credit for correct process with arithmetic errors.
- Part C: Submit well-documented Python code with docstrings, type hints, and unit tests. Code should run without errors.
- Part D: Include all calculations, tables, and any visualizations. Analysis should be supported by the data.
- Part E: Cite all sources. Academic papers should use standard citation format. Web sources should include URL and access date.
Estimated total time: 3 hours (Part A: 30 min, Part B: 40 min, Part C: 60 min, Part D: 30 min, Part E: 20 min)