Chapter 11 Exercises: Understanding Betting Markets
Instructions: Complete all exercises in the parts assigned by your instructor. Show all work for calculation problems. For programming challenges, include comments explaining your logic and provide sample output. Where datasets are referenced, use the companion data files or generate synthetic data as described.
Part A: Market Structure and Efficiency (10 exercises, 5 points each)
Exercise A.1 --- Opening Line Formation
A sportsbook opens a line on an NFL game at Kansas City -3 (-110) / Buffalo +3 (-110). Within 30 minutes, a syndicate places $250,000 on Buffalo +3. The book moves the line to Kansas City -2.5.
(a) Explain why the book moved the line in this direction.
(b) If the syndicate's true probability estimate for Buffalo is 54%, calculate whether there was value at +3 (-110) and whether value still exists at +2.5 (-110).
(c) If no further sharp action arrives, what does the line movement tell recreational bettors about the "smart money" position?
Exercise A.2 --- The Efficient Market Hypothesis in Betting
The Efficient Market Hypothesis (EMH) has three forms: weak, semi-strong, and strong.
(a) Define each form as it applies to sports betting markets.
(b) Under which form would it be impossible to profit by analyzing historical line movement data? Explain why.
(c) Most researchers consider sports betting markets to be approximately semi-strong efficient. What types of information would still provide an edge under this classification?
(d) Give one empirical example from sports betting research that supports semi-strong efficiency and one that challenges it.
Exercise A.3 --- Measuring Market Efficiency with Calibration
You have the following closing line implied probabilities and observed outcomes for 500 NFL games:
| Implied Probability Bin | Games in Bin | Actual Wins |
|---|---|---|
| 50-55% | 120 | 63 |
| 55-60% | 105 | 62 |
| 60-65% | 95 | 59 |
| 65-70% | 80 | 54 |
| 70-75% | 60 | 44 |
| 75-80% | 40 | 31 |
(a) Calculate the actual win percentage for each bin.
(b) Plot (or describe) a calibration chart comparing implied probability (midpoint of each bin) to actual win percentage.
(c) Based on your analysis, does this market appear well-calibrated? Where are the largest deviations?
(d) What statistical test would you use to formally assess calibration, and what is its null hypothesis?
Exercise A.4 --- Sharp vs. Recreational Money
A sportsbook reports the following on a college basketball game:
- 78% of bets are on Team A moneyline
- 55% of money wagered is on Team B moneyline
- The line has moved from Team A -5 to Team A -3.5
(a) Explain the discrepancy between the percentage of bets and the percentage of money.
(b) What does this pattern suggest about whether the "sharp" money is on Team A or Team B?
(c) Why do sportsbooks move lines in response to money rather than number of bets?
(d) If you are a recreational bettor seeing this information on a public dashboard, should you automatically follow the sharp money? Discuss two caveats.
Exercise A.5 --- Line Movement Anatomy
An NBA total opens at 218.5 and closes at 222.0. The following timeline is provided:
| Time | Line | Event |
|---|---|---|
| Open (10:00 AM) | 218.5 | Market opens |
| 10:45 AM | 219.5 | Sharp action detected |
| 1:00 PM | 220.0 | Gradual public action |
| 4:30 PM | 220.5 | Injury report: Starting PG upgraded to probable |
| 6:00 PM | 221.5 | Second wave of sharp action |
| 7:00 PM (close) | 222.0 | Final adjustment |
(a) Calculate the total line movement from open to close.
(b) What percentage of the total movement occurred due to identifiable sharp action versus other factors?
(c) If you had bet the Over at the opening line of 218.5 and the closing line was 222.0, calculate your CLV in points.
(d) Explain why the injury report moved the line upward. What assumption about the point guard's impact is embedded in the 0.5-point move?
Exercise A.6 --- The Favorite-Longshot Bias
Research has documented that longshot bets (high-odds outcomes) tend to be overpriced relative to their true probability, while favorites tend to be underpriced.
(a) Explain the favorite-longshot bias in your own words. Why does it persist?
(b) A moneyline market offers Team A at -350 and Team B at +280. Calculate the implied probabilities including vig and the no-vig probabilities (using the multiplicative method).
(c) If the favorite-longshot bias applies to this market, which side is more likely to offer value, and why?
(d) Describe one sport or market type where the favorite-longshot bias has been documented as especially strong, and one where it is weak or reversed.
Exercise A.7 --- Information Incorporation Speed
At 3:47 PM, a major news outlet reports that a starting quarterback has been ruled out for tonight's game. The following line movements are observed:
| Book | Pre-News Line | 3:48 PM | 3:50 PM | 3:55 PM | 4:00 PM |
|---|---|---|---|---|---|
| Sharp Book A | -7.0 | -5.5 | -5.0 | -4.5 | -4.5 |
| Market Maker B | -7.0 | -6.0 | -5.5 | -5.0 | -4.5 |
| Retail Book C | -7.0 | -7.0 | -6.5 | -6.0 | -5.0 |
| Retail Book D | -7.0 | -7.0 | -7.0 | -6.5 | -5.5 |
(a) Which book incorporates information the fastest? Which is slowest?
(b) If you could place a bet at any book within 30 seconds of the news, which book offers the most value and on which side?
(c) Calculate the implied value (in spread points) of betting at Retail Book D at 3:55 PM versus the consensus closing line of -4.5.
(d) Explain why different books have different speed of adjustment. What structural factors cause this?
Exercise A.8 --- Reverse Line Movement
In an NFL game, 72% of the public bets and 68% of the money are on the home team at -3. Despite this lopsided action, the line moves from -3 to -2.5.
(a) Define "reverse line movement" and explain why it occurs.
(b) What inference can you draw about the nature of the betting action on the away team?
(c) If you observe reverse line movement, does this guarantee that the sharp side is correct? Present a scenario where following RLM would have led to a loss.
(d) What additional data would you want to analyze before using RLM as a betting signal?
Exercise A.9 --- Market Pricing of Key Numbers
In NFL betting, the numbers 3 and 7 are "key numbers" because many games are decided by exactly 3 or 7 points.
(a) Using the historical average that approximately 15% of NFL games land on the spread of 3, explain why lines are "sticky" at -3 and +3.
(b) A book offers Team A -2.5 (-110) and another offers Team A -3 (-120). Assuming your model gives Team A exactly a 52% chance of covering a 2.5-point spread and a 48.5% chance of covering a 3-point spread, calculate the EV of each bet per $100 wagered.
(c) Explain the concept of "buying the half point" and when it is mathematically justified around key numbers.
(d) Why are key numbers less relevant in basketball and baseball betting?
Exercise A.10 --- Steam Moves and Market Coordination
A "steam move" occurs when multiple sportsbooks simultaneously adjust their lines on the same game in the same direction within a very short time window.
(a) Describe the mechanism by which steam moves occur. Who initiates them and how do they propagate?
(b) Why do sportsbooks often move lines even when they have not received significant action on the affected game at their own book?
(c) A bettor claims to have a strategy of "chasing steam" --- betting at books that have not yet moved in response to a steam move. Evaluate the long-term viability of this strategy, including at least two obstacles.
(d) How do automated line-feed services and algorithms contribute to the speed and coordination of steam moves in modern betting markets?
Part B: Closing Line Value (10 exercises, 5 points each)
Exercise B.1 --- CLV Calculation Basics
You placed the following five bets during an NFL week:
| Bet | Your Line | Closing Line | Side |
|---|---|---|---|
| Game 1 | KC -3 (-110) | KC -4.5 (-110) | KC -3 |
| Game 2 | BUF +7 (-110) | BUF +6 (-110) | BUF +7 |
| Game 3 | Over 44.5 (-110) | Over 45.5 (-105) | Over 44.5 |
| Game 4 | LAR -1 (-105) | LAR -1 (-110) | LAR -1 |
| Game 5 | NYJ +3 (-115) | NYJ +2.5 (-110) | NYJ +3 |
(a) For each bet, determine whether you achieved positive or negative CLV.
(b) Express the CLV in points (for spread/total bets) and in implied probability terms (for all bets).
(c) What does a consistent pattern of positive CLV suggest about a bettor's process?
Exercise B.2 --- CLV as a Predictor of Profitability
A bettor provides the following summary of their last 1,000 bets:
- Average CLV: +1.8% (in implied probability)
- Win rate: 51.2% on -110 lines
- ROI: +2.1%
(a) Calculate the expected win rate needed to break even on standard -110 bets.
(b) Is this bettor's actual win rate consistent with their reported CLV? Show the expected ROI given +1.8% CLV.
(c) Explain why CLV is considered a better predictor of long-term profitability than short-term win rate.
(d) If the bettor's CLV dropped to +0.5% over the next 500 bets while their ROI remained at +2.1%, what should they conclude?
Exercise B.3 --- CLV Across Different Markets
Calculate the CLV in implied probability for each of the following bets:
(a) You bet Team A moneyline at +150. The closing line for Team A is +130.
(b) You bet Under 225.5 at -105. The closing line for Under is 223.5 at -110.
(c) You bet Player X Over 22.5 points at -120. The closing line is Over 23.5 at -110.
(d) You bet Team B first-half spread +1.5 at -110. The closing line is +1.0 at -110.
For each, state whether the CLV is positive or negative and calculate the implied probability difference.
Exercise B.4 --- Sample Size and CLV Reliability
(a) A bettor has positive CLV on 55% of their bets over a sample of 50 bets. Calculate the 95% confidence interval for their true CLV hit rate using a normal approximation.
(b) How many bets would the bettor need to achieve a 95% confidence interval width of no more than 5 percentage points?
(c) Explain why a sample of 200 bets is generally considered the minimum for drawing meaningful conclusions about a bettor's CLV performance.
(d) If a bettor has a true CLV hit rate of 58%, what is the probability that they appear to have a negative CLV hit rate (below 50%) over a 100-bet sample?
Exercise B.5 --- The Relationship Between CLV and Profitability
A study finds that in a dataset of 10,000 NFL spread bets, the correlation between bet-level CLV (in points) and bet-level profit is 0.12.
(a) Interpret this correlation coefficient. Is it strong, moderate, or weak?
(b) Why is the correlation between CLV and profit at the individual bet level relatively weak, even though CLV is a strong predictor of long-term profitability?
(c) If you grouped bets into bins of 100 and correlated average CLV with average profit per bin, would you expect the correlation to be higher, lower, or the same? Explain.
(d) Design a simulation to demonstrate that a bettor with consistent +2% CLV will be profitable over 1,000 bets despite high bet-level variance. Describe the parameters you would use.
Exercise B.6 --- CLV and Market Type
A bettor tracks their CLV across different market types over 500 bets:
| Market Type | Bets | Avg CLV (implied prob) | Win Rate |
|---|---|---|---|
| NFL spreads | 150 | +1.5% | 53.3% |
| NFL totals | 100 | +2.1% | 55.0% |
| NBA spreads | 120 | +0.8% | 51.7% |
| NBA player props | 80 | +3.2% | 54.0% (at mixed odds) |
| MLB moneylines | 50 | +1.0% | Various |
(a) Which market type shows the highest CLV? What might explain this?
(b) Is the NBA spread CLV of +0.8% statistically significant at the 95% level, given the sample size of 120 bets? Use a one-sample proportion test comparing win rate to the break-even rate.
(c) Why might player props markets offer more CLV opportunity than main market spreads?
(d) Advise this bettor on how to allocate their time and bankroll across these markets based on the CLV data.
Exercise B.7 --- Negative CLV Analysis
A bettor discovers that they have average CLV of -1.3% over 300 bets, yet they have an ROI of +1.5%.
(a) Is it possible for a bettor to be profitable with negative CLV? Explain the mathematical conditions under which this could occur.
(b) What are three possible explanations for this apparent contradiction?
(c) Should the bettor be concerned about their long-term prospects despite the positive ROI? Why or why not?
(d) What would you recommend the bettor do to investigate whether their positive ROI is sustainable?
Exercise B.8 --- CLV Calculation with Vig Adjustment
When calculating CLV, some analysts use the raw implied probability from the offered odds, while others remove the vig first.
(a) You bet Team A -3 at -110 (implied probability 52.38%). The closing line is Team A -4 at -108/-112. Calculate the closing no-vig probability for Team A -4.
(b) Calculate your CLV using (i) the raw closing implied probability and (ii) the no-vig closing probability.
(c) Which method is more accurate for measuring your true edge? Explain.
(d) Over a large sample, how would the two methods differ in their average CLV estimates?
Exercise B.9 --- CLV Distribution Analysis
You have CLV data for 400 bets with the following distribution:
- Mean CLV: +1.2%
- Standard deviation of CLV: 4.8%
- Skewness: +0.3
(a) What percentage of individual bets have negative CLV, assuming an approximately normal distribution?
(b) Calculate the standard error of the mean CLV.
(c) Construct a 99% confidence interval for the true mean CLV.
(d) The positive skewness suggests that a few bets have very large positive CLV. What types of bets or market situations might produce these outliers?
Exercise B.10 --- Building a CLV Tracking System
Design a CLV tracking system (describe the architecture, do not code it) that includes:
(a) Data collection: What data points must be recorded at the time of bet placement and at market close? List at least eight fields.
(b) Calculation engine: Describe the algorithm for computing CLV for spread bets, total bets, and moneyline bets. Note any differences in the calculation methodology.
(c) Reporting: Design a weekly CLV report template that a bettor would use to evaluate their performance. Include at least five metrics.
(d) Alerting: Describe two automated alerts the system should generate based on CLV trends.
Part C: Applied Analysis and Programming (10 exercises, 6 points each)
Exercise C.1 --- Market Efficiency Test
Write a Python script that tests the efficiency of a sports betting market using the following approach:
(a) Generate (or load) a dataset of 1,000 games with closing implied probabilities and binary outcomes (win/loss).
(b) Bin the games by implied probability (in 5% increments from 50% to 85%).
(c) For each bin, calculate the actual win rate and the 95% confidence interval.
(d) Plot a calibration chart and calculate the Brier score for the market's closing probabilities.
(e) Interpret your results: is the market well-calibrated?
Exercise C.2 --- Line Movement Tracker
Build a Python class LineMovementTracker that:
(a) Accepts a game identifier and records timestamped line updates (time, line, book name).
(b) Computes total movement, sharp action periods (movements > 0.5 points in < 30 minutes), and the percentage of total movement attributable to sharp action.
(c) Identifies the opening and closing lines.
(d) Generates a text-based summary of the line movement history.
Test your class with at least three different movement scenarios.
Exercise C.3 --- CLV Calculator
Implement a Python function calculate_clv() that:
(a) Accepts bet type (spread, total, moneyline), your odds at bet time, and the closing odds.
(b) Computes CLV in implied probability terms, handling the vig removal for both your line and the closing line.
(c) Returns a dictionary with: raw CLV, no-vig CLV, your implied probability, closing implied probability, and edge estimate.
(d) Test with at least five different bet types and odds combinations.
Exercise C.4 --- Sharp Money Detector
Write a Python function that takes as input:
- Percentage of bets on each side
- Percentage of money on each side
- Line movement direction and magnitude
And outputs a "sharp indicator" score from -10 (strong sharp lean on Side A) to +10 (strong sharp lean on Side B), along with a text explanation of the signals detected.
Exercise C.5 --- Market Efficiency Simulation
Create a Monte Carlo simulation in Python that:
(a) Simulates a betting market with an efficient market maker (closing line reflects true probability + noise with standard deviation of 1%).
(b) Simulates 1,000 bettors: 950 recreational (random picks) and 50 sharp (models with 1-3% edge before vig).
(c) Tracks each bettor's CLV and profit over 500 simulated bets.
(d) Produces a scatter plot of average CLV vs. ROI across all bettors and calculates the correlation.
Exercise C.6 --- Key Number Analysis
Using the provided NFL historical data (or synthetic data), write a Python analysis that:
(a) Calculates the frequency of each final margin from 1 to 20 points over a simulated 5-season dataset.
(b) Identifies the key numbers (margins that occur with disproportionate frequency).
(c) Quantifies the value of buying a half-point onto or off of each key number.
(d) Creates a visualization showing the distribution of margins and highlighting key numbers.
Exercise C.7 --- Information Incorporation Speed Analysis
Design and implement a Python simulation that models how different types of sportsbooks incorporate new information:
(a) A "sharp" book that adjusts within 1 minute of information arrival.
(b) A "market maker" book that adjusts within 3 minutes.
(c) A "retail" book that adjusts within 10-15 minutes.
(d) Measure the theoretical profit available to a bettor who can place bets at retail books within 2 minutes of information reaching sharp books.
Exercise C.8 --- Steam Move Detector
Write a Python function detect_steam_move() that analyzes a stream of line movements from multiple books and identifies a steam move when:
- Three or more books move in the same direction
- Within a 5-minute window
- By at least 0.5 points each
The function should return the games affected, the direction and magnitude, and the timestamp of the steam move.
Exercise C.9 --- Calibration Analysis Tool
Build a comprehensive calibration analysis tool in Python that:
(a) Accepts a dataset of (predicted_probability, actual_outcome) pairs.
(b) Generates a calibration plot with confidence bands.
(c) Computes the Brier score, log loss, and calibration error (ECE).
(d) Performs the Hosmer-Lemeshow goodness-of-fit test.
(e) Outputs a written interpretation of the results.
Exercise C.10 --- Full Market Analysis Pipeline
Combine the tools from Exercises C.1-C.9 into a single pipeline that:
(a) Loads a season's worth of betting data (use synthetic data with at least 2,000 games).
(b) Runs market efficiency tests on closing lines.
(c) Analyzes line movement patterns and identifies the percentage of games with significant sharp action.
(d) Calculates CLV for a simulated bettor who bets 200 games using a model with known edge.
(e) Produces a comprehensive report with at least five tables and three visualizations summarizing the findings.
Scoring Summary
| Section | Exercises | Points Each | Total |
|---|---|---|---|
| Part A: Market Structure and Efficiency | 10 | 5 | 50 |
| Part B: Closing Line Value | 10 | 5 | 50 |
| Part C: Applied Analysis and Programming | 10 | 6 | 60 |
| Total | 30 | --- | 160 |