Chapter 6 Exercises: Descriptive Statistics for Sports
Part A: Measures of Central Tendency (Exercises 1-6)
Exercise 1: NFL Scoring Averages
The following table shows points scored by the Kansas City Chiefs in their first 10 games of a season:
| Game | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Pts | 34 | 20 | 41 | 17 | 27 | 30 | 44 | 23 | 31 | 26 |
(a) Calculate the mean points scored per game.
(b) Calculate the median points scored per game.
(c) If the sportsbook set an over/under line at 28.5 for their next game, which measure of central tendency would better inform your betting decision? Explain your reasoning.
(d) Suppose Game 7 (44 points) was actually a data entry error and should have been 24 points. Recalculate the mean and median. Which measure changed more? What does this tell you about robustness?
Exercise 2: Weighted Batting Averages
A baseball player has the following batting averages across different pitch types:
| Pitch Type | At-Bats | Batting Avg |
|---|---|---|
| Fastball | 180 | .312 |
| Curveball | 65 | .245 |
| Slider | 90 | .278 |
| Changeup | 45 | .200 |
| Cutter | 20 | .350 |
(a) Calculate the simple (unweighted) mean of the batting averages.
(b) Calculate the weighted mean batting average using at-bats as weights.
(c) Why does the weighted mean differ from the simple mean? Which is more meaningful for evaluating the player's overall performance?
(d) A sportsbook offers a prop bet on whether this player will get a hit in his next at-bat. The implied probability from the odds is 30%. If the next pitch is expected to be a fastball, is there value in this bet? What if the pitch type is unknown?
Exercise 3: Trimmed Mean for Soccer Goals
The following data shows goals scored per game by a Premier League team over 20 matches:
0, 1, 1, 2, 0, 3, 1, 2, 1, 0, 7, 2, 1, 1, 3, 0, 2, 1, 1, 6
(a) Calculate the mean, median, and mode.
(b) Calculate the 10% trimmed mean (remove the top and bottom 10% of values before averaging).
(c) Calculate the 20% trimmed mean.
(d) The team's over/under line is set at 1.5 goals for their next match. Based on your calculations, would you lean toward the over or under? Justify using the appropriate central tendency measure.
Exercise 4: Comparing Averages Across Leagues
You are comparing scoring environments across four basketball leagues:
| League | Mean PPG | Median PPG | Mode PPG |
|---|---|---|---|
| NBA | 112.4 | 111.0 | 108 |
| EuroLeague | 79.8 | 80.5 | 82 |
| NBL (Aus) | 88.2 | 87.0 | 85 |
| CBA (China) | 105.6 | 104.0 | 101 |
(a) For each league, determine the skewness direction based on the relationship between mean, median, and mode.
(b) Explain why the NBA distribution is right-skewed. What types of games create the long right tail?
(c) If you were building a model to predict game totals in the EuroLeague, would you use the mean or median as your baseline? Why?
Exercise 5: Programming Central Tendency
Write a Python function that accepts a list of game scores and returns a dictionary containing:
- Mean
- Median
- Mode (handle multimodal cases)
- 10% trimmed mean
- Geometric mean (useful for rates and ratios)
Test your function with the following NBA team game scores:
scores = [98, 105, 112, 88, 134, 101, 99, 107, 95, 110,
103, 115, 92, 108, 121, 97, 106, 100, 113, 104]
Verify your results against numpy and scipy.stats built-in functions.
Exercise 6: Moving Averages for Trend Detection
An NHL team's goals scored over 15 consecutive games are:
2, 3, 1, 4, 2, 5, 3, 4, 6, 3, 5, 4, 7, 5, 6
(a) Calculate the 3-game simple moving average for games 3 through 15.
(b) Calculate the 5-game simple moving average for games 5 through 15.
(c) Calculate the exponentially weighted moving average (EWMA) with alpha = 0.3 for all 15 games.
(d) Plot all three moving averages on the same graph. Which one is most responsive to recent performance changes? Which would you prefer for betting purposes?
(e) Write Python code to compute and plot all three moving averages.
Part B: Measures of Variability (Exercises 7-12)
Exercise 7: Team Consistency Analysis
Two NFL quarterbacks have the following passer ratings over 8 games:
QB A: 98.2, 105.7, 92.4, 110.3, 95.8, 102.1, 99.6, 107.4
QB B: 72.5, 130.2, 55.8, 142.1, 88.3, 115.6, 68.9, 138.7
(a) Calculate the mean, variance, and standard deviation for each quarterback.
(b) Calculate the coefficient of variation (CV) for each.
(c) Calculate the interquartile range (IQR) for each.
(d) Which quarterback would you prefer to bet on for a prop bet requiring at least a 95.0 passer rating? Show the reasoning using the empirical rule (assuming approximate normality).
Exercise 8: Spread Variability and ATS Records
The following data shows the margin of victory (positive = win, negative = loss) for a college football team over 12 games:
14, -3, 7, 28, -10, 3, 21, -7, 10, 35, 1, -14
The point spreads for these games were:
-7, 3, -3, -14, 6, -1, -10, 4, -6, -17, -2, 8
(a) Calculate the mean and standard deviation of the actual margins.
(b) Calculate the ATS (against the spread) margin for each game: ATS margin = actual margin - (-spread).
(c) Calculate the mean and standard deviation of the ATS margins.
(d) What percentage of games did the team cover the spread? Is this consistent with the expected ~50% rate?
(e) Calculate the variance of the ATS margins. How does this compare to the variance of actual margins? What does this relationship tell you about how well the spreads captured the team's true ability?
Exercise 9: Coefficient of Variation Across Sports
Calculate the coefficient of variation for scoring in each sport using the following season data:
NFL (Points per game, 10 teams):
23.5, 28.1, 19.8, 25.4, 30.2, 21.7, 26.9, 18.3, 24.6, 27.8
NBA (Points per game, 10 teams):
108.2, 115.7, 102.4, 110.9, 119.3, 105.8, 112.6, 99.1, 107.5, 114.8
MLB (Runs per game, 10 teams):
4.2, 5.1, 3.8, 4.7, 5.5, 3.5, 4.9, 3.2, 4.4, 5.0
NHL (Goals per game, 10 teams):
2.8, 3.4, 2.5, 3.1, 3.7, 2.3, 3.3, 2.1, 2.9, 3.5
(a) For each sport, calculate the mean, standard deviation, and CV.
(b) Which sport has the most relative variability in team scoring? Which has the least?
(c) How does scoring variability affect the difficulty of setting accurate point spreads and totals? Which sport should theoretically have the most accurate lines?
Exercise 10: Range-Based Volatility
A basketball player's scoring over 20 games is:
22, 18, 31, 15, 28, 24, 19, 35, 12, 27,
20, 33, 16, 25, 29, 21, 14, 38, 23, 26
(a) Calculate the range.
(b) Calculate the IQR.
(c) Identify any outliers using the 1.5 * IQR rule.
(d) Calculate the mean absolute deviation (MAD).
(e) A sportsbook sets the player's points prop at 23.5. Using the standard deviation, estimate the probability the player scores over 23.5 (assume normality). Does this match the actual proportion from the data?
Exercise 11: Programming Variability Metrics
Write a Python class called SportVariability that:
- Accepts a list of numeric data points (e.g., game scores).
- Computes and stores: variance, standard deviation, CV, IQR, range, MAD, and median absolute deviation.
- Has a method
consistency_rating()that returns "High," "Medium," or "Low" based on the CV (thresholds: CV < 0.10 = High, 0.10-0.25 = Medium, > 0.25 = Low). - Has a method
identify_outliers(method='iqr')supporting both the IQR method and z-score method. - Has a
summary()method that prints a formatted report.
Test with data for two contrasting teams.
Exercise 12: Rolling Volatility
Using the following 20-game scoring sequence for an NBA team:
102, 98, 115, 108, 95, 121, 104, 99, 110, 113,
107, 125, 96, 118, 103, 130, 94, 112, 109, 128
(a) Calculate the 5-game rolling standard deviation for games 5 through 20.
(b) Plot the rolling standard deviation over time.
(c) Identify periods of high and low volatility.
(d) If a sportsbook adjusts their lines slowly, during which periods would you expect to find the most value? Why?
(e) Write Python code that computes rolling volatility and flags games where the rolling std exceeds 1.5 times the overall std.
Part C: Distribution Shape and Correlation (Exercises 13-18)
Exercise 13: Skewness and Kurtosis
Calculate the skewness and kurtosis for the following three NFL teams' point differentials over 16 games:
Team A (Dominant):
14, 7, 21, 10, 3, 17, 28, 6, 11, 14, 24, 8, 19, 13, 7, 10
Team B (Average):
3, -7, 10, -3, 7, -1, 14, -10, 5, -5, 8, -8, 2, -4, 11, -6
Team C (Erratic):
35, -21, 1, -28, 42, -3, -17, 30, -14, 7, -35, 24, -9, 38, -24, 3
(a) Calculate the mean, standard deviation, skewness, and kurtosis for each team.
(b) Create histograms for each team. Describe the shape of each distribution.
(c) Which team would be hardest for a sportsbook to set accurate lines for? Explain using the statistical measures.
(d) For each team, estimate the probability of a point differential exceeding +14 using both (i) the normal approximation and (ii) the actual data. Discuss any discrepancies.
Exercise 14: Correlation Between Offensive and Defensive Stats
The following data shows season averages for 10 NBA teams:
| Team | PPG (Offense) | Opp PPG (Defense) | Rebounds | Assists | Win% |
|---|---|---|---|---|---|
| A | 114.2 | 108.5 | 45.3 | 26.1 | .634 |
| B | 108.7 | 105.2 | 43.8 | 24.5 | .573 |
| C | 112.5 | 112.8 | 44.1 | 25.8 | .488 |
| D | 105.3 | 100.1 | 46.2 | 22.9 | .622 |
| E | 118.9 | 115.4 | 42.7 | 28.3 | .524 |
| F | 101.4 | 107.6 | 47.5 | 21.4 | .427 |
| G | 110.8 | 103.9 | 44.9 | 25.2 | .610 |
| H | 116.1 | 113.7 | 41.6 | 27.6 | .512 |
| I | 103.9 | 98.4 | 48.1 | 23.1 | .659 |
| J | 107.2 | 110.3 | 45.7 | 24.0 | .451 |
(a) Calculate the Pearson correlation coefficient between PPG and Win%.
(b) Calculate the Pearson correlation coefficient between Opp PPG and Win%.
(c) Calculate the correlation between Net Rating (PPG - Opp PPG) and Win%.
(d) Which single variable is the best predictor of Win%? Is offense or defense more correlated with winning?
(e) Create a correlation matrix for all five variables and identify the three strongest correlations.
Exercise 15: Scatter Plot Analysis
Using the data from Exercise 14:
(a) Create a scatter plot of Net Rating vs. Win% with a linear regression line.
(b) Calculate the R-squared value. What percentage of the variance in Win% is explained by Net Rating?
(c) Based on the regression equation, what Net Rating corresponds to a .500 Win%?
(d) If a team has a Net Rating of +5.0, what Win% does the model predict? How confident should you be in this prediction?
Exercise 16: Correlation vs. Causation in Sports
Consider the following correlations found in real sports data:
| Variable Pair | Correlation (r) |
|---|---|
| NFL team rushing yards per game vs. Win% | +0.42 |
| NFL team time of possession vs. Win% | +0.38 |
| NFL team third-down conversion rate vs. Win% | +0.61 |
| NBA team three-point attempts vs. Win% | +0.15 |
| MLB team batting average vs. Win% | +0.44 |
| MLB team ERA vs. Win% | -0.67 |
(a) For each pair, discuss whether the relationship is likely causal, partially causal, or spurious.
(b) NFL teams that are winning tend to run the ball more in the second half. How does this confound the rushing yards vs. Win% correlation? What is this phenomenon called?
(c) If you were building a betting model, which correlations above would you rely on most heavily? Which would you treat with skepticism?
(d) Describe a method to test whether a correlation between two sports statistics is genuinely predictive rather than just descriptive.
Exercise 17: Rank Correlation
The following table shows 8 NFL teams ranked by two different metrics:
| Team | Rank by Total Yards | Rank by Points Scored |
|---|---|---|
| A | 1 | 3 |
| B | 2 | 1 |
| C | 3 | 5 |
| D | 4 | 2 |
| E | 5 | 4 |
| F | 6 | 8 |
| G | 7 | 6 |
| H | 8 | 7 |
(a) Calculate Spearman's rank correlation coefficient.
(b) Interpret the result. Are yardage rankings and scoring rankings strongly related?
(c) Why might Spearman's rank correlation be preferable to Pearson's correlation when comparing rankings across different sports metrics?
(d) Team C ranks 3rd in yards but 5th in points. What football concept explains why some teams gain many yards but don't score proportionally?
Exercise 18: Autocorrelation in Sports Performance
An NBA team's point differentials over 12 consecutive games are:
+8, +12, +5, +15, -3, -7, -11, -2, +6, +10, +14, +9
(a) Calculate the lag-1 autocorrelation (correlation between each game's differential and the previous game's differential).
(b) Calculate the lag-2 autocorrelation.
(c) Do these results suggest the team goes through "hot" and "cold" streaks? Or is the pattern consistent with random variation?
(d) Write Python code to calculate and plot the autocorrelation function (ACF) for lags 1 through 6. Use statsmodels.tsa.stattools.acf for verification.
(e) How would evidence of significant autocorrelation affect your betting strategy?
Part D: Data Visualization and Interpretation (Exercises 19-24)
Exercise 19: Histogram Construction and Analysis
Using the following distribution of MLB game total runs (50 games):
5, 8, 3, 11, 7, 6, 9, 4, 10, 7, 6, 8, 5, 12, 3, 7, 9, 6, 8, 4,
11, 7, 5, 10, 6, 8, 7, 9, 3, 14, 6, 8, 5, 7, 11, 4, 9, 7, 6, 10,
8, 5, 7, 13, 6, 9, 4, 8, 7, 15
(a) Construct a frequency distribution table with bins of width 2 (1-2, 3-4, 5-6, 7-8, 9-10, 11-12, 13-14, 15-16).
(b) Create a histogram using Python's matplotlib.
(c) Calculate the proportion of games with total runs over 8.5 (a common MLB over/under).
(d) Overlay a normal distribution curve on the histogram using the sample mean and standard deviation. Does the data appear approximately normal?
(e) If the over/under is set at 8.5 and the vig implies 52.4% probability for both sides, does the data suggest value on the over or the under?
Exercise 20: Box Plot Comparison
Create side-by-side box plots for the following three NBA players' scoring over 15 games:
Player X: 22, 25, 18, 30, 21, 27, 24, 19, 26, 23, 28, 20, 25, 22, 31
Player Y: 15, 35, 12, 40, 18, 32, 8, 38, 20, 28, 14, 36, 10, 42, 22
Player Z: 24, 25, 23, 26, 24, 25, 23, 27, 24, 25, 26, 23, 25, 24, 26
(a) For each player, calculate the five-number summary (min, Q1, median, Q3, max).
(b) Create box plots using matplotlib or seaborn.
(c) Identify any outliers for each player.
(d) A sportsbook sets all three players' points props at 24.5. For which player is the over/under most difficult to predict? For which is it easiest? Explain using the box plot characteristics.
(e) If you had to bet one player over 24.5, which would you choose and why?
Exercise 21: Time Series Visualization
An NFL team's weekly point spreads and actual margins over a 17-game season:
| Week | Spread | Actual Margin |
|---|---|---|
| 1 | -3.0 | +7 |
| 2 | -4.5 | -3 |
| 3 | -6.0 | +14 |
| 4 | -3.5 | +1 |
| 5 | -7.0 | +10 |
| 6 | -5.5 | -7 |
| 7 | -4.0 | +3 |
| 8 | -6.5 | +21 |
| 9 | -8.0 | +6 |
| 10 | -7.5 | +12 |
| 11 | -9.0 | +2 |
| 12 | -6.0 | -1 |
| 13 | -7.0 | +15 |
| 14 | -8.5 | +8 |
| 15 | -10.0 | +3 |
| 16 | -7.5 | +10 |
| 17 | -9.0 | +5 |
(a) Create a dual-axis time series plot showing both the spread and actual margin over the season.
(b) Calculate the ATS margin for each week and plot a cumulative ATS margin chart.
(c) Calculate the rolling 4-week ATS performance.
(d) At what point in the season did the market "catch up" to this team's true ability? Identify the inflection point.
(e) Write Python code to create all three visualizations.
Exercise 22: Heatmap Creation
Create a correlation heatmap using the following NBA team statistics for 8 teams:
| Team | PPG | RPG | APG | SPG | BPG | TOV | FG% | 3P% | FT% | Win% |
|---|---|---|---|---|---|---|---|---|---|---|
| A | 112 | 44 | 25 | 7.8 | 5.2 | 14 | .478 | .372 | .785 | .610 |
| B | 108 | 46 | 23 | 8.1 | 4.8 | 13 | .462 | .358 | .792 | .573 |
| C | 115 | 42 | 27 | 7.2 | 4.5 | 15 | .485 | .385 | .770 | .549 |
| D | 104 | 48 | 21 | 8.5 | 5.8 | 12 | .455 | .342 | .805 | .634 |
| E | 119 | 41 | 28 | 6.9 | 4.2 | 16 | .492 | .395 | .762 | .524 |
| F | 106 | 45 | 24 | 7.5 | 5.0 | 13 | .468 | .365 | .798 | .585 |
| G | 110 | 43 | 26 | 7.3 | 4.7 | 14 | .475 | .378 | .780 | .561 |
| H | 102 | 47 | 22 | 8.3 | 5.5 | 11 | .450 | .335 | .810 | .646 |
(a) Calculate the full 10x10 correlation matrix.
(b) Create an annotated heatmap using seaborn.
(c) Identify the three strongest positive correlations and three strongest negative correlations.
(d) Which statistics are most positively correlated with Win%? Which are most negatively correlated?
(e) Based on the heatmap, suggest which statistics a betting model should prioritize.
Exercise 23: Distribution Comparison Visualization
Create overlapping density plots comparing the point distributions of three different eras of NFL scoring:
Era 1 (2000-2005 style), 30 games:
13, 17, 20, 10, 24, 14, 21, 16, 19, 23, 12, 18, 27, 15, 22,
11, 20, 17, 25, 14, 19, 16, 21, 13, 26, 18, 15, 22, 20, 17
Era 2 (2010-2015 style), 30 games:
21, 27, 17, 31, 24, 20, 28, 23, 19, 34, 22, 26, 16, 30, 25,
18, 29, 21, 33, 24, 27, 20, 32, 23, 28, 19, 35, 26, 22, 30
Era 3 (2020-2025 style), 30 games:
24, 30, 20, 35, 27, 23, 31, 26, 22, 38, 25, 29, 19, 34, 28,
21, 33, 24, 37, 27, 31, 23, 36, 26, 32, 22, 41, 29, 25, 34
(a) Create overlapping kernel density estimate (KDE) plots for all three eras.
(b) Calculate descriptive statistics for each era (mean, median, std, skewness).
(c) Quantify the scoring inflation across eras. What is the average increase per era?
(d) If a sportsbook uses historical averages without adjusting for era, how would this create value opportunities?
Exercise 24: QQ-Plot Analysis
Using the 50-game MLB data from Exercise 19:
(a) Create a QQ-plot against a normal distribution.
(b) Create a QQ-plot against a Poisson distribution with lambda equal to the sample mean.
(c) Which theoretical distribution better fits the data? Explain your reasoning from the QQ-plot patterns.
(d) Perform a Shapiro-Wilk test and a chi-squared goodness-of-fit test for both distributions. Report the p-values.
(e) Why does the distributional assumption matter for setting over/under lines? How would using the wrong distribution affect a sportsbook's edge?
Part E: Applied Betting Analysis (Exercises 25-30)
Exercise 25: Building a Descriptive Statistics Dashboard
Write a Python program that creates a comprehensive descriptive statistics dashboard for a sports team. The dashboard should include:
- Summary statistics panel: Mean, median, mode, std, CV, skewness, kurtosis.
- Distribution panel: Histogram with KDE overlay and normal curve.
- Trend panel: Time series of scores with moving averages (3-game, 5-game, 10-game).
- Volatility panel: Rolling standard deviation over time.
- Comparison panel: Box plots comparing home vs. away performance.
Use the following data for an NBA team over 40 games (first 20 home, last 20 away):
Home: 112, 108, 119, 105, 115, 110, 122, 107, 118, 113, 109, 124, 106, 116, 111, 120, 108, 117, 114, 121
Away: 102, 98, 108, 95, 105, 100, 112, 97, 107, 103, 99, 114, 96, 106, 101, 110, 98, 107, 104, 111
Exercise 26: Over/Under Value Detection
A sportsbook sets an NBA game total at 215.5 (-110 both sides). You have the following data:
Team A last 20 games (total points in game):
210, 225, 198, 232, 215, 208, 221, 203, 228, 212,
219, 205, 230, 211, 224, 200, 218, 209, 226, 214
Team B last 20 games (total points in game):
205, 218, 195, 228, 210, 203, 222, 198, 225, 208,
215, 200, 226, 207, 220, 196, 214, 205, 223, 210
(a) Calculate descriptive statistics for both teams' game totals.
(b) Calculate the expected game total using a simple average of the two teams' means.
(c) Calculate the percentage of games each team would have gone over 215.5.
(d) Calculate the z-score for 215.5 based on each team's distribution.
(e) Assuming -110 odds require a 52.4% win rate to be profitable, is there value on either side? Show the full calculation.
Exercise 27: Player Prop Analysis
A sportsbook offers the following player prop for an NBA player: Over/Under 22.5 points (-115/-105).
The player's last 30 games scoring:
25, 18, 30, 22, 27, 15, 32, 20, 24, 28,
19, 26, 21, 33, 17, 29, 23, 16, 31, 24,
20, 27, 22, 35, 18, 26, 23, 14, 30, 25
(a) Calculate the mean, median, and standard deviation.
(b) What percentage of games did the player score over 22.5?
(c) Calculate the z-score for 22.5 and the corresponding probability assuming normality.
(d) The -115 over price implies a probability of 53.5%. The -105 under price implies 51.2%. Is there value on either side?
(e) Split the data into first 15 and last 15 games. Is there a trend? How does recent form affect your analysis?
(f) Write Python code to perform this complete analysis and output a recommendation.
Exercise 28: Cross-Sport Comparison
You want to determine which sport offers the most predictable outcomes for betting purposes. Analyze the following data showing the favorite's margin of victory across 20 games in each sport:
NFL:
7, -3, 14, 1, -7, 10, 3, -14, 21, -1, 6, -10, 17, 4, -6, 8, 2, -3, 11, 5
NBA:
5, -2, 12, 3, -8, 7, 1, -5, 15, -3, 8, -1, 10, 6, -4, 9, 2, -7, 11, 4
MLB (run line of -1.5):
2, -1, 4, -3, 1, 3, -2, -4, 5, 0, 2, -1, 3, 1, -3, 4, -2, 1, 2, -1
NHL (puck line of -1.5):
1, -2, 3, -1, 2, -3, 1, -2, 4, 0, 1, -1, 2, -2, 3, 1, -1, -3, 2, 1
(a) For each sport, calculate: mean, standard deviation, CV, and the percentage of games the favorite won outright.
(b) Which sport has the highest favorite win rate? Which has the most variance in margin?
(c) Calculate the Sharpe ratio for each sport: (mean margin) / (std of margin).
(d) Based on your analysis, in which sport is it most efficient to bet favorites? Explain using descriptive statistics.
Exercise 29: Detecting Line Movement Value
The following data shows opening and closing spreads for 15 NFL games, along with the actual margin:
| Game | Open Spread | Close Spread | Actual Margin |
|---|---|---|---|
| 1 | -3.0 | -3.5 | -7 |
| 2 | -7.0 | -6.0 | +3 |
| 3 | -1.0 | -2.5 | -10 |
| 4 | -5.5 | -7.0 | -14 |
| 5 | -3.0 | -3.0 | +1 |
| 6 | -10.0 | -8.5 | -6 |
| 7 | -4.5 | -6.0 | -3 |
| 8 | -2.5 | -3.0 | -8 |
| 9 | -6.0 | -7.5 | -12 |
| 10 | -1.0 | -1.0 | +4 |
| 11 | -8.0 | -9.5 | -7 |
| 12 | -3.5 | -5.0 | -9 |
| 13 | -6.5 | -6.0 | -2 |
| 14 | -4.0 | -3.0 | +1 |
| 15 | -7.0 | -8.5 | -10 |
Note: Negative margin means the favorite won.
(a) Calculate the ATS record for the favorite using opening spreads vs. closing spreads.
(b) Calculate the mean and standard deviation of ATS margins for both opening and closing spreads.
(c) Is there a correlation between line movement (close - open) and the ATS margin using closing lines?
(d) Do closing lines appear more accurate than opening lines? Quantify the difference.
(e) Write Python code to analyze line movement patterns and determine if betting "into the steam" (following the line movement) is profitable.
Exercise 30: Comprehensive Season Analysis
Write a complete Python program that performs a season-long descriptive statistics analysis. The program should:
-
Data Generation: Create a realistic simulated 17-game NFL season for 8 teams, with each game including: points scored, points allowed, yards gained, yards allowed, turnovers, and time of possession.
-
Team-Level Analysis: For each team, calculate: - All measures of central tendency for points scored - All measures of variability for point differential - Skewness and kurtosis of scoring distribution - Home vs. away splits with statistical comparison
-
League-Level Analysis: - Correlation matrix of all statistics vs. win percentage - Ranking of teams by consistency (CV of point differential) - League-wide scoring distribution with normality tests
-
Betting Analysis: - Simulated spreads and ATS performance - Over/under analysis with value detection - Identification of most and least predictable teams
-
Visualization Suite: - League-wide scoring histogram - Team consistency comparison (box plots) - Correlation heatmap - ATS performance chart - Rolling averages for each team
The program should output a complete report with all statistics, visualizations, and betting recommendations.
Solutions
Solutions for all exercises are available in the accompanying Python file:
code/exercise-solutions.py
For exercises requiring written explanations, reference answers are provided as comments within the solution code.