Exercises: Introduction to Prediction Models
Exercise 1: Basic Prediction
A model predicts: - Home team rating: +4.5 - Away team rating: +1.2 - Home field advantage: 2.5 points
Calculate: a) The predicted point spread b) The home team's win probability (use: wp = 1 / (1 + 10^(spread/8))) c) The predicted scores if total is 46 points
Exercise 2: Brier Score Calculation
Calculate the Brier score for these predictions:
| Game | Predicted Home Win Prob | Actual Winner |
|---|---|---|
| 1 | 0.65 | Home |
| 2 | 0.72 | Home |
| 3 | 0.55 | Away |
| 4 | 0.40 | Away |
| 5 | 0.80 | Away |
Compare to the baseline Brier score of always predicting 0.50.
Exercise 3: Straight-Up vs ATS Accuracy
A model made these predictions vs actual results:
| Game | Predicted Spread | Vegas Spread | Actual Spread |
|---|---|---|---|
| 1 | -6.5 | -7.0 | -3 |
| 2 | -3.0 | -3.5 | -7 |
| 3 | +1.5 | +2.5 | +10 |
| 4 | -10.0 | -7.0 | -14 |
| 5 | +4.0 | +3.0 | -2 |
Calculate: a) Straight-up accuracy b) Against-the-spread accuracy c) Mean absolute error
Exercise 4: Sample Size Analysis
A model shows 58% accuracy over 50 games.
a) Calculate the standard error (SE = sqrt(p*(1-p)/n)) b) Calculate the 95% confidence interval c) Can we conclude the model is better than 50%? d) How many games would we need to be confident the model is >52%?
Exercise 5: Identifying Data Leakage
For each feature, identify if it would cause data leakage when predicting a Week 8 game:
a) Team's rushing yards per game (through Week 7) b) Team's final season win total c) Opponent's passing EPA from the game being predicted d) Weather forecast for game day e) Team's average point differential from Weeks 1-7 f) Quarterback's completion percentage for the season
Exercise 6: Building a Simple Rating System
Given these game results:
| Home | Away | Home Score | Away Score |
|---|---|---|---|
| A | B | 28 | 21 |
| C | A | 17 | 24 |
| B | C | 31 | 14 |
| A | D | 35 | 10 |
| D | B | 20 | 27 |
| C | D | 28 | 28 |
a) Calculate each team's average point differential (accounting for HFA of 3 points) b) Rank the teams c) Predict the spread for: Home: D, Away: A
Exercise 7: Calibration Analysis
Group these predictions and check calibration:
| Predicted Prob | Outcome (1=home win) |
|---|---|
| 0.55 | 1 |
| 0.62 | 0 |
| 0.71 | 1 |
| 0.48 | 0 |
| 0.75 | 1 |
| 0.52 | 1 |
| 0.68 | 1 |
| 0.45 | 0 |
| 0.58 | 0 |
| 0.73 | 1 |
Create bins: 0.40-0.55, 0.55-0.65, 0.65-0.80 Calculate actual win rate per bin and calibration error.
Exercise 8: Implementing the Prediction Pipeline
Write Python code to implement:
class SimplePredictionPipeline:
def __init__(self):
pass
def load_data(self, games: pd.DataFrame) -> None:
"""Load and prepare game data."""
pass
def calculate_ratings(self) -> Dict[str, float]:
"""Calculate team ratings from results."""
pass
def predict_game(self, home: str, away: str) -> Dict:
"""Predict a single game."""
pass
def evaluate(self, predictions: List, actuals: pd.DataFrame) -> Dict:
"""Evaluate prediction accuracy."""
pass
Exercise 9: Uncertainty Quantification
A model predicts Team A beats Team B by 7 points.
Given: - Base game variance: σ = 13.5 points - Rating sample size: 10 games (adds uncertainty) - Model confidence: 0.7
a) Calculate total prediction uncertainty b) Construct a 90% confidence interval for the spread c) What's the probability Team B actually wins?
Exercise 10: Overfitting Detection
You have training data (100 games) and test data (50 games).
Model A results: - Training accuracy: 72% - Test accuracy: 53%
Model B results: - Training accuracy: 58% - Test accuracy: 56%
a) Which model is overfit? b) Which model would you deploy? c) What does this tell us about model complexity?
Exercise 11: Converting Between Formats
Convert between prediction formats:
a) Spread of -7 → Win probability b) Win probability of 0.65 → Spread c) Home score 27, Away score 24 → Spread and total d) Spread -3, Total 44 → Predicted scores
Exercise 12: Adjustment Factors
Calculate the total adjustment for this game:
- Home team: 10 days rest
- Away team: 6 days rest
- Away team traveled 2,500 miles
- Timezone difference: Away is 3 hours behind (traveling east)
- Temperature: 25°F (outdoor stadium)
- Home team missing starting QB (estimate: -3 points)
Use: Rest = 0.5 pts/day difference, Long travel = 1.5 pts, Timezone = 1 pt
Exercise 13: Comparing to Market
Your model predictions vs Vegas lines:
| Game | Your Spread | Vegas Spread |
|---|---|---|
| 1 | -6.0 | -7.5 |
| 2 | -3.0 | -2.5 |
| 3 | +1.0 | -1.0 |
| 4 | -10.0 | -9.0 |
| 5 | +5.0 | +6.5 |
a) For which games do you have the biggest "edge"? b) Which side would you bet for each game? c) What's the average absolute difference from Vegas?
Exercise 14: Season Win Projection
Project season wins for a team with: - Power rating: +3.0 (points above average) - Schedule SOS: -0.5 (slightly below average opponents) - Games remaining: 17
a) Calculate expected win probability per game b) Project total wins c) Calculate 90% confidence interval for wins d) Estimate playoff probability (assume 10+ wins needed)
Exercise 15: Building a Model Evaluation Framework
Create a comprehensive evaluation class:
class ModelEvaluator:
def __init__(self, predictions: List[Dict], actuals: pd.DataFrame):
pass
def straight_up_accuracy(self) -> float:
pass
def ats_accuracy(self, market_spreads: List[float]) -> float:
pass
def brier_score(self) -> float:
pass
def mae_spread(self) -> float:
pass
def calibration_plot(self) -> None:
"""Create calibration visualization."""
pass
def generate_report(self) -> Dict:
"""Generate comprehensive evaluation report."""
pass
Exercise 16: Understanding Variance
NFL spread standard deviation is approximately 13.5 points.
a) If you predict a team wins by 7, what's the probability they actually lose? b) If you're 60% confident in a pick, what spread does that imply? c) Over 100 games with true 55% skill, what's the probability of appearing to have <50% accuracy?
Exercise 17: Feature Importance Analysis
You have these potential features for a prediction model: 1. Team power rating 2. Opponent power rating 3. Home field advantage 4. Rest days difference 5. Travel distance 6. Weather conditions 7. Injuries 8. Recent form (last 3 games)
Rank these by expected importance and justify your ranking.
Exercise 18: Temporal Data Split
Given games from 2018-2023:
a) Design a proper train/test split b) Explain why you can't randomly split NFL data c) How would you implement walk-forward validation? d) What's the minimum test set size you'd need?
Exercise 19: Model Comparison
Two models made predictions for the same 200 games:
| Metric | Model A | Model B |
|---|---|---|
| SU Accuracy | 58% | 55% |
| ATS Accuracy | 51% | 53% |
| Brier Score | 0.235 | 0.218 |
| MAE | 11.2 | 10.5 |
a) Which model is better for picking winners? b) Which model is better for betting? c) Which model has better probability calibration? d) Overall, which model would you prefer and why?
Exercise 20: Complete Prediction Model
Build a complete prediction model that:
- Loads historical game data
- Engineers features (ratings, HFA, adjustments)
- Makes predictions for upcoming games
- Evaluates performance
- Reports uncertainty
class CompletePredictionModel:
"""
A full prediction model implementation.
Requirements:
- Calculate team ratings
- Apply home field advantage
- Include at least 2 adjustment factors
- Output win probability, spread, and confidence
- Implement proper evaluation
"""
def __init__(self):
pass
def fit(self, games: pd.DataFrame) -> None:
pass
def predict(self, home: str, away: str, context: Dict = None) -> Dict:
pass
def evaluate(self, test_games: pd.DataFrame) -> Dict:
pass
Test with sample data and report evaluation metrics.
Submission Guidelines
For coding exercises: 1. Include all imports 2. Add docstrings explaining your approach 3. Test with provided sample data 4. Include example output 5. Report evaluation metrics where applicable
For analytical exercises: 1. Show all calculations 2. Explain your reasoning 3. Discuss uncertainty and limitations