Exercises: Introduction to Prediction Models

Exercise 1: Basic Prediction

A model predicts: - Home team rating: +4.5 - Away team rating: +1.2 - Home field advantage: 2.5 points

Calculate: a) The predicted point spread b) The home team's win probability (use: wp = 1 / (1 + 10^(spread/8))) c) The predicted scores if total is 46 points

Exercise 2: Brier Score Calculation

Calculate the Brier score for these predictions:

Game	Predicted Home Win Prob	Actual Winner
1	0.65	Home
2	0.72	Home
3	0.55	Away
4	0.40	Away
5	0.80	Away

Compare to the baseline Brier score of always predicting 0.50.

Exercise 3: Straight-Up vs ATS Accuracy

A model made these predictions vs actual results:

Game	Predicted Spread	Vegas Spread	Actual Spread
1	-6.5	-7.0	-3
2	-3.0	-3.5	-7
3	+1.5	+2.5	+10
4	-10.0	-7.0	-14
5	+4.0	+3.0	-2

Calculate: a) Straight-up accuracy b) Against-the-spread accuracy c) Mean absolute error

Exercise 4: Sample Size Analysis

A model shows 58% accuracy over 50 games.

a) Calculate the standard error (SE = sqrt(p*(1-p)/n)) b) Calculate the 95% confidence interval c) Can we conclude the model is better than 50%? d) How many games would we need to be confident the model is >52%?

Exercise 5: Identifying Data Leakage

For each feature, identify if it would cause data leakage when predicting a Week 8 game:

a) Team's rushing yards per game (through Week 7) b) Team's final season win total c) Opponent's passing EPA from the game being predicted d) Weather forecast for game day e) Team's average point differential from Weeks 1-7 f) Quarterback's completion percentage for the season

Exercise 6: Building a Simple Rating System

Given these game results:

Home	Away	Home Score	Away Score
A	B	28	21
C	A	17	24
B	C	31	14
A	D	35	10
D	B	20	27
C	D	28	28

a) Calculate each team's average point differential (accounting for HFA of 3 points) b) Rank the teams c) Predict the spread for: Home: D, Away: A

Exercise 7: Calibration Analysis

Group these predictions and check calibration:

Predicted Prob	Outcome (1=home win)
0.55	1
0.62	0
0.71	1
0.48	0
0.75	1
0.52	1
0.68	1
0.45	0
0.58	0
0.73	1

Create bins: 0.40-0.55, 0.55-0.65, 0.65-0.80 Calculate actual win rate per bin and calibration error.

Exercise 8: Implementing the Prediction Pipeline

Write Python code to implement:

class SimplePredictionPipeline:
    def __init__(self):
        pass

    def load_data(self, games: pd.DataFrame) -> None:
        """Load and prepare game data."""
        pass

    def calculate_ratings(self) -> Dict[str, float]:
        """Calculate team ratings from results."""
        pass

    def predict_game(self, home: str, away: str) -> Dict:
        """Predict a single game."""
        pass

    def evaluate(self, predictions: List, actuals: pd.DataFrame) -> Dict:
        """Evaluate prediction accuracy."""
        pass

Exercise 9: Uncertainty Quantification

A model predicts Team A beats Team B by 7 points.

Given: - Base game variance: σ = 13.5 points - Rating sample size: 10 games (adds uncertainty) - Model confidence: 0.7

a) Calculate total prediction uncertainty b) Construct a 90% confidence interval for the spread c) What's the probability Team B actually wins?

Exercise 10: Overfitting Detection

You have training data (100 games) and test data (50 games).

Model A results: - Training accuracy: 72% - Test accuracy: 53%

Model B results: - Training accuracy: 58% - Test accuracy: 56%

a) Which model is overfit? b) Which model would you deploy? c) What does this tell us about model complexity?

Exercise 11: Converting Between Formats

Convert between prediction formats:

a) Spread of -7 → Win probability b) Win probability of 0.65 → Spread c) Home score 27, Away score 24 → Spread and total d) Spread -3, Total 44 → Predicted scores

Exercise 12: Adjustment Factors

Calculate the total adjustment for this game:

Home team: 10 days rest
Away team: 6 days rest
Away team traveled 2,500 miles
Timezone difference: Away is 3 hours behind (traveling east)
Temperature: 25°F (outdoor stadium)
Home team missing starting QB (estimate: -3 points)

Use: Rest = 0.5 pts/day difference, Long travel = 1.5 pts, Timezone = 1 pt

Exercise 13: Comparing to Market

Your model predictions vs Vegas lines:

Game	Your Spread	Vegas Spread
1	-6.0	-7.5
2	-3.0	-2.5
3	+1.0	-1.0
4	-10.0	-9.0
5	+5.0	+6.5

a) For which games do you have the biggest "edge"? b) Which side would you bet for each game? c) What's the average absolute difference from Vegas?

Exercise 14: Season Win Projection

Project season wins for a team with: - Power rating: +3.0 (points above average) - Schedule SOS: -0.5 (slightly below average opponents) - Games remaining: 17

a) Calculate expected win probability per game b) Project total wins c) Calculate 90% confidence interval for wins d) Estimate playoff probability (assume 10+ wins needed)

Exercise 15: Building a Model Evaluation Framework

Create a comprehensive evaluation class:

class ModelEvaluator:
    def __init__(self, predictions: List[Dict], actuals: pd.DataFrame):
        pass

    def straight_up_accuracy(self) -> float:
        pass

    def ats_accuracy(self, market_spreads: List[float]) -> float:
        pass

    def brier_score(self) -> float:
        pass

    def mae_spread(self) -> float:
        pass

    def calibration_plot(self) -> None:
        """Create calibration visualization."""
        pass

    def generate_report(self) -> Dict:
        """Generate comprehensive evaluation report."""
        pass

Exercise 16: Understanding Variance

NFL spread standard deviation is approximately 13.5 points.

a) If you predict a team wins by 7, what's the probability they actually lose? b) If you're 60% confident in a pick, what spread does that imply? c) Over 100 games with true 55% skill, what's the probability of appearing to have <50% accuracy?

Exercise 17: Feature Importance Analysis

You have these potential features for a prediction model: 1. Team power rating 2. Opponent power rating 3. Home field advantage 4. Rest days difference 5. Travel distance 6. Weather conditions 7. Injuries 8. Recent form (last 3 games)

Rank these by expected importance and justify your ranking.

Exercise 18: Temporal Data Split

Given games from 2018-2023:

a) Design a proper train/test split b) Explain why you can't randomly split NFL data c) How would you implement walk-forward validation? d) What's the minimum test set size you'd need?

Exercise 19: Model Comparison

Two models made predictions for the same 200 games:

Metric	Model A	Model B
SU Accuracy	58%	55%
ATS Accuracy	51%	53%
Brier Score	0.235	0.218
MAE	11.2	10.5

a) Which model is better for picking winners? b) Which model is better for betting? c) Which model has better probability calibration? d) Overall, which model would you prefer and why?

Exercise 20: Complete Prediction Model

Build a complete prediction model that:

Loads historical game data
Engineers features (ratings, HFA, adjustments)
Makes predictions for upcoming games
Evaluates performance
Reports uncertainty

class CompletePredictionModel:
    """
    A full prediction model implementation.

    Requirements:
    - Calculate team ratings
    - Apply home field advantage
    - Include at least 2 adjustment factors
    - Output win probability, spread, and confidence
    - Implement proper evaluation
    """

    def __init__(self):
        pass

    def fit(self, games: pd.DataFrame) -> None:
        pass

    def predict(self, home: str, away: str, context: Dict = None) -> Dict:
        pass

    def evaluate(self, test_games: pd.DataFrame) -> Dict:
        pass

Test with sample data and report evaluation metrics.

Submission Guidelines

For coding exercises: 1. Include all imports 2. Add docstrings explaining your approach 3. Test with provided sample data 4. Include example output 5. Report evaluation metrics where applicable

For analytical exercises: 1. Show all calculations 2. Explain your reasoning 3. Discuss uncertainty and limitations