Exercises: Elo and Power Ratings


Exercise 1: Basic Elo Calculation

The Chiefs have an Elo rating of 1650 and the Broncos have 1420.

Tasks: a) Calculate the expected score for the Chiefs b) If K=28 and the Chiefs win, what are both teams' new ratings? c) If K=28 and the Broncos upset the Chiefs, what are both teams' new ratings? d) Compare the magnitude of rating changes in scenarios (b) and (c)


Exercise 2: Home Advantage Impact

Two evenly-matched teams (both rated 1500) play. Home advantage is 48 Elo points.

Tasks: a) What is the home team's expected score? b) Convert this to a win probability c) What point spread does this imply (using 25 Elo = 1 point)? d) If home advantage decreased to 40 Elo, how would probabilities change?


Exercise 3: K-Factor Exploration

Process the following game sequence with K=20 and K=40 separately. Team A starts at 1500, Team B starts at 1500.

Game 1: Team A beats Team B Game 2: Team B beats Team A Game 3: Team A beats Team B Game 4: Team A beats Team B

Tasks: a) Track ratings after each game for both K values b) What are the final ratings with each K-factor? c) Plot the rating trajectories d) Which K-factor would be better for a volatile team? For a consistent team?


Exercise 4: Margin of Victory Multiplier

Implement the FiveThirtyEight margin multiplier:

def margin_multiplier(margin, elo_diff):
    """
    margin: Winner's margin of victory (always positive)
    elo_diff: Winner's Elo - Loser's Elo
    """
    # Your implementation here

Tasks: a) Calculate the multiplier for a 7-point favorite winning by 14 b) Calculate the multiplier for a 7-point underdog winning by 14 c) Calculate the multiplier for a 14-point favorite winning by 28 d) Explain why underdog blowouts produce higher multipliers


Exercise 5: Season Regression

End-of-2023 ratings: - Chiefs: 1680 - Cowboys: 1620 - Patriots: 1350 - Bears: 1380

Tasks: a) Apply 1/3 regression to the mean for all teams b) What is the new rating spread between Chiefs and Bears? c) If the Patriots had a roster continuity of 0.6 while others had 0.8, how might you adjust regression differently? d) Why do we regress more heavily for teams with major changes?


Exercise 6: Building an SRS

Given these game results (no home advantage for simplicity):

Home Away Home Score Away Score
A B 28 21
B C 24 17
C A 20 27
A C 31 14
B A 21 28
C B 10 24

Tasks: a) Calculate each team's average margin b) Calculate each team's strength of schedule (average opponent rating) after one iteration c) Iterate until ratings converge (3-4 iterations) d) Compare final SRS ratings to simple average margins


Exercise 7: Elo vs SRS Comparison

Using the same games from Exercise 6:

Tasks: a) Process games through Elo (K=30, neutral site) in the order given b) Compare Elo ratings to SRS ratings from Exercise 6 c) Reorder the games and process through Elo again d) Do the final Elo ratings differ? Why?


Exercise 8: Converting Ratings to Spreads

You have three rating systems producing these outputs for Chiefs vs Broncos:

  • Elo: Chiefs 1650, Broncos 1420
  • SRS: Chiefs +8.5, Broncos -6.2
  • Efficiency: Chiefs +12.5, Broncos -4.8

Tasks: a) Convert each to a predicted spread (with 2.5 point HFA for Chiefs at home) b) Why might the three systems disagree? c) Create an ensemble prediction using equal weights d) Create a weighted ensemble using Brier score inverse weighting (assume Elo=0.218, SRS=0.220, Efficiency=0.215)


Exercise 9: Calibration Analysis

Your Elo system produces these probability buckets over 100 games:

Predicted Range Games Actual Wins
50-55% 15 8
55-60% 20 12
60-65% 25 15
65-70% 18 14
70-75% 12 10
75-80% 6 5
80%+ 4 4

Tasks: a) Calculate actual win rates for each bucket b) Plot a calibration curve (predicted vs actual) c) Is the system over-confident, under-confident, or well-calibrated? d) What adjustments would improve calibration?


Exercise 10: Real-Time Updates

Design a system that updates Elo ratings in real-time during games.

Tasks: a) How would you calculate a "current game value" based on the live score? b) Should you update ratings during the game or wait until final? c) What problems arise with updating ratings mid-game? d) How might you handle overtime games?


Exercise 11: Efficiency Rating Implementation

Implement a simplified efficiency rating using success rate:

def calculate_success_rate(plays):
    """
    Success rate based on down and yards needed.
    1st down: 45% of needed yards
    2nd down: 60% of needed yards
    3rd/4th down: 100% of needed yards
    """
    # Your implementation here

Tasks: a) Calculate success rate for a team with these play results: - 1st & 10: 5 yards (success?) - 2nd & 5: 2 yards (success?) - 3rd & 3: 4 yards (success?) - 1st & 10: 3 yards (success?) - 2nd & 7: 8 yards (success?) b) What is the overall success rate? c) Why might success rate differ from EPA?


Exercise 12: Handling Team Changes

The Broncos sign a Pro Bowl quarterback in free agency. Their current Elo is 1380.

Tasks: a) What manual adjustment might you apply (justify your choice)? b) How many games would it take for Elo to naturally correct if no adjustment is made? c) What are the risks of manual adjustments? d) Design an approach using rookie contract data or cap investment instead of manual adjustments


Exercise 13: Cross-Season Evaluation

You want to evaluate your Elo system across 5 seasons.

Tasks: a) Design a proper evaluation methodology (training/test splits) b) How should you handle season boundaries? c) What metrics should you track across seasons? d) How would you determine if performance is declining over time?


Exercise 14: Ensemble Optimization

You have three rating systems with these historical performances:

System SU Accuracy Brier Score Spread MAE Correlation
Elo 60.2% 0.218 10.8 0.65
SRS 59.8% 0.221 10.5 0.68
Efficiency 61.0% 0.215 11.2 0.70

Correlation shows how correlated each system's errors are with the others.

Tasks: a) Which single system would you choose for spread predictions? b) How might lower correlation improve ensemble performance? c) Design an optimal weighting scheme d) Why might the ensemble beat all individual systems?


Exercise 15: Schedule Effects

Team A plays this schedule: Weeks 1-4: Four home games against weak opponents Weeks 5-8: Four road games against strong opponents

Tasks: a) How would Elo ratings evolve through this schedule? b) Would SRS ratings differ? Why? c) When during the season would each system most accurately rate Team A? d) How could you modify Elo to be less sensitive to schedule front-loading?


Exercise 16: Variance and Uncertainty

Your model predicts the Ravens to beat the Browns by 7 points.

Tasks: a) Using σ = 13.5, what is the 90% confidence interval for the actual margin? b) What is the probability the Browns win outright? c) What is the probability the Ravens cover a -7 spread? d) What is the probability of a 14+ point victory for either team?


Exercise 17: Historical K-Factor Analysis

Using historical data (simulated or real), test K-factors from 15 to 50 in increments of 5.

Tasks: a) For each K-factor, calculate straight-up accuracy on held-out games b) Plot K-factor vs accuracy c) What is the optimal K-factor? Does it differ by season? d) Is there a tradeoff between early-season and late-season accuracy?


Exercise 18: Rating Stability Analysis

Track the Chiefs' Elo rating through a season where they: - Start 6-0 - Go 1-3 in weeks 7-10 - Finish 4-0

Tasks: a) With K=28, what is the approximate rating trajectory? b) Do the ratings reflect the mid-season slump appropriately? c) Where is the rating at season end vs. where it probably should be? d) How might you modify the system to better handle hot/cold streaks?


Exercise 19: Market Comparison

Your model produces these lines vs the market for Week 10:

Game Your Spread Market Difference
A @ B -3.5 -6.0 +2.5
C @ D +1.0 -1.5 +2.5
E @ F -7.0 -7.0 0
G @ H +4.5 +3.0 +1.5

Tasks: a) Which games offer the most "value" (biggest model-market difference)? b) If the market is efficient, what does disagreement imply? c) How would you evaluate whether your model or the market was right over time? d) At what threshold of disagreement should you consider betting?


Exercise 20: Complete Rating System

Build a complete NFL Elo system with these features:

  1. Margin-adjusted updates
  2. Configurable home advantage
  3. Season-to-season regression
  4. Calibration to spreads
  5. Evaluation metrics

Tasks: a) Implement the full system as a Python class b) Process 3 seasons of simulated games c) Evaluate straight-up accuracy, Brier score, and MAE d) Generate end-of-season power rankings e) Compare to a simple win percentage ranking


Programming Challenges

Challenge A: Elo Visualization

Create a visualization dashboard showing: - Team rating trajectories over time - League rating distribution - Game prediction accuracy by week - Calibration plot

Challenge B: Rating System Tournament

Implement Elo, SRS, and a simple efficiency system. Run a "tournament" where each system predicts a test set of games. Compare: - Overall accuracy - Performance by spread size (close games vs blowouts) - Early vs late season performance - Home vs away game prediction

Challenge C: Adaptive K-Factor

Implement a K-factor that varies based on: - How many games the team has played - How confident we are in the rating (recent variance) - Whether the team has had major changes

Evaluate whether adaptive K outperforms fixed K.

Challenge D: NFL vs College Football Elo

Adapt your NFL Elo system for college football. Consider: - 130+ teams instead of 32 - Teams don't all play each other - Much larger talent disparities - More games per season

What parameter changes are necessary?