Exercises: Elo and Power Ratings
Exercise 1: Basic Elo Calculation
The Chiefs have an Elo rating of 1650 and the Broncos have 1420.
Tasks: a) Calculate the expected score for the Chiefs b) If K=28 and the Chiefs win, what are both teams' new ratings? c) If K=28 and the Broncos upset the Chiefs, what are both teams' new ratings? d) Compare the magnitude of rating changes in scenarios (b) and (c)
Exercise 2: Home Advantage Impact
Two evenly-matched teams (both rated 1500) play. Home advantage is 48 Elo points.
Tasks: a) What is the home team's expected score? b) Convert this to a win probability c) What point spread does this imply (using 25 Elo = 1 point)? d) If home advantage decreased to 40 Elo, how would probabilities change?
Exercise 3: K-Factor Exploration
Process the following game sequence with K=20 and K=40 separately. Team A starts at 1500, Team B starts at 1500.
Game 1: Team A beats Team B Game 2: Team B beats Team A Game 3: Team A beats Team B Game 4: Team A beats Team B
Tasks: a) Track ratings after each game for both K values b) What are the final ratings with each K-factor? c) Plot the rating trajectories d) Which K-factor would be better for a volatile team? For a consistent team?
Exercise 4: Margin of Victory Multiplier
Implement the FiveThirtyEight margin multiplier:
def margin_multiplier(margin, elo_diff):
"""
margin: Winner's margin of victory (always positive)
elo_diff: Winner's Elo - Loser's Elo
"""
# Your implementation here
Tasks: a) Calculate the multiplier for a 7-point favorite winning by 14 b) Calculate the multiplier for a 7-point underdog winning by 14 c) Calculate the multiplier for a 14-point favorite winning by 28 d) Explain why underdog blowouts produce higher multipliers
Exercise 5: Season Regression
End-of-2023 ratings: - Chiefs: 1680 - Cowboys: 1620 - Patriots: 1350 - Bears: 1380
Tasks: a) Apply 1/3 regression to the mean for all teams b) What is the new rating spread between Chiefs and Bears? c) If the Patriots had a roster continuity of 0.6 while others had 0.8, how might you adjust regression differently? d) Why do we regress more heavily for teams with major changes?
Exercise 6: Building an SRS
Given these game results (no home advantage for simplicity):
| Home | Away | Home Score | Away Score |
|---|---|---|---|
| A | B | 28 | 21 |
| B | C | 24 | 17 |
| C | A | 20 | 27 |
| A | C | 31 | 14 |
| B | A | 21 | 28 |
| C | B | 10 | 24 |
Tasks: a) Calculate each team's average margin b) Calculate each team's strength of schedule (average opponent rating) after one iteration c) Iterate until ratings converge (3-4 iterations) d) Compare final SRS ratings to simple average margins
Exercise 7: Elo vs SRS Comparison
Using the same games from Exercise 6:
Tasks: a) Process games through Elo (K=30, neutral site) in the order given b) Compare Elo ratings to SRS ratings from Exercise 6 c) Reorder the games and process through Elo again d) Do the final Elo ratings differ? Why?
Exercise 8: Converting Ratings to Spreads
You have three rating systems producing these outputs for Chiefs vs Broncos:
- Elo: Chiefs 1650, Broncos 1420
- SRS: Chiefs +8.5, Broncos -6.2
- Efficiency: Chiefs +12.5, Broncos -4.8
Tasks: a) Convert each to a predicted spread (with 2.5 point HFA for Chiefs at home) b) Why might the three systems disagree? c) Create an ensemble prediction using equal weights d) Create a weighted ensemble using Brier score inverse weighting (assume Elo=0.218, SRS=0.220, Efficiency=0.215)
Exercise 9: Calibration Analysis
Your Elo system produces these probability buckets over 100 games:
| Predicted Range | Games | Actual Wins |
|---|---|---|
| 50-55% | 15 | 8 |
| 55-60% | 20 | 12 |
| 60-65% | 25 | 15 |
| 65-70% | 18 | 14 |
| 70-75% | 12 | 10 |
| 75-80% | 6 | 5 |
| 80%+ | 4 | 4 |
Tasks: a) Calculate actual win rates for each bucket b) Plot a calibration curve (predicted vs actual) c) Is the system over-confident, under-confident, or well-calibrated? d) What adjustments would improve calibration?
Exercise 10: Real-Time Updates
Design a system that updates Elo ratings in real-time during games.
Tasks: a) How would you calculate a "current game value" based on the live score? b) Should you update ratings during the game or wait until final? c) What problems arise with updating ratings mid-game? d) How might you handle overtime games?
Exercise 11: Efficiency Rating Implementation
Implement a simplified efficiency rating using success rate:
def calculate_success_rate(plays):
"""
Success rate based on down and yards needed.
1st down: 45% of needed yards
2nd down: 60% of needed yards
3rd/4th down: 100% of needed yards
"""
# Your implementation here
Tasks: a) Calculate success rate for a team with these play results: - 1st & 10: 5 yards (success?) - 2nd & 5: 2 yards (success?) - 3rd & 3: 4 yards (success?) - 1st & 10: 3 yards (success?) - 2nd & 7: 8 yards (success?) b) What is the overall success rate? c) Why might success rate differ from EPA?
Exercise 12: Handling Team Changes
The Broncos sign a Pro Bowl quarterback in free agency. Their current Elo is 1380.
Tasks: a) What manual adjustment might you apply (justify your choice)? b) How many games would it take for Elo to naturally correct if no adjustment is made? c) What are the risks of manual adjustments? d) Design an approach using rookie contract data or cap investment instead of manual adjustments
Exercise 13: Cross-Season Evaluation
You want to evaluate your Elo system across 5 seasons.
Tasks: a) Design a proper evaluation methodology (training/test splits) b) How should you handle season boundaries? c) What metrics should you track across seasons? d) How would you determine if performance is declining over time?
Exercise 14: Ensemble Optimization
You have three rating systems with these historical performances:
| System | SU Accuracy | Brier Score | Spread MAE | Correlation |
|---|---|---|---|---|
| Elo | 60.2% | 0.218 | 10.8 | 0.65 |
| SRS | 59.8% | 0.221 | 10.5 | 0.68 |
| Efficiency | 61.0% | 0.215 | 11.2 | 0.70 |
Correlation shows how correlated each system's errors are with the others.
Tasks: a) Which single system would you choose for spread predictions? b) How might lower correlation improve ensemble performance? c) Design an optimal weighting scheme d) Why might the ensemble beat all individual systems?
Exercise 15: Schedule Effects
Team A plays this schedule: Weeks 1-4: Four home games against weak opponents Weeks 5-8: Four road games against strong opponents
Tasks: a) How would Elo ratings evolve through this schedule? b) Would SRS ratings differ? Why? c) When during the season would each system most accurately rate Team A? d) How could you modify Elo to be less sensitive to schedule front-loading?
Exercise 16: Variance and Uncertainty
Your model predicts the Ravens to beat the Browns by 7 points.
Tasks: a) Using σ = 13.5, what is the 90% confidence interval for the actual margin? b) What is the probability the Browns win outright? c) What is the probability the Ravens cover a -7 spread? d) What is the probability of a 14+ point victory for either team?
Exercise 17: Historical K-Factor Analysis
Using historical data (simulated or real), test K-factors from 15 to 50 in increments of 5.
Tasks: a) For each K-factor, calculate straight-up accuracy on held-out games b) Plot K-factor vs accuracy c) What is the optimal K-factor? Does it differ by season? d) Is there a tradeoff between early-season and late-season accuracy?
Exercise 18: Rating Stability Analysis
Track the Chiefs' Elo rating through a season where they: - Start 6-0 - Go 1-3 in weeks 7-10 - Finish 4-0
Tasks: a) With K=28, what is the approximate rating trajectory? b) Do the ratings reflect the mid-season slump appropriately? c) Where is the rating at season end vs. where it probably should be? d) How might you modify the system to better handle hot/cold streaks?
Exercise 19: Market Comparison
Your model produces these lines vs the market for Week 10:
| Game | Your Spread | Market | Difference |
|---|---|---|---|
| A @ B | -3.5 | -6.0 | +2.5 |
| C @ D | +1.0 | -1.5 | +2.5 |
| E @ F | -7.0 | -7.0 | 0 |
| G @ H | +4.5 | +3.0 | +1.5 |
Tasks: a) Which games offer the most "value" (biggest model-market difference)? b) If the market is efficient, what does disagreement imply? c) How would you evaluate whether your model or the market was right over time? d) At what threshold of disagreement should you consider betting?
Exercise 20: Complete Rating System
Build a complete NFL Elo system with these features:
- Margin-adjusted updates
- Configurable home advantage
- Season-to-season regression
- Calibration to spreads
- Evaluation metrics
Tasks: a) Implement the full system as a Python class b) Process 3 seasons of simulated games c) Evaluate straight-up accuracy, Brier score, and MAE d) Generate end-of-season power rankings e) Compare to a simple win percentage ranking
Programming Challenges
Challenge A: Elo Visualization
Create a visualization dashboard showing: - Team rating trajectories over time - League rating distribution - Game prediction accuracy by week - Calibration plot
Challenge B: Rating System Tournament
Implement Elo, SRS, and a simple efficiency system. Run a "tournament" where each system predicts a test set of games. Compare: - Overall accuracy - Performance by spread size (close games vs blowouts) - Early vs late season performance - Home vs away game prediction
Challenge C: Adaptive K-Factor
Implement a K-factor that varies based on: - How many games the team has played - How confident we are in the rating (recent variance) - Whether the team has had major changes
Evaluate whether adaptive K outperforms fixed K.
Challenge D: NFL vs College Football Elo
Adapt your NFL Elo system for college football. Consider: - 130+ teams instead of 32 - Teams don't all play each other - Much larger talent disparities - More games per season
What parameter changes are necessary?