Case Study: Tracking the 2023 49ers Through Elo

How rating systems capture team trajectories and what they miss


Introduction

The 2023 San Francisco 49ers entered the season with championship expectations after falling one game short the previous year. By season's end, they reached the Super Bowl again, only to lose in overtime. This case study tracks the 49ers through an Elo rating system, examining what the ratings captured, when they adjusted appropriately, and where they struggled to reflect reality.


Initial Setup

Starting Ratings

The 49ers finished 2022 with one of the highest Elo ratings in the league: - End of 2022: 1678 Elo - After 1/3 regression: 1619 Elo - Implied spread vs average team: -4.8 points

This starting point reflected the 49ers' 13-4 regular season record and NFC Championship appearance.

System Parameters

For this analysis, we use: - K-factor: 28 - Home advantage: 48 Elo (~2.5 points) - Margin multiplier: FiveThirtyEight-style logarithmic - Margin cap: 24 points


Week-by-Week Analysis

Week 1: vs Steelers (Home)

Pre-game: - 49ers: 1619 - Steelers: 1480 (after regression from 1470) - Expected: 49ers 72% favorite, -7.5 spread

Result: 49ers 30, Steelers 7 (Margin: +23)

Post-game: - Margin multiplier: 2.8 (large margin but expected favorite) - Rating change: +22 Elo - New rating: 1641

Analysis: The blowout boosted the 49ers' rating, but the multiplier dampening (they were heavy favorites) prevented excessive gain. The system appropriately credited a dominant performance without overreacting.

Week 2: @ Rams (Road)

Pre-game: - 49ers: 1641 - Rams: 1510 - Expected: 49ers 65% favorite, -4.7 spread

Result: 49ers 30, Rams 23 (Margin: +7)

Post-game: - Margin multiplier: 2.0 - Rating change: +11 Elo - New rating: 1652

Analysis: A solid road win against a division rival. The rating continued climbing, but moderately—the 7-point margin against a decent opponent warranted adjustment without overreaction.

Week 3: vs Giants (Home)

Pre-game: - 49ers: 1652 - Giants: 1435 - Expected: 49ers 81% favorite, -10.5 spread

Result: 49ers 30, Giants 12 (Margin: +18)

Post-game: - Margin multiplier: 2.5 - Rating change: +15 Elo - New rating: 1667

Analysis: Another comfortable win. The 49ers were establishing themselves as an elite team, and the ratings reflected this through steady gains.

Week 4: @ Cardinals (Road)

Pre-game: - 49ers: 1667 - Cardinals: 1395 - Expected: 49ers 80% favorite, -8.4 spread

Result: 49ers 35, Cardinals 16 (Margin: +19)

Post-game: - Rating change: +14 Elo - New rating: 1681

Analysis: The 49ers continued dominating, reaching their highest rating of the season after four games. Through Week 4, the Elo system accurately captured their trajectory: an elite team living up to expectations.


The Mid-Season Test

Week 5: @ Cowboys (Road)

This was the first true test—two elite teams meeting on the road.

Pre-game: - 49ers: 1681 - Cowboys: 1635 - Expected: 49ers 56% favorite, -1.8 spread

Result: Cowboys 42, 49ers 10 (Margin: -32)

Post-game: - Margin multiplier: 4.2 (underdog blowout!) - Rating change: -72 Elo - New rating: 1609

Analysis: This is where margin-adjusted Elo shows both strength and weakness.

The Good: The massive drop reflected a truly surprising result. An elite team losing by 32 points to another contender signaled either a significant weakness or an outlier performance.

The Concern: One game, regardless of margin, rarely reflects true team quality changes. The 49ers went from 1681 to 1609—a 72-point swing—based on a single game where several things went wrong simultaneously (Brock Purdy's early struggles, defensive breakdowns, special teams miscues).

Week 6: @ Browns (Road)

Pre-game: - 49ers: 1609 - Browns: 1545 - Expected: 49ers 59% favorite, -2.5 spread

Result: 49ers 17, Browns 31 (Margin: -14)

Post-game: - Rating change: -35 Elo - New rating: 1574

Analysis: Back-to-back losses dropped the 49ers from 1681 to 1574—over 100 Elo points in two weeks. The rating system was doing what it was designed to do: respond to results. But was this accurate?


The Bounce Back

Weeks 7-10: Recovery

Week Opponent Result Margin Rating Change New Rating
7 Vikings (H) W +8 +18 1592
8 Bengals (H) W +14 +25 1617
9 BYE - - - 1617
10 Jaguars (A) W +12 +23 1640

By Week 10, the 49ers had recovered most of their lost rating. The system's self-correction was working—wins rebuilt what losses had taken away.

The Rating "Memory"

An important observation: after the bye, the 49ers' rating (1640) was essentially where it was after Week 2 (1652). Despite going 6-2 over the first 8 games, the brutal Cowboys loss heavily weighted their overall trajectory.

This illustrates Elo's recency bias through margin adjustments—a 32-point loss influences ratings more than three 20-point wins.


Late Season Form

Weeks 11-18: Championship Push

Week Opponent Result Margin Rating
11 @ Buccaneers W +14 1658
12 @ Seahawks W +12 1674
13 @ Eagles W +10 1689
14 Seahawks W +14 1702
15 Cardinals W +28 1718
16 @ Ravens L -2 1710
17 @ Commanders W +12 1724
18 Rams W +1 1728

Peak Rating: 1728 after Week 18 (second-highest in the league behind Chiefs)

Key Insight: The Ravens loss in Week 16 only cost 8 Elo points despite being a loss because: 1. The Ravens were highly-rated (1705) 2. The margin was only 2 points 3. It was an away game

The system correctly identified this as a reasonable loss against an elite opponent.


Playoff Performance

Divisional Round: vs Packers (Home)

Pre-game: - 49ers: 1728 - Packers: 1560 - Expected: 49ers 77% favorite

Result: 49ers 24, Packers 21 (Margin: +3)

Post-game: - Rating change: +8 Elo - New rating: 1736

Analysis: The narrow win against an underdog Packers team was a yellow flag. The rating system gave minimal credit for barely surviving.

NFC Championship: vs Lions (Home)

Pre-game: - 49ers: 1736 - Lions: 1642 - Expected: 49ers 68% favorite

Result: 49ers 34, Lions 31 (Margin: +3)

Post-game: - Rating change: +6 Elo - New rating: 1742

Analysis: Another close game against a team the 49ers were expected to beat comfortably. The ratings inched up, but less than if they'd won convincingly.

Super Bowl LVIII: vs Chiefs (Neutral)

Pre-game: - 49ers: 1742 - Chiefs: 1758 - Expected: 49ers 48% slight underdog

Result: Chiefs 25, 49ers 22 OT (Margin: -3)

Post-game: - Rating change: -5 Elo - Final rating: 1737

Analysis: The Super Bowl loss barely affected the rating because: 1. The 49ers were slight underdogs 2. The margin was minimal 3. It went to overtime

The system correctly identified this as a coin-flip game that could have gone either way.


Season Summary

Rating Trajectory

Start:     1619 (after regression)
Peak:      1728 (Week 18)
Valley:    1574 (Week 6)
Final:     1737 (post-Super Bowl)
Range:     154 Elo points

What the Ratings Captured Well

1. Overall Team Quality: Final rating of 1737 correctly identified the 49ers as an elite team—one of the top 2-3 in the league.

2. Blowout Significance: The Cowboys loss appropriately triggered concern. While the 72-point drop was dramatic, teams that lose by 32 to playoff opponents genuinely have issues to address.

3. Consistent Performance Value: The late-season winning streak steadily rebuilt rating, reflecting sustained excellent play.

4. Close Game Evaluation: Narrow playoff wins against good opponents generated appropriate skepticism through minimal rating gains.

What the Ratings Missed

1. Injury Impact: The rating system had no mechanism to account for the Christian McCaffrey injury or various offensive line issues that affected performance during the mid-season slump.

2. True Talent Stability: The 49ers' "true talent" probably didn't drop 107 Elo points in Weeks 5-6, then recover completely. The volatility exceeded actual team quality changes.

3. Playoff Context: The ratings treated playoff games equally to regular season games. Some analysts argue playoff performance deserves different weighting due to higher competition intensity.

4. Scheme and Matchup Factors: The Cowboys exposed specific 49ers' vulnerabilities. Elo couldn't capture that this was a particularly bad matchup rather than a general team weakness.


Lessons for Rating System Design

1. Margin Multiplier Calibration

The 72-point swing after the Cowboys loss may have been excessive. Consider: - Lower margin caps for upset losses - Heavier regression after extreme results - Context-aware multipliers

2. Sample Size Awareness

Elo updated fully after each game, but early-season games should perhaps carry less weight: - Use lower effective K early in season - Blend with prior year ratings more heavily - Increase uncertainty estimates for early predictions

3. External Factors

Pure Elo ignores injuries, weather, rest, and other factors that affect single-game performance. Consider: - Injury adjustments to ratings - Rest day modifiers - Travel/timezone factors

4. Calibration Against Market

Throughout the season, the 49ers' Elo-implied lines differed from market lines: - After Week 6: Elo had 49ers as -2.5 vs next opponent; market had -5.5 - After Week 15: Elo had 49ers as -8.0 vs Cardinals; market had -10.5

The market, with more information sources, often differed from pure-result Elo. This suggests: - Elo is a component, not a complete system - Blending with market information improves accuracy - Pure result-based ratings have inherent limitations


The Complete Code

def track_49ers_2023_season():
    """
    Track 49ers through 2023 season with Elo.
    """
    elo = MarginAdjustedElo(k_factor=28, home_advantage=48)

    # Initialize with post-regression rating
    elo.ratings['SF'] = 1619

    games = [
        # Regular season
        {'week': 1, 'home': 'SF', 'away': 'PIT', 'home_score': 30, 'away_score': 7},
        {'week': 2, 'home': 'LAR', 'away': 'SF', 'home_score': 23, 'away_score': 30},
        {'week': 3, 'home': 'SF', 'away': 'NYG', 'home_score': 30, 'away_score': 12},
        {'week': 4, 'home': 'ARI', 'away': 'SF', 'home_score': 16, 'away_score': 35},
        {'week': 5, 'home': 'DAL', 'away': 'SF', 'home_score': 42, 'away_score': 10},
        # ... rest of season
    ]

    trajectory = []
    for game in games:
        # Get pre-game rating
        pre_rating = elo.get_rating('SF')

        # Determine if SF is home or away
        if game['home'] == 'SF':
            result = elo.update(game['home'], game['away'],
                              game['home_score'], game['away_score'])
        else:
            result = elo.update(game['home'], game['away'],
                              game['home_score'], game['away_score'])

        post_rating = elo.get_rating('SF')

        trajectory.append({
            'week': game['week'],
            'opponent': game['away'] if game['home'] == 'SF' else game['home'],
            'result': 'W' if ((game['home'] == 'SF' and game['home_score'] > game['away_score']) or
                             (game['away'] == 'SF' and game['away_score'] > game['home_score'])) else 'L',
            'pre_rating': pre_rating,
            'post_rating': post_rating,
            'change': post_rating - pre_rating
        })

    return pd.DataFrame(trajectory)

Conclusion

The 2023 49ers season demonstrates both the power and limitations of Elo ratings. The system accurately identified the 49ers as an elite team, appropriately credited consistent winning, and correctly assessed the Super Bowl as a coin-flip game.

However, the mid-season volatility—107 Elo points lost then recovered—exceeded actual team quality changes. This volatility stemmed from margin-adjusted updates responding strongly to the Cowboys blowout and subsequent Browns loss.

The key insight: Elo provides a useful signal but works best as one component of a broader analysis framework. Blending Elo with efficiency metrics, injury information, and market data creates a more complete picture than any single system alone.

For practitioners, the 49ers case suggests: 1. Trust the directional signal (elite team remained elite) 2. Question extreme swings (107-point drop was overdone) 3. Context matters (Cowboys matchup was particularly unfavorable) 4. Close games tell a story (narrow playoff wins suggested vulnerability)

Rating systems are tools for understanding team quality—not crystal balls for predicting outcomes. The 49ers entered the Super Bowl as the second-best team in football by Elo, lost by 3 points in overtime to the best team. Sometimes that's exactly how it works out.