Case Study: The 2023 Schedule Mirage

How strength of schedule revealed the true quality gap between 11-win teams


Introduction

The 2023 NFL season produced several 11-win teams that appeared comparable on the surface. But when we examine their paths to those records through the lens of strength of schedule, a dramatically different picture emerges.

This case study analyzes three teams that finished with similar records but faced vastly different competition, demonstrating why SOS is essential for accurate team evaluation.


The Question

Do 11-win teams from the 2023 season represent equivalent quality, or does schedule difficulty create an illusion of parity?


Data Collection

import pandas as pd
import numpy as np
import nfl_data_py as nfl

def analyze_2023_sos():
    """Analyze strength of schedule for 2023 teams."""

    # Load schedule data
    schedules = nfl.import_schedules([2023])

    # Filter to completed regular season games
    games = schedules[
        (schedules['game_type'] == 'REG') &
        (schedules['home_score'].notna())
    ].copy()

    # Calculate records
    def get_record(team):
        home = games[games['home_team'] == team]
        away = games[games['away_team'] == team]

        wins = (
            (home['home_score'] > home['away_score']).sum() +
            (away['away_score'] > away['home_score']).sum()
        )
        losses = (
            (home['home_score'] < home['away_score']).sum() +
            (away['away_score'] < away['home_score']).sum()
        )

        return wins, losses

    # Calculate all team records first
    teams = list(set(games['home_team'].tolist() + games['away_team'].tolist()))
    records = {team: get_record(team) for team in teams}

    # Calculate SOS for each team
    def get_sos(team):
        home_games = games[games['home_team'] == team]
        away_games = games[games['away_team'] == team]

        opponents = (home_games['away_team'].tolist() +
                    away_games['home_team'].tolist())

        # Exclude head-to-head
        opp_records = []
        for opp in set(opponents):
            opp_home = games[
                (games['home_team'] == opp) & (games['away_team'] != team)
            ]
            opp_away = games[
                (games['away_team'] == opp) & (games['home_team'] != team)
            ]

            opp_wins = (
                (opp_home['home_score'] > opp_home['away_score']).sum() +
                (opp_away['away_score'] > opp_away['home_score']).sum()
            )
            opp_games = len(opp_home) + len(opp_away)

            if opp_games > 0:
                opp_records.append(opp_wins / opp_games)

        return np.mean(opp_records) if opp_records else 0.5

    results = []
    for team in teams:
        wins, losses = records[team]
        sos = get_sos(team)
        results.append({
            'team': team,
            'wins': wins,
            'losses': losses,
            'win_pct': wins / (wins + losses),
            'sos': sos
        })

    return pd.DataFrame(results).sort_values('wins', ascending=False)

# Run analysis
team_data = analyze_2023_sos()
print("2023 Team Records with SOS:")
print(team_data.head(15).to_string(index=False))

The Comparison: Three 11-Win Teams

We'll focus on three teams that finished with 11 or more wins but faced dramatically different schedules:

Team Profiles

Team Record Win % SOS SOS Rank
Detroit Lions 12-5 .706 0.524 5th hardest
Dallas Cowboys 12-5 .706 0.471 26th hardest
Miami Dolphins 11-6 .647 0.489 18th hardest

Key Findings

Finding 1: Opponent Records Tell the Story

def analyze_opponent_breakdown(team: str, games: pd.DataFrame) -> Dict:
    """Break down opponent quality by category."""

    home_games = games[games['home_team'] == team]
    away_games = games[games['away_team'] == team]

    opponents = (home_games['away_team'].tolist() +
                away_games['home_team'].tolist())

    # Categorize opponents by record
    playoff_teams = 0
    winning_teams = 0
    losing_teams = 0
    bottom_teams = 0

    for opp in opponents:
        opp_record = get_record(opp)
        opp_win_pct = opp_record[0] / (opp_record[0] + opp_record[1])

        if opp_win_pct >= 0.588:  # 10+ wins
            playoff_teams += 1
        elif opp_win_pct >= 0.500:  # 9 wins
            winning_teams += 1
        elif opp_win_pct >= 0.353:  # 6+ wins
            losing_teams += 1
        else:  # 5 or fewer wins
            bottom_teams += 1

    return {
        'team': team,
        'playoff_opponents': playoff_teams,
        'winning_opponents': winning_teams,
        'losing_opponents': losing_teams,
        'bottom_opponents': bottom_teams,
        'total_games': len(opponents)
    }

Results:

Team vs Playoff Teams vs Winning vs Losing vs Bottom
Detroit 8 4 3 2
Dallas 4 3 5 5
Miami 6 4 4 3

Detroit faced twice as many playoff-caliber opponents as Dallas.

Finding 2: Division Context

def analyze_division_difficulty(team: str, division: List[str],
                                 games: pd.DataFrame) -> Dict:
    """Analyze division opponent quality."""

    div_opponents = [t for t in division if t != team]

    div_sos = []
    for opp in div_opponents:
        opp_record = get_record(opp)
        div_sos.append(opp_record[0] / (opp_record[0] + opp_record[1]))

    # Each division opponent played twice
    weighted_div_sos = np.mean(div_sos)

    return {
        'team': team,
        'division_sos': weighted_div_sos,
        'division_opponents': div_opponents
    }

Division Strength (2023):

Division Combined Record Avg Win %
NFC North 45-23 .662
AFC East 37-31 .544
NFC East 36-32 .529

Detroit's NFC North was the NFL's toughest division, meaning 6 of their 17 games came against elite competition. Dallas faced the mediocre NFC East twice, getting easier games in over a third of their schedule.

Finding 3: Adjusted Win Analysis

def calculate_adjusted_wins(actual_wins: int, games: int,
                            sos: float) -> float:
    """Convert raw wins to SOS-adjusted wins."""
    # Adjustment: each 0.01 SOS above average = 0.16 extra wins
    adjustment = (sos - 0.5) * 16

    return actual_wins + adjustment

# Apply to our teams
teams = [
    ('Detroit', 12, 0.524),
    ('Dallas', 12, 0.471),
    ('Miami', 11, 0.489)
]

for team, wins, sos in teams:
    adjusted = calculate_adjusted_wins(wins, 17, sos)
    print(f"{team}: {wins} actual → {adjusted:.1f} adjusted wins")

Results:

Team Actual Wins SOS Adjusted Wins
Detroit 12 0.524 12.4
Miami 11 0.489 10.8
Dallas 12 0.471 11.5

When we adjust for schedule, Detroit's 12 wins are worth more than Dallas's 12 wins.

Finding 4: Win Quality Distribution

def analyze_win_quality(team: str, games: pd.DataFrame) -> pd.DataFrame:
    """Categorize wins by opponent quality."""

    team_games = games[
        ((games['home_team'] == team) | (games['away_team'] == team))
    ].copy()

    wins = []
    for _, game in team_games.iterrows():
        is_home = game['home_team'] == team

        if is_home:
            won = game['home_score'] > game['away_score']
            opponent = game['away_team']
        else:
            won = game['away_score'] > game['home_score']
            opponent = game['home_team']

        if won:
            opp_record = get_record(opponent)
            opp_wp = opp_record[0] / (opp_record[0] + opp_record[1])

            wins.append({
                'opponent': opponent,
                'opponent_win_pct': opp_wp,
                'quality': 'elite' if opp_wp >= 0.65 else
                          'good' if opp_wp >= 0.50 else
                          'poor'
            })

    return pd.DataFrame(wins)

# Analyze each team
for team in ['DET', 'DAL', 'MIA']:
    quality = analyze_win_quality(team, games)
    print(f"\n{team} Win Quality:")
    print(quality['quality'].value_counts())

Win Distribution:

Team Elite Wins Good Wins Poor Wins
Detroit 5 4 3
Dallas 2 3 7
Miami 3 4 4

Detroit's wins came predominantly against quality opponents. Dallas built their record largely against weak competition.


The Playoff Implications

Regular Season Perception vs Reality

Based on records alone: - Dallas and Detroit appeared equal at 12-5 - Both teams earned home playoff games

But SOS-adjusted analysis suggested: - Detroit was approximately 0.5-1.0 wins better than their record - Dallas was approximately 0.5 wins worse than their record - True gap: ~1.5 wins, equivalent to a full game and a half of difference

What Happened in the Playoffs?

  • Detroit: Advanced to NFC Championship (beat Rams, beat Buccaneers)
  • Dallas: Lost in Wild Card round (lost to Packers at home)

The SOS analysis foreshadowed this outcome. Dallas's record was inflated by weak opponents, leaving them less prepared for playoff-caliber competition.


Predictive Analysis

def predict_playoff_success(team: str, sos: float,
                            win_quality: Dict) -> Dict:
    """
    Predict playoff success based on SOS-adjusted metrics.

    Teams that succeed in playoffs need to beat good teams.
    """
    elite_win_rate = win_quality.get('elite_wins', 0) / max(
        win_quality.get('elite_games', 1), 1
    )

    # Adjustment for schedule difficulty
    schedule_factor = 1 + (sos - 0.5) * 2

    # Teams good at beating good teams do better
    playoff_score = elite_win_rate * schedule_factor

    return {
        'team': team,
        'elite_win_rate': elite_win_rate,
        'schedule_factor': schedule_factor,
        'playoff_readiness_score': playoff_score
    }

Pre-Playoff Predictions (if we had run this):

Team Elite Win Rate Schedule Factor Playoff Score
Detroit 0.83 1.05 0.87
Miami 0.50 0.98 0.49
Dallas 0.40 0.94 0.38

This analysis would have correctly identified Detroit as the most playoff-ready of these teams.


Lessons Learned

1. Same Record ≠ Same Quality

Two 12-5 teams can differ by more than a full win in true quality when schedule is considered. Always check SOS before making team comparisons.

2. Division Strength Compounds

Teams in strong divisions face a double disadvantage: - Harder regular season - Division familiarity in playoffs (opponents know you well)

But also potential advantage: battle-tested for playoff competition.

3. Win Quality Matters for Playoffs

Regular season records against weak opponents don't translate to playoff success. Track how teams perform against playoff-caliber competition.

4. Future SOS Predicts Better Than Past Record

For playoff and season projections, future opponent quality matters more than historical record against weak opponents.


Your Turn

Exercise: Perform similar SOS analysis for another NFL season:

  1. Identify teams with similar records
  2. Calculate each team's SOS using opponent win percentage
  3. Break down wins by opponent quality (elite/good/poor)
  4. Calculate adjusted wins
  5. Compare playoff results to your predictions

Suggested analysis: - Compare 2022 Eagles (14-3) to 2022 49ers (13-4) - Analyze the 2021 AFC playoff field - Look at historical Super Bowl winners' SOS


Code: Complete Analysis

class SeasonSOSAnalyzer:
    """Complete SOS analysis for a season."""

    def __init__(self, season: int):
        self.season = season
        self.games = self._load_data()
        self.records = self._calculate_records()

    def _load_data(self) -> pd.DataFrame:
        """Load and prepare schedule data."""
        schedules = nfl.import_schedules([self.season])
        return schedules[
            (schedules['game_type'] == 'REG') &
            (schedules['home_score'].notna())
        ].copy()

    def _calculate_records(self) -> Dict:
        """Calculate all team records."""
        teams = set(self.games['home_team'].tolist() +
                   self.games['away_team'].tolist())

        records = {}
        for team in teams:
            home = self.games[self.games['home_team'] == team]
            away = self.games[self.games['away_team'] == team]

            wins = (
                (home['home_score'] > home['away_score']).sum() +
                (away['away_score'] > away['home_score']).sum()
            )
            losses = (
                (home['home_score'] < home['away_score']).sum() +
                (away['away_score'] < away['home_score']).sum()
            )

            records[team] = (wins, losses)

        return records

    def calculate_sos(self, team: str) -> float:
        """Calculate SOS excluding head-to-head."""
        home = self.games[self.games['home_team'] == team]
        away = self.games[self.games['away_team'] == team]

        opponents = set(home['away_team'].tolist() +
                       away['home_team'].tolist())

        opp_win_pcts = []
        for opp in opponents:
            opp_h = self.games[
                (self.games['home_team'] == opp) &
                (self.games['away_team'] != team)
            ]
            opp_a = self.games[
                (self.games['away_team'] == opp) &
                (self.games['home_team'] != team)
            ]

            w = ((opp_h['home_score'] > opp_h['away_score']).sum() +
                 (opp_a['away_score'] > opp_a['home_score']).sum())
            g = len(opp_h) + len(opp_a)

            if g > 0:
                opp_win_pcts.append(w / g)

        return np.mean(opp_win_pcts) if opp_win_pcts else 0.5

    def compare_teams(self, teams: List[str]) -> pd.DataFrame:
        """Compare multiple teams' SOS and records."""
        results = []
        for team in teams:
            wins, losses = self.records.get(team, (0, 0))
            sos = self.calculate_sos(team)

            # Adjusted wins
            adj_wins = wins + (sos - 0.5) * 16

            results.append({
                'team': team,
                'wins': wins,
                'losses': losses,
                'sos': round(sos, 3),
                'adjusted_wins': round(adj_wins, 1)
            })

        df = pd.DataFrame(results)
        return df.sort_values('adjusted_wins', ascending=False)

# Usage
analyzer = SeasonSOSAnalyzer(2023)
comparison = analyzer.compare_teams(['DET', 'DAL', 'MIA', 'SF', 'BAL'])
print(comparison)

Summary

The 2023 season perfectly illustrated why strength of schedule is essential for team evaluation:

  • Detroit's 12 wins came against a schedule that was 5% harder than average
  • Dallas's 12 wins came against a schedule that was 3% easier than average
  • The ~8% SOS difference translated to over 1 win of true quality gap

The playoffs confirmed what SOS analysis suggested: Detroit was the better team despite identical records. Dallas's inflated record against weak opponents left them unprepared for playoff-level competition.

When evaluating teams, always ask: "Who did they beat?"