Case Study: Quarterback Consistency Analysis

"The most important thing for a quarterback is consistency. You don't want a guy who throws for 400 yards one week and 100 the next." — Bill Parcells

Executive Summary

In this case study, you'll analyze quarterback performance using descriptive statistics to answer a key question: Is a consistent quarterback more valuable than a boom-or-bust performer? You'll build a comprehensive comparison framework using mean, standard deviation, and distribution analysis.

Skills Applied: - Central tendency comparison - Variability measurement - Distribution analysis - Z-score standardization - Composite scoring


Background

The Debate

Fantasy football players and NFL scouts often face this decision:

Quarterback A: Reliable 250-yard passer with occasional 300-yard games Quarterback B: Alternates between 180-yard duds and 350-yard explosions

Both might average 265 yards per game, but they're very different players. This case study quantifies that difference.

The Data

We'll analyze two quarterbacks over a 12-game sample:

import pandas as pd
import numpy as np
from scipy import stats

# Create quarterback comparison data
np.random.seed(42)

qb_data = pd.DataFrame({
    "game": list(range(1, 13)) * 2,
    "quarterback": ["Steady Steve"] * 12 + ["Volatile Vic"] * 12,
    "passing_yards": [
        # Steady Steve: consistent performer
        255, 268, 242, 275, 251, 263, 248, 272, 259, 245, 267, 255,
        # Volatile Vic: boom-or-bust
        185, 342, 210, 318, 175, 355, 195, 328, 165, 360, 190, 335
    ],
    "touchdowns": [
        2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2,  # Steve
        0, 4, 1, 3, 0, 4, 1, 4, 0, 5, 1, 3   # Vic
    ],
    "interceptions": [
        1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0,  # Steve
        2, 0, 1, 1, 2, 0, 2, 0, 3, 0, 1, 0   # Vic
    ],
    "completion_pct": [
        65, 68, 63, 70, 64, 67, 66, 69, 65, 64, 68, 66,  # Steve
        52, 72, 55, 71, 50, 75, 53, 73, 48, 76, 54, 70   # Vic
    ],
    "passer_rating": [
        88.5, 95.2, 82.4, 102.3, 86.1, 93.8, 91.2, 98.5, 89.3, 80.5, 94.1, 90.8,
        62.5, 118.5, 70.2, 105.3, 55.8, 125.2, 65.4, 115.8, 50.2, 132.5, 68.3, 108.2
    ]
})

print(qb_data.head(15))

Phase 1: Central Tendency Comparison

Calculate Basic Averages

def calculate_qb_averages(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate average statistics for each quarterback.

    Parameters
    ----------
    df : pd.DataFrame
        Quarterback game data

    Returns
    -------
    pd.DataFrame
        Average statistics per quarterback
    """
    averages = df.groupby("quarterback").agg({
        "passing_yards": "mean",
        "touchdowns": "mean",
        "interceptions": "mean",
        "completion_pct": "mean",
        "passer_rating": "mean"
    }).round(1)

    return averages

averages = calculate_qb_averages(qb_data)
print("\nQUARTERBACK AVERAGES")
print("=" * 60)
print(averages)

Expected Output:

                    passing_yards  touchdowns  interceptions  completion_pct  passer_rating
quarterback
Steady Steve               258.3         2.0            0.4            66.3           91.1
Volatile Vic               263.2         2.2            1.0            62.4           89.8

Initial Observations

At first glance: - Volatile Vic averages slightly more yards (263 vs 258) - Volatile Vic has slightly more TDs (2.2 vs 2.0) - But Vic also has more than double the interceptions (1.0 vs 0.4) - Steve has higher completion percentage and passer rating

The averages are close, but they hide important differences.


Phase 2: Variability Analysis

Standard Deviation Comparison

def calculate_qb_variability(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate variability metrics for each quarterback.

    Parameters
    ----------
    df : pd.DataFrame
        Quarterback game data

    Returns
    -------
    pd.DataFrame
        Variability statistics per quarterback
    """
    variability = df.groupby("quarterback").agg({
        "passing_yards": ["mean", "std", "min", "max"],
        "touchdowns": ["mean", "std"],
        "passer_rating": ["mean", "std"]
    }).round(1)

    # Flatten column names
    variability.columns = [f"{col[0]}_{col[1]}" for col in variability.columns]

    # Add coefficient of variation
    variability["yards_cv"] = (
        variability["passing_yards_std"] / variability["passing_yards_mean"] * 100
    ).round(1)

    variability["rating_cv"] = (
        variability["passer_rating_std"] / variability["passer_rating_mean"] * 100
    ).round(1)

    return variability

variability = calculate_qb_variability(qb_data)
print("\nVARIABILITY ANALYSIS")
print("=" * 60)
print(variability)

Expected Output:

              passing_yards_mean  passing_yards_std  passing_yards_min  passing_yards_max  ...  yards_cv  rating_cv
quarterback                                                                                 ...
Steady Steve              258.3               10.2                242                275  ...       4.0        6.8
Volatile Vic              263.2               74.5                165                360  ...      28.3       31.2

Key Findings

  1. Yards variability: Steve's std is 10.2 yards; Vic's is 74.5 yards (7x higher!)
  2. Yards CV: Steve 4.0% vs Vic 28.3%
  3. Range: Steve's range is 33 yards; Vic's is 195 yards
  4. Rating CV: Steve 6.8% vs Vic 31.2%

Volatile Vic is truly volatile—his performance swings are massive.


Phase 3: Distribution Analysis

Visualizing the Distributions

def analyze_distributions(df: pd.DataFrame, stat: str) -> pd.DataFrame:
    """
    Analyze distribution shape for a statistic.

    Parameters
    ----------
    df : pd.DataFrame
        Quarterback data
    stat : str
        Statistic column to analyze

    Returns
    -------
    pd.DataFrame
        Distribution metrics
    """
    results = []

    for qb in df["quarterback"].unique():
        qb_data_subset = df[df["quarterback"] == qb][stat]

        results.append({
            "quarterback": qb,
            "statistic": stat,
            "mean": qb_data_subset.mean(),
            "median": qb_data_subset.median(),
            "skewness": stats.skew(qb_data_subset),
            "kurtosis": stats.kurtosis(qb_data_subset),
            "q1": qb_data_subset.quantile(0.25),
            "q3": qb_data_subset.quantile(0.75),
            "iqr": qb_data_subset.quantile(0.75) - qb_data_subset.quantile(0.25)
        })

    return pd.DataFrame(results).round(2)

yards_dist = analyze_distributions(qb_data, "passing_yards")
print("\nYARDS DISTRIBUTION ANALYSIS")
print("=" * 60)
print(yards_dist)

rating_dist = analyze_distributions(qb_data, "passer_rating")
print("\nPASSER RATING DISTRIBUTION")
print("=" * 60)
print(rating_dist)

Interpreting the Distributions

Steady Steve: - Mean ≈ Median (symmetric distribution) - Low skewness and kurtosis - Narrow IQR (consistent performance band)

Volatile Vic: - More variable measures - Potentially bimodal (good games vs bad games) - Wide IQR spanning nearly the entire range


Phase 4: Game-by-Game Floor Analysis

Calculating "Floor" Performance

In many contexts, the worst-case scenario matters more than the average.

def analyze_floor_ceiling(df: pd.DataFrame) -> pd.DataFrame:
    """
    Analyze floor (worst games) and ceiling (best games) performance.

    Parameters
    ----------
    df : pd.DataFrame
        Quarterback data

    Returns
    -------
    pd.DataFrame
        Floor and ceiling analysis
    """
    results = []

    for qb in df["quarterback"].unique():
        qb_subset = df[df["quarterback"] == qb]

        # Floor: worst 3 games
        floor_games = qb_subset.nsmallest(3, "passer_rating")
        # Ceiling: best 3 games
        ceiling_games = qb_subset.nlargest(3, "passer_rating")

        results.append({
            "quarterback": qb,
            "floor_avg_yards": floor_games["passing_yards"].mean(),
            "floor_avg_rating": floor_games["passer_rating"].mean(),
            "floor_avg_td": floor_games["touchdowns"].mean(),
            "floor_avg_int": floor_games["interceptions"].mean(),
            "ceiling_avg_yards": ceiling_games["passing_yards"].mean(),
            "ceiling_avg_rating": ceiling_games["passer_rating"].mean(),
            "ceiling_avg_td": ceiling_games["touchdowns"].mean(),
            "ceiling_avg_int": ceiling_games["interceptions"].mean()
        })

    return pd.DataFrame(results).round(1)

floor_ceiling = analyze_floor_ceiling(qb_data)
print("\nFLOOR AND CEILING ANALYSIS")
print("=" * 60)
print(floor_ceiling.T)

Floor vs Ceiling Interpretation

Steady Steve: - Floor: ~80 rating, 245 yards, 1.3 TDs, 1.0 INTs - Ceiling: ~100 rating, 270 yards, 2.7 TDs, 0.3 INTs - Narrow gap between floor and ceiling

Volatile Vic: - Floor: ~55 rating, 175 yards, 0.0 TDs, 2.3 INTs - Ceiling: ~125 rating, 350 yards, 4.3 TDs, 0.0 INTs - Massive gap between floor and ceiling


Phase 5: Standardized Comparison

Z-Score Analysis

def calculate_zscore_profile(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate z-scores for each performance relative to that QB's average.

    Parameters
    ----------
    df : pd.DataFrame
        Quarterback data

    Returns
    -------
    pd.DataFrame
        Data with z-scores added
    """
    result = df.copy()

    # Calculate z-scores within each QB
    for qb in df["quarterback"].unique():
        mask = result["quarterback"] == qb
        for stat in ["passing_yards", "touchdowns", "passer_rating"]:
            qb_data_subset = result.loc[mask, stat]
            result.loc[mask, f"{stat}_z"] = (
                (qb_data_subset - qb_data_subset.mean()) / qb_data_subset.std()
            )

    return result

qb_with_z = calculate_zscore_profile(qb_data)

# Show extreme performances
print("\nEXTREME PERFORMANCES (|z| > 1.5)")
print("=" * 60)
extreme = qb_with_z[
    (abs(qb_with_z["passing_yards_z"]) > 1.5) |
    (abs(qb_with_z["passer_rating_z"]) > 1.5)
][["quarterback", "game", "passing_yards", "passer_rating",
   "passing_yards_z", "passer_rating_z"]]
print(extreme.round(2))

Cross-QB Comparison

def compare_across_qbs(df: pd.DataFrame) -> pd.DataFrame:
    """
    Create league-wide z-scores for cross-QB comparison.

    Parameters
    ----------
    df : pd.DataFrame
        All quarterback data

    Returns
    -------
    pd.DataFrame
        Comparison metrics
    """
    # Calculate league-wide stats
    league_stats = {
        "passing_yards": {"mean": df["passing_yards"].mean(),
                         "std": df["passing_yards"].std()},
        "touchdowns": {"mean": df["touchdowns"].mean(),
                       "std": df["touchdowns"].std()},
        "interceptions": {"mean": df["interceptions"].mean(),
                          "std": df["interceptions"].std()},
        "passer_rating": {"mean": df["passer_rating"].mean(),
                          "std": df["passer_rating"].std()}
    }

    results = []
    for qb in df["quarterback"].unique():
        qb_subset = df[df["quarterback"] == qb]

        profile = {"quarterback": qb}
        for stat, params in league_stats.items():
            qb_mean = qb_subset[stat].mean()
            z = (qb_mean - params["mean"]) / params["std"]
            profile[f"{stat}_z"] = round(z, 2)

        results.append(profile)

    return pd.DataFrame(results)

cross_comparison = compare_across_qbs(qb_data)
print("\nCROSS-QB COMPARISON (League Z-Scores)")
print("=" * 60)
print(cross_comparison)

Phase 6: Decision Framework

Building a Composite Score

def calculate_consistency_score(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate a composite consistency score.

    Weights:
    - Average performance: 40%
    - Consistency (inverse of CV): 30%
    - Floor performance: 20%
    - Ceiling upside: 10%

    Parameters
    ----------
    df : pd.DataFrame
        Quarterback data

    Returns
    -------
    pd.DataFrame
        Consistency scores
    """
    results = []

    for qb in df["quarterback"].unique():
        qb_subset = df[df["quarterback"] == qb]

        # Average performance (normalized 0-100)
        avg_rating = qb_subset["passer_rating"].mean()
        avg_score = (avg_rating - 50) / 100 * 100  # Normalize from 50-150 range

        # Consistency (inverse of CV, capped)
        cv = qb_subset["passer_rating"].std() / qb_subset["passer_rating"].mean()
        consistency_score = max(0, 100 - cv * 200)  # Lower CV = higher score

        # Floor (worst 3 games average)
        floor_rating = qb_subset.nsmallest(3, "passer_rating")["passer_rating"].mean()
        floor_score = (floor_rating - 50) / 100 * 100

        # Ceiling (best 3 games average)
        ceiling_rating = qb_subset.nlargest(3, "passer_rating")["passer_rating"].mean()
        ceiling_score = (ceiling_rating - 50) / 100 * 100

        # Composite
        composite = (
            avg_score * 0.40 +
            consistency_score * 0.30 +
            floor_score * 0.20 +
            ceiling_score * 0.10
        )

        results.append({
            "quarterback": qb,
            "avg_score": round(avg_score, 1),
            "consistency_score": round(consistency_score, 1),
            "floor_score": round(floor_score, 1),
            "ceiling_score": round(ceiling_score, 1),
            "composite": round(composite, 1)
        })

    return pd.DataFrame(results).sort_values("composite", ascending=False)

scores = calculate_consistency_score(qb_data)
print("\nCOMPOSITE CONSISTENCY SCORES")
print("=" * 60)
print(scores)

Phase 7: Conclusions and Recommendations

Final Analysis

def generate_recommendation(df: pd.DataFrame) -> str:
    """
    Generate quarterback recommendation based on analysis.

    Parameters
    ----------
    df : pd.DataFrame
        Quarterback data

    Returns
    -------
    str
        Recommendation text
    """
    scores = calculate_consistency_score(df)
    variability = calculate_qb_variability(df)

    lines = []
    lines.append("QUARTERBACK RECOMMENDATION REPORT")
    lines.append("=" * 60)
    lines.append("")

    # Overall winner
    winner = scores.iloc[0]["quarterback"]
    lines.append(f"RECOMMENDED: {winner}")
    lines.append("")

    # Key findings
    lines.append("KEY FINDINGS:")
    lines.append("-" * 40)

    for _, row in scores.iterrows():
        qb = row["quarterback"]
        lines.append(f"\n{qb}:")
        lines.append(f"  Composite Score: {row['composite']}")
        lines.append(f"  Average Performance: {row['avg_score']}")
        lines.append(f"  Consistency: {row['consistency_score']}")
        lines.append(f"  Floor Protection: {row['floor_score']}")
        lines.append(f"  Ceiling Upside: {row['ceiling_score']}")

    lines.append("")
    lines.append("CONTEXT-SPECIFIC RECOMMENDATIONS:")
    lines.append("-" * 40)
    lines.append("• Need reliable starter: Steady Steve")
    lines.append("• Looking for boom potential: Volatile Vic")
    lines.append("• Risk-averse team: Steady Steve")
    lines.append("• Behind in 4th quarter: Volatile Vic")

    return "\n".join(lines)

recommendation = generate_recommendation(qb_data)
print(recommendation)

Discussion Questions

  1. How would the analysis change if interceptions were weighted more heavily?

  2. In what game situations would you prefer Volatile Vic over Steady Steve?

  3. How might sample size (12 games vs 48 games) affect our confidence in these conclusions?

  4. What other statistics would you want to include in a more comprehensive analysis?

  5. How would you adjust this framework for comparing running backs or receivers?


Your Turn: Extensions

Option A: Multi-Quarterback Analysis

Extend this analysis to compare 4+ quarterbacks with different profiles: - Consistent average performer - Boom-or-bust player - High-floor, low-ceiling - Rookie with small sample

Option B: Situational Splits

Analyze how consistency changes based on: - Home vs away games - Against winning vs losing teams - First half vs second half of season

Option C: Predictive Value

Investigate: Does first-half season consistency predict second-half performance?


Key Takeaways

  1. Averages hide variability: Two quarterbacks with identical averages can be fundamentally different players

  2. Consistency has value: A reliable floor often matters more than occasional ceilings

  3. Context matters: The "better" player depends on team needs and game situations

  4. Multiple metrics needed: No single statistic captures the full picture

  5. Standard deviation is essential: It quantifies what the mean cannot show