Case Study 2: Analyzing Scoring Consistency for MVP Voting

Overview

Scenario: A major sports network is producing a segment analyzing MVP candidates through the lens of scoring consistency. They want to go beyond simple averages to understand which players are most reliable scorers and how consistency should factor into MVP considerations.

Duration: 2-3 hours Difficulty: Intermediate Prerequisites: Chapter 5 concepts, variability measures, distribution analysis


Background

The MVP race typically features players with similar scoring averages. This analysis explores whether consistency matters and how to measure it properly.

Research Questions: 1. How do MVP candidates differ in scoring consistency? 2. Is consistency correlated with team success? 3. Should voters weight consistency more heavily? 4. How do we properly measure and communicate consistency?


Part 1: Measuring Consistency

1.1 Multiple Approaches to Consistency

"""
MVP Scoring Consistency Analysis
Case Study 2 - Chapter 5: Descriptive Statistics
"""

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from typing import Dict, List, Tuple
from dataclasses import dataclass


@dataclass
class ConsistencyMetrics:
    """Container for player consistency metrics."""
    player_name: str
    games: int
    mean: float
    median: float
    std: float
    cv: float  # Coefficient of variation
    iqr: float
    range: float
    skewness: float
    games_above_20: int
    games_below_10: int
    streak_longest_above_avg: int


def calculate_consistency_metrics(game_log: pd.DataFrame,
                                   points_col: str = 'PTS') -> ConsistencyMetrics:
    """
    Calculate comprehensive consistency metrics from game log.

    Args:
        game_log: DataFrame with game-by-game data
        points_col: Name of points column

    Returns:
        ConsistencyMetrics object with all measurements
    """
    points = game_log[points_col].dropna()

    mean = points.mean()
    median = points.median()
    std = points.std()
    cv = (std / mean * 100) if mean > 0 else 0

    q1 = points.quantile(0.25)
    q3 = points.quantile(0.75)
    iqr = q3 - q1

    # Streak analysis
    above_avg = (points >= mean).astype(int)
    streak_lengths = []
    current_streak = 0

    for val in above_avg:
        if val == 1:
            current_streak += 1
        else:
            if current_streak > 0:
                streak_lengths.append(current_streak)
            current_streak = 0

    if current_streak > 0:
        streak_lengths.append(current_streak)

    longest_streak = max(streak_lengths) if streak_lengths else 0

    return ConsistencyMetrics(
        player_name=game_log['PLAYER_NAME'].iloc[0] if 'PLAYER_NAME' in game_log else 'Unknown',
        games=len(points),
        mean=mean,
        median=median,
        std=std,
        cv=cv,
        iqr=iqr,
        range=points.max() - points.min(),
        skewness=points.skew(),
        games_above_20=(points >= 20).sum(),
        games_below_10=(points < 10).sum(),
        streak_longest_above_avg=longest_streak
    )


def compare_consistency(players_data: Dict[str, pd.DataFrame]) -> pd.DataFrame:
    """
    Compare consistency metrics across multiple players.

    Args:
        players_data: Dictionary mapping player names to their game logs

    Returns:
        DataFrame with consistency comparison
    """
    metrics_list = []

    for player_name, game_log in players_data.items():
        metrics = calculate_consistency_metrics(game_log)
        metrics_list.append({
            'Player': metrics.player_name,
            'Games': metrics.games,
            'PPG': round(metrics.mean, 1),
            'Median': round(metrics.median, 1),
            'Std Dev': round(metrics.std, 1),
            'CV%': round(metrics.cv, 1),
            'IQR': round(metrics.iqr, 1),
            'Range': round(metrics.range, 0),
            'Games 20+': metrics.games_above_20,
            'Games <10': metrics.games_below_10,
            'Longest Hot Streak': metrics.streak_longest_above_avg
        })

    return pd.DataFrame(metrics_list)

1.2 Statistical Interpretation

def interpret_cv(cv: float) -> str:
    """
    Interpret coefficient of variation for scoring.

    Args:
        cv: Coefficient of variation (percentage)

    Returns:
        Interpretation string
    """
    if cv < 25:
        return "Highly consistent - elite reliability"
    elif cv < 35:
        return "Consistent - dependable scorer"
    elif cv < 45:
        return "Moderately variable - some fluctuation"
    elif cv < 55:
        return "Variable - significant fluctuation"
    else:
        return "Highly variable - unpredictable"


def calculate_reliability_score(metrics: ConsistencyMetrics) -> float:
    """
    Calculate composite reliability score (0-100).

    This score combines multiple consistency measures into
    a single interpretable metric.

    Args:
        metrics: ConsistencyMetrics object

    Returns:
        Reliability score from 0 to 100
    """
    # Lower CV is better (invert and scale)
    cv_component = max(0, 100 - metrics.cv * 2)

    # Higher percentage of games above 20 is better
    if metrics.games > 0:
        above_20_pct = (metrics.games_above_20 / metrics.games) * 100
    else:
        above_20_pct = 0

    # Lower percentage of games below 10 is better
    if metrics.games > 0:
        below_10_pct = 100 - (metrics.games_below_10 / metrics.games) * 100
    else:
        below_10_pct = 100

    # Combine with weights
    reliability = (
        cv_component * 0.40 +
        above_20_pct * 0.35 +
        below_10_pct * 0.25
    )

    return min(100, max(0, reliability))

Part 2: Visual Analysis

2.1 Distribution Comparisons

def create_scoring_distribution_comparison(players_data: Dict[str, pd.DataFrame],
                                            figsize: Tuple = (16, 10)) -> plt.Figure:
    """
    Create comprehensive distribution comparison visualization.

    Args:
        players_data: Dictionary mapping player names to game logs
        figsize: Figure dimensions

    Returns:
        Matplotlib Figure
    """
    n_players = len(players_data)
    fig, axes = plt.subplots(2, n_players, figsize=figsize)

    if n_players == 1:
        axes = axes.reshape(-1, 1)

    colors = plt.cm.Set2(np.linspace(0, 1, n_players))

    for idx, (player_name, game_log) in enumerate(players_data.items()):
        points = game_log['PTS'].dropna()
        metrics = calculate_consistency_metrics(game_log)

        # Histogram with KDE
        ax1 = axes[0, idx]
        ax1.hist(points, bins=15, density=True, alpha=0.7,
                 color=colors[idx], edgecolor='black')
        points.plot.kde(ax=ax1, linewidth=2, color='black')
        ax1.axvline(metrics.mean, color='red', linestyle='--',
                    linewidth=2, label=f'Mean: {metrics.mean:.1f}')
        ax1.axvline(metrics.median, color='green', linestyle='-',
                    linewidth=2, label=f'Median: {metrics.median:.1f}')
        ax1.set_title(f'{player_name}\nCV = {metrics.cv:.1f}%')
        ax1.set_xlabel('Points')
        ax1.legend(fontsize=8)

        # Box plot with individual games
        ax2 = axes[1, idx]
        bp = ax2.boxplot(points, vert=True, patch_artist=True)
        bp['boxes'][0].set_facecolor(colors[idx])
        bp['boxes'][0].set_alpha(0.7)

        # Overlay individual games
        jitter = np.random.normal(1, 0.04, size=len(points))
        ax2.scatter(jitter, points, alpha=0.4, s=15, color='gray')

        ax2.set_ylabel('Points')
        ax2.set_xticklabels([player_name])

    fig.suptitle('Scoring Distribution Comparison - MVP Candidates', fontsize=14)
    plt.tight_layout()

    return fig


def create_game_by_game_comparison(players_data: Dict[str, pd.DataFrame],
                                    rolling_window: int = 10,
                                    figsize: Tuple = (14, 8)) -> plt.Figure:
    """
    Create game-by-game trend comparison.

    Args:
        players_data: Dictionary mapping player names to game logs
        rolling_window: Window size for rolling average
        figsize: Figure dimensions

    Returns:
        Matplotlib Figure
    """
    fig, axes = plt.subplots(2, 1, figsize=figsize)

    colors = plt.cm.Set2(np.linspace(0, 1, len(players_data)))

    # Raw game scores
    ax1 = axes[0]
    for (player_name, game_log), color in zip(players_data.items(), colors):
        points = game_log['PTS'].dropna().values
        ax1.plot(range(len(points)), points, alpha=0.3, color=color)
        ax1.scatter(range(len(points)), points, s=10, alpha=0.5,
                   color=color, label=player_name)

    ax1.set_xlabel('Game Number')
    ax1.set_ylabel('Points')
    ax1.set_title('Game-by-Game Scoring')
    ax1.legend()

    # Rolling averages
    ax2 = axes[1]
    for (player_name, game_log), color in zip(players_data.items(), colors):
        points = game_log['PTS'].dropna()
        rolling = points.rolling(rolling_window, min_periods=1).mean()
        ax2.plot(range(len(rolling)), rolling.values, linewidth=2,
                color=color, label=player_name)

    ax2.set_xlabel('Game Number')
    ax2.set_ylabel('Points (Rolling Average)')
    ax2.set_title(f'{rolling_window}-Game Rolling Average')
    ax2.legend()
    ax2.grid(True, alpha=0.3)

    plt.tight_layout()
    return fig

2.2 Consistency Rankings

def create_consistency_dashboard(comparison_df: pd.DataFrame,
                                  figsize: Tuple = (16, 10)) -> plt.Figure:
    """
    Create comprehensive consistency dashboard.

    Args:
        comparison_df: DataFrame from compare_consistency()
        figsize: Figure dimensions

    Returns:
        Matplotlib Figure
    """
    fig = plt.figure(figsize=figsize)

    # Layout: 2x2 grid
    ax1 = fig.add_subplot(2, 2, 1)
    ax2 = fig.add_subplot(2, 2, 2)
    ax3 = fig.add_subplot(2, 2, 3)
    ax4 = fig.add_subplot(2, 2, 4)

    players = comparison_df['Player'].tolist()
    colors = plt.cm.Set2(np.linspace(0, 1, len(players)))

    # 1. PPG vs CV scatter
    ax1.scatter(comparison_df['PPG'], comparison_df['CV%'],
                s=150, c=colors, edgecolors='black')
    for i, player in enumerate(players):
        ax1.annotate(player.split()[-1],  # Last name
                     (comparison_df['PPG'].iloc[i], comparison_df['CV%'].iloc[i]),
                     xytext=(5, 5), textcoords='offset points', fontsize=9)
    ax1.set_xlabel('Points Per Game')
    ax1.set_ylabel('Coefficient of Variation (%)')
    ax1.set_title('Volume vs Consistency')
    ax1.axhline(35, color='green', linestyle='--', alpha=0.5, label='Consistent threshold')
    ax1.legend()

    # 2. CV comparison bar chart
    sorted_df = comparison_df.sort_values('CV%')
    colors_sorted = [colors[players.index(p)] for p in sorted_df['Player']]
    ax2.barh(sorted_df['Player'], sorted_df['CV%'], color=colors_sorted)
    ax2.set_xlabel('Coefficient of Variation (%)')
    ax2.set_title('Consistency Ranking (Lower is Better)')
    ax2.axvline(35, color='green', linestyle='--', alpha=0.5)

    # 3. Games 20+ vs Games <10
    width = 0.35
    x = np.arange(len(players))

    bars1 = ax3.bar(x - width/2, comparison_df['Games 20+'], width,
                    label='Games 20+', color='green', alpha=0.7)
    bars2 = ax3.bar(x + width/2, comparison_df['Games <10'], width,
                    label='Games <10', color='red', alpha=0.7)

    ax3.set_xticks(x)
    ax3.set_xticklabels([p.split()[-1] for p in players], rotation=45, ha='right')
    ax3.set_ylabel('Number of Games')
    ax3.set_title('High vs Low Scoring Games')
    ax3.legend()

    # 4. Summary table
    ax4.axis('off')

    table_data = comparison_df[['Player', 'PPG', 'CV%', 'IQR', 'Games 20+']].round(1)
    table = ax4.table(
        cellText=table_data.values,
        colLabels=table_data.columns,
        loc='center',
        cellLoc='center'
    )
    table.auto_set_font_size(False)
    table.set_fontsize(10)
    table.scale(1.2, 1.5)
    ax4.set_title('Summary Statistics', y=0.95)

    plt.suptitle('MVP Candidate Scoring Consistency Analysis', fontsize=16)
    plt.tight_layout()

    return fig

Part 3: Statistical Testing

3.1 Comparing Variability

def levene_test_scoring_variance(players_data: Dict[str, pd.DataFrame]) -> Dict:
    """
    Test whether players have significantly different scoring variances.

    Uses Levene's test which is robust to non-normal distributions.

    Args:
        players_data: Dictionary mapping player names to game logs

    Returns:
        Dictionary with test results
    """
    scoring_arrays = [df['PTS'].dropna().values for df in players_data.values()]

    statistic, p_value = stats.levene(*scoring_arrays)

    return {
        'test': "Levene's Test for Equality of Variances",
        'statistic': statistic,
        'p_value': p_value,
        'interpretation': (
            "Significant difference in variance" if p_value < 0.05
            else "No significant difference in variance"
        ),
        'variances': {
            name: df['PTS'].var() for name, df in players_data.items()
        }
    }


def bootstrap_cv_confidence_interval(points: np.ndarray,
                                      n_bootstrap: int = 1000,
                                      confidence: float = 0.95) -> Tuple[float, float]:
    """
    Calculate bootstrap confidence interval for coefficient of variation.

    Args:
        points: Array of scoring data
        n_bootstrap: Number of bootstrap samples
        confidence: Confidence level

    Returns:
        Tuple of (lower_bound, upper_bound)
    """
    cvs = []

    for _ in range(n_bootstrap):
        sample = np.random.choice(points, size=len(points), replace=True)
        cv = sample.std() / sample.mean() * 100
        cvs.append(cv)

    alpha = 1 - confidence
    lower = np.percentile(cvs, alpha/2 * 100)
    upper = np.percentile(cvs, (1 - alpha/2) * 100)

    return lower, upper


def compare_consistency_significance(player1_data: pd.DataFrame,
                                      player2_data: pd.DataFrame,
                                      n_bootstrap: int = 1000) -> Dict:
    """
    Test whether two players have significantly different consistency.

    Uses bootstrap to compare coefficients of variation.

    Args:
        player1_data: Game log for player 1
        player2_data: Game log for player 2
        n_bootstrap: Number of bootstrap samples

    Returns:
        Dictionary with comparison results
    """
    points1 = player1_data['PTS'].dropna().values
    points2 = player2_data['PTS'].dropna().values

    # Observed difference
    cv1 = points1.std() / points1.mean() * 100
    cv2 = points2.std() / points2.mean() * 100
    observed_diff = cv1 - cv2

    # Bootstrap for significance
    cv_diffs = []
    combined = np.concatenate([points1, points2])
    n1 = len(points1)

    for _ in range(n_bootstrap):
        np.random.shuffle(combined)
        sample1 = combined[:n1]
        sample2 = combined[n1:]

        boot_cv1 = sample1.std() / sample1.mean() * 100
        boot_cv2 = sample2.std() / sample2.mean() * 100
        cv_diffs.append(boot_cv1 - boot_cv2)

    p_value = np.mean(np.abs(cv_diffs) >= np.abs(observed_diff))

    return {
        'player1_cv': cv1,
        'player2_cv': cv2,
        'observed_difference': observed_diff,
        'p_value': p_value,
        'significant': p_value < 0.05,
        'interpretation': (
            f"Player 1 is {'more' if cv1 > cv2 else 'less'} variable "
            f"(p = {p_value:.3f}, {'significant' if p_value < 0.05 else 'not significant'})"
        )
    }

Part 4: Application to MVP Voting

4.1 Creating a Consistency-Weighted MVP Score

def calculate_consistency_adjusted_rating(metrics: ConsistencyMetrics,
                                           raw_rating: float,
                                           consistency_weight: float = 0.2) -> float:
    """
    Adjust a player's MVP rating based on consistency.

    Args:
        metrics: ConsistencyMetrics for the player
        raw_rating: Original MVP rating (e.g., from traditional stats)
        consistency_weight: How much to weight consistency (0-1)

    Returns:
        Adjusted rating
    """
    reliability = calculate_reliability_score(metrics)

    # Scale reliability to adjustment factor (0.9 to 1.1)
    # 50 reliability = no adjustment
    # 100 reliability = +10%
    # 0 reliability = -10%
    adjustment = 1 + (reliability - 50) / 500

    # Blend original rating with consistency adjustment
    adjusted = (
        raw_rating * (1 - consistency_weight) +
        raw_rating * adjustment * consistency_weight
    )

    return adjusted


def generate_mvp_consistency_report(players_data: Dict[str, pd.DataFrame],
                                     raw_ratings: Dict[str, float]) -> str:
    """
    Generate comprehensive MVP consistency analysis report.

    Args:
        players_data: Dictionary mapping player names to game logs
        raw_ratings: Dictionary mapping player names to base MVP ratings

    Returns:
        Formatted report string
    """
    report = []
    report.append("=" * 70)
    report.append("MVP SCORING CONSISTENCY ANALYSIS REPORT")
    report.append("=" * 70)

    comparison_df = compare_consistency(players_data)

    report.append("\n## CONSISTENCY RANKINGS (by Coefficient of Variation)")
    report.append("-" * 50)

    sorted_df = comparison_df.sort_values('CV%')
    for rank, (_, row) in enumerate(sorted_df.iterrows(), 1):
        cv_interp = interpret_cv(row['CV%'])
        report.append(
            f"{rank}. {row['Player']}: CV = {row['CV%']:.1f}% ({cv_interp})"
        )

    report.append("\n## RELIABILITY SCORES")
    report.append("-" * 50)

    for player_name, game_log in players_data.items():
        metrics = calculate_consistency_metrics(game_log)
        reliability = calculate_reliability_score(metrics)
        report.append(f"{player_name}: {reliability:.1f}/100")

    report.append("\n## CONSISTENCY-ADJUSTED MVP RATINGS")
    report.append("-" * 50)

    adjusted_ratings = []
    for player_name, game_log in players_data.items():
        metrics = calculate_consistency_metrics(game_log)
        raw = raw_ratings.get(player_name, 50)
        adjusted = calculate_consistency_adjusted_rating(metrics, raw)
        adjusted_ratings.append((player_name, raw, adjusted))

    adjusted_ratings.sort(key=lambda x: x[2], reverse=True)
    for player, raw, adjusted in adjusted_ratings:
        change = ((adjusted / raw) - 1) * 100
        direction = "+" if change > 0 else ""
        report.append(
            f"{player}: {raw:.1f} -> {adjusted:.1f} ({direction}{change:.1f}%)"
        )

    report.append("\n## KEY INSIGHTS")
    report.append("-" * 50)

    # Most consistent
    most_consistent = sorted_df.iloc[0]
    report.append(f"- Most consistent: {most_consistent['Player']} (CV = {most_consistent['CV%']:.1f}%)")

    # Most variable
    most_variable = sorted_df.iloc[-1]
    report.append(f"- Most variable: {most_variable['Player']} (CV = {most_variable['CV%']:.1f}%)")

    report.append("\n" + "=" * 70)

    return "\n".join(report)

Discussion Questions

Question 1: MVP Criteria

Should consistency be weighted in MVP voting? How much? Make an argument for and against.

Question 2: Context

A player on a bad team might have more variable scoring because their team's gameplan is inconsistent. How do you account for team context in consistency analysis?

Question 3: Sample Size

How many games are needed for a reliable consistency measure? What happens with injury-shortened seasons?

Question 4: Trade-offs

Is it possible that some variance is good (e.g., "rising to the occasion" in big games)? How would you distinguish good variance from bad?


Deliverables

  1. Analysis Code: Complete Python implementation of consistency metrics
  2. Visualization Suite: Distribution comparisons, trend analysis, dashboards
  3. Statistical Report: Written analysis with significance testing
  4. Broadcast Summary: Executive summary suitable for TV presentation
  5. Data Appendix: Full statistical tables for all candidates

Key Takeaways

  1. PPG alone is insufficient - consistency measures add important context
  2. Multiple metrics capture different aspects of consistency (CV, IQR, game counts)
  3. Statistical significance helps distinguish real differences from noise
  4. Visualization communicates consistency effectively to general audiences
  5. Context matters - consistency should be one factor among many in evaluation