Case Study 2: Analyzing Scoring Consistency for MVP Voting
Overview
Scenario: A major sports network is producing a segment analyzing MVP candidates through the lens of scoring consistency. They want to go beyond simple averages to understand which players are most reliable scorers and how consistency should factor into MVP considerations.
Duration: 2-3 hours Difficulty: Intermediate Prerequisites: Chapter 5 concepts, variability measures, distribution analysis
Background
The MVP race typically features players with similar scoring averages. This analysis explores whether consistency matters and how to measure it properly.
Research Questions: 1. How do MVP candidates differ in scoring consistency? 2. Is consistency correlated with team success? 3. Should voters weight consistency more heavily? 4. How do we properly measure and communicate consistency?
Part 1: Measuring Consistency
1.1 Multiple Approaches to Consistency
"""
MVP Scoring Consistency Analysis
Case Study 2 - Chapter 5: Descriptive Statistics
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from typing import Dict, List, Tuple
from dataclasses import dataclass
@dataclass
class ConsistencyMetrics:
"""Container for player consistency metrics."""
player_name: str
games: int
mean: float
median: float
std: float
cv: float # Coefficient of variation
iqr: float
range: float
skewness: float
games_above_20: int
games_below_10: int
streak_longest_above_avg: int
def calculate_consistency_metrics(game_log: pd.DataFrame,
points_col: str = 'PTS') -> ConsistencyMetrics:
"""
Calculate comprehensive consistency metrics from game log.
Args:
game_log: DataFrame with game-by-game data
points_col: Name of points column
Returns:
ConsistencyMetrics object with all measurements
"""
points = game_log[points_col].dropna()
mean = points.mean()
median = points.median()
std = points.std()
cv = (std / mean * 100) if mean > 0 else 0
q1 = points.quantile(0.25)
q3 = points.quantile(0.75)
iqr = q3 - q1
# Streak analysis
above_avg = (points >= mean).astype(int)
streak_lengths = []
current_streak = 0
for val in above_avg:
if val == 1:
current_streak += 1
else:
if current_streak > 0:
streak_lengths.append(current_streak)
current_streak = 0
if current_streak > 0:
streak_lengths.append(current_streak)
longest_streak = max(streak_lengths) if streak_lengths else 0
return ConsistencyMetrics(
player_name=game_log['PLAYER_NAME'].iloc[0] if 'PLAYER_NAME' in game_log else 'Unknown',
games=len(points),
mean=mean,
median=median,
std=std,
cv=cv,
iqr=iqr,
range=points.max() - points.min(),
skewness=points.skew(),
games_above_20=(points >= 20).sum(),
games_below_10=(points < 10).sum(),
streak_longest_above_avg=longest_streak
)
def compare_consistency(players_data: Dict[str, pd.DataFrame]) -> pd.DataFrame:
"""
Compare consistency metrics across multiple players.
Args:
players_data: Dictionary mapping player names to their game logs
Returns:
DataFrame with consistency comparison
"""
metrics_list = []
for player_name, game_log in players_data.items():
metrics = calculate_consistency_metrics(game_log)
metrics_list.append({
'Player': metrics.player_name,
'Games': metrics.games,
'PPG': round(metrics.mean, 1),
'Median': round(metrics.median, 1),
'Std Dev': round(metrics.std, 1),
'CV%': round(metrics.cv, 1),
'IQR': round(metrics.iqr, 1),
'Range': round(metrics.range, 0),
'Games 20+': metrics.games_above_20,
'Games <10': metrics.games_below_10,
'Longest Hot Streak': metrics.streak_longest_above_avg
})
return pd.DataFrame(metrics_list)
1.2 Statistical Interpretation
def interpret_cv(cv: float) -> str:
"""
Interpret coefficient of variation for scoring.
Args:
cv: Coefficient of variation (percentage)
Returns:
Interpretation string
"""
if cv < 25:
return "Highly consistent - elite reliability"
elif cv < 35:
return "Consistent - dependable scorer"
elif cv < 45:
return "Moderately variable - some fluctuation"
elif cv < 55:
return "Variable - significant fluctuation"
else:
return "Highly variable - unpredictable"
def calculate_reliability_score(metrics: ConsistencyMetrics) -> float:
"""
Calculate composite reliability score (0-100).
This score combines multiple consistency measures into
a single interpretable metric.
Args:
metrics: ConsistencyMetrics object
Returns:
Reliability score from 0 to 100
"""
# Lower CV is better (invert and scale)
cv_component = max(0, 100 - metrics.cv * 2)
# Higher percentage of games above 20 is better
if metrics.games > 0:
above_20_pct = (metrics.games_above_20 / metrics.games) * 100
else:
above_20_pct = 0
# Lower percentage of games below 10 is better
if metrics.games > 0:
below_10_pct = 100 - (metrics.games_below_10 / metrics.games) * 100
else:
below_10_pct = 100
# Combine with weights
reliability = (
cv_component * 0.40 +
above_20_pct * 0.35 +
below_10_pct * 0.25
)
return min(100, max(0, reliability))
Part 2: Visual Analysis
2.1 Distribution Comparisons
def create_scoring_distribution_comparison(players_data: Dict[str, pd.DataFrame],
figsize: Tuple = (16, 10)) -> plt.Figure:
"""
Create comprehensive distribution comparison visualization.
Args:
players_data: Dictionary mapping player names to game logs
figsize: Figure dimensions
Returns:
Matplotlib Figure
"""
n_players = len(players_data)
fig, axes = plt.subplots(2, n_players, figsize=figsize)
if n_players == 1:
axes = axes.reshape(-1, 1)
colors = plt.cm.Set2(np.linspace(0, 1, n_players))
for idx, (player_name, game_log) in enumerate(players_data.items()):
points = game_log['PTS'].dropna()
metrics = calculate_consistency_metrics(game_log)
# Histogram with KDE
ax1 = axes[0, idx]
ax1.hist(points, bins=15, density=True, alpha=0.7,
color=colors[idx], edgecolor='black')
points.plot.kde(ax=ax1, linewidth=2, color='black')
ax1.axvline(metrics.mean, color='red', linestyle='--',
linewidth=2, label=f'Mean: {metrics.mean:.1f}')
ax1.axvline(metrics.median, color='green', linestyle='-',
linewidth=2, label=f'Median: {metrics.median:.1f}')
ax1.set_title(f'{player_name}\nCV = {metrics.cv:.1f}%')
ax1.set_xlabel('Points')
ax1.legend(fontsize=8)
# Box plot with individual games
ax2 = axes[1, idx]
bp = ax2.boxplot(points, vert=True, patch_artist=True)
bp['boxes'][0].set_facecolor(colors[idx])
bp['boxes'][0].set_alpha(0.7)
# Overlay individual games
jitter = np.random.normal(1, 0.04, size=len(points))
ax2.scatter(jitter, points, alpha=0.4, s=15, color='gray')
ax2.set_ylabel('Points')
ax2.set_xticklabels([player_name])
fig.suptitle('Scoring Distribution Comparison - MVP Candidates', fontsize=14)
plt.tight_layout()
return fig
def create_game_by_game_comparison(players_data: Dict[str, pd.DataFrame],
rolling_window: int = 10,
figsize: Tuple = (14, 8)) -> plt.Figure:
"""
Create game-by-game trend comparison.
Args:
players_data: Dictionary mapping player names to game logs
rolling_window: Window size for rolling average
figsize: Figure dimensions
Returns:
Matplotlib Figure
"""
fig, axes = plt.subplots(2, 1, figsize=figsize)
colors = plt.cm.Set2(np.linspace(0, 1, len(players_data)))
# Raw game scores
ax1 = axes[0]
for (player_name, game_log), color in zip(players_data.items(), colors):
points = game_log['PTS'].dropna().values
ax1.plot(range(len(points)), points, alpha=0.3, color=color)
ax1.scatter(range(len(points)), points, s=10, alpha=0.5,
color=color, label=player_name)
ax1.set_xlabel('Game Number')
ax1.set_ylabel('Points')
ax1.set_title('Game-by-Game Scoring')
ax1.legend()
# Rolling averages
ax2 = axes[1]
for (player_name, game_log), color in zip(players_data.items(), colors):
points = game_log['PTS'].dropna()
rolling = points.rolling(rolling_window, min_periods=1).mean()
ax2.plot(range(len(rolling)), rolling.values, linewidth=2,
color=color, label=player_name)
ax2.set_xlabel('Game Number')
ax2.set_ylabel('Points (Rolling Average)')
ax2.set_title(f'{rolling_window}-Game Rolling Average')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
return fig
2.2 Consistency Rankings
def create_consistency_dashboard(comparison_df: pd.DataFrame,
figsize: Tuple = (16, 10)) -> plt.Figure:
"""
Create comprehensive consistency dashboard.
Args:
comparison_df: DataFrame from compare_consistency()
figsize: Figure dimensions
Returns:
Matplotlib Figure
"""
fig = plt.figure(figsize=figsize)
# Layout: 2x2 grid
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)
players = comparison_df['Player'].tolist()
colors = plt.cm.Set2(np.linspace(0, 1, len(players)))
# 1. PPG vs CV scatter
ax1.scatter(comparison_df['PPG'], comparison_df['CV%'],
s=150, c=colors, edgecolors='black')
for i, player in enumerate(players):
ax1.annotate(player.split()[-1], # Last name
(comparison_df['PPG'].iloc[i], comparison_df['CV%'].iloc[i]),
xytext=(5, 5), textcoords='offset points', fontsize=9)
ax1.set_xlabel('Points Per Game')
ax1.set_ylabel('Coefficient of Variation (%)')
ax1.set_title('Volume vs Consistency')
ax1.axhline(35, color='green', linestyle='--', alpha=0.5, label='Consistent threshold')
ax1.legend()
# 2. CV comparison bar chart
sorted_df = comparison_df.sort_values('CV%')
colors_sorted = [colors[players.index(p)] for p in sorted_df['Player']]
ax2.barh(sorted_df['Player'], sorted_df['CV%'], color=colors_sorted)
ax2.set_xlabel('Coefficient of Variation (%)')
ax2.set_title('Consistency Ranking (Lower is Better)')
ax2.axvline(35, color='green', linestyle='--', alpha=0.5)
# 3. Games 20+ vs Games <10
width = 0.35
x = np.arange(len(players))
bars1 = ax3.bar(x - width/2, comparison_df['Games 20+'], width,
label='Games 20+', color='green', alpha=0.7)
bars2 = ax3.bar(x + width/2, comparison_df['Games <10'], width,
label='Games <10', color='red', alpha=0.7)
ax3.set_xticks(x)
ax3.set_xticklabels([p.split()[-1] for p in players], rotation=45, ha='right')
ax3.set_ylabel('Number of Games')
ax3.set_title('High vs Low Scoring Games')
ax3.legend()
# 4. Summary table
ax4.axis('off')
table_data = comparison_df[['Player', 'PPG', 'CV%', 'IQR', 'Games 20+']].round(1)
table = ax4.table(
cellText=table_data.values,
colLabels=table_data.columns,
loc='center',
cellLoc='center'
)
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.2, 1.5)
ax4.set_title('Summary Statistics', y=0.95)
plt.suptitle('MVP Candidate Scoring Consistency Analysis', fontsize=16)
plt.tight_layout()
return fig
Part 3: Statistical Testing
3.1 Comparing Variability
def levene_test_scoring_variance(players_data: Dict[str, pd.DataFrame]) -> Dict:
"""
Test whether players have significantly different scoring variances.
Uses Levene's test which is robust to non-normal distributions.
Args:
players_data: Dictionary mapping player names to game logs
Returns:
Dictionary with test results
"""
scoring_arrays = [df['PTS'].dropna().values for df in players_data.values()]
statistic, p_value = stats.levene(*scoring_arrays)
return {
'test': "Levene's Test for Equality of Variances",
'statistic': statistic,
'p_value': p_value,
'interpretation': (
"Significant difference in variance" if p_value < 0.05
else "No significant difference in variance"
),
'variances': {
name: df['PTS'].var() for name, df in players_data.items()
}
}
def bootstrap_cv_confidence_interval(points: np.ndarray,
n_bootstrap: int = 1000,
confidence: float = 0.95) -> Tuple[float, float]:
"""
Calculate bootstrap confidence interval for coefficient of variation.
Args:
points: Array of scoring data
n_bootstrap: Number of bootstrap samples
confidence: Confidence level
Returns:
Tuple of (lower_bound, upper_bound)
"""
cvs = []
for _ in range(n_bootstrap):
sample = np.random.choice(points, size=len(points), replace=True)
cv = sample.std() / sample.mean() * 100
cvs.append(cv)
alpha = 1 - confidence
lower = np.percentile(cvs, alpha/2 * 100)
upper = np.percentile(cvs, (1 - alpha/2) * 100)
return lower, upper
def compare_consistency_significance(player1_data: pd.DataFrame,
player2_data: pd.DataFrame,
n_bootstrap: int = 1000) -> Dict:
"""
Test whether two players have significantly different consistency.
Uses bootstrap to compare coefficients of variation.
Args:
player1_data: Game log for player 1
player2_data: Game log for player 2
n_bootstrap: Number of bootstrap samples
Returns:
Dictionary with comparison results
"""
points1 = player1_data['PTS'].dropna().values
points2 = player2_data['PTS'].dropna().values
# Observed difference
cv1 = points1.std() / points1.mean() * 100
cv2 = points2.std() / points2.mean() * 100
observed_diff = cv1 - cv2
# Bootstrap for significance
cv_diffs = []
combined = np.concatenate([points1, points2])
n1 = len(points1)
for _ in range(n_bootstrap):
np.random.shuffle(combined)
sample1 = combined[:n1]
sample2 = combined[n1:]
boot_cv1 = sample1.std() / sample1.mean() * 100
boot_cv2 = sample2.std() / sample2.mean() * 100
cv_diffs.append(boot_cv1 - boot_cv2)
p_value = np.mean(np.abs(cv_diffs) >= np.abs(observed_diff))
return {
'player1_cv': cv1,
'player2_cv': cv2,
'observed_difference': observed_diff,
'p_value': p_value,
'significant': p_value < 0.05,
'interpretation': (
f"Player 1 is {'more' if cv1 > cv2 else 'less'} variable "
f"(p = {p_value:.3f}, {'significant' if p_value < 0.05 else 'not significant'})"
)
}
Part 4: Application to MVP Voting
4.1 Creating a Consistency-Weighted MVP Score
def calculate_consistency_adjusted_rating(metrics: ConsistencyMetrics,
raw_rating: float,
consistency_weight: float = 0.2) -> float:
"""
Adjust a player's MVP rating based on consistency.
Args:
metrics: ConsistencyMetrics for the player
raw_rating: Original MVP rating (e.g., from traditional stats)
consistency_weight: How much to weight consistency (0-1)
Returns:
Adjusted rating
"""
reliability = calculate_reliability_score(metrics)
# Scale reliability to adjustment factor (0.9 to 1.1)
# 50 reliability = no adjustment
# 100 reliability = +10%
# 0 reliability = -10%
adjustment = 1 + (reliability - 50) / 500
# Blend original rating with consistency adjustment
adjusted = (
raw_rating * (1 - consistency_weight) +
raw_rating * adjustment * consistency_weight
)
return adjusted
def generate_mvp_consistency_report(players_data: Dict[str, pd.DataFrame],
raw_ratings: Dict[str, float]) -> str:
"""
Generate comprehensive MVP consistency analysis report.
Args:
players_data: Dictionary mapping player names to game logs
raw_ratings: Dictionary mapping player names to base MVP ratings
Returns:
Formatted report string
"""
report = []
report.append("=" * 70)
report.append("MVP SCORING CONSISTENCY ANALYSIS REPORT")
report.append("=" * 70)
comparison_df = compare_consistency(players_data)
report.append("\n## CONSISTENCY RANKINGS (by Coefficient of Variation)")
report.append("-" * 50)
sorted_df = comparison_df.sort_values('CV%')
for rank, (_, row) in enumerate(sorted_df.iterrows(), 1):
cv_interp = interpret_cv(row['CV%'])
report.append(
f"{rank}. {row['Player']}: CV = {row['CV%']:.1f}% ({cv_interp})"
)
report.append("\n## RELIABILITY SCORES")
report.append("-" * 50)
for player_name, game_log in players_data.items():
metrics = calculate_consistency_metrics(game_log)
reliability = calculate_reliability_score(metrics)
report.append(f"{player_name}: {reliability:.1f}/100")
report.append("\n## CONSISTENCY-ADJUSTED MVP RATINGS")
report.append("-" * 50)
adjusted_ratings = []
for player_name, game_log in players_data.items():
metrics = calculate_consistency_metrics(game_log)
raw = raw_ratings.get(player_name, 50)
adjusted = calculate_consistency_adjusted_rating(metrics, raw)
adjusted_ratings.append((player_name, raw, adjusted))
adjusted_ratings.sort(key=lambda x: x[2], reverse=True)
for player, raw, adjusted in adjusted_ratings:
change = ((adjusted / raw) - 1) * 100
direction = "+" if change > 0 else ""
report.append(
f"{player}: {raw:.1f} -> {adjusted:.1f} ({direction}{change:.1f}%)"
)
report.append("\n## KEY INSIGHTS")
report.append("-" * 50)
# Most consistent
most_consistent = sorted_df.iloc[0]
report.append(f"- Most consistent: {most_consistent['Player']} (CV = {most_consistent['CV%']:.1f}%)")
# Most variable
most_variable = sorted_df.iloc[-1]
report.append(f"- Most variable: {most_variable['Player']} (CV = {most_variable['CV%']:.1f}%)")
report.append("\n" + "=" * 70)
return "\n".join(report)
Discussion Questions
Question 1: MVP Criteria
Should consistency be weighted in MVP voting? How much? Make an argument for and against.
Question 2: Context
A player on a bad team might have more variable scoring because their team's gameplan is inconsistent. How do you account for team context in consistency analysis?
Question 3: Sample Size
How many games are needed for a reliable consistency measure? What happens with injury-shortened seasons?
Question 4: Trade-offs
Is it possible that some variance is good (e.g., "rising to the occasion" in big games)? How would you distinguish good variance from bad?
Deliverables
- Analysis Code: Complete Python implementation of consistency metrics
- Visualization Suite: Distribution comparisons, trend analysis, dashboards
- Statistical Report: Written analysis with significance testing
- Broadcast Summary: Executive summary suitable for TV presentation
- Data Appendix: Full statistical tables for all candidates
Key Takeaways
- PPG alone is insufficient - consistency measures add important context
- Multiple metrics capture different aspects of consistency (CV, IQR, game counts)
- Statistical significance helps distinguish real differences from noise
- Visualization communicates consistency effectively to general audiences
- Context matters - consistency should be one factor among many in evaluation