Chapter 14: Key Takeaways - Player and Team Comparison Charts

Quick Reference Card

Comparison Principles

Principle Description Example
Common Baseline All entities measured against same standard Compare QBs on completion %, not QB to RB
Consistent Scale Same visual encoding for all compared entities 1 inch = 10 yards for all teams
Relevant Context Raw values with percentiles, averages, or benchmarks "72% completion (85th percentile)"

Chart Type Selection Guide

When to Use Each Chart

Chart Type Best For Avoid When
Radar Chart Multi-dimensional profiles (5-8 metrics) Precise quantitative comparison needed
Horizontal Bar Ranking many entities on one metric Fewer than 5 entities
Grouped Bar Comparing 3-5 entities across 3-4 metrics Too many entities or metrics
Stacked Bar Showing composition of totals Parts don't sum to meaningful whole
Dumbbell Chart Before/after or two-point comparisons More than 2 time points
Slope Chart Change between two periods with many entities More than 15 entities
Bump Chart Ranking changes over time Absolute values matter more than rank
Small Multiples Same analysis across many groups Groups aren't comparable
Similarity Heatmap Finding clusters and outliers Fewer than 6 entities
Percentile Chart Contextualizing individual values Population data unavailable

Code Patterns

Radar Chart Template

def create_radar(values, metrics, title):
    angles = np.linspace(0, 2*np.pi, len(metrics), endpoint=False).tolist()
    angles += angles[:1]
    values += values[:1]

    fig, ax = plt.subplots(subplot_kw=dict(polar=True))
    ax.plot(angles, values, 'o-', linewidth=2)
    ax.fill(angles, values, alpha=0.25)
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(metrics)
    ax.set_ylim(0, 1)
    return fig

Dumbbell Chart Template

def create_dumbbell(entities, val1, val2, labels):
    fig, ax = plt.subplots()
    for y, (v1, v2) in enumerate(zip(val1, val2)):
        color = '#2a9d8f' if v2 > v1 else '#e76f51'
        ax.plot([v1, v2], [y, y], color=color, linewidth=2)
        ax.scatter([v1], [y], s=80, color='#264653')
        ax.scatter([v2], [y], s=80, color='#f4a261')
    ax.set_yticks(range(len(entities)))
    ax.set_yticklabels(entities)
    ax.invert_yaxis()
    return fig

Bump Chart Template

def create_bump(rankings, periods, highlight=None):
    fig, ax = plt.subplots()
    for entity, ranks in rankings.items():
        alpha = 1.0 if highlight and entity in highlight else 0.3
        ax.plot(range(len(periods)), ranks, 'o-', alpha=alpha)
    ax.set_xticks(range(len(periods)))
    ax.set_xticklabels(periods)
    ax.invert_yaxis()  # Rank 1 at top
    return fig

Percentile Color Scale

def get_percentile_color(pct):
    if pct >= 75: return '#1a9641'    # Elite (green)
    elif pct >= 50: return '#a6d96a'  # Above avg (light green)
    elif pct >= 25: return '#ffffbf'  # Average (yellow)
    elif pct >= 10: return '#fdae61'  # Below avg (orange)
    else: return '#d7191c'            # Poor (red)

Fairness Checklist

Before publishing any comparison:

  • [ ] Position equivalence: Are compared players at same position/role?
  • [ ] Opportunity adjustment: Normalized for games played, attempts, snaps?
  • [ ] Era adjustment: Accounted for rule/strategy changes if comparing across years?
  • [ ] Metric balance: Selected metrics fairly represent all compared entities?
  • [ ] Context provided: Includes percentiles, averages, or benchmarks?
  • [ ] Scale appropriate: Axes start at sensible values, not truncated to exaggerate?
  • [ ] Methodology disclosed: How were similarities/rankings calculated?

Normalization Formulas

Min-Max Normalization (0-1 Scale)

normalized = (value - min) / (max - min)

Z-Score Standardization

z = (value - mean) / std_dev

Percentile Rank

from scipy.stats import percentileofscore
pct = percentileofscore(population, value)

Inverted Metrics (lower is better)

inverted_normalized = 1 - (value - min) / (max - min)

Color Conventions

Performance Encoding

Meaning Color Hex
Elite/Positive Green #2a9d8f
Below/Negative Red #e76f51
Neutral/Average Yellow #e9c46a
Primary data Dark blue #264653
Secondary data Orange #f4a261

Percentile Zones

Zone Percentile Color
Elite 75-100 #1a9641
Above Average 50-75 #a6d96a
Average 25-50 #ffffbf
Below Average 10-25 #fdae61
Poor 0-10 #d7191c

Common Mistakes to Avoid

1. Cherry-Picking Metrics

Wrong: Include only passing metrics when comparing a better passer to a better runner Right: Include balanced set of relevant position metrics

2. Inconsistent Baselines

Wrong: Start one bar at 0, another at 50 Right: All bars start at same baseline

3. Missing Context

Wrong: "Player has 68% completion rate" Right: "Player has 68% completion rate (42nd percentile)"

4. Radar Chart Overload

Wrong: 15 metrics crammed into radar chart Right: 5-8 carefully selected metrics

5. Ignoring Opportunity

Wrong: Compare backup's per-play stats to starter's Right: Note sample size and game situations


Similarity Calculation

Euclidean Distance

from scipy.spatial.distance import euclidean
dist = euclidean(player_a_features, player_b_features)
similarity = 1 / (1 + dist)

Cosine Similarity

from numpy import dot
from numpy.linalg import norm
similarity = dot(a, b) / (norm(a) * norm(b))

Hierarchical Clustering

from scipy.cluster.hierarchy import linkage, dendrogram
from sklearn.preprocessing import StandardScaler

scaled = StandardScaler().fit_transform(features)
linkage_matrix = linkage(scaled, method='ward')

Quick Decision Tree

What type of comparison?
│
├── Single metric, many entities → Horizontal Bar Chart
│
├── Multiple metrics, few entities (2-4) → Grouped Bar or Radar
│
├── Two time points → Dumbbell or Slope Chart
│
├── Rankings over time → Bump Chart
│
├── Same analysis, many groups → Small Multiples
│
├── Finding similar entities → Heatmap + Dendrogram
│
└── Value in context → Percentile Chart

Key Terminology

Term Definition
Bump Chart Visualization tracking ranking positions over time
Dendrogram Tree diagram showing hierarchical clustering
Diverging Bar Bar chart centered on baseline showing +/- deviation
Dumbbell Chart Paired dots connected by lines for two-value comparison
Percentile Rank Position within distribution (0-100 scale)
Radar Chart Multi-axis chart with values radiating from center
Similarity Matrix Grid of pairwise similarity scores
Slope Chart Two-column connected dot chart showing change
Small Multiples Repeated chart structure for pattern recognition