Chapter 14: Key Takeaways - Player and Team Comparison Charts

DataField.Dev

Chapter 14: Key Takeaways - Player and Team Comparison Charts

Quick Reference Card

Comparison Principles

Principle	Description	Example
Common Baseline	All entities measured against same standard	Compare QBs on completion %, not QB to RB
Consistent Scale	Same visual encoding for all compared entities	1 inch = 10 yards for all teams
Relevant Context	Raw values with percentiles, averages, or benchmarks	"72% completion (85th percentile)"

Chart Type Selection Guide

When to Use Each Chart

Chart Type	Best For	Avoid When
Radar Chart	Multi-dimensional profiles (5-8 metrics)	Precise quantitative comparison needed
Horizontal Bar	Ranking many entities on one metric	Fewer than 5 entities
Grouped Bar	Comparing 3-5 entities across 3-4 metrics	Too many entities or metrics
Stacked Bar	Showing composition of totals	Parts don't sum to meaningful whole
Dumbbell Chart	Before/after or two-point comparisons	More than 2 time points
Slope Chart	Change between two periods with many entities	More than 15 entities
Bump Chart	Ranking changes over time	Absolute values matter more than rank
Small Multiples	Same analysis across many groups	Groups aren't comparable
Similarity Heatmap	Finding clusters and outliers	Fewer than 6 entities
Percentile Chart	Contextualizing individual values	Population data unavailable

Code Patterns

Radar Chart Template

def create_radar(values, metrics, title):
    angles = np.linspace(0, 2*np.pi, len(metrics), endpoint=False).tolist()
    angles += angles[:1]
    values += values[:1]

    fig, ax = plt.subplots(subplot_kw=dict(polar=True))
    ax.plot(angles, values, 'o-', linewidth=2)
    ax.fill(angles, values, alpha=0.25)
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(metrics)
    ax.set_ylim(0, 1)
    return fig

Dumbbell Chart Template

def create_dumbbell(entities, val1, val2, labels):
    fig, ax = plt.subplots()
    for y, (v1, v2) in enumerate(zip(val1, val2)):
        color = '#2a9d8f' if v2 > v1 else '#e76f51'
        ax.plot([v1, v2], [y, y], color=color, linewidth=2)
        ax.scatter([v1], [y], s=80, color='#264653')
        ax.scatter([v2], [y], s=80, color='#f4a261')
    ax.set_yticks(range(len(entities)))
    ax.set_yticklabels(entities)
    ax.invert_yaxis()
    return fig

Bump Chart Template

def create_bump(rankings, periods, highlight=None):
    fig, ax = plt.subplots()
    for entity, ranks in rankings.items():
        alpha = 1.0 if highlight and entity in highlight else 0.3
        ax.plot(range(len(periods)), ranks, 'o-', alpha=alpha)
    ax.set_xticks(range(len(periods)))
    ax.set_xticklabels(periods)
    ax.invert_yaxis()  # Rank 1 at top
    return fig

Percentile Color Scale

def get_percentile_color(pct):
    if pct >= 75: return '#1a9641'    # Elite (green)
    elif pct >= 50: return '#a6d96a'  # Above avg (light green)
    elif pct >= 25: return '#ffffbf'  # Average (yellow)
    elif pct >= 10: return '#fdae61'  # Below avg (orange)
    else: return '#d7191c'            # Poor (red)

Fairness Checklist

Before publishing any comparison:

[ ] Position equivalence: Are compared players at same position/role?
[ ] Opportunity adjustment: Normalized for games played, attempts, snaps?
[ ] Era adjustment: Accounted for rule/strategy changes if comparing across years?
[ ] Metric balance: Selected metrics fairly represent all compared entities?
[ ] Context provided: Includes percentiles, averages, or benchmarks?
[ ] Scale appropriate: Axes start at sensible values, not truncated to exaggerate?
[ ] Methodology disclosed: How were similarities/rankings calculated?

Normalization Formulas

Min-Max Normalization (0-1 Scale)

normalized = (value - min) / (max - min)

Z-Score Standardization

z = (value - mean) / std_dev

Percentile Rank

from scipy.stats import percentileofscore
pct = percentileofscore(population, value)

Inverted Metrics (lower is better)

inverted_normalized = 1 - (value - min) / (max - min)

Color Conventions

Performance Encoding

Meaning	Color	Hex
Elite/Positive	Green	#2a9d8f
Below/Negative	Red	#e76f51
Neutral/Average	Yellow	#e9c46a
Primary data	Dark blue	#264653
Secondary data	Orange	#f4a261

Percentile Zones

Zone	Percentile	Color
Elite	75-100	#1a9641
Above Average	50-75	#a6d96a
Average	25-50	#ffffbf
Below Average	10-25	#fdae61
Poor	0-10	#d7191c

Common Mistakes to Avoid

1. Cherry-Picking Metrics

Wrong: Include only passing metrics when comparing a better passer to a better runner Right: Include balanced set of relevant position metrics

2. Inconsistent Baselines

Wrong: Start one bar at 0, another at 50 Right: All bars start at same baseline

3. Missing Context

Wrong: "Player has 68% completion rate" Right: "Player has 68% completion rate (42nd percentile)"

4. Radar Chart Overload

Wrong: 15 metrics crammed into radar chart Right: 5-8 carefully selected metrics

5. Ignoring Opportunity

Wrong: Compare backup's per-play stats to starter's Right: Note sample size and game situations

Similarity Calculation

Euclidean Distance

from scipy.spatial.distance import euclidean
dist = euclidean(player_a_features, player_b_features)
similarity = 1 / (1 + dist)

Cosine Similarity

from numpy import dot
from numpy.linalg import norm
similarity = dot(a, b) / (norm(a) * norm(b))

Hierarchical Clustering

from scipy.cluster.hierarchy import linkage, dendrogram
from sklearn.preprocessing import StandardScaler

scaled = StandardScaler().fit_transform(features)
linkage_matrix = linkage(scaled, method='ward')

Quick Decision Tree

What type of comparison?
│
├── Single metric, many entities → Horizontal Bar Chart
│
├── Multiple metrics, few entities (2-4) → Grouped Bar or Radar
│
├── Two time points → Dumbbell or Slope Chart
│
├── Rankings over time → Bump Chart
│
├── Same analysis, many groups → Small Multiples
│
├── Finding similar entities → Heatmap + Dendrogram
│
└── Value in context → Percentile Chart

Key Terminology

Term	Definition
Bump Chart	Visualization tracking ranking positions over time
Dendrogram	Tree diagram showing hierarchical clustering
Diverging Bar	Bar chart centered on baseline showing +/- deviation
Dumbbell Chart	Paired dots connected by lines for two-value comparison
Percentile Rank	Position within distribution (0-100 scale)
Radar Chart	Multi-axis chart with values radiating from center
Similarity Matrix	Grid of pairwise similarity scores
Slope Chart	Two-column connected dot chart showing change
Small Multiples	Repeated chart structure for pattern recognition