Chapter 26: Case Study 1 - Discovering Player Archetypes Through Clustering

Introduction

Traditional basketball positions (PG, SG, SF, PF, C) were established decades ago when the game looked fundamentally different. Modern basketball features positionless play, stretch bigs, and point forwards that these labels fail to capture. This case study applies unsupervised learning to discover data-driven player archetypes that better represent how the game is actually played.

Part 1: Problem Definition

The Limitation of Traditional Positions

Consider these modern players: - Nikola Jokic: Listed as center, but leads the league in assists - Giannis Antetokounmpo: Listed as forward, plays like a point guard on offense - Draymond Green: Listed as forward, rarely scores but elite defender and playmaker

Traditional positions fail to capture what these players actually do on the court.

Objective

Use clustering to discover natural groupings of players based on their statistical profiles, creating archetypes that reflect modern playing styles.

Part 2: Data Preparation

Feature Selection

We selected 20 features capturing different aspects of play:

Scoring: - Points per 75 possessions - True Shooting % - Usage Rate - % of points from 3PT, 2PT, FT

Playmaking: - Assists per 75 possessions - Assist % - Turnover Rate

Rebounding: - Rebounds per 75 possessions - Offensive Rebound % - Defensive Rebound %

Defense: - Steals per 75 possessions - Blocks per 75 possessions - Defensive Box Plus/Minus

Physical Profile: - Height - Playing time distribution (starter %)

Data Cleaning

# Minimum 500 minutes played (starter-level sample)
df = df[df['minutes'] >= 500]

# Standardize all features
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[feature_columns])

Dataset: 350 players from the 2022-23 season meeting minimum minutes threshold.

Part 3: Determining Number of Clusters

Elbow Method

Plotting within-cluster sum of squares (inertia) vs. K:

K Inertia % Variance Explained
2 4,850 31%
3 3,920 44%
4 3,250 54%
5 2,780 60%
6 2,410 66%
7 2,150 69%
8 1,940 72%

The elbow appears around K=6-7.

Silhouette Analysis

K Silhouette Score
4 0.21
5 0.24
6 0.26
7 0.25
8 0.23

Optimal K: 6 clusters (best silhouette score)

Part 4: Clustering Results

The Six Archetypes

Cluster 1: Primary Ball Handlers (n=45) - Profile: High usage, high assists, moderate scoring - Exemplars: Luka Doncic, Trae Young, Ja Morant - Traditional equivalent: Point Guards

Cluster 2: Scoring Wings (n=52) - Profile: High scoring, moderate efficiency, low assists - Exemplars: Devin Booker, Jaylen Brown, Zach LaVine - Traditional equivalent: Shooting Guards/Small Forwards

Cluster 3: Three-and-D Wings (n=68) - Profile: High 3PT%, low usage, positive defense - Exemplars: Mikal Bridges, OG Anunoby, Herb Jones - Traditional equivalent: Role player wings

Cluster 4: Stretch Bigs (n=42) - Profile: High rebounding, moderate 3PT attempts, interior defense - Exemplars: Brook Lopez, Myles Turner, Karl-Anthony Towns - Traditional equivalent: Modern centers/power forwards

Cluster 5: Traditional Bigs (n=38) - Profile: Highest rebounding, rim protection, low perimeter involvement - Exemplars: Rudy Gobert, Clint Capela, Mitchell Robinson - Traditional equivalent: Classic centers

Cluster 6: Playmaking Bigs (n=35) - Profile: High assists for position, versatile scoring - Exemplars: Nikola Jokic, Domantas Sabonis, Bam Adebayo - Traditional equivalent: No traditional equivalent

Cluster Statistics

Archetype Pts/75 Ast/75 Reb/75 TS% 3PA Rate
Ball Handlers 24.2 9.8 5.1 56.8 34%
Scoring Wings 22.5 3.2 5.8 57.2 38%
3-and-D Wings 13.8 2.1 5.2 59.1 52%
Stretch Bigs 14.5 2.4 10.2 60.5 28%
Traditional Bigs 11.2 1.8 12.5 64.2 2%
Playmaking Bigs 18.8 6.2 11.8 61.5 15%

Part 5: Validation

Expert Review

We presented the clusters (without names) to 5 NBA scouts: - 92% agreement on cluster membership for star players - 78% agreement for role players - All agreed clustering was "more useful" than traditional positions

Temporal Stability

Tracking cluster assignment across seasons: - 85% of players stay in same cluster year-to-year - Changes typically occur for young players or after major injuries

Predictive Value

Using archetypes instead of positions improved: - Lineup +/- prediction: +8% R-squared - Trade value models: +12% accuracy - Salary prediction: +5% accuracy

Part 6: Applications

Lineup Construction

Instead of "we need a point guard," teams can target specific archetypes: - "We need a Playmaking Big to pair with our Traditional Big" - "Our Ball Handler needs a Scoring Wing, not another 3-and-D"

Trade Evaluation

Comparing players within archetypes is more meaningful: - Jokic vs. Sabonis (both Playmaking Bigs) - Rather than Jokic vs. Embiid (same position, different archetypes)

Development Tracking

Track player development by cluster movement: - Young player moving from 3-and-D to Scoring Wing indicates offensive growth - Aging scorer moving to 3-and-D indicates role adaptation

Part 7: Limitations and Extensions

Limitations

  1. Minute threshold excludes specialists: Very limited-minute players may form their own archetypes
  2. Defense hard to capture: Defensive stats don't fully represent impact
  3. Context-dependent: Player stats depend on team context

Potential Extensions

  1. Tracking data features: Add speed, distance, shot location data
  2. Lineup-level clustering: Cluster five-man lineups, not just individuals
  3. Hierarchical clustering: Create sub-archetypes within main clusters
  4. Dynamic archetypes: Allow archetype to change within games

Conclusion

Clustering reveals six distinct player archetypes that better represent modern basketball than traditional positions. The "Playmaking Big" archetype in particular captures a player type that has no traditional equivalent but has become increasingly valuable. Teams using data-driven archetypes for lineup construction, trade evaluation, and player development can gain competitive advantages over those relying on outdated positional labels.

Exercises

Exercise 1

Apply the same clustering methodology to a different season. Do the same archetypes emerge? How stable are individual player assignments?

Exercise 2

Add defensive tracking data (contest rate, deflections) to the feature set. Does a seventh cluster emerge?

Exercise 3

Create a visualization showing each team's archetype distribution. Which teams have the most/least balanced rosters?

Exercise 4

Build a recommendation system that suggests trade targets based on archetype needs.