Chapter 26: Case Study 1 - Discovering Player Archetypes Through Clustering
Introduction
Traditional basketball positions (PG, SG, SF, PF, C) were established decades ago when the game looked fundamentally different. Modern basketball features positionless play, stretch bigs, and point forwards that these labels fail to capture. This case study applies unsupervised learning to discover data-driven player archetypes that better represent how the game is actually played.
Part 1: Problem Definition
The Limitation of Traditional Positions
Consider these modern players: - Nikola Jokic: Listed as center, but leads the league in assists - Giannis Antetokounmpo: Listed as forward, plays like a point guard on offense - Draymond Green: Listed as forward, rarely scores but elite defender and playmaker
Traditional positions fail to capture what these players actually do on the court.
Objective
Use clustering to discover natural groupings of players based on their statistical profiles, creating archetypes that reflect modern playing styles.
Part 2: Data Preparation
Feature Selection
We selected 20 features capturing different aspects of play:
Scoring: - Points per 75 possessions - True Shooting % - Usage Rate - % of points from 3PT, 2PT, FT
Playmaking: - Assists per 75 possessions - Assist % - Turnover Rate
Rebounding: - Rebounds per 75 possessions - Offensive Rebound % - Defensive Rebound %
Defense: - Steals per 75 possessions - Blocks per 75 possessions - Defensive Box Plus/Minus
Physical Profile: - Height - Playing time distribution (starter %)
Data Cleaning
# Minimum 500 minutes played (starter-level sample)
df = df[df['minutes'] >= 500]
# Standardize all features
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[feature_columns])
Dataset: 350 players from the 2022-23 season meeting minimum minutes threshold.
Part 3: Determining Number of Clusters
Elbow Method
Plotting within-cluster sum of squares (inertia) vs. K:
| K | Inertia | % Variance Explained |
|---|---|---|
| 2 | 4,850 | 31% |
| 3 | 3,920 | 44% |
| 4 | 3,250 | 54% |
| 5 | 2,780 | 60% |
| 6 | 2,410 | 66% |
| 7 | 2,150 | 69% |
| 8 | 1,940 | 72% |
The elbow appears around K=6-7.
Silhouette Analysis
| K | Silhouette Score |
|---|---|
| 4 | 0.21 |
| 5 | 0.24 |
| 6 | 0.26 |
| 7 | 0.25 |
| 8 | 0.23 |
Optimal K: 6 clusters (best silhouette score)
Part 4: Clustering Results
The Six Archetypes
Cluster 1: Primary Ball Handlers (n=45) - Profile: High usage, high assists, moderate scoring - Exemplars: Luka Doncic, Trae Young, Ja Morant - Traditional equivalent: Point Guards
Cluster 2: Scoring Wings (n=52) - Profile: High scoring, moderate efficiency, low assists - Exemplars: Devin Booker, Jaylen Brown, Zach LaVine - Traditional equivalent: Shooting Guards/Small Forwards
Cluster 3: Three-and-D Wings (n=68) - Profile: High 3PT%, low usage, positive defense - Exemplars: Mikal Bridges, OG Anunoby, Herb Jones - Traditional equivalent: Role player wings
Cluster 4: Stretch Bigs (n=42) - Profile: High rebounding, moderate 3PT attempts, interior defense - Exemplars: Brook Lopez, Myles Turner, Karl-Anthony Towns - Traditional equivalent: Modern centers/power forwards
Cluster 5: Traditional Bigs (n=38) - Profile: Highest rebounding, rim protection, low perimeter involvement - Exemplars: Rudy Gobert, Clint Capela, Mitchell Robinson - Traditional equivalent: Classic centers
Cluster 6: Playmaking Bigs (n=35) - Profile: High assists for position, versatile scoring - Exemplars: Nikola Jokic, Domantas Sabonis, Bam Adebayo - Traditional equivalent: No traditional equivalent
Cluster Statistics
| Archetype | Pts/75 | Ast/75 | Reb/75 | TS% | 3PA Rate |
|---|---|---|---|---|---|
| Ball Handlers | 24.2 | 9.8 | 5.1 | 56.8 | 34% |
| Scoring Wings | 22.5 | 3.2 | 5.8 | 57.2 | 38% |
| 3-and-D Wings | 13.8 | 2.1 | 5.2 | 59.1 | 52% |
| Stretch Bigs | 14.5 | 2.4 | 10.2 | 60.5 | 28% |
| Traditional Bigs | 11.2 | 1.8 | 12.5 | 64.2 | 2% |
| Playmaking Bigs | 18.8 | 6.2 | 11.8 | 61.5 | 15% |
Part 5: Validation
Expert Review
We presented the clusters (without names) to 5 NBA scouts: - 92% agreement on cluster membership for star players - 78% agreement for role players - All agreed clustering was "more useful" than traditional positions
Temporal Stability
Tracking cluster assignment across seasons: - 85% of players stay in same cluster year-to-year - Changes typically occur for young players or after major injuries
Predictive Value
Using archetypes instead of positions improved: - Lineup +/- prediction: +8% R-squared - Trade value models: +12% accuracy - Salary prediction: +5% accuracy
Part 6: Applications
Lineup Construction
Instead of "we need a point guard," teams can target specific archetypes: - "We need a Playmaking Big to pair with our Traditional Big" - "Our Ball Handler needs a Scoring Wing, not another 3-and-D"
Trade Evaluation
Comparing players within archetypes is more meaningful: - Jokic vs. Sabonis (both Playmaking Bigs) - Rather than Jokic vs. Embiid (same position, different archetypes)
Development Tracking
Track player development by cluster movement: - Young player moving from 3-and-D to Scoring Wing indicates offensive growth - Aging scorer moving to 3-and-D indicates role adaptation
Part 7: Limitations and Extensions
Limitations
- Minute threshold excludes specialists: Very limited-minute players may form their own archetypes
- Defense hard to capture: Defensive stats don't fully represent impact
- Context-dependent: Player stats depend on team context
Potential Extensions
- Tracking data features: Add speed, distance, shot location data
- Lineup-level clustering: Cluster five-man lineups, not just individuals
- Hierarchical clustering: Create sub-archetypes within main clusters
- Dynamic archetypes: Allow archetype to change within games
Conclusion
Clustering reveals six distinct player archetypes that better represent modern basketball than traditional positions. The "Playmaking Big" archetype in particular captures a player type that has no traditional equivalent but has become increasingly valuable. Teams using data-driven archetypes for lineup construction, trade evaluation, and player development can gain competitive advantages over those relying on outdated positional labels.
Exercises
Exercise 1
Apply the same clustering methodology to a different season. Do the same archetypes emerge? How stable are individual player assignments?
Exercise 2
Add defensive tracking data (contest rate, deflections) to the feature set. Does a seventh cluster emerge?
Exercise 3
Create a visualization showing each team's archetype distribution. Which teams have the most/least balanced rosters?
Exercise 4
Build a recommendation system that suggests trade targets based on archetype needs.