College to WNBA Translation
Beginner
10 min read
1 views
Nov 27, 2025
Draft Prospect Evaluation
Projecting college players to WNBA success is one of the most challenging aspects of basketball analytics. Performance in the NCAA doesn't always translate directly to professional success due to differences in competition level, pace, physicality, and team context.
Key Translation Factors
- Statistical Performance: Points, rebounds, assists per 40 minutes
- Efficiency Metrics: True shooting %, PER, win shares
- Competition Level: Conference strength adjustment
- Age and Experience: Age-relative-to-competition matters
- Physical Attributes: Height, wingspan, athleticism
- Skill Versatility: Shooting range, defensive ability, playmaking
Python: College-to-WNBA Projection Model
Python: WNBA Draft Prospect Evaluation
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
# Sample dataset: Historical NCAA to WNBA player data
# In practice, this would be a comprehensive database of draft picks and their careers
historical_prospects = pd.DataFrame({
'player': ['Player A', 'Player B', 'Player C', 'Player D', 'Player E',
'Player F', 'Player G', 'Player H', 'Player I', 'Player J'],
'college_ppg': [22.5, 19.8, 18.2, 16.5, 21.2, 17.8, 15.5, 20.1, 14.2, 19.5],
'college_rpg': [9.5, 6.8, 5.2, 10.2, 7.5, 4.8, 8.5, 5.5, 9.8, 6.2],
'college_apg': [3.8, 5.2, 6.8, 2.5, 3.2, 7.5, 2.8, 4.2, 3.5, 5.8],
'college_ts_pct': [0.585, 0.562, 0.545, 0.598, 0.575, 0.538, 0.612, 0.555, 0.602, 0.548],
'college_per': [28.5, 24.8, 22.5, 26.2, 27.1, 23.5, 25.8, 24.2, 26.5, 23.8],
'height_inches': [75, 71, 69, 78, 74, 68, 76, 72, 77, 70],
'age_at_draft': [21.5, 22.2, 21.8, 22.5, 21.2, 22.8, 21.5, 22.0, 21.8, 22.5],
'conference_strength': [8.5, 7.2, 9.1, 6.8, 8.2, 8.8, 7.5, 8.0, 7.8, 8.5],
# WNBA career outcomes (target variables)
'wnba_ppg': [15.2, 12.5, 10.8, 8.5, 14.1, 11.2, 10.5, 13.2, 9.8, 11.8],
'wnba_per': [18.5, 15.2, 14.1, 13.8, 17.2, 14.5, 15.8, 16.1, 14.2, 15.0],
'wnba_minutes': [28.5, 24.2, 22.5, 18.2, 26.8, 23.5, 21.5, 25.2, 20.5, 24.0]
})
# =============================================================================
# 1. Feature Engineering
# =============================================================================
def engineer_features(df):
"""Create advanced features for projection model"""
# Per-40 minute stats (normalize for playing time)
df['college_pts_per_40'] = df['college_ppg'] * (40 / 35) # Assume 35 mpg
df['college_reb_per_40'] = df['college_rpg'] * (40 / 35)
df['college_ast_per_40'] = df['college_apg'] * (40 / 35)
# Age-adjusted metrics (younger = more upside)
df['age_adjustment'] = 23 - df['age_at_draft'] # Positive = younger
df['age_adj_per'] = df['college_per'] * (1 + 0.05 * df['age_adjustment'])
# Conference-adjusted stats
df['adj_ppg'] = df['college_ppg'] * (df['conference_strength'] / 8.0)
df['adj_per'] = df['college_per'] * (df['conference_strength'] / 8.0)
# Versatility score (combination of skills)
df['versatility'] = (
(df['college_ppg'] / 20) +
(df['college_rpg'] / 8) +
(df['college_apg'] / 5)
) / 3
# Height-adjusted scoring
df['height_adj_scoring'] = df['college_ppg'] / (df['height_inches'] / 72)
return df
historical_prospects = engineer_features(historical_prospects)
# =============================================================================
# 2. Build Projection Model
# =============================================================================
# Select features for model
features = ['college_pts_per_40', 'college_reb_per_40', 'college_ast_per_40',
'college_ts_pct', 'age_adj_per', 'height_inches',
'conference_strength', 'versatility']
X = historical_prospects[features]
y_ppg = historical_prospects['wnba_ppg']
y_per = historical_prospects['wnba_per']
# Split data (in practice, use larger dataset)
X_train, X_test, y_train_ppg, y_test_ppg = train_test_split(
X, y_ppg, test_size=0.2, random_state=42
)
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train Random Forest model
model_ppg = RandomForestRegressor(
n_estimators=100,
max_depth=5,
min_samples_split=2,
random_state=42
)
model_ppg.fit(X_train_scaled, y_train_ppg)
# Evaluate model
train_score = model_ppg.score(X_train_scaled, y_train_ppg)
test_predictions = model_ppg.predict(X_test_scaled)
print("=== WNBA PPG Projection Model ===")
print(f"Training R²: {train_score:.3f}")
# Feature importance
feature_importance = pd.DataFrame({
'feature': features,
'importance': model_ppg.feature_importances_
}).sort_values('importance', ascending=False)
print("\nFeature Importance for WNBA Success:")
print(feature_importance)
# =============================================================================
# 3. Translation Factors
# =============================================================================
def calculate_translation_factor(college_stats, wnba_stats, stat_name):
"""Calculate average translation factor from college to WNBA"""
college_avg = college_stats.mean()
wnba_avg = wnba_stats.mean()
translation = wnba_avg / college_avg
return translation
# Calculate translation factors
ppg_factor = calculate_translation_factor(
historical_prospects['college_ppg'],
historical_prospects['wnba_ppg'],
'PPG'
)
per_factor = calculate_translation_factor(
historical_prospects['college_per'],
historical_prospects['wnba_per'],
'PER'
)
print("\n=== College to WNBA Translation Factors ===")
print(f"PPG Translation: {ppg_factor:.3f} (WNBA = {ppg_factor:.1%} of college)")
print(f"PER Translation: {per_factor:.3f} (WNBA = {per_factor:.1%} of college)")
# =============================================================================
# 4. Prospect Evaluation Function
# =============================================================================
def evaluate_prospect(prospect_data, model, scaler, features):
"""Evaluate a college prospect's WNBA projection"""
# Engineer features
prospect_df = pd.DataFrame([prospect_data])
prospect_df = engineer_features(prospect_df)
# Prepare features
X_prospect = prospect_df[features]
X_prospect_scaled = scaler.transform(X_prospect)
# Predict WNBA performance
predicted_ppg = model.predict(X_prospect_scaled)[0]
# Simple confidence interval (±15% based on model uncertainty)
confidence_range = predicted_ppg * 0.15
return {
'predicted_ppg': predicted_ppg,
'range_low': predicted_ppg - confidence_range,
'range_high': predicted_ppg + confidence_range,
'projection_grade': classify_projection(predicted_ppg)
}
def classify_projection(ppg):
"""Classify prospect tier based on projected PPG"""
if ppg >= 15:
return "Star (Top 5 Pick)"
elif ppg >= 12:
return "Solid Starter (Top 15 Pick)"
elif ppg >= 9:
return "Role Player (Roster Spot)"
else:
return "Developmental (Borderline)"
# Example: Evaluate new prospect
new_prospect = {
'college_ppg': 20.5,
'college_rpg': 8.2,
'college_apg': 4.5,
'college_ts_pct': 0.580,
'college_per': 26.8,
'height_inches': 74,
'age_at_draft': 21.8,
'conference_strength': 8.5
}
projection = evaluate_prospect(new_prospect, model_ppg, scaler, features)
print("\n=== New Prospect Evaluation ===")
print(f"Projected WNBA PPG: {projection['predicted_ppg']:.1f}")
print(f"Confidence Range: {projection['range_low']:.1f} - {projection['range_high']:.1f}")
print(f"Projection Grade: {projection['projection_grade']}")
# =============================================================================
# 5. Position-Specific Analysis
# =============================================================================
# In practice, separate models for guards vs forwards vs posts
position_adjustments = {
'Guard': {'ppg_weight': 1.1, 'ast_weight': 1.3, 'reb_weight': 0.8},
'Forward': {'ppg_weight': 1.0, 'ast_weight': 1.0, 'reb_weight': 1.1},
'Post': {'ppg_weight': 0.9, 'ast_weight': 0.7, 'reb_weight': 1.3}
}
print("\n=== Position-Specific Adjustments ===")
for position, weights in position_adjustments.items():
print(f"{position}: {weights}")
print("\n=== Translation Model Complete ===")
print("✓ Feature engineering")
print("✓ Projection model trained")
print("✓ Translation factors calculated")
print("✓ Prospect evaluation framework")
R: NCAA to WNBA Projection with wehoop
library(wehoop)
library(tidyverse)
library(randomForest)
library(caret)
# The wehoop package provides access to both WNBA and NCAA women's basketball data
# =============================================================================
# 1. Load NCAA Women's Basketball Data
# =============================================================================
# Load NCAA women's basketball player box scores
ncaa_players <- wehoop::load_wbb_player_box(seasons = 2023)
# Calculate season averages for college players
ncaa_season_stats <- ncaa_players %>%
group_by(athlete_id, athlete_display_name, team_display_name) %>%
summarise(
games = n(),
college_ppg = mean(points, na.rm = TRUE),
college_rpg = mean(rebounds, na.rm = TRUE),
college_apg = mean(assists, na.rm = TRUE),
college_minutes = mean(minutes, na.rm = TRUE),
college_fgm = sum(field_goals_made, na.rm = TRUE),
college_fga = sum(field_goals_attempted, na.rm = TRUE),
college_fg3m = sum(three_point_field_goals_made, na.rm = TRUE),
college_ftm = sum(free_throws_made, na.rm = TRUE),
college_fta = sum(free_throws_attempted, na.rm = TRUE),
.groups = "drop"
) %>%
filter(games >= 15) %>%
mutate(
# Calculate efficiency metrics
college_ts_pct = college_ppg / (2 * (college_fga/games + 0.44 * college_fta/games)),
college_fg_pct = college_fgm / college_fga,
# Per-40 minute stats
college_pts_per_40 = college_ppg * (40 / college_minutes),
college_reb_per_40 = college_rpg * (40 / college_minutes),
college_ast_per_40 = college_apg * (40 / college_minutes),
# Simplified PER
college_per = (college_ppg + college_rpg + college_apg) / college_minutes * 40
)
cat("=== Top NCAA Prospects by Stats ===\n")
print(ncaa_season_stats %>%
select(athlete_display_name, team_display_name, college_ppg,
college_rpg, college_apg, college_ts_pct) %>%
arrange(desc(college_ppg)) %>%
head(10))
# =============================================================================
# 2. Historical Translation Analysis
# =============================================================================
# In practice, you would have a database linking NCAA players to WNBA careers
# For demonstration, we'll create a synthetic historical dataset
historical_translation <- tibble(
player = paste("Player", LETTERS[1:20]),
college_ppg = runif(20, 12, 25),
college_rpg = runif(20, 4, 11),
college_apg = runif(20, 2, 8),
college_ts_pct = runif(20, 0.50, 0.65),
college_per = runif(20, 20, 32),
height_inches = sample(68:78, 20, replace = TRUE),
age_at_draft = runif(20, 21, 23),
conference_strength = runif(20, 6, 10),
# WNBA outcomes
wnba_ppg = NA,
wnba_per = NA
)
# Simple translation: WNBA stats are typically 55-65% of college stats
historical_translation <- historical_translation %>%
mutate(
wnba_ppg = college_ppg * runif(n(), 0.55, 0.68),
wnba_per = college_per * runif(n(), 0.58, 0.72),
wnba_minutes = 20 + (college_ppg - 12) * 0.8 + rnorm(n(), 0, 2)
)
# =============================================================================
# 3. Calculate Translation Factors
# =============================================================================
translation_factors <- historical_translation %>%
summarise(
ppg_factor = mean(wnba_ppg / college_ppg, na.rm = TRUE),
per_factor = mean(wnba_per / college_per, na.rm = TRUE),
ppg_sd = sd(wnba_ppg / college_ppg, na.rm = TRUE)
)
cat("\n=== College to WNBA Translation Factors ===\n")
cat(sprintf("PPG Translation: %.3f (±%.3f)\n",
translation_factors$ppg_factor,
translation_factors$ppg_sd))
cat(sprintf("PER Translation: %.3f\n",
translation_factors$per_factor))
cat(sprintf("\nInterpretation: WNBA PPG ≈ %.1f%% of College PPG\n",
translation_factors$ppg_factor * 100))
# =============================================================================
# 4. Feature Engineering
# =============================================================================
prepare_prospect_features <- function(data) {
data %>%
mutate(
# Age adjustment (younger players have more upside)
age_adjustment = 23 - age_at_draft,
age_adj_per = college_per * (1 + 0.05 * age_adjustment),
# Conference-adjusted stats
conf_adj_ppg = college_ppg * (conference_strength / 8),
# Versatility score
versatility = (college_ppg/20 + college_rpg/8 + college_apg/5) / 3,
# Per-40 minute rates
pts_per_40 = college_ppg * (40 / 35), # Assume 35 mpg
reb_per_40 = college_rpg * (40 / 35),
ast_per_40 = college_apg * (40 / 35)
)
}
historical_translation <- prepare_prospect_features(historical_translation)
# =============================================================================
# 5. Build Projection Model
# =============================================================================
# Select features
model_features <- c("pts_per_40", "reb_per_40", "ast_per_40",
"college_ts_pct", "age_adj_per", "height_inches",
"conference_strength", "versatility")
# Prepare training data
model_data <- historical_translation %>%
select(all_of(model_features), wnba_ppg) %>%
drop_na()
# Split data
set.seed(42)
train_index <- createDataPartition(model_data$wnba_ppg, p = 0.8, list = FALSE)
train_data <- model_data[train_index, ]
test_data <- model_data[-train_index, ]
# Train Random Forest model
rf_model <- randomForest(
wnba_ppg ~ .,
data = train_data,
ntree = 100,
mtry = 3,
importance = TRUE
)
# Model performance
train_pred <- predict(rf_model, train_data)
test_pred <- predict(rf_model, test_data)
train_r2 <- cor(train_pred, train_data$wnba_ppg)^2
test_r2 <- cor(test_pred, test_data$wnba_ppg)^2
cat("\n=== WNBA Projection Model Performance ===\n")
cat(sprintf("Training R²: %.3f\n", train_r2))
cat(sprintf("Testing R²: %.3f\n", test_r2))
# Feature importance
importance_df <- importance(rf_model) %>%
as.data.frame() %>%
rownames_to_column("feature") %>%
arrange(desc(`%IncMSE`))
cat("\n=== Feature Importance ===\n")
print(importance_df)
# =============================================================================
# 6. Prospect Evaluation Function
# =============================================================================
evaluate_wnba_prospect <- function(prospect_stats, model, translation_factors) {
# Prepare prospect data
prospect_df <- prepare_prospect_features(prospect_stats)
# Predict WNBA performance
predicted_ppg <- predict(model, newdata = prospect_df)
# Calculate confidence interval (±20%)
conf_interval <- predicted_ppg * 0.20
# Classification
grade <- case_when(
predicted_ppg >= 15 ~ "Star Potential (Top 5)",
predicted_ppg >= 12 ~ "Solid Starter (Top 15)",
predicted_ppg >= 9 ~ "Role Player",
TRUE ~ "Developmental"
)
tibble(
predicted_wnba_ppg = predicted_ppg,
range_low = predicted_ppg - conf_interval,
range_high = predicted_ppg + conf_interval,
projection_grade = grade
)
}
# Example: Evaluate a top college prospect
top_college_prospect <- tibble(
college_ppg = 22.5,
college_rpg = 9.2,
college_apg = 4.8,
college_ts_pct = 0.595,
college_per = 28.5,
height_inches = 75,
age_at_draft = 21.5,
conference_strength = 9.0
)
prospect_evaluation <- evaluate_wnba_prospect(
top_college_prospect,
rf_model,
translation_factors
)
cat("\n=== Prospect Evaluation ===\n")
cat(sprintf("College PPG: %.1f\n", top_college_prospect$college_ppg))
cat(sprintf("Projected WNBA PPG: %.1f\n", prospect_evaluation$predicted_wnba_ppg))
cat(sprintf("Confidence Range: %.1f - %.1f\n",
prospect_evaluation$range_low,
prospect_evaluation$range_high))
cat(sprintf("Grade: %s\n", prospect_evaluation$projection_grade))
# =============================================================================
# 7. Draft Class Evaluation
# =============================================================================
# Evaluate multiple prospects
evaluate_draft_class <- function(prospects_df, model) {
prospects_df %>%
rowwise() %>%
mutate(
evaluation = list(evaluate_wnba_prospect(
cur_data(), model, translation_factors
))
) %>%
unnest(evaluation) %>%
arrange(desc(predicted_wnba_ppg))
}
cat("\n=== Translation Framework Complete ===\n")
cat("✓ NCAA data loaded and processed\n")
cat("✓ Translation factors calculated\n")
cat("✓ Projection model trained\n")
cat("✓ Prospect evaluation framework ready\n")
cat("✓ Draft class analysis capability\n")
Translation Challenges
College-to-WNBA projection is inherently uncertain. Factors beyond statistics—work ethic, basketball IQ, injury history, and fit with team system—play crucial roles in professional success. Statistical models provide a baseline expectation, but scouting and qualitative evaluation remain essential.
Effective Prospect Evaluation
- Adjust for conference strength and competition quality
- Favor younger players with similar production (more upside)
- Prioritize efficiency over raw scoring volume
- Consider versatility—players with multiple skills translate better
- Account for positional scarcity in WNBA evaluation
Common Projection Mistakes
- Overvaluing high-usage scorers on weak college teams
- Ignoring shooting efficiency and defensive ability
- Not adjusting for age (senior dominance vs freshman potential)
- Underestimating the physicality gap from college to pros
Discussion
Have questions or feedback? Join our community discussion on
Discord or
GitHub Discussions.
Table of Contents
Related Topics
Quick Actions