College to WNBA Translation

Beginner 10 min read 1 views Nov 27, 2025

Draft Prospect Evaluation

Projecting college players to WNBA success is one of the most challenging aspects of basketball analytics. Performance in the NCAA doesn't always translate directly to professional success due to differences in competition level, pace, physicality, and team context.

Key Translation Factors

  • Statistical Performance: Points, rebounds, assists per 40 minutes
  • Efficiency Metrics: True shooting %, PER, win shares
  • Competition Level: Conference strength adjustment
  • Age and Experience: Age-relative-to-competition matters
  • Physical Attributes: Height, wingspan, athleticism
  • Skill Versatility: Shooting range, defensive ability, playmaking

Python: College-to-WNBA Projection Model

Python: WNBA Draft Prospect Evaluation

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

# Sample dataset: Historical NCAA to WNBA player data
# In practice, this would be a comprehensive database of draft picks and their careers

historical_prospects = pd.DataFrame({
    'player': ['Player A', 'Player B', 'Player C', 'Player D', 'Player E',
               'Player F', 'Player G', 'Player H', 'Player I', 'Player J'],
    'college_ppg': [22.5, 19.8, 18.2, 16.5, 21.2, 17.8, 15.5, 20.1, 14.2, 19.5],
    'college_rpg': [9.5, 6.8, 5.2, 10.2, 7.5, 4.8, 8.5, 5.5, 9.8, 6.2],
    'college_apg': [3.8, 5.2, 6.8, 2.5, 3.2, 7.5, 2.8, 4.2, 3.5, 5.8],
    'college_ts_pct': [0.585, 0.562, 0.545, 0.598, 0.575, 0.538, 0.612, 0.555, 0.602, 0.548],
    'college_per': [28.5, 24.8, 22.5, 26.2, 27.1, 23.5, 25.8, 24.2, 26.5, 23.8],
    'height_inches': [75, 71, 69, 78, 74, 68, 76, 72, 77, 70],
    'age_at_draft': [21.5, 22.2, 21.8, 22.5, 21.2, 22.8, 21.5, 22.0, 21.8, 22.5],
    'conference_strength': [8.5, 7.2, 9.1, 6.8, 8.2, 8.8, 7.5, 8.0, 7.8, 8.5],
    # WNBA career outcomes (target variables)
    'wnba_ppg': [15.2, 12.5, 10.8, 8.5, 14.1, 11.2, 10.5, 13.2, 9.8, 11.8],
    'wnba_per': [18.5, 15.2, 14.1, 13.8, 17.2, 14.5, 15.8, 16.1, 14.2, 15.0],
    'wnba_minutes': [28.5, 24.2, 22.5, 18.2, 26.8, 23.5, 21.5, 25.2, 20.5, 24.0]
})

# =============================================================================
# 1. Feature Engineering
# =============================================================================

def engineer_features(df):
    """Create advanced features for projection model"""

    # Per-40 minute stats (normalize for playing time)
    df['college_pts_per_40'] = df['college_ppg'] * (40 / 35)  # Assume 35 mpg
    df['college_reb_per_40'] = df['college_rpg'] * (40 / 35)
    df['college_ast_per_40'] = df['college_apg'] * (40 / 35)

    # Age-adjusted metrics (younger = more upside)
    df['age_adjustment'] = 23 - df['age_at_draft']  # Positive = younger
    df['age_adj_per'] = df['college_per'] * (1 + 0.05 * df['age_adjustment'])

    # Conference-adjusted stats
    df['adj_ppg'] = df['college_ppg'] * (df['conference_strength'] / 8.0)
    df['adj_per'] = df['college_per'] * (df['conference_strength'] / 8.0)

    # Versatility score (combination of skills)
    df['versatility'] = (
        (df['college_ppg'] / 20) +
        (df['college_rpg'] / 8) +
        (df['college_apg'] / 5)
    ) / 3

    # Height-adjusted scoring
    df['height_adj_scoring'] = df['college_ppg'] / (df['height_inches'] / 72)

    return df

historical_prospects = engineer_features(historical_prospects)

# =============================================================================
# 2. Build Projection Model
# =============================================================================

# Select features for model
features = ['college_pts_per_40', 'college_reb_per_40', 'college_ast_per_40',
            'college_ts_pct', 'age_adj_per', 'height_inches',
            'conference_strength', 'versatility']

X = historical_prospects[features]
y_ppg = historical_prospects['wnba_ppg']
y_per = historical_prospects['wnba_per']

# Split data (in practice, use larger dataset)
X_train, X_test, y_train_ppg, y_test_ppg = train_test_split(
    X, y_ppg, test_size=0.2, random_state=42
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Random Forest model
model_ppg = RandomForestRegressor(
    n_estimators=100,
    max_depth=5,
    min_samples_split=2,
    random_state=42
)

model_ppg.fit(X_train_scaled, y_train_ppg)

# Evaluate model
train_score = model_ppg.score(X_train_scaled, y_train_ppg)
test_predictions = model_ppg.predict(X_test_scaled)

print("=== WNBA PPG Projection Model ===")
print(f"Training R²: {train_score:.3f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': features,
    'importance': model_ppg.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFeature Importance for WNBA Success:")
print(feature_importance)

# =============================================================================
# 3. Translation Factors
# =============================================================================

def calculate_translation_factor(college_stats, wnba_stats, stat_name):
    """Calculate average translation factor from college to WNBA"""

    college_avg = college_stats.mean()
    wnba_avg = wnba_stats.mean()

    translation = wnba_avg / college_avg

    return translation

# Calculate translation factors
ppg_factor = calculate_translation_factor(
    historical_prospects['college_ppg'],
    historical_prospects['wnba_ppg'],
    'PPG'
)

per_factor = calculate_translation_factor(
    historical_prospects['college_per'],
    historical_prospects['wnba_per'],
    'PER'
)

print("\n=== College to WNBA Translation Factors ===")
print(f"PPG Translation: {ppg_factor:.3f} (WNBA = {ppg_factor:.1%} of college)")
print(f"PER Translation: {per_factor:.3f} (WNBA = {per_factor:.1%} of college)")

# =============================================================================
# 4. Prospect Evaluation Function
# =============================================================================

def evaluate_prospect(prospect_data, model, scaler, features):
    """Evaluate a college prospect's WNBA projection"""

    # Engineer features
    prospect_df = pd.DataFrame([prospect_data])
    prospect_df = engineer_features(prospect_df)

    # Prepare features
    X_prospect = prospect_df[features]
    X_prospect_scaled = scaler.transform(X_prospect)

    # Predict WNBA performance
    predicted_ppg = model.predict(X_prospect_scaled)[0]

    # Simple confidence interval (±15% based on model uncertainty)
    confidence_range = predicted_ppg * 0.15

    return {
        'predicted_ppg': predicted_ppg,
        'range_low': predicted_ppg - confidence_range,
        'range_high': predicted_ppg + confidence_range,
        'projection_grade': classify_projection(predicted_ppg)
    }

def classify_projection(ppg):
    """Classify prospect tier based on projected PPG"""
    if ppg >= 15:
        return "Star (Top 5 Pick)"
    elif ppg >= 12:
        return "Solid Starter (Top 15 Pick)"
    elif ppg >= 9:
        return "Role Player (Roster Spot)"
    else:
        return "Developmental (Borderline)"

# Example: Evaluate new prospect
new_prospect = {
    'college_ppg': 20.5,
    'college_rpg': 8.2,
    'college_apg': 4.5,
    'college_ts_pct': 0.580,
    'college_per': 26.8,
    'height_inches': 74,
    'age_at_draft': 21.8,
    'conference_strength': 8.5
}

projection = evaluate_prospect(new_prospect, model_ppg, scaler, features)

print("\n=== New Prospect Evaluation ===")
print(f"Projected WNBA PPG: {projection['predicted_ppg']:.1f}")
print(f"Confidence Range: {projection['range_low']:.1f} - {projection['range_high']:.1f}")
print(f"Projection Grade: {projection['projection_grade']}")

# =============================================================================
# 5. Position-Specific Analysis
# =============================================================================

# In practice, separate models for guards vs forwards vs posts
position_adjustments = {
    'Guard': {'ppg_weight': 1.1, 'ast_weight': 1.3, 'reb_weight': 0.8},
    'Forward': {'ppg_weight': 1.0, 'ast_weight': 1.0, 'reb_weight': 1.1},
    'Post': {'ppg_weight': 0.9, 'ast_weight': 0.7, 'reb_weight': 1.3}
}

print("\n=== Position-Specific Adjustments ===")
for position, weights in position_adjustments.items():
    print(f"{position}: {weights}")

print("\n=== Translation Model Complete ===")
print("✓ Feature engineering")
print("✓ Projection model trained")
print("✓ Translation factors calculated")
print("✓ Prospect evaluation framework")

R: NCAA to WNBA Projection with wehoop

library(wehoop)
library(tidyverse)
library(randomForest)
library(caret)

# The wehoop package provides access to both WNBA and NCAA women's basketball data

# =============================================================================
# 1. Load NCAA Women's Basketball Data
# =============================================================================

# Load NCAA women's basketball player box scores
ncaa_players <- wehoop::load_wbb_player_box(seasons = 2023)

# Calculate season averages for college players
ncaa_season_stats <- ncaa_players %>%
  group_by(athlete_id, athlete_display_name, team_display_name) %>%
  summarise(
    games = n(),
    college_ppg = mean(points, na.rm = TRUE),
    college_rpg = mean(rebounds, na.rm = TRUE),
    college_apg = mean(assists, na.rm = TRUE),
    college_minutes = mean(minutes, na.rm = TRUE),
    college_fgm = sum(field_goals_made, na.rm = TRUE),
    college_fga = sum(field_goals_attempted, na.rm = TRUE),
    college_fg3m = sum(three_point_field_goals_made, na.rm = TRUE),
    college_ftm = sum(free_throws_made, na.rm = TRUE),
    college_fta = sum(free_throws_attempted, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(games >= 15) %>%
  mutate(
    # Calculate efficiency metrics
    college_ts_pct = college_ppg / (2 * (college_fga/games + 0.44 * college_fta/games)),
    college_fg_pct = college_fgm / college_fga,

    # Per-40 minute stats
    college_pts_per_40 = college_ppg * (40 / college_minutes),
    college_reb_per_40 = college_rpg * (40 / college_minutes),
    college_ast_per_40 = college_apg * (40 / college_minutes),

    # Simplified PER
    college_per = (college_ppg + college_rpg + college_apg) / college_minutes * 40
  )

cat("=== Top NCAA Prospects by Stats ===\n")
print(ncaa_season_stats %>%
  select(athlete_display_name, team_display_name, college_ppg,
         college_rpg, college_apg, college_ts_pct) %>%
  arrange(desc(college_ppg)) %>%
  head(10))

# =============================================================================
# 2. Historical Translation Analysis
# =============================================================================

# In practice, you would have a database linking NCAA players to WNBA careers
# For demonstration, we'll create a synthetic historical dataset

historical_translation <- tibble(
  player = paste("Player", LETTERS[1:20]),
  college_ppg = runif(20, 12, 25),
  college_rpg = runif(20, 4, 11),
  college_apg = runif(20, 2, 8),
  college_ts_pct = runif(20, 0.50, 0.65),
  college_per = runif(20, 20, 32),
  height_inches = sample(68:78, 20, replace = TRUE),
  age_at_draft = runif(20, 21, 23),
  conference_strength = runif(20, 6, 10),
  # WNBA outcomes
  wnba_ppg = NA,
  wnba_per = NA
)

# Simple translation: WNBA stats are typically 55-65% of college stats
historical_translation <- historical_translation %>%
  mutate(
    wnba_ppg = college_ppg * runif(n(), 0.55, 0.68),
    wnba_per = college_per * runif(n(), 0.58, 0.72),
    wnba_minutes = 20 + (college_ppg - 12) * 0.8 + rnorm(n(), 0, 2)
  )

# =============================================================================
# 3. Calculate Translation Factors
# =============================================================================

translation_factors <- historical_translation %>%
  summarise(
    ppg_factor = mean(wnba_ppg / college_ppg, na.rm = TRUE),
    per_factor = mean(wnba_per / college_per, na.rm = TRUE),
    ppg_sd = sd(wnba_ppg / college_ppg, na.rm = TRUE)
  )

cat("\n=== College to WNBA Translation Factors ===\n")
cat(sprintf("PPG Translation: %.3f (±%.3f)\n",
            translation_factors$ppg_factor,
            translation_factors$ppg_sd))
cat(sprintf("PER Translation: %.3f\n",
            translation_factors$per_factor))
cat(sprintf("\nInterpretation: WNBA PPG ≈ %.1f%% of College PPG\n",
            translation_factors$ppg_factor * 100))

# =============================================================================
# 4. Feature Engineering
# =============================================================================

prepare_prospect_features <- function(data) {
  data %>%
    mutate(
      # Age adjustment (younger players have more upside)
      age_adjustment = 23 - age_at_draft,
      age_adj_per = college_per * (1 + 0.05 * age_adjustment),

      # Conference-adjusted stats
      conf_adj_ppg = college_ppg * (conference_strength / 8),

      # Versatility score
      versatility = (college_ppg/20 + college_rpg/8 + college_apg/5) / 3,

      # Per-40 minute rates
      pts_per_40 = college_ppg * (40 / 35),  # Assume 35 mpg
      reb_per_40 = college_rpg * (40 / 35),
      ast_per_40 = college_apg * (40 / 35)
    )
}

historical_translation <- prepare_prospect_features(historical_translation)

# =============================================================================
# 5. Build Projection Model
# =============================================================================

# Select features
model_features <- c("pts_per_40", "reb_per_40", "ast_per_40",
                   "college_ts_pct", "age_adj_per", "height_inches",
                   "conference_strength", "versatility")

# Prepare training data
model_data <- historical_translation %>%
  select(all_of(model_features), wnba_ppg) %>%
  drop_na()

# Split data
set.seed(42)
train_index <- createDataPartition(model_data$wnba_ppg, p = 0.8, list = FALSE)
train_data <- model_data[train_index, ]
test_data <- model_data[-train_index, ]

# Train Random Forest model
rf_model <- randomForest(
  wnba_ppg ~ .,
  data = train_data,
  ntree = 100,
  mtry = 3,
  importance = TRUE
)

# Model performance
train_pred <- predict(rf_model, train_data)
test_pred <- predict(rf_model, test_data)

train_r2 <- cor(train_pred, train_data$wnba_ppg)^2
test_r2 <- cor(test_pred, test_data$wnba_ppg)^2

cat("\n=== WNBA Projection Model Performance ===\n")
cat(sprintf("Training R²: %.3f\n", train_r2))
cat(sprintf("Testing R²: %.3f\n", test_r2))

# Feature importance
importance_df <- importance(rf_model) %>%
  as.data.frame() %>%
  rownames_to_column("feature") %>%
  arrange(desc(`%IncMSE`))

cat("\n=== Feature Importance ===\n")
print(importance_df)

# =============================================================================
# 6. Prospect Evaluation Function
# =============================================================================

evaluate_wnba_prospect <- function(prospect_stats, model, translation_factors) {
  # Prepare prospect data
  prospect_df <- prepare_prospect_features(prospect_stats)

  # Predict WNBA performance
  predicted_ppg <- predict(model, newdata = prospect_df)

  # Calculate confidence interval (±20%)
  conf_interval <- predicted_ppg * 0.20

  # Classification
  grade <- case_when(
    predicted_ppg >= 15 ~ "Star Potential (Top 5)",
    predicted_ppg >= 12 ~ "Solid Starter (Top 15)",
    predicted_ppg >= 9 ~ "Role Player",
    TRUE ~ "Developmental"
  )

  tibble(
    predicted_wnba_ppg = predicted_ppg,
    range_low = predicted_ppg - conf_interval,
    range_high = predicted_ppg + conf_interval,
    projection_grade = grade
  )
}

# Example: Evaluate a top college prospect
top_college_prospect <- tibble(
  college_ppg = 22.5,
  college_rpg = 9.2,
  college_apg = 4.8,
  college_ts_pct = 0.595,
  college_per = 28.5,
  height_inches = 75,
  age_at_draft = 21.5,
  conference_strength = 9.0
)

prospect_evaluation <- evaluate_wnba_prospect(
  top_college_prospect,
  rf_model,
  translation_factors
)

cat("\n=== Prospect Evaluation ===\n")
cat(sprintf("College PPG: %.1f\n", top_college_prospect$college_ppg))
cat(sprintf("Projected WNBA PPG: %.1f\n", prospect_evaluation$predicted_wnba_ppg))
cat(sprintf("Confidence Range: %.1f - %.1f\n",
            prospect_evaluation$range_low,
            prospect_evaluation$range_high))
cat(sprintf("Grade: %s\n", prospect_evaluation$projection_grade))

# =============================================================================
# 7. Draft Class Evaluation
# =============================================================================

# Evaluate multiple prospects
evaluate_draft_class <- function(prospects_df, model) {
  prospects_df %>%
    rowwise() %>%
    mutate(
      evaluation = list(evaluate_wnba_prospect(
        cur_data(), model, translation_factors
      ))
    ) %>%
    unnest(evaluation) %>%
    arrange(desc(predicted_wnba_ppg))
}

cat("\n=== Translation Framework Complete ===\n")
cat("✓ NCAA data loaded and processed\n")
cat("✓ Translation factors calculated\n")
cat("✓ Projection model trained\n")
cat("✓ Prospect evaluation framework ready\n")
cat("✓ Draft class analysis capability\n")

Translation Challenges

College-to-WNBA projection is inherently uncertain. Factors beyond statistics—work ethic, basketball IQ, injury history, and fit with team system—play crucial roles in professional success. Statistical models provide a baseline expectation, but scouting and qualitative evaluation remain essential.

Effective Prospect Evaluation

  • Adjust for conference strength and competition quality
  • Favor younger players with similar production (more upside)
  • Prioritize efficiency over raw scoring volume
  • Consider versatility—players with multiple skills translate better
  • Account for positional scarcity in WNBA evaluation

Common Projection Mistakes

  • Overvaluing high-usage scorers on weak college teams
  • Ignoring shooting efficiency and defensive ability
  • Not adjusting for age (senior dominance vs freshman potential)
  • Underestimating the physicality gap from college to pros

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.