NHL Draft Analytics

Beginner 10 min read 1 views Nov 27, 2025

NHL Draft Prediction Models

NHL draft success is notoriously difficult to predict, but modern analytics can significantly improve scouting decisions. By analyzing historical draft data and player development patterns, teams can identify which metrics best predict NHL success and build predictive models to evaluate prospects.

Key Draft Analytics Components

  • Performance Metrics: Points per game, goals, assists, plus/minus
  • Physical Attributes: Height, weight, skating speed, age at draft
  • League Quality Scores: Adjusting for competition level
  • Success Definition: Games played, career longevity, impact metrics

Building a Draft Prediction Model

Python: Random Forest Draft Model

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load draft and performance data
draft_data = pd.read_csv('nhl_draft_history.csv')

# Feature engineering for draft prediction
features = ['points_per_game', 'goals', 'assists',
            'plus_minus', 'age_at_draft', 'height_inches',
            'weight_lbs', 'league_quality_score']

# Target: Whether player became NHL regular (100+ games)
X = draft_data[features]
y = draft_data['nhl_success']

# Split and scale data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train draft prediction model
model = RandomForestClassifier(
    n_estimators=200,
    max_depth=10,
    min_samples_split=20,
    random_state=42
)

model.fit(X_train_scaled, y_train)

# Evaluate model
train_score = model.score(X_train_scaled, y_train)
test_score = model.score(X_test_scaled, y_test)

print(f"Training Accuracy: {train_score:.3f}")
print(f"Testing Accuracy: {test_score:.3f}")

# Feature importance for draft evaluation
feature_importance = pd.DataFrame({
    'feature': features,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nMost Important Draft Factors:")
print(feature_importance)

# Predict success probability for new prospect
new_prospect = pd.DataFrame({
    'points_per_game': [1.45],
    'goals': [42],
    'assists': [38],
    'plus_minus': [15],
    'age_at_draft': [18.2],
    'height_inches': [73],
    'weight_lbs': [195],
    'league_quality_score': [8.5]
})

prospect_scaled = scaler.transform(new_prospect)
success_prob = model.predict_proba(prospect_scaled)[0][1]

print(f"\nProspect NHL Success Probability: {success_prob:.1%}")

R: Draft Prediction and Analysis

library(tidyverse)
library(randomForest)
library(caret)

# Load NHL draft data
draft_data <- read_csv("nhl_draft_history.csv")

# Prepare features for modeling
draft_features <- draft_data %>%
  select(points_per_game, goals, assists, plus_minus,
         age_at_draft, height_inches, weight_lbs,
         league_quality_score, nhl_success)

# Split data
set.seed(42)
train_index <- createDataPartition(draft_features$nhl_success,
                                   p = 0.8, list = FALSE)
train_data <- draft_features[train_index, ]
test_data <- draft_features[-train_index, ]

# Train random forest model
rf_model <- randomForest(
  as.factor(nhl_success) ~ .,
  data = train_data,
  ntree = 200,
  mtry = 3,
  importance = TRUE
)

# Model performance
train_pred <- predict(rf_model, train_data)
test_pred <- predict(rf_model, test_data)

train_acc <- mean(train_pred == train_data$nhl_success)
test_acc <- mean(test_pred == test_data$nhl_success)

cat(sprintf("Training Accuracy: %.3f\n", train_acc))
cat(sprintf("Testing Accuracy: %.3f\n", test_acc))

# Variable importance
importance_df <- importance(rf_model) %>%
  as.data.frame() %>%
  rownames_to_column("variable") %>%
  arrange(desc(MeanDecreaseGini))

print("Most Important Draft Factors:")
print(importance_df)

# Predict for new prospect
new_prospect <- data.frame(
  points_per_game = 1.45,
  goals = 42,
  assists = 38,
  plus_minus = 15,
  age_at_draft = 18.2,
  height_inches = 73,
  weight_lbs = 195,
  league_quality_score = 8.5
)

success_prob <- predict(rf_model, new_prospect, type = "prob")
cat(sprintf("\nProspect NHL Success Probability: %.1f%%\n",
            success_prob[2] * 100))

# Visualize feature importance
ggplot(importance_df[1:8, ],
       aes(x = reorder(variable, MeanDecreaseGini),
           y = MeanDecreaseGini)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "NHL Draft Prediction - Feature Importance",
       x = "Feature", y = "Importance Score") +
  theme_minimal()

Draft Round Analysis

Historical success rates vary dramatically by draft round. First-round picks have approximately 60% chance of playing 100+ NHL games, while seventh-round picks succeed only 10% of the time. Understanding these baseline rates helps contextualize individual predictions.

Key Predictive Factors

  • Points per game is typically the strongest predictor
  • Age at draft matters - younger players with similar stats have higher upside
  • League quality adjustments are critical for fair comparisons
  • Physical attributes have moderate predictive power

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.