Recruiting is the lifeblood of college football programs. Unlike professional sports where teams can acquire talent through trades and free agency, college football programs build their rosters primarily through recruiting high school prospects...
In This Chapter
Chapter 20: Recruiting Analytics
Introduction
Recruiting is the lifeblood of college football programs. Unlike professional sports where teams can acquire talent through trades and free agency, college football programs build their rosters primarily through recruiting high school prospects. This chapter explores how data analytics transforms the recruiting process, from evaluating individual prospects to building optimized recruiting classes.
The application of analytics to recruiting addresses fundamental questions: How do we identify which high school players will succeed at the college level? How should programs allocate limited recruiting resources across thousands of prospects? Can we predict which players will develop into starters, stars, or NFL draft picks?
Learning Objectives:
By the end of this chapter, you will be able to: - Understand the recruiting data landscape and evaluation metrics - Build prospect evaluation models using measurables and performance data - Analyze recruiting class composition and team needs - Predict player development trajectories - Evaluate recruiting efficiency across programs - Apply machine learning to recruit ranking and classification
20.1 The Recruiting Data Landscape
Types of Recruiting Data
Modern recruiting analytics draws from multiple data sources:
Physical Measurables: - Height, weight, body composition - Speed metrics (40-yard dash, shuttle times) - Explosiveness (vertical jump, broad jump) - Position-specific measurements
Performance Data: - High school statistics - Camp and combine performances - Game film evaluations - All-star game appearances
Contextual Information: - Competition level (state, classification) - Team quality and scheme - Teammate quality - Coaching quality
Evaluation Scores: - Composite ratings (247Sports, Rivals, ESPN, On3) - Position-specific grades - Star ratings (2-5 star scale) - National/state/position rankings
Understanding Composite Ratings
The industry standard uses star ratings on a 2-5 scale, with the top ~350 players nationally receiving 5-star designations. Understanding what these ratings represent is essential:
import numpy as np
import pandas as pd
from typing import Dict, List, Tuple
class RecruitingRatingsExplainer:
"""
Explain and analyze recruiting rating systems.
"""
def __init__(self):
# Approximate distributions by star rating
self.star_distributions = {
5: {'count_per_year': 30, 'percentile_min': 0.9985},
4: {'count_per_year': 320, 'percentile_min': 0.98},
3: {'count_per_year': 1800, 'percentile_min': 0.90},
2: {'count_per_year': 5000, 'percentile_min': 0.70}
}
# Historical outcomes by star rating
self.outcomes = {
5: {
'start_rate': 0.85,
'all_conference_rate': 0.45,
'draft_rate': 0.55,
'first_round_rate': 0.20
},
4: {
'start_rate': 0.65,
'all_conference_rate': 0.18,
'draft_rate': 0.22,
'first_round_rate': 0.04
},
3: {
'start_rate': 0.35,
'all_conference_rate': 0.06,
'draft_rate': 0.06,
'first_round_rate': 0.01
},
2: {
'start_rate': 0.12,
'all_conference_rate': 0.015,
'draft_rate': 0.015,
'first_round_rate': 0.001
}
}
def explain_rating(self, rating: float, stars: int) -> Dict:
"""
Explain what a rating means.
Parameters:
-----------
rating : float
Composite rating (typically 0.8000 - 1.0000)
stars : int
Star rating (2-5)
Returns:
--------
dict : Rating explanation
"""
outcomes = self.outcomes.get(stars, self.outcomes[2])
dist = self.star_distributions.get(stars, self.star_distributions[2])
return {
'rating': rating,
'stars': stars,
'national_percentile': dist['percentile_min'],
'approximate_national_rank': self._estimate_rank(rating),
'expected_outcomes': outcomes,
'interpretation': self._get_interpretation(stars)
}
def _estimate_rank(self, rating: float) -> int:
"""Estimate national rank from rating."""
# Higher rating = lower (better) rank
# Rating of 1.0000 = #1
# Rating of 0.9800 ≈ top 50
if rating >= 0.9990:
return int((1.0 - rating) * 10000) + 1
elif rating >= 0.9800:
return int((1.0 - rating) * 5000) + 1
else:
return int((1.0 - rating) * 3000) + 50
def _get_interpretation(self, stars: int) -> str:
"""Get human-readable interpretation."""
interpretations = {
5: "Elite prospect. Immediate impact expected. Future NFL draft pick.",
4: "Blue-chip prospect. Likely multi-year starter. Strong development potential.",
3: "Solid prospect. May develop into starter with time. Depth contributor.",
2: "Developmental prospect. Project player requiring significant development."
}
return interpretations.get(stars, "Unknown rating level")
Data Quality Considerations
Recruiting data presents unique quality challenges:
class RecruitingDataQuality:
"""
Assess and handle recruiting data quality issues.
"""
def __init__(self):
self.known_issues = {
'measurables': [
'Self-reported heights often inflated',
'Camp times may differ from verified times',
'Weight fluctuates during season'
],
'statistics': [
'Competition level varies significantly',
'Scheme differences affect raw numbers',
'Team quality impacts individual stats',
'Incomplete data for many prospects'
],
'ratings': [
'Ratings bias toward prospects who attend camps',
'Regional exposure differences',
'Late bloomers often underrated',
'Rating inflation over time'
]
}
def assess_data_completeness(self,
recruit_data: pd.DataFrame) -> Dict:
"""
Assess completeness of recruiting dataset.
Parameters:
-----------
recruit_data : pd.DataFrame
Recruiting data with various features
Returns:
--------
dict : Completeness assessment
"""
assessment = {}
# Check each column
for col in recruit_data.columns:
missing = recruit_data[col].isna().sum()
total = len(recruit_data)
assessment[col] = {
'missing_count': missing,
'missing_pct': missing / total,
'complete_pct': 1 - (missing / total)
}
# Overall completeness
assessment['overall'] = {
'avg_completeness': np.mean([v['complete_pct']
for v in assessment.values()
if isinstance(v, dict) and 'complete_pct' in v]),
'fully_complete_rows': (recruit_data.notna().all(axis=1)).sum()
}
return assessment
def standardize_measurables(self,
recruit_data: pd.DataFrame) -> pd.DataFrame:
"""
Standardize physical measurables.
Applies common adjustments:
- Convert heights to inches
- Verify 40-time ranges
- Flag suspicious values
"""
df = recruit_data.copy()
# Height standardization
if 'height' in df.columns:
# Convert "6-2" format to inches
if df['height'].dtype == object:
df['height_inches'] = df['height'].apply(
self._parse_height
)
else:
df['height_inches'] = df['height']
# 40-time validation
if 'forty_time' in df.columns:
df['forty_valid'] = df['forty_time'].between(4.2, 5.5)
df.loc[~df['forty_valid'], 'forty_time'] = np.nan
# Weight validation
if 'weight' in df.columns:
df['weight_valid'] = df['weight'].between(150, 350)
return df
def _parse_height(self, height_str: str) -> float:
"""Convert height string to inches."""
if pd.isna(height_str):
return np.nan
try:
if '-' in str(height_str):
feet, inches = str(height_str).split('-')
return int(feet) * 12 + int(inches)
else:
return float(height_str)
except:
return np.nan
20.2 Prospect Evaluation Models
Physical Profile Analysis
Physical measurables provide objective data for prospect evaluation:
class PhysicalProfileAnalyzer:
"""
Analyze prospect physical profiles by position.
"""
def __init__(self):
# Position-specific ideal ranges (based on NFL combine data)
self.position_profiles = {
'QB': {
'height': (74, 78), # 6'2" - 6'6"
'weight': (210, 240),
'forty': (4.5, 5.0),
'arm_length': (31, 34)
},
'RB': {
'height': (68, 73), # 5'8" - 6'1"
'weight': (195, 225),
'forty': (4.35, 4.6),
'vertical': (34, 42)
},
'WR': {
'height': (70, 76), # 5'10" - 6'4"
'weight': (180, 220),
'forty': (4.3, 4.55),
'vertical': (35, 44)
},
'TE': {
'height': (75, 79), # 6'3" - 6'7"
'weight': (240, 265),
'forty': (4.5, 4.8),
'vertical': (32, 38)
},
'OT': {
'height': (76, 80), # 6'4" - 6'8"
'weight': (290, 330),
'forty': (5.0, 5.4),
'arm_length': (33, 36)
},
'EDGE': {
'height': (74, 78),
'weight': (240, 275),
'forty': (4.5, 4.8),
'vertical': (32, 40)
},
'CB': {
'height': (69, 74), # 5'9" - 6'2"
'weight': (180, 205),
'forty': (4.3, 4.5),
'vertical': (36, 44)
}
}
def calculate_physical_score(self,
measurables: Dict,
position: str) -> Dict:
"""
Calculate physical profile score for position.
Parameters:
-----------
measurables : dict
Player's physical measurements
position : str
Target position
Returns:
--------
dict : Physical profile analysis
"""
if position not in self.position_profiles:
return {'error': f'Unknown position: {position}'}
profile = self.position_profiles[position]
scores = {}
for metric, (low, high) in profile.items():
if metric not in measurables or pd.isna(measurables[metric]):
scores[metric] = {'score': None, 'percentile': None}
continue
value = measurables[metric]
# For speed metrics, lower is better
if metric in ['forty']:
if value <= low:
score = 100
elif value >= high:
score = 50
else:
score = 100 - 50 * (value - low) / (high - low)
else:
# For other metrics, in-range is ideal
mid = (low + high) / 2
range_size = (high - low) / 2
if low <= value <= high:
score = 100 - 20 * abs(value - mid) / range_size
elif value < low:
score = max(0, 80 - 30 * (low - value) / range_size)
else:
score = max(0, 80 - 30 * (value - high) / range_size)
scores[metric] = {
'value': value,
'ideal_range': (low, high),
'score': score
}
# Overall physical score
valid_scores = [s['score'] for s in scores.values()
if s.get('score') is not None]
return {
'position': position,
'metric_scores': scores,
'overall_score': np.mean(valid_scores) if valid_scores else None,
'metrics_evaluated': len(valid_scores)
}
def find_position_fit(self,
measurables: Dict) -> List[Dict]:
"""
Find best position fits for a prospect.
Returns positions ranked by physical fit.
"""
fits = []
for position in self.position_profiles.keys():
analysis = self.calculate_physical_score(measurables, position)
if analysis.get('overall_score') is not None:
fits.append({
'position': position,
'fit_score': analysis['overall_score'],
'details': analysis['metric_scores']
})
return sorted(fits, key=lambda x: x['fit_score'], reverse=True)
Performance-Based Evaluation
High school statistics require careful context adjustment:
class HighSchoolStatsAnalyzer:
"""
Analyze and contextualize high school statistics.
"""
def __init__(self):
# Competition level adjustments
self.competition_factors = {
'texas_6a': 1.0, # Top competition
'california_d1': 1.0,
'florida_8a': 1.0,
'ohio_d1': 0.95,
'other_6a': 0.90,
'smaller_state_top': 0.85,
'lower_classification': 0.75,
'unknown': 0.80
}
# Position-specific stat weights
self.stat_weights = {
'QB': {
'completion_pct': 0.20,
'yards_per_attempt': 0.25,
'td_int_ratio': 0.25,
'qb_rating': 0.30
},
'RB': {
'yards_per_carry': 0.35,
'total_yards': 0.25,
'td_rate': 0.20,
'receiving': 0.20
},
'WR': {
'yards_per_catch': 0.30,
'catch_rate': 0.25,
'total_yards': 0.25,
'td_rate': 0.20
}
}
def contextualize_stats(self,
stats: Dict,
position: str,
competition_level: str) -> Dict:
"""
Adjust statistics for competition level.
Parameters:
-----------
stats : dict
Raw high school statistics
position : str
Player position
competition_level : str
Level of competition
Returns:
--------
dict : Contextualized statistics
"""
factor = self.competition_factors.get(
competition_level,
self.competition_factors['unknown']
)
# Apply competition adjustment to rate stats
adjusted = {}
for stat, value in stats.items():
if stat in ['yards_per_carry', 'yards_per_catch', 'yards_per_attempt']:
# Rate stats get partial adjustment
adjusted[stat] = value * (0.7 + 0.3 * factor)
elif stat in ['completion_pct', 'catch_rate']:
# Percentage stats adjusted less
adjusted[stat] = value * (0.8 + 0.2 * factor)
else:
# Counting stats not directly adjusted
adjusted[stat] = value
adjusted['competition_factor'] = factor
adjusted['competition_level'] = competition_level
return adjusted
def calculate_production_score(self,
stats: Dict,
position: str) -> float:
"""
Calculate weighted production score.
"""
weights = self.stat_weights.get(position, {})
if not weights:
return None
score = 0
weight_sum = 0
for stat, weight in weights.items():
if stat in stats and stats[stat] is not None:
# Normalize to 0-100 scale
normalized = self._normalize_stat(stat, stats[stat])
score += weight * normalized
weight_sum += weight
return score / weight_sum if weight_sum > 0 else None
def _normalize_stat(self, stat: str, value: float) -> float:
"""Normalize stat to 0-100 scale."""
# Position-specific normalization ranges
ranges = {
'completion_pct': (0.50, 0.75),
'yards_per_attempt': (5.0, 12.0),
'td_int_ratio': (1.0, 6.0),
'yards_per_carry': (4.0, 10.0),
'yards_per_catch': (10.0, 25.0),
'catch_rate': (0.50, 0.80)
}
if stat not in ranges:
return 50 # Default middle score
low, high = ranges[stat]
normalized = (value - low) / (high - low) * 100
return max(0, min(100, normalized))
20.3 Composite Prospect Scoring
Building a Prospect Evaluation Model
Combining multiple data sources into a unified score:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
class ProspectEvaluationModel:
"""
Machine learning model for prospect evaluation.
Predicts probability of various career outcomes:
- Becoming a multi-year starter
- Earning all-conference honors
- Being drafted to NFL
"""
def __init__(self):
self.models = {}
self.scalers = {}
self.feature_importances = {}
# Features used in evaluation
self.features = [
# Physical
'height_inches', 'weight', 'forty_time',
'vertical_jump', 'shuttle',
# Ratings
'composite_rating', 'star_rating',
'position_rank', 'state_rank',
# Performance
'production_score', 'competition_factor',
# Context
'program_strength', 'position_need'
]
def train(self,
training_data: pd.DataFrame,
outcomes: List[str] = None):
"""
Train evaluation models on historical data.
Parameters:
-----------
training_data : pd.DataFrame
Historical recruit data with outcomes
outcomes : list
Outcome variables to predict
"""
if outcomes is None:
outcomes = ['became_starter', 'all_conference', 'drafted']
# Prepare features
X = training_data[self.features].copy()
# Handle missing values
X = X.fillna(X.median())
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
self.scalers['main'] = scaler
# Train model for each outcome
for outcome in outcomes:
if outcome not in training_data.columns:
continue
y = training_data[outcome].values
model = GradientBoostingClassifier(
n_estimators=100,
max_depth=4,
learning_rate=0.1,
random_state=42
)
# Cross-validation
cv_scores = cross_val_score(model, X_scaled, y, cv=5, scoring='roc_auc')
# Train on full data
model.fit(X_scaled, y)
self.models[outcome] = model
self.feature_importances[outcome] = dict(zip(
self.features,
model.feature_importances_
))
print(f"Model for {outcome}:")
print(f" CV AUC: {cv_scores.mean():.3f} (+/- {cv_scores.std()*2:.3f})")
def evaluate_prospect(self, prospect_data: Dict) -> Dict:
"""
Evaluate a prospect using trained models.
Parameters:
-----------
prospect_data : dict
Prospect's features
Returns:
--------
dict : Evaluation results with probabilities
"""
# Prepare features
X = pd.DataFrame([prospect_data])[self.features]
X = X.fillna(0) # Handle missing
X_scaled = self.scalers['main'].transform(X)
# Get predictions from each model
predictions = {}
for outcome, model in self.models.items():
prob = model.predict_proba(X_scaled)[0, 1]
predictions[outcome] = {
'probability': prob,
'percentile': self._prob_to_percentile(prob, outcome)
}
# Calculate composite score
composite = self._calculate_composite_score(predictions)
return {
'outcome_predictions': predictions,
'composite_score': composite,
'top_factors': self._get_top_factors(prospect_data)
}
def _calculate_composite_score(self, predictions: Dict) -> float:
"""Calculate weighted composite score."""
weights = {
'became_starter': 0.30,
'all_conference': 0.35,
'drafted': 0.35
}
score = 0
for outcome, pred in predictions.items():
if outcome in weights:
score += weights[outcome] * pred['probability']
return score * 100 # Scale to 0-100
def _prob_to_percentile(self, prob: float, outcome: str) -> float:
"""Convert probability to percentile rank."""
# Based on typical distributions
base_rates = {
'became_starter': 0.35,
'all_conference': 0.10,
'drafted': 0.08
}
base = base_rates.get(outcome, 0.10)
return min(99, max(1, prob / base * 50))
def _get_top_factors(self, prospect_data: Dict) -> List[Dict]:
"""Identify top contributing factors."""
factors = []
for outcome, importances in self.feature_importances.items():
for feature, importance in importances.items():
if importance > 0.05 and feature in prospect_data:
factors.append({
'feature': feature,
'value': prospect_data[feature],
'importance': importance,
'outcome': outcome
})
return sorted(factors, key=lambda x: x['importance'], reverse=True)[:5]
Rating Validation and Calibration
Assessing how well ratings predict outcomes:
class RatingValidator:
"""
Validate and calibrate recruiting ratings.
"""
def __init__(self):
self.calibration_data = None
def calculate_rating_accuracy(self,
historical_data: pd.DataFrame,
rating_col: str,
outcome_col: str) -> Dict:
"""
Calculate accuracy of ratings predicting outcomes.
Parameters:
-----------
historical_data : pd.DataFrame
Data with ratings and outcomes
rating_col : str
Column with ratings
outcome_col : str
Column with outcome variable
Returns:
--------
dict : Accuracy metrics
"""
from sklearn.metrics import roc_auc_score, brier_score_loss
# Remove missing values
df = historical_data[[rating_col, outcome_col]].dropna()
ratings = df[rating_col].values
outcomes = df[outcome_col].values
# Normalize ratings to probabilities (0-1)
ratings_normalized = (ratings - ratings.min()) / (ratings.max() - ratings.min())
# Calculate metrics
auc = roc_auc_score(outcomes, ratings_normalized)
brier = brier_score_loss(outcomes, ratings_normalized)
# Calibration by decile
calibration = self._calculate_calibration(ratings_normalized, outcomes)
return {
'auc': auc,
'brier_score': brier,
'correlation': np.corrcoef(ratings, outcomes)[0, 1],
'calibration': calibration,
'n_samples': len(df)
}
def _calculate_calibration(self,
predictions: np.ndarray,
outcomes: np.ndarray,
n_bins: int = 10) -> List[Dict]:
"""Calculate calibration by prediction bins."""
bins = np.linspace(0, 1, n_bins + 1)
calibration = []
for i in range(n_bins):
mask = (predictions >= bins[i]) & (predictions < bins[i+1])
if mask.sum() > 0:
calibration.append({
'bin': f'{bins[i]:.1f}-{bins[i+1]:.1f}',
'predicted': predictions[mask].mean(),
'actual': outcomes[mask].mean(),
'count': mask.sum()
})
return calibration
def compare_rating_services(self,
historical_data: pd.DataFrame,
rating_cols: List[str],
outcome_col: str) -> pd.DataFrame:
"""
Compare accuracy of different rating services.
Parameters:
-----------
historical_data : pd.DataFrame
Data with multiple rating columns
rating_cols : list
List of rating column names
outcome_col : str
Outcome variable
Returns:
--------
pd.DataFrame : Comparison of rating services
"""
comparisons = []
for col in rating_cols:
if col not in historical_data.columns:
continue
accuracy = self.calculate_rating_accuracy(
historical_data, col, outcome_col
)
comparisons.append({
'rating_service': col,
'auc': accuracy['auc'],
'brier_score': accuracy['brier_score'],
'correlation': accuracy['correlation'],
'n_samples': accuracy['n_samples']
})
return pd.DataFrame(comparisons).sort_values('auc', ascending=False)
20.4 Recruiting Class Analysis
Class Composition Optimization
Building an optimal recruiting class requires balancing multiple factors:
class RecruitingClassOptimizer:
"""
Optimize recruiting class composition.
Balances:
- Position needs
- Scholarship limits
- Development timelines
- Budget constraints
"""
def __init__(self, scholarship_limit: int = 25):
self.scholarship_limit = scholarship_limit
# Position group targets
self.position_targets = {
'QB': (1, 2),
'RB': (2, 3),
'WR': (3, 5),
'TE': (1, 2),
'OL': (4, 6),
'DL': (4, 5),
'LB': (3, 4),
'DB': (4, 5),
'SPEC': (0, 1)
}
def analyze_roster_needs(self,
current_roster: pd.DataFrame) -> Dict:
"""
Analyze current roster to identify needs.
Parameters:
-----------
current_roster : pd.DataFrame
Current roster with position, year columns
Returns:
--------
dict : Position needs analysis
"""
needs = {}
for position, (min_target, max_target) in self.position_targets.items():
pos_players = current_roster[current_roster['position'] == position]
# Count by class year
by_year = pos_players.groupby('class_year').size()
# Calculate departures after this season
departing = by_year.get('SR', 0) + by_year.get('GR', 0)
# Returning players
returning = len(pos_players) - departing
# Calculate need level
if returning < min_target:
need_level = 'critical'
need_count = max_target - returning
elif returning < max_target:
need_level = 'moderate'
need_count = max_target - returning
else:
need_level = 'low'
need_count = max(0, min_target - (returning - departing))
needs[position] = {
'current': len(pos_players),
'returning': returning,
'departing': departing,
'need_level': need_level,
'recommended_signees': need_count
}
return needs
def optimize_class(self,
available_recruits: pd.DataFrame,
position_needs: Dict,
budget: float = None) -> pd.DataFrame:
"""
Optimize recruiting class selection.
Uses greedy optimization to maximize value
while meeting position needs.
Parameters:
-----------
available_recruits : pd.DataFrame
Available committed/interested recruits
position_needs : dict
Position needs from analyze_roster_needs
budget : float, optional
NIL/recruiting budget
Returns:
--------
pd.DataFrame : Optimized class
"""
from scipy.optimize import milp, LinearConstraint, Bounds
# Simplified greedy approach
selected = []
remaining = available_recruits.copy()
spots_remaining = self.scholarship_limit
# First pass: Fill critical needs
for position, need in position_needs.items():
if need['need_level'] == 'critical':
pos_recruits = remaining[remaining['position'] == position]
pos_recruits = pos_recruits.sort_values('composite_rating', ascending=False)
to_select = min(
need['recommended_signees'],
len(pos_recruits),
spots_remaining
)
for _, recruit in pos_recruits.head(to_select).iterrows():
selected.append(recruit)
spots_remaining -= 1
remaining = remaining[~remaining.index.isin(
pos_recruits.head(to_select).index
)]
# Second pass: Fill moderate needs with best available
for position, need in position_needs.items():
if need['need_level'] == 'moderate' and spots_remaining > 0:
pos_recruits = remaining[remaining['position'] == position]
pos_recruits = pos_recruits.sort_values('composite_rating', ascending=False)
to_select = min(
need['recommended_signees'],
len(pos_recruits),
spots_remaining
)
for _, recruit in pos_recruits.head(to_select).iterrows():
selected.append(recruit)
spots_remaining -= 1
remaining = remaining[~remaining.index.isin(
pos_recruits.head(to_select).index
)]
# Third pass: Best available for remaining spots
remaining = remaining.sort_values('composite_rating', ascending=False)
for _, recruit in remaining.head(spots_remaining).iterrows():
selected.append(recruit)
return pd.DataFrame(selected)
def evaluate_class(self, recruiting_class: pd.DataFrame) -> Dict:
"""
Evaluate a recruiting class.
Returns:
--------
dict : Class evaluation metrics
"""
return {
'size': len(recruiting_class),
'avg_rating': recruiting_class['composite_rating'].mean(),
'total_points': recruiting_class['composite_rating'].sum() * 100,
'five_stars': (recruiting_class['star_rating'] == 5).sum(),
'four_stars': (recruiting_class['star_rating'] == 4).sum(),
'three_stars': (recruiting_class['star_rating'] == 3).sum(),
'by_position': recruiting_class.groupby('position').size().to_dict(),
'top_recruit': recruiting_class.loc[
recruiting_class['composite_rating'].idxmax(), 'name'
] if len(recruiting_class) > 0 else None
}
Recruiting Class Projections
Projecting how a class will develop:
class ClassDevelopmentProjector:
"""
Project development of a recruiting class.
"""
def __init__(self):
# Historical development rates by star rating
self.development_curves = {
5: {
'freshman_impact': 0.30,
'year_2_starter': 0.70,
'year_3_star': 0.45,
'draft_rate': 0.55
},
4: {
'freshman_impact': 0.15,
'year_2_starter': 0.40,
'year_3_star': 0.18,
'draft_rate': 0.22
},
3: {
'freshman_impact': 0.05,
'year_2_starter': 0.20,
'year_3_star': 0.06,
'draft_rate': 0.06
}
}
def project_class_impact(self,
recruiting_class: pd.DataFrame,
years_out: int = 4) -> Dict:
"""
Project class impact over multiple years.
Parameters:
-----------
recruiting_class : pd.DataFrame
Recruiting class data
years_out : int
Years to project
Returns:
--------
dict : Year-by-year impact projections
"""
projections = {}
for year in range(1, years_out + 1):
year_proj = {
'expected_starters': 0,
'expected_contributors': 0,
'expected_all_conference': 0,
'by_position': {}
}
for _, recruit in recruiting_class.iterrows():
star = recruit.get('star_rating', 3)
curves = self.development_curves.get(star, self.development_curves[3])
if year == 1:
year_proj['expected_contributors'] += curves['freshman_impact']
elif year == 2:
year_proj['expected_starters'] += curves['year_2_starter']
elif year >= 3:
year_proj['expected_starters'] += curves['year_2_starter'] * 1.2
year_proj['expected_all_conference'] += curves['year_3_star']
projections[f'year_{year}'] = year_proj
# Draft projections
total_draft_expected = 0
for _, recruit in recruiting_class.iterrows():
star = recruit.get('star_rating', 3)
curves = self.development_curves.get(star, self.development_curves[3])
total_draft_expected += curves['draft_rate']
projections['draft_picks_expected'] = total_draft_expected
return projections
20.5 Transfer Portal Analytics
Transfer Success Prediction
The transfer portal has transformed roster management:
class TransferPortalAnalyzer:
"""
Analyze transfer portal activity and predict success.
"""
def __init__(self):
# Transfer success factors
self.success_factors = {
'same_conference': 0.15,
'step_down': 0.10,
'step_up': -0.15,
'playing_time_at_origin': 0.20,
'years_remaining': 0.10,
'original_rating': 0.25,
'scheme_fit': 0.15
}
def evaluate_transfer(self,
transfer_data: Dict,
origin_school: Dict,
destination_school: Dict) -> Dict:
"""
Evaluate a transfer portal candidate.
Parameters:
-----------
transfer_data : dict
Player's stats and ratings
origin_school : dict
Origin school information
destination_school : dict
Destination school information
Returns:
--------
dict : Transfer evaluation
"""
score = 50 # Base score
adjustments = {}
# Conference change
if origin_school.get('conference') == destination_school.get('conference'):
adj = self.success_factors['same_conference'] * 100
score += adj
adjustments['same_conference'] = adj
# Step up/down in competition
origin_tier = self._get_program_tier(origin_school)
dest_tier = self._get_program_tier(destination_school)
if dest_tier < origin_tier: # Lower tier = step down
adj = self.success_factors['step_down'] * 100
score += adj
adjustments['step_down'] = adj
elif dest_tier > origin_tier:
adj = self.success_factors['step_up'] * 100
score += adj
adjustments['step_up'] = adj
# Playing time at origin
if transfer_data.get('games_started', 0) > 6:
adj = self.success_factors['playing_time_at_origin'] * 100
score += adj
adjustments['was_starter'] = adj
# Original recruiting rating
original_rating = transfer_data.get('original_composite', 0.85)
rating_adj = (original_rating - 0.85) * 100 * 2
score += rating_adj
adjustments['recruit_rating'] = rating_adj
return {
'transfer_success_score': max(0, min(100, score)),
'adjustments': adjustments,
'recommendation': self._get_recommendation(score),
'risk_level': self._assess_risk(transfer_data, dest_tier)
}
def _get_program_tier(self, school: Dict) -> int:
"""Classify program tier (1-5, 1 being elite)."""
if school.get('conference') in ['SEC', 'Big Ten']:
avg_rank = school.get('avg_ranking', 50)
if avg_rank <= 15:
return 1
elif avg_rank <= 40:
return 2
else:
return 3
elif school.get('conference') in ['Big 12', 'ACC', 'Pac-12']:
return 3
elif school.get('level') == 'FBS':
return 4
else:
return 5
def _get_recommendation(self, score: float) -> str:
"""Generate recommendation based on score."""
if score >= 75:
return "Strong add - High confidence in transfer success"
elif score >= 60:
return "Good fit - Moderate confidence, monitor situation"
elif score >= 45:
return "Risky - Evaluate carefully, some concerns"
else:
return "Pass - Low confidence in successful transfer"
def _assess_risk(self, transfer_data: Dict, dest_tier: int) -> str:
"""Assess transfer risk level."""
risks = []
if transfer_data.get('times_transferred', 0) > 1:
risks.append("Multiple transfers")
if transfer_data.get('years_remaining', 4) <= 1:
risks.append("Limited eligibility")
if transfer_data.get('reason') in ['discipline', 'academic']:
risks.append("Non-football concerns")
if len(risks) == 0:
return "Low"
elif len(risks) == 1:
return "Moderate"
else:
return "High"
20.6 Recruiting Efficiency Metrics
Measuring Program Efficiency
How well do programs convert recruiting to on-field success?
class RecruitingEfficiencyAnalyzer:
"""
Analyze recruiting efficiency across programs.
"""
def __init__(self):
pass
def calculate_blue_chip_ratio(self,
roster_data: pd.DataFrame) -> float:
"""
Calculate blue chip ratio (4 and 5 star players).
The blue chip ratio is highly correlated with
championship contention.
"""
total = len(roster_data)
if total == 0:
return 0
blue_chips = len(roster_data[
roster_data['original_star_rating'] >= 4
])
return blue_chips / total
def calculate_development_efficiency(self,
program_history: pd.DataFrame) -> Dict:
"""
Calculate how well a program develops talent.
Compares actual outcomes to expected based on recruiting.
"""
# Expected outcomes based on average star rating
expected_starters_per_class = program_history.groupby('class_year').apply(
lambda x: self._expected_starters(x['original_star_rating'])
)
# Actual starters
actual_starters = program_history.groupby('class_year').apply(
lambda x: (x['games_started'] >= 6).sum()
)
development_factor = actual_starters.sum() / expected_starters_per_class.sum()
return {
'development_efficiency': development_factor,
'expected_starters': expected_starters_per_class.sum(),
'actual_starters': actual_starters.sum(),
'outperformed': development_factor > 1.0
}
def _expected_starters(self, ratings: pd.Series) -> float:
"""Calculate expected starters from rating series."""
expectations = {
5: 0.85,
4: 0.65,
3: 0.35,
2: 0.12
}
expected = 0
for rating in ratings:
expected += expectations.get(int(rating), 0.20)
return expected
def calculate_recruiting_roi(self,
program_data: pd.DataFrame,
budget_data: pd.DataFrame = None) -> Dict:
"""
Calculate return on recruiting investment.
"""
# Points (composite scores) per recruit
avg_points = program_data['composite_rating'].mean() * 100
# Wins generated by class (approximate)
classes = program_data.groupby('recruiting_class')
contributions = []
for class_year, players in classes:
class_contribution = {
'class': class_year,
'size': len(players),
'avg_rating': players['composite_rating'].mean(),
'starters_produced': (players['career_starts'] >= 12).sum(),
'all_conference': players['all_conference'].sum(),
'draft_picks': players['drafted'].sum()
}
contributions.append(class_contribution)
contributions_df = pd.DataFrame(contributions)
return {
'avg_class_rating': contributions_df['avg_rating'].mean(),
'avg_starters_per_class': contributions_df['starters_produced'].mean(),
'avg_all_conference_per_class': contributions_df['all_conference'].mean(),
'avg_draft_picks_per_class': contributions_df['draft_picks'].mean(),
'class_details': contributions_df
}
def rank_programs_by_efficiency(self,
all_programs: pd.DataFrame,
metric: str = 'development') -> pd.DataFrame:
"""
Rank programs by recruiting efficiency.
Parameters:
-----------
all_programs : pd.DataFrame
Data for all programs
metric : str
Efficiency metric to use
Returns:
--------
pd.DataFrame : Programs ranked by efficiency
"""
rankings = []
for program in all_programs['program'].unique():
program_data = all_programs[all_programs['program'] == program]
efficiency = self.calculate_development_efficiency(program_data)
rankings.append({
'program': program,
'development_efficiency': efficiency['development_efficiency'],
'expected_starters': efficiency['expected_starters'],
'actual_starters': efficiency['actual_starters'],
'class_count': program_data['recruiting_class'].nunique()
})
rankings_df = pd.DataFrame(rankings)
return rankings_df.sort_values('development_efficiency', ascending=False)
20.7 Practical Application: Complete Recruiting Dashboard
Integrated Recruiting Analytics System
class RecruitingAnalyticsDashboard:
"""
Complete recruiting analytics dashboard.
Integrates all recruiting analysis tools.
"""
def __init__(self, team_name: str):
self.team = team_name
self.physical_analyzer = PhysicalProfileAnalyzer()
self.stats_analyzer = HighSchoolStatsAnalyzer()
self.class_optimizer = RecruitingClassOptimizer()
self.transfer_analyzer = TransferPortalAnalyzer()
self.efficiency_analyzer = RecruitingEfficiencyAnalyzer()
def generate_prospect_report(self, prospect: Dict) -> str:
"""
Generate comprehensive prospect evaluation report.
"""
# Physical analysis
physical = self.physical_analyzer.calculate_physical_score(
prospect.get('measurables', {}),
prospect['position']
)
# Stats analysis
stats = self.stats_analyzer.calculate_production_score(
prospect.get('statistics', {}),
prospect['position']
)
# Position fit
position_fits = self.physical_analyzer.find_position_fit(
prospect.get('measurables', {})
)
report = f"""
PROSPECT EVALUATION REPORT
{'=' * 50}
Name: {prospect.get('name', 'Unknown')}
Position: {prospect['position']}
School: {prospect.get('high_school', 'Unknown')}
Rating: {prospect.get('star_rating', 0)}-star ({prospect.get('composite_rating', 0):.4f})
PHYSICAL PROFILE
{'-' * 30}
Overall Physical Score: {physical.get('overall_score', 'N/A'):.1f}/100
Key Measurements:
"""
for metric, data in physical.get('metric_scores', {}).items():
if isinstance(data, dict) and 'score' in data:
report += f" {metric}: {data.get('value', 'N/A')} (Score: {data['score']:.0f})\n"
report += f"""
PRODUCTION ANALYSIS
{'-' * 30}
Production Score: {stats:.1f}/100 if stats else 'N/A'
POSITION FIT ANALYSIS
{'-' * 30}
"""
for fit in position_fits[:3]:
report += f" {fit['position']}: {fit['fit_score']:.1f}/100\n"
report += f"""
PROJECTION
{'-' * 30}
Based on historical data for similar prospects:
- Probability of becoming starter: TBD%
- Probability of all-conference: TBD%
- Probability of being drafted: TBD%
RECOMMENDATION
{'-' * 30}
[Evaluation summary and recommendation]
"""
return report
def generate_class_summary(self,
current_class: pd.DataFrame,
current_roster: pd.DataFrame) -> str:
"""
Generate recruiting class summary report.
"""
# Analyze needs
needs = self.class_optimizer.analyze_roster_needs(current_roster)
# Evaluate class
class_eval = self.class_optimizer.evaluate_class(current_class)
summary = f"""
RECRUITING CLASS SUMMARY: {self.team}
{'=' * 50}
CLASS COMPOSITION
{'-' * 30}
Total Commits: {class_eval['size']}
Average Rating: {class_eval['avg_rating']:.4f}
Total Points: {class_eval['total_points']:.1f}
Star Breakdown:
5-stars: {class_eval['five_stars']}
4-stars: {class_eval['four_stars']}
3-stars: {class_eval['three_stars']}
By Position:
"""
for pos, count in class_eval['by_position'].items():
need = needs.get(pos, {})
need_level = need.get('need_level', 'unknown')
summary += f" {pos}: {count} commits (Need: {need_level})\n"
summary += f"""
POSITION NEEDS ANALYSIS
{'-' * 30}
"""
critical_needs = [p for p, n in needs.items() if n['need_level'] == 'critical']
moderate_needs = [p for p, n in needs.items() if n['need_level'] == 'moderate']
summary += f"Critical Needs: {', '.join(critical_needs) if critical_needs else 'None'}\n"
summary += f"Moderate Needs: {', '.join(moderate_needs) if moderate_needs else 'None'}\n"
return summary
Summary
This chapter covered the application of analytics to college football recruiting:
Key Concepts: 1. Recruiting Data Sources: Physical measurables, performance data, contextual information, and evaluation scores each provide unique insights 2. Prospect Evaluation: Combining physical profiles, statistical production, and contextual factors creates comprehensive assessments 3. Rating Systems: Understanding what star ratings mean and their limitations improves decision-making 4. Class Optimization: Balancing position needs, talent, and resources optimizes recruiting outcomes 5. Transfer Portal: The portal requires specialized evaluation considering previous experience and fit 6. Efficiency Metrics: Development efficiency and ROI metrics help programs evaluate their recruiting processes
Best Practices: - Use multiple data sources rather than relying solely on star ratings - Account for competition level when evaluating high school performance - Consider position-specific physical profiles - Balance immediate needs with long-term roster planning - Evaluate transfer candidates differently than high school prospects - Track development efficiency to improve recruiting processes
Next Steps: The next chapter explores Win Probability Models, applying these concepts to in-game decision making and real-time analytics.