14 min read

In This Chapter

Introduction
23.1 College Statistics Translation to NBA
23.2 Conference Strength Adjustments
23.3 Physical Measurements and Combine Data
23.4 Draft Position Value Curves
23.5 Bust Probability Modeling
23.6 International Prospect Evaluation
23.7 Age Adjustments for Prospects
23.8 Building a Draft Model from Scratch
23.9 Backtesting Draft Models
23.10 Practical Considerations for Draft Model Deployment
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 23: Draft Modeling and Prospect Evaluation

Introduction

The NBA Draft represents one of the highest-leverage decisions a franchise can make. A single selection can transform a struggling team into a contender for a decade, while a poor choice can set a franchise back years. Despite drafting being arguably the most important team-building mechanism in professional basketball, it remains one of the most uncertain endeavors in sports analytics.

This chapter provides a comprehensive framework for building, validating, and deploying draft models. We will explore the statistical translation of college performance to the NBA, account for the varying strength of competition across conferences, incorporate physical measurements and combine data, and develop probability models for prospect success and failure. By the end of this chapter, you will have the tools to construct your own draft evaluation system from scratch.

The challenge of draft modeling lies in its fundamental uncertainty. Unlike in-season analytics where we have hundreds of games to establish patterns, draft prospects often have limited sample sizes of relevant performance data. A college player might have only 30-40 games per season, often against vastly different competition levels. International prospects may play in leagues with different rules, pace, and talent levels. High school players (when eligible) have almost no relevant data at all.

Despite these challenges, systematic approaches to draft evaluation consistently outperform pure intuition. Research has shown that statistical models, even simple ones, can identify value that traditional scouting misses and flag risks that the eye test overlooks.

23.1 College Statistics Translation to NBA

The foundation of any draft model is understanding how college statistics translate to NBA performance. Raw college numbers are nearly meaningless without proper context and translation.

23.1.1 The Translation Problem

Consider two hypothetical prospects: - Player A: 22.5 PPG, 8.2 RPG, 3.1 APG in the Big 12 - Player B: 24.8 PPG, 9.5 RPG, 2.8 APG in the Summit League

Which player projects better? Without understanding how statistics translate across different contexts, we cannot answer this question. Player B has better raw numbers, but Player A faced significantly tougher competition.

23.1.2 Per-Possession Translation

The first step in translation is normalizing for pace. College basketball features vastly different tempos across teams and conferences. A player on a team that averages 75 possessions per game has more statistical opportunities than one on a team averaging 65 possessions.

Per-100-possession rates provide the baseline for comparison:

$$\text{Stat per 100 poss} = \frac{\text{Raw Stat}}{\text{Team Possessions}} \times 100$$

Where team possessions can be estimated as:

$$\text{Possessions} \approx \text{FGA} - \text{ORB} + \text{TOV} + 0.44 \times \text{FTA}$$

23.1.3 Usage and Efficiency Relationship

One of the most important concepts in statistical translation is the usage-efficiency tradeoff. Players with higher usage rates (proportion of team possessions used while on court) tend to have lower efficiency. This relationship is approximately linear for most usage ranges:

$$\text{TS\%}_{adjusted} = \text{TS\%}_{observed} + \beta \times (\text{USG\%}_{observed} - \text{USG\%}_{projected})$$

Research suggests $\beta \approx 0.01$ to $0.015$, meaning for every 1% decrease in usage, true shooting percentage increases by approximately 1-1.5 percentage points.

23.1.4 Minute-Load Adjustments

Playing 35 minutes per game in college while maintaining efficiency is different from playing 25 minutes. Fatigue affects performance, and players who can maintain production over heavy minutes demonstrate valuable endurance:

$$\text{Fatigue Factor} = \frac{\text{MPG}}{32} \times \text{TS\%}$$

Players with higher fatigue factors project to handle heavier NBA workloads.

23.1.5 Historical Translation Coefficients

Based on analysis of draft classes from 2005-2020, the following translation coefficients have been established for major statistical categories:

Statistic	Translation Coefficient	Standard Error
Points per 100	0.72	0.08
Rebounds per 100	0.85	0.06
Assists per 100	0.68	0.09
Steals per 100	0.75	0.11
Blocks per 100	0.82	0.10
3PT%	0.91	0.04
FT%	0.98	0.02

These coefficients represent the expected NBA production as a fraction of college production after controlling for pace and competition.

def translate_college_stats(player_stats, coefficients):
    """
    Translate college statistics to projected NBA statistics.

    Parameters:
    -----------
    player_stats : dict
        Dictionary containing per-100-possession college statistics
    coefficients : dict
        Translation coefficients for each statistic

    Returns:
    --------
    dict : Projected NBA statistics
    """
    translated = {}
    for stat, value in player_stats.items():
        if stat in coefficients:
            translated[stat] = value * coefficients[stat]['coefficient']
            translated[f'{stat}_lower'] = value * (coefficients[stat]['coefficient']
                                                    - 1.96 * coefficients[stat]['se'])
            translated[f'{stat}_upper'] = value * (coefficients[stat]['coefficient']
                                                    + 1.96 * coefficients[stat]['se'])
    return translated

23.2 Conference Strength Adjustments

Not all college basketball is created equal. The gap between a power conference and a mid-major can be immense, and failing to account for this leads to systematic errors in prospect evaluation.

23.2.1 Measuring Conference Strength

Conference strength can be measured through several methods:

Average Adjusted Efficiency Margin (AdjEM): The average efficiency margin of all teams in the conference after adjusting for schedule.
NBA Production Rate: The historical rate at which conference players become productive NBA players.
Inter-Conference Game Results: Head-to-head results between conferences in non-conference games and tournaments.

23.2.2 Conference Adjustment Factors

Based on historical data, the following conference adjustment factors apply to per-100-possession statistics (relative to a baseline of 1.0 for the average high-major conference):

Conference	Adjustment Factor
Big Ten	1.08
Big 12	1.06
ACC	1.05
SEC	1.04
Big East	1.03
Pac-12	1.00
American	0.92
Mountain West	0.91
Atlantic 10	0.88
WCC	0.85
Other Mid-Majors	0.75-0.85
Low-Major	0.65-0.75

These factors should be applied multiplicatively to translated statistics:

$$\text{Adjusted Stat} = \text{Translated Stat} \times \text{Conference Factor}$$

23.2.3 Within-Conference Variation

Conference adjustment alone is insufficient. A player's statistics against ranked opponents versus unranked opponents within the same conference can reveal important information:

def calculate_competition_quality_adjustment(
    stats_vs_ranked,
    stats_vs_unranked,
    games_vs_ranked,
    games_vs_unranked
):
    """
    Calculate adjustment based on performance against different competition levels.

    Parameters:
    -----------
    stats_vs_ranked : float
        Statistical production against ranked opponents
    stats_vs_unranked : float
        Statistical production against unranked opponents
    games_vs_ranked : int
        Number of games against ranked opponents
    games_vs_unranked : int
        Number of games against unranked opponents

    Returns:
    --------
    float : Competition quality multiplier
    """
    if games_vs_ranked < 3:
        return 1.0  # Insufficient sample

    # Calculate weighted performance
    total_games = games_vs_ranked + games_vs_unranked
    weighted_ranked = (stats_vs_ranked * games_vs_ranked) / total_games
    weighted_unranked = (stats_vs_unranked * games_vs_unranked) / total_games

    # Players who maintain production vs ranked opponents get bonus
    ratio = stats_vs_ranked / stats_vs_unranked if stats_vs_unranked > 0 else 1.0

    # Adjustment ranges from 0.9 (poor vs quality) to 1.1 (excellent vs quality)
    adjustment = 0.9 + (ratio * 0.2)
    return min(max(adjustment, 0.85), 1.15)

23.2.4 Tournament Performance Adjustment

March Madness provides a unique laboratory for evaluating prospects against diverse competition. Performance in the NCAA tournament, particularly in later rounds, carries additional predictive weight:

First Four/Round of 64: 1.0x weight
Round of 32: 1.05x weight
Sweet 16: 1.10x weight
Elite Eight: 1.15x weight
Final Four/Championship: 1.20x weight

These weights reflect both the quality of competition and the high-pressure environment that mimics playoff basketball.

23.3 Physical Measurements and Combine Data

Physical attributes provide crucial context for statistical production and project how a player's game will translate to the NBA's more athletic environment.

23.3.1 Key Physical Measurements

The NBA Draft Combine measures the following key attributes:

Height Measurements: - Height without shoes - Height with shoes - Wingspan - Standing reach

Athletic Testing: - Standing vertical leap - Maximum vertical leap - Lane agility time - Three-quarter court sprint - Bench press repetitions

23.3.2 Position-Specific Physical Thresholds

Different positions have different physical requirements. The following table shows average measurements for successful NBA players by position:

Position	Height (no shoes)	Wingspan	Standing Reach	Max Vert
PG	6'1"	6'5"	8'1"	38"
SG	6'4"	6'8"	8'4"	37"
SF	6'6"	6'11"	8'7"	35"
PF	6'8"	7'0"	8'10"	33"
C	6'10"	7'3"	9'2"	31"

23.3.3 Length and Reach Ratios

Pure height is less important than length relative to height. The wingspan-to-height ratio is a key predictor of defensive potential:

$$\text{Wingspan Ratio} = \frac{\text{Wingspan}}{\text{Height}}$$

Ratio > 1.06: Elite length, high defensive upside
Ratio 1.03-1.06: Good length
Ratio 1.00-1.03: Average length
Ratio < 1.00: Poor length, defensive concerns

Similarly, standing reach relative to height matters for rim protection and finishing:

$$\text{Reach Ratio} = \frac{\text{Standing Reach}}{\text{Height}}$$

23.3.4 Athletic Composite Scores

Combining athletic measurements into a single composite score allows for easier comparison:

def calculate_athletic_composite(measurements, position):
    """
    Calculate athletic composite score adjusted for position.

    Parameters:
    -----------
    measurements : dict
        Player's combine measurements
    position : str
        Player's primary position

    Returns:
    --------
    float : Athletic composite score (0-100 scale)
    """
    # Position-specific weights
    weights = {
        'PG': {'vert': 0.35, 'agility': 0.40, 'sprint': 0.25},
        'SG': {'vert': 0.35, 'agility': 0.35, 'sprint': 0.30},
        'SF': {'vert': 0.40, 'agility': 0.30, 'sprint': 0.30},
        'PF': {'vert': 0.45, 'agility': 0.30, 'sprint': 0.25},
        'C': {'vert': 0.50, 'agility': 0.35, 'sprint': 0.15}
    }

    # Normalize each measurement to 0-100 scale
    # (Using historical percentiles from combine data)
    vert_score = percentile_score(measurements['max_vert'], 'max_vert', position)
    agility_score = percentile_score(measurements['lane_agility'], 'lane_agility', position)
    sprint_score = percentile_score(measurements['sprint'], 'sprint', position)

    # Calculate weighted composite
    pos_weights = weights[position]
    composite = (
        vert_score * pos_weights['vert'] +
        agility_score * pos_weights['agility'] +
        sprint_score * pos_weights['sprint']
    )

    return composite

23.3.5 Body Mass Index and Frame

A player's current weight relative to their frame affects both their immediate and long-term projection:

$$\text{BMI} = \frac{\text{Weight (lbs)}}{[\text{Height (in)}]^2} \times 703$$

However, BMI alone is insufficient. The frame assessment considers:

Current weight relative to ideal playing weight
Shoulder width and bone structure
Potential to add or lose weight
Body fat percentage (when available)

Players with room to add weight to their frame project differently than players already at their physical ceiling.

23.4 Draft Position Value Curves

Understanding the expected value of each draft position is essential for evaluating trade-ups, trade-downs, and pick valuations.

23.4.1 Historical Value by Pick

Analysis of NBA drafts from 1995-2020 reveals the following approximate relationship between pick number and career value (measured in career Win Shares):

$$\text{Expected WS} = 45.2 \times e^{-0.065 \times \text{Pick}} + 12.8$$

This exponential decay model captures the steep drop-off in value after the top picks.

23.4.2 Pick Value Relative to First Overall

For trade valuation purposes, it's useful to express pick value relative to the first overall pick:

Pick	Relative Value	Pick	Relative Value
1	1.00	16	0.28
2	0.85	17	0.26
3	0.75	18	0.24
4	0.67	19	0.22
5	0.60	20	0.20
6	0.54	25	0.15
7	0.49	30	0.12
8	0.45	40	0.07
9	0.41	50	0.04
10	0.38	60	0.02

23.4.3 Variance by Draft Position

Higher picks not only have higher expected value but also lower variance in outcomes. The standard deviation of career Win Shares by pick range:

Picks 1-5: SD = 18.5 WS
Picks 6-10: SD = 21.2 WS
Picks 11-20: SD = 19.8 WS
Picks 21-30: SD = 16.4 WS
Second Round: SD = 12.1 WS

The higher variance in the 6-10 range reflects the "boom or bust" nature of these selections, where teams often take higher-risk, higher-upside prospects.

23.4.4 Position-Specific Draft Value

Different positions have historically shown different value curves:

def calculate_positional_pick_value(pick_number, position):
    """
    Calculate expected value of a draft pick by position.

    Parameters:
    -----------
    pick_number : int
        Draft position (1-60)
    position : str
        Player position (PG, SG, SF, PF, C)

    Returns:
    --------
    dict : Expected value metrics
    """
    # Base value calculation
    base_value = 45.2 * np.exp(-0.065 * pick_number) + 12.8

    # Position multipliers (historical success rates)
    position_multipliers = {
        'PG': 1.05,  # Guards slightly overperform expectations
        'SG': 0.95,
        'SF': 1.02,
        'PF': 1.08,  # Frontcourt historically safer
        'C': 0.90   # Centers most volatile
    }

    adjusted_value = base_value * position_multipliers[position]

    # Variance by position
    variance_multipliers = {
        'PG': 1.0,
        'SG': 1.15,
        'SF': 1.10,
        'PF': 0.90,
        'C': 1.25
    }

    base_variance = 18.5 if pick_number <= 5 else (
        21.2 if pick_number <= 10 else (
        19.8 if pick_number <= 20 else (
        16.4 if pick_number <= 30 else 12.1)))

    adjusted_variance = base_variance * variance_multipliers[position]

    return {
        'expected_ws': adjusted_value,
        'variance': adjusted_variance,
        'p_allstar': calculate_allstar_probability(pick_number, position),
        'p_rotation': calculate_rotation_probability(pick_number, position),
        'p_bust': calculate_bust_probability(pick_number, position)
    }

23.5 Bust Probability Modeling

One of the most valuable applications of draft analytics is identifying prospects with elevated bust risk. A "bust" can be defined in several ways, but we'll use the practical definition: a player who fails to provide value commensurate with their draft position.

23.5.1 Defining Bust Thresholds

Draft Range	Bust Threshold (Career WS)	Bust Threshold (Peak Season WS)
1-5	< 20	< 5
6-10	< 15	< 4
11-20	< 10	< 3
21-30	< 5	< 2

23.5.2 Bust Predictors

Research has identified several factors that elevate bust probability:

Statistical Red Flags: - Poor free throw percentage (< 70%) for guards - High turnover rate relative to usage - Low steal rate for guards - Low block rate for bigs without elite athleticism - Declining production from freshman to sophomore/junior year

Physical Red Flags: - Below-average wingspan for position - Poor athletic testing numbers - Injury history, particularly to knees or feet - Older age relative to draft class

Contextual Red Flags: - Statistical production dependent on superior teammates - Limited experience against high-level competition - System-dependent production (e.g., specific offensive schemes)

23.5.3 Bust Probability Model

def calculate_bust_probability(prospect_data, draft_position):
    """
    Calculate probability a prospect will bust relative to draft position.

    Parameters:
    -----------
    prospect_data : dict
        Comprehensive prospect data including stats, measurements, context
    draft_position : int
        Expected or actual draft position

    Returns:
    --------
    float : Bust probability (0-1)
    """
    # Base bust rate by draft position
    if draft_position <= 5:
        base_rate = 0.25
    elif draft_position <= 10:
        base_rate = 0.35
    elif draft_position <= 20:
        base_rate = 0.45
    else:
        base_rate = 0.60

    # Calculate risk factors
    risk_multiplier = 1.0

    # Free throw percentage (guards only)
    if prospect_data['position'] in ['PG', 'SG']:
        if prospect_data['ft_pct'] < 0.70:
            risk_multiplier *= 1.3
        elif prospect_data['ft_pct'] < 0.75:
            risk_multiplier *= 1.1

    # Age factor
    age_at_draft = prospect_data['age']
    if age_at_draft > 22:
        risk_multiplier *= 1.25
    elif age_at_draft > 21:
        risk_multiplier *= 1.1
    elif age_at_draft < 19:
        risk_multiplier *= 0.9  # Younger players have more upside

    # Wingspan ratio
    wingspan_ratio = prospect_data['wingspan'] / prospect_data['height']
    if wingspan_ratio < 1.0:
        risk_multiplier *= 1.35
    elif wingspan_ratio < 1.03:
        risk_multiplier *= 1.15

    # Production trend
    if prospect_data.get('production_trend') == 'declining':
        risk_multiplier *= 1.4
    elif prospect_data.get('production_trend') == 'improving':
        risk_multiplier *= 0.85

    # Competition level
    if prospect_data['conference_strength'] < 0.80:
        risk_multiplier *= 1.3

    # Calculate final probability
    bust_prob = base_rate * risk_multiplier

    # Cap probability at reasonable bounds
    return min(max(bust_prob, 0.05), 0.95)

23.5.4 Bust Categories

Not all busts are created equal. Understanding the type of bust risk helps inform evaluation:

Injury Busts: Players with physical red flags who never stay healthy
Skill Busts: Players whose college skills don't translate
Athletic Busts: Players who relied on physical advantages that disappear at the NBA level
Development Busts: Players who fail to improve necessary skills
Contextual Busts: Players whose production was system or teammate dependent

Each category has different predictors and different potential mitigation strategies.

23.6 International Prospect Evaluation

International prospects present unique challenges for draft models. Different leagues, rules, competition levels, and data availability all complicate evaluation.

23.6.1 League Strength Adjustments

International leagues vary significantly in quality. The following hierarchy represents approximate NBA translation rates:

Tier 1 (Strong Translation): - EuroLeague (top teams) - Spanish ACB - Turkish BSL

Tier 2 (Moderate Translation): - Italian Serie A - French Pro A - German BBL - EuroLeague (lower-tier teams)

Tier 3 (Weaker Translation): - Other European leagues - Australian NBL - Chinese CBA

23.6.2 International Statistical Translation

International statistics require different translation factors than college statistics:

League	Points	Rebounds	Assists	Efficiency
EuroLeague	0.82	0.88	0.75	0.85
ACB	0.78	0.85	0.72	0.82
EuroCup	0.70	0.80	0.68	0.75
Other Europe	0.55-0.65	0.70-0.80	0.55-0.65	0.60-0.70

23.6.3 International Context Factors

Several factors affect how international statistics translate:

Game Style Differences: - Shorter three-point line (until recently harmonized) - Different foul rules - FIBA-style play tends to be slower, more structured - Less isolation-heavy offense

Role Considerations: - Young international players often play limited roles on veteran teams - Per-minute production may be more relevant than total production - Performance against other NBA-level talent in EuroLeague is especially predictive

def translate_international_stats(player_stats, league, age, role='rotation'):
    """
    Translate international statistics to projected NBA statistics.

    Parameters:
    -----------
    player_stats : dict
        Per-40-minute statistics from international league
    league : str
        League identifier
    age : float
        Player's age at time of evaluation
    role : str
        Player's role on team ('star', 'rotation', 'bench')

    Returns:
    --------
    dict : Projected NBA statistics with confidence intervals
    """
    # League-specific translation coefficients
    league_coefficients = {
        'EuroLeague': {'pts': 0.82, 'reb': 0.88, 'ast': 0.75, 'ts': 0.92},
        'ACB': {'pts': 0.78, 'reb': 0.85, 'ast': 0.72, 'ts': 0.90},
        'EuroCup': {'pts': 0.70, 'reb': 0.80, 'ast': 0.68, 'ts': 0.88},
        'Other': {'pts': 0.60, 'reb': 0.75, 'ast': 0.60, 'ts': 0.85}
    }

    coefs = league_coefficients.get(league, league_coefficients['Other'])

    # Role adjustment - limited role players project better per-minute
    role_multipliers = {
        'star': 0.95,  # May have inflated stats
        'rotation': 1.00,
        'bench': 1.10  # Per-minute stats likely sustainable
    }

    role_mult = role_multipliers[role]

    # Age adjustment for international players
    # Younger international players have more projection
    if age < 20:
        age_mult = 1.15
    elif age < 21:
        age_mult = 1.08
    elif age < 22:
        age_mult = 1.02
    else:
        age_mult = 0.95

    translated = {}
    for stat, value in player_stats.items():
        if stat in coefs:
            base_translation = value * coefs[stat] * role_mult * age_mult
            translated[stat] = base_translation
            # International projections have wider uncertainty
            translated[f'{stat}_lower'] = base_translation * 0.70
            translated[f'{stat}_upper'] = base_translation * 1.30

    return translated

23.6.4 International Draft Success Patterns

Historical analysis reveals patterns in international draft success:

Players drafted from EuroLeague at ages 19-21 have the highest success rate
International players who produce at young ages against veteran competition are strong bets
Players from basketball-development countries (Spain, Serbia, France, etc.) translate better
International bigs have historically translated better than guards

23.7 Age Adjustments for Prospects

Age is one of the strongest predictors of future development. Younger players at any given level of production have more room to grow.

23.7.1 The Age-Production Framework

The key insight is that what matters is not absolute production, but production relative to age and experience. A 19-year-old averaging 15 PPG in the Big 12 projects better than a 22-year-old averaging 20 PPG.

23.7.2 Age Adjustment Formulas

For college prospects:

$$\text{Age-Adjusted Production} = \text{Raw Production} \times \left(\frac{22}{\text{Age}}\right)^{0.8}$$

This formula gives younger players significant credit for matching older players' production.

23.7.3 Experience Adjustments

Years of experience at the college level also matter:

Class	Experience Multiplier
Freshman	1.25
Sophomore	1.10
Junior	1.00
Senior	0.90
5th Year	0.82

These multipliers reflect the observation that freshmen who produce at high levels have more projection than seniors with similar numbers.

23.7.4 Age-Based Projection Model

def project_age_adjusted_production(
    current_stats,
    current_age,
    years_of_college,
    position
):
    """
    Project future NBA production with age adjustments.

    Parameters:
    -----------
    current_stats : dict
        Current per-100-possession statistics
    current_age : float
        Player's current age
    years_of_college : int
        Number of years in college
    position : str
        Player's position

    Returns:
    --------
    dict : Projected peak NBA statistics
    """
    # Base age adjustment
    age_factor = (22 / current_age) ** 0.8

    # Experience adjustment
    experience_factors = {1: 1.25, 2: 1.10, 3: 1.00, 4: 0.90, 5: 0.82}
    exp_factor = experience_factors.get(years_of_college, 0.82)

    # Position-specific development curves
    # Guards tend to take longer to develop
    position_dev = {
        'PG': {'peak_age': 28, 'dev_rate': 0.08},
        'SG': {'peak_age': 27, 'dev_rate': 0.07},
        'SF': {'peak_age': 27, 'dev_rate': 0.06},
        'PF': {'peak_age': 27, 'dev_rate': 0.06},
        'C': {'peak_age': 26, 'dev_rate': 0.05}
    }

    pos_info = position_dev[position]
    years_to_peak = pos_info['peak_age'] - current_age

    # Project improvement
    projected_improvement = 1 + (pos_info['dev_rate'] * years_to_peak)

    # Calculate projected peak stats
    total_factor = age_factor * exp_factor * projected_improvement

    projected_stats = {}
    for stat, value in current_stats.items():
        projected_stats[f'{stat}_projected'] = value * total_factor
        projected_stats[f'{stat}_floor'] = value * total_factor * 0.75
        projected_stats[f'{stat}_ceiling'] = value * total_factor * 1.35

    return projected_stats

23.7.5 One-and-Done Evaluation

One-and-done prospects (single year of college) present unique challenges:

Limited sample size of college performance
Often played with inferior teammates
May have been underutilized in college systems
Physical development still incomplete

For these prospects, additional weight should be placed on: - High school rankings and recruiting evaluation - Performance at elite camps (McDonald's All-American, etc.) - Athletic testing at the combine - Interviews and character assessment

23.8 Building a Draft Model from Scratch

Now we'll walk through the complete process of building a comprehensive draft model.

23.8.1 Data Collection and Preparation

The first step is assembling a comprehensive dataset:

Required Data: 1. College statistics (per-game and per-100-possession) 2. Conference and team identifiers 3. Physical measurements from combine 4. Age and experience information 5. Historical draft positions and career outcomes

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor, RandomForestClassifier
import xgboost as xgb

class DraftModel:
    """
    Comprehensive NBA Draft prospect evaluation model.
    """

    def __init__(self):
        self.value_model = None
        self.bust_model = None
        self.allstar_model = None
        self.scaler = StandardScaler()
        self.feature_names = None

    def prepare_features(self, prospect_df):
        """
        Engineer features from raw prospect data.

        Parameters:
        -----------
        prospect_df : pd.DataFrame
            Raw prospect data

        Returns:
        --------
        pd.DataFrame : Engineered features
        """
        features = pd.DataFrame()

        # Per-100 possession statistics (pace-adjusted)
        features['pts_per100'] = prospect_df['points'] / prospect_df['team_poss'] * 100
        features['reb_per100'] = prospect_df['rebounds'] / prospect_df['team_poss'] * 100
        features['ast_per100'] = prospect_df['assists'] / prospect_df['team_poss'] * 100
        features['stl_per100'] = prospect_df['steals'] / prospect_df['team_poss'] * 100
        features['blk_per100'] = prospect_df['blocks'] / prospect_df['team_poss'] * 100
        features['tov_per100'] = prospect_df['turnovers'] / prospect_df['team_poss'] * 100

        # Efficiency metrics
        features['ts_pct'] = prospect_df['points'] / (
            2 * (prospect_df['fga'] + 0.44 * prospect_df['fta'])
        )
        features['efg_pct'] = (
            prospect_df['fgm'] + 0.5 * prospect_df['fg3m']
        ) / prospect_df['fga']
        features['ft_pct'] = prospect_df['ftm'] / prospect_df['fta'].replace(0, 1)
        features['fg3_pct'] = prospect_df['fg3m'] / prospect_df['fg3a'].replace(0, 1)

        # Usage and role metrics
        features['usg_pct'] = prospect_df['usage_rate']
        features['ast_to_tov'] = prospect_df['assists'] / prospect_df['turnovers'].replace(0, 1)
        features['stl_pct'] = prospect_df['steal_rate']
        features['blk_pct'] = prospect_df['block_rate']

        # Physical measurements
        features['height'] = prospect_df['height_no_shoes']
        features['wingspan'] = prospect_df['wingspan']
        features['wingspan_ratio'] = prospect_df['wingspan'] / prospect_df['height_no_shoes']
        features['standing_reach'] = prospect_df['standing_reach']
        features['max_vert'] = prospect_df['max_vertical']
        features['lane_agility'] = prospect_df['lane_agility']
        features['sprint'] = prospect_df['three_quarter_sprint']

        # Age and experience
        features['age'] = prospect_df['age_at_draft']
        features['years_college'] = prospect_df['college_years']
        features['age_adjusted'] = features['age'] - features['years_college']

        # Conference strength
        features['conf_strength'] = prospect_df['conference_strength']

        # Competition-adjusted statistics
        for stat in ['pts_per100', 'reb_per100', 'ast_per100']:
            features[f'{stat}_adj'] = features[stat] * features['conf_strength']

        # Production trends
        if 'prev_year_pts' in prospect_df.columns:
            features['pts_improvement'] = (
                features['pts_per100'] - prospect_df['prev_year_pts_per100']
            ) / prospect_df['prev_year_pts_per100'].replace(0, 1)

        # Composite scores
        features['box_plus_minus'] = prospect_df.get('bpm',
            self._estimate_bpm(features))

        # Age-adjusted production
        features['age_adj_pts'] = features['pts_per100_adj'] * (22 / features['age']) ** 0.8
        features['age_adj_composite'] = (
            features['pts_per100_adj'] * 0.4 +
            features['reb_per100'] * features['conf_strength'] * 0.2 +
            features['ast_per100'] * features['conf_strength'] * 0.2 +
            (features['stl_per100'] + features['blk_per100']) * features['conf_strength'] * 0.2
        ) * (22 / features['age']) ** 0.8

        self.feature_names = features.columns.tolist()
        return features

    def _estimate_bpm(self, features):
        """Estimate Box Plus/Minus from available statistics."""
        # Simplified BPM estimation
        return (
            features['pts_per100'] * 0.05 +
            features['reb_per100'] * 0.1 +
            features['ast_per100'] * 0.15 -
            features['tov_per100'] * 0.1 +
            features['stl_per100'] * 0.2 +
            features['blk_per100'] * 0.15 +
            features['ts_pct'] * 10 - 8
        )

23.8.2 Target Variable Definition

Defining the target variable is crucial. Common approaches include:

Career Win Shares: Total value produced over career
Peak Win Shares: Best single-season production
Categorical Outcomes: All-Star, Starter, Rotation, Bust

def define_target_variables(career_df):
    """
    Define multiple target variables for draft modeling.

    Parameters:
    -----------
    career_df : pd.DataFrame
        Career statistics for historical draft picks

    Returns:
    --------
    pd.DataFrame : Target variables
    """
    targets = pd.DataFrame()

    # Continuous targets
    targets['career_ws'] = career_df['total_win_shares']
    targets['peak_ws'] = career_df['best_season_ws']
    targets['career_vorp'] = career_df['total_vorp']

    # Per-year value (accounts for career length)
    targets['ws_per_year'] = (
        career_df['total_win_shares'] /
        career_df['seasons_played'].replace(0, 1)
    )

    # Categorical targets
    targets['made_allstar'] = (career_df['allstar_selections'] > 0).astype(int)
    targets['made_allnba'] = (career_df['allnba_selections'] > 0).astype(int)

    # Role achievement
    targets['became_starter'] = (
        career_df['seasons_as_starter'] >= 3
    ).astype(int)

    targets['rotation_player'] = (
        career_df['total_minutes'] >= 5000
    ).astype(int)

    # Bust classification (relative to draft position)
    def is_bust(row):
        if row['draft_pick'] <= 5:
            return row['career_ws'] < 20
        elif row['draft_pick'] <= 10:
            return row['career_ws'] < 15
        elif row['draft_pick'] <= 20:
            return row['career_ws'] < 10
        else:
            return row['career_ws'] < 5

    targets['is_bust'] = career_df.apply(is_bust, axis=1).astype(int)

    return targets

23.8.3 Model Training

With features and targets defined, we can train multiple models for different prediction tasks:

def train_draft_models(self, features, targets, test_size=0.2):
    """
    Train ensemble of draft prediction models.

    Parameters:
    -----------
    features : pd.DataFrame
        Engineered feature matrix
    targets : pd.DataFrame
        Target variables
    test_size : float
        Proportion of data for testing

    Returns:
    --------
    dict : Training results and model performance
    """
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        features, targets, test_size=test_size, random_state=42
    )

    # Scale features
    X_train_scaled = self.scaler.fit_transform(X_train)
    X_test_scaled = self.scaler.transform(X_test)

    results = {}

    # Train value prediction model (Gradient Boosting)
    self.value_model = xgb.XGBRegressor(
        n_estimators=200,
        max_depth=5,
        learning_rate=0.05,
        subsample=0.8,
        colsample_bytree=0.8,
        random_state=42
    )
    self.value_model.fit(
        X_train_scaled,
        y_train['career_ws'],
        eval_set=[(X_test_scaled, y_test['career_ws'])],
        verbose=False
    )

    # Evaluate value model
    value_pred = self.value_model.predict(X_test_scaled)
    results['value_model'] = {
        'rmse': np.sqrt(np.mean((value_pred - y_test['career_ws'])**2)),
        'r2': 1 - np.sum((value_pred - y_test['career_ws'])**2) /
              np.sum((y_test['career_ws'] - y_test['career_ws'].mean())**2),
        'feature_importance': dict(zip(
            self.feature_names,
            self.value_model.feature_importances_
        ))
    }

    # Train bust probability model (Random Forest Classifier)
    self.bust_model = RandomForestClassifier(
        n_estimators=200,
        max_depth=8,
        min_samples_leaf=10,
        random_state=42
    )
    self.bust_model.fit(X_train_scaled, y_train['is_bust'])

    # Evaluate bust model
    bust_pred_proba = self.bust_model.predict_proba(X_test_scaled)[:, 1]
    bust_pred = (bust_pred_proba > 0.5).astype(int)
    results['bust_model'] = {
        'accuracy': np.mean(bust_pred == y_test['is_bust']),
        'auc': self._calculate_auc(y_test['is_bust'], bust_pred_proba),
        'feature_importance': dict(zip(
            self.feature_names,
            self.bust_model.feature_importances_
        ))
    }

    # Train All-Star probability model
    self.allstar_model = xgb.XGBClassifier(
        n_estimators=150,
        max_depth=4,
        learning_rate=0.05,
        random_state=42
    )
    self.allstar_model.fit(
        X_train_scaled,
        y_train['made_allstar'],
        eval_set=[(X_test_scaled, y_test['made_allstar'])],
        verbose=False
    )

    allstar_pred_proba = self.allstar_model.predict_proba(X_test_scaled)[:, 1]
    results['allstar_model'] = {
        'auc': self._calculate_auc(y_test['made_allstar'], allstar_pred_proba)
    }

    return results

23.8.4 Model Interpretation

Understanding what drives model predictions is as important as the predictions themselves:

def explain_prediction(self, prospect_features):
    """
    Generate interpretable explanation for prospect evaluation.

    Parameters:
    -----------
    prospect_features : pd.Series
        Feature values for single prospect

    Returns:
    --------
    dict : Prediction with explanations
    """
    # Scale features
    features_scaled = self.scaler.transform(
        prospect_features.values.reshape(1, -1)
    )

    # Get predictions
    predicted_ws = self.value_model.predict(features_scaled)[0]
    bust_prob = self.bust_model.predict_proba(features_scaled)[0, 1]
    allstar_prob = self.allstar_model.predict_proba(features_scaled)[0, 1]

    # Get feature contributions using SHAP-like analysis
    # (Simplified version - use actual SHAP for production)
    feature_impacts = {}
    base_pred = self.value_model.predict(
        np.zeros((1, len(self.feature_names)))
    )[0]

    for i, feature in enumerate(self.feature_names):
        # Measure impact of this feature
        modified = features_scaled.copy()
        modified[0, i] = 0
        impact = predicted_ws - self.value_model.predict(modified)[0]
        feature_impacts[feature] = impact

    # Sort by absolute impact
    sorted_impacts = sorted(
        feature_impacts.items(),
        key=lambda x: abs(x[1]),
        reverse=True
    )

    # Generate explanation
    explanation = {
        'predicted_career_ws': predicted_ws,
        'bust_probability': bust_prob,
        'allstar_probability': allstar_prob,
        'top_positive_factors': [
            (f, v) for f, v in sorted_impacts if v > 0
        ][:5],
        'top_negative_factors': [
            (f, v) for f, v in sorted_impacts if v < 0
        ][:5],
        'confidence_interval': (
            predicted_ws - 15,  # Approximate based on model uncertainty
            predicted_ws + 15
        )
    }

    return explanation

23.8.5 Generating a Draft Board

The final step is ranking prospects to generate a draft board:

def generate_draft_board(self, prospects_df, num_picks=60):
    """
    Generate ranked draft board with comprehensive evaluations.

    Parameters:
    -----------
    prospects_df : pd.DataFrame
        Current draft class prospect data
    num_picks : int
        Number of picks to rank

    Returns:
    --------
    pd.DataFrame : Ranked draft board
    """
    # Prepare features
    features = self.prepare_features(prospects_df)
    features_scaled = self.scaler.transform(features)

    # Generate predictions
    draft_board = prospects_df[['name', 'position', 'school', 'age']].copy()
    draft_board['predicted_ws'] = self.value_model.predict(features_scaled)
    draft_board['bust_prob'] = self.bust_model.predict_proba(features_scaled)[:, 1]
    draft_board['allstar_prob'] = self.allstar_model.predict_proba(features_scaled)[:, 1]

    # Calculate composite score
    # Balances expected value with risk
    draft_board['composite_score'] = (
        draft_board['predicted_ws'] * 0.6 +
        (1 - draft_board['bust_prob']) * 30 * 0.25 +
        draft_board['allstar_prob'] * 50 * 0.15
    )

    # Risk-adjusted value
    draft_board['risk_adj_value'] = (
        draft_board['predicted_ws'] * (1 - draft_board['bust_prob'] * 0.5)
    )

    # Rank by composite score
    draft_board = draft_board.sort_values(
        'composite_score',
        ascending=False
    ).reset_index(drop=True)

    draft_board['model_rank'] = range(1, len(draft_board) + 1)

    # Add tier classifications
    def assign_tier(rank):
        if rank <= 5:
            return 'Elite'
        elif rank <= 14:
            return 'Lottery'
        elif rank <= 30:
            return 'First Round'
        elif rank <= 45:
            return 'Early Second'
        else:
            return 'Late Second'

    draft_board['tier'] = draft_board['model_rank'].apply(assign_tier)

    return draft_board.head(num_picks)

23.9 Backtesting Draft Models

A draft model is only as good as its historical performance. Rigorous backtesting is essential for validating and improving models.

23.9.1 Walk-Forward Validation

The proper way to backtest a draft model is walk-forward validation, where we:

Train the model on all data before year Y
Predict outcomes for year Y's draft class
Evaluate predictions against actual outcomes
Move forward to year Y+1 and repeat

def walk_forward_backtest(self, historical_df, start_year=2010, end_year=2020):
    """
    Perform walk-forward validation of draft model.

    Parameters:
    -----------
    historical_df : pd.DataFrame
        Historical draft data with outcomes
    start_year : int
        First year to predict
    end_year : int
        Last year to predict

    Returns:
    --------
    dict : Backtest results by year
    """
    results = {}

    for year in range(start_year, end_year + 1):
        # Training data: all years before current
        train_data = historical_df[historical_df['draft_year'] < year]

        # Test data: current year
        test_data = historical_df[historical_df['draft_year'] == year]

        if len(train_data) < 100 or len(test_data) < 30:
            continue

        # Prepare features and targets
        train_features = self.prepare_features(train_data)
        train_targets = define_target_variables(train_data)

        test_features = self.prepare_features(test_data)
        test_targets = define_target_variables(test_data)

        # Train model on historical data
        self.scaler.fit(train_features)
        train_scaled = self.scaler.transform(train_features)
        test_scaled = self.scaler.transform(test_features)

        # Fit models
        self.value_model.fit(train_scaled, train_targets['career_ws'])
        self.bust_model.fit(train_scaled, train_targets['is_bust'])

        # Generate predictions
        predicted_ws = self.value_model.predict(test_scaled)
        predicted_bust = self.bust_model.predict_proba(test_scaled)[:, 1]

        # Calculate metrics
        actual_ws = test_targets['career_ws']
        actual_bust = test_targets['is_bust']

        results[year] = {
            'n_prospects': len(test_data),
            'ws_rmse': np.sqrt(np.mean((predicted_ws - actual_ws)**2)),
            'ws_correlation': np.corrcoef(predicted_ws, actual_ws)[0, 1],
            'bust_auc': self._calculate_auc(actual_bust, predicted_bust),
            'top10_accuracy': self._evaluate_top_picks(
                test_data, predicted_ws, actual_ws, top_n=10
            ),
            'value_over_baseline': self._calculate_vob(
                test_data, predicted_ws, actual_ws
            )
        }

        # Store detailed predictions for analysis
        results[year]['predictions'] = pd.DataFrame({
            'name': test_data['name'],
            'actual_pick': test_data['draft_pick'],
            'predicted_ws': predicted_ws,
            'actual_ws': actual_ws,
            'predicted_bust': predicted_bust,
            'actual_bust': actual_bust
        })

    return results

23.9.2 Benchmark Comparisons

Models should be compared against meaningful baselines:

Draft Position Baseline: How much better does the model perform than simply selecting by consensus draft position?
Mock Draft Baseline: Compare against aggregated mock drafts
Simple Statistical Models: Compare against basic regression models

def compare_to_baselines(self, backtest_results):
    """
    Compare model performance to baseline methods.

    Parameters:
    -----------
    backtest_results : dict
        Results from walk_forward_backtest

    Returns:
    --------
    pd.DataFrame : Comparison metrics
    """
    comparisons = []

    for year, results in backtest_results.items():
        predictions = results['predictions']

        # Model ranking correlation with actual outcomes
        model_corr = np.corrcoef(
            predictions['predicted_ws'].rank(ascending=False),
            predictions['actual_ws'].rank(ascending=False)
        )[0, 1]

        # Draft position correlation with actual outcomes
        draft_corr = np.corrcoef(
            predictions['actual_pick'],
            predictions['actual_ws'].rank(ascending=False)
        )[0, 1]

        # Value captured in top 10 picks
        predictions_sorted = predictions.sort_values(
            'predicted_ws', ascending=False
        )
        model_top10_value = predictions_sorted.head(10)['actual_ws'].sum()

        # Optimal top 10 value
        optimal_top10_value = predictions.nlargest(10, 'actual_ws')['actual_ws'].sum()

        # Draft order top 10 value
        draft_top10_value = predictions.nsmallest(
            10, 'actual_pick'
        )['actual_ws'].sum()

        comparisons.append({
            'year': year,
            'model_rank_correlation': model_corr,
            'draft_rank_correlation': draft_corr,
            'model_top10_value': model_top10_value,
            'draft_top10_value': draft_top10_value,
            'optimal_top10_value': optimal_top10_value,
            'model_efficiency': model_top10_value / optimal_top10_value,
            'draft_efficiency': draft_top10_value / optimal_top10_value,
            'model_advantage': model_top10_value - draft_top10_value
        })

    return pd.DataFrame(comparisons)

23.9.3 Error Analysis

Understanding where and why models fail is crucial for improvement:

def analyze_prediction_errors(self, backtest_results):
    """
    Analyze systematic errors in draft predictions.

    Parameters:
    -----------
    backtest_results : dict
        Results from walk_forward_backtest

    Returns:
    --------
    dict : Error analysis results
    """
    all_predictions = pd.concat([
        r['predictions'] for r in backtest_results.values()
    ])

    # Calculate prediction errors
    all_predictions['error'] = (
        all_predictions['predicted_ws'] - all_predictions['actual_ws']
    )
    all_predictions['abs_error'] = abs(all_predictions['error'])

    analysis = {}

    # Error by draft position range
    position_bins = [0, 5, 10, 20, 30, 60]
    all_predictions['position_bin'] = pd.cut(
        all_predictions['actual_pick'],
        bins=position_bins,
        labels=['1-5', '6-10', '11-20', '21-30', '31-60']
    )

    analysis['error_by_position'] = all_predictions.groupby('position_bin').agg({
        'error': ['mean', 'std'],
        'abs_error': 'mean'
    })

    # Biggest misses (predicted high, performed low)
    analysis['biggest_overestimates'] = all_predictions.nlargest(
        10, 'error'
    )[['name', 'actual_pick', 'predicted_ws', 'actual_ws', 'error']]

    # Biggest underestimates (predicted low, performed high)
    analysis['biggest_underestimates'] = all_predictions.nsmallest(
        10, 'error'
    )[['name', 'actual_pick', 'predicted_ws', 'actual_ws', 'error']]

    # Bust prediction accuracy
    bust_threshold = all_predictions['predicted_bust'].median()
    high_bust_flagged = all_predictions[
        all_predictions['predicted_bust'] > bust_threshold
    ]
    analysis['bust_flag_accuracy'] = high_bust_flagged['actual_bust'].mean()

    return analysis

23.9.4 Calibration Assessment

A well-calibrated model's probability predictions should match actual frequencies:

def assess_calibration(self, backtest_results, n_bins=10):
    """
    Assess probability calibration of draft models.

    Parameters:
    -----------
    backtest_results : dict
        Results from walk_forward_backtest
    n_bins : int
        Number of probability bins

    Returns:
    --------
    pd.DataFrame : Calibration analysis
    """
    all_predictions = pd.concat([
        r['predictions'] for r in backtest_results.values()
    ])

    # Bin predictions by predicted bust probability
    all_predictions['prob_bin'] = pd.cut(
        all_predictions['predicted_bust'],
        bins=n_bins,
        labels=[f'{i/n_bins:.1f}-{(i+1)/n_bins:.1f}'
                for i in range(n_bins)]
    )

    calibration = all_predictions.groupby('prob_bin').agg({
        'predicted_bust': 'mean',
        'actual_bust': ['mean', 'count']
    })

    calibration.columns = ['avg_predicted', 'actual_rate', 'n_observations']
    calibration['calibration_error'] = (
        calibration['avg_predicted'] - calibration['actual_rate']
    )

    return calibration

23.10 Practical Considerations for Draft Model Deployment

23.10.1 Updating Models in Real-Time

Draft models should be updated throughout the pre-draft process as new information becomes available:

Post-season tournament performance
NBA Combine measurements and results
Private workout feedback
Medical evaluation results
Interview assessments

23.10.2 Incorporating Qualitative Information

Statistical models should be combined with qualitative scouting:

def blend_model_with_scouts(
    model_ranking,
    scout_rankings,
    scout_weights,
    model_weight=0.6
):
    """
    Blend statistical model rankings with scout evaluations.

    Parameters:
    -----------
    model_ranking : pd.DataFrame
        Model-generated draft board
    scout_rankings : dict
        Dictionary of scout name -> ranking DataFrame
    scout_weights : dict
        Weights for each scout based on historical accuracy
    model_weight : float
        Weight for statistical model (0-1)

    Returns:
    --------
    pd.DataFrame : Blended draft board
    """
    # Normalize scout weights
    total_scout_weight = sum(scout_weights.values())
    norm_scout_weights = {
        k: v / total_scout_weight * (1 - model_weight)
        for k, v in scout_weights.items()
    }

    # Create composite ranking
    blended = model_ranking.copy()
    blended['weighted_rank'] = (
        model_ranking['model_rank'] * model_weight
    )

    for scout, ranking in scout_rankings.items():
        weight = norm_scout_weights.get(scout, 0)
        merged = blended.merge(
            ranking[['name', 'scout_rank']],
            on='name',
            how='left'
        )
        blended['weighted_rank'] += merged['scout_rank'].fillna(60) * weight

    # Re-rank based on weighted score
    blended = blended.sort_values('weighted_rank').reset_index(drop=True)
    blended['final_rank'] = range(1, len(blended) + 1)

    return blended

23.10.3 Position-of-Need Adjustments

While pure best-player-available is optimal in theory, practical draft strategy must consider roster construction:

def adjust_for_team_needs(
    draft_board,
    team_roster,
    position_values,
    need_premium=0.15
):
    """
    Adjust draft board based on team positional needs.

    Parameters:
    -----------
    draft_board : pd.DataFrame
        Base draft board rankings
    team_roster : dict
        Current roster with positions and values
    position_values : dict
        Current positional value on roster
    need_premium : float
        Maximum premium for filling a need

    Returns:
    --------
    pd.DataFrame : Need-adjusted draft board
    """
    # Calculate position needs
    ideal_distribution = {'PG': 2, 'SG': 2, 'SF': 2, 'PF': 2, 'C': 2}
    position_gaps = {}

    for pos, ideal in ideal_distribution.items():
        current = len([p for p in team_roster if p['position'] == pos])
        current_value = position_values.get(pos, 0)
        position_gaps[pos] = (ideal - current) / ideal * (1 - current_value / 100)

    # Adjust prospect values
    adjusted = draft_board.copy()
    for idx, row in adjusted.iterrows():
        pos = row['position']
        need_factor = 1 + position_gaps.get(pos, 0) * need_premium
        adjusted.loc[idx, 'team_adj_value'] = (
            row['composite_score'] * need_factor
        )

    adjusted = adjusted.sort_values(
        'team_adj_value',
        ascending=False
    ).reset_index(drop=True)

    adjusted['team_rank'] = range(1, len(adjusted) + 1)

    return adjusted

23.10.4 Trade Value Assessment

Understanding when to trade picks versus selecting:

def evaluate_trade_scenarios(
    draft_board,
    current_picks,
    trade_offers,
    pick_value_curve
):
    """
    Evaluate whether to accept trade offers for picks.

    Parameters:
    -----------
    draft_board : pd.DataFrame
        Model draft board
    current_picks : list
        Team's current pick positions
    trade_offers : list
        List of trade offer dictionaries
    pick_value_curve : callable
        Function mapping pick -> expected value

    Returns:
    --------
    list : Evaluated trade scenarios
    """
    evaluations = []

    for offer in trade_offers:
        # Calculate value of picks being traded away
        giving_value = sum(
            pick_value_curve(pick)
            for pick in offer['giving_picks']
        )

        # Calculate value of picks being received
        receiving_value = sum(
            pick_value_curve(pick)
            for pick in offer['receiving_picks']
        )

        # Add value of any players in the trade
        giving_value += sum(
            player['value']
            for player in offer.get('giving_players', [])
        )
        receiving_value += sum(
            player['value']
            for player in offer.get('receiving_players', [])
        )

        # Calculate specific prospect value if draft board known
        if offer['giving_picks']:
            top_pick_given = min(offer['giving_picks'])
            prospect_at_pick = draft_board.iloc[top_pick_given - 1]
            specific_value_lost = prospect_at_pick['predicted_ws']
        else:
            specific_value_lost = 0

        evaluations.append({
            'offer': offer,
            'giving_value': giving_value,
            'receiving_value': receiving_value,
            'value_delta': receiving_value - giving_value,
            'specific_prospect_lost': prospect_at_pick.get('name', 'N/A'),
            'recommendation': 'Accept' if receiving_value > giving_value else 'Decline'
        })

    return evaluations

Summary

Draft modeling represents one of the most challenging and impactful applications of basketball analytics. By combining statistical translation, physical profiling, and probabilistic modeling, teams can systematically identify value that pure scouting might miss.

Key principles to remember:

Context is everything: Raw statistics without adjustment for pace, competition, and age are nearly meaningless.
Uncertainty is inherent: Even the best models have significant uncertainty. Embrace probability distributions rather than point estimates.
Multiple models for multiple questions: Different target variables (value, bust risk, All-Star probability) may require different modeling approaches.
Validation is essential: Rigorous backtesting reveals both model strengths and systematic biases.
Combine quantitative and qualitative: The best draft processes integrate statistical models with traditional scouting.
Update continuously: Draft evaluation is a dynamic process that should incorporate new information as it becomes available.

The draft is ultimately about identifying which players will become valuable NBA contributors. Statistical models cannot eliminate the uncertainty inherent in projecting young players, but they can systematically improve decision-making by identifying patterns that the human eye might miss.

In the next chapter, we will explore player development modeling, examining how to project improvement curves and identify which aspects of a player's game are most likely to develop at the NBA level.