Win Probability Models

Beginner 10 min read 1 views Nov 27, 2025
# Win Probability in Basketball Win probability models estimate the likelihood of a team winning based on current game state. These models incorporate score differential, time remaining, possession, and other contextual factors to provide real-time win expectancy throughout a game. ## Core Components of Win Probability Models ### Key Factors 1. **Score Differential**: Current point spread between teams 2. **Time Remaining**: Seconds left in the game or quarter 3. **Possession**: Which team has the ball 4. **Game Location**: Home court advantage 5. **Team Strength**: Pre-game ratings or odds 6. **Game Context**: Fouls remaining, timeouts available ### Basic Win Probability Formula The fundamental win probability can be modeled using logistic regression: ``` P(Win) = 1 / (1 + e^(-z)) where z = β₀ + β₁(ScoreDiff) + β₂(TimeRemaining) + β₃(Possession) + β₄(HomeCourt) + ... ``` ### Advanced State Variables - **Expected Possessions Remaining**: `EPR = TimeRemaining / AvgPossessionLength` - **Points Per Possession**: Team offensive efficiency - **Possessions Needed**: `PN = (OpponentScore - YourScore) / YourPPP` - **Win Probability Leverage**: How much WP changes with each possession ## Building a Win Probability Model ### Python Implementation - Logistic Regression Model ```python import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt class BasketballWinProbability: def __init__(self): self.model = LogisticRegression(max_iter=1000) self.scaler = StandardScaler() def prepare_features(self, df): """ Prepare features for win probability model Parameters: df: DataFrame with columns [score_diff, time_remaining, possession, home_team, team_strength_diff] """ # Create derived features df['seconds_remaining'] = df['time_remaining'] df['score_per_second'] = df['score_diff'] / (2880 - df['seconds_remaining'] + 1) df['possessions_remaining'] = df['seconds_remaining'] / 24 # avg possession df['points_needed_per_poss'] = -df['score_diff'] / (df['possessions_remaining'] + 0.1) # Interaction terms df['score_time_interaction'] = df['score_diff'] * np.log(df['seconds_remaining'] + 1) df['possession_score'] = df['possession'] * df['score_diff'] # Critical time indicators df['is_clutch'] = (df['seconds_remaining'] <= 300) & (abs(df['score_diff']) <= 5) df['final_minute'] = df['seconds_remaining'] <= 60 feature_cols = [ 'score_diff', 'seconds_remaining', 'possession', 'home_team', 'team_strength_diff', 'score_per_second', 'possessions_remaining', 'points_needed_per_poss', 'score_time_interaction', 'possession_score', 'is_clutch', 'final_minute' ] return df[feature_cols] def train(self, X, y): """ Train the win probability model Parameters: X: Feature matrix y: Binary outcome (1 = home team won, 0 = away team won) """ X_scaled = self.scaler.fit_transform(X) self.model.fit(X_scaled, y) # Calculate training accuracy train_accuracy = self.model.score(X_scaled, y) print(f"Training Accuracy: {train_accuracy:.4f}") return self def predict_win_probability(self, score_diff, time_remaining, possession=0, home_team=1, team_strength_diff=0): """ Predict win probability for a given game state Parameters: score_diff: Point differential (positive = home team leading) time_remaining: Seconds remaining in game possession: 1 if home team has possession, 0 otherwise home_team: 1 for home team perspective, 0 for away team_strength_diff: Pre-game rating difference Returns: Win probability (0 to 1) """ # Create feature vector possessions_remaining = time_remaining / 24 score_per_second = score_diff / (2880 - time_remaining + 1) points_needed_per_poss = -score_diff / (possessions_remaining + 0.1) score_time_interaction = score_diff * np.log(time_remaining + 1) possession_score = possession * score_diff is_clutch = int((time_remaining <= 300) and (abs(score_diff) <= 5)) final_minute = int(time_remaining <= 60) features = np.array([[ score_diff, time_remaining, possession, home_team, team_strength_diff, score_per_second, possessions_remaining, points_needed_per_poss, score_time_interaction, possession_score, is_clutch, final_minute ]]) features_scaled = self.scaler.transform(features) win_prob = self.model.predict_proba(features_scaled)[0][1] return win_prob def calculate_leverage(self, score_diff, time_remaining, possession=0, home_team=1, team_strength_diff=0): """ Calculate win probability leverage (how much WP changes with score) """ current_wp = self.predict_win_probability( score_diff, time_remaining, possession, home_team, team_strength_diff ) # WP if home team scores 2 points wp_if_score = self.predict_win_probability( score_diff + 2, time_remaining - 24, 1 - possession, home_team, team_strength_diff ) leverage = abs(wp_if_score - current_wp) return leverage # Example usage with simulated data def generate_sample_data(n_samples=10000): """Generate simulated basketball game states""" np.random.seed(42) data = [] for _ in range(n_samples): # Random game state time_remaining = np.random.randint(0, 2880) # Score differential - more extreme late in close games if time_remaining < 300: score_diff = np.random.randint(-15, 16) else: score_diff = np.random.randint(-30, 31) possession = np.random.choice([0, 1]) home_team = 1 team_strength_diff = np.random.normal(0, 3) # Simulate outcome based on realistic probabilities base_prob = 1 / (1 + np.exp(-(0.15 * score_diff + 0.001 * time_remaining * np.sign(score_diff)))) home_won = np.random.random() < base_prob data.append({ 'score_diff': score_diff, 'time_remaining': time_remaining, 'possession': possession, 'home_team': home_team, 'team_strength_diff': team_strength_diff, 'home_won': int(home_won) }) return pd.DataFrame(data) # Train model df = generate_sample_data(10000) wp_model = BasketballWinProbability() X = wp_model.prepare_features(df.copy()) y = df['home_won'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) wp_model.train(X_train, y_train) # Evaluate on test set X_test_scaled = wp_model.scaler.transform(X_test) test_accuracy = wp_model.model.score(X_test_scaled, y_test) print(f"Test Accuracy: {test_accuracy:.4f}") # Example prediction win_prob = wp_model.predict_win_probability( score_diff=5, time_remaining=120, possession=1, home_team=1, team_strength_diff=2.5 ) print(f"\nWin Probability (up 5, 2 min left, with ball): {win_prob:.1%}") # Calculate leverage leverage = wp_model.calculate_leverage( score_diff=2, time_remaining=60, possession=1 ) print(f"Win Probability Leverage (up 2, 1 min left): {leverage:.1%}") ``` ### R Implementation - GAM-based Win Probability ```r library(mgcv) library(dplyr) library(ggplot2) # Win Probability Model using Generalized Additive Model build_win_prob_gam <- function(game_data) { # Prepare features game_data <- game_data %>% mutate( possessions_remaining = time_remaining / 24, score_per_minute = score_diff / ((2880 - time_remaining) / 60 + 0.1), critical_time = as.numeric(time_remaining <= 300 & abs(score_diff) <= 5) ) # Build GAM model with smooth terms model <- gam( home_won ~ s(score_diff, k = 20) + s(time_remaining, k = 20) + s(score_diff, time_remaining) + # interaction possession + home_court + team_strength_diff + critical_time, data = game_data, family = binomial(link = "logit") ) return(model) } # Predict win probability predict_win_prob <- function(model, score_diff, time_remaining, possession = 0, home_court = 1, team_strength_diff = 0) { possessions_remaining <- time_remaining / 24 score_per_minute <- score_diff / ((2880 - time_remaining) / 60 + 0.1) critical_time <- as.numeric(time_remaining <= 300 & abs(score_diff) <= 5) new_data <- data.frame( score_diff = score_diff, time_remaining = time_remaining, possession = possession, home_court = home_court, team_strength_diff = team_strength_diff, possessions_remaining = possessions_remaining, score_per_minute = score_per_minute, critical_time = critical_time ) win_prob <- predict(model, new_data, type = "response") return(win_prob) } # Generate win probability chart for a game generate_wp_chart <- function(play_by_play_data, model) { # Calculate win probability for each play pbp_with_wp <- play_by_play_data %>% rowwise() %>% mutate( win_prob = predict_win_prob( model, score_diff, time_remaining, possession, home_court, team_strength_diff ) ) %>% ungroup() # Create visualization ggplot(pbp_with_wp, aes(x = 2880 - time_remaining, y = win_prob)) + geom_line(size = 1.2, color = "#1f77b4") + geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) + scale_y_continuous(labels = scales::percent, limits = c(0, 1)) + scale_x_continuous(breaks = seq(0, 2880, 720), labels = c("0", "Q1", "Q2", "Q3", "Q4")) + labs( title = "Live Win Probability", x = "Game Time", y = "Home Team Win Probability" ) + theme_minimal() + theme( plot.title = element_text(size = 16, face = "bold"), axis.title = element_text(size = 12) ) } # Example usage set.seed(42) # Simulate game data n <- 5000 game_data <- data.frame( score_diff = sample(-25:25, n, replace = TRUE), time_remaining = sample(0:2880, n, replace = TRUE), possession = sample(0:1, n, replace = TRUE), home_court = 1, team_strength_diff = rnorm(n, 0, 3) ) # Simulate outcomes game_data$home_won <- rbinom(n, 1, plogis(0.12 * game_data$score_diff + 0.0005 * game_data$time_remaining)) # Train model wp_model <- build_win_prob_gam(game_data) # Predict specific scenario win_prob <- predict_win_prob( wp_model, score_diff = 7, time_remaining = 180, possession = 1, team_strength_diff = 3.5 ) cat(sprintf("Win Probability (up 7, 3 min left, with ball): %.1f%%\n", win_prob * 100)) ``` ## Expected Points Added (EPA) and Win Probability Added (WPA) ### Win Probability Added (WPA) WPA measures the change in win probability from one play to the next: ``` WPA = WP_after - WP_before ``` ### Python Implementation - WPA Calculator ```python import numpy as np import pandas as pd class BasketballWPA: def __init__(self, win_prob_model): self.wp_model = win_prob_model def calculate_wpa(self, play_before, play_after): """ Calculate Win Probability Added for a play Parameters: play_before: dict with game state before play play_after: dict with game state after play Returns: WPA value (positive = increased win probability) """ wp_before = self.wp_model.predict_win_probability( score_diff=play_before['score_diff'], time_remaining=play_before['time_remaining'], possession=play_before['possession'], home_team=play_before.get('home_team', 1), team_strength_diff=play_before.get('team_strength_diff', 0) ) wp_after = self.wp_model.predict_win_probability( score_diff=play_after['score_diff'], time_remaining=play_after['time_remaining'], possession=play_after['possession'], home_team=play_after.get('home_team', 1), team_strength_diff=play_after.get('team_strength_diff', 0) ) wpa = wp_after - wp_before return wpa def calculate_game_wpa(self, play_by_play_df): """ Calculate WPA for all plays in a game Parameters: play_by_play_df: DataFrame with play-by-play data Returns: DataFrame with WPA for each play """ wpa_values = [] for i in range(len(play_by_play_df) - 1): play_before = play_by_play_df.iloc[i].to_dict() play_after = play_by_play_df.iloc[i + 1].to_dict() wpa = self.calculate_wpa(play_before, play_after) wpa_values.append(wpa) # Last play wpa_values.append(0) play_by_play_df['WPA'] = wpa_values return play_by_play_df def identify_clutch_plays(self, play_by_play_df, threshold=0.10): """ Identify clutch plays (plays with high WPA impact) Parameters: threshold: Minimum absolute WPA to be considered clutch Returns: DataFrame of clutch plays """ play_by_play_df = self.calculate_game_wpa(play_by_play_df) clutch_plays = play_by_play_df[ abs(play_by_play_df['WPA']) >= threshold ].copy() clutch_plays = clutch_plays.sort_values('WPA', ascending=False) return clutch_plays def player_wpa_summary(self, play_by_play_df): """ Calculate total WPA by player """ play_by_play_df = self.calculate_game_wpa(play_by_play_df) player_wpa = play_by_play_df.groupby('player').agg({ 'WPA': ['sum', 'mean', 'count'], 'time_remaining': 'mean' }).round(4) player_wpa.columns = ['Total_WPA', 'Avg_WPA', 'Plays', 'Avg_Time_Remaining'] player_wpa = player_wpa.sort_values('Total_WPA', ascending=False) return player_wpa # Example: Calculate WPA for a clutch shot wp_model = BasketballWinProbability() # Assume wp_model is trained wpa_calculator = BasketballWPA(wp_model) # Game state before clutch three-pointer before_play = { 'score_diff': -3, # Down 3 'time_remaining': 15, # 15 seconds left 'possession': 1, # Team has ball 'home_team': 1, 'team_strength_diff': 0 } # Game state after making the three after_play = { 'score_diff': 0, # Tied 'time_remaining': 12, # 12 seconds left 'possession': 0, # Other team's ball 'home_team': 1, 'team_strength_diff': 0 } wpa = wpa_calculator.calculate_wpa(before_play, after_play) print(f"WPA of clutch three-pointer: {wpa:+.3f} ({wpa*100:+.1f}%)") # Create sample play-by-play data pbp_data = pd.DataFrame({ 'play_id': range(1, 11), 'player': ['Player A', 'Player B', 'Player A', 'Player C', 'Player B', 'Player A', 'Player D', 'Player B', 'Player A', 'Player C'], 'score_diff': [0, 2, 2, 0, 0, -2, -2, -1, 1, 1], 'time_remaining': [300, 280, 260, 240, 220, 200, 180, 160, 140, 120], 'possession': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0], 'home_team': [1] * 10, 'team_strength_diff': [0] * 10 }) # Calculate WPA for all plays pbp_with_wpa = wpa_calculator.calculate_game_wpa(pbp_data) print("\nPlay-by-Play with WPA:") print(pbp_with_wpa[['player', 'score_diff', 'time_remaining', 'WPA']]) # Player WPA summary player_summary = wpa_calculator.player_wpa_summary(pbp_data) print("\nPlayer WPA Summary:") print(player_summary) ``` ### Expected Points Added (EPA) EPA measures the value of a possession based on expected points: ``` EPA = ExpectedPoints_after - ExpectedPoints_before ``` ### Python Implementation - EPA Model ```python import numpy as np from sklearn.ensemble import RandomForestRegressor class BasketballEPA: def __init__(self): self.model = RandomForestRegressor(n_estimators=100, random_state=42) def prepare_features(self, df): """Prepare features for expected points model""" features = df[['score_diff', 'time_remaining', 'possession_number', 'shot_clock', 'court_zone', 'defender_distance']].copy() return features def train(self, X, y): """ Train expected points model Parameters: X: Features (game state) y: Points scored on possession (0, 1, 2, 3, etc.) """ self.model.fit(X, y) return self def predict_expected_points(self, game_state): """Predict expected points for a possession""" expected_points = self.model.predict([game_state])[0] return expected_points def calculate_epa(self, state_before, state_after, points_scored): """ Calculate EPA for a play Parameters: state_before: Game state at start of possession state_after: Game state at end of possession points_scored: Actual points scored Returns: EPA value """ ep_before = self.predict_expected_points(state_before) ep_after = self.predict_expected_points(state_after) epa = points_scored - ep_before return epa # Example usage epa_model = BasketballEPA() # Simulate training data n_samples = 5000 X_train = np.random.randn(n_samples, 6) # 6 features y_train = np.random.choice([0, 1, 2, 3], n_samples, p=[0.3, 0.1, 0.45, 0.15]) epa_model.train(X_train, y_train) # Calculate EPA for a specific play state_before = [2, 600, 45, 18, 1, 3] # score_diff, time, poss#, shot_clock, zone, dist state_after = [4, 575, 46, 24, 2, 5] points_scored = 2 epa = epa_model.calculate_epa(state_before, state_after, points_scored) print(f"EPA for the play: {epa:.3f}") ``` ## Clutch Performance Measurement ### Clutch Situations Definition Clutch situations are typically defined as: - Last 5 minutes of the 4th quarter or overtime - Score differential within 5 points ### Python Implementation - Clutch Metrics ```python import pandas as pd import numpy as np class ClutchPerformanceAnalyzer: def __init__(self, wpa_calculator=None): self.wpa_calculator = wpa_calculator def identify_clutch_situations(self, play_by_play_df, time_threshold=300, score_threshold=5): """ Identify clutch situations in play-by-play data Parameters: time_threshold: Seconds remaining (default 300 = 5 minutes) score_threshold: Max point differential (default 5) """ clutch_df = play_by_play_df[ (play_by_play_df['time_remaining'] <= time_threshold) & (abs(play_by_play_df['score_diff']) <= score_threshold) & (play_by_play_df['period'] >= 4) ].copy() return clutch_df def calculate_clutch_stats(self, player_plays_df): """ Calculate clutch performance statistics for a player Parameters: player_plays_df: DataFrame with player's plays in clutch situations Returns: Dictionary with clutch statistics """ stats = { 'clutch_plays': len(player_plays_df), 'clutch_points': player_plays_df['points'].sum(), 'clutch_fg_pct': player_plays_df['fg_made'].sum() / player_plays_df['fg_attempted'].sum() if player_plays_df['fg_attempted'].sum() > 0 else 0, 'clutch_three_pct': player_plays_df['three_made'].sum() / player_plays_df['three_attempted'].sum() if player_plays_df['three_attempted'].sum() > 0 else 0, 'clutch_ft_pct': player_plays_df['ft_made'].sum() / player_plays_df['ft_attempted'].sum() if player_plays_df['ft_attempted'].sum() > 0 else 0, } # Add WPA if available if self.wpa_calculator and 'WPA' in player_plays_df.columns: stats['total_clutch_wpa'] = player_plays_df['WPA'].sum() stats['avg_clutch_wpa'] = player_plays_df['WPA'].mean() stats['max_clutch_wpa'] = player_plays_df['WPA'].max() return stats def clutch_rating(self, player_stats, league_avg_stats): """ Calculate clutch rating comparing player to league average Clutch Rating = (Player Clutch Stats / Player Regular Stats) / (League Clutch Stats / League Regular Stats) * 100 """ player_clutch_ratio = player_stats['clutch_fg_pct'] / player_stats['regular_fg_pct'] league_clutch_ratio = league_avg_stats['clutch_fg_pct'] / league_avg_stats['regular_fg_pct'] clutch_rating = (player_clutch_ratio / league_clutch_ratio) * 100 return clutch_rating def leverage_index(self, play_by_play_df): """ Calculate Leverage Index for each play LI measures how impactful a play could be """ if self.wpa_calculator is None: raise ValueError("WPA calculator required for leverage index") leverage_values = [] for idx, row in play_by_play_df.iterrows(): # Calculate potential WPA swing current_state = row.to_dict() # Simulate successful outcome success_state = current_state.copy() success_state['score_diff'] += 2 success_state['time_remaining'] -= 24 success_state['possession'] = 1 - success_state['possession'] # Simulate failed outcome fail_state = current_state.copy() fail_state['time_remaining'] -= 24 fail_state['possession'] = 1 - fail_state['possession'] wpa_success = self.wpa_calculator.calculate_wpa(current_state, success_state) wpa_fail = self.wpa_calculator.calculate_wpa(current_state, fail_state) # Leverage is the potential WPA swing leverage = abs(wpa_success - wpa_fail) leverage_values.append(leverage) play_by_play_df['leverage_index'] = leverage_values # Normalize to league average = 1.0 avg_leverage = np.mean(leverage_values) play_by_play_df['leverage_index'] = play_by_play_df['leverage_index'] / avg_leverage return play_by_play_df def clutch_wpa_per_48(self, player_clutch_plays): """Calculate clutch WPA per 48 minutes""" total_wpa = player_clutch_plays['WPA'].sum() total_minutes = player_clutch_plays['minutes_played'].sum() wpa_per_48 = (total_wpa / total_minutes) * 48 if total_minutes > 0 else 0 return wpa_per_48 # Example: Analyze clutch performance clutch_analyzer = ClutchPerformanceAnalyzer() # Sample play-by-play data pbp_df = pd.DataFrame({ 'player': ['LeBron James', 'Stephen Curry', 'LeBron James', 'Kevin Durant'], 'period': [4, 4, 4, 4], 'time_remaining': [180, 120, 90, 45], 'score_diff': [-2, 3, 1, -1], 'points': [2, 3, 0, 2], 'fg_made': [1, 1, 0, 1], 'fg_attempted': [1, 1, 1, 1], 'three_made': [0, 1, 0, 0], 'three_attempted': [0, 1, 0, 0], 'ft_made': [0, 0, 0, 0], 'ft_attempted': [0, 0, 0, 0], 'possession': [1, 1, 1, 1], 'minutes_played': [3, 2, 1.5, 0.75], 'WPA': [0.08, 0.15, -0.05, 0.12] }) # Identify clutch situations clutch_plays = clutch_analyzer.identify_clutch_situations(pbp_df) print("Clutch Plays:") print(clutch_plays[['player', 'time_remaining', 'score_diff', 'points', 'WPA']]) # Calculate clutch stats for LeBron lebron_clutch = clutch_plays[clutch_plays['player'] == 'LeBron James'] lebron_stats = clutch_analyzer.calculate_clutch_stats(lebron_clutch) print("\nLeBron James Clutch Stats:") for key, value in lebron_stats.items(): print(f" {key}: {value:.3f}") ``` ### R Implementation - Clutch Performance Analysis ```r library(dplyr) library(ggplot2) # Clutch Performance Analyzer analyze_clutch_performance <- function(play_by_play, time_threshold = 300, score_threshold = 5) { # Identify clutch situations clutch_data <- play_by_play %>% filter( time_remaining <= time_threshold, abs(score_diff) <= score_threshold, period >= 4 ) # Calculate clutch statistics by player clutch_stats <- clutch_data %>% group_by(player) %>% summarise( clutch_plays = n(), clutch_points = sum(points, na.rm = TRUE), clutch_fgm = sum(fg_made, na.rm = TRUE), clutch_fga = sum(fg_attempted, na.rm = TRUE), clutch_fg_pct = clutch_fgm / clutch_fga, clutch_3pm = sum(three_made, na.rm = TRUE), clutch_3pa = sum(three_attempted, na.rm = TRUE), clutch_3p_pct = clutch_3pm / clutch_3pa, total_wpa = sum(WPA, na.rm = TRUE), avg_wpa = mean(WPA, na.rm = TRUE), max_wpa = max(WPA, na.rm = TRUE), .groups = 'drop' ) %>% arrange(desc(total_wpa)) return(clutch_stats) } # Calculate Clutch Rating calculate_clutch_rating <- function(player_data) { player_data %>% mutate( clutch_ratio = clutch_fg_pct / regular_fg_pct, clutch_rating = (clutch_ratio / mean(clutch_ratio, na.rm = TRUE)) * 100 ) } # Visualize clutch performance plot_clutch_performance <- function(clutch_stats, top_n = 10) { top_clutch <- clutch_stats %>% top_n(top_n, total_wpa) ggplot(top_clutch, aes(x = reorder(player, total_wpa), y = total_wpa)) + geom_col(fill = "#d62728", alpha = 0.8) + coord_flip() + labs( title = "Top Clutch Performers by Total WPA", x = "Player", y = "Total Win Probability Added (Clutch)" ) + theme_minimal() + theme( plot.title = element_text(size = 14, face = "bold"), axis.text = element_text(size = 10) ) } # Example usage set.seed(42) pbp_example <- data.frame( player = sample(c("Player A", "Player B", "Player C"), 100, replace = TRUE), period = sample(4:5, 100, replace = TRUE), time_remaining = sample(0:300, 100, replace = TRUE), score_diff = sample(-5:5, 100, replace = TRUE), points = sample(c(0, 1, 2, 3), 100, replace = TRUE, prob = c(0.4, 0.1, 0.35, 0.15)), fg_made = sample(0:1, 100, replace = TRUE), fg_attempted = 1, three_made = sample(0:1, 100, replace = TRUE, prob = c(0.7, 0.3)), three_attempted = sample(0:1, 100, replace = TRUE, prob = c(0.5, 0.5)), WPA = rnorm(100, 0, 0.05) ) clutch_results <- analyze_clutch_performance(pbp_example) print(clutch_results) ``` ## Live Win Probability Dashboard ### Python Implementation - Real-time WP Tracker ```python import matplotlib.pyplot as plt from matplotlib.animation import FuncAnimation import numpy as np class LiveWinProbabilityTracker: def __init__(self, wp_model): self.wp_model = wp_model self.game_states = [] self.win_probs = [] def update_game_state(self, score_home, score_away, time_remaining, possession, team_strength_diff=0): """Update with new game state""" score_diff = score_home - score_away wp = self.wp_model.predict_win_probability( score_diff=score_diff, time_remaining=time_remaining, possession=possession, home_team=1, team_strength_diff=team_strength_diff ) self.game_states.append({ 'score_home': score_home, 'score_away': score_away, 'time_remaining': time_remaining, 'possession': possession, 'elapsed_time': 2880 - time_remaining }) self.win_probs.append(wp) return wp def plot_live_wp(self): """Generate live win probability chart""" if not self.game_states: return elapsed_times = [s['elapsed_time'] for s in self.game_states] plt.figure(figsize=(12, 6)) plt.plot(elapsed_times, self.win_probs, linewidth=2.5, color='#1f77b4') plt.fill_between(elapsed_times, self.win_probs, 0.5, alpha=0.3, color='#1f77b4') plt.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5) plt.ylim(0, 1) plt.xlim(0, 2880) # Quarter markers for q in [720, 1440, 2160, 2880]: plt.axvline(x=q, color='lightgray', linestyle=':', alpha=0.5) plt.xlabel('Game Time (seconds)', fontsize=12) plt.ylabel('Home Team Win Probability', fontsize=12) plt.title('Live Win Probability Chart', fontsize=14, fontweight='bold') # Format y-axis as percentage plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}')) # Add current score and WP current_state = self.game_states[-1] current_wp = self.win_probs[-1] score_text = f"Score: {current_state['score_home']}-{current_state['score_away']}\n" score_text += f"Win Prob: {current_wp:.1%}" plt.text(0.02, 0.98, score_text, transform=plt.gca().transAxes, fontsize=11, verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white', alpha=0.8)) plt.grid(True, alpha=0.3) plt.tight_layout() plt.show() def get_key_moments(self, threshold=0.10): """Identify key moments (large WP swings)""" key_moments = [] for i in range(1, len(self.win_probs)): wp_change = abs(self.win_probs[i] - self.win_probs[i-1]) if wp_change >= threshold: key_moments.append({ 'play_num': i, 'time': self.game_states[i]['elapsed_time'], 'wp_before': self.win_probs[i-1], 'wp_after': self.win_probs[i], 'wp_change': self.win_probs[i] - self.win_probs[i-1], 'score': f"{self.game_states[i]['score_home']}-{self.game_states[i]['score_away']}" }) return sorted(key_moments, key=lambda x: abs(x['wp_change']), reverse=True) # Example: Simulate a game with live WP tracking wp_model = BasketballWinProbability() # Assume model is trained tracker = LiveWinProbabilityTracker(wp_model) # Simulate game progression game_events = [ (0, 0, 2880, 0), # Start (2, 0, 2856, 0), # Home scores (2, 3, 2832, 1), # Away scores 3 (5, 3, 2808, 1), # Home scores 3 (7, 3, 2784, 0), # Home scores 2 (7, 6, 2760, 1), # Away scores 3 # ... more events ... (98, 95, 120, 1), # Close game, 2 min left (100, 95, 96, 0), # Home pulls ahead (100, 98, 72, 1), # Away scores 3 (103, 98, 24, 0), # Home scores 3 (103, 100, 8, 1), # Away scores 2 (103, 100, 0, 0), # Final ] print("Live Win Probability Updates:\n") for score_h, score_a, time, poss in game_events: wp = tracker.update_game_state(score_h, score_a, time, poss) time_formatted = f"{time//60}:{time%60:02d}" print(f"Time {time_formatted} | Score {score_h}-{score_a} | WP: {wp:.1%}") # Plot the game tracker.plot_live_wp() # Get key moments key_moments = tracker.get_key_moments(threshold=0.08) print(f"\n\nTop 5 Key Moments:") for i, moment in enumerate(key_moments[:5], 1): print(f"{i}. Time {moment['time']//60}:{moment['time']%60:02d} | " f"Score {moment['score']} | WP Change: {moment['wp_change']:+.1%}") ``` ## Summary Win probability models in basketball provide real-time insights into game dynamics by: 1. **Modeling win likelihood** based on score, time, possession, and context 2. **Calculating WPA** to measure play impact on winning chances 3. **Computing EPA** to evaluate possession efficiency 4. **Identifying clutch moments** using leverage and WP swings 5. **Measuring player performance** in high-pressure situations These metrics enhance game analysis, player evaluation, and strategic decision-making. ### Key Formulas - **Win Probability**: `P(Win) = 1 / (1 + e^(-z))` where z is linear combination of features - **WPA**: `WP_after - WP_before` - **EPA**: `Points_scored - ExpectedPoints_before` - **Leverage Index**: Potential WP swing on a play - **Clutch Rating**: Player performance ratio vs league average in clutch situations ### Applications - **Broadcasting**: Live win probability graphics - **Coaching**: Identifying critical game moments - **Player Evaluation**: Clutch performance metrics - **Betting**: In-game probability updates - **Strategy**: Understanding high-leverage situations

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.