Win Probability Models
Beginner
10 min read
1 views
Nov 27, 2025
# Win Probability in Basketball
Win probability models estimate the likelihood of a team winning based on current game state. These models incorporate score differential, time remaining, possession, and other contextual factors to provide real-time win expectancy throughout a game.
## Core Components of Win Probability Models
### Key Factors
1. **Score Differential**: Current point spread between teams
2. **Time Remaining**: Seconds left in the game or quarter
3. **Possession**: Which team has the ball
4. **Game Location**: Home court advantage
5. **Team Strength**: Pre-game ratings or odds
6. **Game Context**: Fouls remaining, timeouts available
### Basic Win Probability Formula
The fundamental win probability can be modeled using logistic regression:
```
P(Win) = 1 / (1 + e^(-z))
where z = β₀ + β₁(ScoreDiff) + β₂(TimeRemaining) + β₃(Possession) + β₄(HomeCourt) + ...
```
### Advanced State Variables
- **Expected Possessions Remaining**: `EPR = TimeRemaining / AvgPossessionLength`
- **Points Per Possession**: Team offensive efficiency
- **Possessions Needed**: `PN = (OpponentScore - YourScore) / YourPPP`
- **Win Probability Leverage**: How much WP changes with each possession
## Building a Win Probability Model
### Python Implementation - Logistic Regression Model
```python
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
class BasketballWinProbability:
def __init__(self):
self.model = LogisticRegression(max_iter=1000)
self.scaler = StandardScaler()
def prepare_features(self, df):
"""
Prepare features for win probability model
Parameters:
df: DataFrame with columns [score_diff, time_remaining, possession,
home_team, team_strength_diff]
"""
# Create derived features
df['seconds_remaining'] = df['time_remaining']
df['score_per_second'] = df['score_diff'] / (2880 - df['seconds_remaining'] + 1)
df['possessions_remaining'] = df['seconds_remaining'] / 24 # avg possession
df['points_needed_per_poss'] = -df['score_diff'] / (df['possessions_remaining'] + 0.1)
# Interaction terms
df['score_time_interaction'] = df['score_diff'] * np.log(df['seconds_remaining'] + 1)
df['possession_score'] = df['possession'] * df['score_diff']
# Critical time indicators
df['is_clutch'] = (df['seconds_remaining'] <= 300) & (abs(df['score_diff']) <= 5)
df['final_minute'] = df['seconds_remaining'] <= 60
feature_cols = [
'score_diff', 'seconds_remaining', 'possession', 'home_team',
'team_strength_diff', 'score_per_second', 'possessions_remaining',
'points_needed_per_poss', 'score_time_interaction', 'possession_score',
'is_clutch', 'final_minute'
]
return df[feature_cols]
def train(self, X, y):
"""
Train the win probability model
Parameters:
X: Feature matrix
y: Binary outcome (1 = home team won, 0 = away team won)
"""
X_scaled = self.scaler.fit_transform(X)
self.model.fit(X_scaled, y)
# Calculate training accuracy
train_accuracy = self.model.score(X_scaled, y)
print(f"Training Accuracy: {train_accuracy:.4f}")
return self
def predict_win_probability(self, score_diff, time_remaining, possession=0,
home_team=1, team_strength_diff=0):
"""
Predict win probability for a given game state
Parameters:
score_diff: Point differential (positive = home team leading)
time_remaining: Seconds remaining in game
possession: 1 if home team has possession, 0 otherwise
home_team: 1 for home team perspective, 0 for away
team_strength_diff: Pre-game rating difference
Returns:
Win probability (0 to 1)
"""
# Create feature vector
possessions_remaining = time_remaining / 24
score_per_second = score_diff / (2880 - time_remaining + 1)
points_needed_per_poss = -score_diff / (possessions_remaining + 0.1)
score_time_interaction = score_diff * np.log(time_remaining + 1)
possession_score = possession * score_diff
is_clutch = int((time_remaining <= 300) and (abs(score_diff) <= 5))
final_minute = int(time_remaining <= 60)
features = np.array([[
score_diff, time_remaining, possession, home_team,
team_strength_diff, score_per_second, possessions_remaining,
points_needed_per_poss, score_time_interaction, possession_score,
is_clutch, final_minute
]])
features_scaled = self.scaler.transform(features)
win_prob = self.model.predict_proba(features_scaled)[0][1]
return win_prob
def calculate_leverage(self, score_diff, time_remaining, possession=0,
home_team=1, team_strength_diff=0):
"""
Calculate win probability leverage (how much WP changes with score)
"""
current_wp = self.predict_win_probability(
score_diff, time_remaining, possession, home_team, team_strength_diff
)
# WP if home team scores 2 points
wp_if_score = self.predict_win_probability(
score_diff + 2, time_remaining - 24, 1 - possession,
home_team, team_strength_diff
)
leverage = abs(wp_if_score - current_wp)
return leverage
# Example usage with simulated data
def generate_sample_data(n_samples=10000):
"""Generate simulated basketball game states"""
np.random.seed(42)
data = []
for _ in range(n_samples):
# Random game state
time_remaining = np.random.randint(0, 2880)
# Score differential - more extreme late in close games
if time_remaining < 300:
score_diff = np.random.randint(-15, 16)
else:
score_diff = np.random.randint(-30, 31)
possession = np.random.choice([0, 1])
home_team = 1
team_strength_diff = np.random.normal(0, 3)
# Simulate outcome based on realistic probabilities
base_prob = 1 / (1 + np.exp(-(0.15 * score_diff + 0.001 * time_remaining * np.sign(score_diff))))
home_won = np.random.random() < base_prob
data.append({
'score_diff': score_diff,
'time_remaining': time_remaining,
'possession': possession,
'home_team': home_team,
'team_strength_diff': team_strength_diff,
'home_won': int(home_won)
})
return pd.DataFrame(data)
# Train model
df = generate_sample_data(10000)
wp_model = BasketballWinProbability()
X = wp_model.prepare_features(df.copy())
y = df['home_won']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
wp_model.train(X_train, y_train)
# Evaluate on test set
X_test_scaled = wp_model.scaler.transform(X_test)
test_accuracy = wp_model.model.score(X_test_scaled, y_test)
print(f"Test Accuracy: {test_accuracy:.4f}")
# Example prediction
win_prob = wp_model.predict_win_probability(
score_diff=5,
time_remaining=120,
possession=1,
home_team=1,
team_strength_diff=2.5
)
print(f"\nWin Probability (up 5, 2 min left, with ball): {win_prob:.1%}")
# Calculate leverage
leverage = wp_model.calculate_leverage(
score_diff=2,
time_remaining=60,
possession=1
)
print(f"Win Probability Leverage (up 2, 1 min left): {leverage:.1%}")
```
### R Implementation - GAM-based Win Probability
```r
library(mgcv)
library(dplyr)
library(ggplot2)
# Win Probability Model using Generalized Additive Model
build_win_prob_gam <- function(game_data) {
# Prepare features
game_data <- game_data %>%
mutate(
possessions_remaining = time_remaining / 24,
score_per_minute = score_diff / ((2880 - time_remaining) / 60 + 0.1),
critical_time = as.numeric(time_remaining <= 300 & abs(score_diff) <= 5)
)
# Build GAM model with smooth terms
model <- gam(
home_won ~
s(score_diff, k = 20) +
s(time_remaining, k = 20) +
s(score_diff, time_remaining) + # interaction
possession +
home_court +
team_strength_diff +
critical_time,
data = game_data,
family = binomial(link = "logit")
)
return(model)
}
# Predict win probability
predict_win_prob <- function(model, score_diff, time_remaining,
possession = 0, home_court = 1,
team_strength_diff = 0) {
possessions_remaining <- time_remaining / 24
score_per_minute <- score_diff / ((2880 - time_remaining) / 60 + 0.1)
critical_time <- as.numeric(time_remaining <= 300 & abs(score_diff) <= 5)
new_data <- data.frame(
score_diff = score_diff,
time_remaining = time_remaining,
possession = possession,
home_court = home_court,
team_strength_diff = team_strength_diff,
possessions_remaining = possessions_remaining,
score_per_minute = score_per_minute,
critical_time = critical_time
)
win_prob <- predict(model, new_data, type = "response")
return(win_prob)
}
# Generate win probability chart for a game
generate_wp_chart <- function(play_by_play_data, model) {
# Calculate win probability for each play
pbp_with_wp <- play_by_play_data %>%
rowwise() %>%
mutate(
win_prob = predict_win_prob(
model, score_diff, time_remaining,
possession, home_court, team_strength_diff
)
) %>%
ungroup()
# Create visualization
ggplot(pbp_with_wp, aes(x = 2880 - time_remaining, y = win_prob)) +
geom_line(size = 1.2, color = "#1f77b4") +
geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
scale_x_continuous(breaks = seq(0, 2880, 720),
labels = c("0", "Q1", "Q2", "Q3", "Q4")) +
labs(
title = "Live Win Probability",
x = "Game Time",
y = "Home Team Win Probability"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12)
)
}
# Example usage
set.seed(42)
# Simulate game data
n <- 5000
game_data <- data.frame(
score_diff = sample(-25:25, n, replace = TRUE),
time_remaining = sample(0:2880, n, replace = TRUE),
possession = sample(0:1, n, replace = TRUE),
home_court = 1,
team_strength_diff = rnorm(n, 0, 3)
)
# Simulate outcomes
game_data$home_won <- rbinom(n, 1,
plogis(0.12 * game_data$score_diff + 0.0005 * game_data$time_remaining))
# Train model
wp_model <- build_win_prob_gam(game_data)
# Predict specific scenario
win_prob <- predict_win_prob(
wp_model,
score_diff = 7,
time_remaining = 180,
possession = 1,
team_strength_diff = 3.5
)
cat(sprintf("Win Probability (up 7, 3 min left, with ball): %.1f%%\n",
win_prob * 100))
```
## Expected Points Added (EPA) and Win Probability Added (WPA)
### Win Probability Added (WPA)
WPA measures the change in win probability from one play to the next:
```
WPA = WP_after - WP_before
```
### Python Implementation - WPA Calculator
```python
import numpy as np
import pandas as pd
class BasketballWPA:
def __init__(self, win_prob_model):
self.wp_model = win_prob_model
def calculate_wpa(self, play_before, play_after):
"""
Calculate Win Probability Added for a play
Parameters:
play_before: dict with game state before play
play_after: dict with game state after play
Returns:
WPA value (positive = increased win probability)
"""
wp_before = self.wp_model.predict_win_probability(
score_diff=play_before['score_diff'],
time_remaining=play_before['time_remaining'],
possession=play_before['possession'],
home_team=play_before.get('home_team', 1),
team_strength_diff=play_before.get('team_strength_diff', 0)
)
wp_after = self.wp_model.predict_win_probability(
score_diff=play_after['score_diff'],
time_remaining=play_after['time_remaining'],
possession=play_after['possession'],
home_team=play_after.get('home_team', 1),
team_strength_diff=play_after.get('team_strength_diff', 0)
)
wpa = wp_after - wp_before
return wpa
def calculate_game_wpa(self, play_by_play_df):
"""
Calculate WPA for all plays in a game
Parameters:
play_by_play_df: DataFrame with play-by-play data
Returns:
DataFrame with WPA for each play
"""
wpa_values = []
for i in range(len(play_by_play_df) - 1):
play_before = play_by_play_df.iloc[i].to_dict()
play_after = play_by_play_df.iloc[i + 1].to_dict()
wpa = self.calculate_wpa(play_before, play_after)
wpa_values.append(wpa)
# Last play
wpa_values.append(0)
play_by_play_df['WPA'] = wpa_values
return play_by_play_df
def identify_clutch_plays(self, play_by_play_df, threshold=0.10):
"""
Identify clutch plays (plays with high WPA impact)
Parameters:
threshold: Minimum absolute WPA to be considered clutch
Returns:
DataFrame of clutch plays
"""
play_by_play_df = self.calculate_game_wpa(play_by_play_df)
clutch_plays = play_by_play_df[
abs(play_by_play_df['WPA']) >= threshold
].copy()
clutch_plays = clutch_plays.sort_values('WPA', ascending=False)
return clutch_plays
def player_wpa_summary(self, play_by_play_df):
"""
Calculate total WPA by player
"""
play_by_play_df = self.calculate_game_wpa(play_by_play_df)
player_wpa = play_by_play_df.groupby('player').agg({
'WPA': ['sum', 'mean', 'count'],
'time_remaining': 'mean'
}).round(4)
player_wpa.columns = ['Total_WPA', 'Avg_WPA', 'Plays', 'Avg_Time_Remaining']
player_wpa = player_wpa.sort_values('Total_WPA', ascending=False)
return player_wpa
# Example: Calculate WPA for a clutch shot
wp_model = BasketballWinProbability()
# Assume wp_model is trained
wpa_calculator = BasketballWPA(wp_model)
# Game state before clutch three-pointer
before_play = {
'score_diff': -3, # Down 3
'time_remaining': 15, # 15 seconds left
'possession': 1, # Team has ball
'home_team': 1,
'team_strength_diff': 0
}
# Game state after making the three
after_play = {
'score_diff': 0, # Tied
'time_remaining': 12, # 12 seconds left
'possession': 0, # Other team's ball
'home_team': 1,
'team_strength_diff': 0
}
wpa = wpa_calculator.calculate_wpa(before_play, after_play)
print(f"WPA of clutch three-pointer: {wpa:+.3f} ({wpa*100:+.1f}%)")
# Create sample play-by-play data
pbp_data = pd.DataFrame({
'play_id': range(1, 11),
'player': ['Player A', 'Player B', 'Player A', 'Player C', 'Player B',
'Player A', 'Player D', 'Player B', 'Player A', 'Player C'],
'score_diff': [0, 2, 2, 0, 0, -2, -2, -1, 1, 1],
'time_remaining': [300, 280, 260, 240, 220, 200, 180, 160, 140, 120],
'possession': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
'home_team': [1] * 10,
'team_strength_diff': [0] * 10
})
# Calculate WPA for all plays
pbp_with_wpa = wpa_calculator.calculate_game_wpa(pbp_data)
print("\nPlay-by-Play with WPA:")
print(pbp_with_wpa[['player', 'score_diff', 'time_remaining', 'WPA']])
# Player WPA summary
player_summary = wpa_calculator.player_wpa_summary(pbp_data)
print("\nPlayer WPA Summary:")
print(player_summary)
```
### Expected Points Added (EPA)
EPA measures the value of a possession based on expected points:
```
EPA = ExpectedPoints_after - ExpectedPoints_before
```
### Python Implementation - EPA Model
```python
import numpy as np
from sklearn.ensemble import RandomForestRegressor
class BasketballEPA:
def __init__(self):
self.model = RandomForestRegressor(n_estimators=100, random_state=42)
def prepare_features(self, df):
"""Prepare features for expected points model"""
features = df[['score_diff', 'time_remaining', 'possession_number',
'shot_clock', 'court_zone', 'defender_distance']].copy()
return features
def train(self, X, y):
"""
Train expected points model
Parameters:
X: Features (game state)
y: Points scored on possession (0, 1, 2, 3, etc.)
"""
self.model.fit(X, y)
return self
def predict_expected_points(self, game_state):
"""Predict expected points for a possession"""
expected_points = self.model.predict([game_state])[0]
return expected_points
def calculate_epa(self, state_before, state_after, points_scored):
"""
Calculate EPA for a play
Parameters:
state_before: Game state at start of possession
state_after: Game state at end of possession
points_scored: Actual points scored
Returns:
EPA value
"""
ep_before = self.predict_expected_points(state_before)
ep_after = self.predict_expected_points(state_after)
epa = points_scored - ep_before
return epa
# Example usage
epa_model = BasketballEPA()
# Simulate training data
n_samples = 5000
X_train = np.random.randn(n_samples, 6) # 6 features
y_train = np.random.choice([0, 1, 2, 3], n_samples, p=[0.3, 0.1, 0.45, 0.15])
epa_model.train(X_train, y_train)
# Calculate EPA for a specific play
state_before = [2, 600, 45, 18, 1, 3] # score_diff, time, poss#, shot_clock, zone, dist
state_after = [4, 575, 46, 24, 2, 5]
points_scored = 2
epa = epa_model.calculate_epa(state_before, state_after, points_scored)
print(f"EPA for the play: {epa:.3f}")
```
## Clutch Performance Measurement
### Clutch Situations Definition
Clutch situations are typically defined as:
- Last 5 minutes of the 4th quarter or overtime
- Score differential within 5 points
### Python Implementation - Clutch Metrics
```python
import pandas as pd
import numpy as np
class ClutchPerformanceAnalyzer:
def __init__(self, wpa_calculator=None):
self.wpa_calculator = wpa_calculator
def identify_clutch_situations(self, play_by_play_df,
time_threshold=300,
score_threshold=5):
"""
Identify clutch situations in play-by-play data
Parameters:
time_threshold: Seconds remaining (default 300 = 5 minutes)
score_threshold: Max point differential (default 5)
"""
clutch_df = play_by_play_df[
(play_by_play_df['time_remaining'] <= time_threshold) &
(abs(play_by_play_df['score_diff']) <= score_threshold) &
(play_by_play_df['period'] >= 4)
].copy()
return clutch_df
def calculate_clutch_stats(self, player_plays_df):
"""
Calculate clutch performance statistics for a player
Parameters:
player_plays_df: DataFrame with player's plays in clutch situations
Returns:
Dictionary with clutch statistics
"""
stats = {
'clutch_plays': len(player_plays_df),
'clutch_points': player_plays_df['points'].sum(),
'clutch_fg_pct': player_plays_df['fg_made'].sum() /
player_plays_df['fg_attempted'].sum()
if player_plays_df['fg_attempted'].sum() > 0 else 0,
'clutch_three_pct': player_plays_df['three_made'].sum() /
player_plays_df['three_attempted'].sum()
if player_plays_df['three_attempted'].sum() > 0 else 0,
'clutch_ft_pct': player_plays_df['ft_made'].sum() /
player_plays_df['ft_attempted'].sum()
if player_plays_df['ft_attempted'].sum() > 0 else 0,
}
# Add WPA if available
if self.wpa_calculator and 'WPA' in player_plays_df.columns:
stats['total_clutch_wpa'] = player_plays_df['WPA'].sum()
stats['avg_clutch_wpa'] = player_plays_df['WPA'].mean()
stats['max_clutch_wpa'] = player_plays_df['WPA'].max()
return stats
def clutch_rating(self, player_stats, league_avg_stats):
"""
Calculate clutch rating comparing player to league average
Clutch Rating = (Player Clutch Stats / Player Regular Stats) /
(League Clutch Stats / League Regular Stats) * 100
"""
player_clutch_ratio = player_stats['clutch_fg_pct'] / player_stats['regular_fg_pct']
league_clutch_ratio = league_avg_stats['clutch_fg_pct'] / league_avg_stats['regular_fg_pct']
clutch_rating = (player_clutch_ratio / league_clutch_ratio) * 100
return clutch_rating
def leverage_index(self, play_by_play_df):
"""
Calculate Leverage Index for each play
LI measures how impactful a play could be
"""
if self.wpa_calculator is None:
raise ValueError("WPA calculator required for leverage index")
leverage_values = []
for idx, row in play_by_play_df.iterrows():
# Calculate potential WPA swing
current_state = row.to_dict()
# Simulate successful outcome
success_state = current_state.copy()
success_state['score_diff'] += 2
success_state['time_remaining'] -= 24
success_state['possession'] = 1 - success_state['possession']
# Simulate failed outcome
fail_state = current_state.copy()
fail_state['time_remaining'] -= 24
fail_state['possession'] = 1 - fail_state['possession']
wpa_success = self.wpa_calculator.calculate_wpa(current_state, success_state)
wpa_fail = self.wpa_calculator.calculate_wpa(current_state, fail_state)
# Leverage is the potential WPA swing
leverage = abs(wpa_success - wpa_fail)
leverage_values.append(leverage)
play_by_play_df['leverage_index'] = leverage_values
# Normalize to league average = 1.0
avg_leverage = np.mean(leverage_values)
play_by_play_df['leverage_index'] = play_by_play_df['leverage_index'] / avg_leverage
return play_by_play_df
def clutch_wpa_per_48(self, player_clutch_plays):
"""Calculate clutch WPA per 48 minutes"""
total_wpa = player_clutch_plays['WPA'].sum()
total_minutes = player_clutch_plays['minutes_played'].sum()
wpa_per_48 = (total_wpa / total_minutes) * 48 if total_minutes > 0 else 0
return wpa_per_48
# Example: Analyze clutch performance
clutch_analyzer = ClutchPerformanceAnalyzer()
# Sample play-by-play data
pbp_df = pd.DataFrame({
'player': ['LeBron James', 'Stephen Curry', 'LeBron James', 'Kevin Durant'],
'period': [4, 4, 4, 4],
'time_remaining': [180, 120, 90, 45],
'score_diff': [-2, 3, 1, -1],
'points': [2, 3, 0, 2],
'fg_made': [1, 1, 0, 1],
'fg_attempted': [1, 1, 1, 1],
'three_made': [0, 1, 0, 0],
'three_attempted': [0, 1, 0, 0],
'ft_made': [0, 0, 0, 0],
'ft_attempted': [0, 0, 0, 0],
'possession': [1, 1, 1, 1],
'minutes_played': [3, 2, 1.5, 0.75],
'WPA': [0.08, 0.15, -0.05, 0.12]
})
# Identify clutch situations
clutch_plays = clutch_analyzer.identify_clutch_situations(pbp_df)
print("Clutch Plays:")
print(clutch_plays[['player', 'time_remaining', 'score_diff', 'points', 'WPA']])
# Calculate clutch stats for LeBron
lebron_clutch = clutch_plays[clutch_plays['player'] == 'LeBron James']
lebron_stats = clutch_analyzer.calculate_clutch_stats(lebron_clutch)
print("\nLeBron James Clutch Stats:")
for key, value in lebron_stats.items():
print(f" {key}: {value:.3f}")
```
### R Implementation - Clutch Performance Analysis
```r
library(dplyr)
library(ggplot2)
# Clutch Performance Analyzer
analyze_clutch_performance <- function(play_by_play, time_threshold = 300, score_threshold = 5) {
# Identify clutch situations
clutch_data <- play_by_play %>%
filter(
time_remaining <= time_threshold,
abs(score_diff) <= score_threshold,
period >= 4
)
# Calculate clutch statistics by player
clutch_stats <- clutch_data %>%
group_by(player) %>%
summarise(
clutch_plays = n(),
clutch_points = sum(points, na.rm = TRUE),
clutch_fgm = sum(fg_made, na.rm = TRUE),
clutch_fga = sum(fg_attempted, na.rm = TRUE),
clutch_fg_pct = clutch_fgm / clutch_fga,
clutch_3pm = sum(three_made, na.rm = TRUE),
clutch_3pa = sum(three_attempted, na.rm = TRUE),
clutch_3p_pct = clutch_3pm / clutch_3pa,
total_wpa = sum(WPA, na.rm = TRUE),
avg_wpa = mean(WPA, na.rm = TRUE),
max_wpa = max(WPA, na.rm = TRUE),
.groups = 'drop'
) %>%
arrange(desc(total_wpa))
return(clutch_stats)
}
# Calculate Clutch Rating
calculate_clutch_rating <- function(player_data) {
player_data %>%
mutate(
clutch_ratio = clutch_fg_pct / regular_fg_pct,
clutch_rating = (clutch_ratio / mean(clutch_ratio, na.rm = TRUE)) * 100
)
}
# Visualize clutch performance
plot_clutch_performance <- function(clutch_stats, top_n = 10) {
top_clutch <- clutch_stats %>%
top_n(top_n, total_wpa)
ggplot(top_clutch, aes(x = reorder(player, total_wpa), y = total_wpa)) +
geom_col(fill = "#d62728", alpha = 0.8) +
coord_flip() +
labs(
title = "Top Clutch Performers by Total WPA",
x = "Player",
y = "Total Win Probability Added (Clutch)"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 10)
)
}
# Example usage
set.seed(42)
pbp_example <- data.frame(
player = sample(c("Player A", "Player B", "Player C"), 100, replace = TRUE),
period = sample(4:5, 100, replace = TRUE),
time_remaining = sample(0:300, 100, replace = TRUE),
score_diff = sample(-5:5, 100, replace = TRUE),
points = sample(c(0, 1, 2, 3), 100, replace = TRUE, prob = c(0.4, 0.1, 0.35, 0.15)),
fg_made = sample(0:1, 100, replace = TRUE),
fg_attempted = 1,
three_made = sample(0:1, 100, replace = TRUE, prob = c(0.7, 0.3)),
three_attempted = sample(0:1, 100, replace = TRUE, prob = c(0.5, 0.5)),
WPA = rnorm(100, 0, 0.05)
)
clutch_results <- analyze_clutch_performance(pbp_example)
print(clutch_results)
```
## Live Win Probability Dashboard
### Python Implementation - Real-time WP Tracker
```python
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import numpy as np
class LiveWinProbabilityTracker:
def __init__(self, wp_model):
self.wp_model = wp_model
self.game_states = []
self.win_probs = []
def update_game_state(self, score_home, score_away, time_remaining,
possession, team_strength_diff=0):
"""Update with new game state"""
score_diff = score_home - score_away
wp = self.wp_model.predict_win_probability(
score_diff=score_diff,
time_remaining=time_remaining,
possession=possession,
home_team=1,
team_strength_diff=team_strength_diff
)
self.game_states.append({
'score_home': score_home,
'score_away': score_away,
'time_remaining': time_remaining,
'possession': possession,
'elapsed_time': 2880 - time_remaining
})
self.win_probs.append(wp)
return wp
def plot_live_wp(self):
"""Generate live win probability chart"""
if not self.game_states:
return
elapsed_times = [s['elapsed_time'] for s in self.game_states]
plt.figure(figsize=(12, 6))
plt.plot(elapsed_times, self.win_probs, linewidth=2.5, color='#1f77b4')
plt.fill_between(elapsed_times, self.win_probs, 0.5,
alpha=0.3, color='#1f77b4')
plt.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
plt.ylim(0, 1)
plt.xlim(0, 2880)
# Quarter markers
for q in [720, 1440, 2160, 2880]:
plt.axvline(x=q, color='lightgray', linestyle=':', alpha=0.5)
plt.xlabel('Game Time (seconds)', fontsize=12)
plt.ylabel('Home Team Win Probability', fontsize=12)
plt.title('Live Win Probability Chart', fontsize=14, fontweight='bold')
# Format y-axis as percentage
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
# Add current score and WP
current_state = self.game_states[-1]
current_wp = self.win_probs[-1]
score_text = f"Score: {current_state['score_home']}-{current_state['score_away']}\n"
score_text += f"Win Prob: {current_wp:.1%}"
plt.text(0.02, 0.98, score_text,
transform=plt.gca().transAxes,
fontsize=11, verticalalignment='top',
bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
def get_key_moments(self, threshold=0.10):
"""Identify key moments (large WP swings)"""
key_moments = []
for i in range(1, len(self.win_probs)):
wp_change = abs(self.win_probs[i] - self.win_probs[i-1])
if wp_change >= threshold:
key_moments.append({
'play_num': i,
'time': self.game_states[i]['elapsed_time'],
'wp_before': self.win_probs[i-1],
'wp_after': self.win_probs[i],
'wp_change': self.win_probs[i] - self.win_probs[i-1],
'score': f"{self.game_states[i]['score_home']}-{self.game_states[i]['score_away']}"
})
return sorted(key_moments, key=lambda x: abs(x['wp_change']), reverse=True)
# Example: Simulate a game with live WP tracking
wp_model = BasketballWinProbability()
# Assume model is trained
tracker = LiveWinProbabilityTracker(wp_model)
# Simulate game progression
game_events = [
(0, 0, 2880, 0), # Start
(2, 0, 2856, 0), # Home scores
(2, 3, 2832, 1), # Away scores 3
(5, 3, 2808, 1), # Home scores 3
(7, 3, 2784, 0), # Home scores 2
(7, 6, 2760, 1), # Away scores 3
# ... more events ...
(98, 95, 120, 1), # Close game, 2 min left
(100, 95, 96, 0), # Home pulls ahead
(100, 98, 72, 1), # Away scores 3
(103, 98, 24, 0), # Home scores 3
(103, 100, 8, 1), # Away scores 2
(103, 100, 0, 0), # Final
]
print("Live Win Probability Updates:\n")
for score_h, score_a, time, poss in game_events:
wp = tracker.update_game_state(score_h, score_a, time, poss)
time_formatted = f"{time//60}:{time%60:02d}"
print(f"Time {time_formatted} | Score {score_h}-{score_a} | WP: {wp:.1%}")
# Plot the game
tracker.plot_live_wp()
# Get key moments
key_moments = tracker.get_key_moments(threshold=0.08)
print(f"\n\nTop 5 Key Moments:")
for i, moment in enumerate(key_moments[:5], 1):
print(f"{i}. Time {moment['time']//60}:{moment['time']%60:02d} | "
f"Score {moment['score']} | WP Change: {moment['wp_change']:+.1%}")
```
## Summary
Win probability models in basketball provide real-time insights into game dynamics by:
1. **Modeling win likelihood** based on score, time, possession, and context
2. **Calculating WPA** to measure play impact on winning chances
3. **Computing EPA** to evaluate possession efficiency
4. **Identifying clutch moments** using leverage and WP swings
5. **Measuring player performance** in high-pressure situations
These metrics enhance game analysis, player evaluation, and strategic decision-making.
### Key Formulas
- **Win Probability**: `P(Win) = 1 / (1 + e^(-z))` where z is linear combination of features
- **WPA**: `WP_after - WP_before`
- **EPA**: `Points_scored - ExpectedPoints_before`
- **Leverage Index**: Potential WP swing on a play
- **Clutch Rating**: Player performance ratio vs league average in clutch situations
### Applications
- **Broadcasting**: Live win probability graphics
- **Coaching**: Identifying critical game moments
- **Player Evaluation**: Clutch performance metrics
- **Betting**: In-game probability updates
- **Strategy**: Understanding high-leverage situations
Discussion
Have questions or feedback? Join our community discussion on
Discord or
GitHub Discussions.
Table of Contents
Related Topics
Quick Actions