What is Basketball Analytics?
What is Basketball Analytics?
Basketball analytics represents the application of statistical analysis, data science, and mathematical modeling to evaluate player performance, team strategies, and game outcomes in basketball. This field has revolutionized how teams make decisions, from player acquisitions to in-game strategy, transforming basketball from a sport driven primarily by traditional scouting and intuition to one where data-driven insights play a central role.
The Evolution of Basketball Analytics
The roots of basketball analytics can be traced back to the work of statisticians and enthusiasts who sought to quantify player contributions beyond basic box score statistics. In the 1950s, statisticians began tracking field goal percentages and rebounds, but it wasn't until the 1980s and 1990s that more sophisticated metrics emerged.
Dean Oliver, often called the "father of basketball analytics," published his groundbreaking book "Basketball on Paper" in 2004, introducing the Four Factors of Basketball Success: shooting efficiency, turnover rate, rebounding, and free throw rate. This work laid the foundation for modern basketball analysis and inspired a generation of analysts.
The launch of Basketball-Reference.com in 2000 democratized access to historical basketball data, making it possible for fans and analysts to conduct their own research. Around the same time, online communities began developing advanced metrics like Player Efficiency Rating (PER) and True Shooting Percentage (TS%), which provided more nuanced views of player performance than traditional statistics.
The Moneyball Effect in the NBA
The publication of Michael Lewis's "Moneyball" in 2003, which chronicled the Oakland Athletics' use of sabermetrics in baseball, sent shockwaves through professional sports. NBA executives and coaches began to wonder: could similar analytical approaches revolutionize basketball?
The answer was a resounding yes. Teams started hiring statisticians, data scientists, and analytics experts to gain competitive advantages. The Boston Celtics were early adopters, using analytics to help construct their 2008 championship team. However, it was the Houston Rockets under General Manager Daryl Morey who would become the poster child for analytics-driven basketball.
Daryl Morey and the Houston Rockets Revolution
When Daryl Morey became the General Manager of the Houston Rockets in 2007, he brought with him an MIT education and a deep belief in data-driven decision making. Morey assembled one of the largest analytics departments in professional sports, employing PhDs in statistics, computer science, and behavioral economics.
The Rockets' analytical approach led to several key insights that would reshape the NBA:
- The Inefficiency of Mid-Range Shots: Analysis showed that mid-range two-point shots were the least efficient shots in basketball. The Rockets dramatically reduced their mid-range attempts, focusing instead on three-pointers and shots at the rim.
- The Value of Three-Point Shooting: By calculating expected points per shot, the Rockets determined that a team shooting 33.3% from three-point range scores the same as a team shooting 50% on two-point shots. This led to an unprecedented emphasis on three-point shooting.
- Pace and Space: Analytics revealed that playing at a faster pace and spreading the floor created more efficient offensive opportunities.
- Player Evaluation: The Rockets used advanced metrics to identify undervalued players who excelled in areas traditional statistics missed.
Under Morey's leadership, the Rockets became perennial contenders, won a franchise-record 65 games in 2018, and came within one game of the NBA Finals. While they never won a championship, their analytical approach fundamentally changed how basketball is played at every level.
Traditional Statistics vs. Advanced Analytics
Traditional basketball statistics have been tracked since the sport's inception and include metrics like:
- Points, Rebounds, Assists, Steals, Blocks
- Field Goal Percentage (FG%)
- Free Throw Percentage (FT%)
- Minutes Played
- Plus/Minus
While these statistics provide valuable information, they have significant limitations. For example, traditional field goal percentage treats all made shots equally, whether they're three-pointers or two-pointers. A player shooting 40% from three-point range is actually more efficient than a player shooting 55% on two-pointers, but this isn't reflected in basic FG%.
Advanced analytics address these limitations by providing context and measuring efficiency:
- True Shooting Percentage (TS%): Measures shooting efficiency accounting for field goals, three-pointers, and free throws
- Effective Field Goal Percentage (eFG%): Adjusts for the fact that three-pointers are worth more than two-pointers
- Usage Rate (USG%): Estimates the percentage of team plays used by a player while on the court
- Offensive and Defensive Rating: Estimates points produced/allowed per 100 possessions
- Win Shares: Estimates the number of wins contributed by a player
How Analytics Changed the Game
The Three-Point Revolution
Perhaps no aspect of basketball has been more profoundly affected by analytics than three-point shooting. In the 2000-01 season, NBA teams averaged 13.7 three-point attempts per game. By the 2023-24 season, that number had skyrocketed to over 35 attempts per game.
This shift wasn't arbitrary—it was driven by data showing that three-point shots, despite being more difficult, offer better expected value than mid-range two-pointers. The Golden State Warriors, led by Stephen Curry, demonstrated the devastating effectiveness of this approach, winning four championships between 2015 and 2022.
Pace and Space
Analytics revealed that faster-paced games create more possessions, and more possessions create more scoring opportunities. Teams also learned that spacing the floor with shooters creates driving lanes and open shots. This led to the decline of traditional "big men" who couldn't shoot or defend in space, and the rise of versatile "stretch" players who could shoot from outside.
Load Management and Rest
Data analysis has shown the relationship between player fatigue and injury risk. This has led to controversial "load management" practices where healthy players sit out games to preserve their bodies for the playoffs. While unpopular with fans, teams argue that analytics support this approach.
Defensive Schemes
Analytics hasn't just changed offense. Teams now use data to identify opponents' most efficient plays and players, designing defensive schemes to limit these threats. The proliferation of "switching" defenses, where defenders swap assignments on screens, is partly an analytical response to offensive strategies.
Current State of NBA Analytics Departments
Today, every NBA team has an analytics department, though they vary significantly in size and influence. Leading organizations employ dozens of analysts with diverse backgrounds:
- Data Scientists: Build models to predict player performance and game outcomes
- Video Analysts: Break down game film and track detailed play-by-play data
- Software Engineers: Develop tools and platforms for data analysis and visualization
- Statisticians: Design and validate metrics that measure player and team performance
- Sports Scientists: Analyze biomechanical and physiological data to optimize performance and prevent injuries
Modern analytics departments don't just crunch numbers—they work closely with coaches, scouts, and front office personnel to integrate data insights into decision-making processes. The best organizations have bridged the gap between "analytics people" and "basketball people," creating cultures where data informs but doesn't dictate decisions.
Key Metrics in Basketball Analytics
Player Efficiency Rating (PER)
Developed by ESPN's John Hollinger, PER is a single-number metric that attempts to boil down all of a player's contributions into one rating. The league average PER is always 15.0, with higher numbers indicating better performance. PER has limitations—it doesn't adequately account for defense and favors high-usage offensive players—but remains widely used.
True Shooting Percentage (TS%)
TS% is calculated as: TS% = Points / (2 × (FGA + 0.44 × FTA))
This metric provides a more accurate picture of shooting efficiency than traditional FG% because it accounts for the added value of three-pointers and the efficiency of free throws. A TS% above 60% is considered excellent in the modern NBA.
Box Plus/Minus (BPM)
BPM is a box score-based metric that estimates a player's contribution to the team when they're on the court, measured in points above league average per 100 possessions. It's split into Offensive BPM (OBPM) and Defensive BPM (DBPM), with the sum equaling total BPM.
Value Over Replacement Player (VORP)
VORP converts BPM into an estimate of the player's value over a replacement-level player (worth -2.0 BPM). It accounts for playing time, making it useful for comparing players with different minutes.
Win Shares (WS)
Win Shares estimates the number of wins contributed by a player through offensive and defensive performance. It's calculated using detailed box score data and team performance metrics.
Real Plus-Minus (RPM) and RAPTOR
Unlike box score-based metrics, RPM (developed by ESPN) and RAPTOR (developed by FiveThirtyEight) use play-by-play data to estimate a player's impact. These metrics attempt to isolate a player's effect by comparing what happens when they're on vs. off the court, controlling for teammates and opponents.
Offensive and Defensive Ratings
These metrics estimate how many points a team scores (Offensive Rating) or allows (Defensive Rating) per 100 possessions. They can be calculated at both team and individual levels, providing insight into efficiency independent of pace.
Impact on Player Evaluation and Game Strategy
Draft and Free Agency Decisions
Analytics has transformed how teams evaluate prospects and free agents. Teams now project college and international players into NBA contexts using statistical models, identifying players whose skills translate better than traditional scouting suggests. This has led to success stories like Nikola Jokić (drafted 41st overall in 2014, now a three-time MVP) and Draymond Green (drafted 35th overall in 2012, key to Warriors dynasty).
Contract Valuations
Advanced metrics help teams avoid overpaying for players whose traditional statistics overstate their true value. Conversely, they help identify undervalued players who contribute in ways that don't show up in basic box scores.
Lineup Optimization
Coaches use analytics to determine which player combinations work best together. Net rating (point differential per 100 possessions) for specific lineups helps coaches optimize rotations and matchups.
Shot Selection
Analytics has completely changed shot selection philosophy. Teams now understand expected points per shot for different locations on the court, leading to the dramatic reduction in mid-range attempts and increase in three-pointers and shots at the rim.
Defensive Strategy
Data analysis helps teams identify opponents' tendencies and weaknesses. Teams use shot charts, play-type data, and player tracking information to design defensive game plans that force opponents into lower-efficiency shots.
The Future of Basketball Analytics
Basketball analytics continues to evolve rapidly. Several emerging trends are shaping the future of the field:
Player Tracking and Computer Vision
NBA arenas are equipped with sophisticated camera systems that track every player's movement 25 times per second. This data enables analysis of spacing, movement patterns, and defensive positioning at unprecedented detail. Machine learning models can now identify play types, predict shot outcomes, and evaluate defensive effectiveness in ways impossible just a few years ago.
Wearable Technology
Players wear devices that monitor heart rate, acceleration, jump height, and other biomechanical data. This information helps optimize training, prevent injuries, and manage player workload throughout the season.
Artificial Intelligence and Machine Learning
AI models are being developed to predict game outcomes, optimize lineups, and identify patterns in player performance. Neural networks can process vast amounts of video data to automatically generate scouting reports and identify tendencies.
Integrated Analytics
The future lies in integrating multiple data sources—box scores, tracking data, video analysis, biometric data, and even psychological assessments—to create comprehensive models of player and team performance.
Real-Time Decision Making
Analytics is moving from postgame analysis to real-time decision support. Some teams already receive analytical insights during games, helping coaches make substitution and strategic decisions on the fly.
Working with Basketball Data: Code Examples
Python: Fetching and Analyzing NBA Data
The nba_api library provides easy access to NBA statistics. Here's how to fetch player data and calculate advanced metrics:
# Install required libraries
# pip install nba_api pandas matplotlib seaborn
from nba_api.stats.endpoints import leaguedashplayerstats
from nba_api.stats.static import players
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Fetch current season player statistics
player_stats = leaguedashplayerstats.LeagueDashPlayerStats(
season='2023-24',
per_mode_detailed='PerGame'
)
# Convert to DataFrame
df = player_stats.get_data_frames()[0]
# Calculate True Shooting Percentage
# TS% = PTS / (2 * (FGA + 0.44 * FTA))
df['TS%'] = df['PTS'] / (2 * (df['FGA'] + 0.44 * df['FTA']))
# Calculate Effective Field Goal Percentage
# eFG% = (FGM + 0.5 * FG3M) / FGA
df['eFG%'] = (df['FGM'] + 0.5 * df['FG3M']) / df['FGA']
# Filter for players with significant minutes
qualified_players = df[df['MIN'] >= 20].copy()
# Display top 10 players by True Shooting Percentage
print("Top 10 Players by True Shooting Percentage (min 20 MPG):")
print(qualified_players.nlargest(10, 'TS%')[['PLAYER_NAME', 'PTS', 'TS%', 'eFG%']])
# Compare traditional vs advanced stats
# Find players with high PTS but low efficiency
df['PPG_Rank'] = df['PTS'].rank(ascending=False)
df['TS_Rank'] = df['TS%'].rank(ascending=False)
df['Efficiency_Gap'] = df['PPG_Rank'] - df['TS_Rank']
print("\nHigh Scorers with Lower Efficiency:")
print(df.nlargest(5, 'Efficiency_Gap')[['PLAYER_NAME', 'PTS', 'FG_PCT', 'TS%']])
Python: Visualizing the Three-Point Revolution
from nba_api.stats.endpoints import leaguedashteamstats
import pandas as pd
import matplotlib.pyplot as plt
# Fetch team statistics for multiple seasons
seasons = ['2010-11', '2015-16', '2020-21', '2023-24']
three_point_data = []
for season in seasons:
team_stats = leaguedashteamstats.LeagueDashTeamStats(season=season)
df = team_stats.get_data_frames()[0]
avg_3pa = df['FG3A'].mean()
avg_3pm = df['FG3M'].mean()
avg_3p_pct = df['FG3_PCT'].mean()
three_point_data.append({
'Season': season,
'3PA_per_game': avg_3pa,
'3PM_per_game': avg_3pm,
'3P%': avg_3p_pct * 100
})
# Create visualization
three_pt_df = pd.DataFrame(three_point_data)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
# Plot 3-point attempts over time
ax1.plot(three_pt_df['Season'], three_pt_df['3PA_per_game'],
marker='o', linewidth=2, markersize=8, color='#1f77b4')
ax1.set_title('Average 3-Point Attempts Per Game', fontsize=14, fontweight='bold')
ax1.set_xlabel('Season', fontsize=12)
ax1.set_ylabel('3-Point Attempts', fontsize=12)
ax1.grid(True, alpha=0.3)
# Plot 3-point percentage over time
ax2.plot(three_pt_df['Season'], three_pt_df['3P%'],
marker='s', linewidth=2, markersize=8, color='#ff7f0e')
ax2.set_title('Average 3-Point Percentage', fontsize=14, fontweight='bold')
ax2.set_xlabel('Season', fontsize=12)
ax2.set_ylabel('3-Point %', fontsize=12)
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('three_point_revolution.png', dpi=300, bbox_inches='tight')
plt.show()
print("\nThree-Point Revolution Data:")
print(three_pt_df.to_string(index=False))
Python: Player Comparison Analysis
from nba_api.stats.endpoints import playercareerstats
from nba_api.stats.static import players
import pandas as pd
import matplotlib.pyplot as plt
# Find player IDs
all_players = players.get_players()
def get_player_id(name):
player = [p for p in all_players if p['full_name'].lower() == name.lower()]
return player[0]['id'] if player else None
# Compare two players
player1_name = "LeBron James"
player2_name = "Kevin Durant"
player1_id = get_player_id(player1_name)
player2_id = get_player_id(player2_name)
# Fetch career statistics
career1 = playercareerstats.PlayerCareerStats(player_id=player1_id)
career2 = playercareerstats.PlayerCareerStats(player_id=player2_id)
df1 = career1.get_data_frames()[0]
df2 = career2.get_data_frames()[0]
# Calculate per-game averages for career
df1['PPG'] = df1['PTS'] / df1['GP']
df1['RPG'] = df1['REB'] / df1['GP']
df1['APG'] = df1['AST'] / df1['GP']
df1['TS%'] = df1['PTS'] / (2 * (df1['FGA'] + 0.44 * df1['FTA']))
df2['PPG'] = df2['PTS'] / df2['GP']
df2['RPG'] = df2['REB'] / df2['GP']
df2['APG'] = df2['AST'] / df2['GP']
df2['TS%'] = df2['PTS'] / (2 * (df2['FGA'] + 0.44 * df2['FTA']))
# Create comparison visualization
categories = ['PPG', 'RPG', 'APG', 'FG%', 'TS%']
player1_stats = [
df1['PPG'].mean(),
df1['RPG'].mean(),
df1['APG'].mean(),
df1['FG_PCT'].mean() * 100,
df1['TS%'].mean() * 100
]
player2_stats = [
df2['PPG'].mean(),
df2['RPG'].mean(),
df2['APG'].mean(),
df2['FG_PCT'].mean() * 100,
df2['TS%'].mean() * 100
]
x = range(len(categories))
width = 0.35
fig, ax = plt.subplots(figsize=(12, 6))
bars1 = ax.bar([i - width/2 for i in x], player1_stats, width, label=player1_name, color='#6f42c1')
bars2 = ax.bar([i + width/2 for i in x], player2_stats, width, label=player2_name, color='#fd7e14')
ax.set_xlabel('Statistics', fontsize=12)
ax.set_ylabel('Value', fontsize=12)
ax.set_title('Career Statistics Comparison', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('player_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
print(f"\n{player1_name} Career Averages:")
print(f"PPG: {player1_stats[0]:.1f}, RPG: {player1_stats[1]:.1f}, APG: {player1_stats[2]:.1f}")
print(f"FG%: {player1_stats[3]:.1f}%, TS%: {player1_stats[4]:.1f}%")
print(f"\n{player2_name} Career Averages:")
print(f"PPG: {player2_stats[0]:.1f}, RPG: {player2_stats[1]:.1f}, APG: {player2_stats[2]:.1f}")
print(f"FG%: {player2_stats[3]:.1f}%, TS%: {player2_stats[4]:.1f}%")
R: Analyzing NBA Data with hoopR
The hoopR package provides comprehensive access to NBA data in R:
# Install required libraries
# install.packages("hoopR")
# install.packages("tidyverse")
# install.packages("ggplot2")
library(hoopR)
library(tidyverse)
library(ggplot2)
# Load NBA player box scores
nba_player_box <- load_nba_player_box(seasons = 2024)
# Calculate advanced statistics
nba_stats <- nba_player_box %>%
group_by(athlete_display_name, team_name) %>%
summarise(
games = n(),
total_points = sum(points, na.rm = TRUE),
total_fga = sum(field_goals_attempted, na.rm = TRUE),
total_fgm = sum(field_goals_made, na.rm = TRUE),
total_3pa = sum(three_point_field_goals_attempted, na.rm = TRUE),
total_3pm = sum(three_point_field_goals_made, na.rm = TRUE),
total_fta = sum(free_throws_attempted, na.rm = TRUE),
total_rebounds = sum(rebounds, na.rm = TRUE),
total_assists = sum(assists, na.rm = TRUE),
.groups = 'drop'
) %>%
mutate(
ppg = total_points / games,
rpg = total_rebounds / games,
apg = total_assists / games,
fg_pct = total_fgm / total_fga * 100,
three_pt_pct = total_3pm / total_3pa * 100,
ts_pct = total_points / (2 * (total_fga + 0.44 * total_fta)) * 100,
efg_pct = (total_fgm + 0.5 * total_3pm) / total_fga * 100
) %>%
filter(games >= 20) # Filter for qualified players
# Display top scorers
top_scorers <- nba_stats %>%
arrange(desc(ppg)) %>%
select(athlete_display_name, team_name, ppg, ts_pct, efg_pct) %>%
head(10)
print("Top 10 Scorers (2023-24 Season):")
print(top_scorers)
# Visualize relationship between 3-point attempts and efficiency
ggplot(nba_stats, aes(x = total_3pa / games, y = ts_pct)) +
geom_point(aes(size = ppg, color = ppg), alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
scale_color_gradient(low = "blue", high = "red") +
labs(
title = "3-Point Attempts vs. True Shooting Percentage",
x = "3-Point Attempts per Game",
y = "True Shooting %",
size = "Points per Game",
color = "Points per Game"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)
ggsave("3pt_vs_efficiency.png", width = 10, height = 6, dpi = 300)
R: Shot Chart Analysis
library(hoopR)
library(tidyverse)
library(ggplot2)
# Load play-by-play data to analyze shot locations
pbp_data <- load_nba_pbp(seasons = 2024)
# Filter for shot attempts
shots <- pbp_data %>%
filter(!is.na(shooting_play)) %>%
filter(!is.na(coordinate_x) & !is.na(coordinate_y)) %>%
mutate(
shot_made = str_detect(text, "made"),
shot_type = case_when(
str_detect(text, "Three Point") ~ "Three Point",
str_detect(text, "Free Throw") ~ "Free Throw",
TRUE ~ "Two Point"
),
distance = sqrt(coordinate_x^2 + coordinate_y^2)
)
# Calculate shooting efficiency by zone
shot_zones <- shots %>%
mutate(
zone = case_when(
distance <= 5 ~ "At Rim (0-5 ft)",
distance <= 10 ~ "Short (5-10 ft)",
distance <= 16 ~ "Mid-Range (10-16 ft)",
distance <= 23.75 ~ "Long 2 (16-24 ft)",
TRUE ~ "Three Point (24+ ft)"
)
) %>%
group_by(zone, shot_type) %>%
summarise(
attempts = n(),
makes = sum(shot_made),
fg_pct = makes / attempts * 100,
points_per_attempt = if_else(
shot_type == "Three Point",
(makes / attempts) * 3,
(makes / attempts) * 2
),
.groups = 'drop'
) %>%
arrange(desc(points_per_attempt))
print("Shooting Efficiency by Zone:")
print(shot_zones)
# Visualize points per attempt by zone
ggplot(shot_zones, aes(x = reorder(zone, points_per_attempt), y = points_per_attempt, fill = shot_type)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("Two Point" = "#1f77b4", "Three Point" = "#ff7f0e")) +
labs(
title = "Expected Points per Shot by Court Zone",
subtitle = "NBA 2023-24 Season",
x = "Shot Zone",
y = "Points per Attempt",
fill = "Shot Type"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 11),
legend.position = "bottom"
)
ggsave("points_per_attempt_by_zone.png", width = 10, height = 6, dpi = 300)
Python: Building a Simple Plus/Minus Calculator
from nba_api.stats.endpoints import teamgamelog
import pandas as pd
def calculate_net_rating(team_abbreviation, season='2023-24'):
"""
Calculate a team's offensive and defensive ratings
"""
# Fetch team game log
game_log = teamgamelog.TeamGameLog(
team_id=None, # Will be determined by abbreviation
season=season
)
df = game_log.get_data_frames()[0]
# Calculate per-100-possession stats
total_possessions = (df['FGA'] + 0.44 * df['FTA'] - df['OREB'] + df['TOV']).sum()
total_points_scored = df['PTS'].sum()
total_points_allowed = df['PTS'].sum() - df['PLUS_MINUS'].sum()
offensive_rating = (total_points_scored / total_possessions) * 100
defensive_rating = (total_points_allowed / total_possessions) * 100
net_rating = offensive_rating - defensive_rating
return {
'Team': team_abbreviation,
'Offensive Rating': round(offensive_rating, 2),
'Defensive Rating': round(defensive_rating, 2),
'Net Rating': round(net_rating, 2),
'Record': f"{df['WL'].str.count('W').sum()}-{df['WL'].str.count('L').sum()}"
}
# Example usage
team_ratings = calculate_net_rating('BOS', '2023-24')
print("Team Ratings:")
for key, value in team_ratings.items():
print(f"{key}: {value}")
Conclusion
Basketball analytics has evolved from a niche interest to an essential component of modern basketball. The field has transformed how teams evaluate players, design strategies, and make decisions at every level of the organization. From Daryl Morey's Houston Rockets to the Golden State Warriors' dynasty, analytics has proven its value in building competitive teams.
The integration of traditional basketball knowledge with advanced statistical analysis has created a more sophisticated understanding of the game. While analytics will never replace human judgment, scouting, and coaching expertise, it provides invaluable insights that complement traditional evaluation methods.
As technology continues to advance, bringing more detailed tracking data, artificial intelligence, and real-time analysis capabilities, basketball analytics will only become more powerful and influential. The future of basketball will be shaped by teams that can successfully blend data-driven insights with basketball wisdom, creating a competitive advantage that extends from the draft room to the court.
For aspiring analysts, the field offers exciting opportunities to combine a passion for basketball with skills in data science, statistics, and programming. The code examples provided in this guide offer a starting point for exploring NBA data and developing your own analytical insights. Whether you're a fan looking to understand the game better, a student considering a career in sports analytics, or a coach seeking data-driven insights, basketball analytics provides tools to deepen your understanding and appreciation of the sport.