Understanding Basketball Data Types

Beginner 10 min read 1 views Nov 27, 2025
# Understanding Basketball Data ## Types of NBA Data ### 1. Box Score Data Box scores provide traditional and advanced statistics for players and teams in each game. **What's Included:** - Player statistics (points, rebounds, assists, steals, blocks) - Shooting percentages (FG%, 3P%, FT%) - Plus/minus ratings - Team totals and opponent stats - Game metadata (date, location, outcome) **Python Example - Fetching Box Scores:** ```python from nba_api.stats.endpoints import boxscoretraditionalv2 import pandas as pd # Get box score for a specific game game_id = '0022100001' boxscore = boxscoretraditionalv2.BoxScoreTraditionalV2(game_id=game_id) # Extract player stats player_stats = boxscore.get_data_frames()[0] print(player_stats[['PLAYER_NAME', 'PTS', 'REB', 'AST', 'FG_PCT']]) # Extract team stats team_stats = boxscore.get_data_frames()[1] print(team_stats[['TEAM_NAME', 'PTS', 'FG_PCT', 'FG3_PCT']]) ``` **R Example - Box Score Analysis:** ```r library(nbastatR) library(dplyr) # Get box scores for a date range box_scores <- game_logs( seasons = 2023, result_types = "player", season_types = "Regular Season" ) # Analyze player performance top_scorers <- box_scores %>% group_by(namePlayer) %>% summarise( games = n(), avg_pts = mean(pts, na.rm = TRUE), avg_reb = mean(treb, na.rm = TRUE), avg_ast = mean(ast, na.rm = TRUE) ) %>% arrange(desc(avg_pts)) %>% head(10) print(top_scorers) ``` ### 2. Play-by-Play Data Play-by-play data captures every event that occurs during a game with timestamps and context. **What's Included:** - Event types (shot, rebound, turnover, foul, substitution) - Event timestamps (game clock, period) - Players involved in each event - Shot locations and descriptions - Score after each event **Python Example - Play-by-Play Analysis:** ```python from nba_api.stats.endpoints import playbyplayv2 import pandas as pd # Get play-by-play data game_id = '0022100001' pbp = playbyplayv2.PlayByPlayV2(game_id=game_id) plays = pbp.get_data_frames()[0] # Filter for made shots shots_made = plays[plays['EVENTMSGTYPE'] == 1] # Made shots # Analyze shot distribution print(shots_made['HOMEDESCRIPTION'].value_counts()) # Find scoring runs plays['SCORE_DIFF'] = plays['SCOREMARGIN'].fillna(method='ffill') plays['SCORE_CHANGE'] = plays['SCORE_DIFF'].diff() # Identify momentum swings momentum_swings = plays[abs(plays['SCORE_CHANGE']) >= 5] print(momentum_swings[['PCTIMESTRING', 'HOMEDESCRIPTION', 'VISITORDESCRIPTION', 'SCORE_CHANGE']]) ``` **R Example - Event Sequence Analysis:** ```r library(nbastatR) library(dplyr) # Get play-by-play data pbp_data <- play_by_play_v2(game_ids = "0022100001") # Analyze shot types shot_analysis <- pbp_data %>% filter(str_detect(tolower(descriptionPlayHome), "shot|miss")) %>% mutate( shot_type = case_when( str_detect(descriptionPlayHome, "3PT") ~ "Three Pointer", str_detect(descriptionPlayHome, "Layup|Dunk") ~ "At Rim", TRUE ~ "Mid Range" ), made = !str_detect(descriptionPlayHome, "MISS") ) %>% group_by(shot_type) %>% summarise( attempts = n(), makes = sum(made), fg_pct = mean(made) * 100 ) print(shot_analysis) ``` ### 3. Player Tracking Data Advanced tracking data from SportVU cameras captures player movements, speeds, and spatial data. **What's Included:** - Player speed and distance traveled - Touch time and dribbles - Catch-and-shoot vs pull-up shooting - Defender distance on shots - Rebounding tracking (positioning, hustle) **Python Example - Tracking Data:** ```python from nba_api.stats.endpoints import playerdashptshotlog import pandas as pd import matplotlib.pyplot as plt # Get shot tracking data for a player player_id = 2544 # LeBron James shot_log = playerdashptshotlog.PlayerDashPtShotLog( player_id=player_id, season='2023-24', season_type_all_star='Regular Season' ) shots = shot_log.get_data_frames()[0] # Analyze shots by defender distance shots['CLOSE_DEF_DIST_BUCKET'] = pd.cut( shots['CLOSE_DEF_DIST'], bins=[0, 2, 4, 6, float('inf')], labels=['0-2 ft', '2-4 ft', '4-6 ft', '6+ ft'] ) defense_impact = shots.groupby('CLOSE_DEF_DIST_BUCKET').agg({ 'SHOT_RESULT': lambda x: (x == 'made').sum(), 'FGM': 'count' }) defense_impact['FG_PCT'] = defense_impact['SHOT_RESULT'] / defense_impact['FGM'] print(defense_impact) # Visualize shot chart by defender distance plt.figure(figsize=(10, 6)) shots.groupby('CLOSE_DEF_DIST_BUCKET')['FG_PCT'].mean().plot(kind='bar') plt.title('FG% by Defender Distance') plt.xlabel('Defender Distance') plt.ylabel('Field Goal Percentage') plt.show() ``` **R Example - Speed and Distance Analysis:** ```r library(nbastatR) library(ggplot2) # Get player speed and distance data speed_data <- players_tracking( seasons = 2023, measures = "SpeedDistance" ) # Analyze top players by distance top_distance <- speed_data %>% arrange(desc(distMiles)) %>% head(20) %>% ggplot(aes(x = reorder(namePlayer, distMiles), y = distMiles)) + geom_bar(stat = "identity", fill = "steelblue") + coord_flip() + labs( title = "Top 20 Players by Distance Traveled", x = "Player", y = "Miles per Game" ) print(top_distance) ``` ## Data Schemas ### Box Score Schema ```python # Typical box score data structure box_score_schema = { 'game_id': 'str', # Unique game identifier (e.g., '0022100001') 'game_date': 'datetime', # Date of the game 'team_id': 'int', # Team identifier 'team_abbreviation': 'str', # Team code (e.g., 'LAL', 'GSW') 'player_id': 'int', # Player identifier 'player_name': 'str', # Player full name 'start_position': 'str', # Starting position or bench 'minutes': 'float', # Minutes played 'fgm': 'int', # Field goals made 'fga': 'int', # Field goals attempted 'fg_pct': 'float', # Field goal percentage 'fg3m': 'int', # Three-pointers made 'fg3a': 'int', # Three-pointers attempted 'fg3_pct': 'float', # Three-point percentage 'ftm': 'int', # Free throws made 'fta': 'int', # Free throws attempted 'ft_pct': 'float', # Free throw percentage 'oreb': 'int', # Offensive rebounds 'dreb': 'int', # Defensive rebounds 'reb': 'int', # Total rebounds 'ast': 'int', # Assists 'stl': 'int', # Steals 'blk': 'int', # Blocks 'tov': 'int', # Turnovers 'pf': 'int', # Personal fouls 'pts': 'int', # Points 'plus_minus': 'int' # Plus/minus rating } ``` ### Play-by-Play Schema ```python # Play-by-play data structure pbp_schema = { 'game_id': 'str', # Game identifier 'event_num': 'int', # Sequential event number 'event_msg_type': 'int', # Event type code (1=made shot, 2=miss, etc.) 'event_msg_action_type': 'int', # Detailed action type 'period': 'int', # Quarter/period number 'wc_time_string': 'str', # Wall clock time 'pc_time_string': 'str', # Period clock time (MM:SS) 'home_description': 'str', # Description for home team event 'neutral_description': 'str', # Neutral event description 'visitor_description': 'str', # Description for visitor team event 'score': 'str', # Current score (e.g., '45-42') 'score_margin': 'str', # Point differential 'person1_type': 'int', # Primary person type 'player1_id': 'int', # Primary player ID 'player1_name': 'str', # Primary player name 'player1_team_id': 'int', # Primary player's team 'person2_type': 'int', # Secondary person type 'player2_id': 'int', # Secondary player ID (assist, fouled, etc.) 'player2_name': 'str', # Secondary player name 'player2_team_id': 'int', # Secondary player's team 'person3_type': 'int', # Tertiary person type 'player3_id': 'int', # Tertiary player ID (blocker, etc.) 'player3_name': 'str', # Tertiary player name 'player3_team_id': 'int' # Tertiary player's team } # Event type codes event_types = { 1: 'Made Shot', 2: 'Missed Shot', 3: 'Free Throw', 4: 'Rebound', 5: 'Turnover', 6: 'Foul', 7: 'Violation', 8: 'Substitution', 9: 'Timeout', 10: 'Jump Ball', 12: 'Start Period', 13: 'End Period' } ``` ### Tracking Data Schema ```python # Player tracking data structure tracking_schema = { 'game_id': 'str', # Game identifier 'player_id': 'int', # Player identifier 'team_id': 'int', # Team identifier 'shot_number': 'int', # Shot sequence number 'period': 'int', # Quarter/period 'game_clock': 'str', # Time remaining in period 'shot_clock': 'float', # Shot clock time 'dribbles': 'int', # Number of dribbles before shot 'touch_time': 'float', # Time ball was in possession (seconds) 'shot_dist': 'float', # Distance from basket (feet) 'pts_type': 'int', # Point value (2 or 3) 'shot_result': 'str', # 'Made' or 'Missed' 'closest_defender': 'str', # Name of closest defender 'closest_defender_player_id': 'int', # Defender ID 'close_def_dist': 'float', # Defender distance (feet) 'fgm': 'int', # Field goal made (0 or 1) 'pts': 'int', # Points scored 'player_name': 'str', # Player name 'player_last_team_id': 'int' # Current team ID } ``` ## Player and Game IDs ### Player ID System The NBA uses unique numeric IDs to identify players across all datasets. **Finding Player IDs:** ```python from nba_api.stats.static import players import pandas as pd # Get all active players all_players = players.get_active_players() player_df = pd.DataFrame(all_players) # Search for specific player lebron = players.find_players_by_full_name('LeBron James') print(f"LeBron James ID: {lebron[0]['id']}") # 2544 # Search by partial name curry_players = [p for p in all_players if 'Curry' in p['full_name']] print(curry_players) # Get historical players (inactive) all_time_players = players.get_players() historical_df = pd.DataFrame(all_time_players) # Find player by ID player_id = 2544 player_info = [p for p in all_time_players if p['id'] == player_id] print(player_info) ``` **R Example - Player Lookup:** ```r library(nbastatR) # Get all players all_players <- nba_players() # Search for specific player lebron <- all_players %>% filter(namePlayer == "LeBron James") print(paste("LeBron James ID:", lebron$idPlayer)) # Search by team lakers_players <- all_players %>% filter(slugTeam == "LAL", isActive == TRUE) print(lakers_players[c("namePlayer", "idPlayer")]) # Player ID mapping dictionary player_dict <- setNames(all_players$idPlayer, all_players$namePlayer) ``` ### Game ID System Game IDs follow a specific format: `[season_type][season_year][game_number]` **Game ID Format:** - Position 1-3: Season type (001=preseason, 002=regular season, 003=all-star, 004=playoffs) - Position 4-5: Season year (last 2 digits) - Position 6-10: Game number (00001-01230 for regular season) **Examples:** - `0022100001` = First regular season game of 2021-22 season - `0042200401` = Playoffs game from 2022-23 season **Finding Game IDs:** ```python from nba_api.stats.endpoints import leaguegamefinder import pandas as pd # Find games for a specific team game_finder = leaguegamefinder.LeagueGameFinder( team_id_nullable='1610612747', # Lakers season_nullable='2023-24', season_type_nullable='Regular Season' ) games = game_finder.get_data_frames()[0] print(games[['GAME_ID', 'GAME_DATE', 'MATCHUP', 'WL']]) # Find games by date range games_by_date = games[ (games['GAME_DATE'] >= '2024-01-01') & (games['GAME_DATE'] <= '2024-01-31') ] # Extract game numbers games['GAME_NUMBER'] = games['GAME_ID'].str[-5:].astype(int) games['SEASON_TYPE'] = games['GAME_ID'].str[:3] ``` **R Example - Game Finder:** ```r library(nbastatR) library(dplyr) # Get games for a season games_2023 <- game_logs( seasons = 2023, result_types = "team" ) # Filter by team lakers_games <- games_2023 %>% filter(slugTeam == "LAL") %>% select(idGame, dateGame, slugMatchup, outcomeGame) print(head(lakers_games)) # Parse game ID components lakers_games <- lakers_games %>% mutate( season_type = substr(idGame, 1, 3), season_year = substr(idGame, 4, 5), game_number = substr(idGame, 6, 10) ) ``` ### Team ID Reference ```python # Common NBA team IDs team_ids = { 1610612737: 'ATL', # Atlanta Hawks 1610612738: 'BOS', # Boston Celtics 1610612751: 'BKN', # Brooklyn Nets 1610612766: 'CHA', # Charlotte Hornets 1610612741: 'CHI', # Chicago Bulls 1610612739: 'CLE', # Cleveland Cavaliers 1610612742: 'DAL', # Dallas Mavericks 1610612743: 'DEN', # Denver Nuggets 1610612765: 'DET', # Detroit Pistons 1610612744: 'GSW', # Golden State Warriors 1610612745: 'HOU', # Houston Rockets 1610612754: 'IND', # Indiana Pacers 1610612746: 'LAC', # LA Clippers 1610612747: 'LAL', # LA Lakers 1610612763: 'MEM', # Memphis Grizzlies 1610612748: 'MIA', # Miami Heat 1610612749: 'MIL', # Milwaukee Bucks 1610612750: 'MIN', # Minnesota Timberwolves 1610612740: 'NOP', # New Orleans Pelicans 1610612752: 'NYK', # New York Knicks 1610612760: 'OKC', # Oklahoma City Thunder 1610612753: 'ORL', # Orlando Magic 1610612755: 'PHI', # Philadelphia 76ers 1610612756: 'PHX', # Phoenix Suns 1610612757: 'POR', # Portland Trail Blazers 1610612758: 'SAC', # Sacramento Kings 1610612759: 'SAS', # San Antonio Spurs 1610612761: 'TOR', # Toronto Raptors 1610612762: 'UTA', # Utah Jazz 1610612764: 'WAS' # Washington Wizards } ``` ## Joining Data Sources ### Combining Box Score and Play-by-Play Data **Python Example:** ```python from nba_api.stats.endpoints import boxscoretraditionalv2, playbyplayv2 import pandas as pd def combine_box_pbp(game_id): # Get box score boxscore = boxscoretraditionalv2.BoxScoreTraditionalV2(game_id=game_id) player_stats = boxscore.get_data_frames()[0] # Get play-by-play pbp = playbyplayv2.PlayByPlayV2(game_id=game_id) plays = pbp.get_data_frames()[0] # Count events per player from play-by-play player_events = plays.groupby('PLAYER1_ID').size().reset_index(name='total_events') # Merge datasets combined = player_stats.merge( player_events, left_on='PLAYER_ID', right_on='PLAYER1_ID', how='left' ) # Calculate events per minute combined['events_per_minute'] = combined['total_events'] / combined['MIN'] return combined[['PLAYER_NAME', 'MIN', 'PTS', 'total_events', 'events_per_minute']] # Example usage game_data = combine_box_pbp('0022100001') print(game_data.sort_values('events_per_minute', ascending=False)) ``` **R Example:** ```r library(nbastatR) library(dplyr) combine_game_data <- function(game_id) { # Get box score box <- box_scores(game_ids = game_id, result_types = "player") # Get play-by-play pbp <- play_by_play_v2(game_ids = game_id) # Count player actions in play-by-play player_actions <- pbp %>% filter(!is.na(idPlayer1)) %>% group_by(idPlayer1) %>% summarise(total_actions = n()) # Join datasets combined <- box %>% left_join(player_actions, by = c("idPlayer" = "idPlayer1")) %>% mutate(actions_per_minute = total_actions / minutes) return(combined %>% select(namePlayer, minutes, pts, total_actions, actions_per_minute)) } # Example usage game_analysis <- combine_game_data("0022100001") print(game_analysis %>% arrange(desc(actions_per_minute))) ``` ### Multi-Game Player Analysis **Python Example - Season-Long Analysis:** ```python from nba_api.stats.endpoints import playergamelog, playerdashboardbyyearoveryear import pandas as pd import numpy as np def analyze_player_season(player_id, season='2023-24'): # Get game log gamelog = playergamelog.PlayerGameLog( player_id=player_id, season=season ) games = gamelog.get_data_frames()[0] # Get advanced stats dashboard = playerdashboardbyyearoveryear.PlayerDashboardByYearOverYear( player_id=player_id, season=season ) advanced = dashboard.get_data_frames()[1] # Calculate rolling averages games['PTS_MA5'] = games['PTS'].rolling(window=5).mean() games['AST_MA5'] = games['AST'].rolling(window=5).mean() games['REB_MA5'] = games['REB'].rolling(window=5).mean() # Identify hot/cold streaks games['SCORING_TREND'] = np.where( games['PTS'] > games['PTS_MA5'], 'Hot', 'Cold' ) # Merge with advanced stats summary = { 'player_id': player_id, 'games_played': len(games), 'ppg': games['PTS'].mean(), 'rpg': games['REB'].mean(), 'apg': games['AST'].mean(), 'fg_pct': games['FG_PCT'].mean(), 'hot_games': (games['SCORING_TREND'] == 'Hot').sum(), 'cold_games': (games['SCORING_TREND'] == 'Cold').sum() } return games, summary # Analyze multiple players player_ids = [2544, 201939, 201142] # LeBron, Curry, Durant results = [] for pid in player_ids: games, summary = analyze_player_season(pid) results.append(summary) comparison = pd.DataFrame(results) print(comparison) ``` **R Example - Multi-Player Comparison:** ```r library(nbastatR) library(dplyr) library(tidyr) analyze_multiple_players <- function(player_ids, season = 2023) { # Get game logs for all players all_games <- map_dfr(player_ids, function(pid) { games <- game_logs( seasons = season, result_types = "player", player_ids = pid ) return(games) }) # Calculate per-game averages player_summary <- all_games %>% group_by(idPlayer, namePlayer) %>% summarise( games = n(), ppg = mean(pts, na.rm = TRUE), rpg = mean(treb, na.rm = TRUE), apg = mean(ast, na.rm = TRUE), fg_pct = mean(pctFG, na.rm = TRUE), fg3_pct = mean(pctFG3, na.rm = TRUE), efficiency = mean((pts + treb + ast + stl + blk - (fga - fgm) - (fta - ftm) - tov), na.rm = TRUE) ) %>% arrange(desc(ppg)) return(player_summary) } # Compare players player_comparison <- analyze_multiple_players(c(2544, 201939, 201142)) print(player_comparison) ``` ### Joining Shot Tracking with Box Scores **Python Example:** ```python from nba_api.stats.endpoints import playerdashptshotlog, boxscoretraditionalv2 import pandas as pd def analyze_shooting_context(player_id, season='2023-24'): # Get shot-level data shot_log = playerdashptshotlog.PlayerDashPtShotLog( player_id=player_id, season=season, season_type_all_star='Regular Season' ) shots = shot_log.get_data_frames()[0] # Aggregate by game game_shooting = shots.groupby('GAME_ID').agg({ 'FGM': 'sum', 'FGA': 'count', 'CLOSE_DEF_DIST': 'mean', 'SHOT_DIST': 'mean', 'DRIBBLES': 'mean', 'TOUCH_TIME': 'mean' }).reset_index() game_shooting['FG_PCT'] = game_shooting['FGM'] / game_shooting['FGA'] # Get box scores for context from nba_api.stats.endpoints import playergamelog gamelog = playergamelog.PlayerGameLog(player_id=player_id, season=season) box_scores = gamelog.get_data_frames()[0] # Merge datasets combined = box_scores.merge( game_shooting, on='GAME_ID', how='inner' ) # Analyze relationship between context and performance analysis = combined[[ 'GAME_DATE', 'MATCHUP', 'PTS', 'FG_PCT', 'CLOSE_DEF_DIST', 'SHOT_DIST', 'DRIBBLES', 'TOUCH_TIME' ]] # Correlation analysis correlations = analysis[[ 'PTS', 'FG_PCT', 'CLOSE_DEF_DIST', 'SHOT_DIST', 'DRIBBLES', 'TOUCH_TIME' ]].corr() return analysis, correlations # Example usage shooting_analysis, correlations = analyze_shooting_context(2544) print("Shooting Context Correlations:") print(correlations['FG_PCT'].sort_values(ascending=False)) ``` ### Creating a Master Dataset **Python Example - Comprehensive Data Pipeline:** ```python import pandas as pd from nba_api.stats.endpoints import ( leaguegamefinder, boxscoretraditionalv2, playbyplayv2, boxscoreadvancedv2 ) class NBADataIntegrator: def __init__(self, season='2023-24'): self.season = season self.games_df = None self.players_df = None self.master_df = None def fetch_all_games(self, team_id=None): """Fetch all games for a season""" finder = leaguegamefinder.LeagueGameFinder( season_nullable=self.season, team_id_nullable=team_id ) self.games_df = finder.get_data_frames()[0] return self.games_df def fetch_game_details(self, game_id): """Fetch detailed data for a single game""" # Traditional box score trad_box = boxscoretraditionalv2.BoxScoreTraditionalV2(game_id=game_id) trad_stats = trad_box.get_data_frames()[0] # Advanced box score adv_box = boxscoreadvancedv2.BoxScoreAdvancedV2(game_id=game_id) adv_stats = adv_box.get_data_frames()[0] # Merge traditional and advanced combined = trad_stats.merge( adv_stats[['PLAYER_ID', 'OFF_RATING', 'DEF_RATING', 'NET_RATING', 'AST_PCT', 'REB_PCT', 'TS_PCT', 'USG_PCT']], on='PLAYER_ID', how='left' ) return combined def build_master_dataset(self, game_ids): """Build comprehensive dataset from multiple games""" all_game_data = [] for game_id in game_ids: try: game_data = self.fetch_game_details(game_id) all_game_data.append(game_data) except Exception as e: print(f"Error fetching {game_id}: {e}") self.master_df = pd.concat(all_game_data, ignore_index=True) return self.master_df def aggregate_player_stats(self): """Aggregate statistics by player""" if self.master_df is None: raise ValueError("Master dataset not built yet") player_agg = self.master_df.groupby('PLAYER_ID').agg({ 'PLAYER_NAME': 'first', 'TEAM_ID': 'first', 'TEAM_ABBREVIATION': 'first', 'MIN': 'sum', 'PTS': 'sum', 'FGM': 'sum', 'FGA': 'sum', 'FG3M': 'sum', 'FG3A': 'sum', 'FTM': 'sum', 'FTA': 'sum', 'REB': 'sum', 'AST': 'sum', 'STL': 'sum', 'BLK': 'sum', 'TOV': 'sum', 'OFF_RATING': 'mean', 'DEF_RATING': 'mean', 'NET_RATING': 'mean', 'TS_PCT': 'mean', 'USG_PCT': 'mean' }).reset_index() # Calculate per-game averages games_played = self.master_df.groupby('PLAYER_ID').size() player_agg['GP'] = player_agg['PLAYER_ID'].map(games_played) player_agg['MPG'] = player_agg['MIN'] / player_agg['GP'] player_agg['PPG'] = player_agg['PTS'] / player_agg['GP'] player_agg['RPG'] = player_agg['REB'] / player_agg['GP'] player_agg['APG'] = player_agg['AST'] / player_agg['GP'] return player_agg # Usage example integrator = NBADataIntegrator(season='2023-24') games = integrator.fetch_all_games(team_id='1610612747') # Lakers game_ids = games['GAME_ID'].unique()[:10] # First 10 games master_data = integrator.build_master_dataset(game_ids) player_stats = integrator.aggregate_player_stats() print(player_stats.sort_values('PPG', ascending=False)) ``` ## Best Practices ### 1. Data Quality Checks ```python def validate_data(df): """Validate NBA data quality""" issues = [] # Check for missing values missing = df.isnull().sum() if missing.any(): issues.append(f"Missing values: {missing[missing > 0].to_dict()}") # Check for duplicate records if df.duplicated().any(): issues.append(f"Duplicate records: {df.duplicated().sum()}") # Check for invalid ranges if 'PTS' in df.columns: if (df['PTS'] < 0).any() or (df['PTS'] > 100).any(): issues.append("Invalid point values detected") if 'FG_PCT' in df.columns: if (df['FG_PCT'] < 0).any() or (df['FG_PCT'] > 1).any(): issues.append("Invalid FG% values detected") return issues # Example usage validation_results = validate_data(player_stats) if validation_results: print("Data quality issues found:") for issue in validation_results: print(f" - {issue}") else: print("Data validation passed!") ``` ### 2. Handling API Rate Limits ```python import time from functools import wraps def rate_limit(delay=0.6): """Decorator to add delay between API calls""" def decorator(func): @wraps(func) def wrapper(*args, **kwargs): time.sleep(delay) return func(*args, **kwargs) return wrapper return decorator @rate_limit(delay=0.6) def fetch_game_data(game_id): """Fetch game data with rate limiting""" boxscore = boxscoretraditionalv2.BoxScoreTraditionalV2(game_id=game_id) return boxscore.get_data_frames()[0] ``` ### 3. Caching Results ```python import pickle import os from datetime import datetime, timedelta def cache_data(filename, data, expire_days=1): """Cache data to disk with expiration""" cache_file = f"cache/{filename}.pkl" os.makedirs('cache', exist_ok=True) with open(cache_file, 'wb') as f: pickle.dump({'data': data, 'timestamp': datetime.now()}, f) def load_cached_data(filename, expire_days=1): """Load cached data if not expired""" cache_file = f"cache/{filename}.pkl" if not os.path.exists(cache_file): return None with open(cache_file, 'rb') as f: cached = pickle.load(f) age = datetime.now() - cached['timestamp'] if age > timedelta(days=expire_days): return None return cached['data'] # Usage cached = load_cached_data('player_stats_2023') if cached is None: print("Fetching fresh data...") data = fetch_player_stats() cache_data('player_stats_2023', data) else: print("Using cached data") data = cached ``` ## Summary Understanding NBA data requires familiarity with: - Different data types (box scores, play-by-play, tracking) - Data schemas and structures - ID systems for players, teams, and games - Techniques for joining and integrating multiple data sources With this knowledge, you can build comprehensive basketball analytics pipelines that combine traditional statistics with advanced tracking data.

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.