Setting Up Your Analytics Environment

Beginner 10 min read 1 views Nov 27, 2025

Setting Up Your Basketball Analytics Environment

Before diving into NBA and basketball analytics, you need to establish a robust development environment tailored for basketball data analysis. This comprehensive guide walks you through installing and configuring essential tools, libraries, and workflows for both Python and R-based basketball analytics, with specific focus on NBA data sources, basketball-specific packages, and project structures optimized for basketball research.

Unlike baseball analytics which relies heavily on pybaseball and baseballr, basketball analytics uses a different ecosystem: primarily nba_api for Python and hoopR for R. These packages provide access to the NBA's official stats API, shot chart data, player tracking metrics from Second Spectrum, and comprehensive historical data. Setting up your environment correctly ensures you can efficiently fetch game data, analyze shot patterns, calculate advanced metrics like True Shooting Percentage and Box Plus/Minus, and create visualizations like shot charts and court diagrams.

Installing Python with Anaconda

Anaconda remains the recommended Python distribution for basketball analytics, bundling scientific computing packages and simplifying package management for data-intensive basketball research.

Download and Install Anaconda

  1. Visit anaconda.com/download
  2. Download the latest Python 3.x version for your operating system (Windows, macOS, or Linux)
  3. Run the installer and follow the installation prompts
  4. On Windows, optionally check "Add Anaconda to PATH" for terminal accessibility
  5. Complete installation (requires approximately 3GB of disk space)

Verify Your Installation

Open a terminal (Command Prompt on Windows, Terminal on Mac/Linux) and verify:

# Check Python version (should be 3.10 or later)
python --version

# Check conda version
conda --version

# View installed packages
conda list

# Update conda to latest version
conda update conda

Installing R and RStudio

R provides powerful statistical tools for basketball analytics, with excellent packages for data manipulation, visualization, and accessing NBA data through hoopR.

Install R

  1. Visit cran.r-project.org
  2. Download R 4.3 or later for your operating system
  3. Run the installer with default settings
  4. Verify installation: R --version

Install RStudio

  1. Visit posit.co/download/rstudio-desktop/
  2. Download RStudio Desktop (free version)
  3. Install and launch RStudio
  4. Verify R is detected in the Console pane

Setting Up Virtual Environments

Virtual environments isolate project dependencies, preventing conflicts between basketball analytics projects and ensuring reproducibility.

Python Virtual Environment for Basketball Analytics

Create a dedicated environment for basketball projects:

# Create environment named 'basketball' with Python 3.11
conda create -n basketball python=3.11

# Activate the environment
conda activate basketball

# Verify activation
conda env list

# The active environment will have an asterisk (*)

# Deactivate when finished
conda deactivate

Alternative: Standard venv Module

If you prefer Python's built-in venv:

# Create virtual environment
python -m venv basketball_env

# Activate on Windows
basketball_env\Scripts\activate

# Activate on Mac/Linux
source basketball_env/bin/activate

# Upgrade pip
pip install --upgrade pip

R Project Environments with renv

The renv package provides reproducible R environments:

# Install renv
install.packages("renv")

# Initialize renv in your project directory
renv::init()

# Install packages (isolated to this project)
install.packages("hoopR")
install.packages("tidyverse")

# Save the state of your project library
renv::snapshot()

# Restore packages on another machine or after updating
renv::restore()

Installing Python Basketball Analytics Packages

With your environment activated, install essential Python packages for NBA and basketball analytics.

Core Package Installation

# Activate your basketball environment first
conda activate basketball

# Install nba_api (primary NBA data library)
pip install nba_api

# Install data manipulation libraries
pip install pandas numpy scipy

# Install visualization libraries
pip install matplotlib seaborn plotly

# Install court plotting library
pip install mplbasketball

# Install additional analysis tools
pip install jupyter scikit-learn statsmodels

# Install database connectors
pip install sqlalchemy psycopg2-binary

# Install optional advanced libraries
pip install beautifulsoup4 requests  # For web scraping
pip install opencv-python  # For video analysis (optional)

Verify Python Installation with nba_api

Test your installation with this verification script:

import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from nba_api.stats.static import players, teams
from nba_api.stats.endpoints import leaguedashplayerstats

print(f"Python Version: {sys.version}")
print(f"Pandas Version: {pd.__version__}")
print(f"NumPy Version: {np.__version__}")

# Test nba_api functionality
try:
    # Find LeBron James
    lebron = players.find_players_by_full_name("LeBron James")
    print(f"\nFound player: {lebron[0]['full_name']}")
    print(f"Player ID: {lebron[0]['id']}")

    # Get current season league leaders
    print("\nFetching 2023-24 season stats...")
    stats = leaguedashplayerstats.LeagueDashPlayerStats(
        season='2023-24',
        per_mode_detailed='PerGame',
        season_type_all_star='Regular Season'
    )

    df = stats.get_data_frames()[0]

    # Filter for qualified players (at least 20 games)
    qualified = df[df['GP'] >= 20].copy()

    print(f"\nSuccessfully loaded {len(qualified)} qualified players")
    print(f"Columns available: {len(df.columns)}")

    # Top 5 scorers
    top_scorers = qualified.nlargest(5, 'PTS')[
        ['PLAYER_NAME', 'TEAM_ABBREVIATION', 'GP', 'PTS', 'REB', 'AST']
    ]
    print("\nTop 5 Scorers (2023-24):")
    print(top_scorers.to_string(index=False))

    # Test visualization setup
    plt.figure(figsize=(10, 6))
    sns.set_style("whitegrid")
    plt.close()
    print("\nVisualization libraries configured correctly!")

except Exception as e:
    print(f"Error: {e}")
    print("Note: NBA API may have rate limits or require internet connection")

print("\n✓ All packages installed and working correctly!")

Save Package Requirements

Document your environment for reproducibility:

# Export pip packages to requirements.txt
pip freeze > requirements.txt

# Or export complete conda environment
conda env export > environment.yml

# Recreate environment from requirements.txt
pip install -r requirements.txt

# Recreate conda environment from YAML
conda env create -f environment.yml

Installing R Basketball Analytics Packages

R offers powerful basketball analytics capabilities through hoopR and the tidyverse ecosystem.

Core Package Installation

In RStudio or an R console, install these essential packages:

# Install hoopR (main basketball data package)
install.packages("hoopR")

# Install tidyverse (data manipulation and visualization suite)
install.packages("tidyverse")

# Individual tidyverse components (if preferred)
install.packages(c("dplyr", "ggplot2", "tidyr", "readr",
                   "purrr", "stringr", "lubridate"))

# Install additional data science packages
install.packages(c("data.table", "scales", "janitor"))

# Install statistical modeling packages
install.packages(c("lme4", "mgcv", "broom"))

# Install database connectors
install.packages(c("DBI", "RSQLite", "RPostgres"))

# Install interactive visualization packages
install.packages(c("plotly", "shiny", "DT"))

# Install sports visualization packages
install.packages("ggrepel")  # For better label positioning
install.packages("patchwork")  # For combining plots

Verify R Installation with hoopR

Test your R installation with this verification script:

# Load libraries
library(hoopR)
library(tidyverse)

# Print versions
cat("R Version:", R.version.string, "\n")
cat("hoopR Version:", as.character(packageVersion("hoopR")), "\n")
cat("tidyverse Version:", as.character(packageVersion("tidyverse")), "\n")

# Test hoopR functionality
tryCatch({
    # Get NBA teams
    teams <- nba_teams()
    cat("\nSuccessfully loaded", nrow(teams), "NBA teams\n")

    # Display Lakers and Celtics info
    lakers_celtics <- teams %>%
        filter(abbreviation %in% c("LAL", "BOS")) %>%
        select(full_name, abbreviation, city)

    cat("\nSample teams:\n")
    print(lakers_celtics)

    # Get current season player stats
    cat("\nFetching 2023-24 player statistics...\n")
    player_stats <- nba_leaguedashplayerstats(
        season = "2023-24",
        per_mode = "PerGame"
    )

    cat("Successfully loaded", nrow(player_stats), "player records\n")

    # Top 5 scorers
    top_scorers <- player_stats %>%
        filter(GP >= 20) %>%
        arrange(desc(PTS)) %>%
        head(5) %>%
        select(PLAYER_NAME, TEAM_ABBREVIATION, GP, PTS, REB, AST)

    cat("\nTop 5 Scorers:\n")
    print(top_scorers)

}, error = function(e) {
    cat("Error:", e$message, "\n")
    cat("Note: hoopR requires internet connection and may have API limits\n")
})

# Test data manipulation with sample data
sample_data <- tibble(
    player = c("Player A", "Player B", "Player C"),
    points = c(28.5, 26.3, 24.1),
    ts_pct = c(0.625, 0.598, 0.612),
    per = c(28.3, 25.7, 27.1)
)

cat("\nSample basketball data created:\n")
print(sample_data)

cat("\n✓ All R packages installed and working correctly!\n")

Database Setup for Basketball Analytics

Storing basketball data in a database enables efficient querying of large datasets, particularly when working with play-by-play data or multiple seasons of player statistics.

Option 1: SQLite (Recommended for Beginners)

SQLite is a lightweight, file-based database perfect for personal basketball analytics projects:

Python with SQLite for Basketball Data

import sqlite3
import pandas as pd
from nba_api.stats.endpoints import leaguedashplayerstats

# Create/connect to database
conn = sqlite3.connect('basketball_analytics.db')

# Fetch 2023-24 season data
stats = leaguedashplayerstats.LeagueDashPlayerStats(
    season='2023-24',
    per_mode_detailed='PerGame'
)
data = stats.get_data_frames()[0]

# Store in database
data.to_sql('player_stats_2023_24', conn, if_exists='replace', index=False)

# Query the database - Find elite three-point shooters
query = """
    SELECT PLAYER_NAME, TEAM_ABBREVIATION, GP, PTS, FG3M, FG3_PCT
    FROM player_stats_2023_24
    WHERE FG3A >= 5 AND FG3_PCT >= 0.400 AND GP >= 30
    ORDER BY FG3_PCT DESC
"""
result = pd.read_sql_query(query, conn)
print("Elite Three-Point Shooters:")
print(result)

# Advanced query - Players with 50-40-90 potential
query_elite_shooting = """
    SELECT
        PLAYER_NAME,
        TEAM_ABBREVIATION,
        FG_PCT,
        FG3_PCT,
        FT_PCT,
        PTS
    FROM player_stats_2023_24
    WHERE FG_PCT >= 0.500
      AND FG3_PCT >= 0.400
      AND FT_PCT >= 0.900
      AND GP >= 40
    ORDER BY PTS DESC
"""
elite_shooters = pd.read_sql_query(query_elite_shooting, conn)
print("\n50-40-90 Club Candidates:")
print(elite_shooters)

conn.close()

R with SQLite for Basketball Data

library(DBI)
library(RSQLite)
library(hoopR)
library(dplyr)

# Create/connect to database
con <- dbConnect(RSQLite::SQLite(), "basketball_analytics.db")

# Fetch and store player stats
player_stats <- nba_leaguedashplayerstats(
    season = "2023-24",
    per_mode = "PerGame"
)

dbWriteTable(con, "player_stats_2023_24", player_stats, overwrite = TRUE)

# Query the database - High usage, efficient scorers
query <- "
    SELECT PLAYER_NAME, TEAM_ABBREVIATION, GP, PTS, FGA, FG_PCT
    FROM player_stats_2023_24
    WHERE PTS >= 20 AND FG_PCT >= 0.450 AND GP >= 30
    ORDER BY PTS DESC
    LIMIT 10
"
efficient_scorers <- dbGetQuery(con, query)
print(efficient_scorers)

# Store team data
teams <- nba_teams()
dbWriteTable(con, "nba_teams", teams, overwrite = TRUE)

dbDisconnect(con)

Option 2: PostgreSQL (Production-Ready)

PostgreSQL is ideal for larger basketball analytics projects with multiple seasons and complex queries:

Installation

  1. Download from postgresql.org/download
  2. Install with default settings (remember your password!)
  3. Default port: 5432
  4. Create a database named "basketball_analytics"

Python with PostgreSQL

from sqlalchemy import create_engine
import pandas as pd
from nba_api.stats.endpoints import leaguedashplayerstats

# Create connection string
# Format: postgresql://username:password@localhost:5432/database_name
engine = create_engine('postgresql://postgres:yourpassword@localhost:5432/basketball_analytics')

# Fetch multiple seasons of data
seasons = ['2021-22', '2022-23', '2023-24']

for season in seasons:
    print(f"Fetching {season} data...")
    stats = leaguedashplayerstats.LeagueDashPlayerStats(
        season=season,
        per_mode_detailed='PerGame'
    )
    df = stats.get_data_frames()[0]
    df['SEASON'] = season

    # Store in PostgreSQL
    table_name = f"player_stats_{season.replace('-', '_')}"
    df.to_sql(table_name, engine, if_exists='replace', index=False)
    print(f"Stored {len(df)} records in {table_name}")

# Query across multiple seasons
query = """
    SELECT PLAYER_NAME, SEASON, PTS, REB, AST
    FROM player_stats_2023_24
    WHERE PLAYER_NAME = 'LeBron James'

    UNION ALL

    SELECT PLAYER_NAME, SEASON, PTS, REB, AST
    FROM player_stats_2022_23
    WHERE PLAYER_NAME = 'LeBron James'

    ORDER BY SEASON DESC
"""
lebron_progression = pd.read_sql_query(query, engine)
print(lebron_progression)

R with PostgreSQL

library(DBI)
library(RPostgres)
library(hoopR)

# Connect to PostgreSQL
con <- dbConnect(
    RPostgres::Postgres(),
    dbname = "basketball_analytics",
    host = "localhost",
    port = 5432,
    user = "postgres",
    password = "yourpassword"
)

# Fetch and store data
player_stats <- nba_leaguedashplayerstats(
    season = "2023-24",
    per_mode = "PerGame"
)

dbWriteTable(con, "player_stats_2023_24", player_stats, overwrite = TRUE)

# Create index for faster queries
dbExecute(con, "CREATE INDEX idx_player_name ON player_stats_2023_24(PLAYER_NAME)")

# Query with joins and aggregations
result <- dbGetQuery(con, "
    SELECT
        TEAM_ABBREVIATION,
        COUNT(*) as num_players,
        AVG(PTS) as avg_points,
        AVG(FG3_PCT) as avg_3pt_pct
    FROM player_stats_2023_24
    WHERE GP >= 20
    GROUP BY TEAM_ABBREVIATION
    ORDER BY avg_points DESC
    LIMIT 10
")

print(result)
dbDisconnect(con)

Installing Jupyter Notebooks

Jupyter notebooks are ideal for exploratory basketball analytics, combining code, visualizations, and narrative explanations.

Installation and Setup

# Activate your basketball environment
conda activate basketball

# Install Jupyter Lab (modern interface)
conda install -c conda-forge jupyterlab

# Or install classic Jupyter Notebook
conda install jupyter

# Install useful Jupyter extensions
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user

# Launch Jupyter Lab
jupyter lab

# Or launch classic notebook interface
jupyter notebook

Example Jupyter Notebook Structure for Basketball

# Cell 1: Imports and Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from nba_api.stats.endpoints import leaguedashplayerstats, shotchartdetail
from nba_api.stats.static import players

# Configure visualization settings
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Cell 2: Load Data
print("Loading 2023-24 NBA player statistics...")
stats = leaguedashplayerstats.LeagueDashPlayerStats(
    season='2023-24',
    per_mode_detailed='PerGame'
)
df = stats.get_data_frames()[0]
df = df[df['GP'] >= 20]  # Filter for qualified players
print(f"Loaded {len(df)} qualified players")

# Cell 3: Data Exploration
print("Top 10 Scorers:")
top_scorers = df.nlargest(10, 'PTS')[['PLAYER_NAME', 'TEAM_ABBREVIATION', 'PTS', 'FG_PCT']]
display(top_scorers)

# Cell 4: Calculate Advanced Metrics
# Effective Field Goal Percentage
df['eFG_PCT'] = (df['FGM'] + 0.5 * df['FG3M']) / df['FGA']

# True Shooting Percentage
df['TS_PCT'] = df['PTS'] / (2 * (df['FGA'] + 0.44 * df['FTA']))

# Cell 5: Visualization - Scoring Efficiency
plt.figure(figsize=(12, 8))
plt.scatter(df['PTS'], df['TS_PCT'], alpha=0.6, s=100)
plt.xlabel('Points Per Game', fontsize=12)
plt.ylabel('True Shooting %', fontsize=12)
plt.title('Scoring Volume vs Efficiency (2023-24 Season)', fontsize=14)
plt.axhline(y=df['TS_PCT'].mean(), color='red', linestyle='--',
            label=f'League Avg: {df["TS_PCT"].mean():.3f}')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Cell 6: Statistical Analysis
correlation = df['PTS'].corr(df['TS_PCT'])
print(f"Correlation between PPG and TS%: {correlation:.3f}")
print(f"\nElite scorers (25+ PPG) average TS%: {df[df['PTS'] >= 25]['TS_PCT'].mean():.3f}")
print(f"League average TS%: {df['TS_PCT'].mean():.3f}")

Project Structure for Basketball Analytics

A well-organized project structure is essential for maintainable basketball analytics projects.

Recommended Directory Structure

basketball-analytics-project/
│
├── data/
│   ├── raw/                    # Original data from NBA API
│   │   ├── player_stats/
│   │   ├── shot_charts/
│   │   └── play_by_play/
│   ├── processed/              # Cleaned and transformed data
│   │   ├── aggregated_stats/
│   │   └── player_metrics/
│   └── external/               # Third-party data (Basketball Reference, etc.)
│
├── notebooks/
│   ├── 01-data-collection.ipynb
│   ├── 02-shot-chart-analysis.ipynb
│   ├── 03-player-comparison.ipynb
│   ├── 04-team-performance.ipynb
│   └── 05-predictive-modeling.ipynb
│
├── src/                        # Source code modules
│   ├── __init__.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── nba_api_fetcher.py      # NBA API data collection
│   │   ├── shot_data.py             # Shot chart utilities
│   │   └── play_by_play.py          # Play-by-play processing
│   ├── features/
│   │   ├── __init__.py
│   │   ├── advanced_metrics.py      # TS%, eFG%, PER, etc.
│   │   └── shooting_zones.py        # Shot zone calculations
│   ├── models/
│   │   ├── __init__.py
│   │   ├── player_projections.py
│   │   └── win_probability.py
│   └── visualization/
│       ├── __init__.py
│       ├── shot_charts.py           # Shot chart plotting
│       ├── court_diagrams.py        # Basketball court visualization
│       └── player_dashboards.py
│
├── scripts/                    # Standalone scripts
│   ├── fetch_season_stats.py
│   ├── update_player_data.py
│   └── generate_team_report.py
│
├── tests/                      # Unit tests
│   ├── __init__.py
│   ├── test_data_fetching.py
│   ├── test_metrics.py
│   └── test_shot_charts.py
│
├── reports/                    # Analysis outputs
│   ├── figures/
│   │   ├── shot_charts/
│   │   └── player_comparisons/
│   └── season_analysis_2023_24.md
│
├── config/                     # Configuration files
│   ├── settings.py
│   └── database_config.ini
│
├── docs/                       # Documentation
│   ├── methodology.md
│   └── data_dictionary.md
│
├── .gitignore
├── README.md
├── requirements.txt            # Python dependencies
├── environment.yml             # Conda environment spec
└── setup.py                    # Package installation

Example setup.py for Basketball Analytics

from setuptools import setup, find_packages

setup(
    name='basketball_analytics',
    version='0.1.0',
    description='NBA and basketball analytics toolkit',
    author='Your Name',
    packages=find_packages(where='src'),
    package_dir={'': 'src'},
    install_requires=[
        'pandas>=2.0.0',
        'numpy>=1.24.0',
        'matplotlib>=3.7.0',
        'seaborn>=0.12.0',
        'nba_api>=1.1.14',
        'scikit-learn>=1.3.0',
        'scipy>=1.11.0',
        'sqlalchemy>=2.0.0',
    ],
    extras_require={
        'dev': [
            'pytest>=7.4.0',
            'jupyter>=1.0.0',
            'black>=23.0.0',
            'flake8>=6.0.0',
        ],
        'viz': [
            'plotly>=5.14.0',
            'mplbasketball>=0.1.0',
        ]
    },
    python_requires='>=3.9',
)

Example config/settings.py

import os
from pathlib import Path

# Project paths
PROJECT_ROOT = Path(__file__).parent.parent
DATA_DIR = PROJECT_ROOT / 'data'
RAW_DATA_DIR = DATA_DIR / 'raw'
PROCESSED_DATA_DIR = DATA_DIR / 'processed'
SHOT_CHART_DIR = RAW_DATA_DIR / 'shot_charts'
REPORTS_DIR = PROJECT_ROOT / 'reports'
FIGURES_DIR = REPORTS_DIR / 'figures'

# Create directories if they don't exist
for directory in [RAW_DATA_DIR, PROCESSED_DATA_DIR, SHOT_CHART_DIR, FIGURES_DIR]:
    directory.mkdir(parents=True, exist_ok=True)

# Database settings
DATABASE_PATH = DATA_DIR / 'basketball_analytics.db'

# API settings
NBA_API_RATE_LIMIT = 0.6  # seconds between requests
CACHE_DIR = DATA_DIR / 'cache'
CACHE_DIR.mkdir(exist_ok=True)

# Analysis settings
CURRENT_SEASON = '2023-24'
MIN_GAMES_PLAYED = 20
MIN_MINUTES_PER_GAME = 15

# Shot chart settings
COURT_WIDTH = 50  # feet
COURT_LENGTH = 94  # feet
THREE_POINT_LINE_DISTANCE = 23.75  # feet (corners: 22 feet)

# Team abbreviations
NBA_TEAMS = [
    'ATL', 'BOS', 'BKN', 'CHA', 'CHI', 'CLE', 'DAL', 'DEN', 'DET', 'GSW',
    'HOU', 'IND', 'LAC', 'LAL', 'MEM', 'MIA', 'MIL', 'MIN', 'NOP', 'NYK',
    'OKC', 'ORL', 'PHI', 'PHX', 'POR', 'SAC', 'SAS', 'TOR', 'UTA', 'WAS'
]

Example src/data/nba_api_fetcher.py

"""
NBA API data fetching utilities
"""
import pandas as pd
import time
from nba_api.stats.endpoints import (
    leaguedashplayerstats,
    shotchartdetail,
    playercareerstats,
    teamgamelog
)
from nba_api.stats.static import players, teams
from config.settings import NBA_API_RATE_LIMIT, CACHE_DIR, CURRENT_SEASON

class NBADataFetcher:
    """
    Handles all NBA API data fetching with rate limiting and caching
    """

    def __init__(self, rate_limit=NBA_API_RATE_LIMIT):
        self.rate_limit = rate_limit
        self.last_request_time = 0

    def _wait_for_rate_limit(self):
        """Enforce rate limiting between API requests"""
        current_time = time.time()
        time_since_last = current_time - self.last_request_time
        if time_since_last < self.rate_limit:
            time.sleep(self.rate_limit - time_since_last)
        self.last_request_time = time.time()

    def get_player_stats(self, season=CURRENT_SEASON, per_mode='PerGame'):
        """
        Fetch league-wide player statistics

        Parameters:
        -----------
        season : str
            NBA season (e.g., '2023-24')
        per_mode : str
            'PerGame', 'Totals', or 'Per36'

        Returns:
        --------
        DataFrame with player statistics
        """
        self._wait_for_rate_limit()

        stats = leaguedashplayerstats.LeagueDashPlayerStats(
            season=season,
            per_mode_detailed=per_mode
        )

        df = stats.get_data_frames()[0]
        return df

    def get_shot_chart(self, player_name, season=CURRENT_SEASON):
        """
        Fetch shot chart data for a specific player

        Parameters:
        -----------
        player_name : str
            Full player name (e.g., 'Stephen Curry')
        season : str
            NBA season

        Returns:
        --------
        DataFrame with shot locations and outcomes
        """
        # Find player ID
        player_dict = players.find_players_by_full_name(player_name)
        if not player_dict:
            raise ValueError(f"Player not found: {player_name}")

        player_id = player_dict[0]['id']

        self._wait_for_rate_limit()

        shot_data = shotchartdetail.ShotChartDetail(
            team_id=0,
            player_id=player_id,
            season_nullable=season,
            context_measure_simple='FGA'
        )

        shots_df = shot_data.get_data_frames()[0]
        return shots_df

    def get_team_game_log(self, team_abbr, season=CURRENT_SEASON):
        """
        Fetch game log for a team

        Parameters:
        -----------
        team_abbr : str
            Three-letter team abbreviation (e.g., 'LAL')
        season : str
            NBA season

        Returns:
        --------
        DataFrame with game-by-game results
        """
        # Find team ID
        team_dict = teams.find_team_by_abbreviation(team_abbr)
        if not team_dict:
            raise ValueError(f"Team not found: {team_abbr}")

        team_id = team_dict['id']

        self._wait_for_rate_limit()

        gamelog = teamgamelog.TeamGameLog(
            team_id=team_id,
            season=season
        )

        df = gamelog.get_data_frames()[0]
        return df

# Example usage
if __name__ == "__main__":
    fetcher = NBADataFetcher()

    # Get player stats
    print("Fetching player statistics...")
    stats = fetcher.get_player_stats()
    print(f"Loaded {len(stats)} players")

    # Get shot chart
    print("\nFetching shot chart for Stephen Curry...")
    shots = fetcher.get_shot_chart("Stephen Curry")
    print(f"Loaded {len(shots)} shots")

    print("\nData fetching complete!")

Example R Project Structure

basketball-analytics-r/
│
├── data/
│   ├── raw/
│   ├── processed/
│   └── external/
│
├── R/                          # R function definitions
│   ├── fetch_nba_data.R
│   ├── calculate_metrics.R
│   ├── shot_charts.R
│   └── visualizations.R
│
├── scripts/                    # Analysis scripts
│   ├── 01_collect_season_data.R
│   ├── 02_player_analysis.R
│   └── 03_team_performance.R
│
├── notebooks/                  # R Markdown files
│   ├── season_overview.Rmd
│   └── player_comparison.Rmd
│
├── output/                     # Generated files
│   ├── figures/
│   │   └── shot_charts/
│   └── tables/
│
├── tests/
│   └── testthat/
│
├── renv/                       # R environment
├── renv.lock
├── .Rprofile
├── .gitignore
├── README.md
└── basketball-analytics-r.Rproj

Example R/fetch_nba_data.R

#' Fetch NBA player statistics
#'
#' @param season Character string for season (e.g., "2023-24")
#' @param per_mode Character string: "PerGame", "Totals", or "Per36"
#' @return Tibble with player statistics
fetch_player_stats <- function(season = "2023-24", per_mode = "PerGame") {
    library(hoopR)
    library(dplyr)

    tryCatch({
        stats <- nba_leaguedashplayerstats(
            season = season,
            per_mode = per_mode
        )

        message(sprintf("Fetched stats for %d players in %s season",
                       nrow(stats), season))

        return(stats)

    }, error = function(e) {
        warning(sprintf("Error fetching player stats: %s", e$message))
        return(tibble())
    })
}

#' Calculate True Shooting Percentage
#'
#' @param pts Points scored
#' @param fga Field goal attempts
#' @param fta Free throw attempts
#' @return True shooting percentage
calculate_ts_pct <- function(pts, fga, fta) {
    ts_pct <- pts / (2 * (fga + 0.44 * fta))
    return(ts_pct)
}

#' Calculate Effective Field Goal Percentage
#'
#' @param fgm Field goals made
#' @param fg3m Three-pointers made
#' @param fga Field goal attempts
#' @return Effective field goal percentage
calculate_efg_pct <- function(fgm, fg3m, fga) {
    efg_pct <- (fgm + 0.5 * fg3m) / fga
    return(efg_pct)
}

#' Get team information
#'
#' @return Tibble with NBA team data
get_nba_teams <- function() {
    library(hoopR)

    teams <- nba_teams()
    return(teams)
}

Quick Start Checklist

Python Setup for Basketball Analytics

  • ☐ Install Anaconda Python distribution
  • ☐ Create conda environment named 'basketball'
  • ☐ Install nba_api, pandas, matplotlib, seaborn
  • ☐ Test nba_api by fetching player stats
  • ☐ Install Jupyter Lab or preferred IDE
  • ☐ Set up SQLite or PostgreSQL database
  • ☐ Create requirements.txt file
  • ☐ Test shot chart data fetching

R Setup for Basketball Analytics

  • ☐ Install R (version 4.3 or later)
  • ☐ Install RStudio Desktop
  • ☐ Install hoopR and tidyverse packages
  • ☐ Test hoopR by fetching NBA team data
  • ☐ Initialize renv for project
  • ☐ Configure RStudio preferences
  • ☐ Test database connectivity

Project Setup

  • ☐ Install Git and configure user settings
  • ☐ Create project directory structure
  • ☐ Initialize Git repository
  • ☐ Create .gitignore for data files
  • ☐ Write README.md with project description
  • ☐ Create initial analysis notebook
  • ☐ Set up database schema for basketball data
  • ☐ Test full data pipeline (fetch → process → store → analyze)

Troubleshooting Common Issues

Python and nba_api Issues

Problem Solution
nba_api returns empty data Check internet connection; NBA API may be rate limiting. Add delays between requests.
ModuleNotFoundError for nba_api Ensure conda environment is activated: conda activate basketball
SSL certificate errors Update certificates: conda update certifi or pip install --upgrade certifi
Jupyter kernel not found Install ipykernel: conda install ipykernel then python -m ipykernel install --user --name basketball
API timeout errors Increase timeout in requests or add retry logic with exponential backoff

R and hoopR Issues

Problem Solution
hoopR installation fails Install dependencies first: install.packages(c("dplyr", "httr", "jsonlite"))
API request timeouts Increase timeout: options(timeout = 300) before making requests
Cannot load tidyverse Install with dependencies: install.packages("tidyverse", dependencies = TRUE)
RStudio can't find R Tools → Global Options → R General → Change R version to correct installation
hoopR returns NULL data Check function parameters; ensure season format is correct (e.g., "2023-24")

Next Steps

With your basketball analytics environment configured, you're ready to dive into NBA data analysis:

  1. Explore NBA data sources: Fetch player stats, shot charts, and team data using nba_api or hoopR
  2. Calculate advanced metrics: Implement True Shooting %, Effective FG%, Player Efficiency Rating
  3. Create shot charts: Visualize player shooting patterns on basketball court diagrams
  4. Analyze team performance: Examine Four Factors, pace, offensive/defensive ratings
  5. Build predictive models: Use machine learning for player projections and win probability
  6. Study play-by-play data: Analyze lineup effectiveness and in-game momentum shifts

Remember: basketball analytics combines statistics, domain knowledge, and creativity. Start with simple questions—like "What makes an efficient scorer?" or "How do teams defend the three-point line?"—and build complexity as your skills develop. Document your analyses, version control your code, and always validate your metrics against established sources. The NBA's data richness enables endless analytical possibilities!

Key Takeaways

  • Python's nba_api and R's hoopR provide comprehensive access to NBA statistics, shot charts, and tracking data
  • Virtual environments (conda for Python, renv for R) ensure reproducible analyses across different projects
  • Jupyter notebooks excel at exploratory analysis and visualization, while scripts are better for production pipelines
  • SQLite works well for personal projects; PostgreSQL scales for multi-season databases and complex queries
  • Structured project organization with separate directories for raw data, processed data, notebooks, and source code improves maintainability
  • Rate limiting and caching are essential when working with NBA APIs to avoid being blocked
  • Basketball analytics requires domain knowledge—understand the game context behind the numbers

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.