15 min read

In This Chapter

Introduction
3.1 Installing Python
3.2 Package Managers: pip and conda
3.3 Essential Libraries for Basketball Analytics
3.4 Jupyter Notebooks and Development Workflow
3.5 Version Control with Git
3.6 Best Practices for Reproducible Analysis
Data
Usage
Project Organization
Contributing
License
3.7 Virtual Environments and Dependency Management
3.8 Complete Installation Guide
3.9 Working with Basketball-Specific Libraries
3.10 Summary
Key Terms
Chapter 3 Checklist

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 3: Python Environment Setup

Introduction

Welcome to the practical foundation of your basketball analytics journey. In this chapter, we will guide you through setting up a complete Python development environment optimized for sports analytics work. Whether you are analyzing player shooting percentages, building predictive models for game outcomes, or creating visualizations of team performance trends, having a properly configured environment is essential for productive and reproducible analysis.

Python has emerged as the dominant programming language in sports analytics for several compelling reasons. Its readable syntax makes it accessible to analysts who may not have formal computer science training. The extensive ecosystem of data science libraries provides powerful tools for statistical analysis, machine learning, and visualization. Additionally, Python's versatility allows analysts to work across the entire analytics pipeline, from data collection and cleaning to model deployment and reporting.

This chapter assumes no prior experience with Python or programming. We will start from scratch, walking through each installation step with detailed explanations. By the end of this chapter, you will have a fully functional analytics environment ready to tackle the basketball data analysis projects in subsequent chapters.

3.1 Installing Python

3.1.1 Understanding Python Versions

Python comes in different versions, and understanding version numbering is important for compatibility. As of this writing, Python 3.11 and 3.12 are the current stable releases, with Python 3.10 still widely used in production environments. We recommend using Python 3.10 or 3.11 for maximum compatibility with data science libraries.

Python 2 reached its end of life in January 2020 and should not be used for new projects. All code examples in this textbook use Python 3 syntax.

3.1.2 Installation on Windows

Step 1: Download the Python Installer

Visit the official Python website at https://www.python.org/downloads/. The website should automatically detect your operating system and suggest the appropriate download. Click the yellow "Download Python 3.x.x" button to download the installer.

Step 2: Run the Installer

Locate the downloaded file (usually in your Downloads folder) and double-click to run it. The installer window will appear with several options.

Important: Before clicking "Install Now," check the box at the bottom that says "Add Python 3.x to PATH." This step is crucial for using Python from the command line.

For a custom installation, click "Customize installation" and ensure the following options are selected: - pip (Python's package installer) - py launcher - for all users (if you have administrator privileges)

Step 3: Verify the Installation

Open Command Prompt (search for "cmd" in the Start menu) and type:

python --version

You should see output like:

Python 3.11.5

Also verify pip is installed:

pip --version

Expected output:

pip 23.2.1 from C:\Users\YourName\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip (python 3.11)

3.1.3 Installation on macOS

Method 1: Using the Official Installer

Visit https://www.python.org/downloads/macos/
Download the macOS installer package
Open the .pkg file and follow the installation wizard
Complete the installation by allowing any necessary permissions

Method 2: Using Homebrew (Recommended)

Homebrew is a package manager for macOS that simplifies software installation. If you do not have Homebrew installed, open Terminal and run:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Once Homebrew is installed, install Python:

brew install python@3.11

Verify the installation:

python3 --version
pip3 --version

3.1.4 Installation on Linux

Most Linux distributions come with Python pre-installed. However, you may want to install a specific version.

Ubuntu/Debian:

sudo apt update
sudo apt install python3.11 python3.11-venv python3-pip

Fedora:

sudo dnf install python3.11 python3-pip

Arch Linux:

sudo pacman -S python python-pip

Verify your installation:

python3 --version
pip3 --version

3.2 Package Managers: pip and conda

3.2.1 Understanding pip

pip (Pip Installs Packages) is Python's default package manager. It allows you to install, upgrade, and remove Python packages from the Python Package Index (PyPI), a repository containing over 400,000 packages.

Basic pip Commands

Installing a package:

pip install package_name

Installing a specific version:

pip install package_name==1.2.3

Upgrading a package:

pip install --upgrade package_name

Uninstalling a package:

pip uninstall package_name

Listing installed packages:

pip list

Showing package information:

pip show package_name

Installing Multiple Packages from a Requirements File

Create a text file named requirements.txt:

pandas==2.0.3
numpy==1.24.3
matplotlib==3.7.2
seaborn==0.12.2
scikit-learn==1.3.0
statsmodels==0.14.0
jupyter==1.0.0

Install all packages at once:

pip install -r requirements.txt

3.2.2 Understanding Conda

Conda is an alternative package manager that excels at managing complex dependencies, especially for scientific computing. It can install packages from multiple repositories (channels) and manage non-Python dependencies.

Installing Conda

Conda is available through two distributions:

Anaconda: A full distribution including Conda, Python, and over 1,500 scientific packages. Large download (approximately 3GB) but convenient for beginners.
Miniconda: A minimal installation containing only Conda and Python. Smaller download (approximately 50MB), and you install packages as needed.

For basketball analytics, we recommend Miniconda for more control over your environment.

Installing Miniconda:

Visit https://docs.conda.io/en/latest/miniconda.html
Download the installer for your operating system
Run the installer and follow the prompts
Accept the license agreement and choose installation options

Basic Conda Commands

Creating a new environment:

conda create --name basketball_analytics python=3.11

Activating an environment:

conda activate basketball_analytics

Installing packages:

conda install pandas numpy matplotlib

Installing from conda-forge (a community channel with more packages):

conda install -c conda-forge seaborn

Listing environments:

conda env list

Exporting an environment:

conda env export > environment.yml

Creating an environment from a file:

conda env create -f environment.yml

3.2.3 pip vs Conda: When to Use Each

Feature	pip	Conda
Package source	PyPI	Anaconda repositories, conda-forge
Non-Python dependencies	Limited support	Full support
Environment management	Requires venv or virtualenv	Built-in
Speed	Generally faster	Can be slower for complex dependencies
Disk space	Smaller footprint	Larger footprint

Our Recommendation:

For this textbook, we will primarily use pip with virtual environments (venv). This approach provides a lightweight setup that works well for basketball analytics projects. If you encounter dependency conflicts or need packages with complex system-level dependencies, consider switching to Conda.

3.3 Essential Libraries for Basketball Analytics

3.3.1 pandas: Data Manipulation and Analysis

pandas is the cornerstone library for data analysis in Python. It provides two primary data structures: Series (one-dimensional) and DataFrame (two-dimensional, tabular data). For basketball analytics, you will use pandas to:

Load and save data from various formats (CSV, Excel, JSON, SQL databases)
Clean and preprocess player statistics
Merge datasets (combining player data with team data)
Calculate aggregate statistics (per-game averages, totals)
Filter and sort data based on conditions

Installation:

pip install pandas

Basic Usage Example:

import pandas as pd

# Create a DataFrame with player statistics
player_stats = pd.DataFrame({
    'player': ['LeBron James', 'Stephen Curry', 'Giannis Antetokounmpo'],
    'points_per_game': [25.7, 29.4, 31.1],
    'assists_per_game': [8.3, 6.3, 5.7],
    'rebounds_per_game': [7.5, 6.1, 11.6]
})

# Display the DataFrame
print(player_stats)

# Calculate average points per game
print(f"Average PPG: {player_stats['points_per_game'].mean():.1f}")

# Filter players with more than 28 points per game
high_scorers = player_stats[player_stats['points_per_game'] > 28]
print(high_scorers)

3.3.2 NumPy: Numerical Computing

NumPy provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on them efficiently. In basketball analytics, NumPy is used for:

Fast numerical computations
Linear algebra operations
Statistical calculations
Random number generation for simulations

Installation:

pip install numpy

Basic Usage Example:

import numpy as np

# Create an array of shot distances (in feet)
shot_distances = np.array([3, 15, 22, 24, 25, 8, 12, 26, 4, 18])

# Calculate statistics
print(f"Mean distance: {np.mean(shot_distances):.1f} feet")
print(f"Median distance: {np.median(shot_distances):.1f} feet")
print(f"Standard deviation: {np.std(shot_distances):.1f} feet")

# Count shots from different zones
three_pointers = np.sum(shot_distances >= 23.75)
mid_range = np.sum((shot_distances >= 10) & (shot_distances < 23.75))
paint = np.sum(shot_distances < 10)

print(f"Three-pointers: {three_pointers}")
print(f"Mid-range: {mid_range}")
print(f"Paint: {paint}")

3.3.3 Matplotlib: Basic Visualization

Matplotlib is the foundational plotting library in Python. While it has a learning curve, it provides complete control over every aspect of your visualizations.

Installation:

pip install matplotlib

Basic Usage Example:

import matplotlib.pyplot as plt
import numpy as np

# Sample data: games and points scored
games = np.arange(1, 11)
points = [28, 32, 25, 30, 35, 29, 33, 27, 31, 34]

# Create a line plot
plt.figure(figsize=(10, 6))
plt.plot(games, points, marker='o', linewidth=2, markersize=8)
plt.xlabel('Game Number', fontsize=12)
plt.ylabel('Points Scored', fontsize=12)
plt.title('Player Scoring Over 10-Game Stretch', fontsize=14)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('scoring_trend.png', dpi=300)
plt.show()

3.3.4 Seaborn: Statistical Visualization

Seaborn builds on Matplotlib and provides a high-level interface for creating attractive statistical graphics. It integrates closely with pandas DataFrames.

Installation:

pip install seaborn

Basic Usage Example:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Create sample data
df = pd.DataFrame({
    'position': ['PG', 'SG', 'SF', 'PF', 'C'] * 10,
    'points': [18, 22, 20, 17, 15, 21, 25, 19, 16, 14,
               20, 23, 21, 18, 16, 19, 24, 20, 17, 15,
               22, 26, 22, 19, 17, 20, 25, 21, 18, 16,
               17, 21, 19, 16, 14, 23, 27, 23, 20, 18,
               19, 23, 20, 17, 15, 22, 26, 22, 19, 17]
})

# Create a box plot
plt.figure(figsize=(10, 6))
sns.boxplot(x='position', y='points', data=df, palette='Set2')
plt.xlabel('Position', fontsize=12)
plt.ylabel('Points per Game', fontsize=12)
plt.title('Scoring Distribution by Position', fontsize=14)
plt.tight_layout()
plt.show()

3.3.5 Scikit-learn: Machine Learning

Scikit-learn provides simple and efficient tools for data mining and machine learning. In basketball analytics, you might use it for:

Predicting game outcomes
Clustering players by playing style
Player similarity analysis
Performance prediction models

Installation:

pip install scikit-learn

Basic Usage Example:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Sample data: minutes played vs points scored
minutes = np.array([25, 30, 35, 28, 32, 38, 22, 27, 33, 36]).reshape(-1, 1)
points = np.array([12, 18, 22, 15, 20, 25, 10, 14, 21, 24])

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    minutes, points, test_size=0.2, random_state=42
)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
print(f"Coefficient: {model.coef_[0]:.2f} points per minute")
print(f"R-squared: {model.score(X_test, y_test):.2f}")

3.3.6 Statsmodels: Statistical Modeling

Statsmodels provides classes and functions for estimating statistical models, performing statistical tests, and exploring data. It offers more detailed statistical output compared to scikit-learn.

Installation:

pip install statsmodels

Basic Usage Example:

import statsmodels.api as sm
import numpy as np
import pandas as pd

# Sample data
np.random.seed(42)
data = pd.DataFrame({
    'minutes': np.random.uniform(20, 40, 50),
    'usage_rate': np.random.uniform(15, 35, 50),
    'points': np.random.uniform(8, 30, 50)
})

# Add noise to create a relationship
data['points'] = 0.4 * data['minutes'] + 0.3 * data['usage_rate'] + np.random.normal(0, 3, 50)

# Prepare data for regression
X = data[['minutes', 'usage_rate']]
X = sm.add_constant(X)  # Add intercept
y = data['points']

# Fit the model
model = sm.OLS(y, X).fit()

# Print summary statistics
print(model.summary())

3.3.7 Installing All Essential Libraries

To install all the libraries discussed above in one command:

pip install pandas numpy matplotlib seaborn scikit-learn statsmodels jupyter

Or create a requirements.txt file:

pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.3.0
statsmodels>=0.14.0
jupyter>=1.0.0
openpyxl>=3.1.0
xlrd>=2.0.0
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=4.9.0

And install with:

pip install -r requirements.txt

3.4 Jupyter Notebooks and Development Workflow

3.4.1 What Are Jupyter Notebooks?

Jupyter Notebooks are interactive documents that combine code, text, visualizations, and output in a single file. They are ideal for:

Exploratory data analysis
Creating reproducible research
Sharing analysis with stakeholders
Teaching and learning

The name "Jupyter" comes from three programming languages: Julia, Python, and R, although Python is the most commonly used kernel.

3.4.2 Installing Jupyter

Using pip:

pip install jupyter

Using conda:

conda install jupyter

For enhanced functionality, also install JupyterLab:

pip install jupyterlab

3.4.3 Starting Jupyter Notebook

Open your terminal or command prompt and navigate to your project directory:

cd path/to/your/project
jupyter notebook

This command starts the Jupyter server and opens a browser window with the Jupyter interface. You will see a file browser showing the contents of your current directory.

To start JupyterLab instead:

jupyter lab

3.4.4 Creating and Using Notebooks

Creating a New Notebook:

Click "New" in the upper right corner
Select "Python 3" under Notebook

Understanding the Interface:

Cells: Notebooks are composed of cells, which can contain code or markdown text
Code Cells: Execute Python code and display output
Markdown Cells: Display formatted text, headers, lists, and equations

Essential Keyboard Shortcuts:

Shortcut	Action
Shift + Enter	Run cell and move to next
Ctrl + Enter	Run cell and stay
Esc	Enter command mode
Enter	Enter edit mode
A	Insert cell above (command mode)
B	Insert cell below (command mode)
DD	Delete cell (command mode)
M	Change to markdown cell (command mode)
Y	Change to code cell (command mode)
Ctrl + S	Save notebook

3.4.5 Jupyter Best Practices for Analytics

1. Start with Imports and Configuration

Always begin your notebook with all imports and configuration settings:

# Standard library imports
import os
import sys
from datetime import datetime

# Data manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Configure display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

# Display plots inline
%matplotlib inline

# Autoreload modules (useful during development)
%load_ext autoreload
%autoreload 2

2. Document Your Analysis

Use markdown cells liberally to explain: - The purpose of each section - Assumptions you are making - Interpretation of results - Data source information

3. Keep Cells Focused

Each cell should do one thing well. This makes debugging easier and improves readability.

4. Clear Output Before Sharing

Before sharing a notebook, restart the kernel and run all cells to ensure reproducibility: - Kernel → Restart & Run All

5. Use Descriptive Variable Names

# Good
player_shooting_stats = pd.read_csv('shooting_data.csv')
three_point_percentage = made_threes / attempted_threes

# Avoid
df = pd.read_csv('shooting_data.csv')
pct = m / a

3.4.6 Alternative Development Environments

While Jupyter Notebooks are excellent for exploration, you may also want to use:

Visual Studio Code

VS Code is a powerful, free code editor with excellent Python support. Install the Python extension for: - Syntax highlighting - Code completion - Integrated debugging - Jupyter notebook support within VS Code

PyCharm

PyCharm is a full-featured Python IDE available in free (Community) and paid (Professional) versions. It offers: - Advanced code refactoring - Built-in version control - Database tools - Professional debugging

Spyder

Spyder is designed specifically for scientific computing. It includes: - Variable explorer - Interactive console - Debugging tools - Integration with scientific libraries

3.5 Version Control with Git

3.5.1 Why Version Control?

Version control systems track changes to files over time. For basketball analytics projects, version control helps you:

Keep a history of all changes to your code and analysis
Collaborate with teammates without overwriting each other's work
Experiment with new approaches without losing working code
Roll back to previous versions if something breaks
Share your work publicly or privately

3.5.2 Installing Git

Windows:

Download the installer from https://git-scm.com/download/win
Run the installer and follow the prompts
Accept default settings unless you have specific preferences

macOS:

Using Homebrew:

brew install git

Or install Xcode Command Line Tools:

xcode-select --install

Linux:

Ubuntu/Debian:

sudo apt install git

Fedora:

sudo dnf install git

Verify installation:

git --version

3.5.3 Basic Git Configuration

Set your identity (replace with your information):

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Set default branch name:

git config --global init.defaultBranch main

Configure line endings (important for cross-platform work):

# Windows
git config --global core.autocrlf true

# macOS/Linux
git config --global core.autocrlf input

3.5.4 Essential Git Commands

Initializing a Repository:

# Navigate to your project folder
cd path/to/basketball_analytics

# Initialize Git repository
git init

Basic Workflow:

# Check status of your files
git status

# Add files to staging area
git add filename.py
git add .  # Add all files

# Commit changes with a message
git commit -m "Add player statistics analysis script"

# View commit history
git log
git log --oneline  # Compact view

Working with Branches:

# Create a new branch
git branch feature/shooting-analysis

# Switch to a branch
git checkout feature/shooting-analysis

# Create and switch in one command
git checkout -b feature/new-feature

# List all branches
git branch

# Merge a branch into current branch
git merge feature/shooting-analysis

Working with Remote Repositories (GitHub):

# Clone an existing repository
git clone https://github.com/username/repository.git

# Add a remote
git remote add origin https://github.com/username/repository.git

# Push changes to remote
git push origin main

# Pull changes from remote
git pull origin main

3.5.5 Creating a .gitignore File

A .gitignore file specifies which files Git should ignore. Create this file in your project root:

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
.venv/

# Jupyter Notebooks
.ipynb_checkpoints/
*.ipynb_checkpoints/

# Data files (often too large for Git)
data/raw/
*.csv
*.xlsx
*.parquet
!data/sample_data.csv  # Exception: keep sample data

# IDE settings
.idea/
.vscode/
*.swp
*.swo

# OS files
.DS_Store
Thumbs.db

# Sensitive information
.env
secrets.json
credentials/

# Output files
output/
reports/*.pdf
*.log

3.5.6 Git Best Practices for Analytics

Commit Often: Make small, focused commits that do one thing
Write Good Commit Messages: Use present tense, be descriptive
Use Branches: Keep experimental work separate from stable code
Never Commit Secrets: API keys, passwords, credentials
Keep Data Separate: Large data files should not be in Git

Example Commit Messages:

# Good
Add function to calculate player efficiency rating
Fix bug in three-point percentage calculation
Update season statistics through week 12

# Avoid
Fixed stuff
Updates
asdf

3.6 Best Practices for Reproducible Analysis

3.6.1 What Is Reproducibility?

Reproducibility means that someone else (or future you) can run your analysis and get the same results. This requires:

Clear documentation of data sources
Recorded software versions
Complete code that runs without manual intervention
Documented random seeds for any stochastic processes

3.6.2 Project Structure

Organize your basketball analytics projects with a consistent structure:

basketball_analytics_project/
│
├── data/
│   ├── raw/           # Original, immutable data
│   ├── processed/     # Cleaned, transformed data
│   └── external/      # Data from third-party sources
│
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_data_cleaning.ipynb
│   └── 03_analysis.ipynb
│
├── src/
│   ├── __init__.py
│   ├── data/          # Data loading and processing
│   │   ├── __init__.py
│   │   └── loaders.py
│   ├── features/      # Feature engineering
│   │   ├── __init__.py
│   │   └── build_features.py
│   ├── models/        # Model training and prediction
│   │   ├── __init__.py
│   │   └── train_model.py
│   └── visualization/ # Plotting functions
│       ├── __init__.py
│       └── visualize.py
│
├── tests/
│   └── test_data_loading.py
│
├── output/
│   ├── figures/
│   └── reports/
│
├── requirements.txt
├── environment.yml
├── README.md
├── .gitignore
└── setup.py

3.6.3 Documentation Best Practices

README.md Template:

# Project Name

Brief description of the project.

## Installation

```bash
pip install -r requirements.txt

Data

Describe data sources and how to obtain them.

Usage

python src/main.py

Project Organization

Explain the folder structure.

Contributing

Guidelines for contributors.

License

License information.


**Code Documentation:**

Use docstrings for functions:

```python
def calculate_true_shooting_percentage(points, fga, fta):
    """
    Calculate True Shooting Percentage (TS%).

    TS% accounts for the value of three-point field goals and
    free throws in addition to conventional two-point field goals.

    Parameters
    ----------
    points : int or float
        Total points scored
    fga : int
        Field goal attempts
    fta : int
        Free throw attempts

    Returns
    -------
    float
        True Shooting Percentage as a decimal (e.g., 0.580)

    Examples
    --------
    >>> calculate_true_shooting_percentage(25, 18, 6)
    0.595

    Notes
    -----
    Formula: TS% = PTS / (2 * (FGA + 0.44 * FTA))
    The 0.44 coefficient is an empirical estimate of the
    proportion of free throws that end a possession.
    """
    if fga == 0 and fta == 0:
        return 0.0

    tsa = fga + 0.44 * fta  # True Shooting Attempts
    return points / (2 * tsa)

3.6.4 Setting Random Seeds

For reproducible results in any analysis involving randomness:

import numpy as np
import random
from sklearn.model_selection import train_test_split

# Set random seeds at the beginning of your script/notebook
RANDOM_SEED = 42

np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)

# Use random_state in scikit-learn functions
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_SEED
)

3.6.5 Recording Package Versions

Always record the exact versions of packages used:

pip freeze > requirements.txt

Or create a more readable requirements file manually:

# Core data science
pandas==2.0.3
numpy==1.24.3
scipy==1.11.2

# Visualization
matplotlib==3.7.2
seaborn==0.12.2

# Machine learning
scikit-learn==1.3.0
statsmodels==0.14.0

# Development
jupyter==1.0.0
pytest==7.4.0

For conda environments:

conda env export --no-builds > environment.yml

3.7 Virtual Environments and Dependency Management

3.7.1 Why Use Virtual Environments?

Virtual environments isolate project dependencies, preventing conflicts between projects. Consider this scenario:

Project A requires pandas 1.5.0
Project B requires pandas 2.0.0

Without virtual environments, you cannot satisfy both requirements simultaneously. Virtual environments solve this by creating isolated Python installations for each project.

3.7.2 Using venv (Built-in)

Python 3.3+ includes the venv module for creating virtual environments.

Creating a Virtual Environment:

# Navigate to your project directory
cd basketball_analytics

# Create a virtual environment
python -m venv venv

Activating the Environment:

Windows Command Prompt:

venv\Scripts\activate.bat

Windows PowerShell:

venv\Scripts\Activate.ps1

macOS/Linux:

source venv/bin/activate

When activated, your prompt will show the environment name:

(venv) C:\Users\YourName\basketball_analytics>

Installing Packages in the Environment:

pip install pandas numpy matplotlib

Deactivating the Environment:

deactivate

Deleting a Virtual Environment:

Simply delete the venv folder:

# Windows
rmdir /s /q venv

# macOS/Linux
rm -rf venv

3.7.3 Using Conda Environments

Conda environments offer additional features, including the ability to manage non-Python dependencies.

Creating an Environment:

# Create environment with specific Python version
conda create --name basketball_analytics python=3.11

# Create environment from YAML file
conda env create -f environment.yml

Example environment.yml:

name: basketball_analytics
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - pandas>=2.0.0
  - numpy>=1.24.0
  - matplotlib>=3.7.0
  - seaborn>=0.12.0
  - scikit-learn>=1.3.0
  - statsmodels>=0.14.0
  - jupyter>=1.0.0
  - pip
  - pip:
    - nba_api>=1.2.0
    - basketball_reference_scraper>=0.1.0

Managing Environments:

# Activate
conda activate basketball_analytics

# List environments
conda env list

# Deactivate
conda deactivate

# Remove environment
conda env remove --name basketball_analytics

# Update all packages
conda update --all

# Export environment
conda env export > environment.yml

3.7.4 Using virtualenv (Third-party)

virtualenv is a third-party tool that provides more features than the built-in venv:

Installation:

pip install virtualenv

Usage:

# Create environment
virtualenv venv

# Create with specific Python version
virtualenv -p python3.11 venv

# Activate (same as venv)
source venv/bin/activate  # macOS/Linux
venv\Scripts\activate     # Windows

3.7.5 Best Practices for Environment Management

One Environment Per Project: Do not share environments between projects
Include Requirements File: Always provide requirements.txt or environment.yml
Pin Versions: Specify exact versions for production environments
Document Setup: Include setup instructions in your README
Do Not Commit Environment Folders: Add venv/ to .gitignore

Complete Project Setup Script:

Create a setup.sh (macOS/Linux) or setup.bat (Windows) file:

setup.sh:

#!/bin/bash

# Create virtual environment
python -m venv venv

# Activate environment
source venv/bin/activate

# Upgrade pip
pip install --upgrade pip

# Install dependencies
pip install -r requirements.txt

# Install development dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

echo "Environment setup complete!"

setup.bat:

@echo off

:: Create virtual environment
python -m venv venv

:: Activate environment
call venv\Scripts\activate.bat

:: Upgrade pip
pip install --upgrade pip

:: Install dependencies
pip install -r requirements.txt

:: Install development dependencies
pip install -r requirements-dev.txt

echo Environment setup complete!

3.8 Complete Installation Guide

3.8.1 Recommended Setup for This Textbook

Follow these steps to set up your environment for all examples in this textbook:

Step 1: Install Python

Download and install Python 3.11 from python.org. Ensure you check "Add Python to PATH" during installation.

Step 2: Create a Project Directory

mkdir basketball_analytics_textbook
cd basketball_analytics_textbook

Step 3: Create a Virtual Environment

python -m venv venv

Step 4: Activate the Environment

Windows:

venv\Scripts\activate

macOS/Linux:

source venv/bin/activate

Step 5: Create requirements.txt

Create a file named requirements.txt with the following content:

# Core data science libraries
pandas>=2.0.0
numpy>=1.24.0
scipy>=1.11.0

# Visualization
matplotlib>=3.7.0
seaborn>=0.12.0
plotly>=5.15.0

# Machine learning and statistics
scikit-learn>=1.3.0
statsmodels>=0.14.0

# Jupyter environment
jupyter>=1.0.0
jupyterlab>=4.0.0
notebook>=7.0.0
ipywidgets>=8.0.0

# Data handling
openpyxl>=3.1.0
xlrd>=2.0.0
pyarrow>=12.0.0

# Web scraping
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=4.9.0

# Basketball-specific APIs
nba_api>=1.2.0

# Development tools
pytest>=7.4.0
black>=23.7.0
flake8>=6.1.0
isort>=5.12.0
mypy>=1.5.0

# Documentation
sphinx>=7.0.0

Step 6: Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Step 7: Verify Installation

Create a file named verify_installation.py:

"""Verify that all required packages are installed correctly."""

def check_import(package_name, import_name=None):
    """Check if a package can be imported."""
    if import_name is None:
        import_name = package_name
    try:
        module = __import__(import_name)
        version = getattr(module, '__version__', 'unknown')
        print(f"[OK] {package_name}: {version}")
        return True
    except ImportError as e:
        print(f"[FAILED] {package_name}: {e}")
        return False

def main():
    """Run all package checks."""
    print("=" * 50)
    print("Basketball Analytics Environment Check")
    print("=" * 50)
    print()

    packages = [
        ('pandas', 'pandas'),
        ('numpy', 'numpy'),
        ('matplotlib', 'matplotlib'),
        ('seaborn', 'seaborn'),
        ('scikit-learn', 'sklearn'),
        ('statsmodels', 'statsmodels'),
        ('scipy', 'scipy'),
        ('jupyter', 'jupyter'),
        ('requests', 'requests'),
        ('beautifulsoup4', 'bs4'),
    ]

    all_ok = True
    for package_name, import_name in packages:
        if not check_import(package_name, import_name):
            all_ok = False

    print()
    print("=" * 50)
    if all_ok:
        print("All packages installed successfully!")
        print("Your environment is ready for basketball analytics.")
    else:
        print("Some packages failed to import.")
        print("Please review the errors above and reinstall.")
    print("=" * 50)

if __name__ == "__main__":
    main()

Run the verification:

python verify_installation.py

Step 8: Initialize Git

git init

Create a .gitignore file with appropriate entries (see Section 3.5.5).

Step 9: Start Jupyter

jupyter notebook

Or for JupyterLab:

jupyter lab

3.8.2 Troubleshooting Common Installation Issues

Issue: "python" command not found

Solution: Python may not be in your PATH. Try: - Windows: Use py instead of python - Reinstall Python and check "Add to PATH" - Manually add Python to your system PATH

Issue: pip install fails with permission error

Solution: Do not use sudo pip install on Linux/macOS. Instead: - Use a virtual environment (recommended) - Use pip install --user package_name

Issue: Jupyter kernel not found

Solution: Register the virtual environment as a Jupyter kernel:

pip install ipykernel
python -m ipykernel install --user --name=basketball_analytics

Issue: ModuleNotFoundError after installing package

Solution: - Ensure you are in the correct virtual environment - Try reinstalling: pip uninstall package_name && pip install package_name - Restart your Jupyter kernel

Issue: Conda conflicts when installing packages

Solution: - Use conda update --all to update existing packages - Create a fresh environment - Use --no-deps flag and install dependencies manually

3.9 Working with Basketball-Specific Libraries

3.9.1 nba_api: Official NBA Statistics

The nba_api package provides access to NBA.com statistics endpoints.

Installation:

pip install nba_api

Basic Usage:

from nba_api.stats.static import players, teams
from nba_api.stats.endpoints import playercareerstats, leagueleaders

# Find a player
player_dict = players.find_players_by_full_name("LeBron James")
print(player_dict)

# Get career statistics
lebron_id = 2544  # LeBron's player ID
career = playercareerstats.PlayerCareerStats(player_id=lebron_id)
career_df = career.get_data_frames()[0]
print(career_df.head())

# Get league leaders
leaders = leagueleaders.LeagueLeaders(season='2023-24')
leaders_df = leaders.get_data_frames()[0]
print(leaders_df.head(10))

3.9.2 Other Useful Libraries

Basketball Reference Scraper:

pip install basketball_reference_scraper

from basketball_reference_scraper import players

# Get player stats
stats = players.get_stats('LeBron James', stat_type='PER_GAME',
                          playoffs=False, career=False)
print(stats)

pandas-datareader (for economic data that might correlate with sports business):

pip install pandas-datareader

plotly (for interactive visualizations):

pip install plotly

import plotly.express as px
import pandas as pd

# Create interactive shot chart
shot_data = pd.DataFrame({
    'x': [1, 5, 10, 15, 20, 23, 25],
    'y': [2, 8, 12, 5, 15, 10, 8],
    'made': [1, 0, 1, 1, 0, 1, 1]
})

fig = px.scatter(shot_data, x='x', y='y', color='made',
                 title='Shot Chart')
fig.show()

3.10 Summary

In this chapter, we covered the complete setup of a Python environment for basketball analytics:

Python Installation: We walked through installing Python on Windows, macOS, and Linux systems, emphasizing the importance of adding Python to PATH.
Package Managers: We explored pip for installing Python packages and Conda for managing complex dependencies, providing guidance on when to use each.
Essential Libraries: We introduced the core data science libraries (pandas, NumPy, Matplotlib, Seaborn, scikit-learn, statsmodels) with practical examples relevant to basketball analytics.
Jupyter Notebooks: We covered the installation and effective use of Jupyter Notebooks for interactive analysis, including best practices and keyboard shortcuts.
Version Control: We introduced Git for tracking changes, collaborating with others, and maintaining a history of your analysis work.
Reproducibility: We discussed project organization, documentation standards, and techniques for ensuring others can reproduce your results.
Virtual Environments: We explored venv, virtualenv, and Conda environments for isolating project dependencies.
Basketball-Specific Tools: We introduced libraries like nba_api that provide direct access to basketball statistics.

With your environment now configured, you are ready to begin loading and analyzing real basketball data in Chapter 4. Remember that a well-organized, reproducible environment is the foundation of quality analytics work. Take the time to establish good habits now, and they will serve you throughout your analytics career.

Key Terms

Package Manager: Software tool that automates installing, upgrading, and removing software packages
pip: Python's default package manager, installing from PyPI
Conda: Cross-platform package manager popular in data science
Virtual Environment: Isolated Python installation with its own packages
Repository: A storage location for software packages or version-controlled code
Kernel: The computational engine that executes code in Jupyter notebooks
Git: Distributed version control system
Commit: A snapshot of your project at a specific point in time
Branch: An independent line of development in version control
Reproducibility: The ability for others to obtain the same results from your analysis

Chapter 3 Checklist

Before moving to the next chapter, ensure you can:

[ ] Install Python and verify the installation
[ ] Use pip to install and manage packages
[ ] Create and activate virtual environments
[ ] Install and use essential data science libraries
[ ] Create and run Jupyter notebooks
[ ] Initialize a Git repository and make commits
[ ] Create a proper project structure
[ ] Write and use a requirements.txt file
[ ] Document your code with docstrings
[ ] Troubleshoot common installation issues

In This Chapter

Chapter 3: Python Environment Setup

Introduction

3.1 Installing Python

3.1.1 Understanding Python Versions

3.1.2 Installation on Windows

3.1.3 Installation on macOS

3.1.4 Installation on Linux

3.2 Package Managers: pip and conda

3.2.1 Understanding pip

3.2.2 Understanding Conda

3.2.3 pip vs Conda: When to Use Each

3.3 Essential Libraries for Basketball Analytics

3.3.1 pandas: Data Manipulation and Analysis

3.3.2 NumPy: Numerical Computing

3.3.3 Matplotlib: Basic Visualization

3.3.4 Seaborn: Statistical Visualization

3.3.5 Scikit-learn: Machine Learning

3.3.6 Statsmodels: Statistical Modeling

3.3.7 Installing All Essential Libraries

3.4 Jupyter Notebooks and Development Workflow

3.4.1 What Are Jupyter Notebooks?

3.4.2 Installing Jupyter

3.4.3 Starting Jupyter Notebook

3.4.4 Creating and Using Notebooks

3.4.5 Jupyter Best Practices for Analytics

3.4.6 Alternative Development Environments

3.5 Version Control with Git

3.5.1 Why Version Control?

3.5.2 Installing Git

3.5.3 Basic Git Configuration

3.5.4 Essential Git Commands

3.5.5 Creating a .gitignore File

3.5.6 Git Best Practices for Analytics

3.6 Best Practices for Reproducible Analysis

3.6.1 What Is Reproducibility?

3.6.2 Project Structure

3.6.3 Documentation Best Practices

Data

Usage

Project Organization

Contributing

License

3.6.4 Setting Random Seeds

3.6.5 Recording Package Versions

3.7 Virtual Environments and Dependency Management

3.7.1 Why Use Virtual Environments?

3.7.2 Using venv (Built-in)

3.7.3 Using Conda Environments

3.7.4 Using virtualenv (Third-party)

3.7.5 Best Practices for Environment Management

3.8 Complete Installation Guide

3.8.1 Recommended Setup for This Textbook

3.8.2 Troubleshooting Common Installation Issues

3.9 Working with Basketball-Specific Libraries

3.9.1 nba_api: Official NBA Statistics

3.9.2 Other Useful Libraries

3.10 Summary

Key Terms

Chapter 3 Checklist

Related Reading