Welcome to the practical foundation of your basketball analytics journey. In this chapter, we will guide you through setting up a complete Python development environment optimized for sports analytics work. Whether you are analyzing player shooting...
In This Chapter
- Introduction
- 3.1 Installing Python
- 3.2 Package Managers: pip and conda
- 3.3 Essential Libraries for Basketball Analytics
- 3.4 Jupyter Notebooks and Development Workflow
- 3.5 Version Control with Git
- 3.6 Best Practices for Reproducible Analysis
- Data
- Usage
- Project Organization
- Contributing
- License
- 3.7 Virtual Environments and Dependency Management
- 3.8 Complete Installation Guide
- 3.9 Working with Basketball-Specific Libraries
- 3.10 Summary
- Key Terms
- Chapter 3 Checklist
Chapter 3: Python Environment Setup
Introduction
Welcome to the practical foundation of your basketball analytics journey. In this chapter, we will guide you through setting up a complete Python development environment optimized for sports analytics work. Whether you are analyzing player shooting percentages, building predictive models for game outcomes, or creating visualizations of team performance trends, having a properly configured environment is essential for productive and reproducible analysis.
Python has emerged as the dominant programming language in sports analytics for several compelling reasons. Its readable syntax makes it accessible to analysts who may not have formal computer science training. The extensive ecosystem of data science libraries provides powerful tools for statistical analysis, machine learning, and visualization. Additionally, Python's versatility allows analysts to work across the entire analytics pipeline, from data collection and cleaning to model deployment and reporting.
This chapter assumes no prior experience with Python or programming. We will start from scratch, walking through each installation step with detailed explanations. By the end of this chapter, you will have a fully functional analytics environment ready to tackle the basketball data analysis projects in subsequent chapters.
3.1 Installing Python
3.1.1 Understanding Python Versions
Python comes in different versions, and understanding version numbering is important for compatibility. As of this writing, Python 3.11 and 3.12 are the current stable releases, with Python 3.10 still widely used in production environments. We recommend using Python 3.10 or 3.11 for maximum compatibility with data science libraries.
Python 2 reached its end of life in January 2020 and should not be used for new projects. All code examples in this textbook use Python 3 syntax.
3.1.2 Installation on Windows
Step 1: Download the Python Installer
Visit the official Python website at https://www.python.org/downloads/. The website should automatically detect your operating system and suggest the appropriate download. Click the yellow "Download Python 3.x.x" button to download the installer.
Step 2: Run the Installer
Locate the downloaded file (usually in your Downloads folder) and double-click to run it. The installer window will appear with several options.
Important: Before clicking "Install Now," check the box at the bottom that says "Add Python 3.x to PATH." This step is crucial for using Python from the command line.
For a custom installation, click "Customize installation" and ensure the following options are selected: - pip (Python's package installer) - py launcher - for all users (if you have administrator privileges)
Step 3: Verify the Installation
Open Command Prompt (search for "cmd" in the Start menu) and type:
python --version
You should see output like:
Python 3.11.5
Also verify pip is installed:
pip --version
Expected output:
pip 23.2.1 from C:\Users\YourName\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip (python 3.11)
3.1.3 Installation on macOS
Method 1: Using the Official Installer
- Visit https://www.python.org/downloads/macos/
- Download the macOS installer package
- Open the .pkg file and follow the installation wizard
- Complete the installation by allowing any necessary permissions
Method 2: Using Homebrew (Recommended)
Homebrew is a package manager for macOS that simplifies software installation. If you do not have Homebrew installed, open Terminal and run:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Once Homebrew is installed, install Python:
brew install python@3.11
Verify the installation:
python3 --version
pip3 --version
3.1.4 Installation on Linux
Most Linux distributions come with Python pre-installed. However, you may want to install a specific version.
Ubuntu/Debian:
sudo apt update
sudo apt install python3.11 python3.11-venv python3-pip
Fedora:
sudo dnf install python3.11 python3-pip
Arch Linux:
sudo pacman -S python python-pip
Verify your installation:
python3 --version
pip3 --version
3.2 Package Managers: pip and conda
3.2.1 Understanding pip
pip (Pip Installs Packages) is Python's default package manager. It allows you to install, upgrade, and remove Python packages from the Python Package Index (PyPI), a repository containing over 400,000 packages.
Basic pip Commands
Installing a package:
pip install package_name
Installing a specific version:
pip install package_name==1.2.3
Upgrading a package:
pip install --upgrade package_name
Uninstalling a package:
pip uninstall package_name
Listing installed packages:
pip list
Showing package information:
pip show package_name
Installing Multiple Packages from a Requirements File
Create a text file named requirements.txt:
pandas==2.0.3
numpy==1.24.3
matplotlib==3.7.2
seaborn==0.12.2
scikit-learn==1.3.0
statsmodels==0.14.0
jupyter==1.0.0
Install all packages at once:
pip install -r requirements.txt
3.2.2 Understanding Conda
Conda is an alternative package manager that excels at managing complex dependencies, especially for scientific computing. It can install packages from multiple repositories (channels) and manage non-Python dependencies.
Installing Conda
Conda is available through two distributions:
-
Anaconda: A full distribution including Conda, Python, and over 1,500 scientific packages. Large download (approximately 3GB) but convenient for beginners.
-
Miniconda: A minimal installation containing only Conda and Python. Smaller download (approximately 50MB), and you install packages as needed.
For basketball analytics, we recommend Miniconda for more control over your environment.
Installing Miniconda:
- Visit https://docs.conda.io/en/latest/miniconda.html
- Download the installer for your operating system
- Run the installer and follow the prompts
- Accept the license agreement and choose installation options
Basic Conda Commands
Creating a new environment:
conda create --name basketball_analytics python=3.11
Activating an environment:
conda activate basketball_analytics
Installing packages:
conda install pandas numpy matplotlib
Installing from conda-forge (a community channel with more packages):
conda install -c conda-forge seaborn
Listing environments:
conda env list
Exporting an environment:
conda env export > environment.yml
Creating an environment from a file:
conda env create -f environment.yml
3.2.3 pip vs Conda: When to Use Each
| Feature | pip | Conda |
|---|---|---|
| Package source | PyPI | Anaconda repositories, conda-forge |
| Non-Python dependencies | Limited support | Full support |
| Environment management | Requires venv or virtualenv | Built-in |
| Speed | Generally faster | Can be slower for complex dependencies |
| Disk space | Smaller footprint | Larger footprint |
Our Recommendation:
For this textbook, we will primarily use pip with virtual environments (venv). This approach provides a lightweight setup that works well for basketball analytics projects. If you encounter dependency conflicts or need packages with complex system-level dependencies, consider switching to Conda.
3.3 Essential Libraries for Basketball Analytics
3.3.1 pandas: Data Manipulation and Analysis
pandas is the cornerstone library for data analysis in Python. It provides two primary data structures: Series (one-dimensional) and DataFrame (two-dimensional, tabular data). For basketball analytics, you will use pandas to:
- Load and save data from various formats (CSV, Excel, JSON, SQL databases)
- Clean and preprocess player statistics
- Merge datasets (combining player data with team data)
- Calculate aggregate statistics (per-game averages, totals)
- Filter and sort data based on conditions
Installation:
pip install pandas
Basic Usage Example:
import pandas as pd
# Create a DataFrame with player statistics
player_stats = pd.DataFrame({
'player': ['LeBron James', 'Stephen Curry', 'Giannis Antetokounmpo'],
'points_per_game': [25.7, 29.4, 31.1],
'assists_per_game': [8.3, 6.3, 5.7],
'rebounds_per_game': [7.5, 6.1, 11.6]
})
# Display the DataFrame
print(player_stats)
# Calculate average points per game
print(f"Average PPG: {player_stats['points_per_game'].mean():.1f}")
# Filter players with more than 28 points per game
high_scorers = player_stats[player_stats['points_per_game'] > 28]
print(high_scorers)
3.3.2 NumPy: Numerical Computing
NumPy provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on them efficiently. In basketball analytics, NumPy is used for:
- Fast numerical computations
- Linear algebra operations
- Statistical calculations
- Random number generation for simulations
Installation:
pip install numpy
Basic Usage Example:
import numpy as np
# Create an array of shot distances (in feet)
shot_distances = np.array([3, 15, 22, 24, 25, 8, 12, 26, 4, 18])
# Calculate statistics
print(f"Mean distance: {np.mean(shot_distances):.1f} feet")
print(f"Median distance: {np.median(shot_distances):.1f} feet")
print(f"Standard deviation: {np.std(shot_distances):.1f} feet")
# Count shots from different zones
three_pointers = np.sum(shot_distances >= 23.75)
mid_range = np.sum((shot_distances >= 10) & (shot_distances < 23.75))
paint = np.sum(shot_distances < 10)
print(f"Three-pointers: {three_pointers}")
print(f"Mid-range: {mid_range}")
print(f"Paint: {paint}")
3.3.3 Matplotlib: Basic Visualization
Matplotlib is the foundational plotting library in Python. While it has a learning curve, it provides complete control over every aspect of your visualizations.
Installation:
pip install matplotlib
Basic Usage Example:
import matplotlib.pyplot as plt
import numpy as np
# Sample data: games and points scored
games = np.arange(1, 11)
points = [28, 32, 25, 30, 35, 29, 33, 27, 31, 34]
# Create a line plot
plt.figure(figsize=(10, 6))
plt.plot(games, points, marker='o', linewidth=2, markersize=8)
plt.xlabel('Game Number', fontsize=12)
plt.ylabel('Points Scored', fontsize=12)
plt.title('Player Scoring Over 10-Game Stretch', fontsize=14)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('scoring_trend.png', dpi=300)
plt.show()
3.3.4 Seaborn: Statistical Visualization
Seaborn builds on Matplotlib and provides a high-level interface for creating attractive statistical graphics. It integrates closely with pandas DataFrames.
Installation:
pip install seaborn
Basic Usage Example:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
df = pd.DataFrame({
'position': ['PG', 'SG', 'SF', 'PF', 'C'] * 10,
'points': [18, 22, 20, 17, 15, 21, 25, 19, 16, 14,
20, 23, 21, 18, 16, 19, 24, 20, 17, 15,
22, 26, 22, 19, 17, 20, 25, 21, 18, 16,
17, 21, 19, 16, 14, 23, 27, 23, 20, 18,
19, 23, 20, 17, 15, 22, 26, 22, 19, 17]
})
# Create a box plot
plt.figure(figsize=(10, 6))
sns.boxplot(x='position', y='points', data=df, palette='Set2')
plt.xlabel('Position', fontsize=12)
plt.ylabel('Points per Game', fontsize=12)
plt.title('Scoring Distribution by Position', fontsize=14)
plt.tight_layout()
plt.show()
3.3.5 Scikit-learn: Machine Learning
Scikit-learn provides simple and efficient tools for data mining and machine learning. In basketball analytics, you might use it for:
- Predicting game outcomes
- Clustering players by playing style
- Player similarity analysis
- Performance prediction models
Installation:
pip install scikit-learn
Basic Usage Example:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np
# Sample data: minutes played vs points scored
minutes = np.array([25, 30, 35, 28, 32, 38, 22, 27, 33, 36]).reshape(-1, 1)
points = np.array([12, 18, 22, 15, 20, 25, 10, 14, 21, 24])
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
minutes, points, test_size=0.2, random_state=42
)
# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
print(f"Coefficient: {model.coef_[0]:.2f} points per minute")
print(f"R-squared: {model.score(X_test, y_test):.2f}")
3.3.6 Statsmodels: Statistical Modeling
Statsmodels provides classes and functions for estimating statistical models, performing statistical tests, and exploring data. It offers more detailed statistical output compared to scikit-learn.
Installation:
pip install statsmodels
Basic Usage Example:
import statsmodels.api as sm
import numpy as np
import pandas as pd
# Sample data
np.random.seed(42)
data = pd.DataFrame({
'minutes': np.random.uniform(20, 40, 50),
'usage_rate': np.random.uniform(15, 35, 50),
'points': np.random.uniform(8, 30, 50)
})
# Add noise to create a relationship
data['points'] = 0.4 * data['minutes'] + 0.3 * data['usage_rate'] + np.random.normal(0, 3, 50)
# Prepare data for regression
X = data[['minutes', 'usage_rate']]
X = sm.add_constant(X) # Add intercept
y = data['points']
# Fit the model
model = sm.OLS(y, X).fit()
# Print summary statistics
print(model.summary())
3.3.7 Installing All Essential Libraries
To install all the libraries discussed above in one command:
pip install pandas numpy matplotlib seaborn scikit-learn statsmodels jupyter
Or create a requirements.txt file:
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.3.0
statsmodels>=0.14.0
jupyter>=1.0.0
openpyxl>=3.1.0
xlrd>=2.0.0
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=4.9.0
And install with:
pip install -r requirements.txt
3.4 Jupyter Notebooks and Development Workflow
3.4.1 What Are Jupyter Notebooks?
Jupyter Notebooks are interactive documents that combine code, text, visualizations, and output in a single file. They are ideal for:
- Exploratory data analysis
- Creating reproducible research
- Sharing analysis with stakeholders
- Teaching and learning
The name "Jupyter" comes from three programming languages: Julia, Python, and R, although Python is the most commonly used kernel.
3.4.2 Installing Jupyter
Using pip:
pip install jupyter
Using conda:
conda install jupyter
For enhanced functionality, also install JupyterLab:
pip install jupyterlab
3.4.3 Starting Jupyter Notebook
Open your terminal or command prompt and navigate to your project directory:
cd path/to/your/project
jupyter notebook
This command starts the Jupyter server and opens a browser window with the Jupyter interface. You will see a file browser showing the contents of your current directory.
To start JupyterLab instead:
jupyter lab
3.4.4 Creating and Using Notebooks
Creating a New Notebook:
- Click "New" in the upper right corner
- Select "Python 3" under Notebook
Understanding the Interface:
- Cells: Notebooks are composed of cells, which can contain code or markdown text
- Code Cells: Execute Python code and display output
- Markdown Cells: Display formatted text, headers, lists, and equations
Essential Keyboard Shortcuts:
| Shortcut | Action |
|---|---|
| Shift + Enter | Run cell and move to next |
| Ctrl + Enter | Run cell and stay |
| Esc | Enter command mode |
| Enter | Enter edit mode |
| A | Insert cell above (command mode) |
| B | Insert cell below (command mode) |
| DD | Delete cell (command mode) |
| M | Change to markdown cell (command mode) |
| Y | Change to code cell (command mode) |
| Ctrl + S | Save notebook |
3.4.5 Jupyter Best Practices for Analytics
1. Start with Imports and Configuration
Always begin your notebook with all imports and configuration settings:
# Standard library imports
import os
import sys
from datetime import datetime
# Data manipulation
import pandas as pd
import numpy as np
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Configure display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)
# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
# Display plots inline
%matplotlib inline
# Autoreload modules (useful during development)
%load_ext autoreload
%autoreload 2
2. Document Your Analysis
Use markdown cells liberally to explain: - The purpose of each section - Assumptions you are making - Interpretation of results - Data source information
3. Keep Cells Focused
Each cell should do one thing well. This makes debugging easier and improves readability.
4. Clear Output Before Sharing
Before sharing a notebook, restart the kernel and run all cells to ensure reproducibility: - Kernel → Restart & Run All
5. Use Descriptive Variable Names
# Good
player_shooting_stats = pd.read_csv('shooting_data.csv')
three_point_percentage = made_threes / attempted_threes
# Avoid
df = pd.read_csv('shooting_data.csv')
pct = m / a
3.4.6 Alternative Development Environments
While Jupyter Notebooks are excellent for exploration, you may also want to use:
Visual Studio Code
VS Code is a powerful, free code editor with excellent Python support. Install the Python extension for: - Syntax highlighting - Code completion - Integrated debugging - Jupyter notebook support within VS Code
PyCharm
PyCharm is a full-featured Python IDE available in free (Community) and paid (Professional) versions. It offers: - Advanced code refactoring - Built-in version control - Database tools - Professional debugging
Spyder
Spyder is designed specifically for scientific computing. It includes: - Variable explorer - Interactive console - Debugging tools - Integration with scientific libraries
3.5 Version Control with Git
3.5.1 Why Version Control?
Version control systems track changes to files over time. For basketball analytics projects, version control helps you:
- Keep a history of all changes to your code and analysis
- Collaborate with teammates without overwriting each other's work
- Experiment with new approaches without losing working code
- Roll back to previous versions if something breaks
- Share your work publicly or privately
3.5.2 Installing Git
Windows:
- Download the installer from https://git-scm.com/download/win
- Run the installer and follow the prompts
- Accept default settings unless you have specific preferences
macOS:
Using Homebrew:
brew install git
Or install Xcode Command Line Tools:
xcode-select --install
Linux:
Ubuntu/Debian:
sudo apt install git
Fedora:
sudo dnf install git
Verify installation:
git --version
3.5.3 Basic Git Configuration
Set your identity (replace with your information):
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
Set default branch name:
git config --global init.defaultBranch main
Configure line endings (important for cross-platform work):
# Windows
git config --global core.autocrlf true
# macOS/Linux
git config --global core.autocrlf input
3.5.4 Essential Git Commands
Initializing a Repository:
# Navigate to your project folder
cd path/to/basketball_analytics
# Initialize Git repository
git init
Basic Workflow:
# Check status of your files
git status
# Add files to staging area
git add filename.py
git add . # Add all files
# Commit changes with a message
git commit -m "Add player statistics analysis script"
# View commit history
git log
git log --oneline # Compact view
Working with Branches:
# Create a new branch
git branch feature/shooting-analysis
# Switch to a branch
git checkout feature/shooting-analysis
# Create and switch in one command
git checkout -b feature/new-feature
# List all branches
git branch
# Merge a branch into current branch
git merge feature/shooting-analysis
Working with Remote Repositories (GitHub):
# Clone an existing repository
git clone https://github.com/username/repository.git
# Add a remote
git remote add origin https://github.com/username/repository.git
# Push changes to remote
git push origin main
# Pull changes from remote
git pull origin main
3.5.5 Creating a .gitignore File
A .gitignore file specifies which files Git should ignore. Create this file in your project root:
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
.venv/
# Jupyter Notebooks
.ipynb_checkpoints/
*.ipynb_checkpoints/
# Data files (often too large for Git)
data/raw/
*.csv
*.xlsx
*.parquet
!data/sample_data.csv # Exception: keep sample data
# IDE settings
.idea/
.vscode/
*.swp
*.swo
# OS files
.DS_Store
Thumbs.db
# Sensitive information
.env
secrets.json
credentials/
# Output files
output/
reports/*.pdf
*.log
3.5.6 Git Best Practices for Analytics
- Commit Often: Make small, focused commits that do one thing
- Write Good Commit Messages: Use present tense, be descriptive
- Use Branches: Keep experimental work separate from stable code
- Never Commit Secrets: API keys, passwords, credentials
- Keep Data Separate: Large data files should not be in Git
Example Commit Messages:
# Good
Add function to calculate player efficiency rating
Fix bug in three-point percentage calculation
Update season statistics through week 12
# Avoid
Fixed stuff
Updates
asdf
3.6 Best Practices for Reproducible Analysis
3.6.1 What Is Reproducibility?
Reproducibility means that someone else (or future you) can run your analysis and get the same results. This requires:
- Clear documentation of data sources
- Recorded software versions
- Complete code that runs without manual intervention
- Documented random seeds for any stochastic processes
3.6.2 Project Structure
Organize your basketball analytics projects with a consistent structure:
basketball_analytics_project/
│
├── data/
│ ├── raw/ # Original, immutable data
│ ├── processed/ # Cleaned, transformed data
│ └── external/ # Data from third-party sources
│
├── notebooks/
│ ├── 01_data_exploration.ipynb
│ ├── 02_data_cleaning.ipynb
│ └── 03_analysis.ipynb
│
├── src/
│ ├── __init__.py
│ ├── data/ # Data loading and processing
│ │ ├── __init__.py
│ │ └── loaders.py
│ ├── features/ # Feature engineering
│ │ ├── __init__.py
│ │ └── build_features.py
│ ├── models/ # Model training and prediction
│ │ ├── __init__.py
│ │ └── train_model.py
│ └── visualization/ # Plotting functions
│ ├── __init__.py
│ └── visualize.py
│
├── tests/
│ └── test_data_loading.py
│
├── output/
│ ├── figures/
│ └── reports/
│
├── requirements.txt
├── environment.yml
├── README.md
├── .gitignore
└── setup.py
3.6.3 Documentation Best Practices
README.md Template:
# Project Name
Brief description of the project.
## Installation
```bash
pip install -r requirements.txt
Data
Describe data sources and how to obtain them.
Usage
python src/main.py
Project Organization
Explain the folder structure.
Contributing
Guidelines for contributors.
License
License information.
**Code Documentation:**
Use docstrings for functions:
```python
def calculate_true_shooting_percentage(points, fga, fta):
"""
Calculate True Shooting Percentage (TS%).
TS% accounts for the value of three-point field goals and
free throws in addition to conventional two-point field goals.
Parameters
----------
points : int or float
Total points scored
fga : int
Field goal attempts
fta : int
Free throw attempts
Returns
-------
float
True Shooting Percentage as a decimal (e.g., 0.580)
Examples
--------
>>> calculate_true_shooting_percentage(25, 18, 6)
0.595
Notes
-----
Formula: TS% = PTS / (2 * (FGA + 0.44 * FTA))
The 0.44 coefficient is an empirical estimate of the
proportion of free throws that end a possession.
"""
if fga == 0 and fta == 0:
return 0.0
tsa = fga + 0.44 * fta # True Shooting Attempts
return points / (2 * tsa)
3.6.4 Setting Random Seeds
For reproducible results in any analysis involving randomness:
import numpy as np
import random
from sklearn.model_selection import train_test_split
# Set random seeds at the beginning of your script/notebook
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
# Use random_state in scikit-learn functions
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=RANDOM_SEED
)
3.6.5 Recording Package Versions
Always record the exact versions of packages used:
pip freeze > requirements.txt
Or create a more readable requirements file manually:
# Core data science
pandas==2.0.3
numpy==1.24.3
scipy==1.11.2
# Visualization
matplotlib==3.7.2
seaborn==0.12.2
# Machine learning
scikit-learn==1.3.0
statsmodels==0.14.0
# Development
jupyter==1.0.0
pytest==7.4.0
For conda environments:
conda env export --no-builds > environment.yml
3.7 Virtual Environments and Dependency Management
3.7.1 Why Use Virtual Environments?
Virtual environments isolate project dependencies, preventing conflicts between projects. Consider this scenario:
- Project A requires pandas 1.5.0
- Project B requires pandas 2.0.0
Without virtual environments, you cannot satisfy both requirements simultaneously. Virtual environments solve this by creating isolated Python installations for each project.
3.7.2 Using venv (Built-in)
Python 3.3+ includes the venv module for creating virtual environments.
Creating a Virtual Environment:
# Navigate to your project directory
cd basketball_analytics
# Create a virtual environment
python -m venv venv
Activating the Environment:
Windows Command Prompt:
venv\Scripts\activate.bat
Windows PowerShell:
venv\Scripts\Activate.ps1
macOS/Linux:
source venv/bin/activate
When activated, your prompt will show the environment name:
(venv) C:\Users\YourName\basketball_analytics>
Installing Packages in the Environment:
pip install pandas numpy matplotlib
Deactivating the Environment:
deactivate
Deleting a Virtual Environment:
Simply delete the venv folder:
# Windows
rmdir /s /q venv
# macOS/Linux
rm -rf venv
3.7.3 Using Conda Environments
Conda environments offer additional features, including the ability to manage non-Python dependencies.
Creating an Environment:
# Create environment with specific Python version
conda create --name basketball_analytics python=3.11
# Create environment from YAML file
conda env create -f environment.yml
Example environment.yml:
name: basketball_analytics
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- pandas>=2.0.0
- numpy>=1.24.0
- matplotlib>=3.7.0
- seaborn>=0.12.0
- scikit-learn>=1.3.0
- statsmodels>=0.14.0
- jupyter>=1.0.0
- pip
- pip:
- nba_api>=1.2.0
- basketball_reference_scraper>=0.1.0
Managing Environments:
# Activate
conda activate basketball_analytics
# List environments
conda env list
# Deactivate
conda deactivate
# Remove environment
conda env remove --name basketball_analytics
# Update all packages
conda update --all
# Export environment
conda env export > environment.yml
3.7.4 Using virtualenv (Third-party)
virtualenv is a third-party tool that provides more features than the built-in venv:
Installation:
pip install virtualenv
Usage:
# Create environment
virtualenv venv
# Create with specific Python version
virtualenv -p python3.11 venv
# Activate (same as venv)
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
3.7.5 Best Practices for Environment Management
- One Environment Per Project: Do not share environments between projects
- Include Requirements File: Always provide
requirements.txtorenvironment.yml - Pin Versions: Specify exact versions for production environments
- Document Setup: Include setup instructions in your README
- Do Not Commit Environment Folders: Add
venv/to.gitignore
Complete Project Setup Script:
Create a setup.sh (macOS/Linux) or setup.bat (Windows) file:
setup.sh:
#!/bin/bash
# Create virtual environment
python -m venv venv
# Activate environment
source venv/bin/activate
# Upgrade pip
pip install --upgrade pip
# Install dependencies
pip install -r requirements.txt
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
echo "Environment setup complete!"
setup.bat:
@echo off
:: Create virtual environment
python -m venv venv
:: Activate environment
call venv\Scripts\activate.bat
:: Upgrade pip
pip install --upgrade pip
:: Install dependencies
pip install -r requirements.txt
:: Install development dependencies
pip install -r requirements-dev.txt
echo Environment setup complete!
3.8 Complete Installation Guide
3.8.1 Recommended Setup for This Textbook
Follow these steps to set up your environment for all examples in this textbook:
Step 1: Install Python
Download and install Python 3.11 from python.org. Ensure you check "Add Python to PATH" during installation.
Step 2: Create a Project Directory
mkdir basketball_analytics_textbook
cd basketball_analytics_textbook
Step 3: Create a Virtual Environment
python -m venv venv
Step 4: Activate the Environment
Windows:
venv\Scripts\activate
macOS/Linux:
source venv/bin/activate
Step 5: Create requirements.txt
Create a file named requirements.txt with the following content:
# Core data science libraries
pandas>=2.0.0
numpy>=1.24.0
scipy>=1.11.0
# Visualization
matplotlib>=3.7.0
seaborn>=0.12.0
plotly>=5.15.0
# Machine learning and statistics
scikit-learn>=1.3.0
statsmodels>=0.14.0
# Jupyter environment
jupyter>=1.0.0
jupyterlab>=4.0.0
notebook>=7.0.0
ipywidgets>=8.0.0
# Data handling
openpyxl>=3.1.0
xlrd>=2.0.0
pyarrow>=12.0.0
# Web scraping
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=4.9.0
# Basketball-specific APIs
nba_api>=1.2.0
# Development tools
pytest>=7.4.0
black>=23.7.0
flake8>=6.1.0
isort>=5.12.0
mypy>=1.5.0
# Documentation
sphinx>=7.0.0
Step 6: Install Dependencies
pip install --upgrade pip
pip install -r requirements.txt
Step 7: Verify Installation
Create a file named verify_installation.py:
"""Verify that all required packages are installed correctly."""
def check_import(package_name, import_name=None):
"""Check if a package can be imported."""
if import_name is None:
import_name = package_name
try:
module = __import__(import_name)
version = getattr(module, '__version__', 'unknown')
print(f"[OK] {package_name}: {version}")
return True
except ImportError as e:
print(f"[FAILED] {package_name}: {e}")
return False
def main():
"""Run all package checks."""
print("=" * 50)
print("Basketball Analytics Environment Check")
print("=" * 50)
print()
packages = [
('pandas', 'pandas'),
('numpy', 'numpy'),
('matplotlib', 'matplotlib'),
('seaborn', 'seaborn'),
('scikit-learn', 'sklearn'),
('statsmodels', 'statsmodels'),
('scipy', 'scipy'),
('jupyter', 'jupyter'),
('requests', 'requests'),
('beautifulsoup4', 'bs4'),
]
all_ok = True
for package_name, import_name in packages:
if not check_import(package_name, import_name):
all_ok = False
print()
print("=" * 50)
if all_ok:
print("All packages installed successfully!")
print("Your environment is ready for basketball analytics.")
else:
print("Some packages failed to import.")
print("Please review the errors above and reinstall.")
print("=" * 50)
if __name__ == "__main__":
main()
Run the verification:
python verify_installation.py
Step 8: Initialize Git
git init
Create a .gitignore file with appropriate entries (see Section 3.5.5).
Step 9: Start Jupyter
jupyter notebook
Or for JupyterLab:
jupyter lab
3.8.2 Troubleshooting Common Installation Issues
Issue: "python" command not found
Solution: Python may not be in your PATH. Try:
- Windows: Use py instead of python
- Reinstall Python and check "Add to PATH"
- Manually add Python to your system PATH
Issue: pip install fails with permission error
Solution: Do not use sudo pip install on Linux/macOS. Instead:
- Use a virtual environment (recommended)
- Use pip install --user package_name
Issue: Jupyter kernel not found
Solution: Register the virtual environment as a Jupyter kernel:
pip install ipykernel
python -m ipykernel install --user --name=basketball_analytics
Issue: ModuleNotFoundError after installing package
Solution:
- Ensure you are in the correct virtual environment
- Try reinstalling: pip uninstall package_name && pip install package_name
- Restart your Jupyter kernel
Issue: Conda conflicts when installing packages
Solution:
- Use conda update --all to update existing packages
- Create a fresh environment
- Use --no-deps flag and install dependencies manually
3.9 Working with Basketball-Specific Libraries
3.9.1 nba_api: Official NBA Statistics
The nba_api package provides access to NBA.com statistics endpoints.
Installation:
pip install nba_api
Basic Usage:
from nba_api.stats.static import players, teams
from nba_api.stats.endpoints import playercareerstats, leagueleaders
# Find a player
player_dict = players.find_players_by_full_name("LeBron James")
print(player_dict)
# Get career statistics
lebron_id = 2544 # LeBron's player ID
career = playercareerstats.PlayerCareerStats(player_id=lebron_id)
career_df = career.get_data_frames()[0]
print(career_df.head())
# Get league leaders
leaders = leagueleaders.LeagueLeaders(season='2023-24')
leaders_df = leaders.get_data_frames()[0]
print(leaders_df.head(10))
3.9.2 Other Useful Libraries
Basketball Reference Scraper:
pip install basketball_reference_scraper
from basketball_reference_scraper import players
# Get player stats
stats = players.get_stats('LeBron James', stat_type='PER_GAME',
playoffs=False, career=False)
print(stats)
pandas-datareader (for economic data that might correlate with sports business):
pip install pandas-datareader
plotly (for interactive visualizations):
pip install plotly
import plotly.express as px
import pandas as pd
# Create interactive shot chart
shot_data = pd.DataFrame({
'x': [1, 5, 10, 15, 20, 23, 25],
'y': [2, 8, 12, 5, 15, 10, 8],
'made': [1, 0, 1, 1, 0, 1, 1]
})
fig = px.scatter(shot_data, x='x', y='y', color='made',
title='Shot Chart')
fig.show()
3.10 Summary
In this chapter, we covered the complete setup of a Python environment for basketball analytics:
-
Python Installation: We walked through installing Python on Windows, macOS, and Linux systems, emphasizing the importance of adding Python to PATH.
-
Package Managers: We explored pip for installing Python packages and Conda for managing complex dependencies, providing guidance on when to use each.
-
Essential Libraries: We introduced the core data science libraries (pandas, NumPy, Matplotlib, Seaborn, scikit-learn, statsmodels) with practical examples relevant to basketball analytics.
-
Jupyter Notebooks: We covered the installation and effective use of Jupyter Notebooks for interactive analysis, including best practices and keyboard shortcuts.
-
Version Control: We introduced Git for tracking changes, collaborating with others, and maintaining a history of your analysis work.
-
Reproducibility: We discussed project organization, documentation standards, and techniques for ensuring others can reproduce your results.
-
Virtual Environments: We explored venv, virtualenv, and Conda environments for isolating project dependencies.
-
Basketball-Specific Tools: We introduced libraries like nba_api that provide direct access to basketball statistics.
With your environment now configured, you are ready to begin loading and analyzing real basketball data in Chapter 4. Remember that a well-organized, reproducible environment is the foundation of quality analytics work. Take the time to establish good habits now, and they will serve you throughout your analytics career.
Key Terms
- Package Manager: Software tool that automates installing, upgrading, and removing software packages
- pip: Python's default package manager, installing from PyPI
- Conda: Cross-platform package manager popular in data science
- Virtual Environment: Isolated Python installation with its own packages
- Repository: A storage location for software packages or version-controlled code
- Kernel: The computational engine that executes code in Jupyter notebooks
- Git: Distributed version control system
- Commit: A snapshot of your project at a specific point in time
- Branch: An independent line of development in version control
- Reproducibility: The ability for others to obtain the same results from your analysis
Chapter 3 Checklist
Before moving to the next chapter, ensure you can:
- [ ] Install Python and verify the installation
- [ ] Use pip to install and manage packages
- [ ] Create and activate virtual environments
- [ ] Install and use essential data science libraries
- [ ] Create and run Jupyter notebooks
- [ ] Initialize a Git repository and make commits
- [ ] Create a proper project structure
- [ ] Write and use a requirements.txt file
- [ ] Document your code with docstrings
- [ ] Troubleshoot common installation issues