Chapter 3: Key Takeaways

Summary

Chapter 3 established the technical foundation for basketball analytics by covering Python installation, environment management, essential libraries, and development tools. A properly configured environment is prerequisite to all analytical work that follows.


Core Concepts

Python Installation

Key considerations for installation:

Aspect Recommendation
Version Python 3.10 or 3.11 (3.12 for latest features)
Source python.org (Windows), Homebrew (macOS), package manager (Linux)
PATH Always add Python to system PATH during installation
Multiple Versions Use pyenv for managing multiple Python versions

Key Insight: Using a recent Python version ensures access to performance improvements, security updates, and modern language features like match statements and improved type hints.


Virtual Environments

Why virtual environments are essential:

  1. Isolation: Project dependencies don't conflict with other projects
  2. Reproducibility: Exact package versions can be specified
  3. Portability: Environment can be recreated on any machine
  4. Safety: System Python remains unmodified

Creating a Virtual Environment:

# Create
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (macOS/Linux)
source venv/bin/activate

# Deactivate
deactivate

Key Insight: Always activate the virtual environment before installing packages or running scripts. The command prompt should show (venv) when activated.


Package Management with pip

Essential pip commands:

Command Purpose
pip install package Install latest version
pip install package==1.2.3 Install specific version
pip install -r requirements.txt Install from file
pip freeze > requirements.txt Export installed packages
pip list Show installed packages
pip show package Package details
pip uninstall package Remove package
pip install --upgrade package Update package

Version Specifiers:

package==1.2.3    # Exact version
package>=1.2.0    # Minimum version
package>=1.2,<2.0 # Version range
package~=1.2.0    # Compatible release (~= 1.2.0 means >=1.2.0, <1.3.0)

Key Insight: Use version ranges in development but pin exact versions for production deployments.


Conda as an Alternative

When to use Conda vs pip:

Scenario Recommendation
Pure Python packages pip is sufficient
Non-Python dependencies Use Conda
Scientific computing Conda often easier
GPU computing (CUDA) Conda handles drivers better
Corporate environments May prefer Conda for policy compliance

Key Insight: You can use pip inside a Conda environment. They're not mutually exclusive.


Essential Libraries for Basketball Analytics

The Data Science Stack:

Library Purpose Import Convention
pandas Data manipulation import pandas as pd
numpy Numerical computing import numpy as np
matplotlib Visualization import matplotlib.pyplot as plt
seaborn Statistical visualization import seaborn as sns
scipy Scientific computing from scipy import stats

Machine Learning:

Library Purpose Key Classes
scikit-learn ML algorithms LinearRegression, RandomForest
statsmodels Statistical modeling OLS, GLM
xgboost Gradient boosting XGBClassifier, XGBRegressor

Basketball-Specific:

Library Purpose
nba_api NBA Stats API access
basketball_reference_web_scraper BBRef scraping

Jupyter Notebooks

Essential keyboard shortcuts:

Mode Shortcut Action
Both Shift+Enter Run cell, move to next
Both Ctrl+Enter Run cell, stay in place
Command A Insert cell above
Command B Insert cell below
Command DD Delete cell
Command M Convert to Markdown
Command Y Convert to code
Command Z Undo cell operation
Edit Tab Code completion
Edit Shift+Tab Show docstring

Magic Commands:

%matplotlib inline    # Display plots in notebook
%load_ext autoreload  # Auto-reload changed modules
%autoreload 2
%timeit expression    # Time execution
%%time                # Time cell execution
%who                  # List variables
%reset               # Clear all variables

Key Insight: Use JupyterLab instead of classic Notebook for better file management and multiple views.


Version Control with Git

Essential Git workflow:

# Initialize repository
git init

# Stage changes
git add filename.py
git add .  # All files (use carefully)

# Commit changes
git commit -m "Add shooting analysis function"

# View status
git status

# View history
git log --oneline

# Create branch
git checkout -b feature/shot-charts

# Merge branch
git checkout main
git merge feature/shot-charts

What to include in .gitignore:

# Virtual environments
venv/
.venv/

# Python cache
__pycache__/
*.pyc

# Jupyter checkpoints
.ipynb_checkpoints/

# Data files (often too large)
*.csv
*.parquet
data/raw/

# Credentials
.env
secrets.json

# IDE settings
.vscode/
.idea/

Key Insight: Never commit credentials, large data files, or virtual environments to Git.


Checklist for Chapter Completion

Before proceeding to Chapter 4, ensure you can:

  • [ ] Install Python 3.10+ on your operating system
  • [ ] Verify Python installation with python --version
  • [ ] Create and activate a virtual environment
  • [ ] Install packages with pip using requirements.txt
  • [ ] Explain the difference between pip and Conda
  • [ ] Import and use pandas, numpy, and matplotlib
  • [ ] Create and run Jupyter notebooks
  • [ ] Use essential Jupyter keyboard shortcuts
  • [ ] Initialize a Git repository and make commits
  • [ ] Create a .gitignore file with appropriate entries
  • [ ] Write a basic requirements.txt file with version specifiers
  • [ ] Configure matplotlib for different backends

Quick Reference: Project Setup

Complete setup sequence:

# 1. Create project directory
mkdir basketball_analysis
cd basketball_analysis

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# 3. Create requirements.txt
cat > requirements.txt << EOF
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.3.0
nba_api>=1.2.0
jupyter>=1.0.0
EOF

# 4. Install packages
pip install -r requirements.txt

# 5. Initialize Git
git init
echo "venv/" >> .gitignore
echo "__pycache__/" >> .gitignore
echo ".ipynb_checkpoints/" >> .gitignore

# 6. Create directory structure
mkdir -p {data,notebooks,src,output}

# 7. Start Jupyter
jupyter lab

Common Pitfalls to Avoid

  1. Forgetting to activate venv - Packages install globally instead of in project
  2. Not pinning versions - Builds become non-reproducible
  3. Committing venv to Git - Repositories become bloated
  4. Storing credentials in code - Security vulnerability
  5. Using Python 2 - End of life, no longer supported
  6. Ignoring deprecation warnings - Code will break in future versions
  7. Not using .gitignore - Accidentally commit sensitive files
  8. Installing with sudo - Permission issues later

IDE Configuration Tips

VS Code Recommended Extensions: - Python (Microsoft) - Pylance (Microsoft) - Jupyter (Microsoft) - GitLens

VS Code settings.json for Python:

{
    "python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
    "python.formatting.provider": "black",
    "editor.formatOnSave": true,
    "python.linting.enabled": true,
    "python.linting.pylintEnabled": true
}

Connections to Other Chapters

  • Chapter 2 (Data Sources): Environment must include nba_api and requests for data collection
  • Chapter 4 (EDA): pandas, matplotlib, seaborn configured here are used extensively
  • Chapter 5 (Descriptive Statistics): scipy and numpy provide statistical functions
  • All Chapters: Jupyter notebooks are the primary development environment

Summary Statement

A well-configured Python environment is the foundation of reproducible analytics. The time invested in proper setup pays dividends through consistent results, easier collaboration, and faster debugging. Master these fundamentals before diving into analytical work.