Chapter 3: Key Takeaways
Summary
Chapter 3 established the technical foundation for basketball analytics by covering Python installation, environment management, essential libraries, and development tools. A properly configured environment is prerequisite to all analytical work that follows.
Core Concepts
Python Installation
Key considerations for installation:
| Aspect | Recommendation |
|---|---|
| Version | Python 3.10 or 3.11 (3.12 for latest features) |
| Source | python.org (Windows), Homebrew (macOS), package manager (Linux) |
| PATH | Always add Python to system PATH during installation |
| Multiple Versions | Use pyenv for managing multiple Python versions |
Key Insight: Using a recent Python version ensures access to performance improvements, security updates, and modern language features like match statements and improved type hints.
Virtual Environments
Why virtual environments are essential:
- Isolation: Project dependencies don't conflict with other projects
- Reproducibility: Exact package versions can be specified
- Portability: Environment can be recreated on any machine
- Safety: System Python remains unmodified
Creating a Virtual Environment:
# Create
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate
# Deactivate
deactivate
Key Insight: Always activate the virtual environment before installing packages or running scripts. The command prompt should show (venv) when activated.
Package Management with pip
Essential pip commands:
| Command | Purpose |
|---|---|
pip install package |
Install latest version |
pip install package==1.2.3 |
Install specific version |
pip install -r requirements.txt |
Install from file |
pip freeze > requirements.txt |
Export installed packages |
pip list |
Show installed packages |
pip show package |
Package details |
pip uninstall package |
Remove package |
pip install --upgrade package |
Update package |
Version Specifiers:
package==1.2.3 # Exact version
package>=1.2.0 # Minimum version
package>=1.2,<2.0 # Version range
package~=1.2.0 # Compatible release (~= 1.2.0 means >=1.2.0, <1.3.0)
Key Insight: Use version ranges in development but pin exact versions for production deployments.
Conda as an Alternative
When to use Conda vs pip:
| Scenario | Recommendation |
|---|---|
| Pure Python packages | pip is sufficient |
| Non-Python dependencies | Use Conda |
| Scientific computing | Conda often easier |
| GPU computing (CUDA) | Conda handles drivers better |
| Corporate environments | May prefer Conda for policy compliance |
Key Insight: You can use pip inside a Conda environment. They're not mutually exclusive.
Essential Libraries for Basketball Analytics
The Data Science Stack:
| Library | Purpose | Import Convention |
|---|---|---|
| pandas | Data manipulation | import pandas as pd |
| numpy | Numerical computing | import numpy as np |
| matplotlib | Visualization | import matplotlib.pyplot as plt |
| seaborn | Statistical visualization | import seaborn as sns |
| scipy | Scientific computing | from scipy import stats |
Machine Learning:
| Library | Purpose | Key Classes |
|---|---|---|
| scikit-learn | ML algorithms | LinearRegression, RandomForest |
| statsmodels | Statistical modeling | OLS, GLM |
| xgboost | Gradient boosting | XGBClassifier, XGBRegressor |
Basketball-Specific:
| Library | Purpose |
|---|---|
| nba_api | NBA Stats API access |
| basketball_reference_web_scraper | BBRef scraping |
Jupyter Notebooks
Essential keyboard shortcuts:
| Mode | Shortcut | Action |
|---|---|---|
| Both | Shift+Enter | Run cell, move to next |
| Both | Ctrl+Enter | Run cell, stay in place |
| Command | A | Insert cell above |
| Command | B | Insert cell below |
| Command | DD | Delete cell |
| Command | M | Convert to Markdown |
| Command | Y | Convert to code |
| Command | Z | Undo cell operation |
| Edit | Tab | Code completion |
| Edit | Shift+Tab | Show docstring |
Magic Commands:
%matplotlib inline # Display plots in notebook
%load_ext autoreload # Auto-reload changed modules
%autoreload 2
%timeit expression # Time execution
%%time # Time cell execution
%who # List variables
%reset # Clear all variables
Key Insight: Use JupyterLab instead of classic Notebook for better file management and multiple views.
Version Control with Git
Essential Git workflow:
# Initialize repository
git init
# Stage changes
git add filename.py
git add . # All files (use carefully)
# Commit changes
git commit -m "Add shooting analysis function"
# View status
git status
# View history
git log --oneline
# Create branch
git checkout -b feature/shot-charts
# Merge branch
git checkout main
git merge feature/shot-charts
What to include in .gitignore:
# Virtual environments
venv/
.venv/
# Python cache
__pycache__/
*.pyc
# Jupyter checkpoints
.ipynb_checkpoints/
# Data files (often too large)
*.csv
*.parquet
data/raw/
# Credentials
.env
secrets.json
# IDE settings
.vscode/
.idea/
Key Insight: Never commit credentials, large data files, or virtual environments to Git.
Checklist for Chapter Completion
Before proceeding to Chapter 4, ensure you can:
- [ ] Install Python 3.10+ on your operating system
- [ ] Verify Python installation with
python --version - [ ] Create and activate a virtual environment
- [ ] Install packages with pip using requirements.txt
- [ ] Explain the difference between pip and Conda
- [ ] Import and use pandas, numpy, and matplotlib
- [ ] Create and run Jupyter notebooks
- [ ] Use essential Jupyter keyboard shortcuts
- [ ] Initialize a Git repository and make commits
- [ ] Create a .gitignore file with appropriate entries
- [ ] Write a basic requirements.txt file with version specifiers
- [ ] Configure matplotlib for different backends
Quick Reference: Project Setup
Complete setup sequence:
# 1. Create project directory
mkdir basketball_analysis
cd basketball_analysis
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# 3. Create requirements.txt
cat > requirements.txt << EOF
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.3.0
nba_api>=1.2.0
jupyter>=1.0.0
EOF
# 4. Install packages
pip install -r requirements.txt
# 5. Initialize Git
git init
echo "venv/" >> .gitignore
echo "__pycache__/" >> .gitignore
echo ".ipynb_checkpoints/" >> .gitignore
# 6. Create directory structure
mkdir -p {data,notebooks,src,output}
# 7. Start Jupyter
jupyter lab
Common Pitfalls to Avoid
- Forgetting to activate venv - Packages install globally instead of in project
- Not pinning versions - Builds become non-reproducible
- Committing venv to Git - Repositories become bloated
- Storing credentials in code - Security vulnerability
- Using Python 2 - End of life, no longer supported
- Ignoring deprecation warnings - Code will break in future versions
- Not using .gitignore - Accidentally commit sensitive files
- Installing with sudo - Permission issues later
IDE Configuration Tips
VS Code Recommended Extensions: - Python (Microsoft) - Pylance (Microsoft) - Jupyter (Microsoft) - GitLens
VS Code settings.json for Python:
{
"python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
"python.formatting.provider": "black",
"editor.formatOnSave": true,
"python.linting.enabled": true,
"python.linting.pylintEnabled": true
}
Connections to Other Chapters
- Chapter 2 (Data Sources): Environment must include nba_api and requests for data collection
- Chapter 4 (EDA): pandas, matplotlib, seaborn configured here are used extensively
- Chapter 5 (Descriptive Statistics): scipy and numpy provide statistical functions
- All Chapters: Jupyter notebooks are the primary development environment
Summary Statement
A well-configured Python environment is the foundation of reproducible analytics. The time invested in proper setup pays dividends through consistent results, easier collaboration, and faster debugging. Master these fundamentals before diving into analytical work.