Case Study 1: Setting Up a Team Analytics Environment

Overview

Scenario: You've been hired as a junior analyst for the Milwaukee Bucks. Your first task is to set up a standardized analytics environment that the entire analytics team can use. The environment must be reproducible, well-documented, and support the team's diverse analytical workflows.

Duration: 2-3 hours Difficulty: Beginner to Intermediate Prerequisites: Basic command line familiarity

Background

The Bucks analytics department currently faces several challenges:

Team members use different Python versions, causing compatibility issues
Package versions vary between machines, leading to inconsistent results
New analysts spend days setting up their environments
There's no standardized project structure
Documentation is scattered and outdated

Your task is to create a standardized environment setup that addresses these issues.

Part 1: Requirements Gathering

1.1 Stakeholder Interviews

After meeting with team members, you've identified the following analytical workflows:

Video Analyst (Sarah) - Needs: matplotlib, opencv-python for frame analysis - "I spend hours each week extracting shot clock data from video."

Statistical Analyst (Marcus) - Needs: pandas, scipy, statsmodels for hypothesis testing - "I'm running regression models on player performance data."

Machine Learning Engineer (Priya) - Needs: scikit-learn, xgboost, tensorflow for predictive models - "My models need to be reproducible for auditing."

Data Engineer (James) - Needs: nba_api, requests, sqlalchemy for data pipelines - "I pull data from multiple sources daily."

1.2 Common Requirements Matrix

Package	Video	Stats	ML	Data	Required
pandas	Yes	Yes	Yes	Yes	Core
numpy	Yes	Yes	Yes	Yes	Core
matplotlib	Yes	Yes	Yes	No	Core
seaborn	No	Yes	Yes	No	Core
scipy	No	Yes	Yes	No	Standard
scikit-learn	No	Yes	Yes	No	Standard
statsmodels	No	Yes	No	No	Standard
nba_api	No	Yes	Yes	Yes	Standard
jupyter	Yes	Yes	Yes	Yes	Core
opencv-python	Yes	No	No	No	Optional
xgboost	No	No	Yes	No	Optional
tensorflow	No	No	Yes	No	Optional

Part 2: Implementation

2.1 Creating the Base Environment

Step 1: Create the Project Directory

# Create the main project directory
mkdir bucks_analytics
cd bucks_analytics

# Create subdirectory structure
mkdir -p {data/{raw,processed,external},notebooks,src/{data,features,models,visualization},tests,output/{figures,reports},docs}

# Create placeholder files
touch src/__init__.py
touch src/data/__init__.py
touch src/features/__init__.py
touch src/models/__init__.py
touch src/visualization/__init__.py

Step 2: Create the Virtual Environment

# Create virtual environment
python -m venv venv

# Activate the environment (Windows)
venv\Scripts\activate

# Activate the environment (macOS/Linux)
source venv/bin/activate

# Upgrade pip
pip install --upgrade pip

2.2 Creating Tiered Requirements Files

requirements-core.txt (Essential for all team members)

# Core Data Science Stack
pandas>=2.0.0,<3.0.0
numpy>=1.24.0,<2.0.0
scipy>=1.11.0,<2.0.0

# Visualization
matplotlib>=3.7.0,<4.0.0
seaborn>=0.12.0,<1.0.0

# Jupyter Environment
jupyter>=1.0.0
jupyterlab>=4.0.0
notebook>=7.0.0
ipywidgets>=8.0.0

# NBA Data Access
nba_api>=1.2.0

# Data Handling
openpyxl>=3.1.0
pyarrow>=12.0.0
requests>=2.31.0

# Utilities
python-dotenv>=1.0.0
tqdm>=4.65.0

requirements-stats.txt (For statistical analysis)

# Include core requirements
-r requirements-core.txt

# Statistical Modeling
statsmodels>=0.14.0
scikit-learn>=1.3.0
pingouin>=0.5.0

# Advanced Statistics
lifelines>=0.27.0  # Survival analysis

requirements-ml.txt (For machine learning workflows)

# Include stats requirements
-r requirements-stats.txt

# Machine Learning
xgboost>=1.7.0
lightgbm>=4.0.0

# Model Evaluation
shap>=0.42.0

# Optional: Deep Learning (commented out by default)
# tensorflow>=2.13.0
# torch>=2.0.0

requirements-dev.txt (For development and testing)

# Include core requirements
-r requirements-core.txt

# Testing
pytest>=7.4.0
pytest-cov>=4.1.0

# Code Quality
black>=23.7.0
flake8>=6.1.0
isort>=5.12.0
mypy>=1.5.0

# Documentation
sphinx>=7.0.0
sphinx-rtd-theme>=1.3.0

# Pre-commit hooks
pre-commit>=3.3.0

2.3 Creating the Setup Script

setup_environment.py

#!/usr/bin/env python
"""
Milwaukee Bucks Analytics Environment Setup Script

This script automates the setup of the analytics environment,
including virtual environment creation and package installation.

Usage:
    python setup_environment.py [--profile PROFILE]

Profiles:
    core    - Basic data science stack (default)
    stats   - Statistical analysis packages
    ml      - Machine learning packages
    dev     - Development and testing tools
    all     - Everything
"""

import subprocess
import sys
import os
import argparse
from pathlib import Path


def run_command(command, description):
    """Execute a shell command and handle errors."""
    print(f"\n{'='*60}")
    print(f"  {description}")
    print(f"{'='*60}")

    try:
        result = subprocess.run(
            command,
            shell=True,
            check=True,
            capture_output=True,
            text=True
        )
        if result.stdout:
            print(result.stdout)
        return True
    except subprocess.CalledProcessError as e:
        print(f"Error: {e.stderr}")
        return False


def check_python_version():
    """Verify Python version meets requirements."""
    version = sys.version_info
    if version.major < 3 or (version.major == 3 and version.minor < 10):
        print(f"Error: Python 3.10+ required. Found {version.major}.{version.minor}")
        return False
    print(f"Python version: {version.major}.{version.minor}.{version.micro} (OK)")
    return True


def create_directories():
    """Create the project directory structure."""
    directories = [
        'data/raw',
        'data/processed',
        'data/external',
        'notebooks',
        'src/data',
        'src/features',
        'src/models',
        'src/visualization',
        'tests',
        'output/figures',
        'output/reports',
        'docs'
    ]

    for dir_path in directories:
        Path(dir_path).mkdir(parents=True, exist_ok=True)
        init_file = Path(dir_path.split('/')[0]) / '__init__.py'
        if 'src' in dir_path:
            init_file = Path(dir_path) / '__init__.py'
            init_file.touch(exist_ok=True)

    print("Directory structure created successfully")
    return True


def create_virtual_environment():
    """Create a new virtual environment."""
    venv_path = Path('venv')

    if venv_path.exists():
        print("Virtual environment already exists")
        return True

    return run_command(
        f"{sys.executable} -m venv venv",
        "Creating virtual environment"
    )


def get_pip_command():
    """Get the correct pip command for the current OS."""
    if sys.platform == 'win32':
        return r'venv\Scripts\pip'
    return 'venv/bin/pip'


def install_requirements(profile):
    """Install requirements based on selected profile."""
    pip = get_pip_command()

    # Upgrade pip first
    run_command(f"{pip} install --upgrade pip", "Upgrading pip")

    # Map profiles to requirements files
    profile_map = {
        'core': ['requirements-core.txt'],
        'stats': ['requirements-stats.txt'],
        'ml': ['requirements-ml.txt'],
        'dev': ['requirements-dev.txt'],
        'all': ['requirements-ml.txt', 'requirements-dev.txt']
    }

    requirements_files = profile_map.get(profile, ['requirements-core.txt'])

    for req_file in requirements_files:
        if Path(req_file).exists():
            success = run_command(
                f"{pip} install -r {req_file}",
                f"Installing packages from {req_file}"
            )
            if not success:
                return False

    return True


def create_gitignore():
    """Create a comprehensive .gitignore file."""
    gitignore_content = """# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Virtual Environment
venv/
.venv/
ENV/

# Jupyter Notebooks
.ipynb_checkpoints/
*.ipynb_checkpoints/

# IDE
.idea/
.vscode/
*.swp
*.swo
*.sublime-*

# OS
.DS_Store
Thumbs.db
*.bak

# Project specific
data/raw/*
data/external/*
!data/raw/.gitkeep
!data/external/.gitkeep
output/*
!output/.gitkeep
*.log
*.csv
*.xlsx
*.parquet

# Credentials
.env
.env.local
secrets.json
credentials/

# Model files (often too large)
*.pkl
*.joblib
*.h5
*.pt
*.pth
"""

    with open('.gitignore', 'w') as f:
        f.write(gitignore_content)

    print(".gitignore created successfully")
    return True


def create_readme():
    """Create a README.md file."""
    readme_content = """# Milwaukee Bucks Analytics Environment

## Quick Start

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd bucks_analytics
   ```

2. **Run the setup script**
   ```bash
   python setup_environment.py --profile stats
   ```

3. **Activate the environment**
   ```bash
   # Windows
   venv\\Scripts\\activate

   # macOS/Linux
   source venv/bin/activate
   ```

4. **Start Jupyter**
   ```bash
   jupyter lab
   ```

## Installation Profiles

- `core`: Basic data science stack
- `stats`: Statistical analysis packages
- `ml`: Machine learning packages
- `dev`: Development and testing tools
- `all`: Everything

## Project Structure

bucks_analytics/ ├── data/ │ ├── raw/ # Original data (not tracked) │ ├── processed/ # Cleaned data │ └── external/ # Third-party data ├── notebooks/ # Jupyter notebooks ├── src/ │ ├── data/ # Data loading utilities │ ├── features/ # Feature engineering │ ├── models/ # Model definitions │ └── visualization/ # Plotting functions ├── tests/ # Unit tests ├── output/ │ ├── figures/ # Generated plots │ └── reports/ # Analysis reports └── docs/ # Documentation


## Data Sources

- NBA API (via nba_api library)
- Internal databases (credentials required)
- Basketball-Reference (rate-limited scraping)

## Contributing

1. Create a feature branch
2. Make your changes
3. Run tests: `pytest tests/`
4. Submit a pull request

## Questions?

Contact the analytics team at analytics@bucks.com
"""

    with open('README.md', 'w') as f:
        f.write(readme_content)

    print("README.md created successfully")
    return True


def verify_installation():
    """Verify that key packages are installed correctly."""
    print("\n" + "="*60)
    print("  Verifying Installation")
    print("="*60 + "\n")

    packages = [
        ('pandas', 'pandas'),
        ('numpy', 'numpy'),
        ('matplotlib', 'matplotlib'),
        ('scikit-learn', 'sklearn'),
        ('nba_api', 'nba_api'),
        ('jupyter', 'jupyter'),
    ]

    if sys.platform == 'win32':
        python = r'venv\Scripts\python'
    else:
        python = 'venv/bin/python'

    all_ok = True

    for package_name, import_name in packages:
        try:
            result = subprocess.run(
                f'{python} -c "import {import_name}; print({import_name}.__version__)"',
                shell=True,
                capture_output=True,
                text=True,
                check=True
            )
            version = result.stdout.strip()
            print(f"  [OK] {package_name}: {version}")
        except subprocess.CalledProcessError:
            print(f"  [FAILED] {package_name}")
            all_ok = False

    return all_ok


def main():
    """Main entry point for environment setup."""
    parser = argparse.ArgumentParser(
        description='Set up the Bucks Analytics environment'
    )
    parser.add_argument(
        '--profile',
        choices=['core', 'stats', 'ml', 'dev', 'all'],
        default='core',
        help='Installation profile (default: core)'
    )

    args = parser.parse_args()

    print("\n" + "="*60)
    print("  Milwaukee Bucks Analytics Environment Setup")
    print("="*60)
    print(f"\nProfile: {args.profile}")

    # Run setup steps
    steps = [
        (check_python_version, "Checking Python version"),
        (create_directories, "Creating directory structure"),
        (create_virtual_environment, "Creating virtual environment"),
        (lambda: install_requirements(args.profile), "Installing packages"),
        (create_gitignore, "Creating .gitignore"),
        (create_readme, "Creating README.md"),
        (verify_installation, "Verifying installation"),
    ]

    for step_func, step_name in steps:
        print(f"\n{'='*60}")
        print(f"  {step_name}")
        print(f"{'='*60}")

        if not step_func():
            print(f"\nSetup failed at: {step_name}")
            return 1

    print("\n" + "="*60)
    print("  Setup Complete!")
    print("="*60)
    print("\nNext steps:")
    print("  1. Activate the environment:")
    if sys.platform == 'win32':
        print("     venv\\Scripts\\activate")
    else:
        print("     source venv/bin/activate")
    print("  2. Start Jupyter Lab:")
    print("     jupyter lab")
    print("  3. Open notebooks/01_getting_started.ipynb")

    return 0


if __name__ == '__main__':
    sys.exit(main())

Part 3: Documentation and Onboarding

3.1 Creating an Onboarding Notebook

Create notebooks/01_getting_started.ipynb with the following cells:

Cell 1 (Markdown):

# Welcome to Bucks Analytics!

This notebook verifies your environment setup and introduces our analytics workflow.

## Running This Notebook

1. Make sure you've run the setup script
2. Activated your virtual environment
3. Started Jupyter Lab

Let's verify everything is working correctly.

Cell 2 (Code):

# Verify imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from nba_api.stats.static import players, teams

print("All imports successful!")
print(f"pandas version: {pd.__version__}")
print(f"numpy version: {np.__version__}")

Cell 3 (Code):

# Test NBA API connection
all_players = players.get_players()
bucks_roster = [p for p in all_players if p['is_active']]

print(f"Found {len(bucks_roster)} active NBA players")
print("\nSample players:")
for player in bucks_roster[:5]:
    print(f"  - {player['full_name']}")

Cell 4 (Code):

# Test visualization
fig, ax = plt.subplots(figsize=(10, 6))

# Sample data
positions = ['PG', 'SG', 'SF', 'PF', 'C']
avg_points = [18.5, 17.2, 15.8, 14.3, 12.1]

ax.bar(positions, avg_points, color='#00471B')  # Bucks green
ax.set_xlabel('Position')
ax.set_ylabel('Average Points')
ax.set_title('NBA Average Points by Position (Sample)')

plt.tight_layout()
plt.show()

print("Visualization working correctly!")

Part 4: Testing the Setup

4.1 Creating a Test Script

tests/test_environment.py

"""
Environment verification tests.

Run with: pytest tests/test_environment.py -v
"""

import pytest


class TestCorePackages:
    """Test that core packages are installed and functional."""

    def test_pandas_import(self):
        import pandas as pd
        assert pd.__version__ >= '2.0.0'

    def test_numpy_import(self):
        import numpy as np
        assert np.__version__ >= '1.24.0'

    def test_matplotlib_import(self):
        import matplotlib
        assert matplotlib.__version__ >= '3.7.0'

    def test_seaborn_import(self):
        import seaborn as sns
        assert sns.__version__ >= '0.12.0'


class TestNBAAPI:
    """Test NBA API connectivity."""

    def test_nba_api_import(self):
        from nba_api.stats.static import players
        all_players = players.get_players()
        assert len(all_players) > 0

    def test_find_player(self):
        from nba_api.stats.static import players
        giannis = players.find_players_by_full_name("Giannis Antetokounmpo")
        assert len(giannis) == 1
        assert giannis[0]['id'] == 203507


class TestProjectStructure:
    """Test that project directories exist."""

    def test_data_directories(self):
        from pathlib import Path
        assert Path('data/raw').exists()
        assert Path('data/processed').exists()
        assert Path('data/external').exists()

    def test_src_directories(self):
        from pathlib import Path
        assert Path('src/data').exists()
        assert Path('src/models').exists()
        assert Path('src/visualization').exists()

    def test_output_directories(self):
        from pathlib import Path
        assert Path('output/figures').exists()
        assert Path('output/reports').exists()


if __name__ == '__main__':
    pytest.main([__file__, '-v'])

Part 5: Discussion Questions

Question 1: Version Pinning

Why do the requirements files use version ranges (e.g., >=2.0.0,<3.0.0) instead of exact versions (==2.0.3)? What are the tradeoffs?

Question 2: Profile System

The setup uses different requirement profiles (core, stats, ml, dev). What advantages does this provide over a single requirements.txt file?

Question 3: Virtual Environments

Why is it important that each team member uses a virtual environment rather than installing packages globally?

Question 4: Reproducibility

What additional steps could be taken to ensure that an analysis run today can be exactly reproduced in five years?

Question 5: Security

The .gitignore file excludes credentials and .env files. What other security considerations should an analytics team address?

Deliverables

By completing this case study, you should produce:

Setup Script: Functional setup_environment.py
Requirements Files: Tiered requirements files for different use cases
Project Structure: Complete directory structure with placeholder files
Documentation: README.md and onboarding notebook
Tests: Environment verification test suite

Key Takeaways

Standardization reduces friction - A consistent setup process helps new team members become productive quickly
Tiered requirements support diverse workflows without bloating everyone's environment
Virtual environments isolate dependencies and ensure reproducibility
Documentation is crucial - Good README and onboarding materials save hours of confusion
Automated setup reduces human error and ensures consistency

This case study demonstrates how proper environment management enables effective team collaboration in basketball analytics.