Chapter 3 Exercises: Python Environment Setup

Introduction

These exercises will help you solidify your understanding of Python environment setup and configuration. Complete them in order, as later exercises build on earlier ones. Each exercise includes difficulty ratings to help you gauge your progress.

Difficulty Levels: - Beginner: Essential skills everyone should master - Intermediate: Skills for independent work - Advanced: Professional-level competencies


Section A: Python Installation and Basics (Exercises 1-8)

Exercise 1: Verify Python Installation

Difficulty: Beginner

Open your command prompt or terminal and complete the following tasks:

  1. Check your Python version using the command line
  2. Check your pip version
  3. Determine the path where Python is installed
  4. List the first 10 packages currently installed in your base environment

Expected Commands:

python --version
pip --version
python -c "import sys; print(sys.executable)"
pip list | head -10

Deliverable: Screenshot or text output of all four commands.


Exercise 2: Python Interactive Mode

Difficulty: Beginner

Enter Python's interactive mode and perform the following calculations:

  1. Calculate the field goal percentage for a player who made 432 shots out of 892 attempts
  2. Calculate the true shooting attempts (TSA) formula: FGA + 0.44 * FTA for a player with 756 FGA and 412 FTA
  3. Create a list of the five traditional basketball positions
  4. Create a dictionary with three players and their points per game

Deliverable: A text file with your Python commands and outputs.


Exercise 3: pip Fundamentals

Difficulty: Beginner

Without installing anything, use pip commands to answer the following:

  1. What is the latest version of pandas available on PyPI?
  2. What are the dependencies of the seaborn package?
  3. Find all packages in your current environment that contain "data" in their name
  4. Generate a requirements.txt file from your current environment

Commands to use:

pip index versions pandas
pip show seaborn
pip list | grep -i data
pip freeze > my_requirements.txt

Deliverable: Answers to each question with supporting command output.


Exercise 4: Understanding Package Versions

Difficulty: Beginner

Create a file called version_check.py that:

  1. Imports pandas, numpy, matplotlib, and seaborn
  2. Prints the version of each library
  3. Prints whether each version meets the minimum requirements: - pandas >= 2.0.0 - numpy >= 1.24.0 - matplotlib >= 3.7.0 - seaborn >= 0.12.0

Expected Output:

Package Version Check
====================
pandas: 2.0.3 [PASS]
numpy: 1.24.3 [PASS]
matplotlib: 3.7.2 [PASS]
seaborn: 0.12.2 [PASS]

Exercise 5: Command Line Python Scripts

Difficulty: Intermediate

Create a Python script called quick_stats.py that accepts command-line arguments:

python quick_stats.py --points 28 --fga 22 --fta 8 --rebounds 10 --assists 7

The script should calculate and display: - Field goal attempts - True shooting attempts - A simple efficiency rating (points + rebounds + assists)

Hint: Use the argparse module.


Exercise 6: pip Advanced Usage

Difficulty: Intermediate

Perform the following pip operations:

  1. Install a specific older version of requests (2.28.0)
  2. Check which packages depend on this version
  3. Upgrade to the latest version
  4. Create a constraints file that pins numpy to version 1.24.3

Deliverable: Document each command and its output.


Exercise 7: Investigating Package Metadata

Difficulty: Intermediate

Write a Python script that uses pkg_resources or importlib.metadata to:

  1. List all installed packages with versions
  2. Find all packages installed after a certain date
  3. Calculate the total size of installed packages
  4. Identify which packages have updates available

Exercise 8: Building a Custom Package Index

Difficulty: Advanced

Research and document how you would:

  1. Create a private PyPI server for your organization
  2. Configure pip to use both public PyPI and your private server
  3. Publish a private package to your server

Deliverable: A written plan with specific tools and configuration files needed.


Section B: Virtual Environments (Exercises 9-15)

Exercise 9: Creating Your First Virtual Environment

Difficulty: Beginner

Create a virtual environment for a basketball analytics project:

  1. Create a new directory called nba_analysis
  2. Create a virtual environment inside it called venv
  3. Activate the environment
  4. Verify that the environment is active
  5. Install pandas and numpy
  6. Deactivate the environment

Document: All commands used and the output at each step.


Exercise 10: Requirements File Management

Difficulty: Beginner

Create three different requirements files for a basketball analytics project:

  1. requirements.txt - Production dependencies with pinned versions
  2. requirements-dev.txt - Development dependencies (pytest, black, flake8)
  3. requirements-docs.txt - Documentation dependencies (sphinx, sphinx-rtd-theme)

Each file should include appropriate version specifiers (==, >=, ~=, etc.)

Deliverable: Three requirements files with at least 5 packages each.


Exercise 11: Environment Recreation

Difficulty: Intermediate

Your colleague has shared their requirements.txt file:

pandas==2.0.3
numpy==1.24.3
matplotlib==3.7.2
seaborn==0.12.2
scikit-learn==1.3.0
jupyter==1.0.0
  1. Create a new virtual environment
  2. Install the requirements
  3. Verify all packages are correctly installed
  4. Export a new requirements file with all dependencies (including transitive ones)
  5. Compare the two requirements files and explain the differences

Exercise 12: Conda Environment Setup

Difficulty: Intermediate

If you have Conda installed, create an environment using a YAML file:

  1. Write an environment.yml file for basketball analytics
  2. Create the environment from this file
  3. List all packages in the environment
  4. Export the environment to a new YAML file
  5. Compare the original and exported files

Starter YAML:

name: nba_analytics
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - pandas>=2.0
  - numpy>=1.24
  - matplotlib>=3.7

Exercise 13: Managing Multiple Environments

Difficulty: Intermediate

Create two separate environments to simulate working on different projects:

  1. Environment legacy_project with: - Python 3.9 - pandas 1.5.0 - numpy 1.23.0

  2. Environment modern_project with: - Python 3.11 - pandas 2.0.3 - numpy 1.24.3

Write a script that: - Checks which environment is currently active - Lists the versions of key packages - Warns if you are using outdated packages


Exercise 14: Virtual Environment Troubleshooting

Difficulty: Intermediate

Debug the following scenarios (describe how you would fix each):

  1. You activate your virtual environment but import pandas still fails
  2. Your requirements.txt installs but your code throws version incompatibility errors
  3. Your virtual environment folder was accidentally deleted but you have requirements.txt
  4. pip install hangs when trying to install a package
  5. Two packages in your requirements have conflicting dependency versions

Exercise 15: Cross-Platform Environment Sharing

Difficulty: Advanced

Research and implement a solution for sharing environments across different operating systems:

  1. Create an environment on your current OS
  2. Generate platform-independent requirements
  3. Document how a colleague on a different OS would recreate the environment
  4. Handle platform-specific packages (if any)

Consider: pip-tools, poetry, or pipenv as alternatives to plain requirements.txt


Section C: Jupyter Notebooks (Exercises 16-22)

Exercise 16: Jupyter Notebook Basics

Difficulty: Beginner

Create a Jupyter notebook called basketball_basics.ipynb that:

  1. Has a title in a markdown cell
  2. Imports pandas, numpy, and matplotlib
  3. Creates a simple DataFrame with 5 players and their statistics
  4. Calculates the mean and standard deviation of points per game
  5. Creates a bar chart of points by player
  6. Includes markdown cells explaining each step

Exercise 17: Jupyter Magic Commands

Difficulty: Beginner

Create a notebook that demonstrates the following magic commands:

  1. %timeit - Time a pandas operation
  2. %matplotlib inline - Display plots inline
  3. %%writefile - Write a cell to a Python file
  4. %run - Run an external Python script
  5. %who - List variables in the namespace
  6. %load - Load code from an external file
  7. %history - Show command history

Deliverable: Notebook with examples and explanations of each magic command.


Exercise 18: Notebook Best Practices

Difficulty: Intermediate

Refactor the following poorly structured notebook code into a well-organized notebook:

Original (messy) code:

import pandas as pd
import numpy as np
df = pd.read_csv('data.csv')
df.head()
x = df['points'].mean()
print(x)
import matplotlib.pyplot as plt
plt.plot(df['games'], df['points'])
df2 = df[df['points'] > 20]
df2.to_csv('output.csv')

Create a properly structured notebook with: - Clear section headers - Import statements grouped at the top - Explanatory markdown cells - Proper variable names - Output cells showing results


Exercise 19: Interactive Widgets

Difficulty: Intermediate

Create a notebook with interactive widgets that allow users to:

  1. Select a player from a dropdown
  2. Choose a date range with sliders
  3. Toggle between different statistics with radio buttons
  4. Update a plot based on the selections

Use: ipywidgets library

import ipywidgets as widgets
from IPython.display import display

# Your code here

Exercise 20: Notebook as a Report

Difficulty: Intermediate

Create a professional-looking notebook that serves as an analysis report:

  1. Include a title page with author, date, and abstract
  2. Add a table of contents using markdown
  3. Include executive summary
  4. Create publication-quality figures with proper labels
  5. Add references section
  6. Export to HTML and PDF formats

Topic: "Analysis of Three-Point Shooting Trends Over 10 Seasons"


Exercise 21: Converting Notebooks to Scripts

Difficulty: Intermediate

  1. Create a notebook with reusable analysis functions
  2. Convert it to a Python script using jupyter nbconvert
  3. Refactor the script into a proper module with: - Docstrings - Type hints - Main function - Command-line interface

Commands:

jupyter nbconvert --to script notebook.ipynb

Exercise 22: Notebook Testing and CI/CD

Difficulty: Advanced

Set up a testing pipeline for Jupyter notebooks:

  1. Install nbval for notebook testing
  2. Create tests that verify notebook cells execute without errors
  3. Create a pytest configuration for notebook testing
  4. Write a GitHub Actions workflow that tests notebooks on push

Section D: Version Control with Git (Exercises 23-30)

Exercise 23: Git Basics

Difficulty: Beginner

Initialize a Git repository for a basketball analytics project:

  1. Create a new directory and initialize Git
  2. Create a .gitignore file appropriate for Python projects
  3. Create a simple Python script
  4. Stage and commit the files
  5. View the commit history
  6. Create a README.md and commit it

Document: Each command and its purpose.


Exercise 24: Branching and Merging

Difficulty: Beginner

Practice branching workflow:

  1. Create a new branch called feature/player-analysis
  2. Make changes to a Python file on this branch
  3. Commit the changes
  4. Switch back to main branch
  5. Merge the feature branch into main
  6. Delete the feature branch

Exercise 25: Handling Merge Conflicts

Difficulty: Intermediate

Intentionally create and resolve a merge conflict:

  1. Create two branches from main
  2. Modify the same line of the same file in both branches
  3. Merge one branch into main
  4. Attempt to merge the second branch
  5. Resolve the conflict manually
  6. Complete the merge

Document: The conflict markers and how you resolved them.


Exercise 26: Git History and Logs

Difficulty: Intermediate

Use Git log commands to explore a repository's history:

  1. View the last 5 commits in one-line format
  2. View commits by a specific author
  3. View commits that changed a specific file
  4. View commits between two dates
  5. Find a commit that contains a specific word in the message
  6. View a graphical representation of branches

Commands to explore:

git log --oneline -5
git log --author="Name"
git log --since="2023-01-01" --until="2023-12-31"
git log --grep="feature"
git log --graph --all --oneline

Exercise 27: Working with Remote Repositories

Difficulty: Intermediate

Practice working with GitHub:

  1. Create a new repository on GitHub
  2. Add the remote to your local repository
  3. Push your local commits to GitHub
  4. Make changes on GitHub (edit a file in the browser)
  5. Pull the changes to your local repository
  6. Create a feature branch and push it to GitHub

Exercise 28: Git Tags and Releases

Difficulty: Intermediate

Manage project versions with Git tags:

  1. Create an annotated tag for version 1.0.0
  2. Push tags to remote
  3. View all tags
  4. Checkout a specific tag
  5. Create a release on GitHub from a tag

Exercise 29: Git Stash and Recovery

Difficulty: Intermediate

Practice using Git stash for work-in-progress:

  1. Make changes to files without committing
  2. Stash the changes with a descriptive message
  3. Switch to another branch and do some work
  4. Return to original branch and apply the stash
  5. Clear the stash

Also practice: - Recovering deleted files with git checkout - Undoing the last commit with git reset - Viewing the reflog to find lost commits


Exercise 30: Git Hooks and Automation

Difficulty: Advanced

Set up Git hooks for a Python project:

  1. Create a pre-commit hook that runs black (code formatter)
  2. Create a pre-commit hook that runs flake8 (linter)
  3. Create a pre-push hook that runs pytest
  4. Document how hooks work and where they are stored

Alternative: Use the pre-commit framework:

pip install pre-commit

Section E: Project Organization and Reproducibility (Exercises 31-37)

Exercise 31: Project Template Creation

Difficulty: Beginner

Create a complete project template for basketball analytics:

  1. Create the full directory structure (data, notebooks, src, tests, output)
  2. Create placeholder files in each directory
  3. Create README.md with project description
  4. Create .gitignore
  5. Create requirements.txt
  6. Initialize Git repository

Exercise 32: Documentation with Docstrings

Difficulty: Intermediate

Write comprehensive docstrings for the following functions:

def calculate_per(pts, reb, ast, stl, blk, fgm, fga, ftm, fta, to, pf, mins):
    # Calculate Player Efficiency Rating
    pass

def calculate_win_shares(player_stats, team_stats, league_stats):
    # Calculate Win Shares
    pass

def predict_game_outcome(home_team_stats, away_team_stats, model):
    # Predict game outcome
    pass

Include: - Description - Parameters with types - Returns with types - Examples - Notes on formula sources


Exercise 33: Configuration Management

Difficulty: Intermediate

Implement configuration management for an analytics project:

  1. Create a config.py file with project settings
  2. Create a config.yaml file for environment-specific settings
  3. Write a function to load configuration from YAML
  4. Implement environment variable overrides
  5. Handle secrets securely (API keys, database passwords)

Do not commit secrets to version control!


Exercise 34: Logging Implementation

Difficulty: Intermediate

Add logging to a basketball analytics script:

  1. Set up logging with different levels (DEBUG, INFO, WARNING, ERROR)
  2. Log to both console and file
  3. Include timestamps and function names in log messages
  4. Create rotating log files (max size, max files)
  5. Log the start and end of major operations
import logging

# Your logging configuration here

Exercise 35: Testing Your Analysis Code

Difficulty: Intermediate

Write tests for basketball analytics functions:

  1. Install pytest
  2. Create a tests directory with test files
  3. Write tests for statistical calculation functions
  4. Write tests for data loading functions
  5. Run tests and generate a coverage report

Example functions to test:

def calculate_true_shooting_pct(points, fga, fta):
    if fga + fta == 0:
        return 0.0
    return points / (2 * (fga + 0.44 * fta))

def calculate_usage_rate(fga, fta, to, team_fga, team_fta, team_to, mins, team_mins):
    # Usage rate calculation
    pass

Exercise 36: Reproducibility Checklist

Difficulty: Intermediate

Audit an analysis project for reproducibility. Create a checklist document that verifies:

  1. All dependencies are documented with versions
  2. Random seeds are set and documented
  3. Data sources are documented with access dates
  4. All preprocessing steps are scripted (not manual)
  5. Configuration is separate from code
  6. README includes complete setup instructions
  7. Tests verify key calculations

Apply this checklist to your own project.


Exercise 37: Makefile for Automation

Difficulty: Advanced

Create a Makefile that automates common project tasks:

.PHONY: install test clean lint format run

install:
    # Install dependencies

test:
    # Run tests

clean:
    # Clean generated files

lint:
    # Run linting

format:
    # Format code

run:
    # Run main analysis

Implement each target and document usage in README.


Section F: Integration Challenges (Exercises 38-40)

Exercise 38: Complete Environment Setup Challenge

Difficulty: Advanced

Starting from scratch on a new machine (or in a Docker container):

  1. Document the complete process to set up a basketball analytics environment
  2. Install Python, pip, and all necessary tools
  3. Create a virtual environment
  4. Install all required packages
  5. Verify the installation
  6. Run a test analysis script

Time yourself and aim for under 15 minutes.


Exercise 39: Debugging Environment Issues

Difficulty: Advanced

You receive the following error messages. Diagnose and fix each:

  1. ModuleNotFoundError: No module named 'pandas'
  2. ImportError: cannot import name 'RandomForestClassifier' from 'sklearn'
  3. RuntimeError: Python 3.10 is required, but Python 3.8.10 is installed
  4. ERROR: pip's dependency resolver does not currently take into account all packages
  5. jupyter: command not found

Document your debugging process and solutions.


Exercise 40: Full Project Setup

Difficulty: Advanced

Create a complete, production-ready project setup for analyzing NBA player efficiency:

  1. Initialize project with proper structure
  2. Set up virtual environment with all dependencies
  3. Create sample data files
  4. Write analysis code in src/ directory
  5. Create Jupyter notebook that imports from src/
  6. Write comprehensive tests
  7. Set up pre-commit hooks for code quality
  8. Initialize Git with proper .gitignore
  9. Write complete README with setup instructions
  10. Verify a fresh clone can be set up and run successfully

Deliverable: GitHub repository (can be private) with all components.


Answer Key for Selected Exercises

Exercise 4: Version Check Script Solution

"""Check installed package versions against requirements."""

from packaging import version
import importlib

REQUIREMENTS = {
    'pandas': '2.0.0',
    'numpy': '1.24.0',
    'matplotlib': '3.7.0',
    'seaborn': '0.12.0'
}

def check_version(package_name, min_version):
    """Check if installed version meets minimum requirement."""
    try:
        module = importlib.import_module(package_name)
        installed = module.__version__
        meets_requirement = version.parse(installed) >= version.parse(min_version)
        status = "PASS" if meets_requirement else "FAIL"
        return installed, status
    except ImportError:
        return "NOT INSTALLED", "FAIL"

def main():
    print("Package Version Check")
    print("=" * 30)

    all_pass = True
    for package, min_ver in REQUIREMENTS.items():
        installed, status = check_version(package, min_ver)
        print(f"{package}: {installed} [{status}]")
        if status == "FAIL":
            all_pass = False

    print("=" * 30)
    if all_pass:
        print("All requirements satisfied!")
    else:
        print("Some requirements not met.")

if __name__ == "__main__":
    main()

Exercise 9: Virtual Environment Commands

# 1. Create directory
mkdir nba_analysis
cd nba_analysis

# 2. Create virtual environment
python -m venv venv

# 3. Activate (Windows)
venv\Scripts\activate

# 3. Activate (macOS/Linux)
source venv/bin/activate

# 4. Verify
which python  # Should show path in venv
python --version

# 5. Install packages
pip install pandas numpy

# 6. Deactivate
deactivate

Exercise 23: Git Basics Commands

# 1. Create and initialize
mkdir basketball_analytics
cd basketball_analytics
git init

# 2. Create .gitignore
cat > .gitignore << 'EOF'
__pycache__/
*.py[cod]
venv/
.ipynb_checkpoints/
*.csv
.env
EOF

# 3. Create Python script
cat > analysis.py << 'EOF'
"""Basic basketball analysis script."""
import pandas as pd

def calculate_ppg(total_points, games_played):
    return total_points / games_played
EOF

# 4. Stage and commit
git add .gitignore analysis.py
git commit -m "Initial commit: Add gitignore and analysis script"

# 5. View history
git log --oneline

# 6. Create and commit README
cat > README.md << 'EOF'
# Basketball Analytics

A Python project for analyzing basketball statistics.
EOF

git add README.md
git commit -m "Add README documentation"

Submission Guidelines

For each exercise: 1. Save your work in a clearly named folder (e.g., exercise_01/) 2. Include all code files, notebooks, and documentation 3. Add a brief reflection on what you learned 4. Note any challenges you encountered and how you solved them

Grading Rubric: - Beginner exercises: Completion and correctness - Intermediate exercises: Code quality, documentation, and best practices - Advanced exercises: Comprehensive solution, professional quality, innovative approaches