Appendix B: Python Setup and Libraries

This appendix provides setup instructions and library references for the code examples in this textbook.


Python Installation

  1. Download from https://www.anaconda.com/download
  2. Install with default settings
  3. Includes Python, Jupyter, and common libraries

Alternative: Standard Python

  1. Download from https://www.python.org/downloads/
  2. Install Python 3.9+
  3. Use pip for library installation

Required Libraries

Core Data Analysis

pip install pandas numpy scipy
Library Version Purpose
pandas 1.5+ Data manipulation
numpy 1.24+ Numerical computing
scipy 1.10+ Statistical functions

Visualization

pip install matplotlib seaborn plotly
Library Version Purpose
matplotlib 3.7+ Basic plotting
seaborn 0.12+ Statistical plots
plotly 5.15+ Interactive charts

Machine Learning

pip install scikit-learn
Library Version Purpose
scikit-learn 1.2+ ML algorithms

NFL Data

pip install nfl-data-py sportsipy
Library Version Purpose
nfl-data-py 0.3+ nflfastR Python port
sportsipy 0.6+ Sports reference data

Web/API

pip install requests beautifulsoup4
Library Version Purpose
requests 2.28+ HTTP requests
beautifulsoup4 4.12+ HTML parsing

Complete Environment Setup

Using requirements.txt

Create requirements.txt:

pandas>=1.5.0
numpy>=1.24.0
scipy>=1.10.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.2.0
nfl-data-py>=0.3.0
requests>=2.28.0
jupyter>=1.0.0

Install:

pip install -r requirements.txt

Using Conda Environment

conda create -n nfl-analytics python=3.10
conda activate nfl-analytics
conda install pandas numpy scipy matplotlib seaborn scikit-learn jupyter
pip install nfl-data-py

Jupyter Notebook Setup

Starting Jupyter

jupyter notebook
# or
jupyter lab

Useful Magic Commands

# Display plots inline
%matplotlib inline

# Auto-reload modules
%load_ext autoreload
%autoreload 2

# Show all output (not just last)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Library Quick Reference

Pandas Basics

import pandas as pd

# Read data
df = pd.read_csv('data.csv')

# Basic operations
df.head()           # First 5 rows
df.describe()       # Summary stats
df.info()           # Column types

# Filtering
df[df['column'] > value]
df.query('column > @value')

# Grouping
df.groupby('team').mean()
df.groupby(['team', 'year']).agg({'pts': 'sum'})

# Merging
pd.merge(df1, df2, on='key')
df1.join(df2, on='key')

NumPy Basics

import numpy as np

# Array creation
arr = np.array([1, 2, 3])
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))

# Statistics
np.mean(arr)
np.std(arr)
np.percentile(arr, 75)

# Random
np.random.normal(mean, std, size)
np.random.choice(arr, size, replace=True)

Matplotlib Basics

import matplotlib.pyplot as plt

# Basic plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Series')
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.title('Title')
plt.legend()
plt.savefig('plot.png', dpi=150)
plt.show()

# Scatter plot
plt.scatter(x, y, c=colors, s=sizes)

# Histogram
plt.hist(data, bins=30, edgecolor='black')

# Bar chart
plt.bar(categories, values)

Seaborn Basics

import seaborn as sns

# Distribution
sns.histplot(df['column'], kde=True)
sns.kdeplot(df['column'])

# Relationships
sns.scatterplot(data=df, x='x', y='y', hue='group')
sns.regplot(data=df, x='x', y='y')

# Categorical
sns.boxplot(data=df, x='category', y='value')
sns.barplot(data=df, x='category', y='value')

# Heatmap
sns.heatmap(correlation_matrix, annot=True)

Scikit-Learn Basics

from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Linear regression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)

# Logistic regression
clf = LogisticRegression()
clf.fit(X_train, y_train)
accuracy = accuracy_score(y_test, clf.predict(X_test))

NFL Data Access

Using nfl-data-py

import nfl_data_py as nfl

# Load play-by-play
pbp = nfl.import_pbp_data([2022, 2023])

# Load schedules
schedules = nfl.import_schedules([2022, 2023])

# Load rosters
rosters = nfl.import_rosters([2022, 2023])

# Load combine data
combine = nfl.import_combine_data([2020, 2021, 2022])

Common Columns (Play-by-Play)

Column Description
play_id Unique play identifier
game_id Game identifier
posteam Team with possession
defteam Defensive team
down Current down
ydstogo Yards to first down
yardline_100 Yards from endzone
play_type Type of play
yards_gained Yards gained
epa Expected Points Added
wp Win probability

Project Structure

Recommended directory structure:

nfl-analytics-project/
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/
│   ├── 01-data-exploration.ipynb
│   └── 02-analysis.ipynb
├── src/
│   ├── __init__.py
│   ├── data_loading.py
│   ├── analysis.py
│   └── visualization.py
├── tests/
│   └── test_analysis.py
├── requirements.txt
└── README.md

Troubleshooting

Common Issues

Import Errors:

pip install --upgrade <library>

Version Conflicts:

pip install <library>==<version>

Memory Issues: - Use chunksize parameter in pd.read_csv() - Filter data early in pipeline - Use appropriate dtypes

Plotting Not Showing:

%matplotlib inline
plt.show()

This appendix provides the technical foundation for running code examples. Refer back as needed when setting up your environment.