Exercises: Python for Football Analytics
Build practical programming skills for football data analysis.
Scoring Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)
Part A: Conceptual Understanding ⭐
A.1. Why do we use virtual environments for Python projects? List three benefits.
A.2. Explain the difference between using a Python for loop versus NumPy vectorization for calculations. When is each appropriate?
A.3. What is method chaining in pandas? Write an example showing the same operation with and without chaining.
A.4. Describe the groupby-agg pattern in pandas. What are the key steps?
A.5. What does the SettingWithCopyWarning indicate, and how do you fix it?
A.6. Why is df['col'] > value different from using Python's and/or for multiple conditions?
A.7. What information should a good function docstring contain?
A.8. Explain the difference between .transform() and .agg() in groupby operations.
Part B: Pandas Practice ⭐⭐
B.1. Load 2023 play-by-play data and answer: - How many passing plays had positive EPA? - What percentage of rushing plays gained 4+ yards? - Which team had the most plays in the red zone?
B.2. Using groupby, calculate for each team: - Total offensive plays - EPA per play - Pass rate (percentage of plays that were passes) - Success rate (percentage with EPA > 0)
B.3. Create a new DataFrame showing weekly QB performance: - Filter to pass plays only - Group by week and passer_player_name - Calculate: dropbacks, EPA, completion %, air yards per attempt - Filter to QBs with 20+ dropbacks per week
B.4. Write a method chain that: - Filters to third down plays - Excludes garbage time (score differential > 17 in 4th quarter) - Groups by team and calculates conversion rate - Sorts by conversion rate descending
B.5. Merge play-by-play data with roster data to add player positions. How many pass plays targeted tight ends vs wide receivers?
B.6. Use .transform() to add columns showing:
- Each play's EPA compared to team average
- Cumulative EPA within each game
- Rank of each play's EPA within its game
Part C: Programming Challenges ⭐⭐-⭐⭐⭐
C.1. Write a Function: Success Rate Calculator ⭐⭐
def success_rate_by_situation(
pbp: pd.DataFrame,
down: int = None,
distance_range: tuple = None,
field_zone: str = None
) -> pd.DataFrame:
"""
Calculate success rate for specific situations.
Parameters
----------
pbp : pd.DataFrame
Play-by-play data
down : int, optional
Filter to specific down (1, 2, 3, or 4)
distance_range : tuple, optional
(min, max) yards to go range
field_zone : str, optional
'red_zone', 'midfield', or 'own_territory'
Returns
-------
pd.DataFrame
Success rate by team for the specified situation
"""
pass
# Test cases
# success_rate_by_situation(pbp, down=3, distance_range=(5, 10))
# success_rate_by_situation(pbp, field_zone='red_zone')
C.2. Write a Function: Player Comparison ⭐⭐
def compare_players(
pbp: pd.DataFrame,
player_ids: list,
metrics: list = ['epa', 'yards_gained', 'success']
) -> pd.DataFrame:
"""
Compare multiple players on specified metrics.
Returns a DataFrame with one row per player showing
their average for each metric.
"""
pass
C.3. Write a Class: GameAnalyzer ⭐⭐⭐
class GameAnalyzer:
"""
Analyze a single NFL game.
Usage:
analyzer = GameAnalyzer(pbp, game_id='2023_01_DET_KC')
analyzer.summary()
analyzer.key_plays(n=5)
analyzer.team_comparison()
"""
def __init__(self, pbp: pd.DataFrame, game_id: str):
pass
def summary(self) -> dict:
"""Return game summary statistics."""
pass
def key_plays(self, n: int = 5) -> pd.DataFrame:
"""Return the n highest-impact plays by EPA."""
pass
def team_comparison(self) -> pd.DataFrame:
"""Compare the two teams on key metrics."""
pass
def quarter_breakdown(self) -> pd.DataFrame:
"""Show statistics by quarter."""
pass
C.4. Write a Function: Rolling Metrics ⭐⭐⭐
def calculate_rolling_metrics(
pbp: pd.DataFrame,
player_col: str,
metric_col: str,
window: int = 100
) -> pd.DataFrame:
"""
Calculate rolling metrics for players across their plays.
Returns DataFrame with player, play number, and rolling average.
Useful for tracking player performance trends.
"""
pass
C.5. Build a Data Pipeline ⭐⭐⭐
Create a complete pipeline that: - Loads multiple seasons of data - Cleans and standardizes fields - Adds derived columns - Caches intermediate results - Outputs player season summaries
Part D: NumPy and Performance ⭐⭐-⭐⭐⭐
D.1. Vectorization Exercise ⭐⭐
Rewrite this loop-based function using NumPy vectorization:
def calculate_garbage_time_slow(pbp):
results = []
for idx, row in pbp.iterrows():
if (row['game_seconds_remaining'] < 300 and
abs(row['posteam_score'] - row['defteam_score']) > 17):
results.append(1)
else:
results.append(0)
return results
# Write a vectorized version that's 100x+ faster
def calculate_garbage_time_fast(pbp):
pass
D.2. Memory Optimization ⭐⭐
Load play-by-play data and: - Check initial memory usage - Identify columns that could use smaller data types - Apply optimizations - Report memory savings
D.3. Performance Comparison ⭐⭐⭐
For each of these tasks, write two versions (slow and fast) and time them: 1. Calculate the mean EPA for each team 2. Add a column indicating if EPA is above team average 3. Calculate week-over-week change in EPA for each player
Part E: Project Integration ⭐⭐⭐⭐
E.1. Build a Reusable Analysis Module
Create a Python module (football_utils.py) containing:
- Data loading functions with caching
- Common filters (passes, rushes, garbage time, etc.)
- Standard aggregation functions
- Utility functions for common calculations
Include comprehensive docstrings and type hints.
E.2. Create a Player Analysis Report Generator
Build a function that takes a player ID and season, then generates a complete statistical report including: - Season summary statistics - Weekly performance trend - Comparison to league average - Situational splits (down, field position, etc.)
Output as a formatted dictionary or markdown string.
E.3. Implement a Testing Suite
Write pytest tests for at least 5 functions from this chapter: - Test normal cases - Test edge cases (empty data, missing values) - Test error handling
Solutions
Selected solutions available in code/exercise-solutions.py