Exercises: Python for Football Analytics

Build practical programming skills for football data analysis.

Scoring Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)


Part A: Conceptual Understanding ⭐

A.1. Why do we use virtual environments for Python projects? List three benefits.

A.2. Explain the difference between using a Python for loop versus NumPy vectorization for calculations. When is each appropriate?

A.3. What is method chaining in pandas? Write an example showing the same operation with and without chaining.

A.4. Describe the groupby-agg pattern in pandas. What are the key steps?

A.5. What does the SettingWithCopyWarning indicate, and how do you fix it?

A.6. Why is df['col'] > value different from using Python's and/or for multiple conditions?

A.7. What information should a good function docstring contain?

A.8. Explain the difference between .transform() and .agg() in groupby operations.


Part B: Pandas Practice ⭐⭐

B.1. Load 2023 play-by-play data and answer: - How many passing plays had positive EPA? - What percentage of rushing plays gained 4+ yards? - Which team had the most plays in the red zone?

B.2. Using groupby, calculate for each team: - Total offensive plays - EPA per play - Pass rate (percentage of plays that were passes) - Success rate (percentage with EPA > 0)

B.3. Create a new DataFrame showing weekly QB performance: - Filter to pass plays only - Group by week and passer_player_name - Calculate: dropbacks, EPA, completion %, air yards per attempt - Filter to QBs with 20+ dropbacks per week

B.4. Write a method chain that: - Filters to third down plays - Excludes garbage time (score differential > 17 in 4th quarter) - Groups by team and calculates conversion rate - Sorts by conversion rate descending

B.5. Merge play-by-play data with roster data to add player positions. How many pass plays targeted tight ends vs wide receivers?

B.6. Use .transform() to add columns showing: - Each play's EPA compared to team average - Cumulative EPA within each game - Rank of each play's EPA within its game


Part C: Programming Challenges ⭐⭐-⭐⭐⭐

C.1. Write a Function: Success Rate Calculator ⭐⭐

def success_rate_by_situation(
    pbp: pd.DataFrame,
    down: int = None,
    distance_range: tuple = None,
    field_zone: str = None
) -> pd.DataFrame:
    """
    Calculate success rate for specific situations.

    Parameters
    ----------
    pbp : pd.DataFrame
        Play-by-play data
    down : int, optional
        Filter to specific down (1, 2, 3, or 4)
    distance_range : tuple, optional
        (min, max) yards to go range
    field_zone : str, optional
        'red_zone', 'midfield', or 'own_territory'

    Returns
    -------
    pd.DataFrame
        Success rate by team for the specified situation
    """
    pass

# Test cases
# success_rate_by_situation(pbp, down=3, distance_range=(5, 10))
# success_rate_by_situation(pbp, field_zone='red_zone')

C.2. Write a Function: Player Comparison ⭐⭐

def compare_players(
    pbp: pd.DataFrame,
    player_ids: list,
    metrics: list = ['epa', 'yards_gained', 'success']
) -> pd.DataFrame:
    """
    Compare multiple players on specified metrics.

    Returns a DataFrame with one row per player showing
    their average for each metric.
    """
    pass

C.3. Write a Class: GameAnalyzer ⭐⭐⭐

class GameAnalyzer:
    """
    Analyze a single NFL game.

    Usage:
        analyzer = GameAnalyzer(pbp, game_id='2023_01_DET_KC')
        analyzer.summary()
        analyzer.key_plays(n=5)
        analyzer.team_comparison()
    """

    def __init__(self, pbp: pd.DataFrame, game_id: str):
        pass

    def summary(self) -> dict:
        """Return game summary statistics."""
        pass

    def key_plays(self, n: int = 5) -> pd.DataFrame:
        """Return the n highest-impact plays by EPA."""
        pass

    def team_comparison(self) -> pd.DataFrame:
        """Compare the two teams on key metrics."""
        pass

    def quarter_breakdown(self) -> pd.DataFrame:
        """Show statistics by quarter."""
        pass

C.4. Write a Function: Rolling Metrics ⭐⭐⭐

def calculate_rolling_metrics(
    pbp: pd.DataFrame,
    player_col: str,
    metric_col: str,
    window: int = 100
) -> pd.DataFrame:
    """
    Calculate rolling metrics for players across their plays.

    Returns DataFrame with player, play number, and rolling average.
    Useful for tracking player performance trends.
    """
    pass

C.5. Build a Data Pipeline ⭐⭐⭐

Create a complete pipeline that: - Loads multiple seasons of data - Cleans and standardizes fields - Adds derived columns - Caches intermediate results - Outputs player season summaries


Part D: NumPy and Performance ⭐⭐-⭐⭐⭐

D.1. Vectorization Exercise ⭐⭐

Rewrite this loop-based function using NumPy vectorization:

def calculate_garbage_time_slow(pbp):
    results = []
    for idx, row in pbp.iterrows():
        if (row['game_seconds_remaining'] < 300 and
            abs(row['posteam_score'] - row['defteam_score']) > 17):
            results.append(1)
        else:
            results.append(0)
    return results

# Write a vectorized version that's 100x+ faster
def calculate_garbage_time_fast(pbp):
    pass

D.2. Memory Optimization ⭐⭐

Load play-by-play data and: - Check initial memory usage - Identify columns that could use smaller data types - Apply optimizations - Report memory savings

D.3. Performance Comparison ⭐⭐⭐

For each of these tasks, write two versions (slow and fast) and time them: 1. Calculate the mean EPA for each team 2. Add a column indicating if EPA is above team average 3. Calculate week-over-week change in EPA for each player


Part E: Project Integration ⭐⭐⭐⭐

E.1. Build a Reusable Analysis Module

Create a Python module (football_utils.py) containing: - Data loading functions with caching - Common filters (passes, rushes, garbage time, etc.) - Standard aggregation functions - Utility functions for common calculations

Include comprehensive docstrings and type hints.

E.2. Create a Player Analysis Report Generator

Build a function that takes a player ID and season, then generates a complete statistical report including: - Season summary statistics - Weekly performance trend - Comparison to league average - Situational splits (down, field position, etc.)

Output as a formatted dictionary or markdown string.

E.3. Implement a Testing Suite

Write pytest tests for at least 5 functions from this chapter: - Test normal cases - Test edge cases (empty data, missing values) - Test error handling


Solutions

Selected solutions available in code/exercise-solutions.py