Exercises: Advanced Passing Metrics
Level 1: Conceptual Understanding
Exercise 1.1: EPA Basics
Explain in your own words why a 10-yard completion on 3rd and 5 has a different EPA than a 10-yard completion on 1st and 10.
Exercise 1.2: CPOE Interpretation
A quarterback has a 68% completion percentage and a CPOE of -3.2%. What does this tell you about: - The difficulty of his throws? - His actual accuracy? - How he compares to an average QB throwing the same passes?
Exercise 1.3: Air Yards Concepts
Two quarterbacks have the same completion percentage (65%): - QB A has an aDOT of 6.5 yards - QB B has an aDOT of 10.2 yards
Which quarterback do you expect has a higher CPOE? Explain your reasoning.
Exercise 1.4: Pressure Impact
Why is it important to analyze quarterback performance separately for clean pocket and under pressure situations?
Exercise 1.5: YAC vs Air Yards
A receiver has 800 receiving yards with an average of 10 yards per reception: - 350 yards came from air yards - 450 yards came from YAC
What does this distribution suggest about the receiver's role in the offense?
Level 2: Basic Calculations
Exercise 2.1: Calculate EPA
Given the following expected points values, calculate the EPA for each play:
| Play | EP Before | EP After | Result |
|---|---|---|---|
| A | 2.5 | 3.8 | Completion |
| B | 1.2 | -0.5 | Interception |
| C | 0.8 | 7.0 | Touchdown |
| D | 3.2 | 2.1 | Sack |
Exercise 2.2: Expected Completions
Using a simplified completion probability model where: - Completion Probability = 0.85 - (0.015 × Air Yards)
Calculate the expected number of completions for these 5 passes: | Pass | Air Yards | |------|-----------| | 1 | 5 | | 2 | 12 | | 3 | 3 | | 4 | 20 | | 5 | 8 |
Exercise 2.3: CPOE Calculation
A quarterback throws 40 passes with the following results: - 28 completions (actual) - Sum of completion probabilities: 26.4 (expected completions)
Calculate: - Actual completion percentage - Expected completion percentage - CPOE
Exercise 2.4: Air Yards Share
A quarterback's completions result in: - Total receiving yards: 3,200 - Completed air yards: 2,100
Calculate the air yards share and interpret what it means.
Exercise 2.5: Pressure-Adjusted Stats
A quarterback has these splits:
| Situation | Attempts | Completions | Yards |
|---|---|---|---|
| Clean Pocket | 280 | 196 | 2,520 |
| Under Pressure | 95 | 52 | 580 |
Calculate: - Completion percentage for each situation - YPA for each situation - The completion percentage drop under pressure
Level 3: Implementation Exercises
Exercise 3.1: EPA Calculator Class
Implement a Python class that calculates EPA for passing plays:
class PassingEPACalculator:
"""Calculate EPA for passing plays."""
def __init__(self):
# Initialize EP lookup table
pass
def get_expected_points(self, yard_line: int, down: int, distance: int) -> float:
"""Return expected points for situation."""
pass
def calculate_play_epa(self, play: Dict) -> float:
"""Calculate EPA for a single play."""
pass
def aggregate_epa(self, plays: List[Dict]) -> Dict:
"""Calculate aggregate EPA metrics."""
pass
Requirements: - Handle completions, incompletions, interceptions, and touchdowns - Return both per-play and aggregate metrics
Exercise 3.2: CPOE Analysis
Create a function that analyzes CPOE by depth zone:
def analyze_cpoe_by_depth(passes: List[Dict]) -> pd.DataFrame:
"""
Analyze CPOE broken down by pass depth.
Parameters:
-----------
passes : list
List of passes with air_yards, completed, exp_completion_prob
Returns:
--------
pd.DataFrame with columns:
- depth_zone
- attempts
- actual_comp_pct
- expected_comp_pct
- cpoe
"""
pass
Depth zones should be: Behind LOS, Short (0-9), Medium (10-19), Deep (20+)
Exercise 3.3: Air Yards Metrics
Implement a complete air yards analysis function:
def calculate_air_yards_metrics(passes: List[Dict]) -> Dict:
"""
Calculate comprehensive air yards metrics.
Returns dict with:
- intended_air_yards (total)
- completed_air_yards
- iay_per_attempt
- cay_per_attempt
- adot
- yac_total
- yac_per_completion
- air_yards_share
- depth_distribution (dict with short, medium, deep percentages)
"""
pass
Exercise 3.4: Pressure Analysis
Build a pressure analysis tool:
def create_pressure_splits(passes: List[Dict]) -> Dict:
"""
Create comprehensive pressure analysis.
Returns:
- clean_pocket_stats (comp%, ypa, td%, int%)
- pressure_stats
- pressure_rate
- pressure_to_sack_rate
- adjusted_comp_pct (normalized to league avg pressure)
"""
pass
Exercise 3.5: QB Comparison Tool
Create a function that generates side-by-side QB comparisons:
def compare_quarterbacks(qb1_passes: List[Dict], qb1_name: str,
qb2_passes: List[Dict], qb2_name: str) -> str:
"""
Generate formatted comparison of two quarterbacks.
Include:
- Traditional stats
- CPOE
- Air yards metrics
- Pressure performance
- Advantages for each QB
"""
pass
Level 4: Advanced Analysis
Exercise 4.1: Build a Completion Probability Model
Using the provided sample data, build a logistic regression model to predict completion probability:
# Sample features to include:
# - air_yards
# - under_pressure (0/1)
# - third_down (0/1)
# - seconds_remaining (game time pressure)
# - score_differential
from sklearn.linear_model import LogisticRegression
def train_completion_probability_model(training_data: pd.DataFrame) -> LogisticRegression:
"""
Train a completion probability model.
Parameters:
-----------
training_data : pd.DataFrame
DataFrame with features and 'completed' target
Returns:
--------
LogisticRegression : Trained model
"""
pass
def evaluate_model(model, test_data: pd.DataFrame) -> Dict:
"""
Evaluate model accuracy with:
- Log loss
- Calibration (predicted vs actual completion rates by decile)
- AUC-ROC
"""
pass
Exercise 4.2: Situational EPA Analysis
Create an analysis that breaks down EPA by game situations:
def analyze_situational_epa(passes: List[Dict]) -> pd.DataFrame:
"""
Break down EPA by:
- Down (1st, 2nd, 3rd, 4th)
- Field position (own territory, midfield, red zone)
- Score differential (leading, tied, trailing)
- Quarter (1-4)
Return DataFrame with EPA per dropback for each situation.
"""
pass
Exercise 4.3: Opponent-Adjusted Metrics
Implement opponent-adjusted passing statistics:
def calculate_opponent_adjusted_stats(
passes: List[Dict],
opponent_pass_def_ranks: Dict[str, int],
league_avg_stats: Dict
) -> Dict:
"""
Adjust QB stats for opponent defensive strength.
Parameters:
-----------
passes : list
Passes with 'opponent' field
opponent_pass_def_ranks : dict
Mapping of team to pass defense rank (1-130)
league_avg_stats : dict
League average comp%, ypa, etc.
Returns:
--------
dict with:
- raw_comp_pct
- opponent_adjusted_comp_pct
- adjustment_factor
- performance_by_opponent_tier
"""
pass
Exercise 4.4: EPA Stability Analysis
Analyze how stable EPA is from week to week:
def analyze_epa_stability(weekly_epa: List[Tuple[int, float]]) -> Dict:
"""
Analyze week-to-week EPA stability.
Parameters:
-----------
weekly_epa : list
List of (week, epa_per_dropback) tuples
Returns:
--------
dict with:
- mean_epa
- std_epa
- coefficient_of_variation
- trend (improving/declining/stable)
- consistency_score
"""
pass
Exercise 4.5: Composite QB Score
Design and implement a composite QB scoring system:
class CompositeQBScore:
"""
Multi-factor QB scoring system.
Factors (with suggested weights):
- CPOE (25%)
- EPA per dropback (25%)
- Pressure performance (15%)
- Big play rate (10%)
- Turnover avoidance (15%)
- Situational performance (10%)
"""
def __init__(self, weights: Dict[str, float] = None):
pass
def calculate_score(self, qb_stats: Dict) -> float:
"""Calculate composite score 0-100."""
pass
def get_grade(self, score: float) -> str:
"""Convert score to letter grade."""
pass
def explain_score(self, qb_stats: Dict) -> str:
"""Return breakdown of score components."""
pass
Level 5: Research Projects
Exercise 5.1: CPOE Predictive Value
Research question: How well does CPOE predict future quarterback performance?
Tasks: 1. Calculate season-by-season CPOE for at least 20 quarterbacks 2. Analyze year-to-year correlation (is CPOE stable?) 3. Compare CPOE's predictive value to traditional completion percentage 4. Write a report with visualizations
Exercise 5.2: Air Yards and Offensive Scheme
Analyze how air yards metrics vary by offensive scheme:
Tasks: 1. Classify teams into offensive scheme categories (air raid, west coast, pro-style, spread, etc.) 2. Calculate average aDOT, YAC, and air yards share by scheme 3. Analyze whether CPOE means the same thing across different schemes 4. Create visualizations showing scheme differences
Exercise 5.3: Pressure Impact Study
Comprehensive study of pressure's impact on passing:
Tasks: 1. Calculate league-wide completion percentage drop under pressure 2. Identify QBs who are most/least affected by pressure 3. Analyze whether pressure resilience is stable year-to-year 4. Study relationship between offensive line quality and QB metrics 5. Present findings with statistical significance tests
Exercise 5.4: Building a Better QB Metric
Design, implement, and validate a new composite QB metric:
Tasks: 1. Identify limitations of existing metrics 2. Propose new composite metric with theoretical justification 3. Implement calculation in Python 4. Validate against future performance 5. Compare to existing metrics (passer rating, QBR, EPA) 6. Write paper with methodology and results
Exercise 5.5: Game Script and EPA
Analyze how game script affects EPA metrics:
Tasks: 1. Define game script categories (blowout win/loss, close game, comeback, etc.) 2. Calculate EPA per dropback by game script 3. Identify which QBs have most variance by game script 4. Discuss implications for QB evaluation 5. Propose adjustments for "garbage time" and game context
Bonus Challenges
Challenge A: Real-Time EPA Calculator
Build an interactive tool that calculates EPA in real-time: - User inputs: yard line, down, distance, play result - Tool outputs: EP before, EP after, EPA - Visualize the EP curve
Challenge B: Historical CPOE Analysis
If you can access historical play-by-play data: - Calculate CPOE for top QB seasons historically - Identify the most accurate seasons ever by CPOE - Compare to passer rating rankings
Challenge C: Machine Learning CPOE
Build a more sophisticated completion probability model: - Use gradient boosting or neural network - Include more features (weather, stadium, time of day) - Compare to simple logistic regression model - Analyze feature importance
Solutions
Solutions are available in code/exercise-solutions.py