Exercises: Introduction to Football Analytics

These exercises progress from foundational concept checks to challenging applications. Estimated completion time: 3-4 hours.

Scoring Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)

Part A: Conceptual Understanding ⭐

Test your understanding of core concepts. No calculations required.

A.1. In your own words, explain the difference between a statistic and an analytic. Use a specific football example to illustrate.

A.2. A sports radio host claims: "Analytics is ruining football by taking the human element out of the game." Write a two-paragraph response explaining why this characterization misunderstands the role of analytics.

A.3. Classify each of the following questions as descriptive, predictive, or prescriptive analytics:

a) How many touchdowns did the Ravens score last season? b) What is the probability the Chiefs make the playoffs next year? c) Should the Cowboys draft a wide receiver or a defensive back with their first pick? d) Which quarterback had the highest passer rating in Week 10? e) Will signing Player X improve our win total? f) What is the optimal play call on third-and-five from the opponent's 40-yard line?

A.4. Explain the "signal and noise" problem in football analytics. Why does this problem make football analysis more difficult than, say, analyzing data from a manufacturing process?

A.5. A colleague argues that "analytics can never evaluate leadership and intangibles, so it's fundamentally incomplete." Do you agree or disagree? What is the appropriate response to this critique?

A.6. Describe three ways that traditional scouting and analytics can complement each other in player evaluation. For each, explain what each approach contributes.

A.7. Why is the question "Is our quarterback good?" poorly defined for analytical purposes? Rewrite it as a well-defined analytical question.

A.8. List the six stages of the football analytics workflow. For each stage, describe one common mistake that analysts make.

Part B: Calculations and Reasoning ⭐⭐

Apply concepts to solve problems. Show your reasoning.

B.1. A quarterback has a 65% completion rate over 400 attempts. Another quarterback has a 68% completion rate over 150 attempts.

a) Which quarterback has the higher observed completion rate? b) Which quarterback's rate is more reliable as an estimate of their true skill? Why? c) If we were projecting next year's completion rate, how should we account for the different sample sizes?

B.2. Consider the following fourth-down scenario:

Fourth-and-2 from the opponent's 35-yard line
If you convert, your expected points are approximately +2.5
If you fail, the opponent's expected points from their 35-yard line are approximately +1.8
Your probability of converting fourth-and-2 is 60%
If you punt, the opponent's expected points from their own 10-yard line are approximately -0.5

Calculate the expected points for: a) Going for it b) Punting c) Based on expected value, what should the team do?

B.3. Two metrics for evaluating running backs have the following year-to-year correlations:

Yards per carry: r = 0.35
Success rate: r = 0.52

a) Which metric is more stable from year to year? b) If a running back had a yards-per-carry of 5.2 (league average 4.3) last year, would you expect his YPC next year to be above or below 5.2? Explain using regression to the mean. c) How does metric stability affect how we should use these metrics for projection?

B.4. An analyst claims: "Team A allowed only 18 points per game, while Team B allowed 24 points per game. Therefore, Team A's defense is clearly better."

List three reasons why this conclusion might be wrong. For each, explain what additional information you would need to make a valid comparison.

B.5. You're building a model to predict whether a pass will be completed. You have two candidate features:

Air yards (how far downfield the pass travels)
Number of defenders within 5 yards of the receiver at the catch point

For each feature: a) Explain the intuition for why it might be predictive b) Describe a limitation or potential issue with using it c) What additional features might interact with it?

B.6. A team's analytics department claims their fourth-down model improved decisions by 1.5 expected points over the season. The team played 17 games.

a) On average, how much expected value did the model add per game? b) If typical game margins are around 7 points with a standard deviation of 14 points, how would you characterize the impact of this improvement? c) What factors might cause the actual impact to differ from the expected impact?

Part C: Programming Challenges ⭐⭐-⭐⭐⭐

Implement solutions in Python. All code should be well-documented and tested.

C.1. Basic Classification ⭐⭐

Write a function classify_analytics_question that takes a question string and returns whether it's likely "descriptive", "predictive", or "prescriptive" based on keyword analysis.

def classify_analytics_question(question: str) -> str:
    """
    Classify a football analytics question by type.

    Parameters
    ----------
    question : str
        A football analytics question in plain text

    Returns
    -------
    str
        One of: "descriptive", "predictive", "prescriptive"

    Examples
    --------
    >>> classify_analytics_question("How many touchdowns did we score?")
    "descriptive"
    >>> classify_analytics_question("Will we make the playoffs?")
    "predictive"
    >>> classify_analytics_question("Should we go for it on fourth down?")
    "prescriptive"
    """
    # Your code here
    pass


# Test cases - your function should pass these
assert classify_analytics_question("What was our third-down conversion rate?") == "descriptive"
assert classify_analytics_question("What will be our record next season?") == "predictive"
assert classify_analytics_question("Should we sign this free agent?") == "prescriptive"

C.2. Workflow Tracker ⭐⭐

Create a class AnalyticsProject that tracks the workflow stages of an analytics project.

class AnalyticsProject:
    """
    Tracks the workflow stages of a football analytics project.

    Attributes
    ----------
    name : str
        Name of the project
    question : str
        The defined question being answered
    stages : dict
        Completion status of each workflow stage

    Methods
    -------
    define_question(question: str)
        Set the project question (completes stage 1)
    complete_stage(stage: str)
        Mark a stage as complete
    get_progress() -> dict
        Return current progress summary
    is_ready_for_next() -> tuple
        Check if ready for next stage
    """

    STAGES = [
        'question_defined',
        'data_gathered',
        'data_cleaned',
        'analysis_complete',
        'results_interpreted',
        'results_communicated'
    ]

    def __init__(self, name: str):
        # Your code here
        pass

    def define_question(self, question: str) -> None:
        # Your code here
        pass

    def complete_stage(self, stage: str) -> None:
        # Your code here
        pass

    def get_progress(self) -> dict:
        # Your code here
        pass

    def is_ready_for_next(self) -> tuple:
        # Your code here
        pass


# Test your implementation
project = AnalyticsProject("QB Evaluation")
project.define_question("Which QB should we target in free agency?")
project.complete_stage("data_gathered")
progress = project.get_progress()
assert progress['question_defined'] == True
assert progress['data_gathered'] == True
assert progress['data_cleaned'] == False

C.3. Expected Value Calculator ⭐⭐⭐

Implement a fourth-down decision calculator using expected value analysis.

from dataclasses import dataclass
from typing import Tuple

@dataclass
class FourthDownScenario:
    """Represents a fourth-down situation."""
    yards_to_go: int
    field_position: int  # Yards from own goal line (e.g., 65 = opponent's 35)
    conversion_probability: float
    ep_if_convert: float
    ep_if_fail: float
    ep_if_punt: float
    ep_if_field_goal: float = None  # None if out of FG range
    fg_probability: float = None


def calculate_fourth_down_ev(scenario: FourthDownScenario) -> dict:
    """
    Calculate expected value for each fourth-down option.

    Parameters
    ----------
    scenario : FourthDownScenario
        The fourth-down situation with all relevant parameters

    Returns
    -------
    dict
        Expected values and recommendation:
        {
            'ev_go': float,
            'ev_punt': float,
            'ev_fg': float or None,
            'recommendation': str,
            'ev_advantage': float
        }

    Examples
    --------
    >>> scenario = FourthDownScenario(
    ...     yards_to_go=2,
    ...     field_position=65,
    ...     conversion_probability=0.60,
    ...     ep_if_convert=2.5,
    ...     ep_if_fail=-1.8,
    ...     ep_if_punt=-0.5
    ... )
    >>> result = calculate_fourth_down_ev(scenario)
    >>> result['recommendation']
    'go'
    """
    # Your code here
    pass


# Test with multiple scenarios
scenarios = [
    # Obvious go-for-it
    FourthDownScenario(1, 75, 0.75, 4.0, -2.0, -0.5),
    # Obvious punt
    FourthDownScenario(10, 25, 0.20, 1.0, -3.0, 0.2),
    # Field goal situation
    FourthDownScenario(3, 70, 0.50, 4.0, -2.0, -0.5, 2.3, 0.85),
]

for i, s in enumerate(scenarios):
    result = calculate_fourth_down_ev(s)
    print(f"Scenario {i+1}: Recommend {result['recommendation']} "
          f"(EV advantage: {result['ev_advantage']:.2f})")

C.4. Metric Stability Simulator ⭐⭐⭐

Write a simulation to demonstrate why metric stability matters for projection.

import numpy as np
from typing import Tuple

def simulate_metric_stability(
    true_skill: float,
    sample_size: int,
    noise_level: float,
    n_simulations: int = 10000
) -> Tuple[float, float, float]:
    """
    Simulate observed performance given true skill and noise.

    Parameters
    ----------
    true_skill : float
        The player's true underlying skill level
    sample_size : int
        Number of observations (e.g., pass attempts)
    noise_level : float
        Standard deviation of random noise
    n_simulations : int
        Number of simulations to run

    Returns
    -------
    tuple
        (mean_observed, std_observed, correlation_with_true)

    Examples
    --------
    >>> np.random.seed(42)
    >>> mean_obs, std_obs, corr = simulate_metric_stability(0.65, 400, 0.05)
    >>> 0.64 < mean_obs < 0.66
    True
    """
    # Your code here
    pass


def demonstrate_regression_to_mean(
    league_avg: float,
    observed: float,
    reliability: float
) -> float:
    """
    Calculate regressed estimate using reliability.

    Parameters
    ----------
    league_avg : float
        League average for the metric
    observed : float
        Observed value for the player
    reliability : float
        Year-to-year correlation (0-1)

    Returns
    -------
    float
        Regressed estimate

    Examples
    --------
    >>> demonstrate_regression_to_mean(0.65, 0.72, 0.50)
    0.685
    """
    # Regression formula: regressed = mean + reliability * (observed - mean)
    # Your code here
    pass


# Demonstrate with different sample sizes
print("Effect of Sample Size on Metric Reliability:")
for n in [50, 150, 400, 600]:
    np.random.seed(42)
    _, std, _ = simulate_metric_stability(0.65, n, 0.15)
    print(f"  n={n:3d}: Standard deviation of observed = {std:.4f}")

C.5. Portfolio Project Tracker ⭐⭐⭐

Create a system to track the components of a football analytics career portfolio.

from datetime import datetime
from typing import List, Optional

class PortfolioProject:
    """Represents a single portfolio project."""

    def __init__(
        self,
        title: str,
        project_type: str,  # 'code', 'writing', 'visualization', 'competition'
        url: Optional[str] = None,
        description: str = "",
        skills_demonstrated: List[str] = None
    ):
        self.title = title
        self.project_type = project_type
        self.url = url
        self.description = description
        self.skills_demonstrated = skills_demonstrated or []
        self.date_added = datetime.now()

    def to_dict(self) -> dict:
        # Your code here
        pass


class AnalyticsPortfolio:
    """
    Manages a football analytics career portfolio.

    Tracks projects across categories and evaluates completeness.
    """

    RECOMMENDED_CATEGORIES = {
        'code': 3,          # GitHub projects
        'writing': 2,       # Blog posts/articles
        'visualization': 2, # Dashboards/visualizations
        'competition': 1    # Big Data Bowl, Kaggle, etc.
    }

    ESSENTIAL_SKILLS = [
        'python',
        'sql',
        'statistics',
        'visualization',
        'machine_learning',
        'communication',
        'football_knowledge'
    ]

    def __init__(self, name: str):
        self.name = name
        self.projects: List[PortfolioProject] = []

    def add_project(self, project: PortfolioProject) -> None:
        # Your code here
        pass

    def get_category_counts(self) -> dict:
        # Your code here
        pass

    def get_skills_coverage(self) -> dict:
        # Your code here
        pass

    def get_gaps(self) -> dict:
        """
        Identify gaps in portfolio coverage.

        Returns
        -------
        dict
            {
                'missing_categories': list,
                'undercovered_categories': list,
                'missing_skills': list
            }
        """
        # Your code here
        pass

    def generate_report(self) -> str:
        """Generate a text report of portfolio status."""
        # Your code here
        pass


# Test your implementation
portfolio = AnalyticsPortfolio("Jane Analyst")
portfolio.add_project(PortfolioProject(
    "EPA Calculator",
    "code",
    "github.com/jane/epa-calc",
    "Python package for calculating EPA",
    ["python", "statistics", "football_knowledge"]
))
portfolio.add_project(PortfolioProject(
    "QB Analysis Blog Post",
    "writing",
    "blog.com/qb-analysis",
    "Analysis of QB performance metrics",
    ["communication", "statistics", "visualization"]
))
print(portfolio.generate_report())

Part D: Analysis & Interpretation ⭐⭐-⭐⭐⭐

Apply concepts to realistic scenarios. Justify your reasoning.

D.1. Case Analysis ⭐⭐

A local sports radio host makes the following claims during a debate about analytics:

"The analytics say you should always go for it on fourth down, which is ridiculous."
"If analytics worked, the nerds in the front office would just simulate the whole season and know who wins the Super Bowl."
"The best scout can evaluate a player in person better than any computer looking at numbers."

For each claim: a) Identify the misunderstanding or logical flaw b) Provide a more accurate characterization c) Suggest evidence or examples that would support the correct view

D.2. Scenario Analysis ⭐⭐⭐

You've been hired as the first analytics hire for an NFL team that has historically been skeptical of analytics. The head coach is open-minded but the scouting department is resistant.

a) What would be your priorities for the first 90 days? b) How would you build credibility with the scouting department? c) What kind of project would you propose as a "quick win"? d) What mistakes should you avoid?

D.3. Decision Framework ⭐⭐⭐

The general manager asks you: "Should we use analytics to make our draft picks?"

This question, as stated, is poorly formed. Write a response (2-3 paragraphs) that: a) Explains why the question needs refinement b) Proposes better questions to ask c) Describes what an integrated analytics-scouting approach might look like d) Acknowledges limitations and uncertainties

D.4. Ethics Analysis ⭐⭐⭐

Consider the following ethical questions in football analytics:

a) A team discovers through analytics that a player's performance is likely to decline significantly. Should they trade that player without disclosing this analysis? What if they know other teams don't have similar analytical capabilities?

b) An analytics model finds that certain playing styles lead to higher injury rates. If a team uses this to exploit opponent vulnerabilities, knowing it might result in injuries, is that ethical?

c) Analytics might reveal that certain positions (like running back) provide less value than their contracts suggest. Should teams openly share this information, knowing it could affect players' earning potential league-wide?

For each question, present arguments on both sides and state your own position with justification.

Part E: Research & Extension ⭐⭐⭐⭐

Open-ended problems for deeper exploration. These may require outside research.

E.1. Historical Research

Research the history of football analytics at one NFL team (Patriots, Eagles, Ravens, and Browns are good choices due to documented analytical investments).

Your report should include: a) Timeline of key analytical hires and initiatives b) Documented analytical successes or failures c) How the team's approach has evolved d) Current structure of their analytics department (if known) e) Lessons for other teams

Minimum 1,500 words with at least 5 cited sources.

E.2. Comparative Analysis

Compare the state of analytics across at least three major professional sports (NFL, MLB, NBA, NHL, soccer, etc.).

Address: a) Data availability in each sport b) Adoption of analytics by teams c) Impact on strategy and player evaluation d) Remaining frontiers

Create a comparison table and write 2-3 paragraphs of analysis for each sport.

E.3. Future Trends

Write a 1,000-word essay predicting how football analytics will evolve over the next 10 years. Consider:

a) What new data sources might become available? b) How might machine learning and AI change analysis? c) What questions that are currently intractable might become answerable? d) How might the relationship between analytics and traditional football people evolve? e) What regulatory or competitive changes might affect analytics?

Support your predictions with current trends and research.

Solutions

Selected solutions are available in: - code/exercise-solutions.py (programming problems) - appendices/g-answers-to-selected-exercises.md (odd-numbered problems)

Full solutions available to instructors upon request.