A Retrospective" chapter: 29 difficulty: advanced estimated_time: "90 minutes" data_required: false


Case Study 2: Leicester City's 2015--16 Title --- A Retrospective

"5000-to-1 odds are not assigned to events that happen." --- Pre-season bookmaker consensus, August 2015

Executive Summary

Leicester City's 2015--16 Premier League title remains the most statistically improbable achievement in the history of major European football. This case study applies modern analytics techniques --- many of which did not exist in their current form during that season --- to retrospectively analyze how Leicester won the league, what the data revealed at the time, and what lessons the achievement holds for the relationship between analytics and sporting outcomes.

Skills Applied: - Expected goals and expected points analysis - Defensive efficiency metrics and counter-attacking profiling - Statistical modeling of league outcomes (Monte Carlo simulation) - Bayesian updating of title probability over the course of a season - Critical evaluation of analytical frameworks and their limitations


Background

The Context

In the 2014--15 season, Leicester City narrowly avoided relegation from the Premier League, winning 7 of their final 9 matches under new manager Claudio Ranieri to finish 14th. Their pre-season odds for the 2015--16 title were 5000-to-1 --- the longest odds ever offered for an eventual champion in any major European football league.

The squad was assembled at a fraction of the cost of their rivals:

Club Estimated Squad Cost Final Position
Manchester City ~500M 4th
Manchester United ~475M 5th
Chelsea ~450M 10th
Arsenal ~350M 2nd
Liverpool ~325M 8th
Tottenham ~275M 3rd
Leicester City ~55M 1st

Key Personnel

  • Claudio Ranieri (Manager): Experienced Italian coach known for pragmatic, defensively organized teams.
  • Jamie Vardy (Striker): Signed from non-league Fleetwood Town in 2012. Set a Premier League record with goals in 11 consecutive matches.
  • Riyad Mahrez (Winger): Signed from Le Havre in 2014 for ~400,000. Won PFA Player of the Year.
  • N'Golo Kante (Midfielder): Signed from Caen for ~7.6M. Revolutionized the defensive midfielder role.
  • Kasper Schmeichel (Goalkeeper): Son of Manchester United legend Peter Schmeichel. Provided consistent shot-stopping.

Analytical Retrospective

Expected Goals and Expected Points

Applying modern xG models to Leicester's 2015--16 season reveals a nuanced picture:

Metric Leicester League Average Rank
Goals scored 68 49.0 3rd
xG (expected goals) 55.2 49.0 7th
Goals conceded 36 49.0 3rd
xGA (expected goals against) 43.8 49.0 6th
Actual points 81 50.0 1st
Expected points (from xG/xGA) 62.3 50.0 5th
Points overperformance +20.7 0 1st

The xG data reveals that Leicester significantly overperformed their expected metrics. An 20.7-point overperformance is extraordinarily rare --- in the top 0.5% of all team-seasons in Premier League history.

$$ \text{Overperformance} = \text{Actual Points} - \text{xPoints} = 81 - 62.3 = 20.7 $$

Decomposing the Overperformance

The 20.7-point gap can be attributed to several factors:

1. Finishing quality (approximately +7 points)

Leicester's shot conversion rate was 19.4%, compared to an expected conversion rate (from xG) of approximately 14.5%. This was driven primarily by Vardy and Mahrez, who both finished above their xG by significant margins.

$$ \text{Finishing surplus} = \text{Goals} - \text{xG} = 68 - 55.2 = 14.8 \text{ goals} \approx 7 \text{ points} $$

2. Goalkeeping and defensive overperformance (approximately +6 points)

Schmeichel's save percentage on shots from inside the box was 78%, compared to a league average of 68%. The defensive unit, organized by Ranieri and anchored by Robert Huth and Wes Morgan, consistently outperformed positional expectations.

3. Game-state management (approximately +5 points)

Leicester won an unusually high proportion of close matches (1-0 victories: 8; 2-1 victories: 5). Their record in matches decided by a single goal was 13-5-2, a winning percentage that far exceeded statistical expectation.

Counter-Attacking Profile

Modern positional data analysis reveals Leicester's tactical distinctiveness:

import numpy as np
import pandas as pd

def analyze_counterattack_profile(
    team_events: pd.DataFrame,
    league_events: pd.DataFrame
) -> dict:
    """Analyze a team's counter-attacking tendencies vs league average.

    Args:
        team_events: Event data for the team's season.
        league_events: Event data for all teams in the league.

    Returns:
        Dictionary of counter-attacking metrics and percentile ranks.
    """
    def compute_metrics(events: pd.DataFrame) -> dict:
        total_attacks = events[events["type"] == "shot"].shape[0]
        transitions = events[
            events["play_pattern"] == "from_counter"
        ]
        return {
            "counter_attack_pct": len(transitions) / max(total_attacks, 1),
            "avg_sequence_length": events.groupby("possession_id").size().mean(),
            "direct_speed": events[
                events["type"] == "carry"
            ]["speed"].mean() if "speed" in events.columns else np.nan,
        }

    team_metrics = compute_metrics(team_events)
    league_metrics = compute_metrics(league_events)

    return {
        "team": team_metrics,
        "league_avg": league_metrics,
        "counter_reliance_ratio": (
            team_metrics["counter_attack_pct"]
            / max(league_metrics["counter_attack_pct"], 0.01)
        ),
    }

Leicester's profile was extreme: - Possession: 42.6% (19th in the league --- only Aston Villa, who were relegated, had less) - Counter-attacks leading to shots: 28% of all shots (league average: 14%) - Average passing sequence before a shot: 3.8 passes (league average: 7.6 passes) - Percentage of goals from fast breaks: 34% (league average: 16%)

The Kante Effect

N'Golo Kante's defensive contribution was extraordinary and is best understood through spatial analysis:

  • Interceptions per 90: 4.7 (1st in the league by a wide margin)
  • Tackles per 90: 4.2 (2nd in the league)
  • Defensive action coverage area: Kante's defensive actions covered approximately 35% more pitch area than the average central midfielder, effectively allowing Leicester to defend with an extra player.

This spatial dominance enabled Leicester's tactical system: by winning the ball in midfield, Kante triggered the counter-attacks that were Leicester's primary attacking weapon.

Bayesian Title Probability

We can model how the probability of Leicester winning the title evolved over the season using Bayesian updating:

$$ P(\text{title} \mid \text{data at matchweek } t) \propto P(\text{data at matchweek } t \mid \text{title}) \cdot P(\text{title}) $$

Starting with a prior probability of 0.02% (reflecting the 5000-to-1 odds), the posterior probability updated as follows:

Matchweek Points Position Title Probability
1 3 5th 0.03%
10 22 3rd 0.8%
15 32 1st 4.2%
20 44 1st 20.5%
25 53 1st 42.0%
30 63 1st 74.3%
35 73 1st 95.1%
36 76 Champions 100%

Statistical Note: Even at Christmas (matchweek 20), with Leicester 5 points clear at the top, most statistical models gave them less than a 20% chance of winning the title. This reflects the historical base rate: teams leading at Christmas win the title approximately 75% of the time, but Leicester's underlying metrics (xG, xGA) suggested significant regression was likely.

Monte Carlo Simulation of the Season

Running 100,000 simulations of the 2015--16 season using pre-season xG projections:

  • Leicester winning the title occurred in 23 out of 100,000 simulations (0.023%), roughly consistent with the bookmaker odds.
  • Even using Leicester's actual first-half-of-season performance as input, only 20.5% of simulations resulted in Leicester winning the title from the midseason point.
from scipy.stats import poisson

def simulate_season(
    team_xg_per_match: float,
    team_xga_per_match: float,
    n_remaining_matches: int,
    current_points: int,
    rival_points: int,
    rival_xg: float,
    rival_xga: float,
    n_simulations: int = 100000
) -> float:
    """Simulate remaining season to estimate title probability.

    Args:
        team_xg_per_match: Team's average xG per match.
        team_xga_per_match: Team's average xGA per match.
        n_remaining_matches: Number of matches left.
        current_points: Team's current points tally.
        rival_points: Closest rival's current points.
        rival_xg: Rival's average xG per match.
        rival_xga: Rival's average xGA per match.
        n_simulations: Number of Monte Carlo iterations.

    Returns:
        Estimated probability of winning the title.
    """
    titles_won = 0

    for _ in range(n_simulations):
        team_pts = current_points
        rival_pts = rival_points

        for _ in range(n_remaining_matches):
            # Simulate team match
            goals_for = poisson.rvs(team_xg_per_match)
            goals_against = poisson.rvs(team_xga_per_match)
            if goals_for > goals_against:
                team_pts += 3
            elif goals_for == goals_against:
                team_pts += 1

            # Simulate rival match
            r_goals_for = poisson.rvs(rival_xg)
            r_goals_against = poisson.rvs(rival_xga)
            if r_goals_for > r_goals_against:
                rival_pts += 3
            elif r_goals_for == r_goals_against:
                rival_pts += 1

        if team_pts > rival_pts:
            titles_won += 1

    return titles_won / n_simulations

What the Analytics Community Got Wrong

The Regression Fallacy

Throughout the season, many analysts predicted Leicester would regress to the mean. This prediction was statistically sound --- teams that significantly overperform xG almost always regress --- but it missed several crucial factors:

  1. The finishing was not random. Vardy and Mahrez were genuinely elite finishers in that season, not average players enjoying a lucky streak. Context-dependent finishing models (which account for shot type, body position, and defensive pressure) showed a smaller overperformance than basic xG suggested.

  2. The tactical system was self-reinforcing. Leicester's counter-attacking approach generated high-quality chances (shots from fast breaks have inherently higher conversion rates). Basic xG models undervalue these opportunities because they do not fully capture the disorganization of the defending team.

  3. Psychological momentum was real. As the season progressed, Leicester played with increasing confidence while their rivals wilted under pressure. Analytics at the time had no way to model this.

The Sample Size Problem

The 38-match Premier League season is a notoriously small sample. Leicester's 81 points were within the 95% confidence interval of their expected points, albeit at the extreme upper end. A single-season overperformance of this magnitude, while rare, is not impossible.

The critical question is whether the overperformance should have been attributed to luck (random variance) or skill (genuine excellence in finishing, defending, and game management). The answer, with hindsight, is both --- Leicester were genuinely excellent at specific aspects of the game, and they benefited from favorable variance at crucial moments.


Lessons for Soccer Analytics

1. Models Describe Distributions, Not Certainties

An xG model that assigned Leicester a 5th-place expected finish was not "wrong" --- it correctly identified that Leicester's performance was above their underlying metrics. But the model's point estimate obscured the range of possible outcomes, which included a title.

2. Tactical Context Matters

Generic xG models that ignore tactical context (counter-attack vs. sustained possession, organized defense vs. transition) will systematically misjudge teams with extreme tactical profiles. Leicester exposed this limitation.

3. The Human Element Resists Quantification

Team cohesion, belief, and the inspirational effect of an unprecedented achievement are real factors that influence sporting outcomes. Analytics can measure their effects (e.g., improved conversion rates under pressure) but cannot predict their emergence.

4. Black Swans Happen

Leicester's title was a "black swan" event --- highly improbable, high impact, and retrospectively explicable. Analytics should be calibrated to acknowledge that improbable events occur, not just in theory but in practice.

5. The Value of Structural Advantages

Leicester's key analytical insight was structural: by signing Kante (7.6M) and Mahrez (~400k), they acquired world-class talent at a fraction of market value. This is the ultimate analytical edge --- identifying players whose true ability is dramatically underpriced by the market.


Discussion Questions

  1. If you were Leicester's analytics department in January 2016, with the team top of the table, how would you advise the coaching staff? Would you recommend changing anything, or advocate for staying the course despite the statistical likelihood of regression?

  2. The 20.7-point overperformance is extreme but not unique in Premier League history. Research other significant overperformances and compare their causes. Do they share common features?

  3. Modern xG models are more context-aware than those available in 2015--16 (incorporating freeze-frame data, defensive positioning, etc.). Would a modern model have rated Leicester's chances more highly? Estimate by how much.

  4. Leicester's squad was dismantled over the following seasons (Kante to Chelsea, Mahrez to Manchester City). Using player valuation frameworks from Chapter 17, estimate the total value Leicester extracted from their recruitment during this period.

  5. Is the Leicester story an argument for or against the use of analytics in football? Defend your position with specific reference to the analytical findings in this case study.


Connection to Chapter Themes

This case study connects to the full breadth of techniques covered in this textbook:

  • Expected goals and expected points (Chapters 5, 8): Core metrics for evaluating Leicester's performance vs. underlying quality.
  • Monte Carlo simulation (Chapter 22): Modeling the probability distribution of season outcomes.
  • Bayesian inference (Chapter 19): Updating title probabilities as the season progressed.
  • Tactical analysis (Chapters 10, 11, 16): Counter-attacking profiling, formation analysis, pressing metrics.
  • Player valuation and scouting (Chapter 17): Identifying undervalued talent (Kante, Mahrez, Vardy).
  • Communication (Chapter 21): Framing analytical findings for non-technical audiences.