Case Study 1: Using LSTMs to Model Possession Sequences

Background

Traditional expected threat (xT) models assign a value to each location on the pitch independently, estimating the probability that a possession at that location will eventually lead to a goal. While effective as a first approximation, location-based models ignore a critical dimension of attacking play: the sequence of events that brought the ball to its current position. A shot taken at the edge of the penalty area after a 15-pass build-up through midfield carries fundamentally different information than an identical-location shot following a direct long ball. The defensive organization, the positioning of teammates, and the speed of the attack are all encoded in the event sequence --- and all are invisible to a location-only model.

This case study describes how a fictional Premier League club, Ashford United, developed an LSTM-based model to predict possession outcomes from event sequences, achieving meaningfully better calibration and discrimination than their existing grid-based xT model.

The Problem

Ashford United's analytics team identified several weaknesses in their grid-based xT model during the 2022-23 season review:

Counter-attacks were undervalued. Fast transitions that covered 50+ meters in under 8 seconds generated higher-quality chances than the xT model predicted, because the model did not account for the disorganized defensive structure.
Patient build-up was overvalued. Extended possession sequences in the opponent's half often ended in low-quality chances (recycled possession, backward passes) that the xT model scored highly due to territorial position alone.
Pressing triggers were invisible. The team had no way to detect the moment in a defensive sequence when a coordinated press was most likely to win the ball in a dangerous position.

The team hypothesized that a sequence model --- specifically an LSTM --- could address these shortcomings by learning temporal patterns in event data that grid-based models could not capture.

Data Preparation

Event Representation

The team used event data from three seasons of Premier League matches (approximately 1,140 matches). Each possession was represented as a sequence of events:

$$\mathbf{S} = [(\mathbf{e}_1, t_1), (\mathbf{e}_2, t_2), \ldots, (\mathbf{e}_T, t_T)]$$

Each event vector $\mathbf{e}_i$ encoded the following features:

Feature	Encoding	Dimension
Event type	One-hot (pass, carry, dribble, shot, etc.)	12
x-coordinate (normalized)	Continuous [0, 1]	1
y-coordinate (normalized)	Continuous [0, 1]	1
End x-coordinate	Continuous [0, 1]	1
End y-coordinate	Continuous [0, 1]	1
Time since possession start	Continuous (seconds)	1
Time since previous event	Continuous (seconds)	1
Speed of play (distance / time)	Continuous	1
Angle toward goal	Continuous (radians)	1
Progressive distance	Continuous	1
Total		21

Sequence Processing

Possessions were truncated or padded to a maximum length of 50 events. In practice, 95% of possessions contained fewer than 30 events, so the padding primarily ensured batch processing efficiency.

MAX_SEQ_LEN = 50
FEATURE_DIM = 21

# Padding shorter sequences
for i, seq in enumerate(sequences):
    length = min(len(seq), MAX_SEQ_LEN)
    padded[i, :length] = seq[:length]
    mask[i, :length] = 1.0

Target Variable

Each possession was labeled with a binary outcome: whether it resulted in a shot within the next three events after the current timestamp (for intermediate predictions) or at the end of the possession (for terminal predictions). The team chose a three-event lookahead rather than end-of-possession labeling because it captured the dynamic threat of the sequence without requiring extremely long-horizon predictions.

Train/Test Split

Data was split temporally: the first two seasons (2020-21 and 2021-22) served as training data, and the 2022-23 season was the held-out test set. This temporal split prevented data leakage from tactical trends evolving within a season.

Model Architecture

LSTM Design

The team experimented with several architectures and settled on a two-layer LSTM:

Input: (batch, 50, 21)
LSTM Layer 1: hidden_dim=64, dropout=0.2
LSTM Layer 2: hidden_dim=32, dropout=0.2
Fully Connected: 32 -> 16 (ReLU)
Output: 16 -> 1 (Sigmoid)

The model processed the entire sequence and used the final hidden state $\mathbf{h}_T$ as the representation of the full possession up to that point:

$$\hat{y} = \sigma(\mathbf{W}_o \cdot \text{ReLU}(\mathbf{W}_h \cdot \mathbf{h}_T + \mathbf{b}_h) + \mathbf{b}_o)$$

Why LSTM Over Alternatives?

The team evaluated three sequence architectures:

Architecture	Test Log Loss	Test AUC	Inference Time (ms)
Vanilla RNN	0.198	0.782	0.8
LSTM (chosen)	0.172	0.831	1.4
GRU	0.175	0.825	1.2
Transformer (4 heads)	0.169	0.838	3.1

The LSTM was chosen as the best balance of performance and inference speed. The Transformer achieved marginally better metrics but required over twice the inference time, which was a concern for potential real-time deployment. The vanilla RNN performed poorly on long possessions, consistent with its known vanishing gradient limitations.

Training Configuration

Optimizer: Adam with learning rate $1 \times 10^{-3}$, weight decay $1 \times 10^{-4}$
Batch size: 256 possessions
Epochs: 50 with early stopping (patience = 7)
Loss function: Binary cross-entropy with class weighting (shots are rare events)
Gradient clipping: Max norm of 1.0 to prevent gradient explosion in the LSTM

Results

Comparison with Grid-Based xT

Metric	Grid-Based xT	LSTM Model	Improvement
Log Loss	0.203	0.172	-17.3%
Brier Score	0.081	0.068	-18.0%
AUC-ROC	0.789	0.831	+7.3%
Calibration Error	0.034	0.018	-47.1%

The LSTM model achieved meaningful improvements across all metrics, with the largest gain in calibration --- the model's predicted probabilities more closely matched empirical frequencies.

Counter-Attack Detection

The team analyzed possessions classified as counter-attacks (ball won in own half, reaching the opponent's penalty area within 10 seconds). The LSTM model assigned these possessions 23% higher threat values on average than the grid-based model, better reflecting the empirical shot rate from counter-attacks.

Analysis of Learned Representations

By extracting the hidden state $\mathbf{h}_T$ for each possession and applying t-SNE dimensionality reduction, the team discovered that the LSTM had learned to organize possessions into clusters corresponding to recognizable tactical patterns:

Cluster A: Slow build-up possessions with lateral passing in the opponent's half
Cluster B: Direct counter-attacks through the center
Cluster C: Wing play with crosses into the box
Cluster D: Set-piece sequences (corner kicks, free kicks)
Cluster E: Pressing recoveries leading to immediate shots

These clusters emerged without any explicit tactical labeling --- the LSTM discovered the structure from raw event sequences.

Gate Analysis

The team examined the LSTM's forget gate activations to understand what the model considered important. Key findings:

Possession resets: When the ball was recycled to the goalkeeper or back to the center-backs, the forget gate activated strongly, effectively resetting the threat context.
Progressive carries: Ball carries that advanced more than 10 meters toward goal triggered high input gate activations, indicating the model learned to recognize progressive ball movement as a key threat indicator.
Final-third entries: The moment the ball entered the final third produced a persistent change in the cell state, suggesting the model maintained awareness of territorial progress throughout the remainder of the possession.

Deployment and Integration

Production Pipeline

The LSTM model was deployed within Ashford United's existing analytics infrastructure:

Post-match analysis: The model processed all possessions from each match overnight, generating a "sequence-adjusted xT" value for every event chain.
Scouting integration: The same model was applied to scouting data from target leagues, enabling comparison of player contributions in a sequence-aware framework.
Opposition analysis: By running the model on opponent possessions, the team identified the types of sequences most likely to generate threats against their defensive structure.

Coaching Communication

The analytics team developed a simplified "Sequence Threat Score" (STS) for communication with coaching staff:

$$\text{STS} = \frac{\text{LSTM xT} - \text{Grid xT}}{\text{Grid xT}} \times 100$$

A positive STS indicated that the sequence context made the possession more threatening than location alone would suggest (e.g., a fast counter-attack). A negative STS indicated the opposite (e.g., recycled possession in the final third). This relative metric was more intuitive for coaches than raw probability values.

Limitations and Future Work

No tracking data integration. The model used event data only. Incorporating tracking data (player positions, velocities) would enable the model to capture defensive organization explicitly rather than inferring it from event patterns.
Fixed-length sequences. The 50-event truncation occasionally lost information from very long possessions. An attention mechanism could solve this by allowing the model to focus on the most relevant events regardless of sequence length.
Single-team perspective. The model did not distinguish between possessions by different teams. A team-conditioned model could learn team-specific attacking styles.
Temporal drift. Tactical trends evolve season to season. The team planned to implement rolling retraining on a six-month cycle to mitigate distribution shift.

Discussion Questions

The team chose to predict "shot within next 3 events" rather than "goal within the possession." What are the tradeoffs of each target variable? How might the choice affect the model's utility for different stakeholders?
The LSTM outperformed the vanilla RNN but fell short of the Transformer. Under what circumstances would the additional cost of the Transformer be justified?
The forget gate analysis revealed that the model "resets" when the ball returns to the goalkeeper. Is this always appropriate? Can you think of tactical scenarios where the pre-reset context should be preserved?
How would you validate that the clusters discovered by t-SNE correspond to genuine tactical patterns rather than artifacts of the dimensionality reduction?
If the team wanted to use this model for real-time decision support during matches, what architectural changes would be needed?

Code Implementation

See code/case-study-code.py for the complete Python implementation of the LSTM possession model, including data preparation, training, evaluation, and gate analysis.