Build-Up Play Analysis

Intermediate 10 min read 459 views Nov 25, 2025

Build-Up Play Analysis

Build-up play analysis examines how teams progress the ball from defensive areas to attacking positions through controlled possession. This sophisticated analytical domain focuses on ball progression patterns, passing sequences, tempo variations, and positional structures that facilitate moving the ball upfield. Understanding build-up play is crucial for identifying team tactical patterns, player roles in possession, and vulnerabilities in opposition pressing schemes. Modern analytics quantify build-up effectiveness through metrics that measure both the efficiency and quality of ball progression.

Key Concepts

Analyzing build-up play requires understanding the multi-dimensional aspects of how teams advance the ball:

  • Progressive Passes: Passes that move the ball significantly closer to the opposition goal, typically defined as passes advancing the ball at least 10 yards toward goal or into the penalty area.
  • Build-Up Phases: Sequences of possession categorized by starting zone (defensive third, middle third) and ending in attacking third or shot attempt.
  • Pass Sequence Length: The number of consecutive passes before a possession ends, indicating patience and control in build-up.
  • Progressive Carries: Ball-carrying actions that advance the ball toward the opposition goal by at least 5 yards.
  • Tempo Metrics: Time between passes, possession duration, and speed of ball movement through different zones.
  • Positional Rotations: Player movements and position interchanges during build-up phases that create passing angles.
  • Build-Up Patterns: Recurring passing shapes such as back-to-front vertical passes, triangular combinations, or wide circulation.
  • Press Resistance: Ability to maintain possession and progress the ball when facing defensive pressure.
  • Direct vs. Possession Build-Up: Classification of build-up style based on pass count, field position gains, and time to reach attacking third.

Mathematical Foundation

Progressive Pass Distance (PPD):

PPD = √[(x₂ - goal_x)² + (y₂ - goal_y)²] - √[(x₁ - goal_x)² + (y₁ - goal_y)²]

Where a progressive pass has PPD ≥ 9.144 meters (10 yards) toward goal

Build-Up Completion Rate:

Build-Up % = (Sequences reaching attacking third / Total build-up sequences) × 100

Possession Value Added (PVA):

PVA = Possession Value(end) - Possession Value(start)

Where possession value is based on pitch location and likelihood of scoring

Build-Up Speed:

Build-Up Speed = Distance progressed (yards) / Time elapsed (seconds)

Passing Network Centrality:

Betweenness Centrality = Σ(shortest paths through player / total shortest paths)

Identifies key players in build-up structure

Progressive Passing Ratio:

PPR = Progressive Passes / Total Passes

Field Tilt (Build-Up Penetration):

Field Tilt = (Average team x-coordinate / Total field length) × 100

Sequence Complexity:

Complexity Index = (Unique players involved × Pass count) / Time duration

Python Implementation


import pandas as pd
import numpy as np
from statsbombpy import sb
from mplsoccer import Pitch, VerticalPitch, Sbopen
import matplotlib.pyplot as plt
import networkx as nx
from scipy.spatial import distance

# Load event data
parser = Sbopen()
matches = sb.matches(competition_id=2, season_id=44)
events = sb.events(match_id=3788741)

# Identify progressive passes
def calculate_progressive_passes(events_df):
    """Identify and calculate progressive passes"""
    passes = events_df[events_df['type'] == 'Pass'].copy()

    def is_progressive(row):
        if not isinstance(row.get('location'), list) or not isinstance(row.get('pass_end_location'), list):
            return False

        start_x, start_y = row['location'][0], row['location'][1]
        end_x, end_y = row['pass_end_location'][0], row['pass_end_location'][1]

        # Distance from goal before and after pass
        goal_x, goal_y = 120, 40
        dist_before = np.sqrt((start_x - goal_x)**2 + (start_y - goal_y)**2)
        dist_after = np.sqrt((end_x - goal_x)**2 + (end_y - goal_y)**2)

        # Progressive if moves ball at least 10 yards closer to goal
        progress = dist_before - dist_after
        return progress >= 9.144  # 10 yards in meters

    passes['is_progressive'] = passes.apply(is_progressive, axis=1)

    progressive_stats = {
        'total_passes': len(passes),
        'progressive_passes': passes['is_progressive'].sum(),
        'progressive_rate': (passes['is_progressive'].sum() / len(passes) * 100) if len(passes) > 0 else 0
    }

    return passes, progressive_stats

# Analyze build-up sequences
def analyze_buildup_sequences(events_df, team_name):
    """Extract and analyze build-up sequences"""
    team_events = events_df[events_df['team'] == team_name].copy()
    team_events = team_events.sort_values('timestamp')

    sequences = []
    current_sequence = []
    sequence_id = 0

    for idx, event in team_events.iterrows():
        # Start new sequence if possession starts in defensive/middle third
        if event['type'] in ['Pass', 'Carry', 'Dribble']:
            if isinstance(event.get('location'), list) and event['location'][0] < 80:
                current_sequence.append(event)
            else:
                if len(current_sequence) >= 3:  # Minimum sequence length
                    sequences.append({
                        'sequence_id': sequence_id,
                        'events': current_sequence.copy(),
                        'passes': len([e for e in current_sequence if e['type'] == 'Pass']),
                        'start_x': current_sequence[0]['location'][0] if isinstance(current_sequence[0].get('location'), list) else 0,
                        'end_x': current_sequence[-1].get('pass_end_location', current_sequence[-1].get('location', [0]))[0] if isinstance(current_sequence[-1].get('pass_end_location'), list) or isinstance(current_sequence[-1].get('location'), list) else 0
                    })
                    sequence_id += 1
                current_sequence = []

        # Break sequence on defensive action or turnover
        if event['type'] in ['Interception', 'Tackle', 'Clearance'] or 
           (event['type'] == 'Pass' and not pd.isna(event.get('pass_outcome'))):
            if len(current_sequence) >= 3:
                sequences.append({
                    'sequence_id': sequence_id,
                    'events': current_sequence.copy(),
                    'passes': len([e for e in current_sequence if e['type'] == 'Pass']),
                    'start_x': current_sequence[0]['location'][0] if isinstance(current_sequence[0].get('location'), list) else 0,
                    'end_x': current_sequence[-1].get('pass_end_location', current_sequence[-1].get('location', [0]))[0] if isinstance(current_sequence[-1].get('pass_end_location'), list) or isinstance(current_sequence[-1].get('location'), list) else 0
                })
                sequence_id += 1
            current_sequence = []

    return sequences

# Calculate build-up metrics
def calculate_buildup_metrics(sequences):
    """Calculate comprehensive build-up metrics from sequences"""
    if not sequences:
        return {}

    df_sequences = pd.DataFrame(sequences)
    df_sequences['progression'] = df_sequences['end_x'] - df_sequences['start_x']
    df_sequences['reached_attacking_third'] = df_sequences['end_x'] >= 80

    metrics = {
        'total_sequences': len(sequences),
        'avg_passes_per_sequence': df_sequences['passes'].mean(),
        'avg_progression': df_sequences['progression'].mean(),
        'sequences_to_final_third': df_sequences['reached_attacking_third'].sum(),
        'buildup_success_rate': (df_sequences['reached_attacking_third'].sum() / len(sequences) * 100) if len(sequences) > 0 else 0
    }

    return metrics

# Create passing network for build-up
def create_buildup_passing_network(events_df, team_name):
    """Generate passing network focusing on build-up play"""
    team_passes = events_df[
        (events_df['team'] == team_name) &
        (events_df['type'] == 'Pass') &
        (events_df['pass_outcome'].isna())
    ].copy()

    # Filter for build-up passes (starting in defensive 2/3)
    buildup_passes = team_passes[
        team_passes['location'].apply(lambda x: x[0] if isinstance(x, list) else 0) < 80
    ]

    # Create network graph
    G = nx.DiGraph()

    for _, pass_event in buildup_passes.iterrows():
        passer = pass_event['player']
        receiver = pass_event.get('pass_recipient', 'Unknown')

        if receiver != 'Unknown':
            if G.has_edge(passer, receiver):
                G[passer][receiver]['weight'] += 1
            else:
                G.add_edge(passer, receiver, weight=1)

    # Calculate centrality measures
    if len(G.nodes()) > 0:
        betweenness = nx.betweenness_centrality(G, weight='weight')
        degree_centrality = nx.degree_centrality(G)

        network_stats = pd.DataFrame({
            'player': list(G.nodes()),
            'betweenness': [betweenness.get(n, 0) for n in G.nodes()],
            'degree_centrality': [degree_centrality.get(n, 0) for n in G.nodes()]
        }).sort_values('betweenness', ascending=False)

        return G, network_stats
    return None, None

# Visualize build-up patterns
def visualize_buildup_flow(events_df, team_name):
    """Create flow map of build-up patterns"""
    pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b', line_color='white')
    fig, ax = pitch.draw(figsize=(14, 10))

    team_passes = events_df[
        (events_df['team'] == team_name) &
        (events_df['type'] == 'Pass') &
        (events_df['pass_outcome'].isna())
    ].copy()

    # Filter build-up passes
    buildup_passes = team_passes[
        team_passes['location'].apply(lambda x: x[0] if isinstance(x, list) else 0) < 80
    ]

    # Plot passes
    for _, pass_row in buildup_passes.iterrows():
        if isinstance(pass_row.get('location'), list) and isinstance(pass_row.get('pass_end_location'), list):
            x_start, y_start = pass_row['location'][0], pass_row['location'][1]
            x_end, y_end = pass_row['pass_end_location'][0], pass_row['pass_end_location'][1]

            pitch.arrows(x_start, y_start, x_end, y_end, ax=ax,
                        width=1, headwidth=3, headlength=3,
                        color='#00d9ff', alpha=0.3)

    plt.title(f'{team_name} Build-Up Flow Map', fontsize=16, color='white', pad=20)
    plt.tight_layout()
    return fig

# Progressive carrying analysis
def analyze_progressive_carries(events_df):
    """Analyze ball carries that progress play"""
    carries = events_df[events_df['type'] == 'Carry'].copy()

    def is_progressive_carry(row):
        if not isinstance(row.get('location'), list) or not isinstance(row.get('carry_end_location'), list):
            return False

        start_x = row['location'][0]
        end_x = row['carry_end_location'][0]

        # Progressive if moves at least 5 yards forward
        return (end_x - start_x) >= 4.572  # 5 yards in meters

    carries['is_progressive'] = carries.apply(is_progressive_carry, axis=1)

    progressive_carries = carries[carries['is_progressive']].groupby('player').agg({
        'id': 'count',
        'carry_end_location': lambda x: np.mean([loc[0] for loc in x if isinstance(loc, list)])
    }).rename(columns={'id': 'progressive_carries', 'carry_end_location': 'avg_end_x'})

    return progressive_carries.sort_values('progressive_carries', ascending=False)

# Build-up tempo analysis
def analyze_buildup_tempo(sequences):
    """Analyze the tempo and speed of build-up sequences"""
    tempo_data = []

    for seq in sequences:
        events = seq['events']
        if len(events) >= 2:
            # Calculate time duration
            timestamps = [e['timestamp'] for e in events if 'timestamp' in e]
            if len(timestamps) >= 2:
                # Convert timestamps to seconds (simplified)
                duration = len(timestamps) * 2  # Rough approximation

                tempo_data.append({
                    'sequence_id': seq['sequence_id'],
                    'passes': seq['passes'],
                    'duration_seconds': duration,
                    'tempo': seq['passes'] / duration if duration > 0 else 0,
                    'progression': seq['end_x'] - seq['start_x']
                })

    return pd.DataFrame(tempo_data)

# Example execution
progressive_passes, prog_stats = calculate_progressive_passes(events)
print(f"Progressive passing: {prog_stats['progressive_rate']:.2f}%")

sequences = analyze_buildup_sequences(events, 'Arsenal')
buildup_metrics = calculate_buildup_metrics(sequences)
print(f"
Build-up metrics: {buildup_metrics}")

G, network_stats = create_buildup_passing_network(events, 'Arsenal')
if network_stats is not None:
    print(f"
Top build-up players by centrality:")
    print(network_stats.head())

progressive_carries_stats = analyze_progressive_carries(events)
print(f"
Progressive carries leaders:")
print(progressive_carries_stats.head())

R Implementation


library(tidyverse)
library(StatsBombR)
library(ggsoccer)
library(igraph)
library(ggraph)

# Load match data
competitions <- FreeCompetitions()
matches <- FreeMatches(competitions %>% filter(competition_name == "Premier League"))
events <- get.matchFree(matches$match_id[1])

# Calculate progressive passes
calculate_progressive_passes <- function(events_data) {
  passes <- events_data %>%
    filter(type.name == "Pass")

  passes <- passes %>%
    mutate(
      # Distance to goal before pass
      dist_to_goal_before = sqrt((120 - location.x)^2 + (40 - location.y)^2),
      # Distance to goal after pass
      dist_to_goal_after = sqrt((120 - pass.end_location.x)^2 + (40 - pass.end_location.y)^2),
      # Progression distance
      progression = dist_to_goal_before - dist_to_goal_after,
      # Progressive if moves ball 10+ yards toward goal
      is_progressive = progression >= 9.144
    )

  list(
    passes_data = passes,
    stats = passes %>%
      summarise(
        total_passes = n(),
        progressive_passes = sum(is_progressive, na.rm = TRUE),
        progressive_rate = mean(is_progressive, na.rm = TRUE) * 100
      )
  )
}

# Analyze build-up sequences
analyze_buildup_sequences <- function(events_data, team_name) {
  team_events <- events_data %>%
    filter(team.name == team_name) %>%
    arrange(timestamp)

  # Identify possession sequences starting in defensive/middle third
  sequences <- team_events %>%
    filter(type.name %in% c("Pass", "Carry", "Dribble")) %>%
    mutate(
      possession_group = cumsum(
        type.name %in% c("Interception", "Tackle") |
        !is.na(pass.outcome.name)
      )
    ) %>%
    group_by(possession_group) %>%
    filter(
      n() >= 3,  # Minimum sequence length
      min(location.x, na.rm = TRUE) < 80  # Starts in defensive 2/3
    ) %>%
    summarise(
      sequence_id = cur_group_id(),
      passes = sum(type.name == "Pass"),
      start_x = first(location.x),
      end_x = last(coalesce(pass.end_location.x, location.x)),
      progression = end_x - start_x,
      reached_final_third = end_x >= 80,
      .groups = "drop"
    )

  return(sequences)
}

# Calculate build-up metrics
calculate_buildup_metrics <- function(sequences_data) {
  sequences_data %>%
    summarise(
      total_sequences = n(),
      avg_passes_per_sequence = mean(passes, na.rm = TRUE),
      avg_progression = mean(progression, na.rm = TRUE),
      sequences_to_final_third = sum(reached_final_third, na.rm = TRUE),
      buildup_success_rate = mean(reached_final_third, na.rm = TRUE) * 100,
      max_sequence_length = max(passes, na.rm = TRUE)
    )
}

# Create passing network for build-up
create_buildup_network <- function(events_data, team_name) {
  buildup_passes <- events_data %>%
    filter(
      team.name == team_name,
      type.name == "Pass",
      is.na(pass.outcome.name),
      location.x < 80  # Build-up zone
    ) %>%
    select(player.name, pass.recipient.name) %>%
    filter(!is.na(pass.recipient.name))

  # Create edge list with weights
  edges <- buildup_passes %>%
    group_by(from = player.name, to = pass.recipient.name) %>%
    summarise(weight = n(), .groups = "drop")

  # Create graph
  g <- graph_from_data_frame(edges, directed = TRUE)

  # Calculate centrality measures
  node_metrics <- tibble(
    player = V(g)$name,
    betweenness = betweenness(g, weights = E(g)$weight),
    degree = degree(g, mode = "all"),
    closeness = closeness(g, weights = E(g)$weight)
  ) %>%
    arrange(desc(betweenness))

  list(graph = g, metrics = node_metrics, edges = edges)
}

# Visualize build-up flow
plot_buildup_flow <- function(events_data, team_name) {
  buildup_passes <- events_data %>%
    filter(
      team.name == team_name,
      type.name == "Pass",
      is.na(pass.outcome.name),
      location.x < 80
    )

  ggplot(buildup_passes) +
    annotate_pitch(dimensions = pitch_statsbomb) +
    theme_pitch() +
    geom_segment(
      aes(x = location.x, y = location.y,
          xend = pass.end_location.x, yend = pass.end_location.y),
      arrow = arrow(length = unit(0.15, "cm"), type = "closed"),
      color = "#00d9ff",
      alpha = 0.4,
      linewidth = 0.5
    ) +
    labs(
      title = paste(team_name, "Build-Up Flow Map"),
      subtitle = "Passes originating from defensive/middle third"
    ) +
    theme(
      plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
      plot.subtitle = element_text(hjust = 0.5, size = 11)
    )
}

# Plot passing network
plot_buildup_network <- function(network_data) {
  ggraph(network_data$graph, layout = "fr") +
    geom_edge_link(
      aes(width = weight, alpha = weight),
      arrow = arrow(length = unit(3, "mm"), type = "closed"),
      end_cap = circle(3, "mm"),
      color = "#00d9ff"
    ) +
    geom_node_point(aes(size = betweenness(network_data$graph)),
                   color = "#ff6b6b", alpha = 0.8) +
    geom_node_text(aes(label = name), repel = TRUE, size = 3) +
    scale_edge_width(range = c(0.5, 3)) +
    scale_edge_alpha(range = c(0.3, 0.8)) +
    scale_size_continuous(range = c(3, 10)) +
    theme_graph() +
    labs(title = "Build-Up Passing Network",
         subtitle = "Node size = betweenness centrality, Edge width = pass frequency")
}

# Analyze progressive carries
analyze_progressive_carries <- function(events_data) {
  events_data %>%
    filter(type.name == "Carry") %>%
    mutate(
      carry_distance_forward = carry.end_location.x - location.x,
      is_progressive = carry_distance_forward >= 4.572  # 5 yards
    ) %>%
    filter(is_progressive) %>%
    group_by(player.name, team.name) %>%
    summarise(
      progressive_carries = n(),
      avg_carry_distance = mean(carry_distance_forward, na.rm = TRUE),
      total_progression = sum(carry_distance_forward, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    arrange(desc(progressive_carries))
}

# Build-up tempo analysis
analyze_buildup_tempo <- function(sequences_data) {
  sequences_data %>%
    mutate(
      estimated_duration = passes * 2.5,  # Rough estimate
      tempo = passes / estimated_duration,
      speed = progression / estimated_duration
    ) %>%
    summarise(
      avg_tempo = mean(tempo, na.rm = TRUE),
      avg_speed = mean(speed, na.rm = TRUE),
      fast_buildups = sum(tempo > 0.5, na.rm = TRUE),
      slow_buildups = sum(tempo <= 0.3, na.rm = TRUE)
    )
}

# Player build-up contribution
player_buildup_contribution <- function(events_data, team_name) {
  events_data %>%
    filter(
      team.name == team_name,
      type.name == "Pass",
      location.x < 80
    ) %>%
    group_by(player.name) %>%
    summarise(
      buildup_passes = n(),
      completed = sum(is.na(pass.outcome.name)),
      completion_rate = mean(is.na(pass.outcome.name)) * 100,
      progressive = sum(
        sqrt((120 - location.x)^2 + (40 - location.y)^2) -
        sqrt((120 - pass.end_location.x)^2 + (40 - pass.end_location.y)^2) >= 9.144,
        na.rm = TRUE
      ),
      progressive_rate = progressive / buildup_passes * 100,
      .groups = "drop"
    ) %>%
    filter(buildup_passes >= 10) %>%
    arrange(desc(progressive))
}

# Execute analysis
prog_results <- calculate_progressive_passes(events)
print(prog_results$stats)

sequences <- analyze_buildup_sequences(events, "Arsenal")
buildup_metrics <- calculate_buildup_metrics(sequences)
print(buildup_metrics)

network_data <- create_buildup_network(events, "Arsenal")
print(network_data$metrics)

# Visualizations
buildup_flow_plot <- plot_buildup_flow(events, "Arsenal")
print(buildup_flow_plot)

network_plot <- plot_buildup_network(network_data)
print(network_plot)

# Additional analyses
carries_analysis <- analyze_progressive_carries(events)
print(carries_analysis)

tempo_analysis <- analyze_buildup_tempo(sequences)
print(tempo_analysis)

player_contribution <- player_buildup_contribution(events, "Arsenal")
print(player_contribution)

Practical Applications

Tactical Planning and Opposition Analysis: Build-up analysis reveals how teams structure their possession and identify patterns opponents use to progress the ball. Coaches can prepare defensive pressing schemes targeted at disrupting specific build-up routes. For example, if analysis shows an opponent builds primarily through their left-sided center back and left fullback, the pressing team can overload that side to force turnovers.

Player Recruitment and Role Definition: Progressive passing and carrying metrics identify players capable of advancing play from deep positions. Center backs with high progressive pass numbers are valued in possession-based systems, while midfielders with strong progressive carrying abilities suit teams wanting to break lines through dribbling. Build-up contribution metrics help determine whether players fit tactical systems.

Training and Development: Build-up metrics inform training exercises designed to improve specific aspects of ball progression. Teams struggling with build-up completion can design possession drills focused on playing through pressing, while teams with low progression rates might work on penetrative passing combinations.

In-Game Adjustments: Live tracking of build-up success rates helps coaches identify when tactical adjustments are needed. If build-up sequences repeatedly break down in specific zones, teams might switch to more direct play or adjust positioning to create better passing angles.

Playing Style Optimization: Comparing build-up metrics across different tactical approaches helps teams refine their playing philosophy. Analysis might reveal that shorter passing sequences generate better attacking positions than longer sequences, leading to strategic adjustments in tempo and directness.

Key Takeaways

  • Progressive passes and carries are superior metrics to total passes for evaluating build-up contribution since they measure meaningful ball advancement
  • Build-up success should be measured not just by completion rates but by ability to reach dangerous attacking positions
  • Passing network analysis reveals structural patterns and identifies key players who connect different phases of play
  • Build-up tempo varies significantly between teams and situations; faster tempo doesn't necessarily correlate with better outcomes
  • Modern analytics increasingly emphasize "possession value added" which quantifies how actions improve scoring probability rather than just field position
  • Effective build-up play requires balance between patience (maintaining possession) and penetration (advancing toward goal)
  • Teams must adapt build-up strategies based on opposition pressing intensity and defensive structure
  • Player roles in build-up are revealed through centrality metrics, showing who acts as primary ball progressors versus connectors

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.