35 min read

Passing networks represent one of the most powerful tools in modern soccer analytics, transforming the abstract concept of team play into quantifiable, visual structures. While traditional statistics focus on individual actions—a player's pass...

Learning Objectives

  • Understand graph theory fundamentals as applied to passing networks
  • Construct passing networks from event data using Python and NetworkX
  • Calculate and interpret centrality metrics for player importance
  • Visualize passing networks effectively on pitch diagrams
  • Analyze network-level properties to characterize team playing style
  • Apply temporal network analysis to understand tactical changes
  • Use network fingerprints for team style classification and scouting

Chapter 10: Passing Networks and Analysis

Introduction

Passing networks represent one of the most powerful tools in modern soccer analytics, transforming the abstract concept of team play into quantifiable, visual structures. While traditional statistics focus on individual actions—a player's pass completion rate or a team's possession percentage—passing networks capture the relational dynamics that define how teams actually function on the pitch.

At its core, a passing network treats players as nodes and passes between them as edges, creating a graph structure that can be analyzed using the rich mathematical framework of network theory. This approach reveals patterns invisible to conventional statistics: which players serve as the team's creative hub, how isolated certain positions become during matches, whether a team relies on balanced distribution or funnels play through specific channels, and how tactical adjustments reshape team structure throughout a game.

The power of passing networks lies in their dual nature. They are simultaneously intuitive visualizations that coaches and analysts can interpret at a glance, and rigorous mathematical objects amenable to sophisticated quantitative analysis. A network diagram showing thick edges converging on a central midfielder immediately communicates that player's importance; the same network analyzed through centrality algorithms provides precise numerical measures of that importance.

The historical roots of passing network analysis in soccer trace back to the mid-2000s, when researchers began borrowing techniques from social network analysis and applying them to match event data. Early work by Duch, Waitzman, and Amaral (2010) demonstrated that network centrality could predict tournament performance. Since then, the field has expanded dramatically. Today, professional clubs routinely use passing network analysis in their match preparation, recruitment, and post-match review processes. The mathematical tools have matured alongside the data: the explosion of high-quality event data from providers like StatsBomb, Opta, and Wyscout has made network construction more reliable and granular than ever before.

This chapter develops your understanding of passing networks from foundational concepts through advanced applications. We begin with the mathematical foundations of graph theory, ensuring you understand the structures we're building. We then progress through the practical construction of networks from event data, explore the key metrics used to analyze them, examine visualization techniques, and conclude with advanced topics including temporal analysis, community detection, defensive disruption analysis, and player role identification. Throughout, we maintain focus on practical implementation, providing Python code that you can apply directly to real match data.

Intuition: Think of a passing network as a map of relationships. Just as a social network shows who talks to whom in a community, a passing network shows who passes to whom on a team. The thickness of each connection tells you how strong that relationship is, and the position of each person in the web reveals their role in the group's communication structure.

10.1 Foundations of Network Analysis

10.1.1 Graph Theory Basics

A graph $G = (V, E)$ consists of a set of vertices (or nodes) $V$ and a set of edges $E$ connecting pairs of vertices. In passing networks:

  • Nodes represent players (or positions)
  • Edges represent passes between players
  • Edge weights typically represent the number of passes

Graphs can be directed or undirected:

$$\text{Undirected edge: } \{u, v\} \in E$$ $$\text{Directed edge: } (u, v) \in E$$

In soccer, passing networks are inherently directed—a pass from Player A to Player B is distinct from a pass from Player B to Player A. However, for some analyses (particularly visualization), we may aggregate into undirected networks.

The adjacency matrix $A$ provides a mathematical representation of the graph:

$$A_{ij} = \begin{cases} w_{ij} & \text{if edge } (i,j) \text{ exists with weight } w_{ij} \\ 0 & \text{otherwise} \end{cases}$$

For a team of 11 players, this creates an 11x11 matrix where each entry represents the number of passes from player $i$ to player $j$.

Common Pitfall: Beginners often confuse the adjacency matrix of a directed graph with a symmetric matrix. In a directed passing network, $A_{ij} \neq A_{ji}$ in general. The matrix is only symmetric when the network is undirected -- that is, when you aggregate passes in both directions into a single edge weight. Always check whether you are working with the directed or undirected representation before performing matrix operations.

10.1.2 Directed vs. Undirected Representations

The choice between directed and undirected representations has significant analytical consequences. Directed networks preserve the full information about passing asymmetries: a center-back may pass forward to a midfielder 30 times, while the midfielder passes backward only 10 times. Collapsing this into an undirected edge with weight 40 loses important tactical information about the team's preferred direction of play.

However, undirected representations are sometimes preferred for computational simplicity and for certain metrics (like the clustering coefficient) that are more naturally defined on undirected graphs. A common compromise is to maintain the directed representation internally and convert to undirected only when a specific algorithm requires it.

For visualization purposes, showing both directions can clutter diagrams. Some analysts use net flow (the difference between passes in each direction) to simplify visual displays while still capturing directionality. Others display only the dominant direction of each pair, drawing an arrow from A to B if A passes to B more often than B passes to A.

10.1.3 Network Properties

Several fundamental properties characterize networks:

Density measures how connected the network is relative to the maximum possible connections:

$$D = \frac{|E|}{|V|(|V|-1)}$$

For directed graphs with $n$ nodes, the maximum possible edges is $n(n-1)$. A passing network with high density suggests the team connects all areas of the pitch; low density might indicate reliance on specific passing routes.

Average degree measures the typical number of connections per node:

$$\bar{k} = \frac{1}{n}\sum_{i=1}^{n} k_i$$

where $k_i$ is the degree of node $i$. In passing networks, we distinguish: - Out-degree: passes made by a player - In-degree: passes received by a player

Reciprocity measures the proportion of edges that have corresponding reverse edges:

$$r = \frac{\sum_{i \neq j} A_{ij} \cdot A_{ji}}{\sum_{i \neq j} A_{ij}}$$

High reciprocity indicates balanced passing relationships; low reciprocity suggests hierarchical or one-directional passing patterns. In practical terms, high reciprocity often reflects wall passes, short interchanges, and positional play where players are comfortable exchanging the ball in both directions. Low reciprocity may indicate a more vertical structure -- defenders passing forward to midfielders, who pass forward to attackers, with little reversal.

Diameter measures the longest shortest path between any two nodes in the graph:

$$\text{diameter}(G) = \max_{u,v \in V} d(u, v)$$

A small diameter relative to the number of nodes indicates that the ball can reach any player from any other player through few intermediate passes. Teams that play through a central hub (like a deep-lying playmaker) may have a smaller effective diameter than teams with fragmented passing patterns.

Intuition: Think of a passing network like a city's road map. Some cities have a grid layout with many equal roads (high density, balanced flow). Others have a few major highways that all traffic must use (low density, high centrality on certain nodes). Similarly, some teams distribute passing equally across all players, while others funnel everything through one or two key playmakers. Network metrics let you quantify these structural differences precisely.

Real-World Application: When Liverpool's Virgil van Dijk plays long diagonal passes to the fullbacks, he effectively reduces the network diameter by creating direct connections that bypass intermediate midfield nodes. This structural shortcut is one reason why long-ball ability in center-backs is increasingly valued in the modern game.

10.1.4 Why Networks for Soccer?

Traditional statistics treat events independently, missing the relational structure of team play. Consider these limitations:

  1. Pass completion rate tells you nothing about pass difficulty or destination
  2. Possession percentage doesn't reveal how possession is distributed
  3. Individual metrics ignore combinatory play

Passing networks address these by capturing: - Connectivity patterns: Who passes to whom - Flow dynamics: How the ball moves through the team - Structural roles: Each player's position in the passing structure - Tactical organization: Team shape and preferences

Moreover, passing networks enable questions that are impossible to answer with traditional statistics. How dependent is a team on a single player? What happens when the opposition removes a key connector? Are there "structural holes" -- missing connections that could be exploited? These questions require the relational perspective that network analysis provides.

Real-World Application: FC Barcelona's analytics team under Pep Guardiola used early forms of network analysis to quantify their tiki-taka style. The data showed that Barcelona's network density and clustering coefficient were significantly higher than any other team in La Liga, providing mathematical confirmation of their distinctive passing philosophy. Today, virtually every top club uses network metrics in match preparation.

10.1.5 Network Theory vs. Sequence Analysis

It is worth distinguishing passing networks from passing sequence analysis. A passing network aggregates all passes over a period into a single structure -- you see that Player A passed to Player B 15 times, but you lose the order and timing of those passes. Passing sequence analysis, by contrast, examines the temporal ordering of passes: A to B to C to A to D and so on.

Both approaches have value. Networks excel at revealing structural patterns and identifying roles; sequences excel at capturing tactical patterns like build-up routines and combinatory play. The most complete analyses use both approaches in tandem. For example, you might use network analysis to identify that the left-back-to-left-winger connection is the strongest in the team, then use sequence analysis to examine what happens after that pass -- does the winger cross, cut inside, or play a one-two?

10.2 Constructing Passing Networks

10.2.1 Data Requirements

Building passing networks requires event data with: - Player identifiers (passer and receiver) - Pass outcome (successful/unsuccessful) - Temporal information (minute, possession sequence) - Spatial information (start and end coordinates)

StatsBomb and other providers include all necessary fields:

import pandas as pd
import numpy as np
from statsbombpy import sb

# Load match events
events = sb.events(match_id=3788741)

# Filter to successful passes
passes = events[
    (events['type'] == 'Pass') &
    (events['pass_outcome'].isna())  # NaN indicates success
].copy()

# Extract key fields
passes = passes[['player', 'pass_recipient', 'location', 'pass_end_location',
                 'minute', 'team', 'possession']].dropna(subset=['pass_recipient'])

Common Pitfall: Different data providers encode pass success differently. In StatsBomb data, a successful pass has NaN in the pass_outcome column, which is counterintuitive. Opta uses a qualifiers field. Always verify the encoding before constructing networks, or you may inadvertently include failed passes (inflating edge weights) or exclude successful ones (deflating them).

10.2.2 Weighting, Thresholds, and Directionality

The choices you make when constructing a passing network significantly affect the results. Three decisions are particularly consequential:

Edge weighting. The simplest approach counts raw passes between each pair. But raw counts can be misleading: a pair of center-backs exchanging 25 sideways passes may have the heaviest edge in the network despite minimal tactical significance. Alternative weighting schemes include:

  • xT-weighted passes: Weight each pass by the change in Expected Threat it produces, so a progressive pass into the final third counts more than a lateral pass in the defensive third.
  • Distance-weighted passes: Weight passes by the distance covered, emphasizing long-range distribution.
  • Danger-weighted passes: Weight passes by the xT value of the receiving location, emphasizing passes into dangerous areas.
  • Binary weighting: Set all weights to 1 (unweighted), focusing purely on the existence of connections rather than their frequency.

Thresholds. Showing every single pass in a visualization produces an unreadable mess. Most analysts apply a minimum threshold -- typically 3 to 5 passes -- below which edges are not drawn. This threshold should be chosen relative to the total number of passes; a threshold of 3 might be appropriate for a 15-minute segment but too low for a full 90-minute match where a threshold of 5 or more may be necessary.

Directionality. As discussed in Section 10.1.2, you must decide whether to preserve direction. For most analytical purposes, preserving direction is advisable. For visualization, you may wish to show only the dominant direction of each pair, or use arrow thickness and color to encode both directions simultaneously.

import networkx as nx

def build_passing_network(passes_df, team_name, weight_type='count'):
    """
    Build a passing network for a specific team.

    Parameters
    ----------
    passes_df : DataFrame
        Pass events with 'player', 'pass_recipient', 'team' columns
    team_name : str
        Team to filter for
    weight_type : str
        'count' for raw pass counts, 'distance' for distance-weighted

    Returns
    -------
    G : networkx.DiGraph
        Directed graph with players as nodes and passes as weighted edges
    """
    # Filter to team
    team_passes = passes_df[passes_df['team'] == team_name]

    # Create directed graph
    G = nx.DiGraph()

    if weight_type == 'count':
        # Count passes between each pair
        pass_counts = team_passes.groupby(['player', 'pass_recipient']).size()
        for (passer, receiver), count in pass_counts.items():
            G.add_edge(passer, receiver, weight=count)

    elif weight_type == 'distance':
        for _, row in team_passes.iterrows():
            start = row['location']
            end = row['pass_end_location']
            if isinstance(start, list) and isinstance(end, list):
                dist = np.sqrt((end[0]-start[0])**2 + (end[1]-start[1])**2)
                passer, receiver = row['player'], row['pass_recipient']
                if G.has_edge(passer, receiver):
                    G[passer][receiver]['weight'] += dist
                else:
                    G.add_edge(passer, receiver, weight=dist)

    return G

Best Practice: When comparing passing networks across matches or teams, normalize edge weights by minutes played or by total team passes. A team that has possession for 60 minutes will naturally have more passes than one with 30 minutes of possession, so raw counts can be misleading for comparative analysis.

10.2.3 Node Positioning

Effective visualization requires meaningful node positions. The most common approaches:

Average Position Method: Place nodes at players' average touch locations:

def calculate_average_positions(events_df, team_name):
    """
    Calculate average positions for all players on a team.

    Parameters
    ----------
    events_df : DataFrame
        All events with 'location' column
    team_name : str
        Team to analyze

    Returns
    -------
    dict
        Player name -> (x, y) average position
    """
    team_events = events_df[
        (events_df['team'] == team_name) &
        (events_df['location'].notna())
    ]

    positions = {}

    for player in team_events['player'].unique():
        if pd.isna(player):
            continue

        player_events = team_events[team_events['player'] == player]

        # Extract coordinates from location lists
        coords = player_events['location'].apply(
            lambda x: x if isinstance(x, list) else [np.nan, np.nan]
        )

        x_coords = [c[0] for c in coords if isinstance(c, list) and len(c) == 2]
        y_coords = [c[1] for c in coords if isinstance(c, list) and len(c) == 2]

        if x_coords and y_coords:
            positions[player] = (np.mean(x_coords), np.mean(y_coords))

    return positions

Formation-Based Positioning: Use tactical formation templates adjusted for actual play patterns. This approach places players at idealized formation positions, then nudges them toward their actual average positions. The result is a cleaner visualization that still reflects actual positioning.

Weighted Centroid: Weight positions by event importance (e.g., give more weight to passes than to defensive actions).

Median Positions: Use median rather than mean coordinates. This is more robust to outliers -- a goalkeeper who occasionally rushes upfield for a corner will not have their average position distorted.

Common Pitfall: Average positions can be misleading for players who cover large areas of the pitch. A box-to-box midfielder whose average position shows them at the center circle may have spent equal time near both penalty areas. Consider using position heatmaps alongside average positions, or showing the standard deviation of positions as a size indicator on network nodes.

Best Practice: When calculating average positions, exclude events from after a player was substituted off or before they were substituted on. Including warmup touches or pre-substitution events can distort the network. Also filter to the starting XI only if you want a clean network for tactical analysis.

10.2.4 Handling Substitutions

Substitutions present a challenge: should the substitute inherit the replaced player's node, or be treated separately? Both approaches have merit:

Merge substitutes: Maintains 11 nodes representing positions

def merge_substitute_passes(passes_df, lineup_df):
    """Merge substitutes into their replaced player's position."""
    # Map substitutes to starting players
    position_map = {}
    for _, sub in lineup_df[lineup_df['substitute']].iterrows():
        position_map[sub['player']] = sub['replaced_player']

    # Apply mapping
    passes_df = passes_df.copy()
    passes_df['player'] = passes_df['player'].replace(position_map)
    passes_df['pass_recipient'] = passes_df['pass_recipient'].replace(position_map)

    return passes_df

Keep separate: Preserves individual player analysis but creates larger networks

The choice depends on your analytical goal -- positional analysis favors merging, player evaluation favors separation. In practice, merging is more common for single-match visualizations because it keeps the network clean with exactly 11 nodes.

A third approach is to build separate networks for different lineup configurations. If a team makes a substitution at minute 60, you can build one network for minutes 0-60 and another for minutes 60-90, each with exactly 11 nodes. This approach captures both the positional structure and the tactical shift caused by the substitution.

Advanced: For multi-match analysis, consider building "aggregate networks" that combine passing data across several matches. This smooths out single-match noise and reveals a team's structural tendencies. However, aggregate networks can mask tactical adaptations made for specific opponents. A useful compromise is to build opponent-type-specific networks (e.g., how does the team connect against high-pressing vs. low-block teams).

10.2.5 Time-Filtered Networks

Match dynamics change throughout 90 minutes. Creating time-segmented networks reveals these shifts:

def build_temporal_networks(passes_df, team_name, windows=[(0, 45), (45, 90)]):
    """
    Build separate networks for different time periods.

    Parameters
    ----------
    passes_df : DataFrame
        Pass events
    team_name : str
        Team to analyze
    windows : list of tuples
        (start_minute, end_minute) for each period

    Returns
    -------
    list of networkx.DiGraph
        One network per time window
    """
    networks = []

    for start, end in windows:
        period_passes = passes_df[
            (passes_df['minute'] >= start) &
            (passes_df['minute'] < end)
        ]
        G = build_passing_network(period_passes, team_name)
        networks.append(G)

    return networks

Intuition: Comparing first-half and second-half networks is one of the most powerful tactical analysis techniques. When a coach makes a half-time tactical change, it often shows up clearly in the network structure -- perhaps a midfielder who was central in the first half becomes peripheral, or a new passing connection emerges that did not exist before. This temporal approach reveals "what changed and when" in a way that aggregate match statistics cannot.

10.2.6 Filtering by Game State

Game state -- whether a team is winning, drawing, or losing -- profoundly affects passing patterns. A team leading 2-0 may adopt a more conservative passing structure, while a team trailing will push forward and take more risks. Building separate networks for each game state enables analysis of tactical adaptations:

def build_game_state_networks(events_df, passes_df, team_name):
    """
    Build separate networks for winning, drawing, and losing states.

    Parameters
    ----------
    events_df : DataFrame
        All match events (to identify goals and game state)
    passes_df : DataFrame
        Pass events
    team_name : str
        Team to analyze

    Returns
    -------
    dict
        Game state -> networkx.DiGraph
    """
    # Track running score to assign game state to each minute
    goals = events_df[
        (events_df['type'] == 'Shot') &
        (events_df['shot_outcome'] == 'Goal')
    ].sort_values('minute')

    teams = events_df['team'].unique()
    opponent = [t for t in teams if t != team_name][0]

    # Build minute-by-minute game state
    score = {team_name: 0, opponent: 0}
    state_changes = [(0, 'drawing')]

    for _, goal in goals.iterrows():
        score[goal['team']] += 1
        minute = goal['minute']
        if score[team_name] > score[opponent]:
            state_changes.append((minute, 'winning'))
        elif score[team_name] < score[opponent]:
            state_changes.append((minute, 'losing'))
        else:
            state_changes.append((minute, 'drawing'))

    # Assign state to each pass
    state_networks = {'winning': [], 'drawing': [], 'losing': []}
    for _, p in passes_df[passes_df['team'] == team_name].iterrows():
        current_state = state_changes[0][1]
        for change_min, state in state_changes:
            if p['minute'] >= change_min:
                current_state = state
        state_networks[current_state].append(p)

    # Build networks for each state
    result = {}
    for state, state_passes in state_networks.items():
        if state_passes:
            df = pd.DataFrame(state_passes)
            result[state] = build_passing_network(df, team_name)

    return result

Advanced: Research has found that network centralization increases when teams are losing -- play becomes more focused through star players as the team tries to find a solution. When leading, centralization decreases as teams distribute play more evenly to control the game. These patterns are consistent across leagues and levels of play, suggesting a fundamental tactical response to game state.

10.3 Centrality Metrics

Centrality measures identify the most important nodes in a network. Different centrality definitions capture different aspects of importance.

Common Pitfall: Different centrality metrics tell different stories. A player with high degree centrality (many connections) is not necessarily the same as one with high betweenness centrality (sits on many shortest paths). A defensive midfielder might have moderate degree but very high betweenness because all ball circulation passes through them. Always compute multiple centrality measures and interpret them together rather than relying on a single metric.

10.3.1 Degree Centrality

The simplest centrality measure counts connections:

$$C_D(v) = \frac{k_v}{n-1}$$

where $k_v$ is the degree of node $v$ and $n$ is the number of nodes.

For directed networks, we compute separately:

$$C_{D}^{in}(v) = \frac{k_v^{in}}{n-1}, \quad C_{D}^{out}(v) = \frac{k_v^{out}}{n-1}$$

In passing networks: - High out-degree: Player passes to many teammates (distributor) - High in-degree: Player receives from many teammates (target player)

The ratio of out-degree to in-degree reveals asymmetry in a player's passing role. A player with much higher out-degree than in-degree is a distributor who initiates play but is not often the target of passes. Conversely, a player with high in-degree but moderate out-degree is a focal point who receives the ball frequently, perhaps a target forward or a deep-lying midfielder who collects the ball from defenders.

def calculate_degree_centrality(G, weighted=True):
    """
    Calculate in-degree and out-degree centrality.

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network
    weighted : bool
        If True, use weighted degree (sum of edge weights)

    Returns
    -------
    DataFrame
        Centrality scores for each player
    """
    results = []

    for node in G.nodes():
        if weighted:
            in_deg = sum(G[pred][node]['weight'] for pred in G.predecessors(node))
            out_deg = sum(G[node][succ]['weight'] for succ in G.successors(node))
        else:
            in_deg = G.in_degree(node)
            out_deg = G.out_degree(node)

        results.append({
            'player': node,
            'in_degree': in_deg,
            'out_degree': out_deg,
            'total_degree': in_deg + out_deg
        })

    df = pd.DataFrame(results)

    # Normalize
    max_deg = df['total_degree'].max()
    if max_deg > 0:
        df['degree_centrality'] = df['total_degree'] / max_deg

    return df.sort_values('total_degree', ascending=False)

Intuition: Degree centrality answers the question: "How busy is this player in the passing game?" It is the network equivalent of counting total passes involved in. While simple, it captures a meaningful dimension of influence -- a player who touches the ball more has more opportunities to shape the game.

10.3.2 Betweenness Centrality

Betweenness measures how often a node lies on shortest paths between other nodes:

$$C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}$$

where $\sigma_{st}$ is the number of shortest paths from $s$ to $t$, and $\sigma_{st}(v)$ is the number passing through $v$.

Players with high betweenness serve as critical connectors -- if removed, passing routes between other players become longer or impossible. In soccer terms, betweenness centrality often identifies the "metronome" of the team -- the player through whom the ball must flow for the team to transition between phases of play. Classic examples include Sergio Busquets at Barcelona and Toni Kroos at Real Madrid. These players may not have the highest degree centrality (they are not necessarily the busiest), but they sit at a structural bottleneck that makes them essential for connecting defense to attack.

def calculate_betweenness(G):
    """Calculate betweenness centrality for all nodes."""
    # Convert weights to distances (higher weight = shorter path)
    G_distance = G.copy()
    for u, v, data in G_distance.edges(data=True):
        data['distance'] = 1 / data['weight'] if data['weight'] > 0 else float('inf')

    betweenness = nx.betweenness_centrality(G_distance, weight='distance')

    return pd.DataFrame([
        {'player': player, 'betweenness': score}
        for player, score in betweenness.items()
    ]).sort_values('betweenness', ascending=False)

Real-World Application: When opposition coaches aim to "cut off the supply" to a key player, they are essentially trying to reduce that player's betweenness centrality. By pressing the high-betweenness player tightly, they force the team to find alternative routes -- routes that are longer, less practiced, and less efficient. Identifying and neutralizing the opponent's highest-betweenness player is a core element of tactical game plans at the professional level.

10.3.3 Closeness Centrality

Closeness measures how quickly a node can reach all other nodes:

$$C_C(v) = \frac{n-1}{\sum_{u \neq v} d(v, u)}$$

where $d(v, u)$ is the shortest path distance from $v$ to $u$.

Players with high closeness can efficiently distribute the ball to all areas of the pitch. This metric tends to favor central positions -- both spatially (central midfielders are physically closer to all areas) and structurally (players who connect to many others). A center-back in a build-up-oriented team might have high closeness because they can reach attackers in just two or three passes through various midfield routes.

In-closeness vs. out-closeness. For directed networks, we can compute closeness in two directions. Out-closeness measures how quickly a player can deliver the ball to all teammates; in-closeness measures how quickly the ball can reach a player from any teammate. A deep-lying playmaker typically has high out-closeness (they can find anyone quickly), while a target forward may have high in-closeness (the team can get the ball to them through multiple routes).

Intuition: If degree centrality asks "how busy is this player?", closeness centrality asks "how accessible is this player?" A player with high closeness is at the heart of the network in a structural sense -- they are never far from the action regardless of where the ball is.

10.3.4 Eigenvector Centrality and PageRank

Eigenvector centrality weights a node's importance by the importance of its neighbors:

$$x_i = \frac{1}{\lambda}\sum_{j \in N(i)} A_{ij} x_j$$

where $\lambda$ is the largest eigenvalue of $A$. This creates a recursive definition: important players pass to and receive from other important players.

The intuition here is that being connected to well-connected players matters more than being connected to peripheral ones. A midfielder who frequently exchanges passes with the team's creative playmaker and prolific striker has higher eigenvector centrality than one who only passes to fullbacks.

PageRank, developed by Google's founders for web search, is a variant that handles directed graphs and adds a damping factor:

$$PR(v) = \frac{1-d}{n} + d \sum_{u \in B(v)} \frac{PR(u)}{L(u)}$$

where $d$ is the damping factor (typically 0.85), $B(v)$ is nodes linking to $v$, and $L(u)$ is the out-degree of $u$.

The damping factor has an elegant soccer interpretation: it represents the probability that the ball continues along the network vs. being "reset" (e.g., by a goal kick or throw-in). At $d = 0.85$, there is a 15% chance at each step that the ball effectively teleports to a random player, simulating stoppages and restarts.

def calculate_pagerank(G, damping=0.85):
    """Calculate PageRank centrality for all nodes."""
    pagerank = nx.pagerank(G, alpha=damping, weight='weight')

    return pd.DataFrame([
        {'player': player, 'pagerank': score}
        for player, score in pagerank.items()
    ]).sort_values('pagerank', ascending=False)

10.3.5 Interpreting Centrality in Context

Different centrality measures reveal different player roles:

Centrality Type High Score Indicates Example Player Type
Degree Volume of involvement Busy midfielder
Betweenness Critical connector Holding midfielder
Closeness Efficient distributor Central defender in build-up
PageRank Connected to key players Forward linking with creative players

No single centrality measure tells the complete story. A comprehensive player profile requires examining all four measures together. Consider a player with high betweenness but low degree: this is someone who does not touch the ball frequently but occupies a structurally critical position -- perhaps a holding midfielder who makes 40 passes but each one bridges the defense-to-attack transition. Conversely, a player with high degree but low betweenness touches the ball often but is not structurally essential -- perhaps a center-back exchanging many passes with the other center-back and goalkeeper, none of which are on critical paths to the attack.

Best Practice: Present centrality metrics as a radar chart for each player, showing all four measures normalized to the team maximum. This immediately reveals the player's network profile without requiring the reader to compare multiple tables.

10.3.6 Weighted vs. Unweighted Centrality

A subtle but important choice is whether to use weighted or unweighted centrality. Unweighted centrality treats all connections equally -- a player who received one pass from another has the same connection as one who received 30 passes. Weighted centrality accounts for the volume of interaction.

In practice, weighted centrality is more informative for soccer analysis because the frequency of interaction matters enormously. A midfielder who passes to the left winger once per match has a qualitatively different relationship than one who finds the winger 15 times. However, unweighted centrality can be useful for measuring the breadth of a player's connections independently of volume -- answering "how many different teammates does this player interact with?" rather than "how often?"

The best practice is to compute both weighted and unweighted centrality and examine any discrepancies. A player with high unweighted degree but low weighted degree has many passing connections but none particularly strong -- a generalist. A player with low unweighted degree but high weighted degree has few connections but uses them intensely -- a specialist with preferred passing partners.

10.4 Network-Level Metrics

Beyond individual player importance, we measure properties of the entire network.

10.4.1 Clustering Coefficient

The clustering coefficient measures the degree to which nodes cluster together:

$$C_i = \frac{2T_i}{k_i(k_i-1)}$$

where $T_i$ is the number of triangles through node $i$ and $k_i$ is its degree.

The global clustering coefficient averages across all nodes. High clustering in passing networks indicates triangular passing combinations -- the hallmark of possession-based styles.

Triangles in passing networks represent three players who all pass to each other. These triangular structures are the foundation of combinatory play: player A passes to B, B passes to C, and C passes back to A. Barcelona under Guardiola and Spain during their 2008-2012 dominance were characterized by extremely high clustering coefficients -- their tiki-taka style was built on triangles across the pitch.

def calculate_clustering(G):
    """Calculate local and global clustering coefficients."""
    # Convert to undirected for clustering calculation
    G_undirected = G.to_undirected()

    local_clustering = nx.clustering(G_undirected, weight='weight')
    global_clustering = nx.average_clustering(G_undirected, weight='weight')

    return {
        'local': local_clustering,
        'global': global_clustering
    }

Real-World Application: Low clustering coefficients for specific players may indicate tactical isolation. If a winger has a low clustering coefficient, it means their passing partners do not pass much to each other -- the winger is connecting to different parts of the team rather than participating in triangular combinations. This is typical of wide players in direct, counter-attacking systems.

10.4.2 Network Centralization

While centrality measures individual importance, centralization measures how unequally centrality is distributed across the network:

$$C = \frac{\sum_{i=1}^{n}(C_{max} - C_i)}{max\sum_{i=1}^{n}(C_{max} - C_i)}$$

where $C_{max}$ is the maximum centrality in the network.

High centralization indicates a hierarchical team structure with play focused through one or two players. Low centralization suggests distributed, balanced passing patterns.

The theoretical maximum centralization (1.0) occurs in a star graph -- one central node connected to all others, with no connections between the peripheral nodes. In soccer terms, this would be a team where every pass goes through a single player. The theoretical minimum (0.0) occurs in a complete graph where every player passes equally to every other player. Real teams fall between these extremes, typically with centralization values between 0.2 and 0.6.

def calculate_centralization(G, centrality_type='degree'):
    """
    Calculate network centralization.

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network
    centrality_type : str
        'degree', 'betweenness', or 'closeness'

    Returns
    -------
    float
        Centralization score (0 to 1)
    """
    n = G.number_of_nodes()

    if centrality_type == 'degree':
        centralities = dict(G.degree(weight='weight'))
        # Max possible centralization for star graph
        max_sum = (n - 1) * (n - 1) + (n - 1)
    elif centrality_type == 'betweenness':
        centralities = nx.betweenness_centrality(G, weight='weight')
        max_sum = (n - 1) * (n - 1) * (n - 2) / 2
    else:
        raise ValueError(f"Unknown centrality type: {centrality_type}")

    max_centrality = max(centralities.values())
    sum_differences = sum(max_centrality - c for c in centralities.values())

    return sum_differences / max_sum if max_sum > 0 else 0

Intuition: Centralization answers the question: "Does this team depend on one player or spread the ball evenly?" High centralization teams are vulnerable to opposition targeting their key player; low centralization teams are harder to disrupt but may lack a creative catalyst. Neither extreme is inherently better -- the right balance depends on the team's tactical philosophy and personnel.

10.4.3 Network Density and Team Connectivity

Network density, introduced in Section 10.1.3, deserves further discussion in its tactical context. A team with high network density has many active passing connections -- most player pairs exchange passes during the match. A team with low density has fewer active connections, meaning some player pairs rarely or never exchange passes directly.

High density is generally associated with possession-based styles: these teams move the ball through many different channels, creating alternative passing options. Low density may indicate either a direct style (the ball moves forward quickly through few channels) or a structurally fragmented team (certain parts of the team are disconnected).

However, density alone can be misleading. A team that circulates the ball slowly in their own half, touching every player, will have high density but may not be generating attacking threat. Combining density with spatial information -- where those connections occur -- provides a richer picture.

Connectivity is a related but distinct concept. A network is strongly connected if every node can reach every other node through directed edges. If a passing network is not strongly connected, there exist players who cannot deliver the ball to certain teammates through any sequence of passes -- a sign of severe structural disconnection that usually indicates tactical problems.

10.4.4 Structural Entropy

Network entropy measures the unpredictability of passing patterns:

$$H = -\sum_{i,j} p_{ij} \log(p_{ij})$$

where $p_{ij}$ is the probability of passing from $i$ to $j$.

High entropy indicates varied, unpredictable passing; low entropy suggests repetitive, predictable patterns. From a tactical perspective, entropy captures a team's passing diversity. A team with high entropy is harder for opponents to predict and defend against because their passing patterns are more varied. A team with low entropy tends to use the same passing routes repeatedly, making them vulnerable to opponents who study these patterns.

However, some predictability can be beneficial: well-drilled passing patterns that players execute automatically can be faster and more precise than novel combinations attempted under pressure.

def calculate_pass_entropy(G):
    """Calculate the entropy of the passing distribution."""
    # Get total passes
    total = sum(G[u][v]['weight'] for u, v in G.edges())

    if total == 0:
        return 0

    entropy = 0
    for u, v, data in G.edges(data=True):
        p = data['weight'] / total
        if p > 0:
            entropy -= p * np.log2(p)

    return entropy

Advanced: You can compute entropy at the player level as well -- measuring the diversity of each player's passing distribution. A player who distributes equally to all teammates has high individual entropy (versatile distributor), while one who always passes to the same teammate has low entropy (predictable). Comparing individual entropy values reveals which players add unpredictability to the team's play.

10.4.5 Network Intensity

Network intensity captures how frequently players connect:

$$I = \frac{\sum_{i,j} w_{ij}}{n(n-1)}$$

This is essentially the average edge weight, normalized by possible connections. Higher intensity means more passes per player pair on average. Intensity captures the tempo of a team's passing game: a team that plays many short, quick passes will have higher intensity than one that plays fewer, longer passes over the same time period.

10.5 Visualization Techniques

Effective visualization transforms complex networks into interpretable graphics.

10.5.1 Basic Network Visualization

The fundamental passing network plot places players at their average positions and draws edges weighted by pass frequency:

import matplotlib.pyplot as plt
from mplsoccer import Pitch

def plot_passing_network(G, positions, team_name, ax=None,
                         min_passes=3, node_size_factor=100):
    """
    Create a passing network visualization.

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network
    positions : dict
        Player -> (x, y) positions
    team_name : str
        Team name for title
    ax : matplotlib.axes
        Axis to plot on (created if None)
    min_passes : int
        Minimum passes to show edge
    node_size_factor : float
        Multiplier for node sizes

    Returns
    -------
    matplotlib.axes
    """
    if ax is None:
        pitch = Pitch(pitch_type='statsbomb', pitch_color='#22312b',
                      line_color='white')
        fig, ax = pitch.draw(figsize=(12, 8))

    # Filter to players with positions
    players_with_pos = [p for p in G.nodes() if p in positions]

    # Calculate node sizes based on total passes
    node_passes = {}
    for player in players_with_pos:
        total = (sum(G[player][s]['weight'] for s in G.successors(player)
                     if s in players_with_pos) +
                 sum(G[p][player]['weight'] for p in G.predecessors(player)
                     if p in players_with_pos))
        node_passes[player] = total

    max_passes = max(node_passes.values()) if node_passes else 1

    # Draw edges
    for u, v, data in G.edges(data=True):
        if u not in positions or v not in positions:
            continue
        if data['weight'] < min_passes:
            continue

        x1, y1 = positions[u]
        x2, y2 = positions[v]

        # Edge width proportional to weight
        width = data['weight'] / 5  # Adjust scaling as needed
        alpha = min(0.8, data['weight'] / 20)

        ax.annotate('', xy=(x2, y2), xytext=(x1, y1),
                   arrowprops=dict(arrowstyle='->', color='white',
                                  lw=width, alpha=alpha,
                                  connectionstyle='arc3,rad=0.1'))

    # Draw nodes
    for player in players_with_pos:
        x, y = positions[player]
        size = node_passes.get(player, 10) / max_passes * node_size_factor + 50

        ax.scatter(x, y, s=size, c='#d00027', edgecolors='white',
                  linewidths=2, zorder=10)

        # Add player name (shortened)
        name_parts = player.split()
        short_name = name_parts[-1] if name_parts else player
        ax.annotate(short_name, (x, y), fontsize=8, ha='center',
                   va='bottom', color='white', fontweight='bold')

    ax.set_title(f'{team_name} Passing Network', fontsize=14, color='white')

    return ax

10.5.2 Network Visualization Best Practices

Visualizing passing networks effectively requires balancing information density with readability. Several best practices have emerged from the analytics community:

Node encoding. Node size should encode a meaningful metric -- typically total passes (degree centrality) or a role-specific metric like betweenness. Node color can encode a second dimension, such as position group (defenders in blue, midfielders in green, attackers in red) or community membership detected by clustering algorithms.

Edge encoding. Edge width should encode pass volume, and edge opacity can reinforce this (thicker, more opaque edges for frequent connections). Use curved edges (connectionstyle='arc3,rad=0.1') for directed networks so that passes in both directions between a pair are visually distinguishable.

Threshold selection. Show only edges above a meaningful threshold. A common rule of thumb is to set the threshold at 10-15% of the maximum edge weight. For a match where the strongest connection is 25 passes, a threshold of 3-4 passes is appropriate.

Background. Place the network on a pitch diagram to provide spatial context. Without the pitch, the positions of nodes lack meaning. Use a dark pitch background with light edges for the best visual contrast.

Labels. Use shortened player names (surname only) to avoid clutter. Position labels slightly above or below nodes, not overlapping. For presentations, consider using jersey numbers instead of names.

Color schemes. Avoid using red and green together (colorblind-unfriendly). Use a single color with varying opacity, or a sequential colormap for edge weights.

Best Practice: Always include a legend or annotation explaining what node size and edge thickness represent. A passing network without context is just circles and lines -- the encoding matters as much as the data.

10.5.3 Enhanced Visualizations

Heatmap Matrix: Show passing frequencies between all pairs:

def plot_pass_matrix(G, players_order=None, ax=None):
    """
    Create a heatmap of passing combinations.

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network
    players_order : list
        Order of players (default: alphabetical)
    ax : matplotlib.axes
        Axis to plot on

    Returns
    -------
    matplotlib.axes
    """
    players = players_order or sorted(G.nodes())
    n = len(players)

    # Build matrix
    matrix = np.zeros((n, n))
    for i, p1 in enumerate(players):
        for j, p2 in enumerate(players):
            if G.has_edge(p1, p2):
                matrix[i, j] = G[p1][p2]['weight']

    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 8))

    # Create heatmap
    im = ax.imshow(matrix, cmap='YlOrRd')

    # Labels
    ax.set_xticks(range(n))
    ax.set_yticks(range(n))
    ax.set_xticklabels([p.split()[-1] for p in players], rotation=45, ha='right')
    ax.set_yticklabels([p.split()[-1] for p in players])

    ax.set_xlabel('Pass Recipient')
    ax.set_ylabel('Passer')
    ax.set_title('Passing Matrix')

    plt.colorbar(im, ax=ax, label='Number of Passes')

    return ax

Chord Diagram: Circular layout showing passing flows:

def plot_chord_diagram(G, positions=None, ax=None):
    """
    Create a chord diagram of passing relationships.

    Uses circular layout to show all passing connections
    with edge width proportional to pass volume.
    """
    players = list(G.nodes())
    n = len(players)

    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 10))

    # Circular positions
    angles = np.linspace(0, 2*np.pi, n, endpoint=False)
    radius = 1.0
    pos = {p: (radius * np.cos(a), radius * np.sin(a))
           for p, a in zip(players, angles)}

    # Draw edges
    max_weight = max(d['weight'] for _, _, d in G.edges(data=True))

    for u, v, data in G.edges(data=True):
        x1, y1 = pos[u]
        x2, y2 = pos[v]

        # Bezier curve through center
        width = data['weight'] / max_weight * 5
        alpha = data['weight'] / max_weight * 0.6 + 0.2

        ax.annotate('', xy=(x2, y2), xytext=(x1, y1),
                   arrowprops=dict(arrowstyle='->', color='steelblue',
                                  lw=width, alpha=alpha,
                                  connectionstyle='arc3,rad=0.3'))

    # Draw nodes
    for player in players:
        x, y = pos[player]
        ax.scatter(x, y, s=300, c='darkblue', edgecolors='white',
                  linewidths=2, zorder=10)

        # Label outside the circle
        label_x = x * 1.15
        label_y = y * 1.15
        ax.annotate(player.split()[-1], (label_x, label_y),
                   ha='center', va='center', fontsize=9)

    ax.set_xlim(-1.5, 1.5)
    ax.set_ylim(-1.5, 1.5)
    ax.set_aspect('equal')
    ax.axis('off')
    ax.set_title('Passing Chord Diagram')

    return ax

10.5.4 Interactive Visualizations

For exploratory analysis, interactive tools allow zooming, filtering, and hovering for details:

import plotly.graph_objects as go

def create_interactive_network(G, positions):
    """
    Create an interactive passing network using Plotly.

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network
    positions : dict
        Player -> (x, y) positions

    Returns
    -------
    plotly.graph_objects.Figure
    """
    # Create edge traces
    edge_traces = []
    for u, v, data in G.edges(data=True):
        if u not in positions or v not in positions:
            continue

        x0, y0 = positions[u]
        x1, y1 = positions[v]

        edge_traces.append(go.Scatter(
            x=[x0, x1, None],
            y=[y0, y1, None],
            mode='lines',
            line=dict(width=data['weight']/3, color='rgba(100,100,100,0.5)'),
            hoverinfo='text',
            text=f"{u} -> {v}: {data['weight']} passes"
        ))

    # Create node trace
    node_x = [positions[p][0] for p in G.nodes() if p in positions]
    node_y = [positions[p][1] for p in G.nodes() if p in positions]
    node_text = [p for p in G.nodes() if p in positions]
    node_size = [sum(G[p][s]['weight'] for s in G.successors(p)
                     if s in positions) + 10
                 for p in G.nodes() if p in positions]

    node_trace = go.Scatter(
        x=node_x, y=node_y,
        mode='markers+text',
        hoverinfo='text',
        text=[p.split()[-1] for p in node_text],
        textposition='top center',
        marker=dict(
            size=node_size,
            color='red',
            line=dict(width=2, color='white')
        )
    )

    # Combine into figure
    fig = go.Figure(data=edge_traces + [node_trace])
    fig.update_layout(
        showlegend=False,
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        title='Interactive Passing Network'
    )

    return fig

10.6 Advanced Network Analysis

10.6.1 Community Detection and Positional Clustering

Community detection algorithms identify clusters of highly interconnected players:

from networkx.algorithms import community

def detect_passing_communities(G, resolution=1.0):
    """
    Detect communities in the passing network.

    Uses the Louvain algorithm to find clusters of
    highly interconnected players.

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network
    resolution : float
        Higher values produce more communities

    Returns
    -------
    list of sets
        Each set contains players in a community
    """
    # Convert to undirected with combined weights
    G_undirected = nx.Graph()
    for u, v, data in G.edges(data=True):
        if G_undirected.has_edge(u, v):
            G_undirected[u][v]['weight'] += data['weight']
        else:
            G_undirected.add_edge(u, v, weight=data['weight'])

    # Add reverse edges
    for u, v, data in G.edges(data=True):
        if G_undirected.has_edge(v, u):
            G_undirected[v][u]['weight'] += data['weight']

    # Detect communities
    communities = community.louvain_communities(
        G_undirected, weight='weight', resolution=resolution
    )

    return communities

Communities in passing networks often correspond to: - Defensive unit vs. offensive unit - Left side vs. right side - Build-up players vs. finishing players

The resolution parameter in community detection algorithms controls the granularity of the clusters. At low resolution (e.g., 0.5), you might find just two communities: defense and attack. At high resolution (e.g., 2.0), you might find four or five communities: left defense, right defense, central midfield, left attack, right attack. Experimenting with resolution can reveal the hierarchical structure of the team's passing organization.

Community detection has practical tactical applications. If the detected communities align with traditional positional groups (defense, midfield, attack), it suggests the team plays in a structured, positional way. If communities cut across positional lines -- for example, grouping a left-back with the left winger and left-sided midfielder -- it reveals strong functional connections along flanks that override positional boundaries.

Advanced: Spectral clustering offers an alternative to Louvain community detection. By computing the eigenvectors of the graph Laplacian, spectral methods can identify communities that are not just densely connected internally but also clearly separated from each other. This can be more robust for passing networks where the community boundaries are not sharp.

10.6.2 Formation Detection from Passing Networks

Passing networks can be used to infer a team's effective formation -- which may differ from their nominal formation. By clustering players based on their average positions and passing relationships, we can reconstruct the shape of the team during play:

from sklearn.cluster import KMeans

def detect_formation_from_network(positions, n_lines=3):
    """
    Detect formation structure from player positions.

    Groups players into defensive, midfield, and attacking lines
    based on their average x-positions.

    Parameters
    ----------
    positions : dict
        Player -> (x, y) average position
    n_lines : int
        Number of lines to detect (e.g., 3 for defense-midfield-attack)

    Returns
    -------
    dict
        Line assignments and formation string
    """
    # Exclude goalkeeper (lowest x position)
    players = list(positions.keys())
    x_positions = np.array([positions[p][0] for p in players])

    # Find goalkeeper
    gk_idx = np.argmin(x_positions)
    outfield_players = [p for i, p in enumerate(players) if i != gk_idx]
    outfield_x = np.array([positions[p][0] for p in outfield_players])

    # Cluster into lines using k-means on x-position
    kmeans = KMeans(n_clusters=n_lines, random_state=42)
    line_labels = kmeans.fit_predict(outfield_x.reshape(-1, 1))

    # Sort lines by average x-position (defense to attack)
    line_centers = kmeans.cluster_centers_.flatten()
    line_order = np.argsort(line_centers)

    # Map labels to ordered lines
    label_map = {old: new for new, old in enumerate(line_order)}
    ordered_labels = [label_map[l] for l in line_labels]

    # Count players per line
    line_counts = [0] * n_lines
    for label in ordered_labels:
        line_counts[label] += 1

    formation_string = '-'.join(str(c) for c in line_counts)

    return {
        'formation': formation_string,
        'line_counts': line_counts,
        'player_lines': dict(zip(outfield_players, ordered_labels))
    }

Common Pitfall: Formation detection from average positions can be misleading when players have very fluid roles. A 4-3-3 where one winger frequently cuts inside and the fullback pushes forward might look like a 3-4-3 based on average positions. Always cross-reference detected formations with tactical video review. Formation detection is a useful starting point, not a definitive answer.

10.6.3 Flow Analysis

Beyond static structure, we can analyze how the ball flows through the team:

def analyze_passing_flow(passes_df, team_name, zones=3):
    """
    Analyze passing flow between pitch zones.

    Parameters
    ----------
    passes_df : DataFrame
        Pass events with locations
    team_name : str
        Team to analyze
    zones : int
        Number of horizontal zones (default: 3 for defensive/middle/attacking)

    Returns
    -------
    DataFrame
        Flow matrix between zones
    """
    team_passes = passes_df[passes_df['team'] == team_name].copy()

    # Assign zones based on x-coordinate
    zone_width = 120 / zones

    def get_zone(x):
        return min(int(x / zone_width), zones - 1)

    team_passes['start_zone'] = team_passes['location'].apply(
        lambda loc: get_zone(loc[0]) if isinstance(loc, list) else -1
    )
    team_passes['end_zone'] = team_passes['pass_end_location'].apply(
        lambda loc: get_zone(loc[0]) if isinstance(loc, list) else -1
    )

    # Filter valid passes
    valid = team_passes[(team_passes['start_zone'] >= 0) &
                        (team_passes['end_zone'] >= 0)]

    # Create flow matrix
    flow = valid.groupby(['start_zone', 'end_zone']).size().unstack(fill_value=0)

    # Name zones
    zone_names = ['Defensive', 'Middle', 'Attacking'][:zones]
    flow.index = zone_names[:len(flow.index)]
    flow.columns = zone_names[:len(flow.columns)]

    return flow

10.6.4 Temporal Network Dynamics

Match state changes team structure. Analyzing temporal dynamics reveals adaptation patterns:

def analyze_network_dynamics(passes_df, team_name, window_size=10):
    """
    Analyze how network properties change over time.

    Parameters
    ----------
    passes_df : DataFrame
        Pass events
    team_name : str
        Team to analyze
    window_size : int
        Minutes per window

    Returns
    -------
    DataFrame
        Network metrics over time
    """
    max_minute = int(passes_df['minute'].max())
    windows = range(0, max_minute, window_size)

    results = []

    for start in windows:
        end = start + window_size

        # Filter to window
        window_passes = passes_df[
            (passes_df['minute'] >= start) &
            (passes_df['minute'] < end)
        ]

        # Build network
        G = build_passing_network(window_passes, team_name)

        if G.number_of_edges() == 0:
            continue

        # Calculate metrics
        results.append({
            'window_start': start,
            'window_end': end,
            'num_players': G.number_of_nodes(),
            'total_passes': sum(d['weight'] for _, _, d in G.edges(data=True)),
            'unique_connections': G.number_of_edges(),
            'density': nx.density(G),
            'centralization': calculate_centralization(G)
        })

    return pd.DataFrame(results)

Temporal analysis reveals several common patterns in match dynamics:

  • Early match exploration. In the first 10-15 minutes, networks are often sparser as teams feel out the opponent and settle into their patterns.
  • Score-driven shifts. After scoring, the leading team's network often becomes denser (more conservative passing) while the trailing team's centralization increases (more play through star players seeking an equalizer).
  • Fatigue effects. Late in matches, network density may decrease and average pass distance may increase as tired players resort to longer passes rather than short combinations.
  • Tactical substitution effects. When a defensive midfielder is replaced by an attacking midfielder, the team's centrality distribution shifts, and new passing connections emerge while old ones weaken.

Real-World Application: Temporal network analysis can identify the moment a team's tactical shape collapses. If network density drops sharply in a particular 10-minute window, it may indicate a period where the team lost structural cohesion -- often corresponding to conceding goals. Post-match analysis teams at professional clubs use this kind of temporal breakdown routinely.

10.6.5 Defensive Disruption Through Network Analysis

Passing networks are not only useful for analyzing your own team's play -- they can reveal how to disrupt opponents. By studying an opponent's passing network before a match, analysts can identify structural vulnerabilities:

Removing the hub. If the opponent has a single player with dominant betweenness centrality, pressing or man-marking that player can fragment the entire network. The analytical question is: "What happens to network connectivity if we remove this node?" This can be computed directly:

def simulate_player_removal(G, player_to_remove):
    """
    Simulate removing a player from the network and measure impact.

    Parameters
    ----------
    G : networkx.DiGraph
        Original passing network
    player_to_remove : str
        Player to remove

    Returns
    -------
    dict
        Impact metrics
    """
    # Original metrics
    original_density = nx.density(G)
    original_edges = G.number_of_edges()
    original_total = sum(d['weight'] for _, _, d in G.edges(data=True))

    # Remove player
    G_reduced = G.copy()
    G_reduced.remove_node(player_to_remove)

    # New metrics
    new_density = nx.density(G_reduced)
    new_edges = G_reduced.number_of_edges()
    new_total = sum(d['weight'] for _, _, d in G_reduced.edges(data=True))

    return {
        'player_removed': player_to_remove,
        'density_change': new_density - original_density,
        'edges_lost': original_edges - new_edges,
        'passes_lost': original_total - new_total,
        'passes_lost_pct': (original_total - new_total) / original_total * 100
    }

Cutting key edges. Rather than neutralizing a single player, the opponent's most critical passing connections can be targeted. An edge with high betweenness centrality that also carries high volume is a structural artery -- blocking it forces the team to find less practiced alternative routes.

Overloading one side. If community detection reveals the opponent has strong left-side and right-side communities with a weak connection between them, forcing play to the weaker side (by overloading the strong side defensively) can reduce the opponent's effectiveness significantly.

Advanced: The concept of "network resilience" from complex network theory is directly applicable here. A team whose network remains connected after the removal of any single player is more tactically resilient than one that fragments. Computing the algebraic connectivity (the second-smallest eigenvalue of the Laplacian) provides a single number measuring this resilience. Teams with higher algebraic connectivity are more robust to opponents' pressing tactics.

10.6.6 Player Role Identification

Network position can identify tactical roles independent of nominal positions:

def identify_network_roles(G):
    """
    Identify player roles based on network position.

    Categories:
    - Hub: High centrality, connects many players
    - Distributor: High out-degree, moderate centrality
    - Target: High in-degree, lower out-degree
    - Connector: High betweenness, moderate degree
    - Peripheral: Low centrality overall

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network

    Returns
    -------
    DataFrame
        Players with their network roles
    """
    # Calculate various centralities
    degree = dict(G.degree(weight='weight'))
    in_degree = dict(G.in_degree(weight='weight'))
    out_degree = dict(G.out_degree(weight='weight'))
    betweenness = nx.betweenness_centrality(G, weight='weight')
    pagerank = nx.pagerank(G, weight='weight')

    results = []

    for player in G.nodes():
        metrics = {
            'player': player,
            'degree': degree.get(player, 0),
            'in_degree': in_degree.get(player, 0),
            'out_degree': out_degree.get(player, 0),
            'betweenness': betweenness.get(player, 0),
            'pagerank': pagerank.get(player, 0)
        }

        # Classify role
        if metrics['betweenness'] > np.percentile(list(betweenness.values()), 75):
            if metrics['degree'] > np.median(list(degree.values())):
                role = 'Hub'
            else:
                role = 'Connector'
        elif metrics['out_degree'] > 1.5 * metrics['in_degree']:
            role = 'Distributor'
        elif metrics['in_degree'] > 1.5 * metrics['out_degree']:
            role = 'Target'
        elif metrics['degree'] < np.percentile(list(degree.values()), 25):
            role = 'Peripheral'
        else:
            role = 'Balanced'

        metrics['role'] = role
        results.append(metrics)

    return pd.DataFrame(results)

The role classification above is heuristic. More sophisticated approaches use clustering algorithms on the centrality vectors: representing each player as a point in a multi-dimensional centrality space and clustering to find natural groupings. This data-driven approach can discover role types that do not fit neatly into predefined categories.

10.7 Tactical Applications

10.7.1 Team Style Classification

Network properties can classify playing styles:

def classify_team_style(G, positions):
    """
    Classify team playing style based on network properties.

    Returns
    -------
    dict
        Style classification with supporting metrics
    """
    metrics = {}

    # Network density
    metrics['density'] = nx.density(G)

    # Centralization (how dependent on key players)
    metrics['centralization'] = calculate_centralization(G)

    # Average path length (how directly they play)
    try:
        # Convert weights to distances for path calculation
        G_dist = G.copy()
        for u, v, d in G_dist.edges(data=True):
            d['distance'] = 1 / d['weight'] if d['weight'] > 0 else float('inf')
        metrics['avg_path_length'] = nx.average_shortest_path_length(G_dist, weight='distance')
    except:
        metrics['avg_path_length'] = float('inf')

    # Clustering (triangle play)
    G_undirected = G.to_undirected()
    metrics['clustering'] = nx.average_clustering(G_undirected, weight='weight')

    # Vertical vs horizontal tendency
    if positions:
        vertical_passes = 0
        horizontal_passes = 0
        for u, v, d in G.edges(data=True):
            if u in positions and v in positions:
                dx = abs(positions[v][0] - positions[u][0])
                dy = abs(positions[v][1] - positions[u][1])
                if dx > dy:
                    vertical_passes += d['weight']
                else:
                    horizontal_passes += d['weight']

        metrics['vertical_ratio'] = (vertical_passes / (vertical_passes + horizontal_passes)
                                     if (vertical_passes + horizontal_passes) > 0 else 0.5)

    # Classify
    if metrics['density'] > 0.5 and metrics['clustering'] > 0.3:
        style = 'Possession-Based'
    elif metrics['centralization'] > 0.6:
        style = 'Star-Dependent'
    elif metrics.get('vertical_ratio', 0.5) > 0.6:
        style = 'Direct'
    elif metrics['density'] < 0.3:
        style = 'Fragmented'
    else:
        style = 'Balanced'

    return {
        'style': style,
        'metrics': metrics
    }

10.7.2 Comparing Passing Styles Between Teams

Comparing network properties identifies tactical differences between teams within a match or across a season:

def compare_team_networks(G1, G2, team1_name, team2_name):
    """
    Compare passing networks of two teams.

    Parameters
    ----------
    G1, G2 : networkx.DiGraph
        Passing networks for each team
    team1_name, team2_name : str
        Team names

    Returns
    -------
    DataFrame
        Comparative metrics
    """
    metrics = []

    for G, name in [(G1, team1_name), (G2, team2_name)]:
        if G.number_of_edges() == 0:
            continue

        total_passes = sum(d['weight'] for _, _, d in G.edges(data=True))

        metrics.append({
            'team': name,
            'total_passes': total_passes,
            'unique_connections': G.number_of_edges(),
            'density': nx.density(G),
            'centralization': calculate_centralization(G),
            'avg_passes_per_link': total_passes / G.number_of_edges() if G.number_of_edges() > 0 else 0,
            'clustering': nx.average_clustering(G.to_undirected(), weight='weight')
        })

    return pd.DataFrame(metrics)

For season-level comparisons, you can aggregate passing networks across multiple matches. This requires careful normalization: sum edge weights across matches, then normalize by minutes played. The resulting "average match" network smooths out match-specific variation and reveals the team's underlying passing structure.

Real-World Application: Scouting departments use season-averaged passing networks to assess whether a transfer target would fit a team's playing style. A midfielder from a team with high centralization (star-dependent play) may struggle in a team with low centralization (distributed play) because they are accustomed to receiving the ball in specific patterns that the new team does not create. Network compatibility analysis is an emerging area of recruitment analytics.

10.7.3 Identifying Key Connections

Some passing combinations are more important than raw volume suggests:

def identify_key_connections(G, n_top=10):
    """
    Identify the most important passing connections.

    Uses edge betweenness (how often the edge is on shortest paths)
    combined with volume.

    Parameters
    ----------
    G : networkx.DiGraph
        Passing network
    n_top : int
        Number of top connections to return

    Returns
    -------
    DataFrame
        Top connections with metrics
    """
    # Edge betweenness
    edge_betweenness = nx.edge_betweenness_centrality(G, weight='weight')

    connections = []
    for (u, v), betweenness in edge_betweenness.items():
        weight = G[u][v]['weight']
        connections.append({
            'passer': u,
            'receiver': v,
            'passes': weight,
            'betweenness': betweenness,
            'importance': betweenness * np.log1p(weight)  # Combined score
        })

    df = pd.DataFrame(connections)
    return df.nlargest(n_top, 'importance')

10.8 Practical Implementation

10.8.1 Complete Analysis Pipeline

class PassingNetworkAnalyzer:
    """
    Complete passing network analysis pipeline.

    This class provides end-to-end functionality for building,
    analyzing, and visualizing passing networks from event data.
    """

    def __init__(self, events_df, team_name):
        """
        Initialize analyzer with event data.

        Parameters
        ----------
        events_df : DataFrame
            All match events
        team_name : str
            Team to analyze
        """
        self.events_df = events_df
        self.team_name = team_name

        # Filter to successful passes
        self.passes = events_df[
            (events_df['type'] == 'Pass') &
            (events_df['team'] == team_name) &
            (events_df['pass_outcome'].isna())
        ].copy()

        # Build network
        self.G = self._build_network()

        # Calculate positions
        self.positions = self._calculate_positions()

    def _build_network(self):
        """Build the passing network."""
        G = nx.DiGraph()

        pass_counts = self.passes.groupby(
            ['player', 'pass_recipient']
        ).size()

        for (passer, receiver), count in pass_counts.items():
            if pd.notna(passer) and pd.notna(receiver):
                G.add_edge(passer, receiver, weight=count)

        return G

    def _calculate_positions(self):
        """Calculate average positions."""
        team_events = self.events_df[
            (self.events_df['team'] == self.team_name) &
            (self.events_df['location'].notna())
        ]

        positions = {}
        for player in team_events['player'].unique():
            if pd.isna(player):
                continue

            player_events = team_events[team_events['player'] == player]
            locs = player_events['location'].apply(
                lambda x: x if isinstance(x, list) else None
            ).dropna()

            if len(locs) > 0:
                x_avg = np.mean([loc[0] for loc in locs])
                y_avg = np.mean([loc[1] for loc in locs])
                positions[player] = (x_avg, y_avg)

        return positions

    def get_summary_stats(self):
        """Get summary statistics for the network."""
        G = self.G

        return {
            'total_passes': sum(d['weight'] for _, _, d in G.edges(data=True)),
            'num_players': G.number_of_nodes(),
            'unique_combinations': G.number_of_edges(),
            'density': nx.density(G),
            'clustering': nx.average_clustering(G.to_undirected(), weight='weight'),
            'centralization': calculate_centralization(G)
        }

    def get_player_metrics(self):
        """Calculate all centrality metrics for players."""
        G = self.G

        degree = calculate_degree_centrality(G)
        betweenness = calculate_betweenness(G)
        pagerank = calculate_pagerank(G)

        # Merge
        result = degree.merge(betweenness, on='player').merge(pagerank, on='player')

        return result.sort_values('total_degree', ascending=False)

    def get_top_connections(self, n=10):
        """Get top passing combinations."""
        return identify_key_connections(self.G, n_top=n)

    def plot_network(self, ax=None, **kwargs):
        """Create network visualization."""
        return plot_passing_network(
            self.G, self.positions, self.team_name, ax=ax, **kwargs
        )

    def plot_matrix(self, ax=None):
        """Create passing matrix heatmap."""
        return plot_pass_matrix(self.G, ax=ax)

    def compare_to(self, other_analyzer):
        """Compare this network to another team's."""
        return compare_team_networks(
            self.G, other_analyzer.G,
            self.team_name, other_analyzer.team_name
        )

10.8.2 Example Analysis

def complete_network_analysis(match_id):
    """
    Perform complete passing network analysis for a match.

    Parameters
    ----------
    match_id : int
        StatsBomb match ID

    Returns
    -------
    dict
        Analysis results for both teams
    """
    # Load data
    events = sb.events(match_id=match_id)

    # Get team names
    teams = events['team'].unique()

    results = {}

    for team in teams:
        analyzer = PassingNetworkAnalyzer(events, team)

        results[team] = {
            'summary': analyzer.get_summary_stats(),
            'player_metrics': analyzer.get_player_metrics(),
            'top_connections': analyzer.get_top_connections(),
            'analyzer': analyzer
        }

    # Create visualizations
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))

    for i, (team, data) in enumerate(results.items()):
        data['analyzer'].plot_network(ax=axes[0, i])
        data['analyzer'].plot_matrix(ax=axes[1, i])

    plt.tight_layout()

    # Print comparison
    if len(results) == 2:
        team_names = list(results.keys())
        comparison = results[team_names[0]]['analyzer'].compare_to(
            results[team_names[1]]['analyzer']
        )
        print("\nTeam Comparison:")
        print(comparison.to_string(index=False))

    return results

10.9 Limitations and Considerations

10.9.1 Passing Networks Do Not Capture Off-Ball Movement

The most significant limitation of passing networks is that they represent only on-ball events. The vast majority of what happens in a soccer match -- player runs, positional rotations, pressing triggers, space creation -- is invisible to passing networks. A player who makes a brilliant decoy run to create space for a teammate generates zero signal in the passing network, even though their contribution is tactically essential.

This limitation means passing networks systematically undervalue off-ball contributors: pressing forwards who win the ball back through movement rather than interceptions, wide players who stretch defenses by holding width, and defenders whose positioning prevents opposition passes from occurring in the first place.

Common Pitfall: Do not mistake absence from the passing network for absence of contribution. Some of the most tactically important players in a match may have low centrality in the passing network because their value comes from what they do without the ball. Always complement passing network analysis with pressing metrics, tracking data (when available), and video review.

10.9.2 Data Limitations

Passing networks have inherent limitations:

  1. Only captures on-ball events: Off-ball movement, pressing, and space creation are invisible
  2. Binary success/failure: Doesn't capture pass quality beyond completion
  3. Static representation of dynamic play: Aggregating 90 minutes loses temporal information
  4. Squad selection effects: Networks reflect who played, not tactical intentions
  5. Opposition effects: Network shape depends partly on opponent behavior

10.9.3 Interpretation Cautions

  • High centrality isn't always good: A player might be central because teammates have no alternatives
  • Network density context: High density might indicate inability to progress, not just possession quality
  • Formation effects: 4-3-3 and 3-5-2 will produce structurally different networks regardless of playing style
  • Score effects: Networks change drastically when protecting leads or chasing games
  • Small sample sizes: A single match produces a network based on perhaps 300-500 passes, which may not be representative of the team's typical structure. Season-averaged networks are more stable but lose match-specific tactical detail.

10.9.4 Best Practices

  1. Always consider match context: Score, opponent, home/away all matter
  2. Use multiple metrics: No single measure captures full picture
  3. Compare like with like: Compare teams in similar situations
  4. Temporal analysis: Segment matches to capture dynamics
  5. Validate visually: Always inspect networks alongside statistics
  6. Normalize for comparison: Adjust for possession time, total passes, or match minutes when comparing networks
  7. Combine with other data sources: Use passing networks alongside xG, xT, pressing metrics, and (where available) tracking data for a complete picture
  8. Document your choices: Record the threshold, weighting scheme, and time window used so that analyses are reproducible

Best Practice: When presenting passing network analysis to coaches or non-technical stakeholders, lead with the visualization and let the numbers support what the picture shows. A well-designed passing network diagram communicates more intuitively than a table of centrality scores. Reserve the mathematical details for written reports where readers can engage at their own pace.

10.10 Summary

Passing networks transform qualitative notions of team play into quantitative structures amenable to rigorous analysis. This chapter has covered:

  1. Graph theory foundations: Nodes, edges, adjacency matrices, and fundamental properties including density, degree, reciprocity, and diameter
  2. Network construction: Building networks from event data with proper handling of weighting, thresholds, directionality, positions, substitutions, and game state filtering
  3. Centrality metrics: Degree, betweenness, closeness, eigenvector, and PageRank for measuring player importance from different perspectives
  4. Network-level metrics: Density, clustering, centralization, entropy, and intensity for characterizing team structure
  5. Visualization: Static plots, matrices, chord diagrams, interactive tools, and best practices for effective communication
  6. Advanced analysis: Community detection, formation detection, flow analysis, temporal dynamics, defensive disruption, and role identification
  7. Tactical applications: Style classification, inter-team comparison, and key connection identification
  8. Limitations: Off-ball movement invisibility, data constraints, interpretation cautions, and recommended best practices

The mathematical framework of network theory provides a powerful lens for understanding team dynamics, but meaningful analysis requires combining these tools with soccer knowledge and contextual awareness. Networks reveal structure; interpretation requires expertise.

In the next chapter, we will explore possession and territorial control, extending these ideas to understand how teams control space beyond individual passing connections.

References

  1. Grund, T. U. (2012). Network structure and team performance: The case of English Premier League soccer teams. Social Networks, 34(4), 682-690.

  2. Pena, J. L., & Touchette, H. (2012). A network theory analysis of football strategies. arXiv preprint arXiv:1206.6904.

  3. Clemente, F. M., et al. (2015). General network analysis of national soccer teams in FIFA World Cup 2014. International Journal of Performance Analysis in Sport, 15(1), 80-96.

  4. Buldu, J. M., et al. (2019). Using network science to analyse football passing networks: Dynamics, space, time, and the multilayer nature of the game. Frontiers in Psychology, 10, 1900.

  5. Yamamoto, Y., & Yokoyama, K. (2011). Common and unique network dynamics in football games. PloS One, 6(12), e29638.

  6. Duch, J., Waitzman, J. S., & Amaral, L. A. N. (2010). Quantifying the performance of individual players in a team activity. PloS One, 5(6), e10937.

  7. Goncalves, B., et al. (2017). Exploring team passing networks and player movement dynamics in youth association football. PloS One, 12(1), e0171156.

  8. Rein, R., & Memmert, D. (2016). Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science. SpringerPlus, 5(1), 1410.