Chapter 32: Spanning Trees, Minimum Spanning Trees, and Network Design

DataField.Dev

39 min read

> "The cheapest network that still connects everything is almost never the obvious one — but it is always a tree."

Prerequisites

31
29

Learning Objectives

Define a spanning tree of a connected graph and count the edges any spanning tree must contain.
State the minimum spanning tree (MST) problem precisely and recognize it inside real network-design tasks.
Execute Kruskal's algorithm by hand and implement it, using a union-find data structure to detect cycles efficiently.
Execute and implement Prim's algorithm using a priority queue, and choose between Kruskal and Prim for a given graph.
Prove that the greedy MST algorithms are correct using the cut property, and state the complementary cycle property.
Apply MSTs to network-design problems and explain the limits of what an MST does and does not optimize.

In This Chapter

Overview
Learning Paths
32.1 Spanning trees
32.2 The minimum spanning tree problem
32.3 Kruskal's algorithm and union-find
32.4 Prim's algorithm
32.5 Why the greedy MST algorithms are correct: the cut property
32.6 Network design applications
Project Checkpoint: mst_kruskal for the Toolkit
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 32: Spanning Trees, Minimum Spanning Trees, and Network Design

"The cheapest network that still connects everything is almost never the obvious one — but it is always a tree." — a maxim every network engineer eventually rediscovers

Overview

Suppose you are the engineer responsible for wiring a new data center. There are nine server racks, and any pair of racks could be connected by a fiber run — but cable, conduit, and labor cost money, and the cost differs from pair to pair (some racks are close, some are across the room, some need a run through the ceiling). You do not need every pair connected directly. You only need the racks to form one connected network, so that any rack can reach any other, possibly by hopping through intermediate racks. Question: which cables do you lay to connect everything for the least total cost?

This is not a contrived puzzle. It is the literal shape of laying out an electrical grid, a road network between towns, a clustering of data points, a broadcast tree for streaming, and the original 1926 problem that started the field — designing an electricity network for Moravia. In every case the answer has the same structure, and that structure is a tree: a connected, acyclic subgraph that touches every vertex. The cheapest such tree is a minimum spanning tree, and this chapter is about finding it.

Here is why the result is beautiful rather than merely useful. The naive approach — "try every possible set of cables and keep the cheapest connected one" — is hopeless: even a modest graph has astronomically many spanning subgraphs. Yet two strikingly simple greedy strategies — "always grab the cheapest edge that doesn't create a cycle," or "always grow your tree by its cheapest outgoing edge" — find the provably optimal answer every time. Greedy algorithms usually give good-enough approximations; for the MST problem, greed is exactly right. Proving why greed works here is one of the cleanest correctness arguments in all of algorithms, and it rests on a single idea called the cut property. By the end of this chapter you'll be able to defend the optimality of your network in a design review, not just assert it.

In this chapter you will learn to:

Recognize a spanning tree and count its edges without drawing one.
State the MST problem precisely and spot it disguised inside a real engineering task.
Run Kruskal's algorithm by hand, and implement it with a fast union-find structure that answers "are these two vertices already connected?" in nearly constant time.
Run Prim's algorithm by hand, and implement it with a priority queue.
Prove both algorithms correct via the cut property, and use the cycle property to reason about which edges can be safely discarded.
Apply MSTs to network design — and articulate precisely what an MST optimizes and what it does not.

Learning Paths

🏃 Fast Track: If you already know what a spanning tree is, skim §32.1, then focus on §32.3 (Kruskal + union-find) and §32.4 (Prim) for the algorithms, and read the statement of the cut property in §32.5 even if you skip its proof. Do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. Trace both algorithms by hand on the worked example before reading our traces. Work every 🔄 Check Your Understanding. Read the cut-property proof in §32.5 slowly — it is the intellectual core of the chapter.

🔬 Deep Dive: After the chapter, prove the cycle property in full (we sketch it), prove that a graph's MST is unique when all edge weights are distinct, and read about the inverse-Ackermann analysis of union-find and Borůvka's algorithm in Further Reading.

32.1 Spanning trees

Let's start with the structure before the optimization. Forget costs for a moment and ask the simpler question: given a connected graph, what is the smallest set of its edges that still keeps every vertex reachable from every other?

Consider this small network of five offices, where an edge means "we could run a cable here":

        A
       / \
      B---C
      |   |
      D---E

The graph has vertices $\{A,B,C,D,E\}$ and edges $\{AB, AC, BC, BD, CE, DE\}$ — six edges. It is connected: you can get from any office to any other. But it is more connected than it needs to be. The triangle $A\text{–}B\text{–}C$ contains a cycle, and the square $B\text{–}D\text{–}E\text{–}C$ contains a cycle. A cycle means redundancy: if edge $BC$ fails, you can still get from $B$ to $C$ via $A$. For pure connectivity, redundancy is wasted edges.

If you delete edges to break every cycle while keeping the graph connected, you are left with a tree — and a tree that includes all of the original vertices is called a spanning tree.

🔗 Connection. You met the term spanning tree informally in Chapter 28, where a breadth-first or depth-first traversal of a graph implicitly builds one (the tree of edges actually used to first reach each vertex). And Chapter 31 gave you the full theory of trees: a tree is a connected acyclic graph, and a tree on $n$ vertices has exactly $n-1$ edges. This chapter puts those two ideas together and then adds weights.

Recall the precise definition from Chapter 31 of a tree: a connected, undirected graph with no cycles. We now name the special trees that live inside a larger graph.

Definition (spanning tree). Let $G = (V, E)$ be a connected, undirected graph. A spanning tree of $G$ is a subgraph $T = (V, E_T)$ with $E_T \subseteq E$ that is a tree and includes every vertex of $G$. Equivalently, $T$ is connected, acyclic, and spans $V$.

Three observations follow immediately, and the third is the one you'll use constantly.

A spanning tree exists if and only if $G$ is connected. (If $G$ is disconnected, no single tree can reach all vertices; if $G$ is connected, you can always delete edges from cycles until none remain — and you never have to disconnect the graph to break a cycle, because every edge on a cycle has an alternative route.)
A graph usually has many spanning trees. Our five-office graph has several; we'll count them shortly.
Every spanning tree of a connected graph on $n$ vertices has exactly $n - 1$ edges. This is not a coincidence — it is forced by the definition of a tree, proved in Chapter 31. For our five offices, $n = 5$, so every spanning tree has exactly $4$ edges. We started with $6$ edges; every spanning tree drops exactly $2$ of them (one per independent cycle).

Here is one spanning tree of the office graph (we deleted $BC$ and $DE$):

Four edges $\{AB, AC, BD, CE\}$, every vertex present, no cycles, fully connected. That is a spanning tree.

💡 Intuition: A spanning tree is the "skeleton" of a connected graph — the minimum scaffolding that holds all the vertices together in one piece. Add any one more edge of $G$ to a spanning tree and you create exactly one cycle (because the two endpoints were already connected by a unique tree path). Remove any one edge from a spanning tree and you split it into exactly two pieces. A spanning tree lives right on the knife's edge between "connected" and "acyclic": it is maximally acyclic and minimally connected at the same time.

How many spanning trees? (a worked example)

Let's count the spanning trees of the office graph. A spanning tree must use $4$ of the $6$ edges and be acyclic and connected. Equivalently, we delete $2$ of the $6$ edges and keep what remains, provided the remainder is still connected and acyclic. There are $\binom{6}{2} = 15$ pairs to delete, so this is small enough to reason about — but it is exactly the kind of error-prone hand-work that motivates the algorithms in the rest of the chapter.

The graph is built from three cycles: the triangle $A\text{–}B\text{–}C$, the square $B\text{–}D\text{–}E\text{–}C\text{–}B$, and the outer pentagon $A\text{–}B\text{–}D\text{–}E\text{–}C\text{–}A$. To leave a tree, the two deleted edges must destroy all three cycles without isolating a vertex. Vertex $A$ is held only by $AB$ and $AC$, so we may not delete both of those; likewise we may not delete both edges at $D$ ($BD, DE$) nor both at $E$ ($CE, DE$). Removing those four "would-isolate-a-vertex" pairs — $\{AB,AC\}$, $\{BD,DE\}$, $\{CE,DE\}$, and $\{BD,CE\}$ (which leaves the triangle plus a stranded $\{D,E\}$) — from the $15$ leaves exactly

$$15 - 4 = 11 \text{ spanning trees.}$$

(You can confirm by checking that each of the other $11$ deletion-pairs leaves a connected, acyclic 4-edge graph.) Eleven spanning trees for a five-vertex graph — and they generally differ in total weight, which is what makes the minimum one worth searching for.

🔗 Connection. There is a gorgeous closed-form answer to "how many spanning trees?" called Kirchhoff's matrix-tree theorem: the number equals any cofactor of the graph's Laplacian matrix. For the complete graph $K_n$ it collapses to Cayley's formula, $n^{n-2}$ — so $K_5$ has $5^3 = 125$ spanning trees. We won't prove these (they belong to a course on algebraic graph theory), but they tell you the search space is enormous, which is precisely why we need clever algorithms rather than enumeration.

🔄 Check Your Understanding 1. A connected graph has $12$ vertices and $30$ edges. How many edges does any spanning tree of it have? How many edges get left out? 2. True or false: if you add one edge of $G$ to a spanning tree $T$, the result still has no cycle. 3. Why must $G$ be connected for a spanning tree to exist?

Answers

A spanning tree has $n - 1 = 11$ edges. It leaves out $30 - 11 = 19$ edges. 2. False — adding any non-tree edge creates exactly one cycle, since its two endpoints were already joined by a unique path in the tree. 3. A tree is connected by definition, and a spanning tree includes every vertex; if $G$ itself were disconnected, no connected subgraph could reach all of $G$'s vertices, so no spanning tree could exist.

32.2 The minimum spanning tree problem

Now put the costs back. In the data-center scenario, each potential cable has a price. We model this with a weighted graph — the same object you studied in Chapter 29, where each edge $e$ carries a number $w(e)$, its weight (here, a cost).

🔗 Connection. In Chapter 29, weights were distances and you wanted the shortest path between two vertices (Dijkstra). Here weights are costs and you want the cheapest tree connecting all vertices. These are different problems with different answers — a common beginner mistake is to think the MST contains the shortest path between every pair, and it does not (§32.6 has a clean counterexample). Same graphs, different optimization.

Here is the office graph again, now with cable costs (in thousands of dollars) written on each edge:

Edge weights: $w(AB)=2$, $w(AC)=3$, $w(BC)=1$, $w(BD)=4$, $w(CE)=5$, $w(DE)=6$.

The cost of a spanning tree is just the sum of the weights of its edges. Let's define that.

Definition (weight of a spanning tree). The weight (or cost) of a spanning tree $T$ of a weighted graph is the sum of the weights of its edges: $$w(T) = \sum_{e \in E_T} w(e).$$

Definition (minimum spanning tree). A minimum spanning tree (MST) of a connected weighted graph $G$ is a spanning tree of minimum total weight — that is, a spanning tree $T$ such that $w(T) \le w(T')$ for every spanning tree $T'$ of $G$.

A graph can have more than one MST (several different trees can tie for the minimum weight), so we say "a minimum spanning tree," not "the." But when all edge weights are distinct, the MST is unique — a fact you'll prove in the exercises.

The MST problem. Input: a connected, undirected, weighted graph $G=(V,E,w)$. Output: a spanning tree of $G$ of minimum total weight.

A worked example (by hand, the obvious way)

For the office graph, let's find the MST by reasoning. Every spanning tree has $4$ edges. The cheapest edges are $BC(1), AB(2), AC(3), BD(4), CE(5), DE(6)$ in that order. Intuition says: grab cheap edges, avoid cycles.

Take $BC$ (cost 1). No cycle. Tree so far: $\{BC\}$.
Take $AB$ (cost 2). Connects $A$. No cycle. Tree: $\{BC, AB\}$.
Take $AC$ (cost 3). But $A$ and $C$ are already connected (via $B$). Adding $AC$ would make the cycle $A\text{–}B\text{–}C\text{–}A$. Skip it.
Take $BD$ (cost 4). Connects $D$. No cycle. Tree: $\{BC, AB, BD\}$.
Take $CE$ (cost 5). Connects $E$. No cycle. Tree: $\{BC, AB, BD, CE\}$ — that's $4$ edges, all $5$ vertices. Done; we never even consider $DE$.

Total cost: $1 + 2 + 4 + 5 = 12$ (thousand dollars). That greedy reasoning is Kruskal's algorithm, which we formalize next. The MST is $\{BC, AB, BD, CE\}$:

Notice what greed did and did not do. It took the globally cheapest edge $BC$ first, and it refused the cheap edge $AC$ because that edge was redundant. It never considered the expensive $DE$ at all, because by the time it would, the graph was already spanned. The whole chapter is, in a sense, the question: why does this obviously-too-simple strategy always produce the true optimum? The answer is §32.5.

⚠️ Common Pitfall: "Minimum spanning tree" optimizes the total weight of the tree, not any individual path. The MST does not guarantee the shortest route between two particular vertices, and it does not minimize the maximum number of hops, and it does not balance the load. If your real objective is "minimize the longest path" or "minimize the most expensive single link," that is a different optimization problem with a possibly different answer. Always check that "minimize total edge cost, keep everything connected" is genuinely what you want before reaching for an MST algorithm.

🔄 Check Your Understanding 1. In the office graph, why did the greedy procedure skip edge $AC$ even though it is cheaper than $BD$ and $CE$? 2. What is the weight of the spanning tree $\{AB, AC, BD, CE\}$ (the one from §32.1)? Is it an MST? 3. If two spanning trees of a graph have the same total weight and that weight is minimal, how many MSTs does the graph have (at least)?

Answers

Because by the time $AC$ was considered, $A$ and $C$ were already connected (through $B$), so $AC$ would have closed a cycle and added cost with no connectivity benefit. 2. $w = 2 + 3 + 4 + 5 = 14$. It is not an MST (the MST has weight $12$); it uses the redundant edge $AC$ instead of the cheaper $BC$. 3. At least two — ties mean the MST is not unique. (Distinct weights would force uniqueness.)

32.3 Kruskal's algorithm and union-find

The hand-procedure above generalizes into one of the two classic MST algorithms, published by Joseph Kruskal in 1956.

Kruskal's algorithm (idea). Sort all edges by weight, cheapest first. Walk down the sorted list; add an edge to the growing forest if and only if it does not create a cycle (i.e., its two endpoints are not already connected). Stop when you have $n-1$ edges.

This is greedy in the purest sense: it considers edges in globally increasing cost order and takes each one unless it is redundant. The result is a single tree (a spanning tree) because we add exactly $n-1$ non-cycle-forming edges to a graph that is connected.

🔗 Connection. You first met the word greedy in Chapter 29, describing how Dijkstra's algorithm always expands the closest unvisited vertex and never reconsiders it. A greedy algorithm builds a solution by repeatedly making the choice that looks best right now, never backtracking. For most problems this is a heuristic that can fail; the remarkable fact, proved in §32.5, is that for the MST problem the greedy choice is always part of an optimal solution.

The one tricky part is the cycle test: "are these two endpoints already connected in the forest I've built so far?" Done naively (e.g., a fresh graph search per edge), this is slow — $O(V)$ or worse per edge, and we do it $E$ times. The right tool is a data structure built exactly for this question.

Union-find (the disjoint-set forest)

Definition (union-find / disjoint-set). A union-find (or disjoint-set) data structure maintains a collection of disjoint sets and supports two operations: - find(x) — return a canonical representative of the set containing $x$ (two elements are in the same set iff they have the same representative); - union(x, y) — merge the sets containing $x$ and $y$ into one.

For Kruskal, each set is a connected component of the forest built so far. Two vertices are "already connected" exactly when find(u) == find(v). Adding an edge that joins two different components is a union.

The standard implementation is a disjoint-set forest: each element points to a parent; following parents leads to a root, which is the representative.

Definition (disjoint-set forest). A disjoint-set forest represents each set as a rooted tree of parent pointers. find(x) follows parent pointers from $x$ to the root; union(x,y) makes one root the parent of the other.

Two optimizations make this almost magically fast:

Definition (union by rank). Union by rank (or union by size) always attaches the shorter tree under the taller one's root, keeping trees shallow. ("Rank" is an upper bound on a tree's height; "size" uses the element count — either works.)

Definition (path compression). Path compression is performed during find: after locating the root, repoint every node visited directly to the root, flattening the tree for all future queries.

With both, a sequence of $m$ operations on $n$ elements runs in $O(m\,\alpha(n))$ time, where $\alpha$ is the inverse Ackermann function — effectively a small constant (at most $4$ or $5$ for any input you will ever encounter). For our purposes, treat find and union as essentially constant-time.

💡 Intuition: Think of each set as a club, and find(x) as asking "who is your club president?" Two people are clubmates iff they name the same president. union merges two clubs under one president. Union by rank keeps the chain of "ask your boss who their boss is…" short by putting the smaller club under the bigger club's president. Path compression means: once you've walked the whole chain of command to find the president, everyone you passed now reports to the president directly, so next time the answer is instant.

Kruskal in pseudocode, then a trace

KRUSKAL(G = (V, E, w)):
    forest = empty set of edges
    uf = UnionFind(V)            # every vertex in its own set
    for each edge (u, v) in E sorted by ascending w:
        if uf.find(u) != uf.find(v):     # endpoints in different components?
            forest.add((u, v))
            uf.union(u, v)
            if len(forest) == len(V) - 1:
                break            # spanning tree complete
    return forest

Trace on the office graph. Sorted edges: $BC(1), AB(2), AC(3), BD(4), CE(5), DE(6)$. Start with five singletons $\{A\},\{B\},\{C\},\{D\},\{E\}$.

Step	Edge	$w$	`find(u)`, `find(v)`	Same set?	Action	Components after
1	$BC$	1	$B \neq C$	no	add, union	$\{A\},\{B,C\},\{D\},\{E\}$
2	$AB$	2	$A \neq B$	no	add, union	$\{A,B,C\},\{D\},\{E\}$
3	$AC$	3	$A = C$	yes	skip (cycle)	$\{A,B,C\},\{D\},\{E\}$
4	$BD$	4	$B \neq D$	no	add, union	$\{A,B,C,D\},\{E\}$
5	$CE$	5	$C \neq E$	no	add, union	$\{A,B,C,D,E\}$

After step 5 the forest has $4 = n-1$ edges, so we stop (never examining $DE$). Output: $\{BC, AB, BD, CE\}$, weight $12$ — matching our hand-derivation. Edge $AC$ was correctly rejected the moment find(A) == find(C).

Kruskal in Python

Here is a clean, from-scratch implementation. We represent the graph as a list of weighted edges, which is exactly what Kruskal wants.

def kruskal(n, edges):
    """n = number of vertices (0..n-1); edges = list of (w, u, v).
    Returns (mst_edges, total_weight)."""
    parent = list(range(n))                  # union-find: parent[i]
    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]    # path compression (halving)
            x = parent[x]
        return x
    mst, total = [], 0
    for w, u, v in sorted(edges):            # ascending weight
        ru, rv = find(u), find(v)
        if ru != rv:                         # different components?
            parent[ru] = rv                  # union
            mst.append((u, v, w))
            total += w
    return mst, total

# Office graph: A=0,B=1,C=2,D=3,E=4
edges = [(2,0,1),(3,0,2),(1,1,2),(4,1,3),(5,2,4),(6,3,4)]
print(kruskal(5, edges))
# Expected output:
# ([(1, 2, 1), (0, 1, 2), (1, 3, 4), (2, 4, 5)], 12)

Hand-tracing the output: after sorting, the edges are processed as $(1,1,2),(2,0,1),(3,0,2),(4,1,3),(5,2,4),(6,3,4)$. We add $(1,2)$, add $(0,1)$, skip $(0,2)$ [same component], add $(1,3)$, add $(2,4)$ [now 4 edges, but the loop simply finishes since later edges would all be cycles]. The MST edges in insertion order are $(1,2),(0,1),(1,3),(2,4)$ with total $1+2+4+5 = 12$. ✓

Running time. Sorting the edges dominates: $O(E \log E)$, which is $O(E \log V)$ since $E \le V^2$ implies $\log E \le 2\log V$. The union-find work is $O(E\,\alpha(V))$, negligible by comparison. So Kruskal runs in $\boxed{O(E \log V)}$ time. If the edges arrive already sorted (or are sortable in linear time), it drops to nearly $O(E\,\alpha(V))$.

🔄 Check Your Understanding 1. In Kruskal's trace, exactly one edge was skipped. Which one, and what was the value of find for its two endpoints at that moment? 2. Why does Kruskal sort edges in ascending weight rather than descending? 3. What does the union-find structure represent at any point during Kruskal's run?

Answers

Edge $AC$ was skipped; at that moment find(A) == find(C) (both had the same representative because $A,B,C$ were already in one component). 2. Greedy MST construction takes the cheapest safe edges first; ascending order ensures that when we add an edge it is the cheapest one that still connects two components. (Descending order is the reverse-delete algorithm — a different correct method — not Kruskal.) 3. The connected components of the forest built so far: each disjoint set is one component, and two vertices share a set iff a path between them already exists in the chosen edges.

32.4 Prim's algorithm

Kruskal grows a forest — many little trees that eventually merge. Prim's algorithm (Vojtěch Jarník 1930, rediscovered by Robert Prim 1957 and Edsger Dijkstra 1959) takes the opposite approach: grow one tree from a starting vertex, always adding the cheapest edge that reaches a new vertex.

Prim's algorithm (idea). Start with any single vertex as a one-vertex tree. Repeatedly find the cheapest edge that connects a vertex already in the tree to a vertex not yet in the tree, and add that edge (and the new vertex). Repeat until all vertices are in the tree.

If Kruskal is "cheapest safe edge anywhere," Prim is "cheapest edge leaving my current tree." Both are greedy; they just define the local choice differently. Prim looks a lot like Dijkstra — and indeed uses the same supporting structure.

🔗 Connection. Prim's algorithm has nearly the same skeleton as Dijkstra's shortest-path algorithm from Chapter 29: maintain a frontier, repeatedly extract the cheapest frontier edge with a priority queue (a heap, from Chapter 31), and relax. The only substantive change is the key you store. Dijkstra keys a frontier vertex by its total distance from the source; Prim keys it by the weight of the single cheapest edge connecting it to the current tree. That one-line difference turns shortest-paths into minimum-spanning-tree.

Prim in pseudocode, then a trace

PRIM(G = (V, E, w), start):
    in_tree = { start }
    tree_edges = empty set
    while in_tree != V:
        find the minimum-weight edge (u, v) with u in in_tree, v not in in_tree
        tree_edges.add((u, v))
        in_tree.add(v)
    return tree_edges

Trace on the office graph, starting at $A$. At each step we list the edges crossing from the tree to outside, and pick the cheapest.

Step	Tree vertices	Candidate crossing edges (weight)	Cheapest	Add	Tree edges so far
1	$\{A\}$	$AB(2), AC(3)$	$AB(2)$	$B$	$AB$
2	$\{A,B\}$	$AC(3), BC(1), BD(4)$	$BC(1)$	$C$	$AB, BC$
3	$\{A,B,C\}$	$BD(4), CE(5)$	$BD(4)$	$D$	$AB, BC, BD$
4	$\{A,B,C,D\}$	$CE(5), DE(6)$	$CE(5)$	$E$	$AB, BC, BD, CE$

(Edge $AC$ is never chosen — by step 2 both $A$ and $C$ are inside, so $AC$ no longer crosses the boundary; same for $DE$ at the end.) Output: $\{AB, BC, BD, CE\}$, weight $2+1+4+5 = 12$.

Same MST as Kruskal. Both algorithms returned a tree of weight $12$ (here, the same tree, because the weights are distinct so the MST is unique). The order in which edges were added differs — Kruskal took $BC$ first, Prim took $AB$ first — but the final tree is identical. That is the cut property at work, as we'll prove next.

💡 Intuition: Picture Prim's tree as a growing ink blot on the graph. At every moment there is a boundary between "inked" (in the tree) and "blank" (not yet) vertices. Prim always crosses that boundary at its cheapest point, then re-draws the boundary. The boundary is exactly a cut, and "the cheapest edge crossing the current cut is safe" is precisely the theorem that makes both algorithms correct.

Prim in Python (with a heap)

We store the graph as an adjacency list and use Python's heapq as the priority queue.

import heapq

def prim(n, adj, start=0):
    """adj[u] = list of (w, v). Returns (mst_edges, total_weight)."""
    in_tree = [False] * n
    mst, total = [], 0
    pq = [(0, start, -1)]                     # (edge_weight, vertex, parent)
    while pq:
        w, u, parent = heapq.heappop(pq)
        if in_tree[u]:
            continue                          # stale entry; skip
        in_tree[u] = True
        if parent != -1:
            mst.append((parent, u, w))
            total += w
        for ew, v in adj[u]:
            if not in_tree[v]:
                heapq.heappush(pq, (ew, v, u))
    return mst, total

# Office graph as adjacency list (A=0,B=1,C=2,D=3,E=4)
adj = {0:[(2,1),(3,2)], 1:[(2,0),(1,2),(4,3)], 2:[(3,0),(1,1),(5,4)],
       3:[(4,1),(6,4)], 4:[(5,2),(6,3)]}
print(prim(5, adj))
# Expected output:
# ([(0, 1, 2), (1, 2, 1), (1, 3, 4), (2, 4, 5)], 12)

Hand-tracing: pop $(0,0,-1)$ → add $0$ (root, no edge); push $A$'s neighbors $(2,1),(3,2)$. Pop $(2,1,0)$ → add edge $(0,1,2)$; push $B$'s neighbors $(1,2),(4,3)$ (skip $0$, in tree). Pop $(1,2,1)$ → add edge $(1,2,1)$; push $C$'s neighbors $(5,4)$ and stale $(3,0)$. Pop $(3,2,0)$ → $2$ already in tree, skip. Pop $(4,3,1)$ → add edge $(1,3,4)$; push $(6,4)$. Pop $(5,4,2)$ → add edge $(2,4,5)$. Remaining pops are all stale. MST edges $(0,1),(1,2),(1,3),(2,4)$, total $2+1+4+5 = 12$. ✓

Running time. With a binary heap, each of the $E$ edges may be pushed once and each pop is $O(\log V)$, giving $O(E \log V)$ — the same asymptotic bound as Kruskal. With a Fibonacci heap the bound improves to $O(E + V\log V)$, which matters only for very dense graphs.

🔄 Check Your Understanding 1. In Prim's trace starting at $A$, why was edge $AC$ never added even though it has weight $3 < 4 = w(BD)$? 2. Kruskal and Prim both produced weight-$12$ trees. Under what condition are they guaranteed to produce the exact same tree (not just equal weight)? 3. In the Python prim, what is the purpose of the if in_tree[u]: continue line?

Answers

After step 2, both endpoints $A$ and $C$ are already inside the tree, so $AC$ no longer crosses the tree/non-tree boundary; Prim only ever considers crossing edges. (When $AC$ did cross, in step 2, edge $BC(1)$ was cheaper.) 2. When all edge weights are distinct, the MST is unique, so any correct MST algorithm returns the same tree. With ties, different algorithms (or tie-breaking orders) may return different MSTs of equal weight. 3. It discards stale heap entries — a vertex can be pushed multiple times with different candidate edge weights, and once it is already in the tree we ignore any later (more expensive) entry for it.

32.5 Why the greedy MST algorithms are correct: the cut property

We have two greedy algorithms that seem to work on examples. In Chapter 6 you internalized the book's second theme: passing tests is not the same as being correct. Greedy algorithms are exactly where this bites — most greedy strategies produce suboptimal answers on some input. So we owe you a proof that Kruskal and Prim are always optimal, not merely usually. The entire argument reduces to one lemma about cuts.

Definition (cut and crossing edge). A cut of a graph $G=(V,E)$ is a partition of its vertices into two non-empty sets, $(S, V \setminus S)$. An edge crosses the cut if it has one endpoint in $S$ and the other in $V \setminus S$; such an edge is a crossing edge. A light edge crossing a cut is a crossing edge of minimum weight among all crossing edges.

A cut is just a way of slicing the vertices into two camps; the crossing edges are the ones that would have to be "cut" to fully separate the camps. Here is the theorem on which everything rests.

Theorem (the cut property). Let $G$ be a connected weighted graph, and let $(S, V \setminus S)$ be any cut of $G$. If an edge $e$ is a light edge crossing this cut (a minimum-weight crossing edge), then $e$ belongs to some minimum spanning tree of $G$. Moreover, if $e$ is the unique minimum-weight crossing edge, then $e$ belongs to every MST.

The strategy first. This is a classic exchange argument, and the shape is worth remembering because it proves the correctness of many greedy algorithms. We don't try to build the MST. Instead we assume we already have an MST $T$ that, perhaps, does not contain our light edge $e$. We then surgically swap $e$ into $T$ and swap out some other crossing edge, and show the swap does not increase the weight. Since $T$ was already minimal, the swapped tree is also an MST — and it contains $e$. Conclusion: some MST contains $e$. The only thing we must verify is that the swap really does yield a valid spanning tree, and this is where the "add an edge to a tree, get exactly one cycle" fact from §32.1 earns its keep.

Proof. Let $e = (u, v)$ be a light edge crossing the cut $(S, V \setminus S)$, with $u \in S$ and $v \in V \setminus S$. Let $T$ be any MST of $G$. If $e \in T$, we are done — $e$ is in this MST. So assume $e \notin T$.

Because $T$ is a spanning tree, it connects $u$ and $v$ by a unique path $p$. Since $u \in S$ and $v \in V\setminus S$, the path $p$ must cross the cut at least once — somewhere along $p$ there is an edge $e' = (x, y)$ with $x \in S$ and $y \in V\setminus S$. So $e'$ is also a crossing edge.

Now perform the exchange. Form $T' = (T \setminus \{e'\}) \cup \{e\}$: remove the crossing tree-edge $e'$ and insert our light edge $e$. We claim $T'$ is a spanning tree:

Adding $e$ to the tree $T$ creates exactly one cycle (the cycle is $e$ together with the path $p$, since $u,v$ were joined by $p$). The edge $e'$ lies on that cycle (it is on $p$). Removing an edge that lies on the unique cycle of $T \cup \{e\}$ breaks the cycle without disconnecting anything. So $T'$ is connected and acyclic — a tree.
$T'$ has the same number of edges as $T$ (we removed one, added one), namely $n-1$, and spans all vertices. So $T'$ is a spanning tree. ✓

Compare weights. Since $e$ is a light (minimum-weight) crossing edge and $e'$ is some crossing edge, $w(e) \le w(e')$. Therefore $$w(T') = w(T) - w(e') + w(e) \le w(T).$$ But $T$ is an MST, so no spanning tree can weigh less than $T$; hence $w(T') = w(T)$, and $T'$ is also an MST. Since $e \in T'$, we have exhibited an MST containing $e$.

For the "moreover": if $e$ is the unique minimum-weight crossing edge, then $w(e) < w(e')$ strictly for every other crossing edge $e'$. If some MST $T$ omitted $e$, the same exchange would give $w(T') = w(T) - w(e') + w(e) < w(T)$, contradicting the minimality of $T$. So every MST must contain $e$. $\blacksquare$

🚪 Threshold Concept. The cut property is the moment greedy algorithms stop being heuristics and become provably optimal. The deep idea is the exchange argument: to prove a greedy choice is safe, assume an optimal solution that disagrees with your choice, then transform it into one that agrees without making it worse. Once you see this pattern, you will recognize it behind the correctness of Dijkstra, Huffman coding (Chapter 31), interval scheduling, and many more. "Why is greedy correct here?" almost always has the answer "because an exchange argument shows the greedy choice never costs anything." This is one of the most transferable proof techniques in all of algorithm design.

The cut property makes both algorithms correct

The cut property is a master key. Watch how each algorithm is just "repeatedly apply the cut property."

Corollary (Prim is correct). At every step of Prim's algorithm, let $S$ be the set of vertices already in the tree. The edge Prim adds is, by construction, the minimum-weight edge crossing the cut $(S, V\setminus S)$ — a light edge. By the cut property, that edge belongs to some MST. Since every edge Prim adds is safe (extendable to an MST) and Prim adds exactly $n-1$ edges forming a spanning tree, the result is an MST. $\blacksquare$

Corollary (Kruskal is correct). When Kruskal adds an edge $e=(u,v)$, it does so because $u$ and $v$ are in different components. Let $S$ be the component containing $u$ (just before adding $e$). No edge Kruskal has accepted so far crosses the cut $(S, V\setminus S)$ — those edges all live inside components. Every edge crossing this cut is still unprocessed, hence has weight $\ge w(e)$ (Kruskal processes edges in ascending order, and any cheaper crossing edge would have been examined earlier and, since it joins two different components, accepted — contradiction). So $e$ is a light edge for this cut, and by the cut property it is safe. Thus every edge Kruskal accepts belongs to some MST, and the $n-1$ accepted edges form an MST. $\blacksquare$

The strategy first (Kruskal corollary), restated plainly. The subtle line is "every crossing edge is still unprocessed." Why? Kruskal goes cheapest-first. Suppose a crossing edge $f$ were cheaper than $e$ and had already been processed. Being a crossing edge, $f$ joins $S$ to the outside — two different components at the time it was seen — so Kruskal would have accepted $f$, which would have merged those components and changed what $S$ is. The only way $S$ is what it is now is that no such cheaper crossing edge was accepted. Hence $e$ is the cheapest edge across this particular cut. That is exactly "light edge."

The complementary tool: the cycle property

The cut property tells you which edges to keep. Its mirror image tells you which edges you may throw away.

Theorem (the cycle property). For any cycle $C$ in a connected weighted graph, if an edge $e$ on $C$ has strictly greater weight than every other edge on $C$ (a unique heaviest edge of the cycle), then $e$ is in no minimum spanning tree.

The strategy first. Same exchange flavor, run in reverse. Suppose some MST $T$ contained the heaviest cycle-edge $e$. Removing $e$ splits $T$ into two pieces; the cycle $C$ must contain another edge bridging those two pieces (because $C$ is a cycle, it reconnects them). Swap that cheaper bridging edge in for $e$ — the result is a spanning tree of strictly smaller weight, contradicting that $T$ was minimum.

Proof. Let $e=(u,v)$ be the unique heaviest edge on cycle $C$, and suppose for contradiction that $e$ lies in some MST $T$. Removing $e$ from $T$ disconnects $T$ into two components, $T_u$ (containing $u$) and $T_v$ (containing $v$). The cycle $C$ runs from $u$ to $v$ along $e$ and also the long way around; that long way starts in $T_u$ and ends in $T_v$, so it must contain at least one edge $f \ne e$ with one endpoint in $T_u$ and the other in $T_v$. Form $T' = (T \setminus \{e\}) \cup \{f\}$: this reconnects the two components into a spanning tree. Since $e$ was the unique heaviest edge of $C$ and $f$ is another edge of $C$, we have $w(f) < w(e)$, so $$w(T') = w(T) - w(e) + w(f) < w(T),$$ contradicting the minimality of $T$. Hence no MST contains $e$. $\blacksquare$

The cycle property is the engine behind the reverse-delete algorithm (sort edges descending; delete the heaviest edge whenever doing so keeps the graph connected) and it explains, at last, why Kruskal was right to skip edge $AC$: $AC$ was the heaviest edge of the triangle $A\text{–}B\text{–}C$, so it belongs to no MST.

🔄 Check Your Understanding 1. State the cut property in one sentence, and name the proof technique it uses. 2. In the office graph, take the cut $S = \{A, B\}$ versus $\{C, D, E\}$. List the crossing edges and identify the light edge. Is that light edge in the MST we found? 3. Use the cycle property to explain in one sentence why edge $DE(6)$ is in no MST of the office graph.

Answers

For any cut, a minimum-weight edge crossing the cut belongs to some MST — proved by an exchange argument. 2. Crossing edges (one endpoint in $\{A,B\}$, the other in $\{C,D,E\}$): $AC(3)$, $BC(1)$, $BD(4)$. The light edge is $BC(1)$. Yes — $BC$ is in our MST. 3. $DE(6)$ is the unique heaviest edge of the cycle $B\text{–}D\text{–}E\text{–}C\text{–}B$ (weights $4,6,5,1$), so by the cycle property it is in no MST.

32.6 Network design applications

MSTs are not a textbook curiosity; they are the backbone of how engineers connect things cheaply. But using them well means knowing both their power and their boundaries.

Where MSTs are exactly the right tool

Physical infrastructure layout. Wiring a building or data center, laying water/gas/power lines between facilities, connecting towns to a utility — whenever the cost is "total length/price of links" and you only need everything connected, the MST is the optimal layout. This is literally the problem Otakar Borůvka solved in 1926 for the Moravian electrical grid, the first MST algorithm.
Clustering (single-linkage). Build the MST of your data points (edge weight = distance), then delete the $k-1$ most expensive MST edges. The graph falls into $k$ connected components — these are the single-linkage clusters. MSTs give you a hierarchy of clusters essentially for free, which is why they appear in unsupervised machine learning and image segmentation.
Approximation for hard problems. The Traveling Salesman Problem (Chapter 30) is NP-hard, but for metric instances you can build an MST, walk it (a depth-first traversal), and shortcut repeats to get a tour at most twice the optimal — a guaranteed 2-approximation built directly on the MST. The MST is the scaffolding under several approximation algorithms for intractable problems.
Broadcast and minimum-cost connectivity in networks. When a message must reach all nodes and you pay per link used, an MST minimizes total link cost.

A worked network-design decision

Imagine a regional ISP must connect six towns with fiber. The survey gives the cost (in $\$10\text{k}$) of laying fiber along each feasible route:

   Town graph (edge = feasible route, label = cost):
   1 --7-- 2
   |  \    |
   5   9   8
   |    \  |
   3 --15- 4 --6-- 6
    \      |     /
     11    3    12
       \   |   /
        \  5  /
         \ | /
           5(node)

(Read the labels as the cost of each route; node 5 connects to 3, 4, and 6.) The ISP wants the cheapest set of routes that connects all six towns. That is an MST. Running Kruskal — sort all routes ascending, add each unless it forms a cycle — selects the cheapest spanning subset. The decision the ISP actually makes is: build exactly the MST edges, skip the rest, and the total is provably the minimum possible fiber budget that still connects every town. No spreadsheet of alternatives can beat it, and the cut property is your proof in the procurement meeting.

The Python tools you built (kruskal, prim) are precisely what an engineer runs on the survey data. The hand-trace is what you do on the whiteboard to convince a skeptical manager.

Where MSTs are the wrong tool (know the boundary)

⚠️ Common Pitfall: the MST is not a shortest-path tree. Consider a triangle with vertices $X, Y, Z$ and weights $w(XY)=1$, $w(YZ)=1$, $w(XZ)=3$. The MST is $\{XY, YZ\}$ (weight $2$), so in the MST the distance from $X$ to $Z$ is $1+1 = 2$. But the direct edge $XZ$ has weight $3$, and the actual shortest path $X\to Z$ in the full graph is $XZ$ length... $3$? No — the shortest path is via $Y$, length $2$. Here they happen to agree. Change it: weights $w(XY)=1$, $w(YZ)=1$, $w(XZ)=1.5$. Now the MST is $\{XY, YZ\}$ (weight $2$, cheaper than any tree using $XZ$), giving $X$-to-$Z$ distance $2$ in the tree — but the true shortest path is the direct edge $XZ$ of length $1.5$. The MST sacrificed a short individual route to minimize the total. If your users care about per-pair latency, build a shortest-path tree (Chapter 29), not an MST. They optimize different things.

Other boundaries worth stating plainly:

An MST minimizes total edge weight, not the maximum edge weight (though, as it happens, an MST does also minimize the maximum edge weight on the path between any two vertices — the "minimax path" property — a nice bonus, proved in the exercises).
An MST provides no redundancy: it is a tree, so any single link failure disconnects the network. Real networks add edges back for fault tolerance, trading cost for resilience. The MST is the minimum-cost baseline, not the most robust design.
MST algorithms assume a connected graph. On a disconnected graph the same greedy methods produce a minimum spanning forest (one MST per component) instead.

🔄 Check Your Understanding 1. You must connect $8$ offices with the least total cabling and you don't care about hop count or redundancy. MST or shortest-path tree? 2. After building an MST for clustering, how do you split the data into exactly $4$ clusters? 3. Give one reason a real network designer would deliberately add edges beyond the MST.

Answers

MST — your objective is minimum total connection cost with everything connected, which is exactly what an MST optimizes. 2. Delete the $4 - 1 = 3$ heaviest edges of the MST; the tree falls into $4$ connected components (single-linkage clusters). 3. Fault tolerance / redundancy — a tree has no alternate routes, so one failed link disconnects part of the network; adding edges (creating cycles) provides backup paths, trading extra cost for resilience.

Project Checkpoint: `mst_kruskal` for the Toolkit

Time to add minimum spanning trees to dmtoolkit's graphs.py module — the same module where Chapters 27–28 built the Graph class and traversals, and Chapter 29 added dijkstra. We implement the canonical function mst_kruskal(g), reusing the union-find trick from §32.3. This is the piece the capstone's network-analyzer track (Chapter 39) calls to find the cheapest backbone of a graph.

We add a tiny UnionFind helper and the mst_kruskal function. Assume the Graph class exposes its weighted edges via g.edges() yielding (u, v, weight) triples and its vertices via g.vertices() (both established in Chapter 27's checkpoint).

# dmtoolkit/graphs.py  (add to the existing module)

class UnionFind:
    """Disjoint-set forest with path compression."""
    def __init__(self, items):
        self.parent = {x: x for x in items}
    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]  # path halving
            x = self.parent[x]
        return x
    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx != ry:
            self.parent[rx] = ry
            return True          # a real merge happened
        return False             # already connected (would form a cycle)

def mst_kruskal(g):
    """Return (mst_edges, total_weight) for a connected weighted Graph g.
    mst_edges is a list of (u, v, weight). Uses Kruskal + union-find."""
    uf = UnionFind(g.vertices())
    mst, total = [], 0
    for u, v, w in sorted(g.edges(), key=lambda e: e[2]):  # ascending weight
        if uf.union(u, v):       # only add if it joins two components
            mst.append((u, v, w))
            total += w
    return mst, total

A quick check, using the office graph encoded as edges (the Graph API wraps these):

# Conceptually: g has edges A-B:2, A-C:3, B-C:1, B-D:4, C-E:5, D-E:6
# print(mst_kruskal(g))
# Expected output:
# ([('B', 'C', 1), ('A', 'B', 2), ('B', 'D', 4), ('C', 'E', 5)], 12)

Hand-tracing against §32.3's table: sorted edges put $BC(1)$ first, then $AB(2)$, then $AC(3)$ (rejected — union returns False), then $BD(4)$, then $CE(5)$, then $DE(6)$ (rejected). The accepted edges in order are $BC, AB, BD, CE$ with total $1+2+4+5 = 12$. ✓

Toolkit so far: logic.py (Ch. 1–3), sets.py (Ch. 8), relations.py (Ch. 12–13), combinatorics.py (Ch. 15–17), recurrences.py (Ch. 6, 18–19), probability.py (Ch. 20–21), numbertheory.py and crypto.py (Ch. 22–25), and graphs.py now carrying bfs, dfs, dijkstra, and mst_kruskal. In Chapter 34 you'll add max_flow to complete the graph toolkit; the capstone's Track B (the social-network analyzer) will call mst_kruskal to extract a graph's cheapest connecting backbone.

Summary

A spanning tree of a connected graph $G=(V,E)$ is an acyclic connected subgraph touching every vertex; it always has $\lvert V\rvert - 1$ edges. A minimum spanning tree (MST) is a spanning tree of least total edge weight. Two greedy algorithms find one optimally.

	Kruskal	Prim
Grows	a forest of trees that merge	one tree from a start vertex
Local choice	cheapest edge anywhere that joins two components	cheapest edge leaving the current tree
Key structure	union-find (cycle test)	priority queue / heap
Cycle test / frontier	`find(u) != find(v)`	min crossing edge
Running time	$O(E \log V)$ (sorting dominates)	$O(E \log V)$ (binary heap)
Best when	sparse graphs; edges pre-sortable	dense graphs; adjacency lists

Core definitions.

Term	Meaning
Spanning tree	connected acyclic subgraph spanning all of $V$; has $\lvert V\rvert-1$ edges
Weight of a tree	$w(T)=\sum_{e\in E_T} w(e)$
MST	spanning tree minimizing $w(T)$
Cut $(S,V\setminus S)$	partition of vertices into two non-empty sets
Crossing / light edge	edge with endpoints on both sides / minimum-weight such edge
Union-find	maintains disjoint sets with `find` and `union`, near-constant via path compression + union by rank

The two correctness theorems.

Cut property: a light (min-weight) edge crossing any cut is in some MST; if unique, in every MST. Proved by an exchange argument. This is why both greedy algorithms work — each repeatedly adds a light edge for some cut.
Cycle property: the unique heaviest edge on any cycle is in no MST. This justifies skipping/deleting redundant edges (and the reverse-delete algorithm).

Key facts to carry forward.

Adding any non-tree edge to a spanning tree creates exactly one cycle; removing any tree edge creates exactly two components. (The engine of both exchange proofs.)
Distinct edge weights $\Rightarrow$ the MST is unique.
MST $\ne$ shortest-path tree — they optimize different quantities.
A tree has no redundancy; real networks add edges back for fault tolerance.
On a disconnected graph, the algorithms produce a minimum spanning forest.

Spaced Review

Answer from memory before expanding. These revisit Chapters 31 and 29.

(Ch. 31) A tree on $n$ vertices has how many edges, and why does that immediately tell you how many edges any spanning tree of an $n$-vertex graph has?

(Ch. 31) Prim uses a heap as its priority queue. From Chapter 31, what are the time complexities of a binary heap's insert and extract-min operations, and why does that give Prim its $O(E \log V)$ bound?

(Ch. 29) Prim and Dijkstra share almost the same code. State the one substantive difference in what each stores as a vertex's key in the priority queue.

(Ch. 29) Both Dijkstra and the MST algorithms are greedy. In Chapter 29, what kind of argument (the same family used in this chapter's cut property) establishes that Dijkstra's greedy choice is safe?

(Ch. 31) Huffman coding (Chapter 31) is also a greedy algorithm proved correct by an exchange argument. In one sentence, what is the greedy choice Huffman makes at each step?

Answers

A tree on $n$ vertices has exactly $n-1$ edges; since a spanning tree is a tree on all $n$ vertices of the graph, it has exactly $n-1$ edges regardless of how many edges the graph has. 2. Insert and extract-min are each $O(\log V)$; Prim performs $O(E)$ inserts/updates and $O(V)$ extractions, so the total is $O(E\log V)$ (the $E\log V$ term dominates). 3. Dijkstra keys a frontier vertex by its total distance from the source (sum of edge weights along the best known path); Prim keys it by the weight of the single cheapest edge connecting it to the current tree. 4. An exchange argument (assume an optimal solution disagreeing with the greedy choice, then transform it without increasing cost) — combined with a loop invariant on the set of finalized vertices. 5. Huffman repeatedly merges the two least-frequent nodes into a new subtree.

What's Next

You now have the two most important things you can do with weights on a graph: find the cheapest path between two vertices (Chapter 29) and find the cheapest tree connecting all of them (this chapter). Both were greedy, and both were optimal for the same deep reason — an exchange argument that swaps a greedy choice into any optimal solution without making it worse.

Chapter 33 turns from connecting a graph to labeling it. We'll ask how few colors are needed to color the vertices so that no edge joins two same-colored vertices — the chromatic number — and discover that this innocent question secretly solves exam scheduling, compiler register allocation, and frequency assignment, while also leading to one of the most famous theorems in mathematics, the Four Color Theorem. The greedy theme continues, but this time greed will not always be optimal — and understanding exactly when it fails is the lesson.

Prerequisites

Learning Objectives

In This Chapter

Chapter 32: Spanning Trees, Minimum Spanning Trees, and Network Design

Overview

Learning Paths

32.1 Spanning trees

How many spanning trees? (a worked example)

32.2 The minimum spanning tree problem

A worked example (by hand, the obvious way)

32.3 Kruskal's algorithm and union-find

Union-find (the disjoint-set forest)

Kruskal in pseudocode, then a trace

Kruskal in Python

32.4 Prim's algorithm

Prim in pseudocode, then a trace

Prim in Python (with a heap)

32.5 Why the greedy MST algorithms are correct: the cut property

The cut property makes both algorithms correct

The complementary tool: the cycle property

32.6 Network design applications

Where MSTs are exactly the right tool

A worked network-design decision

Where MSTs are the wrong tool (know the boundary)

Project Checkpoint: mst_kruskal for the Toolkit

Summary

Spaced Review

What's Next

Project Checkpoint: `mst_kruskal` for the Toolkit