Chapter 32 — Key Takeaways

Spanning trees, minimum spanning trees, and network design — the re-grounding card. Reread before an exam.


Core definitions

Term Definition Must-know fact
Spanning tree Connected, acyclic subgraph of connected $G=(V,E)$ that includes every vertex Has exactly $\lvert V\rvert - 1$ edges
Weight of a tree $w(T) = \sum_{e \in E_T} w(e)$ Sum of edge weights
Minimum spanning tree (MST) Spanning tree of minimum total weight $w(T) \le w(T')$ for all spanning trees $T'$
Cut $(S, V\setminus S)$ Partition of $V$ into two non-empty parts The "boundary" being crossed
Crossing edge Edge with one endpoint in $S$, one in $V\setminus S$
Light edge Minimum-weight crossing edge of a cut The edge the cut property protects
Union-find / disjoint-set Maintains disjoint sets; find(x)=representative, union(x,y)=merge Cycle test in Kruskal
Disjoint-set forest Sets as parent-pointer trees; root = representative Substrate for union-find
Path compression In find, repoint visited nodes to root Flattens trees
Union by rank/size Attach shorter/smaller tree under taller/larger root Keeps trees shallow

Two structural identities (the engine of every proof here):

  • Add any non-tree edge to a spanning tree $\Rightarrow$ creates exactly one cycle.
  • Remove any edge from a spanning tree $\Rightarrow$ splits it into exactly two components.

Counting facts (cited, not proved): Cayley's formula — $K_n$ has $n^{n-2}$ spanning trees. General count — any cofactor of the Laplacian (Kirchhoff's matrix-tree theorem).


The two algorithms

Kruskal (1956) Prim (Jarník 1930 / Prim 1957)
Strategy Cheapest edge anywhere that joins two components Cheapest edge leaving the current tree
Grows A forest that merges into one tree One tree from a start vertex
Data structure Union-find (for the cycle test) Priority queue / heap
Test performed find(u) != find(v)? add : skip extract-min crossing edge
Edge ordering Sort all edges ascending once Lazy: pop cheapest frontier edge
Time $O(E \log V)$ — sorting dominates $O(E \log V)$ binary heap; $O(E + V\log V)$ Fibonacci heap
Shines on Sparse graphs; edges pre-sortable Dense graphs; adjacency lists
Cousin algorithm reverse-delete (cycle property) Dijkstra (Ch. 29) — different key

Kruskal skeleton: sort edges ascending → for each, if endpoints in different components, add + union → stop at $n-1$ edges.

Prim skeleton: start = one vertex → repeatedly add min-weight edge crossing tree boundary → stop when all vertices in.


Correctness: two theorems

Theorem Statement Use
Cut property A light (min-weight) edge crossing any cut is in some MST; if uniquely lightest, in every MST Proves Kruskal and Prim correct (each adds a light edge for some cut)
Cycle property The unique heaviest edge on any cycle is in no MST Justifies skipping/deleting redundant edges; powers reverse-delete

Proof technique for both: the exchange argument

Assume an optimal solution that disagrees with the greedy choice; swap the greedy edge in and a worse edge out; show the swap does not increase (cut property) or strictly decreases (cycle property) weight; conclude.

Corollary: distinct edge weights $\Rightarrow$ unique MST (Kruskal and Prim return the same tree).


Decision aid — "which tool do I reach for?"

Goal Use Not
Cheapest set of links connecting all nodes MST (Kruskal/Prim) shortest-path tree
Cheapest route between two nodes Dijkstra (Ch. 29) MST
Split data into $k$ clusters (single-linkage) MST, then delete $k-1$ heaviest edges
2-approx tour for metric TSP MST walk + shortcut (Ch. 30) exact (NP-hard)
Fault-tolerant network MST plus extra edges for redundancy bare MST (a tree = no backup)
Graph may be disconnected minimum spanning forest (one MST/component) MST (assumes connected)
Sparse graph, edges easy to sort Kruskal
Dense graph, adjacency list Prim

Common pitfalls

  • "The MST gives the shortest path between every pair." No — MST minimizes total weight, not per-pair distance. (Triangle $XY{=}1, YZ{=}1, XZ{=}1.5$: MST path $X\to Z$ costs $2$, but the direct edge is $1.5$.)
  • Forgetting the cycle test in Kruskal. Without find(u) != find(v), you build cycles, not a tree.
  • Sorting descending for Kruskal. Kruskal is ascending; descending-with-keep-connected is the reverse-delete algorithm.
  • Confusing MST with min-max objectives. MST minimizes the sum; if you need to minimize the single most expensive link or the longest path, that's a different problem. (Bonus: MST does minimize the max edge on the path between any two vertices — the minimax property.)
  • Assuming the MST is unique. Only guaranteed when all weights are distinct; ties allow multiple MSTs.
  • Treating a tree as fault-tolerant. Any single edge removal disconnects a tree.

Toolkit additions (dmtoolkit/graphs.py)

Added Signature Notes
UnionFind UnionFind(items), .find(x), .union(x,y) path compression; union returns True on a real merge
mst_kruskal(g) mst_kruskal(g) -> (mst_edges, total_weight) canonical; mst_edges = list of (u, v, weight)
def mst_kruskal(g):
    uf = UnionFind(g.vertices())
    mst, total = [], 0
    for u, v, w in sorted(g.edges(), key=lambda e: e[2]):
        if uf.union(u, v):
            mst.append((u, v, w)); total += w
    return mst, total

Builds toward the capstone (Ch. 39 Track B: social-network analyzer extracts a cheapest backbone). Ch. 34 adds max_flow to finish graphs.py.


One-line summary

A spanning tree is the minimal scaffolding ($n-1$ edges) that keeps a graph connected; the MST is the cheapest one; two greedy algorithms (Kruskal via union-find, Prim via a heap) find it in $O(E\log V)$; both are provably optimal because of the cut property, itself proved by an exchange argument.