Chapter 32 — Key Takeaways
Spanning trees, minimum spanning trees, and network design — the re-grounding card. Reread before an exam.
Core definitions
| Term | Definition | Must-know fact |
|---|---|---|
| Spanning tree | Connected, acyclic subgraph of connected $G=(V,E)$ that includes every vertex | Has exactly $\lvert V\rvert - 1$ edges |
| Weight of a tree | $w(T) = \sum_{e \in E_T} w(e)$ | Sum of edge weights |
| Minimum spanning tree (MST) | Spanning tree of minimum total weight | $w(T) \le w(T')$ for all spanning trees $T'$ |
| Cut $(S, V\setminus S)$ | Partition of $V$ into two non-empty parts | The "boundary" being crossed |
| Crossing edge | Edge with one endpoint in $S$, one in $V\setminus S$ | — |
| Light edge | Minimum-weight crossing edge of a cut | The edge the cut property protects |
| Union-find / disjoint-set | Maintains disjoint sets; find(x)=representative, union(x,y)=merge |
Cycle test in Kruskal |
| Disjoint-set forest | Sets as parent-pointer trees; root = representative | Substrate for union-find |
| Path compression | In find, repoint visited nodes to root |
Flattens trees |
| Union by rank/size | Attach shorter/smaller tree under taller/larger root | Keeps trees shallow |
Two structural identities (the engine of every proof here):
- Add any non-tree edge to a spanning tree $\Rightarrow$ creates exactly one cycle.
- Remove any edge from a spanning tree $\Rightarrow$ splits it into exactly two components.
Counting facts (cited, not proved): Cayley's formula — $K_n$ has $n^{n-2}$ spanning trees. General count — any cofactor of the Laplacian (Kirchhoff's matrix-tree theorem).
The two algorithms
| Kruskal (1956) | Prim (Jarník 1930 / Prim 1957) | |
|---|---|---|
| Strategy | Cheapest edge anywhere that joins two components | Cheapest edge leaving the current tree |
| Grows | A forest that merges into one tree | One tree from a start vertex |
| Data structure | Union-find (for the cycle test) | Priority queue / heap |
| Test performed | find(u) != find(v)? add : skip |
extract-min crossing edge |
| Edge ordering | Sort all edges ascending once | Lazy: pop cheapest frontier edge |
| Time | $O(E \log V)$ — sorting dominates | $O(E \log V)$ binary heap; $O(E + V\log V)$ Fibonacci heap |
| Shines on | Sparse graphs; edges pre-sortable | Dense graphs; adjacency lists |
| Cousin algorithm | reverse-delete (cycle property) | Dijkstra (Ch. 29) — different key |
Kruskal skeleton: sort edges ascending → for each, if endpoints in different components, add + union → stop at $n-1$ edges.
Prim skeleton: start = one vertex → repeatedly add min-weight edge crossing tree boundary → stop when all vertices in.
Correctness: two theorems
| Theorem | Statement | Use |
|---|---|---|
| Cut property | A light (min-weight) edge crossing any cut is in some MST; if uniquely lightest, in every MST | Proves Kruskal and Prim correct (each adds a light edge for some cut) |
| Cycle property | The unique heaviest edge on any cycle is in no MST | Justifies skipping/deleting redundant edges; powers reverse-delete |
Proof technique for both: the exchange argument —
Assume an optimal solution that disagrees with the greedy choice; swap the greedy edge in and a worse edge out; show the swap does not increase (cut property) or strictly decreases (cycle property) weight; conclude.
Corollary: distinct edge weights $\Rightarrow$ unique MST (Kruskal and Prim return the same tree).
Decision aid — "which tool do I reach for?"
| Goal | Use | Not |
|---|---|---|
| Cheapest set of links connecting all nodes | MST (Kruskal/Prim) | shortest-path tree |
| Cheapest route between two nodes | Dijkstra (Ch. 29) | MST |
| Split data into $k$ clusters (single-linkage) | MST, then delete $k-1$ heaviest edges | — |
| 2-approx tour for metric TSP | MST walk + shortcut (Ch. 30) | exact (NP-hard) |
| Fault-tolerant network | MST plus extra edges for redundancy | bare MST (a tree = no backup) |
| Graph may be disconnected | minimum spanning forest (one MST/component) | MST (assumes connected) |
| Sparse graph, edges easy to sort | Kruskal | — |
| Dense graph, adjacency list | Prim | — |
Common pitfalls
- ❌ "The MST gives the shortest path between every pair." No — MST minimizes total weight, not per-pair distance. (Triangle $XY{=}1, YZ{=}1, XZ{=}1.5$: MST path $X\to Z$ costs $2$, but the direct edge is $1.5$.)
- ❌ Forgetting the cycle test in Kruskal. Without
find(u) != find(v), you build cycles, not a tree. - ❌ Sorting descending for Kruskal. Kruskal is ascending; descending-with-keep-connected is the reverse-delete algorithm.
- ❌ Confusing MST with min-max objectives. MST minimizes the sum; if you need to minimize the single most expensive link or the longest path, that's a different problem. (Bonus: MST does minimize the max edge on the path between any two vertices — the minimax property.)
- ❌ Assuming the MST is unique. Only guaranteed when all weights are distinct; ties allow multiple MSTs.
- ❌ Treating a tree as fault-tolerant. Any single edge removal disconnects a tree.
Toolkit additions (dmtoolkit/graphs.py)
| Added | Signature | Notes |
|---|---|---|
UnionFind |
UnionFind(items), .find(x), .union(x,y) |
path compression; union returns True on a real merge |
mst_kruskal(g) |
mst_kruskal(g) -> (mst_edges, total_weight) |
canonical; mst_edges = list of (u, v, weight) |
def mst_kruskal(g):
uf = UnionFind(g.vertices())
mst, total = [], 0
for u, v, w in sorted(g.edges(), key=lambda e: e[2]):
if uf.union(u, v):
mst.append((u, v, w)); total += w
return mst, total
Builds toward the capstone (Ch. 39 Track B: social-network analyzer extracts a cheapest backbone). Ch. 34 adds max_flow to finish graphs.py.
One-line summary
A spanning tree is the minimal scaffolding ($n-1$ edges) that keeps a graph connected; the MST is the cheapest one; two greedy algorithms (Kruskal via union-find, Prim via a heap) find it in $O(E\log V)$; both are provably optimal because of the cut property, itself proved by an exchange argument.