Chapter 32 — Key Takeaways

DataField.Dev

Chapter 32 — Key Takeaways

Spanning trees, minimum spanning trees, and network design — the re-grounding card. Reread before an exam.

Core definitions

Term	Definition	Must-know fact
Spanning tree	Connected, acyclic subgraph of connected $G=(V,E)$ that includes every vertex	Has exactly $\lvert V\rvert - 1$ edges
Weight of a tree	$w(T) = \sum_{e \in E_T} w(e)$	Sum of edge weights
Minimum spanning tree (MST)	Spanning tree of minimum total weight	$w(T) \le w(T')$ for all spanning trees $T'$
Cut $(S, V\setminus S)$	Partition of $V$ into two non-empty parts	The "boundary" being crossed
Crossing edge	Edge with one endpoint in $S$, one in $V\setminus S$	—
Light edge	Minimum-weight crossing edge of a cut	The edge the cut property protects
Union-find / disjoint-set	Maintains disjoint sets; `find(x)`=representative, `union(x,y)`=merge	Cycle test in Kruskal
Disjoint-set forest	Sets as parent-pointer trees; root = representative	Substrate for union-find
Path compression	In `find`, repoint visited nodes to root	Flattens trees
Union by rank/size	Attach shorter/smaller tree under taller/larger root	Keeps trees shallow

Two structural identities (the engine of every proof here):

Add any non-tree edge to a spanning tree $\Rightarrow$ creates exactly one cycle.
Remove any edge from a spanning tree $\Rightarrow$ splits it into exactly two components.

Counting facts (cited, not proved): Cayley's formula — $K_n$ has $n^{n-2}$ spanning trees. General count — any cofactor of the Laplacian (Kirchhoff's matrix-tree theorem).

The two algorithms

	Kruskal (1956)	Prim (Jarník 1930 / Prim 1957)
Strategy	Cheapest edge anywhere that joins two components	Cheapest edge leaving the current tree
Grows	A forest that merges into one tree	One tree from a start vertex
Data structure	Union-find (for the cycle test)	Priority queue / heap
Test performed	`find(u) != find(v)`? add : skip	extract-min crossing edge
Edge ordering	Sort all edges ascending once	Lazy: pop cheapest frontier edge
Time	$O(E \log V)$ — sorting dominates	$O(E \log V)$ binary heap; $O(E + V\log V)$ Fibonacci heap
Shines on	Sparse graphs; edges pre-sortable	Dense graphs; adjacency lists
Cousin algorithm	reverse-delete (cycle property)	Dijkstra (Ch. 29) — different key

Kruskal skeleton: sort edges ascending → for each, if endpoints in different components, add + union → stop at $n-1$ edges.

Prim skeleton: start = one vertex → repeatedly add min-weight edge crossing tree boundary → stop when all vertices in.

Correctness: two theorems

Theorem	Statement	Use
Cut property	A light (min-weight) edge crossing any cut is in some MST; if uniquely lightest, in every MST	Proves Kruskal and Prim correct (each adds a light edge for some cut)
Cycle property	The unique heaviest edge on any cycle is in no MST	Justifies skipping/deleting redundant edges; powers reverse-delete

Proof technique for both: the exchange argument —

Assume an optimal solution that disagrees with the greedy choice; swap the greedy edge in and a worse edge out; show the swap does not increase (cut property) or strictly decreases (cycle property) weight; conclude.

Corollary: distinct edge weights $\Rightarrow$ unique MST (Kruskal and Prim return the same tree).

Decision aid — "which tool do I reach for?"

Goal	Use	Not
Cheapest set of links connecting all nodes	MST (Kruskal/Prim)	shortest-path tree
Cheapest route between two nodes	Dijkstra (Ch. 29)	MST
Split data into $k$ clusters (single-linkage)	MST, then delete $k-1$ heaviest edges	—
2-approx tour for metric TSP	MST walk + shortcut (Ch. 30)	exact (NP-hard)
Fault-tolerant network	MST plus extra edges for redundancy	bare MST (a tree = no backup)
Graph may be disconnected	minimum spanning forest (one MST/component)	MST (assumes connected)
Sparse graph, edges easy to sort	Kruskal	—
Dense graph, adjacency list	Prim	—

Common pitfalls

❌ "The MST gives the shortest path between every pair." No — MST minimizes total weight, not per-pair distance. (Triangle $XY{=}1, YZ{=}1, XZ{=}1.5$: MST path $X\to Z$ costs $2$, but the direct edge is $1.5$.)
❌ Forgetting the cycle test in Kruskal. Without find(u) != find(v), you build cycles, not a tree.
❌ Sorting descending for Kruskal. Kruskal is ascending; descending-with-keep-connected is the reverse-delete algorithm.
❌ Confusing MST with min-max objectives. MST minimizes the sum; if you need to minimize the single most expensive link or the longest path, that's a different problem. (Bonus: MST does minimize the max edge on the path between any two vertices — the minimax property.)
❌ Assuming the MST is unique. Only guaranteed when all weights are distinct; ties allow multiple MSTs.
❌ Treating a tree as fault-tolerant. Any single edge removal disconnects a tree.

Toolkit additions (`dmtoolkit/graphs.py`)

Added	Signature	Notes
`UnionFind`	`UnionFind(items)`, `.find(x)`, `.union(x,y)`	path compression; `union` returns `True` on a real merge
`mst_kruskal(g)`	`mst_kruskal(g) -> (mst_edges, total_weight)`	canonical; `mst_edges` = list of `(u, v, weight)`

def mst_kruskal(g):
    uf = UnionFind(g.vertices())
    mst, total = [], 0
    for u, v, w in sorted(g.edges(), key=lambda e: e[2]):
        if uf.union(u, v):
            mst.append((u, v, w)); total += w
    return mst, total

Builds toward the capstone (Ch. 39 Track B: social-network analyzer extracts a cheapest backbone). Ch. 34 adds max_flow to finish graphs.py.

One-line summary

A spanning tree is the minimal scaffolding ($n-1$ edges) that keeps a graph connected; the MST is the cheapest one; two greedy algorithms (Kruskal via union-find, Prim via a heap) find it in $O(E\log V)$; both are provably optimal because of the cut property, itself proved by an exchange argument.

Chapter 32 — Key Takeaways

Core definitions

The two algorithms

Correctness: two theorems

Decision aid — "which tool do I reach for?"

Common pitfalls

Toolkit additions (dmtoolkit/graphs.py)

One-line summary

Toolkit additions (`dmtoolkit/graphs.py`)