Case Study: Building a Fiber-Backbone Planner for a Campus Network

DataField.Dev

Case Study: Building a Fiber-Backbone Planner for a Campus Network

"You can defend a budget with a spreadsheet, or you can defend it with a theorem. Bring the theorem."

Executive Summary

In this build-focused case study you'll construct a small but complete network-design tool, BackbonePlanner, that an engineer could actually point at survey data. It (1) reads the feasible fiber routes between campus buildings with their costs, (2) computes the minimum-cost backbone two ways — Kruskal and Prim — and checks they agree, (3) reports the total budget with a one-line proof of optimality, and (4) goes one step past the textbook by quantifying the cost of fault tolerance: the cheapest single extra link that would make the backbone survive any one cable cut. You'll build each piece incrementally, hand-derive its output, and finish with a tool that turns the chapter's two algorithms and two correctness theorems into a defensible procurement recommendation.

Skills applied

Implementing Kruskal (with union-find) and Prim (with a heap) and cross-validating them.
Modeling a real layout problem as a connected weighted graph and recognizing it as an MST instance.
Using the cut property as the argument (not just the algorithm) behind a budget.
Extending the MST with a redundancy analysis grounded in the "one tree edge = one bridge" fact.

Background

The scenario

A campus IT department must lay a fiber backbone connecting six buildings: the Library (L), Engineering (E), the Gym (G), Admin (A), the Dorms (D), and the Science building (S). A site survey priced every feasible trench (some pairs can't be trenched — there's a road or a protected lawn between them). Costs are in $\$1\text{k}$:

Route	Cost	Route	Cost
L–E	3	E–A	4
L–G	1	G–A	5
L–S	6	G–D	2
E–G	4	A–D	7
		D–S	8

So the feasible graph has $6$ vertices and $9$ edges. Every building must be reachable from every other (it's one network), but the department does not need every pair directly trenched, and it does not (yet) care about hop count. That is exactly the minimum spanning tree problem: cheapest set of trenches that connects all six buildings.

   Feasible routes (label = cost in $1k):
        L --3-- E
        |\      |
        1 6     4
        |  \    |
        G   S   A
        |\      |
        4 2     7
        |  \    |
        E*  D --/        (E* = the E-G route, cost 4)
                \
                 8
                  \
                   S

(The ASCII is only a sketch; the table above is authoritative. Note $D$ connects to $G$ (2), $A$ (7), and $S$ (8); $S$ connects to $L$ (6) and $D$ (8).)

🔗 Connection. This is the literal shape of Otakar Borůvka's original 1926 problem — wiring an electrical grid for the least total cable — recounted in §32.6. A century later the object is fiber instead of copper, but the mathematics is unchanged: the cheapest connecting network is a minimum spanning tree, and the cut property is why no cheaper one exists.

Why this matters

Anyone can sort routes in a spreadsheet and guess a cheap layout. What an engineer with this chapter can do is prove the layout is optimal — "there is no connected set of trenches cheaper than this one, and here is the one-sentence reason" — and then reason precisely about the trade-off when the department asks the inevitable follow-up: "What if a backhoe cuts a cable?" A tree has no backup paths; quantifying the cheapest way to add one is a natural, valuable extension that falls right out of the MST.

Phase 1: Represent the graph and build Kruskal

We'll label buildings $L=0, E=1, G=2, A=3, D=4, S=5$ and store the survey as a list of (w, u, v) edges. First, the Kruskal core with union-find (the same structure as the chapter's mst_kruskal).

NAME = {0:'L', 1:'E', 2:'G', 3:'A', 4:'D', 5:'S'}
edges = [(3,0,1),(1,0,2),(6,0,5),(4,1,2),(4,1,3),
         (5,2,3),(2,2,4),(7,3,4),(8,4,5)]

def kruskal(n, edges):
    parent = list(range(n))
    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]   # path halving
            x = parent[x]
        return x
    mst, total = [], 0
    for w, u, v in sorted(edges):           # ascending cost
        ru, rv = find(u), find(v)
        if ru != rv:
            parent[ru] = rv
            mst.append((u, v, w))
            total += w
    return mst, total

mst_k, total_k = kruskal(6, edges)
print([(NAME[u], NAME[v], w) for u, v, w in mst_k], total_k)
# Expected output:
# [('L', 'G', 1), ('G', 'D', 2), ('L', 'E', 3), ('E', 'A', 4), ('L', 'S', 6)] 15

Hand-deriving the output. Sort the nine edges ascending: $LG(1), GD(2), LE(3), EG(4), EA(4), GA(5), LS(6), AD(7), DS(8)$. Process them, tracking components:

Step	Edge	$w$	Same component?	Action	Components after
1	$LG$	1	no	add	$\{L,G\}$
2	$GD$	2	no	add	$\{L,G,D\}$
3	$LE$	3	no	add	$\{L,E,G,D\}$
4	$EG$	4	$E=G$? yes	skip (cycle)	$\{L,E,G,D\}$
5	$EA$	4	no	add	$\{L,E,G,D,A\}$
6	$GA$	5	$G=A$? yes	skip (cycle)	$\{L,E,G,D,A\}$
7	$LS$	6	no	add	$\{L,E,G,D,A,S\}$

After step 7 we have $5 = n-1$ edges; stop (never examine $AD$ or $DS$). MST edges $LG, GD, LE, EA, LS$ with total $1+2+3+4+6 = 15$. ✓ The skipped edges $EG(4)$ and $GA(5)$ each would have closed a cycle.

🔄 Check Your Understanding Edge $EG$ has cost $4$, the same as $EA$, yet $EG$ was skipped and $EA$ was added. Why did order not matter for the final weight here?

Answer

When $EG$ was considered, $E$ and $G$ were already in the same component (via $L$), so $EG$ would have formed a cycle — it is rejected regardless of its weight. $EA$ joined a genuinely new building ($A$). Because $EG$ is the unique heaviest edge of the cycle $L\text{–}E\text{–}G\text{–}L$ it is in no MST (cycle property), so skipping it is forced, not arbitrary.

Phase 2: Build Prim and cross-validate

A robust tool computes the answer a second, independent way and checks the two agree — a cheap, powerful guard against a bug in either implementation. We add Prim with a binary heap (heapq), keyed by the weight of the cheapest edge reaching each new vertex.

import heapq

def build_adj(n, edges):
    adj = {u: [] for u in range(n)}
    for w, u, v in edges:
        adj[u].append((w, v)); adj[v].append((w, u))   # undirected
    return adj

def prim(n, adj, start=0):
    in_tree = [False]*n
    mst, total = [], 0
    pq = [(0, start, -1)]                    # (edge_weight, vertex, parent)
    while pq:
        w, u, par = heapq.heappop(pq)
        if in_tree[u]:
            continue                         # stale entry
        in_tree[u] = True
        if par != -1:
            mst.append((par, u, w)); total += w
        for ew, v in adj[u]:
            if not in_tree[v]:
                heapq.heappush(pq, (ew, v, u))
    return mst, total

adj = build_adj(6, edges)
mst_p, total_p = prim(6, adj, start=0)
print(total_p, total_p == total_k)
# Expected output:
# 15 True

Hand-deriving the output. Start at $L$. Push $L$'s edges $LG(1), LE(3), LS(6)$. - Pop $LG(1)$ → add $G$; push $G$'s new edges $GE(4), GA(5), GD(2)$. - Pop $GD(2)$ → add $D$; push $D$'s new edges $DA(7), DS(8)$. - Pop $LE(3)$ → add $E$; push $E$'s new edges $EA(4)$ (and stale $EG(4)$). - Pop $EA(4)$ → add $A$; (remaining $A$ edges go to in-tree vertices). - Pop the stale $EG(4)$, $GA(5)$ → both endpoints in tree, skip. - Pop $LS(6)$ → add $S$. All six in tree.

Prim's tree weight is $1+2+3+4+6 = 15$, equal to Kruskal's. Because all edge weights are distinct except the $EA$/$EG$ tie — and the tie is between an MST edge and a never-needed cycle edge — both algorithms return the same tree here. The tool prints 15 True. ✓

💡 Intuition: Kruskal and Prim are the same theorem (the cut property) executed with different bookkeeping. Kruskal grows a forest keyed by global edge order; Prim grows one tree keyed by distance to the current frontier. When they agree, you've checked the cut property twice from two directions — a satisfying, low-cost sanity check to bake into any planner.

Phase 3: Assemble the planner and emit the budget + proof

Now wrap both algorithms in a small class that returns the recommendation a procurement committee needs: the edge list, the total, and the reason it is optimal.

class BackbonePlanner:
    def __init__(self, n, edges, name):
        self.n, self.edges, self.name = n, edges, name
    def plan(self):
        mst_k, total_k = kruskal(self.n, self.edges)
        _, total_p = prim(self.n, build_adj(self.n, self.edges))
        assert total_k == total_p, "Kruskal/Prim disagree -- bug!"
        routes = [(self.name[u], self.name[v], w) for u, v, w in mst_k]
        return {"routes": routes, "budget_k": total_k,
                "num_links": len(mst_k), "skipped": len(self.edges) - len(mst_k)}

planner = BackbonePlanner(6, edges, NAME)
plan = planner.plan()
print(plan["budget_k"], plan["num_links"], plan["skipped"])
# Expected output:
# 15 5 4

Hand-deriving the output. From Phase 1 the budget is $15$ and the MST has $n-1 = 5$ links; the survey had $9$ feasible routes, so $9 - 5 = 4$ were not built. The assert passes because Phase 2 confirmed both totals are $15$. ✓

The committee-ready statement the tool's numbers justify:

"Build these five trenches — L–G, G–D, L–E, E–A, L–S — for \$15k. This is the minimum possible budget that connects all six buildings. The guarantee is the cut property: at every step we added the cheapest trench leaving the already-connected set, and any cheaper connected layout would have to omit one of these and reconnect across the same boundary at equal-or-greater cost. No alternative on the survey can beat \$15k."

That sentence is the whole point of having proved the algorithms correct in §32.5: the recommendation is not "this is the cheapest we found," it is "this is the cheapest that exists."

⚠️ Common Pitfall: The MST minimizes total trenching cost. It does not minimize the number of hops between, say, the Dorms and the Science building (in our tree that path is $D\text{–}G\text{–} L\text{–}S$, three hops), and it provides no redundancy. If the committee actually wants low latency between specific buildings, that's a shortest-path-tree problem (Chapter 29), not an MST. Always re-confirm the objective is "minimum total cost, everything connected" before shipping an MST.

Phase 4: Quantify the cost of fault tolerance

Here the build goes one useful step past the textbook. The backbone is a tree, so — by the §32.1 fact that removing any single tree edge splits it into two components — any one cable cut disconnects the campus. The committee will ask: what is the cheapest single extra trench that makes the network survive any one failure on the backbone?

A graph survives any single edge failure iff it is 2-edge-connected (no bridges). Turning a tree into a 2-edge-connected graph generally needs several added edges, but the committee's budget often allows one. The most valuable single addition is the cheapest non-tree route that creates a cycle covering as many tree edges as possible — every tree edge on that cycle gains a backup path and stops being a single point of failure.

We don't need a heavy algorithm for six buildings; we evaluate each unused survey edge, find the tree path between its endpoints (the edges it would protect), and report the cheapest edge per "protection."

def tree_path_edges(mst_edges, n, u, v):
    """Edges on the unique tree path u->v (each as a frozenset)."""
    adj = {x: [] for x in range(n)}
    for a, b, w in mst_edges:
        adj[a].append(b); adj[b].append(a)
    parent = {u: None}; stack = [u]
    while stack:                              # DFS to find v
        x = stack.pop()
        for y in adj[x]:
            if y not in parent:
                parent[y] = x; stack.append(y)
    path, x = [], v                           # walk back from v to u
    while parent[x] is not None:
        path.append(frozenset((x, parent[x]))); x = parent[x]
    return path

used = {frozenset((u, v)) for u, v, w in mst_k}
options = []
for w, u, v in edges:
    if frozenset((u, v)) not in used:         # a non-tree (redundant) route
        protected = len(tree_path_edges(mst_k, 6, u, v))
        options.append((w, protected, NAME[u], NAME[v]))
best = min(options, key=lambda o: (o[0], -o[1]))   # cheapest, then most protection
print(sorted(options))
print("best add:", best)
# Expected output:
# [(4, 2, 'E', 'G'), (5, 3, 'A', 'G'), (7, 4, 'A', 'D'), (8, 3, 'S', 'D')]
# best add: (4, 2, 'E', 'G')

Hand-deriving the output. The four unused survey routes are $EG(4), GA(5), AD(7), DS(8)$. For each, trace the unique path through the spanning-tree edges $LG, GD, LE, EA, LS$ and count how many tree edges it covers (those are the edges that gain a backup):

$E$–$G$: tree path $E\text{–}L\text{–}G$, covering $LE, LG$ — 2 edges. Cost $4$.
$A$–$G$: tree path $A\text{–}E\text{–}L\text{–}G$, covering $EA, LE, LG$ — 3 edges. Cost $5$.
$A$–$D$: tree path $A\text{–}E\text{–}L\text{–}G\text{–}D$, covering $EA, LE, LG, GD$ — 4 edges. Cost $7$.
$D$–$S$: tree path $D\text{–}G\text{–}L\text{–}S$, covering $GD, LG, LS$ — 3 edges. Cost $8$.

Sorted by (cost, protected), the option list is [(4,2,'E','G'), (5,3,'A','G'), (7,4,'A','D'), (8,3,'S','D')], and the cheapest addition is $EG(4)$ (protection $2$, broken in its favor by lowest cost). ✓ Note how the most protective edge, $AD$ (4 tree edges), is also the second-most expensive — cheapest and most-protective are different objectives, which the next discussion question pursues.

🐛 Find the Error: A teammate "optimizes" tree_path_edges by computing the path in the original survey graph (edges) instead of in mst_k. They argue "the path between $u$ and $v$ is the path between $u$ and $v$ — the graph shouldn't matter." Run their version in your head on the pair $A$–$D$ and explain why it returns the wrong protected count.

Answer

In the original graph, $A$–$D$ is a direct edge (cost $7$), so the shortest/only-considered path from $A$ to $D$ has length $1$ and "protects" $1$ edge — but that edge is $AD$ itself, which is not in the backbone, so it protects zero tree edges. The redundancy analysis must use the spanning tree, because the whole point is "which backbone edges get a backup if I add this trench?" The unique tree path $A\text{–}E\text{–}L\text{–}G\text{–}D$ (4 tree edges) is the right answer; the original-graph path is the wrong abstraction. The bug is using the wrong graph, and it would silently undercount protection for every non-tree edge that happens to be a direct survey route.

The decision the planner supports: adding the single cheapest redundant link, E–G at \$4k, creates the cycle $L\text{–}E\text{–}G\text{–}L$, giving the two trenches $LE$ and $LG$ a backup path. For \$15k + \$4k = \$19k, the two busiest core links survive a cut. To make the entire backbone single-failure-proof would cost more (you'd protect every tree edge), but the planner now lets the committee see the price of each increment of resilience.

🔗 Connection. This redundancy step is where the MST's main limitation (no fault tolerance, from §32.6) becomes an opportunity. The MST is the minimum-cost baseline; real networks add edges back deliberately, trading the cut property's guaranteed minimum for resilience. Quantifying that trade is a standard part of real network design and a natural use of the "one tree edge = one bridge" fact.

Discussion Questions

The planner cross-checks Kruskal against Prim by comparing totals. Is equal total weight enough to conclude they found the same tree? Under what condition on the weights would equal total guarantee identical edge sets? (Tie this to §32.4's uniqueness remark.)
The assert total_k == total_p is a runtime guard, not a proof. What does the chapter give you that the assert cannot — and why do you still want the assert in production code? (Theme: computation and proof are partners.)
Adding $EG$ protects $2$ tree edges; adding $AD$ protects $4$. If the committee could afford exactly one extra link and wanted to maximize the number of protected core links per dollar, which would you recommend, and what does that say about "cheapest" vs. "best value"?
Suppose the survey is later found to have a disconnected feasible graph (a building with no trenchable route to the rest). What does kruskal return, and how would you change the planner's report to be honest about it? (Hint: minimum spanning forest.)
The committee proposes using Dijkstra from the Library instead, "to make everything close to the Library." Explain precisely what tree that produces, how it differs from the MST, and when each is the right call.

Your Turn: Extensions

Option A (full 2-edge-connectivity). Extend the planner to report the cheapest set of extra edges making the whole backbone survive any single cut (no bridges remain). Start by listing every bridge of the augmented graph after each addition; greedily add the cheapest edge covering the most remaining bridges. Compare your greedy total to adding all unused edges.
Option B (capacitated check). Suppose each trench also has a bandwidth capacity. After building the MST, verify whether the tree path between the two highest-traffic buildings has enough capacity; if not, add the cheapest edge that creates an alternate, higher-capacity route. (This blends MST with a flow idea you'll formalize in Chapter 34.)
Option C (sensitivity analysis). For each MST edge, compute by how much its cost could increase before it would drop out of the MST (its "tolerance"). Use the cycle property: an MST edge stays in the MST until its weight exceeds the next-cheapest edge crossing the cut it defines. Print a tolerance per backbone link — useful when survey prices are uncertain.

Key Takeaways

Cross-validate by building twice. Kruskal and Prim are independent implementations of the same theorem; agreement is a cheap, strong correctness guard — but only the cut-property proof certifies the answer for inputs you didn't test.
Ship the theorem, not the spreadsheet. The deliverable is "\$15k is the provable minimum," and the one-sentence justification is the cut property. That is the professional difference the chapter buys you.
The MST is a baseline, redundancy is an add-on. Because a tree edge is a bridge (§32.1), every backbone link is a single point of failure; quantifying the cheapest protective edge per link turns the MST's limitation into a costed design decision.
Hand-derived expected output catches real bugs. The deliberate protected-count error in Phase 4 is exactly the kind of mistake a "looks plausible" guess hides and a careful hand-trace exposes — the §32 discipline in miniature.