Case Study: Single-Linkage Clustering of Sensor Readings with an MST

DataField.Dev

Case Study: Single-Linkage Clustering of Sensor Readings with an MST

"The eye sees clusters; the MST proves where to cut them."

Executive Summary

A minimum spanning tree is not only a way to wire things cheaply — it is also a complete recipe for single-linkage clustering, one of the oldest unsupervised-learning methods. In this analysis-heavy case study you'll take a small set of two-dimensional points (think: temperature/humidity readings from eight environmental sensors), build their MST by hand using Kruskal's algorithm, and then read the cluster structure straight off the tree by deleting its heaviest edges. The whole pipeline is one chapter idea — the MST — applied twice: once to connect the points, once to cut them apart. You'll finish able to explain exactly why "delete the $k-1$ heaviest MST edges" yields $k$ clusters, and where the method's blind spots are.

Skills applied

Modeling a data set as a complete weighted graph (points → vertices, distances → edge weights).
Executing Kruskal's algorithm with union-find by hand and verifying it in code.
Translating "cut the tree" into "partition the data," and proving the cut count.
Diagnosing single-linkage's failure mode (chaining) by reading the MST.

Background

The scenario

A facilities team has eight wireless sensors scattered through a warehouse. Each sensor reports a two-number summary — a normalized (temperature, vibration) pair — and the team suspects the sensors fall into a few natural groups (e.g., "near the loading dock," "by the cold-storage wall," "in the quiet mezzanine"). They have no labels. They want to discover the groups from the numbers alone.

We will use single-linkage clustering: define the distance between two clusters as the smallest distance between a point in one and a point in the other, and repeatedly merge the two closest clusters. The remarkable fact — the reason this case study lives in the MST chapter — is that this entire agglomerative procedure is encoded in the minimum spanning tree of the points. Build the MST once, and every single-linkage clustering (for every $k$) is available by deleting its heaviest edges.

Here are the eight points (sensor IDs $0$ through $7$), chosen with friendly coordinates so every distance we need is exact:

Sensor	$x$	$y$
0	0	0
1	0	3
2	4	0
3	4	3
4	20	0
5	20	3
6	24	0
7	24	3

Eyeballing the coordinates, points $0,1,2,3$ form a tight $4\times 3$ rectangle near the origin, and $4,5,6,7$ form an identical rectangle far to the right (starting at $x=20$). So we expect two clusters of four. Our job is to recover that from the MST — and to see what single-linkage does when we push it.

🔗 Connection. This is the social-graph / data thread at work, but turned on numeric data instead of a follower network: any time "similarity" can be turned into an edge weight, the graph machinery of Part V applies. The single-linkage idea reappears in image segmentation, anomaly detection, and phylogenetics. The same mst_kruskal you added to dmtoolkit in §32's Project Checkpoint is the engine here.

Why this matters

Single-linkage clustering is fast, parameter-light (you don't have to guess the number of clusters in advance — the MST gives you all of them at once), and interpretable. It is also famously prone to chaining: a thin "bridge" of intermediate points can fuse two otherwise-separate blobs. The MST makes this failure visible — the bridge is a single suspicious edge — which is exactly why analysts who understand the MST trust single-linkage more wisely than those who treat it as a black box.

Phase 1: Model the data as a weighted graph

Vertices are the eight sensors. We connect every pair with an edge whose weight is the Euclidean distance between them — a complete graph $K_8$ with $\binom{8}{2}=28$ edges. We don't need all $28$ for the MST, but we list the short ones, since Kruskal will only ever use the cheapest $n-1 = 7$.

Recall the distance $d\big((x_1,y_1),(x_2,y_2)\big) = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}$. The relevant small distances (the ones competing to be MST edges) work out cleanly:

Edge	Pair	$\Delta x, \Delta y$	Distance
0–1	$(0,0)$–$(0,3)$	$0, 3$	$3$
2–3	$(4,0)$–$(4,3)$	$0, 3$	$3$
4–5	$(20,0)$–$(20,3)$	$0, 3$	$3$
6–7	$(24,0)$–$(24,3)$	$0, 3$	$3$
0–2	$(0,0)$–$(4,0)$	$4, 0$	$4$
1–3	$(0,3)$–$(4,3)$	$4, 0$	$4$
4–6	$(20,0)$–$(24,0)$	$4, 0$	$4$
5–7	$(20,3)$–$(24,3)$	$4, 0$	$4$
0–3	$(0,0)$–$(4,3)$	$4, 3$	$5$
1–2	$(0,3)$–$(4,0)$	$4, 3$	$5$
2–4	$(4,0)$–$(20,0)$	$16, 0$	$16$
3–5	$(4,3)$–$(20,3)$	$16, 0$	$16$

Every other edge (e.g. $0$–$4$, distance $20$) is longer still. The single cheapest edge bridging the two rectangles is $2$–$4$ (or $3$–$5$) at distance $16$ — far larger than any within-rectangle edge. That gap is the signal we will exploit.

🔄 Check Your Understanding Why is it safe to ignore the $16$ longest edges of $K_8$ when computing the MST by hand?

Answer

Kruskal processes edges cheapest-first and stops after adding $n-1 = 7$ edges. Any edge longer than the $7$th edge actually added is never accepted — it would either come too late (the tree is already complete) or connect two already-joined components (a cycle). As long as we have all the candidates at least as cheap as the most expensive MST edge, ignoring the rest cannot change the answer.

Phase 2: Build the MST by hand (Kruskal + union-find)

Sort the candidate edges ascending. Ties (all the weight-$3$ edges, then all the weight-$4$ edges) can be broken in any order — with ties the MST is not unique, but its weight and its cut structure are what we care about, and those are stable. We process: the four weight-$3$ edges, then the four weight-$4$ edges, then weight-$5$, then the weight-$16$ bridges.

Start with eight singletons $\{0\},\{1\},\dots,\{7\}$. We track components after each step.

Step	Edge	$w$	`find(u)`, `find(v)`	Same?	Action	Components after
1	0–1	3	$0 \ne 1$	no	add	$\{0,1\}$ and six singletons
2	2–3	3	$2 \ne 3$	no	add	$\{0,1\},\{2,3\}$, four singletons
3	4–5	3	$4 \ne 5$	no	add	$\{0,1\},\{2,3\},\{4,5\}$, two singletons
4	6–7	3	$6 \ne 7$	no	add	$\{0,1\},\{2,3\},\{4,5\},\{6,7\}$
5	0–2	4	$0 \ne 2$	no	add	$\{0,1,2,3\},\{4,5\},\{6,7\}$
6	1–3	4	$1 = 3$	yes	skip (cycle)	$\{0,1,2,3\},\{4,5\},\{6,7\}$
7	4–6	4	$4 \ne 6$	no	add	$\{0,1,2,3\},\{4,5,6,7\}$
8	5–7	4	$5 = 7$	yes	skip (cycle)	$\{0,1,2,3\},\{4,5,6,7\}$
9	0–3	5	$0 = 3$	yes	skip (cycle)	$\{0,1,2,3\},\{4,5,6,7\}$
10	1–2	5	$1 = 2$	yes	skip (cycle)	$\{0,1,2,3\},\{4,5,6,7\}$
11	2–4	16	$2 \ne 4$	no	add	$\{0,1,2,3,4,5,6,7\}$

After step 11 the forest has $7 = n-1$ edges, so we stop (never examining $3$–$5$ or any longer edge).

The MST is ${\,0\text{–}1,\ 2\text{–}3,\ 4\text{–}5,\ 6\text{–}7,\ 0\text{–}2,\ 4\text{–}6,\ 2\text{–}4\,}$ with total weight $3+3+3+3+4+4+16 = 36$.

Notice the structure: six cheap edges (four 3's and two 4's — wait, three edges of weight... let's be precise) — the MST used four weight-3 edges, two weight-4 edges, and one weight-16 edge. That lone weight-$16$ edge is the bridge $2$–$4$ between the two rectangles, and it towers over the others. Hold that thought; it is the cut point.

💡 Intuition: Kruskal builds clusters bottom-up exactly the way single-linkage does. Each union is a "merge the two closest groups" step. The order in which Kruskal adds edges is the dendrogram (the merge tree) of single-linkage clustering, and the weight of each added edge is the height at which that merge happens. The expensive bridge edge is the last merge — the two big clusters joining at height $16$.

Phase 3: Verify the MST in code

Before trusting the hand-trace, encode the points and reuse the chapter's kruskal. We compute squared distances as integer edge weights to keep the comparison exact (distance and squared distance order edges identically, since squaring is monotone on non-negatives — a point worth stating, since it lets us avoid floating-point entirely).

from itertools import combinations

pts = {0:(0,0), 1:(0,3), 2:(4,0), 3:(4,3),
       4:(20,0), 5:(20,3), 6:(24,0), 7:(24,3)}

def sq_dist(p, q):
    return (p[0]-q[0])**2 + (p[1]-q[1])**2

# Build edge list (w, u, v) using SQUARED distance (monotone => same MST).
edges = [(sq_dist(pts[u], pts[v]), u, v) for u, v in combinations(pts, 2)]

def kruskal(n, edges):
    parent = list(range(n))
    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x
    mst, total = [], 0
    for w, u, v in sorted(edges):
        ru, rv = find(u), find(v)
        if ru != rv:
            parent[ru] = rv
            mst.append((u, v, w))
            total += w
    return mst, total

mst, total = kruskal(8, edges)
print(sorted((min(u,v), max(u,v)) for u, v, w in mst))
# Expected output:
# [(0, 1), (0, 2), (2, 3), (2, 4), (4, 5), (4, 6), (6, 7)]

Hand-deriving the expected output. Using squared distances the candidate weights become: within- rectangle vertical edges $3^2=9$ (edges $0$–$1$, $2$–$3$, $4$–$5$, $6$–$7$); horizontal edges $4^2=16$ ($0$–$2$, $1$–$3$, $4$–$6$, $5$–$7$); diagonals $5^2=25$; the bridge $2$–$4$ at $16^2 = 256$. The ascending order is the same as before (9's, then 16's, then 25's, then 256), so Kruskal accepts the same seven edges. Sorted and normalized to (min,max) they are $(0,1),(0,2),(2,3),(2,4),(4,5),(4,6),(6,7)$ — matching the table. The set of edges matches our hand trace (the tie-break order within the 9's and 16's differs harmlessly). ✓

⚠️ Common Pitfall: Using squared distance is fine for finding the MST (it preserves the order of edges, so Kruskal makes the same decisions), but the cluster heights on a dendrogram are then squared too. If you report merge heights to a user, remember to take the square root, or just compute with true distances. The MST edges are identical either way; only the labels differ.

Phase 4: Cut the tree into clusters

Now the second use of the MST. To produce $k$ single-linkage clusters, delete the $k-1$ most expensive MST edges; the tree falls into $k$ connected components, and those components are the clusters.

For $k = 2$: delete the single heaviest MST edge. That is the bridge $2$–$4$ (weight $16$, by far the largest). Removing it splits the tree into:

$\{0,1,2,3\}$ — the near-origin rectangle, and
$\{4,5,6,7\}$ — the far rectangle.

Exactly the two clusters we predicted from the coordinates. The MST found the gap for us: the cluster boundary is wherever the MST's longest edge is.

Here is the cut-and-label step in code, reusing the MST edges (with their weights) from Phase 3:

def single_linkage(mst_edges, n, k):
    """Delete the k-1 heaviest MST edges; return clusters as a list of sets."""
    keep = sorted(mst_edges, key=lambda e: e[2])[: (n - 1) - (k - 1)]  # drop heaviest k-1
    parent = list(range(n))
    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]; x = parent[x]
        return x
    for u, v, w in keep:
        parent[find(u)] = find(v)
    clusters = {}
    for v in range(n):
        clusters.setdefault(find(v), set()).add(v)
    return [sorted(c) for c in clusters.values()]

print(sorted(single_linkage(mst, 8, 2)))
# Expected output:
# [[0, 1, 2, 3], [4, 5, 6, 7]]

Hand-deriving the output. With $n=8$ and $k=2$, we keep $(n-1)-(k-1) = 7 - 1 = 6$ edges — all of the MST except its single heaviest, the bridge $2$–$4$. The six kept edges are the within-rectangle ones, which union $\{0,1,2,3\}$ into one set and $\{4,5,6,7\}$ into another. Grouping vertices by their root gives the two clusters $[0,1,2,3]$ and $[4,5,6,7]$. ✓

For $k = 3$: delete the two heaviest MST edges — the bridge $2$–$4$ (weight $16$) and then the next heaviest, one of the weight-$4$ edges, say $0$–$2$. Removing $0$–$2$ further splits the near rectangle into $\{0,1\}$ and $\{2,3\}$. The three clusters become $\{0,1\}, \{2,3\}, \{4,5,6,7\}$. The far rectangle stays whole because its internal edges (weight 3 and 4) are cheaper than $0$–$2$ only by the tie; in a real data set with distinct distances the third cut would land unambiguously on whichever within-cluster edge is globally largest.

🔄 Check Your Understanding To split these eight sensors into $k=4$ clusters, how many MST edges do you delete, and (using the weight-3/weight-4 structure) what are the resulting clusters?

Answer

Delete $k-1 = 3$ heaviest edges: the bridge $2$–$4$ (16), and the two weight-$4$ edges $0$–$2$ and $4$–$6$. That leaves only the four weight-$3$ edges, giving the four clusters ${0,1}, {2,3}, {4,5}, {6,7}$ — the four vertical sensor pairs. (Deleting the $k-1$ heaviest of $n-1$ edges always leaves $(n-1)-(k-1) = n-k$ edges across $n$ vertices, which is exactly $k$ components.)

Phase 5: Why deleting $k-1$ edges gives exactly $k$ clusters (the analysis)

This is the load-bearing claim, and it is a direct consequence of the spanning-tree edge count from §32.1. The MST is a tree on $n$ vertices, so it has exactly $n-1$ edges and is connected (one component). Each edge you delete from a tree increases the number of connected components by exactly one — because §32.1 established that removing any one edge from a spanning tree splits it into exactly two pieces. Starting from $1$ component and deleting $k-1$ edges therefore yields $1 + (k-1) = k$ components. No edge deletion can split a tree into more than two pieces (a tree has no cycles, so each edge is a "bridge" whose removal disconnects exactly its two sides), so the count is exact, not an upper bound.

That settles how many clusters. Which clusters? Deleting the heaviest edges is what makes this single-linkage: the largest MST edge is the longest "nearest-neighbor link" holding two groups together, so cutting it separates the two groups that were joined by their closest pair of points — precisely the single-linkage merge criterion, run in reverse.

🚪 Threshold Concept. One structure, two jobs. The MST simultaneously encodes (a) the cheapest way to connect the points and (b) the entire hierarchy of how to separate them. The connect/separate duality is the same edge-count fact — "a tree on $n$ vertices has $n-1$ edges, and each edge is a bridge" — read in two directions. Once you see that a single combinatorial object answers both the network-design question and the clustering question, you start looking for the tree hiding inside other problems.

Phase 6: The failure mode — chaining (read it off the MST)

Single-linkage's weakness is chaining: a sparse trail of intermediate points can link two clusters that a human would keep apart. The MST shows you exactly when this happens. Suppose we add one rogue sensor, $8$, sitting halfway in the gap at $(12, 0)$. Its distance to point $2$ (at $(4,0)$) is $8$, and to point $4$ (at $(20,0)$) is also $8$ — both cheaper than the old bridge of $16$.

Now Kruskal, after building the two rectangles, will add $2$–$8$ (weight $8$) and $8$–$4$ (weight $8$) instead of the direct $2$–$4$ (weight $16$). The MST's heaviest edge is now only $8$, and it is not a clean separator: cutting it (for $k=2$) splits off either $\{4,5,6,7\}$ or leaves the rogue point dangling with one rectangle, depending on tie-breaks — the crisp two-blob structure is gone, replaced by a "chain." The lesson: a single intermediate point can collapse the gap, and you can see it in the MST as a pair of medium-weight edges where you expected one large one.

🐛 Find the Error: A teammate concludes, "Single-linkage is robust because the MST is unique and optimal." Identify the conflation. Is the MST being optimal the same as the clustering being good?

Answer

The MST being a provably minimum-weight spanning tree (optimal for the connection problem) says nothing about whether single-linkage produces good clusters. Optimality of the MST is about total edge weight; clustering quality is about whether the cuts match meaningful structure. Chaining shows a perfectly optimal MST yielding a poor clustering. "Optimal tree" ≠ "good clusters." The two problems share the MST but have different success criteria.

Discussion Questions

We computed with squared Euclidean distance to stay in integers. Prove that for any monotone increasing function $g$ (here $g(d) = d^2$ on $d \ge 0$), Kruskal returns the same MST edge set on weights $w$ and on weights $g(w)$. Which step of Kruskal does this rely on?
Single-linkage gives all clusterings (every $k$) from one MST. Compare this to a method that must be re-run from scratch for each $k$. What is the cost advantage, in terms of the $O(E \log V)$ MST build plus per-$k$ cutting?
The chaining example used a point exactly between the two rectangles. How far into the gap could the rogue point move before single-linkage stops merging the rectangles through it? State the condition on its distances in terms of the old bridge weight $16$.
Suppose two of the eight original distances had been tied in a way that made the MST non-unique. Argue that the $k=2$ clustering is still unique even though the MST is not. (Hint: the heaviest edge — the bridge — is unique here.)
How would you adapt this pipeline to cluster users in a social graph (the Part V anchor) rather than points in a plane? What plays the role of "distance"?

Your Turn: Extensions

Option A (dendrogram heights). Modify the pipeline to output, for each merge, the height (the true distance, not squared) at which it occurred. Print the merges in order. This is the single-linkage dendrogram — confirm the last merge has height $16$ for the original eight points.
Option B (max-spacing clustering). It is a theorem (Kleinberg–Tardos) that deleting the $k-1$ heaviest MST edges produces the clustering of maximum spacing (largest minimum inter-cluster distance). Test this empirically: for the original points and $k=2$, compute the spacing of the MST clustering and of one alternative 2-clustering you pick by hand, and confirm the MST's is larger.
Option C (robustify). Implement an outlier guard: before clustering, drop any point whose nearest- neighbor distance exceeds a threshold (these are the points that cause chaining). Re-run on the nine-point version with the rogue sensor and show the two clean clusters return.

Key Takeaways

Clustering is an MST in disguise. Build the MST of your points (edge weight = distance), and every single-linkage clustering is available by deleting heaviest edges — no re-running per $k$.
Delete $k-1$ edges, get $k$ clusters — provably. This is the §32.1 fact "each tree-edge removal adds exactly one component," applied $k-1$ times to a connected tree.
The MST exposes the gaps. The cluster boundary is the MST's heaviest edge; chaining shows up as a suspicious pair of medium edges where you expected one large one.
Optimal MST ≠ good clustering. The MST is provably minimum-weight, but that guarantees nothing about cluster quality; understanding the tree is what lets you trust — or distrust — the result.