Case Study: Diagnosing Register Spills in a Compiler's Allocator

DataField.Dev

Case Study: Diagnosing Register Spills in a Compiler's Allocator

"A compiler's register allocator is graph coloring wearing a hard hat."

Executive Summary

A function in your codebase compiles to slow machine code, and the compiler's diagnostics blame register spills — values it could not keep in fast CPU registers and had to push to memory. Your job is not to write an allocator but to understand one: to take the function, build its interference graph, compute the chromatic number, and explain — provably — exactly why the spill was unavoidable on a machine with only so many registers, and what (if anything) would fix it. This is the analysis side of §33.2 (register allocation as coloring) and §33.1 (the clique lower bound as a certificate of register pressure). The payoff is a skill every systems programmer eventually needs: reading a performance problem as a statement about $\chi(G)$ and $\omega(G)$, not as compiler magic.

Skills applied

Translating straight-line code into an interference graph (vertices = values, edges = overlapping live ranges).
Computing live ranges and reading off interference edges by hand.
Using the clique lower bound ($\chi \ge \omega$) as a certificate of minimum register pressure.
Computing $\chi$ exactly on a small graph and explaining a spill as "$\chi >$ register count."
Analyzing how greedy coloring's vertex order changes the register assignment — and when it spills unnecessarily.

Background

The scenario

A graphics routine computes a weighted blend of pixel channels. The compiler lowers it to this straight-line intermediate code (each line defines one value; read() stands for a load from memory or an argument):

1:  r = read()              # red
2:  g = read()              # green
3:  b = read()              # blue
4:  w = r + g               # partial sum 1
5:  x = w + b               # partial sum 2  (r dies after line 4? no -- used line 6)
6:  y = r + x               # r dies after this
7:  z = g + y               # g dies after this
8:  return b + z            # b dies after this; x, w already dead

The target is an embedded CPU with 3 general-purpose registers. The compiler reports it had to spill one value to memory, adding two slow memory operations to a hot loop. The team's first instinct is "the allocator is dumb — surely 8 values fit in 3 registers if it's clever." Your task is to determine, with proof, whether the spill was the allocator's fault or a hard fact of the code.

💡 Intuition: Two values can share one register exactly when they are never "alive" at the same moment — just as two exams can share a time slot exactly when no student takes both. The register count is a budget of colors; the question is whether the conflict structure fits inside that budget. If it does not, no allocator, however clever, can avoid a spill. That impossibility is what we will certify.

Why this matters

Register allocation is one of the highest-impact passes in any optimizing compiler (GCC and LLVM both ship graph-coloring allocators). A spilled value in an inner loop can cost a program a measurable fraction of its runtime. But "the compiler spilled" is not actionable until you know why: was it a weak heuristic that a better vertex order would fix, or is the code's register pressure genuinely above the machine's capacity? Those two diagnoses lead to opposite fixes — tune the allocator vs. restructure the code — and graph coloring is exactly the tool that tells them apart.

Phase 1: Compute the live ranges

A value is live from the line that defines it up to and including its last use. Reading the code:

Value	Defined (line)	Last used (line)	Live on lines
`r`	1	6	1–6
`g`	2	7	2–7
`b`	3	8	3–8
`w`	4	5	4–5
`x`	5	6	5–6
`y`	6	7	6–7
`z`	7	8	7–8

🔄 Check Your Understanding Which values are live simultaneously at line 5 (i.e., their live ranges all include line 5)?

Answer

At line 5, the live values are r (1–6), g (2–7), b (3–8), w (4–5), and x (5–6) — five values alive at once. That count is the register pressure at line 5, and it is the highest in the function. Watch this number; it is about to become a clique.

Phase 2: Build the interference graph

Two values interfere (need different registers) if their live ranges overlap on at least one line. Reading the table, we connect every pair whose live intervals share a line:

r (1–6) interferes with g, b, w, x, y (all overlap 1–6); r does not reach z (7–8).
g (2–7) interferes with r, b, w, x, y, z (overlaps everything that touches 2–7).
b (3–8) interferes with r, g, w, x, y, z (it is alive the longest).
w (4–5) interferes with r, g, b, x (everything alive on lines 4–5).
x (5–6) interferes with r, g, b, w, y (alive on 5–6).
y (6–7) interferes with r, g, b, x, z.
z (7–8) interferes with g, b, y.

Collecting the distinct edges:

$$E = \{rg, rb, rw, rx, ry,\ gb, gw, gx, gy, gz,\ bw, bx, by, bz,\ wx,\ xy,\ yz\}.$$

That is 17 edges on 7 vertices. (Notice the non-edges, which are the opportunities: r–z, w–y, w–z, x–z. Two values joined by a non-edge may share a register.)

🔗 Connection: This is literally the construction of §33.2 — "one vertex per value; an edge between two values that are simultaneously live." We have turned a performance question into a graph. Everything from here is the coloring theory of §33.1.

Phase 3: Find the clique — a lower bound on registers

The fastest certificate of minimum register pressure is a clique: a set of values that are pairwise interfering all need distinct registers (Theorem 33.1, $\chi \ge \omega$). From Phase 1, the five values alive at line 5 — $\{r, g, b, w, x\}$ — are a natural candidate. Check that they are pairwise adjacent:

$$rg,\ rb,\ rw,\ rx \ (\checkmark);\quad gb,\ gw,\ gx\ (\checkmark);\quad bw,\ bx\ (\checkmark);\quad wx\ (\checkmark).$$

All $\binom{5}{2} = 10$ pairs are edges, so $\{r,g,b,w,x\}$ is a clique of size 5. Therefore

$$\omega(G) \ge 5 \quad\Longrightarrow\quad \chi(G) \ge 5.$$

💡 Intuition: The clique is the peak register pressure made visible. Five values alive at the same instant means five registers, full stop — there is no scheduling trick that squeezes five simultaneously-needed values into fewer storage cells. The clique lower bound from Theorem 33.1 is precisely a lower bound on register pressure, and line 5 is where the pressure peaks.

Phase 4: Compute $\chi$ exactly and deliver the verdict

We have $\chi(G) \ge 5$. Is it exactly 5, or higher? Run the exact chromatic_number from the Project Checkpoint in your head on the structure, or argue directly: a proper 5-coloring exists. Assign:

Value	`r`	`g`	`b`	`w`	`x`	`y`	`z`
Register	R1	R2	R3	R4	R5	R4	R1

Check the reused colors against the non-edges: y reuses R4 with w, and w–y is a non-edge (w died at line 5, y born at line 6) — legal. z reuses R1 with r, and r–z is a non-edge — legal. Every edge has differently-colored endpoints (verify a few: xy → R5 vs R4 ✓; yz → R4 vs R1 ✓; gz → R2 vs R1 ✓). So 5 colors suffice, and combined with the clique bound,

$$\chi(G) = 5.$$

The verdict. The machine has 3 registers but the interference graph needs $\chi(G) = 5$. Since $5 > 3$, at least two values must spill, and this is not the allocator's fault — it is a hard property of the code's conflict structure, certified by the size-5 clique. No allocator, no vertex ordering, no cleverness can fit this function into 3 registers without spilling. The clique $\{r,g,b,w,x\}$ is the proof you hand the skeptic.

🔗 Connection: When a real compiler reports a spill, it is reporting $\chi(G) >$ (register count), and the quickest human-checkable certificate of why is exactly a clique larger than the register file. You just produced that certificate by hand.

⚠️ Common Pitfall: A size-5 clique proves $\chi \ge 5$, but a failure to find a large clique does not prove $\chi$ is small. As §33.1 warns, triangle-free graphs (clique number 2) can still need many colors. The clique gives a lower bound; to confirm $\chi = 5$ we also had to exhibit a 5-coloring. Never report the clique size alone as the chromatic number.

Phase 5: Analyze the allocator's order — where heuristics earn their keep

The spill on this graph was unavoidable. But on graphs where $\chi \le$ (registers), whether the allocator spills anyway depends entirely on the vertex order it colors in (§33.6). To see the stakes, analyze a smaller, friendlier function whose interference graph is the crown graph $\text{Crown}(3)$ — bipartite, so $\chi = 2$, meaning two registers suffice. Here is the greedy allocator's behavior under two orders, on a 2-register machine:

adj = {                                    # Crown(3): a_i -- b_j iff i != j
    "a0": ["b1", "b2"], "a1": ["b0", "b2"], "a2": ["b0", "b1"],
    "b0": ["a1", "a2"], "b1": ["a0", "a2"], "b2": ["a0", "a1"],
}

def greedy(order):
    color = {}
    for v in order:
        used = {color[u] for u in adj[v] if u in color}
        c = 0
        while c in used:
            c += 1
        color[v] = c
    return color

bad  = ["a0", "b0", "a1", "b1", "a2", "b2"]   # interleaved
good = ["a0", "a1", "a2", "b0", "b1", "b2"]   # all a's, then all b's
print("bad :", 1 + max(greedy(bad ).values()))
print("good:", 1 + max(greedy(good).values()))
# Expected output:
# bad : 3
# good: 2

Hand-trace the bad order to see the trap (this mirrors §33.6): a0→0; b0 (neighbors a1,a2, uncolored) →0; a1 (neighbor b0=0) →1; b1 (neighbor a0=0) →1; a2 (neighbors b0=0,b1=1) →2; b2 (neighbors a0=0,a1=1) →2. Three colors — a spill on a 2-register machine, even though $\chi = 2$.

Now the good order: a0,a1,a2 →0,0,0 (no edges among the $a$'s); then b0 (neighbors a1,a2=0) →1; b1 (neighbors a0,a2=0) →1; b2 (neighbors a0,a1=0) →1. Two colors — no spill.

💡 Intuition: Same graph, same algorithm, opposite outcome. The bad order interleaves the two sides so each pair looks "new" and grabs a fresh color; the good order colors one whole side before the other, so the second side sees only one forbidden color. This is exactly why production allocators invest in orderings like Welsh–Powell (decreasing degree) and smallest-last — and why §33.6 insists greedy's count is only an upper bound on $\chi$, never the answer itself.

🐛 Find the Error: A teammate concludes from the bad trace: "Crown(3) needs 3 registers, so the spill is unavoidable here too." Refute this in one sentence, using the right quantity.

Answer

Wrong quantity: $\chi(\text{Crown}(3)) = 2$ (it is bipartite — color all $a$'s 0 and all $b$'s 1), so two registers suffice; greedy's 3 was a bad-order artifact, not a property of the graph. The fix is a better vertex order, not more registers — the opposite of the Phase 4 case.

Discussion Questions

In Phase 4 we found $\chi(G) = 5$ on a 3-register machine, forcing spills. If you could rewrite the source to shorten one live range — say, recompute r at line 6 instead of keeping it alive from line 1 — how would the interference graph change, and could that lower $\chi$? (This is "rematerialization," a real compiler trick.)
The clique $\{r,g,b,w,x\}$ certified $\chi \ge 5$. Suppose a different function had no clique larger than 3 but still needed 4 registers. What does §33.1's pitfall say is possible, and why can't you rely on clique-hunting alone to compute register pressure?
Phase 5 showed greedy's order matters enormously. Welsh–Powell colors in decreasing-degree order. On $\text{Crown}(3)$ every vertex has degree 2 (a tie) — does Welsh–Powell help here? What does this reveal about the limits of any single ordering heuristic?
Deciding $k$-colorability is NP-complete for $k \ge 3$ (§33.5, Chapter 37). Real allocators run in roughly linear time per attempt. Reconcile these: how can a fast heuristic be acceptable for a problem that is intractable to solve exactly? (Tie your answer to theme four: proof frames the target, computation reaches a usable answer.)

Your Turn: Extensions

Option A (raise the budget). Re-solve Phase 4 for a machine with 4 registers, then 5. At which register count does the spill disappear, and how does $\chi(G) = 5$ predict that exact threshold?
Option B (build the certificate finder). Write max_clique_bruteforce(adj) that returns a largest clique (try all subsets, largest first; $\le 30$ lines). Run it in your head on the Phase 2 graph and confirm it returns a size-5 set. Explain why this is exponential and therefore only practical on the small graphs a single function produces.
Option C (measure the gap). Using greedy_coloring and chromatic_number from the Project Checkpoint, write code that reports, for the Phase 2 graph, both the greedy color count (natural order) and the exact $\chi$, then prints their difference. Hand-derive the expected output. What does a nonzero gap tell a compiler engineer about their chosen vertex order?

Key Takeaways

Register allocation is graph coloring. Values are vertices, simultaneous liveness is an edge, registers are colors, and the minimum registers needed is $\chi(G)$ — the same theorem as exam scheduling, in a hard hat.
A clique certifies a spill. A clique of size $k > $ (registers) is a short, human-checkable proof that no allocator can avoid spilling. It is a lower bound on register pressure (Theorem 33.1).
Lower bound ≠ answer. A clique proves $\chi \ge \omega$; confirming $\chi$ also requires exhibiting a coloring, because clique number can fall short of the chromatic number.
When $\chi \le$ registers, the order decides. Greedy is correct but not optimal; a bad vertex order can spill on a graph that fits, which is why real allocators invest in orderings — exactly the §33.6 lesson, paying rent inside a compiler.