Case Study: Building a Randomized MAX-CUT Solver and Proving It Works

DataField.Dev

Case Study: Building a Randomized MAX-CUT Solver and Proving It Works

"The probabilistic method is one of the most powerful and elegant techniques in combinatorics." — Noga Alon and Joel Spencer, The Probabilistic Method

Executive Summary

In this build-heavy case study you will write a tiny randomized algorithm — for MAX-CUT, the problem of splitting a network's nodes into two groups so as to cut as many edges as possible — and then do something that feels like magic the first time you see it: you will prove a performance guarantee about the algorithm using nothing but linearity of expectation, and you will use the probabilistic method to prove that a good cut always exists. The algorithm itself is almost embarrassingly simple (flip a coin for each node), yet linearity proves it cuts at least half the edges on average, and the expectation form of the probabilistic method turns that average into a guarantee of existence. You will build the solver, build a Monte-Carlo harness to measure its performance, and watch the measured numbers converge to the proven expectation — theme four, in code you wrote.

MAX-CUT is not a toy: it is the canonical example of an NP-hard problem with a clean randomized approximation, it models "split this social network into two camps to maximize cross-camp tension" (or minimize within-camp conflict), and the random-coloring argument here is the same argument as the two-coloring theorem in §20.6. Building the solver makes that abstract proof concrete.

Skills applied

Modeling a graph cut as a random variable on the sample space of random vertex colorings (§20.1, §20.3).
Defining edge indicators and using linearity of expectation to compute the expected cut size (§20.4).
Applying the expectation form of the probabilistic method to prove a good cut exists (§20.6).
Building a Monte-Carlo harness with the Toolkit's simulate/expected_value and confirming the derivation empirically (§20.6, Project Checkpoint).
Distinguishing what the simulation checks from what the proof settles (theme four).

Background

The problem

You are given an undirected graph $G = (V, E)$ — think of it as a social network, with vertices for people and an edge between any two people who interact. A cut is a way of partitioning the vertices into two sets, $A$ and $B$ (color each vertex red or blue). The size of the cut is the number of crossing edges: edges with one endpoint in $A$ and the other in $B$. The MAX-CUT problem asks for the partition that makes the cut as large as possible.

# A small graph as an adjacency set. Vertices 0..4; edges are unordered pairs.
edges = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4)]
#   0 --- 1
#   | \   | \
#   |  \  |  4
#   2 --- 3

Finding the exact maximum cut is NP-hard — for a large graph, no known algorithm finds the optimum efficiently. But there is a startlingly simple randomized algorithm with a provable guarantee, and proving that guarantee is a pure application of this chapter.

💡 Intuition: Imagine assigning each person to one of two rooms by flipping a coin. Any given friendship (edge) is "cut" — the two friends end up in different rooms — exactly when their two coins disagree. Two fair coins disagree half the time. So on average, half of all friendships get cut, no matter how the network is wired. That one sentence is the whole performance proof; the rest is making it precise.

Why this matters

MAX-CUT is the textbook entry point to approximation algorithms and randomized algorithms — two pillars of modern CS. The "flip a coin per vertex" method is a $1/2$-approximation: it is guaranteed (in expectation) to cut at least half the edges, and since no cut can exceed all $\lvert E \rvert$ edges, it gets at least half of the optimum. That a one-line random algorithm comes with an ironclad guarantee — and that the guarantee is proved with the linearity you learned three sections ago — is one of the most persuasive arguments in this book for why probability is a tool, not just a topic.

Phase 1: Model the cut as a random variable

Set up the sample space deliberately. The experiment is: color each of the $\lvert V \rvert$ vertices red or blue, independently, by a fair coin flip. An outcome is an assignment of a color to every vertex, so $$\lvert S \rvert = 2^{\lvert V \rvert},$$ and all $2^{\lvert V \rvert}$ colorings are equally likely. The quantity we care about — the number of crossing edges — is a function of the outcome, i.e. a random variable (§20.3): $$X(\text{coloring}) = \text{number of edges whose endpoints get different colors.}$$ $X$ is exactly the cut size of the random coloring. Our goal is $E[X]$, the average cut size over all $2^{\lvert V\rvert}$ colorings, and then a guarantee that some coloring achieves at least that average.

🔄 Check Your Understanding For the 5-vertex graph above, how many outcomes are in the sample space? If we color $\{0,3\}$ red and $\{1,2,4\}$ blue, how many of the 6 edges are crossing?

Answer

$\lvert S \rvert = 2^5 = 32$ equally likely colorings. Crossing edges (endpoints differ): $(0,1)$ ✓, $(0,2)$ ✓, $(1,2)$ ✗ (both blue), $(1,3)$ ✓, $(2,3)$ ✓, $(3,4)$ ✓ — that is 5 of the 6 edges cut.

Phase 2: The expected cut size, by linearity

Computing the distribution of $X$ directly — for how many of the $2^{\lvert V\rvert}$ colorings is the cut size exactly $7$? — is hopeless for any interesting graph. Linearity of expectation makes $E[X]$ a one-liner, in the now-familiar pattern: write $X$ as a sum of indicators, then add the probabilities.

For each edge $e = (u, v)$, define the indicator $$X_e = \mathbf{1}[\text{the endpoints } u \text{ and } v \text{ get different colors}],$$ so that the total cut size is $X = \sum_{e \in E} X_e$. By the indicator lemma (§20.4), $E[X_e] = P(u \text{ and } v \text{ differ in color})$. Each endpoint is red or blue by an independent fair flip; the two flips disagree in exactly 2 of the 4 equally-likely color pairs (RB and BR, not RR or BB), so $$E[X_e] = P(\text{endpoints differ}) = \frac{2}{4} = \frac{1}{2}.$$ Now apply linearity across all $\lvert E\rvert$ edges — and note the edges are not independent (sharing a vertex couples them), yet linearity does not care: $$\boxed{\,E[X] = \sum_{e \in E} E[X_e] = \lvert E\rvert \cdot \frac{1}{2} = \frac{\lvert E\rvert}{2}.\,}$$ A uniformly random coloring cuts half the edges, on average — for every graph, regardless of its structure. For the 5-vertex graph with $\lvert E\rvert = 6$ edges, $E[X] = 3$.

🚪 Threshold Concept: linearity proves an algorithmic guarantee with zero case analysis. Look at what we did not do: we never enumerated colorings, never used the graph's shape, never worried that adjacent edges' indicators are correlated. We wrote the cut as a sum of edge-indicators, replaced each expectation by the probability $1/2$, and added. The graph could be a triangle, a thousand-node social network, or a pathological worst case — $E[X] = \lvert E\rvert/2$ either way. This is the chapter's central technique (§20.4) doing real algorithmic work: a performance guarantee that falls out of one line of probability.

from fractions import Fraction

def expected_cut(num_edges):
    """E[cut size] for a uniformly random 2-coloring = |E| / 2, exactly."""
    return Fraction(num_edges, 2)

print(expected_cut(6))        # the 5-vertex example
print(expected_cut(1000))     # a 1000-edge network
# Expected output:
# 3
# 500

The exact arithmetic restates the theorem: half the edges, every time, in expectation.

Phase 3: From an average to a guarantee (the probabilistic method)

An average of $\lvert E\rvert/2$ is a statement about the mean over all colorings. Does that guarantee some particular coloring achieves at least $\lvert E\rvert/2$ crossing edges? Yes — by the expectation form of the probabilistic method (§20.6): a random variable always attains a value at least its mean on some outcome.

The strategy first. If every coloring had a cut size strictly less than the average $\lvert E\rvert/2$, then the average of all the cut sizes — which is exactly $E[X]$ — would itself have to be strictly less than $\lvert E\rvert/2$. But we computed $E[X] = \lvert E\rvert/2$ exactly. Contradiction. So at least one coloring must reach the average or beat it. We never build that coloring; we only show its nonexistence is impossible.

Theorem (a large cut always exists). Every graph $G = (V, E)$ has a cut of size at least $\lvert E\rvert/2$ — that is, the vertices can be 2-colored so that at least half the edges cross.

Proof. Color the vertices red/blue uniformly at random and let $X$ be the cut size. By Phase 2, $E[X] = \lvert E\rvert/2$. By the expectation form of the probabilistic method, some outcome (some specific coloring) has $X \ge E[X] = \lvert E\rvert/2$. That coloring is a cut of size at least $\lvert E\rvert/2$. $\blacksquare$

Read that again and notice the §20.6 signature: the theorem ("a large cut exists") has nothing to do with randomness, yet the cleanest proof introduces randomness, computes one expectation, and discards the randomness. It is the two-coloring proof of §20.6 in a new costume — there we showed $P(\text{good coloring}) > 0$; here we show $E[\text{cut}] = \lvert E\rvert/2$ forces an outcome at least that good. Both are the probabilistic method.

🧩 Productive Struggle. Before reading on: the proof guarantees a cut of size at least $\lvert E\rvert/2$. Can you find a graph where the maximum cut is exactly $\lvert E\rvert/2$, so the guarantee is tight and cannot be improved by the same argument? (Hint: try the triangle $K_3$. How many of its 3 edges can any 2-coloring cut?) Resolve this before Phase 4.

Phase 4: Build the solver

The proof guarantees a good cut exists, but it does not hand us one. The constructive companion is dead simple: try random colorings and keep the best. Because the expected cut is $\lvert E\rvert/2$, a random coloring is "good" often enough that a handful of tries reliably finds one — and we can even prove a bound on how many tries are needed, which Option B in "Your Turn" pursues.

import random

def cut_size(coloring, edges):
    """Number of edges whose endpoints get different colors under `coloring`."""
    return sum(1 for u, v in edges if coloring[u] != coloring[v])

def random_max_cut(vertices, edges, attempts):
    """Try `attempts` random 2-colorings; return the best (coloring, cut size)."""
    best_coloring, best = None, -1
    for _ in range(attempts):
        coloring = {v: random.randint(0, 1) for v in vertices}   # fair coin per vertex
        c = cut_size(coloring, edges)
        if c > best:
            best_coloring, best = coloring, c
    return best_coloring, best

random.seed(0)
V = [0, 1, 2, 3, 4]
E = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4)]
coloring, best = random_max_cut(V, E, 50)
print(best)                      # best cut found in 50 tries (>= |E|/2 = 3)
print(best >= len(E) / 2)        # the guarantee from Phase 3
# Expected output:
# 5
# True

The solver tries 50 random colorings and keeps the best; on this graph the true maximum cut is 5 (the single uncut edge can only be one of a triangle's three mutually-adjacent edges), and 50 tries comfortably find it. The second line confirms the Phase-3 guarantee: the result is at least $\lvert E\rvert/2 = 3$. The build constructs what the probabilistic method promised.

⚠️ Common Pitfall: "expected $\lvert E\rvert/2$" does not mean "always $\lvert E\rvert/2$." A single random coloring can do worse than half (or much better) — the guarantee is about the average and about existence, not about any one run. That is exactly why the solver tries many colorings and keeps the best: it converts "good on average / some good one exists" into "found a good one." Confusing the expectation with a per-run certainty is the same category of error as reading $E[X] = 3.5$ as a prediction for one die roll (§20.4).

Phase 5: Confirm the expectation empirically

We proved $E[X] = \lvert E\rvert/2$. Now check it — theme four — by measuring the average cut size over many random colorings with the Toolkit's simulate/expected_value machinery (Project Checkpoint). The average of single-run cut sizes should converge to $\lvert E\rvert/2 = 3$.

import random
from itertools import product
from dmtoolkit.probability import expected_value

# Exact: average cut over ALL 2^5 = 32 colorings, using expected_value.
space = list(product([0, 1], repeat=5))                  # every coloring of 5 vertices
E = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4)]
cut_of = lambda c: sum(1 for u, v in E if c[u] != c[v])
print(expected_value(cut_of, space))                     # exact E[X]

# Empirical: average cut over many random colorings (law of large numbers).
random.seed(1)
trials = 100_000
avg = sum(cut_of(tuple(random.randint(0, 1) for _ in range(5)))
          for _ in range(trials)) / trials
print(round(avg, 2))
# Expected output:
# 3
# 3.0

The exact expected_value over all 32 colorings returns precisely $3 = \lvert E\rvert/2$, vindicating the linearity proof with a full enumeration. The Monte-Carlo average over 100,000 random colorings also lands at $3.0$ — close, seed-dependent, and never exactly $3$ on a finite sample, which is the whole point of §20.6: the simulation gives fast evidence, the proof gives certainty, and they agree. If a coding bug had crept into cut_size, the empirical average would have drifted away from $3$ and the disagreement would have flagged it instantly.

🔗 Connection. The "keep the best of many random tries" structure here is the seed of a whole design pattern — randomized algorithms with amplification: run a randomized procedure repeatedly to drive the failure probability down. The same idea underlies randomized primality testing (Chapter 23, where each trial is a witness check) and the analysis of randomized quicksort (Chapter 21). The expectation you proved is what makes the amplification provably work, not just empirically lucky.

Discussion Questions

The performance proof in Phase 2 never used a single structural fact about the graph — not its degree sequence, not whether it is connected, nothing. Explain why $E[X] = \lvert E\rvert/2$ holds for every graph, and identify the exact step where linearity let you ignore the (real) dependence between edges that share a vertex.
Phase 3 proved a cut of size $\ge \lvert E\rvert/2$ exists but did not construct one; Phase 4 constructed one by random trials. Describe the relationship between these two facts. Could you have the existence guarantee without any way to construct the object? (Yes — that is precisely the situation for some objects the probabilistic method proves to exist; see §20.6 and Chapter 40.)
The Productive Struggle showed the triangle $K_3$ has maximum cut exactly $\lvert E\rvert/2 \cdot \frac{?}{?}$ — work it out. What does this say about whether the "$\ge \lvert E\rvert/2$" guarantee can be improved to "$\ge c\lvert E\rvert$" for some $c > 1/2$ by this argument alone?
The solver keeps the best of attempts random colorings. Intuitively, how does increasing attempts change the probability of finding a cut of size at least $\lvert E\rvert/2$ versus the expected best cut found? Which one does the Phase-3 theorem speak to directly?
Suppose you weighted the edges (some friendships matter more). Restate the random variable $X$ as a sum of weighted edge-indicators and re-run the Phase-2 argument. What does $E[X]$ become, and does the probabilistic-method guarantee still hold? (This is the weighted MAX-CUT, and linearity handles it with no extra effort.)

Your Turn: Extensions

Option A (weighted cut). Implement weighted_cut(coloring, weighted_edges) where each edge carries a weight $w_e$, and prove by linearity that the expected weighted cut is $\frac{1}{2}\sum_e w_e$. Then prove a cut of at least half the total weight exists. You will find the proof is identical to the unweighted one with $1$ replaced by $w_e$ — the power of writing it as a sum of indicators.
Option B (how many tries?). A single random coloring achieves cut $\ge \lvert E\rvert/2$ with some probability $p > 0$ (the probabilistic method guarantees $p > 0$; a sharper argument gives a concrete bound). Empirically estimate $p$ for the 5-vertex graph by simulation, then argue that the probability the solver fails to find such a cut in $t$ independent tries is at most $(1 - p)^t$, which shrinks geometrically. This is amplification made quantitative.
Option C (greedy vs. random). Build a deterministic greedy MAX-CUT: process vertices in order and place each on the side that cuts more of its already-placed edges. Prove (or argue) that greedy also achieves $\ge \lvert E\rvert/2$, and compare its cut to the random solver's on several graphs. When does determinism win, and when is the simplicity of "flip a coin" worth more?

Key Takeaways

Model the cut as a random variable on the space of random colorings. With $2^{\lvert V\rvert}$ equally-likely colorings, the cut size $X$ is a function of the outcome, and $E[X]$ is the average cut.
Linearity proves the performance guarantee in one line. Writing $X = \sum_e X_e$ with each $E[X_e] = 1/2$ gives $E[X] = \lvert E\rvert/2$ for every graph — no structure, no independence, no case analysis. This is §20.4's central technique doing algorithmic work.
The probabilistic method turns the average into a guarantee. Because some outcome must meet the mean, a cut of size $\ge \lvert E\rvert/2$ always exists — proved without constructing one, the §20.6 signature move.
Build by amplification. "Try many random colorings, keep the best" constructs the cut the proof promised; running it repeatedly drives the failure probability down geometrically — the design pattern behind randomized algorithms throughout the rest of the book.
Prove it, then check it. expected_value over all 32 colorings returned exactly $\lvert E\rvert/2$, and 100,000 random trials hovered at the same value: the proof settles every graph, the simulation guards against bugs, and together they are theme four in working code.