Case Study: Building a Verify/Find Workbench — A Brute-Force SAT Solver and a Reduction Pipeline

DataField.Dev

Case Study: Building a Verify/Find Workbench — A Brute-Force SAT Solver and a Reduction Pipeline

"You do not understand a reduction until you have watched it carry an answer from one problem to another."

Executive Summary

The deepest idea in this chapter is the verify/find gap: checking a solution can be vastly cheaper than finding one (§37.3). In this build-focused case study you will make that gap executable. You'll construct a small "workbench" with three parts: (1) a fast, polynomial-time verifier for SAT; (2) a brute-force finder that searches all assignments, so you can literally watch the cost explode; and (3) a working reduction pipeline that takes a 3-SAT instance, transforms it into a CLIQUE instance using the §37.5 construction, and carries the satisfiability answer across — demonstrating that an algorithm for CLIQUE would solve SAT. Along the way you'll instrument the search to count assignments and confirm the $2^v$ blow-up by hand.

This is the build counterpart to Case Study 1 (which was analysis-and-proof). Here the deliverable is working machinery and the hand-derived outputs that prove you understand it. Crucially — and per the chapter's discipline — we never execute any of it: every output is reasoned out and recorded in an # Expected output: comment, exactly as the book insists.

Skills applied

Implementing a polynomial-time verifier vs. an exponential finder (§37.2–37.3).
Instrumenting and hand-deriving the $2^v$ search-space size (§37.3, Chapter 1 truth tables).
Implementing the $\text{3-SAT} \le_p \text{CLIQUE}$ reduction construction (§37.5).
Watching a reduction transfer an answer from one problem to another, the operational meaning of $\le_p$ (§37.4).
Reading the result through the coping menu (§37.6).

Background

The scenario

You are building a teaching tool — a hypothetical example, but every piece is standard — to help your team's new hires feel the difference between checking and finding. Slides don't do it; people need to see a verifier finish instantly while a finder grinds, and to watch a reduction physically move an answer. The workbench has to be small enough to trace by hand (so it can go in onboarding docs) yet faithful to the real definitions. You'll build it bottom-up: verifier, then finder, then the reduction that ties SAT to CLIQUE.

💡 Intuition: The whole point of NP is the asymmetry between two programs that look superficially similar. The verifier is handed a candidate and says yes/no in one pass. The finder is handed nothing and must conjure a candidate from an exponential space. Putting them side by side, in code you can read, makes the abstract "$\exists$" of the NP definition (§37.3) concrete: the finder is the search over the existential quantifier; the verifier is what's left once a witness is supplied.

Why this matters

Every practitioner eventually phrases a real problem as SAT (hardware verification, dependency resolution, configuration, scheduling) and hands it to a solver. Understanding what the solver is up against — the exponential space it must navigate, and why verifying its answer is trivially cheap — is what lets you reason about when it will be fast and when it won't. And understanding reductions operationally (not just as proof devices) is what lets you recognize that your weird new problem is "secretly SAT" and reuse decades of solver engineering. Building the toy makes the real thing legible.

There is also a concrete payoff for debugging solver-based systems. When a SAT-backed feature is slow, engineers reflexively blame the solver ("it's a bad solver") or their encoding ("my clauses are wrong"). The workbench teaches a third, often-correct diagnosis: the instance is simply near the worst case for an NP-complete problem, and no amount of solver tuning will make the worst case polynomial (that would prove $\text{P} = \text{NP}$). Knowing which of the three explanations applies — bad solver, bad encoding, or genuine hardness — is the difference between an afternoon of profiling and a month of futile optimization. That judgment is exactly what this chapter trains, and the toy is where it is cheapest to learn.

Phase 1: The verifier (the cheap half)

We start with the polynomial-time checker — the witness that SAT is in NP (§37.2). This is the same verify_sat shape as §37.3, lightly generalized so a clause may have any number of literals.

def verify_sat(clauses, assignment):
    """clauses: list of clauses; each clause is a list of (var, is_positive).
    assignment: dict var -> bool. Returns True iff EVERY clause has at least one
    true literal. Runs in time proportional to the total formula size -- polynomial."""
    for clause in clauses:
        if not any(assignment[v] == is_pos for (v, is_pos) in clause):
            return False                       # this clause is unsatisfied -> formula fails
    return True

# (x1 OR NOT x2) AND (x2 OR x3)
f = [[("x1", True), ("x2", False)], [("x2", True), ("x3", True)]]
print(verify_sat(f, {"x1": True,  "x2": True,  "x3": False}))   # both satisfied
print(verify_sat(f, {"x1": False, "x2": True,  "x3": False}))   # clause 1 fails
# Expected output:
# True
# False

Hand-derive the output. First call, $x_1{=}T, x_2{=}T, x_3{=}F$: clause 1 is $x_1 \lor \neg x_2$; literal $x_1$ needs $x_1{=}T$ — true, so clause 1 passes. Clause 2 is $x_2 \lor x_3$; literal $x_2$ needs $x_2{=}T$ — true, so clause 2 passes. All clauses satisfied → True. Second call, $x_1{=}F, x_2{=}T, x_3{=}F$: clause 1 is $x_1 \lor \neg x_2$; literal $x_1$ needs $x_1{=}T$ (false), literal $\neg x_2$ needs $x_2{=}F$ (but $x_2{=}T$, so false) — clause 1 has no true literal → return False immediately. Output is True then False.

The cost is one pass over the formula: $O(\text{total literals})$. That brevity is the entire content of "SAT is in NP." Hold this program in mind as the baseline; the finder below does something categorically more expensive.

🔄 Check Your Understanding The verifier returns False the instant it finds one unsatisfied clause. Does this short-circuit change its asymptotic complexity? Why is the worst case still the relevant one for classifying SAT?

Answer

No — short-circuiting only helps on some inputs; the worst case (a satisfied formula, or one whose only bad clause is last) still scans all clauses, $O(\text{formula size})$, which is polynomial. Classification uses the worst case over all inputs (§37.1), so the verifier is a polynomial-time verifier regardless of the lucky early exits.

Phase 2: The finder (the expensive half) — and watching the blow-up

Now the contrast. A finder must produce a satisfying assignment if one exists, with no certificate to lean on. The honest brute force: enumerate every assignment and verify it. We instrument it to count how many assignments it examines, so the exponential cost is not a claim but a number you can check.

from itertools import product

def find_sat(clauses, variables):
    """Search ALL 2**|variables| assignments for one that satisfies `clauses`.
    Returns (assignment, tried) if found, else (None, tried). Exponential by design."""
    tried = 0
    for bits in product([False, True], repeat=len(variables)):   # all 2**v assignments
        tried += 1
        assignment = dict(zip(variables, bits))
        if verify_sat(clauses, assignment):
            return assignment, tried
    return None, tried

vars3 = ["x1", "x2", "x3"]
g = [[("x1", True), ("x2", False)], [("x2", True), ("x3", True)]]   # satisfiable
print(find_sat(g, vars3))
unsat = [[("x1", True)], [("x1", False)]]                            # x1 AND NOT x1
print(find_sat(unsat, ["x1"]))
# Expected output:
# ({'x1': False, 'x2': False, 'x3': True}, 2)
# (None, 2)

Hand-derive the output. product([False, True], repeat=3) yields assignments in this fixed order: (F,F,F), (F,F,T), (F,T,F), ... — Python's product increments the last coordinate fastest. The first candidate is $x_1{=}F, x_2{=}F, x_3{=}F$ (tried = 1): clause 1 $x_1 \lor \neg x_2$ — $\neg x_2$ is true (since $x_2{=}F$), so clause 1 passes; clause 2 $x_2 \lor x_3$ — both literals false → clause 2 fails, so this assignment is rejected. The second candidate is $x_1{=}F, x_2{=}F, x_3{=}T$ (tried = 2): clause 1 passes via $\neg x_2$ as before; clause 2 $x_2 \lor x_3$ — $x_3{=}T$ makes it pass. Both clauses satisfied → return this assignment with tried = 2. Hence ({'x1': False, 'x2': False, 'x3': True}, 2).

For the unsatisfiable formula $x_1 \land \neg x_1$ over one variable: candidates are $x_1{=}F$ (tried = 1): clause 1 is $(x_1)$, needs $x_1{=}T$ — fails; then $x_1{=}T$ (tried = 2): clause 2 is $(\neg x_1)$, needs $x_1{=}F$ — fails. All $2^1 = 2$ assignments exhausted with no success → (None, 2).

🚪 Threshold Concept: the finder is the existential search. The loop ranges over all $2^v$ assignments — it is literally enumerating the space the $\exists$ in "does there exist a satisfying assignment?" ranges over (§37.3). The verifier inside the loop is cheap; the number of times we must call it is what hurts. This is the verify/find gap in its rawest form: a polynomial check wrapped in an exponential search. P vs. NP asks whether that exponential wrapper can always be removed — whether finding can be made as cheap as checking. Nobody knows.

Now confirm the blow-up is genuinely $2^v$ with a tiny counter you can fully hand-trace.

def count_assignments(v):
    """How many truth assignments exist over v variables? (The finder's worst-case work.)"""
    total = 0
    for _ in product([False, True], repeat=v):
        total += 1
    return total

print([count_assignments(v) for v in range(5)])
# Expected output:
# [1, 2, 4, 8, 16]

Hand-derive the output. product([...], repeat=0) yields exactly one item (the empty tuple), so $v{=}0 \to 1$. For $v{=}1$: (F,), (T,) → 2. For $v{=}2$: the four pairs → 4. For $v{=}3$: 8. For $v{=}4$: 16. The list is [1, 2, 4, 8, 16] — i.e., $2^v$ for $v = 0, 1, 2, 3, 4$. A clean two-line proof (Exercise 37.25): each variable is independently true or false, so by the product rule (Chapter 1 / Chapter 15) there are $2 \times 2 \times \dots \times 2 = 2^v$ assignments. At 60 variables that is $2^{60} \approx 10^{18}$ — more checks than a CPU does in a decade. Verifying one assignment, by contrast, stayed a single pass throughout.

Phase 3: The reduction pipeline — carry an answer from 3-SAT to CLIQUE

A reduction is usually presented as a proof device. Here we run it as a program and watch it move an answer, which is the operational meaning of $A \le_p B$ (§37.4): "I can rephrase any A-question as a B-question." We implement the §37.5 construction $\text{3-SAT} \le_p \text{CLIQUE}$ — the building part, which is the polynomial step — and then reason about how a CLIQUE answer feeds back to a SAT answer.

def three_sat_to_clique(clauses):
    """The 3-SAT -> CLIQUE reduction CONSTRUCTION (section 37.5).
    One vertex per literal-occurrence; m clauses = m columns. Connect two vertices
    iff they are in DIFFERENT clauses AND not contradictory (x_i vs NOT x_i).
    Returns (vertices, edges, k) with k = number of clauses. Builds, does not solve."""
    vertices = [(j, t) for j in range(len(clauses)) for t in range(3)]
    edges = set()
    for (j, t) in vertices:
        for (j2, t2) in vertices:
            if j < j2:                                   # different clauses, no dup pairs
                v1, p1 = clauses[j][t]
                v2, p2 = clauses[j2][t2]
                if not (v1 == v2 and p1 != p2):          # not contradictory
                    edges.add(((j, t), (j2, t2)))
    return vertices, edges, len(clauses)

phi = [[("a", True),  ("b", True),  ("c", True)],        # (a OR b OR c)
       [("a", False), ("b", True),  ("c", False)]]       # (NOT a OR b OR NOT c)
V, E, k = three_sat_to_clique(phi)
print(len(V), len(E), k)
# Expected output:
# 6 7 2

Hand-derive the output. Two clauses give $2 \times 3 = 6$ vertices, so $\lvert V \rvert = 6$. Edges connect the $3 \times 3 = 9$ cross-clause pairs unless contradictory. Column 0's literals are $a, b, c$ (all positive); column 1's are $\neg a, b, \neg c$. Contradictory pairs are same-variable/opposite-sign: $a$ vs. $\neg a$ (forbidden) and $c$ vs. $\neg c$ (forbidden); $b$ vs. $b$ is fine. So $9 - 2 = 7$ edges, and $k = m = 2$. Output 6 7 2, matching §37.5 exactly.

Now the operational payoff. The reduction's theorem (§37.5) is: $\phi$ is satisfiable $\iff$ the graph has a clique of size $k = 2$. So if we had any CLIQUE-solver, we could answer SAT for $\phi$ by: (1) build the graph (the three_sat_to_clique call above, polynomial), (2) ask the solver "is there a 2-clique?", (3) return its yes/no as the answer for $\phi$. To make the back-translation concrete, here is the certificate-checker for the CLIQUE side, so we can verify a clique the same way we verified an assignment:

def verify_clique(vertices, edges, candidate, k):
    """candidate: a set of vertices. Returns True iff it is a clique of size k
    (every pair adjacent). The NP verifier for CLIQUE -- checks, does not find."""
    cand = list(candidate)
    if len(cand) != k:
        return False
    for i in range(len(cand)):                  # all C(k,2) pairs must be edges
        for j in range(i + 1, len(cand)):
            pair = (cand[i], cand[j])
            if pair not in edges and (pair[1], pair[0]) not in edges:
                return False
    return True

# The literal b appears in both clauses: vertices (0,1) and (1,1). They are
# non-contradictory (b vs b) and in different clauses -> adjacent -> a 2-clique.
print(verify_clique(V, E, {(0, 1), (1, 1)}, 2))
# Expected output:
# True

Hand-derive the output. The candidate is $\{(0,1), (1,1)\}$ — the two occurrences of literal $b$. Size is $2 = k$ ✓. The single pair $\{(0,1),(1,1)\}$: are they adjacent? They are in different clauses ($0 \ne 1$) and both are the literal $b$ (positive in both), so non-contradictory — the construction added this edge. So the pair is in edges → True. And a 2-clique here means a consistent choice of one true literal per clause: pick $b$ in clause 0 and $b$ in clause 1, i.e. set $b = \text{True}$, which indeed satisfies $\phi$ (clause 0 via $b$, clause 1 via $b$). The clique is the satisfying assignment, wearing a different costume.

🔗 Connection. This is §37.4's "$A \le_p B$ means a $B$-oracle answers $A$-questions" made tangible: the SAT question about $\phi$ became a CLIQUE question about $(V, E, k)$, and the CLIQUE answer (a 2-clique exists) translated straight back into the SAT answer (satisfiable, via $b = T$). We didn't need a fast CLIQUE solver to see the mechanism — that's exactly what a reduction is: a wiring diagram that would let any future fast solver for one problem power a fast solver for the other.

⚠️ Common Pitfall: the reduction is polynomial; the solving is not. Everything we built here — three_sat_to_clique — runs in polynomial time ($O(m^2)$ to compare clause pairs). What stays hard is answering the CLIQUE question on the constructed graph; that is where the exponential cost moved, not vanished. A reduction relocates difficulty, it doesn't dissolve it (§37.4). If building the graph were the hard part, the reduction wouldn't preserve efficiency and would prove nothing.

Putting numbers on the gap

It is worth tabulating the three programs side by side, because the contrast is the whole point of the workbench. Let a SAT instance have $v$ variables and a formula of total size $L$ (number of literal occurrences), and let a 3-SAT instance have $m$ clauses. Reason out each cost by hand:

Operation	What it does	Worst-case cost	Class it witnesses
`verify_sat(clauses, assignment)`	check one candidate	$O(L)$ — one pass	SAT $\in$ NP (the checker)
`find_sat(clauses, variables)`	search all candidates	$O(2^v \cdot L)$ — $2^v$ checks	brute-force finder (exponential)
`count_assignments(v)`	size of the search space	$\Theta(2^v)$	the source of the blow-up
`three_sat_to_clique(clauses)`	the reduction $f$	$O(m^2)$ — compare clause pairs	$\le_p$ (polynomial translation)
`verify_clique(V, E, S, k)`	check one clique	$O(k^2)$ — all pairs	CLIQUE $\in$ NP (the checker)

Read the table top to bottom and the chapter's architecture appears: two cheap checkers (rows 1 and 5) that witness membership in NP, one cheap translation (row 4) that witnesses $\le_p$, and one expensive search (row 2, sized by row 3) that is the apparently-irreducible cost of finding. P vs. NP is precisely the question of whether row 2's exponential factor can always be eliminated — whether finding can be brought down to the cost of the checkers. Nothing in this workbench (and nothing anyone has built) removes that factor for general SAT; that is the wall, made of code you can read.

💡 Intuition: The reduction row ($O(m^2)$) sits with the cheap operations, not the expensive one — and that is exactly why it is a valid reduction. If translating 3-SAT into CLIQUE were itself exponential, it would smuggle the hardness into the translation and certify nothing. A reduction is only meaningful because it is cheap: it shows the difficulty lives in the problem, not in the act of rephrasing it.

Put the three pieces together and the chapter's practical lesson writes itself. The verifier (Phase 1) is cheap and stays cheap — that is permanent, regardless of P vs. NP. The finder (Phase 2) is exponential — and unless $\text{P} = \text{NP}$, no general exact SAT finder can escape that worst case. The reduction (Phase 3) shows SAT and CLIQUE are the same hardness in two costumes, so neither can be made fast without making the whole NP class fast.

So when a real SAT instance lands on you, the workbench tells you what your options are (§37.6):

Sacrifice	On the SAT workbench	What you keep
general	restrict to a tractable fragment, e.g. 2-SAT (linear-time) or Horn/affine formulas	exact + fast on that fragment
exact	randomized/local-search SAT (WalkSAT-style): flip variables to reduce unsatisfied clauses	fast + general, usually finds a solution if one exists
fast	a real CDCL SAT solver (the industrial descendant of `find_sat`, with learning and pruning)	exact + general, tolerable on structured real instances

💡 Intuition: Your find_sat is the honest-but-hopeless ancestor of a modern SAT solver. The modern solver gives up nothing on paper — it is exact and general — but its worst case is still your exponential loop (it must be, since SAT is NP-complete). What it adds is ferocious pruning that makes the worst case rare on the structured formulas people actually feed it. "NP-complete" never promised your inputs are hard; it only described the worst case over all inputs (§37.6).

Discussion Questions

The finder examined assignments in product order and returned the first satisfying one it hit. Does the order affect correctness? Does it affect the count tried? Construct a formula where reversing the variable order changes tried but not the answer.
verify_sat, verify_clique, and the Project Checkpoint's verify_hamilton_circuit are all the same kind of object. State the common contract ("given an instance and a certificate, decide in polynomial time whether the certificate proves a yes"). Why does building a fast verifier never, by itself, put a problem in P?
In Phase 3 we built the CLIQUE instance and checked a clique, but we never found one with code. Why was that the right scope for demonstrating $\le_p$? What would change about the cost if we added a brute-force clique finder?
The reduction maps 3-SAT to CLIQUE. If someone hands you a fast CLIQUE algorithm, you get a fast 3-SAT algorithm. Does the reverse follow from this reduction — does a fast 3-SAT algorithm give a fast CLIQUE algorithm? (Think about what an additional reduction, in the other direction, would require, and why NP-completeness of both makes them interreducible anyway.)
count_assignments confirmed $2^v$. Suppose a vendor claims their solver "checks a billion assignments per second." For what $v$ does pure brute force exceed one year? Use $2^{v}$ and reason in orders of magnitude. What does this say about relying on raw enumeration?
A teammate proposes "fixing" the finder by parallelizing it across $1000$ machines. By how much does that shrink the feasible $v$ (the largest $v$ you can brute-force in fixed time)? Reason with logs: a $1000\times$ speed-up buys roughly how many extra variables? What does this reveal about why throwing hardware at an exponential problem is a losing game, and why the §37.6 coping menu (not more cores) is the real answer?

Your Turn: Extensions

Option A (instrument the gap). Extend find_sat to also return the number of clause checks performed, then hand-tabulate verifier-cost vs. finder-cost for the satisfiable 3-variable formula above. Present the contrast as a two-row table: "check one assignment" vs. "find an assignment."
Option B (build a 2-SAT special case). 2-SAT is in P. Implement a checker that, given a proposed assignment, verifies a 2-CNF formula (reuse verify_sat — it already handles any clause width), then describe in prose the linear-time finding algorithm for 2-SAT (implication graph + SCCs, Chapter 28), and explain why the same trick fails for 3-SAT. (Do not implement the finder; explain the structural reason 2 works and 3 doesn't.)
Option C (close the loop). Write solve_sat_via_clique(clauses) that (1) calls three_sat_to_clique, (2) given a clique-oracle has_clique(V, E, k), returns the oracle's answer as the satisfiability verdict, and (3) on a yes, explains how to read a satisfying assignment off the returned clique (one true literal per column). Hand-trace it on the $\phi$ above. This is the full reduction, run as a program.

Key Takeaways

Verify is cheap; find is (apparently) dear. Side by side in code, the polynomial verifier and the exponential finder make the §37.3 gap something you can see and count, not just assert.
$2^v$ is not a metaphor. The instrumented counter confirms the search space, and a two-line product argument proves it for all $v$ — computation suggests, the proof settles (theme four).
A reduction is a wiring diagram you can run. Building the CLIQUE instance from a 3-SAT formula and watching the answer cross is the operational meaning of $\le_p$: a future fast solver for one problem would power a fast solver for the other.
Difficulty relocates, it doesn't disappear. The reduction is polynomial; the hardness simply moves into the target problem. That is why NP-complete problems stand or fall together — and why the coping menu, not a magic algorithm, is the practitioner's real toolkit.
Worst case is not your case. The finder's $2^v$ is the worst case over all formulas; real instances have structure, which is why industrial solvers — the engineered descendants of find_sat — win in practice without contradicting NP-completeness. When a SAT-backed feature is slow, the right first question is "bad solver, bad encoding, or genuine worst-case hardness?" — and this workbench is where that judgment is cheapest to build.

Case Study: Building a Verify/Find Workbench — A Brute-Force SAT Solver and a Reduction Pipeline

Executive Summary

Background

The scenario

Why this matters

Phase 1: The verifier (the cheap half)

Phase 2: The finder (the expensive half) — and watching the blow-up

Phase 3: The reduction pipeline — carry an answer from 3-SAT to CLIQUE

Putting numbers on the gap

Phase 4: Read it through the coping menu

Discussion Questions

Your Turn: Extensions

Key Takeaways