Chapter 37: Introduction to Complexity Theory — P, NP, and the Biggest Open Problem in CS

DataField.Dev

50 min read

> "If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in 'creative leaps,' no fundamental gap between solving a problem and recognizing the solution once it's found."

Prerequisites

14
30
36

Learning Objectives

Define a complexity class and place a decision problem in P or NP by analyzing an algorithm or a verifier.
Distinguish solving a problem from verifying a proposed solution, and state P and NP in terms of that distinction.
Explain what a polynomial-time reduction is and use one to transfer hardness from one problem to another.
State the definitions of NP-hard and NP-complete and explain the role of the Cook–Levin theorem.
Recognize the canonical NP-complete problems (SAT, 3-SAT, CLIQUE, Hamilton circuit, TSP) and the reductions linking them.
State the P versus NP question precisely, explain its stakes, and choose a coping strategy when faced with an intractable problem.

In This Chapter

Overview
Learning Paths
37.1 Measuring difficulty: complexity classes
37.2 P and NP
37.3 Verification versus solution: the heart of the matter
37.4 NP-completeness and the Cook–Levin theorem
37.5 Reductions in action and the NP-complete zoo
37.6 Coping with intractability, and the P vs. NP question
Project Checkpoint: a polynomial-time verifier for the Toolkit
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 37: Introduction to Complexity Theory — P, NP, and the Biggest Open Problem in CS

"If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in 'creative leaps,' no fundamental gap between solving a problem and recognizing the solution once it's found." — Scott Aaronson

Overview

Here is a question that will follow you for your entire career as a programmer: some problem lands on your desk, and you cannot find an efficient algorithm for it. Is that your fault, or the problem's?

There are two very different explanations for "I can't make this fast." The first is that you are missing a clever idea — there is a fast algorithm and you simply haven't found it. The second is that no fast algorithm exists, for anyone, ever, and the wall you're hitting is a law of nature rather than a gap in your education. Telling these apart is one of the most valuable instincts in computing, because the right move is completely different in each case. If a fast algorithm exists, you keep looking (or ask a sharper colleague). If none exists, you stop wasting weeks on the impossible and switch to a heuristic, an approximation, or a smaller input.

Complexity theory is the branch of computer science that gives you the vocabulary and the tools to make that call. In Chapter 14 you learned to describe how fast an algorithm runs (Big-O) and even to prove a problem needs a certain amount of work (your first lower bound, $\Omega(n)$ for finding a maximum). In Chapter 30 you ran into a wall: the Traveling Salesman Problem and the Hamilton circuit problem looked easy to state but seemed impossible to solve efficiently, and we labeled them "NP-hard" with a promissory note that said this chapter. In Chapter 36 you saw the deepest wall of all — the halting problem, which no algorithm can solve at any speed. This chapter is about the wall in between: problems computers can solve in principle but apparently cannot solve efficiently. That middle ground is where almost all of practical computing lives, and at its center sits the most famous open problem in computer science — P versus NP — with a million-dollar prize attached and the entire security of the internet riding (indirectly) on the answer.

By the end of this chapter, when a problem resists every fast algorithm you try, you'll be able to look up whether it's a known member of a club of thousands of problems that are all hard or all easy together — and you'll know what that membership means for your next move.

In this chapter, you will learn to:

Sort decision problems into complexity classes, and define P (solvable fast) and NP (checkable fast).
Separate finding a solution from verifying one, and see why that gap is the whole drama.
Use a polynomial-time reduction to prove "if I could solve B, I could solve A" — the core move of the entire theory.
Define NP-hard and NP-complete, and explain why the Cook–Levin theorem (SAT is NP-complete) is the keystone that makes the theory possible.
Recognize the famous NP-complete problems and the web of reductions that ties them together.
State P vs. NP precisely, understand the stakes, and pick a coping strategy when you meet an intractable problem in the wild.

Learning Paths

🏃 Fast Track: Read 37.2 (P and NP) and 37.3 (verification vs. solution) for the core ideas, then 37.6 (P vs. NP and coping) for the practical payoff. Skim the Cook–Levin proof sketch in 37.4 but make sure you can state what the theorem says. Do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. Work every 🔄 Check Your Understanding. In 37.5, before reading our reductions, try to sketch the CLIQUE-from-3-SAT gadget yourself — wrestling with one reduction is worth more than reading ten.

🔬 Deep Dive: After the chapter, work through a full reduction chain (3-SAT $\le_p$ CLIQUE $\le_p$ VERTEX COVER) with all gadgets, and read about the polynomial hierarchy and the classes co-NP and PSPACE in the further reading. Then read a real proof of the Cook–Levin theorem in Sipser.

37.1 Measuring difficulty: complexity classes

Until now we have measured the difficulty of algorithms. Complexity theory shifts the focus to the difficulty of problems — and to do that cleanly, it makes two simplifying decisions that look restrictive but lose almost nothing.

Decision one: study decision problems. A decision problem is a problem whose answer is always yes or no. "Is this number prime?" is a decision problem. "Is there a route through these cities of cost at most \$500?" is a decision problem. "Find the *cheapest* route," by contrast, is an *optimization* problem — its answer is a tour, not a yes/no. Complexity theory phrases its core results in terms of decision problems because yes/no is the simplest possible output, which makes the theory uniform. This is not a real loss: as you saw in [Chapter 30](../../part-05-graph-theory/chapter-30-euler-hamilton-tsp/index.md), an optimization problem and its decision version "stand or fall together" — if you can answer "is there a tour of cost $\le B$?" quickly for every budget $B$, you can binary-search $B$ to find the optimal cost quickly. So a fast algorithm for the decision version yields a fast algorithm for the optimization version, and a hardness result for the decision version dooms the optimization version too.

💡 Intuition: Think of a decision problem as a giant, possibly infinite, set of "yes" instances. "PRIMES" is the set of all prime numbers; an algorithm "solves" it if, given any number, it correctly decides whether that number is in the set. Recognizing a set of strings — exactly the framing of the recognition problem from Chapter 36 — is what every decision problem is, underneath.

Decision two: measure cost as a function of input size, in the worst case. We carry over Chapter 14 wholesale: input size $n$ is the number of bits (or symbols) needed to write the instance down, and a problem's difficulty is the worst-case running time of the best algorithm that solves it, expressed asymptotically.

With those two decisions, we can define the central object of the chapter.

Definition (complexity class). A complexity class is a set of decision problems that can all be solved within some shared bound on a computational resource (usually time or memory) on a specified model of computation (a Turing machine, from Chapter 36).

A complexity class is, quite literally, a club whose membership requirement is "you can be solved within this budget." The two clubs this chapter cares about are defined by a time budget, and the budget that matters is the tractable/intractable line you met in Chapters 14 and 30: polynomial time.

Why polynomial, and not, say, $O(n^2)$ exactly? Because polynomial time is the one notion of "efficient" that is robust — it doesn't change when you switch reasonable details of the model.

🚪 Threshold Concept: polynomial time is model-independent. Here is an idea that quietly reshapes how you think about computation. If a problem is solvable in polynomial time on one reasonable model of computation — a Turing machine, a real laptop, a Python interpreter, a RAM machine — it is solvable in polynomial time on all of them. The reason is that any of these models can simulate any other with only a polynomial blowup in steps (a Turing machine simulating a RAM machine pays a polynomial overhead, never an exponential one). So "polynomial" is not an artifact of which computer you picked; it is an intrinsic, portable property of the problem. This is the deep reason complexity theory can talk about "P" without specifying your hardware, and it is sometimes called the (strong) Church–Turing thesis — a refinement of the Church–Turing thesis from Chapter 36, which already told you that what is computable doesn't depend on the model. Now we add: what is efficiently computable doesn't either. Once you internalize this, "is there a polynomial-time algorithm?" becomes a question about the problem's nature, not your toolbox.

This robustness is exactly why theorists chose polynomial time as the dividing line, despite its imperfections (an $O(n^{100})$ algorithm is "polynomial" but useless; we'll address that honestly in 37.6). The line is drawn where it is defensible, and almost every naturally occurring polynomial algorithm turns out to have a small exponent anyway.

🔄 Check Your Understanding 1. Why does complexity theory focus on decision (yes/no) problems instead of optimization problems? What is the bridge between them? 2. What does it mean to say polynomial time is "model-independent," and why does that justify choosing it as the definition of "efficient"? 3. Is "sort this list" a decision problem? If not, can you phrase a related decision problem?

Answers

Yes/no is the simplest output, making the theory uniform across all problems. The bridge is that an optimization problem and its decision version ("is there a solution of value $\le B$?") are equivalent in tractability — binary search over the budget $B$ converts a fast decision algorithm into a fast optimization algorithm, and vice versa. 2. It means any problem solvable in polynomial time on one reasonable model is solvable in polynomial time on every other, because the models simulate each other with only polynomial overhead. That makes "polynomial" an intrinsic property of the problem, not the machine — so it is a defensible, portable definition of "efficient." 3. "Sort this list" is not a decision problem (its output is a reordered list, not yes/no). A related decision problem: "Is this list already sorted?" or "Is element $x$ in position $k$ of the sorted order?"

37.2 P and NP

Now the two clubs. The first is the home of every algorithm you'd be happy to ship.

Definition (the class P). P (for polynomial time) is the class of decision problems solvable by an algorithm whose worst-case running time is $O(n^k)$ for some constant $k$, where $n$ is the input size.

P is precisely the set of decision problems Chapter 14 called tractable. Sorting-related decisions, graph connectivity, shortest path (Dijkstra, Chapter 29), whether a graph has an Euler circuit (Chapter 30 — just count odd-degree vertices), and even — a famous and hard-won result — testing whether a number is prime, are all in P. When you say "I have an efficient algorithm," you are claiming the problem is in P.

💡 Intuition: P is the class of problems you can solve by yourself in a reasonable amount of time, where "reasonable" scales gracefully with input size. If a problem is in P, throwing a bigger input at it costs you a constant factor more work, not a multiplicative explosion.

The second club is subtler, and getting it right is the heart of the chapter. NP is not "not polynomial." (That misreading has confused generations of students.) NP stands for nondeterministic polynomial time, and the cleanest way to understand it is through verification.

Consider the decision version of TSP from Chapter 30: "Given cities, costs, and a budget $B$, is there a tour of total cost $\le B$?" Nobody knows how to answer this quickly. But suppose a colleague hands you a specific tour and claims it works. How long does it take you to check their claim? You walk the tour, add up the $n$ edge costs, and compare the sum to $B$ — that's $O(n)$ work. Verifying a proposed solution is trivially fast, even though finding one seems to require searching through $(n-1)!/2$ tours. That gap — easy to check, (apparently) hard to find — is the defining feature of NP.

Definition (the class NP). NP (for nondeterministic polynomial time) is the class of decision problems for which a yes answer can be verified in polynomial time, given a suitable piece of evidence. Formally: a problem is in NP if there is a polynomial-time algorithm $V$ (a verifier) and a polynomial bound, such that for every yes-instance $x$ there exists a string $c$ (a certificate or witness), of length polynomial in $\lvert x \rvert$, with $V(x, c) = \text{yes}$; and for every no-instance, $V(x, c) = \text{no}$ for every candidate $c$.

Unpack that. The certificate is the "proposed solution" — the tour, the satisfying assignment, the actual route. The verifier is the fast checker. The definition says: a problem is in NP exactly when yes-instances have short, quickly-checkable proofs, and no-instances have no convincing proof at all (nothing can fool the verifier).

💡 Intuition: NP is the class of problems whose solutions you can recognize when you see them, even if you can't produce them yourself. You might not be able to compose a brilliant proof, but you can grade one. You might not find the needle, but if someone points to it you can confirm it's a needle. NP is the "I'll know it when I see it" class. The certificate is the pointing finger; the verifier is your confirmation.

Three worked memberships make this concrete.

TSP (decision) is in NP. Certificate: a tour (an ordering of the cities). Verifier: sum the $n$ edge costs along the tour and check the total is $\le B$, and check the tour visits each city once — all $O(n)$. A genuine yes-instance has a good tour (the certificate exists); a no-instance has no tour under budget, so no certificate can pass. ✓

Hamilton circuit is in NP. Certificate: a proposed circuit (an ordering of the vertices). Verifier: check that consecutive vertices are joined by edges and that each vertex appears exactly once — $O(V + E)$. ✓

Graph coloring (decision): "can $G$ be colored with $k$ colors so no edge is monochromatic?" is in NP. Certificate: an assignment of a color to each vertex. Verifier: scan every edge and confirm its two endpoints differ in color — $O(V + E)$. ✓ (You met chromatic number in Chapter 33; here is its complexity home.)

Now, two facts about how P and NP relate, one easy and one the whole point of the chapter.

Easy fact: $\text{P} \subseteq \text{NP}$. Every problem you can solve fast, you can also verify fast — just ignore the certificate and solve it from scratch. If you can compute the answer in polynomial time, you don't even need the hint.

The strategy first. To show P sits inside NP, we take an arbitrary problem in P and build a verifier for it, proving it's in NP. The trick is that a P problem already has a fast solver, so the verifier can simply re-solve the instance and ignore whatever certificate it's handed. The certificate is dead weight; the solver does all the work within the polynomial budget.

Proof that $\text{P} \subseteq \text{NP}$. Let $A$ be any problem in P, solved by a polynomial-time algorithm $M$. Define a verifier $V(x, c)$ that ignores $c$ entirely and simply returns $M(x)$. Then for any yes-instance $x$, every certificate $c$ (say, the empty string) satisfies $V(x, c) = M(x) = \text{yes}$, so a valid short certificate exists. For any no-instance $x$, $V(x, c) = M(x) = \text{no}$ for every $c$, so nothing fools the verifier. Since $M$ runs in polynomial time, so does $V$. Hence $A \in \text{NP}$. As $A$ was arbitrary, $\text{P} \subseteq \text{NP}$. $\blacksquare$

The hard fact (nobody knows): is $\text{P} = \text{NP}$ or $\text{P} \subsetneq \text{NP}$? Are the two clubs actually the same club — is everything checkable also solvable? Or is NP strictly bigger — are there problems whose solutions you can recognize but never efficiently find? This is the P vs. NP problem, and after more than fifty years of effort it is unresolved. We'll devote 37.6 to what hangs on it. For now, hold the picture: P inside NP, and a question mark over whether the inclusion is strict.

⚠️ Common Pitfall: NP does not mean "not polynomial." NP means nondeterministic polynomial — equivalently, "verifiable in polynomial time." Every problem in P is also in NP (we just proved it). So "this problem is in NP" says nothing bad about it; sorting is in NP. The hard problems are the ones in NP that are not known to be in P — and the worst of those are the NP-complete problems of 37.4. Saying "NP problems are the unsolvable ones" is doubly wrong: they're solvable (by exponential search), and many are downright easy.

🔄 Check Your Understanding 1. What does the "N" in NP stand for, and what is the common misreading? 2. Give the certificate and the verifier for the decision problem "does this graph have a clique of size $k$?" (a clique is a set of vertices all pairwise adjacent). 3. Explain in one sentence why $\text{P} \subseteq \text{NP}$.

Answers

"N" stands for nondeterministic (NP = nondeterministic polynomial time, equivalently polynomial-time verifiable). The common misreading is "non-polynomial," which is wrong — P is contained in NP. 2. Certificate: a set $S$ of $k$ vertices. Verifier: check $\lvert S \rvert = k$ and that every pair of vertices in $S$ is joined by an edge — $O(k^2)$, polynomial. A yes-instance has such a set; a no-instance has none, so no certificate passes. 3. Any problem you can solve in polynomial time you can also verify in polynomial time — the verifier just ignores the certificate and re-solves the instance.

37.3 Verification versus solution: the heart of the matter

The P vs. NP question is easy to state once you see it as a question about two human activities you already know are different: solving a problem and checking an answer.

Think about your own experience. Composing a proof from scratch is hard; reading someone's proof and confirming each step is much easier. Finding the factors of a 200-digit number is (apparently) infeasible; checking that two given 100-digit numbers multiply to it is a quick multiplication. Solving a hard Sudoku takes real work; verifying a completed grid is a glance down each row, column, and box. In every case, verification is easier than discovery.

P vs. NP asks whether that asymmetry is real or merely apparent. P is "problems you can solve fast." NP is "problems you can check fast." The question $\text{P} \stackrel{?}{=} \text{NP}$ is exactly:

Is being able to recognize a solution the same as being able to find one?

Almost everyone believes the answer is no — that $\text{P} \ne \text{NP}$, that checking really is fundamentally easier than solving. It matches all of human experience: appreciating a symphony is not composing one, and recognizing a beautiful theorem is not proving it. But believing is not knowing, and no one has been able to prove the asymmetry is genuine. It is conceivable — wildly counterintuitive, but conceivable — that every problem whose solutions we can check, we could also solve fast, if only we found the right algorithm.

🚪 Threshold Concept: the verify/find gap is the deepest open question in computer science. Up to now, "hard problem" has meant "I don't have a fast algorithm." This section reframes hardness as a question about the structure of problems themselves: is there a whole class of problems where finding is provably harder than checking? If $\text{P} = \text{NP}$, then for every problem whose answers can be efficiently verified, there is an efficient way to find those answers — mathematical creativity, in a sense, would be automatable, because finding a proof would be no harder than checking one. If $\text{P} \ne \text{NP}$, then there is a permanent, unbridgeable gap between recognizing and discovering — some things are genuinely easier to appreciate than to create. Either way, the answer reshapes our understanding of what computation, and arguably thought, can do. This is why the epigraph says the world would be "a profoundly different place" if $\text{P} = \text{NP}$.

To make the asymmetry vivid, here is a tiny program that does the checking half for one NP problem — Boolean satisfiability, which we'll formalize in 37.4. The point is how short and fast the verifier is, in stark contrast to the search that finding a satisfying assignment would require.

def verify_sat(clauses, assignment):
    """clauses: list of clauses; each clause is a list of (var, is_positive).
    assignment: dict var -> bool. Returns True iff the assignment satisfies
    EVERY clause (a clause is satisfied if at least one of its literals is True)."""
    for clause in clauses:
        satisfied = any(assignment[v] == is_pos for (v, is_pos) in clause)
        if not satisfied:
            return False          # this clause is unsatisfied -> whole formula fails
    return True

# Formula: (x1 OR NOT x2) AND (x2 OR x3)
f = [[("x1", True), ("x2", False)], [("x2", True), ("x3", True)]]
print(verify_sat(f, {"x1": True, "x2": False, "x3": False}))   # clause 2 fails
print(verify_sat(f, {"x1": True, "x2": True,  "x3": False}))   # both satisfied
# Expected output:
# False
# True

Trace it by hand — reasoning out the answer rather than running it is the discipline this book insists on. For the first call, assignment {x1: True, x2: False, x3: False}: clause 1 is $x_1 \lor \neg x_2$, and $x_1 = \text{True}$ makes the literal $x_1$ true, so clause 1 is satisfied; clause 2 is $x_2 \lor x_3$, but the literal $x_2$ needs $x_2 = \text{True}$ (false here) and the literal $x_3$ needs $x_3 = \text{True}$ (also false), so clause 2 is unsatisfied and the whole formula fails → False. For the second call, {x1: True, x2: True, x3: False}: clause 1 is satisfied by $x_1$ as before, and clause 2 is now satisfied because $x_2 = \text{True}$ makes the literal $x_2$ true → True.

The verifier runs in time proportional to the total size of the formula — fast. Finding a satisfying assignment, by contrast, means searching up to $2^{(\text{number of variables})}$ possible assignments, with no known shortcut. That is the verify/find gap, in code.

💡 Intuition: The certificate (here, the assignment) turns an existential question — "does there exist a satisfying assignment?" — into a checkable claim — "here is one, confirm it." The existential quantifier is what makes finding hard; once a witness is supplied, the quantifier is discharged and only checking remains. This is exactly the $\exists$ from Chapter 2, now wearing a complexity-theoretic cost.

🔄 Check Your Understanding 1. In one sentence, what does the question "$\text{P} = \text{NP}$?" ask about solving versus checking? 2. For factoring, what plays the role of the certificate, and why is checking it fast while finding it is hard? 3. Why does the existential quantifier ("there exists a solution") sit at the core of what makes NP problems hard to solve but easy to verify?

Answers

It asks whether every problem whose solutions can be checked in polynomial time can also be solved in polynomial time — i.e., whether finding is as easy as recognizing. 2. The certificate is a nontrivial factor (or the pair of factors). Checking is one multiplication (polynomial); finding requires searching for a factor among exponentially many candidates, with no known efficient method. 3. "There exists a solution" is the question we must answer; finding requires searching the (exponentially large) space the quantifier ranges over, but once a specific witness is named, the quantifier is satisfied by exhibition and we only have to verify that one witness — a polynomial check.

37.4 NP-completeness and the Cook–Levin theorem

We now have a class, NP, full of "checkable but apparently hard" problems. The breakthrough idea — the one that turned a vague feeling into a science — is that some problems in NP are the hardest in the whole class, in a precise sense: if you could solve any one of them efficiently, you could solve every problem in NP efficiently. These are the NP-complete problems, and understanding them requires one tool we'll build first: the reduction.

Polynomial-time reductions

A reduction is the formalization of "problem A is no harder than problem B." The idea is one you use informally all the time: to solve a new problem, transform it into an old problem you already know how to solve.

Definition (polynomial-time reduction). A polynomial-time (many-one) reduction from decision problem $A$ to decision problem $B$, written $A \le_p B$, is a function $f$ computable in polynomial time that maps instances of $A$ to instances of $B$ such that for every instance $x$, $$x \text{ is a yes-instance of } A \iff f(x) \text{ is a yes-instance of } B.$$ We say "$A$ reduces to $B$" or "$A$ is polynomial-time reducible to $B$."

Read the direction carefully, because students reverse it constantly. $A \le_p B$ means $A$ is no harder than $B$ ($B$ is at least as hard as $A$). Why? Because a fast algorithm for $B$ gives you a fast algorithm for $A$: to decide an instance $x$ of $A$, compute $f(x)$ (polynomial time) and run $B$'s algorithm on it (polynomial time); the answer for $f(x)$ is the answer for $x$. So solving $B$ solves $A$. The hardness flows backward along the arrow: if $A$ is hard, then $B$ must be hard too (otherwise $B$'s fast algorithm would make $A$ fast).

💡 Intuition: $A \le_p B$ is "I can rephrase any A-question as a B-question." If you have a B-oracle, you can answer A-questions by translation. So B is at least as powerful/hard as A. The mnemonic: the symbol $\le$ points the right way — the thing on the left is the easier (or equal) one. Reductions are how you transfer difficulty: prove one problem hard, and every problem that reduces to it... no — every problem that it reduces to inherits the hardness. (Re-read until the direction is automatic; it is the single most error-prone point in the chapter.)

🔗 Connection. You have already seen reductions, informally. In Chapter 30 we noted that decision-TSP and optimization-TSP reduce to each other (binary-search the budget). In Chapter 36, reductions are how undecidability spreads: if the halting problem reduces to your problem, your problem is undecidable too. Complexity-theoretic reductions are the same idea with a resource bound — instead of "computable," the translation must be polynomial-time computable, because we care not just about whether we can transform but whether we can do it cheaply enough to preserve efficiency. Same tool, sharper constraint.

Reductions are transitive, which is what lets us build long chains of hardness.

The strategy first. To show $\le_p$ is transitive, suppose $A \le_p B$ via $f$ and $B \le_p C$ via $g$. The natural candidate reduction from $A$ to $C$ is the composition $g \circ f$. We must check two things: that it preserves yes/no answers (chase the iff through both reductions), and that it runs in polynomial time (the composition of two polynomial maps is polynomial — but we must be slightly careful about how big $f(x)$ can get).

Proof that $A \le_p B$ and $B \le_p C$ imply $A \le_p C$. Let $f$ reduce $A$ to $B$ in time $\le p(n)$ for a polynomial $p$, and $g$ reduce $B$ to $C$ in time $\le q(n)$ for a polynomial $q$. Define $h(x) = g(f(x))$. Correctness: $x$ is a yes-instance of $A$ $\iff$ $f(x)$ is a yes-instance of $B$ (by $f$) $\iff$ $g(f(x))$ is a yes-instance of $C$ (by $g$). So $h$ preserves yes/no answers. Running time: computing $f(x)$ takes $\le p(\lvert x \rvert)$ steps, and crucially the output $f(x)$ has length $\le p(\lvert x \rvert)$ (an algorithm can't write more output than its running time allows). Then computing $g(f(x))$ takes $\le q(\lvert f(x)\rvert) \le q(p(\lvert x \rvert))$ steps. The total, $p(\lvert x \rvert) + q(p(\lvert x \rvert))$, is a polynomial in $\lvert x \rvert$ (a polynomial of a polynomial is a polynomial). Hence $h$ is a polynomial-time reduction, so $A \le_p C$. $\blacksquare$

NP-hard and NP-complete

With reductions in hand, "the hardest problems in NP" becomes precise. We give the formal definitions that Chapter 30 promised — there we could only preview "NP-hard"; here is the real thing.

Definition (NP-hard). A problem $H$ is NP-hard if every problem $A$ in NP reduces to it in polynomial time: $A \le_p H$ for all $A \in \text{NP}$.

An NP-hard problem is at least as hard as everything in NP. Note that $H$ itself need not be in NP — it could be even harder, or not a decision problem at all (the optimization TSP is NP-hard but, being an optimization problem, isn't in NP). NP-hard is a lower bound on difficulty: "this is as hard as the hardest of NP, or harder."

Definition (NP-complete). A problem is NP-complete if it is both in NP and NP-hard.

NP-complete problems are the hardest problems within NP — the ones that are simultaneously checkable-fast (in NP) and universally-hard (NP-hard). They are the crown jewels of the theory, and they have a spectacular property:

The pivotal consequence. If any single NP-complete problem is in P, then $\text{P} = \text{NP}$ (every NP problem becomes polynomial). Conversely, if $\text{P} \ne \text{NP}$, then no NP-complete problem is in P — they are all permanently intractable. The thousands of known NP-complete problems thus stand or fall together: solve one efficiently and you've solved them all; prove one truly hard and you've doomed them all.

The strategy first. Why does one fast NP-complete algorithm collapse the whole class? Suppose an NP-complete problem $C$ has a polynomial algorithm. Take any problem $A \in \text{NP}$. Because $C$ is NP-hard, $A \le_p C$ — there's a polynomial translation from $A$ to $C$. Chain them: translate the A-instance to a C-instance (polynomial), then solve it with $C$'s fast algorithm (polynomial). That's a polynomial algorithm for an arbitrary NP problem, so all of NP collapses into P.

Proof that an NP-complete problem in P forces $\text{P} = \text{NP}$. Suppose $C$ is NP-complete and $C \in \text{P}$, solved by a polynomial-time algorithm $M_C$. Let $A$ be any problem in NP. Since $C$ is NP-hard, there is a polynomial-time reduction $f$ with $x \in A \iff f(x) \in C$. Build an algorithm for $A$: on input $x$, compute $f(x)$ (polynomial time) and return $M_C(f(x))$ (polynomial time, since $\lvert f(x)\rvert$ is polynomial in $\lvert x\rvert$). This correctly decides $A$ and runs in polynomial time, so $A \in \text{P}$. As $A$ was an arbitrary NP problem, $\text{NP} \subseteq \text{P}$; combined with $\text{P} \subseteq \text{NP}$ (37.2), $\text{P} = \text{NP}$. $\blacksquare$

This is beautiful but it has a chicken-and-egg problem. The definition of NP-complete requires showing that every NP problem reduces to your candidate — infinitely many problems, including ones nobody has thought of yet. How could you ever prove a first problem NP-complete from scratch? That is exactly the barrier the Cook–Levin theorem smashed.

The Cook–Levin theorem: the keystone

SAT (the Boolean satisfiability problem). A Boolean formula is satisfiable if there is an assignment of true/false to its variables that makes the whole formula true. SAT is the decision problem: given a Boolean formula, is it satisfiable?

You met satisfiability as far back as Chapter 1 (it was flagged there as "the first hard problem"). SAT is obviously in NP: a satisfying assignment is a certificate, and checking it is the verify_sat function from 37.3 — linear in the formula size. The astonishing fact is that SAT is also NP-hard, and therefore NP-complete.

The Cook–Levin Theorem (1971). SAT is NP-complete. That is, SAT is in NP, and every problem in NP reduces to SAT in polynomial time.

The strategy (this is a sketch, not a full proof). Take any problem $A \in \text{NP}$. By definition it has a polynomial-time verifier $V$ and, on a yes-instance, a polynomial-size certificate $c$ with $V(x, c) = \text{yes}$. The genius of Cook and Levin was to realize that a computation of $V$ is a mechanical, local process — at each step the machine's state depends only on a few nearby cells — and any such bounded computation can be encoded as a giant Boolean formula. The formula has variables describing the machine's configuration at every time step, and clauses asserting "the start is set up correctly," "each step follows the machine's rules from the previous one," and "the machine ends in an accepting state." This formula is satisfiable if and only if there is a certificate $c$ making $V$ accept $x$ — i.e., iff $x$ is a yes-instance of $A$. The formula's size is polynomial in $\lvert x \rvert$ (because $V$ runs in polynomial time, so there are only polynomially many time steps and cells to describe), and it can be written down in polynomial time. That construction is a polynomial-time reduction from $A$ to SAT. Since $A$ was arbitrary, every NP problem reduces to SAT.

We won't carry out the full encoding — it is intricate, and Sipser's textbook gives it in loving detail — but absorb what it accomplishes. Cook and Levin proved a single problem NP-complete directly, by reducing the abstract definition of "any NP computation" into SAT. That one foothold changes everything: to prove a new problem $B$ is NP-complete, you no longer need to wrestle with all of NP. You just (1) show $B \in \text{NP}$, and (2) reduce one already-known NP-complete problem (like SAT) to $B$. By transitivity of $\le_p$, if every NP problem reduces to SAT and SAT reduces to $B$, then every NP problem reduces to $B$ — so $B$ is NP-hard, and with (1), NP-complete.

🚪 Threshold Concept: NP-completeness ties thousands of problems into one knot. Before Cook–Levin, each hard problem was its own private frustration. After it, an avalanche: within a few years, Richard Karp showed 21 classic problems NP-complete by reduction, and today the list runs to thousands across logistics, scheduling, biology, circuit design, and more. The transformative idea is that these are not many independent hard problems — they are one hardness wearing thousands of costumes. A polynomial algorithm for any of them (graph coloring, TSP, SAT, protein folding's discrete models, ...) would instantly solve all of them and prove $\text{P} = \text{NP}$. This is why, when you discover your problem is NP-complete, you can stop feeling personally inadequate: you've hit a wall that the entire field has failed to breach, and that failure is itself strong evidence the wall is real.

🐛 Find the Error. A student writes: "I proved my problem $B$ is NP-complete by reducing $B$ to SAT ($B \le_p \text{SAT}$). Since SAT is NP-complete, $B$ is too." What's wrong?

Answer

The reduction is backwards. $B \le_p \text{SAT}$ means $B$ is no harder than SAT — which only shows $B$ is in NP-ish territory (everything in NP reduces to SAT anyway, so this is nearly free and proves nothing about $B$'s hardness). To prove $B$ is NP-hard, the student must reduce a known NP-complete problem to $B$: $\text{SAT} \le_p B$ (or $\text{3-SAT} \le_p B$, etc.). The direction is the whole point: hardness flows from the known-hard problem into the new one, so the known-hard problem must be the source of the reduction. Plus, they still need to separately show $B \in \text{NP}$. This reversed-direction mistake is the most common error in all of complexity theory.

🔄 Check Your Understanding 1. State the meaning of $A \le_p B$ in plain English, and say which of $A, B$ is "at least as hard." 2. What two things must you prove to establish that a problem $B$ is NP-complete, and in which direction must the reduction go? 3. What did the Cook–Levin theorem make possible that was impossible before it?

Answers

$A \le_p B$ means every instance of $A$ can be transformed in polynomial time into an equivalent instance of $B$ (same yes/no answer), so a fast algorithm for $B$ yields a fast algorithm for $A$. Therefore $B$ is at least as hard as $A$. 2. (i) $B \in \text{NP}$ (give a polynomial-time verifier / short certificate), and (ii) $B$ is NP-hard, proved by reducing a known NP-complete problem to $B$ (e.g., $\text{SAT} \le_p B$) — the known-hard problem is the source. 3. It proved a first problem (SAT) NP-complete directly from the definition of NP, giving a foothold; afterward, any new problem can be proven NP-complete by a single reduction from an already-known NP-complete problem (using transitivity), instead of reasoning about all of NP from scratch.

37.5 Reductions in action and the NP-complete zoo

Cook–Levin gives us SAT. Everything else in the "zoo" of NP-complete problems is reached by a chain of reductions starting from SAT. Let's walk one important reduction in full, then survey the zoo and the web of reductions connecting it.

From SAT to 3-SAT (restricting the form)

It's often easier to reduce from a problem with a tidy, restricted shape. The first link tidies SAT into 3-SAT.

3-SAT. A formula is in conjunctive normal form (CNF) if it is an AND of clauses, each clause an OR of literals (a literal is a variable or its negation). 3-SAT is SAT restricted to CNF formulas with exactly three literals per clause, e.g. $(x_1 \lor \neg x_2 \lor x_3) \land (\neg x_1 \lor x_2 \lor x_4)$.

Remarkably, this restriction doesn't make the problem easier: 3-SAT is still NP-complete (every general SAT formula can be rewritten, in polynomial time, as a 3-CNF formula that is satisfiable iff the original is, by introducing auxiliary variables to break long clauses apart). 3-SAT is the workhorse source for most other reductions, because three literals per clause is just enough structure to build "gadgets" out of. (By contrast, 2-SAT — two literals per clause — is in P, solvable in linear time. The jump from 2 to 3 literals is, astonishingly, the jump from tractable to NP-complete. Hold that for the spaced review.)

From 3-SAT to CLIQUE (a full gadget reduction)

Now the showpiece: a reduction proving the CLIQUE problem NP-hard.

CLIQUE. A clique in a graph is a set of vertices that are all pairwise adjacent (a complete subgraph). CLIQUE is the decision problem: given a graph $G$ and integer $k$, does $G$ contain a clique of size $k$?

CLIQUE is in NP (certificate: the $k$ vertices; verifier: check all $\binom{k}{2}$ pairs are edges). To show it's NP-hard, we reduce 3-SAT to it: $\text{3-SAT} \le_p \text{CLIQUE}$.

The strategy first. Given a 3-CNF formula with $m$ clauses, we must build a graph (and a number $k$) so that the graph has a $k$-clique exactly when the formula is satisfiable. The idea: make one vertex per literal-occurrence, grouped by clause ($m$ groups of 3). Choosing a vertex from a group means "make this literal true to satisfy this clause." We connect two vertices iff they are in different clauses and are not contradictory (we never connect $x_i$ to $\neg x_i$). A clique then corresponds to a consistent choice of one satisfying literal per clause — and we set $k = m$.

The reduction. Let $\phi = C_1 \land C_2 \land \dots \land C_m$ be a 3-CNF formula, with clause $C_j = (\ell_{j,1} \lor \ell_{j,2} \lor \ell_{j,3})$. Construct a graph $G$:

Vertices: one vertex for each literal occurrence, so $3m$ vertices, labeled $(j, t)$ for clause $j$ and position $t \in \{1,2,3\}$. Think of the vertices as $m$ "columns" of 3.
Edges: connect $(j, t)$ to $(j', t')$ if and only if (a) they are in different clauses ($j \ne j'$), and (b) their literals are not contradictory (they are not $x_i$ and $\neg x_i$ for the same variable $x_i$). No edges within a column.

Set $k = m$. We claim: $\phi$ is satisfiable $\iff$ $G$ has a clique of size $m$.

The strategy first (for the iff). A clique of size $m$ must pick one vertex from each of the $m$ columns (no two clique vertices can share a column, since columns have no internal edges, and there are only $m$ columns to supply $m$ pairwise-adjacent vertices). Picking one literal per clause that are all mutually non-contradictory is exactly a consistent way to satisfy every clause. We prove both directions by turning a satisfying assignment into such a selection and back.

Proof of the iff. ($\Rightarrow$) Suppose $\phi$ is satisfiable, via assignment $\alpha$. Every clause $C_j$ has at least one literal made true by $\alpha$; pick one such true literal per clause, giving one vertex $v_j$ per column. Any two chosen vertices $v_j, v_{j'}$ ($j \ne j'$) are in different clauses, and they cannot be contradictory: both are true under the single assignment $\alpha$, and $x_i, \neg x_i$ cannot both be true. So every pair is connected — the $m$ chosen vertices form a clique of size $m$.

($\Leftarrow$) Suppose $G$ has a clique of size $m$. Since there are no edges within a column and the clique has $m$ pairwise-adjacent vertices, the clique contains exactly one vertex from each column (it can't take two from one column, and $m$ vertices across $m$ columns with no column repeated forces one per column). These vertices' literals are pairwise non-contradictory (they're adjacent, so condition (b) held), so we can consistently assign each variable to make all these literals true (set any variable not mentioned arbitrarily). Under this assignment, each clause $C_j$ has its chosen literal true, so every clause is satisfied — $\phi$ is satisfiable.

Finally, the construction is polynomial: $G$ has $3m$ vertices and at most $\binom{3m}{2}$ edges, all computable by examining pairs of literals in time polynomial in the formula's size. Hence $\text{3-SAT} \le_p \text{CLIQUE}$, and since CLIQUE is in NP, CLIQUE is NP-complete. $\blacksquare$

💡 Intuition: The reduction encodes logic as geometry. "Satisfy every clause consistently" becomes "find a fully-connected set, one per column." The cleverness is entirely in the edge rule: forbidding within-column edges forces one-per-clause, and forbidding contradictory edges forces consistency. Every NP-completeness reduction is a gadget like this — a translation of one problem's constraints into another problem's structure. Inventing the gadget is the art; verifying the iff is the craft.

Here's a compact illustration of the construction (not a solver — building the graph is the polynomial part):

def three_sat_to_clique(clauses):
    """clauses: list of 3 literals each; literal = (var, is_positive).
    Returns (vertices, edges, k) for the CLIQUE instance. Building, not solving."""
    vertices = [(j, t) for j in range(len(clauses)) for t in range(3)]
    edges = set()
    for (j, t) in vertices:
        for (j2, t2) in vertices:
            if j < j2:                                   # different clauses, no dup pairs
                v1, p1 = clauses[j][t]
                v2, p2 = clauses[j2][t2]
                if not (v1 == v2 and p1 != p2):          # not contradictory
                    edges.add(((j, t), (j2, t2)))
    return vertices, edges, len(clauses)                 # k = number of clauses

phi = [[("a", True), ("b", True), ("c", True)],
       [("a", False), ("b", True), ("c", False)]]
V, E, k = three_sat_to_clique(phi)
print(len(V), len(E), k)
# Expected output:
# 6 7 2

Hand-check the counts, because the linear-time construction is easy to get subtly wrong. There are $2$ clauses $\times\ 3 = 6$ vertices, so $\lvert V \rvert = 6$. For the edges, every one of the $3 \times 3 = 9$ cross-clause pairs is connected unless contradictory. A pair is contradictory exactly when the same variable appears with opposite signs across the two columns. Column 0's literals are $a, b, c$ (all positive); column 1's are $\neg a, b, \neg c$. Check each variable: $a$ is positive in column 0 and negative in column 1 → forbidden; $b$ is positive in both → fine; $c$ is positive in column 0 and negative in column 1 → forbidden. So there are exactly two forbidden pairs, $\{a, \neg a\}$ and $\{c, \neg c\}$, giving $\lvert E \rvert = 9 - 2 = 7$. And $k = m = 2$. Hence the output 6 7 2.

A clique of size $k = 2$ here picks one true literal from each clause consistently — e.g. vertex $b$ in clause 0 and vertex $b$ in clause 1, which are non-contradictory and adjacent, corresponding to the satisfying assignment $b = \text{True}$.

The zoo and the web of reductions

From SAT and 3-SAT, reductions fan out to a vast family. A few you've already met or will recognize:

Problem	What it asks (decision form)	First shown NP-complete via
SAT	Is this Boolean formula satisfiable?	Cook–Levin (from all of NP)
3-SAT	Is this 3-CNF formula satisfiable?	$\le$ from SAT
CLIQUE	Is there a clique of size $k$?	$\le$ from 3-SAT (above)
VERTEX COVER	Is there a set of $k$ vertices touching every edge?	$\le$ from CLIQUE
INDEPENDENT SET	Is there a set of $k$ pairwise-non-adjacent vertices?	$\le$ from CLIQUE (complement graph)
HAMILTON CIRCUIT	Does the graph have a Hamilton circuit? (Chapter 30)	$\le$ from 3-SAT (via gadgets)
TSP (decision)	Tour of cost $\le B$? (Chapter 30)	$\le$ from Hamilton circuit
SUBSET SUM	Does some subset sum to target $t$?	$\le$ from 3-SAT
GRAPH 3-COLORING	Can $G$ be colored with 3 colors? (Chapter 33)	$\le$ from 3-SAT

🔗 Connection. Notice how this chapter synthesizes the whole book. SAT is the logic of Chapter 1 made computational. CLIQUE, VERTEX COVER, INDEPENDENT SET, Hamilton, and 3-coloring are the graph theory of Part V. SUBSET SUM is counting and number theory. The reductions are the functions of Chapter 9 with a polynomial-time budget. Complexity theory is where logic, graphs, counting, and algorithms all meet — which is exactly why it sits in the synthesis part of the book. The "easy/hard boundary" you first felt with Euler vs. Hamilton in Chapter 30 is now a precise map, and TSP's informal "NP-hard" label from that chapter is now a theorem reached by the chain $\text{3-SAT} \le_p \text{Hamilton} \le_p \text{TSP}$.

💡 Intuition: the reduction web is what makes belief in $\text{P} \ne \text{NP}$ so strong. Thousands of brilliant people, across decades and dozens of fields, have tried to find a fast algorithm for some NP-complete problem — any one would crack them all. Every attempt has failed. That collective failure, spread across problems that look completely unrelated on the surface but are secretly the same problem, is the empirical case that no fast algorithm exists. It's not a proof, but it's why almost every expert bets on $\text{P} \ne \text{NP}$.

🔄 Check Your Understanding 1. To prove a new problem $B$ NP-complete, why is it usually easier to reduce from 3-SAT than from general SAT? 2. In the 3-SAT $\le_p$ CLIQUE reduction, what does choosing one vertex per "column" correspond to, and why are within-column edges forbidden? 3. 2-SAT is in P but 3-SAT is NP-complete. What does this tell you about how sensitive complexity can be to a problem's parameters?

Answers

3-SAT has a rigid, uniform structure (CNF, exactly three literals per clause), which gives reductions a predictable shape to build gadgets against; general SAT formulas can have arbitrary nesting, which is messier to translate. (And 3-SAT is still NP-complete, so reducing from it loses no hardness.) 2. Choosing one vertex per column = choosing one literal to satisfy in each clause. Within-column edges are forbidden so that a clique cannot take two vertices from the same clause — forcing it to spread across all $m$ clauses, i.e., to satisfy every clause. 3. Complexity can be extremely sensitive: a seemingly tiny change (two literals per clause vs. three) flips a problem from polynomial to NP-complete. Small parameter changes can cross the tractability boundary, so you cannot assume a "slightly harder-looking" variant is only slightly harder.

37.6 Coping with intractability, and the P vs. NP question

Suppose you've identified that your problem is NP-complete. What now? And what, exactly, is the great open question hanging over all of this?

Stating P vs. NP precisely

We can now state it with full precision, stripped of metaphor.

The P versus NP problem. Is $\text{P} = \text{NP}$? Equivalently (by the pivotal consequence of 37.4): does any NP-complete problem admit a polynomial-time algorithm? Equivalently: for every problem whose solutions can be verified in polynomial time, can a solution also be found in polynomial time?

This is one of the seven Millennium Prize Problems named by the Clay Mathematics Institute in 2000, each carrying a \$1,000,000 prize. It is widely regarded as the most important open problem in computer science and one of the deepest in all of mathematics. The consensus expectation is $\text{P} \ne \text{NP}$, but there is no proof in either direction — and proving it appears to require fundamentally new techniques, since several natural proof strategies have been shown (themselves by theorems) to be incapable of settling it.

The two worlds. If $\text{P} = \text{NP}$: thousands of currently-intractable problems (logistics, scheduling, chip design, protein structure prediction in its discrete forms, automated theorem proving) become efficiently solvable; much of modern cryptography — which relies on certain problems being hard — would collapse, because breaking a code is an NP problem (the key is a certificate). If $\text{P} \ne \text{NP}$: the intractable stays intractable forever, the hardness that secures cryptography is permanent, and there is a real, provable gap between creativity (finding) and judgment (checking).

🔗 Connection. Recall from Chapter 25 that RSA's security rests on factoring being hard, and Chapter 14 noted factoring is believed hard because the input size is the number of bits. Factoring is in NP (a factor is a certificate), but — interestingly — it is not known to be NP-complete; it may sit in a middle zone between P and NP-complete. So even a proof of $\text{P} \ne \text{NP}$ wouldn't directly guarantee RSA is secure, and a fast factoring algorithm wouldn't by itself prove $\text{P} = \text{NP}$. Complexity theory makes our cryptographic assumptions precise enough to worry about correctly — which is itself a service.

Chapter 30 gave you a "response menu" for the NP-hard TSP. Now we can see it as the general engineering response to any NP-complete problem. When you cannot have a fast, exact, general algorithm (because that would prove $\text{P} = \text{NP}$), you give up exactly one of those three adjectives:

Give up...	Strategy	Example	What you keep
"general"	exploit special structure of your inputs	2-SAT is poly; tree-structured instances; small parameters (FPT)	exact + fast, on a restricted class
"exact"	approximation / heuristics	Christofides ($\le \frac{3}{2}\times$ optimal, metric TSP, Ch. 30); nearest-neighbor; local search	fast + general, near-optimal
"fast"	smarter exponential search	branch-and-bound; Held–Karp DP ($O(n^2 2^n)$, Ch. 30); modern SAT solvers	exact + general, tolerable on real instances

Each row is a real, respectable engineering choice. Notice the last row's quiet triumph: modern SAT solvers. SAT is the canonical NP-complete problem, yet industrial SAT solvers routinely dispatch formulas with millions of variables in hardware verification, software analysis, and scheduling. How? Their worst case is still exponential (SAT is NP-complete; that can't change unless $\text{P} = \text{NP}$), but real-world instances are not worst-case — they have structure the solvers ruthlessly exploit. "NP-complete" is a statement about the worst case over all inputs; it does not forbid solving the inputs you actually have.

⚠️ Common Pitfall: "NP-complete" does not mean "give up." It means "no fast algorithm that is exact and general and works on every input — assuming $\text{P} \ne \text{NP}$." It does not mean your specific instances are unsolvable. The worst case may be astronomically rare for your data. Before despairing over an NP-complete problem, ask: are my inputs small? structured? Would a 99%-good answer do? Is there a great solver (like a SAT or ILP solver) I can phrase my problem for? Often the practical answer is yes. The label is a guide to which tools to reach for, not a tombstone.

💡 Intuition: Discovering your problem is NP-complete is good news in disguise. It tells you to stop hunting for the perfect fast algorithm (which would make you a millionaire and overturn computer science) and start choosing intelligently among approximation, structure-exploitation, and smart search. It converts an open-ended, possibly-hopeless search into a well-understood menu of trade-offs. The most expensive mistake is not knowing a problem is NP-complete and burning months trying to do the impossible.

🔄 Check Your Understanding 1. State the P vs. NP question in terms of an NP-complete problem having (or not having) a polynomial algorithm. 2. SAT is NP-complete, yet SAT solvers handle millions of variables in practice. Reconcile these two facts. 3. You've shown your problem is NP-complete. Name the three coping strategies and which single guarantee each one sacrifices.

Answers

$\text{P} = \text{NP}$ if and only if some (equivalently, every) NP-complete problem has a polynomial-time algorithm; the open question is whether such an algorithm exists. 2. "NP-complete" bounds the worst case over all possible inputs; real-world formulas are typically far from worst-case and carry exploitable structure (and SAT solvers are heavily engineered to exploit it), so they're solved fast in practice even though a pathological worst-case input would still blow up. 3. (i) Give up generality — exploit special structure of your inputs (keep exact + fast on a restricted class); (ii) give up exactness — use approximation/heuristics (keep fast + general, get near-optimal); (iii) give up speed — use smart exponential search like branch-and-bound or Held–Karp (keep exact + general, tolerable on real instances).

Project Checkpoint: a polynomial-time verifier for the Toolkit

This chapter's deepest idea is the verify/find gap — that checking a solution can be vastly easier than finding one. Chapter 30 deliberately stopped the Toolkit's graph algorithms at the tractable side (Euler, not Hamilton), because no efficient Hamilton/TSP solver is known. We honor that here: we will not add an efficient NP-complete solver (that would be a million-dollar bug report). Instead we add what NP is actually about — a polynomial-time verifier — making the chapter's central concept executable.

Add to graphs.py (the module from Chapters 27–34) a verifier for a proposed Hamilton circuit: the certificate-checker that puts the Hamilton circuit problem in NP.

# dmtoolkit/graphs.py  (add to the Chapter 27 Graph module)

def verify_hamilton_circuit(g, tour):
    """Verify (in polynomial time) that `tour` is a Hamilton circuit of g.
    tour: list of vertices, length |V|+1, starting and ending at the same vertex.
    Returns True iff tour visits every vertex exactly once and returns to start.
    This is the NP *verifier* for Hamilton circuit -- it CHECKS, it does not FIND."""
    verts = set(g.vertices())
    n = len(verts)
    if len(tour) != n + 1 or tour[0] != tour[-1]:
        return False                          # must be a closed walk of the right length
    visited = tour[:-1]                        # drop the repeated final vertex
    if set(visited) != verts or len(set(visited)) != n:
        return False                          # must hit every vertex exactly once
    for i in range(n):                         # every consecutive pair must be an edge
        if tour[i + 1] not in g.neighbors(tour[i]):
            return False
    return True

The whole check is $O(V + E)$: a constant number of passes over the tour, each pair an adjacency lookup. That polynomial bound is exactly the witness that Hamilton circuit is in NP. Here's the checkpoint test, on the square graph $1\!-\!2\!-\!3\!-\!4\!-\!1$ (a 4-cycle):

# project-checkpoint.py
g = Graph(directed=False)
for u, v in [(1, 2), (2, 3), (3, 4), (4, 1)]:
    g.add_edge(u, v)
print(verify_hamilton_circuit(g, [1, 2, 3, 4, 1]))   # a real Hamilton circuit
print(verify_hamilton_circuit(g, [1, 2, 4, 3, 1]))   # 2-4 is not an edge
# Expected output:
# True
# False

Hand-trace the first call: tour length is $5 = 4 + 1$ ✓, starts and ends at 1 ✓; visited = [1,2,3,4] hits all four vertices once ✓; consecutive pairs $(1,2),(2,3),(3,4),(4,1)$ are all edges ✓ → True. Second call: same length and vertex checks pass, but the pair $(2,4)$ — is 4 a neighbor of 2? The edges are $1\!-\!2, 2\!-\!3, 3\!-\!4, 4\!-\!1$, so 2's neighbors are $\{1, 3\}$; 4 is not among them → False.

Toolkit so far: logic.py (Ch.1–3), sets.py (Ch.8), relations.py (Ch.12–13), combinatorics.py (Ch.15–17), recurrences.py (Ch.18–19), probability.py (Ch.20–21), numbertheory.py / crypto.py (Ch.22–25), coding.py (Ch.26), complexity.py (Ch.14), and graphs.py (Ch.27–34), now with verify_hamilton_circuit — the NP verifier. The Toolkit deliberately contains verifiers and tractable solvers, never an efficient NP-complete solver, encoding this chapter's lesson in code: we can check a Hamilton circuit in polynomial time, but finding one is exactly the kind of problem P vs. NP is about. In the capstone (Chapter 39), the Sudoku-solver track (Track C) puts this full circle: Sudoku is NP-complete in general, you'll verify solutions in polynomial time, and you'll find them with constraint-propagation search that exploits structure — the coping menu, applied.

Summary

Complexity theory classifies problems by the resources needed to solve them, drawing the line at polynomial time — the robust, model-independent notion of "efficient."

The two central classes (decision problems):

Class	Definition	Plain English	Examples
P	solvable in $O(n^k)$ time	you can find the answer fast	sorting decisions, connectivity, shortest path, PRIMES, Euler circuit, 2-SAT
NP	a yes has a polynomial-size certificate verifiable in polynomial time	you can check a proposed answer fast	SAT, TSP (dec.), Hamilton circuit, CLIQUE, graph $k$-coloring

$\text{P} \subseteq \text{NP}$ (proved): solving fast implies checking fast (ignore the certificate, re-solve).
Whether $\text{P} = \text{NP}$ is open — the central question.
NP = nondeterministic polynomial = polynomial-time verifiable. It does not mean "not polynomial."

Reductions and completeness:

Term	Meaning
$A \le_p B$	polynomial-time map sending yes-instances to yes-instances; $B$ is at least as hard as $A$
NP-hard	every NP problem reduces to it ($A \le_p H$ for all $A \in \text{NP}$); need not be in NP
NP-complete	in NP and NP-hard — the hardest problems within NP

$\le_p$ is transitive (compose the maps), which lets hardness chain.
One NP-complete problem in P $\Rightarrow$ $\text{P} = \text{NP}$ (and all NP-complete problems collapse together).
To prove $B$ NP-complete: (1) show $B \in \text{NP}$; (2) reduce a known NP-complete problem to $B$. Direction matters — the known-hard problem is the source.

The keystone — Cook–Levin (1971): SAT is NP-complete. Proved by encoding any NP verifier's polynomial computation as a Boolean formula satisfiable iff the input is a yes-instance. This gives the first NP-complete problem, after which all others follow by reduction (Karp's 21, then thousands).

The NP-complete zoo (all interreducible): SAT → 3-SAT → CLIQUE, VERTEX COVER, INDEPENDENT SET, Hamilton circuit → TSP (dec.), SUBSET SUM, GRAPH 3-COLORING. (2-SAT, by contrast, is in P — complexity is parameter-sensitive.)

P vs. NP: a Millennium Prize Problem (\$1M). Asks if *finding* is as easy as *checking*. Almost certainly $\text{P} \ne \text{NP}$, but unproven. If equal: intractable problems become easy and cryptography collapses; if unequal: the hardness is permanent.

Coping with NP-completeness (give up exactly one of exact / general / fast):

Sacrifice	Strategy	Keep
general	exploit input structure (FPT, special cases)	exact + fast on a restricted class
exact	approximation / heuristics (Christofides, local search)	fast + general, near-optimal
fast	smart exponential search (branch-and-bound, Held–Karp, SAT solvers)	exact + general, tolerable in practice

"NP-complete" bounds the worst case over all inputs — it is a guide to which tool to reach for, not a verdict that your instances are unsolvable.

Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking.

(Ch. 36) The halting problem is undecidable; an NP-complete problem like SAT is intractable (assuming $\text{P} \ne \text{NP}$). State the difference between "undecidable" and "intractable" in one sentence each.

(Ch. 36) Reductions appear in both Chapter 36 and this chapter. What is the key difference between a reduction used to prove undecidability and a polynomial-time reduction used to prove NP-hardness?

(Ch. 30) In Chapter 30 we called TSP and the Hamilton circuit problem "NP-hard" as a preview. Now give the precise definition of NP-hard, and name the reduction chain (starting from SAT) that establishes TSP's hardness.

(Ch. 14) Chapter 14 distinguished polynomial from exponential time. Why is polynomial (rather than, say, "$O(n^2)$") the right boundary for defining the class P?

(Ch. 14) Chapter 14 proved finding a maximum needs $\ge n - 1$ comparisons (a lower bound). How does the spirit of lower-bound reasoning relate to the (still-open) effort to prove $\text{P} \ne \text{NP}$?

Answers

Undecidable: no algorithm solves it correctly on all inputs, at any speed (a limit on what is computable at all). Intractable: an algorithm exists, but the best one takes super-polynomial (e.g., exponential) time, so it's infeasible at scale (a limit on what is computable efficiently). 2. An undecidability reduction need only be computable (it shows "if I could solve B, I could solve the halting problem," transferring uncomputability). A complexity reduction must additionally be polynomial-time computable, because it must preserve efficiency, not merely solvability — a slow reduction wouldn't show the target is hard to solve fast. 3. NP-hard: $H$ is NP-hard if every problem $A \in \text{NP}$ satisfies $A \le_p H$ (it is at least as hard as everything in NP). The chain: Cook–Levin gives SAT NP-complete; then $\text{SAT} \le_p \text{3-SAT} \le_p \text{Hamilton circuit} \le_p \text{TSP (decision)}$, so TSP is NP-hard. 4. Because polynomial time is model-independent (any reasonable model simulates another with polynomial overhead), so "polynomial" is an intrinsic property of the problem; "$O(n^2)$" exactly would be brittle (it changes under model details and excludes genuinely-efficient $O(n^3)$ algorithms), making the class neither robust nor closed under composition of efficient procedures. 5. Both seek to prove a problem requires a certain amount of work no matter what algorithm is used — an inherent lower bound. $\text{P} \ne \text{NP}$ is exactly the claim that NP-complete problems have super-polynomial lower bounds (no polynomial algorithm exists); proving it is a lower-bound problem of enormous difficulty, the same kind of "argue about all possible algorithms at once" reasoning as the max-finding bound, scaled up immensely.

What's Next

You have now mapped the landscape of computational difficulty: problems we can solve efficiently (P), problems we can at least check efficiently (NP), the hardest of those (NP-complete), and — from Chapter 36 — problems no algorithm can solve at all. That is the theoretical summit of the book. The next chapter descends from theory back toward applied power: Chapter 38 develops coding theory, where the algebraic structures of Chapter 24 and the error detection of Chapter 26 combine into codes that correct errors — the mathematics that lets a scratched CD still play, a QR code survive a coffee stain, and a spacecraft's signal cross billions of miles intact. Then Chapter 39, the capstone, brings the entire Toolkit together: you'll build a complete project — RSA encryption, a social-network analyzer, a Sudoku solver (whose NP-completeness you now understand precisely), or an error-correcting code — synthesizing logic, number theory, graphs, probability, and the complexity awareness you just gained. The wall you learned to recognize here is not the end of the road; it is a map that tells you, for the rest of your career, which roads are worth walking.

Prerequisites

Learning Objectives

In This Chapter

Chapter 37: Introduction to Complexity Theory — P, NP, and the Biggest Open Problem in CS

Overview

Learning Paths

37.1 Measuring difficulty: complexity classes

37.2 P and NP

37.3 Verification versus solution: the heart of the matter

37.4 NP-completeness and the Cook–Levin theorem

Polynomial-time reductions

NP-hard and NP-complete

The Cook–Levin theorem: the keystone

37.5 Reductions in action and the NP-complete zoo

From SAT to 3-SAT (restricting the form)

From 3-SAT to CLIQUE (a full gadget reduction)

The zoo and the web of reductions

37.6 Coping with intractability, and the P vs. NP question

Stating P vs. NP precisely

The coping menu (formalized)

Project Checkpoint: a polynomial-time verifier for the Toolkit

Summary

Spaced Review

What's Next