43 min read

> "It is difficult today to imagine how revolutionary Turing's ideas seemed at the time."

Prerequisites

  • 5
  • 10
  • 35

Learning Objectives

  • Describe the Turing machine model and trace a simple machine's computation on an input.
  • State the Church–Turing thesis and explain what it does and does not claim about computation.
  • Distinguish decidable from undecidable problems, and recognizable from decidable ones.
  • Prove the halting problem is undecidable using a diagonalization-and-contradiction argument.
  • Show a new problem is undecidable by reducing the halting problem to it.
  • Recognize undecidable problems in everyday software-engineering tasks and explain why no tool can solve them perfectly.

Chapter 36: Computability and the Halting Problem

"It is difficult today to imagine how revolutionary Turing's ideas seemed at the time." — Andrew Hodges, Alan Turing: The Enigma

Overview

Every time you write a linter, a static analyzer, a type checker, an optimizing compiler, or a test that asserts "this loop terminates," you are quietly betting that a program can analyze a program. Sometimes that bet pays off — your IDE really does flag unreachable code and unused variables. But you have also met the limits. No tool tells you, in general, whether an arbitrary function will return or spin forever. No analyzer catches every infinite loop. No compiler optimizes every program to its smallest equivalent. These are not gaps that a smarter startup will close next year. They are provably impossible, and this chapter is where we prove it.

This is the payoff Chapter 10 promised. There you learned, by a pure counting argument, that uncomputable problems must exist: programs are countable, problems are uncountable, so almost every problem has no program. That result was haunting but abstract — it named no specific impossible task. This chapter names one. The halting problem — "given a program and an input, will the program eventually stop?" — is a question every working programmer has wished they could answer automatically, and we will prove that no program can answer it for all inputs. The proof is Cantor's diagonal argument from Chapter 10, fired at programs instead of real numbers, finished with proof by contradiction from Chapter 5. Everything you have learned about proofs converges here.

Before we can say "no program solves this," we need a precise, language-independent definition of "program" — otherwise the claim is a claim about Python, or C, or whatever you happen to use. That definition is the Turing machine, the simplest device that captures exactly what any mechanical computer can do. We build it first, justify (via the Church–Turing thesis) that it loses no power, then use it to draw a sharp line between problems a computer can decide, problems it can only recognize, and problems it cannot touch at all.

In this chapter, you will learn to:

  • Read and trace a Turing machine, the formal model of "anything a computer can compute."
  • Use the Church–Turing thesis to reason about computability without committing to a language.
  • Tell apart decidable problems (always answered), recognizable ones (answered when "yes"), and undecidable ones (no general algorithm).
  • Prove the halting problem undecidable, and explain each move of the proof.
  • Spread undecidability to new problems by reduction — the working theorist's main tool.
  • See exactly which everyday software tasks are impossible in general, and why "impossible in general" still leaves room for useful tools.

Learning Paths

🏃 Fast Track: The heart of the chapter is the undecidability of the halting problem (§36.4) and what it means for you (§36.6). Read §36.1 just far enough to accept "Turing machine = formal program," skim §36.2–36.3 for the vocabulary (decidable, undecidable), then study §36.4 and §36.6 closely. Do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. The chapter is one long argument: model (§36.1) → "the model is universal" (§36.2) → the categories of problems (§36.3) → the impossibility theorem (§36.4) → spreading it by reduction (§36.5) → consequences (§36.6). Work every 🔄 Check Your Understanding before moving on, and try to reconstruct the halting-problem proof in §36.4 yourself before reading ours.

🔬 Deep Dive: After the chapter, prove Rice's theorem (every nontrivial semantic property of programs is undecidable) from the halting problem, work out the difference between recognizable and co-recognizable, and read ahead to how undecidability becomes intractability — the P vs. NP story of Chapter 37. Exercise set, Part E.


36.1 The Turing machine model

We want to prove something about all possible programs. That ambition forces a question we have so far dodged: what exactly is a program? If we prove "no Python program solves the halting problem," a skeptic answers "then use a more powerful language." We need a model of computation so simple we can reason about it rigorously, yet so powerful that nothing — no language, no future hardware — can compute anything it cannot. In 1936, before any electronic computer existed, Alan Turing designed exactly such a model.

Turing's insight was to strip computation down to what a human clerk does with pencil and paper. A clerk reads a symbol, consults a fixed rule book based on their current "state of mind," writes a symbol, and moves their attention. That is all computation is. Formalize it and you get a machine.

💡 Intuition: Picture an infinitely long paper tape divided into cells, each holding one symbol. A read/write head sits over one cell. The machine has a finite "state of mind." At each step it reads the current cell, and — based only on that symbol and its current state — it writes a symbol, moves the head one cell left or right, and switches to a new state. A finite rule book drives the whole thing. There is no random access, no arithmetic unit, no memory hierarchy. And yet this is the full power of computation.

The formal definition

Definition (Turing machine). A Turing machine is a 7-tuple $M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}})$ where:

  • $Q$ is a finite set of states;
  • $\Sigma$ is the finite input alphabet (the blank symbol $\sqcup$ is not in $\Sigma$);
  • $\Gamma$ is the finite tape alphabet, with $\Sigma \subseteq \Gamma$ and $\sqcup \in \Gamma$;
  • $\delta\colon Q \times \Gamma \to Q \times \Gamma \times \{L, R\}$ is the transition function (the rule book): given a state and the symbol under the head, it returns the next state, the symbol to write, and a direction to move;
  • $q_0 \in Q$ is the start state; $q_{\text{accept}}, q_{\text{reject}} \in Q$ are the accept and reject states (distinct).

The machine starts in $q_0$ with the head on the leftmost cell of the input (the rest of the tape is blank). It repeatedly applies $\delta$. It halts if it ever enters $q_{\text{accept}}$ (and accepts) or $q_{\text{reject}}$ (and rejects). If it never reaches either, it loops forever.

Three features deserve emphasis, because the entire chapter turns on them. First, $Q$, $\Sigma$, $\Gamma$, and $\delta$ are all finite — the rule book has finitely many entries. Second, the tape is unbounded: the machine never runs out of scratch space (this is the only "infinite" part). Third, and most important, a Turing machine has exactly three fates on any input: accept, reject, or run forever. That third possibility — non-halting — is not a bug to be engineered away. It is the crack through which undecidability enters.

🔗 Connection. Compare this to the finite automata of Chapter 35. A DFA also has finite states and reads input symbol by symbol — but it has no writable memory and must halt when the input ends (it can never loop forever). A Turing machine adds a writable, unbounded tape and the ability to move both directions and never stop. That single upgrade — memory you can revisit — jumps from the bottom of the Chomsky hierarchy (regular languages) to the top (all of computation). The pumping lemma showed you what a DFA can't recognize; this chapter shows you what nothing can.

Tracing a machine

Definitions only become real when you run them. Here is a tiny machine that decides whether a string of $a$'s has even length. Its alphabet is $\Sigma = \{a\}$; its states are $Q = \{q_{\text{even}}, q_{\text{odd}}, q_{\text{accept}}, q_{\text{reject}}\}$. The rule book:

State Read Write Move Next state
$q_{\text{even}}$ $a$ $a$ R $q_{\text{odd}}$
$q_{\text{even}}$ $\sqcup$ $\sqcup$ R $q_{\text{accept}}$
$q_{\text{odd}}$ $a$ $a$ R $q_{\text{even}}$
$q_{\text{odd}}$ $\sqcup$ $\sqcup$ R $q_{\text{reject}}$

Start in $q_{\text{even}}$. The machine walks right across the $a$'s, flipping between "even-so-far" and "odd-so-far," and when it falls off the end onto a blank it accepts iff it is in the even state. Let's trace the input aa. We write the configuration as the tape with the head position marked by the current state:

$$ q_{\text{even}}\,aa \;\to\; a\,q_{\text{odd}}\,a \;\to\; aa\,q_{\text{even}}\,\sqcup \;\to\; \text{accept}. $$

Two $a$'s, two flips, back to even, blank reached: accept. Trace a instead and you end in $q_{\text{odd}}$ over the blank, which rejects. The machine is a clerk counting on its fingers, and "even length" is the whole of its mind.

Now the same machine simulated in Python. We are not claiming Python is a Turing machine; we are using Python to make the formal model concrete. The transition function is a dictionary, the tape is a list, and the simulation loop applies $\delta$ until a halting state is reached.

def run_tm(delta, tape, blank="_"):
    state, head, tape = "even", 0, list(tape)
    while state not in ("ACC", "REJ"):
        if head == len(tape):       # extend the tape with a blank as needed
            tape.append(blank)
        symbol = tape[head]
        state, write, move = delta[(state, symbol)]
        tape[head] = write
        head += 1 if move == "R" else -1
    return state

delta = {("even", "a"): ("odd", "a", "R"),  ("even", "_"): ("ACC", "_", "R"),
         ("odd",  "a"): ("even", "a", "R"), ("odd",  "_"): ("REJ", "_", "R")}
print(run_tm(delta, "aa"), run_tm(delta, "aaa"))
# Expected output:
# ACC REJ

aa has even length, so the machine accepts (ACC); aaa is odd, so it rejects (REJ) — exactly the hand trace. Notice the while loop: for this machine it always terminates because the head only moves right and the input is finite. For a general Turing machine, the same loop might never end. That possibility is the heart of §36.4.

⚠️ Common Pitfall: "A Turing machine is too weak to matter." The barebones model looks like a toy next to your laptop. It is not. A Turing machine can simulate any modern CPU (slowly), run any algorithm in this book, and compute anything any programming language can. Its simplicity is a feature for proofs: because $\delta$ is a tiny finite table, we can reason about all Turing machines at once. We trade speed for analyzability — and lose nothing in what is computable.

🔄 Check Your Understanding 1. What are the three possible fates of a Turing machine on a given input, and which one has no counterpart in a DFA (Chapter 35)? 2. Which parts of a Turing machine are required to be finite, and which single part is unbounded? 3. In the even-length machine, trace the empty input (zero $a$'s). Does it accept or reject, and why is that the right answer?

Answers

  1. Accept, reject, or loop forever. The third — non-halting — has no DFA counterpart: a DFA always halts when the finite input is consumed. 2. The states $Q$, both alphabets $\Sigma$ and $\Gamma$, and the transition function $\delta$ are finite; only the tape is unbounded. 3. The machine starts in $q_{\text{even}}$, immediately reads a blank, and goes to $q_{\text{accept}}$ — it accepts. That is correct because zero is an even length.

36.2 The Church–Turing thesis

We have a clean model, but a nagging doubt remains: maybe Turing machines miss something. Maybe there is a problem your laptop can solve that no Turing machine can, and our impossibility theorem will only be about an outdated abstraction. The answer — one of the deepest empirical facts in computer science — is that nothing is missed.

In the same few years (1930s), several people chasing "what does it mean to compute?" arrived at wildly different formal models: Turing's machines, Alonzo Church's lambda calculus, Kurt Gödel and Jacques Herbrand's general recursive functions, and later Emil Post's systems. They look nothing alike — one is a tape device, one is pure symbol substitution, one is a class of number-theoretic functions. And yet, theorem after theorem, each was proved to compute exactly the same set of functions. Every model that anyone has ever proposed as a complete account of mechanical computation has turned out equivalent to the Turing machine. That striking convergence is the evidence for a claim we cannot formally prove but have every reason to believe.

Definition (Church–Turing thesis). The Church–Turing thesis is the claim that every function that can be computed by any mechanical (algorithmic, effective) procedure can be computed by a Turing machine. Equivalently: "computable by an algorithm" and "computable by a Turing machine" describe the same functions.

Read the definition carefully, because its status is unusual. It is not a theorem. You cannot prove it, because "mechanical procedure" is an informal, intuitive notion, and you cannot prove a precise statement equivalent to a vague one. It is a thesis — a claim about the match between an intuition and a formalism, supported overwhelmingly by evidence (every model agrees) but always formally open. It functions in practice like a definition: when we want to argue "no algorithm does $X$," the thesis lets us prove "no Turing machine does $X$" instead, which is a precise mathematical statement we can settle.

🚪 Threshold Concept. The Church–Turing thesis is what lets this chapter be about computation rather than about one machine. Once you accept it, "undecidable by a Turing machine" means "undecidable by any computer, any language, any algorithm, forever" — not just by the toy on the page. Every "no program can…" sentence in this book secretly invokes it. It also reframes programming languages: Python, C, Lisp, Haskell, and a Turing machine are all Turing complete — they compute the same functions, differing only in convenience and speed, never in raw power. The thesis is the reason a result about a 1936 tape device is a result about the cloud server you deploy to tomorrow.

💡 Intuition: Think of it like the conservation of energy in physics. No one has proved energy is always conserved; rather, every experiment ever done is consistent with it, and it has never failed, so we treat it as a law. The Church–Turing thesis has the same character for computation: ninety years of attempts to find a "more powerful" reasonable model have all failed, every one collapsing back to Turing-equivalence. We build on it the way engineers build on conservation laws.

A useful consequence for the rest of the chapter: we can be informal about machines. To argue that something is computable, we no longer have to exhibit a 7-tuple and a transition table — we can just describe an algorithm in plain English or Python and invoke the thesis to claim "therefore a Turing machine does it too." Conversely, to argue something is not computable, it suffices to show no Turing machine does it. This is exactly the freedom Chapter 10 used when it spoke of "programs" without fixing a language; the thesis is what justified that.

🔄 Check Your Understanding 1. Why is the Church–Turing thesis called a thesis and not a theorem? 2. What does it mean for two programming languages to be Turing complete, and what does that say about their relative power? 3. How does the thesis let us prove a statement about "all algorithms" by reasoning only about Turing machines?

Answers

  1. Because it equates a precise notion (Turing-computable) with an informal one (computable by any mechanical procedure); you cannot formally prove a match to an intuitive concept, so it is an evidence-backed claim, not a proof. 2. Both can compute exactly the set of Turing-computable functions; they have equal raw computational power and differ only in convenience, speed, and expressiveness — never in what is computable. 3. The thesis says "computable by an algorithm" = "computable by a Turing machine," so a proof that no Turing machine does $X$ is automatically a proof that no algorithm does $X$.

36.3 Decidable versus undecidable

With a model in hand and the thesis licensing us to talk about "algorithms" freely, we can finally classify problems by whether a computer can solve them. We focus on decision problems — yes/no questions — exactly as Chapter 10 did. Formally, a decision problem is a language: the set of all inputs whose answer is "yes" (Chapter 35's vocabulary). "Is $n$ prime?" becomes the language $\{n : n \text{ is prime}\}$; the problem is to decide membership.

A Turing machine can relate to a language in two importantly different ways, depending on what it does with the "no" instances.

Definition (decidable). A language $L$ is decidable (also recursive) if some Turing machine $M$ halts on every input and accepts exactly the strings in $L$ — it accepts every $x \in L$ and rejects every $x \notin L$. Such an $M$ is a decider for $L$. A decision problem is decidable if its language is.

Definition (recognizable). A language $L$ is recognizable (also recursively enumerable or semi-decidable) if some Turing machine $M$ accepts exactly the strings in $L$ — it accepts every $x \in L$, but on inputs $x \notin L$ it may reject or loop forever. Such an $M$ is a recognizer.

The difference is entirely about the "no" answers. A decider always gives you a definite answer — yes or no, guaranteed to halt. A recognizer is a weaker promise: it says "yes" eventually when the answer is yes, but when the answer is no it might say "no," or it might just run forever, never committing. Every decidable language is recognizable (a decider is a recognizer that happens to always halt), but — and this is the chapter's destination — not every recognizable language is decidable.

💡 Intuition: the search that never ends. Imagine searching an infinite haystack for a needle. A recognizer is "keep searching; shout when you find it." If a needle exists, you'll find it eventually and shout (accept). If no needle exists, you search forever, never shouting — you never get to say "there's definitely no needle." A decider is the stronger ability to also conclude, after finite work, "I have checked enough; there is no needle here" (reject). Recognizing is easy; the hard part is always knowing when to stop and say no.

Definition (undecidable). A language (decision problem) is undecidable if it is not decidable — there is no Turing machine that halts on all inputs and decides it. By the Church–Turing thesis, undecidable means no algorithm in any language can always correctly answer it.

Let's make the categories concrete with examples you already trust.

  • Decidable: "Is $n$ prime?" A trial-division algorithm always halts with a definite yes/no. "Does this DFA accept string $w$?" Simulate the DFA on $w$ — it halts when $w$ ends (Chapter 35). "Is this graph connected?" Run BFS (Chapter 28); it always terminates. Most problems you actually code are decidable; that's why you can code them with a guaranteed answer.
  • Recognizable but (we'll prove) not decidable: "Does program $p$ halt on input $x$?" You can recognize "yes" — just run $p$ on $x$ and accept if it stops. But if $p$ loops, your simulation loops too, and you never get to safely say "no." This asymmetry is the halting problem, and §36.4 proves no decider exists.

Here is a recognizer for halting, in code, to make the asymmetry tangible:

def halts_recognizer(program, x):
    """A RECOGNIZER for 'does program halt on x': accepts (returns True)
    when program halts; loops forever exactly when program loops. It can
    NEVER safely return False. This is the best a program can do."""
    result = program(x)          # run it; if program(x) loops, WE loop here too
    return True                  # reached only if program(x) actually halted

# If we pass a function that halts, the recognizer accepts:
print(halts_recognizer(lambda n: n + 1, 5))
# Expected output:
# True

The function returns True precisely when the simulated program halts — and hangs exactly when the simulated program hangs. It is a perfect recognizer and a useless decider: it never says "no." A decider would have to detect non-halting without falling into it. Section 36.4 proves that detection is impossible.

🔗 Connection. The decidable/recognizable split refines a question you first met around algorithms and complexity (Chapter 14): there we asked how long a halting algorithm runs; here we ask the prior question of whether an algorithm that always halts even exists. Complexity assumes decidability and measures cost; computability asks whether you get to measure cost at all. Chapter 37 will sit on top of both, splitting the decidable problems into the efficiently decidable (P) and the rest.

🔄 Check Your Understanding 1. State the single difference between a decider and a recognizer. 2. Is every decidable language recognizable? Is every recognizable language decidable? 3. Why does the halts_recognizer above qualify as a recognizer but not a decider?

Answers

  1. A decider halts on every input (always gives yes or no); a recognizer accepts the "yes" inputs but may loop forever on "no" inputs. The difference is entirely in behavior on non-members.
  2. Yes, every decidable language is recognizable (a decider is a recognizer that always halts). No — not every recognizable language is decidable; the halting problem (§36.4) is the counterexample. 3. It accepts (returns True) exactly when the program halts — so it correctly handles every "yes" — but on a non-halting program it loops forever instead of returning False, so it never decides the "no" cases.

36.4 The halting problem is undecidable

This is the summit. We will prove that one specific, natural, important problem has no algorithm — not "no efficient algorithm," not "no algorithm yet," but no algorithm, ever, in any language, on any hardware. The problem is one every programmer has wanted solved.

Definition (the halting problem). The halting problem is the decision problem: given (a description of) a program $P$ and an input $x$, determine whether $P$ halts (eventually stops) when run on $x$, as opposed to running forever. As a language, $$\mathrm{HALT} = \{\, \langle P, x\rangle : P \text{ is a program that halts on input } x \,\}.$$

The notation $\langle P, x\rangle$ means "an encoding of the pair (program $P$, input $x$) as a single string" — recall from Chapter 10 that every program is a finite string, so we can feed a program to another program as data. This is the crucial enabling move, and it is exactly what you do whenever you write a function that takes source code as an argument: a compiler, an interpreter, an eval. Programs are data.

Theorem (Turing, 1936). $\mathrm{HALT}$ is undecidable. No Turing machine halts on all inputs and correctly decides, for every program $P$ and input $x$, whether $P$ halts on $x$.

The strategy first. This is proof by contradiction (Chapter 5) powered by diagonalization (Chapter 10) — the same diagonal dodge that built a real number on no list, now building a program whose behavior no halt-decider can predict. The plan, in four moves:

  1. Assume a decider $H$ for the halting problem exists: $H(P, x)$ always halts and returns "halts" or "loops," correctly, for every $P$ and $x$.
  2. Build a paradoxical program $D$ that uses $H$ as a subroutine and is deliberately contrary: $D$ runs $H$ to predict its own behavior, then does the opposite.
  3. Feed $D$ its own description — the diagonal move, "run the thing on itself," exactly as Cantor fed the diagonal back into the list.
  4. Watch the contradiction. $D$ halts if and only if $D$ does not halt — impossible. The only assumption we made was that $H$ exists, so $H$ cannot exist.

Proof. Suppose, for contradiction, that the halting problem is decidable. Then there is a Turing machine — call it $H$ — that decides it: for every program $P$ and input $x$, $H(P, x)$ halts and outputs

$$ H(P, x) = \begin{cases} \text{"halts"} & \text{if } P \text{ halts on input } x,\\ \text{"loops"} & \text{if } P \text{ runs forever on } x.\end{cases} $$

Crucially, $H$ itself always halts (it is a decider), and it does so without actually running $P$ — it somehow analyzes $P$ and decides. Now we use $H$ to construct a new program $D$ (for "diagonal," or "devious"). $D$ takes a single input: a program $P$. Here is what $D$ does:

$D(P)$: run $H(P, P)$ — ask whether $P$ halts when given its own description as input. - If $H$ says $P$ halts on $P$, then $D$ deliberately loops forever. - If $H$ says $P$ loops on $P$, then $D$ deliberately halts (say, returns immediately).

$D$ is a perfectly legitimate program: it is just "call the subroutine $H$, then do the opposite of what it predicts." Since we assumed $H$ exists and always halts, $D$ is well-defined and the call to $H$ inside it always returns. So far, no contradiction — just a contrarian program.

Now the diagonal move. Run $D$ on its own description, i.e. consider $D(D)$. By the definition of $D$ applied to the input $P = D$:

  • Case 1: $D$ halts on input $D$. Then $H(D, D)$ must report "halts" (because $H$ is correct). But by $D$'s own rule, when $H$ reports "halts," $D$ loops forever. So $D$ halts on $D$ and $D$ loops forever on $D$ — a contradiction.
  • Case 2: $D$ loops forever on input $D$. Then $H(D, D)$ must report "loops" (because $H$ is correct). But by $D$'s rule, when $H$ reports "loops," $D$ halts. So $D$ loops forever on $D$ and $D$ halts on $D$ — again a contradiction.

Both cases are impossible. The statement "$D$ halts on $D$" is true if and only if it is false:

$$ D \text{ halts on } D \quad\Longleftrightarrow\quad D \text{ does not halt on } D. $$

A statement equivalent to its own negation is a contradiction. Every step after our one assumption was forced and valid, so the fault lies in the assumption itself. Therefore no decider $H$ exists: the halting problem is undecidable. $\blacksquare$

Sit with what just happened. We did not fail to find a halt-checker; we proved that the very idea of one is self-contradictory. The program $D$ is a logical trap: any would-be oracle $H$, asked about $D$ on itself, is forced into predicting a behavior that $D$ then defies. It is the liar's paradox — "this sentence is false" — compiled into a program. And the engine is precisely Cantor's diagonal: $H$ claims to tabulate the halting behavior of every program on every input; $D$ walks down the diagonal (each program run on itself) and flips the answer, producing a behavior the table cannot contain.

🔗 Connection. Lay this proof beside Chapter 10's proof that the functions $\mathbb{N} \to {0,1}$ are uncountable. There, we listed candidate functions $f_0, f_1, f_2, \dots$ and built $g$ with $g(n) = 1 - f_n(n)$ — flip the diagonal bit. Here, the "list" is all programs $P_0, P_1, P_2, \dots$, the diagonal entry is "$P_n$ run on its own description $P_n$," and $D$ is the flip: $D$ does the opposite of $P_n$'s halting behavior on $P_n$. In Chapter 10 the flip produced a function on no list (so the list was incomplete); here the flip produces a behavior $H$ cannot correctly predict (so $H$ cannot exist). Same diagonal, same flip, same contradiction — Chapter 10's diagonal_escape made executable, now aimed at programs. This is the debt Chapter 10 said it was incurring, paid in full.

🚪 Threshold Concept. Undecidability is real, and it is specific. Before this proof you knew (from Chapter 10's counting) that some problem must be uncomputable; now you can point at one and say "that one — halting — and here is the proof." This reorganizes how you think about software forever. The question "will this code terminate?" is not merely hard, not merely unsolved — it is mathematically impossible to answer in general. Once you internalize this, you stop asking whether a perfect bug-finder, a perfect optimizer, or a perfect "is this code correct?" checker is feasible and start recognizing the whole family as forbidden. Diagonalization, which you met as a curiosity about real numbers, is the master key that locks these doors.

⚠️ Common Pitfall: "But I can just run the program and watch!" The most common objection: "To check if $P$ halts on $x$, run it; if it stops, it halts." That gives you a recognizer (§36.3), not a decider. If $P$ does halt, watching works — you see it stop and answer "halts." But if $P$ loops forever, your watching also goes on forever; you never reach a moment where you can safely answer "loops." And you cannot just "wait long enough," because there is no bound on how long a halting program might take — some halt after $10^{100}$ steps, and no finite wait distinguishes "will halt eventually" from "never halts." The decider must answer without running $P$ to completion, and that is exactly what the proof forbids.

🐛 Find the Error. A student proposes a halt-decider: "Run $P$ on $x$ for one million steps. If it halted, output 'halts'; if not, output 'loops.' This always terminates, so the halting problem is decidable." Where is the flaw?

Answer

The procedure always halts, but it is not correct. A program that halts after, say, 1,000,001 steps gets wrongly labeled "loops." There is no fixed step bound $N$ that works for all programs, because halting times are unbounded — for any $N$ you choose, some program halts at step $N+1$. Bounding the simulation buys termination at the cost of correctness, and a decider must be both. The proof shows you cannot have both; this student traded away the one the proof says you can't keep while keeping the other.

🔄 Check Your Understanding 1. In the proof, what is the single assumption we make for contradiction, and what specifically becomes contradictory? 2. Why is it essential that we feed $D$ its own description (run $D$ on $D$)? Which earlier technique is this "run it on itself" move? 3. Explain in one sentence why "just run $P$ and see if it stops" does not decide the halting problem.

Answers

  1. We assume a decider $H$ for halting exists (always halts, always correct). The contradiction is that the constructed program $D$ halts on input $D$ if and only if it does not — a statement equivalent to its own negation. 2. Feeding $D$ its own description is the diagonal move (Chapter 10): it makes $D$'s prediction be about itself, so $D$ can contradict the prediction. Running a program on its own source is what creates the self-reference the paradox needs. 3. Running $P$ only recognizes halting — it answers "halts" when $P$ stops but loops forever (never answering) when $P$ loops, so it is not a decider.

36.5 Reductions and other undecidable problems

The halting problem is the first undecidable problem, proven from scratch by diagonalization. But it is far from the last. There is a vast catalogue of undecidable problems, and we almost never prove them undecidable from scratch. Instead we use the single most important technique in computability theory: reduction. The idea is the same one you use constantly in programming — "solve a new problem by calling a solver for an old one" — turned into a proof technique.

Definition (reduction). A reduction from problem $A$ to problem $B$ is an algorithm that transforms any instance of $A$ into an instance of $B$ such that the answer is preserved: the $A$- instance is a "yes" exactly when the produced $B$-instance is a "yes." We write $A \le B$, read "$A$ reduces to $B$." Intuitively, it shows $A$ is no harder than $B$: a solver for $B$ would give a solver for $A$.

The direction of the inequality is the thing to get right, and it is the opposite of most people's first guess.

💡 Intuition: which way does difficulty flow? "$A \le B$" means "$B$ is at least as hard as $A$," because if you could solve $B$, you could solve $A$ (transform the $A$-question into a $B$-question and use the $B$-solver). So reductions transfer power down and impossibility up: a solver for the harder problem solves the easier one, and the impossibility of the easier one dooms the harder one.

The contrapositive of that last sentence is the proof technique we want:

The reduction technique (for undecidability). To prove a problem $B$ is undecidable, reduce a known undecidable problem $A$ (such as $\mathrm{HALT}$) to $B$: show $A \le B$. Then a decider for $B$ would yield a decider for $A$ — but $A$ is undecidable, so no decider for $B$ can exist.

🔗 Connection. You will see this exact machinery again in Chapter 37, where reductions prove problems NP-complete instead of undecidable. The logic is identical — "reduce a known-hard problem to the new one, so a solution to the new one would crack the old one" — only the notion of "hard" changes (undecidable here, intractable there). Reduction is the load-bearing idea of both Part VI's theorems; learning it once equips you for both. Polynomial-time reductions in Chapter 37 just add a budget on the transforming algorithm.

A worked reduction: the empty-language problem

Let's prove a second problem undecidable, to see the technique in motion.

The empty-acceptance problem ($\mathrm{EMPTY}$): given a program $P$, does $P$ accept no input at all — is the set of inputs it accepts empty? As a language, ${\langle P\rangle : P \text{ accepts nothing}}$.

Claim. $\mathrm{EMPTY}$ is undecidable.

The strategy first. We reduce $\mathrm{HALT}$ to $\mathrm{EMPTY}$. Given a halting question "does $P$ halt on $x$?", we will manufacture a new program $P'$ that is rigged so that "$P'$ accepts nothing" carries the answer. The trick: build $P'$ to ignore its own input and instead just run $P$ on $x$; if that run halts, $P'$ accepts (everything), otherwise $P'$ accepts (nothing). Then $P'$'s emptiness is $P$'s non-halting, and a decider for $\mathrm{EMPTY}$ would decide $\mathrm{HALT}$.

Proof. Suppose for contradiction that $\mathrm{EMPTY}$ is decidable, via a decider $E$. We build a decider for the halting problem, contradicting §36.4. Given an arbitrary halting instance $\langle P, x\rangle$, construct the following program $P'$ (note: this construction is a finite, mechanical algorithm — it just writes source code, it does not run $P$):

$P'(w)$: ignore the input $w$; run $P$ on $x$; if that run halts, accept.

Trace the two possibilities:

  • If $P$ halts on $x$, then for every input $w$, $P'$ runs $P$ on $x$, sees it halt, and accepts. So $P'$ accepts every input — its language is not empty.
  • If $P$ loops on $x$, then for every input $w$, $P'$ gets stuck running $P$ on $x$ forever and never accepts. So $P'$ accepts no input — its language is empty.

Therefore: $P'$ accepts nothing $\iff$ $P$ loops on $x$. Now run the assumed decider $E$ on $\langle P'\rangle$ and flip its answer: $P$ halts on $x$ exactly when $E$ says $P'$ is not empty. This composite — "build $P'$ from $\langle P, x\rangle$, run $E$, flip" — always halts (building $P'$ is finite, and $E$ is a decider) and correctly decides whether $P$ halts on $x$. That is a decider for $\mathrm{HALT}$, which §36.4 proved impossible. Contradiction. Hence $\mathrm{EMPTY}$ is undecidable. $\blacksquare$

Notice the shape, because every undecidability reduction has it: assume a decider for the new problem, use it (plus a mechanical program-builder) to construct a decider for HALT, invoke §36.4 to get a contradiction. The creative part is always the program-builder $P'$ — the gadget that smuggles the halting question into the new problem's terms.

⚠️ Common Pitfall: reducing the wrong direction. To prove $B$ undecidable you must reduce a known undecidable problem to $B$ ($\mathrm{HALT} \le B$), not $B$ to $\mathrm{HALT}$. Reducing $B \le \mathrm{HALT}$ would only show "$B$ is no harder than $\mathrm{HALT}$" — true of many decidable problems too, so it proves nothing about $B$'s undecidability. Always ask: "if my new problem were decidable, would that let me decide something I know is impossible?" If yes, you have it; if you find yourself using a HALT-decider to solve $B$, you are going the wrong way.

The reach of undecidability

The same machinery topples a long list of problems that sound like things a tool should do. A representative sample (each provable by reduction from $\mathrm{HALT}$):

  • Program equivalence: do two programs compute the same function on all inputs? Undecidable — so no compiler can certify that an optimization preserved behavior in every case.
  • Dead-code detection (in general): will this line of code ever execute on some input? Undecidable.
  • Whether a program ever throws a given exception, accesses a null pointer, or exceeds a memory bound on some input. Undecidable in general.
  • The Post Correspondence Problem (a tiling/matching puzzle on strings), and Hilbert's tenth problem (does a given polynomial equation have an integer solution?) — both proven undecidable, the latter a celebrated 1970 result (Matiyasevich, building on Davis–Putnam–Robinson; Tier 2).

There is even a sweeping generalization, Rice's theorem, which says (roughly) that every nontrivial question about what a program computes — its input/output behavior — is undecidable. Asking about the behavior of arbitrary code is, with almost no exceptions, a fool's errand for a general algorithm. (We leave Rice's theorem to the Deep Dive; it follows from the halting problem by the reduction pattern above.)

💡 Intuition: Why are so many problems undecidable? Because by Chapter 10's counting, the undecidable problems are the overwhelming majority — the decidable ones are a thin countable sliver. The halting problem and reductions just let us name members of that majority. Each reduction is a bridge from the one undecidable problem we proved by hand to another, and the bridges reach almost everywhere.

🔄 Check Your Understanding 1. To prove a new problem $B$ undecidable by reduction, do you reduce $\mathrm{HALT}$ to $B$, or $B$ to $\mathrm{HALT}$? Why? 2. In the $\mathrm{EMPTY}$ reduction, what is the role of the constructed program $P'$, and why is it important that building $P'$ does not require running $P$? 3. What does Rice's theorem say, in one sentence, and why does it make most "program analysis" questions impossible in general?

Answers

  1. Reduce $\mathrm{HALT}$ to $B$ (show $\mathrm{HALT} \le B$): then a decider for $B$ would give a decider for $\mathrm{HALT}$, which is impossible, so $B$ is undecidable. Reducing $B$ to $\mathrm{HALT}$ would prove nothing (many decidable problems also reduce to HALT). 2. $P'$ is a gadget that translates the halting question into an emptiness question: $P'$ accepts nothing exactly when $P$ loops on $x$. Building $P'$ must be a finite, mechanical step (just writing code) — if it required running $P$, the reduction itself might never halt and would not yield a decider. 3. Rice's theorem: every nontrivial property of the function a program computes is undecidable. So any general tool that decides a behavioral property of arbitrary programs (e.g., "does it ever output 0?") cannot exist, because that property is nontrivial and hence undecidable.

36.6 What this means for programmers

It would be easy to read this chapter as a counsel of despair: the central question about software — "does my code do the right thing?" — is mathematically unanswerable in general. But the working lesson is the opposite of despair. Knowing exactly what is impossible tells you to stop chasing it, where to spend your effort instead, and how to read the fine print on every tool you use. Undecidability is not a wall you keep walking into; it is a map of where the walls are.

1. Why your tools are necessarily imperfect — and that's fine. Your static analyzer misses some bugs and flags some non-bugs. Your linter cannot find every infinite loop. Your compiler will not optimize every program optimally. None of this is sloppy engineering; §36.4 and Rice's theorem prove these tools cannot be perfect. So real tools make a principled trade: they are sound but incomplete (every warning is real, but some real bugs are missed) or complete but unsound (every bug is flagged, but some warnings are false). A type checker rejects some programs that would actually have run fine — it errs on the safe side because it must. Understanding this stops you from filing the bug report "why didn't the analyzer catch this?" and starts you asking the right question: "which side does this tool err on, and is that the side I want?"

🔗 Connection. The decidable/undecidable boundary is why engineering involves so many approximations and heuristics. We met the easy-vs-hard boundary inside the decidable problems back in Chapter 30 (Euler paths are easy, Hamilton paths are hard) and will formalize it in Chapter 37 (P vs. NP). This chapter draws the outer boundary: outside it, no algorithm exists at all; just inside it, some problems are decidable but hopelessly slow. Good engineers navigate both borders.

2. How tools cope: bound the problem, accept approximation, or ask the human. Undecidability is always about the general, unbounded case. Restrict the problem and it often becomes decidable again:

  • Bound the resource. "Does $P$ halt within $10^6$ steps?" is decidable — just simulate for $10^6$ steps. Watchdog timers, execution quotas, and gas limits in smart contracts all exploit exactly this: replace the undecidable "halts ever?" with the decidable "halts within budget?"
  • Restrict the language. Total functional languages, primitive-recursive code, and many domain-specific languages are deliberately not Turing complete — they trade away the ability to loop forever for the guarantee that every program halts. SQL (without recursion), regular expressions (Chapter 35), and configuration languages are decidable by design.
  • Be conservative. Type systems, borrow checkers, and termination checkers accept a subset of good programs and reject everything they cannot prove safe. They answer a decidable approximation of the undecidable question, erring on the safe side.
  • Ask a human. Interactive proof assistants (Coq, Lean, Isabelle) sidestep undecidability by letting a person supply the insight a machine cannot derive, then checking the proof — which is decidable.

3. Where you'll actually meet this. When a senior engineer says "we can't statically guarantee this loop terminates, so add a timeout," they are applying §36.4. When a language designer makes the template/macro system intentionally limited, they are dodging Turing completeness on purpose. When a verification tool reports "cannot prove; possible issue" instead of a clean yes/no, that hedge is undecidability showing through. You will not write a halting-problem proof at work — but you will make, and recognize, dozens of decisions shaped by it.

💡 Intuition: Hold one image from this chapter. There is a hard, permanent border around what any algorithm can do, drawn not by our cleverness but by logic itself — the same diagonal logic that Cantor used on the real numbers. Inside the border lies everything you can compute; the halting problem and its undecidable cousins sit just outside, forever. Mature engineering is not the doomed attempt to cross the border. It is the skill of working productively right up against it — bounding, approximating, and asking humans exactly where the impossible begins.

🔄 Check Your Understanding 1. Give two distinct strategies a real tool uses to "cope" with an undecidable question, with an example of each. 2. What does it mean for a static analyzer to be "sound but incomplete," and why does undecidability force a tool to choose between soundness and completeness? 3. "Does this program halt within 10,000 steps?" — is this decidable or undecidable? How does that square with §36.4?

Answers

  1. Any two of: bound the resource (e.g., a watchdog timer turning "halts ever?" into "halts in $N$ steps?"); restrict the language (e.g., recursion-free SQL or regex, which are not Turing complete); be conservative (e.g., a type checker rejecting any program it can't prove safe); ask a human (e.g., a proof assistant checking a human-supplied proof). 2. "Sound but incomplete" means every warning it emits is a real bug, but it misses some real bugs. Undecidability forbids a tool that is both sound and complete for a behavioral property (that would decide the undecidable), so a tool must give up one — typically staying sound and accepting incompleteness. 3. It is decidable — simulate for at most 10,000 steps and answer. This does not contradict §36.4 because the bounded question is a different, easier problem than the unbounded "halts ever?"; undecidability is about the general case with no step limit.

Project Checkpoint: a Turing machine and the halting paradox, for the Toolkit

The recurring theme of this book is that computation and proof are complementary (theme four). This chapter's proofs are about the uncomputable, but two of its mechanisms are concrete enough to run: a Turing machine simulator (the formal model made executable) and a runnable demonstration of the self-reference at the core of §36.4. We add a small, self-contained computability.py to our growing dmtoolkit. As with Chapter 10's cardinality.py, this module is a teaching companion to the chapter — it introduces no signatures that conflict with the frozen Toolkit API, and it makes the two ideas of this chapter touchable.

Create dmtoolkit/computability.py and add a general Turing-machine simulator plus a self-application helper that exhibits the diagonal "run it on itself" move:

def run_tm(delta, tape, start="q0", accept="ACC", reject="REJ", blank="_", max_steps=10_000):
    """Simulate a Turing machine. delta maps (state, symbol) -> (state, write, move in {L,R}).
    Returns 'ACC', 'REJ', or 'LOOP?' if max_steps is hit (we cannot decide halting -- see 36.4)."""
    state, head, tape = start, 0, list(tape)
    for _ in range(max_steps):
        if state in (accept, reject):
            return state
        if head < 0:
            tape.insert(0, blank); head = 0
        if head == len(tape):
            tape.append(blank)
        state, write, move = delta[(state, tape[head])]
        tape[head] = write
        head += 1 if move == "R" else -1
    return "LOOP?"      # bounded simulation: 'maybe loops' -- the undecidable gap, made visible

def applied_to_self(program):
    """Run a program on its own description -- the diagonal move behind the halting proof."""
    return program(program)

# The even-length-of-a's machine from 36.1, now via the general simulator:
even = {("q0", "a"): ("q1", "a", "R"), ("q0", "_"): ("ACC", "_", "R"),
        ("q1", "a"): ("q0", "a", "R"), ("q1", "_"): ("REJ", "_", "R")}
print(run_tm(even, "aaaa"), run_tm(even, "aaa"))

def name_of(f):                  # a program we can safely apply to itself
    return f.__name__
print(applied_to_self(name_of)) # name_of(name_of) -> the string "name_of"
# Expected output:
# ACC REJ
# name_of

The first line runs the formal model on aaaa (even → ACC) and aaa (odd → REJ), matching the hand trace in §36.1. The crucial design choice is max_steps: because halting is undecidable, our simulator cannot in general tell "still running" from "loops forever" — so it gives up after a bound and reports LOOP?. That LOOP? is the halting problem staring back at you in code: no honest finite simulator can replace it with a definite "loops." The second line illustrates the self- application trick applied_to_self — feeding a program its own description as data. Here name_of applied to itself halts and returns its own name, "name_of"; the point is only to exhibit the pattern of a program taking itself as input. The proof's $D$ is the weaponized version of exactly this move: $D$ run on $D$ is engineered to contradict $H$'s own prediction, and that is what no consistent $H$ can survive.

Toolkit so far: logic.py (Chapters 1–3), recurrences.py (begun Chapter 6), cardinality.py (Chapter 10), and now computability.py. Notice the arc: Chapter 10's diagonal_escape built a row no table contains; this chapter's applied_to_self + the bounded run_tm show the same diagonal fired at programs, where it becomes the halting paradox. The Toolkit now spans from "test a conjecture on many cases" to "demonstrate the boundary of what code can do at all" — the two ends of theme four.


Summary

This chapter drew the outer boundary of computation: a precise model of "program," a thesis that the model is universal, a classification of problems by solvability, and a proof that one natural problem lies forever outside the solvable.

Core definitions

Term Meaning
Turing machine Finite states + unbounded tape + transition function $\delta$; halts by reaching accept/reject, else loops forever. The formal model of "algorithm."
Church–Turing thesis Claim (not theorem): every mechanically computable function is Turing-computable. Lets "no Turing machine" mean "no algorithm at all."
decidable Some Turing machine halts on all inputs and gives the correct yes/no.
recognizable Some machine accepts the "yes" inputs but may loop forever on "no" inputs.
undecidable Not decidable — no machine halts on all inputs and decides it.
halting problem $\mathrm{HALT} = \{\langle P, x\rangle : P \text{ halts on } x\}$ — undecidable.
reduction ($A \le B$) Algorithm transforming $A$-instances to $B$-instances preserving the answer; "$B$ is at least as hard as $A$."

Key results

Result How it's proved
Three fates of a TM: accept / reject / loop forever From the model; non-halting is the source of undecidability
All reasonable computation models are equivalent Empirical convergence (Turing = lambda calculus = recursive functions = …) → Church–Turing thesis
Decidable $\subseteq$ recognizable, but not equal A decider is a recognizer that always halts; HALT separates them
Halting problem is undecidable Assume decider $H$; build contrarian $D$; run $D$ on $D$; "$D$ halts $\iff$ $D$ doesn't" — diagonalization + contradiction
New problems (EMPTY, equivalence, dead code, PCP, Hilbert's 10th) are undecidable Reduction from HALT: a decider for them would decide HALT
Every nontrivial behavioral property of programs is undecidable Rice's theorem (reduction from HALT)

The proof, in one line. A halt-decider $H$ would let us build $D$ = "do the opposite of what $H$ predicts about me," and then $D$ run on itself halts iff it doesn't — contradiction; so $H$ can't exist. (Cantor's diagonal flip, applied to programs.)

Reduction, used correctly

  • To prove $B$ undecidable: show $\mathrm{HALT} \le B$ (reduce the known-hard problem to the new one). The gadget is a finite program-builder that smuggles the halting question into $B$'s terms.
  • $A \le B$ means "$B$ is at least as hard as $A$" — solvability flows down, impossibility flows up.
  • Wrong direction warning: $B \le \mathrm{HALT}$ proves nothing about $B$'s undecidability.

What it means for code

  • Perfect bug-finders, optimizers, equivalence-checkers, and termination-checkers are impossible, not merely unbuilt.
  • Real tools are sound-but-incomplete or complete-but-unsound — they must give up one.
  • Coping strategies: bound the resource ("halts in $N$ steps?" is decidable), restrict the language (non-Turing-complete by design), be conservative (type checkers), or ask a human (proof assistants).

Toolkit: computability.pyrun_tm (a Turing-machine simulator whose LOOP? makes the undecidable gap visible) and applied_to_self (the self-reference at the heart of the halting proof).


Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking.

  1. (Ch. 35) A Turing machine is a finite automaton with two crucial additions. Name them, and say which one makes "loop forever" possible (and thus makes undecidability possible).
  2. (Ch. 35) A decision problem can be viewed as a language. Restate "the halting problem is undecidable" in language terms, and say which is the larger class: decidable or recognizable languages.
  3. (Ch. 10) The halting-problem proof reuses Cantor's diagonalization. In Chapter 10's proof that functions $\mathbb{N}\to\{0,1\}$ are uncountable, what was the "diagonal flip," and what plays its role in the halting proof?
  4. (Ch. 10) Chapter 10 proved uncomputable problems exist by counting; this chapter named one. Which cardinality comparison made the existence inevitable?
  5. (Ch. 5) The halting proof is a proof by contradiction. State the one assumption that is contradicted, and the self-negating statement that does the contradicting.

Answers

  1. A writable, unbounded tape (rewritable memory) and the ability to move both directions / not halt. The unbounded memory + no forced halt is what lets a TM run forever; a DFA must halt when its finite input ends, so it can never loop forever — and without "loop forever," there is no gap between decidable and recognizable. 2. As a language, $\mathrm{HALT} = {\langle P,x\rangle : P \text{ halts on } x}$ is recognizable but not decidable. The recognizable languages are the larger class — every decidable language is recognizable, but not conversely (HALT witnesses the gap). 3. In Chapter 10 the flip was $g(n) = 1 - f_n(n)$ — disagree with the $n$-th listed function at input $n$. In the halting proof, the "list" is all programs $P_n$, the diagonal entry is "$P_n$ run on its own description," and the flip is $D$ doing the opposite of $P_n$'s halting behavior on $P_n$. 4. Programs are countable, problems (functions $\mathbb{N}\to\{0,1\}$) are uncountable, and $\aleph_0 < \lvert\text{problems}\rvert$ forces some problem to have no program. 5. The contradicted assumption is "a decider $H$ for the halting problem exists." The self-negating statement is "$D$ halts on input $D$ $\iff$ $D$ does not halt on input $D$."

What's Next

You now know there are problems no algorithm can solve at all. The natural next question sharpens the knife: among the problems algorithms can solve, which can they solve efficiently? A problem can be perfectly decidable and yet, for any input you'd actually care about, require more steps than there are atoms in the universe — decidable in principle, intractable in practice. Chapter 37 (Complexity Theory — P, NP, and Beyond) draws this inner boundary. It splits the decidable problems into P (efficiently solvable) and NP (efficiently checkable), asks the most famous open question in computer science — does $\text{P} = \text{NP}$? — and reuses this chapter's master tool, reduction, to prove problems NP-complete the way we just used it to prove problems undecidable. The halting problem was the limit of what can be computed; NP-completeness is the limit of what can be computed fast. Same proof technique, one boundary inward.