Case Study: Building a Turing-Machine Simulator and Watching the Undecidable Gap Appear

"What I cannot create, I do not understand." — Richard P. Feynman (found on his blackboard, 1988)

Executive Summary

In this build-focused case study you will construct, from scratch, a working Turing-machine simulator — the formal model of §36.1 turned into runnable code — and then use it to build the very boundary the chapter proves. You will write the simulation loop, encode three machines (even length, length divisible by three, and a unary incrementer that writes to the tape), hand-trace their configurations, and verify each against its specification. Then comes the payoff: because a real simulator must decide when to stop simulating, you will see the halting problem appear inside your own code as the "LOOP?" verdict — the place where the simulator honestly admits it cannot tell "still working" from "loops forever." Finally you will build the self-application gadget applied_to_self and trace how the harmless "run a program on itself" move becomes the proof's weapon $D$.

Where Case Study 1 audited a claim, this one builds the artifact. Every line of code here is meant to run in your head; we hand-derive each # Expected output: exactly as the book requires, and never execute anything.

Skills applied

  • Implementing the §36.1 transition-function-driven simulation loop, including tape extension.
  • Encoding decision problems as transition tables and tracing configurations by hand.
  • Building a bounded universal simulator and locating the undecidable gap as the "LOOP?" return.
  • Constructing the self-application mechanism and connecting it to the diagonal program $D$ (§36.4).
  • Verifying a built component against its specification without a single execution.

Background

The scenario

You are adding a small computability.py module to your team's teaching toolkit (the book's dmtoolkit). The goal is pedagogical: give students a runnable Turing machine so the 7-tuple stops being abstract, and let them feel the halting problem by hitting it in code. The module needs three things: a general simulator run_tm, a few example machines, and the applied_to_self helper that exhibits self-reference. The engineering constraint is the interesting one — a Turing tape is unbounded and a machine may never halt, but your simulator runs on a real computer with finite time. How your simulator handles "might run forever" is where the chapter's theorem becomes concrete.

💡 Intuition: A simulator is a recognizer with a stopwatch. Run the machine and, if it reaches an accept/reject state, report it — that part is faithful. But you cannot wait forever to find out it won't halt (§36.4 says no finite procedure can decide that in general). So a practical simulator imposes a step budget and, when the budget is exhausted, returns an honest "I don't know" rather than a dishonest "loops." That "LOOP?" is the undecidable gap, built right into the API.

Why this matters

Writing the simulator does three things at once. It makes the formal model in §36.1 operational, so "transition function" becomes a dictionary you can index. It demonstrates the Church–Turing thesis from the inside: your Python loop is simulating a Turing machine, which can in turn simulate any computation — the two are inter-translatable (§36.2). And it stages the halting problem honestly: the moment you must choose what to return when the budget runs out, you are face to face with §36.4. Build it once and the abstraction is yours for good.

Phase 1: Build the simulation loop

Start from the definition. A Turing machine is $M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}})$. The only moving parts at run time are the current state, the head position, and the tape. The loop reads the symbol under the head, looks up $\delta$, writes, moves, and switches state — until it lands in a halting state. We add tape extension (the tape is unbounded, so we grow it with blanks on demand) and, deferring the budget to Phase 3, first build the unbounded loop to match §36.1 exactly.

def run_tm_basic(delta, tape, start="q0", accept="ACC", reject="REJ", blank="_"):
    """Faithful (unbounded) TM loop -- mirrors 36.1. May NEVER return (that is the point)."""
    state, head, tape = start, 0, list(tape)
    while state not in (accept, reject):
        if head < 0:                       # head walked off the left end
            tape.insert(0, blank); head = 0
        if head == len(tape):              # head walked off the right end
            tape.append(blank)
        state, write, move = delta[(state, tape[head])]
        tape[head] = write
        head += 1 if move == "R" else -1
    return state

even = {("q0", "a"): ("q1", "a", "R"), ("q0", "_"): ("ACC", "_", "R"),
        ("q1", "a"): ("q0", "a", "R"), ("q1", "_"): ("REJ", "_", "R")}
print(run_tm_basic(even, "aa"), run_tm_basic(even, "aaa"))
# Expected output:
# ACC REJ

Let's verify, not just trust. Trace aa by writing each configuration (the chapter's notation: tape with the head's state inserted at the head position):

$$ q_0\,aa \;\to\; a\,q_1\,a \;\to\; aa\,q_0\,\sqcup \;\to\; \text{ACC}. $$

Two a's flip $q_0 \to q_1 \to q_0$; the end-blank in $q_0$ sends us to ACC. For aaa: $q_0 \to q_1 \to q_0 \to q_1$, and the end-blank in $q_1$ sends us to REJ. Output ACC REJ — matching the hand trace and §36.1.

🔄 Check Your Understanding Trace run_tm_basic(even, "") (empty input) by hand. What configuration does it start in, what does it read first, and what does it return?

Answer

It starts as $q_0\,\sqcup$ (head over a blank, since the input is empty). The first read is _ in state q0, whose rule is ("ACC", "_", "R"), so it goes straight to ACC. That is correct: zero is an even length.

Phase 2: Encode and verify two more machines

A simulator is only as convincing as the variety of machines it runs. Build two more, each a transition table, and verify each against its spec by hand. The first counts length mod 3; the second actually writes to the tape (so far our machines only read).

# Machine 1: accept iff the number of a's is divisible by 3. States q0,q1,q2 = len mod 3.
div3 = {("q0", "a"): ("q1", "a", "R"), ("q0", "_"): ("ACC", "_", "R"),
        ("q1", "a"): ("q2", "a", "R"), ("q1", "_"): ("REJ", "_", "R"),
        ("q2", "a"): ("q0", "a", "R"), ("q2", "_"): ("REJ", "_", "R")}
print(run_tm_basic(div3, "aaa"), run_tm_basic(div3, "aaaa"))
# Expected output:
# ACC REJ

Trace aaa: $q_0 \to q_1 \to q_2 \to q_0$, end-blank in $q_0 \Rightarrow$ ACC (3 is divisible by 3). Trace aaaa: $q_0 \to q_1 \to q_2 \to q_0 \to q_1$, end-blank in $q_1 \Rightarrow$ REJ (4 is not). The state index is the running length mod 3, and acceptance checks "mod 3 equals 0" — a clerk counting on three fingers.

Now a machine that modifies the tape: a unary incrementer. Input is a block of 1s representing a number in unary ($n$ ones $=$ the number $n$); the machine appends one more 1, computing $n+1$. It walks right to the first blank and writes a 1 there.

# Machine 2: unary increment. Walk right over 1's to the end-blank; write a 1; accept.
inc = {("q0", "1"): ("q0", "1", "R"),    # scan right across the 1's
       ("q0", "_"): ("q1", "1", "R"),    # at the end blank, WRITE a 1, step right
       ("q1", "_"): ("ACC", "_", "R")}   # land on the new blank -> accept
print(run_tm_basic(inc, "11"))           # 2 -> 3, tape becomes 111
# Expected output:
# ACC

Trace 11 (the number 2), tracking the tape as it changes:

$$ q_0\,11 \;\to\; 1\,q_0\,1 \;\to\; 11\,q_0\,\sqcup \;\xrightarrow{\text{write }1}\; 111\,q_1\,\sqcup \;\to\; \text{ACC}. $$

At the third step the head is on the end-blank in $q_0$; the rule writes 1 (turning 11_ into 111) and moves right; the new cell is blank in $q_1$, which accepts. The tape now holds 111 — the number 3. The machine computed, not just recognized: it transformed its input. This is the feature a DFA lacks (§36.1, the 🔗 Connection) and the reason Turing machines reach the top of the Chomsky hierarchy.

⚠️ Common Pitfall: forgetting tape extension. A beginner's simulator indexes tape[head] after the head walks off the written input and crashes with an IndexError. The two if guards in run_tm_basic (head < 0 and head == len(tape)) implement the unbounded tape by lazily growing it with blanks. Drop them and your "Turing machine" is secretly bounded — a different, weaker model. The unbounded tape is the one infinite ingredient §36.1 insists on.

🔄 Check Your Understanding In the inc machine, what does run_tm_basic(inc, "") (the number 0) return, and what is on the tape when it accepts?

Answer

Start $q_0\,\sqcup$: the head reads _ in q0, so the rule ("q1", "1", "R") writes a 1 and moves right; then _ in q1 $\Rightarrow$ ACC. The tape holds a single 1 — the number 1 — so it computed $0 + 1 = 1$. It returns ACC.

Phase 3: Build in the step budget — and meet the undecidable gap

Every machine above halts, so the unbounded loop was safe. But the whole point of §36.4 is that some machines never halt, and your simulator must run on a finite computer. Add a max_steps budget. The design decision — what to return when the budget is exhausted — is where the halting problem enters your code.

def run_tm(delta, tape, start="q0", accept="ACC", reject="REJ", blank="_", max_steps=10_000):
    """Bounded TM simulator. Returns 'ACC'/'REJ', or 'LOOP?' if the budget is hit.
    'LOOP?' is HONEST: we CANNOT decide halting (36.4), so we never claim 'REJ'/'loops'."""
    state, head, tape = start, 0, list(tape)
    for _ in range(max_steps):
        if state in (accept, reject):
            return state
        if head < 0:
            tape.insert(0, blank); head = 0
        if head == len(tape):
            tape.append(blank)
        state, write, move = delta[(state, tape[head])]
        tape[head] = write
        head += 1 if move == "R" else -1
    return "LOOP?"          # budget exhausted: 'maybe loops' -- the undecidable gap, made visible

# A deliberately NON-HALTING machine: it walks right forever, never reaching ACC/REJ.
spin = {("q0", "a"): ("q0", "a", "R"), ("q0", "_"): ("q0", "_", "R")}
print(run_tm(even, "aaaa"), run_tm(even, "aaa"), run_tm(spin, "a", max_steps=50))
# Expected output:
# ACC REJ LOOP?

Trace the three calls. even on aaaa: four flips $q_0\to q_1\to q_0\to q_1\to q_0$, end-blank in $q_0$ $\Rightarrow$ ACC (reached well within the budget). even on aaa: ends in $q_1$ on the blank $\Rightarrow$ REJ. spin on a: the head marches right forever — on the a, then across blank after blank, always in $q_0$, never halting. After 50 steps the budget runs out and the simulator returns "LOOP?".

Sit with that third return value. The simulator cannot print "REJ" or "loops" for spin, because that would be claiming to have decided non-halting — exactly what §36.4 forbids. All it can honestly say is "LOOP?": "I did not see it halt within the budget; I cannot conclude it never will." Raise max_steps to a million and spin still returns "LOOP?"; there is no finite budget that converts the honest "LOOP?" into a justified "loops", because halting times are unbounded — some machines halt at step $N+1$ for any $N$ you pick (the §36.4 Find-the-Error trap, now a fact about your own simulator).

🚪 Threshold Concept. You did not choose to put the halting problem in your code; it appeared unavoidably the moment you made the simulator finite. Any honest simulator of a Turing-complete model has a "LOOP?"-shaped hole, because deciding which inputs would have halted is the halting problem itself. This is the chapter's theorem reaching out of the page and into your editor: the limit is not a missing feature you could add next sprint — it is structural.

Phase 3.5: Programs as data — a simulator that reads a machine off its input

So far run_tm takes delta as a Python object. But §36.4's whole enabling move is that a program is a finite string you can feed to another program (the encoding $\langle P, x\rangle$). To feel that, we build a small step toward a universal simulator: one that reads a machine's transition table as text and runs it. This is the Turing-machine version of an interpreter, and it is what makes "run a program on its own description" (Phase 4) more than a slogan.

We encode a machine as newline-separated rows state,read -> next,write,move, parse them into a delta dictionary, then hand that to run_tm. The encoder and decoder are pure string manipulation — exactly the "programs are data" idea (§36.4).

def encode_tm(delta):
    """Serialize a transition table to text: one rule per line."""
    return "\n".join(f"{s},{r}->{ns},{w},{mv}" for (s, r), (ns, w, mv) in delta.items())

def decode_tm(text):
    """Parse the text back into a delta dict -- the 'read a machine off the tape' step."""
    delta = {}
    for line in text.splitlines():
        left, right = line.split("->")
        s, r = left.split(",")
        ns, w, mv = right.split(",")
        delta[(s, r)] = (ns, w, mv)
    return delta

def universal_run(machine_text, tape, **kw):
    """A mini 'universal' simulator: decode a machine FROM TEXT, then run it (36.2/36.4)."""
    return run_tm(decode_tm(machine_text), tape, **kw)

src = encode_tm(even)                 # 'even' from Phase 1, now as a STRING
print(universal_run(src, "aa"), universal_run(src, "aaa"))
# Expected output:
# ACC REJ

Trace the round trip. encode_tm(even) turns the four-rule table into text such as q0,a->q1,a,R / q0,_->ACC,_,R / q1,a->q0,a,R / q1,_->REJ,_,R (one rule per line). decode_tm parses each line back into the same (state, read) -> (next, write, move) entries, reconstructing the identical delta. universal_run then simulates it: aa $\to$ ACC, aaa $\to$ REJ, matching Phase 1. The machine survived the trip through text unchanged — which is the entire point.

🔗 Connection. universal_run is a baby universal Turing machine (§36.2, §36.4): a single machine that, given an encoding of any machine $M$ and an input $w$, simulates $M$ on $w$. Turing's 1936 paper built exactly this — the ancestor of every interpreter, virtual machine, and eval. It is also the construction that lets the halting problem even be stated: $\mathrm{HALT}$ takes $\langle P, x\rangle$ as input precisely because a universal machine can run $P$ from its description.

💡 Intuition: Notice that universal_run inherits the same "LOOP?" gap as run_tm — feed it the text of spin and it returns "LOOP?" after the budget. Building the universal layer did not add any power to decide halting; it only made "programs as data" tangible. A universal simulator can run any machine, but it still cannot decide whether the machine it is running will halt — that is §36.4, one level up.

⚠️ Common Pitfall: "an interpreter could detect the loop for us." Wrapping a machine in an interpreter (or a VM, or a sandbox) feels like it should grant new visibility into the machine's fate. It does not. The interpreter is itself a program; if it could decide whether the interpreted machine halts, it would be the halting decider §36.4 forbids. More tooling around the execution changes convenience, never the computability of "will it halt?".

Phase 4: Build the self-application gadget and connect it to $D$

The last piece is the mechanism behind the proof: feeding a program its own description. Build the helper, exhibit a safe self-application, then trace how the same shape becomes the contradiction $D$.

def applied_to_self(program):
    """Run a program on its own description -- the diagonal 'run it on itself' move (36.4)."""
    return program(program)

def name_of(f):                 # a one-argument program we can safely apply to itself
    return f.__name__

print(applied_to_self(name_of))     # name_of(name_of) -> the string "name_of"
# Expected output:
# name_of

applied_to_self(name_of) evaluates name_of(name_of). name_of ignores the body of its argument and returns its __name__ attribute, which is the string "name_of". It halts; it is harmless. The pattern — a program taking itself as input — is all we need; here it is benign.

Now build the weaponized version. We cannot supply a real halting decider H (the theorem forbids it), so we assume one and show the self-application makes every verdict wrong — the §36.4 construction, matching the chapter's make_D and example-03.

def make_D(H):
    """Given a (hypothetical) halting decider H, build the contrarian D.
    D(p): ask H whether p halts on p, then DO THE OPPOSITE."""
    def D(p):
        if H(p, p) == "halts":
            while True:        # H predicted 'halts' -> D loops forever
                pass
        return "halted"        # H predicted 'loops' -> D halts
    return D

def trace_D_on_D():
    # Consider applied_to_self(D) = D(D). Try both verdicts H could give for D-on-D:
    for verdict in ("halts", "loops"):
        actual = "loops" if verdict == "halts" else "halts"   # D does the opposite
        print(f"H(D,D)='{verdict}' -> D(D) actually {actual} -> contradicts H")

trace_D_on_D()
# Expected output:
# H(D,D)='halts' -> D(D) actually loops -> contradicts H
# H(D,D)='loops' -> D(D) actually halts -> contradicts H

The two helpers share one shape: applied_to_self runs program(program); D is run on D, i.e. D(D). The difference is the body. name_of reads only its argument's name and returns peacefully. D reads H's prediction about D(D) and then contradicts it — so whatever H would say, D does the opposite, and no correct, always-halting H can survive. Self-application is not the villain; self-application combined with the freedom to negate a prediction about yourself is. That is the liar's paradox compiled into a program, and it is why the decider H you would need to upgrade your simulator's "LOOP?" into a real "loops" cannot exist.

🔗 Connection. Lay this beside Chapter 10's diagonal_escape. There, $g(n) = 1 - f_n(n)$ ran down the diagonal of a proposed list of functions and flipped each bit, producing a function on no list. Here, D runs down the diagonal of "every program applied to itself" and flips the halting behavior, producing a behavior H cannot correctly predict. Same diagonal, same flip — your applied_to_self plus make_D is diagonal_escape aimed at programs (§36.4).

Discussion Questions

  1. Your run_tm returns "LOOP?", never "loops", when the budget is hit. Explain to a teammate why returning "loops" would be a correctness bug (not just a style choice), referencing the bounded versus unbounded distinction in §36.4 and the Find-the-Error callout.
  2. The inc machine writes to the tape; the even and div3 machines only read. Which feature of the Turing-machine definition (§36.1) does writing exercise, and why is it the feature that separates Turing machines from the DFAs of Chapter 35?
  3. Suppose you "improve" run_tm by detecting repeated configurations (same state, head position, and tape) and returning "REJ" when one recurs, reasoning "a repeat means an infinite loop." Does this make run_tm a halting decider? Explain why it catches some non-halting machines but not all, and connect to why no finite trick closes the gap (§36.4).
  4. applied_to_self(name_of) halts but applied_to_self(D) (i.e., D(D)) cannot consistently do either. What single property of D's body, absent from name_of's, creates the paradox? State it in one sentence.
  5. By the Church–Turing thesis (§36.2), your Python simulator and a Turing machine compute the same functions. Use that to argue that the "LOOP?" gap is not an artifact of Python — it would appear in a simulator written in any Turing-complete language. Which sentence of §36.2 makes this airtight?

Your Turn: Extensions

  • Option A (binary incrementer). Build a Turing machine that adds 1 to a binary number written on the tape (handle the carry, including the all-ones case 111 -> 1000). Give its transition table, hand-trace it on 11 (which is 3, $\to$ 100), and verify with run_tm's expected output.
  • Option B (a decider with a guaranteed bound). Encode a machine that always halts within a number of steps you can predict from the input length (e.g., even), and argue that for this machine run_tm will never return "LOOP?" once max_steps exceeds the input length. Contrast with spin.
  • Option C (configuration-cycle detector). Implement the "repeated configuration $\Rightarrow$ reject" idea from Discussion Question 3 as run_tm_cycle. Hand-trace it on spin (does it detect the cycle?) and on a machine that loops by growing the tape forever (does it?). Explain, from your two traces, exactly which non-halting machines this catches and which it misses.
  • Option D (toolkit increment). Package run_tm, applied_to_self, and your three machines as computability.py, matching the chapter's Project Checkpoint. Confirm your signatures match (run_tm(delta, tape, ...), applied_to_self(program)) so the module composes with the rest of dmtoolkit.
  • Option E (round-trip a machine). Using the Phase 3.5 encode_tm/decode_tm, encode the div3 machine to text, hand-write the exact lines encode_tm produces, decode them, and confirm universal_run on the decoded text gives the same ACC/REJ verdicts as Phase 2. Then explain in two sentences why running a machine from its text encoding is the construction that makes the $\langle P, x\rangle$ input of $\mathrm{HALT}$ meaningful (§36.4).

Key Takeaways

  1. A Turing machine is a dictionary plus a loop. The transition function is a (state, symbol) -> (state, write, move) table; simulation reads, writes, moves, and switches state until a halting state is reached. Building it makes the §36.1 model concrete.
  2. Verify built machines by tracing configurations, not by running them. Each machine here was checked against its spec by hand — ACC/REJ derived from the configuration sequence — exactly the book's no-execution discipline.
  3. The unbounded tape is non-negotiable; the step budget is unavoidable. Lazy blank-extension implements the one infinite ingredient; a finite computer forces a max_steps bound on the otherwise-infinite loop.
  4. "LOOP?" is the halting problem, sitting in your code. An honest finite simulator cannot convert "did not halt within the budget" into "loops forever" — that conversion is the undecidable problem itself, and no finite trick (not even cycle detection) closes the gap.
  5. Self-application is harmless until paired with self-negation. applied_to_self(name_of) halts; the proof's D run on D cannot — because D reads a prediction about itself and defies it. That is Cantor's diagonal flip (Chapter 10) compiled into a program, and the reason a halting decider — the thing that would fix your simulator's gap — cannot exist.