Chapter 35: Automata and Formal Languages

DataField.Dev

44 min read

> "A model of computation is not a computer. It is an argument about what computers can do."

Prerequisites

8
12
27

Learning Objectives

Model a recognition problem as a formal language over an alphabet, and reason about languages as sets of strings.
Design a deterministic finite automaton (DFA) for a given language and trace its computation on an input string.
Build a nondeterministic finite automaton (NFA), and convert an NFA to an equivalent DFA by the subset construction.
Translate between regular expressions and finite automata, and explain why they recognize exactly the same class of languages.
Apply the pumping lemma to prove that a specific language is not regular.
Place a language in the Chomsky hierarchy and match each level to the machine that recognizes it.

In This Chapter

Overview
Learning Paths
35.1 Languages and the recognition problem
35.2 Deterministic finite automata
35.3 Nondeterministic finite automata and equivalence
35.4 Regular expressions and finite automata
35.5 The pumping lemma: the limits of regularity
35.6 Beyond regular: the Chomsky hierarchy
Project Checkpoint: a DFA simulator for the Toolkit
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 35: Automata and Formal Languages

"A model of computation is not a computer. It is an argument about what computers can do." — paraphrasing the spirit of Sipser's Introduction to the Theory of Computation

Overview

Every time you write if re.match(pattern, s), validate an email field, or watch a compiler reject your code with a syntax error, a tiny machine is running inside the machine. It reads your string one character at a time, keeps a little bit of state, and at the end says yes, this string is in the language I recognize or no, it is not. That machine is a finite automaton, and this chapter is about what such machines can and — crucially — cannot do.

This is your first real taste of the theory of computation, the branch of discrete math that asks not "how do I compute this?" but "can this be computed, and at what cost?" We begin at the bottom of the ladder, with the simplest useful machine there is: one with a fixed, finite memory. You will be surprised how much it can recognize (every pattern your regex library accepts) and, by the end, exactly where it hits a wall (it cannot count without bound — it cannot even check balanced parentheses). That wall is not an engineering limitation we will fix in the next release. It is a mathematical theorem. Finding walls like this — and proving they are real — is what the rest of Part VI is about.

Why should a working programmer care? Three reasons, concrete and immediate:

Regular expressions are finite automata in disguise. Understanding the machine explains why some patterns are fast and others (catastrophic backtracking) blow up, and why a "regex" that matches nested brackets is impossible in the strict sense.
Lexers and protocol parsers are state machines. The first phase of every compiler, every network protocol implementation, and every input validator is a finite automaton you could draw on a napkin.
This is the on-ramp to the deepest results in CS. The pumping-lemma argument you will learn here is the same shape of argument that proves the halting problem undecidable (Chapter 36) and that separates easy problems from hard ones (Chapter 37). Learn the move here, where it is concrete.

In this chapter you will learn to:

Speak precisely about strings, alphabets, and languages, and recast familiar questions ("is this a valid identifier?") as the language-recognition problem.
Design and trace a deterministic finite automaton for a target language.
Build a nondeterministic finite automaton, and prove the surprising fact that nondeterminism adds no power by converting any NFA into a DFA.
Move freely between regular expressions and automata, knowing they describe the very same class.
Prove a language is beyond finite automata using the pumping lemma.
Locate any language in the Chomsky hierarchy and name the machine that recognizes its level.

Learning Paths

🏃 Fast Track: If you have built state machines before, skim 35.1–35.2, then concentrate on 35.3 (the subset construction is the conceptual core), 35.5 (the pumping lemma — the one proof technique you must own), and the hierarchy table in 35.6. Do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. Draw every automaton yourself before looking at ours, and trace at least one accepted and one rejected string per machine. Work every 🔄 Check Your Understanding.

🔬 Deep Dive: After the chapter, prove the Myhill–Nerode theorem (a language is regular iff it has finitely many distinguishable prefixes) and use it as a second, sharper tool for proving non-regularity. Pointers in Further Reading; exercises Part E.

35.1 Languages and the recognition problem

Start with a question you have answered a hundred times in code: is this string a valid input? Is "user@host.com" a well-formed email? Is "0b1010" a binary literal? Is "((a+b))" a balanced expression? In each case you are sorting strings into two bins — yes, accept and no, reject — and the set of all "yes" strings is the thing you are really reasoning about. Formal-language theory gives that set a name and studies it as a mathematical object.

We need three definitions, building on the set theory of Chapter 8. They are simple, but precision here pays off for the whole of Part VI.

An alphabet, written $\Sigma$, is any finite, non-empty set of symbols. For binary strings $\Sigma = \{0, 1\}$; for DNA $\Sigma = \{A, C, G, T\}$; for text it might be the 128 ASCII characters. A string (or word) over $\Sigma$ is a finite sequence of symbols from $\Sigma$ — for example 0110 over $\{0,1\}$. The empty string, written $\varepsilon$, is the string of length zero (the analogue of "" in code). The length of a string $w$ is written $\lvert w\rvert$; so $\lvert 0110\rvert = 4$ and $\lvert\varepsilon\rvert = 0$.

We write $\Sigma^{*}$ for the set of all strings over $\Sigma$, including $\varepsilon$. (The notation is deliberate: the star is the same "zero or more" you know from regex, and it is the same operation we will define formally in 35.4.) Even for a two-symbol alphabet, $\Sigma^{*}$ is infinite: $\{0,1\}^{*} = \{\varepsilon, 0, 1, 00, 01, 10, 11, 000, \dots\}$.

💡 Intuition: $\Sigma^{*}$ is the universe of every possible input. A program that reads strings is, mathematically, a way of carving $\Sigma^{*}$ into the inputs it accepts and the inputs it rejects. Everything in this chapter is about which carvings are possible with which machines.

Now the central definition.

Definition (language). A language $L$ over an alphabet $\Sigma$ is any subset of $\Sigma^{*}$, i.e. $L \subseteq \Sigma^{*}$. The recognition problem for $L$ is: given a string $w \in \Sigma^{*}$, decide whether $w \in L$.

A language is just a set of strings — nothing more exotic than that. The set of valid email addresses is a language. The set of syntactically correct Python programs is a language. The set of binary strings with an even number of 1s is a language. Studying "what computers can recognize" is studying which languages a given machine can decide membership in.

🔗 Connection. A language is a set (Chapter 8) of strings, and a string is a sequence. The recognition problem is exactly asking whether the set $L$ contains a given element — the membership question $w \in L$ — but asked uniformly, by a single finite procedure that must work for every one of the infinitely many possible $w$. That shift from "is this one element in the set?" to "give me a machine that answers for all elements" is the leap from set theory to computation.

Worked example (describing a language three ways). Consider $\Sigma = \{0,1\}$ and the language

$$L_{\text{even}} = \{\, w \in \{0,1\}^{*} \mid w \text{ contains an even number of } 1\text{s} \,\}.$$

Three legitimate strings in $L_{\text{even}}$: $\varepsilon$ (zero 1s, and zero is even), $0$, and $1010$. Three strings not in it: $1$, $111$, $0001$. We can describe $L_{\text{even}}$ in plain English ("even number of 1s"), in set-builder notation (above), or — as we will spend the rest of the chapter doing — by building a machine that recognizes it. The machine is the constructive answer: it does not merely describe the set, it decides membership mechanically.

Here is a first, brute-force recognizer in Python, to anchor the idea before we abstract it:

def in_L_even(w):
    """True iff w (a string over {'0','1'}) has an even number of '1's."""
    count = 0
    for ch in w:
        if ch == '1':
            count += 1
    return count % 2 == 0

print(in_L_even(""), in_L_even("1010"), in_L_even("111"))
# Expected output:
# True True False

Notice what this function really needs to remember as it scans: not the whole count, but only its parity — whether the number of 1s so far is even or odd. That single bit of memory is the seed of the finite automaton. We are about to make "a machine with a fixed, finite amount of memory" a precise mathematical object.

🔄 Check Your Understanding 1. Is the empty language $\emptyset$ a language? Is the language $\{\varepsilon\}$ the same as $\emptyset$? 2. How many strings of length exactly 3 are there over $\Sigma = \{a, b\}$? How many of length $\le 3$ (counting $\varepsilon$)? 3. Why does in_L_even only need one bit of memory, not a full integer counter?

Answers

Yes — $\emptyset \subseteq \Sigma^{*}$, so the empty set is a perfectly good language (it accepts nothing). No — $\{\varepsilon\}$ contains one string (the empty string), so $\lvert{\varepsilon}\rvert = 1$, whereas $\lvert\emptyset\rvert = 0$. They are different languages. 2. $2^3 = 8$ of length 3. Of length $\le 3$: $1 + 2 + 4 + 8 = 15$ (one of length 0, two of length 1, four of length 2, eight of length 3). 3. Membership depends only on whether the running count of 1s is even or odd; the exact count is irrelevant. Two states — "even so far" and "odd so far" — suffice.

35.2 Deterministic finite automata

The in_L_even function had exactly two meaningful situations: "an even number of 1s so far" and "an odd number so far." Reading a 0 leaves the situation unchanged; reading a 1 flips it. We start in "even" (zero 1s), and we accept if we end in "even." Draw two circles, an arrow between them for each 1, self-loops for each 0, mark the start, mark the accepting circle — and you have just drawn a deterministic finite automaton. Let us define it precisely, then read that drawing back off the definition.

Definition (DFA). A deterministic finite automaton is a 5-tuple $M = (Q, \Sigma, \delta, q_0, F)$ where - $Q$ is a finite set of states, - $\Sigma$ is the input alphabet, - $\delta \colon Q \times \Sigma \to Q$ is the transition function (from a state and an input symbol it gives the unique next state), - $q_0 \in Q$ is the start state, and - $F \subseteq Q$ is the set of accepting (or final) states.

The word deterministic lives entirely in $\delta$: from any state, each input symbol leads to exactly one next state — no choice, no ambiguity. The word finite lives in $Q$: there is a fixed number of states, decided in advance, no matter how long the input. That finiteness is the whole story of this chapter's power and its limits.

How a DFA computes. Place the machine in $q_0$. Read the input string left to right; on each symbol $a$, move from the current state $q$ to $\delta(q, a)$. After the last symbol, look at where you landed: if it is in $F$, the machine accepts; otherwise it rejects. The language recognized by $M$, written $L(M)$, is the set of all strings it accepts.

💡 Intuition: A DFA is a program with one local variable (the current state) that can hold only finitely many values, a single for loop over the input, and no other memory — no stack, no counter, no array that grows with the input. The state is everything the machine remembers about the prefix it has read so far. Designing a DFA is the art of choosing what is worth remembering.

The parity machine, formally. Our $L_{\text{even}}$ recognizer is $M = (Q, \Sigma, \delta, q_0, F)$ with $Q = \{E, O\}$ (Even, Odd), $\Sigma = \{0,1\}$, start state $q_0 = E$, accepting set $F = \{E\}$, and transition function:

$\delta$	on `0`	on `1`
$\to E$ (start, accept)	$E$	$O$
$O$	$O$	$E$

(The little arrow marks the start state; a double-circle or "accept" tag marks accepting states — here, $E$.) Reading 0 never changes the parity, so both states self-loop on 0. Reading 1 flips parity, so the two states swap on 1.

Tracing a computation. Let us run $M$ on 1011, the move that turns an abstract definition into something you can check by hand. We track the state after each symbol:

$$E \xrightarrow{\,1\,} O \xrightarrow{\,0\,} O \xrightarrow{\,1\,} E \xrightarrow{\,1\,} O.$$

The machine ends in $O$, which is not accepting, so 1011 is rejected — correct, since 1011 has three 1s (odd). Run it on 1010 instead and you end in $E$ (accept), correct for two 1s. A trace like this is the DFA analogue of stepping through code in a debugger, and you should be able to produce one for any machine on any input.

Here is the same DFA as a data structure plus a universal driver — the pattern we will reuse and, in the Project Checkpoint, package into the Toolkit:

# DFA for "even number of 1s". States: 'E' (start, accept), 'O'.
delta = {('E', '0'): 'E', ('E', '1'): 'O',
         ('O', '0'): 'O', ('O', '1'): 'E'}
start, accept = 'E', {'E'}

def run_dfa(delta, start, accept, w):
    state = start
    for ch in w:
        state = delta[(state, ch)]   # the unique next state
    return state in accept

print(run_dfa(delta, start, accept, "1010"))
print(run_dfa(delta, start, accept, "1011"))
# Expected output:
# True
# False

The driver run_dfa is the definition of acceptance, transcribed: start in start, apply $\delta$ once per symbol, test membership in accept at the end. Every DFA in this chapter runs on this same five-line engine; only the data (delta, start, accept) changes. That is the payoff of the formal 5-tuple — one universal interpreter for an infinite family of machines.

A second design, to build the skill. Languages over $\{0,1\}$ that "end in 01" are recognized by a DFA that remembers how much of the pattern 01 it has just seen. Three states do it: $S$ (start / "nothing useful yet"), $A$ ("just saw a 0, a possible start"), $B$ ("just saw 01, accept"). The design question is always the same: what minimal fact about the prefix must I carry forward to decide correctly? Here it is "the longest suffix of what I have read that is a prefix of 01."

$\delta$	on `0`	on `1`
$\to S$	$A$	$S$
$A$	$A$	$B$
$B$ (accept)	$A$	$S$

Reading 0 always lands in $A$ (a fresh possible start of 01); reading 1 accepts only if we were in $A$ (we had just seen a 0). Trace 1101: $S \xrightarrow{1} S \xrightarrow{1} S \xrightarrow{0} A \xrightarrow{1} B$ — accept, since `1101` does end in `01`. Trace `010`: $S \xrightarrow{0} A \xrightarrow{1} B \xrightarrow{0} A$ — reject, correct.

⚠️ Common Pitfall: a DFA's $\delta$ must be total. Every state needs a transition for every symbol in $\Sigma$ — that is what "deterministic" and "function" demand. A frequent bug is to draw only the "interesting" arrows and forget what happens on the other symbols. A clean fix for languages defined by a forbidden pattern is an explicit dead state (a "trap"): a non-accepting state that loops to itself on every symbol, which you route to the moment the input becomes hopeless. If your diagram has a state missing an outgoing arrow for some symbol, it is not yet a DFA.

🔄 Check Your Understanding 1. In the 5-tuple, which component changes if you switch a language to its complement (accept exactly the strings the original rejected), assuming the same states and transitions? 2. Trace the "ends in 01" DFA on the input 00 and on $\varepsilon$. Does it accept either? 3. Why can a DFA never need more than $\lvert Q\rvert$ distinct "memories," no matter how long the input?

Answers

Only $F$: swap accepting and non-accepting states, i.e. replace $F$ by $Q \setminus F$. (This is a real construction — finite automata are closed under complement.) 2. On 00: $S \to A \to A$, ends in $A$ — reject. On $\varepsilon$: no symbols are read, the machine stays in the start state $S$, which is non-accepting — reject. Neither ends in 01, both correct. 3. The only thing a DFA remembers is its current state, and there are exactly $\lvert Q\rvert$ of them. However long the input, after each symbol the machine is in one of finitely many states — it cannot accumulate unbounded information.

35.3 Nondeterministic finite automata and equivalence

Sometimes the most natural machine for a pattern wants to "guess." Consider strings over $\{0,1\}$ whose third-from-last symbol is a 1 — i.e. the language $\{w : w$ has a 1 in the third position from the end$\}$. A DFA for this is fiddly: it must remember the last three symbols, needing eight states. But there is a slick description: guess the position that is third-from-last, verify it is a 1, then check that exactly two symbols follow. A machine that may "explore several possibilities at once" captures this directly. That machine is a nondeterministic finite automaton.

Definition (NFA). A nondeterministic finite automaton is a 5-tuple $M = (Q, \Sigma, \delta, q_0, F)$ exactly like a DFA, except the transition function returns a set of states: $$\delta \colon Q \times (\Sigma \cup \{\varepsilon\}) \to \mathcal{P}(Q).$$ Two relaxations: (1) from a state on a symbol there may be zero, one, or many next states; (2) an $\varepsilon$-transition lets the machine change state without reading any input.

(The codomain $\mathcal{P}(Q)$ is the power set of $Q$ from Chapter 8 — the set of all subsets — because $\delta$ now hands back a set of possible next states.)

What "acceptance" means for an NFA. An NFA accepts a string $w$ if there exists at least one path of choices, from $q_0$, consuming $w$ (with $\varepsilon$-moves allowed at any time), that ends in an accepting state. Rejection means every possible path fails. Think of it as the machine cloning itself at each choice point and exploring all branches in parallel; it accepts if any clone ends up happy.

💡 Intuition: Nondeterminism is "lucky guessing made rigorous." The machine is permitted to pick the right branch whenever a right branch exists — equivalently, to pursue all branches at once and accept if any succeeds. It is a thinking aid for the designer, not a physical device: real hardware is deterministic. The astonishing payoff, below, is that this guessing power can always be compiled away.

An NFA for "third-from-last symbol is 1." Four states in a line:

$\delta$ (NFA)	on `0`	on `1`
$\to q_0$	$\{q_0\}$	$\{q_0, q_1\}$
$q_1$	$\{q_2\}$	$\{q_2\}$
$q_2$	$\{q_3\}$	$\{q_3\}$
$q_3$ (accept)	$\emptyset$	$\emptyset$

State $q_0$ loops on everything (skipping symbols it does not care about), but on a 1 it also offers a branch to $q_1$ — this is the "guess that this 1 is third-from-last." From $q_1$, any two symbols walk $q_1 \to q_2 \to q_3$. If the guessed 1 really was third-from-last, the machine lands in the accepting $q_3$ exactly at the end of input. Note $q_0$ on 1 returns two states and $q_3$ returns the empty set on everything — both impossible for a DFA, both fine for an NFA.

Trace 1101 (third-from-last symbol is the 1 at index 1, so it should accept). One accepting branch: $q_0 \xrightarrow{1} q_0 \xrightarrow{1} q_1 \xrightarrow{0} q_2 \xrightarrow{1} q_3$. Because one accepting branch exists, the NFA accepts — even though other branches (e.g. staying in $q_0$ the whole time) do not end in $q_3$.

Why nondeterminism adds no power: the subset construction

Here is one of the most important facts in this chapter, and the first genuinely surprising theorem in the book's theory-of-computation arc.

Theorem (subset construction). For every NFA $N$ there is a DFA $M$ with $L(M) = L(N)$. That is, NFAs and DFAs recognize exactly the same class of languages.

🚪 Threshold Concept. Nondeterminism — the power to guess — does not let finite automata recognize a single new language. This is your first encounter with a theme that runs to the top of the field: a more expressive way to describe a computation need not be a more powerful computation. The same question explodes into the deepest open problem in computer science when "finite automaton" becomes "polynomial-time machine": that is the P-versus-NP question (Chapter 37). The proof technique that tames nondeterminism here — track the set of states you could be in — is exactly why we can reason about it at all.

The strategy first. A DFA cannot make choices, but it can bookkeep. The single idea is: run the NFA on all branches simultaneously and, at each step, record the set of states the NFA could be in. That set is a single, well-defined thing — and there are only finitely many subsets of $Q$ (there are $2^{\lvert Q\rvert}$ of them), so "the current set of possible NFA-states" can itself serve as one DFA state. The DFA accepts exactly when that set contains at least one NFA-accepting state.

Proof. Let $N = (Q, \Sigma, \delta_N, q_0, F)$ be an NFA. (For clarity assume no $\varepsilon$-transitions; the general case adds an $\varepsilon$-closure step, defined just below, in every place a set of states appears.) Construct the DFA $M = (Q', \Sigma, \delta_M, q_0', F')$ as follows:

States: $Q' = \mathcal{P}(Q)$ — every DFA state is a subset of $Q$.
Start: $q_0' = \{q_0\}$ — before reading anything, the only possible NFA-state is $q_0$.
Transitions: for a subset $S \subseteq Q$ and a symbol $a \in \Sigma$, define $$\delta_M(S, a) = \bigcup_{q \in S} \delta_N(q, a).$$ In words: from the set of states you could be in, take every state each of them could move to on $a$; their union is the new set of possible states.
Accept: $F' = \{\, S \subseteq Q \mid S \cap F \neq \emptyset \,\}$ — accept iff the set of possible NFA-states includes at least one accepting NFA-state.

It remains to argue $L(M) = L(N)$. The key claim, proved by induction on the length of the input string (the very technique of Chapter 6), is:

Lemma. After $M$ reads a string $w$, its single DFA-state equals exactly the set of NFA-states reachable in $N$ from $q_0$ by reading $w$.

Base case ($\lvert w\rvert = 0$, i.e. $w = \varepsilon$): $M$ is in $q_0' = \{q_0\}$, and the set of $N$-states reachable from $q_0$ on the empty input is $\{q_0\}$. They match.

Inductive step: assume the lemma holds after reading some string $u$, so $M$ is in the state $S = \{$NFA-states reachable from $q_0$ on $u\}$. Read one more symbol $a$. By the definition of $\delta_M$, the machine $M$ moves to $\bigcup_{q \in S}\delta_N(q, a)$ — which is precisely the set of $N$-states reachable on $ua$ (a state is reachable on $ua$ iff it is an $a$-successor of some state reachable on $u$). So the lemma holds after $ua$.

By induction, after reading any $w$ the DFA-state is exactly the set of NFA-states $N$ could be in. Therefore $M$ accepts $w$ $\iff$ that set meets $F$ $\iff$ some NFA-path on $w$ ends in $F$ $\iff$ $N$ accepts $w$. Hence $L(M) = L(N)$. $\blacksquare$

The $\varepsilon$-transition case is handled by the $\varepsilon$-closure of a set $S$: the set of all states reachable from $S$ using $\varepsilon$-moves alone (including the states of $S$ themselves). You apply the closure to the start set and after every symbol-step; the induction goes through unchanged. We will not belabor it, but it is what makes the construction fully general.

⚠️ Common Pitfall: The subset construction can produce up to $2^{\lvert Q\rvert}$ DFA states — an exponential blow-up — and for some languages this is unavoidable. So "NFAs and DFAs are equivalent in power" does not mean "equivalent in size." NFAs can be exponentially smaller. (The third-from-last example: a 4-state NFA, but the minimal DFA needs 8 states; push it to $k$-th-from-last and the gap becomes $k{+}1$ versus $2^k$.) Equivalence of recognizable languages is the theorem; economy of description is a separate, real advantage of nondeterminism.

Here is the subset construction as code, run on the third-from-last NFA. It is the proof made executable — each DFA "state" is a Python frozenset of NFA states:

# NFA: third-from-last symbol is '1'.  delta_N: (state, symbol) -> set of states.
dN = {('q0','0'): {'q0'}, ('q0','1'): {'q0','q1'},
      ('q1','0'): {'q2'}, ('q1','1'): {'q2'},
      ('q2','0'): {'q3'}, ('q2','1'): {'q3'},
      ('q3','0'): set(),  ('q3','1'): set()}
F = {'q3'}

def nfa_accepts(dN, start, F, w):
    current = {start}                       # set of states we could be in
    for ch in w:
        nxt = set()
        for q in current:
            nxt |= dN.get((q, ch), set())   # union of all reachable next states
        current = nxt
    return len(current & F) > 0             # accept iff any current state is final

print(nfa_accepts(dN, 'q0', F, "1101"))     # third-from-last is '1'
print(nfa_accepts(dN, 'q0', F, "0001"))     # third-from-last is '0'
# Expected output:
# True
# False

The variable current is literally a DFA state from the subset construction; the loop body computes $\delta_M(\text{current}, ch)$ by unioning. Determinizing on the fly like this is exactly how a regex engine can simulate an NFA without ever building the (possibly huge) DFA explicitly.

🔄 Check Your Understanding 1. An NFA has 5 states. What is the largest number of states its subset-construction DFA could have? 2. In nfa_accepts, what does current becoming the empty set mid-scan mean about the input? 3. True or false: if an NFA has an accepting and a rejecting path on the same string $w$, the NFA rejects $w$.

Answers

$2^5 = 32$ subsets of a 5-element set (one DFA state per subset; many may be unreachable in practice, but $32$ is the bound). 2. It means every branch has died — no NFA path can continue, so no path can possibly accept; the string will be rejected (the empty set contains no accepting state).

False — an NFA accepts if there exists an accepting path. One accepting path is enough; the failing paths do not matter.

35.4 Regular expressions and finite automata

You already use regular expressions. a*b means "zero or more as, then a b"; (0|1)* means "any binary string." What is the precise object behind that syntax, and what is its exact relationship to the machines of 35.2–35.3? The answer is the cleanest "two languages, one meaning" result in the chapter.

We define regular expressions inductively (a recursive definition in the sense of Chapter 7): a few base cases, plus rules that build bigger expressions from smaller ones.

Definition (regular expression). The regular expressions over an alphabet $\Sigma$, and the language $L(R)$ each one denotes, are defined recursively: - $\emptyset$ is a regex, with $L(\emptyset) = \emptyset$ (matches nothing). - $\varepsilon$ is a regex, with $L(\varepsilon) = \{\varepsilon\}$ (matches only the empty string). - for each $a \in \Sigma$, $a$ is a regex, with $L(a) = \{a\}$ (matches that one symbol).

And if $R$ and $S$ are regular expressions, so are: - $(R \mid S)$ — union/alternation: $L(R \mid S) = L(R) \cup L(S)$. - $(RS)$ — concatenation: $L(RS) = \{xy \mid x \in L(R),\ y \in L(S)\}$. - $(R^{*})$ — Kleene star: $L(R^{*}) = {x_1 x_2 \cdots x_k \mid k \ge 0,\ \text{each } x_i \in L(R)}$ — zero or more copies of strings from $L(R)$ concatenated (the $k = 0$ case yields $\varepsilon$).

That is the entire core syntax. Everything else in a practical regex library — + (one or more, equal to $RR^{*}$), ? (optional, equal to $R \mid \varepsilon$), character classes, . — is convenient shorthand built from these three operations. A language is called regular if it is $L(R)$ for some regular expression $R$.

Worked example. Over $\Sigma = \{0,1\}$, the language $L_{\text{even}}$ ("even number of 1s") is denoted by the regular expression $$R = (0^{*}\,1\,0^{*}\,1)^{*}\,0^{*}.$$ Read it aloud: any number of leading 0s; then repeated pairs of 1s (each 1 padded by arbitrary 0s), ensuring 1s come in twos; then trailing 0s. The string 1010 parses as one pair ($0^{*}=\varepsilon$, $1$, $0^{*}=0$, $1$) followed by $0^{*}=0$. The string 1 cannot be parsed (it has an odd, unmatched 1), so it is correctly excluded. This is the same language the parity DFA of 35.2 recognizes — our first concrete instance of the equivalence we are about to prove in general.

Now the headline theorem, due to Stephen Kleene.

Theorem (Kleene's theorem). A language is regular (denoted by some regular expression) if and only if it is recognized by some finite automaton.

🔗 Connection. This is why your programming language's regex library can be implemented with a finite automaton, and why anything you can build a DFA for, you can also write as a regex. The two formalisms — one declarative (a pattern), one operational (a machine) — describe exactly the same class of languages. When a tool advertises "regular expressions" but also supports backreferences like \1, it has quietly left this class: backreferences can match non-regular languages and are not part of the theory here.

The strategy first. "If and only if" splits into two constructions, each fully algorithmic. (⇐ regex from automaton): given a DFA, rip out states one at a time, relabeling the remaining edges with regular expressions that summarize all the paths through the deleted state, until two states remain and the surviving edge label is the answer (the state-elimination method). (⇒ automaton from regex): recurse on the structure of the regex, gluing together small NFAs with $\varepsilon$-transitions — one tiny gadget per operator (Thompson's construction). We will detail the second direction, because it is the one that explains how a regex compiles into a runnable matcher, and because it puts the $\varepsilon$-transitions of 35.3 to work.

Proof sketch (⇒: every regex has an NFA — Thompson's construction). We build, by recursion on the regex, an NFA with one start state and one accepting state denoting the same language. The base cases are immediate:

$\emptyset$: a start state and a separate accept state, with no transitions (nothing reaches accept).
$\varepsilon$: a start state with an $\varepsilon$-transition straight to the accept state.
$a$: a start state with an $a$-transition to the accept state.

For the inductive step, assume NFAs $N_R$ and $N_S$ have already been built for sub-expressions $R$ and $S$. Combine them with $\varepsilon$-transitions, one gadget per operator:

Union $R \mid S$: a new start state with $\varepsilon$-moves into the starts of both $N_R$ and $N_S$; their two accept states get $\varepsilon$-moves into a new shared accept state. ("Try either branch.")
Concatenation $RS$: wire $N_R$'s accept state to $N_S$'s start state with an $\varepsilon$-move; the combined start is $N_R$'s start, the combined accept is $N_S$'s. ("Finish $R$, then begin $S$.")
Star $R^{*}$: a new start state that $\varepsilon$-moves both to $N_R$'s start (do a copy) and straight to a new accept (do zero copies); add an $\varepsilon$-move from $N_R$'s accept back to its start (loop for more copies). ("Zero or more copies of $R$.")

Each gadget visibly preserves "the NFA accepts exactly $L(\text{sub-expression})$," and the recursion bottoms out at the base cases — a proof by structural induction. The result is an NFA for the whole regex, which the subset construction of 35.3 then turns into a DFA if you want one. Together with the state-elimination direction (which we omit), this proves Kleene's theorem. $\blacksquare$

💡 Intuition: Thompson's construction is the literal pipeline inside many regex engines: regex → NFA → simulate (or determinize). The $\varepsilon$-transitions are the "free" wiring that stitches the per-operator gadgets together. Now you can see why a regex compiles to a machine — the syntax tree of the pattern becomes the wiring diagram of the automaton, operator by operator.

A small, runnable taste — not Thompson's construction in full, but the parity regex's meaning checked against its DFA, illustrating Theme 4 (computation tests what a proof guarantees):

# Both recognize "even number of 1s": the DFA from 35.2 and Python's regex for R.
import re
delta = {('E','0'):'E', ('E','1'):'O', ('O','0'):'O', ('O','1'):'E'}
def dfa_even(w):
    s = 'E'
    for ch in w:
        s = delta[(s, ch)]
    return s == 'E'
pattern = re.compile(r'^(0*10*1)*0*$')        # R = (0*10*1)*0*
tests = ["", "0", "11", "1010", "1", "111"]
print([dfa_even(t) == bool(pattern.match(t)) for t in tests])
# Expected output:
# [True, True, True, True, True, True]

Every entry is True: on each test string the DFA and the regular expression agree, exactly as Kleene's theorem promises. (We are testing the equivalence on six strings, not proving it — the proof above is what settles all infinitely many strings.)

🔄 Check Your Understanding 1. Write a regular expression for "binary strings that start with 1 and end with 0." 2. Which regex operator do the $\varepsilon$-transitions implement most directly in Thompson's construction, and why are they needed at all? 3. Is the language "strings of the form $0^n 1^n$ (equal numbers of 0s then 1s), $n \ge 0$" regular? (Hold this thought — 35.5 answers it.)

Answers

$1\,(0 \mid 1)^{*}\,0$ — a leading 1, any middle, a trailing 0. (Edge note: this requires length $\ge 2$; the single string 10 is included, 1 and 0 are not.) 2. They implement choice and looping — union ("go either way for free") and star ("loop back for free") — and concatenation's "glue." They are needed so each operator's gadget has a clean single start and single accept to wire to the next, without consuming input. 3. No — and proving it is the job of the pumping lemma in the next section. A finite automaton cannot remember an unbounded $n$.

35.5 The pumping lemma: the limits of regularity

We have spent four sections showing how much finite automata can do. Now the wall. Some perfectly simple languages — ones you can describe in a breath — are not regular: no DFA, no NFA, and no regular expression recognizes them. The canonical example is

$$L = \{\, 0^{n} 1^{n} \mid n \ge 0 \,\} = \{\varepsilon, 01, 0011, 000111, \dots\},$$

"some 0s followed by equally many 1s." Intuitively, a finite automaton fails here because it would have to count the 0s to check the 1s match — and with only finitely many states it cannot remember an arbitrarily large count. Checking balanced parentheses is the same problem in disguise, which is why no regex can validate nested brackets. To turn "intuitively it can't count" into a theorem, we need the pumping lemma.

The Pumping Lemma for regular languages. If $L$ is regular, then there is a positive integer $p$ (a pumping length) such that every string $s \in L$ with $\lvert s\rvert \ge p$ can be split as $s = xyz$ satisfying all three conditions: 1. $\lvert y\rvert > 0$ (the middle piece is non-empty), 2. $\lvert xy\rvert \le p$ (the first two pieces fit within the first $p$ symbols), and 3. for every $i \ge 0$, the "pumped" string $x y^{i} z \in L$ (repeating $y$ any number of times — or deleting it, $i = 0$ — stays in the language).

💡 Intuition (why it must be true). Take a DFA for $L$ with $p$ states. Feed it any accepted string $s$ of length $\ge p$. As it reads the first $p$ symbols it visits $p + 1$ states (counting the start), so by the pigeonhole principle (Chapter 17) some state repeats. The chunk of input read between the two visits to that repeated state is a loop in the machine — call it $y$. The machine cannot tell how many times that loop ran, so you may run it zero times, once, twice, … and the machine still ends in the same accepting state. That is conditions (1)–(3) exactly: $y$ is the loop (non-empty), it occurs within the first $p$ symbols (the pigeonhole happened early), and pumping it keeps you in $L$.

So the pumping lemma is just the pigeonhole principle wearing a finite-automaton costume: a long enough accepted string must drive the machine through a repeated state, and the loop between the repeats can be pumped. That single observation is the entire content. The lemma's use, though, is the reverse — as a tool to prove a language is NOT regular — and using it correctly demands care.

The strategy first (proving non-regularity). The lemma is an if regular, then … statement, so we use it by contradiction (Chapter 5). Assume $L$ is regular; then it has some pumping length $p$. We do not know $p$, so we must handle every possible $p$ at once — we choose a specific string $s \in L$ that *depends on $p$* and is at least $p$ long. The lemma promises some valid split $s = xyz$; we get to use conditions (1) and (2) to pin down what $y$ must look like, then exhibit one value of $i$ for which $xy^iz \notin L$, contradicting condition (3). The contradiction means our assumption was false: $L$ is not regular. The crucial discipline: the adversary picks $p$ and the split $x,y,z$; you pick $s$ and the pumping exponent $i$. Win for every legal split and you win.

Theorem. $L = \{0^{n}1^{n} \mid n \ge 0\}$ is not regular.

Proof. Suppose, for contradiction, that $L$ is regular. By the pumping lemma it has a pumping length $p$. Choose the string $$s = 0^{p}1^{p},$$ which is in $L$ and has length $2p \ge p$, so the lemma applies. The lemma gives a split $s = xyz$ with $\lvert y\rvert > 0$ and $\lvert xy\rvert \le p$. Because $\lvert xy\rvert \le p$ and the first $p$ symbols of $s$ are all 0s, both $x$ and $y$ lie entirely within that block of 0s — so $y$ consists of only 0s, say $y = 0^{k}$ with $k \ge 1$ (using $\lvert y\rvert > 0$).

Now pump with $i = 2$: the string $xy^{2}z$ has $k$ extra 0s but the same number of 1s, namely $$xy^{2}z = 0^{p+k}\,1^{p}.$$ Since $k \ge 1$, the count of 0s ($p + k$) exceeds the count of 1s ($p$), so $xy^{2}z \notin L$. But condition (3) of the lemma says $xy^{2}z$ must be in $L$. Contradiction. Therefore $L$ is not regular. $\blacksquare$

⚠️ Common Pitfall: who chooses what. The single most common mistake is letting yourself choose the split $x, y, z$ to make pumping fail. You may not. The lemma only promises that some valid split exists; to derive a contradiction you must defeat every split the adversary could hand you. That is why condition (2), $\lvert xy\rvert \le p$, is gold — in the $0^p1^p$ proof it forces $y$ to be all 0s, eliminating every other possibility at once. Skipping condition (2) and merely asserting "$y$ is some 0s" is the error that sinks most attempts. What you do get to choose freely: the string $s$ (choose it adversarially!) and the exponent $i$.

🐛 Find the Error. A student "proves" that $L_{\text{even}}$ (even number of 1s) is not regular: "Let $p$ be the pumping length, take $s = 1^{2p}$, split it as $x=\varepsilon$, $y = 1$, $z = 1^{2p-1}$; then $xy^2z = 1^{2p+1}$ has an odd number of 1s, so it is not in $L_{\text{even}}$ — contradiction." But $L_{\text{even}}$ is regular (we built its DFA in 35.2!). Where is the error?

Answer

The student chose the split. The pumping lemma does not let you pick $x, y, z$; it says for some valid split, pumping stays in the language. For a regular language the lemma is satisfied — there is a good split (here, $y = 11$, two 1s, whose repetition preserves parity) — and a single bad-looking split proves nothing. The lemma can only be used to show a language is not regular (by defeating all splits of a well-chosen $s$); it can never show a language is not regular by exhibiting one split. The student also implicitly violated the "you defeat every split" discipline. (Subtler still: the lemma is a necessary condition for regularity, not a sufficient one — passing it does not prove regularity either.)

The pumping lemma extends to other non-regular languages with the same recipe: pick $s$ depending on $p$, use condition (2) to corner $y$, pump to break a counting or matching invariant. A quick gallery of non-regular languages (provable this way): $\{ww \mid w \in \{0,1\}^{*}\}$ (a string repeated), $\{0^{n^2} \mid n \ge 0\}$ (perfect-square lengths), and the language of balanced parentheses. All of them require unbounded memory; all of them lie beyond the finite automaton.

# A finite automaton CAN'T recognize 0^n 1^n, but a machine WITH a counter (unbounded memory) can.
# This counter is exactly the "extra power" the pumping lemma proves a DFA lacks.
def is_0n1n(w):
    i = 0
    while i < len(w) and w[i] == '0':   # count leading 0s
        i += 1
    zeros = i
    ones = len(w) - i
    rest_all_ones = all(c == '1' for c in w[i:])
    return rest_all_ones and zeros == ones

print(is_0n1n("000111"), is_0n1n("0011"), is_0n1n("0101"), is_0n1n(""))
# Expected output:
# True True False True

The variable zeros is an unbounded counter — it can grow as large as the input demands. That is precisely the resource a finite automaton does not have, and precisely what the pumping lemma proves it needs. Hand-trace each call to be sure: "000111" has zeros = 3 and w[3:] = "111" (all 1s, ones = 3), so $3 = 3$ → True; "0011" has zeros = 2, w[2:] = "11", $2 = 2$ → True; "0101" has zeros = 1, but w[1:] = "101" is not all 1s → False; "" has zeros = 0, ones = 0, and the empty suffix is vacuously all 1s → True (the $n = 0$ case). Always derive the output by hand — the discipline this whole chapter teaches.

🔄 Check Your Understanding 1. In a pumping-lemma proof, who chooses $p$, who chooses $s$, who chooses the split $x,y,z$, and who chooses $i$? 2. Why is condition $\lvert xy\rvert \le p$ the workhorse when proving $0^n1^n$ non-regular? 3. The language "balanced parentheses" is not regular. Sketch the string $s$ (in terms of $p$) you would pump.

Answers

The adversary (the lemma) chooses $p$ and the split $x,y,z$; you choose $s$ (depending on $p$) and the exponent $i$. 2. Combined with "$s$ starts with $p$ zeros," it forces $y$ to sit entirely inside the 0-block, so $y$ is all 0s — pumping then changes the 0-count without touching the 1-count, breaking the match. 3. Take $s = (^{p}\,)^{p}$ (i.e. ($^p$ followed by )$^p$). Condition (2) forces $y$ to be all (, so $xy^2z$ has more ( than ) and is unbalanced — not in the language.

35.6 Beyond regular: the Chomsky hierarchy

If finite automata cannot recognize $0^n1^n$ or balanced parentheses, what can? The answer is a ladder of ever-more-powerful language classes, each defined by a kind of grammar and matched to a kind of machine. This ladder is the Chomsky hierarchy, named for the linguist Noam Chomsky, and it is the map for the rest of Part VI.

The organizing object is a grammar: a finite set of production rules for generating strings. (Where an automaton recognizes strings, a grammar generates them — two views of the same language, echoing the regex-vs-machine duality of 35.4.) Restricting the form the rules may take gives four nested classes, from most restricted (and least powerful) to least restricted (and most powerful).

Definition (Chomsky hierarchy). Four nested classes of formal languages, each with a grammar restriction and a recognizing machine:

Type	Class	Recognizing machine	Example language	Production-rule form
3	Regular	Finite automaton (DFA/NFA)	$0^{}1^{}$; even # of 1s	$A \to aB$ or $A \to a$ (right-linear)
2	Context-free	Pushdown automaton (a DFA + a stack)	$0^n1^n$; balanced parentheses	$A \to \gamma$ (single nonterminal on the left)
1	Context-sensitive	Linear-bounded automaton	$0^n1^n2^n$; $ww$	$\alpha A \beta \to \alpha \gamma \beta$ (non-shrinking)
0	Recursively enumerable	Turing machine (Chapter 36)	every computable-to-recognize language	$\alpha \to \beta$ (unrestricted)

The containment is strict at every level: $$\text{Regular} \subsetneq \text{Context-free} \subsetneq \text{Context-sensitive} \subsetneq \text{Recursively enumerable}.$$ Each "$\subsetneq$" is proper — there is a language at each level that no machine of the level below can recognize. The pumping lemma you just learned is exactly the tool that proves the first strict containment: $0^n1^n$ is context-free but not regular, witnessing $\text{Regular} \subsetneq \text{Context-free}$. (A more elaborate pumping lemma, for context-free languages, separates the next level: $0^n1^n2^n$ is context-sensitive but not context-free.)

💡 Intuition: each rung adds a kind of memory. A finite automaton has no working memory beyond its state. Add a stack (last-in-first-out) and you get a pushdown automaton, which can match nested structure — exactly enough to count parentheses and recognize $0^n1^n$: push on each 0, pop on each 1, accept iff the stack empties exactly. Add a full read-write tape and you get a Turing machine, the most powerful model of all (Chapter 36). The hierarchy is, at heart, a hierarchy of how much and what kind of memory the machine may use.

🔗 Connection. This ladder is the architecture of a compiler. The lexer (tokenizer) uses regular languages — finite automata — to chop source into tokens like identifiers and numbers. The parser uses context-free grammars — pushdown automata — to recognize nested syntax like balanced braces, if/else nesting, and arithmetic precedence. Programming-language syntax is deliberately designed to be context-free precisely so that efficient pushdown-style parsers exist. The reason your regex can tokenize but cannot parse nested code is the reason regular $\subsetneq$ context-free.

🚪 Threshold Concept. The top of the hierarchy, the recursively enumerable languages, is the realm of the Turing machine — and it is the largest class of languages any machine can recognize at all. Beyond it lie languages no machine can decide: the wall we hit with $0^n1^n$ was the first of a series of walls, and the last one — undecidability — is absolute, no amount of memory or cleverness crosses it. Chapter 36 climbs to that final rung and proves, with a diagonalization argument descended from Cantor (Chapter 10), that some problems cannot be solved by any program. This chapter has been the first step up that ladder.

A compact recognizer for the canonical context-free language, to make "a DFA plus a stack" concrete — the stack is the unbounded memory the pumping lemma proved a DFA lacks:

# Pushdown-style recognizer for balanced parentheses (a context-free, non-regular language).
def balanced(s):
    stack = []                      # the extra memory a DFA does not have
    for ch in s:
        if ch == '(':
            stack.append('(')       # push on open
        elif ch == ')':
            if not stack:
                return False        # close with nothing to match -> reject
            stack.pop()             # pop on close
    return len(stack) == 0          # accept iff stack ends empty

print(balanced("(())"), balanced("(()"), balanced(")("), balanced(""))
# Expected output:
# True False False True

The list stack is exactly the pushdown automaton's stack. "(()" ends with one unmatched ( on the stack → reject; ")(" tries to pop an empty stack → reject; "(())" and "" end empty → accept. With that single LIFO structure we have climbed from Type 3 to Type 2 and recognized a language no finite automaton ever could.

🔄 Check Your Understanding 1. Name the machine that recognizes each class: regular, context-free, recursively enumerable. 2. Which level of the hierarchy is the language of balanced parentheses, and what memory structure does its machine add to a DFA? 3. In a compiler, which phase is "regular" and which is "context-free," and why can't the regular phase do the context-free phase's job?

Answers

Regular ↔ finite automaton (DFA/NFA); context-free ↔ pushdown automaton (DFA + stack); recursively enumerable ↔ Turing machine. 2. Context-free (Type 2); its pushdown automaton adds a stack (LIFO memory) to a finite automaton. 3. Lexing/tokenizing is regular; parsing is context-free. The regular (lexer) phase cannot parse because nested structure (balanced braces, arbitrarily deep nesting) is not a regular language — recognizing it needs the parser's stack, which a finite automaton lacks.

Project Checkpoint: a DFA simulator for the Toolkit

The recurring promise of this book's project is a single composable Python package, dmtoolkit. This chapter contributes the engine of the whole theory-of-computation arc: a tiny, reusable finite automaton simulator. It captures Theme 3 (abstraction is the master tool): one interpreter runs every DFA — the parity machine, the "ends in 01" machine, and any DFA you design in the exercises — because the formal 5-tuple of 35.2 is exactly a data structure.

Create dmtoolkit/automata.py and add a DFA class with two methods: accepts(w) runs one string, and recognize(strings) filters an iterable down to the accepted ones.

class DFA:
    """A deterministic finite automaton M = (Q, Sigma, delta, q0, F).
    delta is a dict mapping (state, symbol) -> state (must be total over Sigma)."""
    def __init__(self, states, alphabet, delta, start, accept):
        self.states = set(states)
        self.alphabet = set(alphabet)
        self.delta = dict(delta)
        self.start = start
        self.accept = set(accept)

    def accepts(self, w):
        state = self.start
        for ch in w:                       # one transition per input symbol
            state = self.delta[(state, ch)]
        return state in self.accept        # accept iff we land in F

    def recognize(self, strings):
        """Return the sublist of `strings` that this DFA accepts."""
        return [w for w in strings if self.accepts(w)]


# Build the "even number of 1s" DFA from 35.2 and exercise it.
even_ones = DFA(states={'E', 'O'}, alphabet={'0', '1'},
                delta={('E','0'):'E', ('E','1'):'O', ('O','0'):'O', ('O','1'):'E'},
                start='E', accept={'E'})
print(even_ones.accepts("1010"), even_ones.accepts("111"))
print(even_ones.recognize(["", "1", "11", "101", "1100"]))
# Expected output:
# True False
# ['', '11', '101', '1100']

Hand-trace the recognize call to confirm the list. Run each string through the parity machine ($E$ = even, accept; $O$ = odd): "" stays in $E$ (accept); "1" → $E\xrightarrow{1}O$ (reject); "11" → $E\xrightarrow{1}O\xrightarrow{1}E$ (accept); "101" → $E\xrightarrow{1}O\xrightarrow{0}O \xrightarrow{1}E$ (accept — two 1s, even); `"1100"` → $E\xrightarrow{1}O\xrightarrow{1}E\xrightarrow{0}E \xrightarrow{0}E$ (accept — two 1s, even). So the accepted strings are ['', '11', '101', '1100'], matching the printed output. (For the first line: "1010" ends in $E$ → True; "111" has three 1s and ends in $O$ → False.) As ever in this book, derive the output by hand — that is exactly the trace-don't-trust habit a finite automaton enforces. The same code, with this verified # Expected output:, lives in code/project-checkpoint.py.

Toolkit so far: logic.py (Ch. 1–3), sets.py (Ch. 8), relations.py (Ch. 12–13), combinatorics.py (Ch. 15–17), recurrences.py (Ch. 18–19), probability.py (Ch. 20–21), numbertheory.py / crypto.py (Ch. 22–25), graphs.py (Ch. 27–34), coding.py (Ch. 26, 38), and now the beginnings of automata.py. The DFA class is the foundation the next chapters build on: a Turing machine (Chapter 36) is the same shape of object with an unbounded tape instead of a fixed state set.

Summary

A language is a set of strings over an alphabet; the recognition problem asks, for a given string, whether it is in the language. Finite automata are the simplest machines that solve recognition problems, and this chapter mapped exactly what they can and cannot do.

The machines and their relationships.

Object	Definition (one line)	Recognizes
DFA	$(Q, \Sigma, \delta, q_0, F)$, $\delta\colon Q\times\Sigma \to Q$ (one next state)	the regular languages
NFA	as DFA but $\delta\colon Q\times(\Sigma\cup\{\varepsilon\}) \to \mathcal{P}(Q)$; accept if some path does	the regular languages (same as DFA)
Regular expression	$\emptyset,\varepsilon,a$ + union, concatenation, Kleene star	the regular languages (Kleene's theorem)

The three equivalence/limit results.

Subset construction: every NFA has an equivalent DFA (up to $2^{\lvert Q\rvert}$ states). Nondeterminism adds expressiveness, not power.
Kleene's theorem: regular expressions and finite automata recognize exactly the same languages (the regular languages). Thompson's construction compiles a regex into an NFA, operator by operator.
Pumping lemma: if $L$ is regular with pumping length $p$, every $s \in L$ with $\lvert s\rvert\ge p$ splits as $xyz$ with $\lvert y\rvert>0$, $\lvert xy\rvert\le p$, and $xy^iz\in L$ for all $i\ge 0$. Used (by contradiction) to prove non-regularity, e.g. $0^n1^n$.

Pumping-lemma discipline (who picks what).

Picked by the adversary (lemma)	Picked by you (the prover)
the pumping length $p$	the string $s\in L$ (depending on $p$), $\lvert s\rvert \ge p$
the split $s = xyz$ (any valid one)	the pumping exponent $i$ that breaks it

The Chomsky hierarchy (strict containments, each rung = more memory):

$$\text{Regular (finite automaton)} \subsetneq \text{Context-free (pushdown automaton)} \subsetneq \text{Context-sensitive (LBA)} \subsetneq \text{Recursively enumerable (Turing machine)}.$$

A DFA has only its state. A pushdown automaton adds a stack (recognizes $0^n1^n$, balanced parentheses). A Turing machine adds a full tape (Chapter 36).
In a compiler: lexer = regular, parser = context-free.

Key facts to carry forward. $\mathbb{N}$ includes 0, so $\varepsilon$ (length-0 string) and $n=0$ cases always matter. $\Sigma^{*}$ is infinite even for $\lvert\Sigma\rvert = 2$. A DFA's $\delta$ must be total; use a dead state for "hopeless" inputs. The pumping lemma is necessary but not sufficient for regularity.

Spaced Review

Retrieval strengthens memory. Answer from recall before expanding the details.

(Ch. 27) A DFA's transition diagram is a directed graph with labeled edges. Using the handshaking-lemma habit of counting edges, how many edges (transitions) does a DFA over an alphabet of size $\lvert\Sigma\rvert$ with $\lvert Q\rvert$ states have, and why is the count exact (not just a bound)?

(Ch. 27) Is the transition diagram of a typical DFA a directed or undirected graph, and can it contain self-loops and multi-edges? Tie each answer to a DFA feature.

(Ch. 12) "Two strings are equivalent if they drive the DFA to the same state" defines a relation on $\Sigma^{*}$. Show it is an equivalence relation (reflexive, symmetric, transitive), and say why it has at most $\lvert Q\rvert$ classes.

(Ch. 12) The NFA transition function returns a set of states, i.e. it is a relation between $(Q\times\Sigma)$ and $Q$ rather than a function to $Q$. Which property of a function does an NFA's $\delta$ give up, in the vocabulary of Chapter 12's relations?

Answers

Exactly $\lvert Q\rvert \cdot \lvert\Sigma\rvert$ transitions: $\delta$ is a total function on $Q\times\Sigma$, so every (state, symbol) pair has exactly one outgoing edge — no more (determinism), no fewer (totality). 2. Directed — transitions go from a state to a next state, an ordered pair (Ch. 27's directed graph). It can have self-loops (a symbol that leaves the state unchanged, e.g. 0 on the parity machine's $E$) and multi-edges (two different symbols from $u$ to $v$ are two distinct labeled edges). 3. Reflexive: every string drives the DFA to the same state as itself. Symmetric: if $x$ and $y$ reach the same state, so do $y$ and $x$. Transitive: if $x,y$ reach state $q$ and $y,z$ reach state $q$, then $x,z$ both reach $q$. So it is an equivalence relation (Ch. 12); its classes are indexed by the reachable states, of which there are at most $\lvert Q\rvert$, so at most $\lvert Q\rvert$ classes. (This is the seed of the Myhill–Nerode theorem.) 4. It gives up being well-defined as a function into $Q$ — specifically the "exactly one output" requirement. A DFA's $\delta$ is a function $Q\times\Sigma\to Q$ (one image); an NFA's is a relation that may pair one input with zero, one, or many states (its image is a set), which is why we type it as $\to \mathcal{P}(Q)$.

What's Next

We climbed the bottom of a ladder. Finite automata recognize the regular languages and no more; the pumping lemma drew the first hard line, and the Chomsky hierarchy revealed three more rungs above it, each unlocked by adding memory — a stack, then a full tape. The top rung belongs to the Turing machine, the most powerful model of computation there is. Chapter 36 defines it, states the Church–Turing thesis (that it captures everything mechanically computable), and then delivers the chapter's payoff: a proof that even this maximal machine cannot solve every problem. The halting problem — does this program halt on this input? — is undecidable, provable by the same diagonalization that showed the reals uncountable in Chapter 10 and by the proof-by-contradiction muscle of Chapter 5. The wall we hit at $0^n1^n$ was finite automata's limit; the next wall is computation's limit, and it cannot be crossed by any machine at all.