Chapter 35 — Key Takeaways
Automata and Formal Languages. The one-page card to reread before an exam.
Core definitions (at a glance)
| Term | Symbol | One-line definition |
|---|---|---|
| Alphabet | $\Sigma$ | A finite, non-empty set of symbols (e.g. $\{0,1\}$). |
| String / word | $w$ | A finite sequence of symbols from $\Sigma$; length $\lvert w\rvert$. |
| Empty string | $\varepsilon$ | The length-0 string ($\lvert\varepsilon\rvert = 0$); the "" of math. |
| All strings | $\Sigma^{*}$ | Every string over $\Sigma$, incl. $\varepsilon$. Infinite if $\lvert\Sigma\rvert\ge 1$. |
| Language | $L$ | Any subset $L \subseteq \Sigma^{*}$ — a set of strings. |
| Recognition problem | — | Given $w$, decide $w \in L$? (membership, asked uniformly by one machine) |
| DFA | $(Q,\Sigma,\delta,q_0,F)$ | Finite states; $\delta\colon Q\times\Sigma\to Q$ (one next state). |
| NFA | $(Q,\Sigma,\delta,q_0,F)$ | $\delta\colon Q\times(\Sigma\cup\{\varepsilon\})\to\mathcal{P}(Q)$; accept if some path does. |
| Regular language | — | $= L(R)$ for a regex $R$ $=$ recognized by some finite automaton. |
| Regular expression | $R$ | Built from $\emptyset,\varepsilon,a$ via union $\mid$, concatenation, star $^{*}$. |
$\emptyset$ (no strings) $\neq$ $\{\varepsilon\}$ (one string, the empty one). $\mathbb{N}$ includes 0, so the $\varepsilon$ and $n=0$ cases always matter.
DFA acceptance (the algorithm)
state := q0
for each symbol a in w: state := delta(state, a) # exactly one move
accept iff state in F
- $\delta$ must be total: every (state, symbol) pair needs a transition. Missing arrow ⇒ not a DFA.
- For "forbidden pattern" languages, route hopeless inputs to a dead state (non-accepting, self-loops on all symbols).
- A DFA over $\lvert Q\rvert$ states and $\lvert\Sigma\rvert$ symbols has exactly $\lvert Q\rvert\cdot\lvert\Sigma\rvert$ transitions.
The three headline results
| Result | Statement | Used for |
|---|---|---|
| Subset construction | Every NFA has an equivalent DFA (states = subsets of $Q$, up to $2^{\lvert Q\rvert}$). | NFA ⇒ DFA; proves NFA = DFA in power. |
| Kleene's theorem | Regex and finite automata recognize exactly the regular languages. | Regex ⇔ machine; compile a regex (Thompson's construction). |
| Pumping lemma | Regular $\Rightarrow$ $\exists p$: every $s\in L$, $\lvert s\rvert\ge p$, splits $s=xyz$ with the 3 conditions. | Prove a language is NOT regular (by contradiction). |
Subset construction core formula: $\delta_M(S,a) = \bigcup_{q\in S}\delta_N(q,a)$; start $\{q_0\}$; accept any $S$ with $S\cap F\neq\emptyset$. (Add $\varepsilon$-closure if there are $\varepsilon$-moves.) ⚠️ Equivalent in power, not in size — NFAs can be exponentially smaller.
Thompson's construction (regex ⇒ NFA), one gadget per operator:
| Operator | Gadget |
|---|---|
| $a$ | start $\xrightarrow{a}$ accept |
| $\varepsilon$ | start $\xrightarrow{\varepsilon}$ accept |
| $\emptyset$ | start, accept, no transitions |
| $R\mid S$ | new start $\xrightarrow{\varepsilon}$ both starts; both accepts $\xrightarrow{\varepsilon}$ new accept |
| $RS$ | $R$.accept $\xrightarrow{\varepsilon}$ $S$.start |
| $R^{*}$ | new start $\xrightarrow{\varepsilon}$ {R.start, new accept}; R.accept $\xrightarrow{\varepsilon}$ R.start |
Pumping lemma — the 3 conditions and who-picks-what
Split $s = xyz$ where: 1. $\lvert y\rvert > 0$ (middle non-empty), 2. $\lvert xy\rvert \le p$ (within the first $p$ symbols), and 3. $xy^{i}z \in L$ for all $i \ge 0$.
| Adversary (the lemma) chooses | You (the prover) choose |
|---|---|
| pumping length $p$ | the string $s\in L$, depending on $p$, with $\lvert s\rvert\ge p$ |
| the split $s=xyz$ (any valid one) | the exponent $i$ that breaks membership |
Recipe to prove "L is not regular": assume regular → get $p$ → pick clever $s(p)$ → condition (2) corners $y$ → choose $i$ (often $i=2$ or $i=0$) so $xy^iz\notin L$ → contradicts (3). Done.
Canonical non-regular languages: $\{0^n1^n\}$ ($s=0^p1^p$, $y$=all 0s, pump $i=2$); balanced parentheses ($s=(^p\,)^p$); $\{ww\}$; $\{0^{n^2}\}$. All need unbounded memory.
Why true: a DFA with $p$ states reading $p$ symbols repeats a state (pigeonhole, Ch. 17); the loop between repeats is $y$, and it can be pumped. The lemma is necessary, not sufficient for regularity.
The Chomsky hierarchy (memorize this table)
| Type | Class | Machine | Memory added | Example |
|---|---|---|---|---|
| 3 | Regular | Finite automaton (DFA/NFA) | none (just state) | $0^{*}1^{*}$, even # of 1s |
| 2 | Context-free | Pushdown automaton | a stack (LIFO) | $0^n1^n$, balanced parens |
| 1 | Context-sensitive | Linear-bounded automaton | bounded tape | $0^n1^n2^n$, $ww$ |
| 0 | Recursively enumerable | Turing machine (Ch. 36) | full read/write tape | every machine-recognizable language |
$$\text{Regular} \subsetneq \text{Context-free} \subsetneq \text{Context-sensitive} \subsetneq \text{Recursively enumerable}$$
Each $\subsetneq$ is strict. Pumping lemma proves the first one (regular $\subsetneq$ CF, via $0^n1^n$). Compiler map: lexer = regular; parser = context-free.
"Which tool?" decision aid
| You want to… | Use |
|---|---|
| Recognize a fixed pattern (identifiers, numbers, even/odd counts) | DFA / regex (it's regular) |
| Describe a language declaratively | regular expression |
| Design a machine but choices feel natural ("guess then verify") | NFA, then subset-construct if you need a DFA |
| Match nested / balanced structure (parens, braces, $0^n1^n$) | pushdown automaton (NOT a finite automaton) |
| Prove a language is not regular | pumping lemma (by contradiction) |
| Accept the complement of a DFA language | swap $F \to Q\setminus F$ (same $Q,\delta$) |
Common pitfalls
- ❌ Drawing a DFA with a missing transition for some symbol. ✅ $\delta$ is total; add a dead state.
- ❌ Choosing $x,y,z$ yourself in a pumping proof. ✅ The adversary picks the split; you must beat every legal split (condition (2) is what corners $y$).
- ❌ Using the pumping lemma to "prove" a language is regular, or trusting that passing it ⇒ regular. ✅ It is one-directional (proves non-regularity only) and only necessary.
- ❌ Thinking NFA ⇒ DFA keeps the size small. ✅ Up to $2^{\lvert Q\rvert}$ blow-up is real.
- ❌ Treating regex backreferences (
\1) as "regular." ✅ They escape the regular class.
Toolkit addition (dmtoolkit/automata.py)
class DFA: # M = (Q, Sigma, delta, q0, F)
def accepts(self, w): # run delta once per symbol; test state in F
...
def recognize(self, strings): # filter to the accepted strings
...
# DFA("even # of 1s").accepts("1010") -> True ; .accepts("111") -> False
One interpreter runs every DFA (the 5-tuple is a data structure). Foundation for the Turing machine in Ch. 36 (same shape, unbounded tape instead of finite state set).