Chapter 35 — Key Takeaways

Automata and Formal Languages. The one-page card to reread before an exam.


Core definitions (at a glance)

Term Symbol One-line definition
Alphabet $\Sigma$ A finite, non-empty set of symbols (e.g. $\{0,1\}$).
String / word $w$ A finite sequence of symbols from $\Sigma$; length $\lvert w\rvert$.
Empty string $\varepsilon$ The length-0 string ($\lvert\varepsilon\rvert = 0$); the "" of math.
All strings $\Sigma^{*}$ Every string over $\Sigma$, incl. $\varepsilon$. Infinite if $\lvert\Sigma\rvert\ge 1$.
Language $L$ Any subset $L \subseteq \Sigma^{*}$ — a set of strings.
Recognition problem Given $w$, decide $w \in L$? (membership, asked uniformly by one machine)
DFA $(Q,\Sigma,\delta,q_0,F)$ Finite states; $\delta\colon Q\times\Sigma\to Q$ (one next state).
NFA $(Q,\Sigma,\delta,q_0,F)$ $\delta\colon Q\times(\Sigma\cup\{\varepsilon\})\to\mathcal{P}(Q)$; accept if some path does.
Regular language $= L(R)$ for a regex $R$ $=$ recognized by some finite automaton.
Regular expression $R$ Built from $\emptyset,\varepsilon,a$ via union $\mid$, concatenation, star $^{*}$.

$\emptyset$ (no strings) $\neq$ $\{\varepsilon\}$ (one string, the empty one). $\mathbb{N}$ includes 0, so the $\varepsilon$ and $n=0$ cases always matter.


DFA acceptance (the algorithm)

state := q0
for each symbol a in w:  state := delta(state, a)   # exactly one move
accept iff state in F
  • $\delta$ must be total: every (state, symbol) pair needs a transition. Missing arrow ⇒ not a DFA.
  • For "forbidden pattern" languages, route hopeless inputs to a dead state (non-accepting, self-loops on all symbols).
  • A DFA over $\lvert Q\rvert$ states and $\lvert\Sigma\rvert$ symbols has exactly $\lvert Q\rvert\cdot\lvert\Sigma\rvert$ transitions.

The three headline results

Result Statement Used for
Subset construction Every NFA has an equivalent DFA (states = subsets of $Q$, up to $2^{\lvert Q\rvert}$). NFA ⇒ DFA; proves NFA = DFA in power.
Kleene's theorem Regex and finite automata recognize exactly the regular languages. Regex ⇔ machine; compile a regex (Thompson's construction).
Pumping lemma Regular $\Rightarrow$ $\exists p$: every $s\in L$, $\lvert s\rvert\ge p$, splits $s=xyz$ with the 3 conditions. Prove a language is NOT regular (by contradiction).

Subset construction core formula: $\delta_M(S,a) = \bigcup_{q\in S}\delta_N(q,a)$; start $\{q_0\}$; accept any $S$ with $S\cap F\neq\emptyset$. (Add $\varepsilon$-closure if there are $\varepsilon$-moves.) ⚠️ Equivalent in power, not in size — NFAs can be exponentially smaller.

Thompson's construction (regex ⇒ NFA), one gadget per operator:

Operator Gadget
$a$ start $\xrightarrow{a}$ accept
$\varepsilon$ start $\xrightarrow{\varepsilon}$ accept
$\emptyset$ start, accept, no transitions
$R\mid S$ new start $\xrightarrow{\varepsilon}$ both starts; both accepts $\xrightarrow{\varepsilon}$ new accept
$RS$ $R$.accept $\xrightarrow{\varepsilon}$ $S$.start
$R^{*}$ new start $\xrightarrow{\varepsilon}$ {R.start, new accept}; R.accept $\xrightarrow{\varepsilon}$ R.start

Pumping lemma — the 3 conditions and who-picks-what

Split $s = xyz$ where: 1. $\lvert y\rvert > 0$ (middle non-empty), 2. $\lvert xy\rvert \le p$ (within the first $p$ symbols), and 3. $xy^{i}z \in L$ for all $i \ge 0$.

Adversary (the lemma) chooses You (the prover) choose
pumping length $p$ the string $s\in L$, depending on $p$, with $\lvert s\rvert\ge p$
the split $s=xyz$ (any valid one) the exponent $i$ that breaks membership

Recipe to prove "L is not regular": assume regular → get $p$ → pick clever $s(p)$ → condition (2) corners $y$ → choose $i$ (often $i=2$ or $i=0$) so $xy^iz\notin L$ → contradicts (3). Done.

Canonical non-regular languages: $\{0^n1^n\}$ ($s=0^p1^p$, $y$=all 0s, pump $i=2$); balanced parentheses ($s=(^p\,)^p$); $\{ww\}$; $\{0^{n^2}\}$. All need unbounded memory.

Why true: a DFA with $p$ states reading $p$ symbols repeats a state (pigeonhole, Ch. 17); the loop between repeats is $y$, and it can be pumped. The lemma is necessary, not sufficient for regularity.


The Chomsky hierarchy (memorize this table)

Type Class Machine Memory added Example
3 Regular Finite automaton (DFA/NFA) none (just state) $0^{*}1^{*}$, even # of 1s
2 Context-free Pushdown automaton a stack (LIFO) $0^n1^n$, balanced parens
1 Context-sensitive Linear-bounded automaton bounded tape $0^n1^n2^n$, $ww$
0 Recursively enumerable Turing machine (Ch. 36) full read/write tape every machine-recognizable language

$$\text{Regular} \subsetneq \text{Context-free} \subsetneq \text{Context-sensitive} \subsetneq \text{Recursively enumerable}$$

Each $\subsetneq$ is strict. Pumping lemma proves the first one (regular $\subsetneq$ CF, via $0^n1^n$). Compiler map: lexer = regular; parser = context-free.


"Which tool?" decision aid

You want to… Use
Recognize a fixed pattern (identifiers, numbers, even/odd counts) DFA / regex (it's regular)
Describe a language declaratively regular expression
Design a machine but choices feel natural ("guess then verify") NFA, then subset-construct if you need a DFA
Match nested / balanced structure (parens, braces, $0^n1^n$) pushdown automaton (NOT a finite automaton)
Prove a language is not regular pumping lemma (by contradiction)
Accept the complement of a DFA language swap $F \to Q\setminus F$ (same $Q,\delta$)

Common pitfalls

  • ❌ Drawing a DFA with a missing transition for some symbol. ✅ $\delta$ is total; add a dead state.
  • ❌ Choosing $x,y,z$ yourself in a pumping proof. ✅ The adversary picks the split; you must beat every legal split (condition (2) is what corners $y$).
  • ❌ Using the pumping lemma to "prove" a language is regular, or trusting that passing it ⇒ regular. ✅ It is one-directional (proves non-regularity only) and only necessary.
  • ❌ Thinking NFA ⇒ DFA keeps the size small. ✅ Up to $2^{\lvert Q\rvert}$ blow-up is real.
  • ❌ Treating regex backreferences (\1) as "regular." ✅ They escape the regular class.

Toolkit addition (dmtoolkit/automata.py)

class DFA:                          # M = (Q, Sigma, delta, q0, F)
    def accepts(self, w):           # run delta once per symbol; test state in F
        ...
    def recognize(self, strings):   # filter to the accepted strings
        ...
# DFA("even # of 1s").accepts("1010") -> True ;  .accepts("111") -> False

One interpreter runs every DFA (the 5-tuple is a data structure). Foundation for the Turing machine in Ch. 36 (same shape, unbounded tape instead of finite state set).