Exercises: Automata and Formal Languages
These build from mechanical warm-ups (describe a language, trace a DFA) up to full proofs, code, and
modeling. Difficulty: ⭐ foundational, ⭐⭐ intermediate, ⭐⭐⭐ challenging. Worked solutions to the
daggered (†) and odd-numbered problems are in the appendix answers-to-selected.md — draw every
machine and run every trace yourself before you peek. Throughout, $\Sigma = \{0,1\}$ unless stated
otherwise, and $\varepsilon$ is the empty string.
Part A — Warm-ups (⭐)
35.1 † List every string of length $\le 2$ in $L_{\text{even}} = \{w \mid w$ has an even number of $1$s$\}$. How many strings of length exactly $3$ are in it?
35.2 How many strings of length exactly $4$ are there over $\Sigma = \{a, b, c\}$? How many of length $\le 2$ (counting $\varepsilon$)?
35.3 † For the parity DFA of §35.2 ($Q = \{E, O\}$, start $E$, accept $\{E\}$), trace the
computation on 11010 symbol by symbol, writing the state after each input symbol. Does it accept?
35.4 Write the language recognized by the "ends in 01" DFA of §35.2 in plain English, then give
two strings it accepts and two it rejects (none of length $0$).
35.5 † Give the formal 5-tuple $(Q, \Sigma, \delta, q_0, F)$ for a DFA over $\Sigma = \{0,1\}$ that accepts only the empty string $\varepsilon$ and nothing else. (Hint: a dead state earns its keep.)
35.6 Using the inductive definition of regular expressions from §35.4, give $L(R)$ for each: (a) $R = (0 \mid 1)0$; (b) $R = 1^{*}$; (c) $R = \emptyset^{*}$. (Part (c) is a famous trap — read the star definition carefully.)
Part B — Prove it (⭐⭐)
35.7 † (Scaffolded — fill in the missing step.) Prove that $L = \{0^{n}1^{n} \mid n \ge 0\}$ is not regular. Fill the blanks, citing the correct pumping-lemma condition for each. - Assume for contradiction $L$ is regular, so it has a pumping length $p$. Choose $s = \underline{\hphantom{xxxx}}$, which is in $L$ and has length $\underline{\hphantom{xx}} \ge p$. - The lemma gives a split $s = xyz$ with $\lvert y\rvert > 0$ and $\lvert xy\rvert \le p$. Because $\lvert xy\rvert \le p$ and the first $p$ symbols of $s$ are all $\underline{\hphantom{xx}}$, the piece $y$ consists of only $\underline{\hphantom{xx}}$, say $y = 0^{k}$ with $k \ge 1$. - Pump with $i = \underline{\hphantom{xx}}$: then $xy^{i}z = \underline{\hphantom{xxxxx}}$, which has a different number of $0$s than $1$s, so it is $\underline{\hphantom{xxxx}}$ $L$ — contradicting condition $\underline{\hphantom{xx}}$ of the lemma. Therefore $L$ is not regular. $\blacksquare$
35.8 Prove that the regular languages are closed under complement: if $L = L(M)$ for some DFA $M = (Q, \Sigma, \delta, q_0, F)$, then $\overline{L} = \Sigma^{*} \setminus L$ is also recognized by a DFA. (Hint: §35.2's first Check Your Understanding tells you which of the five components to change.)
35.9 † Prove that $L = \{0^{n}1^{m} \mid n > m \ge 0\}$ (more $0$s than $1$s) is not regular, using the pumping lemma. State clearly which string $s$ you pick and which exponent $i$ breaks it.
35.10 Let $N$ be an NFA with $L(N) = L$. Prove, by induction on the length of the input string, the lemma at the heart of the subset construction: after the determinized machine reads a string $w$, its single DFA-state equals exactly the set of $N$-states reachable from $q_0$ on $w$. (You may assume $N$ has no $\varepsilon$-transitions; state the base case and the inductive step explicitly, mirroring the proof in §35.3.)
35.11 † Prove that $L = \{w w \mid w \in \{0,1\}^{*}\}$ (any string immediately repeated) is not regular. (Hint: choose $s = 0^{p}1\,0^{p}1$ and use $\lvert xy\rvert \le p$ to corner $y$ inside the first block of $0$s.)
35.12 Prove that every finite language is regular. (Hint: you do not need a clever automaton — think about what regular expression a finite set of strings admits, using union and concatenation.)
Part C — Implement it (⭐⭐)
Write Python for each. Do not run it on the grader's machine — hand-trace and include an
# Expected output: comment, matching the book's convention. Reference solutions live in
code/exercise-solutions.py.
35.13 † Using the universal run_dfa(delta, start, accept, w) driver from §35.2, encode a DFA over
$\Sigma = \{0,1\}$ that accepts exactly the strings ending in 00. Provide delta, start,
accept, then print the results on the four test strings ["00", "100", "0", "010"]. State, in a
comment, what each state "remembers."
35.14 Implement nfa_accepts(dN, start, F, w) (the on-the-fly subset simulation from §35.3) and use
it on an NFA for "the second-from-last symbol is a 1." Give dN, then print the result on "10",
"01", and "110". (Three states suffice: guess, then walk one more symbol.)
35.15 † Write a function complement_dfa(delta, states, accept) that, given a DFA's transition dict,
state set, and accepting set, returns the accepting set of the DFA for the complement language (same
states, same transitions). Demonstrate it by complementing the parity DFA and printing whether the new
machine accepts "1" and "11".
35.16 Implement the pushdown-style balanced(s) recognizer from §35.6, but generalize it to two
bracket kinds, () and [], so that "([])" is accepted and "([)]" is rejected. Print results on
["([])", "([)]", "(", ""]. In a comment, explain why a finite automaton (no stack) cannot do this.
Part D — Find the error (⭐⭐)
Each argument below is wrong. State precisely which part fails and why, citing the relevant definition or the pumping-lemma discipline (who picks $s$, $i$ versus who picks $p$ and the split).
35.17 † Claim: "$L_{\text{even}}$ (even number of $1$s) is not regular." "Proof": "Let $p$ be the pumping length and take $s = 1^{2p}$. Split it as $x = \varepsilon$, $y = 1$, $z = 1^{2p-1}$. Then $xy^{2}z = 1^{2p+1}$ has an odd number of $1$s, so $xy^{2}z \notin L_{\text{even}}$ — contradiction. Hence $L_{\text{even}}$ is not regular." Find the flaw. (We built a DFA for $L_{\text{even}}$ in §35.2, so the conclusion is certainly false.)
35.18 Claim: "Every NFA with $n$ states converts to a DFA with at most $n$ states, because each NFA state becomes one DFA state." Identify the error and state the correct bound from the subset construction.
35.19 † Claim: "The transition function $\delta$ of the following DFA is fine: $Q = \{A, B\}$, $\Sigma = \{0,1\}$, with $\delta(A,0) = B$, $\delta(B,1) = A$, $\delta(B,0) = B$, start $A$, accept $\{A\}$." A student insists this is a valid DFA. What requirement from the §35.2 definition is violated, and how would a dead state fix it?
35.20 Claim: "$L = \{0^{n}1^{n} \mid n \ge 0\}$ passes the pumping lemma — take any $s$, split it as $x = \varepsilon$, $y = s$, $z = \varepsilon$, and note $xy^{1}z = s \in L$ — therefore $L$ is regular." Two distinct things are wrong here. Name both. (One is about how the lemma is used; one is about what "passing" the lemma would even prove — recall the lemma is necessary but not sufficient.)
Part E — Model it & Conjecture and test (⭐⭐⭐)
35.21 † (Model it.) A vending machine accepts only nickels (n) and dimes (d) and dispenses a
drink the instant the inserted total reaches exactly 15 cents, rejecting (and refunding) any sequence
that would overshoot 15. Model the set of accepted coin sequences as a language over
$\Sigma = \{n, d\}$, and design a DFA that recognizes it. Give the 5-tuple and trace it on nnn, dn,
and dd. (Hint: states are "cents so far"; include a dead/refund state for overshoot.)
35.22 (Model it.) You are writing the lexer for a toy language whose identifiers are "a letter followed by zero or more letters or digits." Over the abstract alphabet $\Sigma = \{L, D\}$ ($L$ = any letter, $D$ = any digit), (a) write a regular expression for the identifier language, and (b) give a two-state DFA (plus a dead state) recognizing it. Which Chomsky-hierarchy level is this, and why is it the right level for a lexer (§35.6)?
35.23 † (Conjecture and test, then prove.) Let $a_k$ be the number of states in the minimal DFA
recognizing "the $k$-th symbol from the end is a 1," and let $b_k$ be the number of states in the
natural NFA for it. Tabulate $a_k$ and $b_k$ for $k = 1, 2, 3$ (build the machines, or simulate with
nfa_accepts and the subset construction in code), conjecture closed forms for $a_k$ and $b_k$, then
prove the bound $a_k \le 2^{k}$ using the subset construction. What does the gap between $a_k$ and
$b_k$ illustrate about §35.3's "exponential blow-up" pitfall?
35.24 (Conjecture and test, then prove.) Write code that, for each $n$ from $0$ to $8$, tests
whether the string $0^{n}1^{n}$ is accepted by your run_dfa for the parity DFA of §35.2 (it will
accept some and reject others). Tabulate the results, conjecture which $0^{n}1^{n}$ the parity machine
accepts, and prove your conjecture. (This shows a DFA can accept some strings of a non-regular-looking
form without recognizing the whole non-regular language.)
35.25 † (Conjecture and test, then prove.) The reversal of a string $w$, written $w^{R}$, reverses its symbols. Conjecture whether the class of regular languages is closed under reversal — i.e., if $L$ is regular, is $L^{R} = \{w^{R} \mid w \in L\}$ regular? Test your conjecture on two small regular languages in code (build the DFA, generate its strings up to length $4$, reverse them, and check by hand that the reversed set is still describable by a regex). Then prove your conjecture using either Kleene's theorem (reverse a regular expression) or the NFA construction (reverse all arrows, swap start and accept).
Part F — Interleaved & Deep Dive
Mixing techniques from earlier chapters keeps them sharp.
35.26 † (Ch. 27 + 35.) A DFA's transition diagram is a directed graph with labeled edges. For a DFA with $\lvert Q\rvert$ states over an alphabet of size $\lvert\Sigma\rvert$, how many edges (transitions) does the diagram have, and why is the count exact (not just an upper bound)? Tie your answer to the totality of $\delta$.
35.27 (Ch. 17 + 35.) State the pigeonhole principle, then explain in two or three sentences how the pumping lemma is "the pigeonhole principle in a finite-automaton costume." Identify the pigeons, the holes, and the conclusion.
35.28 † (Ch. 12 + 35.) Define a relation on $\Sigma^{*}$ by: $x \sim y$ iff the DFA $M$ (with state set $Q$) drives both $x$ and $y$ to the same state from $q_0$. Prove $\sim$ is an equivalence relation (reflexive, symmetric, transitive), and explain why it has at most $\lvert Q\rvert$ equivalence classes. (This is the seed of the Myhill–Nerode theorem from the Deep Dive learning path.)
35.29 (Ch. 8 + 35.) The subset construction sets the DFA's state set to $Q' = \mathcal{P}(Q)$, the power set of the NFA's states. Using the Chapter 8 fact that $\lvert\mathcal{P}(Q)\rvert = 2^{\lvert Q\rvert}$, explain why an $n$-state NFA can blow up to a $2^{n}$-state DFA but never more. For $n = 4$, how many DFA states is that?
35.30 (Ch. 5 + 35; Deep Dive.) Both the pumping-lemma method (this chapter) and the proof that the reals are uncountable (Chapter 10) and the halting-problem proof (Chapter 36) are proofs by contradiction (Chapter 5). In one paragraph, articulate the shared "shape" of these arguments — assume the thing exists, extract a contradiction — and note how each one identifies a wall that no machine of its class can cross.
35.31 (Deep Dive.) Prove that the regular languages are closed under union: if $L_1 = L(M_1)$ and $L_2 = L(M_2)$ for DFAs $M_1, M_2$, then $L_1 \cup L_2$ is regular. (Hint: the product construction — run both DFAs at once on the same input, with states $Q_1 \times Q_2$; choose the accepting set so the combined machine accepts iff either component does. Compare with intersection, where you would accept iff both do.)
Solutions to † and odd-numbered exercises: see appendices/answers-to-selected.md. For every
non-regularity proof, the rubric rewards: choosing $s$ to depend on $p$ with $\lvert s\rvert \ge p$,
using $\lvert xy\rvert \le p$ to pin down $y$, defeating every legal split (not just one convenient
one), and naming the exact exponent $i$ that produces a string outside the language.