Exercises: Trees

DataField.Dev

Exercises: Trees

These run from mechanical warm-ups through full proofs, code, modeling, and a section that interleaves earlier chapters. Difficulty: ⭐ foundational, ⭐⭐ intermediate, ⭐⭐⭐ challenging. Worked solutions to the starred-with-a-dagger (†) and odd-numbered problems are in the appendix answers-to-selected.md — attempt them before you peek. For every proof, the rubric rewards a clearly stated claim, the right induction variable and base case, explicit use of the hypothesis (the "strip a leaf" or "split at the root" move), and a clean conclusion.

Part A — Warm-ups (⭐)

These exercise the definitions of §§31.1–31.3 directly: counting edges, reading rooted-tree terminology, and hand-running traversals. If any feels slow, that is the signal to re-read the matching section before moving on.

31.1 † A graph has 15 vertices. How many edges must it have to possibly be a tree? If it is a tree, how many edges does it have for certain? Justify with Theorem 31.2. Then answer the same two questions for a forest with 15 vertices and 4 components (use the forest edge count from Exercise 31.7).

31.2 For each graph, say whether it is a tree, a forest (but not a single tree), or neither, and give the one-word reason ("cycle" or "disconnected"): (a) connected, 8 vertices, 7 edges; (b) connected, 8 vertices, 9 edges; (c) acyclic, 8 vertices, 5 edges; (d) acyclic, 8 vertices, 7 edges. For the two that are trees-or-forests, state how many components each has, using the forest edge-count relation $\lvert E\rvert = n - c$.

31.3 † In the rooted tree below (root $a$), list the children of $b$, the parent of $g$, every leaf, the ancestors of $h$, the subtree rooted at $c$, the depth of $h$, and the height of the tree.

31.4 Give the preorder, inorder, and postorder traversals of the binary tree Node(7, Node(3, Node(1), Node(5)), Node(9)). Which of the three reads the values in sorted order, and what does that tell you about whether this particular tree is a binary search tree?

31.5 † Insert the keys $8, 3, 10, 1, 6, 14$ in that order into an empty binary search tree. Draw the result, state its height, and give the inorder traversal. Then say what would happen to the height if you had instead inserted the same keys in sorted order $1, 3, 6, 8, 10, 14$ — and name the §31.4 phenomenon that produces.

31.6 In an array-backed binary heap (root at index 0), what are the array indices of the parent and the two children of the node stored at index 6? Then state the index of the last node in a complete heap that has 10 elements, and identify whether that last node is a left or a right child.

Part B — Prove it (⭐⭐)

These ask for full proofs. Several reuse the chapter's signature move — find a leaf, strip it, apply the hypothesis to the smaller tree (Theorem 31.2) — or its structural-induction cousin (split at the root). State your claim $P(n)$, name the induction variable, and use the hypothesis explicitly.

31.7 † (Scaffolded — fill the missing step.) Prove by induction on $n$ that a forest with $n$ vertices and $c$ connected components has exactly $n - c$ edges. Fill each blank.

Claim $P(n)$: every forest on $n$ vertices with $c$ components has $\underline{\hphantom{xxxx}}$ edges.
Base case ($n = 1$): a single vertex is a forest with $c = \underline{\hphantom{x}}$ component(s) and $\underline{\hphantom{x}}$ edges; $n - c = \underline{\hphantom{xxxx}}$. ✓
Inductive step: assume $P(k)$. Let $F$ be a forest on $k + 1$ vertices. Because $F$ is acyclic with at least one vertex, some component is a tree with $\ge 2$ vertices, so by Lemma 31.3 it has a leaf $v$ (handle the all-isolated-vertices case separately). Delete $v$ and its single edge to get $F'$ on $k$ vertices. Deleting a leaf keeps the number of components $\underline{\hphantom{xxxxxx}}$ (it does not split or merge any component). By the hypothesis $F'$ has $\underline{\hphantom{xxxx}}$ edges; adding $v$'s one edge back gives $F$ exactly $\underline{\hphantom{xxxxxx}}$ edges $= (k+1) - c$. ✓

31.8 Prove that in any tree with $n \ge 2$ vertices, the number of leaves is at least 2. (Hint: take a longest path and look at both of its endpoints; reuse the argument of Lemma 31.3.)

31.9 † Prove that a binary search tree storing $n$ distinct keys, traversed inorder, outputs the keys in strictly increasing order. (Hint: induct on the structure of the tree — the BST invariant says everything in the left subtree is smaller than the root, everything in the right subtree is larger.)

31.10 Prove that a full binary tree (every internal node has exactly two children) with $\ell$ leaves has exactly $\ell - 1$ internal nodes. (You may induct on $\ell$, or strip a deepest pair of sibling leaves and recurse.) Then connect this to Huffman coding: a Huffman tree for $s$ symbols is a full binary tree with $s$ leaves — how many internal nodes does it have, and how does that match the number of merges from Exercise 31.26?

31.11 † Let $T$ be a binary tree of height $h$. Prove that $T$ has at most $2^{h+1} - 1$ vertices, and at most $2^h$ leaves. (Induct on $h$; the root's two subtrees each have height $\le h - 1$.)

31.12 Prove that for any two vertices $u, v$ in a tree there is exactly one path between them, directly from the $n-1$ edge count rather than from acyclicity — i.e., show a connected graph on $n$ vertices with $n - 1$ edges has a unique $u$–$v$ path. (Hint: connectivity gives at least one path; a second path would create a cycle, letting you delete an edge and stay connected on $n$ vertices with $n - 2$ edges, which is impossible.)

Part C — Implement it (⭐⭐)

Write Python for each. Do not run it — hand-trace and include an # Expected output: comment, matching the book's convention. Use the Node class from §31.3 (val, left, right). For each function, note which traversal pattern (preorder / inorder / postorder) your recursion realizes; nearly every tree function here is "do something at the node, recurse on the children" with the order being the only choice.

31.13 † Write a recursive function height(node) returning the height of a binary tree (an empty tree has height $-1$, a single node height $0$). Hand-trace it on Node(1, Node(2, Node(4)), Node(3)) and show the # Expected output:. Why is the empty-tree convention $-1$ (rather than $0$) the one that makes the recurrence 1 + max(height(left), height(right)) come out right for a single node?

31.14 Write count_leaves(node) returning the number of leaves in a binary tree. Hand-trace it on the same tree as 31.13. State the base case(s) carefully — note there are two (the empty tree and a leaf), and explain why collapsing them into one would miscount.

31.15 † Write is_bst(node, lo=float('-inf'), hi=float('inf')) that returns True iff a binary tree satisfies the BST invariant, by passing down an allowed (lo, hi) range. Explain in one sentence why checking only "left child $<$ node $<$ right child" locally at each node is not sufficient, and give a 3-node counterexample tree.

31.16 Write evaluate(node) for an expression tree whose internal nodes hold '+', '-', '*', or '/' and whose leaves hold numbers. State which traversal order your recursion realizes, and hand-trace it on the tree for $(8 - 2) * 5$, writing down each subtree's value before the final operation.

Part D — Find the error (⭐⭐)

Each argument below is wrong. State precisely which part fails and why; a one-line counterexample is ideal. These mirror real mistakes — a dropped hypothesis, a local check mistaken for a global invariant, a best-case cost claimed as the general case. Diagnosing a broken argument is exactly the skill the book's theme two demands: a plausible-looking proof is not a correct one.

31.17 † Claim: "Every connected graph on $n$ vertices with exactly $n - 1$ edges is a tree, and conversely; therefore every graph with exactly $n - 1$ edges is a tree." "Proof": a tree has $n - 1$ edges by Theorem 31.2, and we proved the converse in the §31.1 Check Your Understanding. ∎ — Find the flaw. (Hint: drop the word "connected" and build a small counterexample on 4 vertices.)

31.18 Claim: "In a binary search tree, to check the BST property it suffices to verify, at every node, that its left child's key is less than its key and its right child's key is greater." "Proof": the BST invariant is a statement about each node and its children, so checking all parent–child pairs checks the invariant. ∎ — Find the flaw and give a 3-node tree that passes the local check but is not a BST.

31.19 † Claim: "A plain binary search tree gives $O(\log n)$ search, because each comparison discards a subtree and you cannot discard a subtree more than $\log_2 n$ times." "Proof": every comparison halves the remaining keys. ∎ — Find the flaw, and exhibit an insertion order on $\{1,2,3,4\}$ for which search costs $\Theta(n)$.

31.20 Claim: "Postorder and inorder traversals of a binary tree always determine the tree uniquely." A student verifies it on Node(1, Node(2), Node(3)). — Is the claim true? If so, say why a single example is not a proof; if false, give two distinct trees with the same preorder and the same postorder. (Hint: think about a tree that is a single left chain versus a single right chain.)

Part E — Model it (⭐⭐⭐)

The skill here is translation: take an informal real-world situation and name the precise discrete-math object (which kind of tree?), what its vertices/edges/leaves/height mean, and which chapter operation solves the stated task. A good answer reads like the first paragraph of a design doc — unambiguous about the model before any code.

31.21 † (Model it — file systems.) A Unix-style file system has directories that contain files and other directories; there are no hard links (so nothing is contained twice) and there is a single root /. Model this as a discrete-math object: which kind of tree is it (rooted? binary?), what do vertices and edges represent, what is a leaf, and what real quantity does the height measure? Then state, in terms of your model, what guarantees ls -R (recursive listing) visits every file exactly once, and name the traversal order it most resembles.

31.22 (Model it — autocomplete.) A search box must, after the user types a prefix $p$, list every stored word beginning with $p$. Model the dictionary as a trie. Describe precisely (in tree terms) the two-step operation the autocomplete performs (locating the node for $p$, then collecting the words beneath it), give its running time in terms of $\lvert p\rvert$ and the number $k$ of completions returned, and explain why the total number of stored words does not appear in that cost. Then state which traversal of the subtree below the prefix node you would use to list the completions in alphabetical order, and why (relate it to the inorder/sorted-order idea of §31.4).

31.23 † (Model it — committee org chart.) A company has one CEO; every other employee has exactly one direct manager; there are no management cycles. An "all-hands email" is forwarded by each manager to their direct reports. Model the org as a rooted tree, then answer: (a) which traversal models the email fan-out, (b) what graph quantity equals the worst-case number of forwarding hops before everyone is reached, and (c) using Theorem 31.2, how many manager→report relationships exist if the company has $n$ employees?

Part F — Conjecture and test (⭐⭐⭐)

This is theme four in miniature: use code to generate evidence on many cases, form a conjecture, then settle it with a proof. The code is for exploration only — it never proves anything, it just shows you the pattern worth proving. State your conjecture crisply before you prove it, and note where the evidence could have misled you (a small-case coincidence is not a theorem).

31.24 † (Conjecture and test, then prove.) Define the internal path length of a binary tree as the sum of the depths of all its nodes. Write code that builds the complete binary tree on $n = 1, 3, 7, 15$ nodes (heights $0, 1, 2, 3$) and computes the internal path length of each. Tabulate $n$ versus the path length, conjecture a formula for a perfect tree of height $h$ (so $n = 2^{h+1}-1$), test it on $h = 4$, then prove your formula by induction on $h$. (Hint: a perfect tree of height $h$ is a root plus two perfect subtrees of height $h - 1$, each of whose nodes sits one level deeper.)

31.25 (Conjecture and test.) Insert the $n!$ different permutations of $\{1, 2, \dots, n\}$ into a BST, one permutation per fresh tree, and record the resulting height each time. For $n = 3$ and $n = 4$, write code (hand-traceable) that enumerates the permutations and tabulates how many produce each height. Conjecture which insertion orders give the minimum height and which give the maximum, and connect your maximum-height finding to the "degenerate stick" pitfall of §31.4. (You need not prove the average height — that is a famously hard $\Theta(\log n)$ result; just report the extremes.)

31.26 † (Conjecture and test, then prove.) For Huffman coding, conjecture the relationship between the number of distinct symbols $s$ and the number of merge operations the algorithm performs before one tree remains. Verify by hand-tracing $s = 2, 3, 4, 5$ symbols (any frequencies), then prove your formula. (Hint: each merge reduces the number of trees in the queue by exactly one.)

Part G — Interleaved & Deep Dive

Mixing techniques from earlier chapters keeps them sharp, and the Deep-Dive problems reach toward material the chapter only gestured at (the full tree characterization, the Kraft inequality, spanning trees). The interleaved problems below pull from the handshaking lemma (Ch. 27), structural induction (Ch. 7), BFS (Ch. 28), loop invariants (Ch. 6), priority queues in Dijkstra (Ch. 29), and the floor function (Ch. 9) — each is a chance to confirm that an earlier tool still works in a new setting. If a referenced idea has gone fuzzy, that is exactly the signal to revisit it; spaced retrieval is the point.

31.27 † (Ch. 27 + 31.) State the handshaking lemma (Chapter 27). Use it to prove that every tree on $n \ge 2$ vertices has at least one vertex of degree $\ge 2$ unless $n = 2$ — equivalently, that the average degree of a tree is just under 2. (Compute $\frac{1}{n}\sum_v \deg(v)$ using $\lvert E\rvert = n - 1$.)

31.28 (Ch. 7 + 31.) Give a recursive (structural) definition of a binary tree in the style of Chapter 7 (basis clause + recursive clause + exclusion). Then, using structural induction on that definition, reprove that a binary tree with $n$ nodes has exactly $n - 1$ edges. Contrast which induction variable you used here versus in Theorem 31.2.

31.29 † (Ch. 28 + 31.) A level-order (breadth-first) traversal of a binary tree visits nodes in increasing depth, left to right. Explain why this is exactly BFS (Chapter 28) started at the root, and why on a tree BFS never needs its "already visited" check. For the tree of Exercise 31.4, give the level-order traversal.

31.30 (Ch. 6 + 31.) The BST bst_search loop of §31.4 maintains the invariant "if the key is in the tree, it is in the subtree currently rooted at node." State this loop invariant's three obligations (initialization, maintenance, termination) in the Chapter 6 vocabulary, and discharge each in one sentence, citing the BST property for maintenance. Then explain how the same invariant, plus a bound on the height $h$, gives the $O(h)$ running-time argument — i.e. why the loop runs at most $h + 1$ times.

31.31 † (Ch. 29 + 31.) In Chapter 29, Dijkstra's algorithm used a priority queue as a black box. Now that you know a binary heap implements it in $O(\log n)$ per operation, explain in two or three sentences why Dijkstra on a graph with $V$ vertices and $E$ edges runs in $O((V + E)\log V)$ rather than $\Theta(V^2)$ — i.e., account for where each heap operation occurs (how many extract-min calls, how many decrease-key/insert calls). For which density of graph (sparse $E = O(V)$ versus dense $E = \Theta(V^2)$) is the heap-based version a clear win, and for which does the difference shrink?

31.32 (Deep Dive — Ch. 38 preview.) Prove the Kraft inequality direction for prefix codes: if a binary prefix code assigns codewords of lengths $\ell_1, \ell_2, \dots, \ell_m$ to $m$ symbols, then $\sum_{i=1}^{m} 2^{-\ell_i} \le 1$. (Hint: place the symbols at leaves of a binary tree; the leaf at depth $\ell_i$ "owns" a fraction $2^{-\ell_i}$ of the leaves of a perfect tree of depth $\max_i \ell_i$, and distinct symbols own disjoint fractions.) Then check it numerically on the ABRACADABRA code of Case Study 2 ($A{=}0$, and four 3-bit codes): does $\sum 2^{-\ell_i}$ equal 1 exactly, and what does equality say about whether the tree "wastes" any leaf?

31.33 (Deep Dive — complete the characterization.) Theorem 31.1 lists four equivalent conditions; the chapter proved $1 \Leftrightarrow 2$. Complete the cycle by proving $2 \Rightarrow 3$ (unique paths $\Rightarrow$ minimally connected) and $3 \Rightarrow 4$ (minimally connected $\Rightarrow$ maximally acyclic). Conclude that all four are equivalent. (Hint for $3 \Rightarrow 4$: a minimally connected graph is acyclic — a cycle edge could be removed without disconnecting; then adding any non-edge $\{u,v\}$ together with the existing unique $u$–$v$ path forms a cycle.)

31.34 (Ch. 7 + 31, Deep Dive — reconstruct a tree.) The §31.3 Productive Struggle showed that preorder with inorder determines a binary tree uniquely. Prove this. State the claim as $P(n)$ for trees with $n$ nodes and use strong induction: the first preorder element is the root; locate it in the inorder to split the remaining nodes into the left and right subtrees' node sets; recurse on each side. Where exactly does the argument use that the keys are distinct (so the root can be located unambiguously in the inorder)?

31.35 (Ch. 9 + 31, Deep Dive — heap index arithmetic.) Using the floor function from Chapter 9, prove that the array-heap formulas of §31.5 are mutually consistent: show that for every $i \ge 0$, the parent of the left child $2i+1$ is $i$, and the parent of the right child $2i+2$ is $i$ — i.e. $\lfloor ((2i+1)-1)/2 \rfloor = \lfloor ((2i+2)-1)/2 \rfloor = i$. Then explain in one sentence why the completeness of the heap's tree is exactly what guarantees these indices are gap-free (every array slot $0, 1, 2, \dots$ corresponds to an actual node).

31.36 (Ch. 27 + 31, Deep Dive — spanning trees, a Chapter 32 bridge.) A spanning tree of a connected graph $G$ (Chapter 27 introduced the term; Chapter 32 builds them) is a subgraph that is a tree and includes every vertex of $G$. Prove that every connected graph on $n$ vertices has a spanning tree, and that spanning tree has exactly $n - 1$ edges. (Hint: if $G$ has a cycle, delete one cycle edge — connectivity survives (why?) and the edge count drops by one; repeat until acyclic. Then invoke Theorem 31.2 for the count.) Explain why this means $n - 1$ is the minimum number of edges any connected graph on $n$ vertices can have.

31.37 † (Ch. 7 + 31, Deep Dive — counting nodes by depth.) Prove by induction on the depth $d$ that a perfect binary tree (every internal node has two children and every leaf is at the same depth $h$) has exactly $2^d$ nodes at depth $d$, for each $0 \le d \le h$. Use this to give a second proof that a perfect tree of height $h$ has $2^{h+1} - 1$ nodes total, by summing a geometric series. (Compare with Exercise 31.11, which bounded an arbitrary tree; here you count an exactly-perfect one.)

31.38 (Find the error — Ch. 31.) Claim: "A binary heap stores its keys in sorted order, so the $k$-th smallest key is always at array index $k - 1$." "Proof": a min-heap puts the minimum at the root (index 0), so by extension the keys increase with the index. ∎ — Find the flaw. Give a concrete 4-element min-heap array in which the element at index 1 is not the second-smallest key, and state the correct weak guarantee a heap provides about index order (relate a parent's index to its children's).

Solutions to † and odd-numbered exercises (31.1, 31.3, …, 31.37, plus the daggered evens 31.24 and 31.26): see appendices/answers-to-selected.md. Reference code for the "Implement it" problems (31.13–31.16) is in this chapter's code/exercise-solutions.py. The rubric for each proof rewards a clearly stated $P(n)$, the right induction variable and base case, explicit use of the hypothesis (the "strip a leaf" or "split at the root" move), and a clean conclusion.