Chapter 31: Trees

DataField.Dev

41 min read

> "I think that I shall never see / a poem lovely as a tree."

Prerequisites

27
7

Learning Objectives

State the equivalent characterizations of a tree and prove that a tree on n vertices has exactly n-1 edges.
Use rooted-tree terminology (root, parent, child, leaf, depth, height) precisely and compute these quantities for a given tree.
Traverse a binary tree in preorder, inorder, and postorder, and explain what each traversal is good for.
Implement and reason about a binary search tree, and connect its shape to its O(h) operation cost.
Apply trees in computer science: binary heaps, expression trees, and tries.
Build a Huffman code from symbol frequencies and prove that it is optimal.

In This Chapter

Overview
Learning Paths
31.1 Trees and their properties
31.2 Rooted trees and terminology
31.3 Binary trees and traversals
31.4 Binary search trees
31.5 Tree applications: heaps, expression trees, and tries
31.6 Huffman coding: optimal prefix codes
Project Checkpoint: trees in the dmtoolkit graph module
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 31: Trees

"I think that I shall never see / a poem lovely as a tree." — Joyce Kilmer, "Trees" (1913)

Overview

Open any non-trivial program and you will find a tree. The folders on your disk form one. The HTML of the page you are reading is parsed into one. Your compiler turns 2 * (3 + 4) into one before it emits a single instruction. Git stores your project's history as one. The database index that makes a query return in milliseconds instead of minutes is one. Trees are, by a wide margin, the most useful single structure in computer science — and the reason is that a tree is the precise mathematical shape of hierarchy with no redundancy: a way to connect $n$ things using the absolute minimum number of links, with exactly one path between any two of them.

That last sentence is not marketing copy; it is a theorem, and by the end of section 31.1 you will be able to prove it. A tree is a graph (Chapter 27) that is connected and has no cycles. From those two small words — connected, acyclic — an astonishing amount follows automatically: the edge count is forced to be exactly $n - 1$, every pair of vertices is joined by a unique path, and removing any single edge disconnects the whole thing. Trees are the graphs that are just barely connected, and that fragility is exactly what makes them efficient.

This chapter is where the structural induction you learned in Chapter 7 finally earns its keep. Trees are recursively defined objects — a tree is a root attached to a list of smaller trees — and almost every fact about them is proved by inducting on that structure. If Chapter 7 felt abstract, this is the payoff: you will prove real, useful properties of the data structures you use every day.

In this chapter you will learn to:

Recognize a tree from any of its several equivalent definitions, and prove they agree.
Prove the central counting fact — a tree on $n$ vertices has exactly $n - 1$ edges — by induction.
Speak the language of rooted trees: root, parent, child, leaf, depth, height, subtree.
Traverse binary trees in preorder, inorder, and postorder, and pick the right one for a task.
Build and search a binary search tree, and see why its performance depends on its height.
Implement a binary heap, parse arithmetic with an expression tree, and store strings in a trie.
Construct an optimal prefix code with Huffman's algorithm — and prove that "optimal" is earned.

Learning Paths

🏃 Fast Track: If you already know that a tree has $n-1$ edges and can traverse a binary tree, skim 31.1–31.3 and spend your time on 31.4 (BSTs), 31.5 (heaps, expression trees, tries), and 31.6 (Huffman coding), where the algorithmic substance lives. Do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. Prove the $n-1$ edges theorem yourself (31.1) before reading ours, hand-trace each traversal in 31.3, and work every 🔄 Check Your Understanding.

🔬 Deep Dive: After the chapter, prove that all four characterizations of a tree in 31.1 are equivalent (we prove two of the implications; you complete the cycle), and prove the Huffman optimality theorem in full, including the exchange-argument lemma. Exercise set, Parts D and E.

31.1 Trees and their properties

Start with a picture instead of a definition. Here are three graphs on six vertices. Which ones "look like" the folder hierarchies and family trees you have an intuition for?

   (A) a tree            (B) NOT a tree          (C) NOT a tree
        a                     a                       a   d
       / \                   / \                     /     \
      b   c                 b   c                   b       e
     / \   \               / \ / \                  |
    d   e   f             d   e   f                 c    (f isolated)
                           \___/   (cycle d-e)

Graph (A) is connected — you can get from any vertex to any other — and it has no cycle: there is no way to start at a vertex, walk along distinct edges, and return without retracing. That is exactly the shape we want. Graph (B) is connected but contains a cycle (the triangle through $d$ and $e$), so it has redundant edges: there are two ways to get between some pair of vertices. Graph (C) has no cycle, but it is disconnected — vertex $f$ floats free, and $\{a,b,c\}$ is severed from $\{d,e\}$.

Recall from Chapter 27 the vocabulary we are leaning on: a path is a walk along edges with no repeated vertex, a cycle is a path of length at least 1 that returns to its start, and a graph is connected if there is a path between every pair of vertices. With those in hand, the definition is short.

Definition (tree). A tree is a connected, acyclic (cycle-free) undirected graph. A graph whose every connected component is a tree — that is, an acyclic graph that need not be connected — is a forest. (So a forest is a disjoint union of trees, and a single tree is a forest with one component.)

Graph (A) is a tree. Graph (C) is a forest (two trees, $\{a,b,c\}$ and $\{d,e\}$, plus the single-vertex tree $\{f\}$ — yes, one isolated vertex is a tree). Graph (B) is neither.

💡 Intuition: A tree is a graph that is connected as cheaply as possible. Connectivity costs edges; cycles are edges you are paying for twice. A tree spends every edge on connectivity and not one edge on redundancy. This is why "minimal connected" and "maximal acyclic" both describe trees, as we are about to see.

Equivalent characterizations

What makes trees beautiful — and what makes them show up everywhere — is that several very different sounding conditions all pin down the same objects. The following theorem lists four; any one of them could have been our definition.

Theorem 31.1 (Characterizations of trees). Let $G = (V, E)$ be an undirected graph with $n = \lvert V\rvert$ vertices. The following are equivalent: 1. $G$ is a tree (connected and acyclic). 2. Between every pair of vertices there is exactly one path. 3. $G$ is connected, and removing any edge disconnects it (it is minimally connected). 4. $G$ is acyclic, and adding any edge between two non-adjacent vertices creates a cycle (it is maximally acyclic).

Moreover, a tree on $n$ vertices has exactly $n - 1$ edges.

We will prove the most-used pieces and leave the rest as a Deep-Dive exercise (the implications form a cycle $1 \Rightarrow 2 \Rightarrow 3 \Rightarrow 4 \Rightarrow 1$; proving any cycle of implications establishes all equivalences). Let us first prove $1 \Leftrightarrow 2$, because the "unique path" property is the one you will use most.

The strategy first. "Connected" gives us at least one path between any two vertices; "acyclic" is what forces it to be at most one. The key idea for "at most one": if there were two different paths between the same pair, you could splice them together into a closed walk, and inside any such walk hides a genuine cycle — contradicting acyclicity. For the reverse, two cycle-free directions: a unique path everywhere means connected (at least one path exists), and a cycle would give two paths between some pair (the long way and the short way around), so unique paths means acyclic.

Proof of $1 \Leftrightarrow 2$.

($1 \Rightarrow 2$) Assume $G$ is a tree. Take any two vertices $u, v$. Because $G$ is connected, at least one path from $u$ to $v$ exists. Suppose, for contradiction, that there were two distinct paths $P$ and $Q$ from $u$ to $v$. Walk along $P$ from $u$; since $P \ne Q$, there is a first vertex $x$ after which they diverge, and because both paths end at the common vertex $v$, after diverging they must meet again — let $y$ be the first vertex where they re-meet. The portion of $P$ from $x$ to $y$ and the portion of $Q$ from $x$ to $y$ share only their endpoints $x$ and $y$, and together they form a cycle. This contradicts $G$ being acyclic. Hence the path is unique.

($2 \Rightarrow 1$) Assume there is exactly one path between every pair of vertices. "Exactly one" includes "at least one", so $G$ is connected. If $G$ contained a cycle through vertices $a$ and $b$, then the two arcs of that cycle would be two distinct paths from $a$ to $b$, contradicting uniqueness. So $G$ is acyclic, hence a tree. $\blacksquare$

The counting theorem: $n - 1$ edges

Now the result that drives almost every application — the edge count. This is the theorem that lets you say "a spanning tree of an $n$-node network needs exactly $n-1$ links" without drawing anything.

Theorem 31.2. Every tree with $n \ge 1$ vertices has exactly $n - 1$ edges.

The strategy first. This is a statement about all trees of every size, so we induct on the number of vertices $n$. The base case $n = 1$ is a lone vertex with zero edges — and $1 - 1 = 0$, good. For the inductive step the creative move is the same one that powers every tree induction: find a leaf, remove it, apply the hypothesis to what remains. Removing a leaf and its single edge from a tree leaves a smaller tree (we must check that), so the hypothesis applies and we just add the one edge back. The only thing we must establish first is that every finite tree with at least one edge has a leaf to remove.

We need one lemma before the proof.

Lemma 31.3. Every tree with at least two vertices has at least one leaf (a vertex of degree 1).

The strategy first. Take the longest path you can find in the tree. Its endpoint cannot have any other neighbors — an extra neighbor would either lengthen the path (contradicting "longest") or close a cycle (contradicting "tree"). So the endpoint has degree 1.

Proof of Lemma 31.3. A finite graph has only finitely many paths, so among them there is one of maximum length; call it $v_0, v_1, \dots, v_\ell$ with $\ell \ge 1$ (it has length at least 1 because the tree, being connected with $\ge 2$ vertices, has an edge). Consider the endpoint $v_\ell$. Its only neighbor on the path is $v_{\ell-1}$. Could $v_\ell$ have a neighbor $w$ off the path? If $w$ is not on the path at all, then $v_0, \dots, v_\ell, w$ is a longer path — contradiction. If $w = v_i$ is an earlier vertex of the path, then $v_i, v_{i+1}, \dots, v_\ell, v_i$ is a cycle — contradicting acyclicity. Either way we reach a contradiction, so $v_\ell$ has no neighbor besides $v_{\ell-1}$; its degree is 1. $\blacksquare$

Proof of Theorem 31.2 (induction on $n$). Let $P(n)$ be "every tree with $n$ vertices has exactly $n - 1$ edges."

Base case ($n = 1$): a tree with one vertex has no edges (an edge would need two endpoints), and $1 - 1 = 0$. So $P(1)$ holds.

Inductive step: fix $k \ge 1$ and assume $P(k)$ — every $k$-vertex tree has $k - 1$ edges. Let $T$ be any tree with $k + 1$ vertices. Since $k + 1 \ge 2$, Lemma 31.3 gives a leaf $v$ of degree 1; let $e$ be its unique incident edge. Form $T' = T - v$ by deleting $v$ and $e$. We claim $T'$ is a tree on $k$ vertices:

$T'$ is acyclic: removing a vertex and edge cannot create a cycle, and $T$ had none.
$T'$ is connected: any path in $T$ between two surviving vertices never used $v$ (a degree-1 vertex can only sit at the end of a path, never in the middle), so every such path survives in $T'$.

By the inductive hypothesis, $T'$ has $k - 1$ edges. Adding $v$ and its one edge $e$ back recovers $T$, so $T$ has $(k - 1) + 1 = k$ edges, which is $(k+1) - 1$. Thus $P(k+1)$ holds. By induction, every tree with $n \ge 1$ vertices has exactly $n - 1$ edges. $\blacksquare$

🔗 Connection. "Remove a leaf, recurse" is structural induction (Chapter 7) wearing graph clothing. In Chapter 7 you proved a full binary tree with $i$ internal nodes has $i + 1$ leaves by inducting on the tree's recursive structure; here we induct on vertex count via the same leaf-stripping move. Almost every proof in this chapter has this shape. When a problem says "tree," your reflex should be "what does removing a leaf, or splitting at the root, reduce this to?"

🔄 Check Your Understanding 1. A graph has 10 vertices and 10 edges. Can it be a tree? What if it has 10 vertices and 8 edges? 2. You are told a graph is connected and has exactly $n - 1$ edges. Must it be a tree? (Sketch why.) 3. In Lemma 31.3, where exactly did we use that the graph is acyclic?

Answers

No to both. A tree on 10 vertices has exactly $10 - 1 = 9$ edges. With 10 edges it has too many (a cycle is forced); with 8 it has too few (it must be disconnected). 2. Yes. If it had a cycle, you could delete a cycle edge and stay connected (a fact from characterization 3's flavor), reaching a connected graph on $n$ vertices with $n - 2$ edges — impossible, since a connected graph needs at least $n-1$ edges. So no cycle exists; connected + acyclic = tree. (This is the converse direction; "connected with $n-1$ edges" is itself a fifth characterization.) 3. To rule out the case where the endpoint's off-path neighbor is an earlier path vertex $v_i$ — that would close a cycle, which acyclicity forbids.

31.2 Rooted trees and terminology

The trees of section 31.1 have no preferred vertex; they are just connected acyclic graphs. But the trees you program with almost always have a distinguished starting vertex — the root of the file system, the top of the DOM, the entry point of a parse. Singling out one vertex transforms a plain tree into a rooted tree and unlocks a whole family of directional words.

Definition (rooted tree). A rooted tree is a tree together with a designated vertex called the root. Choosing a root orients every edge "away from the root": for an edge $\{u, v\}$ where $u$ is closer to the root, $u$ is the parent of $v$ and $v$ is a child of $u$. A vertex with no children is a leaf; a non-leaf is an internal node. The ancestors of a vertex are the vertices on the unique path from it up to the root; its descendants are all vertices for which it is an ancestor. The subtree rooted at $v$ is $v$ together with all of $v$'s descendants.

Because (by Theorem 31.1) there is a unique path between any two vertices, "the path up to the root" is unambiguous — every non-root vertex has exactly one parent. That single fact is what makes rooted trees so clean to compute with.

Two numerical measurements come up constantly:

Definition (depth, height). The depth of a vertex is the length (number of edges) of the path from the root to that vertex; the root has depth 0. The height of a rooted tree is the maximum depth of any vertex — equivalently, the length of the longest root-to-leaf path. A single-vertex tree has height 0.

Here is a rooted tree with its depths labeled, rooted at $a$:

            a            depth 0     (root)
          / | \
         b  c  d         depth 1
        /|     |
       e f     g         depth 2     (e, f, g, and c are leaves)
       |
       h                 depth 3     (h is a leaf; tree height = 3)

In this tree: $a$ is the root; $b$, $c$, $d$ are its children; $b$ is the parent of $e$ and $f$; the leaves are $c$, $f$, $g$, $h$ (each has no children); $e$ is an internal node with one child $h$. The ancestors of $h$ are $e$, $b$, $a$. The subtree rooted at $b$ is $\{b, e, f, h\}$. The height is 3 (the path $a \to b \to e \to h$).

⚠️ Common Pitfall: Depth and height are measured in opposite directions and are easy to swap. Depth is measured down from the root (the root is shallowest, depth 0). Height is measured up from the deepest leaf (a leaf has height 0 in its own subtree; the whole tree's height is the root's height). A useful sanity check: the height of the tree equals the maximum depth over all vertices. When an algorithm's cost is "$O(\text{height})$," padding the tree with deep chains is what hurts you.

💡 Intuition: A rooted tree is a recursive data structure, exactly as Chapter 7 described one. A rooted tree is "a root, plus a list of rooted trees (the subtrees hanging off its children)." That recursive view — root + children-that-are-themselves-trees — is why nearly every tree algorithm is three lines: do something at the root, then recurse on each child. Hold onto this; it is the engine of section 31.3.

🔄 Check Your Understanding 1. In the tree above, what is the depth of $g$, and what is the height of the subtree rooted at $b$? 2. True or false: in a rooted tree, a vertex can have many children but only one parent. Why? 3. If a rooted tree has height $h$, what is the largest possible depth of any vertex?

Answers

$g$ has depth 2. The subtree rooted at $b$ is $b \to e \to h$ plus the leaf $f$; its height is 2 (the path $b \to e \to h$). 2. True. The root has no parent; every other vertex has exactly one parent because there is a unique path to the root (Theorem 31.1), and the parent is the first step of that path. Children, by contrast, are unrestricted. 3. Exactly $h$ — height is defined as the maximum depth, so the deepest vertex sits at depth $h$.

31.3 Binary trees and traversals

Most trees in practice cap how many children a node may have. The most important cap is two.

Definition (binary tree). A binary tree is a rooted tree in which every vertex has at most two children, and each child is distinguished as either the left child or the right child. (The left/right distinction matters even when a node has only one child: a left-only node differs from a right-only node.) A binary tree is full if every internal node has exactly two children, and complete if every level is filled except possibly the last, which is filled left to right.

The left/right labeling is what makes binary trees so versatile — it lets us encode meaning in position, which is the whole idea behind the binary search tree (31.4), the binary heap (31.5), and the expression tree (31.5).

A natural Python representation follows the recursive definition directly:

class Node:
    """A binary-tree node: a value plus optional left/right children."""
    def __init__(self, val, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

#       1
#      / \      Build this tree by hand:
#     2   3
#    / \
#   4   5
root = Node(1, Node(2, Node(4), Node(5)), Node(3))

Traversing a binary tree

To do anything with a tree — print it, evaluate it, search it — you must visit its nodes in some order. A traversal is a systematic visit of every node exactly once. For binary trees, three orders dominate, and they differ only in when you visit the root relative to its subtrees.

Definition (preorder, inorder, postorder). For a binary tree with root $r$, left subtree $L$, and right subtree $R$, the three depth-first traversals are defined recursively: - Preorder: visit $r$, then preorder-traverse $L$, then preorder-traverse $R$. (Root first.) - Inorder: inorder-traverse $L$, then visit $r$, then inorder-traverse $R$. (Root between.) - Postorder: postorder-traverse $L$, then postorder-traverse $R$, then visit $r$. (Root last.) An empty subtree contributes nothing. (A fourth order, level-order or breadth-first, visits nodes by increasing depth and is just BFS from Chapter 28 applied to a tree.)

The recursive definitions translate to code with no friction:

def preorder(node):
    if node is None:
        return []
    return [node.val] + preorder(node.left) + preorder(node.right)

def inorder(node):
    if node is None:
        return []
    return inorder(node.left) + [node.val] + inorder(node.right)

def postorder(node):
    if node is None:
        return []
    return postorder(node.left) + [node.val] + postorder(node.right)

Let us hand-trace all three on the tree built above (root 1; left subtree rooted at 2 with children 4, 5; right child 3). Take preorder: visit 1, then recurse left into the subtree at 2. There: visit 2, recurse left to leaf 4 (visit 4), recurse right to leaf 5 (visit 5). Back up; now recurse right from the root into 3 (visit 3). Reading the visits in order: $1, 2, 4, 5, 3$.

For inorder on the same tree: recurse left first, all the way down. In the subtree at 2: inorder of its left (leaf 4) gives 4, then visit 2, then inorder of its right (leaf 5) gives 5 — so the left subtree yields $4, 2, 5$. Then visit the root 1. Then the right subtree (leaf 3) yields 3. Result: $4, 2, 5, 1, 3$.

For postorder: children before parents everywhere. Left subtree at 2 yields $4, 5, 2$ (both children, then the node); then the right child yields 3; then the root 1 comes last. Result: $4, 5, 2, 3, 1$.

print(preorder(root))    # [1, 2, 4, 5, 3]
print(inorder(root))     # [4, 2, 5, 1, 3]
print(postorder(root))   # [4, 5, 2, 3, 1]
# Expected output:
# [1, 2, 4, 5, 3]
# [4, 2, 5, 1, 3]
# [4, 5, 2, 3, 1]

Each order is the right tool for a different job, and the rule is memorable:

Traversal	Root visited	Canonical use
Preorder	before children	Copy/serialize a tree; print a directory listing top-down
Inorder	between children	Read a binary search tree's keys in sorted order (31.4)
Postorder	after children	Evaluate an expression tree; safely delete/free a tree

💡 Intuition: Postorder is "bottom-up"; preorder is "top-down." You free a tree in postorder because you must release the children before the parent (releasing the parent first would orphan them). You evaluate 2 * (3 + 4) in postorder because you need both operands' values before you can apply the operator. You serialize in preorder because you want to write the parent (so the reader can create it) before its children. Match the traversal to when the parent's work depends on the children's.

🧩 Productive Struggle: Here are the preorder and inorder traversals of some binary tree: preorder $= [A, B, D, E, C]$, inorder $= [D, B, E, A, C]$. Can you reconstruct the tree? Try before reading on — the key realization is that preorder hands you the root first ($A$), and inorder then tells you everything left of $A$ ($D, B, E$) versus right of $A$ ($C$). Recurse on each side.

Answer

Root is $A$ (first in preorder). In inorder, $D, B, E$ precede $A$ (left subtree) and $C$ follows (right subtree). The next preorder element after $A$ is $B$, so $B$ roots the left subtree; in inorder, $D$ precedes $B$ and $E$ follows, so $B$ has left child $D$ and right child $E$. The right subtree is the single node $C$. Final tree: $A$ with left child $B$ (children $D$, $E$) and right child $C$. Preorder with inorder uniquely determines a binary tree — a fact worth remembering.

🔄 Check Your Understanding 1. Which traversal of a binary search tree produces its keys in sorted order? 2. For the tree Node(1, Node(2), Node(3)), give all three traversals. 3. Why must you use postorder, not preorder, to recursively free every node of a tree in C?

Answers

Inorder. 2. Preorder $[1,2,3]$; inorder $[2,1,3]$; postorder $[2,3,1]$. 3. In postorder you free both children before the parent. With preorder you would free the parent first, but the parent holds the pointers to its children — once it is freed those pointers are dangling and you can no longer reach the children to free them (a memory leak, or undefined behavior if you read the freed pointers).

31.4 Binary search trees

We now spend a binary tree's left/right structure on the single most useful idea in this chapter: an ordering invariant that turns a tree into a searchable dictionary.

Imagine storing a set of integer keys so that you can answer "is 37 present?" quickly. A sorted array lets you binary-search in $O(\log n)$, but inserting a new key costs $O(n)$ because you must shift elements. A linked list inserts in $O(1)$ but searches in $O(n)$. A binary search tree (BST) aims for the best of both: $O(\log n)$ search and $O(\log n)$ insertion — when it stays balanced.

Definition (binary search tree). A binary search tree is a binary tree in which every node's key is greater than all keys in its left subtree and less than all keys in its right subtree. (This is the BST property or BST invariant; we assume distinct keys for simplicity.)

The invariant is local — it constrains each node relative to its two subtrees — but it has a global consequence: an inorder traversal visits the keys in sorted order (left subtree first, which by the invariant holds all the smaller keys, then the node, then the larger keys). That is the cleanest possible proof that the structure really does encode an ordering, and it is a one-line corollary of the definition.

To search for a key, you exploit the invariant to discard half the tree at each step: compare the target to the current node; if equal, done; if smaller, the target can only be in the left subtree; if larger, only in the right. Each comparison drops you one level deeper.

def bst_search(node, key):
    """Return the node holding key, or None. O(h) comparisons."""
    while node is not None:
        if key == node.val:
            return node
        node = node.left if key < node.val else node.right
    return None

Insertion follows the identical search path and, when it falls off the bottom of the tree (reaches a missing child), attaches the new key there — which is exactly the spot the BST property requires.

The strategy first (why search is correct). The loop maintains an invariant (Chapter 6's loop invariant, exactly): if the key is anywhere in the tree, it is in the subtree currently rooted at node. Each iteration preserves this because the BST property guarantees the key, if present, lies on the side we move toward and never on the side we discard. When node becomes None the subtree is empty, so by the invariant the key is absent. When we find it, the invariant says we have the right node.

Worked example. Insert the keys $50, 30, 70, 20, 40, 60$ in that order into an empty BST, then search for $40$.

Insert 50: it becomes the root.
Insert 30: $30 < 50$, go left; left is empty, so 30 becomes 50's left child.
Insert 70: $70 > 50$, go right; right empty, 70 becomes 50's right child.
Insert 20: $20 < 50$ (left) then $20 < 30$ (left); empty, 20 becomes 30's left child.
Insert 40: $40 < 50$ (left) then $40 > 30$ (right); empty, 40 becomes 30's right child.
Insert 60: $60 > 50$ (right) then $60 < 70$ (left); empty, 60 becomes 70's left child.

The resulting tree:

An inorder traversal reads $20, 30, 40, 50, 60, 70$ — sorted, as promised. Searching for 40: compare $40 < 50$ (go left to 30), then $40 > 30$ (go right to 40), match. Two comparisons.

Why height is everything

Each BST operation walks one root-to-node path, so it costs $O(h)$ where $h$ is the height. The whole performance story is therefore about height:

Balanced (height $\approx \log_2 n$): search/insert/delete in $O(\log n)$. This is the win.
Degenerate (height $\approx n$): if you insert already-sorted keys $1, 2, 3, \dots, n$, every new key goes right, and the "tree" becomes a linked list of height $n - 1$. Operations degrade to $O(n)$ — no better than the list we were trying to beat.

   sorted insert 1,2,3,4 →   1
                              \
                               2
                                \
                                 3
                                  \
                                   4   (height 3, a degenerate "stick")

⚠️ Common Pitfall: A plain BST gives no balance guarantee. Its $O(\log n)$ performance assumes the tree stays bushy, which it does on random insertions but emphatically not on sorted or adversarial input — the worst case is the $O(n)$ stick above. Real systems use self-balancing BSTs (AVL trees, red-black trees) that perform rotations to keep $h = O(\log n)$ guaranteed. Python's dict/set sidestep the issue with hashing (Chapter 26), and sortedcontainers and most language standard libraries (e.g., C++ std::map, Java TreeMap) use balanced trees under the hood. The math you need to analyze them is exactly the height reasoning here.

🔄 Check Your Understanding 1. You insert $4, 2, 6, 1, 3, 5, 7$ into an empty BST. What is its height, and what does an inorder traversal print? 2. Why does searching a BST cost $O(h)$ rather than $O(n)$ in general? 3. What property of the insertion order produces a degenerate (stick-shaped) BST?

Answers

Height 2 — 4 is the root, 2 and 6 are its children, and 1,3,5,7 are leaves; this is a perfectly balanced tree of 7 nodes. Inorder prints $1,2,3,4,5,6,7$. 2. Each comparison moves you strictly one level deeper and discards an entire subtree, so the number of comparisons is bounded by the height $h$, not the node count $n$. Only when $h \approx n$ (a degenerate tree) do the two coincide.

Inserting keys in sorted (or reverse-sorted) order: each new key is larger (or smaller) than all existing keys, so it always attaches on the same side, building a single chain.

31.5 Tree applications: heaps, expression trees, and tries

Three more tree-shaped data structures earn their place in every programmer's toolbox. Each spends the tree's structure on a different kind of meaning.

Binary heaps

A priority queue (Chapter 29, where Dijkstra's algorithm needed one) supports two operations: insert an item with a priority, and remove the item of highest priority. The standard implementation is a binary heap.

Definition (heap). A (binary, min-) heap is a complete binary tree satisfying the heap property: every node's key is less than or equal to the keys of its children. Consequently the minimum key sits at the root. (A max-heap reverses the inequality so the maximum is at the root.)

The heap property is weaker than the BST property — it relates each node only to its children, not to a whole subtree's ordering — and that weakness is the point: it is cheap to maintain, requiring only a walk up or down one path, so insert and remove-min each cost $O(\log n)$, while peeking at the minimum is $O(1)$.

🔗 Connection. In Chapter 29 we used a priority queue as a black box to make Dijkstra's algorithm efficient, and promised the implementation here. This is it: the heap is the priority queue. Every time Dijkstra "extracts the closest unfinished vertex," it is doing a heap remove-min in $O(\log n)$ — the difference between an $O((V+E)\log V)$ routing algorithm and a quadratic one.

Because a heap is a complete binary tree, it has a beautiful representation with no pointers at all — a flat array. Store the root at index 0; then for the node at index $i$:

$$\text{left child} = 2i + 1, \qquad \text{right child} = 2i + 2, \qquad \text{parent} = \left\lfloor \frac{i-1}{2} \right\rfloor.$$

The completeness of the tree is exactly what makes this index arithmetic exact and gap-free.

def heap_parent(i):  return (i - 1) // 2
def heap_left(i):    return 2 * i + 1
def heap_right(i):   return 2 * i + 2

# Min-heap stored as an array; verify the heap property for the root's children.
heap = [1, 3, 2, 7, 4, 5]   # 1 at root; children 3,2; then 7,4,5
print(heap[0], heap[heap_left(0)], heap[heap_right(0)])
# Expected output:
# 1 3 2

You can confirm the floor/ceiling arithmetic from Chapter 9 makes the indices line up: the children of index 0 are 1 and 2; the children of index 1 are 3 and 4; the children of index 2 are 5 and (would be) 6. Parent of index 5 is $\lfloor 4/2\rfloor = 2$. Every node lands in a unique slot.

Expression trees

Recall the compiler hook from the overview. An arithmetic expression like 2 * (3 + 4) is naturally a binary tree: each internal node is an operator and each leaf is an operand.

Definition (expression tree). An expression tree for an arithmetic expression is a binary tree whose leaves are operands (numbers or variables) and whose internal nodes are binary operators; the subtree rooted at an operator node represents the result of applying that operator to the values of its two child subtrees.

        *          represents 2 * (3 + 4)
       / \
      2   +
         / \
        3   4

The traversals of section 31.3 now reveal their deeper meaning:

Inorder (with parentheses) reproduces the usual infix notation: $2 * (3 + 4)$.
Postorder produces postfix / Reverse Polish Notation: $2\ 3\ 4\ +\ *$ — the form a stack-based evaluator (and many real virtual machines) consumes directly.
Preorder produces prefix / Polish notation: $*\ 2\ +\ 3\ 4$.

To evaluate the expression, you traverse in postorder: compute both operand subtrees, then apply the operator. This is precisely why postorder is "the evaluation order."

def evaluate(node):
    """Evaluate an expression tree. Leaves hold numbers; internal nodes hold '+','*', etc."""
    if node.left is None and node.right is None:   # a leaf (operand)
        return node.val
    a, b = evaluate(node.left), evaluate(node.right)   # postorder: children first
    return a + b if node.val == '+' else a * b         # then apply the operator

tree = Node('*', Node(2), Node('+', Node(3), Node(4)))
print(evaluate(tree))
# Expected output:
# 14

Hand-tracing: evaluate on the root * first evaluates its left child (leaf 2 → 2) and right child (the + node). Evaluating + evaluates its children (leaves 3 and 4) and returns $3 + 4 = 7$. Back at the root, we apply * to $2$ and $7$, giving $14$. The recursion is the postorder traversal.

Tries

A trie (from retrieval, usually pronounced "try") stores a set of strings as a tree of shared prefixes — the structure behind autocomplete and IP-routing tables.

Definition (trie). A trie is a rooted tree in which each edge is labeled with a character, so that the labels along the path from the root to a node spell a string prefix. Nodes are flagged to mark where a complete stored word ends. Strings that share a prefix share the corresponding path from the root.

        (root)
       /  |
      c   d
      |   |
      a   o
     /|   |
    t r   g       Stored words: "cat", "car", "dog"
    *  *  *       (* marks the end of a complete word)

Looking up a length-$m$ string costs $O(m)$ — independent of how many words the trie holds — because you simply follow $m$ edges. That is the trie's superpower: the cost scales with the query length, not the dictionary size, and shared prefixes are stored once. Autocomplete is then "walk to the node for the typed prefix, and report every word in the subtree below it" — a subtree traversal, exactly the operation 31.3 gave you.

🔄 Check Your Understanding 1. In an array-based heap, what are the array indices of the two children of the node at index 4? 2. Give the postfix (postorder) form of the expression tree for $(8 - 2) * 5$. 3. Why does a trie lookup for a word of length $m$ take $O(m)$ time regardless of the number of words stored?

Answers

Left child $2 \cdot 4 + 1 = 9$; right child $2 \cdot 4 + 2 = 10$. 2. The tree is * with left child -(8, 2) and right child 5; postorder visits children before parents: $8\ 2\ -\ 5\ *$.

A lookup follows exactly one edge per character, so it takes $m$ steps; the branching factor and the total number of stored words affect memory and sibling counts but not the length of the single root-to-node path you walk.

31.6 Huffman coding: optimal prefix codes

We close with a genuine algorithmic gem that ties trees to data compression — and to coding theory, which Chapter 38 develops in full. The problem: encode a stream of symbols in as few bits as possible.

Fixed-length encoding (like ASCII's 8 bits per character) is wasteful when some symbols are far more common than others. English text is mostly e, t, a; the letter z is rare. It would be smarter to give common symbols short codes and rare symbols long codes — a variable-length code. But variable-length codes have a hazard: if e = 0 and t = 01, then the bitstream 01 is ambiguous (is it e then... or t?). We need codes that can be decoded without lookahead.

Definition (prefix code). A prefix code is an assignment of binary codewords to symbols such that no codeword is a prefix of another. Prefix codes are exactly the codes representable as a binary tree: each symbol sits at a leaf, and its codeword is the sequence of left(0)/right(1) turns on the path from the root. Because symbols live only at leaves, no codeword can be a prefix of another, and decoding is unambiguous — walk the tree from the root, emit a symbol on reaching a leaf, return to the root, repeat.

Given symbol frequencies, which prefix code is shortest on average? Define the cost of a code tree as

$$\text{cost} = \sum_{\text{symbols } s} f(s) \cdot d(s),$$

where $f(s)$ is the frequency of symbol $s$ and $d(s)$ is its depth in the tree (the length of its codeword). We want to minimize this weighted sum. Huffman's algorithm builds the optimal tree with a strikingly simple greedy rule.

Definition (Huffman coding). Huffman coding builds an optimal prefix-code tree as follows: 1. Make each symbol a single-node tree, tagged with its frequency. 2. Repeatedly remove the two trees of smallest frequency and merge them under a new internal node whose frequency is the sum of theirs. 3. Stop when one tree remains; it is the Huffman tree. Read each symbol's codeword off its root path.

💡 Intuition: The greedy insight is that the two rarest symbols should sit deepest (longest codewords), because they are used least and so contribute least extra cost per bit of depth. Merging the two smallest frequencies first buries them at the bottom and lets the common symbols float toward the root with short codes. The priority queue from 31.5 is the perfect engine for "repeatedly grab the two smallest" — Huffman is a heap in action.

Worked example. Suppose the frequencies are $A:5,\ B:2,\ C:1,\ D:1$ (think: out of 9 symbols, $A$ appears 5 times, etc.). Build the Huffman tree:

Smallest two are $C{:}1$ and $D{:}1$. Merge into a node $\{C,D\}$ of frequency $2$.
Queue now holds $A{:}5,\ B{:}2,\ \{C,D\}{:}2$. Smallest two are $B{:}2$ and $\{C,D\}{:}2$ (tie broken arbitrarily). Merge into $\{B,C,D\}$ of frequency $4$.
Queue holds $A{:}5,\ \{B,C,D\}{:}4$. Merge into the root of frequency $9$.

The tree (left edge = 0, right edge = 1):

Read off the codewords: $A = 0$ (one bit), $B = 10$, $C = 110$, $D = 111$. The common symbol $A$ got the shortest code; the rare $C, D$ got the longest. The total cost is

$$5(1) + 2(2) + 1(3) + 1(3) = 5 + 4 + 3 + 3 = 15 \text{ bits},$$

versus $9 \times 2 = 18$ bits for a fixed 2-bit code — a real saving, and provably the best any prefix code can do for these frequencies.

Why Huffman is optimal

That "provably the best" is a theorem, and its proof is a beautiful exchange argument — the same proof shape you will meet again for greedy MST algorithms in Chapter 32.

Theorem 31.4 (Huffman optimality). The prefix code produced by Huffman's algorithm minimizes the average codeword length $\sum_s f(s)\, d(s)$ over all prefix codes for the given frequencies.

The strategy first. Two ideas. (Lemma — the greedy choice is safe.) In some optimal tree, the two lowest-frequency symbols are siblings at the deepest level. Why: take any optimal tree; if the two rarest symbols are not the deepest siblings, swap them with whatever is deepest. Swapping a lower-frequency symbol down and a higher-frequency symbol up cannot increase the cost — you moved small weight to a deep spot and large weight to a shallow spot. So an optimal tree with the two rarest as deepest siblings exists; Huffman's first merge is consistent with optimality. (Induction — the rest follows.) After merging the two rarest into one super-symbol, Huffman recursively builds an optimal tree for the smaller problem; an induction on the number of symbols shows the whole tree is optimal. We prove the lemma, the heart of the argument.

Proof of the exchange lemma. Let $T$ be an optimal prefix-code tree, and let $x, y$ be the two symbols of lowest frequency, with $f(x) \le f(y) \le f(s)$ for all other symbols $s$. In an optimal tree there are two deepest leaves that are siblings (if a deepest leaf had no sibling, its parent would have a single child, and promoting that leaf one level up would strictly lower the cost — contradicting optimality). Call those deepest siblings $a$ and $b$, at depth $d_{\max}$.

Now exchange $x$ with $a$ and $y$ with $b$. Consider the cost change from swapping $x$ and $a$ alone. Originally $x$ is at some depth $d(x) \le d_{\max}$ and $a$ is at $d_{\max}$. After the swap, $x$ moves to $d_{\max}$ and $a$ moves to $d(x)$. The change in total cost is

$$\big[f(x)\,d_{\max} + f(a)\,d(x)\big] - \big[f(x)\,d(x) + f(a)\,d_{\max}\big] = \big(f(a) - f(x)\big)\big(d(x) - d_{\max}\big).$$

Since $x$ is a minimum-frequency symbol, $f(a) - f(x) \ge 0$; and since $d_{\max}$ is the maximum depth, $d(x) - d_{\max} \le 0$. The product of a non-negative and a non-positive number is $\le 0$, so the swap does not increase the cost. The same holds for swapping $y$ with $b$. As $T$ was optimal, the cost cannot strictly decrease, so it is unchanged: we now have an optimal tree in which the two rarest symbols $x, y$ are deepest siblings — exactly the pair Huffman merges first. $\blacksquare$

🚪 Threshold Concept. Huffman's correctness is your first full greedy exchange argument, and the pattern is one of the most powerful in all of algorithm design: show that some optimal solution agrees with the greedy choice (by exchanging without loss), then induct. Once you internalize "prove the greedy choice is safe, then recurse," you hold the key to MST correctness (Chapter 32, the cut property), to scheduling algorithms, and to a large swath of optimization. Greedy algorithms are seductive and usually wrong; the exchange argument is how you tell the rare correct ones from the many plausible-but-broken ones — which is theme two of this book, that a passing test is not a proof of correctness.

🔄 Check Your Understanding 1. Why must every symbol in a prefix-code tree be a leaf, never an internal node? 2. For frequencies $A:4, B:3, C:2$, build the Huffman tree and give the codeword lengths. 3. In the exchange lemma, which sign fact forces $(f(a)-f(x))(d(x)-d_{\max}) \le 0$?

Answers

If a symbol sat at an internal node, its codeword would be a prefix of every codeword in the subtree below it (those paths extend it), violating the prefix-free property and making decoding ambiguous. Leaves have no descendants, so no leaf codeword extends another. 2. Merge the two smallest, $C{:}2$ and $B{:}3$, into a node of frequency 5; then merge that with $A{:}4$ for a root of

Codeword lengths: $A = 1$ bit, $B = 2$ bits, $C = 2$ bits. (Cost $4(1)+3(2)+2(2) = 14$.)

$f(a) - f(x) \ge 0$ because $x$ has minimum frequency, and $d(x) - d_{\max} \le 0$ because $d_{\max}$ is the maximum depth; a non-negative times a non-positive is non-positive.

Project Checkpoint: trees in the `dmtoolkit` graph module

The Toolkit's graphs.py module has been growing since Chapter 27. Trees are graphs, so this chapter's increment lives right there. We add a recognizer — is this graph a tree? — and a recursive tree-height function, both of which exercise the chapter's two central facts: a tree is connected and acyclic, and a connected graph on $n$ vertices is a tree if and only if it has exactly $n-1$ edges (Theorem 31.2 plus its converse from the 31.1 Check Your Understanding).

Add to dmtoolkit/graphs.py:

def is_tree(g):
    """Return True iff undirected Graph g is a tree: connected with |V|-1 edges.
    Uses Theorem 31.2 + converse: connected and (edges == vertices - 1) <=> tree."""
    verts = g.vertices()                  # set of vertices (from the Ch. 27 Graph class)
    n = len(verts)
    if n == 0:
        return False                      # convention: the empty graph is not a tree
    edge_count = sum(len(g.neighbors(v)) for v in verts) // 2   # handshaking (Ch. 27)
    if edge_count != n - 1:
        return False                      # wrong edge count: cannot be a tree
    # connectivity check via BFS from any start (bfs is the Ch. 28 toolkit function)
    start = next(iter(verts))
    return len(bfs(g, start)) == n        # reached every vertex?

def tree_height(g, root):
    """Height of tree g rooted at `root`: longest root-to-leaf path length."""
    def h(node, parent):
        children = [w for w in g.neighbors(node) if w != parent]
        return 0 if not children else 1 + max(h(c, node) for c in children)
    return h(root, None)

Trace it on the path graph $1 - 2 - 3 - 4$ (a tree: 4 vertices, 3 edges, connected). is_tree computes edge_count = 3 = 4 - 1, BFS from 1 reaches all 4 vertices, so it returns True. Rooting at vertex 1, tree_height recurses $1 \to 2 \to 3 \to 4$ and returns 3.

# Expected output (for the path graph 1-2-3-4 built with the Ch. 27 Graph class):
# is_tree(g)        -> True
# tree_height(g, 1) -> 3

This increment leans on three earlier pieces of the Toolkit — the Graph class and neighbors (Chapter 27), the handshaking lemma to count edges (Chapter 27), and bfs (Chapter 28) — which is the whole point of building one library across the book: each chapter's code stands on the last. The recognizer feeds directly into Chapter 32, where you will construct spanning trees and minimum spanning trees: is_tree is exactly the validator you will run on a candidate spanning tree to confirm Kruskal's or Prim's output is correct.

Toolkit so far: logic.py (Ch. 1–3), sets.py (Ch. 8), relations.py (Ch. 12–13), combinatorics.py (Ch. 15–17), recurrences.py (Ch. 6, 18–19), probability.py (Ch. 20–21), numbertheory.py and crypto.py (Ch. 22–25), and the steadily growing graphs.py (Ch. 27–31). Next chapter turns is_tree from a checker into a builder.

Summary

A tree is a connected, acyclic undirected graph; a forest is an acyclic graph (a disjoint union of trees). Everything else in the chapter flows from those two words.

Core theorems.

Result	Statement
Characterizations (Thm 31.1)	Tree ⟺ unique path between every pair ⟺ minimally connected ⟺ maximally acyclic
Leaf existence (Lem 31.3)	Every tree with $\ge 2$ vertices has a vertex of degree 1
Edge count (Thm 31.2)	A tree on $n$ vertices has exactly $n - 1$ edges (proof: strip a leaf, induct)
Huffman optimality (Thm 31.4)	Huffman's greedy merge yields a minimum-average-length prefix code

Rooted-tree vocabulary. Root (depth 0); parent/child (edge oriented away from root); leaf (no children) vs. internal node; ancestors/descendants; subtree rooted at $v$; depth (down from root) vs. height (= max depth). A binary tree caps children at two, labeled left/right.

Traversals (differ only in when the root is visited):

Traversal	Order	Use
Preorder	root, $L$, $R$	serialize/copy a tree
Inorder	$L$, root, $R$	read a BST in sorted order; infix expression
Postorder	$L$, $R$, root	evaluate an expression tree; free a tree; postfix (RPN)

Decision aid — which tree structure?

Need	Use	Cost
Ordered dictionary, sorted iteration	binary search tree (balanced)	$O(\log n)$ search/insert
Repeatedly extract min/max priority	binary heap (array-backed)	$O(\log n)$ insert/remove, $O(1)$ peek
Evaluate / represent arithmetic	expression tree	$O(n)$ evaluate (postorder)
Store/query strings by prefix	trie	$O(m)$ per length-$m$ string
Compress by frequency	Huffman tree (prefix code)	optimal average bits/symbol

Key formulas. BST/heap operations cost $O(h)$, the tree height. Array heap indices: left $= 2i+1$, right $= 2i+2$, parent $= \lfloor (i-1)/2 \rfloor$. Huffman cost $= \sum_s f(s)\, d(s)$.

Pitfalls. Confusing depth (from root) with height (from leaves); assuming a plain BST is balanced (sorted input makes it a degenerate $O(n)$ stick — use a self-balancing tree); freeing a tree in preorder (orphans the children — use postorder); placing a symbol at an internal node of a prefix-code tree (breaks prefix-freeness).

Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking.

(Ch. 27) State the handshaking lemma. In the is_tree checkpoint, we counted a tree's edges as $\frac{1}{2}\sum_v \deg(v)$ — why does that formula give the edge count, and what does it evaluate to for any tree on $n$ vertices?

(Ch. 27) A tree is a connected graph with no cycles. Using only Chapter 27's definitions, explain why a tree must be a simple graph (no loops, no multi-edges).

(Ch. 7) The proof that a tree has $n-1$ edges strips a leaf and recurses. Which proof technique from Chapter 7 is this an instance of, and what was the analogous full-binary-tree fact you proved there?

(Ch. 7) Give a recursive (inductive) definition of a binary tree in the style of Chapter 7's recursively defined sets (basis clause + recursive clause).

Answers

The handshaking lemma: $\sum_{v \in V}\deg(v) = 2\lvert E\rvert$, because each edge contributes exactly 2 to the total degree (one to each endpoint). Solving gives $\lvert E\rvert = \frac{1}{2}\sum_v \deg(v)$, and for a tree this equals $n - 1$ (Theorem 31.2). 2. A loop (edge from a vertex to itself) is a cycle of length 1, forbidden by acyclicity; two parallel edges between the same pair $u, v$ form a cycle of length 2 (go over on one, back on the other), also forbidden. So a tree has neither — it is simple. 3. Structural induction (Chapter 7) — inducting on the recursive structure of the tree by reducing to a smaller tree. The analogous Chapter 7 fact: a full binary tree with $i$ internal nodes has $i + 1$ leaves. 4. Basis: a single node is a binary tree (with empty left and right subtrees). Recursive clause: if $L$ and $R$ are binary trees (or empty), then a root node with left subtree $L$ and right subtree $R$ is a binary tree. (Exclusion: nothing else is.)

What's Next

You can now recognize a tree, prove it has $n-1$ edges, and wield rooted/binary trees, BSTs, heaps, expression trees, tries, and Huffman codes. The natural next question turns the edge-count theorem into an optimization problem: given a connected graph that is not a tree — say a network of cities with many possible road segments, each with a cost — which subset of edges forms a tree that touches every vertex most cheaply? That is the minimum spanning tree problem, and it is where the greedy exchange argument you met in Huffman's proof returns in full force as the cut property. Chapter 32 develops Kruskal's and Prim's algorithms, the union-find data structure that makes Kruskal fast, and the correctness proofs that certify these famously simple greedy algorithms actually work — using the is_tree validator you just built to check their output.

Prerequisites

Learning Objectives

In This Chapter

Chapter 31: Trees

Overview

Learning Paths

31.1 Trees and their properties

Equivalent characterizations

The counting theorem: $n - 1$ edges

31.2 Rooted trees and terminology

31.3 Binary trees and traversals

Traversing a binary tree

31.4 Binary search trees

Why height is everything

31.5 Tree applications: heaps, expression trees, and tries

Binary heaps

Expression trees

Tries

31.6 Huffman coding: optimal prefix codes

Why Huffman is optimal

Project Checkpoint: trees in the dmtoolkit graph module

Summary

Spaced Review

What's Next

Project Checkpoint: trees in the `dmtoolkit` graph module