22 min read

> "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

Learning Objectives

  • Explain the LIFO principle and implement a stack using a Python list
  • Explain the FIFO principle and implement a queue using collections.deque
  • Apply stacks to solve problems like bracket matching, undo systems, and backtracking
  • Apply queues to solve problems like task scheduling and breadth-first traversal
  • Define abstract data type (ADT) and distinguish interface from implementation
  • Describe linked lists, trees, graphs, and hash tables at a conceptual level
  • Choose the appropriate data structure for a given problem using trade-off analysis

Chapter 20: Intro to Data Structures: Stacks, Queues, and Beyond

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." — Linus Torvalds

Chapter Overview

You already know lists, dictionaries, sets, and tuples. You can store data, look things up, iterate, and sort. So why does computer science have an entire subfield dedicated to data structures? Because general-purpose containers are like Swiss Army knives — useful everywhere, ideal nowhere. When a problem has a specific shape, a specialized data structure doesn't just make your code cleaner — it makes certain bugs structurally impossible and certain operations dramatically faster.

This chapter introduces two of the most important specialized data structures in computing: the stack and the queue. You'll build both from scratch, see how they solve real problems, and learn to recognize when a problem is secretly a stack problem or a queue problem. Along the way, you'll encounter one of the most important ideas in computer science — the abstract data type — which separates what a data structure does from how it does it.

We'll close with a conceptual tour of the data structures you'll meet in CS2: linked lists, trees, graphs, and hash tables. You won't implement them here, but understanding why they exist and what problems they solve will make the transition to more advanced courses far smoother.

In this chapter, you will learn to: - Implement a stack (LIFO) and use it for backtracking, undo, and bracket matching - Implement a queue (FIFO) and use it for task scheduling and processing pipelines - Think about data structures as abstract interfaces, not just concrete code - Describe when and why linked lists, trees, graphs, and hash tables are used - Choose the right data structure for a given problem by analyzing trade-offs

🏃 Fast Track: If you're comfortable with list operations and OOP from Chapters 8 and 14, skim 20.1 and dive into 20.2. Don't skip 20.6 (Abstract Data Types) — it's the threshold concept that changes how you think about all data structures.

🔬 Deep Dive: After this chapter, explore Python's queue module (queue.Queue, queue.LifoQueue, queue.PriorityQueue) for thread-safe implementations used in concurrent programming. The heapq module provides a priority queue via a binary heap — a data structure you'll study in depth in CS2.


20.1 Beyond Lists and Dicts

Think about the tools in a carpenter's workshop. A hammer works for dozens of tasks, but a carpenter also keeps specialized tools — a miter saw for precise angled cuts, a planer for flattening boards, a clamp for holding pieces in place while glue dries. You could substitute a hammer and brute force for most of these, but the specialized tool produces better results with less effort and fewer mistakes.

Data structures work the same way. Python's list is the hammer — versatile, powerful, your go-to for most problems. But some problems have constraints that a list doesn't enforce:

  • An undo system needs to reverse the most recent action first. A list can do this (just pop() from the end), but nothing prevents you from accidentally removing an item from the middle. A stack makes "last in, first out" the only option.
  • A print queue needs to process documents in the order they arrived. A list can do this too, but list.pop(0) is O(n), which gets slow fast. A queue enforces "first in, first out" and keeps the operation O(1).
  • A social network needs to represent who follows whom — relationships, not just items. No built-in Python type captures this naturally. A graph does.

The key insight is this: constraints are features, not limitations. When a data structure restricts what operations are allowed, it simultaneously prevents an entire category of bugs and communicates intent to anyone reading your code.

🔄 Spaced Review (Chapter 17): In Chapter 17, you learned that algorithm choice affects performance dramatically — O(n) vs. O(n log n) vs. O(n^2). Data structure choice has the same impact. The structure you choose determines which operations are fast, which are slow, and which are impossible. Keep Big-O thinking active as you read this chapter.


20.2 Stacks: Last In, First Out

A stack is a collection where the last item added is the first item removed. This pattern is called LIFO — Last In, First Out.

Think of a stack of plates in a cafeteria. You can only add a plate to the top, and you can only remove the plate on top. You can't pull a plate from the middle without toppling everything above it. That's exactly how a stack data structure works.

The Stack Interface

A stack supports exactly three core operations:

Operation Description Analogy
push(item) Add an item to the top Place a plate on the stack
pop() Remove and return the top item Take the top plate off
peek() Look at the top item without removing it Glance at the top plate

Two helper operations round out the interface: - is_empty() — returns True if the stack has no items - size() — returns the number of items

That's it. No indexing. No searching. No inserting in the middle. These restrictions are the whole point.

Implementing a Stack

Here's a complete stack implementation using a Python list as the internal storage. The list's append() and pop() methods — both O(1) — map directly to stack push and pop:

class Stack:
    """A Last-In, First-Out (LIFO) data structure."""

    def __init__(self):
        self._items: list = []

    def push(self, item) -> None:
        """Add an item to the top of the stack."""
        self._items.append(item)

    def pop(self):
        """Remove and return the top item. Raises IndexError if empty."""
        if self.is_empty():
            raise IndexError("pop from empty stack")
        return self._items.pop()

    def peek(self):
        """Return the top item without removing it."""
        if self.is_empty():
            raise IndexError("peek at empty stack")
        return self._items[-1]

    def is_empty(self) -> bool:
        """Return True if the stack contains no items."""
        return len(self._items) == 0

    def size(self) -> int:
        """Return the number of items in the stack."""
        return len(self._items)

    def __repr__(self) -> str:
        return f"Stack({self._items})"

Notice three design choices:

  1. The internal list is named _items with a leading underscore. This is a Python convention from Chapter 14 — it signals "this is an implementation detail, not part of the public interface." Users of the stack should only use push, pop, peek, is_empty, and size.

  2. pop() and peek() raise IndexError when the stack is empty. This is fail-fast design. If someone tries to pop from an empty stack, they made a logic error, and the error message tells them exactly what went wrong.

  3. All operations are O(1). list.append() and list.pop() (without an index) are both amortized O(1). So our stack is efficient by default.

🔄 Spaced Review (Chapter 14): This is an example of encapsulation — hiding internal state behind a public interface. The Stack class wraps a list but only exposes stack operations. Users don't need to know (or care) that a list is used internally. They interact with a concept, not an implementation.

Using the Stack

s = Stack()
s.push("A")
s.push("B")
s.push("C")

print(s.peek())    # C  (top of stack, not removed)
print(s.pop())     # C  (removed)
print(s.pop())     # B  (removed)
print(s.size())    # 1  (only "A" remains)
print(s.pop())     # A  (removed)
print(s.is_empty())  # True

Visualizing the stack after each operation helps build intuition:

push("A"):  | A |          (bottom)
push("B"):  | A | B |
push("C"):  | A | B | C |  (top)
pop():      | A | B |       → returns "C"
pop():      | A |           → returns "B"
pop():      |   |           → returns "A"  (now empty)

The Text Adventure: Room Backtracking

In Crypts of Pythonia, the player explores rooms. When they enter a new room, we push the previous room onto a history stack. When they type "go back," we pop the most recent room. This naturally implements backtracking — the player always returns to the room they just left.

room_history = Stack()
current_room = "Entrance Hall"

def move_to(room_name: str) -> None:
    global current_room
    room_history.push(current_room)
    current_room = room_name
    print(f"You enter the {current_room}.")

def go_back() -> None:
    global current_room
    if room_history.is_empty():
        print("You're at the starting room. Nowhere to go back to.")
        return
    current_room = room_history.pop()
    print(f"You return to the {current_room}.")

# Simulate gameplay
move_to("Library")           # You enter the Library.
move_to("Secret Passage")    # You enter the Secret Passage.
move_to("Dragon's Lair")     # You enter the Dragon's Lair.

go_back()                    # You return to the Secret Passage.
go_back()                    # You return to the Library.
go_back()                    # You return to the Entrance Hall.
go_back()                    # You're at the starting room. Nowhere to go back to.

This is exactly how your web browser's "back" button works. Every page you visit is pushed onto a history stack. Clicking "back" pops the most recent page. The "forward" button uses a second stack — we'll build exactly this pattern in the undo/redo system below.

🔄 Check Your Understanding #1: If you push the values 1, 2, 3, 4 onto a stack and then pop twice, what value is on top? What is the stack's size?

Answer

After pushing 1, 2, 3, 4, the top is 4. Popping twice removes 4 then 3. The top is now 2, and the size is 2 (items 1 and 2 remain).


20.3 Stack Applications

Stacks show up everywhere in computing. Here are three important applications beyond the room backtracking example above.

Application 1: Matching Brackets

Every programmer has encountered the dreaded "unexpected EOF" or "unmatched parenthesis" error. Code editors solve this by using a stack to validate brackets as you type.

The algorithm: 1. Scan the string left to right. 2. When you see an opening bracket ((, [, {), push it onto the stack. 3. When you see a closing bracket (), ], }), pop the stack and check that it matches. If the stack is empty or the brackets don't match, the input is invalid. 4. After scanning the entire string, the stack should be empty. If it's not, there are unmatched opening brackets.

def check_brackets(text: str) -> tuple[bool, str]:
    """Check whether brackets in text are properly matched."""
    MATCH = {")": "(", "]": "[", "}": "{"}
    stack = Stack()

    for i, ch in enumerate(text):
        if ch in "([{":
            stack.push((ch, i))
        elif ch in ")]}":
            if stack.is_empty():
                return False, f"Unmatched '{ch}' at position {i}"
            top_char, top_pos = stack.pop()
            if top_char != MATCH[ch]:
                return False, (f"Mismatched '{top_char}' at position "
                               f"{top_pos} with '{ch}' at position {i}")

    if not stack.is_empty():
        top_char, top_pos = stack.pop()
        return False, f"Unmatched '{top_char}' at position {top_pos}"

    return True, "All brackets matched!"

# Test cases
print(check_brackets("{[()]}"))        # (True, 'All brackets matched!')
print(check_brackets("print('hi')"))   # (True, 'All brackets matched!')
print(check_brackets("data = [1, 2}")) # (False, "Mismatched '[' at position 7 with '}' at position 14")
print(check_brackets("((())"))         # (False, "Unmatched '(' at position 0")

Why a stack? Because brackets nest. The most recently opened bracket must be the first one closed. That's LIFO.

Application 2: Undo/Redo

An undo system uses two stacks: - Undo stack: Every time you perform an action, push a record of it onto the undo stack. - Redo stack: When you undo, move that record to the redo stack. - Critical rule: Any new action clears the redo stack. You've branched into a new timeline.

class TextEditor:
    """A minimal text editor with undo/redo."""

    def __init__(self):
        self.text: str = ""
        self._undo_stack = Stack()
        self._redo_stack = Stack()

    def type_text(self, new_text: str) -> None:
        self._undo_stack.push(self.text)   # save current state
        self._redo_stack = Stack()          # new action kills redo
        self.text += new_text

    def undo(self) -> None:
        if self._undo_stack.is_empty():
            print("Nothing to undo.")
            return
        self._redo_stack.push(self.text)
        self.text = self._undo_stack.pop()

    def redo(self) -> None:
        if self._redo_stack.is_empty():
            print("Nothing to redo.")
            return
        self._undo_stack.push(self.text)
        self.text = self._redo_stack.pop()

editor = TextEditor()
editor.type_text("Hello")
editor.type_text(", World")
print(editor.text)    # Hello, World

editor.undo()
print(editor.text)    # Hello

editor.undo()
print(editor.text)    # (empty string)

editor.redo()
print(editor.text)    # Hello

This is how Ctrl+Z works in virtually every application you use. The fact that it feels natural — always undoing the most recent thing — is because the underlying data structure is a stack.

Application 3: Expression Evaluation

When Python evaluates 2 + 3 * 4, it doesn't just go left to right. It respects operator precedence: multiplication before addition. Compilers and interpreters use stacks to manage this evaluation. While a full expression parser is beyond this course, the key insight is that the call stack itself — the mechanism Python uses to track function calls — is literally a stack. When function a calls function b which calls function c, the return addresses are pushed onto a stack. When c finishes, Python pops the stack to return to b. When b finishes, it pops to return to a. This is why a deeply recursive function can cause a RecursionError: maximum recursion depth exceeded — the call stack has a finite size (Chapter 18).

🧩 Productive Struggle: Design an undo system for a text editor. What data structure would you use? Think about: what happens when you type, delete, undo, redo, then type something new? Write pseudocode before looking at the solution above. The insight that a new action clears the redo stack is the hardest part — and the most important.


20.4 Queues: First In, First Out

A queue is a collection where the first item added is the first item removed. This pattern is called FIFO — First In, First Out.

Think of a line at a coffee shop. The person who arrived first gets served first. New customers join at the back of the line. Nobody cuts in front (in a fair queue, at least). That's exactly how a queue data structure works.

The Queue Interface

Operation Description Analogy
enqueue(item) Add an item to the back Join the end of the line
dequeue() Remove and return the front item Serve the next customer
front() Look at the front item without removing it See who's next

Plus the usual is_empty() and size() helpers.

Why Not Just Use a List?

You can implement a queue with a list — append() to add at the back, pop(0) to remove from the front. But pop(0) is O(n) because Python has to shift every remaining element forward by one position. For a queue with 100,000 items, that's 100,000 shifts on every dequeue. That's terrible.

Python's collections.deque (pronounced "deck," short for "double-ended queue") solves this. It provides O(1) append() and O(1) popleft():

from collections import deque

class Queue:
    """A First-In, First-Out (FIFO) data structure."""

    def __init__(self):
        self._items: deque = deque()

    def enqueue(self, item) -> None:
        """Add an item to the back of the queue."""
        self._items.append(item)

    def dequeue(self):
        """Remove and return the front item."""
        if self.is_empty():
            raise IndexError("dequeue from empty queue")
        return self._items.popleft()

    def front(self):
        """Return the front item without removing it."""
        if self.is_empty():
            raise IndexError("front of empty queue")
        return self._items[0]

    def is_empty(self) -> bool:
        return len(self._items) == 0

    def size(self) -> int:
        return len(self._items)

    def __repr__(self) -> str:
        return f"Queue({list(self._items)})"

The performance difference is dramatic:

import time
from collections import deque

# list.pop(0) — O(n) per operation, O(n^2) total
test_list = list(range(100_000))
start = time.perf_counter()
while test_list:
    test_list.pop(0)
list_time = time.perf_counter() - start

# deque.popleft() — O(1) per operation, O(n) total
test_deque = deque(range(100_000))
start = time.perf_counter()
while test_deque:
    test_deque.popleft()
deque_time = time.perf_counter() - start

print(f"list:  {list_time:.3f}s")   # ~0.44s (varies)
print(f"deque: {deque_time:.3f}s")  # ~0.003s (varies)

⚠️ Pitfall: If you see some_list.pop(0) in a loop, that's a code smell. It means "I need a queue but I'm using a list." Replace the list with a deque and use popleft().

Using the Queue

q = Queue()
q.enqueue("Alice")
q.enqueue("Bob")
q.enqueue("Charlie")

print(q.front())     # Alice  (first in line, not removed)
print(q.dequeue())   # Alice  (served, removed)
print(q.dequeue())   # Bob    (served, removed)
print(q.size())      # 1      (only Charlie remains)

Visualizing the queue:

enqueue("Alice"):   [Alice]           front →  ← back
enqueue("Bob"):     [Alice, Bob]
enqueue("Charlie"): [Alice, Bob, Charlie]
dequeue():          [Bob, Charlie]     → returns "Alice"
dequeue():          [Charlie]          → returns "Bob"

🔄 Check Your Understanding #2: If you enqueue the values "X", "Y", "Z" and then dequeue once, what is at the front of the queue? What would happen if you dequeued three more times?

Answer

After enqueueing X, Y, Z and dequeuing once (removes X), the front is "Y". Dequeuing once more returns Y (front becomes Z). Dequeuing again returns Z (queue is now empty). A fourth dequeue raises IndexError: dequeue from empty queue.


20.5 Queue Applications

Application 1: Elena's Report Processing Pipeline

Elena Vasquez, our nonprofit data analyst, receives reports from multiple departments throughout the week. Each report needs to be processed in the order it arrives — a natural FIFO scenario:

report_queue = Queue()

# Reports arrive throughout the day
report_queue.enqueue("Q1 Donations Summary")
report_queue.enqueue("Volunteer Hours Log")
report_queue.enqueue("Grant Application Status")
report_queue.enqueue("Monthly Budget Reconciliation")

# Elena processes them in order
while not report_queue.is_empty():
    report = report_queue.dequeue()
    print(f"Processing: {report}")

# Output (always in arrival order):
# Processing: Q1 Donations Summary
# Processing: Volunteer Hours Log
# Processing: Grant Application Status
# Processing: Monthly Budget Reconciliation

Application 2: Task Scheduling

Operating systems use queues to schedule which process runs next. A simplified version: tasks arrive with different processing times, and the system works through them in order.

from collections import deque

task_queue: deque = deque()

# Tasks arrive
tasks = [
    {"name": "Download update", "time_ms": 3000},
    {"name": "Save document", "time_ms": 500},
    {"name": "Send email", "time_ms": 1200},
]

for task in tasks:
    task_queue.append(task)

# Process in FIFO order
total_wait = 0
while task_queue:
    task = task_queue.popleft()
    total_wait += task["time_ms"]
    print(f"Running '{task['name']}' — "
          f"completes at {total_wait}ms")

# Output:
# Running 'Download update' — completes at 3000ms
# Running 'Save document' — completes at 3500ms
# Running 'Send email' — completes at 4700ms

Application 3: Breadth-First Search Preview

In Chapter 18, you learned about recursion and depth-first exploration. Queues enable a fundamentally different traversal pattern: breadth-first search (BFS). Imagine you're exploring a maze. Depth-first search (using a stack or recursion) dives deep down one path before backtracking. Breadth-first search (using a queue) explores all rooms one step away, then all rooms two steps away, then three steps away. BFS guarantees finding the shortest path — and it relies on a queue.

Here's a taste — a BFS that finds the shortest path in a simple grid:

def bfs_shortest_path(grid: list[list[str]], start: tuple[int, int],
                      goal: tuple[int, int]) -> int | None:
    """Find shortest path length in a grid using BFS. '#' = wall."""
    rows, cols = len(grid), len(grid[0])
    queue: deque = deque()
    queue.append((start[0], start[1], 0))  # row, col, distance
    visited: set = {start}

    while queue:
        row, col, dist = queue.popleft()
        if (row, col) == goal:
            return dist
        for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
            nr, nc = row + dr, col + dc
            if (0 <= nr < rows and 0 <= nc < cols
                    and grid[nr][nc] != "#" and (nr, nc) not in visited):
                visited.add((nr, nc))
                queue.append((nr, nc, dist + 1))

    return None  # no path found

maze = [
    [".", ".", "#", "."],
    ["#", ".", "#", "."],
    [".", ".", ".", "."],
    [".", "#", "#", "."],
]

result = bfs_shortest_path(maze, (0, 0), (3, 3))
print(f"Shortest path length: {result}")  # Shortest path length: 6

The queue ensures we explore closer cells before farther ones — that's what guarantees shortest-path correctness. If we used a stack instead, we'd get depth-first search, which might find a path but not necessarily the shortest.

🔄 Check Your Understanding #3: A stack uses LIFO and enables depth-first exploration. A queue uses FIFO and enables breadth-first exploration. If you replaced the queue in BFS with a stack, what change in behavior would you observe?

Answer

The algorithm would become depth-first search (DFS). It would still find a path (if one exists), but it would explore one branch deeply before trying others. It would not guarantee finding the shortest path. DFS might return a path of length 20 even if a path of length 6 exists.


20.6 Abstract Data Types

This is the threshold concept of the chapter. Take your time with it.

🚪 Threshold Concept: Abstract Data Types

Before: "A stack is a list with restrictions."

After: "A stack is a concept (LIFO behavior) that can be implemented in many ways."

An abstract data type (ADT) is a specification of what a data structure does — its operations and their behavior — without specifying how it does it internally. It's the difference between a blueprint and a building.

Consider the stack ADT. The specification says: - push(item) adds an item to the top - pop() removes and returns the top item - peek() returns the top item without removing it - is_empty() reports whether the stack is empty

That's the entire ADT. It says nothing about lists, arrays, linked lists, or any other implementation technology. Any code that provides these operations with LIFO behavior is a stack, regardless of how it works internally.

Why Does This Matter?

Because it means you can swap implementations without changing any code that uses the data structure. This is the same principle you encountered with polymorphism in Chapter 15 — code against the interface, not the implementation.

Here's a concrete example. We implemented our stack using a Python list:

class ListStack:
    """Stack implemented with a Python list."""

    def __init__(self):
        self._items: list = []

    def push(self, item) -> None:
        self._items.append(item)

    def pop(self):
        if self.is_empty():
            raise IndexError("pop from empty stack")
        return self._items.pop()

    def peek(self):
        if self.is_empty():
            raise IndexError("peek at empty stack")
        return self._items[-1]

    def is_empty(self) -> bool:
        return len(self._items) == 0

    def size(self) -> int:
        return len(self._items)

But we could also implement the same ADT using a collections.deque:

from collections import deque

class DequeStack:
    """Stack implemented with a deque."""

    def __init__(self):
        self._items: deque = deque()

    def push(self, item) -> None:
        self._items.append(item)

    def pop(self):
        if self.is_empty():
            raise IndexError("pop from empty stack")
        return self._items.pop()

    def peek(self):
        if self.is_empty():
            raise IndexError("peek at empty stack")
        return self._items[-1]

    def is_empty(self) -> bool:
        return len(self._items) == 0

    def size(self) -> int:
        return len(self._items)

Both classes implement the same ADT. Any code that uses push, pop, peek, and is_empty works identically with either one. The user never knows — or needs to know — which implementation is underneath.

This separation of interface from implementation is one of the most powerful ideas in computing:

Concept What How
ADT "A stack supports push, pop, peek, is_empty" (unspecified)
Data structure (matches the ADT specification) "Uses a Python list internally"
Another data structure (same ADT specification) "Uses a linked list internally"

In CS2, you'll learn that the same ADT (like "priority queue") can have dramatically different performance depending on the implementation (unsorted list vs. sorted list vs. binary heap). The ADT stays the same; the trade-offs change. Learning to separate these two concerns is what makes you a computer scientist, not just a programmer.

💡 Intuition: Think of a vending machine. The ADT is the interface: insert money, press a button, get a snack. The implementation is whatever mechanism is inside — gears, springs, a robotic arm. As a customer, you don't care how it works. You care that it works. If the manufacturer swaps the internal mechanism, your experience doesn't change — as long as the interface stays the same.


20.7 Linked Lists Conceptually

Every data structure you've used so far — lists, tuples, strings — stores elements in contiguous memory: one right after another, like books on a shelf. This is great for random access (my_list[42] is O(1)) but expensive for insertions in the middle (every element after the insertion point has to shift over).

A linked list takes a fundamentally different approach. Instead of storing elements side by side, each element is a node that contains two things: 1. The data (the actual value) 2. A reference (or pointer) to the next node

The nodes can be scattered anywhere in memory. The references stitch them together into a sequence.

Node("A")      Node("B")      Node("C")
┌──────┬───┐   ┌──────┬───┐   ┌──────┬───┐
│ "A"  │ ──┼──→│ "B"  │ ──┼──→│ "C"  │ / │
└──────┴───┘   └──────┴───┘   └──────┴───┘
  data  next     data  next     data  next
                                      (None = end)

In Python, a node might look like this:

class Node:
    """A single node in a linked list."""

    def __init__(self, data, next_node=None):
        self.data = data
        self.next = next_node

# Build a tiny linked list: A → B → C
c = Node("C")
b = Node("B", c)
a = Node("A", b)

# Traverse it
current = a
while current is not None:
    print(current.data, end=" → ")
    current = current.next
print("None")
# Output: A → B → C → None

Why Linked Lists Exist

Linked lists aren't better than Python lists for most tasks. Their power shows up in specific scenarios:

Operation Python list (array) Linked list
Access by index O(1) O(n) — must walk from the start
Insert at beginning O(n) — shift everything O(1) — just update one reference
Insert at end O(1) amortized O(1) if you keep a tail reference
Insert in middle (given position) O(n) — shift elements O(1) — update two references
Memory Contiguous block Scattered nodes + reference overhead

The trade-off is clear: linked lists sacrifice random access for fast insertions and deletions at arbitrary positions. In CS2, you'll implement linked lists fully and discover that many advanced data structures (trees, graphs, hash table chains) use linked-list-like node-and-reference patterns internally.

⚠️ Important: You don't need to implement a full linked list for this course. The goal here is to understand the concept — nodes connected by references — because it appears everywhere in advanced data structures.


20.8 Preview: Trees, Graphs, and Hash Tables

This section is a roadmap, not a tutorial. You won't implement these structures here, but knowing they exist — and what problems they solve — gives you a framework for the data structures courses ahead.

Trees

A tree is a hierarchical data structure where each node has a value and zero or more children. The topmost node is the root. Nodes with no children are leaves.

        CEO
       /    \
     CTO    CFO
    /   \      \
  Dev1  Dev2  Accountant

Trees model anything hierarchical: file systems (folders contain subfolders), HTML documents (tags contain other tags), organizational charts, decision trees, and more. A binary tree — where each node has at most two children — is the foundation of binary search trees (BSTs), which provide O(log n) search, insert, and delete.

Graphs

A graph is a set of nodes (also called vertices) connected by edges. Unlike trees, graphs can have cycles (A connects to B connects to C connects back to A), and edges can be one-directional or two-directional.

    Alice ←→ Bob
      ↕       ↕
    Carol ←→ Dave

Graphs model relationships: social networks (who follows whom), maps (cities connected by roads), dependency systems (which tasks must finish before others can start), and the internet itself (web pages linked by hyperlinks). Graph algorithms like BFS, DFS, Dijkstra's shortest path, and minimum spanning trees are among the most important algorithms in computer science.

Hash Tables

A hash table maps keys to values using a hash function — a function that converts a key into an array index. This is exactly how Python's dict works behind the scenes.

# When you write:
phonebook = {"Alice": "555-0101", "Bob": "555-0102"}

# Python internally computes hash("Alice") to find WHERE to store the value.
# That's why dict lookup is O(1) on average — it jumps directly to the slot.

In Chapter 9, you learned to use dictionaries. In CS2, you'll learn how they work: hash functions, collision resolution, load factors, and resizing. Understanding the implementation explains why dictionary keys must be hashable (immutable) and why iteration order isn't guaranteed in older Python versions.

The Big Picture

Data Structure Key Property Good For
Stack LIFO access Undo, backtracking, parsing
Queue FIFO access Scheduling, BFS, pipelines
Linked list Fast insert/delete anywhere Building blocks for trees, graphs
Tree Hierarchical relationships File systems, search, decision-making
Graph Arbitrary relationships Networks, maps, dependencies
Hash table O(1) key-value lookup Dictionaries, caches, deduplication

These six data structures (plus arrays, which you already know as Python lists) cover the vast majority of data structure needs in practice. Everything else — heaps, tries, B-trees, skip lists, bloom filters — is a specialization of these fundamentals.


20.9 Choosing the Right Data Structure

One of the most important skills in programming is choosing the right data structure for the job. Here's a comprehensive decision guide comparing the structures you know:

Need Best Choice Why
Ordered collection, access by position list O(1) index access, flexible
Key-value pairs, fast lookup by key dict O(1) average lookup
Unique elements, membership testing set O(1) average in check
Fixed record of related values tuple Immutable, hashable, lightweight
LIFO access (undo, backtracking) Stack Enforces discipline, clear intent
FIFO access (scheduling, pipelines) Queue (deque) O(1) at both ends
Lots of inserts/deletes in the middle Linked list O(1) insert/delete at known position
Hierarchical data Tree Natural representation of hierarchy
Relationships between entities Graph Models connections and networks

Decision Flowchart (Mental Model)

Ask yourself these questions in order:

  1. Do I need key-value pairs? → Use a dict.
  2. Do I need to enforce uniqueness? → Use a set.
  3. Do I need LIFO (last in, first out)? → Use a stack.
  4. Do I need FIFO (first in, first out)? → Use a queue.
  5. Do I need an ordered sequence? → Use a list (or tuple if it shouldn't change).
  6. Am I modeling relationships? → You need a graph (CS2 territory).
  7. Am I modeling hierarchy? → You need a tree (CS2 territory).

💡 Intuition: The right data structure makes the code obvious. If you find yourself writing complicated workarounds — checking if a list is empty before popping, shifting elements manually, using a list as a makeshift queue — you're probably using the wrong data structure. Step back and ask: "What shape does this problem have?"

🐛 Debugging Walkthrough: Empty Stack and Queue Errors

The most common bug with stacks and queues is trying to access an element when the container is empty. Let's trace through a typical scenario:

# Bug: processing without checking for empty
task_stack = Stack()
task_stack.push("Task A")
task_stack.push("Task B")

# Process all tasks... but one too many pops
task_stack.pop()    # Returns "Task B" — fine
task_stack.pop()    # Returns "Task A" — fine
task_stack.pop()    # IndexError: pop from empty stack

The fix: Always guard with is_empty() before pop() or dequeue():

# Correct pattern: check before popping
while not task_stack.is_empty():
    task = task_stack.pop()
    print(f"Processing: {task}")

A similar bug occurs with queues:

q = Queue()
q.enqueue("first")

# Bug: processing assumes more items than exist
for _ in range(5):
    item = q.dequeue()  # Crashes on second iteration

The fix: Same pattern — use is_empty() or size() as a guard:

while not q.is_empty():
    item = q.dequeue()
    print(f"Processed: {item}")

Rule of thumb: Never call pop(), peek(), dequeue(), or front() without first confirming the container is not empty. This applies to Python's built-in collections too — list.pop() on an empty list raises IndexError, and deque.popleft() on an empty deque does the same.


20.10 Project Checkpoint: TaskFlow v1.9

It's time to add undo/redo and a task execution queue to TaskFlow.

What's New in v1.9

  1. Undo stack: Every add_task and delete_task operation pushes a record onto an undo stack. The undo command reverses the most recent operation.
  2. Redo stack: Undone operations can be redone. A new add or delete clears the redo stack (you've branched into a new timeline).
  3. Task queue: Tasks can be scheduled for sequential execution. The process_next command completes the next queued task in FIFO order.

Implementation Overview

We need three new components:

  1. Stack and Queue classes — the ones you built in this chapter.
  2. UndoableAction class — a record that stores what was done (add or delete), which task was affected, and where it was in the list (for accurate restoration).
  3. New methods on TaskFlow: undo(), redo(), schedule_task(), process_next().

Key Code

The full implementation is in code/project-checkpoint.py. Here are the critical pieces:

Recording an action:

class UndoableAction:
    def __init__(self, action_type: str, task: Task,
                 position: int | None = None):
        self.action_type = action_type   # "add" or "delete"
        self.task = task
        self.position = position         # where it was (for delete)

Undoing an add (remove the task):

def undo(self) -> bool:
    if self._undo_stack.is_empty():
        print("Nothing to undo.")
        return False
    action = self._undo_stack.pop()
    if action.action_type == "add":
        self.tasks = [t for t in self.tasks
                      if t.id != action.task.id]
    elif action.action_type == "delete":
        pos = action.position or len(self.tasks)
        self.tasks.insert(pos, action.task)
    self._redo_stack.push(action)
    return True

Scheduling and processing tasks:

def schedule_task(self, task_id: int) -> bool:
    for task in self.tasks:
        if task.id == task_id:
            self._execution_queue.enqueue(task)
            return True
    return False

def process_next(self) -> Task | None:
    if self._execution_queue.is_empty():
        return None
    task = self._execution_queue.dequeue()
    task.done = True
    return task

Testing Your Implementation

tf = TaskFlow()
tf.add_task("Write notes", "high", "school")
tf.add_task("Buy groceries", "medium", "personal")
tf.add_task("Review PR", "high", "work")

# Delete and undo
tf.delete_task(2)          # Removes "Buy groceries"
tf.undo()                  # Restores "Buy groceries"

# Schedule and process
tf.schedule_task(1)        # Queue: [Task #1]
tf.schedule_task(3)        # Queue: [Task #1, Task #3]
tf.process_next()          # Completes Task #1 (FIFO)
tf.process_next()          # Completes Task #3

Run code/project-checkpoint.py for a complete demo with expected output for every operation.

🔄 Spaced Review (Chapter 19): In v1.8, you added custom sorting and binary search. Those features still work — undo/redo is a layer on top of the existing task management. Notice how each new feature composes with the existing code rather than replacing it. Good design is additive.


Chapter Summary

This chapter introduced stacks (LIFO), queues (FIFO), and the critical idea of abstract data types — separating what a data structure does from how it's implemented.

Key ideas:

  • A stack supports push, pop, and peek. It's used for undo systems, bracket matching, backtracking, and expression evaluation. Implement it with a Python list using append() and pop() for O(1) operations.
  • A queue supports enqueue, dequeue, and front. It's used for task scheduling, processing pipelines, and breadth-first search. Implement it with collections.deque using append() and popleft() for O(1) operations.
  • An abstract data type defines what operations a data structure supports, not how it implements them. The same ADT can have multiple implementations with different performance characteristics.
  • Linked lists store elements in nodes connected by references, trading random access for fast insertions.
  • Trees, graphs, and hash tables are fundamental data structures you'll study in CS2, each optimized for a different class of problems.
  • Choosing the right data structure is one of the most impactful design decisions you'll make. Match the structure to the problem's shape.

Looking ahead: In Chapter 21, you'll move from data structures to data sources. You'll learn to read from and write to files, consume APIs, and work with CSV and JSON data — connecting your programs to the outside world. The queue pattern from this chapter will reappear when you process records from a file in the order they arrive.


"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important." — Linus Torvalds