> "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
Learning Objectives
- Explain the LIFO principle and implement a stack using a Python list
- Explain the FIFO principle and implement a queue using collections.deque
- Apply stacks to solve problems like bracket matching, undo systems, and backtracking
- Apply queues to solve problems like task scheduling and breadth-first traversal
- Define abstract data type (ADT) and distinguish interface from implementation
- Describe linked lists, trees, graphs, and hash tables at a conceptual level
- Choose the appropriate data structure for a given problem using trade-off analysis
In This Chapter
- Chapter Overview
- 20.1 Beyond Lists and Dicts
- 20.2 Stacks: Last In, First Out
- 20.3 Stack Applications
- 20.4 Queues: First In, First Out
- 20.5 Queue Applications
- 20.6 Abstract Data Types
- 20.7 Linked Lists Conceptually
- 20.8 Preview: Trees, Graphs, and Hash Tables
- 20.9 Choosing the Right Data Structure
- 20.10 Project Checkpoint: TaskFlow v1.9
- Chapter Summary
Chapter 20: Intro to Data Structures: Stacks, Queues, and Beyond
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." — Linus Torvalds
Chapter Overview
You already know lists, dictionaries, sets, and tuples. You can store data, look things up, iterate, and sort. So why does computer science have an entire subfield dedicated to data structures? Because general-purpose containers are like Swiss Army knives — useful everywhere, ideal nowhere. When a problem has a specific shape, a specialized data structure doesn't just make your code cleaner — it makes certain bugs structurally impossible and certain operations dramatically faster.
This chapter introduces two of the most important specialized data structures in computing: the stack and the queue. You'll build both from scratch, see how they solve real problems, and learn to recognize when a problem is secretly a stack problem or a queue problem. Along the way, you'll encounter one of the most important ideas in computer science — the abstract data type — which separates what a data structure does from how it does it.
We'll close with a conceptual tour of the data structures you'll meet in CS2: linked lists, trees, graphs, and hash tables. You won't implement them here, but understanding why they exist and what problems they solve will make the transition to more advanced courses far smoother.
In this chapter, you will learn to: - Implement a stack (LIFO) and use it for backtracking, undo, and bracket matching - Implement a queue (FIFO) and use it for task scheduling and processing pipelines - Think about data structures as abstract interfaces, not just concrete code - Describe when and why linked lists, trees, graphs, and hash tables are used - Choose the right data structure for a given problem by analyzing trade-offs
🏃 Fast Track: If you're comfortable with list operations and OOP from Chapters 8 and 14, skim 20.1 and dive into 20.2. Don't skip 20.6 (Abstract Data Types) — it's the threshold concept that changes how you think about all data structures.
🔬 Deep Dive: After this chapter, explore Python's
queuemodule (queue.Queue,queue.LifoQueue,queue.PriorityQueue) for thread-safe implementations used in concurrent programming. Theheapqmodule provides a priority queue via a binary heap — a data structure you'll study in depth in CS2.
20.1 Beyond Lists and Dicts
Think about the tools in a carpenter's workshop. A hammer works for dozens of tasks, but a carpenter also keeps specialized tools — a miter saw for precise angled cuts, a planer for flattening boards, a clamp for holding pieces in place while glue dries. You could substitute a hammer and brute force for most of these, but the specialized tool produces better results with less effort and fewer mistakes.
Data structures work the same way. Python's list is the hammer — versatile, powerful, your go-to for most problems. But some problems have constraints that a list doesn't enforce:
- An undo system needs to reverse the most recent action first. A list can do this (just
pop()from the end), but nothing prevents you from accidentally removing an item from the middle. A stack makes "last in, first out" the only option. - A print queue needs to process documents in the order they arrived. A list can do this too, but
list.pop(0)is O(n), which gets slow fast. A queue enforces "first in, first out" and keeps the operation O(1). - A social network needs to represent who follows whom — relationships, not just items. No built-in Python type captures this naturally. A graph does.
The key insight is this: constraints are features, not limitations. When a data structure restricts what operations are allowed, it simultaneously prevents an entire category of bugs and communicates intent to anyone reading your code.
🔄 Spaced Review (Chapter 17): In Chapter 17, you learned that algorithm choice affects performance dramatically — O(n) vs. O(n log n) vs. O(n^2). Data structure choice has the same impact. The structure you choose determines which operations are fast, which are slow, and which are impossible. Keep Big-O thinking active as you read this chapter.
20.2 Stacks: Last In, First Out
A stack is a collection where the last item added is the first item removed. This pattern is called LIFO — Last In, First Out.
Think of a stack of plates in a cafeteria. You can only add a plate to the top, and you can only remove the plate on top. You can't pull a plate from the middle without toppling everything above it. That's exactly how a stack data structure works.
The Stack Interface
A stack supports exactly three core operations:
| Operation | Description | Analogy |
|---|---|---|
push(item) |
Add an item to the top | Place a plate on the stack |
pop() |
Remove and return the top item | Take the top plate off |
peek() |
Look at the top item without removing it | Glance at the top plate |
Two helper operations round out the interface:
- is_empty() — returns True if the stack has no items
- size() — returns the number of items
That's it. No indexing. No searching. No inserting in the middle. These restrictions are the whole point.
Implementing a Stack
Here's a complete stack implementation using a Python list as the internal storage. The list's append() and pop() methods — both O(1) — map directly to stack push and pop:
class Stack:
"""A Last-In, First-Out (LIFO) data structure."""
def __init__(self):
self._items: list = []
def push(self, item) -> None:
"""Add an item to the top of the stack."""
self._items.append(item)
def pop(self):
"""Remove and return the top item. Raises IndexError if empty."""
if self.is_empty():
raise IndexError("pop from empty stack")
return self._items.pop()
def peek(self):
"""Return the top item without removing it."""
if self.is_empty():
raise IndexError("peek at empty stack")
return self._items[-1]
def is_empty(self) -> bool:
"""Return True if the stack contains no items."""
return len(self._items) == 0
def size(self) -> int:
"""Return the number of items in the stack."""
return len(self._items)
def __repr__(self) -> str:
return f"Stack({self._items})"
Notice three design choices:
-
The internal list is named
_itemswith a leading underscore. This is a Python convention from Chapter 14 — it signals "this is an implementation detail, not part of the public interface." Users of the stack should only usepush,pop,peek,is_empty, andsize. -
pop()andpeek()raiseIndexErrorwhen the stack is empty. This is fail-fast design. If someone tries to pop from an empty stack, they made a logic error, and the error message tells them exactly what went wrong. -
All operations are O(1).
list.append()andlist.pop()(without an index) are both amortized O(1). So our stack is efficient by default.
🔄 Spaced Review (Chapter 14): This is an example of encapsulation — hiding internal state behind a public interface. The
Stackclass wraps a list but only exposes stack operations. Users don't need to know (or care) that a list is used internally. They interact with a concept, not an implementation.
Using the Stack
s = Stack()
s.push("A")
s.push("B")
s.push("C")
print(s.peek()) # C (top of stack, not removed)
print(s.pop()) # C (removed)
print(s.pop()) # B (removed)
print(s.size()) # 1 (only "A" remains)
print(s.pop()) # A (removed)
print(s.is_empty()) # True
Visualizing the stack after each operation helps build intuition:
push("A"): | A | (bottom)
push("B"): | A | B |
push("C"): | A | B | C | (top)
pop(): | A | B | → returns "C"
pop(): | A | → returns "B"
pop(): | | → returns "A" (now empty)
The Text Adventure: Room Backtracking
In Crypts of Pythonia, the player explores rooms. When they enter a new room, we push the previous room onto a history stack. When they type "go back," we pop the most recent room. This naturally implements backtracking — the player always returns to the room they just left.
room_history = Stack()
current_room = "Entrance Hall"
def move_to(room_name: str) -> None:
global current_room
room_history.push(current_room)
current_room = room_name
print(f"You enter the {current_room}.")
def go_back() -> None:
global current_room
if room_history.is_empty():
print("You're at the starting room. Nowhere to go back to.")
return
current_room = room_history.pop()
print(f"You return to the {current_room}.")
# Simulate gameplay
move_to("Library") # You enter the Library.
move_to("Secret Passage") # You enter the Secret Passage.
move_to("Dragon's Lair") # You enter the Dragon's Lair.
go_back() # You return to the Secret Passage.
go_back() # You return to the Library.
go_back() # You return to the Entrance Hall.
go_back() # You're at the starting room. Nowhere to go back to.
This is exactly how your web browser's "back" button works. Every page you visit is pushed onto a history stack. Clicking "back" pops the most recent page. The "forward" button uses a second stack — we'll build exactly this pattern in the undo/redo system below.
🔄 Check Your Understanding #1: If you push the values 1, 2, 3, 4 onto a stack and then pop twice, what value is on top? What is the stack's size?
Answer
After pushing 1, 2, 3, 4, the top is 4. Popping twice removes 4 then 3. The top is now 2, and the size is 2 (items 1 and 2 remain).
20.3 Stack Applications
Stacks show up everywhere in computing. Here are three important applications beyond the room backtracking example above.
Application 1: Matching Brackets
Every programmer has encountered the dreaded "unexpected EOF" or "unmatched parenthesis" error. Code editors solve this by using a stack to validate brackets as you type.
The algorithm:
1. Scan the string left to right.
2. When you see an opening bracket ((, [, {), push it onto the stack.
3. When you see a closing bracket (), ], }), pop the stack and check that it matches. If the stack is empty or the brackets don't match, the input is invalid.
4. After scanning the entire string, the stack should be empty. If it's not, there are unmatched opening brackets.
def check_brackets(text: str) -> tuple[bool, str]:
"""Check whether brackets in text are properly matched."""
MATCH = {")": "(", "]": "[", "}": "{"}
stack = Stack()
for i, ch in enumerate(text):
if ch in "([{":
stack.push((ch, i))
elif ch in ")]}":
if stack.is_empty():
return False, f"Unmatched '{ch}' at position {i}"
top_char, top_pos = stack.pop()
if top_char != MATCH[ch]:
return False, (f"Mismatched '{top_char}' at position "
f"{top_pos} with '{ch}' at position {i}")
if not stack.is_empty():
top_char, top_pos = stack.pop()
return False, f"Unmatched '{top_char}' at position {top_pos}"
return True, "All brackets matched!"
# Test cases
print(check_brackets("{[()]}")) # (True, 'All brackets matched!')
print(check_brackets("print('hi')")) # (True, 'All brackets matched!')
print(check_brackets("data = [1, 2}")) # (False, "Mismatched '[' at position 7 with '}' at position 14")
print(check_brackets("((())")) # (False, "Unmatched '(' at position 0")
Why a stack? Because brackets nest. The most recently opened bracket must be the first one closed. That's LIFO.
Application 2: Undo/Redo
An undo system uses two stacks: - Undo stack: Every time you perform an action, push a record of it onto the undo stack. - Redo stack: When you undo, move that record to the redo stack. - Critical rule: Any new action clears the redo stack. You've branched into a new timeline.
class TextEditor:
"""A minimal text editor with undo/redo."""
def __init__(self):
self.text: str = ""
self._undo_stack = Stack()
self._redo_stack = Stack()
def type_text(self, new_text: str) -> None:
self._undo_stack.push(self.text) # save current state
self._redo_stack = Stack() # new action kills redo
self.text += new_text
def undo(self) -> None:
if self._undo_stack.is_empty():
print("Nothing to undo.")
return
self._redo_stack.push(self.text)
self.text = self._undo_stack.pop()
def redo(self) -> None:
if self._redo_stack.is_empty():
print("Nothing to redo.")
return
self._undo_stack.push(self.text)
self.text = self._redo_stack.pop()
editor = TextEditor()
editor.type_text("Hello")
editor.type_text(", World")
print(editor.text) # Hello, World
editor.undo()
print(editor.text) # Hello
editor.undo()
print(editor.text) # (empty string)
editor.redo()
print(editor.text) # Hello
This is how Ctrl+Z works in virtually every application you use. The fact that it feels natural — always undoing the most recent thing — is because the underlying data structure is a stack.
Application 3: Expression Evaluation
When Python evaluates 2 + 3 * 4, it doesn't just go left to right. It respects operator precedence: multiplication before addition. Compilers and interpreters use stacks to manage this evaluation. While a full expression parser is beyond this course, the key insight is that the call stack itself — the mechanism Python uses to track function calls — is literally a stack. When function a calls function b which calls function c, the return addresses are pushed onto a stack. When c finishes, Python pops the stack to return to b. When b finishes, it pops to return to a. This is why a deeply recursive function can cause a RecursionError: maximum recursion depth exceeded — the call stack has a finite size (Chapter 18).
🧩 Productive Struggle: Design an undo system for a text editor. What data structure would you use? Think about: what happens when you type, delete, undo, redo, then type something new? Write pseudocode before looking at the solution above. The insight that a new action clears the redo stack is the hardest part — and the most important.
20.4 Queues: First In, First Out
A queue is a collection where the first item added is the first item removed. This pattern is called FIFO — First In, First Out.
Think of a line at a coffee shop. The person who arrived first gets served first. New customers join at the back of the line. Nobody cuts in front (in a fair queue, at least). That's exactly how a queue data structure works.
The Queue Interface
| Operation | Description | Analogy |
|---|---|---|
enqueue(item) |
Add an item to the back | Join the end of the line |
dequeue() |
Remove and return the front item | Serve the next customer |
front() |
Look at the front item without removing it | See who's next |
Plus the usual is_empty() and size() helpers.
Why Not Just Use a List?
You can implement a queue with a list — append() to add at the back, pop(0) to remove from the front. But pop(0) is O(n) because Python has to shift every remaining element forward by one position. For a queue with 100,000 items, that's 100,000 shifts on every dequeue. That's terrible.
Python's collections.deque (pronounced "deck," short for "double-ended queue") solves this. It provides O(1) append() and O(1) popleft():
from collections import deque
class Queue:
"""A First-In, First-Out (FIFO) data structure."""
def __init__(self):
self._items: deque = deque()
def enqueue(self, item) -> None:
"""Add an item to the back of the queue."""
self._items.append(item)
def dequeue(self):
"""Remove and return the front item."""
if self.is_empty():
raise IndexError("dequeue from empty queue")
return self._items.popleft()
def front(self):
"""Return the front item without removing it."""
if self.is_empty():
raise IndexError("front of empty queue")
return self._items[0]
def is_empty(self) -> bool:
return len(self._items) == 0
def size(self) -> int:
return len(self._items)
def __repr__(self) -> str:
return f"Queue({list(self._items)})"
The performance difference is dramatic:
import time
from collections import deque
# list.pop(0) — O(n) per operation, O(n^2) total
test_list = list(range(100_000))
start = time.perf_counter()
while test_list:
test_list.pop(0)
list_time = time.perf_counter() - start
# deque.popleft() — O(1) per operation, O(n) total
test_deque = deque(range(100_000))
start = time.perf_counter()
while test_deque:
test_deque.popleft()
deque_time = time.perf_counter() - start
print(f"list: {list_time:.3f}s") # ~0.44s (varies)
print(f"deque: {deque_time:.3f}s") # ~0.003s (varies)
⚠️ Pitfall: If you see
some_list.pop(0)in a loop, that's a code smell. It means "I need a queue but I'm using a list." Replace the list with adequeand usepopleft().
Using the Queue
q = Queue()
q.enqueue("Alice")
q.enqueue("Bob")
q.enqueue("Charlie")
print(q.front()) # Alice (first in line, not removed)
print(q.dequeue()) # Alice (served, removed)
print(q.dequeue()) # Bob (served, removed)
print(q.size()) # 1 (only Charlie remains)
Visualizing the queue:
enqueue("Alice"): [Alice] front → ← back
enqueue("Bob"): [Alice, Bob]
enqueue("Charlie"): [Alice, Bob, Charlie]
dequeue(): [Bob, Charlie] → returns "Alice"
dequeue(): [Charlie] → returns "Bob"
🔄 Check Your Understanding #2: If you enqueue the values "X", "Y", "Z" and then dequeue once, what is at the front of the queue? What would happen if you dequeued three more times?
Answer
After enqueueing X, Y, Z and dequeuing once (removes X), the front is "Y". Dequeuing once more returns Y (front becomes Z). Dequeuing again returns Z (queue is now empty). A fourth dequeue raises
IndexError: dequeue from empty queue.
20.5 Queue Applications
Application 1: Elena's Report Processing Pipeline
Elena Vasquez, our nonprofit data analyst, receives reports from multiple departments throughout the week. Each report needs to be processed in the order it arrives — a natural FIFO scenario:
report_queue = Queue()
# Reports arrive throughout the day
report_queue.enqueue("Q1 Donations Summary")
report_queue.enqueue("Volunteer Hours Log")
report_queue.enqueue("Grant Application Status")
report_queue.enqueue("Monthly Budget Reconciliation")
# Elena processes them in order
while not report_queue.is_empty():
report = report_queue.dequeue()
print(f"Processing: {report}")
# Output (always in arrival order):
# Processing: Q1 Donations Summary
# Processing: Volunteer Hours Log
# Processing: Grant Application Status
# Processing: Monthly Budget Reconciliation
Application 2: Task Scheduling
Operating systems use queues to schedule which process runs next. A simplified version: tasks arrive with different processing times, and the system works through them in order.
from collections import deque
task_queue: deque = deque()
# Tasks arrive
tasks = [
{"name": "Download update", "time_ms": 3000},
{"name": "Save document", "time_ms": 500},
{"name": "Send email", "time_ms": 1200},
]
for task in tasks:
task_queue.append(task)
# Process in FIFO order
total_wait = 0
while task_queue:
task = task_queue.popleft()
total_wait += task["time_ms"]
print(f"Running '{task['name']}' — "
f"completes at {total_wait}ms")
# Output:
# Running 'Download update' — completes at 3000ms
# Running 'Save document' — completes at 3500ms
# Running 'Send email' — completes at 4700ms
Application 3: Breadth-First Search Preview
In Chapter 18, you learned about recursion and depth-first exploration. Queues enable a fundamentally different traversal pattern: breadth-first search (BFS). Imagine you're exploring a maze. Depth-first search (using a stack or recursion) dives deep down one path before backtracking. Breadth-first search (using a queue) explores all rooms one step away, then all rooms two steps away, then three steps away. BFS guarantees finding the shortest path — and it relies on a queue.
Here's a taste — a BFS that finds the shortest path in a simple grid:
def bfs_shortest_path(grid: list[list[str]], start: tuple[int, int],
goal: tuple[int, int]) -> int | None:
"""Find shortest path length in a grid using BFS. '#' = wall."""
rows, cols = len(grid), len(grid[0])
queue: deque = deque()
queue.append((start[0], start[1], 0)) # row, col, distance
visited: set = {start}
while queue:
row, col, dist = queue.popleft()
if (row, col) == goal:
return dist
for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
nr, nc = row + dr, col + dc
if (0 <= nr < rows and 0 <= nc < cols
and grid[nr][nc] != "#" and (nr, nc) not in visited):
visited.add((nr, nc))
queue.append((nr, nc, dist + 1))
return None # no path found
maze = [
[".", ".", "#", "."],
["#", ".", "#", "."],
[".", ".", ".", "."],
[".", "#", "#", "."],
]
result = bfs_shortest_path(maze, (0, 0), (3, 3))
print(f"Shortest path length: {result}") # Shortest path length: 6
The queue ensures we explore closer cells before farther ones — that's what guarantees shortest-path correctness. If we used a stack instead, we'd get depth-first search, which might find a path but not necessarily the shortest.
🔄 Check Your Understanding #3: A stack uses LIFO and enables depth-first exploration. A queue uses FIFO and enables breadth-first exploration. If you replaced the queue in BFS with a stack, what change in behavior would you observe?
Answer
The algorithm would become depth-first search (DFS). It would still find a path (if one exists), but it would explore one branch deeply before trying others. It would not guarantee finding the shortest path. DFS might return a path of length 20 even if a path of length 6 exists.
20.6 Abstract Data Types
This is the threshold concept of the chapter. Take your time with it.
🚪 Threshold Concept: Abstract Data Types
Before: "A stack is a list with restrictions."
After: "A stack is a concept (LIFO behavior) that can be implemented in many ways."
An abstract data type (ADT) is a specification of what a data structure does — its operations and their behavior — without specifying how it does it internally. It's the difference between a blueprint and a building.
Consider the stack ADT. The specification says:
- push(item) adds an item to the top
- pop() removes and returns the top item
- peek() returns the top item without removing it
- is_empty() reports whether the stack is empty
That's the entire ADT. It says nothing about lists, arrays, linked lists, or any other implementation technology. Any code that provides these operations with LIFO behavior is a stack, regardless of how it works internally.
Why Does This Matter?
Because it means you can swap implementations without changing any code that uses the data structure. This is the same principle you encountered with polymorphism in Chapter 15 — code against the interface, not the implementation.
Here's a concrete example. We implemented our stack using a Python list:
class ListStack:
"""Stack implemented with a Python list."""
def __init__(self):
self._items: list = []
def push(self, item) -> None:
self._items.append(item)
def pop(self):
if self.is_empty():
raise IndexError("pop from empty stack")
return self._items.pop()
def peek(self):
if self.is_empty():
raise IndexError("peek at empty stack")
return self._items[-1]
def is_empty(self) -> bool:
return len(self._items) == 0
def size(self) -> int:
return len(self._items)
But we could also implement the same ADT using a collections.deque:
from collections import deque
class DequeStack:
"""Stack implemented with a deque."""
def __init__(self):
self._items: deque = deque()
def push(self, item) -> None:
self._items.append(item)
def pop(self):
if self.is_empty():
raise IndexError("pop from empty stack")
return self._items.pop()
def peek(self):
if self.is_empty():
raise IndexError("peek at empty stack")
return self._items[-1]
def is_empty(self) -> bool:
return len(self._items) == 0
def size(self) -> int:
return len(self._items)
Both classes implement the same ADT. Any code that uses push, pop, peek, and is_empty works identically with either one. The user never knows — or needs to know — which implementation is underneath.
This separation of interface from implementation is one of the most powerful ideas in computing:
| Concept | What | How |
|---|---|---|
| ADT | "A stack supports push, pop, peek, is_empty" | (unspecified) |
| Data structure | (matches the ADT specification) | "Uses a Python list internally" |
| Another data structure | (same ADT specification) | "Uses a linked list internally" |
In CS2, you'll learn that the same ADT (like "priority queue") can have dramatically different performance depending on the implementation (unsorted list vs. sorted list vs. binary heap). The ADT stays the same; the trade-offs change. Learning to separate these two concerns is what makes you a computer scientist, not just a programmer.
💡 Intuition: Think of a vending machine. The ADT is the interface: insert money, press a button, get a snack. The implementation is whatever mechanism is inside — gears, springs, a robotic arm. As a customer, you don't care how it works. You care that it works. If the manufacturer swaps the internal mechanism, your experience doesn't change — as long as the interface stays the same.
20.7 Linked Lists Conceptually
Every data structure you've used so far — lists, tuples, strings — stores elements in contiguous memory: one right after another, like books on a shelf. This is great for random access (my_list[42] is O(1)) but expensive for insertions in the middle (every element after the insertion point has to shift over).
A linked list takes a fundamentally different approach. Instead of storing elements side by side, each element is a node that contains two things: 1. The data (the actual value) 2. A reference (or pointer) to the next node
The nodes can be scattered anywhere in memory. The references stitch them together into a sequence.
Node("A") Node("B") Node("C")
┌──────┬───┐ ┌──────┬───┐ ┌──────┬───┐
│ "A" │ ──┼──→│ "B" │ ──┼──→│ "C" │ / │
└──────┴───┘ └──────┴───┘ └──────┴───┘
data next data next data next
(None = end)
In Python, a node might look like this:
class Node:
"""A single node in a linked list."""
def __init__(self, data, next_node=None):
self.data = data
self.next = next_node
# Build a tiny linked list: A → B → C
c = Node("C")
b = Node("B", c)
a = Node("A", b)
# Traverse it
current = a
while current is not None:
print(current.data, end=" → ")
current = current.next
print("None")
# Output: A → B → C → None
Why Linked Lists Exist
Linked lists aren't better than Python lists for most tasks. Their power shows up in specific scenarios:
| Operation | Python list (array) | Linked list |
|---|---|---|
| Access by index | O(1) | O(n) — must walk from the start |
| Insert at beginning | O(n) — shift everything | O(1) — just update one reference |
| Insert at end | O(1) amortized | O(1) if you keep a tail reference |
| Insert in middle (given position) | O(n) — shift elements | O(1) — update two references |
| Memory | Contiguous block | Scattered nodes + reference overhead |
The trade-off is clear: linked lists sacrifice random access for fast insertions and deletions at arbitrary positions. In CS2, you'll implement linked lists fully and discover that many advanced data structures (trees, graphs, hash table chains) use linked-list-like node-and-reference patterns internally.
⚠️ Important: You don't need to implement a full linked list for this course. The goal here is to understand the concept — nodes connected by references — because it appears everywhere in advanced data structures.
20.8 Preview: Trees, Graphs, and Hash Tables
This section is a roadmap, not a tutorial. You won't implement these structures here, but knowing they exist — and what problems they solve — gives you a framework for the data structures courses ahead.
Trees
A tree is a hierarchical data structure where each node has a value and zero or more children. The topmost node is the root. Nodes with no children are leaves.
CEO
/ \
CTO CFO
/ \ \
Dev1 Dev2 Accountant
Trees model anything hierarchical: file systems (folders contain subfolders), HTML documents (tags contain other tags), organizational charts, decision trees, and more. A binary tree — where each node has at most two children — is the foundation of binary search trees (BSTs), which provide O(log n) search, insert, and delete.
Graphs
A graph is a set of nodes (also called vertices) connected by edges. Unlike trees, graphs can have cycles (A connects to B connects to C connects back to A), and edges can be one-directional or two-directional.
Alice ←→ Bob
↕ ↕
Carol ←→ Dave
Graphs model relationships: social networks (who follows whom), maps (cities connected by roads), dependency systems (which tasks must finish before others can start), and the internet itself (web pages linked by hyperlinks). Graph algorithms like BFS, DFS, Dijkstra's shortest path, and minimum spanning trees are among the most important algorithms in computer science.
Hash Tables
A hash table maps keys to values using a hash function — a function that converts a key into an array index. This is exactly how Python's dict works behind the scenes.
# When you write:
phonebook = {"Alice": "555-0101", "Bob": "555-0102"}
# Python internally computes hash("Alice") to find WHERE to store the value.
# That's why dict lookup is O(1) on average — it jumps directly to the slot.
In Chapter 9, you learned to use dictionaries. In CS2, you'll learn how they work: hash functions, collision resolution, load factors, and resizing. Understanding the implementation explains why dictionary keys must be hashable (immutable) and why iteration order isn't guaranteed in older Python versions.
The Big Picture
| Data Structure | Key Property | Good For |
|---|---|---|
| Stack | LIFO access | Undo, backtracking, parsing |
| Queue | FIFO access | Scheduling, BFS, pipelines |
| Linked list | Fast insert/delete anywhere | Building blocks for trees, graphs |
| Tree | Hierarchical relationships | File systems, search, decision-making |
| Graph | Arbitrary relationships | Networks, maps, dependencies |
| Hash table | O(1) key-value lookup | Dictionaries, caches, deduplication |
These six data structures (plus arrays, which you already know as Python lists) cover the vast majority of data structure needs in practice. Everything else — heaps, tries, B-trees, skip lists, bloom filters — is a specialization of these fundamentals.
20.9 Choosing the Right Data Structure
One of the most important skills in programming is choosing the right data structure for the job. Here's a comprehensive decision guide comparing the structures you know:
| Need | Best Choice | Why |
|---|---|---|
| Ordered collection, access by position | list |
O(1) index access, flexible |
| Key-value pairs, fast lookup by key | dict |
O(1) average lookup |
| Unique elements, membership testing | set |
O(1) average in check |
| Fixed record of related values | tuple |
Immutable, hashable, lightweight |
| LIFO access (undo, backtracking) | Stack | Enforces discipline, clear intent |
| FIFO access (scheduling, pipelines) | Queue (deque) | O(1) at both ends |
| Lots of inserts/deletes in the middle | Linked list | O(1) insert/delete at known position |
| Hierarchical data | Tree | Natural representation of hierarchy |
| Relationships between entities | Graph | Models connections and networks |
Decision Flowchart (Mental Model)
Ask yourself these questions in order:
- Do I need key-value pairs? → Use a
dict. - Do I need to enforce uniqueness? → Use a
set. - Do I need LIFO (last in, first out)? → Use a stack.
- Do I need FIFO (first in, first out)? → Use a queue.
- Do I need an ordered sequence? → Use a
list(ortupleif it shouldn't change). - Am I modeling relationships? → You need a graph (CS2 territory).
- Am I modeling hierarchy? → You need a tree (CS2 territory).
💡 Intuition: The right data structure makes the code obvious. If you find yourself writing complicated workarounds — checking if a list is empty before popping, shifting elements manually, using a list as a makeshift queue — you're probably using the wrong data structure. Step back and ask: "What shape does this problem have?"
🐛 Debugging Walkthrough: Empty Stack and Queue Errors
The most common bug with stacks and queues is trying to access an element when the container is empty. Let's trace through a typical scenario:
# Bug: processing without checking for empty
task_stack = Stack()
task_stack.push("Task A")
task_stack.push("Task B")
# Process all tasks... but one too many pops
task_stack.pop() # Returns "Task B" — fine
task_stack.pop() # Returns "Task A" — fine
task_stack.pop() # IndexError: pop from empty stack
The fix: Always guard with is_empty() before pop() or dequeue():
# Correct pattern: check before popping
while not task_stack.is_empty():
task = task_stack.pop()
print(f"Processing: {task}")
A similar bug occurs with queues:
q = Queue()
q.enqueue("first")
# Bug: processing assumes more items than exist
for _ in range(5):
item = q.dequeue() # Crashes on second iteration
The fix: Same pattern — use is_empty() or size() as a guard:
while not q.is_empty():
item = q.dequeue()
print(f"Processed: {item}")
Rule of thumb: Never call pop(), peek(), dequeue(), or front() without first confirming the container is not empty. This applies to Python's built-in collections too — list.pop() on an empty list raises IndexError, and deque.popleft() on an empty deque does the same.
20.10 Project Checkpoint: TaskFlow v1.9
It's time to add undo/redo and a task execution queue to TaskFlow.
What's New in v1.9
- Undo stack: Every
add_taskanddelete_taskoperation pushes a record onto an undo stack. Theundocommand reverses the most recent operation. - Redo stack: Undone operations can be redone. A new add or delete clears the redo stack (you've branched into a new timeline).
- Task queue: Tasks can be scheduled for sequential execution. The
process_nextcommand completes the next queued task in FIFO order.
Implementation Overview
We need three new components:
StackandQueueclasses — the ones you built in this chapter.UndoableActionclass — a record that stores what was done (add or delete), which task was affected, and where it was in the list (for accurate restoration).- New methods on
TaskFlow:undo(),redo(),schedule_task(),process_next().
Key Code
The full implementation is in code/project-checkpoint.py. Here are the critical pieces:
Recording an action:
class UndoableAction:
def __init__(self, action_type: str, task: Task,
position: int | None = None):
self.action_type = action_type # "add" or "delete"
self.task = task
self.position = position # where it was (for delete)
Undoing an add (remove the task):
def undo(self) -> bool:
if self._undo_stack.is_empty():
print("Nothing to undo.")
return False
action = self._undo_stack.pop()
if action.action_type == "add":
self.tasks = [t for t in self.tasks
if t.id != action.task.id]
elif action.action_type == "delete":
pos = action.position or len(self.tasks)
self.tasks.insert(pos, action.task)
self._redo_stack.push(action)
return True
Scheduling and processing tasks:
def schedule_task(self, task_id: int) -> bool:
for task in self.tasks:
if task.id == task_id:
self._execution_queue.enqueue(task)
return True
return False
def process_next(self) -> Task | None:
if self._execution_queue.is_empty():
return None
task = self._execution_queue.dequeue()
task.done = True
return task
Testing Your Implementation
tf = TaskFlow()
tf.add_task("Write notes", "high", "school")
tf.add_task("Buy groceries", "medium", "personal")
tf.add_task("Review PR", "high", "work")
# Delete and undo
tf.delete_task(2) # Removes "Buy groceries"
tf.undo() # Restores "Buy groceries"
# Schedule and process
tf.schedule_task(1) # Queue: [Task #1]
tf.schedule_task(3) # Queue: [Task #1, Task #3]
tf.process_next() # Completes Task #1 (FIFO)
tf.process_next() # Completes Task #3
Run code/project-checkpoint.py for a complete demo with expected output for every operation.
🔄 Spaced Review (Chapter 19): In v1.8, you added custom sorting and binary search. Those features still work — undo/redo is a layer on top of the existing task management. Notice how each new feature composes with the existing code rather than replacing it. Good design is additive.
Chapter Summary
This chapter introduced stacks (LIFO), queues (FIFO), and the critical idea of abstract data types — separating what a data structure does from how it's implemented.
Key ideas:
- A stack supports
push,pop, andpeek. It's used for undo systems, bracket matching, backtracking, and expression evaluation. Implement it with a Python list usingappend()andpop()for O(1) operations. - A queue supports
enqueue,dequeue, andfront. It's used for task scheduling, processing pipelines, and breadth-first search. Implement it withcollections.dequeusingappend()andpopleft()for O(1) operations. - An abstract data type defines what operations a data structure supports, not how it implements them. The same ADT can have multiple implementations with different performance characteristics.
- Linked lists store elements in nodes connected by references, trading random access for fast insertions.
- Trees, graphs, and hash tables are fundamental data structures you'll study in CS2, each optimized for a different class of problems.
- Choosing the right data structure is one of the most impactful design decisions you'll make. Match the structure to the problem's shape.
Looking ahead: In Chapter 21, you'll move from data structures to data sources. You'll learn to read from and write to files, consume APIs, and work with CSV and JSON data — connecting your programs to the outside world. The queue pattern from this chapter will reappear when you process records from a file in the order they arrive.
"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important." — Linus Torvalds
Related Reading
Explore this topic in other books
Intro CS Python Welcome to Computer Science Intro CS Python Introduction to Algorithms Python for Business Getting Started with Python Vibe Coding Python Essentials for Vibe Coding