> "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
Learning Objectives
- Create and manipulate dictionaries using CRUD operations
- Iterate over dictionary keys, values, and items using practical patterns
- Use dictionary comprehensions and the defaultdict type
- Apply sets for membership testing, deduplication, and set operations
- Choose the right data structure (list vs. dict vs. set vs. tuple) for a given problem
In This Chapter
- Chapter Overview
- 9.1 Beyond Lists: When Order Isn't Enough
- 9.2 Creating and Accessing Dictionaries
- 9.3 Adding, Updating, and Removing Entries
- 9.4 Iterating Over Dictionaries
- 9.5 Dictionary Comprehensions
- 9.6 Nested Dictionaries
- 9.7 Sets: Unique Collections
- 9.8 Set Operations
- 9.9 Choosing the Right Data Structure
- 9.10 defaultdict and Counter from collections
- 9.11 Project Checkpoint: TaskFlow v0.8
- Chapter Summary
Chapter 9: Dictionaries and Sets: Organizing Data
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." — Linus Torvalds
Chapter Overview
Imagine you're building a contacts app. You have a thousand phone numbers. Someone asks: "What's Jordan's number?" If your data is stored in a list, you have to scan every entry until you stumble across Jordan — and if Jordan is the 999th entry, that's 999 checks. If your data is stored in a dictionary, you look up "Jordan" and get the answer immediately. Not faster. Immediately. One step, regardless of whether you have ten contacts or ten million.
That's the power of hash-based data structures, and it's what this chapter is about.
Dictionaries and sets are two of the most frequently used data structures in professional Python code. Dictionaries let you associate keys with values — think of a real-world dictionary where you look up a word (key) and get its definition (value). Sets let you work with collections where uniqueness matters and membership testing needs to be fast. Together, they unlock patterns that would be painfully slow or awkward with lists alone.
If Chapter 8 taught you to think in sequences, this chapter teaches you to think in mappings and memberships. It's a different mental model, and once it clicks, you'll start reaching for dicts and sets constantly.
In this chapter, you will learn to: - Create and manipulate dictionaries using CRUD operations (create, read, update, delete) - Iterate over dictionary keys, values, and items using efficient patterns - Write dictionary comprehensions to transform data - Use sets for deduplication, membership testing, and set operations - Choose the right data structure for any given problem
🏃 Fast Track: If you're comfortable with basic dict syntax, skim 9.1-9.3, slow down at 9.5 (comprehensions) and 9.9 (choosing data structures), and make sure you understand the threshold concept in 9.9.
🔬 Deep Dive: Sections 9.6 (nested dicts) and 9.10 (
defaultdictandCounter) are your springboards — they connect directly to JSON handling in Chapter 10 and OOP data modeling in Chapter 14.
9.1 Beyond Lists: When Order Isn't Enough
Lists are great when your data is naturally sequential — a series of test scores, a playlist of songs, a batch of filenames to process. You access items by position: "Give me the third element." But many real problems don't work that way.
Consider these scenarios:
- You have a student record: name, ID, GPA, major. Which one is
student[2]? Was that GPA or major? (Don't guess. Look it up. Every time.) - You're counting how many times each word appears in a document. With a list, you'd need to search the entire list for every single word.
- You have a game inventory: 3 torches, 1 sword, 5 health potions. With a list, how do you check if the player has a torch? Loop through everything?
In all these cases, what you really want is to look something up by name, not by position. That's exactly what a dictionary gives you.
💡 Intuition: Think of a real dictionary — not the Python kind, the book kind. You don't start at page 1 and read every entry until you find "ephemeral." You use the word itself to jump directly to its definition. Python dictionaries work the same way, but with any kind of key, not just alphabetical words.
🔗 Connection — Spaced Review (Ch 5): In Chapter 5, you learned that
forloops let you iterate over sequences. Dictionaries are iterable too, but with a twist — you can iterate over keys, values, or key-value pairs. The loop patterns you learned still apply; you just have more choices about what you're looping over.
9.2 Creating and Accessing Dictionaries
9.2.1 Dictionary Syntax
A dictionary is an unordered collection of key-value pairs. Each key maps to a value. Keys must be unique within a dictionary, and they must be hashable (we'll explain what that means shortly). Values can be anything.
Here are three ways to create a dictionary:
# Method 1: Curly brace literal (most common)
inventory = {"torch": 3, "sword": 1, "health_potion": 5}
# Method 2: dict() constructor with keyword arguments
inventory = dict(torch=3, sword=1, health_potion=5)
# Method 3: dict() from a list of tuples
inventory = dict([("torch", 3), ("sword", 1), ("health_potion", 5)])
# Empty dictionary
empty = {}
also_empty = dict()
All three methods produce the same result. Method 1 is what you'll use 90% of the time. Method 2 is clean but only works when keys are valid Python identifiers (no spaces, no special characters). Method 3 is useful when you're building a dict from another data source.
9.2.2 Accessing Values
Use square brackets with the key to access a value:
inventory = {"torch": 3, "sword": 1, "health_potion": 5}
print(inventory["torch"]) # 3
print(inventory["health_potion"]) # 5
Output:
3
5
But here's where beginners get burned — accessing a key that doesn't exist raises a KeyError:
inventory = {"torch": 3, "sword": 1, "health_potion": 5}
print(inventory["shield"]) # KeyError: 'shield'
Output:
Traceback (most recent call last):
File "example.py", line 3, in <module>
print(inventory["shield"])
~~~~~~~~~^^^^^^^^^^
KeyError: 'shield'
🐛 Debugging Walkthrough: KeyError
This is the single most common dictionary error. It happens when you try to access a key that isn't in the dict. Common causes:
- Typo in the key name.
inventory["Torch"]is not the same asinventory["torch"]. Keys are case-sensitive.- Assuming a key exists before adding it. If you read data from user input and haven't validated it, the key might not be there.
- Different data types.
inventory[1]andinventory["1"]are different keys. The integer1is not the same as the string"1".Fix pattern: Use
.get()(next section), useinto check first, or use try/except (Chapter 11).
9.2.3 Safe Access with .get()
The .get() method lets you provide a default value when a key is missing:
inventory = {"torch": 3, "sword": 1, "health_potion": 5}
# .get() returns the value if the key exists
print(inventory.get("torch")) # 3
# .get() returns None if the key doesn't exist (no crash)
print(inventory.get("shield")) # None
# .get() with a custom default
print(inventory.get("shield", 0)) # 0
# Compare: this would crash
# print(inventory["shield"]) # KeyError!
Output:
3
None
0
Use .get() when you're not sure a key exists and you want a fallback. Use bracket access (dict[key]) when the key should exist and a missing key means something is wrong — you want the error so you can fix the bug.
✅ Best Practice: Don't use
.get()everywhere "just to be safe." If a key must exist and it's missing, that's a bug you want to catch. Reserve.get()for cases where missing keys are genuinely expected — like counting occurrences or reading optional configuration.
9.2.4 Membership Testing with in
You can check whether a key exists before accessing it:
inventory = {"torch": 3, "sword": 1, "health_potion": 5}
if "torch" in inventory:
print(f"You have {inventory['torch']} torches.")
else:
print("No torches found.")
# Note: `in` checks KEYS, not values
print(3 in inventory) # False — 3 is a value, not a key
Output:
You have 3 torches.
False
The in operator checks keys, not values. This is important. If you need to check whether a value exists, use in inventory.values() — but be aware that searching values is O(n), not O(1).
🔄 Check Your Understanding (try before looking)
What does the following code print?
python config = {"debug": True, "verbose": False, "max_retries": 3} print(config.get("timeout", 30)) print("verbose" in config) print(False in config)
Verify
30 True False"timeout"isn't a key, so.get()returns the default30."verbose"is a key, soinreturnsTrue.Falseis a value, not a key, soinreturnsFalse.
9.3 Adding, Updating, and Removing Entries
Dictionaries are mutable — you can change them after creation.
🔗 Connection — Spaced Review (Ch 8): In Chapter 8, you learned that lists are mutable but tuples are not. Dictionaries are mutable too, and they share the same aliasing behavior — if two variables point to the same dict, changing one affects the other. Keep the name-tag model from Chapter 3 in mind.
9.3.1 Adding and Updating
inventory = {"torch": 3, "sword": 1}
# Add a new key-value pair
inventory["shield"] = 1
print(inventory)
# {'torch': 3, 'sword': 1, 'shield': 1}
# Update an existing value
inventory["torch"] = 5
print(inventory)
# {'torch': 5, 'sword': 1, 'shield': 1}
# update() merges another dict into this one
inventory.update({"health_potion": 3, "torch": 7})
print(inventory)
# {'torch': 7, 'sword': 1, 'shield': 1, 'health_potion': 3}
Output:
{'torch': 3, 'sword': 1, 'shield': 1}
{'torch': 5, 'sword': 1, 'shield': 1}
{'torch': 7, 'sword': 1, 'shield': 1, 'health_potion': 3}
Notice that update() overwrites existing keys (torch went from 5 to 7) and adds new ones (health_potion).
9.3.2 The Merge Operator (Python 3.9+)
Python 3.9 introduced the | operator for merging dictionaries:
defaults = {"color": "blue", "size": "medium", "verbose": False}
user_prefs = {"color": "green", "verbose": True}
# Merge: user_prefs overrides defaults where keys overlap
final = defaults | user_prefs
print(final)
# {'color': 'green', 'size': 'medium', 'verbose': True}
# In-place merge
defaults |= user_prefs
print(defaults)
# {'color': 'green', 'size': 'medium', 'verbose': True}
Output:
{'color': 'green', 'size': 'medium', 'verbose': True}
{'color': 'green', 'size': 'medium', 'verbose': True}
9.3.3 Removing Entries
inventory = {"torch": 3, "sword": 1, "shield": 1, "health_potion": 5}
# del removes a specific key (raises KeyError if missing)
del inventory["shield"]
print(inventory)
# {'torch': 3, 'sword': 1, 'health_potion': 5}
# pop() removes and returns the value (with optional default)
potion_count = inventory.pop("health_potion")
print(f"Removed {potion_count} health potions")
# Removed 5 health potions
# pop() with a default (no KeyError)
gone = inventory.pop("amulet", 0)
print(f"Removed {gone} amulets")
# Removed 0 amulets
# clear() removes everything
inventory.clear()
print(inventory)
# {}
Output:
{'torch': 3, 'sword': 1, 'health_potion': 5}
Removed 5 health potions
Removed 0 amulets
{}
⚠️ Pitfall: Never modify a dictionary's keys while iterating over it. This raises a
RuntimeError. If you need to remove items during iteration, iterate over a copy of the keys:```python
WRONG — will crash
for key in scores: if scores[key] < 50: del scores[key] # RuntimeError!
RIGHT — iterate over a copy
for key in list(scores): if scores[key] < 50: del scores[key] ```
9.4 Iterating Over Dictionaries
This is where dictionaries really show their versatility. You have three views into a dictionary, and each one is useful for different patterns.
9.4.1 Iterating Over Keys
rooms = {
"entrance": "A dim hallway stretches before you.",
"armory": "Weapons line the walls. A rusty sword catches your eye.",
"throne_room": "A golden throne sits atop a dais of black marble.",
}
# Default iteration is over keys
for room in rooms:
print(room)
Output:
entrance
armory
throne_room
9.4.2 Iterating Over Values
rooms = {
"entrance": "A dim hallway stretches before you.",
"armory": "Weapons line the walls. A rusty sword catches your eye.",
"throne_room": "A golden throne sits atop a dais of black marble.",
}
for description in rooms.values():
print(description)
Output:
A dim hallway stretches before you.
Weapons line the walls. A rusty sword catches your eye.
A golden throne sits atop a dais of black marble.
9.4.3 Iterating Over Key-Value Pairs
This is the most common pattern — .items() gives you both the key and the value in each iteration:
rooms = {
"entrance": "A dim hallway stretches before you.",
"armory": "Weapons line the walls. A rusty sword catches your eye.",
"throne_room": "A golden throne sits atop a dais of black marble.",
}
for room_name, description in rooms.items():
print(f"[{room_name.upper()}]")
print(f" {description}")
print()
Output:
[ENTRANCE]
A dim hallway stretches before you.
[ARMORY]
Weapons line the walls. A rusty sword catches your eye.
[THRONE_ROOM]
A golden throne sits atop a dais of black marble.
🔗 Connection — Spaced Review (Ch 5, Ch 8): The tuple unpacking in
for room_name, description in rooms.items()works exactly like unpacking a list of tuples (Chapter 8). Each call to.items()yields a(key, value)tuple, and theforloop unpacks it into two variables.
9.4.4 Practical Pattern: Building a Frequency Table
Here's one of the most important dictionary patterns in all of programming — counting occurrences:
sentence = "the cat sat on the mat the cat"
words = sentence.split()
# Count how many times each word appears
word_counts = {}
for word in words:
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1
print(word_counts)
Output:
{'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}
🔗 Connection — Spaced Review (Ch 7): The
.split()method from Chapter 7 breaks a string into a list of words — exactly what we need to feed into a frequency counter.
There's a more elegant version using .get():
sentence = "the cat sat on the mat the cat"
words = sentence.split()
word_counts = {}
for word in words:
word_counts[word] = word_counts.get(word, 0) + 1
print(word_counts)
Output:
{'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}
That one-liner — word_counts[word] = word_counts.get(word, 0) + 1 — is a pattern you'll use hundreds of times. It says: "Get the current count (or 0 if this is the first time), add 1, and store the result."
9.4.5 Anchor Example: Dr. Patel Counts Nucleotides
Dr. Anika Patel needs to count the frequency of each nucleotide (A, T, G, C) in a DNA sequence. This is the exact same pattern:
dna_sequence = "ATGCGATCGATCGATCATGCATGC"
nucleotide_counts = {}
for base in dna_sequence:
nucleotide_counts[base] = nucleotide_counts.get(base, 0) + 1
print("Nucleotide frequencies:")
for base, count in sorted(nucleotide_counts.items()):
percentage = (count / len(dna_sequence)) * 100
print(f" {base}: {count:>3} ({percentage:.1f}%)")
# GC content (important in genomics)
gc = nucleotide_counts.get("G", 0) + nucleotide_counts.get("C", 0)
gc_content = gc / len(dna_sequence) * 100
print(f"\nGC Content: {gc_content:.1f}%")
Output:
Nucleotide frequencies:
A: 5 (20.8%)
C: 7 (29.2%)
G: 6 (25.0%)
T: 6 (25.0%)
GC Content: 54.2%
🔄 Check Your Understanding
Given the dictionary
scores = {"Alice": 92, "Bob": 87, "Carol": 95}, write a loop that prints only students who scored above 90 and their scores. Use.items().
Verify
python for name, score in scores.items(): if score > 90: print(f"{name}: {score}")Prints:Alice: 92 Carol: 95
9.5 Dictionary Comprehensions
Just as you can build lists with list comprehensions (if you've read ahead to Chapter 8's advanced material, or you'll see them later), you can build dictionaries with dictionary comprehensions. The syntax is similar but uses curly braces and a colon:
# Basic syntax: {key_expr: value_expr for item in iterable}
# Squares of numbers 1-5
squares = {n: n**2 for n in range(1, 6)}
print(squares)
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
Output:
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
9.5.1 Transforming Data
# Convert temperatures from Celsius to Fahrenheit
celsius_temps = {"Monday": 20, "Tuesday": 22, "Wednesday": 19,
"Thursday": 25, "Friday": 23}
fahrenheit = {day: (temp * 9/5) + 32 for day, temp in celsius_temps.items()}
print(fahrenheit)
Output:
{'Monday': 68.0, 'Tuesday': 71.6, 'Wednesday': 66.2, 'Thursday': 77.0, 'Friday': 73.4}
9.5.2 Filtering with Conditions
scores = {"Alice": 92, "Bob": 67, "Carol": 95, "Dave": 43, "Eve": 88}
# Keep only passing students (score >= 70)
passing = {name: score for name, score in scores.items() if score >= 70}
print(passing)
Output:
{'Alice': 92, 'Carol': 95, 'Eve': 88}
9.5.3 Inverting a Dictionary
Sometimes you need to swap keys and values:
extensions = {"py": "Python", "js": "JavaScript", "rs": "Rust"}
# Invert: language name -> extension
languages = {lang: ext for ext, lang in extensions.items()}
print(languages)
Output:
{'Python': 'py', 'JavaScript': 'js', 'Rust': 'rs'}
⚠️ Pitfall: Inverting only works cleanly when values are unique. If two keys share the same value, one will overwrite the other — whichever comes last wins.
✅ Best Practice: Use dict comprehensions when the transformation is simple and fits on one or two lines. If you need multiple conditions, nested loops, or complex logic, a regular
forloop is more readable. Readability always wins over cleverness.
9.6 Nested Dictionaries
Real-world data is rarely flat. You'll often need dictionaries inside dictionaries to represent structured records.
9.6.1 Student Records
students = {
"Alice": {"id": "A001", "scores": [90, 85, 92], "major": "CS"},
"Bob": {"id": "A002", "scores": [78, 82, 74], "major": "Math"},
"Carol": {"id": "A003", "scores": [95, 98, 91], "major": "CS"},
}
# Access nested data
print(students["Alice"]["major"]) # CS
print(students["Bob"]["scores"][1]) # 82
# Calculate Alice's average
alice_avg = sum(students["Alice"]["scores"]) / len(students["Alice"]["scores"])
print(f"Alice's average: {alice_avg:.1f}") # 89.0
# Find all CS majors
cs_students = [name for name, info in students.items() if info["major"] == "CS"]
print(f"CS majors: {cs_students}")
Output:
CS
82
Alice's average: 89.0
CS majors: ['Alice', 'Carol']
9.6.2 Nested Dicts as Proto-JSON
If you've ever seen data from a web API or a configuration file, you've seen JSON — and JSON maps almost directly to Python dictionaries and lists:
# This is what real-world API data looks like
weather_data = {
"city": "Portland",
"temperature": {"current": 18, "high": 22, "low": 14},
"conditions": "partly cloudy",
"forecast": [
{"day": "Monday", "high": 20, "low": 12},
{"day": "Tuesday", "high": 24, "low": 15},
],
}
# Navigating nested data
print(f"Current temp in {weather_data['city']}: "
f"{weather_data['temperature']['current']}°C")
for day_forecast in weather_data["forecast"]:
print(f" {day_forecast['day']}: {day_forecast['high']}°C / "
f"{day_forecast['low']}°C")
Output:
Current temp in Portland: 18°C
Monday: 20°C / 12°C
Tuesday: 24°C / 15°C
🔗 Connection — Bridge Forward: In Chapter 10, you'll learn to read and write JSON files directly. The
jsonmodule converts between JSON text and Python dicts/lists — which means everything you learn about nested dicts right now transfers directly to working with real-world data files and web APIs.
9.7 Sets: Unique Collections
A set is an unordered collection of unique elements. If dictionaries are about mapping keys to values, sets are about asking one question: is this element in the collection?
9.7.1 Creating Sets
# Set literal (curly braces, no colons)
fruits = {"apple", "banana", "cherry"}
print(fruits) # {'cherry', 'banana', 'apple'} (order may vary)
print(type(fruits)) # <class 'set'>
# From a list (automatically removes duplicates)
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
unique_numbers = set(numbers)
print(unique_numbers) # {1, 2, 3, 4, 5, 6, 9}
# Empty set (NOT {} — that's an empty dict!)
empty_set = set()
print(type(empty_set)) # <class 'set'>
print(type({})) # <class 'dict'> ← surprise!
Output:
{'cherry', 'banana', 'apple'}
<class 'set'>
{1, 2, 3, 4, 5, 6, 9}
<class 'set'>
<class 'dict'>
⚠️ Pitfall:
{}creates an empty dictionary, not an empty set. To create an empty set, you must useset(). This is one of Python's well-known gotchas.
9.7.2 Deduplication
One of the most common uses of sets is removing duplicates from a list:
raw_emails = [
"alice@example.com", "bob@example.com", "alice@example.com",
"carol@example.com", "bob@example.com", "dave@example.com",
]
unique_emails = list(set(raw_emails))
print(f"Before: {len(raw_emails)} emails")
print(f"After: {len(unique_emails)} unique emails")
print(unique_emails)
Output:
Before: 6 emails
After: 4 unique emails
['carol@example.com', 'dave@example.com', 'alice@example.com', 'bob@example.com']
⚠️ Pitfall: Converting to a set loses the original order. If order matters, use
list(dict.fromkeys(raw_emails))instead — this preserves insertion order while removing duplicates (since dict keys are unique and ordered in Python 3.7+).
9.7.3 Fast Membership Testing
Sets shine when you need to check whether an element exists in a large collection. Like dictionaries, sets use hashing for O(1) lookup:
# Simulating: check if words are in a dictionary of valid English words
valid_words = {"apple", "banana", "cherry", "date", "elderberry"}
test_words = ["apple", "xylophone", "cherry", "quartz"]
for word in test_words:
if word in valid_words:
print(f" '{word}' — valid")
else:
print(f" '{word}' — not found")
Output:
'apple' — valid
'xylophone' — not found
'cherry' — valid
'quartz' — not found
9.7.4 Modifying Sets
colors = {"red", "green", "blue"}
# Add a single element
colors.add("yellow")
print(colors) # {'red', 'green', 'blue', 'yellow'}
# Adding a duplicate does nothing (no error)
colors.add("red")
print(colors) # {'red', 'green', 'blue', 'yellow'}
# Remove an element (raises KeyError if missing)
colors.remove("green")
print(colors) # {'red', 'blue', 'yellow'}
# Discard — like remove, but no error if missing
colors.discard("purple") # No error
print(colors) # {'red', 'blue', 'yellow'}
Output:
{'blue', 'green', 'yellow', 'red'}
{'blue', 'green', 'yellow', 'red'}
{'blue', 'yellow', 'red'}
{'blue', 'yellow', 'red'}
9.8 Set Operations
Sets support the mathematical operations you might remember from a math class — union, intersection, difference, and symmetric difference. These are powerful tools for comparing collections.
python_devs = {"Alice", "Bob", "Carol", "Dave"}
java_devs = {"Bob", "Dave", "Eve", "Frank"}
# Union: everyone who knows at least one language
all_devs = python_devs | java_devs # or python_devs.union(java_devs)
print(f"All devs: {all_devs}")
# Intersection: people who know both
both = python_devs & java_devs # or python_devs.intersection(java_devs)
print(f"Know both: {both}")
# Difference: Python devs who don't know Java
python_only = python_devs - java_devs # or python_devs.difference(java_devs)
print(f"Python only: {python_only}")
# Symmetric difference: people who know exactly one language
one_language = python_devs ^ java_devs # or python_devs.symmetric_difference(java_devs)
print(f"Exactly one: {one_language}")
# Subset/superset checks
print(f"\nIs {{'Bob', 'Carol'}} a subset of python_devs? {{'Bob', 'Carol'} <= python_devs}")
Output:
All devs: {'Carol', 'Eve', 'Dave', 'Bob', 'Alice', 'Frank'}
Know both: {'Bob', 'Dave'}
Python only: {'Carol', 'Alice'}
Exactly one: {'Carol', 'Eve', 'Alice', 'Frank'}
Is {'Bob', 'Carol'} a subset of python_devs? True
| Operation | Operator | Method | Returns |
|---|---|---|---|
| Union | a \| b |
a.union(b) |
All elements from both sets |
| Intersection | a & b |
a.intersection(b) |
Elements in both sets |
| Difference | a - b |
a.difference(b) |
Elements in a but not in b |
| Symmetric Difference | a ^ b |
a.symmetric_difference(b) |
Elements in either set, but not both |
| Subset | a <= b |
a.issubset(b) |
True if all of a is in b |
| Superset | a >= b |
a.issuperset(b) |
True if a contains all of b |
🔄 Check Your Understanding
Given:
python enrolled = {"Alice", "Bob", "Carol", "Dave", "Eve"} submitted = {"Alice", "Carol", "Eve", "Frank"}1. Who submitted but isn't enrolled? (Possible cheating alert!) 2. Who is enrolled but didn't submit? (Missing assignments!) 3. How many unique people are involved in total?
Verify
submitted - enrolled→{"Frank"}enrolled - submitted→{"Bob", "Dave"}len(enrolled | submitted)→6
9.9 Choosing the Right Data Structure
This is the most important section of the chapter. Everything before this has been how to use dicts and sets. This section is about when.
🚪 Threshold Concept: Hash-Based O(1) Lookup
Before this clicks: "To find something, I search through everything." You think of data as a list, and looking something up means looping through it.
After this clicks: "With a dictionary, I jump directly to what I need — no searching." You understand that dicts and sets use hashing to convert a key into a memory address, giving you constant-time lookup regardless of the collection size.
Why this matters practically:
```python import time
Create a list and a dict with 1 million entries
big_list = list(range(1_000_000)) big_dict = {i: True for i in range(1_000_000)}
Search for the last element in each
target = 999_999
start = time.perf_counter() found_list = target in big_list list_time = time.perf_counter() - start
start = time.perf_counter() found_dict = target in big_dict dict_time = time.perf_counter() - start
print(f"List search: {list_time:.6f} seconds") print(f"Dict search: {dict_time:.6f} seconds") print(f"Dict is ~{list_time / dict_time:.0f}x faster") ```
Typical output:
List search: 0.012483 seconds Dict search: 0.000001 seconds Dict is ~12483x fasterThat's not a typo. The dictionary lookup is roughly ten thousand times faster for a million elements. And the gap gets wider as the collection grows. A list search is O(n) — it grows linearly with size. A dict/set lookup is O(1) — it stays constant.
The mental shift: Once you internalize this, you'll start asking "Can I use a dict here?" whenever you see code that searches through a list repeatedly. This single insight will make you write dramatically faster programs.
9.9.1 The Decision Guide
| Need | Use | Why |
|---|---|---|
| Ordered sequence, accessed by position | list | scores[3] — the 4th score |
| Fixed collection that shouldn't change | tuple | (lat, lon) — immutable coordinates |
| Key-value mapping, fast lookup by key | dict | contacts["Jordan"] — instant access |
| Unique elements, membership testing | set | "Alice" in enrolled — instant check |
| Count occurrences | dict (or Counter) |
word_counts["the"] — direct access to count |
| Remove duplicates from a list | set | set(emails) — dedup in one step |
| Associate records with an ID | dict | students["A001"] — lookup by student ID |
| Compare two groups | set | team_a & team_b — who's on both? |
| Preserve insertion order with fast lookup | dict | Python 3.7+ dicts maintain insertion order |
| Stack (LIFO) | list | append() and pop() — Chapter 20 |
| Queue (FIFO) | collections.deque | O(1) on both ends — Chapter 20 |
9.9.2 How Hashing Works (Simplified)
When you put a key into a dictionary, Python calls hash() on it to compute an integer. That integer is used (after some math) as an index into an internal array. When you look up the key later, Python computes the same hash and jumps directly to that spot. No searching required.
# hash() converts objects to integers
print(hash("hello")) # e.g., 2486828820262256434 (varies per session)
print(hash(42)) # 42
print(hash((1, 2, 3))) # e.g., 529344067295497451
# Mutable objects can't be hashed — they'd change, invalidating the hash
# hash([1, 2, 3]) # TypeError: unhashable type: 'list'
🐛 Debugging Walkthrough: Unhashable Type
You'll hit this error if you try to use a mutable object as a dictionary key or set element:
```python
This crashes
grid = {} grid[[0, 1]] = "start" # TypeError: unhashable type: 'list'
Fix: use a tuple instead (tuples are immutable, so they're hashable)
grid = {} grid[(0, 1)] = "start" # Works! ```
Rule of thumb: Anything you can use as a dict key or put in a set must be immutable — strings, numbers, tuples (of immutables), frozensets, booleans. Lists, dicts, and sets are mutable, so they can't be keys.
🔗 Connection — Spaced Review (Ch 8): This is why tuples exist! In Chapter 8, you might have wondered: "Why would I ever use a tuple instead of a list?" Here's one major reason — tuples can be dictionary keys and set elements; lists cannot.
9.10 defaultdict and Counter from collections
The standard library's collections module has two tools that make common dictionary patterns even cleaner.
9.10.1 defaultdict
A defaultdict automatically creates a default value when you access a missing key. No more checking if key in dict before every operation:
from collections import defaultdict
# Regular dict: need to check before incrementing
word_counts = {}
for word in "the cat sat on the mat the cat".split():
if word not in word_counts:
word_counts[word] = 0
word_counts[word] += 1
# defaultdict(int): automatically starts at 0
word_counts = defaultdict(int)
for word in "the cat sat on the mat the cat".split():
word_counts[word] += 1 # No check needed!
print(dict(word_counts))
Output:
{'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}
You can also use defaultdict(list) to auto-create empty lists:
from collections import defaultdict
# Group students by grade
grades = [("Alice", "A"), ("Bob", "B"), ("Carol", "A"),
("Dave", "C"), ("Eve", "B")]
by_grade = defaultdict(list)
for name, grade in grades:
by_grade[grade].append(name)
for grade, students in sorted(by_grade.items()):
print(f"Grade {grade}: {', '.join(students)}")
Output:
Grade A: Alice, Carol
Grade B: Bob, Eve
Grade C: Dave
9.10.2 Counter
Counter is purpose-built for counting things. It's like a dictionary that counts automatically:
from collections import Counter
# Count characters
letter_counts = Counter("mississippi")
print(letter_counts)
# Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
# Most common elements
print(letter_counts.most_common(2))
# [('s', 4), ('i', 4)]
# Count words
words = "the cat sat on the mat the cat".split()
word_counts = Counter(words)
print(word_counts)
# Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1})
# Counters support arithmetic!
bag1 = Counter(apple=3, banana=2)
bag2 = Counter(apple=1, banana=4, cherry=2)
print(bag1 + bag2) # Counter({'banana': 6, 'apple': 4, 'cherry': 2})
Output:
Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
[('s', 4), ('i', 4)]
Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1})
Counter({'banana': 6, 'apple': 4, 'cherry': 2})
✅ Best Practice: For frequency counting, use
Counterin production code. It's clearer, faster, and already handles all the edge cases. Use the manualforloop +.get()pattern when you're learning — but reach forCounterwhen you're writing real programs.
9.11 Project Checkpoint: TaskFlow v0.8
Time to upgrade TaskFlow. In v0.7, tasks were stored as tuples. That worked, but accessing task[2] to get the priority is fragile and unreadable. Let's upgrade to dictionaries.
What's New in v0.8
- Each task is a dictionary with named fields:
name,priority,category,created_at - Filter and display tasks by category
- Show tasks grouped by category
- Task list stored as a list of dictionaries
The Code
Here's the core of TaskFlow v0.8 (full version in code/project-checkpoint.py):
from datetime import datetime
def create_task(name, priority="medium", category="general"):
"""Create a new task as a dictionary."""
return {
"name": name,
"priority": priority,
"category": category,
"created_at": datetime.now().strftime("%Y-%m-%d %H:%M"),
"done": False,
}
def list_tasks(tasks, category_filter=None):
"""Display tasks, optionally filtered by category."""
filtered = tasks
if category_filter:
filtered = [t for t in tasks if t["category"] == category_filter]
if not filtered:
print("No tasks found.")
return
for i, task in enumerate(filtered, 1):
status = "done" if task["done"] else "todo"
print(f" {i}. [{status}] {task['name']} "
f"(priority: {task['priority']}, "
f"category: {task['category']})")
def show_by_category(tasks):
"""Group and display tasks by category."""
categories = {}
for task in tasks:
cat = task["category"]
if cat not in categories:
categories[cat] = []
categories[cat].append(task)
for cat, cat_tasks in sorted(categories.items()):
print(f"\n [{cat.upper()}]")
for task in cat_tasks:
status = "done" if task["done"] else "todo"
print(f" - [{status}] {task['name']} ({task['priority']})")
Try it out:
# Create some tasks
tasks = [
create_task("Write chapter outline", "high", "writing"),
create_task("Buy groceries", "medium", "personal"),
create_task("Review pull request", "high", "work"),
create_task("Draft blog post", "low", "writing"),
create_task("Fix login bug", "high", "work"),
]
print("=== All Tasks ===")
list_tasks(tasks)
print("\n=== Work Tasks Only ===")
list_tasks(tasks, category_filter="work")
print("\n=== Tasks by Category ===")
show_by_category(tasks)
Output:
=== All Tasks ===
1. [todo] Write chapter outline (priority: high, category: writing)
2. [todo] Buy groceries (priority: medium, category: personal)
3. [todo] Review pull request (priority: high, category: work)
4. [todo] Draft blog post (priority: low, category: writing)
5. [todo] Fix login bug (priority: high, category: work)
=== Work Tasks Only ===
1. [todo] Review pull request (priority: high, category: work)
2. [todo] Fix login bug (priority: high, category: work)
=== Tasks by Category ===
[PERSONAL]
- [todo] Buy groceries (medium)
[WORK]
- [todo] Review pull request (high)
- [todo] Fix login bug (high)
[WRITING]
- [todo] Write chapter outline (high)
- [todo] Draft blog post (low)
🧩 Productive Struggle: Before you look at the
show_by_categoryfunction above, try this: given a flat list of task dictionaries, how would you build a dictionary that groups them by category? What would the keys be? What would the values be? Sketch it out on paper or in the REPL before reading the solution.
Notice how much more readable task["name"] is compared to task[2]. That's the power of dictionaries as records — your code becomes self-documenting.
🔗 Connection — Bridge Forward (Ch 10): In the next chapter, you'll save this task list to a JSON file and load it on startup. Because tasks are now dictionaries, they'll convert to JSON naturally — JSON is nested dicts and lists. This is no coincidence: we designed the data structure with persistence in mind.
🔗 Connection — Bridge Forward (Ch 14): In Chapter 14, you'll convert these task dictionaries into proper
Taskobjects using classes. The dictionary is a stepping stone — it gives you named fields, but a class gives you named fields plus methods and validation.
Chapter Summary
Dictionaries and sets are hash-based data structures that transform how you organize and access data. Dictionaries map keys to values with O(1) lookup, making them ideal for representing records, counting frequencies, and building lookup tables. Sets store unique elements and support powerful mathematical operations for comparing collections.
The key insight from this chapter is the threshold concept of hash-based O(1) lookup: when you need to find something in a large collection, a dictionary or set lets you jump directly to it instead of scanning every element. This isn't just a performance optimization — it's a fundamental shift in what problems become practical to solve.
You also learned that choosing the right data structure is a skill in itself. Lists for ordered sequences, tuples for immutable data, dicts for key-value mappings, sets for unique collections — each has its sweet spot, and picking the right one makes your code simpler, faster, and more readable.
Key concepts introduced:
1. Dictionary creation, access (bracket and .get()), and CRUD operations
2. Dictionary iteration patterns (keys, values, items)
3. Dictionary comprehensions for data transformation
4. Sets for uniqueness and membership testing
5. Set operations (union, intersection, difference, symmetric difference)
What's next: In Chapter 10, you'll learn to read and write files — giving your programs the ability to persist data beyond a single execution. Your TaskFlow tasks will survive between sessions.