Chapter 9: Dictionaries and Sets: Organizing Data

Contributors

16 min read

> "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

Prerequisites

6
8

Learning Objectives

Create and manipulate dictionaries using CRUD operations
Iterate over dictionary keys, values, and items using practical patterns
Use dictionary comprehensions and the defaultdict type
Apply sets for membership testing, deduplication, and set operations
Choose the right data structure (list vs. dict vs. set vs. tuple) for a given problem

In This Chapter

Chapter Overview
9.1 Beyond Lists: When Order Isn't Enough
9.2 Creating and Accessing Dictionaries
9.3 Adding, Updating, and Removing Entries
9.4 Iterating Over Dictionaries
9.5 Dictionary Comprehensions
9.6 Nested Dictionaries
9.7 Sets: Unique Collections
9.8 Set Operations
9.9 Choosing the Right Data Structure
9.10 defaultdict and Counter from collections
9.11 Project Checkpoint: TaskFlow v0.8
Chapter Summary

Chapter 9: Dictionaries and Sets: Organizing Data

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." — Linus Torvalds

Chapter Overview

Imagine you're building a contacts app. You have a thousand phone numbers. Someone asks: "What's Jordan's number?" If your data is stored in a list, you have to scan every entry until you stumble across Jordan — and if Jordan is the 999th entry, that's 999 checks. If your data is stored in a dictionary, you look up "Jordan" and get the answer immediately. Not faster. Immediately. One step, regardless of whether you have ten contacts or ten million.

That's the power of hash-based data structures, and it's what this chapter is about.

Dictionaries and sets are two of the most frequently used data structures in professional Python code. Dictionaries let you associate keys with values — think of a real-world dictionary where you look up a word (key) and get its definition (value). Sets let you work with collections where uniqueness matters and membership testing needs to be fast. Together, they unlock patterns that would be painfully slow or awkward with lists alone.

If Chapter 8 taught you to think in sequences, this chapter teaches you to think in mappings and memberships. It's a different mental model, and once it clicks, you'll start reaching for dicts and sets constantly.

In this chapter, you will learn to: - Create and manipulate dictionaries using CRUD operations (create, read, update, delete) - Iterate over dictionary keys, values, and items using efficient patterns - Write dictionary comprehensions to transform data - Use sets for deduplication, membership testing, and set operations - Choose the right data structure for any given problem

🏃 Fast Track: If you're comfortable with basic dict syntax, skim 9.1-9.3, slow down at 9.5 (comprehensions) and 9.9 (choosing data structures), and make sure you understand the threshold concept in 9.9.

🔬 Deep Dive: Sections 9.6 (nested dicts) and 9.10 (defaultdict and Counter) are your springboards — they connect directly to JSON handling in Chapter 10 and OOP data modeling in Chapter 14.

9.1 Beyond Lists: When Order Isn't Enough

Lists are great when your data is naturally sequential — a series of test scores, a playlist of songs, a batch of filenames to process. You access items by position: "Give me the third element." But many real problems don't work that way.

Consider these scenarios:

You have a student record: name, ID, GPA, major. Which one is student[2]? Was that GPA or major? (Don't guess. Look it up. Every time.)
You're counting how many times each word appears in a document. With a list, you'd need to search the entire list for every single word.
You have a game inventory: 3 torches, 1 sword, 5 health potions. With a list, how do you check if the player has a torch? Loop through everything?

In all these cases, what you really want is to look something up by name, not by position. That's exactly what a dictionary gives you.

💡 Intuition: Think of a real dictionary — not the Python kind, the book kind. You don't start at page 1 and read every entry until you find "ephemeral." You use the word itself to jump directly to its definition. Python dictionaries work the same way, but with any kind of key, not just alphabetical words.

🔗 Connection — Spaced Review (Ch 5): In Chapter 5, you learned that for loops let you iterate over sequences. Dictionaries are iterable too, but with a twist — you can iterate over keys, values, or key-value pairs. The loop patterns you learned still apply; you just have more choices about what you're looping over.

9.2 Creating and Accessing Dictionaries

9.2.1 Dictionary Syntax

A dictionary is an unordered collection of key-value pairs. Each key maps to a value. Keys must be unique within a dictionary, and they must be hashable (we'll explain what that means shortly). Values can be anything.

Here are three ways to create a dictionary:

# Method 1: Curly brace literal (most common)
inventory = {"torch": 3, "sword": 1, "health_potion": 5}

# Method 2: dict() constructor with keyword arguments
inventory = dict(torch=3, sword=1, health_potion=5)

# Method 3: dict() from a list of tuples
inventory = dict([("torch", 3), ("sword", 1), ("health_potion", 5)])

# Empty dictionary
empty = {}
also_empty = dict()

All three methods produce the same result. Method 1 is what you'll use 90% of the time. Method 2 is clean but only works when keys are valid Python identifiers (no spaces, no special characters). Method 3 is useful when you're building a dict from another data source.

9.2.2 Accessing Values

Use square brackets with the key to access a value:

inventory = {"torch": 3, "sword": 1, "health_potion": 5}

print(inventory["torch"])        # 3
print(inventory["health_potion"])  # 5

Output:

3
5

But here's where beginners get burned — accessing a key that doesn't exist raises a KeyError:

inventory = {"torch": 3, "sword": 1, "health_potion": 5}

print(inventory["shield"])  # KeyError: 'shield'

Output:

Traceback (most recent call last):
  File "example.py", line 3, in <module>
    print(inventory["shield"])
          ~~~~~~~~~^^^^^^^^^^
KeyError: 'shield'

🐛 Debugging Walkthrough: KeyError

This is the single most common dictionary error. It happens when you try to access a key that isn't in the dict. Common causes:

Typo in the key name. inventory["Torch"] is not the same as inventory["torch"]. Keys are case-sensitive.

Assuming a key exists before adding it. If you read data from user input and haven't validated it, the key might not be there.

Different data types. inventory[1] and inventory["1"] are different keys. The integer 1 is not the same as the string "1".

Fix pattern: Use .get() (next section), use in to check first, or use try/except (Chapter 11).

9.2.3 Safe Access with `.get()`

The .get() method lets you provide a default value when a key is missing:

inventory = {"torch": 3, "sword": 1, "health_potion": 5}

# .get() returns the value if the key exists
print(inventory.get("torch"))         # 3

# .get() returns None if the key doesn't exist (no crash)
print(inventory.get("shield"))        # None

# .get() with a custom default
print(inventory.get("shield", 0))     # 0

# Compare: this would crash
# print(inventory["shield"])          # KeyError!

Output:

3
None
0

Use .get() when you're not sure a key exists and you want a fallback. Use bracket access (dict[key]) when the key should exist and a missing key means something is wrong — you want the error so you can fix the bug.

✅ Best Practice: Don't use .get() everywhere "just to be safe." If a key must exist and it's missing, that's a bug you want to catch. Reserve .get() for cases where missing keys are genuinely expected — like counting occurrences or reading optional configuration.

9.2.4 Membership Testing with `in`

You can check whether a key exists before accessing it:

inventory = {"torch": 3, "sword": 1, "health_potion": 5}

if "torch" in inventory:
    print(f"You have {inventory['torch']} torches.")
else:
    print("No torches found.")

# Note: `in` checks KEYS, not values
print(3 in inventory)  # False — 3 is a value, not a key

Output:

You have 3 torches.
False

The in operator checks keys, not values. This is important. If you need to check whether a value exists, use in inventory.values() — but be aware that searching values is O(n), not O(1).

🔄 Check Your Understanding (try before looking)

What does the following code print? python config = {"debug": True, "verbose": False, "max_retries": 3} print(config.get("timeout", 30)) print("verbose" in config) print(False in config)

Verify

30 True False "timeout" isn't a key, so .get() returns the default 30. "verbose" is a key, so in returns True. False is a value, not a key, so in returns False.

9.3 Adding, Updating, and Removing Entries

Dictionaries are mutable — you can change them after creation.

🔗 Connection — Spaced Review (Ch 8): In Chapter 8, you learned that lists are mutable but tuples are not. Dictionaries are mutable too, and they share the same aliasing behavior — if two variables point to the same dict, changing one affects the other. Keep the name-tag model from Chapter 3 in mind.

9.3.1 Adding and Updating

inventory = {"torch": 3, "sword": 1}

# Add a new key-value pair
inventory["shield"] = 1
print(inventory)
# {'torch': 3, 'sword': 1, 'shield': 1}

# Update an existing value
inventory["torch"] = 5
print(inventory)
# {'torch': 5, 'sword': 1, 'shield': 1}

# update() merges another dict into this one
inventory.update({"health_potion": 3, "torch": 7})
print(inventory)
# {'torch': 7, 'sword': 1, 'shield': 1, 'health_potion': 3}

Output:

{'torch': 3, 'sword': 1, 'shield': 1}
{'torch': 5, 'sword': 1, 'shield': 1}
{'torch': 7, 'sword': 1, 'shield': 1, 'health_potion': 3}

Notice that update() overwrites existing keys (torch went from 5 to 7) and adds new ones (health_potion).

9.3.2 The Merge Operator (Python 3.9+)

Python 3.9 introduced the | operator for merging dictionaries:

defaults = {"color": "blue", "size": "medium", "verbose": False}
user_prefs = {"color": "green", "verbose": True}

# Merge: user_prefs overrides defaults where keys overlap
final = defaults | user_prefs
print(final)
# {'color': 'green', 'size': 'medium', 'verbose': True}

# In-place merge
defaults |= user_prefs
print(defaults)
# {'color': 'green', 'size': 'medium', 'verbose': True}

Output:

{'color': 'green', 'size': 'medium', 'verbose': True}
{'color': 'green', 'size': 'medium', 'verbose': True}

9.3.3 Removing Entries

inventory = {"torch": 3, "sword": 1, "shield": 1, "health_potion": 5}

# del removes a specific key (raises KeyError if missing)
del inventory["shield"]
print(inventory)
# {'torch': 3, 'sword': 1, 'health_potion': 5}

# pop() removes and returns the value (with optional default)
potion_count = inventory.pop("health_potion")
print(f"Removed {potion_count} health potions")
# Removed 5 health potions

# pop() with a default (no KeyError)
gone = inventory.pop("amulet", 0)
print(f"Removed {gone} amulets")
# Removed 0 amulets

# clear() removes everything
inventory.clear()
print(inventory)
# {}

Output:

{'torch': 3, 'sword': 1, 'health_potion': 5}
Removed 5 health potions
Removed 0 amulets
{}

⚠️ Pitfall: Never modify a dictionary's keys while iterating over it. This raises a RuntimeError. If you need to remove items during iteration, iterate over a copy of the keys:

```python

WRONG — will crash

for key in scores: if scores[key] < 50: del scores[key] # RuntimeError!

RIGHT — iterate over a copy

for key in list(scores): if scores[key] < 50: del scores[key] ```

9.4 Iterating Over Dictionaries

This is where dictionaries really show their versatility. You have three views into a dictionary, and each one is useful for different patterns.

9.4.1 Iterating Over Keys

rooms = {
    "entrance": "A dim hallway stretches before you.",
    "armory": "Weapons line the walls. A rusty sword catches your eye.",
    "throne_room": "A golden throne sits atop a dais of black marble.",
}

# Default iteration is over keys
for room in rooms:
    print(room)

Output:

entrance
armory
throne_room

9.4.2 Iterating Over Values

rooms = {
    "entrance": "A dim hallway stretches before you.",
    "armory": "Weapons line the walls. A rusty sword catches your eye.",
    "throne_room": "A golden throne sits atop a dais of black marble.",
}

for description in rooms.values():
    print(description)

Output:

A dim hallway stretches before you.
Weapons line the walls. A rusty sword catches your eye.
A golden throne sits atop a dais of black marble.

9.4.3 Iterating Over Key-Value Pairs

This is the most common pattern — .items() gives you both the key and the value in each iteration:

rooms = {
    "entrance": "A dim hallway stretches before you.",
    "armory": "Weapons line the walls. A rusty sword catches your eye.",
    "throne_room": "A golden throne sits atop a dais of black marble.",
}

for room_name, description in rooms.items():
    print(f"[{room_name.upper()}]")
    print(f"  {description}")
    print()

Output:

[ENTRANCE]
  A dim hallway stretches before you.

[ARMORY]
  Weapons line the walls. A rusty sword catches your eye.

[THRONE_ROOM]
  A golden throne sits atop a dais of black marble.

🔗 Connection — Spaced Review (Ch 5, Ch 8): The tuple unpacking in for room_name, description in rooms.items() works exactly like unpacking a list of tuples (Chapter 8). Each call to .items() yields a (key, value) tuple, and the for loop unpacks it into two variables.

9.4.4 Practical Pattern: Building a Frequency Table

Here's one of the most important dictionary patterns in all of programming — counting occurrences:

sentence = "the cat sat on the mat the cat"
words = sentence.split()

# Count how many times each word appears
word_counts = {}
for word in words:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

print(word_counts)

Output:

{'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}

🔗 Connection — Spaced Review (Ch 7): The .split() method from Chapter 7 breaks a string into a list of words — exactly what we need to feed into a frequency counter.

There's a more elegant version using .get():

sentence = "the cat sat on the mat the cat"
words = sentence.split()

word_counts = {}
for word in words:
    word_counts[word] = word_counts.get(word, 0) + 1

print(word_counts)

Output:

{'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}

That one-liner — word_counts[word] = word_counts.get(word, 0) + 1 — is a pattern you'll use hundreds of times. It says: "Get the current count (or 0 if this is the first time), add 1, and store the result."

9.4.5 Anchor Example: Dr. Patel Counts Nucleotides

Dr. Anika Patel needs to count the frequency of each nucleotide (A, T, G, C) in a DNA sequence. This is the exact same pattern:

dna_sequence = "ATGCGATCGATCGATCATGCATGC"

nucleotide_counts = {}
for base in dna_sequence:
    nucleotide_counts[base] = nucleotide_counts.get(base, 0) + 1

print("Nucleotide frequencies:")
for base, count in sorted(nucleotide_counts.items()):
    percentage = (count / len(dna_sequence)) * 100
    print(f"  {base}: {count:>3} ({percentage:.1f}%)")

# GC content (important in genomics)
gc = nucleotide_counts.get("G", 0) + nucleotide_counts.get("C", 0)
gc_content = gc / len(dna_sequence) * 100
print(f"\nGC Content: {gc_content:.1f}%")

Output:

Nucleotide frequencies:
  A: 5 (20.8%)
  C: 7 (29.2%)
  G: 6 (25.0%)
  T: 6 (25.0%)

GC Content: 54.2%

🔄 Check Your Understanding

Given the dictionary scores = {"Alice": 92, "Bob": 87, "Carol": 95}, write a loop that prints only students who scored above 90 and their scores. Use .items().

Verify

python for name, score in scores.items(): if score > 90: print(f"{name}: {score}") Prints: Alice: 92 Carol: 95

9.5 Dictionary Comprehensions

Just as you can build lists with list comprehensions (if you've read ahead to Chapter 8's advanced material, or you'll see them later), you can build dictionaries with dictionary comprehensions. The syntax is similar but uses curly braces and a colon:

# Basic syntax: {key_expr: value_expr for item in iterable}

# Squares of numbers 1-5
squares = {n: n**2 for n in range(1, 6)}
print(squares)
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

Output:

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

9.5.1 Transforming Data

# Convert temperatures from Celsius to Fahrenheit
celsius_temps = {"Monday": 20, "Tuesday": 22, "Wednesday": 19,
                 "Thursday": 25, "Friday": 23}

fahrenheit = {day: (temp * 9/5) + 32 for day, temp in celsius_temps.items()}
print(fahrenheit)

Output:

{'Monday': 68.0, 'Tuesday': 71.6, 'Wednesday': 66.2, 'Thursday': 77.0, 'Friday': 73.4}

9.5.2 Filtering with Conditions

scores = {"Alice": 92, "Bob": 67, "Carol": 95, "Dave": 43, "Eve": 88}

# Keep only passing students (score >= 70)
passing = {name: score for name, score in scores.items() if score >= 70}
print(passing)

Output:

{'Alice': 92, 'Carol': 95, 'Eve': 88}

9.5.3 Inverting a Dictionary

Sometimes you need to swap keys and values:

extensions = {"py": "Python", "js": "JavaScript", "rs": "Rust"}

# Invert: language name -> extension
languages = {lang: ext for ext, lang in extensions.items()}
print(languages)

Output:

{'Python': 'py', 'JavaScript': 'js', 'Rust': 'rs'}

⚠️ Pitfall: Inverting only works cleanly when values are unique. If two keys share the same value, one will overwrite the other — whichever comes last wins.

✅ Best Practice: Use dict comprehensions when the transformation is simple and fits on one or two lines. If you need multiple conditions, nested loops, or complex logic, a regular for loop is more readable. Readability always wins over cleverness.

9.6 Nested Dictionaries

Real-world data is rarely flat. You'll often need dictionaries inside dictionaries to represent structured records.

9.6.1 Student Records

students = {
    "Alice": {"id": "A001", "scores": [90, 85, 92], "major": "CS"},
    "Bob": {"id": "A002", "scores": [78, 82, 74], "major": "Math"},
    "Carol": {"id": "A003", "scores": [95, 98, 91], "major": "CS"},
}

# Access nested data
print(students["Alice"]["major"])       # CS
print(students["Bob"]["scores"][1])     # 82

# Calculate Alice's average
alice_avg = sum(students["Alice"]["scores"]) / len(students["Alice"]["scores"])
print(f"Alice's average: {alice_avg:.1f}")  # 89.0

# Find all CS majors
cs_students = [name for name, info in students.items() if info["major"] == "CS"]
print(f"CS majors: {cs_students}")

Output:

CS
82
Alice's average: 89.0
CS majors: ['Alice', 'Carol']

9.6.2 Nested Dicts as Proto-JSON

If you've ever seen data from a web API or a configuration file, you've seen JSON — and JSON maps almost directly to Python dictionaries and lists:

# This is what real-world API data looks like
weather_data = {
    "city": "Portland",
    "temperature": {"current": 18, "high": 22, "low": 14},
    "conditions": "partly cloudy",
    "forecast": [
        {"day": "Monday", "high": 20, "low": 12},
        {"day": "Tuesday", "high": 24, "low": 15},
    ],
}

# Navigating nested data
print(f"Current temp in {weather_data['city']}: "
      f"{weather_data['temperature']['current']}°C")

for day_forecast in weather_data["forecast"]:
    print(f"  {day_forecast['day']}: {day_forecast['high']}°C / "
          f"{day_forecast['low']}°C")

Output:

Current temp in Portland: 18°C
  Monday: 20°C / 12°C
  Tuesday: 24°C / 15°C

🔗 Connection — Bridge Forward: In Chapter 10, you'll learn to read and write JSON files directly. The json module converts between JSON text and Python dicts/lists — which means everything you learn about nested dicts right now transfers directly to working with real-world data files and web APIs.

9.7 Sets: Unique Collections

A set is an unordered collection of unique elements. If dictionaries are about mapping keys to values, sets are about asking one question: is this element in the collection?

9.7.1 Creating Sets

# Set literal (curly braces, no colons)
fruits = {"apple", "banana", "cherry"}
print(fruits)       # {'cherry', 'banana', 'apple'} (order may vary)
print(type(fruits)) # <class 'set'>

# From a list (automatically removes duplicates)
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
unique_numbers = set(numbers)
print(unique_numbers)  # {1, 2, 3, 4, 5, 6, 9}

# Empty set (NOT {} — that's an empty dict!)
empty_set = set()
print(type(empty_set))  # <class 'set'>
print(type({}))         # <class 'dict'>  ← surprise!

Output:

{'cherry', 'banana', 'apple'}
<class 'set'>
{1, 2, 3, 4, 5, 6, 9}
<class 'set'>
<class 'dict'>

⚠️ Pitfall: {} creates an empty dictionary, not an empty set. To create an empty set, you must use set(). This is one of Python's well-known gotchas.

9.7.2 Deduplication

One of the most common uses of sets is removing duplicates from a list:

raw_emails = [
    "alice@example.com", "bob@example.com", "alice@example.com",
    "carol@example.com", "bob@example.com", "dave@example.com",
]

unique_emails = list(set(raw_emails))
print(f"Before: {len(raw_emails)} emails")
print(f"After:  {len(unique_emails)} unique emails")
print(unique_emails)

Output:

Before: 6 emails
After:  4 unique emails
['carol@example.com', 'dave@example.com', 'alice@example.com', 'bob@example.com']

⚠️ Pitfall: Converting to a set loses the original order. If order matters, use list(dict.fromkeys(raw_emails)) instead — this preserves insertion order while removing duplicates (since dict keys are unique and ordered in Python 3.7+).

9.7.3 Fast Membership Testing

Sets shine when you need to check whether an element exists in a large collection. Like dictionaries, sets use hashing for O(1) lookup:

# Simulating: check if words are in a dictionary of valid English words
valid_words = {"apple", "banana", "cherry", "date", "elderberry"}

test_words = ["apple", "xylophone", "cherry", "quartz"]
for word in test_words:
    if word in valid_words:
        print(f"  '{word}' — valid")
    else:
        print(f"  '{word}' — not found")

Output:

  'apple' — valid
  'xylophone' — not found
  'cherry' — valid
  'quartz' — not found

9.7.4 Modifying Sets

colors = {"red", "green", "blue"}

# Add a single element
colors.add("yellow")
print(colors)  # {'red', 'green', 'blue', 'yellow'}

# Adding a duplicate does nothing (no error)
colors.add("red")
print(colors)  # {'red', 'green', 'blue', 'yellow'}

# Remove an element (raises KeyError if missing)
colors.remove("green")
print(colors)  # {'red', 'blue', 'yellow'}

# Discard — like remove, but no error if missing
colors.discard("purple")  # No error
print(colors)  # {'red', 'blue', 'yellow'}

Output:

{'blue', 'green', 'yellow', 'red'}
{'blue', 'green', 'yellow', 'red'}
{'blue', 'yellow', 'red'}
{'blue', 'yellow', 'red'}

9.8 Set Operations

Sets support the mathematical operations you might remember from a math class — union, intersection, difference, and symmetric difference. These are powerful tools for comparing collections.

python_devs = {"Alice", "Bob", "Carol", "Dave"}
java_devs = {"Bob", "Dave", "Eve", "Frank"}

# Union: everyone who knows at least one language
all_devs = python_devs | java_devs  # or python_devs.union(java_devs)
print(f"All devs:        {all_devs}")

# Intersection: people who know both
both = python_devs & java_devs  # or python_devs.intersection(java_devs)
print(f"Know both:       {both}")

# Difference: Python devs who don't know Java
python_only = python_devs - java_devs  # or python_devs.difference(java_devs)
print(f"Python only:     {python_only}")

# Symmetric difference: people who know exactly one language
one_language = python_devs ^ java_devs  # or python_devs.symmetric_difference(java_devs)
print(f"Exactly one:     {one_language}")

# Subset/superset checks
print(f"\nIs {{'Bob', 'Carol'}} a subset of python_devs? {{'Bob', 'Carol'} <= python_devs}")

Output:

All devs:        {'Carol', 'Eve', 'Dave', 'Bob', 'Alice', 'Frank'}
Know both:       {'Bob', 'Dave'}
Python only:     {'Carol', 'Alice'}
Exactly one:     {'Carol', 'Eve', 'Alice', 'Frank'}

Is {'Bob', 'Carol'} a subset of python_devs? True

Operation	Operator	Method	Returns
Union	`a \\| b`	`a.union(b)`	All elements from both sets
Intersection	`a & b`	`a.intersection(b)`	Elements in both sets
Difference	`a - b`	`a.difference(b)`	Elements in `a` but not in `b`
Symmetric Difference	`a ^ b`	`a.symmetric_difference(b)`	Elements in either set, but not both
Subset	`a <= b`	`a.issubset(b)`	`True` if all of `a` is in `b`
Superset	`a >= b`	`a.issuperset(b)`	`True` if `a` contains all of `b`

🔄 Check Your Understanding

Given: python enrolled = {"Alice", "Bob", "Carol", "Dave", "Eve"} submitted = {"Alice", "Carol", "Eve", "Frank"} 1. Who submitted but isn't enrolled? (Possible cheating alert!) 2. Who is enrolled but didn't submit? (Missing assignments!) 3. How many unique people are involved in total?

Verify

submitted - enrolled → {"Frank"}

enrolled - submitted → {"Bob", "Dave"}

len(enrolled | submitted) → 6

9.9 Choosing the Right Data Structure

This is the most important section of the chapter. Everything before this has been how to use dicts and sets. This section is about when.

🚪 Threshold Concept: Hash-Based O(1) Lookup

Before this clicks: "To find something, I search through everything." You think of data as a list, and looking something up means looping through it.

After this clicks: "With a dictionary, I jump directly to what I need — no searching." You understand that dicts and sets use hashing to convert a key into a memory address, giving you constant-time lookup regardless of the collection size.

Why this matters practically:

```python import time

Create a list and a dict with 1 million entries

big_list = list(range(1_000_000)) big_dict = {i: True for i in range(1_000_000)}

Search for the last element in each

target = 999_999

start = time.perf_counter() found_list = target in big_list list_time = time.perf_counter() - start

start = time.perf_counter() found_dict = target in big_dict dict_time = time.perf_counter() - start

print(f"List search: {list_time:.6f} seconds") print(f"Dict search: {dict_time:.6f} seconds") print(f"Dict is ~{list_time / dict_time:.0f}x faster") ```

Typical output: List search: 0.012483 seconds Dict search: 0.000001 seconds Dict is ~12483x faster

That's not a typo. The dictionary lookup is roughly ten thousand times faster for a million elements. And the gap gets wider as the collection grows. A list search is O(n) — it grows linearly with size. A dict/set lookup is O(1) — it stays constant.

The mental shift: Once you internalize this, you'll start asking "Can I use a dict here?" whenever you see code that searches through a list repeatedly. This single insight will make you write dramatically faster programs.

9.9.1 The Decision Guide

Need	Use	Why
Ordered sequence, accessed by position	list	`scores[3]` — the 4th score
Fixed collection that shouldn't change	tuple	`(lat, lon)` — immutable coordinates
Key-value mapping, fast lookup by key	dict	`contacts["Jordan"]` — instant access
Unique elements, membership testing	set	`"Alice" in enrolled` — instant check
Count occurrences	dict (or `Counter`)	`word_counts["the"]` — direct access to count
Remove duplicates from a list	set	`set(emails)` — dedup in one step
Associate records with an ID	dict	`students["A001"]` — lookup by student ID
Compare two groups	set	`team_a & team_b` — who's on both?
Preserve insertion order with fast lookup	dict	Python 3.7+ dicts maintain insertion order
Stack (LIFO)	list	`append()` and `pop()` — Chapter 20
Queue (FIFO)	collections.deque	O(1) on both ends — Chapter 20

9.9.2 How Hashing Works (Simplified)

When you put a key into a dictionary, Python calls hash() on it to compute an integer. That integer is used (after some math) as an index into an internal array. When you look up the key later, Python computes the same hash and jumps directly to that spot. No searching required.

# hash() converts objects to integers
print(hash("hello"))    # e.g., 2486828820262256434 (varies per session)
print(hash(42))         # 42
print(hash((1, 2, 3)))  # e.g., 529344067295497451

# Mutable objects can't be hashed — they'd change, invalidating the hash
# hash([1, 2, 3])  # TypeError: unhashable type: 'list'

🐛 Debugging Walkthrough: Unhashable Type

You'll hit this error if you try to use a mutable object as a dictionary key or set element:

```python

This crashes

grid = {} grid[[0, 1]] = "start" # TypeError: unhashable type: 'list'

Fix: use a tuple instead (tuples are immutable, so they're hashable)

grid = {} grid[(0, 1)] = "start" # Works! ```

Rule of thumb: Anything you can use as a dict key or put in a set must be immutable — strings, numbers, tuples (of immutables), frozensets, booleans. Lists, dicts, and sets are mutable, so they can't be keys.

🔗 Connection — Spaced Review (Ch 8): This is why tuples exist! In Chapter 8, you might have wondered: "Why would I ever use a tuple instead of a list?" Here's one major reason — tuples can be dictionary keys and set elements; lists cannot.

9.10 `defaultdict` and `Counter` from `collections`

The standard library's collections module has two tools that make common dictionary patterns even cleaner.

9.10.1 `defaultdict`

A defaultdict automatically creates a default value when you access a missing key. No more checking if key in dict before every operation:

from collections import defaultdict

# Regular dict: need to check before incrementing
word_counts = {}
for word in "the cat sat on the mat the cat".split():
    if word not in word_counts:
        word_counts[word] = 0
    word_counts[word] += 1

# defaultdict(int): automatically starts at 0
word_counts = defaultdict(int)
for word in "the cat sat on the mat the cat".split():
    word_counts[word] += 1  # No check needed!

print(dict(word_counts))

Output:

{'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}

You can also use defaultdict(list) to auto-create empty lists:

from collections import defaultdict

# Group students by grade
grades = [("Alice", "A"), ("Bob", "B"), ("Carol", "A"),
          ("Dave", "C"), ("Eve", "B")]

by_grade = defaultdict(list)
for name, grade in grades:
    by_grade[grade].append(name)

for grade, students in sorted(by_grade.items()):
    print(f"Grade {grade}: {', '.join(students)}")

Output:

Grade A: Alice, Carol
Grade B: Bob, Eve
Grade C: Dave

9.10.2 `Counter`

Counter is purpose-built for counting things. It's like a dictionary that counts automatically:

from collections import Counter

# Count characters
letter_counts = Counter("mississippi")
print(letter_counts)
# Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})

# Most common elements
print(letter_counts.most_common(2))
# [('s', 4), ('i', 4)]

# Count words
words = "the cat sat on the mat the cat".split()
word_counts = Counter(words)
print(word_counts)
# Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1})

# Counters support arithmetic!
bag1 = Counter(apple=3, banana=2)
bag2 = Counter(apple=1, banana=4, cherry=2)
print(bag1 + bag2)  # Counter({'banana': 6, 'apple': 4, 'cherry': 2})

Output:

Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
[('s', 4), ('i', 4)]
Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1})
Counter({'banana': 6, 'apple': 4, 'cherry': 2})

✅ Best Practice: For frequency counting, use Counter in production code. It's clearer, faster, and already handles all the edge cases. Use the manual for loop + .get() pattern when you're learning — but reach for Counter when you're writing real programs.

9.11 Project Checkpoint: TaskFlow v0.8

Time to upgrade TaskFlow. In v0.7, tasks were stored as tuples. That worked, but accessing task[2] to get the priority is fragile and unreadable. Let's upgrade to dictionaries.

What's New in v0.8

Each task is a dictionary with named fields: name, priority, category, created_at
Filter and display tasks by category
Show tasks grouped by category
Task list stored as a list of dictionaries

The Code

Here's the core of TaskFlow v0.8 (full version in code/project-checkpoint.py):

from datetime import datetime

def create_task(name, priority="medium", category="general"):
    """Create a new task as a dictionary."""
    return {
        "name": name,
        "priority": priority,
        "category": category,
        "created_at": datetime.now().strftime("%Y-%m-%d %H:%M"),
        "done": False,
    }

def list_tasks(tasks, category_filter=None):
    """Display tasks, optionally filtered by category."""
    filtered = tasks
    if category_filter:
        filtered = [t for t in tasks if t["category"] == category_filter]

    if not filtered:
        print("No tasks found.")
        return

    for i, task in enumerate(filtered, 1):
        status = "done" if task["done"] else "todo"
        print(f"  {i}. [{status}] {task['name']} "
              f"(priority: {task['priority']}, "
              f"category: {task['category']})")

def show_by_category(tasks):
    """Group and display tasks by category."""
    categories = {}
    for task in tasks:
        cat = task["category"]
        if cat not in categories:
            categories[cat] = []
        categories[cat].append(task)

    for cat, cat_tasks in sorted(categories.items()):
        print(f"\n  [{cat.upper()}]")
        for task in cat_tasks:
            status = "done" if task["done"] else "todo"
            print(f"    - [{status}] {task['name']} ({task['priority']})")

Try it out:

# Create some tasks
tasks = [
    create_task("Write chapter outline", "high", "writing"),
    create_task("Buy groceries", "medium", "personal"),
    create_task("Review pull request", "high", "work"),
    create_task("Draft blog post", "low", "writing"),
    create_task("Fix login bug", "high", "work"),
]

print("=== All Tasks ===")
list_tasks(tasks)

print("\n=== Work Tasks Only ===")
list_tasks(tasks, category_filter="work")

print("\n=== Tasks by Category ===")
show_by_category(tasks)

Output:

=== All Tasks ===
  1. [todo] Write chapter outline (priority: high, category: writing)
  2. [todo] Buy groceries (priority: medium, category: personal)
  3. [todo] Review pull request (priority: high, category: work)
  4. [todo] Draft blog post (priority: low, category: writing)
  5. [todo] Fix login bug (priority: high, category: work)

=== Work Tasks Only ===
  1. [todo] Review pull request (priority: high, category: work)
  2. [todo] Fix login bug (priority: high, category: work)

=== Tasks by Category ===

  [PERSONAL]
    - [todo] Buy groceries (medium)

  [WORK]
    - [todo] Review pull request (high)
    - [todo] Fix login bug (high)

  [WRITING]
    - [todo] Write chapter outline (high)
    - [todo] Draft blog post (low)

🧩 Productive Struggle: Before you look at the show_by_category function above, try this: given a flat list of task dictionaries, how would you build a dictionary that groups them by category? What would the keys be? What would the values be? Sketch it out on paper or in the REPL before reading the solution.

Notice how much more readable task["name"] is compared to task[2]. That's the power of dictionaries as records — your code becomes self-documenting.

🔗 Connection — Bridge Forward (Ch 10): In the next chapter, you'll save this task list to a JSON file and load it on startup. Because tasks are now dictionaries, they'll convert to JSON naturally — JSON is nested dicts and lists. This is no coincidence: we designed the data structure with persistence in mind.

🔗 Connection — Bridge Forward (Ch 14): In Chapter 14, you'll convert these task dictionaries into proper Task objects using classes. The dictionary is a stepping stone — it gives you named fields, but a class gives you named fields plus methods and validation.

Chapter Summary

Dictionaries and sets are hash-based data structures that transform how you organize and access data. Dictionaries map keys to values with O(1) lookup, making them ideal for representing records, counting frequencies, and building lookup tables. Sets store unique elements and support powerful mathematical operations for comparing collections.

The key insight from this chapter is the threshold concept of hash-based O(1) lookup: when you need to find something in a large collection, a dictionary or set lets you jump directly to it instead of scanning every element. This isn't just a performance optimization — it's a fundamental shift in what problems become practical to solve.

You also learned that choosing the right data structure is a skill in itself. Lists for ordered sequences, tuples for immutable data, dicts for key-value mappings, sets for unique collections — each has its sweet spot, and picking the right one makes your code simpler, faster, and more readable.

Key concepts introduced: 1. Dictionary creation, access (bracket and .get()), and CRUD operations 2. Dictionary iteration patterns (keys, values, items) 3. Dictionary comprehensions for data transformation 4. Sets for uniqueness and membership testing 5. Set operations (union, intersection, difference, symmetric difference)

What's next: In Chapter 10, you'll learn to read and write files — giving your programs the ability to persist data beyond a single execution. Your TaskFlow tasks will survive between sessions.

Prerequisites

Learning Objectives

In This Chapter

Chapter 9: Dictionaries and Sets: Organizing Data

Chapter Overview

9.1 Beyond Lists: When Order Isn't Enough

9.2 Creating and Accessing Dictionaries

9.2.1 Dictionary Syntax

9.2.2 Accessing Values

9.2.3 Safe Access with .get()

9.2.4 Membership Testing with in

9.3 Adding, Updating, and Removing Entries

9.3.1 Adding and Updating

9.3.2 The Merge Operator (Python 3.9+)

9.3.3 Removing Entries

WRONG — will crash

RIGHT — iterate over a copy

9.4 Iterating Over Dictionaries

9.4.1 Iterating Over Keys

9.4.2 Iterating Over Values

9.4.3 Iterating Over Key-Value Pairs

9.4.4 Practical Pattern: Building a Frequency Table

9.4.5 Anchor Example: Dr. Patel Counts Nucleotides

9.5 Dictionary Comprehensions

9.5.1 Transforming Data

9.5.2 Filtering with Conditions

9.5.3 Inverting a Dictionary

9.6 Nested Dictionaries

9.6.1 Student Records

9.6.2 Nested Dicts as Proto-JSON

9.7 Sets: Unique Collections

9.7.1 Creating Sets

9.7.2 Deduplication

9.7.3 Fast Membership Testing

9.7.4 Modifying Sets

9.8 Set Operations

9.9 Choosing the Right Data Structure

Create a list and a dict with 1 million entries

Search for the last element in each

9.9.1 The Decision Guide

9.9.2 How Hashing Works (Simplified)

This crashes

Fix: use a tuple instead (tuples are immutable, so they're hashable)

9.10 defaultdict and Counter from collections

9.10.1 defaultdict

9.10.2 Counter

9.11 Project Checkpoint: TaskFlow v0.8

What's New in v0.8

The Code

Chapter Summary

9.2.3 Safe Access with `.get()`

9.2.4 Membership Testing with `in`

9.10 `defaultdict` and `Counter` from `collections`

9.10.1 `defaultdict`

9.10.2 `Counter`