Key Takeaways: Working with Data Structures
This is your reference card for Chapter 5. Keep it open while working through Chapter 6 — you will use every concept here when you load your first real dataset.
The Big Idea
Data structures are how you organize the world in code. Before this chapter, you stored individual values in variables. Now you can represent entire records, datasets, and mappings. The shift from "one value" to "structured collections" is the foundation of all data work in Python.
Data Structure Comparison Table
| Structure | Syntax | Ordered? | Mutable? | Duplicates? | Access By | Best For |
|---|---|---|---|---|---|---|
| List | [1, 2, 3] |
Yes | Yes | Yes | Index [i] |
Sequences, ordered records, iteration |
| Dictionary | {"a": 1} |
Insertion (3.7+) | Yes | Keys: No | Key ["a"] |
Named fields, lookups, mappings |
| Set | {1, 2, 3} |
No | Yes | No | N/A | Unique values, membership tests, set math |
| Tuple | (1, 2, 3) |
Yes | No | Yes | Index [i] |
Fixed data, dict keys, return values |
Quick decision rule: - Need to look up by name? Use a dictionary. - Need an ordered, changeable collection? Use a list. - Need unique values or fast membership checks? Use a set. - Need a fixed, unchangeable sequence? Use a tuple.
Essential Methods Reference
List Methods
my_list.append(item) # Add to end
my_list.insert(i, item) # Add at position i
my_list.remove(item) # Remove first occurrence
my_list.pop() # Remove and return last item
my_list.pop(i) # Remove and return item at index i
my_list.sort() # Sort in place (returns None!)
sorted(my_list) # Return NEW sorted list (original unchanged)
my_list.count(item) # Count occurrences
my_list.index(item) # Find first index of item
len(my_list) # Number of items
my_list.copy() # Create a shallow copy
Dictionary Methods
my_dict["key"] # Access (KeyError if missing!)
my_dict.get("key", default) # Safe access (returns default if missing)
my_dict["key"] = value # Add or update
del my_dict["key"] # Delete a key-value pair
my_dict.keys() # All keys
my_dict.values() # All values
my_dict.items() # All (key, value) pairs
my_dict.update(other_dict) # Merge another dict into this one
"key" in my_dict # Check if key exists
len(my_dict) # Number of key-value pairs
Set Operations
set_a | set_b # Union — items in either
set_a & set_b # Intersection — items in both
set_a - set_b # Difference — items in A but not B
set_a ^ set_b # Symmetric difference — items in exactly one
item in my_set # Fast membership test
my_set.add(item) # Add an item
my_set.discard(item) # Remove (no error if missing)
File I/O Patterns
Reading a CSV file
import csv
records = []
with open("data.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
record = {
"name": row["name"],
"value": float(row["value"]) # Convert strings!
}
records.append(record)
Writing a CSV file
import csv
with open("output.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "value"])
writer.writeheader()
for record in records:
writer.writerow(record)
Reading a JSON file
import json
with open("data.json", "r") as f:
data = json.load(f) # Returns dict or list
Writing a JSON file
import json
with open("output.json", "w") as f:
json.dump(data, f, indent=2) # indent for readability
Comprehension Syntax
List Comprehension
# Basic
[expression for item in iterable]
# With filter
[expression for item in iterable if condition]
# Examples
names = [record["name"] for record in data]
high = [r for r in data if r["score"] >= 80]
Dictionary Comprehension
# Basic
{key_expr: value_expr for item in iterable}
# With filter
{key_expr: value_expr for item in iterable if condition}
# Examples
lookup = {r["name"]: r["score"] for r in data}
passing = {r["name"]: r["score"] for r in data if r["score"] >= 60}
Common Patterns
Counting with dictionaries
counts = {}
for item in my_list:
counts[item] = counts.get(item, 0) + 1
Grouping with dictionaries
groups = {}
for record in records:
key = record["category"]
if key not in groups:
groups[key] = []
groups[key].append(record)
The read-process-write pipeline
# 1. Read
records = []
with open("input.csv", "r") as f:
for row in csv.DictReader(f):
records.append({...convert types...})
# 2. Process
for record in records:
record["new_field"] = compute_something(record)
# 3. Write
with open("output.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=[...])
writer.writeheader()
for record in records:
writer.writerow(record)
Common Mistakes to Avoid
- CSV values are strings. Always convert:
float(row["rate"]),int(row["count"]). my_list.sort()returnsNone. Usesorted(my_list)if you need the result as a value.b = afor lists creates a reference, not a copy. Useb = a.copy()for an independent copy.bool("False")isTrue. Any non-empty string is truthy. Compare explicitly:value == "True".- KeyError from typos. Use
.get()or check with"key" in dictbefore accessing. - Modifying a list while looping over it. Build a new list instead.
What You Should Be Able to Do Now
- [ ] Create and manipulate lists, dictionaries, sets, and tuples
- [ ] Choose the right data structure for a given scenario and explain your reasoning
- [ ] Represent a real-world record (patient, player, country) as a dictionary
- [ ] Represent a dataset as a list of dictionaries
- [ ] Navigate nested data structures (dictionaries within dictionaries, lists within dictionaries)
- [ ] Write list and dictionary comprehensions with optional filtering
- [ ] Read a CSV file with
csv.DictReaderand convert types appropriately - [ ] Read a JSON file with
json.load - [ ] Write data to CSV and JSON files
- [ ] Debug
KeyError,IndexError, andFileNotFoundError - [ ] Explain the difference between mutable and immutable objects
If every item is checked, you are ready for Chapter 6 — your first real data analysis.