Chapter 7 Key Takeaways: Data Structures

DataField.Dev

Chapter 7 Key Takeaways: Data Structures

The Big Ideas

1. Data structures are a business decision, not just a technical one.

Choosing a list instead of a dict isn't about syntax preference — it reflects a business judgment about how your data will be used. A product catalog is a list because it changes and is ordered. A customer record is a dict because it has named fields you look up by name. A set of active market segments is a set because uniqueness and fast membership testing are what matter, not order.

Before writing any code, ask: What questions will I need to ask of this data? What operations will be fast or slow with each structure? The right structure makes the analysis almost write itself. The wrong structure creates friction at every step.

2. Lists: ordered, mutable, positional.

Core behaviors to internalize: - Zero-indexed. The first item is at position 0. Negative indices count from the end: list[-1] is always the last item. - Slicing returns a sub-list: list[start:stop] where start is inclusive and stop is exclusive. - .append() adds one item to the end. .extend() merges another iterable in. These are not interchangeable. - .sort() modifies in place. sorted() returns a new sorted list and leaves the original unchanged. Prefer sorted() when you want to keep the original order available. - Use the key= parameter to sort by any field: sorted(catalog, key=lambda item: item["price"]). - List comprehensions are the idiomatic, concise way to filter and transform: [expr for item in list if condition].

When to reach for a list: Any time you have an ordered collection of items that might change — product catalogs, sales records, queues, ranked results.

3. Tuples: ordered, immutable, fixed records.

Core behaviors to internalize: - Tuples are read-only. They cannot be appended to, sorted in place, or modified. This is a feature: it communicates that the data represents a complete, fixed snapshot. - Tuple unpacking is powerful: region, product, amount = record unpacks a three-element tuple into three names in one line. - Named tuples (collections.namedtuple) give you field names without the overhead of a full class. They're ideal for representing rows of data. - Tuples can be used as dictionary keys (because they're immutable); lists cannot.

When to reach for a tuple: Fixed records where position has meaning (quarterly results, coordinates, a row from a CSV), configuration values that shouldn't change, return values from functions that yield multiple pieces of related information.

4. Dictionaries: key-value stores with named fields.

Core behaviors to internalize: - Always use .get(key, default) when a key might be absent. Direct bracket access (dict[key]) raises KeyError on missing keys. - .items() returns (key, value) pairs for iteration. This is the go-to pattern for processing all fields in a record. - .keys() and .values() give you just the keys or just the values when that's all you need. - .setdefault(key, default) sets a key if absent and returns its value — the cleanest way to build a dict of lists in a single loop. - Dict comprehensions build lookup tables in one line: {sku: price for sku, _, price in products}. - Python 3.9+ supports the merge operator: config = base | override. Earlier versions use {**base, **override}.

When to reach for a dict: Any structured record with named fields (customer profiles, product specs, configuration), any aggregation that groups by a label (sales by region), any fast-lookup table (SKU to price, ID to record).

5. Sets: unique, unordered, fast membership testing.

Core behaviors to internalize: - Sets automatically eliminate duplicates. Adding a value that already exists is a no-op. - Sets have no guaranteed order and no index. You can't access my_set[0]. - Set operations mirror mathematical set theory and are extraordinarily useful in business: - | (union): all items from either set - & (intersection): items that appear in both - - (difference): items in A but not B - ^ (symmetric difference): items in one but not both - in membership testing is O(1) for sets — much faster than scanning a list.

When to reach for a set: Deduplication, tracking unique values (customer IDs seen so far, SKUs already processed), comparing two groups (who lapsed? who's new? who's retained?), validating that all required fields are present.

6. Lists of dicts are the universal business data table.

The most important pattern in this chapter is a list of dictionaries where each dict is a row and each key is a column name. This pattern:

Mirrors spreadsheet rows and database query results
Works naturally with Python's built-in tools (sorted, max, sum, comprehensions)
Is what you get back from CSV readers, REST APIs, JSON files, and most databases

Master these four operations on a list of dicts: - Filter: [r for r in records if r["region"] == "West"] - Sort: sorted(records, key=lambda r: r["amount"], reverse=True) - Aggregate: {r["region"]: totals.get(r["region"], 0) + r["amount"] for r in records} (or the loop equivalent) - Top-N: sorted(records, key=lambda r: r["amount"], reverse=True)[:n]

These four patterns, combined with sum(), max(), min(), and len(), can answer nearly any question Sandra or any other stakeholder will ask about a dataset.

7. Deep copy vs. shallow copy is a real bug, not a theoretical one.

When you do list_b = list_a.copy() and your list contains dictionaries, you have two lists pointing to the same dictionary objects. Modifying a dict in list_b modifies the original in list_a.

The rule is simple: - Flat lists of immutable values (ints, strings, tuples): .copy() or list[:] is safe. - Lists containing dicts, lists, or other mutable objects: use copy.deepcopy() for a fully independent copy.

This matters most when you're creating a backup before modifying data, returning a filtered sub-list that might be modified by the caller, or passing a list to a function that you don't want to alter your original.

8. Choosing the right structure is a skill you develop through practice.

The decision guide:

Need	Structure
Ordered collection, may change	`list`
Fixed record, won't change	`tuple`
Named fields, key-based access	`dict`
Uniqueness, membership testing, set math	`set`
Table of rows	`list[dict]`
Fast lookup by unique ID	`dict[str, dict]`
Groups of items by category	`dict[str, list]`

When in doubt, start with a list of dicts. It's the most flexible structure and the easiest to extend as requirements change.

Patterns Worth Memorizing

# --- Safe dict access ---
value = d.get("key", default_value)

# --- Dict aggregation ---
totals[key] = totals.get(key, 0) + amount

# --- Build dict of lists ---
groups.setdefault(key, []).append(item)

# --- Filter a list of dicts ---
filtered = [r for r in records if r["field"] == value]

# --- Sort a list of dicts by a field ---
ranked = sorted(records, key=lambda r: r["field"], reverse=True)

# --- Top-N from a list of dicts ---
top_n = sorted(records, key=lambda r: r["score"], reverse=True)[:n]

# --- Deduplicate a list ---
unique = list(set(original_list))

# --- Set operations for business comparison ---
new_this_period     = this_period - last_period
lost_this_period    = last_period - this_period
retained_this_period = last_period & this_period

# --- Safe deep copy of nested data ---
import copy
safe_backup = copy.deepcopy(original_data)

# --- Dict comprehension for lookup table ---
price_by_sku = {sku: price for sku, _, price in products}

# --- Merge dicts (Python 3.9+) ---
merged = base_config | override_config

Common Mistakes to Avoid

Forgetting that dict[key] crashes on missing keys. Always use .get() unless you are certain the key exists.

Using .copy() on nested data. Shallow copies share inner objects. Use copy.deepcopy() when you need true independence.

Confusing .append() and .extend(). append adds one item; extend unpacks an iterable. list.append([1,2,3]) creates a nested list; list.extend([1,2,3]) adds three items.

Sorting a list of dicts without a key=. sorted(records) without a key will crash if the dicts can't be compared to each other. Always provide key=lambda r: r["field"].

Relying on dict ordering in Python < 3.7. If you need to support older Python, dicts do not guarantee insertion order. In Python 3.7+, they do.

Creating a set with {}. An empty {} creates an empty dict, not a set. Use set() for an empty set.

Modifying a list while iterating over it. Never remove items from a list inside a for loop that's iterating that same list. Build a new list instead: [item for item in list if keep_condition(item)].

What You're Ready For Next

With data structures mastered, you can now: - Work with data from files and APIs (Chapter 9) — the data comes back as lists and dicts - Write reusable functions that accept and return structured data (Chapter 8 builds on this heavily) - Build complete command-line business tools that accept user input and update in-memory databases - Understand what's happening under the hood when you use libraries like Pandas, which are built on these same primitives

The patterns from this chapter — list of dicts, dict aggregation, deep copy, set operations — are not "beginner" patterns that you'll outgrow. They are the foundation of every Python data application, from the simplest script to production analytics pipelines.

Continue to Chapter 8: Functions — Writing Reusable Code.