Key Takeaways: Working with Data Structures

Contributors to Introduction to Data Science

Key Takeaways: Working with Data Structures

This is your reference card for Chapter 5. Keep it open while working through Chapter 6 — you will use every concept here when you load your first real dataset.

The Big Idea

Data structures are how you organize the world in code. Before this chapter, you stored individual values in variables. Now you can represent entire records, datasets, and mappings. The shift from "one value" to "structured collections" is the foundation of all data work in Python.

Data Structure Comparison Table

Structure	Syntax	Ordered?	Mutable?	Duplicates?	Access By	Best For
List	`[1, 2, 3]`	Yes	Yes	Yes	Index `[i]`	Sequences, ordered records, iteration
Dictionary	`{"a": 1}`	Insertion (3.7+)	Yes	Keys: No	Key `["a"]`	Named fields, lookups, mappings
Set	`{1, 2, 3}`	No	Yes	No	N/A	Unique values, membership tests, set math
Tuple	`(1, 2, 3)`	Yes	No	Yes	Index `[i]`	Fixed data, dict keys, return values

Quick decision rule: - Need to look up by name? Use a dictionary. - Need an ordered, changeable collection? Use a list. - Need unique values or fast membership checks? Use a set. - Need a fixed, unchangeable sequence? Use a tuple.

Essential Methods Reference

List Methods

my_list.append(item)      # Add to end
my_list.insert(i, item)   # Add at position i
my_list.remove(item)      # Remove first occurrence
my_list.pop()             # Remove and return last item
my_list.pop(i)            # Remove and return item at index i
my_list.sort()            # Sort in place (returns None!)
sorted(my_list)           # Return NEW sorted list (original unchanged)
my_list.count(item)       # Count occurrences
my_list.index(item)       # Find first index of item
len(my_list)              # Number of items
my_list.copy()            # Create a shallow copy

Dictionary Methods

my_dict["key"]            # Access (KeyError if missing!)
my_dict.get("key", default)  # Safe access (returns default if missing)
my_dict["key"] = value    # Add or update
del my_dict["key"]        # Delete a key-value pair
my_dict.keys()            # All keys
my_dict.values()          # All values
my_dict.items()           # All (key, value) pairs
my_dict.update(other_dict)  # Merge another dict into this one
"key" in my_dict          # Check if key exists
len(my_dict)              # Number of key-value pairs

Set Operations

set_a | set_b             # Union — items in either
set_a & set_b             # Intersection — items in both
set_a - set_b             # Difference — items in A but not B
set_a ^ set_b             # Symmetric difference — items in exactly one
item in my_set            # Fast membership test
my_set.add(item)          # Add an item
my_set.discard(item)      # Remove (no error if missing)

File I/O Patterns

Reading a CSV file

import csv

records = []
with open("data.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        record = {
            "name": row["name"],
            "value": float(row["value"])   # Convert strings!
        }
        records.append(record)

Writing a CSV file

import csv

with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "value"])
    writer.writeheader()
    for record in records:
        writer.writerow(record)

Reading a JSON file

import json

with open("data.json", "r") as f:
    data = json.load(f)     # Returns dict or list

Writing a JSON file

import json

with open("output.json", "w") as f:
    json.dump(data, f, indent=2)   # indent for readability

Comprehension Syntax

List Comprehension

# Basic
[expression for item in iterable]

# With filter
[expression for item in iterable if condition]

# Examples
names = [record["name"] for record in data]
high = [r for r in data if r["score"] >= 80]

Dictionary Comprehension

# Basic
{key_expr: value_expr for item in iterable}

# With filter
{key_expr: value_expr for item in iterable if condition}

# Examples
lookup = {r["name"]: r["score"] for r in data}
passing = {r["name"]: r["score"] for r in data if r["score"] >= 60}

Common Patterns

Counting with dictionaries

counts = {}
for item in my_list:
    counts[item] = counts.get(item, 0) + 1

Grouping with dictionaries

groups = {}
for record in records:
    key = record["category"]
    if key not in groups:
        groups[key] = []
    groups[key].append(record)

The read-process-write pipeline

# 1. Read
records = []
with open("input.csv", "r") as f:
    for row in csv.DictReader(f):
        records.append({...convert types...})

# 2. Process
for record in records:
    record["new_field"] = compute_something(record)

# 3. Write
with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=[...])
    writer.writeheader()
    for record in records:
        writer.writerow(record)

Common Mistakes to Avoid

CSV values are strings. Always convert: float(row["rate"]), int(row["count"]).
my_list.sort() returns None. Use sorted(my_list) if you need the result as a value.
b = a for lists creates a reference, not a copy. Use b = a.copy() for an independent copy.
bool("False") is True. Any non-empty string is truthy. Compare explicitly: value == "True".
KeyError from typos. Use .get() or check with "key" in dict before accessing.
Modifying a list while looping over it. Build a new list instead.

What You Should Be Able to Do Now

[ ] Create and manipulate lists, dictionaries, sets, and tuples
[ ] Choose the right data structure for a given scenario and explain your reasoning
[ ] Represent a real-world record (patient, player, country) as a dictionary
[ ] Represent a dataset as a list of dictionaries
[ ] Navigate nested data structures (dictionaries within dictionaries, lists within dictionaries)
[ ] Write list and dictionary comprehensions with optional filtering
[ ] Read a CSV file with csv.DictReader and convert types appropriately
[ ] Read a JSON file with json.load
[ ] Write data to CSV and JSON files
[ ] Debug KeyError, IndexError, and FileNotFoundError
[ ] Explain the difference between mutable and immutable objects

If every item is checked, you are ready for Chapter 6 — your first real data analysis.