Chapter 21: Working with Data: CSV, JSON, and APIs

Contributors

19 min read

> "Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom."

Prerequisites

9
10
11

Learning Objectives

Read, process, and write CSV files with the csv module and DictReader/DictWriter
Parse and generate JSON data for data interchange
Make HTTP requests to REST APIs using the requests library
Process API responses and handle common API errors (rate limits, auth, pagination)
Combine data from multiple sources into a unified dataset

In This Chapter

Chapter Overview
21.1 Data Is Everywhere
21.2 CSV Deep Dive
21.3 JSON Deep Dive
21.4 What Is an API?
21.5 Making HTTP Requests with requests
21.6 Working with API Responses
21.7 API Authentication and Rate Limits
21.8 Combining Data from Multiple Sources
21.9 Common Data Processing Patterns
21.10 Project Checkpoint: TaskFlow v2.0
Chapter Summary
Key Vocabulary

Chapter 21: Working with Data: CSV, JSON, and APIs

"Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom." — Clifford Stoll

Chapter Overview

Open your phone. Check the weather. The app didn't measure the temperature — it asked an API. Check your bank balance. The banking app didn't compute your balance locally — it asked an API. Order food. The delivery app talks to a restaurant API, a payment API, a mapping API, and a notification API, all before the driver even starts moving.

Every modern application is a data consumer. Programs don't live in isolation anymore. They read spreadsheets exported by other systems, parse configuration files, query remote servers for live data, and produce output that other programs will consume. The ability to work with real-world data formats — CSV for tabular data, JSON for structured interchange, and APIs for live remote data — is what separates a script that runs on your laptop from software that participates in the wider world.

In Chapter 10, you learned the fundamentals: opening files, reading and writing text, and got your first taste of CSV and JSON. This chapter goes deeper. You'll handle CSV edge cases that trip up beginners, navigate nested JSON structures that mirror real API responses, and — most importantly — learn to talk to web APIs using Python's requests library. By the end, you'll be pulling live data from the internet and combining it with local files into unified datasets.

In this chapter, you will learn to: - Read, process, and write CSV files with the csv module, including edge cases like commas in data and quoting - Parse and generate JSON data for data interchange, including deeply nested structures - Make HTTP requests to REST APIs using the requests library - Process API responses and handle common errors (404, 401, 429) - Combine data from multiple sources into a unified dataset (the data pipeline pattern)

🏃 Fast Track: If you're solid on CSV and JSON from Chapter 10, skim 21.2-21.3 (focus on the edge cases), then slow down at 21.4-21.7 where APIs are introduced.

🔬 Deep Dive: Section 21.8 (combining data sources) and 21.9 (processing patterns) are your springboards — they connect directly to web scraping in Chapter 24 and the capstone data dashboard project.

21.1 Data Is Everywhere

Here's a question that would have seemed absurd thirty years ago: how many different data sources does your phone talk to before you've finished your morning coffee?

Your alarm clock app checks your calendar API. Your weather widget queries a meteorological API. Your email client pulls from an IMAP server. Your news feed aggregates from dozens of content APIs. Your fitness tracker syncs sleep data to a cloud API. Your smart coffee maker — yes, really — might talk to an IoT API.

Before you're even fully awake, your phone has made hundreds of data requests across dozens of formats and protocols.

This is the world your programs live in. And the three technologies that make most of it work are surprisingly simple:

CSV (Comma-Separated Values): The lingua franca of tabular data. Every spreadsheet program, database, and analytics tool can read and write CSV. When Elena needs to export donor data from her nonprofit's CRM and import it into her Python script, CSV is usually how that happens.
JSON (JavaScript Object Notation): The dominant format for structured data on the web. When Dr. Patel queries the NCBI database for gene annotations, the response comes back as JSON. When your weather app asks for a forecast, it gets JSON. When a web page loads dynamic content, it's almost certainly JSON behind the scenes.
APIs (Application Programming Interfaces): The mechanism by which programs talk to other programs over the internet. APIs are not a data format — they're a protocol, a contract that says: "Send me a request that looks like this, and I'll send you a response that looks like that."

🔗 Connection — Spaced Review (Ch 10): In Chapter 10, you learned open(), read(), write(), and got your first exposure to csv.reader and json.load. Everything in this chapter builds on that foundation. If those concepts feel rusty, revisit Chapter 10, Sections 10.4-10.5, before continuing.

21.2 CSV Deep Dive

You've seen basic CSV reading and writing in Chapter 10. Now let's handle the real world, where data is messy.

21.2.1 Quick Review: csv.reader and csv.writer

A CSV file is just text with values separated by commas, one record per line:

name,department,hours,rate
Elena Vasquez,Programs,42,28.50
Marcus Chen,Development,38,32.00

Reading it with csv.reader:

import csv

with open("employees.csv", "r", newline="") as f:
    reader = csv.reader(f)
    header = next(reader)  # ['name', 'department', 'hours', 'rate']
    for row in reader:
        name, dept, hours, rate = row
        pay = float(hours) * float(rate)
        print(f"{name}: ${pay:.2f}")

# Output:
# Elena Vasquez: $1197.00
# Marcus Chen: $1216.00

Writing with csv.writer:

with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["name", "total_pay"])
    writer.writerow(["Elena Vasquez", 1197.00])
    writer.writerow(["Marcus Chen", 1216.00])

⚠️ Pitfall: Always pass newline="" when opening CSV files. Without it, you can get extra blank lines on Windows. This is a subtle, platform-specific bug that will cost you an hour of debugging if you forget.

21.2.2 DictReader and DictWriter: Column Names, Not Indices

Using row[0], row[1], row[2] is fragile. If someone reorders the columns in the CSV, your code breaks silently — it won't crash, it'll just produce wrong results. DictReader solves this by mapping each row to a dictionary:

import csv

with open("employees.csv", "r", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        # row is a dict: {'name': 'Elena Vasquez', 'department': 'Programs', ...}
        pay = float(row["hours"]) * float(row["rate"])
        print(f"{row['name']} ({row['department']}): ${pay:.2f}")

# Output:
# Elena Vasquez (Programs): $1197.00
# Marcus Chen (Development): $1216.00

Writing dictionaries back to CSV with DictWriter:

import csv

employees = [
    {"name": "Elena Vasquez", "department": "Programs", "total_pay": 1197.00},
    {"name": "Marcus Chen", "department": "Development", "total_pay": 1216.00},
]

fieldnames = ["name", "department", "total_pay"]

with open("summary.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(employees)

✅ Best Practice: Prefer DictReader/DictWriter over reader/writer in any code that will be maintained. The small overhead of dictionary lookups is worth the readability and resilience to column reordering.

21.2.3 Edge Cases: Commas, Quotes, and Newlines in Data

What happens when the data itself contains commas?

name,address,city
"O'Brien, James","123 Main St, Apt 4B",Portland

The CSV standard (RFC 4180) handles this with quoting: any field containing a comma, a newline, or a double quote is wrapped in double quotes. If a field contains a double quote, it's escaped by doubling it: "".

import csv

# Data with commas and quotes inside fields
tricky_data = [
    ["O'Brien, James", '123 Main St, Apt 4B', "Portland"],
    ["Chen, \"Bobby\"", "456 Oak Ave", "Seattle"],
]

with open("tricky.csv", "w", newline="") as f:
    writer = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
    writer.writerow(["name", "address", "city"])
    writer.writerows(tricky_data)

# The file will contain:
# name,address,city
# "O'Brien, James","123 Main St, Apt 4B",Portland
# "Chen, ""Bobby""",456 Oak Ave,Seattle

The csv module handles all of this automatically. That's why you should always use the csv module instead of splitting lines on commas manually.

⚠️ Pitfall: Never parse CSV by doing line.split(","). It breaks on the first field that contains a comma. The csv module exists precisely because CSV parsing is harder than it looks.

21.2.4 Other Delimiters: TSV and Custom Separators

Not all "CSV" files use commas. Tab-separated values (TSV) are common in scientific data:

import csv

# Reading a tab-separated file
with open("data.tsv", "r", newline="") as f:
    reader = csv.reader(f, delimiter="\t")
    for row in reader:
        print(row)

# Writing with a pipe delimiter
with open("data.pipe", "w", newline="") as f:
    writer = csv.writer(f, delimiter="|")
    writer.writerow(["name", "score"])
    writer.writerow(["Alice", 95])

🔄 Check Your Understanding 1: You receive a CSV file where some names contain commas (like "Smith, John"). Without using the csv module, explain why line.split(",") would produce incorrect results. Then explain how csv.reader handles this correctly.

21.3 JSON Deep Dive

JSON (JavaScript Object Notation) has become the universal language of the web. Despite its JavaScript origins, JSON maps almost perfectly to Python's data types, which is why working with JSON in Python feels natural.

21.3.1 Quick Review: The Four JSON Functions

Python's json module has four core functions — two for files, two for strings:

Function	Direction	Source/Destination
`json.load(f)`	JSON file → Python	Reads from a file object
`json.dump(obj, f)`	Python → JSON file	Writes to a file object
`json.loads(s)`	JSON string → Python	Parses a string
`json.dumps(obj)`	Python → JSON string	Produces a string

💡 Intuition: The s in loads and dumps stands for "string." load reads from a file, loads reads from a string. dump writes to a file, dumps writes to a string. Once you internalize this naming convention, you'll never mix them up.

21.3.2 Nested JSON: Real-World Structures

Chapter 10 showed you simple, flat JSON. Real API responses are almost always nested — dictionaries containing lists containing dictionaries:

import json

# A realistic API response (weather data)
weather_response = {
    "location": {
        "city": "Portland",
        "state": "OR",
        "coordinates": {"lat": 45.5152, "lon": -122.6784}
    },
    "current": {
        "temp_f": 58.3,
        "condition": "Partly cloudy",
        "wind_mph": 12.5,
        "humidity": 72
    },
    "forecast": [
        {"day": "Monday", "high": 62, "low": 48, "condition": "Rain"},
        {"day": "Tuesday", "high": 65, "low": 50, "condition": "Sunny"},
        {"day": "Wednesday", "high": 59, "low": 45, "condition": "Cloudy"}
    ]
}

# Navigating nested JSON: chain the keys/indices
city = weather_response["location"]["city"]
lat = weather_response["location"]["coordinates"]["lat"]
temp = weather_response["current"]["temp_f"]
tuesday_high = weather_response["forecast"][1]["high"]

print(f"Weather in {city} (lat {lat}):")
print(f"  Current: {temp}°F")
print(f"  Tuesday high: {tuesday_high}°F")

# Output:
# Weather in Portland (lat 45.5152):
#   Current: 58.3°F
#   Tuesday high: 65°F

21.3.3 Pretty-Printing and Sorting Keys

When you're exploring a large JSON structure, pretty-printing with indent is essential:

import json

data = {"users": [{"name": "Alice", "scores": [92, 88]}, {"name": "Bob", "scores": [85, 79]}]}

# Compact (hard to read)
print(json.dumps(data))
# {"users": [{"name": "Alice", "scores": [92, 88]}, {"name": "Bob", "scores": [85, 79]}]}

# Pretty (easy to explore)
print(json.dumps(data, indent=2))
# {
#   "users": [
#     {
#       "name": "Alice",
#       "scores": [
#         92,
#         88
#       ]
#     },
#     ...
#   ]
# }

# Sorted keys (useful for diffing/comparing)
print(json.dumps(data, indent=2, sort_keys=True))

21.3.4 JSON Gotchas

A few things that will bite you if you're not watching:

import json

# 1. Tuples become lists (and don't come back)
data = {"coords": (42.3, -71.1)}
round_tripped = json.loads(json.dumps(data))
print(type(round_tripped["coords"]))  # <class 'list'>, NOT tuple

# 2. Dict keys must be strings
bad = {42: "answer", (1, 2): "tuple key"}
# json.dumps(bad)  # TypeError: keys must be strings

# 3. Sets are not JSON-serializable
bad2 = {"tags": {"python", "coding"}}
# json.dumps(bad2)  # TypeError: Object of type set is not JSON serializable

# Fix: convert sets to lists before serialization
good = {"tags": sorted(list({"python", "coding"}))}
print(json.dumps(good))  # {"tags": ["coding", "python"]}

# 4. Python's None becomes JSON's null
print(json.dumps({"value": None}))  # {"value": null}

21.3.5 CSV vs. JSON: When to Use Each

Criterion	CSV	JSON
Best for	Flat, tabular data	Hierarchical, nested data
Readability	Spreadsheet-friendly	Developer-friendly
Data types	Everything is a string	Strings, numbers, booleans, null, arrays, objects
File size	Smaller (no key repetition)	Larger (keys repeated per record)
Tool support	Excel, Google Sheets, pandas	Web APIs, config files, NoSQL databases
Nesting	Not supported	Native
Schema	Implicit (header row)	Flexible (can vary per record)

✅ Best Practice: Use CSV when your data is naturally a table (rows and columns, same fields for every record). Use JSON when your data has nesting, variable structure, or needs to preserve types.

🔄 Check Your Understanding 2: Elena has a list of 500 donors, each with a name, email, donation amount, and date. Dr. Patel has protein data where each protein has a name, a variable-length list of associated genes, and nested structural metadata. Which format (CSV or JSON) is better suited for each dataset, and why?

21.4 What Is an API?

Here's where this chapter takes a leap. CSV and JSON are data formats — containers for information. An API (Application Programming Interface) is a way for programs to talk to each other over a network.

21.4.1 The Restaurant Analogy

Think of an API like ordering at a restaurant:

You (the client) look at the menu (the API documentation) to see what's available.
You give your order (a request) to the waiter (the HTTP protocol).
The waiter takes your order to the kitchen (the server), which prepares the food.
The waiter brings back your meal (the response).

You never go into the kitchen. You don't know how the food is prepared. You don't need to. You just need to know: what can I order (the endpoints), what format does my order need to be in (the request format), and what will I get back (the response format).

That's exactly how APIs work. Your Python program sends a request to a server, and the server sends back data — usually as JSON.

21.4.2 The Request-Response Cycle

Every API interaction follows the same pattern:

┌─────────────┐         HTTP Request           ┌─────────────┐
│             │  ──────────────────────────►    │             │
│   CLIENT    │   GET /api/weather?city=PDX     │   SERVER    │
│  (your      │                                 │  (remote    │
│   Python    │  ◄──────────────────────────    │   service)  │
│   script)   │         HTTP Response           │             │
│             │   200 OK                        │             │
│             │   {"temp": 58.3, ...}           │             │
└─────────────┘                                 └─────────────┘

Key components:

URL/Endpoint: The address of the resource you want. Like https://api.weather.com/v1/forecast.
HTTP Method: What you want to do. GET means "give me data." POST means "here's data, process it."
Headers: Metadata about the request (authentication, content type, etc.).
Query Parameters: Filters or options appended to the URL: ?city=Portland&units=imperial.
Response Status Code: A number that tells you whether the request succeeded (200), the resource wasn't found (404), you're not authorized (401), or the server had an error (500).
Response Body: The actual data, usually JSON.

21.4.3 REST: The Architecture of the Web

Most APIs you'll encounter follow REST (Representational State Transfer) conventions. REST is not a protocol — it's a set of design principles:

Resources have URLs (/users, /users/42, /users/42/posts)
HTTP methods map to operations (GET = read, POST = create, PUT = update, DELETE = remove)
Responses are stateless (each request contains all the information the server needs)
Data is typically exchanged as JSON

You don't need to memorize REST theory. What matters for a Python programmer is this: most APIs accept GET requests to a URL and return JSON. That's 90% of what you'll do.

🚪 Threshold Concept: APIs as Connective Tissue

Before this chapter: "Apps are self-contained programs that do everything themselves."

After this chapter: "Every app I use talks to dozens of APIs — weather, maps, payments, authentication, social media, analytics, machine learning... My Python script can talk to those same APIs."

This shift in understanding is profound. Once you see APIs, you see them everywhere. That weather widget on your phone? An API. Google Maps embedded in a website? An API. "Sign in with Google"? An API. Credit card processing? An API. Every "smart" feature in every app you use is probably powered by an API call to a specialized service.

Your Python scripts can make those same calls. That means you can build programs that access real-time weather data, translate text, analyze sentiment, geocode addresses, send emails, generate images, and do anything that any API on the internet offers. The entire internet becomes your function library.

21.5 Making HTTP Requests with `requests`

Python's standard library has urllib, but it's verbose and awkward. The third-party requests library is the de facto standard for HTTP in Python — used by virtually every professional Python developer.

21.5.1 Installing requests

pip install requests

Or if you're using a virtual environment (as you should — see Chapter 23):

python -m pip install requests

21.5.2 Your First GET Request

Let's start with a free, public API that requires no authentication — the JSONPlaceholder API, which provides fake data for testing:

import requests

response = requests.get("https://jsonplaceholder.typicode.com/posts/1")

print(f"Status code: {response.status_code}")
print(f"Content type: {response.headers['Content-Type']}")
print(f"Response body (first 200 chars): {response.text[:200]}")

# Output:
# Status code: 200
# Content type: application/json; charset=utf-8
# Response body (first 200 chars): {
#   "userId": 1,
#   "id": 1,
#   "title": "sunt aut facere repellat provident occaecati excepturi...",
#   "body": "quia et suscipit\nsuscipit..."
# }

21.5.3 The Response Object

The response object returned by requests.get() is rich with information:

import requests

response = requests.get("https://jsonplaceholder.typicode.com/posts/1")

# Status code: did the request succeed?
print(response.status_code)   # 200

# Was it successful? (True for any 2xx status code)
print(response.ok)             # True

# The raw text of the response
print(type(response.text))    # <class 'str'>

# Parse JSON directly (no need for json.loads!)
data = response.json()
print(type(data))             # <class 'dict'>
print(data["title"])          # sunt aut facere repellat...

# Response headers
print(response.headers["Content-Type"])  # application/json; charset=utf-8

💡 Intuition: response.json() is just a convenience for json.loads(response.text). It parses the response body as JSON and returns a Python dict (or list). You'll use it constantly.

21.5.4 Common HTTP Status Codes

Code	Meaning	What to Do
200	OK — success	Process the response data
201	Created — resource created	Confirm creation succeeded
400	Bad Request — your request was malformed	Check your parameters
401	Unauthorized — invalid or missing credentials	Check your API key
403	Forbidden — you don't have permission	Check your access level
404	Not Found — the resource doesn't exist	Check the URL
429	Too Many Requests — rate limited	Slow down, wait, retry
500	Internal Server Error — the server broke	Not your fault; retry later

21.5.5 Query Parameters

Most APIs accept parameters to filter or customize the response:

import requests

# Method 1: Parameters in the URL (works but messy)
response = requests.get(
    "https://jsonplaceholder.typicode.com/posts?userId=1&_limit=3"
)

# Method 2: params dict (cleaner, auto-encodes special characters)
params = {"userId": 1, "_limit": 3}
response = requests.get(
    "https://jsonplaceholder.typicode.com/posts",
    params=params
)

print(f"URL that was actually sent: {response.url}")
# https://jsonplaceholder.typicode.com/posts?userId=1&_limit=3

posts = response.json()
print(f"Got {len(posts)} posts")
for post in posts:
    print(f"  [{post['id']}] {post['title'][:50]}...")

# Output:
# URL that was actually sent: https://jsonplaceholder.typicode.com/posts?userId=1&_limit=3
# Got 3 posts
#   [1] sunt aut facere repellat provident occaecati exc...
#   [2] qui est esse...
#   [3] ea molestias quasi exercitationem repellat qui i...

✅ Best Practice: Always use the params dict instead of manually building query strings. It handles URL encoding automatically — spaces, special characters, and Unicode all get encoded correctly.

21.6 Working with API Responses

Getting a response is only step one. The real skill is extracting useful information from it.

21.6.1 Exploring an API Response

When you encounter a new API, the first thing to do is explore what it returns:

import requests
import json

response = requests.get("https://jsonplaceholder.typicode.com/users/1")
user = response.json()

# Pretty-print to understand the structure
print(json.dumps(user, indent=2))

This reveals the structure:

{
  "id": 1,
  "name": "Leanne Graham",
  "username": "Bret",
  "email": "Sincere@april.biz",
  "address": {
    "street": "Kulas Light",
    "suite": "Apt. 556",
    "city": "Gwenborough",
    "zipcode": "92998-3874",
    "geo": {
      "lat": "-37.3159",
      "lng": "81.1496"
    }
  },
  "phone": "1-770-736-8031 x56442",
  "website": "hildegard.org",
  "company": {
    "name": "Romaguera-Crona",
    "catchPhrase": "Multi-layered client-server neural-net",
    "bs": "harness real-time e-markets"
  }
}

Now you can extract exactly what you need:

print(f"Name: {user['name']}")
print(f"City: {user['address']['city']}")
print(f"Company: {user['company']['name']}")

# Output:
# Name: Leanne Graham
# City: Gwenborough
# Company: Romaguera-Crona

21.6.2 Processing Lists of Results

APIs often return lists. Here's how to process them:

import requests

response = requests.get("https://jsonplaceholder.typicode.com/users")
users = response.json()

print(f"Total users: {len(users)}\n")

# Build a summary report
for user in users:
    city = user["address"]["city"]
    company = user["company"]["name"]
    print(f"  {user['name']:<25} {city:<20} {company}")

# Output:
# Total users: 10
#
#   Leanne Graham             Gwenborough          Romaguera-Crona
#   Ervin Howell              Wisokyburgh          Deckow-Crist
#   Clementine Bauch          McKenziehaven        Romaguera-Jacobson
#   ...

21.6.3 Dr. Patel's Pattern: API to Analysis

Dr. Patel needs protein information from a public bioinformatics API. Here's the pattern she follows — the same pattern you'll use with any API:

import requests
import json

def fetch_protein_info(protein_id: str) -> dict | None:
    """Fetch protein data from the UniProt API.

    Returns the protein data as a dict, or None if the request fails.
    """
    url = f"https://rest.uniprot.org/uniprotkb/{protein_id}.json"
    response = requests.get(url)

    if response.ok:
        return response.json()
    else:
        print(f"Error fetching {protein_id}: {response.status_code}")
        return None

# Fetch data for human insulin
data = fetch_protein_info("P01308")
if data:
    name = data.get("proteinDescription", {}).get("recommendedName", {}).get(
        "fullName", {}).get("value", "Unknown")
    organism = data.get("organism", {}).get("scientificName", "Unknown")
    length = data.get("sequence", {}).get("length", 0)

    print(f"Protein: {name}")
    print(f"Organism: {organism}")
    print(f"Sequence length: {length} amino acids")

# Output:
# Protein: Insulin
# Organism: Homo sapiens
# Sequence length: 110 amino acids

💡 Intuition: Notice the .get() chains with default empty dicts: data.get("organism", {}).get("scientificName", "Unknown"). This is a defensive pattern for nested JSON. If any key is missing, you get the default value instead of a KeyError. You learned dict.get() in Chapter 9 — this is where it really pays off.

21.7 API Authentication and Rate Limits

21.7.1 API Keys

Many APIs require authentication. The simplest form is an API key — a unique string that identifies your application:

import requests

API_KEY = "your_api_key_here"  # Get from the API provider's website

# Method 1: API key as a query parameter
response = requests.get(
    "https://api.example.com/data",
    params={"api_key": API_KEY, "query": "Portland"}
)

# Method 2: API key in a header (more secure, more common)
headers = {"Authorization": f"Bearer {API_KEY}"}
response = requests.get(
    "https://api.example.com/data",
    headers=headers,
    params={"query": "Portland"}
)

⚠️ Pitfall: Never hard-code API keys in your source code, especially if the code will be shared or committed to Git. Use environment variables instead:

```python import os

API_KEY = os.environ.get("WEATHER_API_KEY") if not API_KEY: raise SystemExit("Set the WEATHER_API_KEY environment variable first.") ```

21.7.2 Rate Limiting

APIs limit how many requests you can make in a given time period. This is called rate limiting. If you exceed the limit, you'll get a 429 Too Many Requests response.

import requests
import time

def fetch_with_retry(url: str, max_retries: int = 3, delay: float = 1.0) -> dict | None:
    """Fetch URL with automatic retry on rate limiting."""
    for attempt in range(max_retries):
        response = requests.get(url)

        if response.ok:
            return response.json()
        elif response.status_code == 429:
            wait = delay * (2 ** attempt)  # Exponential backoff: 1, 2, 4 sec
            print(f"Rate limited. Waiting {wait:.0f} seconds...")
            time.sleep(wait)
        else:
            print(f"Error: {response.status_code}")
            return None

    print("Max retries exceeded.")
    return None

💡 Intuition: Exponential backoff means you wait longer each time you're rate-limited: 1 second, then 2, then 4. This gives the API server time to recover and makes your requests more likely to succeed on retry.

21.7.3 Handling Errors Gracefully

🔗 Connection — Spaced Review (Ch 11): Remember EAFP — Easier to Ask Forgiveness than Permission? That philosophy applies perfectly to API calls. Don't try to predict every error; catch and handle them:

import requests

def safe_api_call(url: str, params: dict = None) -> dict | None:
    """Make an API call with comprehensive error handling."""
    try:
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()  # Raises exception for 4xx/5xx
        return response.json()

    except requests.exceptions.Timeout:
        print("Request timed out. The server might be slow.")
    except requests.exceptions.ConnectionError:
        print("Connection failed. Check your internet connection.")
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error: {e.response.status_code}")
    except requests.exceptions.JSONDecodeError:
        print("Response was not valid JSON.")

    return None

🐛 Debugging Walkthrough: Common API Errors

Symptom: 401 Unauthorized Cause: Missing or invalid API key. Double-check you're passing the key correctly — some APIs want it in a header, others as a query parameter. Fix: Verify the key value, check if it's expired, and confirm the header/parameter name matches the API docs.

Symptom: 404 Not Found Cause: The URL is wrong, or the specific resource doesn't exist. Maybe you have /users/999 but there's no user 999. Fix: Check the URL against the API documentation. Try the URL in a browser.

Symptom: 429 Too Many Requests Cause: You're sending requests too fast. Common when looping over many items. Fix: Add time.sleep() between requests. Implement exponential backoff. Check the API's rate limit documentation — many include a Retry-After header telling you exactly how long to wait.

21.8 Combining Data from Multiple Sources

This is where everything comes together. Real-world data work almost always involves merging data from multiple sources.

21.8.1 Elena's Multi-Source Report

Elena's nonprofit needs a weekly report that combines: 1. A CSV file of donation records from the CRM 2. A JSON file of program metrics from the internal dashboard 3. API data from a geocoding service to map donor locations

Here's the pattern — a data pipeline:

import csv
import json
import requests

# ── Step 1: Load CSV data ──────────────────────────────
def load_donations(csv_path: str) -> list[dict]:
    """Load donation records from a CSV file."""
    donations = []
    with open(csv_path, "r", newline="") as f:
        reader = csv.DictReader(f)
        for row in reader:
            row["amount"] = float(row["amount"])
            donations.append(row)
    return donations

# ── Step 2: Load JSON data ─────────────────────────────
def load_program_metrics(json_path: str) -> dict:
    """Load program metrics from a JSON file."""
    with open(json_path, "r") as f:
        return json.load(f)

# ── Step 3: Enrich with API data ───────────────────────
def enrich_with_location(donations: list[dict]) -> list[dict]:
    """Add city/state info based on zip code (mock implementation)."""
    # In production, this would call a geocoding API.
    # Here we simulate the pattern with a lookup dict.
    zip_lookup = {
        "97201": {"city": "Portland", "state": "OR"},
        "97202": {"city": "Portland", "state": "OR"},
        "98101": {"city": "Seattle", "state": "WA"},
        "94102": {"city": "San Francisco", "state": "CA"},
    }
    for donation in donations:
        location = zip_lookup.get(donation.get("zip", ""), {})
        donation["city"] = location.get("city", "Unknown")
        donation["state"] = location.get("state", "Unknown")
    return donations

# ── Step 4: Aggregate and report ───────────────────────
def generate_report(donations: list[dict], metrics: dict) -> str:
    """Generate a summary report from combined data."""
    total = sum(d["amount"] for d in donations)
    by_city: dict[str, float] = {}
    for d in donations:
        by_city[d["city"]] = by_city.get(d["city"], 0) + d["amount"]

    lines = ["=== Weekly Report ===", ""]
    lines.append(f"Total donations: ${total:,.2f}")
    lines.append(f"Number of donors: {len(donations)}")
    lines.append(f"Programs active: {metrics.get('active_programs', 'N/A')}")
    lines.append("")
    lines.append("Donations by city:")
    for city, amount in sorted(by_city.items(), key=lambda x: -x[1]):
        lines.append(f"  {city:<20} ${amount:>10,.2f}")

    return "\n".join(lines)

This is the data pipeline pattern: Load → Clean → Enrich → Aggregate → Output. You'll see it everywhere in data engineering, and it's built from the basic tools you already know.

21.8.2 The Data Pipeline Pattern

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  LOAD    │ →  │  CLEAN   │ →  │  ENRICH  │ →  │ AGGREGATE│ →  │  OUTPUT  │
│          │    │          │    │          │    │          │    │          │
│ CSV file │    │ Fix types│    │ Add API  │    │ Sum,     │    │ Report,  │
│ JSON file│    │ Remove   │    │ data     │    │ count,   │    │ CSV,     │
│ API call │    │ bad rows │    │ Merge    │    │ group    │    │ JSON     │
└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘

🔗 Connection — Spaced Review (Ch 17): When processing large datasets, Big-O thinking matters. If you're looking up each donation's zip code against a list, that's O(n*m). Using a dictionary lookup (as we did above) makes it O(n). The data structure choices from Chapter 9 and the algorithmic thinking from Chapter 17 become critical when your dataset has 100,000 rows instead of 10.

21.9 Common Data Processing Patterns

Once you can load data from CSV, JSON, and APIs, you'll find yourself using the same processing patterns over and over.

21.9.1 Filtering

Select only the records that meet a condition:

# Filter donations over $100
large_donations = [d for d in donations if d["amount"] > 100]

# Filter API results to a specific category
posts = requests.get("https://jsonplaceholder.typicode.com/posts").json()
user_1_posts = [p for p in posts if p["userId"] == 1]

21.9.2 Transforming

Change the shape or content of each record:

# Transform raw API data into a cleaner format
raw_users = requests.get("https://jsonplaceholder.typicode.com/users").json()

clean_users = [
    {
        "name": u["name"],
        "email": u["email"],
        "city": u["address"]["city"],
        "company": u["company"]["name"],
    }
    for u in raw_users
]

21.9.3 Aggregating

Compute summaries across records:

# Count by category
from collections import Counter

categories = [d["category"] for d in donations]
category_counts = Counter(categories)
print(category_counts.most_common(3))
# [('individual', 45), ('corporate', 12), ('foundation', 8)]

# Sum by group
totals_by_state: dict[str, float] = {}
for d in donations:
    totals_by_state[d["state"]] = totals_by_state.get(d["state"], 0) + d["amount"]

21.9.4 Exporting

Write results back to CSV or JSON:

import csv
import json

# Export to CSV
with open("report.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "email", "city", "company"])
    writer.writeheader()
    writer.writerows(clean_users)

# Export to JSON
with open("report.json", "w") as f:
    json.dump(clean_users, f, indent=2)

print(f"Exported {len(clean_users)} records to CSV and JSON")

🧩 Productive Struggle: Before reading further, try this: you need weather data for 50 cities across the US. You have the city names in a CSV file. How would you get the weather data? Think about: (1) where would the weather data come from, (2) how would you match it to your cities, and (3) what would the output look like? Sketch out the steps on paper before looking at the project checkpoint.

🔄 Check Your Understanding 3: Describe the five stages of a data pipeline. For Elena's weekly report, identify which specific data source (CSV, JSON, or API) is used at each stage and what processing happens.

21.10 Project Checkpoint: TaskFlow v2.0

TaskFlow v2.0 adds data interchange capabilities — your task manager can now export and import data in both CSV and JSON formats, and it can fetch inspirational quotes from a free public API to display alongside your tasks.

What's New in v2.0

Building on v1.9 (undo/redo stack and task queue from Chapter 20):

CSV Export/Import: Export your tasks to a CSV file that you can open in Excel or Google Sheets. Import tasks from a CSV file that someone else created.
JSON Export/Import: Export tasks as JSON for backup or sharing with other programs.
API Integration: Fetch a random inspirational quote from a free API to display when you start the app or when you list tasks.

Key Design Decisions

Export functions write files to the same directory as the TaskFlow data file.
Import functions add to existing tasks (they don't replace them).
The quote API call is wrapped in a try/except so the app works even without internet.
CSV export uses DictWriter so the file has clear column headers.

The Code

"""
TaskFlow v2.0 — Data Interchange & API Integration
Introduction to Computer Science with Python — Chapter 21

New in v2.0:
  - Export tasks to CSV
  - Import tasks from CSV
  - Export tasks to JSON (pretty-printed)
  - Import tasks from JSON
  - Fetch inspirational quote from free API on startup

Building on v1.9 (Chapter 20):
  - Undo/redo stack
  - Task queue
"""

import csv
import json
import requests
import time
from datetime import datetime
from pathlib import Path

DATA_FILE = Path(__file__).parent / "taskflow_data.json"
EXPORT_DIR = Path(__file__).parent

# ── Persistence (from v0.9) ──────────────────────────────

def load_tasks(path: Path) -> list[dict]:
    if not path.exists():
        return []
    try:
        with open(path, "r") as f:
            return json.load(f)
    except json.JSONDecodeError:
        print("  Warning: data file corrupted. Starting fresh.")
        return []

def save_tasks(tasks: list[dict], path: Path) -> None:
    with open(path, "w") as f:
        json.dump(tasks, f, indent=2)

# ── NEW: CSV Export/Import ───────────────────────────────

CSV_FIELDS = ["title", "priority", "category", "done", "created"]

def export_csv(tasks: list[dict]) -> None:
    if not tasks:
        print("  No tasks to export.")
        return
    path = EXPORT_DIR / "taskflow_export.csv"
    with open(path, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=CSV_FIELDS,
                                extrasaction="ignore")
        writer.writeheader()
        writer.writerows(tasks)
    print(f"  Exported {len(tasks)} tasks to {path.name}")

def import_csv(tasks: list[dict]) -> None:
    path = EXPORT_DIR / "taskflow_export.csv"
    if not path.exists():
        print(f"  No file found: {path.name}")
        return
    count = 0
    with open(path, "r", newline="") as f:
        reader = csv.DictReader(f)
        for row in reader:
            row["done"] = row.get("done", "False").lower() == "true"
            tasks.append(row)
            count += 1
    save_tasks(tasks, DATA_FILE)
    print(f"  Imported {count} tasks from {path.name}")

# ── NEW: JSON Export/Import ──────────────────────────────

def export_json(tasks: list[dict]) -> None:
    if not tasks:
        print("  No tasks to export.")
        return
    path = EXPORT_DIR / "taskflow_export.json"
    with open(path, "w") as f:
        json.dump(tasks, f, indent=2)
    print(f"  Exported {len(tasks)} tasks to {path.name}")

def import_json(tasks: list[dict]) -> None:
    path = EXPORT_DIR / "taskflow_export.json"
    if not path.exists():
        print(f"  No file found: {path.name}")
        return
    try:
        with open(path, "r") as f:
            imported = json.load(f)
        tasks.extend(imported)
        save_tasks(tasks, DATA_FILE)
        print(f"  Imported {len(imported)} tasks from {path.name}")
    except json.JSONDecodeError:
        print(f"  Error: {path.name} is not valid JSON.")

# ── NEW: API Integration ─────────────────────────────────

def fetch_quote() -> str | None:
    """Fetch an inspirational quote from a free API."""
    try:
        response = requests.get(
            "https://dummyjson.com/quotes/random",
            timeout=5
        )
        if response.ok:
            data = response.json()
            return f'"{data["quote"]}" — {data["author"]}'
    except (requests.exceptions.RequestException, KeyError):
        pass
    return None

# ── Task operations (carried forward) ────────────────────

PRIORITIES = {"high": 1, "medium": 2, "low": 3}

def add_task(tasks: list[dict]) -> None:
    title = input("  Task title: ").strip()
    if not title:
        print("  Title cannot be empty.")
        return
    priority = input("  Priority (high/medium/low) [medium]: ").strip().lower()
    if priority not in PRIORITIES:
        priority = "medium"
    category = input("  Category [general]: ").strip() or "general"
    task = {
        "title": title,
        "priority": priority,
        "category": category,
        "done": False,
        "created": datetime.now().strftime("%Y-%m-%d %H:%M"),
    }
    tasks.append(task)
    save_tasks(tasks, DATA_FILE)
    print(f"  Added: '{title}' [{priority}, {category}]")

def list_tasks(tasks: list[dict]) -> None:
    if not tasks:
        print("  No tasks yet.")
        return
    sorted_tasks = sorted(tasks, key=lambda t: (
        PRIORITIES.get(t.get("priority", "medium"), 99),
        t.get("created", ""),
    ))
    print(f"\n  {'#':<4} {'Done':<6} {'Pri':<8} {'Category':<12} {'Title'}")
    print(f"  {'—' * 4} {'—' * 5} {'—' * 7} {'—' * 11} {'—' * 20}")
    for i, task in enumerate(sorted_tasks, start=1):
        check = "[x]" if task.get("done") else "[ ]"
        print(f"  {i:<4} {check:<6} {task.get('priority', '?'):<8} "
              f"{task.get('category', '?'):<12} {task.get('title', '?')}")
    done = sum(1 for t in tasks if t.get("done"))
    print(f"\n  {done}/{len(tasks)} complete")

# ── Main loop ────────────────────────────────────────────

MENU = """
╔══════════════════════════════════╗
║         TaskFlow v2.0            ║
║  CSV · JSON · API Integration    ║
╚══════════════════════════════════╝

  1) Add task          5) Export CSV
  2) List tasks        6) Import CSV
  3) Complete task     7) Export JSON
  4) Delete task       8) Import JSON
  9) Fetch new quote
  0) Save & quit
"""

def main() -> None:
    print("\n  Welcome to TaskFlow v2.0!")
    tasks = load_tasks(DATA_FILE)
    print(f"  {len(tasks)} task(s) loaded.\n")

    quote = fetch_quote()
    if quote:
        print(f"  💡 {quote}\n")
    else:
        print("  (Could not fetch quote — check internet connection)\n")

    while True:
        print(MENU)
        choice = input("  Choose (0-9): ").strip()

        match choice:
            case "1":
                add_task(tasks)
            case "2":
                list_tasks(tasks)
            case "3":
                list_tasks(tasks)
                if tasks:
                    try:
                        num = int(input("\n  Task # to complete: "))
                        sorted_tasks = sorted(tasks, key=lambda t: (
                            PRIORITIES.get(t.get("priority", "medium"), 99),
                            t.get("created", ""),
                        ))
                        if 1 <= num <= len(sorted_tasks):
                            sorted_tasks[num - 1]["done"] = True
                            save_tasks(tasks, DATA_FILE)
                            print(f"  Completed: '{sorted_tasks[num-1]['title']}'")
                        else:
                            print("  Invalid task number.")
                    except ValueError:
                        print("  Please enter a number.")
            case "4":
                list_tasks(tasks)
                if tasks:
                    try:
                        num = int(input("\n  Task # to delete: "))
                        sorted_tasks = sorted(tasks, key=lambda t: (
                            PRIORITIES.get(t.get("priority", "medium"), 99),
                            t.get("created", ""),
                        ))
                        if 1 <= num <= len(sorted_tasks):
                            target = sorted_tasks[num - 1]
                            tasks.remove(target)
                            save_tasks(tasks, DATA_FILE)
                            print(f"  Deleted: '{target['title']}'")
                        else:
                            print("  Invalid task number.")
                    except ValueError:
                        print("  Please enter a number.")
            case "5":
                export_csv(tasks)
            case "6":
                import_csv(tasks)
            case "7":
                export_json(tasks)
            case "8":
                import_json(tasks)
            case "9":
                quote = fetch_quote()
                if quote:
                    print(f"\n  💡 {quote}\n")
                else:
                    print("\n  Could not fetch quote.\n")
            case "0":
                save_tasks(tasks, DATA_FILE)
                print(f"\n  Tasks saved. Goodbye!")
                break
            case _:
                print("  Invalid choice. Enter 0-9.")

if __name__ == "__main__":
    main()

What to Notice

CSV export uses extrasaction="ignore" — if tasks have extra fields (like from the undo stack), the CSV writer silently ignores them instead of crashing.
CSV import converts the done field from a string ("True"/"False") back to a boolean. CSV files store everything as strings — you always need to convert types when reading.
The quote API call is wrapped in a try/except — the app works perfectly without internet. API integration should enhance the experience, not gatekeep it.
JSON export/import is simpler than CSV — JSON preserves types natively, so you don't need type conversion after import.

🔗 Bridge to Chapter 22: In the next chapter, you'll learn regular expressions — powerful pattern matching that lets TaskFlow v2.1 support advanced search queries like "find all tasks containing a date" or "find tasks matching 'bug*fix'." Regex will also let you validate data formats (like email addresses and phone numbers) before importing them.

Chapter Summary

This chapter transformed your Python programs from isolated scripts into connected, data-aware applications. You mastered three essential skills:

CSV mastery: You moved beyond basic reading and writing to handle edge cases — commas in data, quoting, custom delimiters, and the critical newline="" parameter. You learned to prefer DictReader/DictWriter for maintainable code.

JSON fluency: You worked with deeply nested JSON structures like those returned by real APIs, used defensive .get() chains to avoid KeyError crashes, and learned when JSON is the better choice over CSV.

API fundamentals: You made HTTP requests with the requests library, parsed JSON responses, handled errors gracefully, and implemented rate limiting with exponential backoff. Most importantly, you internalized the threshold concept: APIs are the connective tissue of the internet, and your Python scripts can talk to any of them.

The data pipeline pattern — Load, Clean, Enrich, Aggregate, Output — is the backbone of data engineering, and you've now built one from scratch. In Chapter 24, you'll extend this to web scraping, and in the capstone projects, you'll build complete data-driven applications.

Key Vocabulary

Term	Definition
API	Application Programming Interface — a set of rules for programs to communicate with each other
API key	A unique string that authenticates your application when making API requests
authentication	The process of proving your identity to an API, usually via an API key or token
CSV	Comma-Separated Values — a plain-text format for tabular data
data pipeline	A sequence of steps that load, clean, transform, and output data
DictReader	A `csv` module class that reads CSV rows as dictionaries keyed by column name
DictWriter	A `csv` module class that writes dictionaries as CSV rows
GET	An HTTP method that requests data from a server
HTTP	HyperText Transfer Protocol — the protocol for communication on the web
JSON	JavaScript Object Notation — a text format for structured data interchange
pagination	Splitting large API responses across multiple pages/requests
POST	An HTTP method that sends data to a server for processing
rate limiting	Restrictions on how many API requests you can make in a given time period
requests	A third-party Python library for making HTTP requests
response	The data a server sends back after receiving a request
REST	Representational State Transfer — architectural principles for web API design
status code	A three-digit number indicating the result of an HTTP request (200 = OK, 404 = Not Found, etc.)

Prerequisites

Learning Objectives

In This Chapter

Chapter 21: Working with Data: CSV, JSON, and APIs

Chapter Overview

21.1 Data Is Everywhere

21.2 CSV Deep Dive

21.2.1 Quick Review: csv.reader and csv.writer

21.2.2 DictReader and DictWriter: Column Names, Not Indices

21.2.3 Edge Cases: Commas, Quotes, and Newlines in Data

21.2.4 Other Delimiters: TSV and Custom Separators

21.3 JSON Deep Dive

21.3.1 Quick Review: The Four JSON Functions

21.3.2 Nested JSON: Real-World Structures

21.3.3 Pretty-Printing and Sorting Keys

21.3.4 JSON Gotchas

21.3.5 CSV vs. JSON: When to Use Each

21.4 What Is an API?

21.4.1 The Restaurant Analogy

21.4.2 The Request-Response Cycle

21.4.3 REST: The Architecture of the Web

21.5 Making HTTP Requests with requests

21.5.1 Installing requests

21.5.2 Your First GET Request

21.5.3 The Response Object

21.5.4 Common HTTP Status Codes

21.5.5 Query Parameters

21.6 Working with API Responses

21.6.1 Exploring an API Response

21.6.2 Processing Lists of Results

21.6.3 Dr. Patel's Pattern: API to Analysis

21.7 API Authentication and Rate Limits

21.7.1 API Keys

21.7.2 Rate Limiting

21.7.3 Handling Errors Gracefully

21.8 Combining Data from Multiple Sources

21.8.1 Elena's Multi-Source Report

21.8.2 The Data Pipeline Pattern

21.9 Common Data Processing Patterns

21.9.1 Filtering

21.9.2 Transforming

21.9.3 Aggregating

21.9.4 Exporting

21.10 Project Checkpoint: TaskFlow v2.0

What's New in v2.0

Key Design Decisions

The Code

What to Notice

Chapter Summary

Key Vocabulary

21.5 Making HTTP Requests with `requests`