> "The file system is the most democratic database — every programming language, every operating system, every era of computing agrees on its fundamental idea: bytes in a file."
Learning Objectives
- Open, read, and write text files using context managers (the with statement)
- Process files line by line for memory-efficient handling of large files
- Work with file paths using pathlib.Path for cross-platform compatibility
- Read and write structured data in CSV and JSON formats
- Handle common file I/O errors gracefully
In This Chapter
- Chapter Overview
- 10.1 Why File I/O Matters
- 10.2 Opening and Reading Files
- 10.3 Context Managers: The with Statement
- 10.4 Writing and Appending to Files
- 10.5 Processing Files Line by Line
- 10.6 Working with Paths: pathlib
- 10.7 Reading and Writing CSV
- 10.8 Reading and Writing JSON
- 10.9 Common File I/O Errors
- 10.10 Project Checkpoint: TaskFlow v0.9
- Chapter Summary
Chapter 10: File Input and Output: Persistent Data
"The file system is the most democratic database — every programming language, every operating system, every era of computing agrees on its fundamental idea: bytes in a file." — Adapted from Rob Pike
Chapter Overview
Every program you've written so far has a fatal flaw: it forgets everything the moment it stops running. Your grade calculator computed a perfect average, displayed it on screen, and then — poof — the data vanished. Your TaskFlow task list? Gone as soon as you pressed Ctrl+C. All that work, evaporated.
In the real world, this is a non-starter. Your phone's contacts persist when you restart it. Spreadsheets survive power outages. Web applications remember your login across sessions. How? They write data to files — or, at scale, to databases, which are themselves just sophisticated file systems under the hood.
This chapter is about making your programs remember. You'll learn to read data from files, write results to disk, and work with the two most common structured data formats in the industry: CSV and JSON. By the end of this chapter, your programs will outlive their own execution — and that's a fundamental shift in what your code can do.
In this chapter, you will learn to:
- Open files for reading, writing, and appending using open() and context managers
- Process files efficiently, line by line, without loading everything into memory
- Use pathlib.Path for cross-platform file path handling
- Read and write CSV files using the csv module
- Read and write JSON files using the json module
- Diagnose and fix common file I/O errors
🏃 Fast Track: If you're comfortable with basic file reading/writing and want to jump to structured data formats, skim sections 10.1-10.5 and start at section 10.7 (CSV) or 10.8 (JSON).
🔬 Deep Dive: After this chapter, read Case Study 02 for a comparison of data formats used in real-world applications, and the Further Reading for links to the Python documentation on
io,csv, andjson.
10.1 Why File I/O Matters
🚪 Threshold Concept: Persistence
Until now, every variable you've created lives only as long as your program is running. When the program ends, Python reclaims that memory, and the data is gone forever. Persistence is the idea that programs can outlive their execution by writing data to disk. This is the difference between a calculator and a spreadsheet — the spreadsheet remembers. Once you understand persistence, you start thinking about programs differently: every application becomes a conversation between the running code and the data it stores.
Think about the programs you use every day. Your text editor saves documents. Your music app remembers your playlists. Your web browser keeps your bookmarks across updates, crashes, and new laptop setups. All of these programs read data from files when they start and write data to files when things change.
File I/O (input/output) is the bridge between your program's temporary memory (RAM) and permanent storage (your hard drive or SSD). Here's the mental model:
The File I/O Lifecycle:
Program starts → Open file → Read data into variables → Process data
→ Write results to file → Close file → Program ends
↓
Data survives on disk
↓
Next time program starts → Read saved data → Continue
This lifecycle is at the heart of virtually every useful application. The pattern has three phases:
- Open the file (establish a connection between your program and a file on disk)
- Read or write data through that connection
- Close the file (release the connection so the operating system can clean up)
🧩 Productive Struggle
Before reading further, think about this: You built a grade calculator in earlier chapters. It computes averages, letter grades, maybe even handles weighted categories. But every time you close the program, all the student data disappears. A student asks: "Can I save my grades and load them next week?" How would you solve this? What would you need to store? Where would you put it? Jot down your ideas before reading on — you'll be surprised how close your intuition gets.
Why Not Just Use a Database?
Fair question. Databases (like SQLite, PostgreSQL, or MongoDB) are powerful tools for storing data, and you'll encounter them in later courses. But they're overkill for many tasks, and they all sit on top of file I/O at the lowest level. Understanding files first means:
- You understand what databases are actually doing under the hood.
- You can work with data formats (CSV, JSON, log files) that don't require a database.
- You can prototype quickly — write to a file now, switch to a database later when your needs grow.
💡 Intuition: File I/O is like paper. Everyone can read it, everyone can write on it, and it doesn't require electricity to store. Databases are like filing cabinets with locks, indexes, and a librarian — more powerful, but sometimes you just need to jot something on a Post-it.
10.2 Opening and Reading Files
The built-in open() function is your gateway to the file system. It creates a file object — a Python object that represents a connection to a file on disk.
The Basics of open()
file = open("myfile.txt", "r") # open for reading
content = file.read() # read the entire file
file.close() # ALWAYS close when done
The first argument is the filename (or path). The second argument is the mode — what you intend to do with the file:
| Mode | Meaning | Creates file if missing? | Overwrites existing? |
|---|---|---|---|
"r" |
Read (default) | No (raises error) | N/A |
"w" |
Write | Yes | Yes — erases all content |
"a" |
Append | Yes | No (adds to end) |
"x" |
Exclusive create | Yes (but errors if exists) | N/A |
⚠️ Pitfall: The
"w"mode is dangerous. If the file exists, opening it in write mode immediately erases all its content — before you've written a single byte. There is no undo. This is the file I/O equivalent ofrm— it doesn't ask for confirmation.
Reading Methods
Once you have a file object open in read mode, you have three ways to get data out of it:
read() — everything at once:
with open("zen.txt", "r") as f:
contents = f.read() # one big string
print(len(contents)) # number of characters
This loads the entire file into a single string. Simple and convenient for small files, but dangerous for large ones — a 2 GB log file would try to consume 2 GB of RAM.
readline() — one line at a time:
with open("zen.txt", "r") as f:
first_line = f.readline() # "Beautiful is better than ugly.\n"
second_line = f.readline() # "Explicit is better than implicit.\n"
Each call reads up to and including the next newline character (\n). When you reach the end of the file, readline() returns an empty string "".
readlines() — all lines as a list:
with open("zen.txt", "r") as f:
lines = f.readlines() # ["Beautiful is...\n", "Explicit is...\n", ...]
print(len(lines)) # number of lines
This gives you a list of strings, one per line, each ending with \n. It loads the entire file into memory, so the same size warning as read() applies.
🔗 Connection (Ch 8 — Lists):
readlines()returns a list of strings. Every list operation you learned — indexing, slicing, iterating, list comprehensions — works on the result.lines[0]gives you the first line.lines[-1]gives you the last.[line.strip() for line in lines]removes all trailing newlines.
Dr. Patel's FASTA Files
Dr. Anika Patel processes DNA sequence files in a format called FASTA. Each sequence starts with a header line beginning with >, followed by the sequence data on subsequent lines:
# Reading a simplified FASTA file
sequences = {}
current_name = ""
with open("sequences.fasta", "r") as f:
for line in f:
line = line.strip()
if line.startswith(">"):
current_name = line[1:] # remove the '>'
sequences[current_name] = ""
else:
sequences[current_name] += line
# Now sequences is a dict: {"Gene_ABC": "ATCGATCG...", ...}
for name, seq in sequences.items():
print(f"{name}: {len(seq)} nucleotides")
This pattern — reading a file and building a dictionary — shows how file I/O connects directly to the data structures you learned in Chapters 8 and 9.
🔄 Check Your Understanding
- What's the difference between
read()andreadlines()?- If you call
f.read()twice on the same file object (without reopening), what does the second call return? Why?- Why does
readline()include the\ncharacter at the end?
Verify
read()returns the entire file as a single string.readlines()returns a list of strings, one per line.- The second call returns an empty string
"". The file object maintains a cursor (position pointer) that advances as you read. Afterread()reaches the end, the cursor is at the end — there's nothing left to read.- Because
\nis a character in the file, just like any letter. Python faithfully reports what's there. Use.strip()to remove it if you don't want it.
10.3 Context Managers: The with Statement
You may have noticed that every example above uses with open(...) as f:. This is a context manager, and it's the single most important pattern in file I/O.
The Problem: Forgetting to Close
Without with, you must close the file manually:
f = open("data.txt", "r")
content = f.read()
f.close() # easy to forget!
This looks fine, but what if an error occurs between open() and close()? The file stays open, and the operating system keeps holding onto it. Open too many files without closing them and your program (or your entire system) can run out of file handles and crash.
# This is BROKEN — if process() raises an error, f.close() never runs
f = open("data.txt", "r")
content = f.read()
result = process(content) # What if this crashes?
f.close() # This line never executes
The Solution: with Guarantees Cleanup
The with statement guarantees that the file is closed when the indented block ends — whether it ends normally or because of an exception:
with open("data.txt", "r") as f:
content = f.read()
result = process(content) # Even if this crashes...
# ...the file is closed here, guaranteed.
✅ Best Practice: Always use
withfor file operations. There is no good reason to useopen()withoutwithin modern Python. If you see code that opens a file withoutwith, that's a code smell — it's not necessarily broken, but it's fragile.
How with Works (Briefly)
A context manager is any object that defines __enter__ and __exit__ methods. When Python enters the with block, it calls __enter__ (which returns the file object). When the block ends — for any reason — Python calls __exit__ (which closes the file). You don't need to understand these methods yet; you just need to use with.
# What with open(...) as f: actually does (conceptually):
# 1. Call open("data.txt", "r") → returns a file object
# 2. Call file_object.__enter__() → returns self (assigned to f)
# 3. Execute the indented block
# 4. Call file_object.__exit__() → closes the file (always, even on error)
🔄 Spaced Review (Ch 6 — Functions): The
withblock creates a scope for file operations, similar to how functions create a scope for local variables. The file objectfis accessible inside the block and technically exists after the block, but the file is closed — you can't read from or write to it anymore. Think ofwithas a function that automatically cleans up after itself.
10.4 Writing and Appending to Files
Reading data is only half the story. To achieve persistence, your programs need to write data back to disk.
Write Mode ("w"): Start Fresh
with open("output.txt", "w") as f:
f.write("Line one\n")
f.write("Line two\n")
f.write("Line three\n")
The write() method takes a string and writes it to the file. Unlike print(), it does not add a newline — you must include \n yourself.
# write() vs. print() comparison
with open("comparison.txt", "w") as f:
f.write("write: no newline added")
f.write("write: this is on the same line!")
print("print: newline added automatically", file=f)
print("print: this is on a new line", file=f)
The file would contain:
write: no newline addedwrite: this is on the same line!print: newline added automatically
print: this is on a new line
⚠️ Pitfall: Opening a file in
"w"mode truncates (empties) the file immediately — even if you never callwrite(). This code destroys the file's content:python with open("important_data.txt", "w") as f: pass # Oops — file is now empty, even though we wrote nothing
Append Mode ("a"): Add to the End
Append mode opens the file for writing but positions the cursor at the end. Existing content is preserved:
# First, create a log file
with open("app.log", "w") as f:
f.write("=== Application Log ===\n")
# Later, append entries
with open("app.log", "a") as f:
f.write("2025-01-15 09:00: App started\n")
# Even later, append more
with open("app.log", "a") as f:
f.write("2025-01-15 09:05: User logged in\n")
The file now contains all three lines. This is the pattern used for log files — you keep appending entries and the history accumulates.
writelines(): Write a List of Strings
The writelines() method writes each string in a list to the file. Like write(), it does not add newlines:
lines = ["Alice: 92\n", "Bob: 87\n", "Carol: 95\n"]
with open("grades.txt", "w") as f:
f.writelines(lines)
💡 Intuition: Think of
write()andwritelines()as the opposites ofread()andreadlines(). Theread/writepair works with a single string; thereadlines/writelinespair works with a list of strings.
Grade Calculator: Writing a Report
Let's extend the grade calculator to produce an output file:
def write_grade_report(students: list[dict], filename: str) -> None:
"""Write a formatted grade report to a text file.
Each student dict has 'name' and 'scores' keys.
"""
with open(filename, "w") as f:
f.write("Grade Report\n")
f.write("=" * 40 + "\n\n")
for student in students:
avg = sum(student["scores"]) / len(student["scores"])
f.write(f"{student['name']:<20} Average: {avg:.1f}\n")
overall = sum(
sum(s["scores"]) / len(s["scores"]) for s in students
) / len(students)
f.write(f"\n{'Class average:':<20} {overall:.1f}\n")
print(f"Report written to {filename}")
# Usage
students = [
{"name": "Alice", "scores": [92, 88, 95]},
{"name": "Bob", "scores": [85, 79, 88]},
{"name": "Carol", "scores": [97, 93, 96]},
]
write_grade_report(students, "report.txt")
Output in report.txt:
Grade Report
========================================
Alice Average: 91.7
Bob Average: 84.0
Carol Average: 95.3
Class average: 90.3
🔄 Spaced Review (Ch 6 — Functions): Notice how
write_grade_report()is a well-designed function — it takes data and a filename as parameters, does one job, and uses a context manager internally. The caller doesn't need to worry about file handles or closing. This is the power of combining functions with file I/O.
10.5 Processing Files Line by Line
When you're working with large files — log files, datasets, genome sequences — loading the entire file into memory with read() or readlines() is a bad idea. A 10 GB log file would consume 10 GB of RAM.
The solution is to iterate over the file object directly. Python reads one line at a time, keeping memory usage constant regardless of file size:
# Memory-efficient: only one line is in memory at a time
total = 0
count = 0
with open("huge_dataset.txt", "r") as f:
for line in f:
value = float(line.strip())
total += value
count += 1
print(f"Average: {total / count:.2f}")
This pattern works for files of any size — 1 KB or 100 GB. Each iteration, Python reads one line from disk, you process it, and then that line's memory can be reclaimed.
Common Line-by-Line Patterns
Counting specific lines:
# Count lines containing the word "ERROR"
error_count = 0
with open("server.log", "r") as f:
for line in f:
if "ERROR" in line:
error_count += 1
print(f"Found {error_count} errors")
Building a list from a file:
# Read a file of names into a list (stripping whitespace)
with open("names.txt", "r") as f:
names = [line.strip() for line in f if line.strip()]
Processing and writing simultaneously:
# Elena's pattern: read one file, write results to another
with open("raw_data.txt", "r") as infile, \
open("processed.txt", "w") as outfile:
for line in infile:
cleaned = line.strip().upper()
if cleaned: # skip blank lines
outfile.write(cleaned + "\n")
💡 Intuition: You can open multiple files in a single
withstatement by separating them with commas (or using a backslash\for line continuation). Both files are guaranteed to be closed when the block ends.
Elena's Report: Processing Monthly Data
Elena Vasquez at the nonprofit receives a plain-text report each month where each line contains a donor name and amount separated by a pipe character:
def process_donations(input_file: str, output_file: str) -> None:
"""Read raw donation data, compute stats, write summary."""
total = 0.0
count = 0
largest_donor = ""
largest_amount = 0.0
with open(input_file, "r") as f:
for line in f:
line = line.strip()
if not line or line.startswith("#"): # skip blanks/comments
continue
name, amount_str = line.split("|")
amount = float(amount_str.strip())
total += amount
count += 1
if amount > largest_amount:
largest_amount = amount
largest_donor = name.strip()
with open(output_file, "w") as f:
f.write(f"Donations Summary\n")
f.write(f"Total donations: ${total:,.2f}\n")
f.write(f"Number of donors: {count}\n")
f.write(f"Average donation: ${total / count:,.2f}\n")
f.write(f"Largest donor: {largest_donor} (${largest_amount:,.2f})\n")
print(f"Summary written to {output_file}")
🔄 Spaced Review (Ch 8 — Lists): The list comprehension
[line.strip() for line in f if line.strip()]combines file iteration (Chapter 10) with list comprehension filtering (Chapter 8). Every line of the file is stripped of whitespace, and blank lines are excluded — all in one expression.
10.6 Working with Paths: pathlib
So far, we've used simple filenames like "data.txt". That works when the file is in the same directory as your script, but real programs need to work with files in other directories, on different operating systems, and in locations that might not exist yet.
The pathlib module provides the Path class — an object-oriented way to work with file system paths that works correctly on Windows, macOS, and Linux.
Creating Path Objects
from pathlib import Path
# Simple filename
p = Path("data.txt")
# Subdirectory path
p = Path("reports") / "2025" / "january.csv"
print(p) # reports/2025/january.csv (or reports\2025\january.csv on Windows)
# Home directory
home = Path.home()
print(home) # /Users/yourname (macOS) or C:\Users\yourname (Windows)
# Current working directory
cwd = Path.cwd()
print(cwd) # wherever your script is running from
The / operator on Path objects joins path components. This is cleaner and more reliable than string concatenation, because Path handles the right separator for your operating system automatically.
Useful Path Operations
from pathlib import Path
p = Path("reports") / "2025" / "january.csv"
# Components
print(p.name) # "january.csv" — filename with extension
print(p.stem) # "january" — filename without extension
print(p.suffix) # ".csv" — file extension
print(p.parent) # reports/2025 — directory containing the file
# Checking existence
print(p.exists()) # True or False
print(p.is_file()) # True if it's a file (not a directory)
print(p.is_dir()) # True if it's a directory
# Creating directories
output_dir = Path("output") / "processed"
output_dir.mkdir(parents=True, exist_ok=True)
# parents=True: create intermediate directories if needed
# exist_ok=True: don't error if the directory already exists
Path Objects Work with open()
Path objects work seamlessly with open():
from pathlib import Path
data_dir = Path("data")
report_path = data_dir / "monthly_report.csv"
# Both of these work:
with open(report_path, "r") as f: # open() accepts Path objects
content = f.read()
content = report_path.read_text() # Path has its own read method
Path objects also have convenience methods for quick reads and writes:
from pathlib import Path
p = Path("quick.txt")
# Quick write (no need for open/with)
p.write_text("Hello, pathlib!\n")
# Quick read
content = p.read_text()
print(content) # "Hello, pathlib!\n"
✅ Best Practice: Use
pathlib.Pathfor all file path manipulation. String concatenation with/or\\is fragile and platform-dependent.Path("data") / "file.csv"works on every operating system."data/" + "file.csv"might not work on Windows.
Finding Your Script's Directory
A common pattern is to locate files relative to your script, not relative to where the user happens to run it:
from pathlib import Path
# Directory containing the current script
SCRIPT_DIR = Path(__file__).parent
# Data file in the same directory as the script
data_path = SCRIPT_DIR / "sample-data.csv"
# Data file in a sibling directory
config_path = SCRIPT_DIR.parent / "config" / "settings.json"
This pattern is critical for distributing programs. Without it, your code breaks if someone runs it from a different directory.
🔄 Check Your Understanding
- What does
Path("reports") / "q1" / "summary.csv"produce on macOS vs. Windows?- Why is
Path.mkdir(parents=True, exist_ok=True)safer than justPath.mkdir()?- What does
Path(__file__).parentgive you, and why is it useful?
Verify
- On macOS/Linux:
reports/q1/summary.csv. On Windows:reports\q1\summary.csv. ThePathclass uses the correct separator automatically.parents=Truecreates intermediate directories (likereports/q1/) that don't exist yet.exist_ok=Trueavoids aFileExistsErrorif the directory already exists. Without these flags, either condition raises an exception.- It gives the directory containing the currently running script. It's useful because it lets you locate data files relative to your code, regardless of where the user runs the script from.
10.7 Reading and Writing CSV
CSV (Comma-Separated Values) is the most common format for tabular data — anything that looks like a spreadsheet. Every spreadsheet application can export CSV, every data tool can import it, and every programming language has CSV support.
A CSV file is just a text file where each line is a row and values are separated by commas:
name,department,hours_worked,hourly_rate
Elena Vasquez,Programs,42,28.50
Marcus Chen,Development,38,32.00
You could parse this yourself with line.split(","), but don't. Real CSV files have edge cases that will bite you: values containing commas, quoted strings, embedded newlines, different delimiters. The csv module handles all of this correctly.
Reading with csv.reader
import csv
with open("employees.csv", "r", newline="") as f:
reader = csv.reader(f)
header = next(reader) # grab the header row
print(f"Columns: {header}")
for row in reader:
name, dept, hours, rate = row # each row is a list of strings
pay = float(hours) * float(rate)
print(f" {name}: ${pay:.2f}")
⚠️ Pitfall: Always pass
newline=""when opening CSV files. Without it, Python's universal newline handling can interfere with thecsvmodule's own newline parsing, leading to blank rows or corrupted data on some platforms.
Reading with csv.DictReader (Recommended)
DictReader maps each row to a dictionary using the header row as keys:
import csv
with open("employees.csv", "r", newline="") as f:
reader = csv.DictReader(f)
for row in reader:
# row is a dict: {"name": "Elena Vasquez", "department": "Programs", ...}
pay = float(row["hours_worked"]) * float(row["hourly_rate"])
print(f" {row['name']}: ${pay:.2f}")
DictReader is almost always preferable to csv.reader because:
- Your code is readable: row["name"] vs. row[0]
- Column order doesn't matter — you access by name
- Adding a new column to the CSV doesn't break existing code
🔗 Connection (Ch 9 — Dictionaries):
DictReaderturns each row into a dictionary. All the dict operations from Chapter 9 —row["key"],row.get("key", default), iterating with.items()— work exactly as you'd expect.
Writing with csv.writer and csv.DictWriter
import csv
# csv.writer — write lists
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Name", "Score", "Grade"]) # header
writer.writerow(["Alice", 92, "A"])
writer.writerow(["Bob", 87, "B+"])
# csv.DictWriter — write from dicts (recommended)
with open("output.csv", "w", newline="") as f:
fieldnames = ["Name", "Score", "Grade"]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({"Name": "Alice", "Score": 92, "Grade": "A"})
writer.writerow({"Name": "Bob", "Score": 87, "Grade": "B+"})
Elena's Report: CSV Pipeline
Elena processes monthly payroll CSVs to produce department summaries. Here's the complete read-process-write pattern:
import csv
def summarize_payroll(input_csv: str, output_csv: str) -> None:
"""Read employee data, compute department totals, write summary."""
dept_data: dict[str, list[float]] = {}
# Phase 1: Read and aggregate
with open(input_csv, "r", newline="") as f:
for row in csv.DictReader(f):
dept = row["department"]
pay = float(row["hours_worked"]) * float(row["hourly_rate"])
dept_data.setdefault(dept, []).append(pay)
# Phase 2: Compute and write
with open(output_csv, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=[
"department", "employees", "total_pay", "avg_pay"
])
writer.writeheader()
for dept in sorted(dept_data):
pays = dept_data[dept]
writer.writerow({
"department": dept,
"employees": len(pays),
"total_pay": f"{sum(pays):.2f}",
"avg_pay": f"{sum(pays) / len(pays):.2f}",
})
print(f"Summary: {len(dept_data)} departments → {output_csv}")
# Usage
summarize_payroll("sample-data.csv", "department_summary.csv")
10.8 Reading and Writing JSON
JSON (JavaScript Object Notation) is the lingua franca of the modern web. APIs return JSON. Configuration files use JSON. Mobile apps store settings in JSON. If you've ever looked at data from a web service, you've seen JSON.
JSON looks almost identical to Python dictionaries and lists:
{
"name": "Alice Chen",
"gpa": 3.87,
"courses": ["CS 101", "MATH 201"],
"graduated": false,
"advisor": null
}
Python's json module provides four functions — two for files, two for strings:
| Function | Direction | Works With |
|---|---|---|
json.dump(obj, file) |
Python → JSON file | File objects |
json.load(file) |
JSON file → Python | File objects |
json.dumps(obj) |
Python → JSON string | Strings |
json.loads(string) |
JSON string → Python | Strings |
💡 Intuition: The
sindumpsandloadsstands for "string."dumpwrites to a file;dumps writes to a string.loadreads from a file;loads reads from a string.
Writing JSON
import json
student = {
"name": "Alice",
"scores": [92, 88, 95],
"graduated": False,
}
# Write to a file (pretty-printed)
with open("student.json", "w") as f:
json.dump(student, f, indent=2)
# Convert to a string
json_string = json.dumps(student, indent=2)
print(json_string)
Output:
{
"name": "Alice",
"scores": [
92,
88,
95
],
"graduated": false
}
Notice that Python False becomes JSON false, and Python None becomes JSON null. The indent=2 parameter produces human-readable output; without it, everything lands on one line.
Reading JSON
import json
# Read from a file
with open("student.json", "r") as f:
student = json.load(f)
print(student["name"]) # "Alice"
print(student["scores"]) # [92, 88, 95]
print(type(student["scores"])) # <class 'list'>
# Parse from a string
json_text = '{"city": "Portland", "pop": 652503}'
data = json.loads(json_text)
print(data["city"]) # "Portland"
Python ↔ JSON Type Mapping
| Python | JSON | Notes |
|---|---|---|
dict |
object {} |
Keys must be strings in JSON |
list |
array [] |
|
str |
string | |
int, float |
number | |
True / False |
true / false |
Lowercase in JSON |
None |
null |
|
tuple |
array [] |
Tuples become lists — the distinction is lost |
⚠️ Pitfall: JSON dictionary keys must be strings. If you have integer keys in a Python dict,
json.dumpconverts them to strings. When youjson.loadthe data back, those keys are still strings — not integers. This can cause subtleKeyErrorbugs:python scores = {1: "Alice", 2: "Bob"} json_text = json.dumps(scores) # '{"1": "Alice", "2": "Bob"}' loaded = json.loads(json_text) print(loaded[1]) # KeyError! Keys are now "1", "2" print(loaded["1"]) # "Alice" — works
Grade Calculator: JSON Persistence
Here's the grade calculator with save/load functionality:
import json
from pathlib import Path
RECORDS_FILE = Path("student_records.json")
def save_records(records: list[dict]) -> None:
"""Save student records to a JSON file."""
with open(RECORDS_FILE, "w") as f:
json.dump(records, f, indent=2)
print(f"Saved {len(records)} records.")
def load_records() -> list[dict]:
"""Load student records from JSON, or return empty list."""
if not RECORDS_FILE.exists():
return []
with open(RECORDS_FILE, "r") as f:
return json.load(f)
# Usage: records persist between runs
records = load_records()
records.append({"name": "David", "scores": [76, 82, 80]})
save_records(records)
When to Use CSV vs. JSON
This is one of the most common design decisions in data programming. Here's a comparison:
| Criterion | Plain Text | CSV | JSON |
|---|---|---|---|
| Best for | Logs, notes, config | Tabular data (rows & columns) | Nested/hierarchical data |
| Human readable | Excellent | Good | Good (with indent) |
| Structure | Unstructured | Flat table | Nested objects & arrays |
| Excel compatible | No | Yes | No (without conversion) |
| API standard | No | Rare | Yes (dominant) |
| Python module | Built-in I/O | csv |
json |
| Example use case | Server log | Spreadsheet export | Config file, API response |
Rule of thumb: - If your data looks like a spreadsheet (rows and columns, all the same fields), use CSV. - If your data is nested, has varying fields per record, or needs to round-trip through a web API, use JSON. - If your data is just human-readable notes or logs, plain text is fine.
🔄 Check Your Understanding
- What's the difference between
json.dump()andjson.dumps()?- You have a list of 10,000 student records, each with the same fields (name, ID, GPA). Would you choose CSV or JSON? Why?
- What happens when you
json.dumpa Python dict with tuple values?
Verify
json.dump(obj, file)writes directly to a file object.json.dumps(obj)returns a JSON-formatted string. The "s" stands for "string."- CSV — the data is tabular (same columns for every record), CSV is more compact than JSON for flat data, and spreadsheet tools can open it directly.
- Tuples are converted to JSON arrays (which become Python lists when loaded back). The tuple-vs-list distinction is lost in the round-trip.
10.9 Common File I/O Errors
File I/O is one of the most error-prone areas in programming. Files might not exist, you might not have permission to access them, the encoding might be wrong, or the disk might be full. Here are the errors you'll encounter most often.
FileNotFoundError: Wrong Path
The most common beginner error. You try to read a file that doesn't exist — usually because the path is wrong:
# This fails if data.txt isn't in the current working directory
with open("data.txt", "r") as f:
content = f.read()
# FileNotFoundError: [Errno 2] No such file or directory: 'data.txt'
🐛 Debugging Walkthrough: FileNotFoundError
Symptom:
FileNotFoundError: [Errno 2] No such file or directory: 'data/results.csv'Common causes: 1. Typo in the filename. Double-check spelling and extension (
.csvvs.CSV). 2. Wrong working directory. Your script assumes the file is in the same folder, but you're running it from a different directory. Fix: usePath(__file__).parent / "data" / "results.csv"instead of a relative path. 3. The file genuinely doesn't exist yet. If your program is supposed to create the file on first run, check for existence first:```python from pathlib import Path
path = Path("data") / "results.csv" if path.exists(): with open(path, "r") as f: data = f.read() else: print(f"File not found: {path}") print(f"Current directory: {Path.cwd()}") print(f"Files here: {list(Path.cwd().iterdir())}") ```
The
Path.cwd()trick is your best debugging friend. When a file can't be found, print the current working directory — the answer is almost always "I thought I was in folder X, but I'm actually in folder Y."
PermissionError: Access Denied
You don't have permission to read or write the file — common on shared servers or when trying to write to system directories:
# This might fail on some systems
with open("/etc/shadow", "r") as f: # Linux system file
content = f.read()
# PermissionError: [Errno 13] Permission denied: '/etc/shadow'
Encoding Errors
Text files are stored as bytes, and those bytes must be interpreted using a character encoding. The most common encoding today is UTF-8, but older files might use Latin-1, Windows-1252, or other encodings. If Python guesses wrong, you get garbled text or crashes:
🐛 Debugging Walkthrough: Encoding Errors
Symptom:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 42What happened: The file isn't UTF-8 encoded. That byte
0xe9is the letter "e" in Latin-1 encoding.Fix: Specify the correct encoding: ```python
Try UTF-8 first (most common), fall back to latin-1
try: with open("data.txt", "r", encoding="utf-8") as f: content = f.read() except UnicodeDecodeError: with open("data.txt", "r", encoding="latin-1") as f: content = f.read() print("Warning: file is not UTF-8 — read as Latin-1") ```
Prevention: When creating files, always specify
encoding="utf-8":python with open("output.txt", "w", encoding="utf-8") as f: f.write("Caf\u00e9, na\u00efve, r\u00e9sum\u00e9\n")
Defensive File I/O Pattern
Here's a robust pattern that handles the most common errors:
from pathlib import Path
def safe_read(filepath: str | Path) -> str | None:
"""Read a file safely, returning None on failure."""
path = Path(filepath)
if not path.exists():
print(f"File not found: {path}")
return None
try:
with open(path, "r", encoding="utf-8") as f:
return f.read()
except PermissionError:
print(f"Permission denied: {path}")
return None
except UnicodeDecodeError:
print(f"Encoding error — trying latin-1: {path}")
with open(path, "r", encoding="latin-1") as f:
return f.read()
📊 Real-World Application: In production systems, file I/O errors are expected, not exceptional. Elena's automated report pipeline includes error handling for every file operation — because in a real nonprofit, someone will eventually rename the monthly CSV, change permissions on the shared drive, or save a file in the wrong encoding. Robust code handles all of these gracefully.
10.10 Project Checkpoint: TaskFlow v0.9
It's time for the biggest upgrade to TaskFlow yet: persistence. After this checkpoint, your tasks will survive between program runs. Close the program, shut down your computer, come back tomorrow — your tasks will still be there.
What's New in v0.9
- Tasks are saved to a JSON file (
taskflow_data.json) - Tasks load automatically when the program starts
- Every change (add, delete, complete) auto-saves immediately
- The program handles missing or corrupted data files gracefully
Implementation
The core persistence logic is two functions:
import json
from pathlib import Path
DATA_FILE = Path(__file__).parent / "taskflow_data.json"
def load_tasks(path: Path) -> list[dict]:
"""Load tasks from a JSON file.
Returns an empty list if the file doesn't exist or is corrupted.
"""
if not path.exists():
print(" No saved tasks found — starting fresh.")
return []
try:
with open(path, "r") as f:
tasks = json.load(f)
print(f" Loaded {len(tasks)} task(s) from {path.name}")
return tasks
except json.JSONDecodeError:
print(f" Warning: {path.name} is corrupted. Starting fresh.")
return []
def save_tasks(tasks: list[dict], path: Path) -> None:
"""Save tasks to a JSON file with pretty formatting."""
with open(path, "w") as f:
json.dump(tasks, f, indent=2)
Then integrate auto-save into every operation:
def add_task(tasks: list[dict]) -> None:
"""Add a new task and auto-save."""
title = input(" Task title: ").strip()
if not title:
print(" Title cannot be empty.")
return
priority = input(" Priority (high/medium/low) [medium]: ").strip().lower()
if priority not in ("high", "medium", "low"):
priority = "medium"
category = input(" Category [general]: ").strip() or "general"
task = {
"title": title,
"priority": priority,
"category": category,
"done": False,
"created": datetime.now().strftime("%Y-%m-%d %H:%M"),
}
tasks.append(task)
save_tasks(tasks, DATA_FILE) # <-- auto-save after every change
print(f" Added: '{title}'")
Why JSON (and Not CSV)?
This is a design decision. We chose JSON over CSV for TaskFlow because:
- Tasks have nested structure. A task might eventually have sub-tasks or tags (a list within a dict). JSON handles nesting naturally; CSV doesn't.
- Fields vary. Not every task needs every field. JSON handles missing fields gracefully; CSV requires every row to have the same columns.
- Human-readable. With
indent=2, the JSON file is easy to inspect and debug. - Round-trip fidelity. Booleans stay booleans, numbers stay numbers. In CSV, everything is a string that you'd need to convert back.
What the JSON File Looks Like
[
{
"title": "Read Chapter 10",
"priority": "high",
"category": "homework",
"done": false,
"created": "2025-01-15 09:30"
},
{
"title": "Buy groceries",
"priority": "medium",
"category": "personal",
"done": true,
"created": "2025-01-14 18:00"
}
]
Try It Yourself
- Run the TaskFlow v0.9 script from
code/project-checkpoint.py. - Add three tasks with different priorities and categories.
- Close the program (option 7).
- Reopen the program — your tasks should still be there.
- Open
taskflow_data.jsonin a text editor and inspect it. - Try deleting
taskflow_data.json— the program should start fresh without crashing.
🚪 Threshold Concept Callback: This is persistence in action. Your TaskFlow program now has memory — it outlives its own execution. Every professional application you use (email, social media, games, banking) is built on this same principle: read state from disk, let the user modify it, write state back to disk. The details get more sophisticated (databases, cloud storage, distributed systems), but the core idea is exactly what you just built.
What's Next
In Chapter 11, we'll add robust error handling to every part of TaskFlow. Right now, if the user enters non-numeric input where a number is expected, the program crashes. Chapter 11 fixes that with try/except blocks — making TaskFlow bulletproof.
Chapter Summary
This chapter introduced the fundamental skill of file I/O — making programs that persist data beyond a single execution.
Key concepts:
- open() creates a file object; the mode ("r", "w", "a") determines what you can do with it.
- Context managers (with) guarantee files are closed, even when errors occur.
- Process large files line by line to keep memory usage constant.
- pathlib.Path provides cross-platform path handling — always prefer it over string concatenation.
- The csv module handles tabular data with reader/DictReader and writer/DictWriter.
- The json module handles nested/hierarchical data with dump/load (files) and dumps/loads (strings).
- Common errors (FileNotFoundError, PermissionError, encoding issues) are expected in production code and should be handled gracefully.
New terms introduced: file object, open(), context manager, with, read mode, write mode, append mode, pathlib, Path, CSV, JSON, json module, csv module, encoding, newline, readline(), readlines()
Looking ahead: Chapter 11 introduces error handling with try/except — the Python philosophy of EAFP (Easier to Ask Forgiveness than Permission). You'll learn to catch specific exceptions, write your own error messages, and make your programs resilient to bad input, missing files, and unexpected conditions. The file I/O error patterns from section 10.9 are just the beginning.