Chapter 9 Exercises: File I/O — Reading and Writing Business Data
These exercises are organized into five tiers of increasing complexity. Complete each tier before advancing. All exercises use the characters, scenarios, and file formats introduced in Chapter 9.
Tier 1: Foundations (Exercises 1–4)
Core mechanics — file modes, context managers, reading and writing text
Exercise 1: Your First File, Written and Read Back
Scenario: You are writing a utility to generate a simple business memo as a text file.
Task:
Write a Python script that does the following in order:
- Creates a folder called
output/if it does not already exist (usepathlib) - Writes a text file called
output/business_memo.txtwith at least 8 lines of content. Include: - A "To:" line - A "From:" line - A "Date:" line (use today's date — you can hardcode it for this exercise) - A blank line - At least four lines of memo body text about a fictional business topic - Reads the file back using
.read()and prints the entire contents to the console - Prints the file's size in bytes using
pathlib's.stat()method
Constraints:
- Use a with statement for both the write and the read
- Specify encoding="utf-8" explicitly
- Use pathlib.Path to construct the file path (no string concatenation)
Expected output: The memo text printed to the console, followed by a line like:
File size: 312 bytes
Exercise 2: The Line Counter
Scenario: Priya needs a quick utility to count lines, words, and characters in any text report file.
Task:
Write a function called count_file_contents(file_path) that:
- Takes a
pathlib.Pathas its argument - Opens the file and reads it line by line using direct iteration (not
.readlines()) - Counts and returns a dictionary with three keys:
-
"lines"— total number of lines -"words"— total number of words (split on whitespace) -"characters"— total number of characters (including spaces and newlines) - Handles a missing file gracefully: if the file does not exist, print an error message and return
{"lines": 0, "words": 0, "characters": 0}
Then write a main block that: - Creates a sample text file with at least 10 lines - Calls your function on it - Prints the result in a formatted way
Stretch goal: Add a "non_blank_lines" key that counts only lines with at least one non-whitespace character.
Exercise 3: The Append Log
Scenario: Marcus Webb wants a simple log file that records each time a script runs.
Task:
Write a function called log_script_run(script_name, status, message) that:
- Appends one line to
logs/run_history.log - Creates the
logs/directory if it does not exist - Each line should follow this format:
2024-04-01T08:52:14 [SUCCESS] weekly_consolidation.py Processed 847 records - Uses
datetime.now().isoformat(timespec="seconds")for the timestamp
Then simulate three separate script runs by calling log_script_run() three times with different arguments (including at least one "ERROR" status call). After the three calls, read the log file back and print its contents to verify all three entries are present.
Key concept to practice: Each call to log_script_run() should open the file, write one line, and close it — not hold the file open across all three calls.
Exercise 4: File Existence Checker
Scenario: Before Priya runs her weekly consolidation, she wants to verify all four regional report files are present.
Task:
Write a function called check_required_files(file_paths) that:
- Takes a list of
pathlib.Pathobjects - Checks whether each file exists using
.exists() - Returns a dictionary with two keys:
-
"found"— list of Path objects for files that exist -"missing"— list of Path objects for files that do not exist - Prints a clear status line for each file: either
"[FOUND] filename.csv"or"[MISSING] filename.csv"
Create a list of four fictitious regional CSV paths (you only need to actually create two of them on disk — leave the other two missing). Call your function and print the summary counts.
Stretch goal: Also check file size for files that exist, and flag any that are suspiciously small (under 100 bytes) as "[EMPTY?]".
Tier 2: CSV Fundamentals (Exercises 5–8)
Reading and writing CSV files with the csv module
Exercise 5: Client Contact List
Scenario: Maya needs to maintain a CSV of her client contacts.
Task:
- Create a list of at least 6 client contact dictionaries, each with these keys:
-
client_name,contact_person,email,phone,city,industry - Write them to
data/clients.csvusingcsv.DictWriter- Specifyfieldnamesexplicitly - Callwriter.writeheader()before writing rows - Read the file back using
csv.DictReaderand print each row in this format:Hartwell & Sons | Jennifer Walsh | jennifer@hartwell.com | Boston - Count and print the number of clients in each industry
Key concept: Verify that the column order in the output CSV matches your fieldnames list, regardless of the order keys appear in your dictionaries.
Exercise 6: Type Conversion and Validation
Scenario: Priya receives a CSV of expense reports. Some rows have bad data that must be filtered out.
Task:
Write a function called load_and_validate_expenses(file_path) that:
- Reads a CSV with columns:
expense_id,employee_name,department,amount,category,date - For each row, attempts to convert
amountto afloat - Skips any row where:
-
amountcannot be converted (print a warning with the row number) -amountis negative (print a warning) -employee_nameis blank - Returns the list of valid records with
amountas a float
Create a sample CSV file with at least 10 rows, including 2–3 intentionally invalid rows. Call your function and print: - How many rows were loaded - How many were skipped and why - The total amount of all valid expense records
Stretch goal: Write the valid records to a new output/cleaned_expenses.csv file.
Exercise 7: Filtered Report Writer
Scenario: Sandra wants a CSV containing only the sales reps who hit their quota this quarter.
Task:
Using the SAMPLE_ROWS data from csv_handler.py (or create your own similar dataset), write a complete script that:
- Reads the full sales data CSV into a list of dicts using
csv.DictReader - Converts
revenueandquotato floats - Calculates
quota_attainment_pctfor each rep as(revenue / quota) * 100 - Writes a new CSV called
output/quota_achievers.csvcontaining only reps who achieved >= 100% quota - Sorts the output by
quota_attainment_pctdescending
The output CSV must include these columns: rep_name, region, product_line, revenue, quota, quota_attainment_pct
Print to the console: how many reps made quota out of the total.
Exercise 8: Appending New Records
Scenario: At the end of each week, Maya adds new time entries to her project log.
Task:
Write a function called append_time_entry(csv_path, entry_dict) that:
- Reads the existing CSV to get the current fieldnames (from the header row)
- Opens the file in append mode (
"a") usingcsv.DictWriter - Writes the new entry without writing the header again
- Validates that the entry contains all required fields before appending — if a required field is missing, raise a
ValueErrorwith a clear message
Create a sample CSV with 3 rows and valid headers. Then call append_time_entry() three times with different entries. Read the file back at the end and confirm it has 6 rows (3 original + 3 appended).
Important: Remember that appending CSV rows requires newline="" just like writing.
Tier 3: JSON and Pathlib (Exercises 9–12)
Working with JSON configuration, pathlib operations, and directory processing
Exercise 9: Configuration File System
Scenario: Priya wants the consolidation script to read its settings from a JSON config file instead of hardcoding values.
Task:
- Create a JSON config file at
config/consolidation_settings.jsonwith at least these keys:json { "input_directory": "data/regional_reports", "output_directory": "output/consolidated", "required_columns": ["rep_id", "rep_name", "region", "revenue", "quota"], "over_quota_threshold_pct": 110, "report_title": "Q1 Consolidated Sales" } - Write a function
load_config(config_path)that reads this JSON and returns the parsed dict. If the file is missing, it should raise aFileNotFoundErrorwith a helpful message. - Write a function
save_config(config_path, config_dict)that writes an updated config back to the file withindent=2 - Demonstrate: load the config, change
over_quota_threshold_pctto 105, save it back, reload it, and confirm the change persisted
Stretch goal: Add a "last_run" key that gets updated with the current timestamp each time the config is loaded. This creates a simple "last accessed" audit trail.
Exercise 10: Directory Scanner
Scenario: Marcus needs a script that scans a folder and produces a report of all files found, organized by extension.
Task:
Write a function called scan_directory(directory_path) that:
- Takes a
pathlib.Pathas input (raisesFileNotFoundErrorif it does not exist) - Uses
.iterdir()to examine every item in the directory (non-recursive) - Returns a dictionary where:
- Keys are file extensions (e.g.,
".csv",".json",".txt") - Values are lists of tuples:(filename, size_in_bytes)- Files with no extension go under the key"(no extension)"- Directories are counted separately under a"(directories)"key - Prints a formatted summary grouped by extension
Create a test directory with at least 6 files of mixed types. Call your function and print the result.
Stretch goal: Accept an optional pattern argument (e.g., "*.csv") and use .glob() instead of .iterdir() when a pattern is provided.
Exercise 11: Bulk File Renamer
Scenario: Priya receives regional report files named inconsistently: North Q1.csv, south_report_q1.csv, EAST-Q1-SALES.csv. She wants to normalize them all to north_q1.csv style.
Task:
Write a function called normalize_filename(original_name) that:
- Converts the filename to lowercase
- Replaces all spaces and hyphens with underscores
- Removes any characters that are not letters, digits, underscores, or dots
Then write a function called batch_rename_files(directory_path, dry_run=True) that:
- Scans all CSV files in the directory
- Computes the normalized name for each
- If
dry_run=True, prints what would be renamed without actually doing it - If
dry_run=False, performs the rename usingpathlib's.rename()method and logs each rename
Create 5 test files with messy names, run in dry-run mode first, then run for real, and confirm the results.
Key constraint: Before renaming, check whether the normalized name would collide with an existing file. If so, skip that file and print a warning.
Exercise 12: JSON Earnings Loader
Scenario: Maya wants to compare her earnings across multiple weeks by reading a folder of JSON summary files and producing a trend report.
Task:
- Create at least 3 JSON earnings summary files in
data/earnings_history/, namedearnings_2024_week_10.json,earnings_2024_week_11.json,earnings_2024_week_12.json. Each should have at least:week,total_hours,total_earnings,active_projects_count. - Write a function
load_earnings_history(directory_path)that: - Uses.glob("earnings_*.json")to find all matching files - Loads each JSON file - Returns a list of dicts sorted byweek - Write a function
print_earnings_trend(history)that prints a simple week-by-week trend table showing earnings and hours, plus the change from the previous week
Stretch goal: Calculate and print the average earnings per week and flag any week that is more than 20% above or below the average.
Tier 4: Integration Challenges (Exercises 13–15)
Combining file I/O with business logic in complete mini-programs
Exercise 13: Priya's Region Consolidator (Reduced Version)
Scenario: Replicate the core of the Case Study 1 consolidation, building it yourself from scratch.
Task:
Build a complete script (without looking at the case study code) that:
- Creates four sample regional CSV files in
data/test_regions/, each with 5 rows and columns:region,rep_name,revenue,quota - Reads all four files using
pathlib.glob("*.csv") - Validates that each file has the required columns — skip and log any that fail validation
- Combines all valid records into one list
- Writes the combined list to
output/combined_regions.csvusingcsv.DictWriter - Writes a JSON metadata file with:
total_files,total_records,files_processed(list),generated_at
Quality criteria:
- Uses context managers for all file operations
- Handles a missing directory or empty directory gracefully
- The combined CSV has rows sorted by region then rep_name
- All numeric fields in the output CSV are formatted consistently
Exercise 14: Maya's Weekly Time Report
Scenario: Maya wants to generate a weekly earnings report from her project log.
Task:
Using the project log CSV format from maya_project_log.py, write a complete script that:
- Loads the project log (create a sample file with at least 8 projects, mix of statuses)
- Calculates for each project:
actual_earnings,projected_earnings,hours_over_under - Groups projects by status and calculates subtotals for each group
- Writes a formatted weekly report to
output/weekly_report_YYYY_MM_DD.txt(use today's date in the filename) — this should be a human-readable text file, not CSV - The text report should include: a header with the date, a table of active projects, subtotals per status group, and a total earnings line
- Also writes the raw data to
output/weekly_report_data.csvfor use in other tools
The text report should look something like:
MAYA REYES — WEEKLY EARNINGS REPORT
Generated: 2024-04-01
=====================================
ACTIVE PROJECTS (7)
--------------------
Hartwell & Sons / Financial Dashboard 38.5h / 40.0h est. $6,737.50
...
SUBTOTALS
Active : $28,525.00 (7 projects)
Completed : $19,162.50 (2 projects)
...
TOTAL EARNINGS TO DATE: $51,362.50
Exercise 15: The Master Batch Processor
Scenario: Acme Corp receives CSV files from multiple vendors throughout the week. Each file has different columns, but all share vendor_id, invoice_number, amount, and date. Marcus needs a script that processes them all and writes one unified invoice register.
Task:
- Create 4 vendor CSV files in
data/vendor_invoices/. Each should have the required columns plus 2–3 vendor-specific extra columns. - Write a
process_vendor_file(file_path)function that: - Reads the file - Extracts only the four required columns - Convertsamountto float anddatetoYYYY-MM-DDformat (assume dates may come in asMM/DD/YYYY) - Returns valid records and a list of any skipped rows with reasons - Write a main function that:
- Processes all files using
.rglob("*.csv")- Collects all valid records across all files - Writes the unified register tooutput/invoice_register.csv, sorted bydatethenvendor_id- Writes a processing log tologs/vendor_processing.logusing append mode, with one entry per file processed - Writes a summary JSON atoutput/invoice_summary.jsonwith total count and total amount
Quality criteria:
- extrasaction="ignore" on all DictWriters
- The date normalization handles both MM/DD/YYYY and YYYY-MM-DD gracefully
- Any file that fails completely (unreadable, missing required columns) is logged and skipped, but does not stop the rest of processing
Tier 5: Stretch Challenges (Exercises 16–17)
Open-ended problems for learners ready to go beyond the chapter
Exercise 16: File Watcher Simulation
Scenario: Design a "polling" file processor that simulates what a real-time file watcher would do.
Task:
Create a script that:
- Maintains a JSON "state file" at
state/processed_files.jsontracking which files have already been processed (stored as a dict mapping filename to timestamp) - On each run, scans a
data/incoming/directory for new CSV files - Processes only files that are NOT in the state file (new files since the last run)
- After processing each new file (just read it and count its rows), adds it to the state file with the current timestamp
- Writes a summary of new vs. already-seen files to the console
Simulate three "runs" by manually dropping new files into data/incoming/ between runs. Confirm that the second run only processes files added since the first run.
This exercise teaches: idempotent processing, state management with JSON, and the foundation of real ETL (Extract, Transform, Load) pipelines.
Exercise 17: CSV Diff Tool
Scenario: Priya wants to know what changed between this week's consolidated report and last week's.
Task:
Write a csv_diff(old_path, new_path, key_column) function that:
- Reads both CSV files into lists of dicts
- Uses the
key_columnvalue (e.g.,"rep_id") to match rows between the two files - Identifies and returns:
-
"added": rows innew_pathbut not inold_path(by key) -"removed": rows inold_pathbut not innew_path(by key) -"changed": rows present in both but with different values in at least one column (report which columns changed) -"unchanged": rows identical in both - Writes a diff report to
output/csv_diff_report.txtin a human-readable format
Create two versions of a sales CSV (old and new) where some reps' revenue changed, one rep was added, and one was removed. Call your function and verify the output correctly identifies all four categories.
This is a genuine professional tool — variations of this are used in data pipeline validation everywhere.
Answer Notes
Exercise answers and worked solutions are provided in the instructor supplement. For self-study learners: the case studies and code files in this chapter contain all the patterns needed to complete these exercises. If you find yourself looking up a specific method, that is a sign you are engaging correctly with the material — not a sign you are doing it wrong.
Recommended time estimates per tier: - Tier 1: 15–20 minutes per exercise - Tier 2: 25–35 minutes per exercise - Tier 3: 35–45 minutes per exercise - Tier 4: 60–90 minutes per exercise - Tier 5: 90–120 minutes per exercise