Chapter 16 Exercises: Excel and CSV Integration
Work through these exercises in order. Each tier builds on skills from the previous. All code should follow PEP 8. Use openpyxl for formatting, pandas for data manipulation.
Tier 1 — Foundational (Do These First)
Exercise 16-1: Your First openpyxl Workbook
Create an Excel workbook with a single sheet titled "Sales Data" using openpyxl. Write the following data starting at cell A1, with the first row as headers:
| Product | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|
| Laptops | 142000 | 158000 | 171000 | 195000 |
| Monitors | 67000 | 73000 | 81000 | 94000 |
| Keyboards | 23000 | 28000 | 31000 | 36000 |
Requirements:
- Header row: bold font, dark blue background, white text
- Data rows: normal font
- Columns B through E: currency format ("$"#,##0)
- Column A: minimum width 15
Save as product_sales.xlsx. Open the file in Excel to verify it looks correct.
Exercise 16-2: Reading Multiple Sheets
Create a Python script that reads the file you created in Exercise 16-1 using both methods:
1. pd.read_excel("product_sales.xlsx") — print the resulting DataFrame
2. openpyxl.load_workbook("product_sales.xlsx") — print each cell value individually using ws.iter_rows()
For each method, print the shape of the data (rows × columns). Write a comment explaining one thing you can do with openpyxl that you cannot do with pandas' read_excel.
Exercise 16-3: CSV Encoding Exercise
Download or create a CSV file with a line containing a non-ASCII character (e.g., a client name like "Müller GmbH" or "Café Renée"). Save the file in three different encodings using Python:
df.to_csv("data_utf8.csv", encoding="utf-8", index=False)
df.to_csv("data_utf8bom.csv", encoding="utf-8-sig", index=False)
df.to_csv("data_latin1.csv", encoding="latin-1", index=False)
Then attempt to read each file back:
- pd.read_csv("data_utf8.csv") — does it work?
- pd.read_csv("data_utf8bom.csv") — does the column name have a \ufeff prefix?
- pd.read_csv("data_latin1.csv", encoding="utf-8") — does it raise a UnicodeDecodeError?
Write a brief explanation of what utf-8-sig is and when you would use it.
Exercise 16-4: Multi-Sheet Write with pandas ExcelWriter
Generate a DataFrame for each of the four Acme Corp regions (use numpy with a fixed seed to generate 50 rows per region with columns: order_id, revenue, gross_margin_pct). Write all four DataFrames to a single Excel workbook using pd.ExcelWriter, one sheet per region.
Add a fifth sheet called "Combined" that contains all rows from all four regions combined, with an additional column "region" indicating which region each row came from.
Save as acme_multi_sheet.xlsx.
Exercise 16-5: Format a pandas-Written File
Write summary_df.to_excel("summary.xlsx", index=False) for a small (5-row) summary DataFrame. Then load the resulting workbook with openpyxl and apply the following post-processing:
- Make row 1 (the header) bold
- Set the background color of row 1 to
"1F4E79"(dark blue) - Set the font color of row 1 to white
- Set all revenue columns to currency format
- Set all column widths to 16
Save the modified workbook as summary_formatted.xlsx. This exercise practices the pandas-write-then-openpyxl-format workflow.
Tier 2 — Developing Skills
Exercise 16-6: Alternating Row Colors
Create a function apply_alternating_rows(ws, start_row: int, color_a: str, color_b: str) that applies alternating background colors to all rows from start_row to ws.max_row. Color A applies to odd-indexed rows, Color B to even-indexed.
Test it on a worksheet with 20 data rows using colors "EBF5FB" (light blue) and "FFFFFF" (white). The function must work regardless of how many columns are in the sheet — use ws.max_column to determine the range.
Exercise 16-7: Merged Title Cell and Subtitle
Write a function add_report_header(ws, title: str, subtitle: str, num_columns: int) that:
1. Merges A1 through the column specified by num_columns in row 1
2. Writes the title in bold, 14pt, dark navy blue font
3. Merges the same range in row 2 and writes the subtitle in 10pt, gray, italic font
4. Sets row 1 height to 28 and row 2 height to 16
5. Returns the row number of the first available data row (3)
Test the function on a worksheet with 6 columns.
Exercise 16-8: Freeze Panes and Auto-Filter
Build a worksheet with 100 rows of generated data (5 columns: Date, Region, Rep, Product, Revenue). Apply:
1. Freeze panes so row 1 stays visible while scrolling
2. Excel auto-filter on the header row using ws.auto_filter.ref = ws.dimensions
3. Dollar formatting on the Revenue column
4. Sort the data by Revenue descending before writing (use pandas)
The auto-filter allows users to filter the data directly in Excel without Python. Verify by opening the file — the dropdown arrows should appear in the header cells.
Exercise 16-9: openpyxl Chart
Generate a 6-row DataFrame with category and revenue columns (6 product categories, one revenue total per category). Write it to Excel and use openpyxl to add an embedded BarChart:
- Chart title: "Revenue by Product Category"
- Y-axis title: "Revenue ($)"
- Number format on Y-axis:
"$#,##0" - Chart placed at cell
D2 - Chart style 10
- Chart width = 18cm, height = 12cm
Save as category_chart.xlsx. The chart should update automatically if you change the data in cells.
Exercise 16-10: Reading a Specific Range with openpyxl
Given a workbook where the data table starts at B3 (not A1) and there are decorative cells above and to the left:
- Use
load_workbook("complicated_layout.xlsx")to open it (create this file first) - Use
ws["B3:F20"]to read only the data range - Convert the range to a pandas DataFrame manually using list comprehension:
data = [[cell.value for cell in row] for row in ws["B3:F20"]]
headers = data[0]
df = pd.DataFrame(data[1:], columns=headers)
- Print the DataFrame's shape and first 5 rows
This pattern handles the common case where ERP exports have multi-row decorative headers that pd.read_excel's skiprows can't easily handle.
Tier 3 — Applied Business
Exercise 16-11: The Complete "Pandas to Formatted Excel" Pipeline
You have a CSV file of customer orders. Write a complete pipeline that:
- Reads
orders.csv(generate it: 200 rows, columns:order_id,customer,region,product,revenue,cost,order_date) - Adds computed columns:
gross_profit = revenue - cost,margin_pct = gross_profit / revenue - Groups by
regionandproductto create a pivot table of total revenue - Writes the pivot table to a formatted Excel workbook with:
- Bold header row with dark blue background
- Revenue values formatted as
"$"#,##0- Alternating row shading - Column widths set appropriately - A "Grand Total" row at the bottom showing column sums - Adds a second sheet with the raw order-level data (unformatted is acceptable for this sheet)
Save as orders_report.xlsx.
Exercise 16-12: Multi-Client Report Generator
Write a script that:
1. Takes a DataFrame with columns: client_name, project_name, hours, rate, amount
2. Groups by client_name
3. Generates one Excel workbook per client (not one sheet per client — one entire file per client)
4. Each workbook has a single sheet with:
- Client name in a large merged title cell
- Project line items in a formatted table
- Subtotal at the bottom
- Payment terms and due date (computed as 30 days from today)
5. Saves each file as {ClientName}_Statement.xlsx
Use at least 4 clients with 2-4 projects each. This exercise mirrors Maya's invoice generator but lets you build it from scratch.
Exercise 16-13: CSV Delimiter Detection
Write a function detect_and_read_csv(filepath: str) -> pd.DataFrame that:
1. Reads the first line of the file to detect the delimiter
2. Tests for comma, semicolon, tab, and pipe (|) delimiters
3. Counts the number of tokens in the first line for each delimiter
4. Chooses the delimiter that produces the most tokens (most likely the correct one)
5. Reads the full file with the detected delimiter
6. Prints which delimiter was detected
Test with at least three CSV files using different delimiters.
Exercise 16-14: Excel Workbook Health Check
Write a script that takes an Excel file path and produces a "health report":
Workbook: acme_report.xlsx
Sheets: 5 (Summary, Northeast, Southeast, Midwest, West)
Sheet: Summary
Dimensions: A1:F25
Non-empty cells: 142
Merged cells: 3
Has formulas: Yes
Sheet: Northeast
Dimensions: A1:H185
Non-empty cells: 1478
Merged cells: 1
Has formulas: No
Potential issues:
- Column D has 14 empty cells in the data range
- Row 89 appears to be blank (possible data gap)
Use openpyxl to inspect: ws.merged_cells, ws.max_row, ws.max_column, and iterate cells to count non-empty values and find formula strings (they start with =).
Exercise 16-15: Combining and Splitting Workbooks
Part A — Combine: Write a script that takes a directory of Excel files (each with one sheet) and combines them into a single workbook, one sheet per source file. The sheet name should be derived from the source filename (without extension). Print a log of which files were combined.
Part B — Split: Write the reverse: take a multi-sheet workbook and save each sheet as a separate Excel file, named SheetName.xlsx, in an output directory.
Test Part A with at least 4 source files. Test Part B by splitting the result of Part A back apart and verifying the row counts match.
Tier 4 — Challenge
Exercise 16-16: Formatted Excel Report with Conditional Formatting
openpyxl supports Excel's conditional formatting rules. Apply the following to a revenue column:
from openpyxl.formatting.rule import ColorScaleRule
rule = ColorScaleRule(
start_type="percentile", start_value=10, start_color="F8696B", # Red
mid_type="percentile", mid_value=50, mid_color="FFEB84", # Yellow
end_type="percentile", end_value=90, end_color="63BE7B", # Green
)
ws.conditional_formatting.add("B2:B101", rule)
Apply this rule to a 100-row dataset and save the workbook. The cells should shade from red (lowest values) through yellow (median) to green (highest values) — Excel's standard heat scale. Open the file in Excel to verify the shading appears correctly.
Exercise 16-17: xlwings Live Excel Integration
(Requires Excel installed and xlwings: pip install xlwings)
Create a Python script that: 1. Opens an existing Excel file with a data table 2. Reads the contents of a specific named range (or cell range) 3. Performs a calculation in Python (e.g., adjusts all values by a percentage) 4. Writes the results back to Excel in a different column 5. Saves the file
This demonstrates the xlwings workflow for situations where Excel formulas are insufficient or where you want to use Python libraries (scipy, numpy) for calculations that Excel can't perform.
Exercise 16-18: The Automated Month-End Report Package
Build a complete "report package" generator that produces a zip archive containing:
1. An Excel workbook with 4 regional sheets and a summary sheet
2. Four CSV exports (one per region) of the raw data
3. A text file (run_log.txt) with: timestamp, row counts per region, total revenue, any warnings encountered during processing
Use Python's zipfile module to package everything:
import zipfile
with zipfile.ZipFile("monthly_package_202311.zip", "w") as zf:
zf.write("acme_report.xlsx")
for region in regions:
zf.write(f"acme_{region}_data.csv")
zf.write("run_log.txt")
The recipient should be able to extract the zip and find everything they need.
Tier 5 — Mastery
Exercise 16-19: A Reusable Excel Formatting Library
Write a Python module excel_styles.py that provides:
class ReportStyle:
"""
Encapsulates a complete visual style for business Excel reports.
Instantiate once; reuse across all sheets in a workbook.
"""
def __init__(self, primary_color: str, accent_color: str, font_name: str):
...
def apply_header(self, ws, row: int, columns: list[str]) -> None:
"""Writes and formats a header row."""
...
def apply_data_row(self, ws, row: int, values: list, formats: dict) -> None:
"""Writes and formats a data row."""
...
def apply_total_row(self, ws, row: int, values: list, formats: dict) -> None:
"""Writes and formats a total/summary row."""
...
def apply_title_block(self, ws, title: str, subtitle: str) -> int:
"""Applies the title block and returns the first data row."""
...
def autofit_columns(self, ws) -> None:
"""Auto-sizes all columns based on content."""
...
The class should accept a primary_color and accent_color so the style can be brand-customized. Demonstrate it with two different styles: "Acme Corp" (dark blue) and "Maya Reyes Consulting" (teal).
Exercise 16-20: End-to-End Invoice Automation with Maya's Data
Using the maya_invoice_generator.py script from the chapter as a starting point (or building your own from scratch), extend the system with the following enhancements:
-
Invoice number persistence: Read the last used invoice number from a
invoice_counter.jsonfile and increment it. If the file doesn't exist, start at 1000. -
PDF export: After generating each Excel invoice, use the
subprocessmodule to open the file in LibreOffice or Microsoft Excel headless and export it as PDF. (This is platform-specific — document the platform requirement clearly.) -
Summary CSV: After generating all invoices, write a
invoices_summary_YYYYMM.csvwith columns: invoice_number, client_company, total_hours, subtotal, due_date. This is Maya's record of what she billed and when. -
Duplicate detection: Before generating an invoice, check whether a file with the same name already exists in the output directory. If it does, prompt the user to confirm overwrite or skip.
Answer Guidance (Selected Exercises)
Exercise 16-3 (key points):
- utf-8-sig writes a BOM (byte order mark, \ufeff) at the start of the file. Excel uses this to auto-detect UTF-8 encoding when opening CSV files directly. Without the BOM, Excel may interpret the file as the system default encoding, producing garbled characters.
- When reading utf-8-sig files with encoding="utf-8" (no sig), the first column name gets a \ufeff prefix. Use encoding="utf-8-sig" to read and write consistently.
Exercise 16-10 (key points):
- ws["B3:F20"] returns a tuple of row tuples. Each inner tuple is a sequence of Cell objects. Access .value on each cell.
- The first element of data (index 0) is the header row. Subsequent elements are data rows.
- This pattern is more reliable than skiprows when the layout is irregular.
Exercise 16-13 (key points):
- Read just the first line: with open(filepath) as f: first_line = f.readline()
- Count tokens: len(first_line.split(",")), len(first_line.split(";")), etc.
- Edge case: a file with quoted values containing the delimiter will count incorrectly. The csv.Sniffer class handles this more robustly: csv.Sniffer().sniff(first_line).delimiter.
Exercise 16-16 (key points):
- Conditional formatting in openpyxl applies Excel's native formatting rules — the logic runs in Excel, not in Python. Python only writes the rule specification.
- The ColorScaleRule start_value, mid_value, end_value with type="percentile" sets the color scale relative to the data range, so it adapts automatically when values change.