Chapter 16 Exercises: Excel and CSV Integration

Work through these exercises in order. Each tier builds on skills from the previous. All code should follow PEP 8. Use openpyxl for formatting, pandas for data manipulation.

Tier 1 — Foundational (Do These First)

Exercise 16-1: Your First openpyxl Workbook

Create an Excel workbook with a single sheet titled "Sales Data" using openpyxl. Write the following data starting at cell A1, with the first row as headers:

Product	Q1	Q2	Q3	Q4
Laptops	142000	158000	171000	195000
Monitors	67000	73000	81000	94000
Keyboards	23000	28000	31000	36000

Requirements: - Header row: bold font, dark blue background, white text - Data rows: normal font - Columns B through E: currency format ("$"#,##0) - Column A: minimum width 15

Save as product_sales.xlsx. Open the file in Excel to verify it looks correct.

Exercise 16-2: Reading Multiple Sheets

Create a Python script that reads the file you created in Exercise 16-1 using both methods: 1. pd.read_excel("product_sales.xlsx") — print the resulting DataFrame 2. openpyxl.load_workbook("product_sales.xlsx") — print each cell value individually using ws.iter_rows()

For each method, print the shape of the data (rows × columns). Write a comment explaining one thing you can do with openpyxl that you cannot do with pandas' read_excel.

Exercise 16-3: CSV Encoding Exercise

Download or create a CSV file with a line containing a non-ASCII character (e.g., a client name like "Müller GmbH" or "Café Renée"). Save the file in three different encodings using Python:

df.to_csv("data_utf8.csv", encoding="utf-8", index=False)
df.to_csv("data_utf8bom.csv", encoding="utf-8-sig", index=False)
df.to_csv("data_latin1.csv", encoding="latin-1", index=False)

Then attempt to read each file back: - pd.read_csv("data_utf8.csv") — does it work? - pd.read_csv("data_utf8bom.csv") — does the column name have a \ufeff prefix? - pd.read_csv("data_latin1.csv", encoding="utf-8") — does it raise a UnicodeDecodeError?

Write a brief explanation of what utf-8-sig is and when you would use it.

Exercise 16-4: Multi-Sheet Write with pandas ExcelWriter

Generate a DataFrame for each of the four Acme Corp regions (use numpy with a fixed seed to generate 50 rows per region with columns: order_id, revenue, gross_margin_pct). Write all four DataFrames to a single Excel workbook using pd.ExcelWriter, one sheet per region.

Add a fifth sheet called "Combined" that contains all rows from all four regions combined, with an additional column "region" indicating which region each row came from.

Save as acme_multi_sheet.xlsx.

Exercise 16-5: Format a pandas-Written File

Write summary_df.to_excel("summary.xlsx", index=False) for a small (5-row) summary DataFrame. Then load the resulting workbook with openpyxl and apply the following post-processing:

Make row 1 (the header) bold
Set the background color of row 1 to "1F4E79" (dark blue)
Set the font color of row 1 to white
Set all revenue columns to currency format
Set all column widths to 16

Save the modified workbook as summary_formatted.xlsx. This exercise practices the pandas-write-then-openpyxl-format workflow.

Tier 2 — Developing Skills

Exercise 16-6: Alternating Row Colors

Create a function apply_alternating_rows(ws, start_row: int, color_a: str, color_b: str) that applies alternating background colors to all rows from start_row to ws.max_row. Color A applies to odd-indexed rows, Color B to even-indexed.

Test it on a worksheet with 20 data rows using colors "EBF5FB" (light blue) and "FFFFFF" (white). The function must work regardless of how many columns are in the sheet — use ws.max_column to determine the range.

Exercise 16-7: Merged Title Cell and Subtitle

Write a function add_report_header(ws, title: str, subtitle: str, num_columns: int) that: 1. Merges A1 through the column specified by num_columns in row 1 2. Writes the title in bold, 14pt, dark navy blue font 3. Merges the same range in row 2 and writes the subtitle in 10pt, gray, italic font 4. Sets row 1 height to 28 and row 2 height to 16 5. Returns the row number of the first available data row (3)

Test the function on a worksheet with 6 columns.

Exercise 16-8: Freeze Panes and Auto-Filter

Build a worksheet with 100 rows of generated data (5 columns: Date, Region, Rep, Product, Revenue). Apply: 1. Freeze panes so row 1 stays visible while scrolling 2. Excel auto-filter on the header row using ws.auto_filter.ref = ws.dimensions 3. Dollar formatting on the Revenue column 4. Sort the data by Revenue descending before writing (use pandas)

The auto-filter allows users to filter the data directly in Excel without Python. Verify by opening the file — the dropdown arrows should appear in the header cells.

Exercise 16-9: openpyxl Chart

Generate a 6-row DataFrame with category and revenue columns (6 product categories, one revenue total per category). Write it to Excel and use openpyxl to add an embedded BarChart:

Chart title: "Revenue by Product Category"
Y-axis title: "Revenue ($)"
Number format on Y-axis: "$#,##0"
Chart placed at cell D2
Chart style 10
Chart width = 18cm, height = 12cm

Save as category_chart.xlsx. The chart should update automatically if you change the data in cells.

Exercise 16-10: Reading a Specific Range with openpyxl

Given a workbook where the data table starts at B3 (not A1) and there are decorative cells above and to the left:

Use load_workbook("complicated_layout.xlsx") to open it (create this file first)
Use ws["B3:F20"] to read only the data range
Convert the range to a pandas DataFrame manually using list comprehension:

data = [[cell.value for cell in row] for row in ws["B3:F20"]]
headers = data[0]
df = pd.DataFrame(data[1:], columns=headers)

Print the DataFrame's shape and first 5 rows

This pattern handles the common case where ERP exports have multi-row decorative headers that pd.read_excel's skiprows can't easily handle.

Tier 3 — Applied Business

Exercise 16-11: The Complete "Pandas to Formatted Excel" Pipeline

You have a CSV file of customer orders. Write a complete pipeline that:

Reads orders.csv (generate it: 200 rows, columns: order_id, customer, region, product, revenue, cost, order_date)
Adds computed columns: gross_profit = revenue - cost, margin_pct = gross_profit / revenue
Groups by region and product to create a pivot table of total revenue
Writes the pivot table to a formatted Excel workbook with: - Bold header row with dark blue background - Revenue values formatted as "$"#,##0 - Alternating row shading - Column widths set appropriately - A "Grand Total" row at the bottom showing column sums
Adds a second sheet with the raw order-level data (unformatted is acceptable for this sheet)

Save as orders_report.xlsx.

Exercise 16-12: Multi-Client Report Generator

Write a script that: 1. Takes a DataFrame with columns: client_name, project_name, hours, rate, amount 2. Groups by client_name 3. Generates one Excel workbook per client (not one sheet per client — one entire file per client) 4. Each workbook has a single sheet with: - Client name in a large merged title cell - Project line items in a formatted table - Subtotal at the bottom - Payment terms and due date (computed as 30 days from today) 5. Saves each file as {ClientName}_Statement.xlsx

Use at least 4 clients with 2-4 projects each. This exercise mirrors Maya's invoice generator but lets you build it from scratch.

Exercise 16-13: CSV Delimiter Detection

Write a function detect_and_read_csv(filepath: str) -> pd.DataFrame that: 1. Reads the first line of the file to detect the delimiter 2. Tests for comma, semicolon, tab, and pipe (|) delimiters 3. Counts the number of tokens in the first line for each delimiter 4. Chooses the delimiter that produces the most tokens (most likely the correct one) 5. Reads the full file with the detected delimiter 6. Prints which delimiter was detected

Test with at least three CSV files using different delimiters.

Exercise 16-14: Excel Workbook Health Check

Write a script that takes an Excel file path and produces a "health report":

Workbook: acme_report.xlsx
Sheets: 5 (Summary, Northeast, Southeast, Midwest, West)

Sheet: Summary
  Dimensions: A1:F25
  Non-empty cells: 142
  Merged cells: 3
  Has formulas: Yes

Sheet: Northeast
  Dimensions: A1:H185
  Non-empty cells: 1478
  Merged cells: 1
  Has formulas: No
  Potential issues:
    - Column D has 14 empty cells in the data range
    - Row 89 appears to be blank (possible data gap)

Use openpyxl to inspect: ws.merged_cells, ws.max_row, ws.max_column, and iterate cells to count non-empty values and find formula strings (they start with =).

Exercise 16-15: Combining and Splitting Workbooks

Part A — Combine: Write a script that takes a directory of Excel files (each with one sheet) and combines them into a single workbook, one sheet per source file. The sheet name should be derived from the source filename (without extension). Print a log of which files were combined.

Part B — Split: Write the reverse: take a multi-sheet workbook and save each sheet as a separate Excel file, named SheetName.xlsx, in an output directory.

Test Part A with at least 4 source files. Test Part B by splitting the result of Part A back apart and verifying the row counts match.

Tier 4 — Challenge

Exercise 16-16: Formatted Excel Report with Conditional Formatting

openpyxl supports Excel's conditional formatting rules. Apply the following to a revenue column:

from openpyxl.formatting.rule import ColorScaleRule

rule = ColorScaleRule(
    start_type="percentile", start_value=10, start_color="F8696B",  # Red
    mid_type="percentile", mid_value=50, mid_color="FFEB84",        # Yellow
    end_type="percentile", end_value=90, end_color="63BE7B",        # Green
)
ws.conditional_formatting.add("B2:B101", rule)

Apply this rule to a 100-row dataset and save the workbook. The cells should shade from red (lowest values) through yellow (median) to green (highest values) — Excel's standard heat scale. Open the file in Excel to verify the shading appears correctly.

Exercise 16-17: xlwings Live Excel Integration

(Requires Excel installed and xlwings: pip install xlwings)

Create a Python script that: 1. Opens an existing Excel file with a data table 2. Reads the contents of a specific named range (or cell range) 3. Performs a calculation in Python (e.g., adjusts all values by a percentage) 4. Writes the results back to Excel in a different column 5. Saves the file

This demonstrates the xlwings workflow for situations where Excel formulas are insufficient or where you want to use Python libraries (scipy, numpy) for calculations that Excel can't perform.

Exercise 16-18: The Automated Month-End Report Package

Build a complete "report package" generator that produces a zip archive containing: 1. An Excel workbook with 4 regional sheets and a summary sheet 2. Four CSV exports (one per region) of the raw data 3. A text file (run_log.txt) with: timestamp, row counts per region, total revenue, any warnings encountered during processing

Use Python's zipfile module to package everything:

import zipfile

with zipfile.ZipFile("monthly_package_202311.zip", "w") as zf:
    zf.write("acme_report.xlsx")
    for region in regions:
        zf.write(f"acme_{region}_data.csv")
    zf.write("run_log.txt")

The recipient should be able to extract the zip and find everything they need.

Tier 5 — Mastery

Exercise 16-19: A Reusable Excel Formatting Library

Write a Python module excel_styles.py that provides:

class ReportStyle:
    """
    Encapsulates a complete visual style for business Excel reports.
    Instantiate once; reuse across all sheets in a workbook.
    """
    def __init__(self, primary_color: str, accent_color: str, font_name: str):
        ...

    def apply_header(self, ws, row: int, columns: list[str]) -> None:
        """Writes and formats a header row."""
        ...

    def apply_data_row(self, ws, row: int, values: list, formats: dict) -> None:
        """Writes and formats a data row."""
        ...

    def apply_total_row(self, ws, row: int, values: list, formats: dict) -> None:
        """Writes and formats a total/summary row."""
        ...

    def apply_title_block(self, ws, title: str, subtitle: str) -> int:
        """Applies the title block and returns the first data row."""
        ...

    def autofit_columns(self, ws) -> None:
        """Auto-sizes all columns based on content."""
        ...

The class should accept a primary_color and accent_color so the style can be brand-customized. Demonstrate it with two different styles: "Acme Corp" (dark blue) and "Maya Reyes Consulting" (teal).

Exercise 16-20: End-to-End Invoice Automation with Maya's Data

Using the maya_invoice_generator.py script from the chapter as a starting point (or building your own from scratch), extend the system with the following enhancements:

Invoice number persistence: Read the last used invoice number from a invoice_counter.json file and increment it. If the file doesn't exist, start at 1000.
PDF export: After generating each Excel invoice, use the subprocess module to open the file in LibreOffice or Microsoft Excel headless and export it as PDF. (This is platform-specific — document the platform requirement clearly.)
Summary CSV: After generating all invoices, write a invoices_summary_YYYYMM.csv with columns: invoice_number, client_company, total_hours, subtotal, due_date. This is Maya's record of what she billed and when.
Duplicate detection: Before generating an invoice, check whether a file with the same name already exists in the output directory. If it does, prompt the user to confirm overwrite or skip.

Answer Guidance (Selected Exercises)

Exercise 16-3 (key points): - utf-8-sig writes a BOM (byte order mark, \ufeff) at the start of the file. Excel uses this to auto-detect UTF-8 encoding when opening CSV files directly. Without the BOM, Excel may interpret the file as the system default encoding, producing garbled characters. - When reading utf-8-sig files with encoding="utf-8" (no sig), the first column name gets a \ufeff prefix. Use encoding="utf-8-sig" to read and write consistently.

Exercise 16-10 (key points): - ws["B3:F20"] returns a tuple of row tuples. Each inner tuple is a sequence of Cell objects. Access .value on each cell. - The first element of data (index 0) is the header row. Subsequent elements are data rows. - This pattern is more reliable than skiprows when the layout is irregular.

Exercise 16-13 (key points): - Read just the first line: with open(filepath) as f: first_line = f.readline() - Count tokens: len(first_line.split(",")), len(first_line.split(";")), etc. - Edge case: a file with quoted values containing the delimiter will count incorrectly. The csv.Sniffer class handles this more robustly: csv.Sniffer().sniff(first_line).delimiter.

Exercise 16-16 (key points): - Conditional formatting in openpyxl applies Excel's native formatting rules — the logic runs in Excel, not in Python. Python only writes the rule specification. - The ColorScaleRule start_value, mid_value, end_value with type="percentile" sets the color scale relative to the data range, so it adapts automatically when values change.