Case Study 16-1: Priya Generates the Weekly Acme Report as a Formatted Excel Workbook

Background

Every Monday morning, Marcus Webb runs through the same ritual. He logs into the ERP system, exports four CSV files — one per region — waits for each download to complete, opens Excel, creates a new workbook, pastes each file into a separate sheet, manually renames the sheets, formats the headers, adjusts column widths, sets the number formats on the revenue columns, and updates the summary sheet by typing new numbers into the formulas by hand.

This takes one hour and forty minutes. He has been doing it every Monday for four years.

Last Monday he broke a formula in the Midwest sheet by accidentally pasting over a cell. He didn't notice until Thursday when Sandra asked why Midwest numbers were 17% below the Q3 figures from the previous week. He spent thirty minutes tracking down the error.

Priya had watched this process from her desk. After finishing Chapter 16's openpyxl section, she told Marcus she thought she could automate it. He was politely skeptical.

"Can it put the headers in blue?" he asked.

"Yes."

"Can it set the revenue columns to dollar format?"

"Yes."

"Can it add the summary sheet that adds everything up?"

"Yes."

"And can it do this without me touching anything?"

"You run one Python script. That's it."

Marcus leaned back in his chair. "Show me."

Step 1: Understanding the Data

Priya started by talking to Marcus about the data before writing a line of code. She asked to see a sample of the four CSV files he exported each week.

The CSVs came from the ERP with consistent column names:

order_id, order_date, region, rep_name, product_category,
revenue, order_quantity, unit_cost, gross_margin

The files for the previous week were named acme_northeast_w47.csv, acme_southeast_w47.csv, acme_midwest_w47.csv, and acme_west_w47.csv. Marcus admitted the week number suffix was manual — he typed it each time.

Priya's first improvement: instead of four separate files, she would write the script to read from a single combined weekly export. She'd work with Marcus later on getting IT to generate one combined file. For the prototype, she'd read all four and combine them.

Step 2: The Data Layer

import pandas as pd
from pathlib import Path
from datetime import datetime

def load_weekly_data(data_dir: str, week_suffix: str) -> pd.DataFrame:
    """
    Loads and combines four regional CSV files into one DataFrame.

    Args:
        data_dir: Directory containing the regional CSV files
        week_suffix: Week identifier (e.g., 'w47')

    Returns:
        Combined DataFrame with a 'region' column added.
    """
    regions = ["northeast", "southeast", "midwest", "west"]
    frames = []

    for region in regions:
        filepath = Path(data_dir) / f"acme_{region}_{week_suffix}.csv"
        if not filepath.exists():
            print(f"Warning: {filepath} not found — skipping")
            continue

        region_df = pd.read_csv(filepath, parse_dates=["order_date"])
        region_df["region"] = region.title()
        frames.append(region_df)

    if not frames:
        raise FileNotFoundError(f"No data files found for week {week_suffix}")

    combined = pd.concat(frames, ignore_index=True)
    print(f"Loaded {len(combined):,} rows from {len(frames)} regional files.")
    return combined

She also wrote a function to aggregate the data by region — the summary numbers Marcus calculated manually:

def build_regional_summary(weekly_df: pd.DataFrame) -> pd.DataFrame:
    """
    Aggregates weekly order data to regional summary metrics.

    Returns DataFrame with: region, total_revenue, total_orders,
    avg_order_value, total_units, avg_gross_margin.
    """
    summary = (
        weekly_df
        .groupby("region")
        .agg(
            total_revenue=("revenue", "sum"),
            total_orders=("order_id", "count"),
            avg_order_value=("revenue", "mean"),
            total_units=("order_quantity", "sum"),
            avg_gross_margin=("gross_margin", "mean"),
        )
        .reset_index()
        .sort_values("total_revenue", ascending=False)
    )
    return summary

She added a detail aggregation by product category within each region — Marcus had always included this on each regional sheet, though he populated it manually from a separate pivot:

def build_category_breakdown(regional_df: pd.DataFrame) -> pd.DataFrame:
    """Aggregates revenue and order count by product category."""
    return (
        regional_df
        .groupby("product_category")
        .agg(
            revenue=("revenue", "sum"),
            orders=("order_id", "count"),
            avg_margin=("gross_margin", "mean"),
        )
        .reset_index()
        .sort_values("revenue", ascending=False)
    )

Step 3: Building the Excel Workbook

Priya decided on the sheet structure:

Summary — regional totals, side-by-side comparison
Northeast — order detail + category breakdown
Southeast — same
Midwest — same
West — same

She started with the style constants, defined once at the top of the file:

from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils import get_column_letter

# Colors — Acme brand palette
COLOR_NAVY = "1F4E79"
COLOR_LIGHT_BLUE = "D6E4F0"
COLOR_ALT_ROW = "EBF5FB"
COLOR_WHITE = "FFFFFF"
COLOR_GRAY = "BFBFBF"

# Style objects
HEADER_FONT = Font(name="Calibri", bold=True, size=11, color=COLOR_WHITE)
HEADER_FILL = PatternFill(fill_type="solid", fgColor=COLOR_NAVY)
HEADER_BORDER = Border(
    bottom=Side(style="medium", color=COLOR_NAVY)
)
DATA_FONT = Font(name="Calibri", size=10)
TOTAL_FONT = Font(name="Calibri", bold=True, size=10)
TOTAL_FILL = PatternFill(fill_type="solid", fgColor=COLOR_LIGHT_BLUE)
ALT_FILL = PatternFill(fill_type="solid", fgColor=COLOR_ALT_ROW)

ALIGN_CENTER = Alignment(horizontal="center", vertical="center")
ALIGN_RIGHT = Alignment(horizontal="right", vertical="center")
ALIGN_LEFT = Alignment(horizontal="left", vertical="center")

The regional sheet builder:

from openpyxl import Workbook

def write_region_sheet(wb: Workbook, region_name: str, region_df: pd.DataFrame) -> None:
    """
    Creates a worksheet for a single region with order detail
    and a category breakdown table below.
    """
    ws = wb.create_sheet(title=region_name)

    # ---- Title block ----
    ws.merge_cells("A1:H1")
    ws["A1"] = f"Acme Corp — {region_name} Region — Weekly Report"
    ws["A1"].font = Font(name="Calibri", bold=True, size=14, color=COLOR_NAVY)
    ws["A1"].alignment = ALIGN_LEFT
    ws.row_dimensions[1].height = 28

    ws.merge_cells("A2:H2")
    report_date = datetime.now().strftime("Generated %A, %B %d, %Y at %I:%M %p")
    ws["A2"] = report_date
    ws["A2"].font = Font(name="Calibri", size=9, italic=True, color="888888")
    ws.row_dimensions[2].height = 14

    # ---- Order detail section ----
    order_headers = [
        "Order ID", "Order Date", "Rep Name", "Product Category",
        "Revenue", "Qty", "Unit Cost", "Gross Margin %",
    ]

    for col_idx, header in enumerate(order_headers, start=1):
        cell = ws.cell(row=4, column=col_idx)
        cell.value = header
        cell.font = HEADER_FONT
        cell.fill = HEADER_FILL
        cell.alignment = ALIGN_CENTER
    ws.row_dimensions[4].height = 18

    # Freeze rows 1-4
    ws.freeze_panes = "A5"

    # Order detail rows
    order_cols = [
        "order_id", "order_date", "rep_name", "product_category",
        "revenue", "order_quantity", "unit_cost", "gross_margin",
    ]
    number_fmts = {
        5: '"$"#,##0.00',    # Revenue
        7: '"$"#,##0.00',    # Unit cost
        8: '0.0%',           # Gross margin
    }

    for row_idx, (_, order_row) in enumerate(region_df.iterrows(), start=5):
        alternate = (row_idx % 2 == 0)
        for col_idx, col_name in enumerate(order_cols, start=1):
            cell = ws.cell(row=row_idx, column=col_idx)
            cell.value = order_row[col_name]
            cell.font = DATA_FONT
            cell.alignment = (
                ALIGN_RIGHT if col_name in ("revenue", "order_quantity",
                                             "unit_cost", "gross_margin")
                else ALIGN_LEFT
            )
            if alternate:
                cell.fill = ALT_FILL
            if col_idx in number_fmts:
                cell.number_format = number_fmts[col_idx]

    # Totals row
    total_row = 5 + len(region_df)
    ws.cell(row=total_row, column=1).value = "TOTAL"
    ws.cell(row=total_row, column=5).value = region_df["revenue"].sum()
    ws.cell(row=total_row, column=6).value = region_df["order_quantity"].sum()
    ws.cell(row=total_row, column=8).value = region_df["gross_margin"].mean()

    for col_idx in range(1, 9):
        cell = ws.cell(row=total_row, column=col_idx)
        cell.font = TOTAL_FONT
        cell.fill = TOTAL_FILL

    ws.cell(row=total_row, column=5).number_format = '"$"#,##0.00'
    ws.cell(row=total_row, column=8).number_format = '0.0%'

    # ---- Category breakdown section ----
    breakdown_df = build_category_breakdown(region_df)
    breakdown_start = total_row + 3

    ws.cell(row=breakdown_start - 1, column=1).value = "Revenue by Product Category"
    ws.cell(row=breakdown_start - 1, column=1).font = Font(
        name="Calibri", bold=True, size=10, color="595959"
    )

    cat_headers = ["Category", "Revenue", "Orders", "Avg Margin %"]
    for col_idx, header in enumerate(cat_headers, start=1):
        cell = ws.cell(row=breakdown_start, column=col_idx)
        cell.value = header
        cell.font = HEADER_FONT
        cell.fill = HEADER_FILL
        cell.alignment = ALIGN_CENTER

    for row_idx, (_, cat_row) in enumerate(breakdown_df.iterrows(),
                                            start=breakdown_start + 1):
        ws.cell(row=row_idx, column=1).value = cat_row["product_category"]
        ws.cell(row=row_idx, column=2).value = cat_row["revenue"]
        ws.cell(row=row_idx, column=2).number_format = '"$"#,##0'
        ws.cell(row=row_idx, column=3).value = cat_row["orders"]
        ws.cell(row=row_idx, column=3).number_format = "#,##0"
        ws.cell(row=row_idx, column=4).value = cat_row["avg_margin"]
        ws.cell(row=row_idx, column=4).number_format = '0.0%'
        for col_idx in range(1, 5):
            ws.cell(row=row_idx, column=col_idx).font = DATA_FONT

    # Auto-fit columns
    for col in ws.columns:
        max_len = max(
            len(str(cell.value)) if cell.value else 0
            for cell in col
        )
        ws.column_dimensions[col[0].column_letter].width = max(12, min(max_len + 4, 40))

Step 4: The Summary Sheet

The summary sheet was the part Marcus spent the most time on manually — pulling totals from each regional sheet into one view. Priya built it from the already-computed regional_summary DataFrame:

def write_summary_sheet(
    wb: Workbook,
    summary_df: pd.DataFrame,
    week_label: str,
    report_date: str,
) -> None:
    """Writes the Summary sheet with the regional comparison table."""
    ws = wb.active
    ws.title = "Summary"

    # Title
    ws.merge_cells("A1:F1")
    ws["A1"] = f"Acme Corp — Weekly Sales Summary — {week_label}"
    ws["A1"].font = Font(name="Calibri", bold=True, size=16, color=COLOR_NAVY)
    ws["A1"].alignment = Alignment(horizontal="left", vertical="center")
    ws.row_dimensions[1].height = 30

    ws.merge_cells("A2:F2")
    ws["A2"] = report_date
    ws["A2"].font = Font(name="Calibri", size=9, italic=True, color="888888")
    ws.row_dimensions[2].height = 14

    # Header row
    summary_headers = [
        "Region", "Total Revenue", "Total Orders",
        "Avg Order Value", "Total Units", "Avg Gross Margin %",
    ]
    for col_idx, header in enumerate(summary_headers, start=1):
        cell = ws.cell(row=4, column=col_idx)
        cell.value = header
        cell.font = HEADER_FONT
        cell.fill = HEADER_FILL
        cell.alignment = ALIGN_CENTER
    ws.row_dimensions[4].height = 20

    # Data rows
    cols = [
        "region", "total_revenue", "total_orders",
        "avg_order_value", "total_units", "avg_gross_margin",
    ]
    fmts = {
        2: '"$"#,##0',        # Total revenue
        3: "#,##0",           # Total orders
        4: '"$"#,##0.00',     # Avg order value
        5: "#,##0",           # Total units
        6: "0.0%",            # Avg gross margin
    }

    for row_idx, (_, row_data) in enumerate(summary_df.iterrows(), start=5):
        for col_idx, col_name in enumerate(cols, start=1):
            cell = ws.cell(row=row_idx, column=col_idx)
            cell.value = row_data[col_name]
            cell.font = DATA_FONT
            cell.alignment = ALIGN_RIGHT if col_idx > 1 else ALIGN_LEFT
            if row_idx % 2 == 0:
                cell.fill = ALT_FILL
            if col_idx in fmts:
                cell.number_format = fmts[col_idx]

    # Grand total row
    grand_row = 5 + len(summary_df)
    ws.cell(row=grand_row, column=1).value = "Grand Total"
    ws.cell(row=grand_row, column=2).value = summary_df["total_revenue"].sum()
    ws.cell(row=grand_row, column=2).number_format = '"$"#,##0'
    ws.cell(row=grand_row, column=3).value = summary_df["total_orders"].sum()
    ws.cell(row=grand_row, column=3).number_format = "#,##0"
    ws.cell(row=grand_row, column=5).value = summary_df["total_units"].sum()
    ws.cell(row=grand_row, column=5).number_format = "#,##0"

    for col_idx in range(1, 7):
        cell = ws.cell(row=grand_row, column=col_idx)
        cell.font = TOTAL_FONT
        cell.fill = TOTAL_FILL

    # Auto-fit columns
    for col in ws.columns:
        max_len = max(
            len(str(cell.value)) if cell.value else 0
            for cell in col
        )
        ws.column_dimensions[col[0].column_letter].width = max(14, min(max_len + 4, 40))

    ws.freeze_panes = "A5"

Step 5: Assembling and Running

from openpyxl import Workbook
from datetime import datetime

def generate_weekly_report(data_dir: str, week_suffix: str) -> str:
    """
    Full pipeline: load data → build aggregations → write Excel workbook.
    Returns the path to the saved file.
    """
    # Load data
    weekly_df = load_weekly_data(data_dir, week_suffix)
    summary_df = build_regional_summary(weekly_df)

    # Build workbook
    wb = Workbook()
    report_date = datetime.now().strftime("Generated %A, %B %d, %Y at %I:%M %p")
    week_label = week_suffix.upper()

    # Summary sheet first (uses wb.active)
    write_summary_sheet(wb, summary_df, week_label, report_date)

    # Regional sheets
    for region in ["Northeast", "Southeast", "Midwest", "West"]:
        region_df = weekly_df[weekly_df["region"] == region].copy()
        if region_df.empty:
            print(f"Warning: No data for {region} this week.")
            continue
        write_region_sheet(wb, region, region_df)

    # Save
    output_filename = f"acme_weekly_report_{week_suffix}.xlsx"
    wb.save(output_filename)
    return output_filename

# Run it
output_path = generate_weekly_report(data_dir=".", week_suffix="w48")
print(f"Report saved: {output_path}")

The Demo

Priya showed Marcus the result on a Monday morning. She ran the script while he watched — total time from running the command to opening the Excel file: fourteen seconds.

He opened the file. There was the Summary tab. Blue headers, alternating gray rows, dollar-formatted revenue column, right-aligned numbers. He clicked the Northeast tab. The order detail table was there, with all the columns formatted exactly as he had been formatting them manually for four years.

He scrolled down to the category breakdown.

"How did you know I always do the category breakdown?" he asked.

"You mentioned it when I asked about the structure. You said you do it for every region."

"I did mention that." He scrolled back up. "The date in the header — is that today's date?"

"It's the date and time the script ran."

"So if someone asks when the report was generated ..."

"It's right there in the header."

Marcus closed the file and looked at his screen for a moment.

"You know what I also do on Monday mornings? I email this to fourteen people. Can Python do that too?"

"Chapter 19," Priya said.

He nodded slowly. "What do I do for an hour and forty minutes next Monday?"

"I don't know. What would you like to do?"

What Made the Difference

The technical work in this case study is straightforward: pandas for aggregation, openpyxl for formatting. The more interesting question is why the output was immediately trusted and accepted.

Several design choices contributed:

Familiar structure. The Excel output matched Marcus's existing format closely. The same columns in the same order, the same summary structure. A completely different layout might have been objectively better, but familiarity accelerated adoption.

The auto-generated timestamp. Marcus had always worried about stale reports circulating after the source data updated. The generated-by timestamp made the recency of the data visible in the document itself, addressing a concern he had never explicitly articulated.

The same file name pattern. The output file was named acme_weekly_report_w48.xlsx — same pattern as the files Marcus had been creating manually (acme_northwest_w47.csv). He could drop the new file into his existing folder structure without reorganizing anything.

Small decisions that respected the existing workflow made the difference between a tool that gets used and a tool that gets admired once and abandoned.

This case study demonstrates: pd.concat for combining regional files, groupby + agg for summary calculations, openpyxl.Workbook with multiple sheets, PatternFill, Font, Alignment, number_format, freeze_panes, auto-fit column width calculation, and the importance of matching existing workflow conventions.