Case Study 24-1: Priya Migrates Acme's Weekly Reports to the Cloud

Characters: - Priya Okonkwo — Junior Analyst, Acme Corp - Sandra Chen — VP of Sales, Acme Corp - Marcus Webb — IT Manager, Acme Corp

Business Problem: Acme's weekly regional sales reports are generated as Excel files on Priya's laptop and distributed via email attachment. Sandra cannot access them from her phone while traveling, and the version problem — multiple people forwarding older versions of the file — is creating confusion about which numbers are authoritative.


The Situation

Every Monday morning, Priya runs a Python script that pulls data from Acme's sales database, aggregates revenue and unit counts by region (North, South, East, West), and saves an Excel report to her desktop. She emails it to Sandra and four regional managers.

The system has three problems. First, Sandra frequently travels and cannot open Excel files on her iPhone. Second, when Sandra forwards the attachment, the regional managers sometimes end up with different versions. Third, Priya is the single point of failure — if she is sick on Monday, no one gets the report.

After a Monday morning incident where Sandra needed numbers for a board presentation and could not open Priya's attachment, Marcus Webb pulled Priya aside. "I've been saying we should put reports in S3 for six months. This is the week we do it."


The Approach

Priya spent Tuesday afternoon planning the migration. Her existing script already produced a clean Excel file. The change was an extension, not a rewrite: after generating the file locally, upload it to S3, generate a presigned URL, and email the URL instead of the attachment.

She had three conversations before writing a line of code:

Conversation 1 — Marcus Webb on IAM permissions: "I need an IAM user that can write to one S3 bucket," Priya said. "Nothing else."

Marcus nodded. "Principle of least privilege. Write down the exact actions you need — PutObject, GetObject, ListBucket — and the specific bucket ARN. I'll review it before approving the user creation."

Priya wrote the policy request in a shared document that evening. Marcus approved it the next morning with one note: "Don't forget to add DeleteObject if you ever need to clean up old reports, but leave it out for now if you don't need it."

Conversation 2 — Sandra on access requirements: "How long do you need the link to stay valid?" Priya asked. "48 hours is fine. By Wednesday, last Monday's numbers are stale anyway." "Do you want a password on it?" "No — if someone gets the email, they get the report. The data isn't classified. Just don't make it a permanent link."

Conversation 3 — Priya's own planning: She mapped out the exact .env variables she would need and wrote a .env.example file before touching any code. She verified her .gitignore excluded .env and the credentials directory. Only then did she start writing functions.


The Implementation

Priya's modified pipeline, simplified for clarity:

"""
acme_weekly_report.py — Weekly sales report with S3 delivery.
Run every Monday morning. Uploads report to S3, emails Sandra a secure link.
"""

import os
import pandas as pd
from datetime import datetime
from dotenv import load_dotenv

load_dotenv()

ACME_S3_BUCKET = os.environ.get("ACME_S3_BUCKET")
SANDRA_EMAIL = os.environ.get("SANDRA_EMAIL")
REPORT_RECIPIENTS = os.environ.get("REPORT_RECIPIENTS", "").split(",")


def generate_regional_summary(db_path: str) -> pd.DataFrame:
    """
    Query the sales database and produce a regional summary DataFrame.

    Args:
        db_path: Path to acme_inventory.db or connection string.

    Returns:
        pd.DataFrame: One row per region with revenue and unit totals.
    """
    from sqlalchemy import create_engine, text

    engine = create_engine(f"sqlite:///{db_path}")
    with engine.connect() as conn:
        result = conn.execute(
            text("""
                SELECT
                    region,
                    SUM(revenue) AS total_revenue,
                    SUM(units_sold) AS total_units,
                    COUNT(sale_id) AS transaction_count,
                    AVG(revenue) AS avg_order_value
                FROM sales
                WHERE sale_date >= date('now', 'start of month', '-7 days')
                GROUP BY region
                ORDER BY total_revenue DESC
            """)
        )
        df = pd.DataFrame(result.fetchall(), columns=list(result.keys()))

    df["total_revenue"] = df["total_revenue"].round(2)
    df["avg_order_value"] = df["avg_order_value"].round(2)
    return df


def save_report_to_excel(df: pd.DataFrame, filename: str) -> None:
    """Save the summary DataFrame to a formatted Excel file."""
    with pd.ExcelWriter(filename, engine="openpyxl") as writer:
        df.to_excel(writer, index=False, sheet_name="Regional Summary")
    print(f"Report saved: {filename}")


def run_pipeline(db_path: str = "data/acme_inventory.db") -> None:
    """
    Execute the full report pipeline: generate, upload, notify.

    Args:
        db_path: Path to the Acme sales database.
    """
    if not ACME_S3_BUCKET:
        raise ValueError("ACME_S3_BUCKET not set in environment.")

    now = datetime.now()
    week_label = now.strftime("Week-%W-%Y")
    local_filename = f"acme_regional_summary_{week_label}.xlsx"

    print(f"\nGenerating Acme weekly report: {week_label}")
    print("-" * 50)

    # Step 1: Generate
    summary_df = generate_regional_summary(db_path)
    save_report_to_excel(summary_df, local_filename)

    # Step 2: Upload to S3
    from cloud_storage_demo import upload_file_to_s3, generate_presigned_url
    s3_key = f"reports/weekly-sales/{now.year}/W{now.strftime('%W')}/regional-summary.xlsx"
    upload_file_to_s3(local_filename, ACME_S3_BUCKET, s3_key)

    # Step 3: Presigned URL (48 hours)
    share_url = generate_presigned_url(ACME_S3_BUCKET, s3_key, expiration_hours=48)

    # Step 4: Build summary bullets
    top_region = summary_df.loc[summary_df["total_revenue"].idxmax(), "region"]
    total_revenue = summary_df["total_revenue"].sum()
    total_transactions = summary_df["transaction_count"].sum()

    summary_bullets = [
        f"Total revenue this week: ${total_revenue:,.2f}",
        f"Best performing region: {top_region}",
        f"Total transactions: {total_transactions:,}",
    ]

    # Step 5: Notify all recipients
    from notification import send_report_notification
    all_recipients = [SANDRA_EMAIL] + [r for r in REPORT_RECIPIENTS if r.strip()]

    for recipient in all_recipients:
        send_report_notification(
            recipient_email=recipient.strip(),
            report_name=f"Weekly Sales Report — {week_label}",
            presigned_url=share_url,
            summary_lines=summary_bullets,
            expiration_hours=48,
        )

    print(f"\nPipeline complete.")
    print(f"Report: {s3_key}")
    print(f"Share URL (valid 48h): {share_url[:80]}...")


if __name__ == "__main__":
    run_pipeline()

What Marcus Reviewed

Before Priya deployed, Marcus Webb sat with her for twenty minutes and reviewed three things:

The IAM policy. He confirmed it granted only s3:PutObject, s3:GetObject, and s3:ListBucket on arn:aws:s3:::acme-corp-reports and arn:aws:s3:::acme-corp-reports/*. "Nothing else. If this key is ever leaked, the attacker gets read/write access to one S3 bucket that contains report files. They don't get the database, they don't get other AWS services."

The .env file and .gitignore. "Show me your .gitignore," Marcus said. Priya opened the file. .env was on line 1. "Good. And show me your git log to confirm .env was never committed." It had not been.

The presigned URL expiration. "48 hours. Sandra confirmed that works for her use case. After 48 hours the link dies and there's no way to access the old file through that URL."

Marcus approved the deployment. "This is how I want every automated script at Acme to handle cloud resources."


The Outcome

Sandra received the first notification email at 7:03 AM on Monday. She opened "View Report" from her phone in the airport. The file loaded in her browser as a clean table. She forwarded the email to the CFO — who also clicked the link from their phone and saw the same current data.

By the following Monday, all four regional managers had adjusted to checking the email link instead of opening attachments. No one called Priya asking "which version is right?"

Three weeks later, Marcus configured the script to run as a scheduled task on a server in Acme's office. Priya's laptop no longer needed to be on for the report to run. Marcus, who had been openly skeptical about the Python initiative when Sandra hired Priya, wrote in Priya's quarterly review: "The cloud reporting pipeline is the most impactful automation project this team has shipped."


Key Decisions and Trade-offs

Why presigned URLs instead of public objects? The reports contain regional revenue data that Acme does not want publicly accessible. Presigned URLs provide temporary access with automatic expiration and no ongoing security exposure.

Why 48 hours instead of 1 week? Sandra's feedback: "If I haven't looked at it by Wednesday, the data is stale anyway." Shorter expirations reduce the window during which a forwarded email could give unauthorized access.

Why email link instead of shared S3 folder access? Marcus did not want to manage AWS credentials for the five non-technical recipients. A presigned URL requires nothing from the recipient.

Why keep the local file after uploading? Priya keeps a local copy in an archived/ folder for two weeks as a backup. Marcus agreed this was reasonable belt-and-suspenders practice.


Discussion Questions

  1. Priya keeps the script's IAM policy to three permissions. What additional risk would s3:DeleteObject permission add, and under what circumstances would it be appropriate to add it?

  2. If Priya wanted the script to run even when the Acme office server was down, what would be the next architectural step?

  3. Sandra currently receives one email with one link. If she wanted to share different sections of the report with different managers, what modification would be most elegant?

  4. Marcus is planning to onboard a second analyst. How should credentials be managed so that the second analyst can run the same pipeline without Priya sharing her .env file directly?