Chapter 36 Key Takeaways: Automated Report Generation

The Core Idea

Every organization has recurring reports that someone builds by hand every month. Automating those reports eliminates labor cost, eliminates errors, and produces more consistent output than any manual process. Once built, a report pipeline requires essentially zero ongoing maintenance and runs forever.


What You Learned

The Business Case Is Clear

Manual report creation costs more than it appears: direct labor time, manager review time, error correction, and the opportunity cost of what the analyst could have built instead. A single monthly report that takes two hours costs over $1,400/year in labor at typical analyst rates. The automation investment pays back in months, not years.

The less obvious benefit is consistency and trust. Automated reports look the same every month, draw from the same data source, and produce identical numbers in every section. Stakeholders can stop double-checking and start using the reports as a reliable primary source.

Jinja2 Is the Foundation

The Jinja2 templating engine converts Python data into HTML using three delimiter types:

  • {{ variable }} — Output values, expressions, filtered data
  • {% statement %} — Control flow: for loops, if statements, block definitions
  • {# comment #} — Template comments that do not appear in output

Custom filters (registered via env.filters['name'] = function) extend Jinja2 with domain-specific formatting like currency, percentage, and compact number notation.

Template inheritance ({% extends %} and {% block %}) solves the consistency problem: a base template defines headers, footers, navigation, and CSS once; every report type inherits those elements automatically. One change to the base template updates every report that extends it.

Charts Must Be Self-Contained

Charts embedded in HTML reports and PDFs must use base64 encoding, not external file references. The three-step pattern:

  1. Generate the matplotlib figure normally
  2. Save it to an io.BytesIO() buffer rather than a file
  3. Base64-encode the buffer contents and embed in the <img src> attribute

This produces HTML that works in any browser, converts to PDF correctly, and can be sent by email without broken images.

The Six-Stage Pipeline

Every report pipeline follows the same structure regardless of report type or audience:

Load Data → Calculate Metrics → Generate Charts → Render Template → Save Outputs → Deliver

Implementing this as a function or class separates concerns cleanly: loading is separate from calculation, calculation is separate from presentation, presentation is separate from delivery. Each stage can be tested independently.

weasyprint Handles HTML to PDF

A single function call converts any HTML string to a professional PDF. The CSS @page rule handles print-specific formatting: page size, margins, running headers, and automatic page numbering. Elements like metric cards and charts should be marked page-break-inside: avoid to prevent them from being split across pages.

Scheduling Closes the Loop

A report pipeline that a human must trigger manually is only half-automated. Scheduling — via the schedule library, cron, or Windows Task Scheduler — removes the last human step. The pipeline runs at a configured time, generates the report, delivers it, and logs the result. Nobody has to remember; it just happens.

For production reliability: keep a guard check inside the script (verify it is actually the 1st of the month), implement error handling that sends alerts to the analyst rather than bad reports to stakeholders, and log every run with timestamps and output file paths.


Design Principles

Fetch once, render multiple times. Query the data source once. Compute metrics once. Render into HTML, PDF, and Excel from the same data dictionary. This ensures all output formats show identical numbers and eliminates the risk of recalculation inconsistencies.

Templates are code. Store templates in version control alongside the Python pipeline. Commit template changes with descriptive messages. Document the expected context variables. Test templates with known data before deploying changes.

Errors should alert the analyst, not reach stakeholders. Wrap the full pipeline in a try/except block. On failure, send an alert email to the report owner with the full traceback. Do not send a partial or incorrect report. Recipients should see either the complete correct report or nothing.

Separate configuration from code. Distribution lists, report schedules, and company names belong in configuration files or environment variables, not hardcoded in the Python script. This makes changes safe without touching code.


Practical Checklist

Before deploying a report pipeline to production:

  • [ ] Test the complete pipeline on sample data that mirrors the production data structure
  • [ ] Verify the PDF output looks correct in at least two PDF viewers
  • [ ] Confirm the email sends and the PDF attachment opens correctly
  • [ ] Test the error path: what happens if the data source is unavailable?
  • [ ] Set up logging so you can verify the pipeline ran correctly after the fact
  • [ ] Store the configuration (recipients, schedule, template directory) in a separate config file
  • [ ] Run the report for a prior month and verify all numbers match your manual calculations
  • [ ] Commit the template and pipeline code to version control

Tool Reference

Task Tool Notes
HTML templating Jinja2 Industry standard; powers Flask, Django, and more
HTML to PDF weasyprint Best CSS support for print; handles @page rules
Programmatic PDF reportlab More control, more code; good for complex layouts
Charts for embedding matplotlib + io.BytesIO Save to buffer, encode to base64
Scheduling (simple) schedule library Runs inside Python; needs persistent process
Scheduling (production) cron / Task Scheduler OS-level; restarts automatically
Excel reports openpyxl Full .xlsx support, charts, formatting

One-Sentence Summary

Automated report generation is the highest-leverage Python project most analysts can build: it returns time indefinitely, eliminates an entire class of errors, and makes every report more consistent and trustworthy than any manual process ever could be.

Core Concepts at a Glance

Concept What It Is When to Use It
Jinja2 templating A template engine that renders HTML with variable data Any HTML/text report where the structure is consistent but data changes
WeasyPrint HTML+CSS to PDF converter When you need print-ready PDFs without a separate PDF design tool
openpyxl Read/write Excel .xlsx files in Python When your audience works primarily in Excel
base64 chart embedding Convert matplotlib charts to data URIs embedded in HTML Self-contained HTML reports with no external image dependencies
Template inheritance Base template + child templates that override specific sections Multiple report types sharing the same layout
Parameterized reports Same template, different date ranges or filters Monthly/quarterly/annual reports from one codebase

Jinja2 Quick Reference

Delimiter Summary

{{ expression }}   — Output a value (variables, filters, expressions)
{% statement %}    — Control flow (if, for, set, extends, block, include)
{# comment #}      — Template comment — NOT rendered in output

Common Patterns

{# Variable output #}
{{ company_name }}
{{ revenue | currency }}

{# Loop #}
{% for item in items %}
    {{ loop.index }}. {{ item.name }}
{% endfor %}

{# Conditional #}
{% if value >= target %}
    <span class="positive">On track</span>
{% elif value >= target * 0.9 %}
    <span class="warning">Close</span>
{% else %}
    <span class="negative">Below target</span>
{% endif %}

{# Template inheritance #}
{% extends "base.html" %}
{% block content %}
    Your content here
{% endblock %}

Registering Custom Filters

env = Environment(loader=FileSystemLoader("templates"))
env.filters["currency"] = lambda v: f"${v:,.2f}"
env.filters["pct"] = lambda v: f"{v:.1f}%"
env.filters["integer"] = lambda v: f"{int(v):,}"

Report Architecture Checklist

Use this checklist when designing a new automated report:

  • [ ] Data source confirmed — Where does the data come from? Database, API, CSV?
  • [ ] Aggregation point identified — Does Python aggregate, or does the query return pre-aggregated data?
  • [ ] Template structure planned — Executive summary first, then KPIs, then charts, then tables
  • [ ] Parameterization designed — What changes between runs? (Date range, region, recipient list?)
  • [ ] Output formats decided — HTML only? PDF? Excel? All three?
  • [ ] Error handling in place — What happens if the data source is unavailable?
  • [ ] Delivery method chosen — Email? File share? Web portal?
  • [ ] Schedule set — When does this run? First of the month? Daily? On demand?
  • [ ] Logging added — Is there a record of every run with success/failure status?
  • [ ] Recipients configurable — Is the distribution list in a config file, not hardcoded?

Chart-to-Base64 Pattern

import base64
import io
import matplotlib
matplotlib.use("Agg")  # Must be before importing pyplot
import matplotlib.pyplot as plt

def fig_to_base64(fig: plt.Figure) -> str:
    """Convert a matplotlib figure to a base64 data URI."""
    buf = io.BytesIO()
    fig.savefig(buf, format="png", dpi=150, bbox_inches="tight",
                facecolor="white")
    buf.seek(0)
    encoded = base64.b64encode(buf.read()).decode("utf-8")
    plt.close(fig)  # Always close to free memory
    return f"data:image/png;base64,{encoded}"

# In your HTML template:
# <img src="{{ chart_data_uri }}" alt="Revenue chart">

Key note: Call matplotlib.use("Agg") before import matplotlib.pyplot as plt. On a server or scheduled job, matplotlib will fail without a display backend unless you specify Agg.


openpyxl Style Reference

from openpyxl.styles import Font, PatternFill, Alignment, Border, Side

# Dark header row
header_fill = PatternFill(start_color="2C3E50", end_color="2C3E50",
                          fill_type="solid")
header_font = Font(color="FFFFFF", bold=True, size=11)

# Alternating row color
alt_fill = PatternFill(start_color="EBF5FB", end_color="EBF5FB",
                       fill_type="solid")

# Number formats
cell.number_format = '"$"#,##0.00'    # Currency: $1,234.56
cell.number_format = "#,##0"          # Integer with commas: 1,234
cell.number_format = "0.0%"           # Percentage: 45.2%
cell.number_format = "+0.0%;-0.0%"   # Signed percentage: +12.3% or -4.1%

# Auto-size columns
from openpyxl.utils import get_column_letter
for col_cells in ws.columns:
    max_len = max(len(str(c.value or "")) for c in col_cells)
    ws.column_dimensions[get_column_letter(col_cells[0].column)].width = max_len + 2

Executive Summary Generation Pattern

The executive summary should be auto-generated from data, not written manually. Structure each bullet as: what happened + context + implication if needed.

def generate_executive_summary(data: dict) -> list[str]:
    """Return a list of HTML bullet point strings."""
    bullets = []

    # Pattern: compare to target
    variance_pct = (data["actual"] - data["target"]) / data["target"] * 100
    if variance_pct >= 0:
        bullets.append(
            f"<strong>{data['metric_name']}</strong> of "
            f"${data['actual']:,.0f} exceeded target by {variance_pct:.1f}%."
        )
    else:
        bullets.append(
            f"<strong>{data['metric_name']}</strong> of "
            f"${data['actual']:,.0f} missed target by {abs(variance_pct):.1f}%."
        )

    # Pattern: identify best performer
    best = max(data["items"], key=lambda x: x["value"])
    bullets.append(
        f"Top performer: {best['name']} with "
        f"${best['value']:,.0f} ({best['yoy_change']:+.1f}% YoY)."
    )

    return bullets

PDF Generation with WeasyPrint

import weasyprint

# From string
html_string = render_template(...)
weasyprint.HTML(string=html_string).write_pdf("report.pdf")

# From file
weasyprint.HTML(filename="report.html").write_pdf("report.pdf")

# Essential CSS for print-ready output
css = """
@page {
    size: A4;
    margin: 20mm 15mm 25mm 15mm;
    @bottom-right {
        content: "Page " counter(page) " of " counter(pages);
        font-size: 9pt;
    }
}
.section-break { page-break-before: always; }
tr { page-break-inside: avoid; }
"""
weasyprint.HTML(string=html_string).write_pdf(
    "report.pdf",
    stylesheets=[weasyprint.CSS(string=css)]
)

Common Mistakes to Avoid

Mistake 1: Importing pyplot before setting backend

# WRONG — will fail on servers
import matplotlib.pyplot as plt
matplotlib.use("Agg")

# CORRECT — set backend first
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt

Mistake 2: Hardcoding recipient lists

# WRONG
recipients = ["priya@acmecorp.com", "sandra@acmecorp.com"]

# CORRECT — load from config
import json
config = json.loads(Path("report_config.json").read_text())
recipients = config["monthly_review_recipients"]

Mistake 3: Not closing matplotlib figures

# WRONG — memory leak in loops
fig, ax = plt.subplots()
# ... add data ...
return fig_to_base64(fig)

# CORRECT — close after converting
result = fig_to_base64(fig)
plt.close(fig)  # Free the memory
return result

Mistake 4: Fetching raw rows instead of aggregating in SQL

# WRONG — fetches 500k rows
rows = db.execute("SELECT * FROM sales WHERE month = %s", month)
total = sum(row["revenue"] for row in rows)

# CORRECT — aggregate at the database
total = db.execute(
    "SELECT SUM(revenue) FROM sales WHERE month = %s", month
).fetchone()[0]

Mistake 5: Using | safe without sanitization

{# DANGEROUS if executive_summary comes from untrusted user input #}
{{ executive_summary | safe }}

{# Safe: only use | safe for content you generated in your own Python code #}

| safe tells Jinja2 not to escape HTML characters. Only use it when the content is HTML you generated yourself, never on user-provided text.


Report Delivery Checklist

Before sending automated reports to stakeholders:

  • [ ] Review on first run — Manually review the first automated output before trusting the schedule
  • [ ] Check PDF page breaks — Open the PDF and verify sections don't split awkwardly across pages
  • [ ] Verify numbers against source — Cross-check three or four key metrics against the source data
  • [ ] Test with one recipient first — Send to yourself before sending to the distribution list
  • [ ] Check email rendering — The HTML email body should render correctly in Gmail, Outlook, and mobile
  • [ ] Confirm attachment opens — Open the attached PDF to verify it's not corrupted
  • [ ] Review the subject line — Make sure the month/year in the subject line matches the report content
  • [ ] Set up error alerts — Any exception during generation should notify you, not silently fail