Chapter 36 Key Takeaways: Automated Report Generation
The Core Idea
Every organization has recurring reports that someone builds by hand every month. Automating those reports eliminates labor cost, eliminates errors, and produces more consistent output than any manual process. Once built, a report pipeline requires essentially zero ongoing maintenance and runs forever.
What You Learned
The Business Case Is Clear
Manual report creation costs more than it appears: direct labor time, manager review time, error correction, and the opportunity cost of what the analyst could have built instead. A single monthly report that takes two hours costs over $1,400/year in labor at typical analyst rates. The automation investment pays back in months, not years.
The less obvious benefit is consistency and trust. Automated reports look the same every month, draw from the same data source, and produce identical numbers in every section. Stakeholders can stop double-checking and start using the reports as a reliable primary source.
Jinja2 Is the Foundation
The Jinja2 templating engine converts Python data into HTML using three delimiter types:
{{ variable }}— Output values, expressions, filtered data{% statement %}— Control flow: for loops, if statements, block definitions{# comment #}— Template comments that do not appear in output
Custom filters (registered via env.filters['name'] = function) extend Jinja2 with domain-specific formatting like currency, percentage, and compact number notation.
Template inheritance ({% extends %} and {% block %}) solves the consistency problem: a base template defines headers, footers, navigation, and CSS once; every report type inherits those elements automatically. One change to the base template updates every report that extends it.
Charts Must Be Self-Contained
Charts embedded in HTML reports and PDFs must use base64 encoding, not external file references. The three-step pattern:
- Generate the matplotlib figure normally
- Save it to an
io.BytesIO()buffer rather than a file - Base64-encode the buffer contents and embed in the
<img src>attribute
This produces HTML that works in any browser, converts to PDF correctly, and can be sent by email without broken images.
The Six-Stage Pipeline
Every report pipeline follows the same structure regardless of report type or audience:
Load Data → Calculate Metrics → Generate Charts → Render Template → Save Outputs → Deliver
Implementing this as a function or class separates concerns cleanly: loading is separate from calculation, calculation is separate from presentation, presentation is separate from delivery. Each stage can be tested independently.
weasyprint Handles HTML to PDF
A single function call converts any HTML string to a professional PDF. The CSS @page rule handles print-specific formatting: page size, margins, running headers, and automatic page numbering. Elements like metric cards and charts should be marked page-break-inside: avoid to prevent them from being split across pages.
Scheduling Closes the Loop
A report pipeline that a human must trigger manually is only half-automated. Scheduling — via the schedule library, cron, or Windows Task Scheduler — removes the last human step. The pipeline runs at a configured time, generates the report, delivers it, and logs the result. Nobody has to remember; it just happens.
For production reliability: keep a guard check inside the script (verify it is actually the 1st of the month), implement error handling that sends alerts to the analyst rather than bad reports to stakeholders, and log every run with timestamps and output file paths.
Design Principles
Fetch once, render multiple times. Query the data source once. Compute metrics once. Render into HTML, PDF, and Excel from the same data dictionary. This ensures all output formats show identical numbers and eliminates the risk of recalculation inconsistencies.
Templates are code. Store templates in version control alongside the Python pipeline. Commit template changes with descriptive messages. Document the expected context variables. Test templates with known data before deploying changes.
Errors should alert the analyst, not reach stakeholders. Wrap the full pipeline in a try/except block. On failure, send an alert email to the report owner with the full traceback. Do not send a partial or incorrect report. Recipients should see either the complete correct report or nothing.
Separate configuration from code. Distribution lists, report schedules, and company names belong in configuration files or environment variables, not hardcoded in the Python script. This makes changes safe without touching code.
Practical Checklist
Before deploying a report pipeline to production:
- [ ] Test the complete pipeline on sample data that mirrors the production data structure
- [ ] Verify the PDF output looks correct in at least two PDF viewers
- [ ] Confirm the email sends and the PDF attachment opens correctly
- [ ] Test the error path: what happens if the data source is unavailable?
- [ ] Set up logging so you can verify the pipeline ran correctly after the fact
- [ ] Store the configuration (recipients, schedule, template directory) in a separate config file
- [ ] Run the report for a prior month and verify all numbers match your manual calculations
- [ ] Commit the template and pipeline code to version control
Tool Reference
| Task | Tool | Notes |
|---|---|---|
| HTML templating | Jinja2 | Industry standard; powers Flask, Django, and more |
| HTML to PDF | weasyprint | Best CSS support for print; handles @page rules |
| Programmatic PDF | reportlab | More control, more code; good for complex layouts |
| Charts for embedding | matplotlib + io.BytesIO | Save to buffer, encode to base64 |
| Scheduling (simple) | schedule library | Runs inside Python; needs persistent process |
| Scheduling (production) | cron / Task Scheduler | OS-level; restarts automatically |
| Excel reports | openpyxl | Full .xlsx support, charts, formatting |
One-Sentence Summary
Automated report generation is the highest-leverage Python project most analysts can build: it returns time indefinitely, eliminates an entire class of errors, and makes every report more consistent and trustworthy than any manual process ever could be.
Core Concepts at a Glance
| Concept | What It Is | When to Use It |
|---|---|---|
| Jinja2 templating | A template engine that renders HTML with variable data | Any HTML/text report where the structure is consistent but data changes |
| WeasyPrint | HTML+CSS to PDF converter | When you need print-ready PDFs without a separate PDF design tool |
| openpyxl | Read/write Excel .xlsx files in Python |
When your audience works primarily in Excel |
| base64 chart embedding | Convert matplotlib charts to data URIs embedded in HTML | Self-contained HTML reports with no external image dependencies |
| Template inheritance | Base template + child templates that override specific sections | Multiple report types sharing the same layout |
| Parameterized reports | Same template, different date ranges or filters | Monthly/quarterly/annual reports from one codebase |
Jinja2 Quick Reference
Delimiter Summary
{{ expression }} — Output a value (variables, filters, expressions)
{% statement %} — Control flow (if, for, set, extends, block, include)
{# comment #} — Template comment — NOT rendered in output
Common Patterns
{# Variable output #}
{{ company_name }}
{{ revenue | currency }}
{# Loop #}
{% for item in items %}
{{ loop.index }}. {{ item.name }}
{% endfor %}
{# Conditional #}
{% if value >= target %}
<span class="positive">On track</span>
{% elif value >= target * 0.9 %}
<span class="warning">Close</span>
{% else %}
<span class="negative">Below target</span>
{% endif %}
{# Template inheritance #}
{% extends "base.html" %}
{% block content %}
Your content here
{% endblock %}
Registering Custom Filters
env = Environment(loader=FileSystemLoader("templates"))
env.filters["currency"] = lambda v: f"${v:,.2f}"
env.filters["pct"] = lambda v: f"{v:.1f}%"
env.filters["integer"] = lambda v: f"{int(v):,}"
Report Architecture Checklist
Use this checklist when designing a new automated report:
- [ ] Data source confirmed — Where does the data come from? Database, API, CSV?
- [ ] Aggregation point identified — Does Python aggregate, or does the query return pre-aggregated data?
- [ ] Template structure planned — Executive summary first, then KPIs, then charts, then tables
- [ ] Parameterization designed — What changes between runs? (Date range, region, recipient list?)
- [ ] Output formats decided — HTML only? PDF? Excel? All three?
- [ ] Error handling in place — What happens if the data source is unavailable?
- [ ] Delivery method chosen — Email? File share? Web portal?
- [ ] Schedule set — When does this run? First of the month? Daily? On demand?
- [ ] Logging added — Is there a record of every run with success/failure status?
- [ ] Recipients configurable — Is the distribution list in a config file, not hardcoded?
Chart-to-Base64 Pattern
import base64
import io
import matplotlib
matplotlib.use("Agg") # Must be before importing pyplot
import matplotlib.pyplot as plt
def fig_to_base64(fig: plt.Figure) -> str:
"""Convert a matplotlib figure to a base64 data URI."""
buf = io.BytesIO()
fig.savefig(buf, format="png", dpi=150, bbox_inches="tight",
facecolor="white")
buf.seek(0)
encoded = base64.b64encode(buf.read()).decode("utf-8")
plt.close(fig) # Always close to free memory
return f"data:image/png;base64,{encoded}"
# In your HTML template:
# <img src="{{ chart_data_uri }}" alt="Revenue chart">
Key note: Call matplotlib.use("Agg") before import matplotlib.pyplot as plt. On a server or scheduled job, matplotlib will fail without a display backend unless you specify Agg.
openpyxl Style Reference
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
# Dark header row
header_fill = PatternFill(start_color="2C3E50", end_color="2C3E50",
fill_type="solid")
header_font = Font(color="FFFFFF", bold=True, size=11)
# Alternating row color
alt_fill = PatternFill(start_color="EBF5FB", end_color="EBF5FB",
fill_type="solid")
# Number formats
cell.number_format = '"$"#,##0.00' # Currency: $1,234.56
cell.number_format = "#,##0" # Integer with commas: 1,234
cell.number_format = "0.0%" # Percentage: 45.2%
cell.number_format = "+0.0%;-0.0%" # Signed percentage: +12.3% or -4.1%
# Auto-size columns
from openpyxl.utils import get_column_letter
for col_cells in ws.columns:
max_len = max(len(str(c.value or "")) for c in col_cells)
ws.column_dimensions[get_column_letter(col_cells[0].column)].width = max_len + 2
Executive Summary Generation Pattern
The executive summary should be auto-generated from data, not written manually. Structure each bullet as: what happened + context + implication if needed.
def generate_executive_summary(data: dict) -> list[str]:
"""Return a list of HTML bullet point strings."""
bullets = []
# Pattern: compare to target
variance_pct = (data["actual"] - data["target"]) / data["target"] * 100
if variance_pct >= 0:
bullets.append(
f"<strong>{data['metric_name']}</strong> of "
f"${data['actual']:,.0f} exceeded target by {variance_pct:.1f}%."
)
else:
bullets.append(
f"<strong>{data['metric_name']}</strong> of "
f"${data['actual']:,.0f} missed target by {abs(variance_pct):.1f}%."
)
# Pattern: identify best performer
best = max(data["items"], key=lambda x: x["value"])
bullets.append(
f"Top performer: {best['name']} with "
f"${best['value']:,.0f} ({best['yoy_change']:+.1f}% YoY)."
)
return bullets
PDF Generation with WeasyPrint
import weasyprint
# From string
html_string = render_template(...)
weasyprint.HTML(string=html_string).write_pdf("report.pdf")
# From file
weasyprint.HTML(filename="report.html").write_pdf("report.pdf")
# Essential CSS for print-ready output
css = """
@page {
size: A4;
margin: 20mm 15mm 25mm 15mm;
@bottom-right {
content: "Page " counter(page) " of " counter(pages);
font-size: 9pt;
}
}
.section-break { page-break-before: always; }
tr { page-break-inside: avoid; }
"""
weasyprint.HTML(string=html_string).write_pdf(
"report.pdf",
stylesheets=[weasyprint.CSS(string=css)]
)
Common Mistakes to Avoid
Mistake 1: Importing pyplot before setting backend
# WRONG — will fail on servers
import matplotlib.pyplot as plt
matplotlib.use("Agg")
# CORRECT — set backend first
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
Mistake 2: Hardcoding recipient lists
# WRONG
recipients = ["priya@acmecorp.com", "sandra@acmecorp.com"]
# CORRECT — load from config
import json
config = json.loads(Path("report_config.json").read_text())
recipients = config["monthly_review_recipients"]
Mistake 3: Not closing matplotlib figures
# WRONG — memory leak in loops
fig, ax = plt.subplots()
# ... add data ...
return fig_to_base64(fig)
# CORRECT — close after converting
result = fig_to_base64(fig)
plt.close(fig) # Free the memory
return result
Mistake 4: Fetching raw rows instead of aggregating in SQL
# WRONG — fetches 500k rows
rows = db.execute("SELECT * FROM sales WHERE month = %s", month)
total = sum(row["revenue"] for row in rows)
# CORRECT — aggregate at the database
total = db.execute(
"SELECT SUM(revenue) FROM sales WHERE month = %s", month
).fetchone()[0]
Mistake 5: Using | safe without sanitization
{# DANGEROUS if executive_summary comes from untrusted user input #}
{{ executive_summary | safe }}
{# Safe: only use | safe for content you generated in your own Python code #}
| safe tells Jinja2 not to escape HTML characters. Only use it when the content is HTML you generated yourself, never on user-provided text.
Report Delivery Checklist
Before sending automated reports to stakeholders:
- [ ] Review on first run — Manually review the first automated output before trusting the schedule
- [ ] Check PDF page breaks — Open the PDF and verify sections don't split awkwardly across pages
- [ ] Verify numbers against source — Cross-check three or four key metrics against the source data
- [ ] Test with one recipient first — Send to yourself before sending to the distribution list
- [ ] Check email rendering — The HTML email body should render correctly in Gmail, Outlook, and mobile
- [ ] Confirm attachment opens — Open the attached PDF to verify it's not corrupted
- [ ] Review the subject line — Make sure the month/year in the subject line matches the report content
- [ ] Set up error alerts — Any exception during generation should notify you, not silently fail