Exercises: Automated Reporting

DataField.Dev

Exercises: Automated Reporting

Install: pip install fpdf2 reportlab python-pptx jinja2 openpyxl weasyprint.

Part A: Conceptual (6 problems)

A.1 ★☆☆ | Recall

Name three reasons reports still matter despite the availability of dashboards.

Guidance

(1) Push vs. pull: reports arrive in inboxes without user action. (2) Audit trail: reports are snapshots that can be cited and archived. (3) Narrative structure: the author controls the story. (4) Compliance: many industries require formal document deliverables. (5) Distribution: PDFs work everywhere.

A.2 ★☆☆ | Recall

What is the BytesIO pattern and why is it used for chart generation in reports?

Guidance

`io.BytesIO` creates an in-memory buffer that acts like a file. Use `fig.savefig(buf, format="png")` to write the chart into the buffer, then `buf.getvalue()` returns the bytes. This avoids writing temporary files to disk, which simplifies deployment and cleanup.

A.3 ★★☆ | Understand

Compare FPDF2 and ReportLab. When should you use each?

Guidance

**FPDF2**: simple API, few dependencies, good for straightforward reports with text, tables, and images. **ReportLab**: more powerful, supports complex layouts (flowables, templates, custom pages), better for advanced typography and specialized documents. Start with FPDF2; upgrade to ReportLab when you hit its limits.

A.4 ★★☆ | Understand

How do you embed an inline image in an HTML email?

Guidance

Use CID (Content-ID) references. In the HTML: `

`. Attach the image with a matching Content-ID header: `img.add_header("Content-ID", "")`. Email clients render the inline image without needing an external URL.

A.5 ★★☆ | Analyze

Why is it critical to validate outputs before sending a scheduled report?

Guidance

A scheduled report that runs successfully but contains wrong or empty data is worse than no report — users receive it, trust it, and make decisions based on incorrect information. Pre-send validation (non-empty DataFrame, expected column counts, sanity checks on metrics) catches upstream data issues before they propagate to the report output.

A.6 ★★★ | Evaluate

A team has a weekly report pipeline that has worked for months but starts producing empty reports due to an upstream database schema change. How could this have been detected sooner?

Guidance

(1) Output validation that raises an exception on empty data. (2) Monitoring that alerts when the report file size drops below a threshold. (3) Cron monitoring that alerts on non-zero exit codes. (4) Schema validation in the data loading step (fail fast if expected columns are missing). (5) Unit tests that run against a sample dataset and verify the report builder produces expected output.

Part B: Applied (10 problems)

B.1 ★☆☆ | Apply

Save a matplotlib figure to a BytesIO buffer and read the bytes.

Guidance

import io
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 6])
buf = io.BytesIO()
fig.savefig(buf, format="png", dpi=150, bbox_inches="tight")
buf.seek(0)
chart_bytes = buf.getvalue()
plt.close(fig)

B.2 ★☆☆ | Apply

Create a minimal PDF with FPDF2 containing a title and a paragraph.

Guidance

from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font("helvetica", "B", 18)
pdf.cell(0, 10, "Report Title", ln=True, align="C")
pdf.set_font("helvetica", "", 12)
pdf.multi_cell(0, 6, "This is the summary paragraph.")
pdf.output("report.pdf")

B.3 ★★☆ | Apply

Add a chart image to an FPDF2 PDF.

Guidance

pdf.image(buf, x=20, w=170)

Where `buf` is a `BytesIO` object from the previous exercise.

B.4 ★★☆ | Apply

Create a ReportLab document with a title, paragraph, and table.

Guidance

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
data = [["A", "B"], ["1", "2"], ["3", "4"]]
story = [
    Paragraph("Title", styles["Title"]),
    Spacer(1, 12),
    Paragraph("Intro paragraph.", styles["BodyText"]),
    Table(data),
]
doc.build(story)

B.5 ★★☆ | Apply

Create a PowerPoint slide with a title and embedded chart using python-pptx.

Guidance

from pptx import Presentation
from pptx.util import Inches

prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[5])
slide.shapes.title.text = "Chart Title"
slide.shapes.add_picture("chart.png", Inches(1), Inches(2), width=Inches(8))
prs.save("report.pptx")

B.6 ★★☆ | Apply

Write a Jinja2 template for a report and render it with sample data.

Guidance

from jinja2 import Template
t = Template("<h1>{{ title }}</h1><p>Value: {{ value }}</p>")
html = t.render(title="Report", value=42)

B.7 ★★★ | Apply

Send an HTML email with an inline chart image using smtplib.

Guidance

from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.image import MIMEImage
import smtplib

msg = MIMEMultipart("related")
msg["Subject"] = "Report"
msg["From"] = "from@example.com"
msg["To"] = "to@example.com"
msg.attach(MIMEText('<img src="cid:chart1">', "html"))

img = MIMEImage(chart_bytes)
img.add_header("Content-ID", "<chart1>")
msg.attach(img)

with smtplib.SMTP("smtp.example.com", 587) as server:
    server.starttls()
    server.login("user", "pass")
    server.send_message(msg)

B.8 ★★☆ | Apply

Write a cron entry that runs a Python script every Monday at 9 AM.

Guidance

0 9 * * 1 /usr/bin/python3 /path/to/generate_report.py

B.9 ★★☆ | Apply

Add page numbers to an FPDF2 PDF by subclassing FPDF and overriding footer.

Guidance

class MyPDF(FPDF):
    def footer(self):
        self.set_y(-15)
        self.set_font("helvetica", "", 8)
        self.cell(0, 10, f"Page {self.page_no()}", align="C")

pdf = MyPDF()

B.10 ★★★ | Create

Build a complete parameterized report pipeline that takes a date range, generates 3 charts, assembles a PDF with a summary, and saves it to disk.

Guidance

Follow the Section 31.10 example as a template. Function signature: `generate_report(start_date, end_date, output_path)`. Steps: load data, compute metrics, build charts to BytesIO, build PDF with title/summary/charts, save.

Part C: Synthesis (4 problems)

C.1 ★★★ | Analyze

You have a daily report that takes 5 minutes to generate and sometimes fails silently. Diagnose and fix.

Guidance

(1) Add `try/except` with notifications on failure. (2) Log to a centralized system. (3) Add cron monitoring (Healthchecks.io). (4) Profile the 5-minute run to find optimization opportunities (data loading is usually the bottleneck). (5) Add output validation to catch stale or empty data.

C.2 ★★★ | Evaluate

When would you choose to use LLMs to generate narrative content in a report, and when would you avoid it?

Guidance

**Use LLMs**: for large-volume personalized reports, multilingual summaries, customer-facing content where writing cost dominates. **Avoid LLMs**: for high-stakes reports where accuracy is paramount, for small-volume reports where the cost exceeds the savings, for privacy-sensitive data that cannot be sent to external APIs. Always validate LLM outputs against the underlying data.

C.3 ★★★ | Create

Build a parameterized pipeline that generates a separate PDF for each of 10 customers, all with personalized charts and tables.

Guidance

Write `generate_customer_report(customer_id, output_path)`. In a loop, call it with each customer ID. Optionally parallelize with `multiprocessing.Pool`. Each report is customized but produced by the same function.

C.4 ★★★ | Evaluate

The chapter says "reports are often underestimated because they feel old-fashioned compared to dashboards." Do you agree? In your experience, which stakeholders prefer each?

Guidance

Subjective but typical patterns: executives prefer reports (push, digestible, no clicking needed); analysts prefer dashboards (pull, explorable, real-time); regulators require reports (audit trail). The "old-fashioned" reputation often reflects young analysts' preferences rather than user needs.

Chapter 32 covers theming and branding — building a visual identity that applies across all the output formats in this part.