Quiz: Automated Reporting
Part I: Multiple Choice (10 questions)
Q1. Which Python library is the simplest option for generating PDFs?
A) ReportLab B) FPDF2 C) WeasyPrint D) pdfkit
Answer
**B.** FPDF2 has a simple API and few dependencies. ReportLab is more capable but more complex; WeasyPrint and pdfkit are HTML-to-PDF converters.Q2. What is io.BytesIO used for in report generation?
A) Reading CSV files B) An in-memory buffer for chart images C) Network sockets D) Database connections
Answer
**B.** `BytesIO` acts as an in-memory file. You save a matplotlib figure to it and then read the bytes for embedding in PDFs, emails, or slides — no temporary files required.Q3. Which library generates PowerPoint files?
A) python-pptx B) pptxgen C) reportlab-ppt D) pptlib
Answer
**A.** `python-pptx` (pip install python-pptx) is the standard library for reading and writing .pptx files.Q4. What is CID in the context of HTML emails?
A) Content-ID — a reference for inline attached images B) Common Image Data C) Client ID D) Chart Identifier
Answer
**A.** Content-ID lets HTML email reference inline image attachments via `Q5. Which syntax is Jinja2's variable substitution?
A) ${variable}
B) {{ variable }}
C) <%= variable %>
D) #{variable}
Answer
**B.** Jinja2 uses `{{ variable }}` for expressions and `{% statement %}` for control flow (if, for).Q6. Which cron entry runs a script every Monday at 9 AM?
A) 0 9 * * 1 /path/to/script.py
B) 0 9 1 * * /path/to/script.py
C) 9 0 * * 1 /path/to/script.py
D) 0 9 * 1 * /path/to/script.py
Answer
**A.** Cron syntax is `minute hour day-of-month month day-of-week`. Day-of-week 1 = Monday. So `0 9 * * 1` is "9:00 on any day of month, any month, when it's Monday."Q7. What does buf.seek(0) do after fig.savefig(buf)?
A) Clears the buffer B) Rewinds the buffer to position 0 for reading C) Writes the buffer to disk D) Returns the buffer size
Answer
**B.** Writing to a BytesIO advances the position. `seek(0)` rewinds to the start so that subsequent reads return the full content.Q8. Which report format is best for stakeholders who want to do further analysis?
A) PDF B) PNG C) Excel/CSV D) PowerPoint
Answer
**C.** Excel and CSV let the stakeholder sort, filter, pivot, and extend the data. PDFs are read-only. Use Excel/CSV when the audience will analyze the data further, PDF when the conclusions are the main deliverable.Q9. What is weasyprint used for?
A) Generating SVG charts B) Converting HTML to PDF C) Sending emails D) Compressing images
Answer
**B.** WeasyPrint is a Python library that converts HTML+CSS to PDF. It supports modern CSS (flexbox, grid) and is a good alternative to direct PDF generation when you want rich styling.Q10. Why should scheduled reports include error handling with notifications?
A) To comply with regulations B) To prevent silent failures that go unnoticed C) To reduce server load D) To encrypt the data
Answer
**B.** A scheduled report that fails silently produces no report, and users don't notice until they ask why they haven't received one. Explicit error handling with notifications (email, Slack, PagerDuty) catches failures immediately.Part II: Short Answer (10 questions)
Q11. Write code to save a matplotlib figure to a BytesIO buffer.
Answer
import io
buf = io.BytesIO()
fig.savefig(buf, format="png", dpi=150, bbox_inches="tight")
buf.seek(0)
Q12. Explain why a report pipeline should validate its outputs before sending.
Answer
Pre-send validation catches upstream data issues (empty DataFrames, missing columns, unexpected values) that would produce wrong or misleading reports. Users trust scheduled reports, so wrong data is worse than no data. Basic checks: non-empty result, expected column names present, sanity checks on key metrics.Q13. Write the skeleton of a parameterized report function.
Answer
def generate_report(start_date, end_date, output_path):
df = load_data(start_date, end_date)
metrics = compute_metrics(df)
charts = build_charts(df)
build_pdf(output_path, metrics, charts)
return output_path
Q14. What are the main Python libraries covered in this chapter and what does each do?
Answer
**FPDF2** and **ReportLab** generate PDFs. **python-pptx** generates PowerPoint slides. **smtplib + email.mime** send HTML emails. **Jinja2** renders HTML templates. **openpyxl** writes Excel files. **WeasyPrint** converts HTML to PDF.Q15. How do you embed an inline image in an HTML email?
Answer
Use a CID reference in the HTML (`Q16. Describe the try/except pattern for scheduled report error handling.
Answer
Wrap the main logic in `try/except`. On success, log and optionally notify. On failure, log the traceback, notify a monitoring system (email, Slack, PagerDuty), and re-raise so the process exits with a non-zero status. Cron detects the non-zero exit and can alert on it.Q17. When should you use HTML-to-PDF (WeasyPrint) instead of direct PDF generation (FPDF2/ReportLab)?
Answer
When the report has rich styling, complex layouts, or design input from non-Python developers. HTML/CSS has richer layout features than FPDF2/ReportLab, and designers can mock up HTML/CSS before Python integration. The trade-off is slightly slower rendering and potential font-embedding issues.Q18. What is bulk generation and why does it benefit from parameterization?
Answer
Bulk generation produces many reports from the same pipeline — e.g., one per customer, one per region, one per month. Parameterization lets the same function handle all variations by taking configuration arguments. Without parameterization, you'd write one script per report variant; with it, you write one script and loop over configurations.Q19. Describe three security considerations for production report pipelines.
Answer
(1) **Credentials**: use environment variables or secrets management, not hardcoded passwords. (2) **Data access**: limit the database credentials to read-only and to specific tables. (3) **Output retention**: don't keep reports longer than necessary. (4) **Encryption**: password-protect sensitive PDFs; use encrypted email for external delivery. (5) **Audit logging**: record who received what and when.Q20. The chapter argues that reports complement dashboards rather than competing with them. Explain.
Answer
Dashboards are pull-based (users visit a URL) and interactive (users explore). Reports are push-based (arrive in inboxes) and authorial (tell a specific story). Different stakeholders and decision contexts favor different formats: executives often prefer reports, analysts often prefer dashboards, and regulators require reports. A mature data team produces both and knows when to use each. The tools in this chapter handle the report side; Chapters 29-30 handle the dashboard side.Scoring Rubric
| Score | Level | Meaning |
|---|---|---|
| 18–20 | Mastery | You can build production report pipelines covering PDFs, emails, slides, and scheduling. |
| 14–17 | Proficient | You know the main libraries; review error handling and scheduling. |
| 10–13 | Developing | You grasp the basics; re-read Sections 31.3-31.9 and work all Part B exercises. |
| < 10 | Review | Re-read the full chapter. |
Chapter 32 moves to theming and branding — building a visual identity that applies across dashboards, reports, and standalone charts.