Exercises: Capstone — The Complete Data Story

DataField.Dev

Exercises: Capstone — The Complete Data Story

These exercises are the capstone project deliverables. Each is a substantial deliverable, not a quick problem. Allow 2-4 weeks for independent completion.

Part A: Climate Capstone Deliverables (10 items)

A.1 ★★★ | Create

Write a one-page project brief: the question, the audience, the data, and the planned deliverables. This is Step 1 of the workflow from Chapter 33.

Guidance

One page, covering: (1) the specific question ("How has global warming accelerated, and what does the multi-variable evidence show?"), (2) the audience (a general science-literate reader), (3) the data (climate dataset: temperature, CO2, sea level, year, era), (4) the deliverables (6 static figures, 1 dashboard, 1 PDF report, 1 slide deck). This document guides every subsequent step.

A.2 ★★★ | Create

Produce an exploratory Jupyter notebook that loads the climate data, assesses quality, and generates 10+ quick charts. Document what you found.

Guidance

Use pandas for assessment (`df.info()`, `df.describe()`, `df.isnull().sum()`). Quick charts with matplotlib/seaborn: histograms of each variable, time series of each variable, scatter of CO2 vs. temperature, correlation heatmap, box plots by era. Annotate findings: "temperature distribution is right-skewed in the modern era", "CO2 has a strong linear relationship with temperature", etc.

A.3 ★★★ | Create

Produce 6 publication-quality static figures, each answering a different question about the climate data. Apply the brand from Chapter 32.

Guidance

Suggested 6 figures: (1) Full temperature time series with 10-year rolling mean and annotations. (2) CO2 vs. temperature scatter with regression. (3) Monthly heatmap of temperature anomalies. (4) Multi-panel small multiples (temperature, CO2, sea level). (5) Distributional comparison (violin plots by era). (6) Pair plot of all variables colored by era. Each figure must have an action title, source attribution, brand colors, and a caption.

A.4 ★★★ | Create

Build a Streamlit dashboard for the climate dataset with sidebar filters, multiple chart tabs, KPI metrics, and a download button.

Guidance

Structure: sidebar with date range slider, variable selectbox, and smoothing slider. Main area with tabs: "Time Series" (Plotly line chart), "Relationships" (scatter), "Distributions" (violin/box), "Raw Data" (st.dataframe). KPI cards at the top: mean anomaly, max anomaly, current CO2. Download button for filtered data.

A.5 ★★★ | Create

Build an automated PDF report pipeline that generates a monthly climate report with 4 charts, a summary, and a data table.

Guidance

Use FPDF2 or ReportLab. Parameterize by date range. Include: title page, summary paragraph (computed from metrics), 4 charts (time series, scatter, heatmap, bar by era), data table of monthly means, source attribution, page numbers. Save as PDF. Verify by opening the PDF.

A.6 ★★★ | Create

Create a 10-slide presentation (python-pptx) telling the climate data story following Chapter 9's narrative structure.

Guidance

Suggested slides: (1) Title. (2) The question. (3) Context: what the data covers. (4) The trend: temperature over time. (5) The driver: CO2 relationship. (6) The evidence: multi-variable correlations. (7) The regional view: map or regional comparison. (8) The seasonal view: heatmap or cycle plot. (9) The dashboard: screenshot or summary of the Streamlit app. (10) Conclusion and next steps. Each slide has one chart and one key message. Use python-pptx with the brand template.

A.7 ★★★ | Evaluate

Apply the Master Critique Rubric from Chapter 33 to each of your 6 static figures. Document your scores and any issues found.

Guidance

Use the 25-point rubric: data integrity (4), encoding (4), design (4), accessibility (3), ethics (3), narrative (4), brand (3). Score each figure. Note items that score below expectations. Fix critical issues; document non-critical ones.

A.8 ★★★ | Apply

Apply consistent branding across all outputs: the 6 figures, the dashboard, the PDF report, and the slide deck should all use the same color palette, fonts, and title style.

Guidance

Use the brand module from [Chapter 32](../../part-07-dashboards-production/chapter-32-theming-branding-style-guides/index.md): a shared `.mplstyle` file, a Plotly template, and helper functions. Import at the top of every script. Verify consistency by placing a figure from each output side by side.

A.9 ★★★ | Evaluate

Write a 1-page reflection on the capstone process: what worked, what was hardest, what you would change, what surprised you.

Guidance

Honest self-assessment. Common themes: "data assessment was more valuable than I expected", "the critique step caught issues I would have shipped", "branding made everything look more professional with minimal effort", "the hardest part was the PDF report because of font issues." The reflection is as important as the deliverables.

A.10 ★★★ | Create

Archive the complete project: all source code in a git repository, with a README documenting the structure, requirements, and how to reproduce each output.

Guidance

climate-capstone/
  README.md
  requirements.txt
  data/climate.csv
  notebooks/exploration.ipynb
  figures/figure_01.png ... figure_06.png
  dashboard/app.py
  report/generate_report.py
  slides/generate_slides.py
  brand/brand.py, climate_observatory.mplstyle

The README should explain how to reproduce every output. `pip install -r requirements.txt`, then run each script.

Part B: Independent Capstone (6 items)

B.1 ★★★ | Create

Choose a dataset from a different domain. Suggested options: (1) NYC taxi trips (subset), (2) World Bank development indicators, (3) US election results by county, (4) Spotify top tracks, (5) Stack Overflow developer survey, (6) your own dataset from work or research.

Guidance

Choose a dataset you find interesting and that has enough variables for multi-chart exploration. The dataset should be freely available and at least 1000 rows.

B.2 ★★★ | Create

Write a project brief for the independent capstone: question, audience, data, deliverables.

Guidance

Same structure as A.1 but for the new dataset. The question should be specific and testable.

B.3 ★★★ | Create

Produce an exploratory notebook for the independent dataset.

Guidance

Same structure as A.2. Load, assess, explore with quick charts, document findings.

B.4 ★★★ | Create

Produce 4+ publication-quality static figures for the independent dataset, with branding.

Guidance

Fewer than the climate capstone (4 instead of 6) because this is independent work. Still requires action titles, source attribution, brand application, and captions.

B.5 ★★★ | Create

Build either a Streamlit dashboard or an automated PDF report for the independent dataset (student's choice).

Guidance

Choose the output format that best fits the audience. A dashboard is better for exploration; a report is better for delivery. Whichever you choose, apply the brand.

B.6 ★★★ | Evaluate

Apply the critique rubric to the independent capstone, write a reflection, and archive the project.

Guidance

Same rubric and reflection as A.7 and A.9. Archive the code in a repository with a README.

Part C: Meta-Reflection (4 items)

C.1 ★★★ | Evaluate

Compare your first matplotlib chart (from Chapter 10 or 11) to your capstone figure. What has changed? What skills did you develop?

Guidance

Pull up your earliest chart from the book's exercises and place it next to a capstone figure. The difference should be dramatic: default styling vs. branded, generic title vs. action title, no annotations vs. rich annotations. List the specific improvements and trace each to the chapter that taught it.

C.2 ★★★ | Evaluate

Which chapter was the most valuable to you personally, and why?

Guidance

This is subjective. Common answers: [Chapter 7](../../part-02-design-principles/chapter-07-typography-annotation/index.md) (action titles), [Chapter 12](../../part-03-matplotlib/chapter-12-customization-mastery/index.md) (customization), [Chapter 20](../../part-05-interactive/chapter-20-plotly-express/index.md) (Plotly for interactivity), [Chapter 33](../../part-07-dashboards-production/chapter-33-visualization-workflow/index.md) (workflow). The answer depends on your background and goals. The exercise is to identify what you value most so you can invest in that area going forward.

C.3 ★★★ | Evaluate

What skill gap do you still have after finishing this book? What is your plan to address it?

Guidance

Honest self-assessment. Common gaps: D3/JavaScript for custom web viz, advanced statistical methods, specific domain knowledge, design skills beyond what a style sheet can provide. Plan: read a specific book, take a specific course, build a specific project. The exercise closes the loop from "what I learned" to "what I will learn next."

C.4 ★★★ | Create

Build a personal visualization portfolio: 5-10 of your best charts from this book, with a brief description of each and what it demonstrates.

Guidance

Pick your best work across the book. For each chart: a thumbnail, the question it answers, the technique it demonstrates, and the chapter it came from. Publish as a GitHub Pages site, a personal website section, or a PDF portfolio. This is a professional asset for job interviews and client pitches.

With the capstone complete, you have a full portfolio of visualization work spanning every technique in the book. Chapter 35 is the Visualization Gallery — a permanent reference of 50 chart types with code for each.