Case Study 2: Real-World Data Stories — What Professional Capstones Look Like
The capstone project in this chapter is a teaching exercise. But the skills it exercises are the same ones that professionals use to produce real-world data stories every day. This case study looks at three examples of professional end-to-end data stories — one from journalism, one from science, and one from business — and examines what makes each effective. These are the kinds of projects you should aspire to after finishing this book.
Example 1: The Pudding's "Film Dialogue" Project
In 2016, the data journalism outlet The Pudding published "Film Dialogue" — an interactive scrollytelling analysis of gendered dialogue in 2,000 screenplays. The project combined data engineering (scraping and parsing thousands of screenplays, linking lines to characters, classifying character gender), statistical analysis (computing dialogue shares by gender, genre, and decade), and rich interactive visualization (scrollytelling with synchronized charts, interactive filters, and explorable data).
The project had multiple outputs: the main interactive article (web, D3.js), a companion dataset (open-sourced on GitHub), a behind-the-scenes blog post documenting the methodology, and social media graphics for promotion. The visual style was consistent across all outputs — The Pudding's distinctive bold typography, dark backgrounds, and clean chart style.
What makes this a "capstone-level" project:
- A specific question: "How does dialogue distribution differ by gender in Hollywood films, and how has this changed over time?"
- Substantial data work: thousands of screenplays processed, characters linked to gender, quality issues handled.
- Multiple visualization types: bar charts, scatter plots, slope charts, small multiples, and custom interactive elements.
- Multi-format output: interactive web article, open data, blog, social graphics.
- Consistent brand: The Pudding's visual identity throughout.
- Narrative structure: the article follows a clear story arc from question to evidence to conclusion.
- Self-critique: the behind-the-scenes post discusses limitations and caveats.
The Pudding team is small — each project is typically 1-3 people working for several weeks. The project demonstrates that a small team with strong skills can produce work that competes with major newsrooms.
Example 2: The IPCC AR6 Summary for Policymakers
The IPCC's AR6 Summary for Policymakers (SPM), published in 2021, is a 42-page document summarizing thousands of pages of climate science for policymakers. The SPM contains about 20 figures, each produced by climate scientists and refined through extensive review and negotiation. The figures cover global temperature projections, emission scenarios, sea level rise, extreme weather attribution, and mitigation pathways.
The SPM figures are the most carefully-produced data visualizations in climate science. Each figure:
- Answers a specific policy-relevant question: "What are the projected temperature outcomes under different emission scenarios?"
- Integrates data from multiple sources: climate models, observational records, statistical analyses.
- Has been through multiple rounds of review: expert review, government review, and a plenary approval session.
- Follows a consistent visual style: standardized colors for emission scenarios (SSPs), consistent typography, panel labels.
- Is accompanied by a detailed caption: describing every element, the data sources, and the uncertainty ranges.
- Is reproducible: the underlying data and code are published alongside the report.
The SPM is the scientific equivalent of a capstone: a multi-format, multi-audience, rigorously-reviewed project that integrates years of work into a coherent deliverable. The production timeline is measured in years, not weeks, but the principles are the same: clear questions, careful data work, disciplined visualization, consistent branding, structured critique.
For students, the SPM is worth studying as an aspirational reference. You will not produce something this polished in a course, but the production values — the consistent styling, the thoughtful uncertainty communication, the meticulous captions — are standards to work toward.
Example 3: A Corporate Quarterly Business Review
The third example is less glamorous but more common: a corporate quarterly business review (QBR). At many companies, the data team produces a QBR presentation each quarter: a slide deck with 15-20 slides showing revenue trends, customer metrics, product adoption, market comparisons, and forecasts. The QBR is presented to the executive team and archived for reference.
A well-produced QBR has:
- A clear structure: executive summary (1-2 slides), financial overview (3-4 slides), customer metrics (3-4 slides), product metrics (3-4 slides), competitive landscape (1-2 slides), forecast and risks (2-3 slides), appendix (as needed).
- Consistent branding: company colors, fonts, and logo on every slide.
- Action titles on every chart: "Revenue grew 12% YoY, driven by Enterprise tier" rather than "Quarterly revenue."
- Source attribution: data source and date cutoff on every chart.
- Error bars or confidence ranges on forecasts: never a single line without uncertainty.
- A companion dashboard: a Streamlit or Looker dashboard that stakeholders can explore between QBR sessions.
- An automated pipeline: the QBR is generated from a parameterized script that pulls data from the warehouse, builds the charts, and assembles the deck. The quarterly manual effort is editorial (writing commentary, choosing emphasis), not technical (building charts).
The QBR is the business equivalent of the capstone: a recurring, multi-format, branded project that integrates data, visualization, narrative, and delivery. Many data teams spend more time on QBRs than on any other single output. Building the QBR pipeline is one of the highest-impact things a data team can do.
What These Examples Share
Despite their different domains (journalism, science, business), these three examples share common features:
1. A clear question driving the work. The Pudding asks about gendered dialogue. The IPCC asks about climate projections. The QBR asks about business performance. Each project starts with a question, and every chart serves that question.
2. Substantial data work behind the visualizations. The charts are the tip of the iceberg. Most of the effort goes into data collection, cleaning, analysis, and validation. The visualization is the last step, not the whole project.
3. Multiple output formats. Each project produces several kinds of output for different audiences. The Pudding has web + data + blog + social. The IPCC has SPM + full report + data archive. The QBR has slides + dashboard + data tables.
4. Consistent visual identity. Each project uses a brand system: The Pudding's bold style, the IPCC's standardized scenario colors, the company's corporate brand. Consistency makes the project feel cohesive and professional.
5. Critique and review. The Pudding gets editorial review. The IPCC gets expert + government review. The QBR gets exec feedback. No project ships without someone else looking at it.
6. Narrative structure. Each project tells a story: setup (context), development (evidence), conclusion (takeaway). The structure is not accidental; it is designed.
7. Reproducibility. The Pudding open-sources its data and code. The IPCC publishes data and methods. The QBR has a parameterized pipeline. Each project can be regenerated from its source materials.
These seven features are exactly what the capstone chapter asks you to produce. The capstone is not an academic exercise; it is practice for the real work these professionals do every day.
The Gap Between Capstone and Professional Work
The capstone is a close approximation of real-world work, but there are some differences:
Time: the capstone takes 2-4 weeks. Professional projects take months or years (the IPCC SPM took ~6 years).
Team: the capstone is solo. Professional projects involve teams of specialists (data engineers, designers, developers, editors).
Data access: the capstone uses a clean, pre-curated dataset. Professional projects involve messy data from multiple sources with quality issues.
Stakeholder management: the capstone has no real stakeholders. Professional projects involve negotiating with editors, executives, reviewers, and clients about what to include and how to present it.
Technical scale: the capstone is modest in data size and computational complexity. Professional projects may involve billions of rows, distributed computing, and custom infrastructure.
These differences are real, but the core skills — the 8-step workflow, the critique rubric, the brand system, the multi-format output — are the same. The capstone teaches the skills at a manageable scale; professional work applies them at a larger one. The gap is one of scale and context, not of kind.
What to Do After the Capstone
The capstone is a beginning, not an end. After completing it, consider:
Build a portfolio. Your capstone and independent project are portfolio pieces. Publish them on a personal website or GitHub. Link to them in your resume.
Seek feedback. Show your work to people outside the course: colleagues, mentors, online communities (Data Visualization Society, Reddit r/dataisbeautiful, Twitter/X data viz community). External feedback builds skills that self-critique cannot.
Take on a real project. Apply the skills to a real-world question: a work project, a volunteer project, a personal interest. The transition from exercises to real work is where the skills solidify.
Specialize. You now have broad skills across many tools and techniques. Consider specializing in one area: scientific visualization, data journalism, dashboard development, or another niche. Depth in one area is more marketable than breadth across all of them.
Keep learning. The field evolves constantly. New libraries appear, new techniques are developed, new best practices emerge. Read, practice, and stay current. The Further Reading sections of this book point to ongoing resources.
Discussion Questions
-
On The Pudding example. The Film Dialogue project used D3.js, not Plotly or matplotlib. Does the tool matter, or is the workflow transferable across tools?
-
On the IPCC example. The SPM figures are the most carefully-produced scientific visualizations in the world. What makes them trustworthy beyond just the styling?
-
On the QBR example. Many data teams spend most of their time on recurring reports. Is this a good use of their skills, or should automation handle more of it?
-
On the gap. The capstone is simpler than professional work in several ways. Which simplification is the most significant, and how would you prepare for the professional version?
-
On specialization. The chapter suggests specializing after the capstone. Which area of data visualization are you most drawn to, and why?
-
On your own capstone. After seeing these professional examples, what would you change about your capstone project?
Professional data stories — from The Pudding, the IPCC, and corporate QBRs — share the same features the capstone asks you to develop: clear questions, substantial data work, multiple outputs, consistent branding, narrative structure, critique, and reproducibility. The capstone is practice for this real work. The skills you have developed in 34 chapters are the same skills these professionals use. The difference is scale and context, not kind. What you build next is up to you.