Chapter 31 Exercises: Communicating Results: Reports, Presentations, and the Art of the Data Story

Contributors to Introduction to Data Science

Chapter 31 Exercises: Communicating Results: Reports, Presentations, and the Art of the Data Story

How to use these exercises: Part A focuses on conceptual understanding — can you distinguish findings from insights and identify audience needs? Part B is applied — you will rewrite, restructure, and critique real communications. Part C involves creating communication artifacts. Part D pushes you toward synthesis and critical evaluation.

Difficulty key: ⭐ Foundational | ⭐⭐ Intermediate | ⭐⭐⭐ Advanced | ⭐⭐⭐⭐ Extension

Part A: Conceptual Understanding ⭐

Exercise 31.1 — Findings vs. insights

For each of the following findings, rewrite it as an insight by adding context, interpretation, and a suggested action.

"Customer churn increased 8% in Q3."
"The average response time for support tickets is 4.2 hours."
"Women are 35% less likely to click on our job ads than men."
"The model has an accuracy of 91%."
"Sales of Product A peak in November."

Guidance

1. "Q3 churn spiked 8%, driven almost entirely by customers acquired through the June discount campaign. This suggests that deep discounts attract price-sensitive customers with low loyalty. We recommend adding an engagement checkpoint at 30 days post-signup to improve retention." 2. "Support ticket response time averages 4.2 hours — nearly double our 2-hour SLA target. The bottleneck is ticket routing: 60% of tickets are initially assigned to the wrong team. Implementing automated categorization could cut response time by an estimated 40%." 3. "Women are 35% less likely to click our job ads, suggesting the ad copy or imagery may be discouraging female applicants. An A/B test with revised language could determine whether the gap is addressable through messaging changes." 4. "The model correctly classifies 91% of cases overall, but its accuracy drops to 67% for minority classes. For a fraud detection system, this means one in three fraudulent transactions would be missed. We need to retrain with oversampling or adjust the decision threshold." 5. "Product A sales peak in November, likely driven by holiday shopping. Pre-positioning inventory in October and increasing ad spend in the first two weeks of November could capture an estimated 15% more revenue." The key pattern: every insight adds *why it matters* and *what to do about it*.

Exercise 31.2 — Audience identification ⭐

For each scenario below, identify the primary audience type (technical, managerial, or public/executive), and describe how you would adjust your communication.

You are presenting a churn prediction model to the VP of Customer Success.
You are writing a blog post about interesting patterns you found in public transit data.
You are sharing your analysis notebook with a teammate who will extend your work.
You are briefing the school board on student performance trends.
You are presenting at a data science meetup about a novel feature engineering technique.

Guidance

1. **Managerial.** Focus on which customers are at risk and what interventions to take. Show the top predictive features in business terms ("customers who haven't logged in for 30 days"), not technical terms ("feature importance scores from the gradient boosting model"). Include a cost-benefit analysis of intervention. 2. **Public.** Use no jargon. Lead with the most surprising or relatable finding. Use simple, well-annotated visualizations. Tell a story — perhaps about how transit patterns reflect the rhythm of the city. 3. **Technical.** Include full methodology, code, data sources, and assumptions. Document edge cases and decisions. Your teammate needs to understand *how* you did things, not just *what* you found. 4. **Public/executive.** The school board may have diverse backgrounds. Use plain language, focus on what the trends mean for students and families, and provide clear recommendations. Avoid statistical terminology. 5. **Technical.** This audience wants methodology details, benchmarks, and reproducibility information. Show code, compare approaches, and discuss tradeoffs.

Exercise 31.3 — The Pyramid Principle ⭐

Rearrange the following sentences into Pyramid Principle order (conclusion first, then evidence, then details):

"We analyzed three years of sales data across 200 stores."
"Stores that adopted the new checkout system saw 12% higher sales."
"We recommend rolling out the new checkout system to all locations."
"The effect was strongest in high-traffic stores (18% increase) and weaker in low-traffic stores (5%)."
"The analysis controlled for store size, location, and seasonal trends."
"Customer survey data suggests faster checkout is the primary driver."

Guidance

**Level 1 (Conclusion):** "We recommend rolling out the new checkout system to all locations." **Level 2 (Evidence):** - "Stores that adopted the new checkout system saw 12% higher sales." - "The effect was strongest in high-traffic stores (18% increase) and weaker in low-traffic stores (5%)." - "Customer survey data suggests faster checkout is the primary driver." **Level 3 (Details):** - "We analyzed three years of sales data across 200 stores." - "The analysis controlled for store size, location, and seasonal trends."

Exercise 31.4 — Narrative arc identification ⭐

Read the following data story and label each paragraph as Situation, Complication, Resolution, or Call to Action:

"In 2019, Springfield County achieved its highest-ever childhood vaccination rate: 92%, well above the 85% herd immunity threshold. [] The COVID-19 pandemic disrupted routine health services, and by 2022, the rate had fallen to 76%. Three schools reported outbreaks of preventable diseases. [] Our analysis of clinic-level data reveals that the decline was concentrated in the southern part of the county, where two of three clinics closed during the pandemic and never reopened. The northern district, where clinics remained open, maintained an 89% rate. [] We recommend prioritizing the reopening of at least one clinic in the southern district, with a specific focus on catch-up vaccination drives for children aged 2-5. []"

Guidance

1. **Situation** — establishes the baseline (things were good in 2019). 2. **Complication** — introduces the problem (pandemic caused a decline; outbreaks occurred). 3. **Resolution** — presents the analysis finding (the decline was geographically concentrated; clinic closures explain it). 4. **Call to Action** — recommends a specific next step.

Exercise 31.5 — Spotting communication mistakes ⭐

Identify the communication mistake in each example:

A 45-slide presentation where every slide shows a different chart, with no narrative thread connecting them.
"The logistic regression yielded an AUC of 0.847, which, when compared against the baseline model's AUC of 0.723, represents a statistically significant improvement (DeLong test, p = 0.0012)." — presented to a marketing team.
A chart showing revenue growth with a y-axis starting at $9.8 million instead of $0, making a 2% increase look like a 40% increase.
A report that presents six findings supporting the proposed strategy and omits two findings that complicate it.
"Revenue will be $14,782,341.67 next quarter."

Guidance

1. **Data dump.** Too many charts with no narrative. Cut to the 8-10 slides that support a single clear message; move the rest to an appendix. 2. **Jargon without translation.** A marketing team does not know what AUC, logistic regression, or a DeLong test are. Translate: "Our new model is significantly better at predicting which customers will buy — it correctly identifies 85% of purchasers, up from 72% with the old approach." 3. **Misleading axis.** Truncating the y-axis creates a visual impression that does not match the actual change. Either start at zero or prominently label the scale to prevent misinterpretation. 4. **Cherry-picking.** Omitting contradictory findings is ethically problematic. Report all findings and discuss why the contradictory ones do not invalidate the conclusion (or acknowledge that they weaken it). 5. **False precision.** Forecasting to the cent implies a level of accuracy that no forecast can achieve. Round to "approximately $14.8 million" and provide a range.

Part B: Applied Communication ⭐⭐

Exercise 31.6 — Rewriting for a different audience ⭐⭐

The following paragraph was written for a data science team. Rewrite it for (a) a product manager and (b) a newspaper article.

"We trained a random forest classifier on 18 months of user activity data (n=45,000) to predict 30-day churn. Feature importance analysis revealed that session_frequency (importance=0.31) and days_since_last_login (importance=0.27) were the top predictors. The model achieved an AUC of 0.89 and F1 of 0.82 on the held-out test set. Cross-validation confirmed stability across folds (std=0.02)."

Guidance

**(a) For a product manager:** "We built a model that can predict which users will stop using the product within the next 30 days. The biggest warning signs are how often someone uses the app and how long it's been since their last visit. The model correctly identifies about 82% of users who will churn, which means we can proactively reach out to at-risk users before they leave. We recommend integrating this into the CRM so the customer success team gets automated alerts." **(b) For a newspaper article:** "The company developed a system that can identify customers likely to cancel their subscriptions, based on patterns in how often and how recently they use the service. According to the team, the system catches about 8 out of 10 customers who are on their way out, giving the company a chance to intervene — perhaps with a special offer or a check-in call — before the customer decides to leave." Note how the technical details (random forest, AUC, F1, cross-validation) disappear entirely for the non-technical audiences, replaced by plain-language equivalents and business/human context.

Exercise 31.7 — Writing assertive slide titles ⭐⭐

Convert each topic-based title into an assertion-evidence title:

"Q3 Revenue Results"
"Customer Satisfaction Survey"
"Website Traffic Analysis"
"Employee Turnover Data"
"Model Performance Comparison"

Guidance

1. "Q3 revenue exceeded target by 7%, driven by the new enterprise tier" 2. "Customer satisfaction dropped 15 points after the app redesign" 3. "Organic search traffic doubled after the SEO overhaul, while paid traffic declined" 4. "Engineering turnover is 3x the company average, concentrated in mid-level roles" 5. "The gradient boosting model outperforms logistic regression on every metric" Each title now tells the reader what to conclude before they even look at the evidence below.

Exercise 31.8 — Annotating a chart ⭐⭐

You have a line chart showing monthly website traffic for the past 24 months. Traffic was steady at around 50,000 visits/month for the first 12 months, then jumped to 80,000 after a marketing campaign launched in month 13, then gradually declined to 60,000 by month 24.

Write out: 1. An assertive title for this chart 2. Three annotations you would place directly on the chart 3. A one-sentence takeaway you would put below the chart

Guidance

1. **Title:** "The marketing campaign doubled traffic, but gains faded within a year as spending returned to normal" 2. **Annotations:** - At month 13: "Campaign launched →" with an arrow pointing to the spike - At the peak (month 13-14): "Peak: 80,000 visits/month" - At month 24: "Current: 60,000 — still 20% above pre-campaign baseline" 3. **Takeaway:** "The campaign created a sustained 20% traffic increase, but sustaining the full effect may require ongoing investment rather than a one-time push."

Exercise 31.9 — Writing an executive summary ⭐⭐

You analyzed employee satisfaction survey data and found: - Overall satisfaction dropped from 7.8/10 to 6.2/10 over two years - The decline was concentrated in the engineering department (7.9 → 4.8) - The top complaint in engineering was "lack of career growth opportunities" - Departments with mentorship programs maintained satisfaction above 7.0 - Engineering has no mentorship program

Write a 150-200 word executive summary for the Chief People Officer.

Guidance

**Executive Summary: Employee Satisfaction Trends and Recommended Intervention** Employee satisfaction has declined from 7.8 to 6.2 over the past two years, but this decline is not company-wide — it is concentrated in Engineering, where satisfaction plummeted from 7.9 to 4.8, the lowest of any department. The primary driver is a perceived lack of career growth opportunities. In the most recent survey, 73% of engineering respondents cited "unclear promotion path" or "no mentorship" as their top concern. Notably, departments that have established mentorship programs (Sales, Product, Design) maintained satisfaction scores above 7.0 throughout the same period. **Recommendation:** Launch a mentorship program in Engineering within the next quarter, pairing senior engineers with mid-level staff. Based on outcomes in other departments, we expect this could improve Engineering satisfaction by 1.5-2.0 points within six months. **Risk if unaddressed:** Engineering turnover has already increased 40% year-over-year. At current rates, we will lose an estimated 15 engineers in the next two quarters, at a replacement cost of roughly $2.25 million.

Exercise 31.10 — Narrative arc construction ⭐⭐

Using the following facts, construct a data story with clear Situation, Complication, Resolution, and Call to Action. Write 4-6 sentences for each section.

Facts: - A city's public library system serves 200,000 cardholders - Digital book checkouts have increased 300% since 2019 - Physical book checkouts have declined 45% in the same period - Three branch libraries are at risk of closure due to budget cuts - Analysis shows that branches in low-income neighborhoods have the highest physical checkout rates - Digital checkout rates are lowest in these same neighborhoods (limited internet access)

Guidance

**Situation:** The Metro City library system has long been a cornerstone of community access to knowledge, serving 200,000 active cardholders across 12 branches. In recent years, the system has invested heavily in digital infrastructure, and digital checkouts have grown an impressive 300% since 2019. This digital expansion was celebrated as a success story — evidence that the library was modernizing to meet changing reader habits. **Complication:** However, the growth in digital lending has masked a troubling pattern. Physical checkouts have declined 45% system-wide, prompting the city budget office to propose closing three branch libraries to cut costs. But our analysis reveals that the decline is not uniform. The three branches proposed for closure — all located in low-income neighborhoods — actually have the highest physical checkout rates in the system. These same neighborhoods have the lowest rates of digital checkout, likely because residents have limited home internet access. **Resolution:** The data tells a clear story: closing these branches would not affect communities that have already shifted to digital. It would disproportionately impact the communities that depend most on physical library access — the very communities the library system was built to serve. Our analysis estimates that 12,000 active users in low-income areas would lose their primary point of library access. **Call to Action:** We recommend that the city preserve all three branches in low-income neighborhoods and instead consolidate two branches in high-income areas where digital checkout has effectively replaced physical visits. This approach saves a comparable amount while protecting access for the city's most vulnerable residents.

Exercise 31.11 — Dashboard design critique ⭐⭐

A colleague shows you a dashboard with the following elements on a single screen: - 12 different charts (bar, line, pie, scatter, area, donut, treemap, bubble, stacked bar, waterfall, radar, and gauge) - No filters or interactive elements - Each chart uses a different color scheme - Charts have descriptive titles like "Sales" and "Revenue" but no annotations - There is no clear visual hierarchy — all charts are the same size - Data source and date are not shown

List at least six specific improvements you would recommend, explaining why each matters.

Guidance

1. **Reduce to 4-6 charts.** Twelve charts overwhelm the viewer and prevent any single chart from receiving adequate attention. Identify the 4-6 most important metrics and focus on those. Move the rest to sub-pages or remove them. 2. **Establish visual hierarchy.** Make the most important metric the largest element, placed at the top left. Supporting charts should be smaller and arranged below. This guides the viewer's eye to what matters most. 3. **Use a consistent color scheme.** Different color schemes for each chart force the viewer to re-learn the visual encoding twelve times. Choose one palette and use it consistently — same color for the same dimension across all charts. 4. **Add assertive titles.** "Sales" tells the viewer nothing. "Sales grew 12% in Q3, led by the enterprise segment" tells a story. Every chart title should state the insight. 5. **Add annotations to key charts.** Label important data points, add reference lines for targets/benchmarks, and mark significant events. Without annotations, viewers must interpret each chart from scratch. 6. **Add filters.** Interactive filters (date range, region, product) let different users answer different questions. A dashboard without filters forces a one-size-fits-all view. 7. **Include data source and last-updated date.** Without these, viewers cannot assess data freshness or trustworthiness. A prominent "Data as of: March 15, 2026" label is essential. 8. **Remove the pie, donut, and radar charts.** These chart types are difficult to read accurately. Replace pies with horizontal bar charts and radar charts with grouped bar charts.

Exercise 31.12 — Translating numbers ⭐⭐

Rewrite each technical statement in plain language that a non-technical stakeholder would understand:

"The model's precision is 0.78 and recall is 0.65."
"We used a 70/30 train-test split with 5-fold cross-validation."
"The p-value was 0.003, so we reject the null hypothesis."
"The R-squared is 0.42."
"We applied L2 regularization with lambda=0.01 to prevent overfitting."

Guidance

1. "When the model flags a transaction as fraudulent, it's correct about 78% of the time. However, it only catches 65% of actual fraud — so about a third of fraudulent transactions slip through." 2. "We trained the model on 70% of our historical data and tested it on the remaining 30% that it had never seen before. We repeated this process five times with different splits to make sure the results were consistent, not a fluke." 3. "The difference we observed is statistically significant — there's less than a 0.3% chance it occurred by random chance alone. In practical terms, we're highly confident this is a real pattern, not noise." 4. "Our model explains about 42% of the variation in the outcome. That means it captures a meaningful portion of what's going on, but there are clearly other factors at play that we haven't included." 5. "We used a technique that prevents the model from over-memorizing the training data, so it performs better on new, unseen data. Think of it as keeping the model from overthinking."

Part C: Creating Communication Artifacts ⭐⭐⭐

Exercise 31.13 — Building a narrative notebook ⭐⭐⭐

Using any dataset you have worked with in this course (or the vaccination project data), create a Jupyter notebook that follows the narrative notebook principles from Section 31.6. Your notebook should:

Open with a Markdown cell explaining the analysis question and data source
Include at least 5 Markdown cells that narrate the analysis (not just code comments)
Produce at least 3 annotated visualizations with assertive titles
End with a conclusions cell that summarizes findings and limitations
Be readable top-to-bottom by someone who has never seen the data

Evaluation criteria: Could a classmate read this notebook without you present and understand what you found and why it matters?

Guidance

Structure your notebook like this: 1. **Title and Introduction** (Markdown): What question are you answering? What data are you using? Why does it matter? 2. **Setup** (Code): Import libraries cleanly. No unnecessary imports. 3. **Data Loading and Preview** (Markdown + Code): Explain what the data contains. Show `.head()` and `.shape`. 4. **Data Preparation** (Markdown + Code): Explain any cleaning or filtering, with rationale. 5. **Analysis 1** (Markdown + Code + Markdown): State the question, show the analysis and visualization, then interpret the result. 6. **Analysis 2** (same pattern): A second analytical question. 7. **Analysis 3** (same pattern): A third question, building on the previous findings. 8. **Conclusions** (Markdown): 3-5 bullet points summarizing what you found, what you recommend, and what limitations exist. The key test: read every Markdown cell in sequence, skipping the code. Does a coherent story emerge? If yes, you have a narrative notebook. If it reads like "now we do this... now we do that..." you have a lab notebook — revise.

Exercise 31.14 — Designing a slide deck outline ⭐⭐⭐

Create a 10-slide outline for a presentation about your vaccination analysis project, targeted at a county health board (non-technical audience). For each slide, write:

The assertion-evidence title (a full sentence)
What visual evidence would appear on the slide (chart type, key data shown)
What you would say verbally (2-3 sentences of speaker notes)

Guidance

Example for the first three slides: **Slide 1: Title Slide** - Title: "Protecting Our Children: Data-Driven Recommendations for Improving Vaccination Rates in [County]" - Visual: Clean title with county logo, presenter name, date - Speaker notes: "Good morning. I'm here today to share what our analysis of vaccination data tells us about where we are, where we're heading, and what we can do about it." **Slide 2: "Vaccination rates in our county were above the state average until 2020"** - Visual: Line chart showing county rate vs. state average, 2015-2023, with annotation at 2020 - Speaker notes: "For years, our county outperformed the state average. We were consistently above 85%, the threshold for herd immunity. But as you can see, that changed in 2020." **Slide 3: "Since 2020, our rate has dropped 11 points — twice the state decline"** - Visual: Two-bar comparison showing county decline (-11) vs. state decline (-5), with the gap highlighted - Speaker notes: "Our decline wasn't just part of a statewide trend. We fell twice as fast as the state average. Something specific is happening in our county that we need to understand." Continue this pattern for slides 4-10, building toward a recommendation.

Exercise 31.15 — Annotated visualization ⭐⭐⭐

Using matplotlib, create a bar chart comparing a metric across 5-8 categories. Apply the following communication principles:

Use an assertive title that states the insight
Add a reference line showing a benchmark or target
Annotate the highest and lowest bars with their values
Add a callout text box highlighting the key takeaway
Remove chartjunk (unnecessary gridlines, borders, decorations)

import matplotlib.pyplot as plt
import numpy as np

# Example: Vaccination rates by district
districts = ['North', 'East', 'South', 'West', 'Central',
             'Northeast', 'Southeast']
rates = [88, 82, 67, 79, 91, 75, 71]
target = 85

fig, ax = plt.subplots(figsize=(10, 6))

colors = ['#2ecc71' if r >= target else '#e74c3c' for r in rates]
bars = ax.barh(districts, rates, color=colors)

# Reference line
ax.axvline(x=target, color='#2c3e50', linestyle='--', linewidth=1.5)
ax.text(target + 0.5, 6.5, f'Target: {target}%', fontsize=10,
        color='#2c3e50')

# Annotate highest and lowest
max_idx = np.argmax(rates)
min_idx = np.argmin(rates)
ax.text(rates[max_idx] + 1, max_idx, f'{rates[max_idx]}%',
        va='center', fontweight='bold', color='#2ecc71')
ax.text(rates[min_idx] + 1, min_idx, f'{rates[min_idx]}%',
        va='center', fontweight='bold', color='#e74c3c')

# Assertive title
ax.set_title('Three districts fall below the 85% vaccination target,\n'
             'with South district at a critical 67%',
             fontsize=13, fontweight='bold', loc='left')

ax.set_xlabel('Vaccination Rate (%)')
ax.set_xlim(50, 100)
ax.spines[['top', 'right']].set_visible(False)

plt.tight_layout()
plt.show()

Customize this for your own data and insight.

Guidance

The key elements that make this chart communicate effectively: - The **title** tells the viewer exactly what to conclude - The **reference line** provides context (what is the target?) - The **color coding** creates instant visual categorization (above/below target) - The **annotations** on key bars draw attention to the most important values - The **removed spines** reduce visual clutter Your version should follow these same principles with your own data and insight.

Part D: Synthesis and Critical Evaluation ⭐⭐⭐–⭐⭐⭐⭐

Exercise 31.16 — Communication audit ⭐⭐⭐

Find a real data visualization in a news article, blog post, or social media post. Using the communication checklist from Section 31.12, evaluate it on:

Does it have a clear insight (not just a finding)?
Is it appropriate for its audience?
Are axes labeled and honest?
Are key data points annotated?
Is uncertainty communicated?
Could it be misinterpreted? How?

Write a 200-word evaluation.

Guidance

Look for visualizations in publications like the New York Times, The Economist, FiveThirtyEight, or Our World in Data. These tend to be high quality, which makes for richer evaluation. Also look for less polished examples on social media or business blogs. Your evaluation should address each of the six criteria specifically. For example: "The chart uses an assertive title ('Homelessness has doubled since 2019'), which clearly communicates the insight. However, the y-axis is truncated, starting at 40,000 instead of 0, which exaggerates the visual impression of the change. The chart would benefit from either a zero-baseline or an annotation acknowledging the truncation."

Exercise 31.17 — Three versions of one story ⭐⭐⭐

Take a single analytical finding from your project work and write three different communications:

A two-sentence Slack message to a data science colleague
A one-paragraph email to a product manager
A three-paragraph executive brief for the CEO

Each version should communicate the same core finding but adapt language, detail, and framing for the audience.

Guidance

Example for the finding "Our recommendation engine's click-through rate dropped after the latest model update": **Slack to colleague:** "Heads up — CTR on the reco engine dropped 8% after last week's model push. Looks like the new embeddings are over-indexing on popularity. Can you pull the A/B test results so we can compare segment-level performance?" **Email to product manager:** "The recommendation engine update we deployed last Tuesday has resulted in an 8% decrease in click-through rate. Our initial analysis suggests the new model is recommending more popular items at the expense of personalization. We're investigating whether this affects all user segments equally and expect to have a clear diagnosis by Thursday. We may recommend a rollback if the issue isn't isolated to a specific segment." **Brief for CEO:** "Our recommendation engine, which drives approximately 30% of product page visits, experienced a performance decline following a recent update. Click-through rates dropped 8%, which we estimate translates to roughly $120K in weekly lost revenue if sustained. The data science team has identified the likely cause and is testing a fix. We expect to resolve the issue within the week. No customer-facing communication is needed at this time, but we recommend a more rigorous pre-deployment testing protocol going forward to prevent similar issues." Notice how each version adjusts: the colleague gets technical details and a specific ask; the manager gets timeline and implications; the CEO gets business impact and assurance.

Exercise 31.18 — Ethical communication dilemma ⭐⭐⭐

You have analyzed employee productivity data and found that remote workers are 15% less productive than in-office workers (measured by tickets resolved per day). However, you also noticed that remote workers handle more complex tickets on average, and when you control for ticket complexity, the gap disappears entirely.

The CEO has asked you to present "the remote work productivity numbers" at next week's board meeting. She has not asked about ticket complexity.

What are the ethical considerations in how you present this data?
Write two versions of a slide title: one that is technically accurate but misleading, and one that is honest and complete.
What would you say if the CEO asked you to present only the headline number?

Guidance

1. **Ethical considerations:** Presenting the raw 15% gap without the complexity adjustment would create a misleading impression that could lead to policy decisions (like ending remote work) based on incomplete analysis. This is a classic case of Simpson's paradox — the aggregated data tells a different story than the disaggregated data. The disaggregated view is more truthful. 2. **Misleading title:** "Remote workers resolve 15% fewer tickets per day than in-office workers" **Honest title:** "Remote and in-office workers are equally productive when accounting for ticket complexity — remote workers handle harder tickets" 3. **What to say:** "I understand you'd like a clean number, and I can certainly present the headline figure. But I'd be doing the board a disservice if I didn't share the full picture. The raw number creates a misleading impression because it doesn't account for the type of work each group is doing. I'd recommend presenting both the raw and adjusted figures — it actually tells a more interesting and actionable story: our remote workers are gravitating toward complex work, which might be worth encouraging."

Exercise 31.19 — From notebook to narrative ⭐⭐⭐

Take an existing analysis notebook from a previous chapter's exercise (or create a simple one). It likely reads as a "lab notebook" — cells run in order, but the narrative is missing.

Transform it into a narrative notebook by: 1. Adding a title and introduction cell 2. Adding Markdown narration before and after each analysis step 3. Cleaning up all code (removing dead code, improving variable names) 4. Ensuring all visualizations have assertive titles and annotations 5. Adding a conclusions cell

Compare the two versions. Which would a colleague rather read?

Guidance

Common transformations needed: - Replace `df2 = df[df['year'] > 2019]` with a named variable like `pandemic_period = data[data['year'] > 2019]` preceded by a Markdown cell explaining why you're filtering to this period. - Replace chart titles like "Figure 1" with assertive titles like "Vaccination rates dropped sharply in 2020 and have not recovered." - Remove cells that produce output you don't discuss (orphaned DataFrames, debugging prints). - Add interpretation after key visualizations: "This chart reveals that..." - End with clear conclusions rather than letting the notebook trail off.

Exercise 31.20 — The anti-dashboard ⭐⭐⭐⭐

Design a dashboard that is intentionally terrible — one that violates as many dashboard design principles as possible while remaining technically "accurate." Then, for each violation, explain what is wrong and how to fix it.

Include at least 8 deliberate violations (from Section 31.7 and your own judgment).

Guidance

Example violations: 1. **3D pie chart** showing percentages that add to 103% (3D distortion) → Use a horizontal bar chart 2. **Dual y-axes** with vastly different scales creating a false visual correlation → Use separate charts 3. **Rainbow color scheme** with no semantic meaning → Use a sequential or categorical palette consistently 4. **No chart titles** or axis labels → Add assertive titles and clear labels 5. **All charts are the same size** with no visual hierarchy → Make the key metric largest 6. **No date stamp** → Add "Data as of [date]" 7. **Inconsistent time periods** (one chart shows YTD, another shows last 12 months, without labeling) → Align time periods or clearly label each 8. **Decorative images** (clip art of money bags next to revenue) → Remove all decorative elements 9. **Too many decimal places** ($1,234,567.89) → Round to meaningful precision ($1.2M) 10. **No interactivity** despite multiple user segments needing different views → Add filters This exercise builds critical evaluation skills by making you articulate *why* each violation is a problem.

Exercise 31.21 — Stakeholder simulation ⭐⭐⭐⭐

Work with a partner (or imagine one). One person plays the data scientist presenting vaccination analysis findings. The other plays a skeptical county commissioner who: - Does not understand statistics - Is worried about budget cuts - Has heard "data people always have an agenda" - Needs to vote on clinic funding next month

The data scientist must present three key findings and a recommendation in five minutes. The commissioner asks challenging questions like "Why should I trust this data?" and "Can you guarantee this will work?"

Write out both sides of a realistic dialogue (at least 10 exchanges). Focus on how the data scientist navigates skepticism, translates technical concepts, and maintains honesty about uncertainty.

Guidance

Key techniques the data scientist should model: - **Acknowledge skepticism:** "That's a fair concern, and I'm glad you raised it." - **Use analogies:** "Think of it like comparing gas mileage — you can't compare a truck to a sedan without adjusting for vehicle type." - **Be honest about uncertainty:** "I can't guarantee it will work. What I can tell you is that the evidence from similar counties is promising, and a pilot program would let us test it at lower risk." - **Connect to their concerns:** "I understand the budget is tight. That's exactly why I'm recommending the more cost-effective approach." - **Avoid defensiveness:** When accused of having an agenda, respond with "My job is to follow the data wherever it leads. In this case, it leads to a recommendation, but I've also shown you the limitations."

Exercise 31.22 — Cross-format communication plan ⭐⭐⭐⭐

You have completed a major analysis on student performance in an urban school district. Your findings show that class size has a small effect on test scores, but teacher experience has a large effect. Small classes with inexperienced teachers perform worse than large classes with experienced teachers.

Create a communication plan that includes: 1. A one-sentence headline for each of three audiences (school board, teachers' union, parents) 2. The key chart you would show each audience (describe it) 3. The call to action for each audience 4. One thing you would not say to each audience, and why

Guidance

The three audiences have different interests and sensitivities: **School board:** Interested in resource allocation. Headline: "Investing in teacher development yields 3x the improvement of reducing class size." Chart: Cost-effectiveness comparison. Call to action: Redirect class-size-reduction budget to teacher mentorship programs. Do not say: "Class size reduction was a waste of money" — they approved that spending and will become defensive. **Teachers' union:** Interested in working conditions and professional respect. Headline: "Experienced teachers are the strongest predictor of student success — more than any other factor." Chart: Student performance by teacher experience level. Call to action: Support the proposed mentorship program that pairs experienced and early-career teachers. Do not say: "Inexperienced teachers are the problem" — frame it as an investment in professional development, not a criticism. **Parents:** Interested in their children's outcomes. Headline: "Your child's teacher matters more than class size — here's what the district is doing about it." Chart: Simple comparison of outcomes in classes with supported vs. unsupported teachers. Call to action: Support the district's new teacher development initiative. Do not say: technical details about the statistical model — parents want to know what it means for their kids, not how you measured it.

Exercise 31.23 — Presentation rehearsal and feedback ⭐⭐⭐

Prepare a 5-minute presentation of any analysis from this course. Record yourself (audio or video). Then watch/listen and evaluate yourself on:

Did you lead with the key finding, or build up to it?
Did you use jargon that your audience would not understand?
Did you read from slides or speak naturally?
Did your slides support your words, or duplicate them?
Were there any "um," "uh," or filler words that could be replaced by pauses?
Did you end with a clear call to action?

Write a one-page self-evaluation with specific plans to improve.

Guidance

Most people discover the following when they watch themselves present for the first time: - They said "um" far more than they realized (solution: embrace silence instead) - They read their slides verbatim (solution: use less text on slides) - They spent too long on methodology and rushed through findings (solution: restructure with the Pyramid Principle) - They ended with "so yeah, that's it" instead of a clear conclusion (solution: script your closing sentence) This exercise is uncomfortable but transformative. Presentation skills improve fastest with self-observation and deliberate practice.

Exercise 31.24 — Communication failure post-mortem ⭐⭐⭐⭐

Research a real-world case where poor data communication led to a bad decision or public misunderstanding. (Examples: the Challenger disaster, the 2016 election forecast misunderstanding, early COVID-19 data visualization problems.)

Write a 300-word analysis that addresses: 1. What was the communication failure? 2. Who was the audience, and what did they need? 3. What could the data communicators have done differently? 4. What general lesson does this teach about data communication?

Guidance

The Challenger disaster is a well-documented case. Engineers at Morton Thiokol knew that O-ring failures were more likely at low temperatures, but their data presentation to NASA decision-makers failed to make the case compellingly. The key chart showed O-ring damage incidents but did not clearly show the relationship to temperature. Edward Tufte later demonstrated that a simple scatter plot of temperature vs. O-ring damage would have made the danger obvious. The lesson: the right visualization, with clear annotation, could have saved lives. For COVID-19: early logarithmic scale charts confused the public (who were not used to reading log scales), and projections with wide uncertainty bands were reported as point predictions, leading to both panic and complacency. The lesson: know your audience's graph literacy and communicate uncertainty in accessible ways.

Exercise 31.25 — The one-page portfolio piece ⭐⭐⭐⭐

Take your best analysis from this course and create a one-page visual summary — a single page that someone could read in 2 minutes and understand your question, approach, key finding, and recommendation. Include:

A compelling title
A one-paragraph introduction (3-4 sentences)
One or two well-annotated charts
Three bullet-point key findings
A clear recommendation
A data source note

This is the format used for policy briefs, conference posters (in miniature), and portfolio pieces. It forces you to distill an entire analysis to its essence.

Guidance

Think of this as an "infographic" without the clip art — a dense but readable one-page summary. Layout tips: - Title takes 10% of the page - Introduction takes 15% - Charts take 40% - Key findings take 20% - Recommendation and source take 15% Every word must earn its place. If you can remove a sentence without losing meaning, remove it. If a chart does not directly support your key findings, it does not belong on this page. This exercise is excellent preparation for building a portfolio ([Chapter 34](../chapter-34-building-portfolio/index.md)).

Reflection

After completing these exercises, consider: Which type of data communication feels most natural to you? Writing? Presenting? Visualizing? Designing dashboards? And which feels most difficult? The communication skills that feel hardest are usually the ones worth practicing most, because they represent your biggest growth opportunity as a data scientist.

Remember: the last mile of data science is communication. An analysis that nobody understands might as well not exist.