Key Takeaways: Communicating with Data: Telling Stories with Numbers
One-Sentence Summary
Effective data communication requires honest visualizations that follow Tufte's principles (maximize data-ink, minimize chartjunk), audience-appropriate writing that includes both statistical significance and effect sizes, structured reports with Introduction-Methods-Results-Discussion-Limitations, and reproducible analysis practices — because the most rigorous analysis in the world is worthless if your audience doesn't understand it.
Core Concepts at a Glance
| Concept | Definition | Why It Matters |
|---|---|---|
| Data-ink ratio | Proportion of ink used to display data vs. total ink | Guides you to remove everything that doesn't serve the data |
| Chartjunk | Visual elements that don't convey information (3D effects, gradients, decorations) | Reduces clarity and distracts from the data |
| Small multiples | Series of similarly designed charts for comparison across groups | Leverages human pattern-detection across consistently formatted panels |
| Data storytelling | Translating statistical findings into narrative with context, implications, and recommendations | Bridges the gap between analysis and action |
| Reproducible analysis | Analysis that someone else can reconstruct and arrive at the same results | Ensures scientific integrity and professional credibility |
Tufte's Principles
| Principle | Application |
|---|---|
| Maximize data-ink ratio | Remove unnecessary gridlines, borders, backgrounds, and decorations |
| Eliminate chartjunk | No 3D effects, gradient fills, decorative icons, or drop shadows |
| Use small multiples | Compare groups with side-by-side panels sharing the same axes |
| Show the data | Display individual observations, not just summaries |
| Encourage comparison | Use shared axes and consistent design |
| Serve a clear purpose | Every chart should answer a specific question |
| Integrate text and data | Annotate key features; titles should state findings |
Misleading Techniques Checklist
| Technique | Problem | Fix |
|---|---|---|
| Truncated axis | Small differences look enormous | Start bar chart y-axis at 0; label breaks |
| Cherry-picked time window | Controls narrative through selective framing | Show longest reasonable time frame; justify window |
| Dual y-axes | Any two variables can be made to look correlated | Use small multiples instead |
| 3D effects | Distorts proportions through perspective | Use flat 2D charts |
| Too many pie slices | Comparison becomes impossible | Switch to sorted bar chart |
| Area/volume distortion | Non-linear scaling exaggerates differences | Scale by area, not diameter; prefer bars |
Writing Statistical Results
Template Sentences
Confidence Interval:
Technical: "The 95% CI for mean [variable] was ([lower], [upper])."
Plain: "We estimate the average [variable] is between [lower] and [upper]."
t-Test:
Technical: "t([df]) = [value], p = [value], d = [value], 95% CI: ([lower], [upper])."
Plain: "[Group 1] scored [higher/lower] by about [difference]. This is [unlikely] to be chance, and the effect is [small/medium/large]."
Regression:
Technical: "b = [slope], p = [value], $R^2$ = [value]."
Plain: "For every additional [unit of x], [y] tends to [change] by about [slope]. The model explains [R² × 100]% of the variation."
The "So What?" Checklist
Every result needs: 1. The finding — what happened 2. The magnitude — how big (effect size) 3. The uncertainty — how confident (CI) 4. The implication — so what? (recommendation)
Report Structure
The Five Sections
| Section | Purpose | Key Content |
|---|---|---|
| Introduction | The "Why" | Research question, context, hypothesis |
| Methods | The "How" | Data source, sample size, analysis methods, cleaning decisions |
| Results | The "What" | Findings with test statistics, effect sizes, CIs, and visualizations |
| Discussion | The "So What" | Interpretation, practical significance, alternative explanations |
| Limitations | The "But" | Sampling, measurement, confounding, generalizability |
Executive Summary Template
- What did we study? (One sentence)
- What did we find? (One or two sentences)
- Why does it matter? (One sentence)
- What should we do? (One sentence)
Presenting Uncertainty Honestly
| Tool | When to Use |
|---|---|
| Error bars | Bar charts comparing group means |
| Confidence bands | Regression lines and trend lines |
| Hedging language | Text descriptions of findings |
| Exact p-values | Reports (not just "p < .05") |
| Confidence intervals | Always, alongside point estimates |
| Effect sizes | Always, alongside p-values |
Hedging Language Guide
| Evidence Strength | Language |
|---|---|
| Very strong (p < .001, large effect) | "The data clearly shows..." |
| Good (p < .05, medium effect) | "The data suggests..." |
| Suggestive (p = .05–.10) | "There are hints, but further data is needed..." |
| No evidence (p > .10) | "We found no evidence that..." (NOT "We proved no effect") |
Accessibility Principles
| Principle | Implementation |
|---|---|
| Don't rely on color alone | Use shapes, patterns, AND colors |
| Use colorblind-friendly palettes | Viridis, cividis, or Wong's palette |
| Test your charts | View in grayscale |
| Use direct labels | Label data series on the chart, not just in legends |
Reproducibility Checklist
| Element | What to Do |
|---|---|
| Raw data | Save the original, unmodified dataset |
| Cleaning log | Document every step (deletions, transformations, imputations) |
| Code | Write all analysis in scripts or notebooks — no manual editing |
| Random seeds | Set np.random.seed() for any simulation |
| Library versions | Record version numbers of all packages |
| Comments | Explain why you made each decision |
Key Python Code
Professional Chart Template
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Set global style
sns.set_style("whitegrid")
plt.rcParams.update({
'font.size': 11,
'axes.titlesize': 14,
'figure.dpi': 150,
'savefig.dpi': 300
})
fig, ax = plt.subplots(figsize=(8, 5))
# Plot data
ax.bar(categories, values, color='steelblue', edgecolor='none')
# Clean styling
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#CCCCCC')
ax.spines['bottom'].set_color('#CCCCCC')
ax.grid(axis='y', alpha=0.3)
# Descriptive title (states the finding)
ax.set_title('Monthly Sales Increased 12% After Campaign',
color='#333333')
ax.set_ylabel('Sales ($K)', color='#555555')
# Data labels
for bar, val in zip(ax.patches, values):
ax.text(bar.get_x() + bar.get_width() / 2,
bar.get_height() + 1, f'${val}K',
ha='center', fontsize=10)
plt.tight_layout()
plt.savefig('chart.png', dpi=300, bbox_inches='tight')
Annotation Example
# Add annotation to highlight key data point
ax.annotate('Peak: 380 visits',
xy=(x_val, y_val),
xytext=(x_offset, y_offset),
arrowprops=dict(arrowstyle='->',
color='#E74C3C'),
fontsize=10, color='#E74C3C',
fontweight='bold')
# Add reference line
ax.axhline(y=target, color='gray', linestyle='--',
linewidth=1, alpha=0.5)
ax.text(x_pos, target - 3, 'Target: 120 min',
fontsize=9, color='gray')
Excel Chart Formatting
| Task | How |
|---|---|
| Remove legend (one series) | Click legend → Delete |
| Lighten gridlines | Format → Color: light gray |
| Remove chart border | Format → No border |
| Descriptive title | Replace "Chart Title" with finding |
| Start y-axis at 0 | Format Axis → Minimum = 0 |
| Single color | Format bars → one muted color |
| Add data labels | Right-click → Add Data Labels |
Common Mistakes
| Mistake | Correction |
|---|---|
| Reporting only p-values | Always include effect sizes and CIs |
| "We proved the treatment works" | "The data provides strong evidence of an effect" |
| Charts without titles or with generic titles | State the finding in the title |
| No limitations section | Every analysis has limitations — state them |
| Manual data editing without documentation | Script all analysis steps for reproducibility |
| Red-green color coding | Use colorblind-friendly palettes |
| "p = .06 means no effect" | "Evidence was suggestive but didn't reach conventional significance" |
| Same writing style for all audiences | Adapt detail and language to the audience |
Connections
| Connection | Details |
|---|---|
| Ch.5 (Graph types) | Graph types from Ch.5 are now polished with design principles and accessibility |
| Ch.7 (Reproducibility) | Cleaning logs from Ch.7 become the Methods section of your report |
| Ch.13 (p-values) | p-value communication is one of the hardest challenges — now you have templates |
| Ch.17 (Effect sizes) | Always report effect sizes alongside p-values — the rule from Ch.17 becomes a reporting standard |
| Ch.20 (Decomposing variability) | $R^2$ is one of the most intuitive numbers to communicate: "explains X% of the variation" |
| Ch.22 (Regression) | Regression results need careful communication — slope interpretation and $R^2$ |
| Ch.26 (Critical consumer) | The misleading techniques you learned to avoid as a producer, you'll learn to detect as a consumer |
| Ch.27 (Ethical data practice) | Honest visualization is ethical practice — the line between design and deception |