Chapter 25: Communicating with Data: Telling Stories with Numbers

Contributors

45 min read

> "The greatest value of a picture is when it forces us to notice what we never expected to see."

Learning Objectives

Design clear, honest, and effective statistical visualizations
Write statistical results for non-technical audiences
Avoid common misleading graph techniques (truncated axes, cherry-picked scales)
Structure a data analysis report with reproducibility in mind
Present statistical findings in oral and written formats

In This Chapter

Chapter Overview
25.1 A Puzzle Before We Start (Productive Struggle)
25.2 Tufte's Principles: The Foundation of Good Data Visualization
25.3 The Rogues' Gallery: Common Misleading Graph Techniques
25.4 Spaced Review 1: Graph Types and When to Use Them (from Ch. 5)
25.5 Designing for Accessibility
25.6 The Annotation Layer: Making Your Charts Talk
25.7 Writing Statistical Results: Translating Numbers into Words
25.8 Spaced Review 2: Interpreting p-Values for Non-Statisticians (from Ch. 13)
25.9 Presenting Uncertainty Honestly
25.10 Spaced Review 3: Effect Sizes — Always Include Them (from Ch. 17)
25.11 Structuring a Data Analysis Report
25.12 Writing for Different Audiences: The Same Result, Two Ways
25.13 Ethical Analysis: When Data Visualization Becomes Manipulation
25.14 Our Characters in Action
25.15 Reproducibility: Why Your Analysis Should Be Replicable
25.16 Threshold Concept Review: Decomposing Variability (from Ch. 20)
25.17 Threshold Concept Review: Holding Other Variables Constant (from Ch. 23)
25.18 Python: Polishing Your Visualizations
25.19 Excel: Chart Formatting Best Practices
25.20 Presenting Statistical Findings: Oral Communication
25.21 Progressive Project: Draft Your Report
Research Question
Background
Dataset Overview
Hypothesis
25.22 Chapter Summary

Chapter 25: Communicating with Data: Telling Stories with Numbers

"The greatest value of a picture is when it forces us to notice what we never expected to see." — John Tukey

Chapter Overview

Here's a secret that nobody tells you in statistics class: the analysis is the easy part.

I know — after twenty-four chapters of probability distributions, hypothesis tests, regression models, and p-values, that statement probably feels like a slap in the face. But think about it. You've built an impressive toolkit. You can clean data, visualize distributions, test hypotheses, build regression models, and interpret confidence intervals. You can do things that most people — including most professionals — cannot.

And none of it matters if you can't communicate what you found.

The most elegant regression model in the world is worthless if your audience doesn't understand your graph. The most important finding in your dataset will be ignored if you bury it in jargon. The most careful analysis will be misused if you present it in a way that misleads — even unintentionally.

This chapter is about the craft of communication. Less math, more design. Less calculation, more writing. Less "how do I compute this?" and more "how do I make sure people understand what I computed?"

We'll learn from Edward Tufte, the legendary data visualization expert whose principles have guided chart design for decades. We'll study common tricks — truncated axes, cherry-picked time windows, 3D effects, and other techniques that can make honest data tell dishonest stories. We'll practice writing statistical results for audiences who have never taken a statistics course (which is most of your future audience). And we'll structure a complete data analysis report, from introduction to limitations.

Here's why this matters right now: Maya Chen is preparing a public health brief for the city council. Alex Rivera is building a dashboard for StreamVibe executives. Sam Okafor is presenting analytics findings to the Riverside Raptors coaching staff. Professor James Washington is communicating algorithmic bias findings to policymakers. Each of them has done rigorous analysis. Each of them could torpedo their own work with a bad graph, a confusing sentence, or a misleading visual.

Don't let that happen to you.

In this chapter, you will learn to: - Design clear, honest, and effective statistical visualizations - Write statistical results for non-technical audiences - Avoid common misleading graph techniques (truncated axes, cherry-picked scales) - Structure a data analysis report with reproducibility in mind - Present statistical findings in oral and written formats

Fast Track: If you're already comfortable with basic data visualization and want to focus on communication, skim Sections 25.1–25.3, then jump to Section 25.7 (writing statistical results). Complete quiz questions 1, 8, and 15 to verify.

Deep Dive: After this chapter, work through Case Study 1 (Maya's public health brief) for a complete before-and-after revision exercise, then Case Study 2 (James's policy memo on algorithmic bias) for practice communicating sensitive statistical findings to decision-makers. Both include full Python code.

25.1 A Puzzle Before We Start (Productive Struggle)

Before we define any principles, look at these two versions of the same data.

Two Graphs, One Dataset

A company reports quarterly revenue. Both graphs below show the exact same data — revenue from Q1 2023 through Q4 2024.

Graph A: The y-axis starts at $0 and goes to $120 million. The bars show revenue of $95M, $97M, $98M, $96M, $99M, $101M, $103M, $102M. The title reads: "Quarterly Revenue: Steady Performance."

Graph B: The y-axis starts at $93 million and goes to $105 million. The bars show the same values. The title reads: "Revenue Surges — Growth Accelerating!"

(a) Which graph makes it look like revenue is exploding? Which makes it look flat?

(b) Which graph is lying? Or are both telling the truth?

(c) If you were presenting to investors who want to see growth, which would you choose? If you were presenting to regulators who want accuracy, which would you choose? What does that choice say about you?

Take 3 minutes. The answer to question (c) is the entire lesson of this chapter.

Here's the uncomfortable truth: both graphs are technically accurate. Every bar corresponds to the correct revenue figure. No number has been fabricated. But Graph B, by truncating the y-axis, creates a visual impression that's wildly different from the underlying reality. An 8% increase over two years looks like a tripling.

This is why communication matters. The same data, presented two different ways, can lead to completely different decisions. And the person making the chart — that's you — has an ethical responsibility to present data honestly.

Let's learn how to do that.

25.2 Tufte's Principles: The Foundation of Good Data Visualization

Edward Tufte is a statistician, artist, and professor emeritus at Yale who wrote what many consider the most beautiful book about data: The Visual Display of Quantitative Information (1983). If this textbook has a patron saint of Chapter 25, it's Tufte.

Tufte's core argument is disarmingly simple: good data visualization maximizes the data and minimizes everything else. Every pixel on your chart should either show data or help the reader understand the data. Anything else is clutter.

The Data-Ink Ratio

Tufte's most famous concept is the data-ink ratio:

$$\text{Data-ink ratio} = \frac{\text{Ink used to display data}}{\text{Total ink used in the graphic}}$$

The goal is to push this ratio as close to 1.0 as possible. That means: - Remove unnecessary gridlines - Eliminate decorative borders and boxes - Simplify legends when the context is obvious - Remove background shading that doesn't encode information - Eliminate redundant labels

This doesn't mean every chart should be a stripped-down skeleton. It means every element should earn its place. If you can remove something and the chart is equally clear, remove it.

Key Insight: The data-ink ratio isn't about making ugly, minimalist charts. It's about making charts where the data is the star, not the decoration. Think of it like writing: good prose doesn't use big words to sound impressive — it uses the right words to be clear.

Chartjunk

Chartjunk is Tufte's term for visual elements that don't convey information. It includes:

Chartjunk Element	Why It's a Problem
3D effects on bar charts	Distorts visual comparison; bars in the back look shorter
Gradient fills on bars	Color changes don't encode data; they distract
Decorative clip art	Draws the eye away from the actual data
Excessive gridlines	Creates a visual cage around the data
Background images or textures	Competes with data for the viewer's attention
Exploding pie chart slices	Distorts the angular comparison that pie charts rely on
Drop shadows	Adds visual noise without information

Here's the before-and-after:

Before/After: Removing Chartjunk

BEFORE (cluttered): - 3D bar chart with gradient fills - Dark background with grid lines at every 5 units - Decorative icons above each bar - Drop shadows on every element - Legend in a thick bordered box - Title in fancy font: "🔥 AMAZING SALES DATA 🔥"

AFTER (clean): - Flat 2D bar chart with a single muted color - Light gray gridlines on the y-axis only, at meaningful intervals - Data labels on each bar (so the reader doesn't need to trace to the axis) - No drop shadows - Legend removed (only one data series — the title is enough context) - Title in clean font: "Monthly Sales by Region, 2024"

What changed: The data didn't change at all. But the clean version lets the reader focus on what matters — which region sold the most — instead of admiring your graphic design skills.

import matplotlib.pyplot as plt
import numpy as np

# ============================================================
# BEFORE AND AFTER: CHARTJUNK REMOVAL
# ============================================================

regions = ['North', 'South', 'East', 'West']
sales = [42, 38, 55, 47]

# --- BEFORE: Cluttered chart ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Cluttered version
ax1 = axes[0]
bars1 = ax1.bar(regions, sales, color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4'],
                edgecolor='black', linewidth=2)
ax1.set_facecolor('#F0F0F0')
ax1.grid(True, which='both', linewidth=1.5, color='gray', alpha=0.7)
ax1.set_title('AMAZING SALES DATA', fontsize=16,
              fontweight='bold', fontstyle='italic',
              color='darkred')
ax1.set_ylabel('Sales (thousands)', fontsize=12, fontweight='bold')
ax1.set_xlabel('Region', fontsize=12, fontweight='bold')
for spine in ax1.spines.values():
    spine.set_linewidth(3)
    spine.set_color('black')
ax1.legend(['Q4 Sales'], loc='upper right',
           fancybox=True, shadow=True, fontsize=11,
           edgecolor='black')
ax1.annotate('WOW!', xy=(2, 55), fontsize=14, color='red',
             fontweight='bold', ha='center')
ax1.set_ylim(0, 70)

# --- AFTER: Clean chart ---
ax2 = axes[1]
bars2 = ax2.bar(regions, sales, color='steelblue', edgecolor='none',
                width=0.6)
ax2.set_title('Monthly Sales by Region, Q4 2024',
              fontsize=13, fontweight='normal', color='#333333',
              pad=15)
ax2.set_ylabel('Sales (thousands)', fontsize=11, color='#555555')
ax2.set_xlabel('')
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.spines['left'].set_color('#CCCCCC')
ax2.spines['bottom'].set_color('#CCCCCC')
ax2.grid(axis='y', alpha=0.3, color='gray', linewidth=0.5)
ax2.set_ylim(0, 70)

# Add data labels on bars
for bar, val in zip(bars2, sales):
    ax2.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 1,
             f'${val}K', ha='center', va='bottom',
             fontsize=10, color='#333333')

ax2.tick_params(colors='#555555')

plt.tight_layout()
plt.savefig('before_after_chartjunk.png', dpi=150, bbox_inches='tight')
plt.show()

print("Left: Chartjunk overload — gradient colors, heavy gridlines,")
print("      thick borders, unnecessary legend, distracting annotation.")
print("Right: Clean design — single color, minimal gridlines, data labels,")
print("       no unnecessary elements. The DATA is the star.")

Small Multiples

Tufte's second major contribution is the concept of small multiples: a series of small, similarly designed charts that allow comparison across categories or time periods.

Instead of cramming five lines onto one busy chart, you create five small, simple charts side by side — each showing one category, all sharing the same axes so comparisons are straightforward.

import matplotlib.pyplot as plt
import numpy as np

# ============================================================
# SMALL MULTIPLES: COMPARING TRENDS ACROSS GROUPS
# ============================================================

np.random.seed(42)

# Simulated monthly data for four regions
months = np.arange(1, 13)
month_labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
regions = ['North', 'South', 'East', 'West']
base_sales = [40, 35, 50, 45]

fig, axes = plt.subplots(1, 4, figsize=(16, 3.5), sharey=True)
fig.suptitle('Monthly Sales Trends by Region, 2024',
             fontsize=13, y=1.02, color='#333333')

for i, (ax, region, base) in enumerate(zip(axes, regions, base_sales)):
    # Simulated seasonal pattern
    seasonal = base + 5 * np.sin(2 * np.pi * months / 12) + \
               np.random.normal(0, 3, 12)

    ax.plot(months, seasonal, color='steelblue', linewidth=2)
    ax.fill_between(months, seasonal, alpha=0.15, color='steelblue')
    ax.set_title(region, fontsize=11, fontweight='bold', color='#333333')
    ax.set_xlim(1, 12)
    ax.set_xticks([1, 4, 7, 10])
    ax.set_xticklabels(['Jan', 'Apr', 'Jul', 'Oct'], fontsize=8)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_color('#CCCCCC')
    ax.spines['bottom'].set_color('#CCCCCC')
    ax.tick_params(colors='#555555', labelsize=8)
    ax.grid(axis='y', alpha=0.3, linewidth=0.5)

axes[0].set_ylabel('Sales ($K)', fontsize=10, color='#555555')

plt.tight_layout()
plt.savefig('small_multiples.png', dpi=150, bbox_inches='tight')
plt.show()

print("Small multiples: same scale, same design, easy comparison.")
print("The East region clearly has higher baseline sales.")
print("All regions show a similar seasonal pattern (summer peak).")

Key Insight: Small multiples work because they leverage the most powerful pattern-detection tool in the universe: the human visual system. When charts share the same axes and design, your eye can instantly spot which panel is different — without needing to decode a complex legend or untangle overlapping lines.

Tufte's Other Principles

Principle	What It Means	In Practice
Show the data	Let the viewer see the actual data points, not just summaries	Use scatterplots instead of just regression lines; show individual observations, not just means
Encourage comparison	The most interesting insights come from comparisons	Use shared axes, small multiples, side-by-side layouts
Serve a clear purpose	Every chart should answer a specific question	Write the question the chart answers before you make it
Integrate text and data	Labels, annotations, and titles should work with the visual	Annotate key data points; use descriptive titles that state the finding
Avoid distortion	Visual representation should be proportional to data values	Use consistent scales; don't use area or volume to represent one-dimensional quantities

25.3 The Rogues' Gallery: Common Misleading Graph Techniques

Now let's look at the dark side. These techniques are used every day in news media, corporate presentations, political campaigns, and — yes — sometimes in academic papers. Learning to spot them makes you a better consumer and producer of data.

Technique 1: Truncated Axes

We saw this in the opening puzzle. By starting the y-axis at a value other than zero (for bar charts), small differences look enormous.

Before/After: Truncated Axis

BEFORE (misleading): Approval ratings for a politician: 48%, 47%, 46%, 45%. Y-axis starts at 44%. The graph shows what appears to be a steep decline — it looks like support is collapsing.

AFTER (honest): Same data. Y-axis starts at 0%. The bars are nearly identical in height. The trend is real — there is a small decline — but the visual accurately conveys its magnitude.

The rule: For bar charts, the y-axis should almost always start at zero. For line charts, some truncation is acceptable (because you're comparing changes, not amounts), but you should clearly label the axis and consider adding a break indicator (⫽) to signal the truncation.

import matplotlib.pyplot as plt
import numpy as np

# ============================================================
# MISLEADING VS. HONEST: TRUNCATED AXIS
# ============================================================

quarters = ['Q1', 'Q2', 'Q3', 'Q4']
approval = [48, 47, 46, 45]

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# --- MISLEADING: Truncated axis ---
ax1 = axes[0]
bars1 = ax1.bar(quarters, approval, color='#E74C3C', edgecolor='none',
                width=0.5)
ax1.set_ylim(44, 49)
ax1.set_title('Approval Rating in Freefall!',
              fontsize=13, fontweight='bold', color='#C0392B')
ax1.set_ylabel('Approval (%)', fontsize=11)
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.grid(axis='y', alpha=0.3)
ax1.text(0.5, 0.02, 'Y-axis starts at 44% — misleading!',
         transform=ax1.transAxes, fontsize=9, color='red',
         ha='center', style='italic')

# --- HONEST: Full axis ---
ax2 = axes[1]
bars2 = ax2.bar(quarters, approval, color='steelblue', edgecolor='none',
                width=0.5)
ax2.set_ylim(0, 55)
ax2.set_title('Approval Rating: Small Decline Over 2024',
              fontsize=13, color='#333333')
ax2.set_ylabel('Approval (%)', fontsize=11)
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.grid(axis='y', alpha=0.3)

# Add data labels
for bar, val in zip(bars2, approval):
    ax2.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.8,
             f'{val}%', ha='center', fontsize=10, color='#333333')

plt.tight_layout()
plt.savefig('truncated_axis.png', dpi=150, bbox_inches='tight')
plt.show()

print("Left: Y-axis starts at 44% — a 3-point drop looks like a collapse.")
print("Right: Y-axis starts at 0% — the decline is visible but properly scaled.")
print("Both show the SAME data. The visual impression is completely different.")

Technique 2: Cherry-Picked Time Windows

Choose the right start date and almost any trend can look like whatever you want.

The stock market crashed in March 2020 and recovered spectacularly by late 2020. If you show a graph starting March 2020, it looks like explosive growth. If you show a graph starting January 2020, it looks like a crash followed by a recovery. If you show the full decade, it looks like a temporary blip in a long upward trend. Same market. Same data. Three completely different stories.

The rule: Always ask "why does this time window start and end where it does?" If you can't give a good reason, you might be cherry-picking. Show the longest reasonable time frame, or at minimum acknowledge what happened before and after your window.

Technique 3: Dual Y-Axes

Dual-axis charts plot two variables on the same graph with different y-axes (one on the left, one on the right). They're popular in business dashboards and almost always misleading.

The problem: by adjusting the scales of the two axes, you can make any two variables appear correlated — or uncorrelated. You can make the lines cross wherever you want. You control the visual relationship by choosing the axis ranges, not by letting the data speak.

The rule: Avoid dual-axis charts. If you need to compare two variables with different units, use small multiples (two separate charts, side by side, sharing the x-axis). If you absolutely must use dual axes, label both axes prominently and don't draw conclusions about correlation from the visual overlap.

Technique 4: 3D Charts

Three-dimensional effects on 2D data are never helpful and almost always harmful. A 3D bar chart makes bars in the "back" appear shorter than bars in the "front," even when they represent the same value. A 3D pie chart distorts angles — slices in the "front" of the pie look larger because of the perspective projection.

The rule: Never use 3D effects for 2D data. If your data has three genuine dimensions (e.g., x, y, and z coordinates), 3D visualization is appropriate. Otherwise, 3D is chartjunk.

Technique 5: Pie Charts with Too Many Slices

Pie charts have exactly one strength: showing parts of a whole when there are a small number of categories (ideally 2–4). They fail spectacularly when there are too many slices, when slices are similar in size, or when you need to compare across multiple pie charts.

Before/After: Pie Chart Improvement

BEFORE: A pie chart showing market share for 12 companies. Seven slices are between 5% and 8%. The labels overlap. You can't tell which company is bigger.

AFTER: A horizontal bar chart showing the same data, sorted from largest to smallest. Now the comparison is instant — your eye reads the bar lengths and immediately sees the ranking.

The rule: If your pie chart has more than 5 slices, switch to a bar chart. Actually, consider switching to a bar chart regardless — Cleveland and McGill's 1984 research showed that humans judge bar lengths more accurately than pie angles.

import matplotlib.pyplot as plt
import numpy as np

# ============================================================
# PIE CHART VS. BAR CHART: WHEN PIES FAIL
# ============================================================

companies = ['Alpha Corp', 'Beta Inc', 'Gamma LLC', 'Delta Co',
             'Epsilon Ltd', 'Zeta Group', 'Eta Systems', 'Theta Tech',
             'Iota Media', 'Kappa Net', 'Lambda AI', 'Other']
shares = [15, 12, 11, 9, 8, 8, 7, 7, 6, 6, 5, 6]

fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# --- PIE CHART: Hard to read ---
ax1 = axes[0]
colors = plt.cm.Set3(np.linspace(0, 1, len(companies)))
ax1.pie(shares, labels=companies, autopct='%1.0f%%', colors=colors,
        startangle=90, textprops={'fontsize': 8})
ax1.set_title('Market Share (Pie Chart)\nCan you tell which is bigger?',
              fontsize=12, color='#333333')

# --- BAR CHART: Easy to read ---
ax2 = axes[1]
# Sort by share
sorted_idx = np.argsort(shares)
sorted_companies = [companies[i] for i in sorted_idx]
sorted_shares = [shares[i] for i in sorted_idx]

bars = ax2.barh(sorted_companies, sorted_shares, color='steelblue',
                edgecolor='none', height=0.6)
ax2.set_xlabel('Market Share (%)', fontsize=11, color='#555555')
ax2.set_title('Market Share (Bar Chart)\nInstant comparison',
              fontsize=12, color='#333333')
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.spines['left'].set_color('#CCCCCC')
ax2.spines['bottom'].set_color('#CCCCCC')

# Add value labels
for bar, val in zip(bars, sorted_shares):
    ax2.text(bar.get_width() + 0.3, bar.get_y() + bar.get_height() / 2,
             f'{val}%', va='center', fontsize=9, color='#333333')

ax2.set_xlim(0, 20)
ax2.tick_params(colors='#555555')

plt.tight_layout()
plt.savefig('pie_vs_bar.png', dpi=150, bbox_inches='tight')
plt.show()

print("Left: 12-slice pie chart. Try comparing Epsilon (8%) to Zeta (8%).")
print("Right: Sorted bar chart. The ranking and magnitudes are immediately clear.")

Technique 6: Area and Volume Distortion

If you double a number, and you represent it by doubling the radius of a circle, the visual area increases by a factor of four ($\pi r^2$). If you use 3D objects, doubling the linear dimension increases the volume by a factor of eight ($r^3$). This means the visual impression of the larger number is dramatically exaggerated.

The rule: When using icons or shapes to represent quantities, scale the area, not the diameter. Better yet, use bars — they scale linearly and avoid this problem entirely.

Summary: The Misleading Techniques Checklist

Technique	What It Does	How to Fix It
Truncated axis	Makes small differences look enormous	Start bar chart y-axis at 0; label breaks clearly for line charts
Cherry-picked time window	Controls the narrative by selective framing	Show the longest reasonable time frame; justify your window choice
Dual y-axes	Allows arbitrary visual correlation between any two variables	Use small multiples instead; or clearly label both axes
3D effects	Distorts visual proportions through perspective	Use flat 2D charts for 2D data
Too many pie slices	Makes comparison impossible	Switch to a sorted bar chart
Area/volume distortion	Exaggerates differences through non-linear scaling	Scale by area, not diameter; prefer bars

25.4 Spaced Review 1: Graph Types and When to Use Them (from Ch. 5)

Way back in Chapter 5, you learned the fundamental graph types: histograms for distributions, bar charts for categories, scatterplots for relationships, box plots for comparing groups, and time series plots for trends.

Now it's time to polish those graphs. Here's the quick review of which graph to choose, and then we'll add the design principles from Sections 25.2–25.3.

Data Situation	Best Graph Type	Now Add These Design Principles
One numerical variable (distribution)	Histogram	Meaningful bin widths; clear axis labels; descriptive title stating the finding
One categorical variable	Bar chart (or horizontal bar for many categories)	Y-axis starts at 0; sort bars by frequency if categories have no natural order
Two numerical variables (relationship)	Scatterplot	Add a regression line only if the relationship is linear; label axes with units
One numerical variable across groups	Box plot or side-by-side histograms	Shared axis scale; annotate medians if needed; use small multiples for many groups
Trend over time	Line chart	Consistent time intervals; don't truncate the y-axis without a clear signal
Parts of a whole	Pie chart (2–4 slices) or stacked bar	If more than 5 categories, switch to bar chart

Connection to Chapter 5: In Chapter 5, you learned what each graph shows. Now you're learning how to make each graph effective. The first is about statistical thinking — choosing the right tool. The second is about communication — making the tool work for your audience.

Test yourself (retrieval practice): Without looking at the table above, try to answer: What type of graph would you use to compare the distribution of exam scores across four different class sections? What design principles from this chapter would you apply?

Check your answer

Side-by-side box plots (or small multiples of histograms). Design principles: share the same y-axis across all groups, use a clean color scheme, label medians, remove chartjunk, and write a title that states the comparison: "Exam Score Distributions Across Four Sections, Fall 2024."

25.5 Designing for Accessibility

Before we move on to writing, we need to address something that Tufte's 1983 book didn't cover: accessibility. Approximately 8% of men and 0.5% of women have some form of color vision deficiency. If your chart relies entirely on color to distinguish categories, a meaningful portion of your audience won't be able to read it.

Color Accessibility Principles

Principle	What to Do	What to Avoid
Don't rely on color alone	Use shapes, patterns, or labels in addition to color	Red-green color coding as the only distinguisher
Use colorblind-friendly palettes	Viridis, cividis, or manually selected palettes	Default rainbow or jet colormaps
Test your charts	View in grayscale or use a colorblind simulator	Assuming your monitor represents all viewers
Use direct labels	Label data series directly instead of using legends	Legends that require matching small color swatches to data

import matplotlib.pyplot as plt
import numpy as np

# ============================================================
# ACCESSIBLE COLOR PALETTES
# ============================================================

categories = ['Group A', 'Group B', 'Group C', 'Group D']
values = [35, 42, 28, 51]

fig, axes = plt.subplots(1, 3, figsize=(16, 4))

# --- Bad: Red-green palette ---
ax1 = axes[0]
bad_colors = ['#FF0000', '#00FF00', '#0000FF', '#FF00FF']
ax1.bar(categories, values, color=bad_colors, edgecolor='none')
ax1.set_title('Problematic:\nRed-Green Colors', fontsize=11, color='#333333')
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.set_ylim(0, 60)

# --- Good: Colorblind-friendly palette ---
ax2 = axes[1]
# Wong's colorblind-friendly palette (commonly recommended)
good_colors = ['#0072B2', '#E69F00', '#009E73', '#CC79A7']
ax2.bar(categories, values, color=good_colors, edgecolor='none')
ax2.set_title('Better:\nColorblind-Friendly Palette', fontsize=11,
              color='#333333')
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.set_ylim(0, 60)

# --- Best: Color + pattern + labels ---
ax3 = axes[2]
hatches = ['///', '...', 'xxx', '\\\\\\']
bars = ax3.bar(categories, values, color=good_colors,
               edgecolor='#333333', linewidth=0.5)
for bar, hatch in zip(bars, hatches):
    bar.set_hatch(hatch)
for bar, val in zip(bars, values):
    ax3.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 1,
             str(val), ha='center', fontsize=10, color='#333333')
ax3.set_title('Best:\nColor + Pattern + Labels', fontsize=11,
              color='#333333')
ax3.spines['top'].set_visible(False)
ax3.spines['right'].set_visible(False)
ax3.set_ylim(0, 60)

for ax in axes:
    ax.grid(axis='y', alpha=0.2)
    ax.tick_params(colors='#555555')

plt.tight_layout()
plt.savefig('accessibility.png', dpi=150, bbox_inches='tight')
plt.show()

print("Left: Red and green are indistinguishable for ~8% of men.")
print("Center: Colorblind-friendly palette (Wong 2011).")
print("Right: Color + hatching patterns + data labels — works for everyone.")

Key Insight: Designing for accessibility doesn't just help people with color vision deficiency. It makes your charts better for everyone — they print well in black and white, they work on projector screens with poor contrast, and they're clearer in low-light conditions. Universal design is good design.

25.6 The Annotation Layer: Making Your Charts Talk

A chart without annotations is like a painting without a title — the viewer might admire it, but they're not sure what they're supposed to take away. Annotations are text labels, arrows, and callouts that guide the viewer's attention to the most important features of your visualization.

What to Annotate

Feature	When to Annotate It	How
Key data point	When one value is the main finding	Arrow pointing to the data point with a text label
Threshold or benchmark	When there's a meaningful reference line	Horizontal/vertical dashed line with a label
Trend change	When the pattern shifts at a specific point	Vertical line at the change point with explanation
Outlier	When an unusual observation deserves attention	Callout box explaining why it's unusual
Comparison	When the viewer should compare two specific items	Bracket or arrow connecting the two items

import matplotlib.pyplot as plt
import numpy as np

# ============================================================
# ANNOTATION EXAMPLE: MAYA'S ER WAIT TIME TREND
# ============================================================

np.random.seed(42)
months = np.arange(1, 25)
month_labels = ['Jan \'23', '', 'Mar', '', 'May', '', 'Jul', '',
                'Sep', '', 'Nov', '', 'Jan \'24', '', 'Mar', '',
                'May', '', 'Jul', '', 'Sep', '', 'Nov', '']

# Simulated ER wait times (minutes)
wait_times = 145 + 3 * months + np.random.normal(0, 8, 24)
# Intervention at month 13
wait_times[12:] = wait_times[12:] - 35 + np.random.normal(0, 5, 12)

fig, ax = plt.subplots(figsize=(12, 5))

ax.plot(months, wait_times, color='steelblue', linewidth=2,
        marker='o', markersize=4, markerfacecolor='steelblue')

# Annotation 1: Peak wait time
peak_idx = np.argmax(wait_times[:12])
ax.annotate(f'Peak: {wait_times[peak_idx]:.0f} min',
            xy=(months[peak_idx], wait_times[peak_idx]),
            xytext=(months[peak_idx] + 2, wait_times[peak_idx] + 15),
            arrowprops=dict(arrowstyle='->', color='#E74C3C', lw=1.5),
            fontsize=10, color='#E74C3C', fontweight='bold')

# Annotation 2: Intervention line
ax.axvline(x=12.5, color='#2ECC71', linestyle='--', linewidth=1.5,
           alpha=0.7)
ax.text(12.8, max(wait_times) + 5, 'New triage\nprotocol\nimplemented',
        fontsize=9, color='#2ECC71', fontweight='bold', va='bottom')

# Annotation 3: Post-intervention average
post_mean = np.mean(wait_times[12:])
ax.axhline(y=post_mean, xmin=0.52, xmax=0.98, color='#E67E22',
           linestyle=':', linewidth=1.5, alpha=0.7)
ax.text(24.3, post_mean, f'Post avg:\n{post_mean:.0f} min',
        fontsize=9, color='#E67E22', va='center')

# Annotation 4: Target
ax.axhline(y=120, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.text(1, 117, 'Target: 120 min', fontsize=9, color='gray',
        va='top')

ax.set_title('ER Wait Times: Effect of New Triage Protocol',
             fontsize=13, color='#333333', pad=15)
ax.set_ylabel('Average Wait Time (minutes)', fontsize=11,
              color='#555555')
ax.set_xlabel('Month', fontsize=11, color='#555555')
ax.set_xticks(months[::3])
ax.set_xticklabels([month_labels[i] for i in range(0, 24, 3)],
                   fontsize=9)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#CCCCCC')
ax.spines['bottom'].set_color('#CCCCCC')
ax.grid(axis='y', alpha=0.2)
ax.tick_params(colors='#555555')

plt.tight_layout()
plt.savefig('annotated_chart.png', dpi=150, bbox_inches='tight')
plt.show()

print("This chart tells a complete story without any additional text:")
print("1. ER wait times were rising steadily")
print("2. A new triage protocol was introduced in January 2024")
print("3. Wait times dropped significantly after the intervention")
print("4. The hospital hasn't yet reached its 120-minute target")
print("\nEvery annotation serves the story. Nothing is decorative.")

Notice how the annotations in the chart above turn raw data into a narrative. Without the annotations, the reader sees a line that goes up and then drops. With annotations, the reader understands: wait times were climbing, an intervention happened, results improved, but haven't met the target yet. That's a complete story in one graph.

Key Insight: A well-annotated chart can replace an entire paragraph of text. When you find yourself writing "As shown in Figure 3, we can see that..." — stop. If you need to describe what the chart shows, the chart isn't doing its job. Add annotations until the chart speaks for itself, then your text can focus on interpretation rather than description.

25.7 Writing Statistical Results: Translating Numbers into Words

This is where most statistics students — and even many professionals — struggle. You've done the analysis. You have a p-value, a confidence interval, an effect size, and maybe an $R^2$. Now you have to explain what it all means to someone who has never taken a statistics course.

The Two Audiences

Every statistical result will eventually be read by two different audiences, and you need to write for both:

	Technical Audience	Non-Technical Audience
Who	Other statisticians, data scientists, peer reviewers	Executives, policymakers, journalists, general public
What they want	Methods, assumptions, exact numbers, reproducibility	The bottom line: what does this mean and what should we do?
What they know	Statistical jargon, methods, assumptions	Basic numeracy, intuitive sense of "big" vs. "small"
Your writing style	Precise, formula-heavy, assumption-checking	Plain language, analogies, visualizations
Example	"A two-sample Welch's t-test yielded t(247.3) = 2.53, p = .012, d = 0.32"	"Users who saw the new algorithm spent about 4.5 more minutes per session — a small but real improvement"

Template Sentences for Common Tests

Here's a practical toolkit — fill-in-the-blank sentences for translating statistical results into clear prose. For each test, I've written a technical version and a plain-language version.

Confidence Interval:

Technical: "The 95% confidence interval for the mean [variable] was ([lower], [upper]), suggesting that the true population mean plausibly falls within this range."

Plain: "We estimate that the average [variable] is between [lower] and [upper]. We're fairly confident in this range, though the true value could be somewhat different."

Two-Sample t-Test:

Technical: "A two-sample t-test comparing [group 1] and [group 2] on [variable] yielded t([df]) = [value], p = [value], 95% CI for the difference: ([lower], [upper]), Cohen's d = [value]."

Plain: "The [group 1] group scored [higher/lower] than the [group 2] group by about [difference] points. This difference is [unlikely/somewhat unlikely/plausible] to be due to chance alone, and it represents a [small/medium/large] effect."

Chi-Square Test:

Technical: "$\chi^2$([df], N = [n]) = [value], p = [value], Cramér's V = [value], indicating a [small/medium/large] association between [variable 1] and [variable 2]."

Plain: "We found a [statistically significant] relationship between [variable 1] and [variable 2]. In practical terms, knowing someone's [variable 1] helps predict their [variable 2] — but only [a little/moderately/a lot]."

Regression:

Technical: "Simple linear regression showed that [x] significantly predicted [y], b = [slope], t([df]) = [value], p = [value], $R^2$ = [value]. For each one-unit increase in [x], [y] increased by [slope] units."

Plain: "[X] and [y] are clearly related: for every additional [unit of x], [y] tends to [increase/decrease] by about [slope] [units]. The model explains about [R² × 100]% of the variation in [y] — meaningful, but there are other factors at play."

ANOVA:

Technical: "A one-way ANOVA revealed significant differences among the [k] groups, F([df₁], [df₂]) = [value], p = [value], $\eta^2$ = [value]. Post-hoc Tukey's HSD tests showed that [group A] differed significantly from [group B] (p = [value])."

Plain: "The [k] groups are not all the same. In particular, [group A] had notably [higher/lower] scores than [group B]. The grouping explained about [η² × 100]% of the overall variation in scores."

The "So What?" Test

After writing any statistical result, ask yourself: "If my reader stopped reading right here, would they know what to do next?"

If the answer is no, you haven't finished writing. Every result needs: 1. The finding (what happened) 2. The magnitude (how big) 3. The uncertainty (how confident) 4. The implication (so what?)

Here's an example of each level, building from insufficient to complete:

Level 1 (insufficient): "The result was statistically significant (p < .05)."

Level 2 (better): "Users spent significantly more time on the platform with the new algorithm (t(498) = 2.53, p = .012)."

Level 3 (good): "Users spent an average of 4.5 more minutes per session with the new algorithm (t(498) = 2.53, p = .012, d = 0.32, 95% CI: 1.0 to 8.0 minutes)."

Level 4 (excellent): "Users spent an average of 4.5 more minutes per session with the new algorithm — a statistically significant difference (p = .012) representing a small-to-medium effect (d = 0.32). At StreamVibe's scale of 12 million daily users, even this modest improvement translates to approximately 54 million additional minutes of engagement per day. The 95% confidence interval (1.0 to 8.0 minutes) suggests the true improvement could be as small as 1 minute or as large as 8 minutes per session."

25.8 Spaced Review 2: Interpreting p-Values for Non-Statisticians (from Ch. 13)

One of the hardest communication challenges in all of statistics is explaining p-values to non-technical audiences. Let's review what we learned in Chapter 13 and translate it into communication language.

What a p-value is: The probability of seeing data as extreme as what we observed, if there were really no effect.

What a p-value is NOT: - It is NOT the probability that the null hypothesis is true - It is NOT the probability that the result happened by chance - It is NOT the probability that you'll get the same result if you repeat the study

How to communicate p-values:

Don't Say	Do Say
"There's only a 3% chance this was due to chance"	"If there were truly no effect, we'd see a result this extreme only about 3% of the time"
"We proved that the treatment works"	"The data provides strong evidence that the treatment has an effect"
"The result was highly significant (p = .001)"	"The evidence against no effect is very strong (p = .001), and the effect size was [small/medium/large]"
"The p-value was .06, so the treatment doesn't work"	"The evidence was suggestive but didn't reach the conventional threshold for statistical significance (p = .06). The estimated effect was [size], which may warrant further investigation with a larger sample"
"p < .05 so it's real"	"The result is statistically significant, meaning it's unlikely to be just noise. Whether it's large enough to matter in practice depends on [context]"

Test yourself (retrieval practice): A colleague tells a client: "Our A/B test showed p = 0.03, so there's a 97% chance the new design is better." What's wrong with this statement? How would you rephrase it?

Check your answer

The statement confuses the p-value with the probability that the alternative hypothesis is true. A p-value of 0.03 means: "If the two designs were truly identical, we'd see a difference this large only about 3% of the time." It does NOT mean there's a 97% chance the new design is better. A better phrasing: "Our A/B test found strong evidence that the new design performs differently (p = 0.03). The new design produced [X]% higher conversions, which represents a [small/medium/large] improvement."

25.9 Presenting Uncertainty Honestly

One of the most important — and most frequently skipped — aspects of data communication is presenting uncertainty. In a world of bold headlines and confident predictions, admitting "we're not entirely sure" feels uncomfortable. But honest uncertainty is what separates rigorous analysis from guesswork.

Showing Uncertainty in Visualizations

There are several ways to show uncertainty visually:

1. Error bars (confidence intervals on bar charts):

import matplotlib.pyplot as plt
import numpy as np

# ============================================================
# SHOWING UNCERTAINTY: ERROR BARS AND CONFIDENCE BANDS
# ============================================================

groups = ['Control', 'Treatment A', 'Treatment B', 'Treatment C']
means = [52, 58, 61, 57]
ci_lower = [48, 53, 55, 51]
ci_upper = [56, 63, 67, 63]
errors = [[m - lo for m, lo in zip(means, ci_lower)],
          [hi - m for m, hi in zip(means, ci_upper)]]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# --- Without uncertainty ---
ax1 = axes[0]
ax1.bar(groups, means, color='steelblue', edgecolor='none', width=0.5)
ax1.set_title('Treatment Comparison\n(Without Uncertainty)',
              fontsize=12, color='#333333')
ax1.set_ylabel('Mean Score', fontsize=11, color='#555555')
ax1.set_ylim(0, 75)
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.grid(axis='y', alpha=0.2)

# --- With uncertainty (95% CIs) ---
ax2 = axes[1]
ax2.bar(groups, means, color='steelblue', edgecolor='none', width=0.5,
        yerr=errors, capsize=5, error_kw={'color': '#333333', 'linewidth': 1.5})
ax2.set_title('Treatment Comparison\n(With 95% Confidence Intervals)',
              fontsize=12, color='#333333')
ax2.set_ylabel('Mean Score', fontsize=11, color='#555555')
ax2.set_ylim(0, 75)
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.grid(axis='y', alpha=0.2)

# Add annotation about overlap
ax2.annotate('CIs overlap: difference\nmay not be significant',
             xy=(2.5, 60), fontsize=9, color='#E74C3C',
             ha='center', style='italic')

for ax in axes:
    ax.tick_params(colors='#555555')

plt.tight_layout()
plt.savefig('uncertainty.png', dpi=150, bbox_inches='tight')
plt.show()

print("Left: Without CIs, Treatment B looks clearly best.")
print("Right: With CIs, the picture is much less clear —")
print("       Treatment B's interval overlaps with Treatment A and C.")
print("       Honest visualization prevents premature conclusions.")

2. Confidence bands on regression lines:

When you show a regression line, always include the confidence band (the shaded region showing the range of plausible regression lines). Seaborn's regplot() does this by default — don't turn it off.

3. Hedging language in text:

Confidence Level	Appropriate Language
Very strong evidence (p < .001, large effect)	"The data clearly shows..." / "There is strong evidence that..."
Good evidence (p < .05, medium effect)	"The data suggests..." / "We found evidence that..."
Suggestive but inconclusive (p = .05–.10)	"There are hints that..." / "The trend suggests, but further data is needed..."
No evidence (p > .10)	"We found no evidence that..." (NOT "We proved there is no effect")
Small effect size	"While statistically significant, the practical impact is modest"

Key Insight: "We found no evidence that X causes Y" is very different from "We proved that X does not cause Y." The first is humble and accurate. The second claims certainty that your data cannot support. Statistics can reject claims; it almost never proves them.

25.10 Spaced Review 3: Effect Sizes — Always Include Them (from Ch. 17)

Chapter 17 taught us that statistical significance (the p-value) answers "is there an effect?" while effect sizes answer "how big is it?" In any report you write, you need both.

Here's why: with a large enough sample, any effect — no matter how tiny — will be statistically significant. A drug that lowers blood pressure by 0.1 mmHg might achieve p < .001 with 100,000 patients, but no doctor would prescribe it. The effect size tells you whether the finding matters.

Quick reference for common effect sizes:

Measure	Small	Medium	Large	Used For
Cohen's d	0.2	0.5	0.8	Comparing two group means
$r$ (correlation)	0.1	0.3	0.5	Linear relationships
$R^2$	0.01	0.09	0.25	Variance explained (regression)
$\eta^2$	0.01	0.06	0.14	Variance explained (ANOVA)
Cramér's V	0.10	0.30	0.50	Categorical associations

In your reports, always include: 1. The p-value (is the effect real?) 2. The effect size (how big?) 3. The confidence interval (how precise is our estimate?) 4. A plain-language interpretation of practical significance

Connection to Chapter 17: Chapter 17's threshold concept was the distinction between statistical and practical significance. In this chapter, we operationalize that distinction: every time you write a result, include both. No exceptions.

Test yourself (retrieval practice): You find that a new tutoring program improves test scores by 2 points (p = .003, d = 0.15). Write a one-sentence summary that honestly communicates both the statistical significance and the practical significance.

Check your answer

"The tutoring program produced a statistically significant improvement in test scores (p = .003), but the effect was small (d = 0.15, approximately 2 points on a 100-point scale) — real but unlikely to change letter grades for most students."

25.11 Structuring a Data Analysis Report

Whether you're writing a class assignment, a professional memo, a blog post, or a journal article, every data analysis report follows the same basic structure. The details vary, but the skeleton doesn't.

The Five Sections

1. Introduction (The "Why")

What question are you trying to answer? Why does it matter? What has been done before?

State the research question or business problem clearly
Provide context — why should the reader care?
Preview your approach (briefly)
End with a thesis or hypothesis

Template: "We investigated whether [variable X] is associated with [variable Y] in [population], using data from [source]. This question matters because [context]. Based on prior research / domain knowledge, we hypothesized that [hypothesis]."

2. Methods (The "How")

How did you collect or obtain the data? What analyses did you run? Why those analyses?

Describe the data source, sample size, and key variables
Explain your analysis methods and why you chose them
Note any data cleaning or transformations
State the significance level (usually $\alpha = 0.05$)
Include enough detail that someone could replicate your analysis

Template: "We used data from [source], which contains [n] observations and [k] variables collected between [dates]. The primary outcome variable was [Y], measured as [description]. The primary predictor was [X], measured as [description]. We conducted [analysis type] to test whether [X] was associated with [Y], controlling for [covariates if applicable]. Data cleaning included [brief description]. All analyses were conducted in Python [version] using [libraries]."

3. Results (The "What")

What did you find? Present results clearly and completely.

Lead with the most important finding
Include both statistical significance AND effect size
Present confidence intervals
Use visualizations to support (not replace) the text
Report exact p-values (p = .017, not just p < .05)
Include sample sizes

Template: "[Variable X] was significantly [positively/negatively] associated with [variable Y] (test statistic = [value], p = [value], 95% CI: [range], effect size = [value]). Figure [n] shows [description of the visualization]."

4. Discussion (The "So What")

What do the results mean? How do they connect to the bigger picture?

Interpret findings in context
Compare to prior research or expectations
Discuss practical significance (not just statistical significance)
Address alternative explanations
Discuss what you can't conclude (especially causation from observational data)

5. Limitations (The "But")

What could be wrong? What should the reader be cautious about?

Sampling limitations (who was included/excluded?)
Measurement limitations (how accurate are the variables?)
Confounding (what variables weren't controlled?)
Generalizability (does this apply to other populations?)
Statistical limitations (multiple comparisons, low power, etc.)

Key Insight: Including a strong limitations section does not weaken your report — it strengthens it. It shows you understand your own analysis deeply enough to know where it could go wrong. Reviewers, editors, and executives trust analysts who acknowledge limitations more than those who pretend their analysis is bulletproof.

The Executive Summary

For business and policy reports, add an executive summary at the very beginning — a half-page (or less) summary of the entire report. Assume the executive will read only this section.

The executive summary should answer four questions: 1. What did we study? (One sentence) 2. What did we find? (One or two sentences) 3. Why does it matter? (One sentence) 4. What should we do? (One sentence — the recommendation)

Example (Alex): "We tested whether StreamVibe's new recommendation algorithm increases watch time compared to the existing algorithm. Users in the new-algorithm group watched an average of 4.5 more minutes per session (p = .012, 95% CI: 1.0 to 8.0 minutes). At 12 million daily active users, this translates to approximately 54 million additional engagement minutes per day. We recommend a full rollout of the new algorithm, with continued A/B testing to monitor long-term retention effects."

25.12 Writing for Different Audiences: The Same Result, Two Ways

Let's practice the most important skill in this chapter: translating results for different audiences. We'll take one analysis — Sam's comparison of Daria's shooting before and after a new training regimen — and write it two ways.

The Data

Daria's three-point shooting percentage before the training regimen was 31% (historical average). After the training regimen, she made 25 out of 65 three-point attempts (38.5%).

From earlier chapters, we know: - One-proportion z-test: z = 1.30, p = .097 - 95% CI for her true shooting percentage: (26.7%, 50.3%) - Effect size (Cohen's h): 0.166 (small)

Version 1: For the Sports Analytics Conference Paper

"A one-proportion z-test was conducted to determine whether Daria Williams's three-point shooting percentage improved following the implementation of a targeted training protocol. The null hypothesis was $H_0: p = 0.31$, with a one-sided alternative $H_a: p > 0.31$. Williams converted 25 of 65 three-point attempts ($\hat{p} = 0.385$) during the post-training observation period. The test yielded $z = 1.30$, $p = .097$, failing to reach conventional significance at $\alpha = .05$. The 95% confidence interval for the true post-training percentage was (0.267, 0.503), which includes the historical baseline of 0.31. Cohen's $h = 0.166$ indicates a small effect. While the point estimate suggests improvement, the wide confidence interval and insufficient power (estimated at 24% for this sample size) preclude a definitive conclusion. A larger observation window of approximately 240 attempts would be needed to achieve 80% power to detect an effect of this magnitude."

Version 2: For the Riverside Raptors Coaching Staff

"Here's what we know so far: since starting the new training routine, Daria has been shooting 38.5% from three — up from her career average of 31%. That's encouraging, but we need to be honest: with only 65 shots in the data, we can't be confident this improvement is real and not just a hot streak.

Statistically, the improvement is suggestive but not conclusive. Her true shooting percentage could be anywhere from 27% to 50% — that's a wide range. To be more certain, we'd need to track about 240 more three-point attempts, roughly 30 to 35 more games.

My recommendation: keep the training routine. The data is trending in the right direction, and there's no downside to continuing. But let's not redesign the offense around an improvement that isn't confirmed yet. I'll update you when we have enough data to draw a firm conclusion."

Notice the differences: - Version 1 includes test statistics, p-values, and formulas - Version 2 uses no jargon — "hot streak" instead of "sampling variability" - Version 1 reports Cohen's h; Version 2 translates it to "suggestive but not conclusive" - Version 2 includes a recommendation; Version 1 does not (that's not expected in an academic paper) - Both are honest about uncertainty

25.13 Ethical Analysis: When Data Visualization Becomes Manipulation

Ethical Analysis Block

Every design choice in data visualization is also an ethical choice. Truncating an axis, choosing a color scheme, selecting a time window, writing a title — each decision shapes how the viewer perceives reality. At some point, "designing for impact" becomes "designing to deceive."

Where is the line?

Consider these scenarios:

A pharmaceutical company presents a bar chart showing their drug's effectiveness. The y-axis starts at 90% instead of 0%, making a 3% improvement look like the bars triple. The footnote says "(axis does not start at zero)." Is this honest?

A politician's campaign shows a graph of unemployment during their opponent's term. The graph starts the month unemployment peaked and ends the month it was highest. The data is accurate, but the time window was selected to maximize the appearance of a bad trend. Is this honest?

A data journalist creates an interactive map of crime rates. She uses red to represent high-crime areas, which happen to correlate with neighborhoods that are predominantly Black and Latino. The map is factually accurate but could reinforce racial stereotypes. Is this responsible?

A tech company's annual report shows user growth as a line chart with a steep upward slope. They don't mention that the y-axis is logarithmic — on a linear scale, growth has actually flattened. Is this misleading?

Discussion questions:

Is there a difference between technically accurate and honestly presented?

Who bears responsibility when viewers misinterpret a visualization — the creator or the viewer?

Should data visualization be held to the same ethical standards as journalism? As advertising?

The American Statistical Association's Ethical Guidelines (2022) state that statisticians should "present their findings and interpretations honestly and objectively." What does "objectively" mean when every visualization requires subjective design choices?

The bottom line: You can't make a visualization without making choices, and every choice has the potential to mislead. The ethical data communicator doesn't avoid choices — that's impossible. Instead, they make choices that serve the reader's understanding rather than the presenter's agenda. When in doubt, ask: "Would I change this design if I wanted the reader to reach a different conclusion?" If yes, you're shaping the narrative. Make sure you're shaping it honestly.

25.14 Our Characters in Action

Let's see how our four anchor characters handle the communication challenges of this chapter.

Maya: Writing a Public Health Brief for City Council

Maya has completed her analysis of ER visit rates and poverty across 25 communities (from Chapter 22). Now she needs to present her findings to the city council — elected officials who may have taken one statistics course decades ago.

Her challenge: the regression shows a strong correlation between poverty and ER visits ($r = 0.96$, $R^2 = 0.92$). But as she discovered in Chapter 22, the relationship is partly driven by lurking variables: uninsured rates and primary care physician availability. If the council sees "poverty causes ER overcrowding" and responds by funding anti-poverty programs, they might miss the faster, more direct interventions — expanding Medicaid enrollment and recruiting primary care physicians.

Maya's approach: 1. Executive summary: Lead with the finding and the recommendation, not the methodology 2. One key visualization: A scatterplot of poverty rate vs. ER visits with annotations showing which communities have high/low physician access 3. Plain language: "Poverty is correlated with ER overcrowding, but expanding insurance access and primary care would likely reduce ER visits more quickly than general anti-poverty programs" 4. Honest uncertainty: "These are associations, not proof of causation. A pilot program in 3-4 communities would help us test these relationships before committing to a county-wide policy"

Alex: Creating a Dashboard for StreamVibe Executives

Alex needs to present the A/B test results (from Chapter 16) in a format that StreamVibe executives — people who make decisions about millions of users — will actually use. An executive dashboard isn't a research paper. It's a decision-support tool.

Alex's approach: 1. One-page dashboard with three key metrics: engagement lift, retention impact, revenue projection 2. Green/yellow/red status indicators (not p-values): green = statistically significant and practically meaningful; yellow = significant but small effect; red = no significant difference 3. Confidence ranges, not point estimates: "Expected revenue impact: $2.1M to $6.8M annually (95% CI)" — because executives understand ranges 4. Action-oriented title: "New Algorithm: Recommend Full Rollout" — the dashboard recommends, backed by data

Sam: Presenting Analytics Findings to the Coaching Staff

Sam is presenting Daria's shooting analysis to coaches who think in terms of wins, losses, and play-calling — not p-values and effect sizes.

Sam's approach: 1. No statistical jargon: "The numbers are encouraging but not conclusive" instead of "we failed to reject the null hypothesis" 2. Visual: A simple before/after bar chart with error bars, labeled "Before Training: 31%" and "After Training: 38.5% (but could be anywhere from 27% to 50%)" 3. Actionable recommendation: "Continue the training program. We'll need about 240 more attempts — roughly 35 games — before we can be confident the improvement is real" 4. The honest hedge: "I don't want to redesign the playbook based on 65 shots. Let's watch the data and decide after the All-Star break"

James: Communicating Algorithmic Bias Findings to Policymakers

James has the hardest communication challenge. His regression analysis (from Chapter 22) shows that the predictive policing algorithm's risk scores predict recidivism with $R^2 = 0.85$ overall — but only $R^2 = 0.73$ for Black defendants compared to $R^2 = 0.91$ for white defendants. At a risk score of 7, the actual recidivism rate for white defendants is 51%, but only 38% for Black defendants. The same score means very different things for different racial groups.

James needs to communicate this to policymakers who will use his findings to decide whether to continue, modify, or discontinue the algorithm.

James's approach: 1. Lead with the equity finding, not the technical accuracy: "The algorithm works, but it doesn't work equally well for everyone" 2. Use a comparison the audience understands: "Imagine a thermometer that reads 5 degrees too high for one group of patients and 5 degrees too low for another. The average might be accurate, but the individual readings aren't fair" 3. Separate the statistical finding from the policy recommendation: "The data shows a disparity. Whether that disparity is acceptable is a policy decision, not a statistical one" 4. Provide options, not ultimatums: "Option A: recalibrate the algorithm with race-specific thresholds. Option B: supplement scores with human review for scores near the threshold. Option C: discontinue algorithmic scoring and return to judge discretion. Each option has tradeoffs I can quantify"

25.15 Reproducibility: Why Your Analysis Should Be Replicable

Here's a scenario that happens far too often: a data analyst presents impressive findings to their organization. Decisions are made. Months later, someone asks, "Can you re-run that analysis with updated data?" The analyst opens their files and discovers... they can't reconstruct their own work. Which version of the dataset did they use? Why did they exclude those 47 rows? What was the random seed? Which library version produced that chart?

Reproducible analysis means that someone else — or your future self — can take your code, your data, and your documentation and arrive at the exact same results.

The Reproducibility Checklist

Element	What to Do	Why
Raw data	Save the original, unmodified dataset	So you can start from scratch if needed
Cleaning log	Document every cleaning step (deletions, transformations, imputations)	So others understand (and can question) your choices
Code	Write all analysis in a script or notebook — no manual spreadsheet editing	So the analysis is deterministic and repeatable
Random seeds	Set `np.random.seed()` for any simulation or sampling	So bootstrap CIs, permutation tests, etc. produce identical results
Library versions	Record version numbers of all packages	So results don't change when libraries are updated
Comments	Explain why you made each analytical decision, not just what you did	So your reasoning is transparent

# ============================================================
# REPRODUCIBILITY TEMPLATE: ANALYSIS SETUP
# ============================================================

# Environment documentation
import sys
import numpy as np
import pandas as pd
import matplotlib
import scipy
import statsmodels

print("=" * 60)
print("ANALYSIS ENVIRONMENT")
print("=" * 60)
print(f"Python version:      {sys.version}")
print(f"NumPy version:       {np.__version__}")
print(f"pandas version:      {pd.__version__}")
print(f"matplotlib version:  {matplotlib.__version__}")
print(f"scipy version:       {scipy.__version__}")
print(f"statsmodels version: {statsmodels.__version__}")
print(f"Analysis date:       2026-03-15")
print(f"Analyst:             [Your name]")
print(f"Random seed:         42")
print("=" * 60)

# Set random seed for reproducibility
np.random.seed(42)

# Data loading with explicit path and description
# data = pd.read_csv('path/to/data.csv')
# print(f"Rows: {len(data)}, Columns: {len(data.columns)}")
# print(f"Date range: {data['date'].min()} to {data['date'].max()}")

Key Insight: Reproducibility isn't just about scientific integrity — it's about professional self-preservation. Six months from now, when your boss asks you to re-run the analysis with Q4 data added, you'll either thank yourself for writing clean, documented code or curse yourself for doing the analysis through a series of unreproducible manual steps.

Connection to Chapter 7: In Chapter 7, we discussed cleaning logs and reproducibility as part of data wrangling. This chapter elevates reproducibility from a data cleaning habit to a full analysis principle. Your cleaning log (Ch. 7) becomes the data section of your methods (Ch. 25).

25.16 Threshold Concept Review: Decomposing Variability (from Ch. 20)

Spaced Review — Threshold Concept

In Chapter 20, you learned the threshold concept of decomposing variability: the total variation in your data can be split into explained variation (between groups) and unexplained variation (within groups). In Chapter 22, you saw the same idea applied to regression: $SS_{\text{Total}} = SS_{\text{Regression}} + SS_{\text{Residual}}$, with $R^2$ measuring the proportion explained.

Why does this matter for communication? Because when you present regression results, $R^2$ is one of the most intuitive numbers to communicate:

"$R^2 = 0.72$ means our model explains about 72% of the variation in [outcome]. The remaining 28% is due to other factors we didn't measure."

That sentence works for any audience. An executive understands "72% explained." A policymaker understands "28% due to other factors." A fellow statistician appreciates the precision.

Variability decomposition is also the natural way to present uncertainty: "Our model accounts for most of the pattern, but about [1 - R²]% remains unexplained — and that's where surprises can hide."

25.17 Threshold Concept Review: Holding Other Variables Constant (from Ch. 23)

Spaced Review — Threshold Concept

Chapter 23 introduced the threshold concept of holding other variables constant — the idea that a regression coefficient tells you the effect of one predictor while controlling for all the others.

Why does this matter for communication? Because one of the most common misinterpretations of regression is ignoring the "all else equal" clause. When you write "for each additional year of education, income increases by $5,200," you must add "controlling for age, gender, and work experience" — otherwise, the reader will think education alone drives the difference.

Template for communicating multiple regression coefficients:

"After controlling for [list of other variables], each additional [unit of predictor] was associated with a [direction] change of [coefficient] [units] in [outcome] (p = [value], 95% CI: [range])."

The phrase "after controlling for" is your best friend in multiple regression communication. It's intuitive enough for non-technical audiences and precise enough for technical ones.

25.18 Python: Polishing Your Visualizations

Throughout this course, you've been creating charts that get the job done. Now let's make them professional-quality. This section covers the matplotlib and seaborn techniques that separate student work from presentation-ready output.

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
import seaborn as sns

# ============================================================
# PROFESSIONAL CHART TEMPLATE
# A complete, polished visualization you can adapt
# ============================================================

# Set global style
sns.set_style("whitegrid")
plt.rcParams.update({
    'font.family': 'sans-serif',
    'font.size': 11,
    'axes.titlesize': 14,
    'axes.labelsize': 12,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10,
    'figure.dpi': 150,
    'savefig.dpi': 300,
    'savefig.bbox': 'tight'
})

# --- Example: Professional scatterplot with regression ---
np.random.seed(42)
x = np.random.normal(50, 15, 100)
y = 0.8 * x + np.random.normal(0, 10, 100) + 20

fig, ax = plt.subplots(figsize=(8, 6))

# Main scatter
ax.scatter(x, y, color='steelblue', alpha=0.6, edgecolors='navy',
           s=50, linewidth=0.5, label='Observed data')

# Regression line
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
x_line = np.linspace(x.min(), x.max(), 100)
y_line = intercept + slope * x_line
ax.plot(x_line, y_line, color='#E74C3C', linewidth=2,
        label=f'Regression line (R² = {r_value**2:.2f})')

# Confidence band (simplified — using standard error of prediction)
y_pred = intercept + slope * x
residuals = y - y_pred
se_resid = np.std(residuals)
ax.fill_between(x_line, y_line - 1.96 * se_resid,
                y_line + 1.96 * se_resid,
                color='#E74C3C', alpha=0.1, label='95% prediction interval')

# Titles and labels
ax.set_title('Study Hours and Exam Scores\nPositive Association with'
             ' Meaningful Scatter',
             fontweight='normal', color='#333333', pad=15)
ax.set_xlabel('Study Hours per Week', color='#555555')
ax.set_ylabel('Exam Score', color='#555555')

# Clean styling
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#CCCCCC')
ax.spines['bottom'].set_color('#CCCCCC')
ax.legend(frameon=True, framealpha=0.9, edgecolor='#CCCCCC',
          fontsize=10, loc='upper left')
ax.tick_params(colors='#555555')

# Source and note
ax.text(0.99, -0.12, 'Source: Simulated data for illustration',
        transform=ax.transAxes, fontsize=8, color='gray',
        ha='right', va='top')

plt.tight_layout()
plt.savefig('professional_scatter.png', dpi=300, bbox_inches='tight')
plt.show()

print(f"Regression: ŷ = {intercept:.1f} + {slope:.2f}x")
print(f"R² = {r_value**2:.3f}")
print(f"p-value = {p_value:.4f}")
print("\nProfessional touches applied:")
print("  1. Descriptive two-line title (finding, not just variables)")
print("  2. Clean spines (no top/right border)")
print("  3. Light gridlines that don't compete with data")
print("  4. Alpha transparency on points to show density")
print("  5. Confidence band showing uncertainty")
print("  6. Source citation at bottom")
print("  7. Legend with R² value")
print("  8. Muted color palette (steelblue + red accent)")

Key Polishing Techniques

Technique	Code	Effect
Remove top/right spines	`ax.spines['top'].set_visible(False)`	Cleaner, less boxy appearance
Soften remaining spines	`ax.spines['left'].set_color('#CCCCCC')`	Less visual noise
Add transparency	`alpha=0.6` in `scatter()`	Shows overlapping points
Use descriptive titles	State the finding, not just the variables	Guides interpretation
Add source notes	`ax.text()` below the chart	Supports credibility
Set consistent DPI	`plt.rcParams['figure.dpi'] = 150`	Sharp output on all screens
Save at high resolution	`plt.savefig('file.png', dpi=300)`	Print-quality output
Use `tight_layout()`	`plt.tight_layout()`	Prevents label clipping

Seaborn Shortcuts for Publication-Ready Charts

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# ============================================================
# SEABORN: PUBLICATION-READY CHARTS
# ============================================================

np.random.seed(42)

# Sample data
df = pd.DataFrame({
    'group': np.repeat(['Control', 'Treatment'], 50),
    'score': np.concatenate([
        np.random.normal(70, 12, 50),
        np.random.normal(78, 11, 50)
    ])
})

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# 1. Box plot with individual points
ax1 = axes[0]
sns.boxplot(data=df, x='group', y='score', ax=ax1,
            palette=['steelblue', '#E74C3C'],
            fliersize=0, width=0.4)
sns.stripplot(data=df, x='group', y='score', ax=ax1,
              color='#333333', alpha=0.4, size=4, jitter=True)
ax1.set_title('Box Plot with Data Points', fontsize=12,
              color='#333333')
ax1.set_ylabel('Score', color='#555555')
ax1.set_xlabel('')

# 2. Violin plot
ax2 = axes[1]
sns.violinplot(data=df, x='group', y='score', ax=ax2,
               palette=['steelblue', '#E74C3C'],
               inner='quartile', cut=0)
ax2.set_title('Violin Plot\n(Shows Full Distribution)', fontsize=12,
              color='#333333')
ax2.set_ylabel('Score', color='#555555')
ax2.set_xlabel('')

# 3. Bar plot with CI
ax3 = axes[2]
sns.barplot(data=df, x='group', y='score', ax=ax3,
            palette=['steelblue', '#E74C3C'],
            capsize=0.1, errwidth=1.5, ci=95)
ax3.set_title('Bar Plot with 95% CI', fontsize=12,
              color='#333333')
ax3.set_ylabel('Score', color='#555555')
ax3.set_xlabel('')
ax3.set_ylim(0, 100)

for ax in axes:
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.tick_params(colors='#555555')

plt.tight_layout()
plt.savefig('seaborn_options.png', dpi=150, bbox_inches='tight')
plt.show()

print("Three ways to show the same comparison:")
print("  1. Box plot + strip: shows summary AND individual data")
print("  2. Violin: shows the full distribution shape")
print("  3. Bar + CI: shows mean and uncertainty (but hides distribution)")
print("\nTufte would prefer option 1 — it maximizes data display.")

25.19 Excel: Chart Formatting Best Practices

Not everyone uses Python. Excel is the most widely used data tool in the world, and its default chart settings produce... let's say room for improvement. Here's how to make Excel charts that Tufte wouldn't cringe at.

Step-by-Step: Cleaning Up an Excel Chart

Step	What to Do	How
1	Remove the legend (if only one data series)	Click legend → Delete
2	Remove gridlines (or make them lighter)	Click gridlines → Format → Color: light gray, weight: thin
3	Remove the chart border	Click chart area → Format → No border
4	Add a descriptive title	Click the title → Replace "Chart Title" with your finding
5	Label axes clearly	Add axis labels with units (e.g., "Revenue ($ millions)")
6	Start the y-axis at zero (for bar charts)	Right-click y-axis → Format Axis → Minimum = 0
7	Use a single color (unless color encodes data)	Click all bars → Format → Fill: one muted color
8	Add data labels	Right-click bars → Add Data Labels
9	Remove 3D effects	Change chart type to 2D equivalent
10	Remove unnecessary elements	Delete text boxes, arrows, shapes that don't show data

Excel Color Recommendations

Purpose	Recommended Colors	Avoid
Single series	Steel blue (#4472C4) or dark gray (#595959)	Bright red, neon green, orange
Two groups	Steel blue + coral red (#E74C3C)	Red + green (colorblind issue)
Sequential data	Light-to-dark gradient in one hue	Rainbow (jet) colors
Categories	Office's built-in "Color Blind Safe" palette	Default rainbow

Quick Wins in Excel

Conditional formatting in tables can replace many charts entirely — a well-formatted data table with color scales is often clearer than a chart for small datasets
Sparklines (tiny charts inside cells) are excellent for showing trends alongside data
PivotChart + PivotTable combinations allow interactive exploration that static charts can't match

25.20 Presenting Statistical Findings: Oral Communication

At some point, you'll stand in front of an audience and present your data. Whether it's a classroom presentation, a team meeting, a board briefing, or a conference talk, oral presentation adds a layer of challenge that written reports don't have: your audience can't re-read a confusing sentence or flip back to check a definition.

The Rules of Statistical Presentations

Rule	Why	How
Lead with the punchline	Your audience may stop listening at any moment	Slide 1: the finding and the recommendation
One idea per slide	Complex slides are ignored	Each slide answers one question
Minimize text	The audience reads or listens — they can't do both	Use visuals, not bullet points
Speak the uncertainty	Oral presentations tempt overconfidence	"We're fairly confident, but..."
Anticipate "So what?"	Every audience member is thinking it	Build toward the implication
Prepare for "How do you know?"	Someone will challenge your method	Have backup slides with methodology
Don't read p-values aloud	"p equals point zero one two" is meaningless to most audiences	"The evidence is strong" or "the difference is statistically significant"

Slide Design for Data

Element	Do	Don't
Charts	One chart per slide, annotated	Multiple charts crammed together
Titles	"Sales Increased 12% After Campaign" (finding)	"Sales Data" (label)
Animation	Build charts piece by piece to guide attention	Spinning transitions
Font size	Minimum 24pt for text, 18pt for labels	Anything below 16pt
Source	Small text at bottom of chart	No attribution

25.21 Progressive Project: Draft Your Report

Progressive Project Checkpoint — Chapter 25

It's time to start turning your portfolio notebook into a report. You've been analyzing your chosen dataset for twenty-four chapters. Now draft three key sections.

Task: Draft the Introduction, Methods, and Results sections of your Data Detective Portfolio report.

Step 1: Introduction (aim for ~300 words)

Use this template: ```

Introduction

Research Question

[State your question clearly in one sentence]

Background

[Why does this question matter? Who cares about the answer? Provide 2-3 sentences of context.]

Dataset Overview

[What dataset are you using? How was it collected? How many observations and variables?]

Hypothesis

[Based on your exploratory analysis (Ch.5-6), what do you expect to find? State it clearly.] ```

Step 2: Methods (aim for ~250 words)

Include: - Data source, sample size, date range - Key variables (response and explanatory) - Data cleaning summary (reference your Ch.7 cleaning log) - Statistical methods used and why you chose them - Software and library versions

Step 3: Results (aim for ~400 words)

Include: - At least one polished visualization (applying Tufte's principles) - Descriptive statistics with interpretation - At least one formal inference result (CI, hypothesis test, or regression) with: - Test statistic and p-value - Effect size - Confidence interval - Plain-language interpretation

Step 4: Self-Assessment

After writing, check your draft against this rubric:

Criterion Yes/No

Is my research question clearly stated?

Did I include both statistical significance AND effect size?

Are my visualizations free of chartjunk?

Would a non-statistician understand my results section?

Did I include uncertainty (CIs, hedging language)?

Is my analysis reproducible (random seeds, library versions)?

Did I cite my data source?

Deliverable: Add these three sections to your Jupyter notebook under a new heading: "Report Draft: Introduction, Methods, and Results."

25.22 Chapter Summary

What We Learned

This chapter was different from every other chapter in this book. Instead of learning a new statistical technique, you learned how to communicate all the techniques you already know. Here's the arc:

Design principles matter. Tufte's data-ink ratio, chartjunk elimination, and small multiples give you a framework for creating visualizations that serve the data, not the designer.
Misleading techniques are everywhere. Truncated axes, cherry-picked time windows, dual y-axes, 3D effects, and overloaded pie charts can make honest data tell dishonest stories. Now you can spot them and avoid creating them.
Write for your audience. The same result needs different translations for different readers. Technical audiences want precision and reproducibility. Non-technical audiences want the bottom line and a recommendation. Good communicators can do both.
Present uncertainty honestly. Error bars on charts, hedging language in text, and confidence intervals in reports are not signs of weakness — they're signs of integrity.
Structure your reports. Introduction, Methods, Results, Discussion, Limitations. Every good report follows this skeleton, whether it's a two-page memo or a fifty-page thesis.
Reproducibility is non-negotiable. Document your data, your code, your decisions, and your software versions. Your future self will thank you.
Accessibility is design. Color-blind-friendly palettes, direct labels instead of legends, and patterns in addition to colors make your work accessible to everyone.
Communication is the superpower. You can be the most technically skilled analyst in the room, but if you can't explain what you found, your analysis might as well not exist.

Connections to What's Next

In Chapter 26, you'll encounter statistics from the other side — as a consumer rather than a producer. You'll learn to critically evaluate claims made by AI systems, news articles, advertisements, and social media posts. The communication skills you built in this chapter will become detection skills: you'll be able to spot when someone else is using the misleading techniques we catalogued here.

What's Next: Chapter 26 asks the question that ties this entire course together: "Now that you know how statistics works, how do you evaluate the statistical claims that surround you every day?" You'll learn to be a critical consumer of data in the age of AI — spotting bad statistics, evaluating algorithmic claims, and asking the right questions when someone says "the data shows..."

"The goal is to turn data into information, and information into insight." — Carly Fiorina

Criterion	Yes/No
Is my research question clearly stated?
Did I include both statistical significance AND effect size?
Are my visualizations free of chartjunk?
Would a non-statistician understand my results section?
Did I include uncertainty (CIs, hedging language)?
Is my analysis reproducible (random seeds, library versions)?
Did I cite my data source?