Case Study 1: Visualizing Vaccination Trends — Elena's First Dashboard

Contributors to Introduction to Data Science

Case Study 1: Visualizing Vaccination Trends — Elena's First Dashboard

Tier 3 — Illustrative/Composite Example: This case study follows Elena, the public health analyst introduced in Chapter 1, as she builds her first set of matplotlib charts using vaccination data. Elena is a composite character, and the specific data values, chart designs, and discovery process are constructed for pedagogical purposes. However, the types of analyses she performs and the visualization challenges she encounters reflect genuine practices in public health data analysis.

The Setting

Elena has been working with the WHO vaccination dataset since Chapter 7. She's loaded it, cleaned it, reshaped it, and computed summary statistics. She has pages of numbers in her notebook — means and medians, group-by aggregations, counts and percentages. Her DataFrame is pristine.

But she has a problem: no one wants to read her DataFrame.

She's preparing for a meeting with her supervisor, Dr. Okafor, who oversees a regional public health initiative. Dr. Okafor doesn't want a spreadsheet. She wants a story. She wants to see — literally see — which regions are falling behind, whether things are getting better or worse, and where the resources should go next.

Elena opens a fresh Jupyter notebook cell, types import matplotlib.pyplot as plt, and stares at the blinking cursor. She has the data. She has the chart plans she sketched in Chapter 14. Now she needs to turn those plans into actual figures.

This is her first real dashboard: a set of four charts that tell the story of global vaccination coverage.

Chart 1: The Regional Comparison

Elena's first chart plan reads:

Question:     How do vaccination rates compare across WHO regions?
Chart type:   Bar chart
x-axis:       WHO region
y-axis:       Mean vaccination rate (%)
Audience:     Dr. Okafor — explanatory

She starts simple:

import matplotlib.pyplot as plt

regions = ["Africa", "Americas", "SE Asia",
           "Europe", "E Med", "W Pacific"]
rates = [48, 79, 82, 91, 73, 88]

fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(regions, rates, color="steelblue")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
plt.show()

It works. Six bars, clearly different heights, Africa visibly lower than the rest. But it's not ready for Dr. Okafor. Elena remembers the Chapter 14 lesson: explanatory charts need a finding in the title, not just a topic. She also wants to highlight Africa, add a reference line at the global average, and clean up the design.

fig, ax = plt.subplots(figsize=(9, 5))

colors = ["darkorange" if r == "Africa" else "steelblue"
          for r in regions]
ax.bar(regions, rates, color=colors)

global_avg = sum(rates) / len(rates)
ax.axhline(y=global_avg, color="gray", linestyle="--",
           alpha=0.6, label=f"Global Avg ({global_avg:.0f}%)")

ax.set_title("Africa's Vaccination Rate Trails the Global Average by 27 Points",
             fontsize=13, fontweight="bold")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
ax.legend(frameon=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
fig.tight_layout()
plt.show()

The difference is dramatic. The orange bar draws the eye immediately. The reference line provides context. The title states the finding. This is a chart that tells a story.

What Elena learned: A default bar chart is fine for exploration, but for communication, the details matter — highlighting, reference lines, and descriptive titles transform data into a narrative.

Chart 2: The Trend Over Time

Elena's second chart plan:

Question:     How have vaccination rates changed from 2015 to 2023?
Chart type:   Line chart (multiple series)
Color:        One line per selected country
Audience:     Dr. Okafor — explanatory

She chooses three countries that represent different trajectories — a success story, a decline and recovery, and a persistent struggle:

years = list(range(2015, 2024))
rwanda = [93, 95, 97, 97, 98, 93, 96, 98, 99]
brazil = [84, 79, 76, 72, 73, 68, 70, 77, 80]
chad = [36, 38, 40, 39, 41, 35, 37, 42, 45]

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(years, rwanda, label="Rwanda", color="seagreen",
        linewidth=2.5, marker="o", markersize=5)
ax.plot(years, brazil, label="Brazil", color="steelblue",
        linewidth=2.5, marker="s", markersize=5)
ax.plot(years, chad, label="Chad", color="tomato",
        linewidth=2.5, marker="^", markersize=5)

ax.axhline(y=80, color="gray", linestyle=":", alpha=0.4)
ax.text(2023.3, 80.5, "80% target", fontsize=9, color="gray")

ax.set_title("Rwanda Leads; Brazil Recovers; Chad Still Struggles\n"
             "Vaccination Rates 2015-2023", fontsize=13)
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
ax.legend(frameon=False, fontsize=11)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.grid(True, alpha=0.15)
fig.tight_layout()
plt.show()

Three lines, three stories. Rwanda's green line hugs the top — consistently above 90%, a vaccination success story. Brazil's blue line dips from 2015 to 2020 and then climbs back — a decline and recovery that raises questions about what happened. Chad's red line stays at the bottom — never above 45%, a persistent gap.

The 80% reference line adds context: it's a WHO target. Rwanda comfortably exceeds it. Brazil just reached it in 2023. Chad is less than halfway there.

Elena notices something interesting that she hadn't seen in her tables: Brazil's decline started before COVID-19, not during it. The 2020 pandemic gets blamed for many things, but Brazil's vaccination decline began in 2015. This is a finding that deserves investigation — and she never would have noticed it without plotting the data.

What Elena learned: Line charts with multiple series reveal comparative trajectories that tables hide. The unexpected finding about Brazil's pre-pandemic decline illustrates why Tukey said visualization "forces us to notice what we never expected to see."

Chart 3: The Distribution

Elena's third chart plan:

Question:     What does the distribution of country-level vaccination rates look like?
Chart type:   Histogram
Audience:     Exploratory (for Elena's own understanding)

This one is for Elena herself — she wants to understand the shape of the data before making any claims about it.

import random
random.seed(42)
# Simulated country-level data (150 countries)
all_rates = [random.gauss(72, 18) for _ in range(150)]
all_rates = [max(5, min(99, r)) for r in all_rates]

fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(all_rates, bins=20, color="steelblue",
        edgecolor="white", alpha=0.8)
ax.axvline(x=sum(all_rates)/len(all_rates), color="tomato",
           linestyle="--", label="Mean")
ax.set_xlabel("Vaccination Rate (%)")
ax.set_ylabel("Number of Countries")
ax.set_title("Distribution of Country-Level Vaccination Rates")
ax.legend(frameon=False)
fig.tight_layout()
plt.show()

The histogram reveals something tables couldn't: the distribution is roughly bell-shaped but with a left tail — a cluster of countries with very low rates. The mean line sits near 72%, but the spread is wide. Some countries are at 95%+; others are below 30%.

Elena decides this chart isn't ready for Dr. Okafor — it's exploratory, and the simulated data doesn't have the real-world nuance she'd need. But it guides her next analysis: she wants to look more closely at the countries in that left tail. Who are they? What do they have in common?

What Elena learned: Histograms answer "what does the data look like?" in a way that means and medians cannot. The left tail — a cluster of low-performing countries — is the kind of structural feature that summary statistics obscure.

Chart 4: The Small Multiples

Elena's most ambitious chart — and the one she's most nervous about — is a set of small multiples showing the distribution of vaccination rates by income group:

import random
random.seed(42)

income_groups = {
    "Low Income": [random.gauss(45, 12) for _ in range(35)],
    "Lower Middle": [random.gauss(65, 10) for _ in range(40)],
    "Upper Middle": [random.gauss(80, 8) for _ in range(40)],
    "High Income": [random.gauss(92, 5) for _ in range(35)],
}

# Clamp all values to 5-99 range
for key in income_groups:
    income_groups[key] = [max(5, min(99, v))
                          for v in income_groups[key]]

fig, axes = plt.subplots(1, 4, figsize=(16, 4), sharey=True)

for ax, (group, data) in zip(axes, income_groups.items()):
    ax.hist(data, bins=12, color="steelblue",
            edgecolor="white", range=(0, 100))
    ax.set_title(group, fontsize=11, fontweight="bold")
    ax.set_xlabel("Rate (%)")
    ax.axvline(x=sum(data)/len(data), color="tomato",
               linestyle="--", alpha=0.7)

axes[0].set_ylabel("Number of Countries")
fig.suptitle("Vaccination Rate Distributions by Income Group:\n"
             "The Rich-Poor Divide Is Clear", fontsize=14)
fig.tight_layout()
plt.show()

Four panels, shared y-axis, shared x-axis range (0-100). The pattern is stark: the Low Income histogram is centered around 45% with wide spread. The High Income histogram is packed tightly around 92%. The middle groups fall in between, neatly ordered.

This is the chart that Elena knows will matter most in her meeting. It doesn't just show that rich countries have higher vaccination rates — it shows that the distributions barely overlap. The best-performing low-income countries don't even reach the average of high-income countries. That's not a gap; it's a chasm.

What Elena learned: Small multiples with shared axes are one of the most powerful comparison tools in data visualization. The faceted histograms reveal distributional differences that a simple bar chart of averages would flatten.

The Meeting

Elena saves all four charts as PNG files at 300 DPI:

fig.savefig("income_distributions.png", dpi=300, bbox_inches="tight")

In the meeting, Dr. Okafor spends about five seconds on the first chart (the regional bar chart) — she already knew Africa was behind. She spends thirty seconds on the line chart — the Brazil finding is new to her, and she asks Elena to investigate the pre-pandemic decline further.

But the small multiples chart stops her. She studies it for a full minute, then says: "Can you add Rwanda to the Low Income panel? I want to show that being low-income doesn't have to mean low vaccination. Rwanda is proof."

Elena makes a note. She'll add an annotation in the next version. This is the cycle of data visualization: build, present, get feedback, refine.

Technical Takeaways

Start with a chart plan. Elena's plans from Chapter 14 guided every chart. She never sat in front of matplotlib wondering "what should I plot?"
Build iteratively. Each chart started as a bare-bones default and was progressively refined. Don't try to write the final version on the first attempt.
Exploratory and explanatory are different. The histogram (Chart 3) was for Elena's own understanding — she didn't polish it. The bar chart and line chart were for Dr. Okafor — they got titles, annotations, and clean design.
Small multiples are powerful. The faceted histograms (Chart 4) communicated more than any single chart could. Shared axes were essential for honest comparison.
Visualization reveals what tables hide. Brazil's pre-pandemic decline, the left-tail cluster of low-performing countries, the non-overlapping income-group distributions — none of these were visible in summary statistics alone.
Save at the right resolution. 300 DPI for print, bbox_inches="tight" to trim whitespace. These small details determine whether your chart looks professional or amateurish in a report.

Discussion Questions

Elena chose three specific countries for her line chart (Rwanda, Brazil, Chad). How might the story change if she chose different countries? Is there a risk of cherry-picking? How would you defend her choices?
The income group histogram has four categories. What would happen if Elena used a single overlapping histogram instead of small multiples? When are overlapping histograms acceptable, and when do they become unreadable?
Dr. Okafor's request to add Rwanda to the Low Income panel is interesting — it would highlight an exception to the overall pattern. How would you implement this annotation in matplotlib? What would it add to the chart's message?

End of Case Study 1