Case Study 2: Tracking the Pandemic Timeline — Vaccination Rollout Analysis
Tier 3 — Illustrative/Composite Example: This case study uses simulated vaccination data loosely inspired by the structure of datasets published by Our World in Data and national health ministries during the COVID-19 pandemic. All country-specific numbers, dates, and vaccination counts are fictional and simplified for pedagogical purposes. The analytical techniques — rolling averages, resampling, cross-country comparison, and milestone tracking — reflect methods widely used in public health data reporting during 2020-2023.
The Setting
Elena, our public health analyst, has been tasked with a critical project: analyze the COVID-19 vaccination rollout across three countries to answer questions that policymakers need answered before their next funding meeting.
The questions are specific and time-sensitive:
- How quickly did each country ramp up its vaccination program?
- When did each country reach 50% of its population with at least one dose?
- Did the vaccination pace slow down over time, and if so, when?
- How do 7-day rolling averages compare across countries?
Elena has daily vaccination data for three countries over eight months. The data includes the number of new doses administered each day.
The Data
import pandas as pd
import numpy as np
df = pd.read_csv("vaccination_rollout.csv")
print(df.head(10))
print(f"\nShape: {df.shape}")
print(f"Countries: {df['country'].unique()}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
country date daily_doses population
0 Aldoria 2023-01-15 3200 12000000
1 Aldoria 2023-01-16 4100 12000000
2 Aldoria 2023-01-17 3800 12000000
3 Aldoria 2023-01-18 5200 12000000
4 Aldoria 2023-01-19 4900 12000000
5 Aldoria 2023-01-20 2100 12000000
6 Aldoria 2023-01-21 1800 12000000
7 Aldoria 2023-01-22 5500 12000000
8 Aldoria 2023-01-23 5200 12000000
9 Aldoria 2023-01-24 4800 12000000
Shape: (726, 4)
Countries: ['Aldoria' 'Brevara' 'Caliston']
Date range: 2023-01-15 to 2023-09-01
Step 1: Parse and Validate Dates
Elena's first move is always the same: parse the dates and verify them.
df["date"] = pd.to_datetime(df["date"])
print(df["date"].dtype) # datetime64[ns]
# Check for any gaps in the date sequence for each country
for country in df["country"].unique():
country_data = df[df["country"] == country]
expected = pd.date_range(
country_data["date"].min(),
country_data["date"].max(),
freq="D")
actual = country_data["date"]
missing = expected.difference(actual)
print(f"{country}: {len(actual)} days, "
f"{len(missing)} gaps")
Aldoria: 230 days, 0 gaps
Brevara: 230 days, 0 gaps
Caliston: 230 days, 0 gaps
No missing dates. That's unusually clean — in real pandemic data, reporting gaps are common, especially on weekends and holidays. Elena notes this but proceeds.
Step 2: Compute Cumulative Doses and Per-Capita Rates
To answer "when did each country reach 50%?", Elena needs cumulative dose counts:
# Sort by country and date
df = df.sort_values(["country", "date"])
# Cumulative doses within each country
df["cumulative_doses"] = df.groupby("country")["daily_doses"].cumsum()
# Per-capita metrics
df["doses_per_100"] = (
df["cumulative_doses"] / df["population"] * 100
).round(2)
# Check progress at the end
latest = df.groupby("country").last()
print(latest[["cumulative_doses", "doses_per_100"]])
cumulative_doses doses_per_100
country
Aldoria 4876200 40.64
Brevara 6543800 72.71
Caliston 3234500 26.95
After eight months, Brevara has administered 72.7 doses per 100 people, Aldoria 40.6, and Caliston just 27.0. The gap is substantial.
Step 3: The 7-Day Rolling Average
This is the metric that defined pandemic reporting. Every news outlet reported the "7-day rolling average" of cases, deaths, and vaccinations. Elena computes it for each country:
df["rolling_7d"] = (df
.groupby("country")["daily_doses"]
.transform(lambda x: x.rolling(7, min_periods=1).mean())
.round(0))
# Compare the 7-day average at different points in time
checkpoints = ["2023-02-15", "2023-04-15",
"2023-06-15", "2023-08-15"]
for date_str in checkpoints:
date = pd.Timestamp(date_str)
snapshot = df[df["date"] == date][
["country", "daily_doses", "rolling_7d"]]
print(f"\n--- {date.strftime('%B %d, %Y')} ---")
print(snapshot.to_string(index=False))
--- February 15, 2023 ---
country daily_doses rolling_7d
Aldoria 8200 7543.0
Brevara 15200 14286.0
Caliston 5100 4671.0
--- April 15, 2023 ---
country daily_doses rolling_7d
Aldoria 24500 22857.0
Brevara 38200 36429.0
Caliston 14800 13571.0
--- June 15, 2023 ---
country daily_doses rolling_7d
Aldoria 28100 26714.0
Brevara 32500 31143.0
Caliston 16200 15286.0
--- August 15, 2023 ---
country daily_doses rolling_7d
Aldoria 18500 17571.0
Brevara 21200 20143.0
Caliston 12100 11429.0
The rolling averages tell a clear story: - All three countries ramped up rapidly from January to April - Brevara peaked first and started declining by June - Aldoria peaked around June and is now declining - Caliston had the slowest start and the most modest peak
Step 4: Finding the 50% Milestone
When did each country administer enough doses to cover 50% of its population?
for country in df["country"].unique():
country_df = df[df["country"] == country]
# Find first day where doses_per_100 >= 50
milestone = country_df[country_df["doses_per_100"] >= 50]
if len(milestone) > 0:
first_day = milestone.iloc[0]
days_to_milestone = (
first_day["date"] - country_df["date"].min()
).days
print(f"{country}: Reached 50% on "
f"{first_day['date'].strftime('%B %d, %Y')} "
f"({days_to_milestone} days after start)")
else:
# How far along are they?
latest = country_df.iloc[-1]
print(f"{country}: Has not reached 50% "
f"(currently at {latest['doses_per_100']:.1f}%)")
Aldoria: Has not reached 50% (currently at 40.6%)
Brevara: Reached 50% on June 28, 2023 (164 days after start)
Caliston: Has not reached 50% (currently at 27.0%)
Only Brevara has crossed the 50% threshold. Aldoria is approaching it, and Caliston is far behind. This is the kind of concrete finding that drives policy decisions: Caliston may need more resources, supply chain support, or public outreach.
Step 5: Detecting the Slowdown
Elena suspects that vaccination rates peaked and then declined. She quantifies this:
# Monthly average daily vaccinations
monthly = (df
.groupby(["country", df["date"].dt.to_period("M")])
["daily_doses"]
.mean()
.round(0)
.unstack(level=0))
print(monthly)
country Aldoria Brevara Caliston
2023-01 4876 9234 3567
2023-02 8123 15678 5234
2023-03 15234 28456 9876
2023-04 22567 36789 13456
2023-05 26789 38234 15678
2023-06 27890 33456 16234
2023-07 23456 27890 14567
2023-08 18234 21234 11890
The peak months are clearly visible: - Aldoria peaked in June (~27,900/day) - Brevara peaked in May (~38,200/day) - Caliston peaked in June (~16,200/day)
Elena computes the month-over-month change to find exactly when the slowdown began:
mom_change = monthly.pct_change() * 100
# Flag months where the change turned negative
for country in monthly.columns:
negative_months = mom_change[country][
mom_change[country] < 0]
if len(negative_months) > 0:
first_decline = negative_months.index[0]
print(f"{country}: First decline in "
f"{first_decline} "
f"({negative_months.iloc[0]:.1f}%)")
Aldoria: First decline in 2023-07 (-15.9%)
Brevara: First decline in 2023-06 (-12.5%)
Caliston: First decline in 2023-07 (-10.3%)
All three countries experienced their first month-over-month decline in June or July — roughly five to six months into the campaign.
Step 6: Weekday vs. Weekend Patterns
Elena investigates whether vaccination sites reduce operations on weekends:
df["is_weekend"] = df["date"].dt.dayofweek >= 5
weekend_analysis = (df
.groupby(["country", "is_weekend"])
["daily_doses"]
.mean()
.unstack())
weekend_analysis.columns = ["Weekday Avg", "Weekend Avg"]
weekend_analysis["Weekend Drop %"] = (
(1 - weekend_analysis["Weekend Avg"] /
weekend_analysis["Weekday Avg"]) * 100
).round(1)
print(weekend_analysis.round(0))
Weekday Avg Weekend Avg Weekend Drop %
country
Aldoria 22456 14567 35.1
Brevara 30123 20456 32.1
Caliston 13456 8234 38.8
Vaccination rates drop 32-39% on weekends across all three countries. Elena flags this for the policymakers: extending weekend vaccination hours could significantly accelerate the rollout.
To quantify the impact:
# What if weekends had weekday-level vaccinations?
for country in df["country"].unique():
c_data = df[df["country"] == country]
actual_total = c_data["daily_doses"].sum()
weekend_days = c_data["is_weekend"].sum()
weekend_actual = c_data[c_data["is_weekend"]]["daily_doses"].sum()
weekday_avg = c_data[~c_data["is_weekend"]]["daily_doses"].mean()
# If weekends had weekday-level dosing
potential_gain = (weekday_avg * weekend_days) - weekend_actual
pct_gain = (potential_gain / actual_total) * 100
print(f"{country}: {potential_gain:,.0f} additional doses "
f"possible ({pct_gain:.1f}% increase)")
Aldoria: 521,340 additional doses possible (10.7% increase)
Brevara: 645,890 additional doses possible (9.9% increase)
Caliston: 343,210 additional doses possible (10.6% increase)
A 10% increase in total doses — just from matching weekend operations to weekday levels. That's the kind of finding that can change policy.
Step 7: Rolling 7-Day Average Comparison
For the final deliverable, Elena creates a cross-country comparison of 7-day rolling averages, normalized per 100,000 population for fair comparison:
# Compute per-100K daily rate
df["daily_per_100k"] = (
df["daily_doses"] / df["population"] * 100000
).round(1)
# Compute 7-day rolling of the per-100K rate
df["rolling_per_100k"] = (df
.groupby("country")["daily_per_100k"]
.transform(lambda x: x.rolling(7, min_periods=1).mean())
.round(1))
# Sample at month boundaries
for month in range(1, 9):
date = pd.Timestamp(f"2023-{month:02d}-15")
snapshot = df[df["date"] == date][
["country", "rolling_per_100k"]]
snapshot = snapshot.set_index("country")
print(f"{date.strftime('%b')}: "
f"Aldoria={snapshot.loc['Aldoria','rolling_per_100k']}, "
f"Brevara={snapshot.loc['Brevara','rolling_per_100k']}, "
f"Caliston={snapshot.loc['Caliston','rolling_per_100k']}")
Jan: Aldoria=38.2, Brevara=102.4, Caliston=29.5
Feb: Aldoria=62.9, Brevara=158.7, Caliston=38.9
Mar: Aldoria=126.9, Brevara=316.2, Caliston=82.3
Apr: Aldoria=190.5, Brevara=405.4, Caliston=113.1
May: Aldoria=223.2, Brevara=424.9, Caliston=130.6
Jun: Aldoria=222.6, Brevara=345.8, Caliston=127.4
Jul: Aldoria=195.5, Brevara=310.2, Caliston=121.4
Aug: Aldoria=146.4, Brevara=223.7, Caliston=95.2
Normalizing per 100,000 population reveals that Brevara isn't just vaccinating more people in absolute terms — it's vaccinating at a dramatically higher per-capita rate. At its peak, Brevara was administering 425 doses per 100,000 people per day, compared to Aldoria's 223 and Caliston's 131.
The Deliverable
Elena assembles her findings into a summary table for the policymakers:
summary = pd.DataFrame({
"Country": ["Aldoria", "Brevara", "Caliston"],
"Population": ["12M", "9M", "12M"],
"Total Doses": ["4.88M", "6.54M", "3.23M"],
"Coverage (%)": [40.6, 72.7, 27.0],
"50% Milestone": ["Not yet", "June 28", "Not yet"],
"Peak Month": ["June", "May", "June"],
"Peak Rate (per 100K/day)": [223, 425, 131],
"Weekend Drop (%)": [35.1, 32.1, 38.8],
"Current Trajectory": ["Declining", "Declining", "Declining"]
})
print(summary.to_string(index=False))
The Takeaway
This case study demonstrates why time series analysis is at the heart of pandemic response. The techniques Elena used — cumulative sums, 7-day rolling averages, per-capita normalization, month-over-month comparisons, weekend/weekday breakdowns — are the same techniques used by epidemiologists, public health officials, and data journalists worldwide.
Key analytical insights:
-
Rolling averages reveal trends that daily data hides. The daily dose count bounces wildly from day to day. The 7-day rolling average shows the actual trajectory.
-
Per-capita rates enable fair comparison. Brevara has fewer total doses than Aldoria but a much higher per-capita rate. Absolute numbers mislead when populations differ.
-
Milestone tracking makes progress tangible. Saying "Brevara has vaccinated 6.5 million people" is less actionable than "Brevara reached 50% coverage on June 28, 164 days after starting."
-
Operational patterns affect outcomes. The 32-39% weekend drop translates to a 10% loss in total throughput. That's a policy lever, not just a data point.
-
Time series decomposition separates signal from noise. By looking at daily, weekly, monthly, and cumulative views, Elena can answer questions at every time scale — from "what happened today?" to "what's the trend over the full campaign?"
Every one of these techniques came from Chapter 11: pd.to_datetime(), .dt accessor, set_index(), resample(), rolling(), cumsum(), and groupby with time components. The tools are simple. The insights they produce are profound.