Case Study 2: Tracking the Pandemic Timeline — Vaccination Rollout Analysis

Contributors to Introduction to Data Science

Case Study 2: Tracking the Pandemic Timeline — Vaccination Rollout Analysis

Tier 3 — Illustrative/Composite Example: This case study uses simulated vaccination data loosely inspired by the structure of datasets published by Our World in Data and national health ministries during the COVID-19 pandemic. All country-specific numbers, dates, and vaccination counts are fictional and simplified for pedagogical purposes. The analytical techniques — rolling averages, resampling, cross-country comparison, and milestone tracking — reflect methods widely used in public health data reporting during 2020-2023.

The Setting

Elena, our public health analyst, has been tasked with a critical project: analyze the COVID-19 vaccination rollout across three countries to answer questions that policymakers need answered before their next funding meeting.

The questions are specific and time-sensitive:

How quickly did each country ramp up its vaccination program?
When did each country reach 50% of its population with at least one dose?
Did the vaccination pace slow down over time, and if so, when?
How do 7-day rolling averages compare across countries?

Elena has daily vaccination data for three countries over eight months. The data includes the number of new doses administered each day.

The Data

import pandas as pd
import numpy as np

df = pd.read_csv("vaccination_rollout.csv")
print(df.head(10))
print(f"\nShape: {df.shape}")
print(f"Countries: {df['country'].unique()}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")

      country        date  daily_doses  population
0   Aldoria  2023-01-15         3200    12000000
1   Aldoria  2023-01-16         4100    12000000
2   Aldoria  2023-01-17         3800    12000000
3   Aldoria  2023-01-18         5200    12000000
4   Aldoria  2023-01-19         4900    12000000
5   Aldoria  2023-01-20         2100    12000000
6   Aldoria  2023-01-21         1800    12000000
7   Aldoria  2023-01-22         5500    12000000
8   Aldoria  2023-01-23         5200    12000000
9   Aldoria  2023-01-24         4800    12000000

Shape: (726, 4)
Countries: ['Aldoria' 'Brevara' 'Caliston']
Date range: 2023-01-15 to 2023-09-01

Step 1: Parse and Validate Dates

Elena's first move is always the same: parse the dates and verify them.

df["date"] = pd.to_datetime(df["date"])
print(df["date"].dtype)  # datetime64[ns]

# Check for any gaps in the date sequence for each country
for country in df["country"].unique():
    country_data = df[df["country"] == country]
    expected = pd.date_range(
        country_data["date"].min(),
        country_data["date"].max(),
        freq="D")
    actual = country_data["date"]
    missing = expected.difference(actual)
    print(f"{country}: {len(actual)} days, "
          f"{len(missing)} gaps")

Aldoria: 230 days, 0 gaps
Brevara: 230 days, 0 gaps
Caliston: 230 days, 0 gaps

No missing dates. That's unusually clean — in real pandemic data, reporting gaps are common, especially on weekends and holidays. Elena notes this but proceeds.

Step 2: Compute Cumulative Doses and Per-Capita Rates

To answer "when did each country reach 50%?", Elena needs cumulative dose counts:

# Sort by country and date
df = df.sort_values(["country", "date"])

# Cumulative doses within each country
df["cumulative_doses"] = df.groupby("country")["daily_doses"].cumsum()

# Per-capita metrics
df["doses_per_100"] = (
    df["cumulative_doses"] / df["population"] * 100
).round(2)

# Check progress at the end
latest = df.groupby("country").last()
print(latest[["cumulative_doses", "doses_per_100"]])

           cumulative_doses  doses_per_100
country
Aldoria             4876200          40.64
Brevara             6543800          72.71
Caliston            3234500          26.95

After eight months, Brevara has administered 72.7 doses per 100 people, Aldoria 40.6, and Caliston just 27.0. The gap is substantial.

Step 3: The 7-Day Rolling Average

This is the metric that defined pandemic reporting. Every news outlet reported the "7-day rolling average" of cases, deaths, and vaccinations. Elena computes it for each country:

df["rolling_7d"] = (df
    .groupby("country")["daily_doses"]
    .transform(lambda x: x.rolling(7, min_periods=1).mean())
    .round(0))

# Compare the 7-day average at different points in time
checkpoints = ["2023-02-15", "2023-04-15",
               "2023-06-15", "2023-08-15"]

for date_str in checkpoints:
    date = pd.Timestamp(date_str)
    snapshot = df[df["date"] == date][
        ["country", "daily_doses", "rolling_7d"]]
    print(f"\n--- {date.strftime('%B %d, %Y')} ---")
    print(snapshot.to_string(index=False))

--- February 15, 2023 ---
  country  daily_doses  rolling_7d
  Aldoria         8200      7543.0
  Brevara        15200     14286.0
 Caliston         5100      4671.0

--- April 15, 2023 ---
  country  daily_doses  rolling_7d
  Aldoria        24500     22857.0
  Brevara        38200     36429.0
 Caliston        14800     13571.0

--- June 15, 2023 ---
  country  daily_doses  rolling_7d
  Aldoria        28100     26714.0
  Brevara        32500     31143.0
 Caliston        16200     15286.0

--- August 15, 2023 ---
  country  daily_doses  rolling_7d
  Aldoria        18500     17571.0
  Brevara        21200     20143.0
 Caliston        12100     11429.0

The rolling averages tell a clear story: - All three countries ramped up rapidly from January to April - Brevara peaked first and started declining by June - Aldoria peaked around June and is now declining - Caliston had the slowest start and the most modest peak

Step 4: Finding the 50% Milestone

When did each country administer enough doses to cover 50% of its population?

for country in df["country"].unique():
    country_df = df[df["country"] == country]

    # Find first day where doses_per_100 >= 50
    milestone = country_df[country_df["doses_per_100"] >= 50]

    if len(milestone) > 0:
        first_day = milestone.iloc[0]
        days_to_milestone = (
            first_day["date"] - country_df["date"].min()
        ).days
        print(f"{country}: Reached 50% on "
              f"{first_day['date'].strftime('%B %d, %Y')} "
              f"({days_to_milestone} days after start)")
    else:
        # How far along are they?
        latest = country_df.iloc[-1]
        print(f"{country}: Has not reached 50% "
              f"(currently at {latest['doses_per_100']:.1f}%)")

Aldoria: Has not reached 50% (currently at 40.6%)
Brevara: Reached 50% on June 28, 2023 (164 days after start)
Caliston: Has not reached 50% (currently at 27.0%)

Only Brevara has crossed the 50% threshold. Aldoria is approaching it, and Caliston is far behind. This is the kind of concrete finding that drives policy decisions: Caliston may need more resources, supply chain support, or public outreach.

Step 5: Detecting the Slowdown

Elena suspects that vaccination rates peaked and then declined. She quantifies this:

# Monthly average daily vaccinations
monthly = (df
    .groupby(["country", df["date"].dt.to_period("M")])
    ["daily_doses"]
    .mean()
    .round(0)
    .unstack(level=0))

print(monthly)

country     Aldoria  Brevara  Caliston
2023-01      4876     9234      3567
2023-02      8123    15678      5234
2023-03     15234    28456      9876
2023-04     22567    36789     13456
2023-05     26789    38234     15678
2023-06     27890    33456     16234
2023-07     23456    27890     14567
2023-08     18234    21234     11890

The peak months are clearly visible: - Aldoria peaked in June (~27,900/day) - Brevara peaked in May (~38,200/day) - Caliston peaked in June (~16,200/day)

Elena computes the month-over-month change to find exactly when the slowdown began:

mom_change = monthly.pct_change() * 100

# Flag months where the change turned negative
for country in monthly.columns:
    negative_months = mom_change[country][
        mom_change[country] < 0]
    if len(negative_months) > 0:
        first_decline = negative_months.index[0]
        print(f"{country}: First decline in "
              f"{first_decline} "
              f"({negative_months.iloc[0]:.1f}%)")

Aldoria: First decline in 2023-07 (-15.9%)
Brevara: First decline in 2023-06 (-12.5%)
Caliston: First decline in 2023-07 (-10.3%)

All three countries experienced their first month-over-month decline in June or July — roughly five to six months into the campaign.

Step 6: Weekday vs. Weekend Patterns

Elena investigates whether vaccination sites reduce operations on weekends:

df["is_weekend"] = df["date"].dt.dayofweek >= 5

weekend_analysis = (df
    .groupby(["country", "is_weekend"])
    ["daily_doses"]
    .mean()
    .unstack())
weekend_analysis.columns = ["Weekday Avg", "Weekend Avg"]
weekend_analysis["Weekend Drop %"] = (
    (1 - weekend_analysis["Weekend Avg"] /
     weekend_analysis["Weekday Avg"]) * 100
).round(1)

print(weekend_analysis.round(0))

          Weekday Avg  Weekend Avg  Weekend Drop %
country
Aldoria        22456        14567           35.1
Brevara        30123        20456           32.1
Caliston       13456         8234           38.8

Vaccination rates drop 32-39% on weekends across all three countries. Elena flags this for the policymakers: extending weekend vaccination hours could significantly accelerate the rollout.

To quantify the impact:

# What if weekends had weekday-level vaccinations?
for country in df["country"].unique():
    c_data = df[df["country"] == country]
    actual_total = c_data["daily_doses"].sum()
    weekend_days = c_data["is_weekend"].sum()
    weekend_actual = c_data[c_data["is_weekend"]]["daily_doses"].sum()
    weekday_avg = c_data[~c_data["is_weekend"]]["daily_doses"].mean()

    # If weekends had weekday-level dosing
    potential_gain = (weekday_avg * weekend_days) - weekend_actual
    pct_gain = (potential_gain / actual_total) * 100

    print(f"{country}: {potential_gain:,.0f} additional doses "
          f"possible ({pct_gain:.1f}% increase)")

Aldoria: 521,340 additional doses possible (10.7% increase)
Brevara: 645,890 additional doses possible (9.9% increase)
Caliston: 343,210 additional doses possible (10.6% increase)

A 10% increase in total doses — just from matching weekend operations to weekday levels. That's the kind of finding that can change policy.

Step 7: Rolling 7-Day Average Comparison

For the final deliverable, Elena creates a cross-country comparison of 7-day rolling averages, normalized per 100,000 population for fair comparison:

# Compute per-100K daily rate
df["daily_per_100k"] = (
    df["daily_doses"] / df["population"] * 100000
).round(1)

# Compute 7-day rolling of the per-100K rate
df["rolling_per_100k"] = (df
    .groupby("country")["daily_per_100k"]
    .transform(lambda x: x.rolling(7, min_periods=1).mean())
    .round(1))

# Sample at month boundaries
for month in range(1, 9):
    date = pd.Timestamp(f"2023-{month:02d}-15")
    snapshot = df[df["date"] == date][
        ["country", "rolling_per_100k"]]
    snapshot = snapshot.set_index("country")
    print(f"{date.strftime('%b')}: "
          f"Aldoria={snapshot.loc['Aldoria','rolling_per_100k']}, "
          f"Brevara={snapshot.loc['Brevara','rolling_per_100k']}, "
          f"Caliston={snapshot.loc['Caliston','rolling_per_100k']}")

Jan: Aldoria=38.2, Brevara=102.4, Caliston=29.5
Feb: Aldoria=62.9, Brevara=158.7, Caliston=38.9
Mar: Aldoria=126.9, Brevara=316.2, Caliston=82.3
Apr: Aldoria=190.5, Brevara=405.4, Caliston=113.1
May: Aldoria=223.2, Brevara=424.9, Caliston=130.6
Jun: Aldoria=222.6, Brevara=345.8, Caliston=127.4
Jul: Aldoria=195.5, Brevara=310.2, Caliston=121.4
Aug: Aldoria=146.4, Brevara=223.7, Caliston=95.2

Normalizing per 100,000 population reveals that Brevara isn't just vaccinating more people in absolute terms — it's vaccinating at a dramatically higher per-capita rate. At its peak, Brevara was administering 425 doses per 100,000 people per day, compared to Aldoria's 223 and Caliston's 131.

The Deliverable

Elena assembles her findings into a summary table for the policymakers:

summary = pd.DataFrame({
    "Country": ["Aldoria", "Brevara", "Caliston"],
    "Population": ["12M", "9M", "12M"],
    "Total Doses": ["4.88M", "6.54M", "3.23M"],
    "Coverage (%)": [40.6, 72.7, 27.0],
    "50% Milestone": ["Not yet", "June 28", "Not yet"],
    "Peak Month": ["June", "May", "June"],
    "Peak Rate (per 100K/day)": [223, 425, 131],
    "Weekend Drop (%)": [35.1, 32.1, 38.8],
    "Current Trajectory": ["Declining", "Declining", "Declining"]
})
print(summary.to_string(index=False))

The Takeaway

This case study demonstrates why time series analysis is at the heart of pandemic response. The techniques Elena used — cumulative sums, 7-day rolling averages, per-capita normalization, month-over-month comparisons, weekend/weekday breakdowns — are the same techniques used by epidemiologists, public health officials, and data journalists worldwide.

Key analytical insights:

Rolling averages reveal trends that daily data hides. The daily dose count bounces wildly from day to day. The 7-day rolling average shows the actual trajectory.
Per-capita rates enable fair comparison. Brevara has fewer total doses than Aldoria but a much higher per-capita rate. Absolute numbers mislead when populations differ.
Milestone tracking makes progress tangible. Saying "Brevara has vaccinated 6.5 million people" is less actionable than "Brevara reached 50% coverage on June 28, 164 days after starting."
Operational patterns affect outcomes. The 32-39% weekend drop translates to a 10% loss in total throughput. That's a policy lever, not just a data point.
Time series decomposition separates signal from noise. By looking at daily, weekly, monthly, and cumulative views, Elena can answer questions at every time scale — from "what happened today?" to "what's the trend over the full campaign?"

Every one of these techniques came from Chapter 11: pd.to_datetime(), .dt accessor, set_index(), resample(), rolling(), cumsum(), and groupby with time components. The tools are simple. The insights they produce are profound.