Exercises: Time Series Visualization

DataField.Dev

Exercises: Time Series Visualization

These exercises assume import pandas as pd, import matplotlib.pyplot as plt, import matplotlib.dates as mdates, import numpy as np, and use pandas's DatetimeIndex throughout.

Part A: Conceptual (6 problems)

A.1 ★☆☆ | Recall

Name three matplotlib.dates locators and describe when each is appropriate.

Guidance

`YearLocator` — one tick per year (optionally every N years via `base=N`). Best for multi-year or multi-decade charts. `MonthLocator` — one tick per month (or specific months via `bymonth`). Best for charts spanning months to a few years. `DayLocator` — one tick per day (or specific days via `bymonthday`). Best for charts spanning weeks. Other options include `HourLocator`, `MinuteLocator`, `WeekdayLocator`, and `AutoDateLocator`.

A.2 ★☆☆ | Recall

What are the three components of a classical seasonal decomposition?

Guidance

**Trend** (the long-term direction, smooth), **seasonal** (the repeating cycle within each period), and **residual** (what is left after removing trend and seasonal — the noise). The additive model: `observed = trend + seasonal + residual`. The multiplicative model: `observed = trend × seasonal × residual`.

A.3 ★★☆ | Understand

Explain the difference between a simple moving average and an exponential moving average.

Guidance

A **simple moving average (SMA)** gives equal weight to every observation in the window. A 30-day SMA is the mean of the last 30 days, each weighted 1/30. An **exponential moving average (EMA)** gives more weight to recent observations than older ones, with the weight decaying exponentially. EMA reacts faster to recent changes than SMA. Visually similar on most data, but EMA has less lag. `df.rolling(window=30).mean()` is SMA; `df.ewm(span=30).mean()` is EMA.

A.4 ★★☆ | Understand

When should you use a calendar heatmap instead of a line chart?

Guidance

Use a calendar heatmap when you want to see **weekly cycles**, **seasonal patterns**, or **individual day outliers** in daily data. A line chart of daily data across multiple years is often cluttered; the calendar heatmap compresses the same information into a compact scannable layout. Line charts are better when the question is about **trend** or **magnitude** rather than the calendar pattern.

A.5 ★★☆ | Analyze

Describe what "banking to 45 degrees" means and why it matters for time series charts.

Guidance

Banking to 45 degrees is the perceptual principle (from William Cleveland's 1988 research) that line charts are most readable when the average slope of the line is close to 45 degrees. At 45 degrees, the reader can compare adjacent segments' slopes most accurately. Too flat or too steep, and slope comparisons become harder. For time series, this means choosing the aspect ratio to make the interesting slopes appear at roughly 45 degrees on average — usually a wide chart for long time series, more square for volatile short ones.

A.6 ★★★ | Evaluate

A colleague sends you a chart of "daily website traffic over 3 years" with a 365-day rolling mean as the only line. What do you suggest?

Guidance

Several issues. (1) A 365-day rolling mean smooths out annual seasonality entirely, which may be part of the story. Suggest also showing the raw data or a 7- or 30-day rolling mean to preserve finer patterns. (2) Without the raw data visible, specific events (outages, viral posts) disappear. Consider a two-layer chart with raw data in light gray and the smoothed line on top. (3) The 365-day window also means the first year shows nothing, because there is not enough history. Disclose this or use a shorter window for early data.

Part B: Applied (10 problems)

B.1 ★☆☆ | Apply

Create a time series DataFrame with a DatetimeIndex and plot it with a formatted year axis.

Guidance

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

dates = pd.date_range("2015-01-01", "2024-12-31", freq="D")
df = pd.DataFrame({"value": np.random.randn(len(dates)).cumsum()}, index=dates)

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df.index, df["value"])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
plt.show()

B.2 ★☆☆ | Apply

Add a 30-day rolling mean to the chart from B.1, along with the raw data shown in light gray.

Guidance

df["ma30"] = df["value"].rolling(window=30).mean()

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df.index, df["value"], color="lightgray", linewidth=0.5, label="Daily")
ax.plot(df.index, df["ma30"], color="steelblue", linewidth=2, label="30-day MA")
ax.legend()

B.3 ★★☆ | Apply

Perform a seasonal decomposition using statsmodels and plot the four panels (observed, trend, seasonal, residual).

Guidance

from statsmodels.tsa.seasonal import seasonal_decompose

# Create a series with clear seasonality
dates = pd.date_range("2015-01-01", "2024-12-31", freq="D")
season = 5 * np.sin(2 * np.pi * np.arange(len(dates)) / 365)
trend = np.arange(len(dates)) * 0.01
noise = np.random.randn(len(dates))
df = pd.DataFrame({"value": trend + season + noise}, index=dates)

result = seasonal_decompose(df["value"], model="additive", period=365)

fig, axes = plt.subplots(4, 1, figsize=(12, 10), sharex=True)
axes[0].plot(result.observed); axes[0].set_ylabel("Observed")
axes[1].plot(result.trend); axes[1].set_ylabel("Trend")
axes[2].plot(result.seasonal); axes[2].set_ylabel("Seasonal")
axes[3].plot(result.resid); axes[3].set_ylabel("Residual")
plt.tight_layout()

B.4 ★★☆ | Apply

Add a vertical line with an annotation at a specific date ("Event X on 2020-03-15") to a time series chart.

Guidance

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df.index, df["value"])
ax.axvline(pd.Timestamp("2020-03-15"), color="red", linestyle="--", alpha=0.7)
ax.annotate("Event X", xy=(pd.Timestamp("2020-03-15"), ax.get_ylim()[1]),
            xytext=(10, -15), textcoords="offset points", color="red", fontsize=9)

B.5 ★★☆ | Apply

Highlight anomalies (points more than 2 standard deviations from the mean) as red scatter markers on top of the line chart.

Guidance

mean = df["value"].mean()
std = df["value"].std()
anomalies = df[abs(df["value"] - mean) > 2 * std]

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df.index, df["value"], color="steelblue")
ax.scatter(anomalies.index, anomalies["value"], color="red", s=40, zorder=5,
           label="Anomaly")
ax.legend()

B.6 ★★☆ | Apply

Use pd.DataFrame.resample to convert daily data to monthly averages and plot the result.

Guidance

df_monthly = df["value"].resample("M").mean()

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df_monthly.index, df_monthly, marker="o")
ax.set_title("Monthly Mean")

For start-of-month anchoring, use `"MS"` instead of `"M"`.

B.7 ★★★ | Apply

Build an interactive Plotly time series with a range slider and unified hover mode.

Guidance

import plotly.express as px

fig = px.line(df, x=df.index, y="value", title="Time Series")
fig.update_layout(
    xaxis_rangeslider_visible=True,
    hovermode="x unified",
)
fig.show()

B.8 ★★☆ | Apply

Create a small-multiples chart of monthly mean values across several years, one panel per year.

Guidance

import seaborn as sns

df_monthly = df.resample("MS").mean().reset_index()
df_monthly["year"] = df_monthly["index"].dt.year
df_monthly["month"] = df_monthly["index"].dt.month

g = sns.FacetGrid(df_monthly, col="year", col_wrap=3, height=2.5, aspect=1.2)
g.map(plt.plot, "month", "value")

B.9 ★★★ | Apply

Build a calendar heatmap of daily data using calplot (or manually with matplotlib if calplot is unavailable).

Guidance

# With calplot:
import calplot
calplot.calplot(df["value"], cmap="YlOrRd")

# Manual matplotlib version (sketch):
df["year"] = df.index.year
df["week"] = df.index.isocalendar().week
df["day"] = df.index.dayofweek
pivot = df.pivot_table(values="value", index="day", columns="week")
fig, ax = plt.subplots(figsize=(20, 3))
ax.imshow(pivot, cmap="YlOrRd", aspect="auto")

B.10 ★★★ | Create

Build a forecast visualization: historical line in black, forecast line in red, 80% confidence band shaded.

Guidance

hist = df[df.index < "2023-01-01"]
forecast = df[df.index >= "2023-01-01"]
forecast_mean = forecast["value"]
forecast_lower = forecast_mean - 2
forecast_upper = forecast_mean + 2

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(hist.index, hist["value"], color="black", label="Historical")
ax.plot(forecast.index, forecast_mean, color="red", label="Forecast")
ax.fill_between(forecast.index, forecast_lower, forecast_upper,
                color="red", alpha=0.2, label="80% CI")
ax.axvline(pd.Timestamp("2023-01-01"), color="gray", linestyle="--")
ax.legend()

Part C: Synthesis (4 problems)

C.1 ★★★ | Analyze

Take the climate temperature dataset (150 years of annual data). Build a 4-panel figure: (a) full series with 10-year rolling mean, (b) STL decomposition, (c) cycle plot of monthly means, (d) calendar heatmap of the last 20 years. Describe what each panel reveals that the others do not.

Guidance

Panel (a) shows the long-term trend clearly with the rolling mean; the raw data gives context. Panel (b) separates trend from seasonal from residual, making the magnitude of seasonal variation explicit. Panel (c) reveals which months have warmed fastest — in climate data, winter months typically warm faster than summer months. Panel (d) shows specific years and days that were unusually hot or cold, revealing individual events the other panels hide. Together they give a complete time series analysis; no single panel alone would answer all four questions.

C.2 ★★★ | Evaluate

You are asked to visualize the number of deaths per day from COVID-19 in a country, across 2020-2023. Which techniques from this chapter apply, and why?

Guidance

(1) **Rolling mean** to smooth reporting noise (weekends had low reporting, producing a weekly cycle). (2) **Annotations** for lockdowns, vaccine rollouts, variant emergence. (3) **Log scale** for the exponential growth phases. (4) **Range slider** in Plotly if the chart is interactive, so readers can zoom into specific waves. (5) **Faceting by region** if the country has sub-national variation. (6) **Forecast visualization** if the chart includes projection. Avoid dual-axis (deaths + cases), non-zero baselines for area charts, and aggressive smoothing that hides individual wave peaks.

C.3 ★★★ | Create

Build a sparkline-style chart inline with a short text summary: "Revenue ↗ $1.2M (up 15%) [sparkline]".

Guidance

Use matplotlib's `figsize=(1.5, 0.3)`, remove axes and spines, and place the chart next to text in a larger figure or an HTML document. The sparkline function from Section 25.8 is reusable: call it with your data and embed the result.

C.4 ★★★ | Evaluate

The chapter argues that time series charts often need multiple visualizations at different scales. When is this overkill? Can you think of scenarios where one chart is enough?

Guidance

One chart is enough when: (1) the audience has a single specific question ("how did sales do this quarter?"), (2) the time span is short (days or weeks, not years), (3) there is no seasonality to disentangle, (4) there are no anomalies worth highlighting. In these cases, a single well-designed line chart does the job. The multi-panel approach is for exploratory or comprehensive analysis where the analyst wants to understand the series fully. For a simple operational dashboard, a single chart is usually better than four.

These exercises exercise the main time series visualization techniques. Chapter 26 introduces text and NLP visualization.