Key Takeaways: Working with Dates, Times, and Time Series Data

This is your reference card for Chapter 11. Keep it handy whenever you're working with dates, time series, or any data that changes over time.


The Date Handling Workflow

Every time you encounter a date column, follow this sequence:

1. CHECK THE DTYPE          df["date"].dtype
    |                       If "object" -> it's a string, needs parsing.
    v
2. PARSE WITH FORMAT        pd.to_datetime(df["date"], format="%Y-%m-%d")
    |                       Always specify format when possible.
    v
3. HANDLE ERRORS            errors="coerce" converts bad values to NaT.
    |                       Check .isna().sum() for unparseable entries.
    v
4. SET AS INDEX             df = df.set_index("date")
    |                       Unlocks time-based slicing and resampling.
    v
5. VERIFY COMPLETENESS      Check for gaps with date_range().difference().
    |                       Fill gaps if needed (ffill, interpolate).
    v
6. ANALYZE                  resample(), rolling(), groupby with .dt

Date Format Codes Reference

Code Meaning Example
%Y 4-digit year 2023
%y 2-digit year 23
%m Month (01-12) 03
%d Day (01-31) 15
%B Full month name March
%b Abbreviated month Mar
%H Hour, 24-hour (00-23) 14
%I Hour, 12-hour (01-12) 02
%M Minute (00-59) 30
%S Second (00-59) 45
%p AM/PM PM
%A Full weekday name Wednesday
%a Abbreviated weekday Wed

Common format strings:

Date String Format String
"2023-03-15" "%Y-%m-%d"
"03/15/2023" "%m/%d/%Y"
"15/03/2023" "%d/%m/%Y"
"15-Mar-2023" "%d-%b-%Y"
"March 15, 2023" "%B %d, %Y"
"2023-03-15 14:30:00" "%Y-%m-%d %H:%M:%S"

The .dt Accessor

Access datetime components from a column (like .str for strings):

df["date"].dt.year          # 2023
df["date"].dt.month         # 3
df["date"].dt.day           # 15
df["date"].dt.hour          # 14
df["date"].dt.dayofweek     # 2 (Mon=0, Sun=6)
df["date"].dt.day_name()    # "Wednesday"
df["date"].dt.month_name()  # "March"
df["date"].dt.quarter       # 1
df["date"].dt.dayofyear     # 74
df["date"].dt.is_leap_year  # False
df["date"].dt.date          # date only (no time)
df["date"].dt.time          # time only (no date)

Formatting output:

df["date"].dt.strftime("%B %Y")  # "March 2023"

Date Arithmetic

Operation Code Result Type
Subtract two dates date2 - date1 Timedelta
Get days from Timedelta td.days int
Add days date + pd.Timedelta(days=7) Timestamp
Add calendar months date + pd.DateOffset(months=1) Timestamp
Add business days date + pd.offsets.BDay(5) Timestamp

Key distinction: - Timedelta(days=30) = exactly 30 days (calendar-unaware) - DateOffset(months=1) = one calendar month (handles varying month lengths)


Resampling Quick Reference

Frequency aliases:

Alias Meaning
"D" Calendar day
"B" Business day (Mon-Fri)
"W" Weekly (default: Sunday)
"W-MON" Weekly, anchored to Monday
"MS" Month start
"ME" Month end
"QS" Quarter start
"QE" Quarter end
"YS" Year start
"YE" Year end
"h" Hourly
"min" Minutely

Common resampling patterns:

# Daily -> Weekly total
series.resample("W").sum()

# Daily -> Monthly average
series.resample("ME").mean()

# Hourly -> Daily max
series.resample("D").max()

# Multiple aggregations at once
series.resample("ME").agg(["mean", "min", "max", "std"])

Choosing the right aggregation:

If you want... Use... Example
Total over the period .sum() Total monthly revenue
Typical value .mean() Average daily temperature
Extreme value .max() / .min() Hottest day of the month
Number of observations .count() Days with data this month
Last value of the period .last() Month-end stock price

Rolling Windows Guide

# Basic 7-day rolling average
series.rolling(7).mean()

# Start computing before full window is available
series.rolling(7, min_periods=1).mean()

# Centered window (for historical analysis)
series.rolling(7, center=True).mean()

# Other rolling statistics
series.rolling(7).std()     # volatility
series.rolling(7).min()     # local minimum
series.rolling(7).max()     # local maximum
series.rolling(7).median()  # robust average
series.rolling(7).sum()     # running total over window

Window size guidelines:

Window Best for
3-day Very responsive, still noisy
7-day Standard for daily data (removes day-of-week effect)
14-day Smooths out biweekly patterns
30-day Monthly trend, very smooth
90-day Quarterly trend

Expanding Windows

# Cumulative mean (running average)
series.expanding().mean()

# Cumulative sum (running total)
series.expanding().sum()

# Cumulative max (running high)
series.expanding().max()

Rolling vs. Expanding: - Rolling = fixed window, moves through data (recent trend) - Expanding = window grows from start to current point (all-time statistic)


Common Patterns Cheat Sheet

# Parse dates during CSV loading
df = pd.read_csv("file.csv", parse_dates=["date_col"])

# Filter to a date range
mask = (df["date"] >= "2023-01-01") & (df["date"] < "2023-04-01")
q1 = df[mask]

# Time-based slicing with DatetimeIndex
df_indexed = df.set_index("date")
df_indexed["2023-03"]          # all of March
df_indexed["2023-01":"2023-03"] # Jan through March

# Group by day of week
df.groupby(df["date"].dt.day_name())["value"].mean()

# Weekend vs. weekday
df["is_weekend"] = df["date"].dt.dayofweek >= 5
df.groupby("is_weekend")["value"].mean()

# Find missing dates
full_range = pd.date_range(df["date"].min(), df["date"].max())
missing = full_range.difference(df["date"])

# Generate a date sequence
dates = pd.date_range("2023-01-01", periods=365, freq="D")

# Shift for lag comparison
df["yesterday"] = df["value"].shift(1)
df["last_week"] = df["value"].shift(7)
df["change"] = df["value"] - df["value"].shift(1)
df["pct_change"] = df["value"].pct_change() * 100

Common Pitfalls and Fixes

Pitfall Symptom Fix
Dates still strings dtype: object pd.to_datetime()
Day-month ambiguity "01/02" parsed wrong Specify format= or dayfirst=True
Unparseable entries ValueError on bad dates Use errors="coerce"
NaT in calculations Unexpected missing results Check for NaT with .isna()
Wrong week start Weekly totals don't align Use "W-MON" instead of "W"
NaN at start of rolling First N-1 values missing Use min_periods=1
Timezone confusion Times don't match Store in UTC, convert for display

Key Vocabulary

Term Definition
Timestamp A single point in time (pandas' datetime type)
DatetimeIndex An index of Timestamps enabling time-based operations
Timedelta A duration (difference between two Timestamps)
DateOffset A calendar-aware time offset (e.g., "one month")
NaT "Not a Time" — the datetime equivalent of NaN
Resample Change the frequency of time series data (daily to weekly)
Rolling window A sliding window that computes statistics over N recent points
Expanding window A growing window from the start to the current point
strftime Format a datetime as a string (string-format-time)
strptime Parse a string into a datetime (string-parse-time)