Key Takeaways: Working with Dates, Times, and Time Series Data

Contributors to Introduction to Data Science

Key Takeaways: Working with Dates, Times, and Time Series Data

This is your reference card for Chapter 11. Keep it handy whenever you're working with dates, time series, or any data that changes over time.

The Date Handling Workflow

Every time you encounter a date column, follow this sequence:

1. CHECK THE DTYPE          df["date"].dtype
    |                       If "object" -> it's a string, needs parsing.
    v
2. PARSE WITH FORMAT        pd.to_datetime(df["date"], format="%Y-%m-%d")
    |                       Always specify format when possible.
    v
3. HANDLE ERRORS            errors="coerce" converts bad values to NaT.
    |                       Check .isna().sum() for unparseable entries.
    v
4. SET AS INDEX             df = df.set_index("date")
    |                       Unlocks time-based slicing and resampling.
    v
5. VERIFY COMPLETENESS      Check for gaps with date_range().difference().
    |                       Fill gaps if needed (ffill, interpolate).
    v
6. ANALYZE                  resample(), rolling(), groupby with .dt

Date Format Codes Reference

Code	Meaning	Example
`%Y`	4-digit year	2023
`%y`	2-digit year	23
`%m`	Month (01-12)	03
`%d`	Day (01-31)	15
`%B`	Full month name	March
`%b`	Abbreviated month	Mar
`%H`	Hour, 24-hour (00-23)	14
`%I`	Hour, 12-hour (01-12)	02
`%M`	Minute (00-59)	30
`%S`	Second (00-59)	45
`%p`	AM/PM	PM
`%A`	Full weekday name	Wednesday
`%a`	Abbreviated weekday	Wed

Common format strings:

Date String	Format String
`"2023-03-15"`	`"%Y-%m-%d"`
`"03/15/2023"`	`"%m/%d/%Y"`
`"15/03/2023"`	`"%d/%m/%Y"`
`"15-Mar-2023"`	`"%d-%b-%Y"`
`"March 15, 2023"`	`"%B %d, %Y"`
`"2023-03-15 14:30:00"`	`"%Y-%m-%d %H:%M:%S"`

The `.dt` Accessor

Access datetime components from a column (like .str for strings):

df["date"].dt.year          # 2023
df["date"].dt.month         # 3
df["date"].dt.day           # 15
df["date"].dt.hour          # 14
df["date"].dt.dayofweek     # 2 (Mon=0, Sun=6)
df["date"].dt.day_name()    # "Wednesday"
df["date"].dt.month_name()  # "March"
df["date"].dt.quarter       # 1
df["date"].dt.dayofyear     # 74
df["date"].dt.is_leap_year  # False
df["date"].dt.date          # date only (no time)
df["date"].dt.time          # time only (no date)

Formatting output:

df["date"].dt.strftime("%B %Y")  # "March 2023"

Date Arithmetic

Operation	Code	Result Type
Subtract two dates	`date2 - date1`	Timedelta
Get days from Timedelta	`td.days`	int
Add days	`date + pd.Timedelta(days=7)`	Timestamp
Add calendar months	`date + pd.DateOffset(months=1)`	Timestamp
Add business days	`date + pd.offsets.BDay(5)`	Timestamp

Key distinction: - Timedelta(days=30) = exactly 30 days (calendar-unaware) - DateOffset(months=1) = one calendar month (handles varying month lengths)

Resampling Quick Reference

Frequency aliases:

Alias	Meaning
`"D"`	Calendar day
`"B"`	Business day (Mon-Fri)
`"W"`	Weekly (default: Sunday)
`"W-MON"`	Weekly, anchored to Monday
`"MS"`	Month start
`"ME"`	Month end
`"QS"`	Quarter start
`"QE"`	Quarter end
`"YS"`	Year start
`"YE"`	Year end
`"h"`	Hourly
`"min"`	Minutely

Common resampling patterns:

# Daily -> Weekly total
series.resample("W").sum()

# Daily -> Monthly average
series.resample("ME").mean()

# Hourly -> Daily max
series.resample("D").max()

# Multiple aggregations at once
series.resample("ME").agg(["mean", "min", "max", "std"])

Choosing the right aggregation:

If you want...	Use...	Example
Total over the period	`.sum()`	Total monthly revenue
Typical value	`.mean()`	Average daily temperature
Extreme value	`.max()` / `.min()`	Hottest day of the month
Number of observations	`.count()`	Days with data this month
Last value of the period	`.last()`	Month-end stock price

Rolling Windows Guide

# Basic 7-day rolling average
series.rolling(7).mean()

# Start computing before full window is available
series.rolling(7, min_periods=1).mean()

# Centered window (for historical analysis)
series.rolling(7, center=True).mean()

# Other rolling statistics
series.rolling(7).std()     # volatility
series.rolling(7).min()     # local minimum
series.rolling(7).max()     # local maximum
series.rolling(7).median()  # robust average
series.rolling(7).sum()     # running total over window

Window size guidelines:

Window	Best for
3-day	Very responsive, still noisy
7-day	Standard for daily data (removes day-of-week effect)
14-day	Smooths out biweekly patterns
30-day	Monthly trend, very smooth
90-day	Quarterly trend

Expanding Windows

# Cumulative mean (running average)
series.expanding().mean()

# Cumulative sum (running total)
series.expanding().sum()

# Cumulative max (running high)
series.expanding().max()

Rolling vs. Expanding: - Rolling = fixed window, moves through data (recent trend) - Expanding = window grows from start to current point (all-time statistic)

Common Patterns Cheat Sheet

# Parse dates during CSV loading
df = pd.read_csv("file.csv", parse_dates=["date_col"])

# Filter to a date range
mask = (df["date"] >= "2023-01-01") & (df["date"] < "2023-04-01")
q1 = df[mask]

# Time-based slicing with DatetimeIndex
df_indexed = df.set_index("date")
df_indexed["2023-03"]          # all of March
df_indexed["2023-01":"2023-03"] # Jan through March

# Group by day of week
df.groupby(df["date"].dt.day_name())["value"].mean()

# Weekend vs. weekday
df["is_weekend"] = df["date"].dt.dayofweek >= 5
df.groupby("is_weekend")["value"].mean()

# Find missing dates
full_range = pd.date_range(df["date"].min(), df["date"].max())
missing = full_range.difference(df["date"])

# Generate a date sequence
dates = pd.date_range("2023-01-01", periods=365, freq="D")

# Shift for lag comparison
df["yesterday"] = df["value"].shift(1)
df["last_week"] = df["value"].shift(7)
df["change"] = df["value"] - df["value"].shift(1)
df["pct_change"] = df["value"].pct_change() * 100

Common Pitfalls and Fixes

Pitfall	Symptom	Fix
Dates still strings	`dtype: object`	`pd.to_datetime()`
Day-month ambiguity	"01/02" parsed wrong	Specify `format=` or `dayfirst=True`
Unparseable entries	`ValueError` on bad dates	Use `errors="coerce"`
NaT in calculations	Unexpected missing results	Check for NaT with `.isna()`
Wrong week start	Weekly totals don't align	Use `"W-MON"` instead of `"W"`
NaN at start of rolling	First N-1 values missing	Use `min_periods=1`
Timezone confusion	Times don't match	Store in UTC, convert for display

Key Vocabulary

Term	Definition
Timestamp	A single point in time (pandas' datetime type)
DatetimeIndex	An index of Timestamps enabling time-based operations
Timedelta	A duration (difference between two Timestamps)
DateOffset	A calendar-aware time offset (e.g., "one month")
NaT	"Not a Time" — the datetime equivalent of NaN
Resample	Change the frequency of time series data (daily to weekly)
Rolling window	A sliding window that computes statistics over N recent points
Expanding window	A growing window from the start to the current point
strftime	Format a datetime as a string (string-format-time)
strptime	Parse a string into a datetime (string-parse-time)

Key Takeaways: Working with Dates, Times, and Time Series Data

The Date Handling Workflow

Date Format Codes Reference

The .dt Accessor

Date Arithmetic

Resampling Quick Reference

Rolling Windows Guide

Expanding Windows

Common Patterns Cheat Sheet

Common Pitfalls and Fixes

Key Vocabulary

The `.dt` Accessor