Key Takeaways: Working with Dates, Times, and Time Series Data
This is your reference card for Chapter 11. Keep it handy whenever you're working with dates, time series, or any data that changes over time.
The Date Handling Workflow
Every time you encounter a date column, follow this sequence:
1. CHECK THE DTYPE df["date"].dtype
| If "object" -> it's a string, needs parsing.
v
2. PARSE WITH FORMAT pd.to_datetime(df["date"], format="%Y-%m-%d")
| Always specify format when possible.
v
3. HANDLE ERRORS errors="coerce" converts bad values to NaT.
| Check .isna().sum() for unparseable entries.
v
4. SET AS INDEX df = df.set_index("date")
| Unlocks time-based slicing and resampling.
v
5. VERIFY COMPLETENESS Check for gaps with date_range().difference().
| Fill gaps if needed (ffill, interpolate).
v
6. ANALYZE resample(), rolling(), groupby with .dt
Date Format Codes Reference
| Code | Meaning | Example |
|---|---|---|
%Y |
4-digit year | 2023 |
%y |
2-digit year | 23 |
%m |
Month (01-12) | 03 |
%d |
Day (01-31) | 15 |
%B |
Full month name | March |
%b |
Abbreviated month | Mar |
%H |
Hour, 24-hour (00-23) | 14 |
%I |
Hour, 12-hour (01-12) | 02 |
%M |
Minute (00-59) | 30 |
%S |
Second (00-59) | 45 |
%p |
AM/PM | PM |
%A |
Full weekday name | Wednesday |
%a |
Abbreviated weekday | Wed |
Common format strings:
| Date String | Format String |
|---|---|
"2023-03-15" |
"%Y-%m-%d" |
"03/15/2023" |
"%m/%d/%Y" |
"15/03/2023" |
"%d/%m/%Y" |
"15-Mar-2023" |
"%d-%b-%Y" |
"March 15, 2023" |
"%B %d, %Y" |
"2023-03-15 14:30:00" |
"%Y-%m-%d %H:%M:%S" |
The .dt Accessor
Access datetime components from a column (like .str for strings):
df["date"].dt.year # 2023
df["date"].dt.month # 3
df["date"].dt.day # 15
df["date"].dt.hour # 14
df["date"].dt.dayofweek # 2 (Mon=0, Sun=6)
df["date"].dt.day_name() # "Wednesday"
df["date"].dt.month_name() # "March"
df["date"].dt.quarter # 1
df["date"].dt.dayofyear # 74
df["date"].dt.is_leap_year # False
df["date"].dt.date # date only (no time)
df["date"].dt.time # time only (no date)
Formatting output:
df["date"].dt.strftime("%B %Y") # "March 2023"
Date Arithmetic
| Operation | Code | Result Type |
|---|---|---|
| Subtract two dates | date2 - date1 |
Timedelta |
| Get days from Timedelta | td.days |
int |
| Add days | date + pd.Timedelta(days=7) |
Timestamp |
| Add calendar months | date + pd.DateOffset(months=1) |
Timestamp |
| Add business days | date + pd.offsets.BDay(5) |
Timestamp |
Key distinction:
- Timedelta(days=30) = exactly 30 days (calendar-unaware)
- DateOffset(months=1) = one calendar month (handles varying month lengths)
Resampling Quick Reference
Frequency aliases:
| Alias | Meaning |
|---|---|
"D" |
Calendar day |
"B" |
Business day (Mon-Fri) |
"W" |
Weekly (default: Sunday) |
"W-MON" |
Weekly, anchored to Monday |
"MS" |
Month start |
"ME" |
Month end |
"QS" |
Quarter start |
"QE" |
Quarter end |
"YS" |
Year start |
"YE" |
Year end |
"h" |
Hourly |
"min" |
Minutely |
Common resampling patterns:
# Daily -> Weekly total
series.resample("W").sum()
# Daily -> Monthly average
series.resample("ME").mean()
# Hourly -> Daily max
series.resample("D").max()
# Multiple aggregations at once
series.resample("ME").agg(["mean", "min", "max", "std"])
Choosing the right aggregation:
| If you want... | Use... | Example |
|---|---|---|
| Total over the period | .sum() |
Total monthly revenue |
| Typical value | .mean() |
Average daily temperature |
| Extreme value | .max() / .min() |
Hottest day of the month |
| Number of observations | .count() |
Days with data this month |
| Last value of the period | .last() |
Month-end stock price |
Rolling Windows Guide
# Basic 7-day rolling average
series.rolling(7).mean()
# Start computing before full window is available
series.rolling(7, min_periods=1).mean()
# Centered window (for historical analysis)
series.rolling(7, center=True).mean()
# Other rolling statistics
series.rolling(7).std() # volatility
series.rolling(7).min() # local minimum
series.rolling(7).max() # local maximum
series.rolling(7).median() # robust average
series.rolling(7).sum() # running total over window
Window size guidelines:
| Window | Best for |
|---|---|
| 3-day | Very responsive, still noisy |
| 7-day | Standard for daily data (removes day-of-week effect) |
| 14-day | Smooths out biweekly patterns |
| 30-day | Monthly trend, very smooth |
| 90-day | Quarterly trend |
Expanding Windows
# Cumulative mean (running average)
series.expanding().mean()
# Cumulative sum (running total)
series.expanding().sum()
# Cumulative max (running high)
series.expanding().max()
Rolling vs. Expanding: - Rolling = fixed window, moves through data (recent trend) - Expanding = window grows from start to current point (all-time statistic)
Common Patterns Cheat Sheet
# Parse dates during CSV loading
df = pd.read_csv("file.csv", parse_dates=["date_col"])
# Filter to a date range
mask = (df["date"] >= "2023-01-01") & (df["date"] < "2023-04-01")
q1 = df[mask]
# Time-based slicing with DatetimeIndex
df_indexed = df.set_index("date")
df_indexed["2023-03"] # all of March
df_indexed["2023-01":"2023-03"] # Jan through March
# Group by day of week
df.groupby(df["date"].dt.day_name())["value"].mean()
# Weekend vs. weekday
df["is_weekend"] = df["date"].dt.dayofweek >= 5
df.groupby("is_weekend")["value"].mean()
# Find missing dates
full_range = pd.date_range(df["date"].min(), df["date"].max())
missing = full_range.difference(df["date"])
# Generate a date sequence
dates = pd.date_range("2023-01-01", periods=365, freq="D")
# Shift for lag comparison
df["yesterday"] = df["value"].shift(1)
df["last_week"] = df["value"].shift(7)
df["change"] = df["value"] - df["value"].shift(1)
df["pct_change"] = df["value"].pct_change() * 100
Common Pitfalls and Fixes
| Pitfall | Symptom | Fix |
|---|---|---|
| Dates still strings | dtype: object |
pd.to_datetime() |
| Day-month ambiguity | "01/02" parsed wrong | Specify format= or dayfirst=True |
| Unparseable entries | ValueError on bad dates |
Use errors="coerce" |
| NaT in calculations | Unexpected missing results | Check for NaT with .isna() |
| Wrong week start | Weekly totals don't align | Use "W-MON" instead of "W" |
| NaN at start of rolling | First N-1 values missing | Use min_periods=1 |
| Timezone confusion | Times don't match | Store in UTC, convert for display |
Key Vocabulary
| Term | Definition |
|---|---|
| Timestamp | A single point in time (pandas' datetime type) |
| DatetimeIndex | An index of Timestamps enabling time-based operations |
| Timedelta | A duration (difference between two Timestamps) |
| DateOffset | A calendar-aware time offset (e.g., "one month") |
| NaT | "Not a Time" — the datetime equivalent of NaN |
| Resample | Change the frequency of time series data (daily to weekly) |
| Rolling window | A sliding window that computes statistics over N recent points |
| Expanding window | A growing window from the start to the current point |
| strftime | Format a datetime as a string (string-format-time) |
| strptime | Parse a string into a datetime (string-parse-time) |