Case Study 1: Pacific Climate Research Consortium — Probabilistic Regional Temperature Forecasting for 2030-2050
Context
The Pacific Climate Research Consortium (PCRC) is a collaboration of three universities and two national meteorological agencies tasked with producing regional climate projections for the Pacific Northwest. Their mandate is specific: provide probabilistic temperature forecasts for 2030-2050 that policymakers can use to guide infrastructure investment, agricultural planning, and public health preparedness.
The challenge is not prediction accuracy in the traditional sense — no one expects climate models to predict the temperature on a specific day in 2040. The challenge is uncertainty quantification: producing honest probability statements about long-term temperature trajectories. A city planning engineer needs to know the probability that summer peak temperatures exceed 42°C (the threshold above which asphalt road surfaces degrade) by 2045. A water resource manager needs the probability that winter snowpack falls below 60% of historical averages. These are questions about the tails of predictive distributions, not about means.
PCRC has three decades of monthly temperature anomaly data (1994-2024, 360 observations) for their target region, supplemented by global climate model (GCM) outputs from CMIP6. Their data science team will compare three modeling approaches: a Bayesian structural time series model (BSTS), a Temporal Fusion Transformer (TFT), and a weighted ensemble. The evaluation focuses on calibration and the quality of tail-probability estimates — not on point forecast accuracy.
The Data
The team constructs a monthly temperature anomaly series relative to the 1981-2010 baseline, augmented with three covariates: CO2 concentration (Mauna Loa, monthly), Pacific Decadal Oscillation (PDO) index, and a CMIP6 multi-model mean projection.
import numpy as np
import pandas as pd
from dataclasses import dataclass
from typing import List, Tuple, Dict, Optional
import matplotlib.pyplot as plt
from scipy import stats as sp_stats
def generate_pcrc_climate_data(
n_historical: int = 360,
n_future: int = 312,
seed: int = 42,
) -> Tuple[pd.DataFrame, pd.DataFrame]:
"""Generate synthetic PCRC climate data: historical + future.
Historical: 30 years of monthly temperature anomalies (1994-2024).
Future: 26 years (2025-2050) for projection evaluation.
The data-generating process includes:
- Long-term warming trend (nonlinear, accelerating)
- Annual seasonality (varying amplitude)
- PDO-driven decadal variability
- CO2 as a covariate (logarithmic forcing)
- Heteroscedastic noise (higher variance in summer)
Args:
n_historical: Number of historical monthly observations.
n_future: Number of future monthly observations.
seed: Random seed for reproducibility.
Returns:
Tuple of (historical_df, future_df) DataFrames.
"""
np.random.seed(seed)
n_total = n_historical + n_future
months = pd.date_range("1994-01-01", periods=n_total, freq="MS")
t = np.arange(n_total) / 12.0 # Time in years
# CO2 concentration (ppm): logarithmic growth from ~358 to ~470
co2 = 358 + 2.0 * t + 0.02 * t**2 + np.random.normal(0, 0.3, n_total)
# PDO index: quasi-periodic ~20-year cycle
pdo = 0.8 * np.sin(2 * np.pi * t / 22.0 + 0.5) + np.random.normal(0, 0.4, n_total)
# CMIP6 multi-model mean (synthetic): smooth warming trajectory
cmip6_mean = 0.03 * t + 0.001 * t**2
# True temperature anomaly
trend = 0.015 * t + 0.0008 * t**2 # Accelerating warming
annual_cycle = 0.3 * np.sin(2 * np.pi * np.arange(n_total) / 12.0)
pdo_effect = 0.15 * pdo
co2_forcing = 0.4 * np.log(co2 / 358.0) # Logarithmic CO2 sensitivity
# Heteroscedastic noise: summer months (Jun-Aug) have higher variance
month_of_year = np.array([m.month for m in months])
noise_scale = np.where((month_of_year >= 6) & (month_of_year <= 8), 0.35, 0.20)
noise = noise_scale * np.random.randn(n_total)
temp_anomaly = trend + annual_cycle + pdo_effect + co2_forcing + noise
# Introduce 3% missing values in historical period
missing_mask = np.zeros(n_total, dtype=bool)
missing_idx = np.random.choice(n_historical, size=int(0.03 * n_historical), replace=False)
missing_mask[missing_idx] = True
temp_observed = temp_anomaly.copy()
temp_observed[missing_mask] = np.nan
df = pd.DataFrame({
"date": months,
"month": month_of_year,
"year": [m.year for m in months],
"t_years": t,
"temp_anomaly": temp_observed,
"temp_true": temp_anomaly,
"co2": co2,
"pdo": pdo,
"cmip6_mean": cmip6_mean,
})
historical = df.iloc[:n_historical].copy()
future = df.iloc[n_historical:].copy()
return historical, future
historical, future = generate_pcrc_climate_data()
print(f"Historical period: {historical['date'].iloc[0].date()} to {historical['date'].iloc[-1].date()}")
print(f"Future period: {future['date'].iloc[0].date()} to {future['date'].iloc[-1].date()}")
print(f"Historical missing: {historical['temp_anomaly'].isna().sum()} / {len(historical)}")
print(f"Historical temp range: [{historical['temp_anomaly'].min():.2f}, {historical['temp_anomaly'].max():.2f}]°C")
print(f"Mean warming rate (historical): {(historical['temp_anomaly'].iloc[-12:].mean() - historical['temp_anomaly'].iloc[:12].mean()) / 30:.4f}°C/year")
Approach 1: Bayesian Structural Time Series
The BSTS model decomposes the series into trend, seasonality, and regression components, with full Bayesian inference over all parameters. This is the natural choice for the climate application because it provides posterior predictive distributions that propagate parameter uncertainty into the forecast.
from scipy.optimize import minimize
@dataclass
class BayesianStructuralTS:
"""Bayesian structural time series model for climate projections.
Components:
- Local linear trend (level + slope)
- Annual seasonality (Fourier, 6 harmonics)
- Regression on CO2 and PDO
- Observation noise
Uses a Kalman filter for state estimation and Monte Carlo sampling
for uncertainty quantification.
Attributes:
n_fourier: Number of Fourier pairs for annual seasonality.
n_samples: Number of posterior samples for uncertainty.
"""
n_fourier: int = 6
n_samples: int = 500
def _build_fourier_features(self, t_months: np.ndarray) -> np.ndarray:
"""Build annual Fourier features."""
features = []
for k in range(1, self.n_fourier + 1):
features.append(np.cos(2 * np.pi * k * t_months / 12.0))
features.append(np.sin(2 * np.pi * k * t_months / 12.0))
return np.column_stack(features)
def fit_and_forecast(
self,
y_train: np.ndarray,
co2_train: np.ndarray,
pdo_train: np.ndarray,
co2_future: np.ndarray,
pdo_future: np.ndarray,
horizon: int,
) -> Dict[str, np.ndarray]:
"""Fit BSTS model and produce probabilistic forecasts.
Uses a local linear trend Kalman filter with Fourier seasonal
regressors and CO2/PDO covariates. Uncertainty is estimated
via parametric bootstrap: resample model parameters from their
approximate posterior, re-run the filter, and forecast.
Args:
y_train: Historical temperature anomalies (may contain NaN).
co2_train: Historical CO2 values.
pdo_train: Historical PDO values.
co2_future: Future CO2 values.
pdo_future: Future PDO values.
horizon: Forecast horizon in months.
Returns:
Dictionary with keys: mean, std, samples, quantiles.
"""
n_train = len(y_train)
t_train = np.arange(n_train)
t_future = np.arange(n_train, n_train + horizon)
# Build regression features
X_train = np.column_stack([
self._build_fourier_features(t_train),
np.log(co2_train / co2_train[0]),
pdo_train,
])
X_future = np.column_stack([
self._build_fourier_features(t_future),
np.log(co2_future / co2_train[0]),
pdo_future,
])
# Fit regression coefficients on non-missing data
valid = ~np.isnan(y_train)
from numpy.linalg import lstsq
beta, _, _, _ = lstsq(X_train[valid], y_train[valid], rcond=None)
# Residuals after regression
y_detrended = y_train - X_train @ beta
residual_std = np.nanstd(y_detrended)
# Fit Kalman filter to residuals (local linear trend)
from dataclasses import dataclass as dc
T_mat = np.array([[1.0, 1.0], [0.0, 1.0]])
Z_mat = np.array([[1.0, 0.0]])
R_mat = np.eye(2)
sigma_level = residual_std * 0.3
sigma_trend = residual_std * 0.05
Q_mat = np.diag([sigma_level**2, sigma_trend**2])
H_scalar = (residual_std * 0.8) ** 2
# Run Kalman filter on detrended residuals
a = np.array([0.0, 0.0])
P = np.eye(2) * 1e6
filtered_states = np.zeros((n_train, 2))
filtered_covs = np.zeros((n_train, 2, 2))
for t in range(n_train):
# Predict
a_pred = T_mat @ a
P_pred = T_mat @ P @ T_mat.T + R_mat @ Q_mat @ R_mat.T
if np.isnan(y_detrended[t]):
a = a_pred
P = P_pred
else:
v = y_detrended[t] - Z_mat @ a_pred
F = Z_mat @ P_pred @ Z_mat.T + H_scalar
K = P_pred @ Z_mat.T / float(F)
a = a_pred + K.ravel() * float(v)
P = P_pred - np.outer(K.ravel(), Z_mat @ P_pred)
filtered_states[t] = a
filtered_covs[t] = P
# Monte Carlo forecast with parameter uncertainty
forecast_samples = np.zeros((self.n_samples, horizon))
regression_cov = np.linalg.inv(X_train[valid].T @ X_train[valid]) * residual_std**2
for s in range(self.n_samples):
# Sample regression coefficients from approximate posterior
beta_s = np.random.multivariate_normal(beta, regression_cov)
reg_component = X_future @ beta_s
# Sample Kalman state forward
a_s = np.random.multivariate_normal(filtered_states[-1], filtered_covs[-1])
P_s = filtered_covs[-1].copy()
for h in range(horizon):
a_s = T_mat @ a_s + np.random.multivariate_normal(
np.zeros(2), Q_mat
)
y_state = float(Z_mat @ a_s)
obs_noise = np.random.normal(0, np.sqrt(H_scalar))
forecast_samples[s, h] = reg_component[h] + y_state + obs_noise
# Compute summary statistics
forecast_mean = forecast_samples.mean(axis=0)
forecast_std = forecast_samples.std(axis=0)
quantiles = np.percentile(
forecast_samples, [5, 10, 25, 50, 75, 90, 95], axis=0
)
return {
"mean": forecast_mean,
"std": forecast_std,
"samples": forecast_samples,
"quantiles": quantiles,
"quantile_levels": np.array([0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95]),
"regression_beta": beta,
}
# Fit BSTS and forecast
bsts = BayesianStructuralTS(n_fourier=6, n_samples=500)
bsts_result = bsts.fit_and_forecast(
y_train=historical["temp_anomaly"].values,
co2_train=historical["co2"].values,
pdo_train=historical["pdo"].values,
co2_future=future["co2"].values,
pdo_future=future["pdo"].values,
horizon=len(future),
)
print("=== BSTS Forecast Summary ===")
print(f"Forecast horizon: {len(future)} months ({len(future)//12} years)")
print(f"Mean anomaly at 2035: {bsts_result['mean'][132]:.3f}°C")
print(f"Mean anomaly at 2050: {bsts_result['mean'][-1]:.3f}°C")
print(f"95% interval at 2035: [{bsts_result['quantiles'][0, 132]:.3f}, {bsts_result['quantiles'][-1, 132]:.3f}]°C")
print(f"95% interval at 2050: [{bsts_result['quantiles'][0, -1]:.3f}, {bsts_result['quantiles'][-1, -1]:.3f}]°C")
print(f"P(anomaly > 2.5°C by 2050): {np.mean(bsts_result['samples'][:, -1] > 2.5):.3f}")
Approach 2: Temporal Fusion Transformer
The TFT treats the problem as a multi-horizon forecasting task, using CO2, PDO, and CMIP6 projections as known future inputs. The key question is whether the TFT — trained on only 360 monthly observations — can produce well-calibrated long-horizon forecasts.
import torch
import torch.nn as nn
class ClimateForecasterTFT(nn.Module):
"""Simplified TFT-style model for climate projection.
Uses an LSTM encoder-decoder with multi-quantile output.
Designed for monthly climate data with covariates.
Args:
n_covariates: Number of input covariates per time step.
hidden_dim: LSTM hidden dimension.
n_quantiles: Number of output quantiles.
encoder_length: Historical context window.
"""
def __init__(
self,
n_covariates: int = 4,
hidden_dim: int = 64,
n_quantiles: int = 7,
encoder_length: int = 120,
):
super().__init__()
self.encoder = nn.LSTM(
input_size=n_covariates + 1, # covariates + target
hidden_size=hidden_dim,
num_layers=2,
dropout=0.1,
batch_first=True,
)
self.decoder = nn.LSTM(
input_size=n_covariates, # only covariates (no future target)
hidden_size=hidden_dim,
num_layers=2,
dropout=0.1,
batch_first=True,
)
self.output_head = nn.Linear(hidden_dim, n_quantiles)
self.quantiles = [0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95]
def forward(
self,
x_encoder: torch.Tensor,
y_encoder: torch.Tensor,
x_decoder: torch.Tensor,
) -> torch.Tensor:
"""Forward pass.
Args:
x_encoder: Historical covariates, shape (batch, enc_len, n_cov).
y_encoder: Historical target, shape (batch, enc_len, 1).
x_decoder: Future covariates, shape (batch, dec_len, n_cov).
Returns:
Quantile forecasts, shape (batch, dec_len, n_quantiles).
"""
enc_input = torch.cat([x_encoder, y_encoder], dim=-1)
_, (h_n, c_n) = self.encoder(enc_input)
dec_output, _ = self.decoder(x_decoder, (h_n, c_n))
quantile_preds = self.output_head(dec_output)
return quantile_preds
def quantile_loss(
self, y_true: torch.Tensor, y_pred: torch.Tensor
) -> torch.Tensor:
"""Multi-quantile pinball loss."""
losses = []
for i, tau in enumerate(self.quantiles):
error = y_true - y_pred[:, :, i]
loss = torch.max(tau * error, (tau - 1.0) * error)
losses.append(loss.mean())
return torch.stack(losses).mean()
def train_climate_tft(
historical: pd.DataFrame,
future: pd.DataFrame,
encoder_length: int = 120,
n_epochs: int = 200,
learning_rate: float = 1e-3,
) -> Dict[str, np.ndarray]:
"""Train TFT-style model and forecast.
Uses a sliding-window approach to create training samples from
the historical period.
Args:
historical: Historical DataFrame with covariates.
future: Future DataFrame with covariates.
encoder_length: Context window length (months).
n_epochs: Training epochs.
learning_rate: Adam learning rate.
Returns:
Dictionary with quantile forecasts.
"""
# Prepare features
y = historical["temp_anomaly"].interpolate().fillna(method="bfill").values
co2_all = np.concatenate([historical["co2"].values, future["co2"].values])
pdo_all = np.concatenate([historical["pdo"].values, future["pdo"].values])
cmip6_all = np.concatenate([historical["cmip6_mean"].values, future["cmip6_mean"].values])
# Normalize
y_mean, y_std = np.nanmean(y), np.nanstd(y)
y_norm = (y - y_mean) / y_std
co2_mean, co2_std = co2_all.mean(), co2_all.std()
pdo_mean, pdo_std = pdo_all.mean(), pdo_all.std()
cmip6_mean_val, cmip6_std = cmip6_all.mean(), cmip6_all.std()
month_sin = np.sin(2 * np.pi * historical["month"].values / 12.0)
month_cos = np.cos(2 * np.pi * historical["month"].values / 12.0)
covariates_hist = np.column_stack([
(historical["co2"].values - co2_mean) / co2_std,
(historical["pdo"].values - pdo_mean) / pdo_std,
month_sin,
month_cos,
])
# Create training windows
dec_length = 60 # 5-year forecast windows for training
X_enc_list, Y_enc_list, X_dec_list, Y_dec_list = [], [], [], []
for start in range(0, len(y) - encoder_length - dec_length, 6):
enc_end = start + encoder_length
dec_end = enc_end + dec_length
X_enc_list.append(covariates_hist[start:enc_end])
Y_enc_list.append(y_norm[start:enc_end])
X_dec_list.append(covariates_hist[enc_end:dec_end])
Y_dec_list.append(y_norm[enc_end:dec_end])
X_enc = torch.tensor(np.array(X_enc_list), dtype=torch.float32)
Y_enc = torch.tensor(np.array(Y_enc_list), dtype=torch.float32).unsqueeze(-1)
X_dec = torch.tensor(np.array(X_dec_list), dtype=torch.float32)
Y_dec = torch.tensor(np.array(Y_dec_list), dtype=torch.float32)
# Train
model = ClimateForecasterTFT(n_covariates=4, hidden_dim=64, encoder_length=encoder_length)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=n_epochs)
model.train()
for epoch in range(n_epochs):
optimizer.zero_grad()
preds = model(X_enc, Y_enc, X_dec)
loss = model.quantile_loss(Y_dec, preds)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
scheduler.step()
# Forecast into the future
model.eval()
future_months = future["month"].values
future_sin = np.sin(2 * np.pi * future_months / 12.0)
future_cos = np.cos(2 * np.pi * future_months / 12.0)
covariates_future = np.column_stack([
(future["co2"].values - co2_mean) / co2_std,
(future["pdo"].values - pdo_mean) / pdo_std,
future_sin,
future_cos,
])
# Use last encoder_length observations as context
x_enc_final = torch.tensor(
covariates_hist[-encoder_length:], dtype=torch.float32
).unsqueeze(0)
y_enc_final = torch.tensor(
y_norm[-encoder_length:], dtype=torch.float32
).unsqueeze(0).unsqueeze(-1)
# Forecast in chunks of dec_length and concatenate
n_future = len(future)
all_quantiles = []
chunk_start = 0
while chunk_start < n_future:
chunk_end = min(chunk_start + dec_length, n_future)
chunk_len = chunk_end - chunk_start
x_dec_chunk = torch.tensor(
covariates_future[chunk_start:chunk_end], dtype=torch.float32
).unsqueeze(0)
# Pad if last chunk is shorter
if chunk_len < dec_length:
pad = torch.zeros(1, dec_length - chunk_len, 4)
x_dec_chunk = torch.cat([x_dec_chunk, pad], dim=1)
with torch.no_grad():
q_preds = model(x_enc_final, y_enc_final, x_dec_chunk)
all_quantiles.append(q_preds[0, :chunk_len, :].numpy())
chunk_start = chunk_end
quantile_forecasts = np.concatenate(all_quantiles, axis=0) * y_std + y_mean
return {
"quantiles": quantile_forecasts,
"quantile_levels": np.array(model.quantiles),
"mean": quantile_forecasts[:, 3], # Median (q50)
}
tft_result = train_climate_tft(historical, future)
print("\n=== TFT Forecast Summary ===")
print(f"Median anomaly at 2035: {tft_result['mean'][132]:.3f}°C")
print(f"Median anomaly at 2050: {tft_result['mean'][-1]:.3f}°C")
print(f"90% interval at 2050: [{tft_result['quantiles'][-1, 1]:.3f}, {tft_result['quantiles'][-1, 5]:.3f}]°C")
Approach 3: Calibrated Ensemble
The PCRC team combines the BSTS and TFT forecasts using a simple calibrated ensemble. The ensemble weights are determined by walk-forward calibration performance on the last 5 years of historical data.
def evaluate_calibration(
y_true: np.ndarray,
forecast_quantiles: np.ndarray,
quantile_levels: np.ndarray,
) -> Dict[str, float]:
"""Compute calibration metrics for a probabilistic forecast.
Args:
y_true: True values, shape (n,).
forecast_quantiles: Predicted quantiles, shape (n, n_q).
quantile_levels: Quantile levels, shape (n_q,).
Returns:
Dictionary with calibration error, interval score, and coverage.
"""
n = len(y_true)
# Empirical coverage per quantile level
empirical_coverage = np.array([
np.mean(y_true <= forecast_quantiles[:, j])
for j in range(len(quantile_levels))
])
calibration_error = np.mean(np.abs(empirical_coverage - quantile_levels))
# 90% interval score (Gneiting & Raftery, 2007)
alpha = 0.10
lower_idx = np.argmin(np.abs(quantile_levels - alpha / 2))
upper_idx = np.argmin(np.abs(quantile_levels - (1 - alpha / 2)))
lower = forecast_quantiles[:, lower_idx]
upper = forecast_quantiles[:, upper_idx]
width = upper - lower
penalty_lower = (2.0 / alpha) * np.maximum(lower - y_true, 0)
penalty_upper = (2.0 / alpha) * np.maximum(y_true - upper, 0)
interval_score = np.mean(width + penalty_lower + penalty_upper)
coverage_90 = np.mean((y_true >= lower) & (y_true <= upper))
return {
"calibration_error": calibration_error,
"interval_score": interval_score,
"coverage_90": coverage_90,
"empirical_coverage": empirical_coverage,
}
def build_calibrated_ensemble(
bsts_quantiles: np.ndarray,
tft_quantiles: np.ndarray,
y_true: np.ndarray,
quantile_levels: np.ndarray,
) -> Tuple[np.ndarray, float]:
"""Find optimal ensemble weight via grid search on calibration error.
Args:
bsts_quantiles: BSTS quantile forecasts, shape (n, n_q).
tft_quantiles: TFT quantile forecasts, shape (n, n_q).
y_true: True values for calibration, shape (n,).
quantile_levels: Quantile levels.
Returns:
(optimal_weight, best_calibration_error).
"""
best_w, best_error = 0.5, np.inf
for w in np.linspace(0.0, 1.0, 21):
blended = w * bsts_quantiles + (1 - w) * tft_quantiles
cal = evaluate_calibration(y_true, blended, quantile_levels)
if cal["calibration_error"] < best_error:
best_w = w
best_error = cal["calibration_error"]
return best_w, best_error
# Evaluate on the future period (using "true" future for demonstration)
y_future_true = future["temp_true"].values
# BSTS calibration
bsts_quantiles_future = bsts_result["quantiles"].T # shape (n_future, 7)
bsts_cal = evaluate_calibration(
y_future_true, bsts_quantiles_future, bsts_result["quantile_levels"]
)
# TFT calibration
tft_cal = evaluate_calibration(
y_future_true, tft_result["quantiles"], tft_result["quantile_levels"]
)
# Ensemble
w_opt, _ = build_calibrated_ensemble(
bsts_quantiles_future, tft_result["quantiles"],
y_future_true[:60], # Calibrate on first 5 years only
bsts_result["quantile_levels"],
)
ensemble_quantiles = w_opt * bsts_quantiles_future + (1 - w_opt) * tft_result["quantiles"]
ensemble_cal = evaluate_calibration(
y_future_true, ensemble_quantiles, bsts_result["quantile_levels"]
)
print("\n=== Calibration Comparison ===")
print(f"{'Model':<15s} {'Cal. Error':>12s} {'90% Coverage':>14s} {'Interval Score':>16s}")
print("-" * 60)
print(f"{'BSTS':<15s} {bsts_cal['calibration_error']:>12.4f} {bsts_cal['coverage_90']:>14.3f} {bsts_cal['interval_score']:>16.3f}")
print(f"{'TFT':<15s} {tft_cal['calibration_error']:>12.4f} {tft_cal['coverage_90']:>14.3f} {tft_cal['interval_score']:>16.3f}")
print(f"{'Ensemble (w={w_opt:.2f})':<15s} {ensemble_cal['calibration_error']:>12.4f} {ensemble_cal['coverage_90']:>14.3f} {ensemble_cal['interval_score']:>16.3f}")
# Key policy-relevant probability
p_exceed_2_5 = np.mean(bsts_result["samples"][:, -1] > 2.5)
print(f"\nPolicy-relevant output:")
print(f" P(anomaly > 2.5°C by 2050) [BSTS]: {p_exceed_2_5:.3f}")
Results and Analysis
The PCRC team summarizes the comparison:
| Model | Calibration Error | 90% Coverage | Interval Score |
|---|---|---|---|
| BSTS | Low | Close to 0.90 | Moderate |
| TFT | Moderate | Often below 0.90 | Can be narrower |
| Ensemble | Lowest | Closest to 0.90 | Best overall |
Three findings shape the final recommendation:
-
BSTS produces the most honest uncertainty. The Bayesian framework propagates parameter uncertainty into the forecast, producing intervals that widen naturally with the horizon. The 90% intervals achieve near-nominal coverage because the model's structural assumptions (linear trend, Fourier seasonality, logarithmic CO2 response) match the data-generating process reasonably well. The interval width at 2050 is approximately 3x wider than at 2030, reflecting genuinely increasing uncertainty.
-
TFT produces sharper forecasts but risks overconfidence. The TFT's quantile forecasts are narrower than the BSTS intervals — attractive for communication but potentially dangerous for policy. Trained on only 360 monthly observations with 60-month internal forecast windows, the TFT cannot reliably learn the tail behavior needed for 25-year projections. The 90% coverage falls below 0.80 in some evaluation windows, indicating overconfidence. The TFT's real strength is in medium-term forecasting (12-60 months) where the training data density is sufficient.
-
The calibrated ensemble combines the best of both. By blending with a weight calibrated on the first 5 years of out-of-sample data, the ensemble inherits the BSTS's honest uncertainty while benefiting from the TFT's ability to capture nonlinear covariate interactions. The interval score — which rewards both sharpness and coverage — favors the ensemble.
Communicating Uncertainty to Policymakers
The PCRC team translates the ensemble forecast into policy-relevant statements:
- Temperature trajectory: "Regional mean temperature anomaly is projected to reach 1.8°C to 2.9°C above the 1981-2010 baseline by 2050 (90% prediction interval), with a median projection of 2.3°C."
- Threshold exceedance: "The probability that the anomaly exceeds 2.5°C by 2050 is approximately 0.35. The probability it exceeds 3.0°C is approximately 0.10."
- Decision-relevant framing: "Infrastructure designed for a 2.0°C world has a 75% chance of being adequate through 2050. Infrastructure designed for a 3.0°C world has a 90% chance."
The key lesson: probabilistic forecasts are useful only if they are calibrated and communicated in terms the audience can act on. A median temperature projection is almost useless for a civil engineer — but a probability distribution over peak temperatures, translated into infrastructure survival probabilities, is directly actionable. The BSTS model's honest uncertainty quantification is the foundation on which this communication rests.
Discussion Questions
-
The BSTS model assumes a logarithmic relationship between CO2 and temperature forcing. If the true relationship is more complex (e.g., including aerosol interactions), how would this manifest in the calibration diagnostics?
-
The TFT was trained on only 360 observations. Would performance improve if the team incorporated global temperature data (not just regional) as additional training series? What assumptions would this require about the relationship between global and regional temperatures?
-
The ensemble weight was calibrated on the first 5 years of out-of-sample data. If climate dynamics are non-stationary, should the weight be adaptive? How would you implement adaptive ensemble weighting using the ACI framework from Section 23.11?