Case Study 1: The Weather Forecaster's Dilemma — Simple vs. Complex Models

Contributors to Introduction to Data Science

Case Study 1: The Weather Forecaster's Dilemma — Simple vs. Complex Models

Tier 1 — Verified Concepts: This case study explores well-established principles of forecasting using weather prediction as a lens. The historical progression of weather forecasting models is documented in meteorological literature. The specific data examples are constructed for pedagogical purposes, but the patterns they illustrate — the relationship between model complexity and forecast accuracy, and the persistence model baseline — are standard in the forecasting literature.

A Farmer, a Physicist, and a Supercomputer Walk Into a Weather Station

In 1854, a British naval officer named Robert FitzRoy — the same FitzRoy who had captained the HMS Beagle during Darwin's famous voyage — began issuing what he called "weather forecasts." He was ridiculed. The Times of London called his predictions unreliable. Scientists of the day argued that weather was too complex to predict.

FitzRoy's approach was simple: he gathered barometric pressure readings from telegraph stations across Britain, noticed that storms tended to follow dropping pressure, and issued warnings when the pattern appeared. His model was primitive — essentially "if pressure drops fast, expect bad weather." It wasn't right all the time. But it was useful enough to save lives at sea.

This is a story about model complexity, and it perfectly illustrates the tension at the heart of Chapter 25: the tradeoff between simplicity and accuracy, between understanding and prediction, between a model that's wrong in understandable ways and one that's wrong in mysterious ways.

The Simplest Weather Model: Persistence

The simplest weather model in the world is called the persistence model: predict that tomorrow's weather will be the same as today's. If it's sunny today, predict sun tomorrow. If it's 72 degrees today, predict 72 degrees tomorrow.

This sounds laughably simple. But it's actually a surprisingly strong baseline.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Simulated daily temperatures for a city (one year)
np.random.seed(42)
days = np.arange(365)

# Base temperature: seasonal cycle
base_temp = 55 + 25 * np.sin(2 * np.pi * (days - 80) / 365)

# Daily variation (autocorrelated — today's weather
# is similar to yesterday's)
noise = np.zeros(365)
noise[0] = np.random.normal(0, 5)
for i in range(1, 365):
    noise[i] = 0.7 * noise[i-1] + np.random.normal(0, 3)

temperature = base_temp + noise

# The persistence model: tomorrow = today
persistence_pred = temperature[:-1]  # yesterday's temp
actual_tomorrow = temperature[1:]    # actual temp

# How good is it?
persistence_mae = np.abs(actual_tomorrow - persistence_pred).mean()
print(f"Persistence model MAE: {persistence_mae:.1f} degrees")

When you run this, you'll find that the persistence model is off by only about 4-5 degrees on average. Not bad for "just predict today's temperature again."

This is why baselines matter. Any sophisticated weather model must beat the persistence model, or it's adding complexity for no benefit.

A Slightly Better Model: Climatology

The next step up is the climatology model: predict that tomorrow's temperature will be the historical average for that date. If January 15th has averaged 35 degrees over the past 30 years, predict 35 degrees.

# Climatology baseline: historical average for each day
climatology_pred = base_temp[1:]  # seasonal average
climatology_mae = np.abs(actual_tomorrow - climatology_pred).mean()

print(f"Climatology model MAE: {climatology_mae:.1f} degrees")
print(f"Persistence model MAE: {persistence_mae:.1f} degrees")

Interestingly, the persistence model often beats the climatology model for short-term forecasts (1-2 days), while climatology is better for long-range forecasts (weeks ahead). Why? Because tomorrow's weather is highly correlated with today's weather (persistence captures this), but next month's weather is more related to the seasonal average (climatology captures this).

This is your first lesson in model selection: the best model depends on the task. There is no universally best model.

The First Scientific Models: Linear Relationships

FitzRoy and his successors noticed that weather variables are related to each other. Pressure drops before storms. Wind speed increases with pressure gradients. Temperature depends on cloud cover, latitude, and season.

These observations led to the first quantitative weather models — essentially linear relationships:

# A simple linear model: predict tomorrow's temperature
# from today's temperature and today's pressure change
np.random.seed(42)

# Simulated pressure changes (correlated with temp changes)
pressure_change = -0.3 * np.diff(temperature) + np.random.normal(0, 2, 364)

# Features: today's temp + pressure change
today_temp = temperature[:-1]

# Simple linear prediction
# tomorrow = a * today + b * pressure_change + c
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

X = np.column_stack([today_temp, pressure_change])
y = actual_tomorrow

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

linear_model = LinearRegression().fit(X_train, y_train)
linear_pred = linear_model.predict(X_test)
linear_mae = np.abs(y_test - linear_pred).mean()

print(f"Linear model MAE: {linear_mae:.1f} degrees")

This simple two-feature linear model typically beats both the persistence and climatology baselines. It's using real information — today's temperature and how pressure is changing — to make a better prediction.

The Modern Approach: Massive Complexity

Today's weather models are among the most complex computational systems ever built. The Global Forecast System (GFS) divides the atmosphere into a three-dimensional grid with millions of cells, solves fluid dynamics equations for each cell, and runs on some of the world's most powerful supercomputers. It uses thousands of input variables — temperature, pressure, humidity, wind speed, and more — measured at hundreds of altitudes across the globe.

These models are staggeringly good for short-term forecasts. A 3-day forecast today is as accurate as a 1-day forecast was in 1980. But they're not perfect, and they exhibit an interesting pattern that relates directly to our chapter's themes.

The Accuracy Curve: Complexity vs. Forecast Horizon

Here's the key insight. If you plot forecast accuracy against forecast horizon (how far ahead you're predicting), you see something revealing:

# Simulated accuracy by model complexity and forecast horizon
horizons = [1, 2, 3, 5, 7, 10, 14]

# Simple model (persistence): great at day 1, useless by day 7
simple_mae = [3.5, 6.0, 8.5, 12.0, 14.0, 15.0, 15.5]

# Complex model (physics-based): better at all horizons
# but the improvement narrows at longer horizons
complex_mae = [2.0, 3.5, 5.0, 8.0, 11.0, 13.5, 15.0]

# Climatology: constant (doesn't depend on horizon)
climate_mae = [10.0] * 7

plt.figure(figsize=(10, 6))
plt.plot(horizons, simple_mae, 'o-', label='Persistence (simple)')
plt.plot(horizons, complex_mae, 's-', label='Physics model (complex)')
plt.plot(horizons, climate_mae, '^--', label='Climatology (baseline)')
plt.xlabel('Forecast Horizon (days)')
plt.ylabel('Mean Absolute Error (degrees)')
plt.title('Model Complexity vs. Forecast Horizon')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Three patterns emerge:

At 1 day: The complex model is better than the simple one, but not by a huge margin. The simple persistence model is already pretty good.
At 7-14 days: All models converge toward climatology. No matter how complex your model, you can't reliably predict the weather two weeks out. The atmosphere is chaotic — tiny errors compound exponentially.
The sweet spot: The complex model adds the most value in the 2-5 day range, where it significantly outperforms the simple model but hasn't yet been defeated by chaos.

Connecting to the Bias-Variance Tradeoff

This weather example beautifully illustrates the bias-variance tradeoff:

The persistence model has high bias (it assumes tomorrow equals today, ignoring real atmospheric dynamics) but low variance (its predictions are very stable — they never go haywire). It's the Student B from our chapter who learns general patterns but misses specifics.

An overfit weather model — one that tries to predict based on dozens of local, short-lived atmospheric features — might have low bias (it captures real phenomena) but high variance (its predictions are unstable, sensitive to small measurement errors). On days when its inputs are accurate, it's brilliant. On days when a sensor is slightly off, it predicts a blizzard in July.

The best operational weather models balance bias and variance through careful physics-based constraints. They don't just fit data — they encode the laws of physics, which act as a regularizing force. The physics prevents the model from making physically impossible predictions (variance control) while still being flexible enough to capture complex weather patterns (bias control).

The Forecaster's Confession

Professional weather forecasters will tell you something surprising: for many everyday decisions, the persistence model is good enough. Should you bring an umbrella today? Look outside. Is it raining? Then yes, probably bring one. That's the persistence model in action.

The sophisticated models earn their keep in specific situations:

Severe weather warnings: You can't use persistence to predict a hurricane that's still forming
Multi-day planning: When you need to know conditions 3-5 days out
Aviation and shipping: Where precise forecasts save lives and millions of dollars

The lesson for data science is profound: the right model depends on the decision you're making. A simple model might be perfectly adequate for low-stakes decisions, while a complex model is necessary for high-stakes ones. The added accuracy of the complex model has to justify its added cost (computational resources, interpretation difficulty, maintenance burden).

What This Teaches Us About Modeling

Always start with a baseline. The persistence model is the weather forecaster's baseline, and it's surprisingly hard to beat. If your sophisticated model can't outperform "predict the same as yesterday," you've wasted effort.
Complexity has diminishing returns. Going from persistence to a simple linear model is a big improvement. Going from a simple linear model to a physics-based supercomputer model is another improvement, but the marginal gains are smaller. And beyond about 10 days, no amount of complexity helps.
The irreducible error is real. Some things are genuinely unpredictable. The weather 30 days from now cannot be accurately predicted — not because our models are bad, but because the atmosphere is chaotic. Recognizing the limit of predictability is as important as pushing toward it.
Model choice depends on the decision. Don't build a supercomputer model when the persistence model will do. But don't use the persistence model when lives depend on 3-day severe weather warnings.
Simple models help you understand; complex models help you predict. The persistence model teaches you about autocorrelation in weather. The linear model teaches you about the relationship between pressure and temperature. The physics model makes the best predictions but is a black box to most users. These are different kinds of value.

Discussion Questions

FitzRoy was mocked for his "weather forecasts" because scientists believed weather was too complex to predict. What modern prediction tasks do people believe are too complex? Are they right?
The persistence model works well because weather is autocorrelated — today's weather is similar to yesterday's. What other domains have strong autocorrelation? Where would the persistence baseline fail?
Weather models encode the laws of physics as constraints. What kinds of "domain knowledge constraints" might you build into a model predicting vaccination rates?
At what forecast horizon does the simple model and complex model converge in accuracy? What does this tell you about the value of complexity?

Key Takeaways from This Case Study

The best model depends on the task, the decision, and the time horizon
Baselines (even absurdly simple ones) set the standard that any useful model must beat
Complexity has diminishing returns and eventually hits the wall of irreducible noise
Domain knowledge (like physics) can constrain models to prevent overfitting
The bias-variance tradeoff shows up everywhere — even in weather forecasting