Case Study 2: Measuring Climate Model Disagreement with Information Theory

DataField.Dev

Case Study 2: Measuring Climate Model Disagreement with Information Theory

Context

The Pacific Climate Research Consortium (PCRC) faces a communication challenge. They have access to projections from 12 CMIP6 global climate models, each providing probability distributions over future regional temperature anomalies for the Pacific Northwest under the SSP2-4.5 emissions scenario (a "middle of the road" pathway). The projections span from 2050 to 2100 and are discretized into temperature bins.

The problem: these models do not agree. For 2050, the models cluster reasonably tightly around a 1.5-2.5 degree Celsius warming range. For 2100, the spread is enormous — from 1.8 to 5.2 degrees above the 1980-2010 baseline. When a state legislator asks "how much will it warm by 2100?", the consortium needs to communicate not just a central estimate but the nature and magnitude of disagreement among the models.

Information theory provides the right language. Entropy quantifies total uncertainty. The decomposition into aleatoric and epistemic components separates irreducible climate variability from model disagreement. KL divergence identifies which models are "outliers" relative to the ensemble consensus.

The Data

Each of the 12 CMIP6 models produces a probability distribution over temperature anomaly bins. We work with two time horizons: 2050 and 2100.

import numpy as np
from typing import Dict, List, Tuple

def generate_cmip6_ensemble(
    n_models: int = 12,
    n_bins: int = 25,
    temp_range: Tuple[float, float] = (-1.0, 7.0),
    seed: int = 42
) -> Dict[str, np.ndarray]:
    """Generate synthetic CMIP6-like ensemble distributions.

    Each model's distribution is a discretized Gaussian with varying
    mean (climate sensitivity) and spread (internal variability).

    For 2050: models are in closer agreement (1.5-2.5C range).
    For 2100: models diverge substantially (1.8-5.2C range).

    Returns:
        Dict with '2050' and '2100' keys, each mapping to an array
        of shape (n_models, n_bins) containing model distributions.
    """
    rng = np.random.RandomState(seed)
    bin_edges = np.linspace(temp_range[0], temp_range[1], n_bins + 1)
    bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2
    bin_width = bin_edges[1] - bin_edges[0]

    def discretize_gaussian(mu: float, sigma: float) -> np.ndarray:
        """Discretize a Gaussian into probability bins."""
        from scipy.stats import norm
        probs = np.array([
            norm.cdf(bin_edges[i+1], mu, sigma) - norm.cdf(bin_edges[i], mu, sigma)
            for i in range(n_bins)
        ])
        probs = np.maximum(probs, 1e-10)  # Avoid exact zeros
        return probs / probs.sum()

    # 2050 projections: moderate spread
    means_2050 = rng.uniform(1.5, 2.5, n_models)
    sigmas_2050 = rng.uniform(0.5, 1.0, n_models)
    models_2050 = np.array([
        discretize_gaussian(m, s) for m, s in zip(means_2050, sigmas_2050)
    ])

    # 2100 projections: wide spread reflecting climate sensitivity uncertainty
    means_2100 = rng.uniform(1.8, 5.2, n_models)
    sigmas_2100 = rng.uniform(0.6, 1.5, n_models)
    models_2100 = np.array([
        discretize_gaussian(m, s) for m, s in zip(means_2100, sigmas_2100)
    ])

    model_names = [
        "CESM2", "GFDL-ESM4", "EC-Earth3", "MPI-ESM1-2-HR",
        "CNRM-ESM2-1", "UKESM1-0-LL", "NorESM2-LM", "ACCESS-ESM1-5",
        "INM-CM5-0", "MIROC6", "IPSL-CM6A-LR", "CanESM5"
    ]

    return {
        "2050": models_2050,
        "2100": models_2100,
        "bin_centers": bin_centers,
        "bin_edges": bin_edges,
        "model_names": model_names,
        "means_2050": means_2050,
        "means_2100": means_2100,
    }

ensemble = generate_cmip6_ensemble()
print(f"Models: {len(ensemble['model_names'])}")
print(f"Temperature bins: {len(ensemble['bin_centers'])} "
      f"({ensemble['bin_edges'][0]:.1f} to {ensemble['bin_edges'][-1]:.1f} C)")

Models: 12
Temperature bins: 25 (-1.0 to 7.0 C)

Uncertainty Decomposition

We decompose the total predictive uncertainty into aleatoric (within-model, irreducible climate variability) and epistemic (between-model, reflecting structural disagreement) components.

def entropy(p: np.ndarray) -> float:
    """Entropy of a discrete distribution in nats."""
    p = np.asarray(p, dtype=np.float64)
    mask = p > 0
    return -np.sum(p[mask] * np.log(p[mask]))


def uncertainty_decomposition(
    model_distributions: np.ndarray,
    model_names: List[str]
) -> Dict:
    """Full uncertainty decomposition of a climate model ensemble.

    Returns total, aleatoric, and epistemic uncertainty, plus
    per-model entropy and KL divergence from ensemble mean.
    """
    n_models = model_distributions.shape[0]
    ensemble_mean = model_distributions.mean(axis=0)

    # Total uncertainty
    total = entropy(ensemble_mean)

    # Per-model entropy
    model_entropies = np.array([entropy(model_distributions[m])
                                 for m in range(n_models)])

    # Aleatoric (average within-model uncertainty)
    aleatoric = model_entropies.mean()

    # Epistemic (between-model disagreement)
    epistemic = total - aleatoric

    # KL divergence from each model to ensemble mean
    kl_to_ensemble = np.zeros(n_models)
    for m in range(n_models):
        p = model_distributions[m]
        q = ensemble_mean
        mask = p > 0
        kl_to_ensemble[m] = np.sum(p[mask] * np.log(p[mask] / q[mask]))

    return {
        "total_nats": total,
        "aleatoric_nats": aleatoric,
        "epistemic_nats": epistemic,
        "epistemic_fraction": epistemic / total if total > 0 else 0,
        "model_entropies": dict(zip(model_names, model_entropies)),
        "kl_to_ensemble": dict(zip(model_names, kl_to_ensemble)),
    }

# Analyze both time horizons
for horizon in ["2050", "2100"]:
    result = uncertainty_decomposition(
        ensemble[horizon], ensemble["model_names"]
    )

    print(f"\n{'='*60}")
    print(f"  Uncertainty Decomposition: {horizon} Projections")
    print(f"{'='*60}")
    print(f"  Total uncertainty:     {result['total_nats']:.4f} nats "
          f"({result['total_nats'] / np.log(2):.4f} bits)")
    print(f"  Aleatoric (within):    {result['aleatoric_nats']:.4f} nats")
    print(f"  Epistemic (between):   {result['epistemic_nats']:.4f} nats "
          f"({result['epistemic_fraction']:.1%} of total)")
    print()

    # Per-model analysis
    print(f"  {'Model':<20} {'H (nats)':>10} {'KL from ensemble':>18}")
    print(f"  {'-'*48}")

    # Sort by KL divergence (outliers first)
    sorted_models = sorted(result['kl_to_ensemble'].items(),
                            key=lambda x: x[1], reverse=True)
    for name, kl in sorted_models:
        h = result['model_entropies'][name]
        marker = "  <-- outlier" if kl > 0.15 else ""
        print(f"  {name:<20} {h:>10.4f} {kl:>18.4f}{marker}")

============================================================
  Uncertainty Decomposition: 2050 Projections
============================================================
  Total uncertainty:     1.9125 nats (2.7592 bits)
  Aleatoric (within):    1.8524 nats
  Epistemic (between):   0.0601 nats (3.1% of total)

  Model                  H (nats)   KL from ensemble
  ------------------------------------------------
  INM-CM5-0                1.6614             0.1098
  ACCESS-ESM1-5            1.9817             0.0947
  MIROC6                   1.7089             0.0870
  CanESM5                  1.9497             0.0662
  MPI-ESM1-2-HR            1.6872             0.0577
  NorESM2-LM               1.9698             0.0571
  EC-Earth3                1.8803             0.0372
  CNRM-ESM2-1              1.9539             0.0309
  CESM2                    1.7752             0.0265
  UKESM1-0-LL              2.0005             0.0262
  GFDL-ESM4                1.8853             0.0212
  IPSL-CM6A-LR             1.7754             0.0195

============================================================
  Uncertainty Decomposition: 2100 Projections
============================================================
  Total uncertainty:     2.5374 nats (3.6605 bits)
  Aleatoric (within):    1.9476 nats
  Epistemic (between):   0.5898 nats (23.2% of total)

  Model                  H (nats)   KL from ensemble
  ------------------------------------------------
  CanESM5                  2.1095             0.5832  <-- outlier
  INM-CM5-0                1.6423             0.4978  <-- outlier
  IPSL-CM6A-LR             1.7619             0.3698  <-- outlier
  ACCESS-ESM1-5            2.1476             0.2543  <-- outlier
  MIROC6                   1.8098             0.2216  <-- outlier
  UKESM1-0-LL              1.7851             0.1754  <-- outlier
  CESM2                    1.8372             0.1479
  CNRM-ESM2-1              2.1120             0.1318
  MPI-ESM1-2-HR            1.8723             0.1284
  EC-Earth3                2.1002             0.1138
  GFDL-ESM4                2.0816             0.0905
  NorESM2-LM               2.1120             0.0769

Key Findings

2050 projections: Only 3.1% of total uncertainty is epistemic. The models largely agree — most uncertainty comes from internal climate variability (weather noise, ocean-atmosphere coupling). The ensemble prediction is robust in the sense that no individual model is far from the consensus.

2100 projections: 23.2% of total uncertainty is epistemic — a nearly eightfold increase. Six models are flagged as "outliers" (KL divergence > 0.15 nats from the ensemble mean). The models fundamentally disagree about the long-term climate response, reflecting genuine scientific uncertainty about climate sensitivity, carbon cycle feedbacks, and ice sheet dynamics.

Communicating to Policymakers

This decomposition has direct policy implications. Consider a specific policy question: "What is the probability of exceeding 4 degrees C warming by 2100?"

def probability_exceeding_threshold(
    model_distributions: np.ndarray,
    bin_centers: np.ndarray,
    threshold: float,
    model_names: List[str]
) -> Dict:
    """Compute P(T > threshold) for each model and the ensemble.

    Args:
        model_distributions: Shape (n_models, n_bins).
        bin_centers: Center of each temperature bin.
        threshold: Temperature threshold in degrees C.
        model_names: Names of the models.

    Returns:
        Dict with per-model and ensemble exceedance probabilities.
    """
    above_threshold = bin_centers > threshold
    per_model = {
        name: model_distributions[i, above_threshold].sum()
        for i, name in enumerate(model_names)
    }
    ensemble_mean_dist = model_distributions.mean(axis=0)
    ensemble_prob = ensemble_mean_dist[above_threshold].sum()

    return {
        "per_model": per_model,
        "ensemble": ensemble_prob,
        "model_range": (min(per_model.values()), max(per_model.values())),
    }

result_4C = probability_exceeding_threshold(
    ensemble["2100"], ensemble["bin_centers"], 4.0, ensemble["model_names"]
)

print("Policy Question: P(warming > 4C by 2100)?")
print("=" * 50)
print(f"  Ensemble mean estimate: {result_4C['ensemble']:.1%}")
print(f"  Range across models:    {result_4C['model_range'][0]:.1%} "
      f"to {result_4C['model_range'][1]:.1%}")
print()
print("  Per-model breakdown:")
sorted_probs = sorted(result_4C["per_model"].items(),
                       key=lambda x: x[1], reverse=True)
for name, prob in sorted_probs:
    bar = "#" * int(prob * 40)
    print(f"    {name:<20} {prob:>6.1%}  {bar}")

Policy Question: P(warming > 4C by 2100)?
==================================================
  Ensemble mean estimate: 26.3%
  Range across models:    1.3% to 68.5%

  Per-model breakdown:
    CanESM5              68.5%  ###########################
    IPSL-CM6A-LR        43.0%  #################
    ACCESS-ESM1-5        38.3%  ###############
    UKESM1-0-LL         30.2%  ############
    MIROC6               23.2%  #########
    CESM2                19.2%  #######
    CNRM-ESM2-1          18.7%  #######
    EC-Earth3            17.8%  #######
    NorESM2-LM           16.3%  ######
    MPI-ESM1-2-HR        14.1%  #####
    GFDL-ESM4            12.1%  ####
    INM-CM5-0             1.3%

The ensemble mean says 26.3%, but the range is enormous: from 1.3% (INM-CM5-0, a low-sensitivity model) to 68.5% (CanESM5, a high-sensitivity model). The KL divergence analysis already flagged these as outliers — now we see why. The policy-relevant answer depends critically on whether you trust the high-sensitivity models.

Information-Theoretic Recommendations for PCRC

The analysis yields three actionable recommendations for the consortium:

1. Report epistemic fraction alongside central estimates. Saying "the ensemble mean projects 2.8 degrees C of warming" without noting that 23% of the total uncertainty is epistemic understates the state of knowledge. The epistemic fraction tells policymakers how much of the uncertainty reflects genuine scientific disagreement versus irreducible variability.

2. Flag models with high KL divergence from the ensemble mean. Models like CanESM5 (KL = 0.58 nats from ensemble) and INM-CM5-0 (KL = 0.50 nats) are statistical outliers. This does not mean they are wrong — CanESM5's high climate sensitivity may be physically justified. But their outsized influence on the ensemble mean should be transparently reported. Some ensemble weighting schemes down-weight models with high KL from observations.

3. For threshold questions, report the full distribution of model answers. The single number "26.3% probability of exceeding 4 degrees C" obscures a 67-percentage-point spread. A more honest presentation: "Most models (8 of 12) estimate a 12-20% probability. Two models estimate 38-69%. Two models estimate 1-23%. The range reflects unresolved questions about climate sensitivity."

Research Insight: The entropy-based uncertainty decomposition used here is standard in the climate modeling community (Hawkins and Sutton, 2009). The aleatoric-epistemic decomposition maps directly to the mutual information $I(\text{model index}; \text{prediction})$, which measures how much knowing which model generated a prediction reduces uncertainty about the prediction itself. When this MI is high, model choice matters — a signal that the science is not settled. When it is low, the models effectively agree, and the remaining uncertainty is inherent to the climate system.

Common Misconception: A common policy error is interpreting the ensemble mean as a "consensus" prediction. The ensemble mean is the arithmetic average of model distributions — it is not itself a physically realizable climate trajectory. When models are bimodal (some predicting 2C, others 5C), the ensemble mean of 3.5C may have lower probability than either mode under any individual model. Entropy analysis reveals this: high epistemic uncertainty is a signal that the ensemble mean may be a poor summary.