Case Study 2: Detecting Market Regimes in Election Markets

Overview

Election prediction markets are among the longest-running and most closely watched prediction markets. A presidential election market might trade for over a year, during which the market passes through qualitatively different phases: calm periods where the price drifts slowly, volatile periods around debates and major events, and the final convergence phase as election day approaches.

In this case study, we apply regime detection techniques to a synthetic but realistic election market time series. We will:

Generate a realistic 365-day election market with embedded regimes.
Apply Hidden Markov Model (HMM) regime detection.
Apply change-point detection using the PELT algorithm.
Correlate detected regimes with simulated events.
Characterize the statistical properties of each regime.

Step 1: Generating a Realistic Election Market

We construct a synthetic 365-day election market that mimics the behavior of a real presidential election market. The market passes through four distinct phases:

Phase 1 (Days 1-100): Stable uncertainty. The market trades in a narrow range around 0.45-0.55 with low volatility. No major information events.
Phase 2 (Days 101-180): Gradual trend. Improving polls push the price from 0.50 to 0.60. Moderate volatility.
Phase 3 (Days 181-300): High volatility. The debate season introduces large price swings. Several discrete jumps of 5-10 points. Volatility doubles.
Phase 4 (Days 301-365): Final convergence. Polling stabilizes, and the market converges to 0.65 as election day approaches. Volatility decreases.

import numpy as np
import pandas as pd
from datetime import datetime, timedelta

np.random.seed(2024)

n_days = 365
dates = [datetime(2024, 1, 1) + timedelta(days=i) for i in range(n_days)]

# Define regimes
prices = np.zeros(n_days)
volumes = np.zeros(n_days)
true_regimes = np.zeros(n_days, dtype=int)

prices[0] = 0.50

for t in range(1, n_days):
    if t < 100:
        # Phase 1: Stable uncertainty
        true_regimes[t] = 0
        drift = 0.0001
        vol = 0.008
        base_vol = 50000
    elif t < 180:
        # Phase 2: Gradual trend
        true_regimes[t] = 1
        drift = 0.0012
        vol = 0.012
        base_vol = 80000
    elif t < 300:
        # Phase 3: High volatility (debates, events)
        true_regimes[t] = 2
        drift = 0.0002
        vol = 0.025
        base_vol = 150000
    else:
        # Phase 4: Convergence
        true_regimes[t] = 3
        drift = 0.0005
        vol = 0.010
        base_vol = 200000

    change = drift + np.random.randn() * vol
    prices[t] = np.clip(prices[t-1] + change, 0.01, 0.99)
    volumes[t] = max(1000, base_vol * np.random.lognormal(0, 0.5))

# Inject specific events
events = {
    130: ('Major poll release', 0.03),
    185: ('First debate', -0.06),
    210: ('VP debate', 0.04),
    240: ('October surprise', -0.08),
    260: ('Second debate', 0.05),
    280: ('Late poll shift', 0.03),
    340: ('Final polls', 0.02),
}

for day, (name, impact) in events.items():
    if day < n_days:
        prices[day] = np.clip(prices[day] + impact, 0.01, 0.99)
        # Adjust subsequent prices
        for t in range(day + 1, n_days):
            prices[t] = np.clip(prices[t] + impact, 0.01, 0.99)
        volumes[day] *= 5  # Volume spike at events

# Create DataFrame
df = pd.DataFrame({
    'date': dates,
    'price': prices,
    'volume': volumes,
    'true_regime': true_regimes,
    'price_change': np.concatenate([[0], np.diff(prices)])
})

Step 2: Initial EDA of the Election Market

Price Time Series

import matplotlib.pyplot as plt

fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Price plot
colors = ['#2196F3', '#4CAF50', '#FF5722', '#9C27B0']
regime_names = ['Stable', 'Trending', 'Volatile', 'Convergence']

for regime in range(4):
    mask = df['true_regime'] == regime
    axes[0].scatter(df[mask]['date'], df[mask]['price'],
                    c=colors[regime], s=3, label=regime_names[regime])

axes[0].set_ylabel('Price (Probability)')
axes[0].set_ylim(0, 1)
axes[0].set_title('Election Market Price with True Regimes')
axes[0].legend(loc='upper left')

# Mark events
for day, (name, _) in events.items():
    axes[0].axvline(x=dates[day], color='gray', alpha=0.3, linestyle='--')
    axes[0].annotate(name, xy=(dates[day], 0.95), fontsize=7,
                     rotation=45, ha='right')

# Volume plot
axes[1].bar(df['date'], df['volume'], width=1, color='orange', alpha=0.7)
axes[1].set_ylabel('Volume')
axes[1].set_title('Daily Volume')

# Price changes
axes[2].bar(df['date'], df['price_change'], width=1,
            color=np.where(df['price_change'] >= 0, 'green', 'red'), alpha=0.7)
axes[2].set_ylabel('Price Change')
axes[2].set_title('Daily Price Changes')
axes[2].set_xlabel('Date')

plt.tight_layout()
plt.savefig('election_market_overview.png', dpi=150)
plt.show()

Summary Statistics by True Regime

regime_stats = df.groupby('true_regime').agg(
    mean_price=('price', 'mean'),
    std_price_change=('price_change', 'std'),
    mean_volume=('volume', 'mean'),
    max_abs_change=('price_change', lambda x: x.abs().max()),
    n_days=('date', 'count'),
).rename(index={0: 'Stable', 1: 'Trending', 2: 'Volatile', 3: 'Convergence'})

print("=== Regime Statistics (True Regimes) ===")
print(regime_stats.round(4).to_string())

Key observations from the true regime statistics:

The Volatile regime (Phase 3) has roughly 2-3x the price change standard deviation of the Stable regime.
Volume increases monotonically across regimes, with the Convergence phase showing the highest average volume (reflecting increased interest as election day approaches).
The largest single-day moves occur during the Volatile regime, coinciding with the injected debate events.

Step 3: HMM Regime Detection

We now apply a 2-state and 3-state Gaussian HMM to the price changes, pretending we do not know the true regimes.

from hmmlearn import hmm

# Prepare data
price_changes = df['price_change'].values[1:].reshape(-1, 1)

# Fit 2-state HMM
model_2 = hmm.GaussianHMM(n_components=2, covariance_type='full',
                            n_iter=200, random_state=42)
model_2.fit(price_changes)
states_2 = model_2.predict(price_changes)

# Fit 3-state HMM
model_3 = hmm.GaussianHMM(n_components=3, covariance_type='full',
                            n_iter=200, random_state=42)
model_3.fit(price_changes)
states_3 = model_3.predict(price_changes)

# Compute BIC for model selection
def compute_bic(model, data):
    n_params = (model.n_components ** 2
                + 2 * model.n_components - 1)  # transitions + means + variances
    log_likelihood = model.score(data)
    n = len(data)
    return -2 * log_likelihood * n + n_params * np.log(n)

bic_2 = compute_bic(model_2, price_changes)
bic_3 = compute_bic(model_3, price_changes)

print(f"BIC (2 states): {bic_2:.1f}")
print(f"BIC (3 states): {bic_3:.1f}")
print(f"Selected model: {'2 states' if bic_2 < bic_3 else '3 states'}")

Visualizing HMM Results

fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# 2-state HMM
for state in range(2):
    mask = states_2 == state
    axes[0].scatter(df['date'].values[1:][mask], df['price'].values[1:][mask],
                    s=3, label=f'State {state}')
axes[0].set_ylabel('Price')
axes[0].set_title('2-State HMM Regime Detection')
axes[0].legend()
axes[0].set_ylim(0, 1)

# 3-state HMM
for state in range(3):
    mask = states_3 == state
    axes[1].scatter(df['date'].values[1:][mask], df['price'].values[1:][mask],
                    s=3, label=f'State {state}')
axes[1].set_ylabel('Price')
axes[1].set_title('3-State HMM Regime Detection')
axes[1].legend()
axes[1].set_ylim(0, 1)

plt.tight_layout()
plt.savefig('hmm_regimes.png', dpi=150)
plt.show()

HMM Regime Characterization

print("\n=== 2-State HMM Parameters ===")
for i in range(2):
    print(f"State {i}: mean={model_2.means_[i][0]:.5f}, "
          f"std={np.sqrt(model_2.covars_[i][0][0]):.5f}")

print(f"\nTransition matrix:\n{model_2.transmat_.round(3)}")

print("\n=== 3-State HMM Parameters ===")
for i in range(3):
    print(f"State {i}: mean={model_3.means_[i][0]:.5f}, "
          f"std={np.sqrt(model_3.covars_[i][0][0]):.5f}")

print(f"\nTransition matrix:\n{model_3.transmat_.round(3)}")

Analysis:

The 2-state HMM typically identifies a "calm" state (low variance) and a "volatile" state (high variance). This corresponds roughly to the distinction between Phases 1/2/4 (lower volatility) and Phase 3 (higher volatility) in the true regime structure.

The 3-state HMM can potentially distinguish between the stable, trending, and volatile regimes, though the trending and convergence phases may be merged since they have similar volatility levels but different drift rates.

Step 4: Change-Point Detection with PELT

import ruptures

# PELT with RBF kernel
algo = ruptures.Pelt(model="rbf").fit(price_changes)
penalty_value = 5  # Tuning parameter

# Try different penalty values
for pen in [1, 3, 5, 10, 20]:
    cps = algo.predict(pen=pen)
    print(f"Penalty={pen}: {len(cps)-1} change points at {cps[:-1]}")

# Use the selected penalty
change_points = algo.predict(pen=5)
change_point_days = [cp for cp in change_points[:-1]]

print(f"\nDetected change points (penalty=5): {change_point_days}")
print(f"True change points: [100, 180, 300]")

Visualizing Change Points

fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(df['date'], df['price'], linewidth=0.8, color='steelblue')

# True change points
for cp in [100, 180, 300]:
    ax.axvline(x=dates[cp], color='green', linestyle='--', alpha=0.7,
               label='True CP' if cp == 100 else '')

# Detected change points
for cp in change_point_days:
    if cp < len(dates):
        ax.axvline(x=dates[cp], color='red', linestyle='-', alpha=0.7,
                   label='Detected CP' if cp == change_point_days[0] else '')

ax.set_ylabel('Price')
ax.set_title('Change-Point Detection (PELT)')
ax.legend()
ax.set_ylim(0, 1)

plt.tight_layout()
plt.savefig('change_points.png', dpi=150)
plt.show()

Analysis:

The PELT algorithm identifies structural breaks in the statistical properties of the price change series. With an appropriate penalty parameter, it should detect change points near the true transition points (days 100, 180, and 300). However:

The transition between Phase 1 (stable) and Phase 2 (trending) may be difficult to detect because the volatility change is moderate.
The transition into Phase 3 (volatile) is usually well-detected due to the sharp increase in volatility.
Individual events (debates, October surprise) may be detected as separate change points, which is technically correct but may over-segment the regimes.

The choice of penalty parameter is critical: too low a penalty produces many spurious change points; too high a penalty misses genuine transitions.

Step 5: Correlating Regimes with Events

# For each detected regime, list the events that occurred within it
print("=== Events by Detected Regime ===")

# Using 2-state HMM
df_with_states = df.copy()
df_with_states['hmm_state_2'] = np.concatenate([[0], states_2])

for day, (name, impact) in events.items():
    state = df_with_states.loc[day, 'hmm_state_2']
    print(f"Day {day}: {name} (impact: {impact:+.2f}) -> HMM State: {state}")

# Event frequency by regime
for state in range(2):
    mask = df_with_states['hmm_state_2'] == state
    days_in_state = mask.sum()
    events_in_state = sum(1 for d in events if df_with_states.loc[d, 'hmm_state_2'] == state)
    print(f"\nState {state}: {days_in_state} days, {events_in_state} events, "
          f"event rate: {events_in_state/days_in_state:.4f} events/day")

Analysis:

Major events cluster within the volatile regime (HMM state with higher variance). This is expected---the events cause the high volatility, which is what the HMM detects. However, the relationship is bidirectional: events cause volatility, but also, the volatile regime may persist for a few days after an event as the market digests the information.

The event rate (events per day) is higher in the volatile regime, confirming the intuitive relationship between information arrival and market volatility.

Step 6: Detailed Regime Characterization

# Characterize each HMM-detected regime
print("=== Detailed Regime Characterization (2-State HMM) ===\n")

for state in range(2):
    mask = np.concatenate([[False], states_2 == state])
    state_data = df[mask]

    print(f"--- Regime {state} ---")
    print(f"  Days in regime: {len(state_data)}")
    print(f"  Price range: [{state_data['price'].min():.3f}, {state_data['price'].max():.3f}]")
    print(f"  Mean price: {state_data['price'].mean():.3f}")
    print(f"  Mean price change: {state_data['price_change'].mean():.5f}")
    print(f"  Std price change: {state_data['price_change'].std():.5f}")
    print(f"  Mean volume: {state_data['volume'].mean():,.0f}")
    print(f"  Skewness of changes: {state_data['price_change'].skew():.3f}")
    print(f"  Kurtosis of changes: {state_data['price_change'].kurtosis():.3f}")

    # Autocorrelation
    changes = state_data['price_change'].values
    if len(changes) > 10:
        acf_1 = pd.Series(changes).autocorr(lag=1)
        print(f"  Autocorrelation (lag 1): {acf_1:.3f}")

    # Regime persistence
    state_durations = []
    current_dur = 0
    all_states = np.concatenate([[0], states_2])
    for s in all_states:
        if s == state:
            current_dur += 1
        else:
            if current_dur > 0:
                state_durations.append(current_dur)
            current_dur = 0
    if current_dur > 0:
        state_durations.append(current_dur)

    if state_durations:
        print(f"  Number of episodes: {len(state_durations)}")
        print(f"  Mean episode duration: {np.mean(state_durations):.1f} days")
        print(f"  Max episode duration: {max(state_durations)} days")
    print()

Key characterization findings:

Calm regime: - Lower standard deviation of price changes (approximately 0.008-0.012). - Moderate volume. - Near-zero autocorrelation (consistent with efficiency within the regime). - Longer average episode duration (the market spends extended periods in calm mode).

Volatile regime: - Higher standard deviation (approximately 0.020-0.030). - Higher volume. - Possible positive autocorrelation at lag 1 (suggesting information is being processed over multiple days). - Shorter, more fragmented episodes (interspersed with brief calm periods). - Higher kurtosis (extreme moves within the volatile regime are themselves unevenly distributed).

Step 7: Trading Implications of Regime Detection

Understanding regimes has direct trading implications:

In the calm regime: - Market-making strategies are more profitable (spreads can be tighter because the risk of adverse selection is lower). - Trend-following strategies are unprofitable (no persistent trend to follow). - The primary risk is being caught off-guard by a regime transition.

In the volatile regime: - Market-making is riskier (wider spreads are needed to compensate for information asymmetry). - Trend-following may be profitable during the initial reaction to events. - Risk management is critical; position sizes should be reduced.

At regime transitions: - The transition from calm to volatile often corresponds to a specific event. Traders who anticipate which events will trigger volatility can pre-position. - The transition from volatile back to calm represents a stabilization opportunity.

Conclusions

Regime detection successfully identifies distinct market phases in election markets. Both HMM and change-point methods can detect the major transitions between calm and volatile periods.
The 2-state model (calm/volatile) captures the primary dynamic. While additional states can capture finer distinctions (trending vs. stable, convergence vs. early trading), the marginal gain is modest and the models become harder to interpret.
Events cluster in the volatile regime, confirming the intuitive relationship between information arrival and market dynamics.
Regime characterization provides actionable information for trading strategy selection, risk management, and position sizing.
The choice of detection method matters. HMM provides smooth probabilistic regime assignments and models transitions explicitly. Change-point detection identifies specific breakpoints but does not model the generative process. Using both methods together provides the most robust picture.
Sensitivity to tuning parameters (penalty in PELT, number of states in HMM) means that regime detection should be treated as a tool for understanding, not as a definitive classification. The boundaries between regimes are inherently fuzzy.