Case Study 2: Time Series Forecasting of Market Prices

Overview

In this case study, we apply ARIMA and GARCH models to prediction market price data to forecast short-term price movements. We evaluate forecast accuracy using walk-forward validation, compare model-based forecasts to naive benchmarks, and demonstrate how volatility forecasts from GARCH improve position sizing and risk management.

Background

Prediction market prices evolve over time as new information arrives and participants update their beliefs. While the efficient market hypothesis suggests that prices should follow a random walk (making future prices unpredictable), several empirical patterns have been documented in prediction markets:

Momentum: Price changes in one direction tend to be followed by further changes in the same direction, especially over short horizons.
Mean reversion: Extreme prices tend to revert toward moderate levels, especially when driven by temporary liquidity shocks.
Volatility clustering: Periods of high volatility (large price swings) tend to persist, as do periods of low volatility.
Calendar effects: Trading activity and price dynamics differ between weekdays and weekends, and around key information events.

Time series models can capture these patterns and produce short-term forecasts that are useful for market-making, mean-reversion trading, and risk management.

Data Description

We work with daily prediction market prices for a binary contract over a 365-day period leading to resolution. The data includes:

date: Trading date
price: Closing price (implied probability, 0 to 1)
volume: Number of contracts traded
high: Highest price during the day
low: Lowest price during the day
n_trades: Number of individual trades

Step 1: Data Generation and Exploration

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
import warnings
warnings.filterwarnings('ignore')

# Generate realistic prediction market price data
np.random.seed(123)
n_days = 365

# True probability: starts at 0.5, evolves as a random walk with drift
true_prob = np.zeros(n_days)
true_prob[0] = 0.5
for t in range(1, n_days):
    # Slight drift toward resolution
    drift = 0.0005 * np.sign(true_prob[t-1] - 0.5)
    shock = np.random.normal(0, 0.015)
    # Occasional large information shocks
    if np.random.random() < 0.03:
        shock += np.random.normal(0, 0.05)
    true_prob[t] = true_prob[t-1] + drift + shock
    true_prob[t] = np.clip(true_prob[t], 0.02, 0.98)

# Market price: true probability with market microstructure noise
# Volatility increases near expiry
time_to_expiry = np.arange(n_days, 0, -1)
noise_scale = 0.01 + 0.005 * np.exp(-time_to_expiry / 30)
market_noise = np.random.normal(0, noise_scale)
price = true_prob + market_noise
price = np.clip(price, 0.02, 0.98)

# Volume: higher when price changes are large, lower on weekends
base_volume = 1000 + 500 * np.abs(np.diff(np.concatenate([[0.5], price]))) * 100
day_of_week = np.arange(n_days) % 7
weekend_factor = np.where((day_of_week == 5) | (day_of_week == 6), 0.3, 1.0)
volume = (base_volume * weekend_factor * np.random.lognormal(0, 0.3, n_days)).astype(int)

# High/Low prices
daily_range = np.abs(np.random.normal(0.02, 0.008, n_days))
high = price + daily_range / 2
low = price - daily_range / 2
high = np.clip(high, 0.02, 0.99)
low = np.clip(low, 0.01, 0.98)

# Create DataFrame
dates = pd.date_range(start='2024-01-01', periods=n_days, freq='D')
market_data = pd.DataFrame({
    'date': dates,
    'price': price,
    'volume': volume,
    'high': high,
    'low': low,
    'true_prob': true_prob,
    'days_to_expiry': time_to_expiry
}).set_index('date')

print("Dataset Summary:")
print(market_data.describe().round(4))

# Exploratory plots
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

axes[0].plot(market_data.index, market_data['price'], label='Market Price')
axes[0].plot(market_data.index, market_data['true_prob'], '--', alpha=0.5,
             label='True Probability')
axes[0].set_title('Prediction Market Price Over Time')
axes[0].set_ylabel('Price')
axes[0].legend()

axes[1].bar(market_data.index, market_data['volume'], width=1, alpha=0.7)
axes[1].set_title('Trading Volume')
axes[1].set_ylabel('Contracts')

# Price changes
price_changes = market_data['price'].diff().dropna()
axes[2].plot(market_data.index[1:], price_changes)
axes[2].axhline(y=0, color='black', linewidth=0.5)
axes[2].set_title('Daily Price Changes')
axes[2].set_ylabel('Change')

plt.tight_layout()
plt.savefig('market_data_exploration.png', dpi=150)
plt.show()

Step 2: Stationarity Analysis

# Test stationarity of price levels and returns
print("=== Stationarity Tests ===\n")

# Price levels
adf_level = adfuller(market_data['price'], autolag='AIC')
print(f"Price Levels:")
print(f"  ADF Statistic: {adf_level[0]:.4f}")
print(f"  p-value: {adf_level[1]:.4f}")
print(f"  Conclusion: {'Stationary' if adf_level[1] < 0.05 else 'Non-stationary'}")

# Price changes
price_changes = market_data['price'].diff().dropna()
adf_diff = adfuller(price_changes, autolag='AIC')
print(f"\nPrice Changes (First Difference):")
print(f"  ADF Statistic: {adf_diff[0]:.4f}")
print(f"  p-value: {adf_diff[1]:.4f}")
print(f"  Conclusion: {'Stationary' if adf_diff[1] < 0.05 else 'Non-stationary'}")

# Log-odds transform
log_odds = np.log(market_data['price'] / (1 - market_data['price']))
log_odds_changes = log_odds.diff().dropna()
adf_logodds = adfuller(log_odds_changes, autolag='AIC')
print(f"\nLog-Odds Changes:")
print(f"  ADF Statistic: {adf_logodds[0]:.4f}")
print(f"  p-value: {adf_logodds[1]:.4f}")
print(f"  Conclusion: {'Stationary' if adf_logodds[1] < 0.05 else 'Non-stationary'}")

# ACF and PACF of price changes
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
plot_acf(price_changes, ax=axes[0], lags=30, title='ACF of Price Changes')
plot_pacf(price_changes, ax=axes[1], lags=30, title='PACF of Price Changes')
plt.tight_layout()
plt.savefig('acf_pacf_analysis.png', dpi=150)
plt.show()

Step 3: ARIMA Model Fitting

# Grid search for optimal ARIMA order
print("=== ARIMA Model Selection ===\n")

prices_array = market_data['price'].values
best_aic = np.inf
best_order = None
results_table = []

for p in range(5):
    for d in [1]:  # We know d=1 from stationarity analysis
        for q in range(5):
            if p == 0 and q == 0:
                continue
            try:
                model = ARIMA(prices_array, order=(p, d, q))
                fitted = model.fit()
                results_table.append({
                    'order': f'({p},{d},{q})',
                    'aic': fitted.aic,
                    'bic': fitted.bic,
                    'n_params': p + q + 1  # +1 for the constant
                })
                if fitted.aic < best_aic:
                    best_aic = fitted.aic
                    best_order = (p, d, q)
            except Exception:
                continue

results_df = pd.DataFrame(results_table).sort_values('aic')
print("Top 5 models by AIC:")
print(results_df.head(10).to_string(index=False))
print(f"\nBest model: ARIMA{best_order}")

# Fit best model
best_model = ARIMA(prices_array, order=best_order).fit()
print(f"\n{best_model.summary()}")

# Residual diagnostics
residuals = best_model.resid

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].plot(residuals)
axes[0, 0].set_title('Residuals Over Time')
axes[0, 0].axhline(y=0, color='black', linewidth=0.5)

axes[0, 1].hist(residuals, bins=30, density=True, alpha=0.7)
axes[0, 1].set_title('Residual Distribution')

plot_acf(residuals, ax=axes[1, 0], lags=20, title='Residual ACF')

# QQ plot
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[1, 1])
axes[1, 1].set_title('Residual QQ Plot')

plt.tight_layout()
plt.savefig('arima_diagnostics.png', dpi=150)
plt.show()

# Ljung-Box test on residuals
lb_test = acorr_ljungbox(residuals, lags=[5, 10, 15, 20], return_df=True)
print("\nLjung-Box Test (residual autocorrelation):")
print(lb_test.to_string())

Step 4: GARCH Volatility Model

from arch import arch_model

# Fit GARCH on log-odds returns
log_odds_series = np.log(market_data['price'] / (1 - market_data['price']))
returns = log_odds_series.diff().dropna() * 100  # Scale for numerical stability

# GARCH(1,1)
garch11 = arch_model(returns, vol='Garch', p=1, q=1,
                      mean='AR', lags=1, dist='t')
garch_result = garch11.fit(disp='off')
print("=== GARCH(1,1) Results ===")
print(garch_result.summary())

# Extract conditional volatility
cond_vol = garch_result.conditional_volatility

# Compare realized vs conditional volatility
realized_vol = returns.rolling(20).std()

fig, axes = plt.subplots(2, 1, figsize=(14, 8))

axes[0].plot(returns.index, returns.values, alpha=0.5, label='Returns')
axes[0].set_title('Log-Odds Returns')
axes[0].set_ylabel('Return (%)')
axes[0].legend()

axes[1].plot(cond_vol.index, cond_vol.values, label='GARCH Conditional Vol',
             linewidth=2)
axes[1].plot(realized_vol.index, realized_vol.values, '--', alpha=0.5,
             label='20-day Realized Vol')
axes[1].set_title('Volatility: GARCH vs Realized')
axes[1].set_ylabel('Volatility')
axes[1].legend()

plt.tight_layout()
plt.savefig('garch_volatility.png', dpi=150)
plt.show()

# Identify high and low volatility regimes
vol_median = cond_vol.median()
high_vol_periods = cond_vol > vol_median * 1.5
low_vol_periods = cond_vol < vol_median * 0.5

print(f"\nVolatility Statistics:")
print(f"  Median conditional vol: {vol_median:.4f}")
print(f"  High vol periods: {high_vol_periods.sum()} days "
      f"({high_vol_periods.mean():.1%})")
print(f"  Low vol periods: {low_vol_periods.sum()} days "
      f"({low_vol_periods.mean():.1%})")

Step 5: Walk-Forward Forecast Evaluation

# Walk-forward evaluation of ARIMA forecasts
print("=== Walk-Forward Forecast Evaluation ===\n")

min_train_size = 100
forecast_horizon = 1  # 1-step-ahead

wf_results = {
    'date': [],
    'actual': [],
    'arima_forecast': [],
    'naive_forecast': [],  # Random walk: forecast = last price
    'ma5_forecast': [],    # 5-day moving average forecast
}

for t in range(min_train_size, n_days):
    train_prices = prices_array[:t]
    actual = prices_array[t]

    # ARIMA forecast
    try:
        arima = ARIMA(train_prices, order=best_order)
        arima_fit = arima.fit()
        arima_fc = arima_fit.forecast(steps=forecast_horizon)[0]
        arima_fc = np.clip(arima_fc, 0.01, 0.99)
    except Exception:
        arima_fc = train_prices[-1]  # Fallback to naive

    # Naive forecast (random walk)
    naive_fc = train_prices[-1]

    # Moving average forecast
    ma5_fc = np.mean(train_prices[-5:])

    wf_results['date'].append(dates[t])
    wf_results['actual'].append(actual)
    wf_results['arima_forecast'].append(arima_fc)
    wf_results['naive_forecast'].append(naive_fc)
    wf_results['ma5_forecast'].append(ma5_fc)

wf_df = pd.DataFrame(wf_results).set_index('date')

# Compute error metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error

for method in ['arima_forecast', 'naive_forecast', 'ma5_forecast']:
    rmse = np.sqrt(mean_squared_error(wf_df['actual'], wf_df[method]))
    mae = mean_absolute_error(wf_df['actual'], wf_df[method])
    # Directional accuracy
    actual_direction = np.sign(wf_df['actual'].diff().dropna())
    forecast_direction = np.sign(
        (wf_df[method] - wf_df['actual'].shift(1)).dropna()
    )
    # Align indices
    common_idx = actual_direction.index.intersection(forecast_direction.index)
    dir_accuracy = (
        actual_direction.loc[common_idx] == forecast_direction.loc[common_idx]
    ).mean()

    print(f"{method}:")
    print(f"  RMSE: {rmse:.6f}")
    print(f"  MAE:  {mae:.6f}")
    print(f"  Directional Accuracy: {dir_accuracy:.3f}")
    print()

# Plot forecasts vs actuals
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Full period
axes[0].plot(wf_df.index, wf_df['actual'], 'k-', label='Actual', linewidth=1)
axes[0].plot(wf_df.index, wf_df['arima_forecast'], 'b-', alpha=0.7,
             label='ARIMA')
axes[0].plot(wf_df.index, wf_df['naive_forecast'], 'r--', alpha=0.5,
             label='Naive (RW)')
axes[0].set_title('Walk-Forward Forecasts vs Actual Prices')
axes[0].legend()

# Forecast errors
arima_errors = wf_df['actual'] - wf_df['arima_forecast']
naive_errors = wf_df['actual'] - wf_df['naive_forecast']
axes[1].plot(wf_df.index, arima_errors.cumsum(), 'b-', label='ARIMA Cumulative Error')
axes[1].plot(wf_df.index, naive_errors.cumsum(), 'r--', label='Naive Cumulative Error')
axes[1].axhline(y=0, color='black', linewidth=0.5)
axes[1].set_title('Cumulative Forecast Errors')
axes[1].legend()

plt.tight_layout()
plt.savefig('forecast_evaluation.png', dpi=150)
plt.show()

Step 6: Multi-Horizon Forecast Analysis

# Evaluate forecasts at multiple horizons
horizons = [1, 3, 5, 10, 20]
horizon_results = []

for h in horizons:
    errors = []
    for t in range(min_train_size, n_days - h):
        train_prices = prices_array[:t]
        actual = prices_array[t + h - 1]  # h-step ahead actual

        try:
            arima = ARIMA(train_prices, order=best_order)
            arima_fit = arima.fit()
            fc = arima_fit.forecast(steps=h)
            forecast = np.clip(fc[-1], 0.01, 0.99)
        except Exception:
            forecast = train_prices[-1]

        errors.append((actual - forecast) ** 2)

    rmse = np.sqrt(np.mean(errors))
    naive_errors = []
    for t in range(min_train_size, n_days - h):
        actual = prices_array[t + h - 1]
        naive = prices_array[t - 1]
        naive_errors.append((actual - naive) ** 2)
    naive_rmse = np.sqrt(np.mean(naive_errors))

    horizon_results.append({
        'horizon': h,
        'arima_rmse': rmse,
        'naive_rmse': naive_rmse,
        'improvement': (naive_rmse - rmse) / naive_rmse * 100
    })

horizon_df = pd.DataFrame(horizon_results)
print("\n=== Multi-Horizon Forecast Accuracy ===")
print(horizon_df.to_string(index=False))

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(horizon_df['horizon'], horizon_df['arima_rmse'], 'bo-', label='ARIMA')
ax.plot(horizon_df['horizon'], horizon_df['naive_rmse'], 'rs--', label='Naive')
ax.set_xlabel('Forecast Horizon (days)')
ax.set_ylabel('RMSE')
ax.set_title('Forecast Accuracy by Horizon')
ax.legend()
plt.tight_layout()
plt.savefig('multi_horizon_accuracy.png', dpi=150)
plt.show()

Step 7: Volatility-Adjusted Trading Strategy

# Use GARCH volatility to adjust position sizes
print("=== Volatility-Adjusted Trading Strategy ===\n")

# Generate ARIMA signals: buy/sell based on forecast vs current price
signals = pd.DataFrame(index=wf_df.index)
signals['price'] = wf_df['actual']
signals['forecast'] = wf_df['arima_forecast']
signals['edge'] = signals['forecast'] - signals['price'].shift(1)

# Get conditional volatility for the test period (align indices)
signals['cond_vol'] = cond_vol.reindex(signals.index)
signals['cond_vol'] = signals['cond_vol'].fillna(method='ffill')

# Position sizing: edge / volatility (similar to Sharpe-based sizing)
vol_median_test = signals['cond_vol'].median()
signals['vol_scale'] = vol_median_test / (signals['cond_vol'] + 0.001)
signals['vol_scale'] = signals['vol_scale'].clip(0.2, 3.0)

# Base position from edge
signals['base_position'] = np.sign(signals['edge']) * np.clip(
    np.abs(signals['edge']) * 20, 0, 1
)

# Volatility-adjusted position
signals['adj_position'] = signals['base_position'] * signals['vol_scale']

# Returns
price_returns = signals['price'].diff()

# Strategy returns
signals['base_return'] = signals['base_position'].shift(1) * price_returns
signals['adj_return'] = signals['adj_position'].shift(1) * price_returns

# Transaction costs (proportional to position change)
tc = 0.001
signals['base_tc'] = np.abs(signals['base_position'].diff()) * tc
signals['adj_tc'] = np.abs(signals['adj_position'].diff()) * tc

signals['base_net_return'] = signals['base_return'] - signals['base_tc']
signals['adj_net_return'] = signals['adj_return'] - signals['adj_tc']

# Performance comparison
for strategy, col in [('Base', 'base_net_return'), ('Vol-Adjusted', 'adj_net_return')]:
    rets = signals[col].dropna()
    total_return = rets.sum()
    sharpe = rets.mean() / (rets.std() + 1e-10) * np.sqrt(252)
    cum_rets = rets.cumsum()
    max_dd = (cum_rets - cum_rets.cummax()).min()

    print(f"{strategy} Strategy:")
    print(f"  Total Return: {total_return:.4f}")
    print(f"  Sharpe Ratio: {sharpe:.3f}")
    print(f"  Max Drawdown: {max_dd:.4f}")
    print(f"  Mean Daily Return: {rets.mean():.6f}")
    print()

# Equity curves
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(signals.index, signals['base_net_return'].cumsum(), label='Base Strategy')
ax.plot(signals.index, signals['adj_net_return'].cumsum(),
        label='Vol-Adjusted Strategy')
ax.axhline(y=0, color='black', linewidth=0.5)
ax.set_title('Cumulative Returns: Base vs Volatility-Adjusted Strategy')
ax.set_ylabel('Cumulative Return')
ax.legend()
plt.tight_layout()
plt.savefig('strategy_comparison.png', dpi=150)
plt.show()

Key Findings

Stationarity: Raw prediction market prices are non-stationary (as expected), but first differences and log-odds returns are stationary, justifying the use of ARIMA with $d=1$.
ARIMA model selection: The grid search over $(p, d, q)$ typically selects a low-order model (e.g., ARIMA(1,1,1) or ARIMA(2,1,1)), consistent with the weak autocorrelation structure of prediction market returns.
Forecast accuracy: ARIMA provides modest improvement over the naive random walk baseline at short horizons (1-3 days), but the improvement diminishes rapidly at longer horizons. This is consistent with markets being approximately efficient but not perfectly so.
Volatility clustering: GARCH successfully captures the time-varying volatility of prediction market returns. The conditional volatility estimate is useful for risk management and position sizing.
Volatility-adjusted trading: Scaling positions inversely with GARCH volatility improves the risk-adjusted returns of the ARIMA-based trading strategy. The volatility-adjusted strategy has a higher Sharpe ratio and lower maximum drawdown than the base strategy.
Practical implications: Time series models are most useful as short-term tactical tools — for timing entries and exits, managing position sizes, and detecting volatility regime changes. They should be combined with fundamental models (like the logistic regression from Case Study 1) for strategic positioning.

Limitations

Bounded prices: ARIMA can forecast outside [0,1]. The log-odds transform partially addresses this.
Event risk: Time series models cannot anticipate the timing or magnitude of news events.
Non-linearity: The linear ARIMA structure may miss important non-linear dynamics.
Parameter instability: ARIMA parameters estimated on historical data may not be stable going forward, especially as the contract approaches expiry.
Computational cost: Walk-forward validation with frequent refitting is computationally intensive.

Code Reference

The complete implementation is available in code/case-study-code.py.