Case Study 2: Did the Market Know? Analyzing Information Incorporation Speed

Overview

One of the strongest empirical claims for prediction markets is that they incorporate new information faster than alternative forecasting methods. When a major news event occurs—a debate performance, an economic report, a scandal—prediction market prices should adjust rapidly to reflect the new reality.

In this case study, we build synthetic datasets that realistically model how prediction market prices respond to news events. We then apply event-study methodology from financial economics to measure the speed and completeness of information incorporation. This approach mirrors the methods used by researchers studying real prediction markets but gives us the advantage of knowing the "true" information content of each event.

By the end of this case study, you will be able to:

  1. Generate realistic synthetic prediction market data with embedded news events
  2. Conduct event-study analysis to measure information incorporation speed
  3. Distinguish between efficient and inefficient price responses
  4. Compare different market structures in their information processing ability
  5. Identify anomalies that suggest market inefficiency

Background: The Event Study Method

The event study methodology, pioneered by Fama, Fisher, Jensen, and Roll (1969) in the context of stock splits, is the workhorse tool for testing market efficiency. The approach is straightforward:

  1. Define the event and its timing
  2. Measure the "normal" return pattern (what prices would do without the event)
  3. Compute the "abnormal" return (actual return minus normal return) around the event
  4. Test whether abnormal returns are statistically significant

In an efficient market: - The full price adjustment happens at the event time (no pre-event drift or post-event drift) - Abnormal returns before the event are zero (no information leakage) - Abnormal returns after the event are zero (no delayed response)


Part 1: Generating Synthetic Data

1.1 Realistic Price Path Generator

We create a synthetic data generator that produces price paths with known properties: a true underlying probability that evolves over time, news events that cause discrete jumps, and market microstructure noise.

"""
Case Study 2: Information Incorporation Speed Analysis
Chapter 11: Information Aggregation Theory

Generate synthetic prediction market data with known information events,
then analyze how quickly and completely the market incorporates information.
"""

import numpy as np
from scipy import stats
from collections import defaultdict


class SyntheticMarketGenerator:
    """
    Generate realistic synthetic prediction market price data
    with known information events.
    """

    def __init__(self, seed=42):
        self.rng = np.random.RandomState(seed)

    def generate_election_market(self, n_days=180, events=None):
        """
        Generate a synthetic election prediction market.

        Simulates 180 days leading up to an election with:
        - Underlying true probability that evolves via random walk
        - Discrete news events that shift the true probability
        - Market microstructure noise
        - Varying liquidity (thinner early, deeper later)

        Parameters
        ----------
        n_days : int
            Number of trading days.
        events : list of dict, optional
            Predefined events. If None, generates random events.

        Returns
        -------
        dict with complete simulation data
        """
        # Ticks per day (e.g., one observation per 15 minutes for 8 hours)
        ticks_per_day = 32
        n_ticks = n_days * ticks_per_day

        # Define events if not provided
        if events is None:
            events = self._generate_election_events(n_days)

        # True probability path (what an omniscient observer would know)
        true_prob = np.zeros(n_ticks)
        true_prob[0] = 0.50  # start at coin flip

        # Slow-moving fundamentals (random walk)
        for t in range(1, n_ticks):
            # Small drift toward a "fundamental" value
            fundamental_drift = 0.0001 * (0.55 - true_prob[t-1])
            random_innovation = self.rng.normal(0, 0.001)
            true_prob[t] = np.clip(
                true_prob[t-1] + fundamental_drift + random_innovation,
                0.05, 0.95
            )

        # Apply event shocks to true probability
        event_ticks = []
        for event in events:
            day = event['day']
            tick = day * ticks_per_day + ticks_per_day // 2  # mid-day
            if tick < n_ticks:
                shock = event['shock']
                true_prob[tick:] += shock
                true_prob = np.clip(true_prob, 0.05, 0.95)
                event_ticks.append(tick)
                event['tick'] = tick

        # Market price: true probability + noise
        # Noise decreases over time (market becomes more liquid)
        noise_scale = np.linspace(0.03, 0.01, n_ticks)
        market_noise = self.rng.normal(0, 1, n_ticks) * noise_scale

        # Add mean reversion toward true probability
        market_price = np.zeros(n_ticks)
        market_price[0] = 0.50

        for t in range(1, n_ticks):
            # Market price adjusts toward true probability
            adjustment_speed = 0.15  # how quickly market learns
            price_adjustment = adjustment_speed * (true_prob[t] - market_price[t-1])
            microstructure_noise = market_noise[t]

            market_price[t] = np.clip(
                market_price[t-1] + price_adjustment + microstructure_noise,
                0.01, 0.99
            )

        # Compute derived quantities
        returns = np.diff(market_price)
        timestamps = np.arange(n_ticks) / ticks_per_day  # in days

        return {
            'true_probability': true_prob,
            'market_price': market_price,
            'returns': returns,
            'timestamps': timestamps,
            'events': events,
            'event_ticks': event_ticks,
            'n_ticks': n_ticks,
            'ticks_per_day': ticks_per_day,
            'n_days': n_days,
            'noise_scale': noise_scale
        }

    def _generate_election_events(self, n_days):
        """Generate realistic election events."""
        events = [
            {
                'day': 20,
                'name': 'Major poll release showing tight race',
                'shock': 0.03,
                'type': 'public_information'
            },
            {
                'day': 45,
                'name': 'First debate: candidate A performs well',
                'shock': 0.08,
                'type': 'public_information'
            },
            {
                'day': 60,
                'name': 'Economic report: unemployment drops',
                'shock': 0.04,
                'type': 'public_information'
            },
            {
                'day': 80,
                'name': 'Minor scandal affecting candidate B',
                'shock': 0.06,
                'type': 'public_information'
            },
            {
                'day': 95,
                'name': 'Second debate: candidate B recovers',
                'shock': -0.05,
                'type': 'public_information'
            },
            {
                'day': 110,
                'name': 'Major endorsement for candidate A',
                'shock': 0.03,
                'type': 'public_information'
            },
            {
                'day': 130,
                'name': 'Leaked internal polling data',
                'shock': 0.05,
                'type': 'leaked_information'
            },
            {
                'day': 150,
                'name': 'October surprise: major scandal',
                'shock': -0.10,
                'type': 'public_information'
            },
            {
                'day': 165,
                'name': 'Final debate: mixed reviews',
                'shock': 0.02,
                'type': 'public_information'
            },
            {
                'day': 175,
                'name': 'Final poll aggregate released',
                'shock': 0.01,
                'type': 'public_information'
            }
        ]
        return events

    def generate_inefficient_market(self, n_days=180, delay_ticks=10,
                                     overreaction_factor=1.0):
        """
        Generate market data with known inefficiencies.

        Parameters
        ----------
        n_days : int
            Number of trading days.
        delay_ticks : int
            How many ticks the market takes to fully incorporate information.
        overreaction_factor : float
            If > 1, market initially overreacts; if < 1, underreacts.
        """
        efficient_data = self.generate_election_market(n_days)

        # Create inefficient version by adding delayed response
        efficient_price = efficient_data['market_price'].copy()
        inefficient_price = np.zeros_like(efficient_price)
        inefficient_price[0] = efficient_price[0]

        for t in range(1, len(efficient_price)):
            # Delayed adjustment: only partially incorporate each tick's info
            target = efficient_price[t]
            current = inefficient_price[t-1]

            # Slow adjustment
            adjustment_rate = 1.0 / delay_ticks
            noise = self.rng.normal(0, 0.005)

            # Check if we're near an event (for overreaction)
            near_event = any(
                abs(t - et) < 3 for et in efficient_data['event_ticks']
            )

            if near_event and overreaction_factor != 1.0:
                # Overreact to events
                diff = target - current
                inefficient_price[t] = np.clip(
                    current + diff * overreaction_factor + noise,
                    0.01, 0.99
                )
            else:
                inefficient_price[t] = np.clip(
                    current + (target - current) * adjustment_rate + noise,
                    0.01, 0.99
                )

        inefficient_data = efficient_data.copy()
        inefficient_data['market_price'] = inefficient_price
        inefficient_data['returns'] = np.diff(inefficient_price)
        inefficient_data['inefficiency_type'] = {
            'delay_ticks': delay_ticks,
            'overreaction_factor': overreaction_factor
        }

        return inefficient_data

Part 2: Event Study Analysis

2.1 Event Study Framework

class EventStudyAnalyzer:
    """
    Conduct event study analysis on prediction market data.
    """

    def __init__(self, market_data):
        self.data = market_data
        self.prices = market_data['market_price']
        self.returns = market_data['returns']
        self.events = market_data['events']
        self.tpd = market_data['ticks_per_day']

    def analyze_event(self, event, pre_window=48, post_window=48):
        """
        Analyze a single event: measure pre-event, event, and post-event
        price behavior.

        Parameters
        ----------
        event : dict
            Event specification with 'tick' field.
        pre_window : int
            Number of ticks before event to analyze.
        post_window : int
            Number of ticks after event to analyze.

        Returns
        -------
        dict with event analysis results
        """
        t = event['tick']

        if t - pre_window < 0 or t + post_window >= len(self.prices):
            return None

        # Extract windows
        pre_prices = self.prices[t - pre_window:t]
        post_prices = self.prices[t:t + post_window + 1]
        event_price_before = self.prices[t - 1]
        event_price_after = self.prices[t + 1] if t + 1 < len(self.prices) else self.prices[t]

        # Pre-event returns
        pre_returns = np.diff(pre_prices)
        # Post-event returns (excluding the event tick itself)
        post_returns = np.diff(post_prices[1:]) if len(post_prices) > 2 else np.array([])

        # Event return
        event_return = self.prices[t] - self.prices[t - 1]

        # Cumulative abnormal returns (CAR)
        # Estimate "normal" return as mean return in estimation window
        estimation_window = self.returns[max(0, t - pre_window * 3):t - pre_window]
        normal_return = estimation_window.mean() if len(estimation_window) > 0 else 0

        # Pre-event CAR
        pre_abnormal = pre_returns - normal_return
        pre_car = np.cumsum(pre_abnormal)

        # Post-event CAR
        post_abnormal = post_returns - normal_return if len(post_returns) > 0 else np.array([])
        post_car = np.cumsum(post_abnormal) if len(post_abnormal) > 0 else np.array([])

        # Statistical tests
        # Test 1: Are pre-event returns different from zero? (information leakage)
        pre_t_stat, pre_p_value = (
            stats.ttest_1samp(pre_returns, 0)
            if len(pre_returns) > 1 else (0, 1)
        )

        # Test 2: Are post-event returns different from zero? (delayed response)
        post_t_stat, post_p_value = (
            stats.ttest_1samp(post_returns, 0)
            if len(post_returns) > 1 else (0, 1)
        )

        # Information incorporation speed
        # How many ticks until 90% of the total price adjustment is complete?
        if len(post_prices) > 1:
            total_adjustment = post_prices[-1] - event_price_before
            if abs(total_adjustment) > 0.001:
                cumulative_adjustment = post_prices - event_price_before
                threshold = 0.9 * total_adjustment
                incorporation_speed = next(
                    (i for i, ca in enumerate(cumulative_adjustment)
                     if (total_adjustment > 0 and ca >= threshold) or
                        (total_adjustment < 0 and ca <= threshold)),
                    len(cumulative_adjustment)
                )
            else:
                incorporation_speed = 0
        else:
            incorporation_speed = 0

        return {
            'event_name': event['name'],
            'event_type': event.get('type', 'unknown'),
            'event_tick': t,
            'event_day': event['day'],
            'shock_magnitude': event['shock'],
            'price_before': event_price_before,
            'price_after': event_price_after,
            'event_return': event_return,
            'pre_event_car': pre_car[-1] if len(pre_car) > 0 else 0,
            'post_event_car': post_car[-1] if len(post_car) > 0 else 0,
            'pre_event_p_value': pre_p_value,
            'post_event_p_value': post_p_value,
            'information_leakage': pre_p_value < 0.05,
            'delayed_response': post_p_value < 0.05,
            'incorporation_speed_ticks': incorporation_speed,
            'incorporation_speed_hours': incorporation_speed * (8.0 / self.tpd),
            'efficient_response': pre_p_value > 0.05 and post_p_value > 0.05
        }

    def analyze_all_events(self):
        """Analyze all events in the dataset."""
        results = []
        for event in self.events:
            if 'tick' not in event:
                continue
            analysis = self.analyze_event(event)
            if analysis is not None:
                results.append(analysis)
        return results

    def compute_efficiency_metrics(self, event_results):
        """
        Compute overall efficiency metrics across all events.
        """
        if not event_results:
            return {}

        n_events = len(event_results)
        n_efficient = sum(1 for r in event_results if r['efficient_response'])
        n_leakage = sum(1 for r in event_results if r['information_leakage'])
        n_delayed = sum(1 for r in event_results if r['delayed_response'])

        avg_speed = np.mean([
            r['incorporation_speed_hours'] for r in event_results
        ])

        # Efficiency ratio: actual event return / expected shock
        efficiency_ratios = []
        for r in event_results:
            if abs(r['shock_magnitude']) > 0.001:
                ratio = abs(r['event_return']) / abs(r['shock_magnitude'])
                efficiency_ratios.append(ratio)

        return {
            'n_events': n_events,
            'n_efficient_responses': n_efficient,
            'pct_efficient': n_efficient / n_events,
            'n_information_leakage': n_leakage,
            'n_delayed_response': n_delayed,
            'avg_incorporation_speed_hours': avg_speed,
            'mean_efficiency_ratio': np.mean(efficiency_ratios) if efficiency_ratios else 0,
            'median_efficiency_ratio': np.median(efficiency_ratios) if efficiency_ratios else 0
        }

    def weak_form_efficiency_test(self):
        """Test weak-form efficiency using autocorrelation."""
        returns = self.returns
        n = len(returns)

        # Autocorrelation at various lags
        max_lag = min(20, n // 4)
        autocorrelations = {}

        for lag in range(1, max_lag + 1):
            x = returns[:n - lag]
            y = returns[lag:]
            corr, p_val = stats.pearsonr(x, y)
            autocorrelations[lag] = {
                'correlation': corr,
                'p_value': p_val,
                'significant': p_val < 0.05
            }

        # Ljung-Box statistic
        acs = [autocorrelations[k]['correlation'] for k in range(1, max_lag + 1)]
        q_stat = n * (n + 2) * sum(
            ac**2 / (n - k) for k, ac in enumerate(acs, 1)
        )
        lb_p_value = 1 - stats.chi2.cdf(q_stat, max_lag)

        return {
            'autocorrelations': autocorrelations,
            'ljung_box_statistic': q_stat,
            'ljung_box_p_value': lb_p_value,
            'weak_form_efficient': lb_p_value > 0.05,
            'n_significant_lags': sum(
                1 for v in autocorrelations.values() if v['significant']
            )
        }

Part 3: Running the Analysis

3.1 Efficient Market Analysis

# Generate and analyze an efficient market
generator = SyntheticMarketGenerator(seed=42)
efficient_data = generator.generate_election_market()

analyzer = EventStudyAnalyzer(efficient_data)
event_results = analyzer.analyze_all_events()
efficiency_metrics = analyzer.compute_efficiency_metrics(event_results)
weak_form = analyzer.weak_form_efficiency_test()

print("=" * 70)
print("EFFICIENT MARKET ANALYSIS")
print("=" * 70)

print("\n--- Event-by-Event Analysis ---\n")
print(f"{'Event':<45} {'Return':>8} {'Speed(h)':>9} {'Efficient':>10}")
print("-" * 75)
for r in event_results:
    print(f"{r['event_name'][:44]:<45} {r['event_return']:>+8.4f} "
          f"{r['incorporation_speed_hours']:>8.1f}  "
          f"{'Yes' if r['efficient_response'] else 'No':>9}")

print(f"\n--- Overall Efficiency Metrics ---")
print(f"  Events analyzed:         {efficiency_metrics['n_events']}")
print(f"  Efficient responses:     {efficiency_metrics['n_efficient_responses']} "
      f"({efficiency_metrics['pct_efficient']:.0%})")
print(f"  Information leakage:     {efficiency_metrics['n_information_leakage']}")
print(f"  Delayed responses:       {efficiency_metrics['n_delayed_response']}")
print(f"  Avg incorporation speed: {efficiency_metrics['avg_incorporation_speed_hours']:.1f} hours")
print(f"  Mean efficiency ratio:   {efficiency_metrics['mean_efficiency_ratio']:.2f}")

print(f"\n--- Weak-Form Efficiency Test ---")
print(f"  Ljung-Box statistic:     {weak_form['ljung_box_statistic']:.2f}")
print(f"  p-value:                 {weak_form['ljung_box_p_value']:.4f}")
print(f"  Weak-form efficient:     {weak_form['weak_form_efficient']}")
print(f"  Significant AC lags:     {weak_form['n_significant_lags']}")

3.2 Inefficient Market Comparison

# Generate and analyze an inefficient market (delayed response)
inefficient_data = generator.generate_inefficient_market(
    delay_ticks=15,
    overreaction_factor=1.3
)

analyzer_ineff = EventStudyAnalyzer(inefficient_data)
event_results_ineff = analyzer_ineff.analyze_all_events()
efficiency_metrics_ineff = analyzer_ineff.compute_efficiency_metrics(event_results_ineff)
weak_form_ineff = analyzer_ineff.weak_form_efficiency_test()

print("\n" + "=" * 70)
print("INEFFICIENT MARKET ANALYSIS (Delayed + Overreaction)")
print("=" * 70)

print("\n--- Event-by-Event Analysis ---\n")
print(f"{'Event':<45} {'Return':>8} {'Speed(h)':>9} {'Efficient':>10}")
print("-" * 75)
for r in event_results_ineff:
    print(f"{r['event_name'][:44]:<45} {r['event_return']:>+8.4f} "
          f"{r['incorporation_speed_hours']:>8.1f}  "
          f"{'Yes' if r['efficient_response'] else 'No':>9}")

print(f"\n--- Comparison: Efficient vs. Inefficient ---\n")
print(f"{'Metric':<35} {'Efficient':>12} {'Inefficient':>12}")
print("-" * 60)
metrics_to_compare = [
    ('Pct efficient responses', 'pct_efficient'),
    ('Avg speed (hours)', 'avg_incorporation_speed_hours'),
    ('Information leakage events', 'n_information_leakage'),
    ('Delayed response events', 'n_delayed_response'),
    ('Mean efficiency ratio', 'mean_efficiency_ratio')
]
for label, key in metrics_to_compare:
    val_eff = efficiency_metrics.get(key, 'N/A')
    val_ineff = efficiency_metrics_ineff.get(key, 'N/A')
    if isinstance(val_eff, float):
        print(f"  {label:<33} {val_eff:>12.3f} {val_ineff:>12.3f}")
    else:
        print(f"  {label:<33} {val_eff:>12} {val_ineff:>12}")

Part 4: Deep Dive — Information Leakage Detection

4.1 Detecting Pre-Event Price Movements

One of the most interesting questions is whether prediction markets show evidence of "information leakage" — prices moving in the direction of upcoming news before the news is publicly released. This could indicate insider trading, or it could reflect the market correctly anticipating public information.

def detect_information_leakage(market_data, pre_window_hours=24):
    """
    Analyze whether prices systematically move in the correct direction
    before events, suggesting information leakage or anticipation.
    """
    prices = market_data['market_price']
    tpd = market_data['ticks_per_day']
    pre_window_ticks = int(pre_window_hours * tpd / 8)  # 8-hour trading day

    leakage_analysis = []

    for event in market_data['events']:
        if 'tick' not in event:
            continue

        t = event['tick']
        if t - pre_window_ticks < 0:
            continue

        # Pre-event price drift
        pre_drift = prices[t] - prices[t - pre_window_ticks]
        event_direction = np.sign(event['shock'])

        # Does the pre-event drift anticipate the shock?
        anticipates = np.sign(pre_drift) == event_direction and abs(pre_drift) > 0.005

        # Measure pre-drift as fraction of total shock
        if abs(event['shock']) > 0.001:
            leakage_fraction = pre_drift / event['shock']
        else:
            leakage_fraction = 0

        leakage_analysis.append({
            'event': event['name'],
            'event_type': event.get('type', 'unknown'),
            'shock': event['shock'],
            'pre_drift': pre_drift,
            'anticipates': anticipates,
            'leakage_fraction': leakage_fraction
        })

    # Summary statistics
    n_events = len(leakage_analysis)
    n_anticipating = sum(1 for la in leakage_analysis if la['anticipates'])

    # Under null hypothesis (no leakage), anticipation should be ~50% by chance
    if n_events > 0:
        binom_p = 1 - stats.binom.cdf(n_anticipating - 1, n_events, 0.5)
    else:
        binom_p = 1.0

    print("=== Information Leakage Analysis ===\n")
    print(f"{'Event':<45} {'Type':<15} {'Pre-Drift':>10} {'Anticipated':>12}")
    print("-" * 85)
    for la in leakage_analysis:
        print(f"{la['event'][:44]:<45} {la['event_type']:<15} "
              f"{la['pre_drift']:>+10.4f} {'YES' if la['anticipates'] else 'no':>11}")

    print(f"\n  Events anticipating shock: {n_anticipating}/{n_events}")
    print(f"  Binomial test p-value:     {binom_p:.4f}")
    print(f"  Evidence of leakage:       {'Yes' if binom_p < 0.05 else 'No'}")

    # Special attention to "leaked_information" events
    leaked = [la for la in leakage_analysis if la['event_type'] == 'leaked_information']
    if leaked:
        print(f"\n  Events classified as 'leaked_information':")
        for la in leaked:
            print(f"    {la['event']}: pre-drift = {la['pre_drift']:+.4f}, "
                  f"leakage fraction = {la['leakage_fraction']:.1%}")

    return leakage_analysis

leakage_results = detect_information_leakage(efficient_data)

Part 5: Comparing Markets to Alternative Forecasts

5.1 Simulated Poll Data

To contextualize market efficiency, we compare market prices to simulated polls that update more slowly:

def generate_poll_comparison(market_data, poll_delay_days=3, poll_noise=0.05):
    """
    Generate synthetic poll data that updates more slowly than the market.

    Parameters
    ----------
    market_data : dict
        Market data from SyntheticMarketGenerator.
    poll_delay_days : int
        How many days polls lag behind the true probability.
    poll_noise : float
        Standard deviation of polling noise.
    """
    true_prob = market_data['true_probability']
    market_price = market_data['market_price']
    tpd = market_data['ticks_per_day']
    n_ticks = len(true_prob)

    # Polls update every few days with a delay
    poll_values = np.full(n_ticks, 0.50)
    delay_ticks = poll_delay_days * tpd

    for t in range(n_ticks):
        # Poll reflects true probability from `delay_ticks` ago
        reference_t = max(0, t - delay_ticks)
        poll_values[t] = true_prob[reference_t] + np.random.normal(0, poll_noise)

    poll_values = np.clip(poll_values, 0.01, 0.99)

    # Smooth polls (they are released periodically, not continuously)
    poll_release_interval = 3 * tpd  # every 3 days
    smoothed_polls = np.full(n_ticks, 0.50)
    last_release = poll_values[0]

    for t in range(n_ticks):
        if t % poll_release_interval == 0:
            last_release = poll_values[t]
        smoothed_polls[t] = last_release

    # Compare accuracy
    market_errors = np.abs(market_price - true_prob)
    poll_errors = np.abs(smoothed_polls - true_prob)

    # Error around events specifically
    event_market_errors = []
    event_poll_errors = []

    for event in market_data['events']:
        if 'tick' not in event:
            continue
        t = event['tick']
        # Look at errors 1 day after event
        check_t = min(t + tpd, n_ticks - 1)
        event_market_errors.append(abs(market_price[check_t] - true_prob[check_t]))
        event_poll_errors.append(abs(smoothed_polls[check_t] - true_prob[check_t]))

    print("=== Market vs. Poll Accuracy Comparison ===\n")
    print(f"{'Metric':<40} {'Market':>10} {'Polls':>10}")
    print("-" * 62)
    print(f"{'Overall MAE':<40} {market_errors.mean():>10.4f} "
          f"{poll_errors.mean():>10.4f}")
    print(f"{'MAE (last 30 days)':<40} "
          f"{market_errors[-30*tpd:].mean():>10.4f} "
          f"{poll_errors[-30*tpd:].mean():>10.4f}")
    print(f"{'Post-event MAE (1 day after)':<40} "
          f"{np.mean(event_market_errors):>10.4f} "
          f"{np.mean(event_poll_errors):>10.4f}")
    print(f"{'Market more accurate (% of ticks)':<40} "
          f"{(market_errors < poll_errors).mean():>10.1%}")

    return {
        'market_errors': market_errors,
        'poll_errors': poll_errors,
        'event_market_errors': event_market_errors,
        'event_poll_errors': event_poll_errors,
        'smoothed_polls': smoothed_polls
    }

comparison = generate_poll_comparison(efficient_data)

Part 6: Summary and Discussion

6.1 Key Findings

This case study demonstrates several important properties of information incorporation in prediction markets:

  1. Speed. Efficient markets incorporate information within hours (or faster). Our synthetic efficient market shows rapid price adjustment to news events, with most of the adjustment happening within a few ticks of the event.

  2. Completeness. The efficiency ratio (actual price change / expected shock) measures whether the market fully incorporates information. Ratios near 1.0 indicate complete incorporation; ratios below 1.0 indicate underreaction.

  3. Leakage detection. The event study framework can detect pre-event price movements that suggest information leakage or anticipation. The binomial test provides a simple statistical framework for this.

  4. Comparison advantage. Markets outperform polls most dramatically in the period immediately after major events, when the market has already adjusted but polls have not yet been conducted or released.

6.2 Limitations

Several limitations of this analysis should be noted:

  • Synthetic data. Our data is generated by a model, not observed from real markets. The efficiency properties are partially determined by our modeling choices.
  • Known event timing. In real markets, the exact timing of information arrival is often uncertain, making event studies harder to conduct.
  • Single market. We analyze a single simulated market rather than a cross-section of markets. Real-world analysis would aggregate across many events and markets.

6.3 Discussion Questions

  1. How would the results change if we added a "manipulation event"—a large trade designed to move the price away from the true probability? Would the event study detect the manipulation attempt?

  2. In real prediction markets (like PredictIt or Polymarket), what practical challenges would you face in conducting this type of event study? Consider data availability, event identification, and confounding factors.

  3. Our synthetic polls update every 3 days with a 3-day delay. Is this a fair representation of modern polling? How would "now-casting" models (which update daily) change the comparison?

  4. The "October surprise" event has the largest shock (-0.10). How does market behavior around this extreme event differ from behavior around smaller events? Is there evidence of overreaction?

  5. If you were advising a policy-maker on whether to trust prediction market prices for a specific decision, what evidence from this type of analysis would you cite? What caveats would you mention?