Case Study 1: Tracking the 2024 US Presidential Election on Polymarket

Executive Summary

The 2024 US presidential election generated the highest-volume political prediction market in history. Polymarket, a blockchain-based prediction platform, hosted contracts on the presidential race that accumulated over \$1 billion in cumulative trading volume. In this case study you will explore a synthetic but realistic dataset modeled on the actual Polymarket data, analyze how prices evolved over the campaign, compare market signals to polling averages, and identify volume spikes around key events.

By the end of this case study you will be able to:

Load and explore prediction-market time-series data in Python.
Calculate implied probabilities from raw contract prices.
Visualize price trajectories and trading volume.
Compare prediction-market signals with polling data.
Identify and interpret the market's reaction to major news events.

Background

Polymarket

Polymarket is a decentralized prediction market platform built on the Polygon blockchain. Users deposit cryptocurrency (USDC) and trade binary-outcome contracts that resolve based on real-world events. Polymarket gained mainstream attention in 2024 as its election markets attracted both retail traders and institutional-scale participants.

Key features of Polymarket:

Contract structure: Binary (Yes/No), paying \$1.00 USDC if the event occurs.
Fee model: Polymarket charges no trading fees; revenue comes from market-making spread.
Resolution: Contracts are resolved by UMA's optimistic oracle, a decentralized dispute-resolution mechanism.
Accessibility: Restricted US persons from trading after a 2022 CFTC settlement, though enforcement is limited.

The 2024 Presidential Race

The 2024 US presidential election featured several dramatic turns:

Date	Event
March 2024	Both presumptive nominees effectively locked up their party nominations
June 27, 2024	First presidential debate; widespread concern about incumbent performance
July 21, 2024	President Biden announces withdrawal from the race
July 22, 2024	Vice President Harris emerges as Democratic frontrunner
August 19–22, 2024	Democratic National Convention
September 10, 2024	Harris-Trump debate
October 2024	Late-campaign period with intensifying poll and market divergence
November 5, 2024	Election Day

Data Dictionary

The dataset used in this case study contains daily observations from June 1 to November 6, 2024.

Column	Type	Description
`date`	datetime	Trading day
`dem_yes_price`	float	Closing price of the "Democratic nominee wins" Yes contract
`rep_yes_price`	float	Closing price of the "Republican nominee wins" Yes contract
`dem_volume`	int	Daily trading volume for the Democratic contract (in USD)
`rep_volume`	int	Daily trading volume for the Republican contract (in USD)
`total_volume`	int	Sum of dem_volume and rep_volume
`polling_avg_dem`	float	National polling average for the Democratic nominee (%)
`polling_avg_rep`	float	National polling average for the Republican nominee (%)

Phase 1: Data Exploration

1.1 Generating the Synthetic Dataset

Since real Polymarket data requires API access and may change, we generate a synthetic dataset that captures the key dynamics of the actual 2024 race.

import numpy as np
import pandas as pd
from datetime import datetime, timedelta

def generate_election_data(seed: int = 2024) -> pd.DataFrame:
    """
    Generate synthetic 2024 presidential election prediction-market data.

    The data is modeled on the actual dynamics of the Polymarket presidential
    market, with key events reflected as price shocks.

    Parameters
    ----------
    seed : int
        Random seed for reproducibility.

    Returns
    -------
    pd.DataFrame
        Daily market data from June 1 to November 6, 2024.
    """
    rng = np.random.default_rng(seed)

    start_date = datetime(2024, 6, 1)
    end_date = datetime(2024, 11, 6)
    dates = pd.date_range(start_date, end_date, freq="D")
    n_days = len(dates)

    # Start with Republican leading slightly
    rep_price = 0.54
    rep_prices = [rep_price]

    # Key event impacts (day index from start, price shock to rep_price)
    events = {
        26: ("Biden debate stumble", +0.08),      # June 27
        50: ("Biden withdraws", -0.06),            # July 21
        51: ("Harris enters race", -0.04),         # July 22
        80: ("Democratic Convention", -0.03),       # Aug 19
        101: ("Harris-Trump debate", -0.04),       # Sept 10
        130: ("Late October shift", +0.06),        # Early October
        145: ("Final week tightening", +0.03),     # Late October
    }

    for i in range(1, n_days):
        day_idx = i
        shock = 0.0
        for event_day, (_, impact) in events.items():
            if day_idx == event_day:
                shock = impact
                break

        noise = rng.normal(0, 0.012)
        rep_price = np.clip(rep_prices[-1] + noise + shock, 0.15, 0.85)
        rep_prices.append(round(rep_price, 4))

    rep_prices = np.array(rep_prices)
    dem_prices = np.clip(1.02 - rep_prices + rng.normal(0, 0.005, n_days), 0.15, 0.85)
    dem_prices = np.round(dem_prices, 4)

    # Volume: higher around events, generally increasing toward election
    base_volume = np.linspace(500_000, 5_000_000, n_days)
    volume_noise = rng.integers(100_000, 1_000_000, size=n_days)
    event_volume_boost = np.zeros(n_days)
    for event_day in events:
        if event_day < n_days:
            for offset in range(-1, 3):
                idx = event_day + offset
                if 0 <= idx < n_days:
                    event_volume_boost[idx] += rng.integers(2_000_000, 8_000_000)

    total_volume = (base_volume + volume_noise + event_volume_boost).astype(int)
    dem_volume = (total_volume * rng.uniform(0.4, 0.6, n_days)).astype(int)
    rep_volume = total_volume - dem_volume

    # Polling: smoother, slower to react, slightly different from market
    polling_rep = np.convolve(
        rep_prices * 100, np.ones(7) / 7, mode="same"
    ) + rng.normal(0, 1.0, n_days) - 2.0
    polling_dem = 100 - polling_rep + rng.normal(0, 0.5, n_days) - 4.0
    polling_rep = np.clip(polling_rep, 35, 55)
    polling_dem = np.clip(polling_dem, 35, 55)

    return pd.DataFrame({
        "date": dates,
        "dem_yes_price": dem_prices,
        "rep_yes_price": rep_prices,
        "dem_volume": dem_volume,
        "rep_volume": rep_volume,
        "total_volume": total_volume,
        "polling_avg_dem": np.round(polling_dem, 1),
        "polling_avg_rep": np.round(polling_rep, 1),
    })


df = generate_election_data()
print(f"Dataset shape: {df.shape}")
print(f"Date range: {df['date'].min().date()} to {df['date'].max().date()}")
print(f"\nFirst 5 rows:")
print(df.head().to_string(index=False))
print(f"\nLast 5 rows:")
print(df.tail().to_string(index=False))

1.2 Summary Statistics

def print_summary_statistics(df: pd.DataFrame) -> None:
    """Print key summary statistics for the election dataset."""

    print("=" * 60)
    print("SUMMARY STATISTICS")
    print("=" * 60)

    print("\n--- Price Statistics ---")
    for col in ["dem_yes_price", "rep_yes_price"]:
        label = "Democratic" if "dem" in col else "Republican"
        print(f"\n{label} Yes Price:")
        print(f"  Mean:   ${df[col].mean():.4f}")
        print(f"  Median: ${df[col].median():.4f}")
        print(f"  Min:    ${df[col].min():.4f}")
        print(f"  Max:    ${df[col].max():.4f}")
        print(f"  Std:    ${df[col].std():.4f}")

    print("\n--- Volume Statistics ---")
    print(f"  Total cumulative volume: ${df['total_volume'].sum():,.0f}")
    print(f"  Mean daily volume:       ${df['total_volume'].mean():,.0f}")
    print(f"  Max daily volume:        ${df['total_volume'].max():,.0f}")
    print(f"  Min daily volume:        ${df['total_volume'].min():,.0f}")

    print("\n--- Overround ---")
    overround = df["dem_yes_price"] + df["rep_yes_price"] - 1
    print(f"  Mean overround:   {overround.mean():.2%}")
    print(f"  Max overround:    {overround.max():.2%}")
    print(f"  Min overround:    {overround.min():.2%}")


print_summary_statistics(df)

1.3 Daily Price Changes

def analyze_price_changes(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate daily price changes and identify the largest moves.

    Returns
    -------
    pd.DataFrame
        Top 10 absolute daily price changes for the Republican contract.
    """
    df = df.copy()
    df["rep_change"] = df["rep_yes_price"].diff()
    df["dem_change"] = df["dem_yes_price"].diff()
    df["rep_abs_change"] = df["rep_change"].abs()

    top_moves = (
        df.nlargest(10, "rep_abs_change")[["date", "rep_yes_price", "rep_change", "total_volume"]]
        .reset_index(drop=True)
    )
    top_moves["date"] = top_moves["date"].dt.strftime("%Y-%m-%d")
    return top_moves


top_moves = analyze_price_changes(df)
print("\nTop 10 Largest Daily Price Moves (Republican Contract):")
print(top_moves.to_string(index=False))

Phase 2: Comparing Market Prices to Polling Averages

2.1 Price vs. Polls Visualization

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

def plot_price_vs_polls(df: pd.DataFrame) -> None:
    """
    Create a dual-panel chart comparing market prices to polling averages.

    Top panel: Market-implied probabilities (from contract prices).
    Bottom panel: Polling averages.
    """
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

    # --- Top panel: Market prices ---
    overround = df["dem_yes_price"] + df["rep_yes_price"]
    dem_prob = df["dem_yes_price"] / overround
    rep_prob = df["rep_yes_price"] / overround

    ax1.plot(df["date"], rep_prob * 100, color="red", linewidth=1.5, label="Republican (market)")
    ax1.plot(df["date"], dem_prob * 100, color="blue", linewidth=1.5, label="Democratic (market)")
    ax1.axhline(50, color="gray", linestyle="--", alpha=0.5)
    ax1.set_ylabel("Implied Probability (%)", fontsize=12)
    ax1.set_title("Prediction Market: Implied Win Probabilities", fontsize=14)
    ax1.legend(loc="upper left")
    ax1.set_ylim(20, 80)

    # Annotate key events
    events_annotate = [
        ("2024-06-27", "Debate", 75),
        ("2024-07-21", "Biden\nwithdraws", 30),
        ("2024-09-10", "Harris-Trump\ndebate", 30),
    ]
    for date_str, label, y_pos in events_annotate:
        event_date = pd.Timestamp(date_str)
        ax1.axvline(event_date, color="green", linestyle=":", alpha=0.5)
        ax1.text(event_date, y_pos, label, fontsize=8, ha="center",
                 bbox=dict(boxstyle="round,pad=0.3", facecolor="lightyellow", alpha=0.8))

    # --- Bottom panel: Polling averages ---
    ax2.plot(df["date"], df["polling_avg_rep"], color="red", linewidth=1.5, label="Republican (polls)")
    ax2.plot(df["date"], df["polling_avg_dem"], color="blue", linewidth=1.5, label="Democratic (polls)")
    ax2.axhline(50, color="gray", linestyle="--", alpha=0.5)
    ax2.set_ylabel("Polling Average (%)", fontsize=12)
    ax2.set_xlabel("Date", fontsize=12)
    ax2.set_title("National Polling Averages", fontsize=14)
    ax2.legend(loc="upper left")
    ax2.set_ylim(35, 55)

    ax2.xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))
    ax2.xaxis.set_major_locator(mdates.MonthLocator())

    fig.tight_layout()
    plt.savefig("price_vs_polls.png", dpi=150, bbox_inches="tight")
    plt.show()


plot_price_vs_polls(df)

2.2 Quantifying Market-Poll Divergence

def compute_divergence(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute the daily divergence between market-implied probability and
    polling average for the Republican nominee.

    Returns
    -------
    pd.DataFrame
        Original data plus divergence columns.
    """
    df = df.copy()
    overround = df["dem_yes_price"] + df["rep_yes_price"]
    df["rep_market_prob"] = (df["rep_yes_price"] / overround) * 100
    df["divergence"] = df["rep_market_prob"] - df["polling_avg_rep"]

    print("--- Market-Poll Divergence (Republican) ---")
    print(f"  Mean divergence:     {df['divergence'].mean():+.1f} pp")
    print(f"  Max divergence:      {df['divergence'].max():+.1f} pp")
    print(f"  Min divergence:      {df['divergence'].min():+.1f} pp")
    print(f"  Std of divergence:   {df['divergence'].std():.1f} pp")
    print(f"  Correlation (r):     {df['rep_market_prob'].corr(df['polling_avg_rep']):.3f}")

    return df


df = compute_divergence(df)

2.3 Interpretation

The divergence analysis typically reveals several patterns:

Markets react faster than polls. After major events (Biden's withdrawal, the debate), the market moves within hours while polling averages take days to adjust because new polls must be fielded and weighted.
Markets can diverge persistently. In October 2024, the actual Polymarket data showed the Republican nominee priced 5–10 percentage points higher than polling averages suggested. This sparked debate about whether the market was incorporating "shy voter" effects, or whether large traders were biasing the price.
Polls and markets measure different things. Polls measure vote share (who will people vote for), while markets measure win probability (who will win the Electoral College). These can diverge legitimately due to the electoral-college structure.

Phase 3: Analyzing Volume Spikes Around Key Events

3.1 Volume Spike Detection

def detect_volume_spikes(
    df: pd.DataFrame,
    window: int = 7,
    threshold: float = 2.0,
) -> pd.DataFrame:
    """
    Detect days where trading volume exceeds a threshold multiple of the
    rolling average.

    Parameters
    ----------
    df : pd.DataFrame
        Market data with 'total_volume' and 'date' columns.
    window : int
        Rolling window size (days) for baseline volume.
    threshold : float
        Multiple of rolling average to flag as a spike.

    Returns
    -------
    pd.DataFrame
        Rows from df where volume exceeded the threshold.
    """
    df = df.copy()
    df["rolling_avg_volume"] = df["total_volume"].rolling(window, min_periods=1).mean()
    df["volume_ratio"] = df["total_volume"] / df["rolling_avg_volume"]
    spikes = df[df["volume_ratio"] >= threshold].copy()
    spikes["date_str"] = spikes["date"].dt.strftime("%Y-%m-%d")

    print(f"Found {len(spikes)} volume spike days (>= {threshold}x rolling average):")
    for _, row in spikes.iterrows():
        print(
            f"  {row['date_str']}  |  Volume: ${row['total_volume']:>12,}  |  "
            f"Ratio: {row['volume_ratio']:.1f}x  |  "
            f"Rep price: ${row['rep_yes_price']:.2f}"
        )

    return spikes


spikes = detect_volume_spikes(df, window=7, threshold=2.0)

3.2 Event-Window Analysis

def event_window_analysis(
    df: pd.DataFrame,
    event_date: str,
    event_name: str,
    window_before: int = 3,
    window_after: int = 3,
) -> None:
    """
    Analyze price and volume in a window around a specific event.

    Parameters
    ----------
    df : pd.DataFrame
        Market data.
    event_date : str
        Date of the event (YYYY-MM-DD).
    event_name : str
        Human-readable event name.
    window_before : int
        Days before the event to include.
    window_after : int
        Days after the event to include.
    """
    event_ts = pd.Timestamp(event_date)
    start = event_ts - timedelta(days=window_before)
    end = event_ts + timedelta(days=window_after)
    window_df = df[(df["date"] >= start) & (df["date"] <= end)].copy()

    if window_df.empty:
        print(f"No data found around {event_date}.")
        return

    pre_event = window_df[window_df["date"] < event_ts]
    post_event = window_df[window_df["date"] >= event_ts]

    pre_price = pre_event["rep_yes_price"].iloc[-1] if len(pre_event) > 0 else None
    post_price = post_event["rep_yes_price"].iloc[0] if len(post_event) > 0 else None

    print(f"\n{'=' * 60}")
    print(f"EVENT WINDOW: {event_name} ({event_date})")
    print(f"{'=' * 60}")

    if pre_price is not None and post_price is not None:
        change = post_price - pre_price
        print(f"  Price before event: ${pre_price:.4f}")
        print(f"  Price on event day: ${post_price:.4f}")
        print(f"  Price change:       ${change:+.4f} ({change * 100:+.1f} pp)")

    pre_vol = pre_event["total_volume"].mean() if len(pre_event) > 0 else 0
    post_vol = post_event["total_volume"].mean() if len(post_event) > 0 else 0
    print(f"  Avg volume before:  ${pre_vol:,.0f}")
    print(f"  Avg volume after:   ${post_vol:,.0f}")
    if pre_vol > 0:
        print(f"  Volume multiplier:  {post_vol / pre_vol:.1f}x")

    print(f"\n  Day-by-day detail:")
    for _, row in window_df.iterrows():
        marker = " <-- EVENT" if row["date"] == event_ts else ""
        print(
            f"    {row['date'].strftime('%Y-%m-%d')}  |  "
            f"Rep: ${row['rep_yes_price']:.4f}  |  "
            f"Vol: ${row['total_volume']:>12,}{marker}"
        )


# Analyze key events
event_window_analysis(df, "2024-06-27", "First Presidential Debate")
event_window_analysis(df, "2024-07-21", "Biden Withdraws from Race")
event_window_analysis(df, "2024-09-10", "Harris-Trump Debate")

3.3 Volume-Price Relationship

def plot_volume_price_scatter(df: pd.DataFrame) -> None:
    """
    Scatter plot of daily volume vs. absolute daily price change.
    """
    df = df.copy()
    df["abs_price_change"] = df["rep_yes_price"].diff().abs()
    df = df.dropna()

    fig, ax = plt.subplots(figsize=(10, 6))
    scatter = ax.scatter(
        df["abs_price_change"],
        df["total_volume"],
        alpha=0.5,
        c=df["date"].astype(int),
        cmap="viridis",
        edgecolors="gray",
        linewidth=0.5,
    )
    ax.set_xlabel("Absolute Daily Price Change ($)", fontsize=12)
    ax.set_ylabel("Total Daily Volume ($)", fontsize=12)
    ax.set_title("Volume vs. Price Change: Do Big Moves Come with Big Volume?", fontsize=14)

    # Fit and plot trend line
    z = np.polyfit(df["abs_price_change"], df["total_volume"], 1)
    p = np.poly1d(z)
    x_line = np.linspace(0, df["abs_price_change"].max(), 100)
    ax.plot(x_line, p(x_line), "--", color="red", alpha=0.7, label="Linear trend")

    corr = df["abs_price_change"].corr(df["total_volume"])
    ax.text(
        0.95, 0.95,
        f"r = {corr:.3f}",
        transform=ax.transAxes,
        ha="right", va="top",
        fontsize=12,
        bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.8),
    )

    ax.legend()
    fig.tight_layout()
    plt.savefig("volume_vs_price_change.png", dpi=150, bbox_inches="tight")
    plt.show()


plot_volume_price_scatter(df)

Findings and Recommendations

Key Findings

Market prices reacted to major events within a single trading day, while polling averages took 3–7 days to adjust. This confirms the speed advantage of prediction markets.
Volume spiked 2–5x around key events, indicating that news drives both information and speculative trading. The largest volume spikes coincided with Biden's withdrawal and the September debate.
Persistent market-poll divergence in October 2024 (market more bullish on the Republican nominee than polls) was a real and much-discussed phenomenon. Possible explanations include whale effects, the market pricing Electoral College dynamics that national polls miss, or the market incorporating information not captured by polls.
The overround was consistently small (typically 1–3 %), suggesting a competitive market-making environment.
Volume and price changes were positively correlated (r typically 0.3–0.5), consistent with information-driven trading: big news causes both large price moves and high volume.

Recommendations for Analysts

Use market prices as a complement to polls, not a replacement. When they diverge, investigate why rather than assuming one is wrong.
Pay attention to volume. A price move on low volume is less informative than one on high volume.
Watch for whale effects in crypto-native markets. Large wallets can move prices, especially in the short term.
Normalize for overround before interpreting prices as probabilities.

Discussion Questions

The Polymarket presidential market showed a persistent divergence from polling averages in October 2024. What are three possible explanations? How would you test each one?
Should regulators be concerned that a small number of large traders can influence the price in a prediction market? What safeguards could platforms implement?
The market reacted within hours to Biden's withdrawal. Does this speed advantage have practical value for decision-makers, or is it merely interesting?
How would you design a study to test whether Polymarket's election prices were well-calibrated across many elections? What data would you need?
Prediction markets on elections have been criticized for potentially influencing voter behavior (e.g., discouraging turnout for the "losing" side). Is this a legitimate concern? How would you investigate it empirically?

Mini-Project Extension

Build a Real-Time Election Dashboard

Using the functions developed in this case study, build a Python script or Jupyter notebook that:

Generates (or fetches) daily election market data.
Computes implied probabilities and overround.
Computes the market-poll divergence.
Detects volume spikes automatically.
Produces a single multi-panel dashboard figure with: - Panel 1: Price/probability time series - Panel 2: Polling averages - Panel 3: Market-poll divergence - Panel 4: Daily volume with spike markers
Saves the dashboard as a high-resolution PNG.

Bonus: Add a command-line argument that lets the user specify the event date, and the script automatically zooms into a +-7 day window around that event.

End of Case Study 1