Case Study 1: Tracking the 2024 US Presidential Election on Polymarket
Executive Summary
The 2024 US presidential election generated the highest-volume political prediction market in history. Polymarket, a blockchain-based prediction platform, hosted contracts on the presidential race that accumulated over \$1 billion in cumulative trading volume. In this case study you will explore a synthetic but realistic dataset modeled on the actual Polymarket data, analyze how prices evolved over the campaign, compare market signals to polling averages, and identify volume spikes around key events.
By the end of this case study you will be able to:
- Load and explore prediction-market time-series data in Python.
- Calculate implied probabilities from raw contract prices.
- Visualize price trajectories and trading volume.
- Compare prediction-market signals with polling data.
- Identify and interpret the market's reaction to major news events.
Background
Polymarket
Polymarket is a decentralized prediction market platform built on the Polygon blockchain. Users deposit cryptocurrency (USDC) and trade binary-outcome contracts that resolve based on real-world events. Polymarket gained mainstream attention in 2024 as its election markets attracted both retail traders and institutional-scale participants.
Key features of Polymarket:
- Contract structure: Binary (Yes/No), paying \$1.00 USDC if the event occurs.
- Fee model: Polymarket charges no trading fees; revenue comes from market-making spread.
- Resolution: Contracts are resolved by UMA's optimistic oracle, a decentralized dispute-resolution mechanism.
- Accessibility: Restricted US persons from trading after a 2022 CFTC settlement, though enforcement is limited.
The 2024 Presidential Race
The 2024 US presidential election featured several dramatic turns:
| Date | Event |
|---|---|
| March 2024 | Both presumptive nominees effectively locked up their party nominations |
| June 27, 2024 | First presidential debate; widespread concern about incumbent performance |
| July 21, 2024 | President Biden announces withdrawal from the race |
| July 22, 2024 | Vice President Harris emerges as Democratic frontrunner |
| August 19–22, 2024 | Democratic National Convention |
| September 10, 2024 | Harris-Trump debate |
| October 2024 | Late-campaign period with intensifying poll and market divergence |
| November 5, 2024 | Election Day |
Data Dictionary
The dataset used in this case study contains daily observations from June 1 to November 6, 2024.
| Column | Type | Description |
|---|---|---|
date |
datetime | Trading day |
dem_yes_price |
float | Closing price of the "Democratic nominee wins" Yes contract |
rep_yes_price |
float | Closing price of the "Republican nominee wins" Yes contract |
dem_volume |
int | Daily trading volume for the Democratic contract (in USD) |
rep_volume |
int | Daily trading volume for the Republican contract (in USD) |
total_volume |
int | Sum of dem_volume and rep_volume |
polling_avg_dem |
float | National polling average for the Democratic nominee (%) |
polling_avg_rep |
float | National polling average for the Republican nominee (%) |
Phase 1: Data Exploration
1.1 Generating the Synthetic Dataset
Since real Polymarket data requires API access and may change, we generate a synthetic dataset that captures the key dynamics of the actual 2024 race.
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
def generate_election_data(seed: int = 2024) -> pd.DataFrame:
"""
Generate synthetic 2024 presidential election prediction-market data.
The data is modeled on the actual dynamics of the Polymarket presidential
market, with key events reflected as price shocks.
Parameters
----------
seed : int
Random seed for reproducibility.
Returns
-------
pd.DataFrame
Daily market data from June 1 to November 6, 2024.
"""
rng = np.random.default_rng(seed)
start_date = datetime(2024, 6, 1)
end_date = datetime(2024, 11, 6)
dates = pd.date_range(start_date, end_date, freq="D")
n_days = len(dates)
# Start with Republican leading slightly
rep_price = 0.54
rep_prices = [rep_price]
# Key event impacts (day index from start, price shock to rep_price)
events = {
26: ("Biden debate stumble", +0.08), # June 27
50: ("Biden withdraws", -0.06), # July 21
51: ("Harris enters race", -0.04), # July 22
80: ("Democratic Convention", -0.03), # Aug 19
101: ("Harris-Trump debate", -0.04), # Sept 10
130: ("Late October shift", +0.06), # Early October
145: ("Final week tightening", +0.03), # Late October
}
for i in range(1, n_days):
day_idx = i
shock = 0.0
for event_day, (_, impact) in events.items():
if day_idx == event_day:
shock = impact
break
noise = rng.normal(0, 0.012)
rep_price = np.clip(rep_prices[-1] + noise + shock, 0.15, 0.85)
rep_prices.append(round(rep_price, 4))
rep_prices = np.array(rep_prices)
dem_prices = np.clip(1.02 - rep_prices + rng.normal(0, 0.005, n_days), 0.15, 0.85)
dem_prices = np.round(dem_prices, 4)
# Volume: higher around events, generally increasing toward election
base_volume = np.linspace(500_000, 5_000_000, n_days)
volume_noise = rng.integers(100_000, 1_000_000, size=n_days)
event_volume_boost = np.zeros(n_days)
for event_day in events:
if event_day < n_days:
for offset in range(-1, 3):
idx = event_day + offset
if 0 <= idx < n_days:
event_volume_boost[idx] += rng.integers(2_000_000, 8_000_000)
total_volume = (base_volume + volume_noise + event_volume_boost).astype(int)
dem_volume = (total_volume * rng.uniform(0.4, 0.6, n_days)).astype(int)
rep_volume = total_volume - dem_volume
# Polling: smoother, slower to react, slightly different from market
polling_rep = np.convolve(
rep_prices * 100, np.ones(7) / 7, mode="same"
) + rng.normal(0, 1.0, n_days) - 2.0
polling_dem = 100 - polling_rep + rng.normal(0, 0.5, n_days) - 4.0
polling_rep = np.clip(polling_rep, 35, 55)
polling_dem = np.clip(polling_dem, 35, 55)
return pd.DataFrame({
"date": dates,
"dem_yes_price": dem_prices,
"rep_yes_price": rep_prices,
"dem_volume": dem_volume,
"rep_volume": rep_volume,
"total_volume": total_volume,
"polling_avg_dem": np.round(polling_dem, 1),
"polling_avg_rep": np.round(polling_rep, 1),
})
df = generate_election_data()
print(f"Dataset shape: {df.shape}")
print(f"Date range: {df['date'].min().date()} to {df['date'].max().date()}")
print(f"\nFirst 5 rows:")
print(df.head().to_string(index=False))
print(f"\nLast 5 rows:")
print(df.tail().to_string(index=False))
1.2 Summary Statistics
def print_summary_statistics(df: pd.DataFrame) -> None:
"""Print key summary statistics for the election dataset."""
print("=" * 60)
print("SUMMARY STATISTICS")
print("=" * 60)
print("\n--- Price Statistics ---")
for col in ["dem_yes_price", "rep_yes_price"]:
label = "Democratic" if "dem" in col else "Republican"
print(f"\n{label} Yes Price:")
print(f" Mean: ${df[col].mean():.4f}")
print(f" Median: ${df[col].median():.4f}")
print(f" Min: ${df[col].min():.4f}")
print(f" Max: ${df[col].max():.4f}")
print(f" Std: ${df[col].std():.4f}")
print("\n--- Volume Statistics ---")
print(f" Total cumulative volume: ${df['total_volume'].sum():,.0f}")
print(f" Mean daily volume: ${df['total_volume'].mean():,.0f}")
print(f" Max daily volume: ${df['total_volume'].max():,.0f}")
print(f" Min daily volume: ${df['total_volume'].min():,.0f}")
print("\n--- Overround ---")
overround = df["dem_yes_price"] + df["rep_yes_price"] - 1
print(f" Mean overround: {overround.mean():.2%}")
print(f" Max overround: {overround.max():.2%}")
print(f" Min overround: {overround.min():.2%}")
print_summary_statistics(df)
1.3 Daily Price Changes
def analyze_price_changes(df: pd.DataFrame) -> pd.DataFrame:
"""
Calculate daily price changes and identify the largest moves.
Returns
-------
pd.DataFrame
Top 10 absolute daily price changes for the Republican contract.
"""
df = df.copy()
df["rep_change"] = df["rep_yes_price"].diff()
df["dem_change"] = df["dem_yes_price"].diff()
df["rep_abs_change"] = df["rep_change"].abs()
top_moves = (
df.nlargest(10, "rep_abs_change")[["date", "rep_yes_price", "rep_change", "total_volume"]]
.reset_index(drop=True)
)
top_moves["date"] = top_moves["date"].dt.strftime("%Y-%m-%d")
return top_moves
top_moves = analyze_price_changes(df)
print("\nTop 10 Largest Daily Price Moves (Republican Contract):")
print(top_moves.to_string(index=False))
Phase 2: Comparing Market Prices to Polling Averages
2.1 Price vs. Polls Visualization
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def plot_price_vs_polls(df: pd.DataFrame) -> None:
"""
Create a dual-panel chart comparing market prices to polling averages.
Top panel: Market-implied probabilities (from contract prices).
Bottom panel: Polling averages.
"""
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
# --- Top panel: Market prices ---
overround = df["dem_yes_price"] + df["rep_yes_price"]
dem_prob = df["dem_yes_price"] / overround
rep_prob = df["rep_yes_price"] / overround
ax1.plot(df["date"], rep_prob * 100, color="red", linewidth=1.5, label="Republican (market)")
ax1.plot(df["date"], dem_prob * 100, color="blue", linewidth=1.5, label="Democratic (market)")
ax1.axhline(50, color="gray", linestyle="--", alpha=0.5)
ax1.set_ylabel("Implied Probability (%)", fontsize=12)
ax1.set_title("Prediction Market: Implied Win Probabilities", fontsize=14)
ax1.legend(loc="upper left")
ax1.set_ylim(20, 80)
# Annotate key events
events_annotate = [
("2024-06-27", "Debate", 75),
("2024-07-21", "Biden\nwithdraws", 30),
("2024-09-10", "Harris-Trump\ndebate", 30),
]
for date_str, label, y_pos in events_annotate:
event_date = pd.Timestamp(date_str)
ax1.axvline(event_date, color="green", linestyle=":", alpha=0.5)
ax1.text(event_date, y_pos, label, fontsize=8, ha="center",
bbox=dict(boxstyle="round,pad=0.3", facecolor="lightyellow", alpha=0.8))
# --- Bottom panel: Polling averages ---
ax2.plot(df["date"], df["polling_avg_rep"], color="red", linewidth=1.5, label="Republican (polls)")
ax2.plot(df["date"], df["polling_avg_dem"], color="blue", linewidth=1.5, label="Democratic (polls)")
ax2.axhline(50, color="gray", linestyle="--", alpha=0.5)
ax2.set_ylabel("Polling Average (%)", fontsize=12)
ax2.set_xlabel("Date", fontsize=12)
ax2.set_title("National Polling Averages", fontsize=14)
ax2.legend(loc="upper left")
ax2.set_ylim(35, 55)
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))
ax2.xaxis.set_major_locator(mdates.MonthLocator())
fig.tight_layout()
plt.savefig("price_vs_polls.png", dpi=150, bbox_inches="tight")
plt.show()
plot_price_vs_polls(df)
2.2 Quantifying Market-Poll Divergence
def compute_divergence(df: pd.DataFrame) -> pd.DataFrame:
"""
Compute the daily divergence between market-implied probability and
polling average for the Republican nominee.
Returns
-------
pd.DataFrame
Original data plus divergence columns.
"""
df = df.copy()
overround = df["dem_yes_price"] + df["rep_yes_price"]
df["rep_market_prob"] = (df["rep_yes_price"] / overround) * 100
df["divergence"] = df["rep_market_prob"] - df["polling_avg_rep"]
print("--- Market-Poll Divergence (Republican) ---")
print(f" Mean divergence: {df['divergence'].mean():+.1f} pp")
print(f" Max divergence: {df['divergence'].max():+.1f} pp")
print(f" Min divergence: {df['divergence'].min():+.1f} pp")
print(f" Std of divergence: {df['divergence'].std():.1f} pp")
print(f" Correlation (r): {df['rep_market_prob'].corr(df['polling_avg_rep']):.3f}")
return df
df = compute_divergence(df)
2.3 Interpretation
The divergence analysis typically reveals several patterns:
-
Markets react faster than polls. After major events (Biden's withdrawal, the debate), the market moves within hours while polling averages take days to adjust because new polls must be fielded and weighted.
-
Markets can diverge persistently. In October 2024, the actual Polymarket data showed the Republican nominee priced 5–10 percentage points higher than polling averages suggested. This sparked debate about whether the market was incorporating "shy voter" effects, or whether large traders were biasing the price.
-
Polls and markets measure different things. Polls measure vote share (who will people vote for), while markets measure win probability (who will win the Electoral College). These can diverge legitimately due to the electoral-college structure.
Phase 3: Analyzing Volume Spikes Around Key Events
3.1 Volume Spike Detection
def detect_volume_spikes(
df: pd.DataFrame,
window: int = 7,
threshold: float = 2.0,
) -> pd.DataFrame:
"""
Detect days where trading volume exceeds a threshold multiple of the
rolling average.
Parameters
----------
df : pd.DataFrame
Market data with 'total_volume' and 'date' columns.
window : int
Rolling window size (days) for baseline volume.
threshold : float
Multiple of rolling average to flag as a spike.
Returns
-------
pd.DataFrame
Rows from df where volume exceeded the threshold.
"""
df = df.copy()
df["rolling_avg_volume"] = df["total_volume"].rolling(window, min_periods=1).mean()
df["volume_ratio"] = df["total_volume"] / df["rolling_avg_volume"]
spikes = df[df["volume_ratio"] >= threshold].copy()
spikes["date_str"] = spikes["date"].dt.strftime("%Y-%m-%d")
print(f"Found {len(spikes)} volume spike days (>= {threshold}x rolling average):")
for _, row in spikes.iterrows():
print(
f" {row['date_str']} | Volume: ${row['total_volume']:>12,} | "
f"Ratio: {row['volume_ratio']:.1f}x | "
f"Rep price: ${row['rep_yes_price']:.2f}"
)
return spikes
spikes = detect_volume_spikes(df, window=7, threshold=2.0)
3.2 Event-Window Analysis
def event_window_analysis(
df: pd.DataFrame,
event_date: str,
event_name: str,
window_before: int = 3,
window_after: int = 3,
) -> None:
"""
Analyze price and volume in a window around a specific event.
Parameters
----------
df : pd.DataFrame
Market data.
event_date : str
Date of the event (YYYY-MM-DD).
event_name : str
Human-readable event name.
window_before : int
Days before the event to include.
window_after : int
Days after the event to include.
"""
event_ts = pd.Timestamp(event_date)
start = event_ts - timedelta(days=window_before)
end = event_ts + timedelta(days=window_after)
window_df = df[(df["date"] >= start) & (df["date"] <= end)].copy()
if window_df.empty:
print(f"No data found around {event_date}.")
return
pre_event = window_df[window_df["date"] < event_ts]
post_event = window_df[window_df["date"] >= event_ts]
pre_price = pre_event["rep_yes_price"].iloc[-1] if len(pre_event) > 0 else None
post_price = post_event["rep_yes_price"].iloc[0] if len(post_event) > 0 else None
print(f"\n{'=' * 60}")
print(f"EVENT WINDOW: {event_name} ({event_date})")
print(f"{'=' * 60}")
if pre_price is not None and post_price is not None:
change = post_price - pre_price
print(f" Price before event: ${pre_price:.4f}")
print(f" Price on event day: ${post_price:.4f}")
print(f" Price change: ${change:+.4f} ({change * 100:+.1f} pp)")
pre_vol = pre_event["total_volume"].mean() if len(pre_event) > 0 else 0
post_vol = post_event["total_volume"].mean() if len(post_event) > 0 else 0
print(f" Avg volume before: ${pre_vol:,.0f}")
print(f" Avg volume after: ${post_vol:,.0f}")
if pre_vol > 0:
print(f" Volume multiplier: {post_vol / pre_vol:.1f}x")
print(f"\n Day-by-day detail:")
for _, row in window_df.iterrows():
marker = " <-- EVENT" if row["date"] == event_ts else ""
print(
f" {row['date'].strftime('%Y-%m-%d')} | "
f"Rep: ${row['rep_yes_price']:.4f} | "
f"Vol: ${row['total_volume']:>12,}{marker}"
)
# Analyze key events
event_window_analysis(df, "2024-06-27", "First Presidential Debate")
event_window_analysis(df, "2024-07-21", "Biden Withdraws from Race")
event_window_analysis(df, "2024-09-10", "Harris-Trump Debate")
3.3 Volume-Price Relationship
def plot_volume_price_scatter(df: pd.DataFrame) -> None:
"""
Scatter plot of daily volume vs. absolute daily price change.
"""
df = df.copy()
df["abs_price_change"] = df["rep_yes_price"].diff().abs()
df = df.dropna()
fig, ax = plt.subplots(figsize=(10, 6))
scatter = ax.scatter(
df["abs_price_change"],
df["total_volume"],
alpha=0.5,
c=df["date"].astype(int),
cmap="viridis",
edgecolors="gray",
linewidth=0.5,
)
ax.set_xlabel("Absolute Daily Price Change ($)", fontsize=12)
ax.set_ylabel("Total Daily Volume ($)", fontsize=12)
ax.set_title("Volume vs. Price Change: Do Big Moves Come with Big Volume?", fontsize=14)
# Fit and plot trend line
z = np.polyfit(df["abs_price_change"], df["total_volume"], 1)
p = np.poly1d(z)
x_line = np.linspace(0, df["abs_price_change"].max(), 100)
ax.plot(x_line, p(x_line), "--", color="red", alpha=0.7, label="Linear trend")
corr = df["abs_price_change"].corr(df["total_volume"])
ax.text(
0.95, 0.95,
f"r = {corr:.3f}",
transform=ax.transAxes,
ha="right", va="top",
fontsize=12,
bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.8),
)
ax.legend()
fig.tight_layout()
plt.savefig("volume_vs_price_change.png", dpi=150, bbox_inches="tight")
plt.show()
plot_volume_price_scatter(df)
Findings and Recommendations
Key Findings
-
Market prices reacted to major events within a single trading day, while polling averages took 3–7 days to adjust. This confirms the speed advantage of prediction markets.
-
Volume spiked 2–5x around key events, indicating that news drives both information and speculative trading. The largest volume spikes coincided with Biden's withdrawal and the September debate.
-
Persistent market-poll divergence in October 2024 (market more bullish on the Republican nominee than polls) was a real and much-discussed phenomenon. Possible explanations include whale effects, the market pricing Electoral College dynamics that national polls miss, or the market incorporating information not captured by polls.
-
The overround was consistently small (typically 1–3 %), suggesting a competitive market-making environment.
-
Volume and price changes were positively correlated (r typically 0.3–0.5), consistent with information-driven trading: big news causes both large price moves and high volume.
Recommendations for Analysts
- Use market prices as a complement to polls, not a replacement. When they diverge, investigate why rather than assuming one is wrong.
- Pay attention to volume. A price move on low volume is less informative than one on high volume.
- Watch for whale effects in crypto-native markets. Large wallets can move prices, especially in the short term.
- Normalize for overround before interpreting prices as probabilities.
Discussion Questions
-
The Polymarket presidential market showed a persistent divergence from polling averages in October 2024. What are three possible explanations? How would you test each one?
-
Should regulators be concerned that a small number of large traders can influence the price in a prediction market? What safeguards could platforms implement?
-
The market reacted within hours to Biden's withdrawal. Does this speed advantage have practical value for decision-makers, or is it merely interesting?
-
How would you design a study to test whether Polymarket's election prices were well-calibrated across many elections? What data would you need?
-
Prediction markets on elections have been criticized for potentially influencing voter behavior (e.g., discouraging turnout for the "losing" side). Is this a legitimate concern? How would you investigate it empirically?
Mini-Project Extension
Build a Real-Time Election Dashboard
Using the functions developed in this case study, build a Python script or Jupyter notebook that:
- Generates (or fetches) daily election market data.
- Computes implied probabilities and overround.
- Computes the market-poll divergence.
- Detects volume spikes automatically.
- Produces a single multi-panel dashboard figure with: - Panel 1: Price/probability time series - Panel 2: Polling averages - Panel 3: Market-poll divergence - Panel 4: Daily volume with spike markers
- Saves the dashboard as a high-resolution PNG.
Bonus: Add a command-line argument that lets the user specify the event date, and the script automatically zooms into a +-7 day window around that event.
End of Case Study 1