Case Study 1: Building a Profitable Market Making Strategy Across 50 Markets

Background

Aurora Capital is a quantitative trading firm that decides to build a systematic market-making operation on Polymarket, one of the largest prediction market platforms. Their goal: provide liquidity across 50 simultaneous markets—spanning politics, sports, economics, and entertainment—while generating consistent risk-adjusted returns.

The team consists of three people: a quantitative researcher (Dr. Lena Park), a software engineer (Marcus Chen), and a risk manager (Sarah Okafor). They have $50,000 in working capital and a mandate to achieve a Sharpe ratio above 1.5 within six months.

This case study follows their journey from system design through live operation, analyzing the strategies that worked, the pitfalls they encountered, and the quantitative results across their first 90 days of operation.


Phase 1: Market Selection (Week 1–2)

The Universe

Polymarket hosted over 300 active markets at the time. Aurora's first task was selecting which 50 markets to provide liquidity in. Dr. Park developed a scoring model:

$$ \text{Score}_i = w_1 \cdot V_i + w_2 \cdot \frac{1}{s_i} + w_3 \cdot (1 - \alpha_i) + w_4 \cdot T_i + w_5 \cdot C_i $$

where: - $V_i$ = normalized daily volume (higher is better) - $s_i$ = current bid-ask spread (tighter existing spread means more competition, hence the inverse) - $\alpha_i$ = estimated adverse selection (lower is better) - $T_i$ = time to resolution in days (longer is better) - $C_i$ = clarity of resolution criteria (subjective 0–1 score)

After scoring all markets, they selected 50 with the following distribution:

Category Count Avg Time to Resolution Avg Spread
US Politics 15 120 days 0.06
International Politics 8 90 days 0.09
Sports 12 30 days 0.04
Economics (Fed, GDP) 8 60 days 0.07
Entertainment / Culture 7 45 days 0.12

Correlation Analysis

Dr. Park estimated pairwise correlations between markets using historical price comovements and domain knowledge. Key correlation clusters:

  1. US Election cluster (15 markets): Average pairwise correlation $\rho \approx 0.45$.
  2. Federal Reserve cluster (4 markets): $\rho \approx 0.60$.
  3. Sports (12 markets): $\rho \approx 0.05$ (nearly independent).
  4. Entertainment (7 markets): $\rho \approx 0.10$.

The high correlation within the US Election cluster meant that Aurora could not treat these markets independently for risk management.


Phase 2: System Architecture (Week 2–4)

Marcus built the following system:

┌──────────────────────────────────────────────┐
│               Aurora Market Maker             │
│                                               │
│  ┌─────────┐  ┌──────────┐  ┌─────────────┐  │
│  │ Data     │  │ Fair     │  │ Quote       │  │
│  │ Ingestion│──│ Value    │──│ Engine      │  │
│  │ Module   │  │ Engine   │  │             │  │
│  └─────────┘  └──────────┘  └──────┬──────┘  │
│                                     │         │
│  ┌─────────┐  ┌──────────┐  ┌──────▼──────┐  │
│  │ Risk    │──│ Inventory │──│ Order       │  │
│  │ Manager │  │ Manager   │  │ Manager     │  │
│  └─────────┘  └──────────┘  └──────┬──────┘  │
│                                     │         │
│  ┌─────────────────────────────────▼──────┐  │
│  │         Polymarket API Client          │  │
│  └────────────────────────────────────────┘  │
│                                               │
│  ┌────────────────────────────────────────┐  │
│  │      Analytics & Reporting Dashboard   │  │
│  └────────────────────────────────────────┘  │
└──────────────────────────────────────────────┘

Key Design Decisions:

  1. Fair Value Engine: Used a Bayesian model combining the order book mid-price (50% weight), a proprietary signal based on poll aggregation for political markets (30% weight), and order flow imbalance (20% weight).

  2. Quote Engine: Avellaneda-Stoikov framework with per-market $\gamma$ tuned based on historical volatility.

  3. Inventory Manager: Nonlinear skewing with $\beta = 1.5$ and per-market position limits.

  4. Risk Manager: Portfolio-level risk with correlation-adjusted limits. Maximum portfolio VaR of $5,000 (10% of capital).

  5. Update Frequency: Quotes refreshed every 30 seconds for high-volume markets and every 2 minutes for low-volume markets, to stay within API rate limits.


Phase 3: Parameter Calibration (Week 3–4)

Spread Calibration

Dr. Park ran a backtesting framework on historical Polymarket data to calibrate base spreads:

# Pseudocode for spread calibration
for market in markets:
    for spread in np.arange(0.02, 0.15, 0.005):
        simulated_pnl = simulate_market_making(
            market_data=historical_data[market],
            base_spread=spread,
            alpha_estimate=estimated_alpha[market],
            gamma=0.05,
            max_inventory=100
        )
        results[market][spread] = {
            'mean_pnl': np.mean(simulated_pnl),
            'sharpe': np.mean(simulated_pnl) / np.std(simulated_pnl),
            'max_drawdown': compute_max_drawdown(simulated_pnl)
        }
    optimal_spread[market] = max(results[market], key=lambda s: results[market][s]['sharpe'])

Results showed: - Sports markets: Optimal spread 0.03–0.05 (low adverse selection, high volume) - Political markets: Optimal spread 0.05–0.08 (moderate adverse selection) - Entertainment: Optimal spread 0.08–0.12 (high adverse selection, low volume)

Risk Aversion Calibration

The risk aversion parameter $\gamma$ was calibrated per market to target a maximum inventory of approximately 50% of the position limit at steady state:

$$ \gamma_i = \frac{0.5 \cdot s_i}{Q_{\max,i} \cdot \sigma_i^2 \cdot T_i} $$


Phase 4: Live Trading — First 30 Days

Performance Overview

Metric Week 1–2 Week 3–4
Markets Quoted 50 50
Average Uptime 92% 97%
Total Trades 2,847 4,112
Gross Revenue (Spread) $1,245 | $1,892
Adverse Selection Loss -$687 | -$823
Net P&L +$412 | +$834
Max Drawdown -$310 | -$425
Sharpe Ratio (annualized) 1.2 1.8

Key Observations from Month 1

Observation 1: Sports markets were the most profitable per unit of capital. Sports markets had high volume, low adverse selection, and short resolution times. The team earned 45% of their total P&L from 24% of their markets.

Observation 2: Political markets had episodic adverse selection. During quiet periods, political market making was moderately profitable. But following debate performances and major polls, VPIN spiked above 0.7 and the team suffered concentrated losses. These bursts of adverse selection accounted for 60% of total adverse selection losses.

Observation 3: The US Election cluster was problematic. Because of high correlations, inventory accumulated in the same direction across multiple election markets simultaneously. On one day, a polling surprise caused the team to accumulate +300 contracts across 8 election markets within 15 minutes, creating a portfolio-level exposure far larger than intended.

Observation 4: Entertainment markets had wide spreads but few fills. With base spreads of 0.08–0.12, fill rates were only 2–3%, generating minimal revenue.

Corrective Actions

  1. Cluster limits: Introduced a hard limit of 200 total contracts across the US Election cluster.
  2. Event calendar: Built a calendar of known information events (debates, data releases, earnings) and pre-widened spreads by 50% in the 2 hours surrounding each event.
  3. Market pruning: Dropped 3 entertainment markets with near-zero fill rates and reallocated capital.

Phase 5: Optimization — Days 31–60

Adverse Selection Regime Detection

Dr. Park implemented a rolling adverse selection detector using VPIN and toxicity metrics:

class RegimeDetector:
    def __init__(self, high_threshold=0.6, low_threshold=0.35):
        self.high_threshold = high_threshold
        self.low_threshold = low_threshold
        self.current_regime = 'normal'

    def update(self, vpin: float, toxicity: float) -> str:
        score = 0.6 * vpin + 0.4 * max(toxicity / 0.03, 0)

        if score > self.high_threshold and self.current_regime != 'high_as':
            self.current_regime = 'high_as'
        elif score < self.low_threshold and self.current_regime != 'low_as':
            self.current_regime = 'low_as'
        elif self.low_threshold <= score <= self.high_threshold:
            self.current_regime = 'normal'

        return self.current_regime

When the detector identified "high_as" regime, the bot automatically: - Doubled the base spread - Halved the order size - Tightened position limits by 50%

Cross-Market Signal Integration

For the Federal Reserve cluster, the team found that when the "Fed raises rates in March" market moved, related markets (inflation, unemployment) were slow to adjust. They built a cross-market signal:

$$ \hat{p}_{\text{inflation}} = \hat{p}_{\text{inflation, current}} + \beta \cdot (\Delta p_{\text{fed rate}} - \hat{\Delta} p_{\text{fed rate}}) $$

where $\beta$ was estimated from historical data. This allowed the team to shift their fair value estimate in related markets before the order book adjusted, earning an extra $0.01–0.02$ per contract on these informed quote adjustments.

Results — Days 31–60

Metric Days 31–60 Improvement
Markets Quoted 47 -3
Average Uptime 98% +1%
Total Trades 5,234 +27%
Gross Revenue (Spread) $2,856 +51%
Adverse Selection Loss -$943 -15% (relative to revenue)
Net P&L +$1,613 +93%
Max Drawdown -$380 -11%
Sharpe Ratio (annualized) 2.4 +33%

Phase 6: Scaling and Mature Operation — Days 61–90

Market Resolution Management

Twelve markets resolved during this period. Key learnings:

  1. Early exit: For markets approaching resolution within 3 days, the team reduced position limits to 25% and widened spreads by 100%. This protected against binary payoff risk.

  2. Resolution P&L: Of 12 resolved markets, 7 were profitable (net positive P&L including resolution), 3 were near break-even, and 2 had significant losses. The two losing markets were both in the US Election cluster where the team had been unable to fully unwind inventory before resolution.

  3. New market entry: As markets resolved, the team rotated into new markets, maintaining a portfolio of ~47 active markets.

Final 90-Day Results

Metric 90-Day Total
Total Trades 14,891
Gross Revenue (Spread Capture) $6,987
Adverse Selection Loss -$2,891
Inventory Marking Losses -$412
Resolution P&L +$234
Platform Fees -$892
Net P&L +$3,026
Return on Capital 6.05% (24.2% annualized)
Max Drawdown -$1,100 (2.2% of capital)
Sharpe Ratio (annualized) 2.1
Win Rate (daily) 68%
Uptime 97.2%

P&L Attribution by Category

Category Gross Revenue AS Loss Net P&L Capital Share
Sports $2,450 | -$612 +$1,500 20%
US Politics $2,100 | -$1,230 +$480 35%
Intl Politics $890 | -$380 +$310 15%
Economics $1,050 | -$420 +$480 20%
Entertainment $497 | -$249 +$156 10%

Lessons Learned

1. Diversification is the Single Most Important Factor

The sports portfolio alone had a Sharpe of 1.4. The full portfolio had a Sharpe of 2.1. The improvement came entirely from diversification across uncorrelated categories.

2. Adverse Selection is Episodic, Not Constant

Steady-state adverse selection was manageable. The dangerous periods were the sudden spikes around information events. A regime detection system was essential for survival.

3. Correlation Management Cannot Be an Afterthought

The US Election cluster nearly caused a catastrophic drawdown in week 3. Treating correlated markets independently for risk purposes is a recipe for blow-ups.

4. Market Selection Matters More Than Strategy Sophistication

Dropping 3 unprofitable markets and selecting better replacements had a larger impact than any strategy improvement. Not every market deserves a market maker.

5. Fees Eat Spread

Platform fees consumed 12.8% of gross revenue. Negotiating better fee tiers (which required hitting volume thresholds) was a critical business objective.

6. Resolution Management is a Distinct Skill

The transition from "market making" to "position unwinding" as markets approach resolution requires a different mindset and different parameters. The team learned to start unwinding 5 days before resolution.


Discussion Questions

  1. Aurora allocated 35% of capital to US political markets but earned only 16% of net P&L from that category. Should they reduce political market exposure? What factors beyond raw P&L should they consider?

  2. The cross-market signal for the Federal Reserve cluster improved P&L by approximately $200 over the 90 days. Is this worth the complexity and risk of signal-based quoting?

  3. If Aurora wanted to scale to 200 markets, what infrastructure and risk management changes would be necessary?

  4. The Sharpe ratio of 2.1 is strong but depends on the 90-day sample. How would you estimate the true long-run Sharpe ratio, and what confidence interval would you place on it?

  5. A competitor enters the market and narrows spreads by 30% in sports markets. How should Aurora respond?