Case Study 1: Building a Profitable Market Making Strategy Across 50 Markets
Background
Aurora Capital is a quantitative trading firm that decides to build a systematic market-making operation on Polymarket, one of the largest prediction market platforms. Their goal: provide liquidity across 50 simultaneous markets—spanning politics, sports, economics, and entertainment—while generating consistent risk-adjusted returns.
The team consists of three people: a quantitative researcher (Dr. Lena Park), a software engineer (Marcus Chen), and a risk manager (Sarah Okafor). They have $50,000 in working capital and a mandate to achieve a Sharpe ratio above 1.5 within six months.
This case study follows their journey from system design through live operation, analyzing the strategies that worked, the pitfalls they encountered, and the quantitative results across their first 90 days of operation.
Phase 1: Market Selection (Week 1–2)
The Universe
Polymarket hosted over 300 active markets at the time. Aurora's first task was selecting which 50 markets to provide liquidity in. Dr. Park developed a scoring model:
$$ \text{Score}_i = w_1 \cdot V_i + w_2 \cdot \frac{1}{s_i} + w_3 \cdot (1 - \alpha_i) + w_4 \cdot T_i + w_5 \cdot C_i $$
where: - $V_i$ = normalized daily volume (higher is better) - $s_i$ = current bid-ask spread (tighter existing spread means more competition, hence the inverse) - $\alpha_i$ = estimated adverse selection (lower is better) - $T_i$ = time to resolution in days (longer is better) - $C_i$ = clarity of resolution criteria (subjective 0–1 score)
After scoring all markets, they selected 50 with the following distribution:
| Category | Count | Avg Time to Resolution | Avg Spread |
|---|---|---|---|
| US Politics | 15 | 120 days | 0.06 |
| International Politics | 8 | 90 days | 0.09 |
| Sports | 12 | 30 days | 0.04 |
| Economics (Fed, GDP) | 8 | 60 days | 0.07 |
| Entertainment / Culture | 7 | 45 days | 0.12 |
Correlation Analysis
Dr. Park estimated pairwise correlations between markets using historical price comovements and domain knowledge. Key correlation clusters:
- US Election cluster (15 markets): Average pairwise correlation $\rho \approx 0.45$.
- Federal Reserve cluster (4 markets): $\rho \approx 0.60$.
- Sports (12 markets): $\rho \approx 0.05$ (nearly independent).
- Entertainment (7 markets): $\rho \approx 0.10$.
The high correlation within the US Election cluster meant that Aurora could not treat these markets independently for risk management.
Phase 2: System Architecture (Week 2–4)
Marcus built the following system:
┌──────────────────────────────────────────────┐
│ Aurora Market Maker │
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Data │ │ Fair │ │ Quote │ │
│ │ Ingestion│──│ Value │──│ Engine │ │
│ │ Module │ │ Engine │ │ │ │
│ └─────────┘ └──────────┘ └──────┬──────┘ │
│ │ │
│ ┌─────────┐ ┌──────────┐ ┌──────▼──────┐ │
│ │ Risk │──│ Inventory │──│ Order │ │
│ │ Manager │ │ Manager │ │ Manager │ │
│ └─────────┘ └──────────┘ └──────┬──────┘ │
│ │ │
│ ┌─────────────────────────────────▼──────┐ │
│ │ Polymarket API Client │ │
│ └────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Analytics & Reporting Dashboard │ │
│ └────────────────────────────────────────┘ │
└──────────────────────────────────────────────┘
Key Design Decisions:
-
Fair Value Engine: Used a Bayesian model combining the order book mid-price (50% weight), a proprietary signal based on poll aggregation for political markets (30% weight), and order flow imbalance (20% weight).
-
Quote Engine: Avellaneda-Stoikov framework with per-market $\gamma$ tuned based on historical volatility.
-
Inventory Manager: Nonlinear skewing with $\beta = 1.5$ and per-market position limits.
-
Risk Manager: Portfolio-level risk with correlation-adjusted limits. Maximum portfolio VaR of $5,000 (10% of capital).
-
Update Frequency: Quotes refreshed every 30 seconds for high-volume markets and every 2 minutes for low-volume markets, to stay within API rate limits.
Phase 3: Parameter Calibration (Week 3–4)
Spread Calibration
Dr. Park ran a backtesting framework on historical Polymarket data to calibrate base spreads:
# Pseudocode for spread calibration
for market in markets:
for spread in np.arange(0.02, 0.15, 0.005):
simulated_pnl = simulate_market_making(
market_data=historical_data[market],
base_spread=spread,
alpha_estimate=estimated_alpha[market],
gamma=0.05,
max_inventory=100
)
results[market][spread] = {
'mean_pnl': np.mean(simulated_pnl),
'sharpe': np.mean(simulated_pnl) / np.std(simulated_pnl),
'max_drawdown': compute_max_drawdown(simulated_pnl)
}
optimal_spread[market] = max(results[market], key=lambda s: results[market][s]['sharpe'])
Results showed: - Sports markets: Optimal spread 0.03–0.05 (low adverse selection, high volume) - Political markets: Optimal spread 0.05–0.08 (moderate adverse selection) - Entertainment: Optimal spread 0.08–0.12 (high adverse selection, low volume)
Risk Aversion Calibration
The risk aversion parameter $\gamma$ was calibrated per market to target a maximum inventory of approximately 50% of the position limit at steady state:
$$ \gamma_i = \frac{0.5 \cdot s_i}{Q_{\max,i} \cdot \sigma_i^2 \cdot T_i} $$
Phase 4: Live Trading — First 30 Days
Performance Overview
| Metric | Week 1–2 | Week 3–4 |
|---|---|---|
| Markets Quoted | 50 | 50 |
| Average Uptime | 92% | 97% |
| Total Trades | 2,847 | 4,112 |
| Gross Revenue (Spread) | $1,245 | $1,892 | |
| Adverse Selection Loss | -$687 | -$823 | |
| Net P&L | +$412 | +$834 | |
| Max Drawdown | -$310 | -$425 | |
| Sharpe Ratio (annualized) | 1.2 | 1.8 |
Key Observations from Month 1
Observation 1: Sports markets were the most profitable per unit of capital. Sports markets had high volume, low adverse selection, and short resolution times. The team earned 45% of their total P&L from 24% of their markets.
Observation 2: Political markets had episodic adverse selection. During quiet periods, political market making was moderately profitable. But following debate performances and major polls, VPIN spiked above 0.7 and the team suffered concentrated losses. These bursts of adverse selection accounted for 60% of total adverse selection losses.
Observation 3: The US Election cluster was problematic. Because of high correlations, inventory accumulated in the same direction across multiple election markets simultaneously. On one day, a polling surprise caused the team to accumulate +300 contracts across 8 election markets within 15 minutes, creating a portfolio-level exposure far larger than intended.
Observation 4: Entertainment markets had wide spreads but few fills. With base spreads of 0.08–0.12, fill rates were only 2–3%, generating minimal revenue.
Corrective Actions
- Cluster limits: Introduced a hard limit of 200 total contracts across the US Election cluster.
- Event calendar: Built a calendar of known information events (debates, data releases, earnings) and pre-widened spreads by 50% in the 2 hours surrounding each event.
- Market pruning: Dropped 3 entertainment markets with near-zero fill rates and reallocated capital.
Phase 5: Optimization — Days 31–60
Adverse Selection Regime Detection
Dr. Park implemented a rolling adverse selection detector using VPIN and toxicity metrics:
class RegimeDetector:
def __init__(self, high_threshold=0.6, low_threshold=0.35):
self.high_threshold = high_threshold
self.low_threshold = low_threshold
self.current_regime = 'normal'
def update(self, vpin: float, toxicity: float) -> str:
score = 0.6 * vpin + 0.4 * max(toxicity / 0.03, 0)
if score > self.high_threshold and self.current_regime != 'high_as':
self.current_regime = 'high_as'
elif score < self.low_threshold and self.current_regime != 'low_as':
self.current_regime = 'low_as'
elif self.low_threshold <= score <= self.high_threshold:
self.current_regime = 'normal'
return self.current_regime
When the detector identified "high_as" regime, the bot automatically: - Doubled the base spread - Halved the order size - Tightened position limits by 50%
Cross-Market Signal Integration
For the Federal Reserve cluster, the team found that when the "Fed raises rates in March" market moved, related markets (inflation, unemployment) were slow to adjust. They built a cross-market signal:
$$ \hat{p}_{\text{inflation}} = \hat{p}_{\text{inflation, current}} + \beta \cdot (\Delta p_{\text{fed rate}} - \hat{\Delta} p_{\text{fed rate}}) $$
where $\beta$ was estimated from historical data. This allowed the team to shift their fair value estimate in related markets before the order book adjusted, earning an extra $0.01–0.02$ per contract on these informed quote adjustments.
Results — Days 31–60
| Metric | Days 31–60 | Improvement |
|---|---|---|
| Markets Quoted | 47 | -3 |
| Average Uptime | 98% | +1% |
| Total Trades | 5,234 | +27% |
| Gross Revenue (Spread) | $2,856 | +51% |
| Adverse Selection Loss | -$943 | -15% (relative to revenue) |
| Net P&L | +$1,613 | +93% |
| Max Drawdown | -$380 | -11% |
| Sharpe Ratio (annualized) | 2.4 | +33% |
Phase 6: Scaling and Mature Operation — Days 61–90
Market Resolution Management
Twelve markets resolved during this period. Key learnings:
-
Early exit: For markets approaching resolution within 3 days, the team reduced position limits to 25% and widened spreads by 100%. This protected against binary payoff risk.
-
Resolution P&L: Of 12 resolved markets, 7 were profitable (net positive P&L including resolution), 3 were near break-even, and 2 had significant losses. The two losing markets were both in the US Election cluster where the team had been unable to fully unwind inventory before resolution.
-
New market entry: As markets resolved, the team rotated into new markets, maintaining a portfolio of ~47 active markets.
Final 90-Day Results
| Metric | 90-Day Total |
|---|---|
| Total Trades | 14,891 |
| Gross Revenue (Spread Capture) | $6,987 |
| Adverse Selection Loss | -$2,891 |
| Inventory Marking Losses | -$412 |
| Resolution P&L | +$234 |
| Platform Fees | -$892 |
| Net P&L | +$3,026 |
| Return on Capital | 6.05% (24.2% annualized) |
| Max Drawdown | -$1,100 (2.2% of capital) |
| Sharpe Ratio (annualized) | 2.1 |
| Win Rate (daily) | 68% |
| Uptime | 97.2% |
P&L Attribution by Category
| Category | Gross Revenue | AS Loss | Net P&L | Capital Share |
|---|---|---|---|---|
| Sports | $2,450 | -$612 | +$1,500 | 20% | |
| US Politics | $2,100 | -$1,230 | +$480 | 35% | |
| Intl Politics | $890 | -$380 | +$310 | 15% | |
| Economics | $1,050 | -$420 | +$480 | 20% | |
| Entertainment | $497 | -$249 | +$156 | 10% |
Lessons Learned
1. Diversification is the Single Most Important Factor
The sports portfolio alone had a Sharpe of 1.4. The full portfolio had a Sharpe of 2.1. The improvement came entirely from diversification across uncorrelated categories.
2. Adverse Selection is Episodic, Not Constant
Steady-state adverse selection was manageable. The dangerous periods were the sudden spikes around information events. A regime detection system was essential for survival.
3. Correlation Management Cannot Be an Afterthought
The US Election cluster nearly caused a catastrophic drawdown in week 3. Treating correlated markets independently for risk purposes is a recipe for blow-ups.
4. Market Selection Matters More Than Strategy Sophistication
Dropping 3 unprofitable markets and selecting better replacements had a larger impact than any strategy improvement. Not every market deserves a market maker.
5. Fees Eat Spread
Platform fees consumed 12.8% of gross revenue. Negotiating better fee tiers (which required hitting volume thresholds) was a critical business objective.
6. Resolution Management is a Distinct Skill
The transition from "market making" to "position unwinding" as markets approach resolution requires a different mindset and different parameters. The team learned to start unwinding 5 days before resolution.
Discussion Questions
-
Aurora allocated 35% of capital to US political markets but earned only 16% of net P&L from that category. Should they reduce political market exposure? What factors beyond raw P&L should they consider?
-
The cross-market signal for the Federal Reserve cluster improved P&L by approximately $200 over the 90 days. Is this worth the complexity and risk of signal-based quoting?
-
If Aurora wanted to scale to 200 markets, what infrastructure and risk management changes would be necessary?
-
The Sharpe ratio of 2.1 is strong but depends on the 90-day sample. How would you estimate the true long-run Sharpe ratio, and what confidence interval would you place on it?
-
A competitor enters the market and narrows spreads by 30% in sports markets. How should Aurora respond?