Case Study 1: Building a Combinatorial Market for Sports Parlays
Overview
Sports parlays --- bets that combine the outcomes of multiple games --- are among the most popular products in sports betting. A parlay pays out only if every leg wins. For example, "Team A wins Game 1 AND Team B wins Game 2 AND Team C wins Game 3" is a three-leg parlay. Sportsbooks traditionally price parlays by assuming independence between games and multiplying the individual odds. But games are often correlated: weather affects multiple outdoor games, a star player's injury news can shift odds across related markets, and conference dynamics create structural correlations.
In this case study, we design and implement a combinatorial prediction market for sports parlays that properly accounts for correlations between games. We handle the combinatorial explosion using a partition-based approach with approximate cross-partition pricing, and we simulate a full trading session to evaluate the system's performance.
The Problem
A sportsbook offers markets on 12 NFL games in a single Sunday slate. Bettors want to place parlay bets on subsets of these games. The goals are:
- Correct pricing: Parlay prices should reflect actual correlations (e.g., weather, conference dynamics) rather than assuming independence.
- Scalability: The system must handle 12 games (2^12 = 4,096 states) in real time, and the design should extend to larger slates.
- Fair returns: The market maker's expected loss should be bounded and predictable.
- Trader experience: Bettors should be able to easily construct and price parlays.
Data Model
We model 12 games as binary events (Home Team Wins or Away Team Wins). We define three correlation groups based on domain knowledge:
- Group 1 (Weather): Games 1-4 are all played in the same region with shared weather conditions. If weather is bad, home teams in dome stadiums have an advantage.
- Group 2 (Conference): Games 5-8 involve teams from the same conference, so outcomes may be correlated through division standings.
- Group 3 (Independent): Games 9-12 have no obvious structural correlation.
Architecture
We use a hybrid architecture: - Within-group: Full combinatorial LMSR. Each group has at most 4 events (2^4 = 16 states), which is trivially manageable. - Cross-group: Independent assumption for pricing, with a correlation correction factor estimated from historical data and trading patterns.
Implementation
The full implementation is in code/case-study-code.py. Here we walk through the key components.
Step 1: Define the Game Structure
games = {
'weather_group': [
{'id': 'G1', 'home': 'Bears', 'away': 'Lions', 'prior': 0.55},
{'id': 'G2', 'home': 'Packers', 'away': 'Vikings', 'prior': 0.60},
{'id': 'G3', 'home': 'Colts', 'away': 'Texans', 'prior': 0.48},
{'id': 'G4', 'home': 'Browns', 'away': 'Bengals', 'prior': 0.42},
],
'conference_group': [
{'id': 'G5', 'home': 'Chiefs', 'away': 'Raiders', 'prior': 0.72},
{'id': 'G6', 'home': 'Broncos', 'away': 'Chargers', 'prior': 0.45},
{'id': 'G7', 'home': 'Bills', 'away': 'Dolphins', 'prior': 0.65},
{'id': 'G8', 'home': 'Jets', 'away': 'Patriots', 'prior': 0.52},
],
'independent_group': [
{'id': 'G9', 'home': 'Cowboys', 'away': 'Eagles', 'prior': 0.50},
{'id': 'G10', 'home': '49ers', 'away': 'Seahawks', 'prior': 0.68},
{'id': 'G11', 'home': 'Ravens', 'away': 'Steelers', 'prior': 0.58},
{'id': 'G12', 'home': 'Bucs', 'away': 'Saints', 'prior': 0.53},
]
}
Step 2: Initialize the Partition Market
Each group gets its own LMSR sub-market. We set initial share vectors to reflect the prior probabilities (derived from historical win rates and betting lines).
The initialization procedure adjusts the LMSR starting shares so that initial prices match the given priors rather than being uniform:
# For each game in a group, buy shares to move the price
# from 0.5 (uniform prior) to the specified prior
for game in group:
target_p = game['prior']
# shares needed: b * ln(target_p / (1 - target_p)) in the
# binary case, distributed across states where this game
# is won by the home team
...
Step 3: Parlay Pricing
When a bettor requests a parlay price, the system:
- Identifies which groups the parlay spans.
- For each group, computes the joint probability of the relevant game outcomes within that group using exact LMSR prices.
- Multiplies the across-group probabilities (independence assumption).
- Applies a correlation correction factor based on historical data.
For a 3-game parlay across two groups:
P(G1 wins AND G5 wins AND G9 wins) =
P(G1 wins | weather_group) *
P(G5 wins | conference_group) *
P(G9 wins | independent_group) *
correction_factor
Step 4: Trading Simulation
We simulate 500 trades from various bettor profiles:
- Sharp bettors (10% of population): Have accurate private information about game outcomes. They tend to buy underpriced parlays and sell overpriced ones.
- Casual bettors (70%): Place parlays based on team loyalty and simple heuristics. They tend to overbet favorites and popular teams.
- Correlation traders (20%): Specifically look for mispriced correlations. They buy parlays that the market underprices due to the independence assumption.
Step 5: Results Analysis
After 500 trades, we analyze:
- Price accuracy: How close are the market's parlay prices to the "true" joint probabilities (computed from a known data-generating process)?
- Market maker P&L: How much does the market maker lose, and is it within the theoretical bound?
- Information aggregation: Do sharp traders successfully move prices toward truth?
- Correlation discovery: Does the market learn about cross-group correlations through trading?
Key Findings
Finding 1: Within-Group Pricing Is Accurate
The partition-based approach provides exact pricing within groups. For 2-game parlays within a single group, the mean absolute pricing error is less than 1% after 200 trades.
Finding 2: Cross-Group Independence Assumption Introduces Bias
For parlays spanning multiple groups, the independence assumption introduces a systematic bias of 2-5 percentage points. Weather-correlated games are particularly mispriced: on rainy days, the probability of home teams winning (in games played outdoors) drops simultaneously, creating positive correlation that the market initially ignores.
Finding 3: Correlation Traders Reduce Bias
After correlation traders enter the market, cross-group pricing errors decrease by approximately 50%. The LMSR's within-group prices adjust to partially compensate for the missing cross-group correlations.
Finding 4: Market Maker Loss Is Bounded
The market maker's total loss across 500 trades is $3,847, well within the theoretical bound of $4,159 (calculated as b * n * ln(2) for the partitioned market). This confirms the LMSR-C's bounded loss guarantee in a practical setting.
Finding 5: Three-Leg Parlays Are Most Popular but Four-Leg Parlays Are Most Profitable for the Market Maker
Casual bettors overwhelmingly prefer 3-leg parlays. However, 4-leg and 5-leg parlays have the largest pricing errors (due to compounding independence assumptions), making them the most profitable for the market maker.
Lessons Learned
-
Partition markets are a practical starting point. Even without perfect cross-group pricing, within-group accuracy provides significant value.
-
Correlation traders are essential. The market needs traders who specifically look for and exploit correlation mispricings. Without them, the independence assumption persists indefinitely.
-
The LMSR liquidity parameter b must be tuned per group. Groups with more volatile games need higher b to avoid excessive price swings.
-
User interface matters. Bettors in our simulation were more likely to place parlays when the interface showed both the independent price and the market-adjusted price, building trust.
-
The 4,096-state full combinatorial approach would have been feasible for 12 games. In hindsight, with modern hardware, full enumeration for 12 events is practical. The partition approach shines when scaling to larger slates (e.g., 30+ games in a busy Saturday of college football).
Extensions
- Dynamic correlation estimation: Use a Bayesian approach to update cross-group correlation estimates as trades reveal information.
- Weather integration: Feed real-time weather data into the correlation correction factor.
- Live markets: Extend to in-game markets where correlations evolve as games progress.
- Larger slates: Scale to 20-30 games using the approximate methods from Section 30.5.