> "The real world doesn't present us with isolated questions. Events are entangled, correlated, and conditional. A prediction market that ignores these connections leaves money on the table --- and information in the dark."
In This Chapter
- 30.1 The Combinatorial Challenge
- 30.2 Full Combinatorial Markets
- 30.3 The Exponential Blowup Problem
- 30.4 Partition-Based Markets
- 30.5 Approximation Approaches
- 30.6 LMSR Extensions for Combinatorial Markets
- 30.7 Bundle Trading
- 30.8 Conditional Prediction Markets
- 30.9 Real-World Combinatorial Markets
- 30.10 Computational Trade-offs
- 30.11 Advanced: Market Design for Causal Inference
- 30.12 Chapter Summary
- What's Next
- Key Formulas Reference
Chapter 30: Combinatorial Prediction Markets
"The real world doesn't present us with isolated questions. Events are entangled, correlated, and conditional. A prediction market that ignores these connections leaves money on the table --- and information in the dark."
In every chapter so far, we have largely treated prediction markets as dealing with one question at a time: "Will candidate X win the election?" or "Will the project ship on time?" But the questions that matter most in decision-making are almost never isolated. A CEO wants to know: "If we launch Product A AND the Fed raises rates, what happens to our Q4 revenue?" A policy analyst asks: "If a carbon tax passes AND a recession hits, what is the probability unemployment exceeds 7%?" These compound, conditional, and combinatorial questions are where prediction markets can unlock their deepest value --- and where their most formidable computational challenges arise.
This chapter takes you deep into combinatorial prediction markets: markets that allow traders to express beliefs about combinations of events, conditional relationships, and joint probabilities. We will see why the naive approach collapses under the weight of exponential state spaces, explore clever approximations that make the problem tractable, and build working implementations of each approach in Python. By the end, you will understand both the theoretical foundations and the practical engineering trade-offs that define this frontier of market design.
30.1 The Combinatorial Challenge
30.1.1 Why Combinations Matter
Consider a simple prediction market with two binary questions:
- A: "Will the Federal Reserve raise interest rates in March?"
- B: "Will US GDP growth exceed 3% in Q2?"
A standard market setup would create two separate markets, each with its own probability. But a trader who believes these events are strongly correlated --- perhaps she thinks a rate hike will suppress growth --- has no way to express that belief in isolated markets. She cannot say, "I think P(A and B) is much lower than P(A) * P(B)." The correlation information is lost.
Combinatorial prediction markets solve this by allowing traders to bet on any Boolean combination of the underlying events:
- "A AND B" (rate hike AND strong growth)
- "A AND NOT B" (rate hike AND weak growth)
- "NOT A AND B" (no hike AND strong growth)
- "NOT A AND NOT B" (no hike AND weak growth)
With prices for all four states, the market implicitly encodes the full joint distribution over A and B, including their correlation.
30.1.2 The Joint Distribution Perspective
For n binary events, the full joint distribution requires specifying a probability for each of the 2^n possible outcomes (subject to the constraint that they sum to 1). Each "state of the world" is a complete assignment of TRUE or FALSE to every event.
For two events A and B, the four states and their probabilities might be:
| State | A | B | Probability |
|---|---|---|---|
| s1 | T | T | 0.15 |
| s2 | T | F | 0.35 |
| s3 | F | T | 0.20 |
| s4 | F | F | 0.30 |
From this joint distribution, we can compute anything: - P(A) = P(s1) + P(s2) = 0.50 - P(B) = P(s1) + P(s3) = 0.35 - P(A and B) = P(s1) = 0.15 - P(B | A) = P(s1) / P(A) = 0.30 - P(A or B) = P(s1) + P(s2) + P(s3) = 0.70
A combinatorial market that prices all 2^n states is, in effect, aggregating the crowd's beliefs about the entire joint distribution.
30.1.3 The Combinatorial Explosion
Here is the fundamental problem: the number of states grows exponentially.
| Number of Events (n) | Number of States (2^n) |
|---|---|
| 2 | 4 |
| 5 | 32 |
| 10 | 1,024 |
| 20 | 1,048,576 |
| 30 | 1,073,741,824 |
| 50 | ~1.13 x 10^15 |
With just 20 binary events, there are over a million states. With 50, the number exceeds a quadrillion. No market maker can maintain a cost function over a quadrillion states. No trader can browse a quadrillion contracts. This is the combinatorial explosion, and it is the central challenge of this chapter.
30.1.4 The Design Spectrum
The approaches to combinatorial markets fall along a spectrum from exact to approximate:
- Full combinatorial (Section 30.2): Enumerate all 2^n states. Exact but infeasible for large n.
- Partition-based (Section 30.4): Decompose into independent sub-problems. Exact for independent groups.
- Approximation (Section 30.5): Use sampling, belief propagation, or mean-field methods. Approximate but scalable.
- Lazy evaluation (Section 30.6): Only instantiate states that traders actually care about.
- Bundle trading (Section 30.7): Let traders express compound bets without full state enumeration.
- Conditional markets (Section 30.8): A special case that conditions on specific events.
Each approach makes different trade-offs between expressiveness, computational cost, and accuracy. Let us explore them one by one.
30.2 Full Combinatorial Markets
30.2.1 Complete State Space Enumeration
The conceptually simplest approach to a combinatorial market is to enumerate every possible state of the world and run a standard automated market maker (like the LMSR from Chapter 8) over the full state space.
Recall the LMSR cost function for a set of outcomes:
$$C(\mathbf{q}) = b \ln\left(\sum_{i=1}^{N} e^{q_i / b}\right)$$
where $\mathbf{q} = (q_1, q_2, \ldots, q_N)$ is the vector of outstanding shares for each outcome and $b$ is the liquidity parameter.
The price (instantaneous probability) for outcome $i$ is:
$$p_i = \frac{e^{q_i / b}}{\sum_{j=1}^{N} e^{q_j / b}}$$
For a full combinatorial market with n binary events, $N = 2^n$. Each outcome corresponds to one complete state of the world --- a specific TRUE/FALSE assignment for every event.
30.2.2 Pricing Any Boolean Expression
The power of the full combinatorial market is that any Boolean expression over the events can be priced by summing the probabilities of the states where that expression is true.
For example, with events A, B, C: - P(A) = sum of probabilities over all states where A is true (4 out of 8 states) - P(A AND B) = sum over states where both A and B are true (2 out of 8 states) - P(A OR (B AND NOT C)) = sum over appropriate states
This gives us enormous expressive power. Traders can bet on anything.
30.2.3 Implementation: Full Combinatorial LMSR
Let us build a complete implementation for small n. The following Python code implements a full combinatorial LMSR:
import numpy as np
from itertools import product
class FullCombinatorialLMSR:
"""
Full combinatorial LMSR for n binary events.
Each state is a tuple of n booleans.
Feasible for n <= ~15.
"""
def __init__(self, event_names, b=100.0):
self.event_names = event_names
self.n = len(event_names)
self.num_states = 2 ** self.n
self.b = b
# Generate all states: each is a tuple of bools
self.states = list(product([False, True], repeat=self.n))
# Map state tuples to indices
self.state_to_idx = {s: i for i, s in enumerate(self.states)}
# Outstanding shares for each state
self.shares = np.zeros(self.num_states)
self.trade_history = []
def cost(self, shares=None):
"""LMSR cost function."""
if shares is None:
shares = self.shares
return self.b * np.log(np.sum(np.exp(shares / self.b)))
def state_prices(self):
"""Current price (probability) for each state."""
exp_q = np.exp(self.shares / self.b)
return exp_q / np.sum(exp_q)
def event_probability(self, event_idx):
"""Marginal probability that event_idx is True."""
prices = self.state_prices()
return sum(
prices[i] for i, s in enumerate(self.states)
if s[event_idx]
)
def boolean_price(self, condition_fn):
"""
Price any Boolean expression over events.
condition_fn takes a state tuple, returns True/False.
"""
prices = self.state_prices()
return sum(
prices[i] for i, s in enumerate(self.states)
if condition_fn(s)
)
def conditional_probability(self, target_fn, given_fn):
"""
P(target | given) = P(target AND given) / P(given).
Both are Boolean functions over states.
"""
p_given = self.boolean_price(given_fn)
if p_given < 1e-15:
return float('nan')
p_both = self.boolean_price(
lambda s: target_fn(s) and given_fn(s)
)
return p_both / p_given
def buy_boolean(self, condition_fn, num_shares=1.0):
"""
Buy shares of a Boolean expression.
This adds shares to every state where the condition is true.
Returns cost of the trade.
"""
old_cost = self.cost()
delta = np.array([
num_shares if condition_fn(s) else 0.0
for s in self.states
])
self.shares += delta
new_cost = self.cost()
cost = new_cost - old_cost
self.trade_history.append({
'type': 'buy_boolean',
'num_shares': num_shares,
'cost': cost,
'matching_states': int(np.sum(delta > 0))
})
return cost
def sell_boolean(self, condition_fn, num_shares=1.0):
"""Sell shares of a Boolean expression."""
return -self.buy_boolean(condition_fn, -num_shares)
def joint_distribution(self):
"""Return the full joint distribution as a dict."""
prices = self.state_prices()
dist = {}
for i, s in enumerate(self.states):
label = tuple(
(self.event_names[j], s[j])
for j in range(self.n)
)
dist[label] = prices[i]
return dist
def correlation(self, event_i, event_j):
"""
Correlation between two events implied by market prices.
corr = (P(A and B) - P(A)*P(B)) / sqrt(P(A)(1-P(A))*P(B)(1-P(B)))
"""
p_a = self.event_probability(event_i)
p_b = self.event_probability(event_j)
p_ab = self.boolean_price(
lambda s: s[event_i] and s[event_j]
)
cov = p_ab - p_a * p_b
denom = np.sqrt(p_a * (1 - p_a) * p_b * (1 - p_b))
if denom < 1e-15:
return float('nan')
return cov / denom
def summary(self):
"""Print market summary."""
print(f"Combinatorial LMSR: {self.n} events, "
f"{self.num_states} states, b={self.b}")
print(f"\nMarginal probabilities:")
for i, name in enumerate(self.event_names):
print(f" {name}: {self.event_probability(i):.4f}")
print(f"\nPairwise correlations:")
for i in range(self.n):
for j in range(i+1, self.n):
corr = self.correlation(i, j)
print(f" {self.event_names[i]} <-> "
f"{self.event_names[j]}: {corr:.4f}")
# Example usage
if __name__ == "__main__":
events = ["RateHike", "GDPHigh", "StocksUp"]
market = FullCombinatorialLMSR(events, b=50.0)
# Initially all states are equiprobable
market.summary()
# Trader believes rate hike is likely
cost = market.buy_boolean(lambda s: s[0], num_shares=20)
print(f"\nBought 'RateHike' for ${cost:.2f}")
# Trader believes rate hike AND stocks down are correlated
cost = market.buy_boolean(
lambda s: s[0] and not s[2],
num_shares=15
)
print(f"Bought 'RateHike AND NOT StocksUp' for ${cost:.2f}")
market.summary()
30.2.4 Why Full Combinatorial is Impractical for Large n
The implementation above works beautifully for 3 events (8 states) or even 10 events (1,024 states). But consider the resource requirements:
- Memory: Storing the share vector for 2^n states requires 2^n floating-point numbers. At 8 bytes per double, 20 events need 8 MB, 25 events need 256 MB, and 30 events need 8 GB --- just for one array.
- Computation: Every trade requires computing the cost function, which sums over all 2^n states. Every price query for a Boolean expression requires iterating over all states.
- Liquidity: The LMSR's liquidity parameter b must be set relative to the number of states. With more states, each individual state gets less liquidity, making prices more volatile.
In practice, full combinatorial LMSR is feasible for up to about 15-20 events, depending on available hardware and latency requirements. Beyond that, we need smarter approaches.
30.3 The Exponential Blowup Problem
30.3.1 Understanding the Scale
To appreciate why 2^n growth is so devastating, let us think about it concretely. Suppose you want a combinatorial market on the outcomes of 30 events --- say, whether each of 30 different countries will meet their emissions targets in 2030.
The full state space has 2^30 = 1,073,741,824 states --- over a billion. Storing the share vector alone requires 8 GB of memory. Computing a single price update requires iterating over a billion entries. At a generous estimate of 1 nanosecond per operation, that is about 1 second per price update --- far too slow for a responsive market.
For 50 events, the numbers become absurd: 2^50 states would require about 8 petabytes of storage. No amount of hardware makes this tractable.
30.3.2 The Dimensionality Curse in Market Terms
The combinatorial explosion has specific consequences for market design:
-
Thin liquidity per state: If the market maker's total subsidy is $S, and there are 2^n states, the effective liquidity per state is roughly $S / 2^n$. For large n, each state has negligible liquidity, meaning prices move wildly with small trades.
-
Trader cognitive load: No human trader can meaningfully evaluate a billion states. Even sophisticated algorithmic traders struggle with state spaces this large.
-
Price staleness: With 2^n states but far fewer active traders, most state prices will never be directly traded and will remain at their priors. The information aggregation benefit of the market is lost.
-
Arbitrage complexity: Checking for and exploiting arbitrage opportunities (like prices that violate the laws of probability) becomes computationally expensive itself.
30.3.3 When Full Combinatorial Is Feasible
Despite the explosion, there are real scenarios where full enumeration works:
- Small n (up to ~12-15): Election forecasts combining a handful of swing states, multi-event sports outcomes, or a few related corporate events.
- Hardware acceleration: GPU-based implementations can handle somewhat larger state spaces by parallelizing the exp and sum operations.
- Sparse trading: If traders only ever trade on a small fraction of states, lazy evaluation (Section 30.6) can defer computation.
- Offline analysis: If real-time pricing is not required, batch computation over large state spaces becomes feasible.
The key insight is that we rarely need the full state space. Most traders care about marginal probabilities, a few conditional probabilities, and perhaps some specific conjunctions. This opens the door to approximation.
30.4 Partition-Based Markets
30.4.1 Exploiting Independence
Many events in a combinatorial market are approximately independent. The outcome of the French election probably has little to do with whether a hurricane hits Florida. If we can identify groups of events that are independent of each other, we can decompose the full joint distribution into separate, smaller distributions.
If events {A1, A2, A3} are independent of events {B1, B2}, then:
$$P(A_1, A_2, A_3, B_1, B_2) = P(A_1, A_2, A_3) \times P(B_1, B_2)$$
Instead of one market with 2^5 = 32 states, we have two markets: one with 2^3 = 8 states and one with 2^2 = 4 states, for a total of 12 states. The savings grow dramatically with n.
30.4.2 Tree-Structured Markets
A more general structure allows events to form a tree of dependencies. At the root, we have a "parent" event. Each branch leads to "child" events whose probabilities are conditioned on the parent's outcome.
For example: - Root: "Will there be a recession?" (2 states) - If recession: "Will unemployment exceed 8%?" (2 states) - If no recession: "Will unemployment exceed 8%?" (2 states, different priors) - If recession: "Will the incumbent lose re-election?" (2 states) - If no recession: "Will the incumbent lose re-election?" (2 states)
This tree structure captures the key dependencies without requiring full enumeration. The total number of parameters is proportional to the number of edges in the tree, not 2^n.
30.4.3 Conditional Markets as a Special Case
A conditional market is a partition market with a specific structure: one event is designated as the "condition," and other markets exist separately for each value of the condition. This is the "if X happens, what is the probability of Y?" structure.
Formally, a conditional market for P(Y | X) maintains: - A market for X - A market for Y given X = True - A market for Y given X = False
The joint probability is recovered as: $$P(X=x, Y=y) = P(Y=y | X=x) \times P(X=x)$$
30.4.4 Implementation: Partition Market
import numpy as np
from itertools import product
class PartitionMarket:
"""
A combinatorial market decomposed into independent partitions.
Each partition is a group of events assumed independent of
events in other partitions.
"""
def __init__(self, partitions, b=100.0):
"""
partitions: list of lists of event names
e.g., [["A", "B"], ["C", "D", "E"]]
means A,B may be correlated with each other but are
independent of C,D,E.
"""
self.partitions = partitions
self.b = b
# All event names
self.all_events = []
self.event_to_partition = {}
self.event_to_local_idx = {}
for p_idx, partition in enumerate(partitions):
for local_idx, event in enumerate(partition):
self.all_events.append(event)
self.event_to_partition[event] = p_idx
self.event_to_local_idx[event] = local_idx
self.n = len(self.all_events)
# Create sub-markets for each partition
self.sub_markets = []
for partition in partitions:
n_p = len(partition)
num_states_p = 2 ** n_p
states_p = list(product([False, True], repeat=n_p))
self.sub_markets.append({
'events': partition,
'n': n_p,
'num_states': num_states_p,
'states': states_p,
'shares': np.zeros(num_states_p)
})
def _sub_cost(self, sub_market, shares=None):
if shares is None:
shares = sub_market['shares']
return self.b * np.log(np.sum(np.exp(shares / self.b)))
def _sub_prices(self, sub_market):
exp_q = np.exp(sub_market['shares'] / self.b)
return exp_q / np.sum(exp_q)
def event_probability(self, event_name):
"""Marginal probability of a single event."""
p_idx = self.event_to_partition[event_name]
l_idx = self.event_to_local_idx[event_name]
sub = self.sub_markets[p_idx]
prices = self._sub_prices(sub)
return sum(
prices[i] for i, s in enumerate(sub['states'])
if s[l_idx]
)
def joint_probability_within_partition(self, event_values):
"""
Compute joint probability for events within the same partition.
event_values: dict mapping event name -> bool
"""
# Group by partition
partitions_involved = {}
for event, value in event_values.items():
p_idx = self.event_to_partition[event]
if p_idx not in partitions_involved:
partitions_involved[p_idx] = {}
l_idx = self.event_to_local_idx[event]
partitions_involved[p_idx][l_idx] = value
# Multiply probabilities across independent partitions
prob = 1.0
for p_idx, constraints in partitions_involved.items():
sub = self.sub_markets[p_idx]
prices = self._sub_prices(sub)
p_sub = sum(
prices[i] for i, s in enumerate(sub['states'])
if all(s[l] == v for l, v in constraints.items())
)
prob *= p_sub
return prob
def buy_event(self, event_name, num_shares=1.0):
"""Buy shares of a single event being True."""
p_idx = self.event_to_partition[event_name]
l_idx = self.event_to_local_idx[event_name]
sub = self.sub_markets[p_idx]
old_cost = self._sub_cost(sub)
delta = np.array([
num_shares if s[l_idx] else 0.0
for s in sub['states']
])
sub['shares'] += delta
new_cost = self._sub_cost(sub)
return new_cost - old_cost
def buy_conjunction_same_partition(self, event_values, num_shares=1.0):
"""
Buy shares of a conjunction of events in the same partition.
event_values: dict of event_name -> bool
"""
# Verify all in same partition
p_indices = set(
self.event_to_partition[e] for e in event_values
)
if len(p_indices) > 1:
raise ValueError(
"Events span multiple partitions. "
"Use buy_cross_partition instead."
)
p_idx = p_indices.pop()
sub = self.sub_markets[p_idx]
old_cost = self._sub_cost(sub)
constraints = {
self.event_to_local_idx[e]: v
for e, v in event_values.items()
}
delta = np.array([
num_shares if all(s[l] == v for l, v in constraints.items())
else 0.0
for s in sub['states']
])
sub['shares'] += delta
new_cost = self._sub_cost(sub)
return new_cost - old_cost
def total_states(self):
"""Total states across all partitions (compare to 2^n)."""
return sum(sub['num_states'] for sub in self.sub_markets)
def full_states_comparison(self):
"""Compare partition states to full combinatorial states."""
partition_total = self.total_states()
full_total = 2 ** self.n
savings = 1 - partition_total / full_total
return {
'partition_states': partition_total,
'full_states': full_total,
'savings_ratio': savings,
'speedup': full_total / partition_total
}
def summary(self):
"""Print market summary."""
comp = self.full_states_comparison()
print(f"Partition Market: {self.n} events in "
f"{len(self.partitions)} partitions")
print(f" Partition states: {comp['partition_states']} "
f"(vs {comp['full_states']} full)")
print(f" Savings: {comp['savings_ratio']:.2%}, "
f"Speedup: {comp['speedup']:.1f}x")
print(f"\nPartitions:")
for i, p in enumerate(self.partitions):
sub = self.sub_markets[i]
print(f" [{i}] {p} ({sub['num_states']} states)")
print(f"\nMarginal probabilities:")
for event in self.all_events:
print(f" {event}: {self.event_probability(event):.4f}")
# Example
if __name__ == "__main__":
# 8 events in 3 partitions: much smaller than 2^8 = 256 states
market = PartitionMarket(
partitions=[
["RateHike", "GDPHigh", "StocksUp"], # 8 states
["HurricaneFL", "OilSpike"], # 4 states
["ElectionX", "ElectionY", "ElectionZ"] # 8 states
],
b=50.0
)
market.summary()
# Total: 20 states instead of 256 --- 12.8x speedup
# Trade on correlated events within a partition
cost = market.buy_conjunction_same_partition(
{"RateHike": True, "StocksUp": False},
num_shares=10
)
print(f"\nBought 'RateHike AND NOT StocksUp': ${cost:.2f}")
market.summary()
30.4.5 Limitations of Partitioning
The partition approach has a critical limitation: it cannot capture correlations between events in different partitions. If a rate hike (partition 1) is actually correlated with oil prices (partition 2), the partition market will miss this dependency entirely.
This creates a design tension. Lumping correlated events into the same partition improves accuracy but increases the state space for that partition. The optimal partition structure balances these concerns.
In practice, domain knowledge is essential. A market designer who understands the dependencies between events can create partitions that capture the most important correlations while keeping each partition small enough for exact computation.
30.5 Approximation Approaches
When events are too numerous for full enumeration and too entangled for clean partitioning, we turn to approximation. This section surveys the major approaches.
30.5.1 Sampling-Based Pricing
The simplest approximation is Monte Carlo sampling. Instead of summing over all 2^n states to compute a price, we sample states and estimate the sum.
For an LMSR with share vector $\mathbf{q}$, the probability of state $s$ is:
$$p(s) = \frac{e^{q_s / b}}{\sum_{s'} e^{q_{s'} / b}} = \frac{e^{q_s / b}}{Z}$$
where $Z$ is the partition function. Computing $Z$ exactly requires summing over all states, but we can estimate it by importance sampling.
The probability of a Boolean expression $\phi$ is:
$$P(\phi) = \sum_{s: \phi(s)=\text{true}} p(s)$$
We can estimate this by drawing samples from a proposal distribution and reweighting:
$$\hat{P}(\phi) = \frac{1}{M} \sum_{m=1}^{M} \frac{p(s_m)}{q(s_m)} \cdot \mathbb{1}[\phi(s_m)]$$
where $s_m \sim q(\cdot)$ is sampled from the proposal distribution $q$.
A natural proposal is the product of independent marginals, which is easy to sample from but may be inaccurate when events are highly correlated.
30.5.2 Bayesian Network Approximation
A Bayesian network represents the joint distribution as a directed acyclic graph (DAG) where each node's probability depends only on its parents. This can dramatically reduce the number of parameters.
For example, with 10 events, a full joint distribution has 2^10 - 1 = 1,023 parameters. But if each event has at most 2 parents in the Bayesian network, the total parameters are at most 10 * 2^2 = 40.
The market can maintain a Bayesian network structure and update the conditional probability tables (CPTs) based on trades. The key challenge is determining the network structure (which events depend on which) and performing efficient inference (computing marginals and conditionals).
30.5.3 Mean-Field Approximation
The mean-field approximation assumes all events are independent and finds the product distribution that best approximates the true joint distribution (in the KL-divergence sense).
$$q^*(x) = \prod_{i=1}^{n} q_i(x_i)$$
$$q^* = \arg\min_q \text{KL}(q \| p)$$
This is solved iteratively: for each event $i$, update $q_i$ while holding all other $q_j$ fixed. The update rule is:
$$q_i(x_i) \propto \exp\left(\mathbb{E}_{q_{-i}}[\ln p(x_i, x_{-i})]\right)$$
Mean-field is fast and scalable but completely ignores correlations. It works best when events are only weakly correlated.
30.5.4 Belief Propagation
Belief propagation (also known as the sum-product algorithm) computes exact marginal probabilities on tree-structured graphical models and provides good approximations on graphs with loops ("loopy belief propagation").
The idea is to pass messages between neighboring nodes in the graph. Each message represents one node's "belief" about another node's probability distribution. After enough message passes, the beliefs converge to approximate marginal probabilities.
For a combinatorial market, the events are nodes, and edges represent dependencies (as discovered from trading patterns or specified by the market designer). Belief propagation can then estimate marginal and pairwise joint probabilities efficiently.
30.5.5 Implementation: Approximate Combinatorial Market
import numpy as np
from itertools import product
import random
class ApproximateCombinatorialMarket:
"""
Approximate combinatorial market using sampling and
mean-field approximation.
"""
def __init__(self, event_names, b=100.0, num_samples=10000):
self.event_names = event_names
self.n = len(event_names)
self.b = b
self.num_samples = num_samples
# Instead of full share vector, maintain a sparse
# representation: only track states that have been traded.
self.share_deltas = {} # state_tuple -> share_delta
# Mean-field parameters: independent marginal probabilities
self.marginals = np.full(self.n, 0.5)
# Pairwise correlations (for better sampling)
self.pairwise = np.zeros((self.n, self.n))
self.trade_count = 0
def _state_to_tuple(self, state):
return tuple(state)
def _get_share(self, state_tuple):
"""Get share count for a state (0 if never traded)."""
return self.share_deltas.get(state_tuple, 0.0)
def _log_unnormalized_prob(self, state_tuple):
"""Log unnormalized probability of a state."""
return self._get_share(state_tuple) / self.b
def _sample_proposal(self, num_samples):
"""Sample from the mean-field proposal distribution."""
samples = []
for _ in range(num_samples):
state = tuple(
random.random() < self.marginals[i]
for i in range(self.n)
)
samples.append(state)
return samples
def _proposal_prob(self, state_tuple):
"""Probability of a state under the proposal."""
prob = 1.0
for i, val in enumerate(state_tuple):
if val:
prob *= self.marginals[i]
else:
prob *= (1 - self.marginals[i])
return prob
def estimate_boolean_price(self, condition_fn, num_samples=None):
"""
Estimate price of a Boolean expression using
importance sampling.
"""
if num_samples is None:
num_samples = self.num_samples
samples = self._sample_proposal(num_samples)
# Importance weights
log_weights = []
condition_values = []
for state in samples:
log_w = (self._log_unnormalized_prob(state)
- np.log(max(self._proposal_prob(state), 1e-300)))
log_weights.append(log_w)
condition_values.append(1.0 if condition_fn(state) else 0.0)
# Normalize weights (log-sum-exp trick for stability)
log_weights = np.array(log_weights)
max_log_w = np.max(log_weights)
weights = np.exp(log_weights - max_log_w)
# Weighted estimate
numerator = np.sum(weights * np.array(condition_values))
denominator = np.sum(weights)
if denominator < 1e-300:
return 0.5 # Fall back to prior
return numerator / denominator
def buy_boolean(self, condition_fn, num_shares=1.0,
affected_events=None):
"""
Buy shares of a Boolean expression.
For efficiency, if affected_events is specified, only
enumerate states over those events and add shares
to matching states.
"""
if affected_events is not None:
# Only enumerate states for affected events
affected_idx = [
self.event_names.index(e) for e in affected_events
]
n_affected = len(affected_idx)
if n_affected > 20:
raise ValueError(
f"Too many affected events ({n_affected}). "
"Use approximate trading."
)
# Enumerate affected event combinations
for combo in product([False, True], repeat=n_affected):
# Create a template state
# For unaffected events, we need to decide values
# Here we create partial states and distribute shares
for bg_combo in product(
[False, True],
repeat=self.n - n_affected
):
state = [False] * self.n
unaffected_idx = [
i for i in range(self.n)
if i not in affected_idx
]
for k, idx in enumerate(affected_idx):
state[idx] = combo[k]
for k, idx in enumerate(unaffected_idx):
state[idx] = bg_combo[k]
state_tuple = tuple(state)
if condition_fn(state_tuple):
self.share_deltas[state_tuple] = (
self.share_deltas.get(state_tuple, 0.0)
+ num_shares
)
else:
# Sample-based approximate trading
samples = self._sample_proposal(self.num_samples)
matching = [s for s in samples if condition_fn(s)]
if matching:
share_per = num_shares / len(matching) * len(samples)
for state in matching:
self.share_deltas[state] = (
self.share_deltas.get(state, 0.0) + share_per
)
# Update mean-field marginals
self._update_marginals()
self.trade_count += 1
def _update_marginals(self, iterations=5):
"""Update mean-field marginal estimates."""
if not self.share_deltas:
return
for _ in range(iterations):
for i in range(self.n):
# Estimate P(event_i = True) using current
# approximation
samples_true = self._sample_proposal(
self.num_samples // 10
)
log_w_true = []
log_w_false = []
for state in samples_true:
log_w = self._log_unnormalized_prob(state)
if state[i]:
log_w_true.append(log_w)
else:
log_w_false.append(log_w)
if log_w_true and log_w_false:
max_t = max(log_w_true)
max_f = max(log_w_false)
sum_t = np.sum(np.exp(
np.array(log_w_true) - max_t
)) * np.exp(max_t)
sum_f = np.sum(np.exp(
np.array(log_w_false) - max_f
)) * np.exp(max_f)
total = sum_t + sum_f
if total > 0:
self.marginals[i] = np.clip(
sum_t / total, 0.01, 0.99
)
def event_probability(self, event_idx):
"""Estimated marginal probability of event."""
return self.marginals[event_idx]
def summary(self):
"""Print market summary."""
print(f"Approximate Combinatorial Market: "
f"{self.n} events")
print(f" Active states: {len(self.share_deltas)} "
f"(of {2**self.n} total)")
print(f" Trades: {self.trade_count}")
print(f"\nEstimated marginal probabilities:")
for i, name in enumerate(self.event_names):
print(f" {name}: {self.marginals[i]:.4f}")
30.5.6 Choosing an Approximation Method
The choice of approximation depends on the problem structure:
| Method | Best When | Complexity | Accuracy |
|---|---|---|---|
| Sampling | General case, moderate n | O(M * n) per query | Improves with M samples |
| Bayesian network | Known causal structure | O(n * 2^k) for max k parents | Exact if structure correct |
| Mean-field | Weak correlations | O(n * iterations) | Poor for strong correlations |
| Belief propagation | Sparse dependencies | O(n * 2^k * iterations) | Good for trees, approximate for loops |
| Partition | Clear independence groups | O(sum of 2^k_i) | Exact within partitions |
In practice, hybrid approaches work best. A market might use partitions for clearly independent groups, belief propagation for sparse dependencies within groups, and sampling as a fallback for arbitrary queries.
30.6 LMSR Extensions for Combinatorial Markets
30.6.1 Hanson's Combinatorial LMSR
Robin Hanson, who invented the LMSR (see Chapter 8), also proposed an extension for combinatorial markets. The key insight is that the LMSR cost function structure naturally extends to exponential state spaces --- if we are clever about computation.
The combinatorial LMSR (LMSR-C) maintains the same mathematical form:
$$C(\mathbf{q}) = b \ln\left(\sum_{s \in \mathcal{S}} e^{q_s / b}\right)$$
but avoids explicit enumeration of $\mathcal{S}$ by using the structure of trades. When a trader buys shares of a Boolean expression $\phi$, this adds shares to all states satisfying $\phi$. The cost function update can sometimes be computed without enumerating all states.
30.6.2 The Factored Representation
If shares have only been added via "simple" Boolean expressions (individual events, small conjunctions), the share vector $\mathbf{q}$ can be represented in a factored form. Suppose the only trades so far have been: - Buy $\delta_1$ shares of event A - Buy $\delta_2$ shares of event B - Buy $\delta_3$ shares of "A AND B"
Then the share count for any state $s$ is:
$$q_s = \delta_1 \cdot \mathbb{1}[A \in s] + \delta_2 \cdot \mathbb{1}[B \in s] + \delta_3 \cdot \mathbb{1}[A \in s \wedge B \in s]$$
The partition function becomes:
$$Z = \sum_{s} \exp\left(\frac{\delta_1 \cdot \mathbb{1}[A \in s] + \delta_2 \cdot \mathbb{1}[B \in s] + \delta_3 \cdot \mathbb{1}[A \in s \wedge B \in s]}{b}\right)$$
For binary events, this can be computed by enumerating just the 4 states of (A, B), regardless of how many other events exist. The other events factor out because they contribute equally to every term.
30.6.3 Lazy Evaluation
Lazy evaluation extends this idea: only instantiate the states that are "relevant" to past trades. If nobody has ever traded on event C, then event C does not affect relative prices and can be ignored until someone trades on it.
This is implemented by tracking which events have been "activated" by trades and only enumerating combinations of activated events. The partition function for unactivated events contributes a constant factor that cancels out in price calculations.
30.6.4 Cost Function Properties
The LMSR-C preserves the key properties of the standard LMSR:
- Information incorporation: Prices reflect trader beliefs.
- Bounded loss: The market maker's maximum loss is bounded.
- No arbitrage: Prices form a valid probability distribution.
- Path independence: The cost of a sequence of trades does not depend on the order.
However, the bounded loss property deserves special attention. For the standard LMSR, the maximum loss is $b \ln N$ where $N$ is the number of outcomes. For LMSR-C, $N = 2^n$, so the maximum loss is $b \cdot n \cdot \ln 2$. This grows linearly in $n$, which is much more manageable than the state space itself.
30.6.5 Implementation: LMSR-C
import numpy as np
from itertools import product
from collections import defaultdict
class LMSR_C:
"""
Combinatorial LMSR with lazy evaluation.
Only tracks events that have been traded.
"""
def __init__(self, event_names, b=100.0):
self.event_names = event_names
self.n = len(event_names)
self.b = b
# Trade deltas: list of (condition_mask, delta)
# condition_mask maps event_index -> required_value
self.trade_deltas = []
# Activated events: events that appear in any trade
self.activated = set()
# Cache
self._cache_valid = False
self._cached_prices = None
def _share_for_state(self, state_tuple):
"""
Compute total shares for a given state by summing
all trade deltas that match.
"""
total = 0.0
for condition, delta in self.trade_deltas:
match = all(
state_tuple[idx] == val
for idx, val in condition.items()
)
if match:
total += delta
return total
def _compute_over_activated(self):
"""
Enumerate only states of activated events.
Returns dict of activated_state -> share_count.
"""
if not self.activated:
return {(): 0.0}
activated_list = sorted(self.activated)
n_act = len(activated_list)
results = {}
for combo in product([False, True], repeat=n_act):
# Map to full state (only activated positions matter)
state = {}
for i, idx in enumerate(activated_list):
state[idx] = combo[i]
# Compute shares
total = 0.0
for condition, delta in self.trade_deltas:
match = all(
state.get(idx, None) == val
for idx, val in condition.items()
if idx in self.activated
)
if match:
total += delta
results[combo] = total
return results, activated_list
def _partition_function(self):
"""Compute the partition function over activated events."""
if not self.activated:
return 1.0
state_shares, _ = self._compute_over_activated()
# Multiply by 2^(n - n_activated) for unactivated events
# But this cancels in price ratios, so we can ignore it
# for relative prices
values = np.array(list(state_shares.values()))
return np.sum(np.exp(values / self.b))
def state_prices_activated(self):
"""
Prices for each combination of activated events.
"""
if not self.activated:
return {(): 1.0}
state_shares, activated_list = self._compute_over_activated()
values = np.array(list(state_shares.values()))
keys = list(state_shares.keys())
exp_v = np.exp(values / self.b)
Z = np.sum(exp_v)
prices = exp_v / Z
return {keys[i]: prices[i] for i in range(len(keys))}, activated_list
def event_probability(self, event_name):
"""Marginal probability of an event."""
event_idx = self.event_names.index(event_name)
if event_idx not in self.activated:
return 0.5 # Unactivated events have uniform prior
prices, activated_list = self.state_prices_activated()
pos_in_activated = activated_list.index(event_idx)
prob = sum(
p for state, p in prices.items()
if state[pos_in_activated]
)
return prob
def cost_function(self):
"""Current value of the cost function."""
if not self.activated:
return self.b * np.log(1.0)
state_shares, _ = self._compute_over_activated()
values = np.array(list(state_shares.values()))
max_v = np.max(values / self.b)
log_Z = max_v + np.log(
np.sum(np.exp(values / self.b - max_v))
)
return self.b * log_Z
def buy(self, condition, num_shares=1.0):
"""
Buy shares of a condition.
condition: dict mapping event_name -> required_value
e.g., {"RateHike": True, "GDPHigh": False}
"""
# Convert to index-based condition
idx_condition = {}
for event_name, value in condition.items():
idx = self.event_names.index(event_name)
idx_condition[idx] = value
self.activated.add(idx)
old_cost = self.cost_function()
self.trade_deltas.append((idx_condition, num_shares))
self._cache_valid = False
new_cost = self.cost_function()
return new_cost - old_cost
def sell(self, condition, num_shares=1.0):
"""Sell shares."""
return self.buy(condition, -num_shares)
def conditional_probability(self, target, given):
"""
P(target | given).
target: dict of event_name -> bool
given: dict of event_name -> bool
"""
if not self.activated:
return 0.5
prices, activated_list = self.state_prices_activated()
def matches(state, constraints):
for event_name, value in constraints.items():
idx = self.event_names.index(event_name)
if idx not in self.activated:
continue
pos = activated_list.index(idx)
if state[pos] != value:
return False
return True
p_given = sum(
p for state, p in prices.items()
if matches(state, given)
)
p_both = sum(
p for state, p in prices.items()
if matches(state, {**given, **target})
)
if p_given < 1e-15:
return float('nan')
return p_both / p_given
def summary(self):
"""Print summary."""
n_act = len(self.activated)
print(f"LMSR-C: {self.n} events, "
f"{n_act} activated, "
f"{2**n_act} active states "
f"(of {2**self.n} total)")
print(f"Trades: {len(self.trade_deltas)}")
print(f"\nMarginal probabilities:")
for name in self.event_names:
prob = self.event_probability(name)
marker = "" if self.event_names.index(name) in self.activated else " (prior)"
print(f" {name}: {prob:.4f}{marker}")
def max_loss(self):
"""Maximum possible loss for the market maker."""
n_act = len(self.activated)
return self.b * n_act * np.log(2)
# Example
if __name__ == "__main__":
events = [
"RateHike", "GDPHigh", "StocksUp",
"OilSpike", "ElectionX", "InflationHigh",
"HousingCrash", "TechBoom"
]
market = LMSR_C(events, b=100.0)
market.summary()
# Only trade on a few events --- lazy evaluation kicks in
cost = market.buy({"RateHike": True}, num_shares=30)
print(f"\nBought RateHike=True: ${cost:.2f}")
cost = market.buy(
{"RateHike": True, "StocksUp": False},
num_shares=20
)
print(f"Bought RateHike=T AND StocksUp=F: ${cost:.2f}")
cost = market.buy({"GDPHigh": True}, num_shares=15)
print(f"Bought GDPHigh=True: ${cost:.2f}")
market.summary()
# Conditional probability
p = market.conditional_probability(
{"StocksUp": True},
{"RateHike": True}
)
print(f"\nP(StocksUp | RateHike) = {p:.4f}")
print(f"\nMax market maker loss: ${market.max_loss():.2f}")
30.6.6 Bounded Loss Analysis
A critical property for any automated market maker is bounded loss: the market operator needs to know the maximum amount they can lose.
For the standard LMSR with N outcomes, the maximum loss is:
$$L_{\max} = b \ln N$$
For the combinatorial LMSR with n binary events:
$$L_{\max} = b \ln(2^n) = b \cdot n \cdot \ln 2 \approx 0.693 \cdot b \cdot n$$
This is a remarkable result: the maximum loss grows only linearly in the number of events, even though the state space grows exponentially. This makes the LMSR-C economically viable even for moderately large n, provided we solve the computational challenges.
For the lazy evaluation variant, the bound is even tighter: the maximum loss depends only on the number of activated events, not the total number of events.
30.7 Bundle Trading
30.7.1 The Bundle Concept
A bundle is a collection of positions across multiple contracts that is traded as a single atomic transaction. Instead of a trader making separate bets on individual events, they submit a bundle order specifying their desired positions on multiple outcomes simultaneously.
For example, a bundle order might say: "I want to buy 10 shares of A, sell 5 shares of B, and buy 3 shares of (A AND B), all at a total cost not exceeding $15."
Bundle trading is powerful because it lets traders express complex beliefs about the relationships between events without needing to execute multiple separate trades (which could be front-run or only partially filled).
30.7.2 Bundle Order Types
Several types of bundle orders arise naturally:
-
Portfolio bundles: Buy/sell a specified quantity of each contract in the bundle. The bundle executes atomically or not at all.
-
Spread bundles: Express a view on the difference between two probabilities. For example, "Buy A, sell B" is a bet that P(A) > P(B).
-
Conditional bundles: "If A resolves True, buy B." These only activate if a condition is met.
-
Parlay bundles: Buy a conjunction. "I want to buy (A AND B AND C)." Equivalent to buying shares of the specific state where all three are true.
-
Hedging bundles: Combinations designed to reduce risk. "Buy A, sell (A AND NOT B)" hedges A against the scenario where B fails.
30.7.3 Atomic Execution
A critical property of bundle trading is atomicity: the entire bundle executes or none of it does. This prevents the scenario where a trader gets half of a spread trade executed (buying A) but not the other half (selling B), leaving them with unintended exposure.
In a centralized market with an automated market maker, atomicity is straightforward: the market maker computes the total cost of the bundle and executes it in a single state update. In a decentralized market, atomicity requires smart contract mechanisms (see Chapter 28 for blockchain-based markets).
30.7.4 Implementation: Bundle Order System
import numpy as np
from itertools import product
class BundleOrderMarket:
"""
Market supporting bundle orders across multiple events.
Uses LMSR for pricing individual events and computes
bundle costs as the sum of individual position costs.
"""
def __init__(self, event_names, b=100.0):
self.event_names = event_names
self.n = len(event_names)
self.b = b
# Each event has its own share count
# (for Yes and No outcomes)
self.shares = {
name: {'yes': 0.0, 'no': 0.0}
for name in event_names
}
# Conjunction tracking for correlated bets
# Full combinatorial for small subsets
self.conjunction_shares = {} # frozenset -> shares
self.order_book = []
self.executed_orders = []
def _lmsr_cost(self, q_yes, q_no):
"""LMSR cost for a binary market."""
return self.b * np.log(
np.exp(q_yes / self.b) + np.exp(q_no / self.b)
)
def _lmsr_price(self, q_yes, q_no):
"""Price of Yes in a binary LMSR."""
e_y = np.exp(q_yes / self.b)
e_n = np.exp(q_no / self.b)
return e_y / (e_y + e_n)
def event_price(self, event_name):
"""Current price of an event being True."""
s = self.shares[event_name]
return self._lmsr_price(s['yes'], s['no'])
def submit_bundle(self, positions, max_cost=None):
"""
Submit a bundle order.
positions: list of dicts with keys:
- event: event name
- side: 'yes' or 'no'
- quantity: number of shares (positive = buy)
max_cost: maximum total cost (None = no limit)
Returns: dict with execution details or None if rejected.
"""
# Compute total cost
total_cost = 0.0
position_costs = []
for pos in positions:
event = pos['event']
side = pos['side']
quantity = pos['quantity']
s = self.shares[event]
old_cost = self._lmsr_cost(s['yes'], s['no'])
new_yes = s['yes'] + (quantity if side == 'yes' else 0)
new_no = s['no'] + (quantity if side == 'no' else 0)
new_cost = self._lmsr_cost(new_yes, new_no)
pos_cost = new_cost - old_cost
total_cost += pos_cost
position_costs.append(pos_cost)
# Check max cost constraint
if max_cost is not None and total_cost > max_cost:
return {
'executed': False,
'reason': f'Cost {total_cost:.2f} exceeds '
f'max {max_cost:.2f}',
'estimated_cost': total_cost
}
# Execute atomically
for pos in positions:
event = pos['event']
side = pos['side']
quantity = pos['quantity']
self.shares[event][side] += quantity
result = {
'executed': True,
'total_cost': total_cost,
'position_costs': position_costs,
'positions': positions
}
self.executed_orders.append(result)
return result
def submit_parlay(self, events_true, num_shares=1.0,
max_cost=None):
"""
Submit a parlay bet: all specified events are True.
This is a conjunction bet.
"""
positions = [
{'event': e, 'side': 'yes', 'quantity': num_shares}
for e in events_true
]
return self.submit_bundle(positions, max_cost)
def submit_spread(self, long_event, short_event,
num_shares=1.0, max_cost=None):
"""
Submit a spread bet: long one event, short another.
Profit if long_event is more likely than short_event.
"""
positions = [
{'event': long_event, 'side': 'yes',
'quantity': num_shares},
{'event': short_event, 'side': 'no',
'quantity': num_shares}
]
return self.submit_bundle(positions, max_cost)
def submit_hedge(self, primary_event, hedge_event,
primary_shares=1.0, hedge_ratio=0.5):
"""
Submit a hedged position: go long on primary,
partially hedge with another event.
"""
positions = [
{'event': primary_event, 'side': 'yes',
'quantity': primary_shares},
{'event': hedge_event, 'side': 'no',
'quantity': primary_shares * hedge_ratio}
]
return self.submit_bundle(positions)
def portfolio_value(self, positions_held):
"""
Estimate portfolio value given current prices.
positions_held: dict of (event, side) -> quantity
"""
value = 0.0
for (event, side), qty in positions_held.items():
price = self.event_price(event)
if side == 'yes':
value += qty * price
else:
value += qty * (1 - price)
return value
def summary(self):
"""Print market summary."""
print(f"Bundle Order Market: {self.n} events")
print(f"Executed orders: {len(self.executed_orders)}")
print(f"\nEvent prices:")
for name in self.event_names:
price = self.event_price(name)
s = self.shares[name]
print(f" {name}: {price:.4f} "
f"(yes={s['yes']:.1f}, no={s['no']:.1f})")
# Example
if __name__ == "__main__":
events = ["TeamA_Wins", "TeamB_Wins", "TeamC_Wins",
"TotalPoints_Over"]
market = BundleOrderMarket(events, b=50.0)
# Parlay bet: TeamA AND TeamB both win
result = market.submit_parlay(
["TeamA_Wins", "TeamB_Wins"],
num_shares=10
)
print(f"Parlay: {result}")
# Spread bet: TeamA more likely than TeamC
result = market.submit_spread(
"TeamA_Wins", "TeamC_Wins",
num_shares=15
)
print(f"Spread: {result}")
# Hedged position
result = market.submit_hedge(
"TeamA_Wins", "TotalPoints_Over",
primary_shares=20, hedge_ratio=0.3
)
print(f"Hedge: {result}")
market.summary()
30.7.5 Cross-Market Arbitrage with Bundles
Bundle trading creates interesting arbitrage opportunities (and challenges). Consider three markets: - Market A: P(Rain) = 0.60 - Market B: P(Outdoor Event Cancelled) = 0.50 - Market C: P(Rain AND Event Cancelled) = 0.55
This is logically impossible: P(A AND C) cannot exceed min(P(A), P(C)). In this case, P(Rain AND Event Cancelled) = 0.55 exceeds P(Outdoor Event Cancelled) = 0.50. An arbitrageur should sell the conjunction and buy the individual events to profit from the inconsistency.
Bundle orders make such arbitrage strategies easy to express and execute atomically. Without bundles, the arbitrageur would need to execute multiple trades separately, risking partial execution and adverse price movements between trades.
30.8 Conditional Prediction Markets
30.8.1 The Conditional Market Framework
Conditional prediction markets answer questions of the form: "IF event X occurs, what is the probability of event Y?" These are fundamentally different from simple conjunction markets because they condition on one event and price another.
The mathematical framework is straightforward:
$$P(Y | X) = \frac{P(X \text{ and } Y)}{P(X)}$$
But the market design is subtle. A conditional market creates two "sub-markets" for Y: - One that is active only if X resolves True - One that is active only if X resolves False
If a trader buys a conditional share "Y given X," they receive: - $1 if both X and Y are true - $0 otherwise - Their purchase is refunded if X is false
The last point is crucial: conditional shares are voided (refunded) if the condition event does not occur. This means the trader is only betting on their belief about P(Y | X), not taking any risk on X itself.
30.8.2 Conditional Token Framework
The most practical implementation of conditional markets uses conditional tokens, popularized by Gnosis in the Ethereum ecosystem (but applicable to any market structure).
The idea is: 1. Start with a "collateral token" (e.g., $1). 2. Split it into conditional tokens for each outcome of the condition event. - If the condition is binary (X or NOT X), you get two tokens. 3. Each conditional token can then be further split for the outcome event Y. - "X=True AND Y=True", "X=True AND Y=False" - "X=False AND Y=True", "X=False AND Y=False" 4. A market maker provides pricing for each conditional token.
This "split" operation is what creates the conditional structure. The conditional tokens for "Y given X" are the pair: - "X=True AND Y=True" - "X=True AND Y=False"
Their relative prices give P(Y | X).
30.8.3 Why Conditional Markets Are Powerful
Conditional markets are particularly powerful for several reasons:
-
Causal reasoning: "If we implement policy P, what will happen to outcome O?" This is the foundation of decision markets (Chapter 31).
-
Scenario analysis: "If the economy enters recession, what is the probability of the incumbent winning re-election?" Decision-makers can explore scenarios systematically.
-
Information revelation: Conditional markets reveal how traders think events are related. If P(Y | X) is very different from P(Y | NOT X), the market has revealed that traders believe X causally or correlationally affects Y.
-
Reduced complexity: Instead of pricing the full joint distribution over many events, conditional markets focus attention on the most decision-relevant questions.
30.8.4 Implementation: Conditional Prediction Market
import numpy as np
class ConditionalMarket:
"""
Conditional prediction market: price Y given X.
Implements the conditional token framework.
"""
def __init__(self, condition_name, outcome_name, b=100.0):
self.condition_name = condition_name
self.outcome_name = outcome_name
self.b = b
# Four fundamental states
# (condition=T, outcome=T), (condition=T, outcome=F),
# (condition=F, outcome=T), (condition=F, outcome=F)
self.shares = np.zeros(4)
# Index: 0=TT, 1=TF, 2=FT, 3=FF
self.trade_history = []
def _cost(self, shares=None):
if shares is None:
shares = self.shares
return self.b * np.log(np.sum(np.exp(shares / self.b)))
def _prices(self, shares=None):
if shares is None:
shares = self.shares
exp_s = np.exp(shares / self.b)
return exp_s / np.sum(exp_s)
def state_prices(self):
"""Prices for all four states."""
p = self._prices()
return {
(True, True): p[0],
(True, False): p[1],
(False, True): p[2],
(False, False): p[3]
}
def condition_probability(self):
"""P(condition = True)."""
p = self._prices()
return p[0] + p[1]
def outcome_probability(self):
"""P(outcome = True) (marginal)."""
p = self._prices()
return p[0] + p[2]
def conditional_probability_given_true(self):
"""P(outcome = True | condition = True)."""
p = self._prices()
p_cond = p[0] + p[1]
if p_cond < 1e-15:
return float('nan')
return p[0] / p_cond
def conditional_probability_given_false(self):
"""P(outcome = True | condition = False)."""
p = self._prices()
p_not_cond = p[2] + p[3]
if p_not_cond < 1e-15:
return float('nan')
return p[2] / p_not_cond
def buy_conditional(self, condition_value, outcome_value,
num_shares=1.0):
"""
Buy shares of a specific conditional outcome.
Returns cost of the trade.
"""
idx = self._state_index(condition_value, outcome_value)
old_cost = self._cost()
self.shares[idx] += num_shares
new_cost = self._cost()
cost = new_cost - old_cost
self.trade_history.append({
'condition': condition_value,
'outcome': outcome_value,
'shares': num_shares,
'cost': cost
})
return cost
def buy_outcome_given_condition(self, outcome_value,
condition_value=True,
num_shares=1.0):
"""
Buy conditional shares: bet on outcome given condition.
If condition doesn't occur, shares are voided (refunded).
This is equivalent to buying the conjunction and being
refunded if condition fails.
"""
return self.buy_conditional(
condition_value, outcome_value, num_shares
)
def buy_condition(self, value=True, num_shares=1.0):
"""Buy shares on the condition event (both outcomes)."""
old_cost = self._cost()
if value:
self.shares[0] += num_shares # TT
self.shares[1] += num_shares # TF
else:
self.shares[2] += num_shares # FT
self.shares[3] += num_shares # FF
new_cost = self._cost()
return new_cost - old_cost
def buy_outcome(self, value=True, num_shares=1.0):
"""Buy shares on the outcome event (both conditions)."""
old_cost = self._cost()
if value:
self.shares[0] += num_shares # TT
self.shares[2] += num_shares # FT
else:
self.shares[1] += num_shares # TF
self.shares[3] += num_shares # FF
new_cost = self._cost()
return new_cost - old_cost
def _state_index(self, condition, outcome):
if condition and outcome:
return 0
elif condition and not outcome:
return 1
elif not condition and outcome:
return 2
else:
return 3
def causal_effect_estimate(self):
"""
Estimate causal effect:
P(outcome | condition=T) - P(outcome | condition=F).
Positive means condition increases outcome probability.
"""
p_y_given_x = self.conditional_probability_given_true()
p_y_given_not_x = self.conditional_probability_given_false()
return p_y_given_x - p_y_given_not_x
def summary(self):
"""Print market summary."""
print(f"Conditional Market: "
f"P({self.outcome_name} | {self.condition_name})")
print(f"\nState prices:")
sp = self.state_prices()
for state, price in sp.items():
c_str = "T" if state[0] else "F"
o_str = "T" if state[1] else "F"
print(f" {self.condition_name}={c_str}, "
f"{self.outcome_name}={o_str}: {price:.4f}")
print(f"\nMarginals:")
print(f" P({self.condition_name}) = "
f"{self.condition_probability():.4f}")
print(f" P({self.outcome_name}) = "
f"{self.outcome_probability():.4f}")
print(f"\nConditionals:")
print(f" P({self.outcome_name} | "
f"{self.condition_name}=T) = "
f"{self.conditional_probability_given_true():.4f}")
print(f" P({self.outcome_name} | "
f"{self.condition_name}=F) = "
f"{self.conditional_probability_given_false():.4f}")
effect = self.causal_effect_estimate()
direction = "increases" if effect > 0 else "decreases"
print(f"\nEstimated causal effect: {effect:+.4f} "
f"({self.condition_name} {direction} "
f"{self.outcome_name})")
class MultiConditionalMarket:
"""
Multiple conditional markets sharing a common condition.
E.g., "If carbon tax passes, what happens to GDP, employment,
and emissions?"
"""
def __init__(self, condition_name, outcome_names, b=100.0):
self.condition_name = condition_name
self.outcome_names = outcome_names
self.markets = {
name: ConditionalMarket(condition_name, name, b)
for name in outcome_names
}
def buy_conditional(self, outcome_name, outcome_value,
condition_value=True, num_shares=1.0):
"""Buy conditional shares in a specific outcome market."""
return self.markets[outcome_name].buy_outcome_given_condition(
outcome_value, condition_value, num_shares
)
def scenario_analysis(self):
"""
Print scenario analysis: what does the market predict
under each condition scenario?
"""
print(f"Scenario Analysis: {self.condition_name}")
print(f"{'='*60}")
for scenario_name, cond_val in [
("IF TRUE", True), ("IF FALSE", False)
]:
print(f"\n Scenario: {self.condition_name} = "
f"{scenario_name}")
print(f" {'-'*40}")
for name in self.outcome_names:
m = self.markets[name]
if cond_val:
p = m.conditional_probability_given_true()
else:
p = m.conditional_probability_given_false()
print(f" P({name}) = {p:.4f}")
print(f"\n Causal Effects:")
print(f" {'-'*40}")
for name in self.outcome_names:
effect = self.markets[name].causal_effect_estimate()
print(f" {name}: {effect:+.4f}")
# Example
if __name__ == "__main__":
# Single conditional market
market = ConditionalMarket("CarbonTax", "GDPGrowth", b=50.0)
# Traders express beliefs
# Trader 1: If carbon tax, GDP growth unlikely
market.buy_conditional(True, False, num_shares=20)
# Trader 2: If no carbon tax, GDP growth more likely
market.buy_conditional(False, True, num_shares=15)
# Trader 3: Carbon tax likely to pass
market.buy_condition(True, num_shares=10)
market.summary()
print("\n" + "="*60)
# Multi-conditional market
multi = MultiConditionalMarket(
"CarbonTax",
["GDPGrowth", "Employment", "Emissions", "Innovation"],
b=50.0
)
# Various traders express beliefs
multi.buy_conditional("GDPGrowth", False, True, 15)
multi.buy_conditional("Employment", False, True, 10)
multi.buy_conditional("Emissions", False, True, 25)
multi.buy_conditional("Innovation", True, True, 20)
multi.buy_conditional("GDPGrowth", True, False, 10)
multi.buy_conditional("Emissions", True, False, 5)
multi.scenario_analysis()
30.8.5 The Conditional Independence Problem
One subtlety of conditional markets is that they can create apparent inconsistencies when the conditional independence assumptions are violated.
Suppose we have three events: X, Y, Z. We create conditional markets for: - P(Y | X) - P(Z | X) - P(Y | Z)
These three markets can arrive at prices that are mutually inconsistent --- for example, they might imply that P(Y, Z | X) is negative. This happens because each conditional market operates independently and does not enforce the constraints of the joint distribution.
Solutions include: - Coupled market makers that enforce joint consistency - Arbitrage mechanisms that allow traders to exploit inconsistencies - Periodic consistency checks that adjust prices when they drift too far from feasibility
30.9 Real-World Combinatorial Markets
30.9.1 Polymarket's Related Markets
Polymarket, one of the largest prediction market platforms as of 2024-2025, handles combinatorial questions through a pragmatic approach: related but separate markets. For example, during the 2024 US presidential election, Polymarket offered: - Individual state-level markets (will candidate X win state Y?) - Overall election outcome markets - Markets on margin of victory
While these are not formally combinatorial markets in the academic sense, they achieve some of the same goals. Traders who believe certain states are correlated can express this by trading across multiple markets. The platform does not enforce joint consistency between markets, but arbitrageurs tend to keep prices roughly consistent.
The limitation is that direct questions about combinations ("What is the probability of winning state A AND state B AND state C?") cannot be directly traded. Traders must assemble their own multi-market positions.
30.9.2 Gnosis Conditional Tokens
Gnosis developed the Conditional Token Framework (CTF) on Ethereum, which implements a formal version of the conditional market structure described in Section 30.8. Key features include:
- Token splitting and merging: Users can split a collateral token into conditional tokens and merge them back. This creates the conditional market structure.
- Deep nesting: Conditions can be nested. "If A, then if B, what is C?" is represented as tokens split on A, then split again on B.
- Composability: Conditional tokens are standard ERC-1155 tokens and can be used in any DeFi protocol.
- AMM integration: Combined with Gnosis's automated market makers (specifically their custom market maker contracts), conditional tokens enable on-chain combinatorial markets.
The CTF has been used for political prediction markets, tournament predictions, and even governance decisions in decentralized organizations.
30.9.3 Academic Implementations
Several academic projects have pushed the boundaries of combinatorial markets:
- LMSR-based combinatorial markets: Chen and Pennock (2007) showed how to implement LMSR over exponential state spaces when trades are restricted to certain structures.
- Combinatorial betting markets with permutation structures: Useful for predicting rankings (e.g., "Who will finish 1st, 2nd, 3rd?") where the state space is n! rather than 2^n.
- Bayesian market makers: Dimitrov and Sami (2008) connected combinatorial markets to Bayesian network inference, showing that certain market structures are equivalent to belief propagation.
30.9.4 Corporate Experiments
Several large corporations have experimented with internal combinatorial prediction markets:
- Google: Used internal prediction markets for project timelines, combining questions about feature completion with resource allocation outcomes.
- Microsoft: Internal markets combined product launch timing with competitive response predictions.
- Intel: Used structured prediction markets for technology forecasting, combining questions about manufacturing yields with market demand.
These corporate experiments generally use simplified combinatorial structures (partitions or conditional markets) rather than full combinatorial enumeration, reflecting the practical trade-offs discussed throughout this chapter.
30.10 Computational Trade-offs
30.10.1 Accuracy vs. Speed
Every approximation method trades accuracy for speed. The question for a market designer is: how much accuracy is needed?
Consider the sources of error in a prediction market: 1. Approximation error: The gap between the approximate and exact price. 2. Trading noise: Prices fluctuate due to random trading activity. 3. Information error: Traders themselves may have inaccurate beliefs. 4. Model error: The underlying model (e.g., binary outcomes) may not match reality.
If trading noise alone causes prices to fluctuate by 5%, then an approximation error of 1% is negligible. The approximation does not need to be more accurate than the noise floor of the market.
30.10.2 Latency Requirements
Market responsiveness affects trader behavior: - < 100ms: Feels instant. Required for active trading. - 100ms-1s: Noticeable but acceptable for casual traders. - 1s-10s: Acceptable for infrequent updates or batch auctions. - > 10s: Only suitable for offline analysis.
Full combinatorial LMSR with 15 events (32,768 states) can compute prices in under 1ms on modern hardware. With 20 events (about 1 million states), it takes around 10-50ms. Beyond 20 events, approximation is needed for sub-second responses.
30.10.3 Memory Constraints
The full state vector for n events requires: - 15 events: 256 KB - 20 events: 8 MB - 25 events: 256 MB - 30 events: 8 GB
Lazy evaluation reduces memory to O(k) where k is the number of distinct states that have been traded, which is typically much smaller than 2^n.
30.10.4 When Approximation Is Good Enough
In practice, approximation is "good enough" when:
-
The approximation error is smaller than the bid-ask spread. If the market maker charges a 2% spread, a 0.5% approximation error is invisible to traders.
-
Relative rankings are preserved. Even if absolute probabilities are slightly off, if the market correctly identifies which events are more or less likely, it serves its information aggregation purpose.
-
Conditional relationships are directionally correct. If the market correctly identifies that P(Y | X) > P(Y | NOT X), even if the exact values are approximate, it is useful for decision-making.
-
Arbitrage opportunities remain detectable. If the approximation preserves enough structure for arbitrageurs to detect and correct inconsistencies, the market will self-correct over time.
30.10.5 Hybrid Approaches in Practice
The most practical systems use hybrid approaches:
- Exact computation for the most actively traded event combinations (typically a small subset of the full state space).
- Partition-based decomposition for clearly independent event groups.
- Sampling-based approximation for rare queries about unusual combinations.
- Caching of recently computed prices to avoid redundant computation.
This hybrid approach mirrors how the human brain handles combinatorial reasoning: we think precisely about a few key scenarios and use rough heuristics for everything else.
30.11 Advanced: Market Design for Causal Inference
30.11.1 From Correlation to Causation
One of the most ambitious applications of combinatorial prediction markets is causal inference: using market prices to identify causal relationships between events.
A standard prediction market reveals correlations: if P(Y | X) > P(Y | NOT X), we know that traders believe X and Y are positively associated. But does X cause Y, does Y cause X, or is there a common cause Z?
Combinatorial markets can help distinguish these cases by providing prices for multiple conditional distributions simultaneously. Consider three events: X, Y, Z. If we observe: - P(Y | X) is high - P(Y | X, Z) is similar to P(Y | NOT X, Z) (controlling for Z)
Then the association between X and Y may be "explained away" by the common cause Z. This is analogous to conditional independence testing in causal inference.
30.11.2 Decision Markets and Causal Structure
The connection between combinatorial markets and causal inference is deepest in the context of decision markets (which we will explore in depth in Chapter 31).
A decision market asks: "If we take action A, what will happen to outcome O?" This is inherently a causal question --- we want to know the effect of an intervention, not just a correlation.
The conditional market P(O | A) gives us the probability of O in worlds where A happens, but this might include worlds where A happens for various reasons (some of which independently affect O). A true causal estimate would want:
$$P(O | \text{do}(A))$$
where "do(A)" means we intervene to make A happen, breaking any causal arrows into A.
Hanson (2013) proposed that decision markets with the right structure can estimate do-calculus quantities, making prediction markets a tool for causal reasoning. The key insight is that if the market mechanism itself randomizes whether action A is taken (based on the market price), then the observed conditional probability becomes a causal estimate.
30.11.3 Limitations of Market-Based Causal Inference
Several limitations apply: - Trader sophistication: Traders must understand the conditional structure and trade accordingly. If traders conflate correlation with causation, market prices will too. - Manipulation incentives: In a decision market, traders may try to manipulate the price to influence the decision. - Confounding: Without proper randomization, market prices reflect conditional probabilities, not causal effects. - Sample size: Each market is effectively a single data point. Statistical inference requires multiple markets or repeated observations.
Despite these limitations, combinatorial markets remain one of the most promising approaches to crowdsourced causal reasoning.
30.12 Chapter Summary
This chapter has taken us through the theory, implementation, and practical challenges of combinatorial prediction markets. Let us recap the key ideas:
The Combinatorial Challenge: Real-world questions are interconnected. Markets that can price combinations, conjunctions, and conditional relationships provide richer information than isolated markets. But the exponential growth of state spaces (2^n for n binary events) makes naive approaches infeasible.
Full Combinatorial Markets work for small n (up to about 15-20 events) by enumerating all states and running a standard LMSR. They provide exact pricing for any Boolean expression.
Partition-Based Markets exploit independence between groups of events to decompose the full state space into smaller, manageable pieces. The savings can be dramatic (from 2^n to sum of 2^{k_i}).
Approximation Methods --- sampling, Bayesian networks, mean-field, and belief propagation --- make it possible to work with larger state spaces at the cost of some accuracy. The right method depends on the correlation structure and accuracy requirements.
LMSR Extensions (LMSR-C) maintain the desirable properties of the standard LMSR (bounded loss, no arbitrage, path independence) while using lazy evaluation to avoid instantiating the full state space. The maximum loss grows only linearly in n, not exponentially.
Bundle Trading enables traders to express complex multi-event beliefs in atomic transactions, preventing partial execution risks and enabling sophisticated strategies like spreads and hedges.
Conditional Markets are the most practically important special case, allowing questions of the form "if X, what happens to Y?" They connect to the conditional token framework used in real-world platforms.
Computational Trade-offs involve balancing accuracy, speed, memory, and user experience. Hybrid approaches that combine exact computation for common queries with approximation for rare ones are the most practical.
Causal Inference represents the frontier of combinatorial market design, connecting to decision markets and the do-calculus.
What's Next
In Chapter 31: Decision Markets, we will build directly on the conditional market framework introduced here to explore one of the most exciting applications of prediction markets: using them to make better decisions. Decision markets combine conditional predictions ("If we do X, what happens?") with a decision rule that selects the best action based on market prices. This creates fascinating incentive challenges and opens the door to "futarchy" --- governance by prediction markets.
The combinatorial machinery of this chapter is essential preparation. Decision markets are, at their core, conditional prediction markets where the condition is a decision and the outcome is a measure of success. Understanding how to efficiently price, approximate, and trade conditional predictions is the foundation on which decision markets are built.
Key Formulas Reference
| Formula | Description |
|---|---|
| $2^n$ | Number of states for n binary events |
| $C(\mathbf{q}) = b \ln \sum_i e^{q_i/b}$ | LMSR cost function |
| $p_i = e^{q_i/b} / \sum_j e^{q_j/b}$ | LMSR state price |
| $P(\phi) = \sum_{s:\phi(s)} p_s$ | Boolean expression price |
| $P(Y|X) = P(X,Y) / P(X)$ | Conditional probability |
| $L_{\max} = b \cdot n \cdot \ln 2$ | Maximum loss for LMSR-C |
Chapter 30 is part of Part V: Market Design & Mechanism Engineering in "Learning Prediction Markets --- From Concepts to Strategies."