36 min read

> "A prediction market is only as good as its question. Ask the wrong question, or ask it the wrong way, and even the most liquid, well-incentivized market will produce garbage." --- Robin Hanson

In This Chapter

28.1 The Art and Science of Market Design
28.2 Question Design Principles
28.3 Resolution Criteria and Sources
28.4 Common Wording Pitfalls
28.5 Outcome Space Design
28.6 Market Lifecycle Management
28.7 Liquidity Seeding and Subsidization
28.8 Incentive Design
28.9 Quality Metrics for Market Design
28.10 Case Studies in Market Design
28.11 Advanced: Automated Market Creation
28.12 Chapter Summary
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 28: Principles of Prediction Market Design

"A prediction market is only as good as its question. Ask the wrong question, or ask it the wrong way, and even the most liquid, well-incentivized market will produce garbage." --- Robin Hanson

Prediction markets aggregate beliefs into prices that serve as probability estimates. But this elegant mechanism depends critically on a foundation that is easy to overlook: the design of the market itself. A poorly designed market does not merely produce slightly degraded forecasts --- it can produce systematically biased estimates, discourage participation, invite manipulation, and erode trust in the entire platform. This chapter provides an exhaustive treatment of how to design prediction markets that actually work, from the wording of individual questions through resolution criteria, outcome space design, lifecycle management, liquidity seeding, incentive engineering, and quality measurement.

28.1 The Art and Science of Market Design

28.1.1 Why Design Matters

Consider two markets created on the same platform on the same day:

Market A: "Will AI be dangerous?" - Closes: December 31, 2030 - Resolution: Market creator decides

Market B: "Will a frontier AI system (as defined by the EU AI Act, Article 6) cause direct financial losses exceeding $1 billion USD (adjusted to 2024 dollars using CPI-U) to a single organization, as reported in SEC filings or equivalent regulatory disclosures, before January 1, 2031?" - Resolution source: SEC EDGAR filings, EU regulatory disclosures - Backup resolution: If primary sources are unavailable, resolution defers to a 3-person arbitration panel selected from platform-approved arbitrators

Market A will attract confused trading, generate unresolvable disputes, and produce a price that means essentially nothing. Market B, while more verbose, produces a tradable claim with a clear probability interpretation. Every trader knows exactly what they are betting on, exactly how resolution will be determined, and exactly what evidence matters.

The difference between these two markets is not merely aesthetic --- it is the difference between an information aggregation mechanism that works and one that does not.

28.1.2 The Market Designer's Objectives

A market designer must balance multiple objectives simultaneously:

Accuracy. The primary purpose of a prediction market is to produce well-calibrated probability estimates. Every design decision should be evaluated against this criterion first. Accuracy requires that traders can clearly understand what they are betting on, that resolution is deterministic and trustworthy, and that the market attracts informed participants.

Participation. A market with zero traders produces no information. The question must be interesting enough to attract attention, clear enough that potential traders are not deterred by ambiguity risk, and structured so that diverse participants --- experts, generalists, and casual traders --- can all contribute.

Liquidity. Even with willing participants, thin markets produce noisy prices. Design choices affect liquidity directly: a market with five possible outcomes will have thinner liquidity per outcome than a binary market. The designer must consider how to concentrate trading activity where it produces the most information.

Fairness. Markets must not systematically advantage one group of traders over another through information asymmetries created by the market design itself (as opposed to legitimate informational advantages). Resolution criteria that depend on a market creator's subjective judgment, for example, create an inherent unfairness.

Efficiency. Platform resources are finite. Automated market creation, standardized resolution criteria, and reusable templates reduce the marginal cost of adding new markets and allow platforms to scale.

28.1.3 The Design Space

Market design encompasses decisions at multiple levels:

Level	Decisions	Examples
Question	Topic, framing, scope	What event? What time frame?
Outcome space	Binary, categorical, scalar	Yes/No? Which candidate? What value?
Resolution	Criteria, sources, edge cases	What data? Who decides? What if ambiguous?
Mechanics	Trading mechanism, fees, limits	AMM? Order book? What spread?
Lifecycle	Creation, seeding, closure, settlement	When to open? When to close?
Incentives	Subsidies, rewards, reputation	How much to seed? What bonuses?

This chapter addresses each of these levels systematically. We begin with the most fundamental: how to write the question itself.

28.2 Question Design Principles

28.2.1 The SMART Framework for Markets

The SMART framework, originally developed for goal-setting, adapts remarkably well to prediction market question design:

Specific: The question identifies a precise event, entity, or measurement
Measurable: The outcome can be determined by objective, quantifiable criteria
Achievable (Assessable): The question is resolvable --- the answer can actually be determined
Relevant: The question addresses something people care about and have information on
Time-bound: There is a clear deadline for resolution

Let us examine each dimension in detail.

28.2.2 Specificity

A specific question leaves no room for reasonable disagreement about what is being asked.

Bad: "Will the economy improve?" - Which economy? The US? Global? By what measure? Improve compared to what baseline? Over what period?

Better: "Will US real GDP growth (annualized, as reported in the BEA's advance estimate) exceed 2.0% for Q3 2026?"

Specificity requires identifying: 1. The subject: Who or what is this about? (US real GDP) 2. The predicate: What must happen? (exceed 2.0%) 3. The measurement: How is this determined? (BEA advance estimate, annualized) 4. The baseline: Compared to what? (implicit: quarter-over-quarter, annualized) 5. The time frame: When must this occur? (Q3 2026)

28.2.3 Measurability

A measurable question can be resolved by reference to objective data.

Bad: "Will GPT-5 be impressive?" - "Impressive" is subjective and unmeasurable.

Better: "Will GPT-5 (or its marketed successor to GPT-4) score above 90% on the MMLU benchmark, as reported in OpenAI's technical report or by independent evaluation published in a peer-reviewed venue, by December 31, 2026?"

Measurability requires: - A specific metric or observable event - A defined threshold or criterion - An identified data source - Agreement on what constitutes a valid measurement

28.2.4 Assessability (Resolvability)

Some questions, while specific and measurable in principle, cannot actually be resolved in practice.

Problematic: "Will China's actual (not reported) GDP growth exceed 4% in 2026?" - China's actual GDP is not directly observable; we only have official statistics.

Better: "Will China's officially reported GDP growth (National Bureau of Statistics of China) exceed 4% for calendar year 2026?"

The designer must ask: "At resolution time, will there be an unambiguous data source that all reasonable parties agree answers this question?" If not, the question needs revision.

28.2.5 Relevance

A question must attract informed traders. Relevance has two dimensions:

Interest relevance: Do people care about the answer? Markets on obscure topics will attract no trading volume and produce no useful signal. "Will the Svalbard Global Seed Vault receive a new deposit in Q3 2026?" might be perfectly well-designed but attract zero interest.

Information relevance: Do potential traders have useful private information to contribute? Markets are valuable precisely because they aggregate dispersed information. A question like "Will a fair coin land heads?" is perfectly specific and measurable but produces no information because no trader has an edge.

28.2.6 Time-Boundedness

Every market must have a clear resolution date or a clear resolution trigger.

Date-based: "Will X happen before January 1, 2027?"

Trigger-based: "Will X happen before Y happens?" (Note: this requires a backstop date in case Y never happens.)

Bad: "Will humans colonize Mars?" - No time bound. This market could remain open indefinitely.

Better: "Will at least 100 humans live continuously on Mars for at least 365 consecutive days, beginning before January 1, 2060?"

28.2.7 Examples: Good vs. Bad Questions

Bad Question	Problems	Improved Version
"Will there be a recession?"	No country, no definition, no time frame	"Will the NBER Business Cycle Dating Committee declare that a US recession began in 2026, with their declaration occurring before Dec 31, 2027?"
"Will Tesla stock go up?"	No magnitude, no time frame, no starting reference	"Will TSLA close above $300 on the NYSE on December 31, 2026?"
"Will the war in Ukraine end?"	"End" is undefined, no time frame	"Will a formal ceasefire agreement between Russia and Ukraine, acknowledged by both governments, be in effect on June 30, 2026?"
"Will AI replace programmers?"	Vague subject, vague predicate, no time frame	"Will the BLS reported employment in SOC code 15-1252 (Software Developers) decline by more than 10% from 2025 to 2028, as reported in the OES survey?"
"Will Bitcoin moon?"	Informal language, no definition, no time frame	"Will the daily closing price of BTC/USD on Coinbase exceed $200,000 at any point before January 1, 2027?"

28.2.8 The Tension Between Precision and Participation

There is an inherent tension in question design. More precise questions are easier to resolve but harder to understand and may attract fewer participants. The ideal question finds a sweet spot: precise enough that resolution is unambiguous, but simple enough that a non-expert can understand what they are betting on.

One effective strategy is to have a clear, simple title with detailed resolution criteria in the description:

Title: "Will the US enter a recession in 2026?" Resolution criteria (in description): "Resolves YES if the National Bureau of Economic Research (NBER) Business Cycle Dating Committee officially declares that a recession began during any month of calendar year 2026. The declaration itself may occur after 2026. This market remains open until the earlier of: (a) an official NBER declaration covering 2026, or (b) December 31, 2028. If no declaration has been made by December 31, 2028, this market resolves NO."

28.3 Resolution Criteria and Sources

28.3.1 The Importance of Unambiguous Resolution

Resolution is the moment of truth for a prediction market. If traders do not trust that resolution will be fair and deterministic, the market price ceases to be a meaningful probability estimate. Rational traders will demand a risk premium for "resolution ambiguity risk," distorting prices away from true beliefs.

The ideal resolution criterion has these properties: 1. Deterministic: Given the same world-state, any reasonable person applying the criteria reaches the same resolution 2. Observable: The relevant information is publicly available 3. Timely: Resolution occurs within a reasonable time after the event 4. Authoritative: The data source is trusted by all parties 5. Robust: Edge cases are addressed in advance

28.3.2 Resolution Source Hierarchy

Not all data sources are equally reliable. A useful hierarchy:

Tier 1: Official government statistics and regulatory filings - BLS employment data, BEA GDP estimates, SEC filings, election results certified by official bodies - Highly reliable, rarely disputed, but may be revised (use "advance estimate" or "final estimate" specification)

Tier 2: Major institutional data - WHO disease reports, IMF economic data, major academic studies - Generally reliable but may have methodological disputes

Tier 3: Reputable media organizations - Reuters, AP, major newspapers - Good for event-based resolution but subject to correction and retraction

Tier 4: Domain-specific specialized sources - Box Office Mojo for film revenue, Transfermarkt for sports transfers, GitHub for software releases - Reliable within domain but may have coverage gaps

Tier 5: Platform or community determination - Market creator judgment, community vote, arbitration panel - Use only as last resort; subject to bias and manipulation

28.3.3 Handling Conflicting Sources

When different authoritative sources disagree, the resolution criteria must specify precedence:

Resolution source priority:
1. Official BLS Employment Situation Summary (primary)
2. If BLS data is unavailable: Federal Reserve Economic Data (FRED) mirror
3. If both are unavailable: Market resolves N/A

28.3.4 Backup Resolution and N/A

Markets must account for scenarios where resolution is impossible:

N/A Resolution (also called "void" or "annulled"): The market is cancelled, and all traders receive their money back (or shares are valued at their purchase price). This should be used when: - The question becomes meaningless (e.g., "Will X become president?" when X dies) - Resolution sources are permanently unavailable - The question was found to contain a fundamental error

Delayed Resolution: Sometimes the answer is knowable but not yet available. The criteria should specify a maximum wait time:

If the BEA has not released the advance estimate for Q3 2026 GDP by
January 31, 2027, this market resolves N/A.

28.3.5 Edge Cases Checklist

Every market should be stress-tested against these edge cases:

What if the subject ceases to exist? (Company goes bankrupt, country dissolves, person dies)
What if the metric is discontinued? (Data source stops publishing)
What if the metric is redefined? (BLS changes methodology)
What if there is a partial outcome? (Half of the condition is met)
What if the event happens but is later reversed? (Record set, then disqualified)
What if the event happens outside the specified time frame by a technicality? (Announced on Dec 31, effective Jan 1)
What if there is reasonable disagreement about whether the criterion is met? (Does a "ceasefire" include an informal cessation of hostilities?)
What if the market itself influences the outcome? (Self-referential markets)

28.3.6 A Resolution Rules Engine (Python)

We can formalize resolution logic as a rules engine. See code/example-01-resolution-engine.py for the full implementation. The core idea:

class ResolutionCriterion:
    """
    A single criterion that must be evaluated for market resolution.
    """
    def __init__(self, description, source, check_fn, backup_source=None):
        self.description = description
        self.source = source
        self.check_fn = check_fn
        self.backup_source = backup_source

    def evaluate(self, data):
        """Returns True, False, or None (indeterminate)."""
        result = self.check_fn(data.get(self.source))
        if result is None and self.backup_source:
            result = self.check_fn(data.get(self.backup_source))
        return result

The resolution engine chains multiple criteria and handles edge cases, conflicts, and N/A conditions programmatically. This is particularly valuable for platforms operating hundreds or thousands of markets simultaneously.

28.4 Common Wording Pitfalls

28.4.1 The Ambiguous "Or"

Natural language "or" is notoriously ambiguous between inclusive and exclusive interpretations.

Problematic: "Will Apple or Google release an AR headset in 2026?"

Does this mean: - (a) Apple releases one, OR Google releases one, OR both? (inclusive or) - (b) Exactly one of them releases one? (exclusive or)

Fix: Be explicit: "Will at least one of Apple or Google release an AR headset in 2026?" (inclusive) or create separate markets for each.

28.4.2 Scope Creep

Scope creep occurs when the meaning of key terms expands or contracts over time.

Problematic: "Will a self-driving car be available for purchase in 2026?"

What counts as "self-driving"? Level 3? Level 4? Level 5? What counts as "available for purchase"? In one city? Nationwide? In any country? What counts as a "car"? Does a self-driving shuttle count?

Fix: Pin down every term: "Will a Level 4 (SAE J3016) autonomous passenger vehicle be offered for retail sale to individual consumers (not fleet-only) in at least one US state, with deliveries beginning before January 1, 2027, as reported by the manufacturer or NHTSA?"

28.4.3 Moving Goalposts

Moving goalposts occur when the criteria for resolution can be reinterpreted after trading has occurred.

Problematic: "Will SpaceX's Starship be successful?" - After a test flight that reaches space but does not achieve orbital velocity, is that "successful"? SpaceX might call it successful; critics might not.

Fix: Define success with objective criteria before trading begins: "Will a SpaceX Starship vehicle complete a full orbital trajectory (apogee > 200 km, at least one complete orbit of Earth) and successfully land (intact vehicle on landing pad or drone ship) before January 1, 2027?"

28.4.4 Self-Referential Markets

A self-referential market is one where the market's own price or existence influences the outcome.

Example: "Will this market's price be above 50% on the close date?"

This creates a logical paradox. If traders believe it will resolve YES, they buy, pushing the price above 50%, making it resolve YES (self-fulfilling). But if they believe it will resolve NO, they sell, pushing it below 50%, making it resolve NO. The price is indeterminate and oscillates.

While self-referential markets are occasionally created as curiosities, they violate the fundamental purpose of prediction markets: to aggregate beliefs about external events.

28.4.5 Insider-Triggerable Markets

An insider-triggerable market is one where a small group of people can directly cause the outcome and profit from their foreknowledge.

Problematic: "Will Company X announce a product before Q3 2026?" - Company X employees know the announcement schedule and can trade on this with certainty.

This is not necessarily a fatal flaw --- many legitimate markets have insiders (employees always know earnings before announcement). The question is whether the market is primarily measuring insider knowledge versus aggregating dispersed public information. Markets that can be triggered by a single person's decision are particularly problematic.

Mitigation strategies: - Exclude known insiders from trading - Focus on outcomes that no single party controls - Ensure the market captures genuine uncertainty (even insiders are uncertain about election outcomes)

28.4.6 Gaming Through Semantic Interpretation

Sophisticated traders may exploit ambiguities in wording to create "sure thing" bets.

Example: "Will a human visit Mars before 2040?" - A trader could argue that "visit" includes a flyby without landing, or that "Mars" includes Phobos and Deimos.

Example: "Will unemployment fall below 4%?" - Fall below 4% at any point? As a monthly average? As an annual average? Seasonally adjusted?

Fix: For every key term, ask: "Is there any reasonable alternative interpretation that would change the resolution?" If yes, clarify.

28.4.7 Temporal Ambiguities

Time-related wording is a frequent source of disputes.

Problematic: "Will X happen in 2026?" - Does this mean "announced in 2026," "effective in 2026," "completed in 2026," or "begun in 2026"?

Problematic: "Will X happen by March?" - Does "by March" mean "before March 1," "before March 31," or "by the end of March"?

Fix: Use explicit dates and specify the relevant time zone: "before 11:59 PM ET on March 31, 2026" or "at any point during the period January 1, 2026 00:00 UTC through December 31, 2026 23:59 UTC."

28.4.8 Negation and Double Negation

Negative framing creates confusion.

Problematic: "Will the Fed NOT raise rates in 2026?" - Resolves YES if nothing happens? This is counterintuitive.

Better: "Will the Federal Reserve maintain or lower the federal funds rate target throughout all of 2026?" or simply use the positive framing and let the price reflect the probability of the negative outcome (a price of 0.20 on "Will the Fed raise rates?" implies 0.80 probability of not raising rates).

28.4.9 Summary of Pitfalls and Fixes

Pitfall	Example	Fix
Ambiguous "or"	"A or B will happen"	"At least one of A or B"
Scope creep	"self-driving car"	Pin each term to a standard
Moving goalposts	"successful mission"	Define success criteria objectively
Self-referential	"This market's price..."	Avoid; use external events
Insider-triggerable	"Will CEO announce..."	Focus on uncontrollable outcomes
Semantic gaming	"visit Mars"	Define every key term
Temporal ambiguity	"in 2026"	Use explicit UTC timestamps
Negation	"Will NOT happen"	Use positive framing

28.5 Outcome Space Design

28.5.1 Binary Markets

The simplest and most common market structure. A binary market has exactly two outcomes: YES and NO.

Advantages: - Maximum liquidity concentration (all trading in one market) - Easy to understand - Simple pricing: price = probability of YES - Well-suited for AMMs

Best for: Questions with natural yes/no answers, threshold questions ("Will X exceed Y?"), event occurrence questions ("Will X happen?")

28.5.2 Multi-Outcome (Categorical) Markets

A market with three or more mutually exclusive outcomes.

Example: "Who will win the 2028 presidential election?" - Outcomes: [Democrat, Republican, Other/Independent, No election held]

Advantages: - Captures richer information than multiple binary markets - Prices across outcomes must sum to 1 (arbitrage-enforced) - Reveals relative probabilities

Disadvantages: - Liquidity is split across outcomes - More complex pricing and trading - Harder for casual participants to understand

Design rules for multi-outcome markets: 1. Outcomes must be mutually exclusive (no overlap) 2. Outcomes must be exhaustive (cover all possibilities) 3. Include an "Other" or "None of the above" outcome as a catch-all 4. Limit the number of outcomes (5-10 is usually the practical maximum for liquid markets)

28.5.3 Scalar (Continuous) Markets

A market that estimates a numerical value rather than a discrete outcome.

Example: "What will US GDP growth be in Q3 2026?" - Traders buy and sell a contract whose value at resolution equals the realized GDP growth percentage.

Advantages: - Captures full distributional information - No need to choose thresholds - Point estimate (mean, median) is naturally derived from prices

Disadvantages: - More complex to implement and understand - Requires careful specification of the payoff function - Extremal outcomes create unbounded risk (must cap the range)

Design considerations for scalar markets: - Range: Set minimum and maximum values that are extremely unlikely to be breached (e.g., GDP growth between -10% and +15%) - Resolution value: What exact number will be used? (BEA advance estimate, not revised) - Payoff function: Linear between bounds? Piecewise?

28.5.4 Bracket Design for Continuous Outcomes

An alternative to scalar markets is to discretize a continuous outcome into brackets:

Example: "What will the unemployment rate be in December 2026?" - Below 3.0% - 3.0% to 3.4% - 3.5% to 3.9% - 4.0% to 4.4% - 4.5% to 4.9% - 5.0% to 5.4% - 5.5% or above

Bracket design principles: 1. Equal probability brackets are often better than equal-width brackets. If the base rate for unemployment is around 4%, having five brackets between 3% and 5% is more informative than having equal 1-point brackets from 0% to 10%. 2. Overlap-free boundaries must use half-open intervals: [3.0%, 3.5%) means 3.0 is included, 3.5 is not. 3. Endpoint precision must match the resolution source's precision. If unemployment is reported to one decimal place, brackets should use one decimal place.

28.5.5 Mutually Exclusive and Exhaustive Validation

For multi-outcome and bracket markets, validating that outcomes are mutually exclusive and exhaustive is critical. A formal approach:

Let $\Omega$ be the sample space of all possible outcomes. Let $O_1, O_2, \ldots, O_n$ be the defined outcomes.

Exhaustive: $\bigcup_{i=1}^{n} O_i = \Omega$ --- every possible outcome falls into at least one category.

Mutually exclusive: $O_i \cap O_j = \emptyset$ for all $i \neq j$ --- no outcome falls into more than one category.

For bracket markets with continuous outcomes, this becomes:

$$\text{Exhaustive: } \min(O_1) = \text{lower\_bound} \text{ and } \max(O_n) = \text{upper\_bound}$$ $$\text{Mutually exclusive: } \max(O_i) = \min(O_{i+1}) \text{ with consistent open/closed boundaries}$$

See code/example-02-outcome-validator.py for an implementation that checks these properties.

28.5.6 Choosing the Right Structure

Question Type	Recommended Structure	Example
Will X happen?	Binary	"Will SpaceX land on Mars before 2030?"
Who/which X?	Multi-outcome	"Which party wins the UK general election?"
How much/many?	Scalar or bracket	"What will inflation be in 2026?"
When will X happen?	Bracket (time ranges)	"When will fusion reach net energy gain?"
Ranking	Series of binary or conditional	"Will A finish ahead of B?"

28.6 Market Lifecycle Management

28.6.1 The Market Lifecycle

Every prediction market passes through a sequence of phases:

[Creation] --> [Seeding] --> [Active Trading] --> [Approaching Resolution]
    --> [Resolution] --> [Settlement] --> [Archived]

Each phase has distinct design considerations.

28.6.2 Creation Phase

When to create a market: Timing matters. A market created too early may languish without interest. A market created too late misses the period of maximum uncertainty and information aggregation.

Optimal creation timing depends on: - Event predictability curve: Create markets when there is meaningful uncertainty. A market on who will win the Super Bowl is most valuable early in the NFL season, not the day before the game. - Information availability: Create markets when there is enough public information for traders to form initial estimates, but before the outcome is highly predictable. - Public interest curve: Align market creation with periods of public attention (news cycles, seasonal events).

Pre-creation checklist: 1. Is the question specific, measurable, assessable, relevant, and time-bound? 2. Are resolution criteria unambiguous? 3. Has the edge case checklist been reviewed? 4. Is the outcome space properly designed? 5. Are resolution sources identified and currently functional? 6. Is there a backup resolution mechanism? 7. Has the question been reviewed by someone other than the creator? 8. Is the resolution timeline realistic?

28.6.3 Seeding Phase

New markets start with zero liquidity and zero information. The seeding phase establishes initial conditions:

Initial liquidity provision: Using an AMM (Chapter 9), the market creator or platform provides initial liquidity by funding the AMM with tokens.
Initial price setting: The initial price should reflect a reasonable prior. Setting a 50/50 price is not always appropriate --- if the base rate for the event is 5%, starting at 50% invites easy arbitrage that depletes the subsidy.
Early trader incentives: Bonus rewards, reduced fees, or reputation points for early traders help bootstrap participation.

28.6.4 Active Trading Phase

During active trading, the market designer's role shifts to monitoring:

Liquidity adequacy: Are spreads too wide? Is depth too thin? Consider adding more liquidity.
Price responsiveness: Is the price updating in response to relevant news? If not, the market may lack informed traders.
Manipulation detection: Are there suspicious patterns? Wash trading? Large positions established before insider-triggerable events?
Resolution criteria validity: Has anything changed that affects the resolution criteria? (Source discontinued, metric redefined, etc.)

28.6.5 Approaching Resolution

As the resolution date nears, several things change:

Price convergence: Prices should converge toward 0 or 1 (for binary markets) as uncertainty resolves. If they do not, it may indicate resolution ambiguity.
Liquidity withdrawal: Market makers may withdraw liquidity as the information advantage of being a market maker diminishes. The platform may need to maintain AMM liquidity.
Last-minute trading: A spike in trading volume just before close may indicate informed trading or manipulation.

28.6.6 Resolution Phase

Resolution should be: - Prompt: Executed as soon as the resolution criteria are satisfied - Transparent: The resolution rationale and data sources should be published - Appealable: There should be a brief window for disputes before settlement is final

Resolution process: 1. Resolution trigger occurs (date reached, event observed, data published) 2. Platform identifies the resolution data 3. Preliminary resolution is posted 4. Dispute window opens (e.g., 48 hours) 5. If no disputes (or disputes are resolved), resolution is finalized 6. Settlement occurs

28.6.7 Settlement Phase

Settlement is the final distribution of funds based on the resolution:

Winning positions are paid out
Losing positions are zeroed
N/A resolutions return funds at cost basis
Platform fees are collected

28.6.8 Timing Considerations

Consideration	Recommendation
Market duration	Match to natural information cycles (quarterly for economic data, 2-4 years for elections)
Trading hours	24/7 for global markets; consider timezone effects
Resolution delay	Minimize, but allow for data publication lag
Dispute window	24-72 hours is standard
Settlement speed	Immediate after dispute window closes

28.7 Liquidity Seeding and Subsidization

28.7.1 The Cold Start Problem

New markets face a chicken-and-egg problem: traders do not want to trade in illiquid markets, but markets need traders to become liquid. This is the cold start problem, and subsidization is the primary solution.

28.7.2 AMM Seeding

The most common approach to liquidity seeding is to fund an Automated Market Maker (see Chapter 9 for AMM mechanics). The platform or market creator deposits initial funds into the AMM, which then provides liquidity to all traders.

Key decisions: - Initial funding amount: More funding means tighter spreads and more liquidity, but higher cost. A typical range is $100-$10,000 per market depending on expected interest. - Initial price (prior): The AMM's initial price should reflect the best available prior estimate. For a binary market, this is the initial probability estimate. - AMM type: Logarithmic Market Scoring Rule (LMSR) is most common, but constant-product (Uniswap-style) AMMs are also used.

28.7.3 Subsidy Budget Allocation

Platforms with limited budgets must allocate subsidies strategically across markets.

The value of subsidizing a market depends on: - Expected information value: How valuable is an accurate forecast for this question? - Expected participation: How many traders will the market attract? - Marginal liquidity impact: How much does each dollar of subsidy improve market quality?

A simple model for subsidy allocation:

$$V_i = \alpha \cdot \text{InfoValue}_i + \beta \cdot \text{ExpParticipation}_i + \gamma \cdot \text{MarginalImpact}_i$$

Allocate the budget $B$ across $n$ markets to maximize $\sum_{i=1}^{n} V_i(s_i)$ subject to $\sum_{i=1}^{n} s_i \leq B$, where $s_i$ is the subsidy for market $i$.

28.7.4 Declining Subsidies

Initial subsidies can be gradually withdrawn as organic liquidity develops:

$$s(t) = s_0 \cdot e^{-\lambda t}$$

where $s_0$ is the initial subsidy, $\lambda$ is the decay rate, and $t$ is time since market creation. This provides strong initial liquidity that gracefully transitions to organic market-making.

Alternatively, subsidies can be withdrawn when organic volume exceeds a threshold:

$$s(t) = \max\left(0, s_0 - k \cdot V_{\text{organic}}(t)\right)$$

where $V_{\text{organic}}(t)$ is the cumulative organic trading volume and $k$ is a scaling factor.

28.7.5 Cost-Effectiveness Analysis

The cost-effectiveness of subsidization can be measured as:

$$\text{CE} = \frac{\Delta \text{Accuracy}}{s}$$

where $\Delta \text{Accuracy}$ is the improvement in forecast accuracy attributable to the subsidy and $s$ is the subsidy amount.

Empirical studies (Atanasov et al., 2017) have found that even modest subsidies ($50-$500 per market) can significantly improve forecast accuracy by attracting 2-5x more participants.

See code/example-03-market-quality.py for a subsidy calculator implementation.

28.7.6 Alternative Seeding Strategies

Beyond AMM subsidization:

Designated market makers (DMMs): Pay specific traders to maintain quotes on both sides. Common in financial markets, less common in prediction markets.
Community seeding: Allow community members to collectively seed liquidity and share returns.
Cross-subsidization: Use revenue from high-volume markets to subsidize low-volume but high-value markets.
Loss-leader markets: Create highly popular markets (elections, sports) at a loss to attract users who then trade in other markets.

28.8 Incentive Design

28.8.1 Participation Incentives

Getting people to trade in prediction markets requires overcoming several barriers: - Time cost: Understanding the question and forming a view takes effort - Capital cost: Traders must risk money - Uncertainty aversion: Many people dislike betting on uncertain outcomes - Regulatory perception: Some view prediction markets as gambling

Participation incentives address these barriers:

Incentive	Mechanism	Effectiveness
Sign-up bonus	Free tokens for new users	High for initial adoption, low for retention
Trading rewards	Tokens or points per trade	Moderate; can incentivize churning
Accuracy rewards	Bonus for correct predictions	High; aligns with market purpose
Referral programs	Rewards for bringing new traders	Moderate; depends on network effects
Content creation	Rewards for creating good markets	High for platforms with user-created markets

28.8.2 Accuracy Incentives

Accuracy incentives reward traders for making correct predictions, beyond the profit from their trades:

Brier score leaderboards: Rank traders by their calibration across multiple markets
Accuracy bonuses: Bonus payouts for predictions within a certain range of the outcome
Reputation points: Non-monetary score that reflects prediction accuracy over time

The Brier score for a set of $n$ binary predictions:

$$BS = \frac{1}{n} \sum_{i=1}^{n} (f_i - o_i)^2$$

where $f_i$ is the predicted probability and $o_i \in \{0, 1\}$ is the outcome. Lower is better (0 is perfect).

28.8.3 Liquidity Incentives

Market makers who provide liquidity bear adverse selection risk (informed traders trade against them). Compensating market makers:

Fee rebates: Market makers pay lower fees or receive a portion of taker fees
Liquidity mining: Token rewards proportional to liquidity provided
Spread guarantees: Platform guarantees that market makers can withdraw within a certain spread

28.8.4 Reputation Systems

Reputation systems serve multiple purposes in prediction markets:

Signal quality: High-reputation traders' bets carry more information
Governance: Reputation-weighted voting for dispute resolution
Access: Certain markets or features are unlocked at higher reputation levels
Social proof: Leaderboards create aspirational goals

A reputation score might be calculated as:

$$R = w_1 \cdot \text{Accuracy} + w_2 \cdot \text{Volume} + w_3 \cdot \text{Consistency} + w_4 \cdot \text{Tenure}$$

where each component is normalized to [0, 1] and weights sum to 1.

28.8.5 Tournament Structures

Prediction tournaments (a la the IARPA ACE program or Good Judgment Project) add competitive structure:

Fixed time periods: Monthly, quarterly, or annual tournaments
Standardized question sets: All participants forecast the same questions
Prize pools: Top forecasters win monetary or reputational prizes
Elimination rounds: Progressive tournaments where only top performers advance

Tournaments are particularly effective for: - Identifying superforecasters - Comparing forecasting methodologies - Generating public interest and media coverage

28.8.6 Gamification Considerations

Gamification elements (badges, streaks, levels) can increase engagement but carry risks:

Benefits: - Increased frequency of participation - Broader appeal to casual users - Clear progression path for new users

Risks: - Encouraging quantity over quality (trading for streaks rather than information) - Trivializing serious forecasting - Creating addictive patterns that may attract regulatory scrutiny

The key principle: gamification should reward accurate forecasting and thoughtful participation, not just volume.

28.9 Quality Metrics for Market Design

28.9.1 Why Measure Market Quality?

Market quality metrics serve three purposes: 1. Evaluation: Which markets are performing well? Which need intervention? 2. Comparison: How does market A compare to market B? How does platform X compare to platform Y? 3. Optimization: What design changes improve market quality?

28.9.2 Participation Metrics

Unique traders: The number of distinct individuals who have traded in the market. More traders generally means better information aggregation.

Trading volume: Total value of trades executed. Higher volume indicates stronger interest and more information being incorporated.

Trader diversity: Do traders come from diverse backgrounds and information sources? A market dominated by a single trader or a homogeneous group may not aggregate diverse information.

$$\text{Herfindahl Index} = \sum_{i=1}^{n} s_i^2$$

where $s_i$ is trader $i$'s share of total volume. Lower values indicate more diverse participation. A market where one trader accounts for 90% of volume ($HHI \approx 0.82$) is very different from one where 20 traders each account for 5% ($HHI = 0.05$).

28.9.3 Liquidity Metrics

Bid-ask spread: The difference between the best buy and sell prices. Tighter spreads mean lower transaction costs and more liquid markets.

$$\text{Spread} = \frac{p_{\text{ask}} - p_{\text{bid}}}{p_{\text{mid}}} \times 100\%$$

Depth: How much can be traded at or near the current price without significantly moving it? Measured as the dollar value available within X% of the midpoint.

Slippage: The difference between the expected price and the execution price for a given order size.

$$\text{Slippage}(q) = \left|\frac{p_{\text{executed}}(q) - p_{\text{mid}}}{p_{\text{mid}}}\right|$$

28.9.4 Accuracy and Calibration Metrics

The ultimate test of a prediction market is whether its prices are good probability estimates.

Calibration: Among all markets whose final price was $p$, what fraction actually resolved YES? Perfect calibration means $\mathbb{E}[\text{outcome} | \text{price} = p] = p$.

Brier score: As defined in Section 28.8.2. Can be applied to individual markets or aggregated across a platform.

Log score: $-\sum_{i} [o_i \ln(f_i) + (1-o_i) \ln(1-f_i)]$. More sensitive to extreme probabilities than the Brier score.

Resolution time accuracy: How accurate was the market at different time horizons before resolution? A well-functioning market should become more accurate as resolution approaches.

28.9.5 Resolution Quality Metrics

Dispute rate: What fraction of markets have their resolution disputed? High dispute rates indicate ambiguous resolution criteria.

Resolution time: How long between the resolution trigger and final settlement? Faster is better.

N/A rate: What fraction of markets resolve as N/A? High N/A rates indicate poor question design.

28.9.6 Composite Quality Score

A composite score combining multiple metrics:

$$Q = \sum_{j} w_j \cdot \tilde{m}_j$$

where $\tilde{m}_j$ is the normalized (0-1) value of metric $j$ and $w_j$ is its weight. A reasonable default weighting:

Metric	Weight	Rationale
Calibration (1 - Brier)	0.30	Accuracy is primary
Participation (log unique traders)	0.20	Information aggregation needs traders
Liquidity (1/spread)	0.20	Tradability matters
Low dispute rate (1 - dispute rate)	0.15	Resolution quality
Low N/A rate (1 - N/A rate)	0.15	Question design quality

See code/example-03-market-quality.py for a full implementation.

28.10 Case Studies in Market Design

28.10.1 Polymarket: Best Practices in Binary Market Design

Polymarket, one of the largest prediction market platforms, has developed sophisticated market design practices through iteration and experience.

What they do well: - Standardized resolution sources: Most markets reference specific, named data sources (BLS reports, official election results, etc.) - Clear resolution criteria: Resolution criteria are written in the market description with explicit conditions - Edge case handling: Markets typically address major edge cases (what if the data is revised? what if the source is unavailable?) - Appropriate time frames: Markets are created with resolution dates that match the natural information cycle

Example of strong Polymarket design:

Title: "Will US CPI exceed 3% for December 2025?" Resolution: "This market resolves to Yes if the Bureau of Labor Statistics (BLS) Consumer Price Index for All Urban Consumers (CPI-U) 12-month percentage change exceeds 3.0% for the month of December 2025, as reported in the CPI news release. The relevant figure is the 'all items' 12-month percent change, not seasonally adjusted."

This market is specific (CPI-U, all items, not seasonally adjusted), measurable (exceeds 3.0%), assessable (BLS publishes this monthly), relevant (inflation is a major concern), and time-bound (December 2025).

28.10.2 Metaculus: Question Design Excellence

Metaculus specializes in long-range forecasting and has developed perhaps the most rigorous question design process in the prediction market space.

Key features: - Community review: Questions go through a review process before opening - Fine print: Extensive "fine print" sections address edge cases - Resolution council: A council of community members can adjudicate disputes - Question series: Related questions are grouped into series for systematic tracking - Scoring rules: Uses a log-scoring rule that strongly incentivizes calibration

Metaculus question design template: 1. Question title: Clear, concise summary 2. Background: Context and motivation 3. Resolution criteria: Precise conditions for each possible resolution 4. Fine print: Edge cases, ambiguities, and exceptions 5. Resolution source: Primary and backup data sources 6. Close date: When forecasting ends 7. Resolution date: When the answer will be determined

28.10.3 Manifold Markets: Community Creation Model

Manifold Markets allows any user to create markets, which creates a natural experiment in market design.

What works: - Rapid market creation: Anyone can create a market on any topic - Diverse topics: Markets cover everything from geopolitics to personal bets - Social features: Comments, reactions, and sharing increase engagement - Play money: Lower stakes encourage experimentation

Challenges: - Quality variance: User-created markets vary enormously in quality - Resolution trust: Market creators resolve their own markets, creating potential conflicts - Ambiguity: Many markets are poorly worded and generate resolution disputes - Liquidity fragmentation: Thousands of markets with tiny volumes

Lessons for market design: 1. Lowering barriers to market creation massively increases quantity but reduces average quality 2. Community moderation can partially compensate for quality issues 3. Template-based market creation significantly improves quality 4. Reputation systems help identify reliable market creators

28.10.4 Lessons from Failures

The "Brexit" problem: Some early Brexit markets did not clearly specify what constituted "Brexit." Did it mean the referendum passing? Article 50 being triggered? The UK actually leaving the EU? The transition period ending? Markets with different implicit definitions showed different prices, causing confusion.

Lesson: Even seemingly simple questions ("Will X happen?") can have multiple reasonable interpretations of the triggering event.

The "COVID lab leak" problem: Markets on whether COVID-19 originated from a lab leak faced challenges because the resolution criterion (a definitive scientific or intelligence community consensus) may never be achievable. Markets that specified "will the scientific consensus as of 2025 support lab leak origin" were more tractable but still difficult because "scientific consensus" is subjective.

Lesson: Some questions are fundamentally difficult to resolve. Consider whether a clear resolution is achievable before creating the market.

The "Twitter/X" naming problem: Markets created about "Twitter" before the company rebranded to "X" faced ambiguity. Did "Twitter" refer to the company (now X Corp), the platform (now x.com), or the brand name (Twitter)?

Lesson: Entity references should be robust to name changes. Use legal entity names or identifiers where possible (e.g., "the company currently known as X Corp, formerly Twitter, Inc.").

28.11 Advanced: Automated Market Creation

28.11.1 The Scaling Challenge

Manually designing prediction markets is time-consuming and requires expertise. As the value of prediction markets becomes clearer, there is increasing demand for thousands or millions of markets covering diverse topics. Manual creation cannot scale to meet this demand.

28.11.2 Template-Based Generation

The simplest form of automated market creation uses templates:

Template: "Will {metric} {direction} {threshold} for {period}?"
Parameters:
  - metric: [US CPI, US GDP growth, unemployment rate, ...]
  - direction: [exceed, fall below]
  - threshold: [varies by metric]
  - period: [Q1 2026, Q2 2026, ..., December 2026, ...]
Resolution: "{source} {metric_name} for {period}"

This approach can generate hundreds of markets from a single template while maintaining consistent quality.

28.11.3 News-Driven Market Generation

More sophisticated systems can generate markets from news events:

Monitor news feeds for events with uncertain outcomes
Extract entities and events using NLP
Generate candidate questions based on extracted information
Validate questions against quality criteria (specificity, measurability, etc.)
Select resolution sources from a database of approved sources
Review and publish (human-in-the-loop or fully automated)

28.11.4 LLM-Assisted Question Writing

Large Language Models can assist in question design:

Draft generation: Given a topic, generate candidate questions
Ambiguity detection: Identify potential ambiguities in human-written questions
Edge case generation: Enumerate edge cases that the designer may not have considered
Resolution criteria writing: Given a question, draft resolution criteria
Quality scoring: Evaluate questions against the SMART framework

However, LLMs should not be the final authority on question design. They can miss subtle ambiguities, may not understand platform-specific conventions, and their suggestions should always be reviewed by experienced human designers.

28.11.5 Auto-Resolution

For markets based on structured data sources (economic statistics, sports results, stock prices), resolution can be fully automated:

Data source integration: Connect to APIs for BLS, BEA, sports data providers, etc.
Resolution condition evaluation: Programmatically check whether criteria are met
Confidence checking: Verify data consistency (same result from multiple sources)
Dispute window: Automated resolution still goes through a human-reviewable dispute window
Settlement: Automatic fund distribution after dispute window closes

See code/example-01-resolution-engine.py for an auto-resolution prototype.

28.11.6 Challenges and Risks

Automated market creation introduces risks:

Quality control: Automated systems may create low-quality or nonsensical markets
Spam: Without controls, the market could be flooded with worthless markets
Unintended consequences: Automated markets might touch on sensitive topics (assassinations, self-harm, etc.)
Resolution failures: Automated resolution may not handle edge cases correctly
Liability: Who is responsible when an automated market is poorly designed?

Mitigation strategies include human-in-the-loop review, content filtering, rate limiting, and graduated automation (start with templates, progress to more autonomous generation as confidence grows).

28.12 Chapter Summary

This chapter has covered the full spectrum of prediction market design, from the wording of individual questions through the mechanics of resolution, the structure of outcome spaces, the management of market lifecycles, and the measurement of market quality.

Core principles:

Clarity above all. Every market question must be specific, measurable, assessable, relevant, and time-bound. Ambiguity is the enemy of information aggregation.
Resolution is a contract. Resolution criteria define a contract between the platform and its traders. This contract must be explicit, objective, and robust to edge cases.
Design for the trader. Every design decision should consider how it affects the trader's experience: Can they understand the question? Can they trust the resolution? Can they trade efficiently?
Liquidity requires investment. Markets do not become liquid by themselves. Subsidization, market making, and incentive design are essential for bootstrapping and maintaining liquidity.
Measure and iterate. Market quality should be systematically measured and design should be continuously improved based on data.
Automate carefully. Automation can scale market creation but must be balanced with quality control and human oversight.

Key formulas and concepts:

Concept	Formula/Framework
Question quality	SMART framework
Outcome space validity	Mutually exclusive and exhaustive
Subsidy decay	$s(t) = s_0 \cdot e^{-\lambda t}$
Brier score	$BS = \frac{1}{n}\sum(f_i - o_i)^2$
Market quality	$Q = \sum w_j \cdot \tilde{m}_j$
Trader diversity	Herfindahl Index $= \sum s_i^2$
Spread	$\frac{p_{\text{ask}} - p_{\text{bid}}}{p_{\text{mid}}}$

What's Next

In Chapter 29: Mechanism Design for Information Elicitation, we dive deeper into the theoretical foundations of mechanism design as applied to prediction markets. While this chapter focused on the practical craft of designing individual markets, Chapter 29 explores the formal theory of incentive-compatible mechanisms --- scoring rules, proper elicitation, peer prediction, and the design of mechanisms that truthfully extract beliefs from participants even without observable ground truth. We will prove key theorems about strictly proper scoring rules and explore frontiers like wagering mechanisms for subjective beliefs.

Chapter 28 is part of Part V: Market Design & Mechanism Engineering. For the complete table of contents, see the book overview.