Case Study 1: Google's Internal Prediction Markets — A Decade of Corporate Forecasting
Overview
Google has operated one of the longest-running and most carefully studied internal prediction markets in corporate history. Beginning in 2005, Google's internal prediction markets provided forecasts on product launches, user adoption metrics, operational targets, and corporate strategy questions. This case study draws on published research by Bo Cowgill and Eric Zitzewitz, internal accounts, and industry analyses to provide a comprehensive picture of what worked, what failed, and what lessons generalize to other organizations.
Background
The Motivation
Google's culture of data-driven decision-making made it a natural candidate for internal prediction markets. By 2005, the company had grown to several thousand employees, creating the classic large-organization problem: critical information was distributed across hundreds of teams, and hierarchical reporting channels were too slow and filtered to aggregate it effectively.
Specific forecasting needs included:
- Product launch dates. Google's rapid product development cycle generated frequent launches, and accurately predicting launch timing was critical for marketing, partnerships, and resource allocation.
- User adoption. Would a new feature reach its target user count? Would a product achieve product-market fit?
- Revenue forecasts. Would specific product lines hit revenue targets?
- Competitive intelligence. When would competitors launch rival products?
- Operational metrics. Server capacity needs, data center expansion timelines, hiring targets.
Market Design
Google's prediction markets used a play-money system with prizes for top performers. This design choice was driven by several factors:
- Regulatory simplicity. Real-money prediction markets would have required compliance with federal and state gambling and securities laws, creating legal overhead that Google wanted to avoid.
- Broad participation. Play money eliminated the financial barrier to entry, encouraging participation from employees across all levels and functions.
- Cultural fit. Google's engineering culture valued intellectual competition. Leaderboards and bragging rights provided meaningful incentives.
Market mechanism. The markets used a continuous double auction with a market-maker bot that provided initial liquidity. Contracts were binary (YES/NO) paying 100 "Goobles" (Google's play-money currency) if the event occurred.
Incentive structure. Top-performing traders received quarterly prizes, typically Google merchandise, small gift cards, or recognition awards. The total prize pool was modest — perhaps a few thousand dollars per quarter — but the prestige of being a top forecaster was valued in Google's competitive culture.
Key Findings
Accuracy
The published research (Cowgill, 2009; Cowgill & Zitzewitz, 2015) documents several accuracy metrics:
Calibration. Markets were well-calibrated overall: events priced at 80% happened approximately 80% of the time. The calibration was not perfect — there was a slight tendency toward overconfidence at extreme probabilities — but it compared favorably to internal forecasts produced by project managers and analysts.
Relative accuracy. In head-to-head comparisons, prediction market forecasts outperformed official internal forecasts in the majority of cases, particularly for: - Product launch dates (where markets detected slippage earlier) - User adoption milestones (where markets were less susceptible to optimism bias) - Cross-functional questions requiring diverse information
Information speed. Markets incorporated new information faster than hierarchical reporting. When a product team encountered a delay, market prices typically reflected the delay days or even weeks before the official project status update.
Biases
Despite generally good calibration, the research identified several systematic biases:
Optimism about Google. Employees tended to be optimistic about Google's prospects overall. Markets on Google's stock price performance, revenue growth, and competitive position were systematically optimistic relative to reality.
Optimism about own projects. Engineers who traded on their own projects' milestones showed a marked optimism bias. A developer who knew their code was 60% complete might rate the probability of on-time delivery at 80% rather than the 60% that the completion rate would suggest.
Tenure effect. Surprisingly, newer employees were better calibrated than long-tenured ones. The researchers hypothesized that long-tenured employees had stronger emotional attachments to Google and its products, biasing their assessments upward.
Anchoring on initial prices. The market-maker bot set initial prices at 50%. This created an anchoring effect: for events that were clearly more or less likely than 50%, prices moved slowly away from the anchor, especially in the first few days of trading.
Participation Patterns
Who traded. Engineers were overrepresented relative to other functions (marketing, sales, operations, HR). This reflected both Google's engineering-heavy workforce composition and the natural affinity of engineers for quantitative prediction.
Trading frequency. A small fraction of participants (approximately 10-15%) accounted for the majority of trading activity. This "power law" of participation is consistent with patterns in external prediction markets.
Cross-functional trading. The most valuable trades came from participants who traded outside their area of expertise, bringing diverse information to bear. For example, a salesforce employee who traded on a product launch date might incorporate information from customer conversations that the engineering team did not have.
Event-driven spikes. Trading activity increased sharply around relevant events: product reviews, all-hands meetings, news articles about competitors, and internal milestone announcements.
Organizational Impact
Information surfacing. The markets' most valuable function was surfacing "bad news" that might not travel well through normal reporting channels. When a project was in trouble, market prices dropped before official acknowledgment, giving leadership an early warning signal.
Decision influence. The degree to which market prices actually influenced decisions varied. In some cases, a low market price on a product's success prompted leadership to investigate and intervene. In other cases, the market signal was noted but not acted upon.
Cultural effects. The markets contributed to Google's culture of intellectual honesty and data-driven reasoning. Participants reported that the act of making quantitative predictions encouraged more rigorous thinking about uncertainty.
Challenges and Limitations
Thin Markets
Many markets attracted too few traders to generate meaningful prices. Markets on obscure internal metrics or distant future events were particularly illiquid. The minimum viable participation level appeared to be approximately 20-30 active traders per market.
Gaming and Manipulation
While outright manipulation was rare (the play-money stakes made it not worth the effort), several forms of gaming were observed:
- Strategic trading. Some participants used the market strategically — for example, buying "YES" on their own project to signal confidence to colleagues.
- Inside information. Project leads occasionally traded on decisions they had already made, profiting from predetermined outcomes. While not technically illegal (play money), this undermined market integrity.
- Reputation management. Some traders avoided markets where they had inside information, not because of ethical concerns, but because they feared being identified as "the person who knew."
Organizational Resistance
Despite positive results, the prediction markets faced recurring organizational resistance:
- Managerial discomfort. Some managers were uncomfortable with the idea that a market could "know" more about their project's status than they did.
- Fear of commitment. Publicly expressed predictions (visible to colleagues) created accountability that some employees preferred to avoid.
- "Not my job." Without strong executive sponsorship during certain periods, participation waned as employees prioritized other work.
Sustainability
Google's prediction markets have undergone multiple iterations, expansions, and contractions over the years. The consistent challenge is maintaining engagement:
- New market launches generate excitement and high participation.
- Over time, participation declines as the novelty wears off.
- Refreshing the market with new questions and incentives is necessary but labor-intensive.
- Leadership changes can cause the market to lose its internal champion.
Quantitative Analysis
Brier Score Comparison
Using published data and estimates, we can compare the accuracy of Google's prediction markets to alternative forecasting methods:
| Method | Estimated Brier Score | Notes |
|---|---|---|
| Google prediction market | 0.18-0.22 | Varies by question type |
| Project manager forecasts | 0.24-0.30 | Consistently more optimistic |
| Historical base rates | 0.25 | Baseline comparator |
| Simple regression models | 0.20-0.25 | Using internal data |
| Constant 50% prediction | 0.25 | Theoretical baseline |
The prediction market's advantage was most pronounced for: - Questions with high information dispersion (many people have different pieces) - Questions where official forecasts were subject to political pressure - Questions with clearly verifiable outcomes
Participation and Accuracy Correlation
Analysis of individual markets showed a strong positive correlation between participation level and forecast accuracy:
- Markets with 50+ active traders: Brier score ~0.17
- Markets with 20-50 active traders: Brier score ~0.21
- Markets with fewer than 20 active traders: Brier score ~0.28
This confirms that participation is the key driver of prediction market quality, not market mechanism design or incentive structure.
Lessons for Other Organizations
Lesson 1: Start Small, Demonstrate Value
Google began with a small number of clearly resolvable questions and expanded based on demonstrated accuracy. Organizations should resist the temptation to launch with dozens of markets — focus on 5-10 questions where the market can clearly outperform existing forecasts.
Lesson 2: Play Money Works (for Corporate Markets)
The play-money design was sufficient to generate accurate forecasts in a corporate setting. The competitive culture at Google provided adequate incentives without real money. However, organizations with less competitive cultures may need stronger incentives.
Lesson 3: Executive Sponsorship Is Essential
During periods with strong executive champions, the prediction markets thrived. During periods without them, participation declined. A prediction market needs a senior leader who regularly references market prices in decision-making.
Lesson 4: Cross-Functional Questions Add the Most Value
The highest-value markets were those that aggregated information from multiple departments. Markets on questions that only one team could answer (like an internal engineering deadline known only to that team) added less value.
Lesson 5: Design for Honest Bad News
The market's greatest organizational contribution was surfacing bad news early. Platform design should protect anonymity (so employees are not punished for revealing problems through trading) and focus on questions where early warning matters.
Lesson 6: Beware of Optimism Bias
Corporate prediction markets systematically overestimate positive outcomes. Consider calibration adjustments or debiasing training for participants.
Discussion Questions
-
Why might play-money prediction markets work well at Google but potentially fail at a company with a different culture?
-
How should Google handle the case of a project lead who trades on their own project based on information they have not yet shared through official channels?
-
If Google's markets showed a product launch had only a 30% chance of meeting its deadline, should leadership intervene, and how?
-
How do you balance transparency (everyone can see market prices) with the risk that pessimistic prices become self-fulfilling prophecies for employee morale?
-
Could Google's prediction market model be adapted for distributed, remote-first organizations?
Computational Exercise
The chapter's code directory includes a simulation (code/case-study-code.py) that replicates the key dynamics of a Google-style corporate prediction market. The simulation includes:
- Heterogeneous participants with different information quality and biases
- Trading dynamics with information arrival and price discovery
- Comparison to analyst forecasts with optimism bias
- Analysis of how participation levels affect accuracy
Experiment with the simulation parameters to identify the minimum participation level needed for the market to outperform analyst forecasts.