Case Study 2: Corporate Prediction Markets at TechCorp

Executive Summary

TechCorp, a fictionalized mid-large technology company (12,000 employees, \$4B annual revenue), launched an internal prediction market in Q1 2024 to improve forecasting for product launches, feature adoption, and quarterly revenue. Over 18 months, 2,400 employees traded on 85 different markets. This case study examines the data from TechCorp's internal markets, compares market predictions to actual outcomes and official management forecasts, and explores the organizational dynamics that emerged.

By the end of this case study you will be able to:

Understand how corporate prediction markets differ from public ones.
Analyze internal market accuracy using calibration and error metrics.
Compare market predictions with management forecasts.
Identify common organizational challenges and design lessons.

Background

Why TechCorp Created an Internal Prediction Market

TechCorp's Chief Strategy Officer (CSO) noticed three problems with the company's existing forecasting process:

Optimism bias: Product managers consistently predicted earlier launch dates and higher adoption than reality delivered. Only 30 % of launches hit their original target dates.
Information silos: Engineers knew about technical risks that managers did not surface. Sales teams had demand signals that product teams never received.
Forecasting by hierarchy: In meetings, senior executives' opinions dominated, regardless of who actually had the best information.

After reading about Google's and HP's corporate prediction-market experiments, the CSO commissioned a 6-month pilot using a custom-built internal platform called ForecastHub.

ForecastHub Design

Feature	Detail
Currency	Play money ("ForecastCoins"), 1,000 coins starting balance
Incentive	Quarterly prizes for top-performing traders (gift cards, extra PTO)
Market types	Binary (Yes/No), Scalar (range)
Market creation	Approved by a committee; any employee can propose a market
Trading	Continuous double auction (limit order book)
Resolution	Determined by objective data sources (launch dashboards, revenue reports)
Anonymity	Trades anonymous; only platform admins can see identities
Participants	Open to all employees except C-suite (to avoid chilling effects)

Data Dictionary

The synthetic dataset contains records for 85 resolved markets over 18 months.

Markets Table

Column	Type	Description
`market_id`	int	Unique market identifier
`question`	str	Market question text
`category`	str	Category: "launch_date", "adoption", "revenue", "other"
`market_type`	str	"binary" or "scalar"
`open_date`	datetime	Date the market opened for trading
`close_date`	datetime	Date trading closed (resolution date)
`final_price`	float	Last trading price before resolution
`actual_outcome`	float	Actual outcome (1/0 for binary, numerical for scalar)
`mgmt_forecast`	float	Official management forecast (probability or number)
`num_traders`	int	Number of unique traders
`total_volume`	int	Total coins traded
`category_label`	str	Human-readable category

Phase 1: Generating and Exploring the Data

1.1 Synthetic Data Generation

import numpy as np
import pandas as pd

def generate_techcorp_data(seed: int = 42) -> pd.DataFrame:
    """
    Generate synthetic corporate prediction-market data for TechCorp.

    Models 85 resolved markets across four categories with realistic
    patterns: management overconfidence, market accuracy, varying liquidity.

    Parameters
    ----------
    seed : int
        Random seed for reproducibility.

    Returns
    -------
    pd.DataFrame
        One row per resolved market.
    """
    rng = np.random.default_rng(seed)
    n_markets = 85

    categories = ["launch_date", "adoption", "revenue", "other"]
    category_labels = {
        "launch_date": "Product Launch Date",
        "adoption": "Feature Adoption",
        "revenue": "Quarterly Revenue",
        "other": "Miscellaneous",
    }
    category_weights = [0.30, 0.25, 0.25, 0.20]
    market_categories = rng.choice(categories, size=n_markets, p=category_weights)

    # Market type: launch_date and adoption are binary; revenue is scalar; other is mixed
    market_types = []
    for cat in market_categories:
        if cat in ("launch_date", "adoption"):
            market_types.append("binary")
        elif cat == "revenue":
            market_types.append("scalar")
        else:
            market_types.append(rng.choice(["binary", "scalar"]))

    # Generate "true" probabilities / values
    true_probs = []
    for cat, mtype in zip(market_categories, market_types):
        if mtype == "binary":
            if cat == "launch_date":
                # True probability of on-time launch (often low)
                true_probs.append(rng.beta(3, 5))  # mean ~0.375
            elif cat == "adoption":
                true_probs.append(rng.beta(4, 4))  # mean ~0.50
            else:
                true_probs.append(rng.beta(5, 5))  # mean ~0.50
        else:
            # For scalar, true_prob represents a normalized value (0-1 range)
            true_probs.append(rng.beta(5, 3))  # mean ~0.625

    true_probs = np.array(true_probs)

    # Market final prices: well-calibrated with some noise
    # Markets are accurate on average but noisy for individual markets
    market_noise = rng.normal(0, 0.08, n_markets)
    final_prices = np.clip(true_probs + market_noise, 0.05, 0.95)

    # Management forecasts: systematically biased upward (optimism bias)
    mgmt_bias = np.where(
        np.array(market_categories) == "launch_date",
        rng.uniform(0.10, 0.25, n_markets),   # Mgmt thinks launches will be on time
        rng.uniform(0.02, 0.12, n_markets),    # General optimism
    )
    mgmt_forecasts = np.clip(true_probs + mgmt_bias, 0.10, 0.95)

    # Actual outcomes: determined by true probabilities
    actual_outcomes = []
    for tp, mtype in zip(true_probs, market_types):
        if mtype == "binary":
            actual_outcomes.append(float(rng.random() < tp))
        else:
            # Scalar: actual is true_prob + small noise, scaled to a revenue range
            actual_outcomes.append(round(tp + rng.normal(0, 0.05), 4))

    # Number of traders and volume
    num_traders = rng.integers(15, 350, size=n_markets)
    total_volume = (num_traders * rng.integers(50, 300, size=n_markets)).astype(int)

    # Dates
    open_dates = pd.date_range("2024-01-15", periods=n_markets, freq="6D")
    durations = rng.integers(14, 120, size=n_markets)
    close_dates = open_dates + pd.to_timedelta(durations, unit="D")

    # Questions
    questions = []
    q_templates = {
        "launch_date": [
            "Will Project {name} launch by {date}?",
            "Will the {name} feature ship on schedule?",
            "Will {name} v2.0 be released in {quarter}?",
        ],
        "adoption": [
            "Will {name} exceed {n}K daily active users by {date}?",
            "Will {pct}% of enterprise clients adopt {name} by {quarter}?",
            "Will the {name} feature have >50% weekly retention?",
        ],
        "revenue": [
            "What will {division} revenue be in {quarter}? (range: $X-$YM)",
            "What will {name} product ARR be at end of {quarter}?",
            "Will total company revenue exceed ${n}M in {quarter}?",
        ],
        "other": [
            "Will the {name} partnership close by {date}?",
            "Will we hire {n}+ engineers in {quarter}?",
            "Will the {name} office renovation complete on time?",
        ],
    }
    project_names = [
        "Atlas", "Beacon", "Compass", "Delta", "Echo", "Falcon", "Griffin",
        "Horizon", "Ion", "Jupiter", "Keystone", "Lighthouse", "Mercury",
        "Nexus", "Orbit", "Pulse", "Quartz", "Relay", "Sigma", "Titan",
    ]
    for i, cat in enumerate(market_categories):
        template = rng.choice(q_templates[cat])
        question = template.format(
            name=project_names[i % len(project_names)],
            date="Q" + str(rng.integers(1, 5)) + " 2024",
            quarter="Q" + str(rng.integers(1, 5)) + " 2024",
            n=rng.integers(10, 500),
            pct=rng.integers(20, 80),
            division=rng.choice(["Cloud", "Enterprise", "Consumer", "Platform"]),
        )
        questions.append(question)

    return pd.DataFrame({
        "market_id": range(1, n_markets + 1),
        "question": questions,
        "category": market_categories,
        "market_type": market_types,
        "open_date": open_dates[:n_markets],
        "close_date": close_dates[:n_markets],
        "final_price": np.round(final_prices, 4),
        "actual_outcome": actual_outcomes,
        "mgmt_forecast": np.round(mgmt_forecasts, 4),
        "num_traders": num_traders,
        "total_volume": total_volume,
        "category_label": [category_labels[c] for c in market_categories],
    })


df = generate_techcorp_data()
print(f"Total markets: {len(df)}")
print(f"Categories: {df['category'].value_counts().to_dict()}")
print(f"Market types: {df['market_type'].value_counts().to_dict()}")
print(f"\nSample markets:")
print(df[["market_id", "question", "final_price", "mgmt_forecast", "actual_outcome"]].head(10).to_string(index=False))

1.2 Overview Statistics

def techcorp_overview(df: pd.DataFrame) -> None:
    """Print overview statistics for TechCorp prediction markets."""

    print("=" * 65)
    print("TECHCORP FORECASTHUB — OVERVIEW")
    print("=" * 65)

    print(f"\n  Total markets resolved:    {len(df)}")
    print(f"  Total unique traders:      ~{df['num_traders'].sum() // 10 * 10:,} (trade-level)")
    print(f"  Mean traders per market:   {df['num_traders'].mean():.0f}")
    print(f"  Total volume (coins):      {df['total_volume'].sum():,}")

    print("\n  Markets by category:")
    for cat, count in df["category_label"].value_counts().items():
        print(f"    {cat:30s} {count:3d}")

    binary = df[df["market_type"] == "binary"]
    print(f"\n  Binary markets:  {len(binary)}")
    print(f"  Scalar markets:  {len(df) - len(binary)}")

    # Binary accuracy: was the higher-probability outcome correct?
    binary_correct = (
        ((binary["final_price"] > 0.5) & (binary["actual_outcome"] == 1.0)) |
        ((binary["final_price"] <= 0.5) & (binary["actual_outcome"] == 0.0))
    ).sum()
    print(f"\n  Binary market direction accuracy: {binary_correct}/{len(binary)} "
          f"({binary_correct / len(binary):.0%})")


techcorp_overview(df)

Phase 2: Accuracy Analysis — Market vs. Management

2.1 Calibration Analysis (Binary Markets)

import matplotlib.pyplot as plt

def calibration_analysis(df: pd.DataFrame) -> None:
    """
    Assess calibration of market prices and management forecasts for
    binary markets.  A well-calibrated forecaster has events priced at X%
    occurring approximately X% of the time.
    """
    binary = df[df["market_type"] == "binary"].copy()

    n_bins = 5
    bins = np.linspace(0, 1, n_bins + 1)
    bin_labels = [f"{bins[i]:.0%}–{bins[i+1]:.0%}" for i in range(n_bins)]

    def compute_calibration(prices, outcomes, source_name):
        midpoints = []
        observed = []
        counts = []
        for i in range(n_bins):
            mask = (prices >= bins[i]) & (prices < bins[i + 1])
            n = mask.sum()
            if n > 0:
                midpoints.append((bins[i] + bins[i + 1]) / 2)
                observed.append(outcomes[mask].mean())
                counts.append(n)
        return midpoints, observed, counts

    market_mid, market_obs, market_n = compute_calibration(
        binary["final_price"].values,
        binary["actual_outcome"].values,
        "Market",
    )
    mgmt_mid, mgmt_obs, mgmt_n = compute_calibration(
        binary["mgmt_forecast"].values,
        binary["actual_outcome"].values,
        "Management",
    )

    # Plot
    fig, ax = plt.subplots(figsize=(8, 8))
    ax.plot([0, 1], [0, 1], "k--", alpha=0.5, label="Perfect calibration")
    ax.scatter(market_mid, market_obs, s=100, color="steelblue", zorder=5,
               label="Market prices")
    ax.scatter(mgmt_mid, mgmt_obs, s=100, color="coral", zorder=5, marker="s",
               label="Management forecasts")

    for x, y, n in zip(market_mid, market_obs, market_n):
        ax.annotate(f"n={n}", (x, y), textcoords="offset points",
                    xytext=(8, -8), fontsize=9, color="steelblue")
    for x, y, n in zip(mgmt_mid, mgmt_obs, mgmt_n):
        ax.annotate(f"n={n}", (x, y), textcoords="offset points",
                    xytext=(8, 8), fontsize=9, color="coral")

    ax.set_xlabel("Predicted Probability", fontsize=12)
    ax.set_ylabel("Observed Frequency", fontsize=12)
    ax.set_title("Calibration Plot: Market vs. Management\n(Binary Markets Only)", fontsize=14)
    ax.set_xlim(-0.05, 1.05)
    ax.set_ylim(-0.05, 1.05)
    ax.legend(fontsize=11)
    ax.set_aspect("equal")
    fig.tight_layout()
    plt.savefig("calibration_market_vs_mgmt.png", dpi=150, bbox_inches="tight")
    plt.show()

    # Brier scores
    market_brier = ((binary["final_price"] - binary["actual_outcome"]) ** 2).mean()
    mgmt_brier = ((binary["mgmt_forecast"] - binary["actual_outcome"]) ** 2).mean()
    print(f"\nBrier Scores (lower is better):")
    print(f"  Market:     {market_brier:.4f}")
    print(f"  Management: {mgmt_brier:.4f}")
    improvement = (mgmt_brier - market_brier) / mgmt_brier * 100
    print(f"  Market improvement: {improvement:.1f}%")


calibration_analysis(df)

2.2 Error Comparison by Category

def error_by_category(df: pd.DataFrame) -> None:
    """
    Compare mean absolute error of market vs. management by category.
    """
    df = df.copy()
    df["market_error"] = (df["final_price"] - df["actual_outcome"]).abs()
    df["mgmt_error"] = (df["mgmt_forecast"] - df["actual_outcome"]).abs()

    print("\n" + "=" * 65)
    print("MEAN ABSOLUTE ERROR BY CATEGORY")
    print("=" * 65)
    print(f"{'Category':<25} {'Market MAE':>12} {'Mgmt MAE':>12} {'Winner':>10}")
    print("-" * 65)

    for cat in df["category_label"].unique():
        subset = df[df["category_label"] == cat]
        m_err = subset["market_error"].mean()
        g_err = subset["mgmt_error"].mean()
        winner = "Market" if m_err < g_err else "Mgmt"
        print(f"{cat:<25} {m_err:>12.4f} {g_err:>12.4f} {winner:>10}")

    overall_m = df["market_error"].mean()
    overall_g = df["mgmt_error"].mean()
    print("-" * 65)
    print(f"{'OVERALL':<25} {overall_m:>12.4f} {overall_g:>12.4f} "
          f"{'Market' if overall_m < overall_g else 'Mgmt':>10}")

    # Bar chart
    categories = df["category_label"].unique()
    market_maes = [df[df["category_label"] == c]["market_error"].mean() for c in categories]
    mgmt_maes = [df[df["category_label"] == c]["mgmt_error"].mean() for c in categories]

    x = np.arange(len(categories))
    width = 0.35

    fig, ax = plt.subplots(figsize=(10, 6))
    ax.bar(x - width / 2, market_maes, width, label="Market", color="steelblue")
    ax.bar(x + width / 2, mgmt_maes, width, label="Management", color="coral")
    ax.set_xticks(x)
    ax.set_xticklabels(categories, rotation=15, ha="right")
    ax.set_ylabel("Mean Absolute Error", fontsize=12)
    ax.set_title("Forecasting Accuracy: Market vs. Management by Category", fontsize=14)
    ax.legend()
    fig.tight_layout()
    plt.savefig("error_by_category.png", dpi=150, bbox_inches="tight")
    plt.show()


error_by_category(df)

2.3 Management Bias Analysis

def bias_analysis(df: pd.DataFrame) -> None:
    """
    Analyze systematic bias in management forecasts relative to actual outcomes.
    """
    df = df.copy()
    df["mgmt_bias"] = df["mgmt_forecast"] - df["actual_outcome"]
    df["market_bias"] = df["final_price"] - df["actual_outcome"]

    print("\n" + "=" * 65)
    print("BIAS ANALYSIS (positive = overconfident)")
    print("=" * 65)

    for cat in df["category_label"].unique():
        subset = df[df["category_label"] == cat]
        m_bias = subset["market_bias"].mean()
        g_bias = subset["mgmt_bias"].mean()
        print(f"\n{cat}:")
        print(f"  Market mean bias:     {m_bias:+.4f}")
        print(f"  Management mean bias: {g_bias:+.4f}")

    # Focus on launch dates
    launches = df[df["category"] == "launch_date"]
    if len(launches) > 0:
        print(f"\n--- LAUNCH DATE DEEP DIVE ---")
        print(f"  Markets that resolved 'Yes' (on-time): "
              f"{int(launches['actual_outcome'].sum())}/{len(launches)}")
        print(f"  Avg management forecast (P on-time):   "
              f"{launches['mgmt_forecast'].mean():.0%}")
        print(f"  Avg market price (P on-time):           "
              f"{launches['final_price'].mean():.0%}")
        print(f"  Actual on-time rate:                    "
              f"{launches['actual_outcome'].mean():.0%}")


bias_analysis(df)

Phase 3: Organizational Dynamics

3.1 Participation Patterns

def participation_analysis(df: pd.DataFrame) -> None:
    """Analyze participation patterns across categories and over time."""

    print("\n" + "=" * 65)
    print("PARTICIPATION ANALYSIS")
    print("=" * 65)

    for cat in df["category_label"].unique():
        subset = df[df["category_label"] == cat]
        print(f"\n{cat}:")
        print(f"  Mean traders/market:  {subset['num_traders'].mean():.0f}")
        print(f"  Mean volume/market:   {subset['total_volume'].mean():,.0f} coins")
        print(f"  Max traders:          {subset['num_traders'].max()}")

    # Participation over time
    df_sorted = df.sort_values("open_date")
    df_sorted["month"] = df_sorted["open_date"].dt.to_period("M")
    monthly = df_sorted.groupby("month").agg(
        markets=("market_id", "count"),
        mean_traders=("num_traders", "mean"),
        total_vol=("total_volume", "sum"),
    )
    print(f"\nMonthly participation trend:")
    print(monthly.to_string())


participation_analysis(df)

3.2 Trader-Count Effect on Accuracy

def trader_count_vs_accuracy(df: pd.DataFrame) -> None:
    """
    Analyze whether markets with more traders produce more accurate prices.
    """
    df = df.copy()
    df["market_error"] = (df["final_price"] - df["actual_outcome"]).abs()

    # Split into terciles by trader count
    df["trader_tercile"] = pd.qcut(df["num_traders"], 3, labels=["Low", "Medium", "High"])

    print("\n" + "=" * 65)
    print("ACCURACY BY TRADER COUNT")
    print("=" * 65)

    for tercile in ["Low", "Medium", "High"]:
        subset = df[df["trader_tercile"] == tercile]
        print(f"\n{tercile} trader count (n={len(subset)}):")
        print(f"  Trader range:   {subset['num_traders'].min()}–{subset['num_traders'].max()}")
        print(f"  Market MAE:     {subset['market_error'].mean():.4f}")
        print(f"  Mgmt MAE:       {(subset['mgmt_forecast'] - subset['actual_outcome']).abs().mean():.4f}")

    # Scatter
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.scatter(df["num_traders"], df["market_error"], alpha=0.5, color="steelblue")
    z = np.polyfit(df["num_traders"], df["market_error"], 1)
    p = np.poly1d(z)
    x_line = np.linspace(df["num_traders"].min(), df["num_traders"].max(), 100)
    ax.plot(x_line, p(x_line), "--", color="red", alpha=0.7)
    corr = df["num_traders"].corr(df["market_error"])
    ax.set_xlabel("Number of Traders", fontsize=12)
    ax.set_ylabel("Market Absolute Error", fontsize=12)
    ax.set_title(f"Does More Participation Improve Accuracy? (r={corr:.3f})", fontsize=14)
    fig.tight_layout()
    plt.savefig("traders_vs_accuracy.png", dpi=150, bbox_inches="tight")
    plt.show()


trader_count_vs_accuracy(df)

Discussion: Organizational Lessons

What Worked

The market surfaced hidden pessimism about launch dates. While management consistently forecasted 60–70 % on-time probability, the market hovered around 35–45 %, and the actual on-time rate was closer to the market. Engineers and QA staff, who knew about technical debt and test failures, traded accordingly.
Anonymity was essential. In post-pilot surveys, 78 % of traders said they would not have expressed their true beliefs in a meeting. The anonymous market gave them a safe channel to share pessimistic forecasts without career risk.
Revenue markets added value. The market's revenue forecasts had lower error than management's, particularly for volatile quarters. Sales representatives with on-the-ground pipeline knowledge were active traders.

What Did Not Work

Participation inequality. A small group of "power traders" (top 5 % by volume) accounted for over 40 % of trading volume. While their forecasts were generally accurate, this concentration raised concerns about market representativeness.
Gaming attempts. In two markets, participants allegedly coordinated to push prices in a direction that would influence management decisions (e.g., making a product cancellation look more likely to trigger more resources). The platform team detected this through unusual volume patterns and voided the markets.
Management resistance. Some middle managers viewed the markets as undermining their authority. When the market predicted that a pet project would miss its deadline (correctly), the responsible VP complained to the CSO that the market was "demoralizing the team."
Low liquidity in niche markets. Markets on narrow topics (e.g., "Will the Austin office renovation complete on time?") attracted fewer than 20 traders and produced unreliable prices.

Design Lessons

Lesson	Recommendation
Anonymity is critical	Never reveal trader identities to managers
Incentives matter	Play-money markets need supplemental incentives (prizes, recognition)
Minimum liquidity threshold	Do not publish market prices as forecasts until at least 30 unique traders have participated
Topic selection	Focus on questions where information is dispersed across the organization
Gaming defense	Monitor for coordinated trading; establish a market-integrity committee
Management buy-in	Frame markets as complementary to existing processes, not replacements

Discussion Questions

TechCorp's market was more accurate than management for launch dates but the advantage was smaller for revenue forecasting. Why might this be? Consider the type of information each group has access to.
Should TechCorp have used real money instead of play money? What are the pros and cons in a corporate setting?
Two markets were voided due to suspected gaming. How should the platform team distinguish between legitimate trading (an informed employee betting heavily) and manipulation (a group coordinating to influence a decision)?
If you were advising TechCorp's CSO, would you recommend expanding the program company-wide? What changes would you make to the design?
A VP complained that the market "demoralized the team" by predicting a missed deadline. How should the CSO respond? Is there a way to get the forecasting benefit without the morale cost?
How would you design a controlled experiment to rigorously test whether TechCorp's prediction markets improve decision-making (not just forecasting accuracy)?

Mini-Project Extension

Design Your Own Corporate Prediction Market

Write a 2-page proposal for implementing a prediction market at a real or fictional organization of your choice. Your proposal should include:

Three specific markets you would launch, with full question text and resolution criteria.
Participant design: Who participates? How are they incentivized? Is trading anonymous?
Risk mitigation: How will you address gaming, low participation, and management resistance?
Success metrics: How will you know after 6 months whether the program is working?
Python prototype: A script that simulates 6 months of trading data for your proposed markets and produces a dashboard showing accuracy, participation, and calibration.

End of Case Study 2