Case Study 2: Behavioral Audit of Your Trading History

Overview

This case study provides a comprehensive framework for analyzing a personal trading history for behavioral biases. Rather than analyzing external market data, you will turn the lens inward — examining your own trades for patterns that reveal systematic errors in judgment. The goal is to identify which biases are costing you the most money and build a personalized debiasing plan.

We use a synthetic trading history to demonstrate the methodology, but the framework is designed to be applied to your real trading data.

Part 1: Preparing Your Trading Data

The first step is to export and organize your trading history. At minimum, you need the following fields for each trade:

Field Description Example
date Date and time of trade 2025-03-15 14:30
contract_id Unique identifier polymarket_2024_election
contract_name Human-readable name "Biden wins 2024 election"
action Buy or sell BUY
price Execution price 0.55
quantity Number of shares 50
your_estimate Your probability estimate at time of trade 0.65
confidence Self-rated confidence (1-10) 7
reasoning Brief note on why you made the trade "Economic indicators favor incumbent"
outcome 1 if event occurred, 0 if not (filled after resolution) 1
resolution_date When the contract resolved 2024-11-05

If you have not been recording all these fields, that is itself a finding — a well-disciplined trader keeps detailed records of their reasoning and confidence for exactly this kind of analysis.

import numpy as np
import csv
from datetime import datetime, timedelta

# Generate synthetic trading history for demonstration
np.random.seed(123)

n_trades = 300
contracts = [f"contract_{i:03d}" for i in range(80)]

trade_history = []
base_date = datetime(2024, 1, 1)

for i in range(n_trades):
    # Simulate a trade
    contract = np.random.choice(contracts)
    true_prob = np.random.beta(2, 2)

    # Trader's estimate has systematic biases:
    # - Overconfident (estimates too extreme)
    # - Anchored to round numbers
    # - Confirmation bias (doesn't update enough)
    estimate = true_prob + np.random.normal(0, 0.12)
    estimate = np.clip(estimate, 0.05, 0.95)

    # Push toward round numbers (anchoring)
    round_targets = [0.1, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.9]
    nearest_round = min(round_targets, key=lambda x: abs(x - estimate))
    if abs(estimate - nearest_round) < 0.05:
        estimate = estimate * 0.6 + nearest_round * 0.4

    # Market price (somewhat efficient but noisy)
    market_price = true_prob + np.random.normal(0, 0.08)
    market_price = np.clip(market_price, 0.02, 0.98)

    # Trader buys if they think true prob > market price
    if estimate > market_price:
        action = 'BUY'
    else:
        action = 'SELL'

    # Confidence is overrated (Dunning-Kruger effect)
    # Confidence is higher when edge is smaller (inverse relationship)
    edge = abs(estimate - market_price)
    confidence = min(10, max(1, int(7 - edge * 10 + np.random.normal(0, 1.5))))

    # Outcome
    outcome = 1 if np.random.uniform() < true_prob else 0

    # Position size (overconfident traders bet too much)
    position_size = max(5, int(confidence * 8 + np.random.normal(0, 10)))

    trade_date = base_date + timedelta(days=np.random.randint(0, 365))

    trade_history.append({
        'date': trade_date,
        'contract_id': contract,
        'action': action,
        'price': round(market_price, 3),
        'quantity': position_size,
        'your_estimate': round(estimate, 3),
        'confidence': confidence,
        'outcome': outcome,
        'true_prob': round(true_prob, 3),  # In real life you don't know this
    })

print(f"Generated {len(trade_history)} trades across {len(set(t['contract_id'] for t in trade_history))} contracts")

Part 2: Calibration Analysis

The first and most important test: are your probability estimates well-calibrated?

def calibration_analysis(trades, n_bins=10):
    """Analyze calibration of probability estimates."""
    estimates = np.array([t['your_estimate'] for t in trades])
    outcomes = np.array([t['outcome'] for t in trades])

    bin_edges = np.linspace(0, 1, n_bins + 1)

    print("CALIBRATION ANALYSIS")
    print("=" * 70)
    print(f"{'Bin':>12} {'N':>6} {'Avg Estimate':>14} {'Actual Rate':>14} {'Error':>10} {'Status':>12}")
    print("-" * 70)

    total_error = 0
    total_bins = 0

    for i in range(n_bins):
        lo, hi = bin_edges[i], bin_edges[i + 1]
        if i == n_bins - 1:
            mask = (estimates >= lo) & (estimates <= hi)
        else:
            mask = (estimates >= lo) & (estimates < hi)

        n = mask.sum()
        if n < 5:
            continue

        avg_est = estimates[mask].mean()
        actual = outcomes[mask].mean()
        error = actual - avg_est

        status = "OK" if abs(error) < 0.05 else ("Overconf" if error < 0 else "Underconf")

        print(f"  [{lo:.1f}-{hi:.1f}] {n:6d} {avg_est:14.3f} {actual:14.3f} {error:10.3f} {status:>12}")

        total_error += abs(error)
        total_bins += 1

    mean_cal_error = total_error / max(total_bins, 1)
    print(f"\nMean Absolute Calibration Error: {mean_cal_error:.4f}")

    # Brier score
    brier = np.mean((estimates - outcomes) ** 2)
    # Reference Brier score (using base rate)
    base_rate = outcomes.mean()
    brier_ref = np.mean((base_rate - outcomes) ** 2)
    brier_skill = 1 - brier / brier_ref

    print(f"Brier Score: {brier:.4f}")
    print(f"Reference Brier Score: {brier_ref:.4f}")
    print(f"Brier Skill Score: {brier_skill:.4f}")

    if mean_cal_error > 0.08:
        print("\nVERDICT: SIGNIFICANT MISCALIBRATION DETECTED")
        print("Your probability estimates are systematically off.")
    elif mean_cal_error > 0.04:
        print("\nVERDICT: MODERATE MISCALIBRATION")
        print("Your estimates are reasonably calibrated but could improve.")
    else:
        print("\nVERDICT: WELL CALIBRATED")
        print("Your probability estimates closely match observed frequencies.")

    return mean_cal_error, brier

cal_error, brier = calibration_analysis(trade_history)

Part 3: Overconfidence Detection

def overconfidence_analysis(trades):
    """Detect overconfidence in trading behavior."""
    print("\nOVERCONFIDENCE ANALYSIS")
    print("=" * 70)

    estimates = np.array([t['your_estimate'] for t in trades])
    prices = np.array([t['price'] for t in trades])
    outcomes = np.array([t['outcome'] for t in trades])
    confidence = np.array([t['confidence'] for t in trades])
    sizes = np.array([t['quantity'] for t in trades])
    actions = [t['action'] for t in trades]

    # 1. Estimated edge vs actual edge
    estimated_edges = np.abs(estimates - prices)
    actual_pnl = np.array([
        (outcomes[i] - prices[i]) if actions[i] == 'BUY' else (prices[i] - outcomes[i])
        for i in range(len(trades))
    ])

    print(f"Average estimated edge: {estimated_edges.mean():.4f}")
    print(f"Average actual PnL per trade: {actual_pnl.mean():.4f}")
    print(f"Edge overestimation: {estimated_edges.mean() - max(actual_pnl.mean(), 0):.4f}")

    # 2. Confidence vs accuracy
    print(f"\nConfidence vs Accuracy:")
    for conf_level in range(1, 11):
        mask = confidence == conf_level
        if mask.sum() < 5:
            continue
        win_rate = (actual_pnl[mask] > 0).mean()
        avg_pnl = actual_pnl[mask].mean()
        avg_size = sizes[mask].mean()
        print(f"  Confidence {conf_level:2d}: Win rate {win_rate:.2%}, "
              f"Avg PnL {avg_pnl:+.4f}, Avg size {avg_size:.0f}, N={mask.sum()}")

    # 3. Position sizing relative to edge
    high_conf = confidence >= 7
    low_conf = confidence <= 4

    if high_conf.sum() > 0 and low_conf.sum() > 0:
        hc_win_rate = (actual_pnl[high_conf] > 0).mean()
        lc_win_rate = (actual_pnl[low_conf] > 0).mean()
        hc_avg_size = sizes[high_conf].mean()
        lc_avg_size = sizes[low_conf].mean()

        print(f"\nHigh confidence (>=7): Win rate {hc_win_rate:.2%}, Avg size {hc_avg_size:.0f}")
        print(f"Low confidence (<=4):  Win rate {lc_win_rate:.2%}, Avg size {lc_avg_size:.0f}")

        if hc_win_rate < lc_win_rate + 0.05 and hc_avg_size > lc_avg_size * 1.2:
            print("WARNING: You bet larger when confident, but confidence doesn't predict accuracy!")
            print("This is classic overconfidence — your self-assessed confidence is not informative.")

    # 4. Overall overconfidence score
    expected_accuracy = np.mean(np.maximum(estimates, 1 - estimates))
    actual_accuracy = (actual_pnl > 0).mean()
    overconfidence_gap = expected_accuracy - actual_accuracy

    print(f"\nExpected accuracy (from your estimates): {expected_accuracy:.2%}")
    print(f"Actual accuracy: {actual_accuracy:.2%}")
    print(f"Overconfidence gap: {overconfidence_gap:.2%}")

    if overconfidence_gap > 0.10:
        print("VERDICT: SEVERELY OVERCONFIDENT")
    elif overconfidence_gap > 0.05:
        print("VERDICT: MODERATELY OVERCONFIDENT")
    elif overconfidence_gap > 0:
        print("VERDICT: SLIGHTLY OVERCONFIDENT")
    else:
        print("VERDICT: NOT OVERCONFIDENT (may be underconfident)")

    return overconfidence_gap

oc_gap = overconfidence_analysis(trade_history)

Part 4: Anchoring Detection

def anchoring_analysis(trades):
    """Detect anchoring to round numbers and market prices."""
    print("\nANCHORING ANALYSIS")
    print("=" * 70)

    estimates = np.array([t['your_estimate'] for t in trades])
    prices = np.array([t['price'] for t in trades])

    # 1. Round number anchoring
    round_numbers = [0.1, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.9]

    distances_to_round = []
    for est in estimates:
        min_dist = min(abs(est - rn) for rn in round_numbers)
        distances_to_round.append(min_dist)

    distances_to_round = np.array(distances_to_round)

    # Compare to uniform distribution of distances
    # Under uniform distribution, expected distance to nearest round number
    # from the set above would be about 0.025
    expected_mean_distance = 0.025
    actual_mean_distance = distances_to_round.mean()

    print(f"Average distance of estimates to nearest round number: {actual_mean_distance:.4f}")
    print(f"Expected distance if unanchored: ~{expected_mean_distance:.4f}")

    if actual_mean_distance < expected_mean_distance * 0.7:
        print("DETECTED: Strong round-number anchoring")
        print("Your estimates cluster around round numbers more than expected.")
    elif actual_mean_distance < expected_mean_distance * 0.9:
        print("DETECTED: Mild round-number anchoring")
    else:
        print("No significant round-number anchoring detected.")

    # 2. Estimate distribution around key points
    print(f"\nEstimate frequency near round numbers:")
    for rn in [0.25, 0.50, 0.75]:
        near = np.abs(estimates - rn) < 0.03
        print(f"  Within 3% of {rn:.0%}: {near.sum()} trades ({near.mean():.1%} of all trades)")

    # 3. Anchoring to market price
    # If anchored, estimates should be pulled toward market price
    # The residual (estimate - true_prob) should correlate with (price - true_prob)
    if 'true_prob' in trades[0]:
        true_probs = np.array([t['true_prob'] for t in trades])
        estimate_error = estimates - true_probs
        price_error = prices - true_probs
        anchor_corr = np.corrcoef(estimate_error, price_error)[0, 1]
        print(f"\nCorrelation between estimate error and price error: {anchor_corr:.4f}")
        if anchor_corr > 0.3:
            print("DETECTED: Your estimates are anchored to market prices.")
            print("Your errors track the market's errors — you're not independent.")
        else:
            print("No significant market-price anchoring detected.")

anchoring_analysis(trade_history)

Part 5: Disposition Effect Analysis

def disposition_analysis(trades):
    """Detect the disposition effect in trading behavior."""
    print("\nDISPOSITION EFFECT ANALYSIS")
    print("=" * 70)

    # Group trades by contract
    contract_trades = {}
    for t in sorted(trades, key=lambda x: x['date']):
        cid = t['contract_id']
        if cid not in contract_trades:
            contract_trades[cid] = []
        contract_trades[cid].append(t)

    # For contracts with multiple trades, analyze hold times and
    # whether gains were realized faster than losses
    gain_hold_times = []
    loss_hold_times = []
    gain_counts = 0
    loss_counts = 0

    for cid, ctrades in contract_trades.items():
        if len(ctrades) < 2:
            continue

        # Find buy-sell pairs
        buys = [t for t in ctrades if t['action'] == 'BUY']
        sells = [t for t in ctrades if t['action'] == 'SELL']

        for buy in buys:
            for sell in sells:
                if sell['date'] > buy['date']:
                    hold_time = (sell['date'] - buy['date']).days
                    pnl = sell['price'] - buy['price']

                    if pnl > 0:
                        gain_hold_times.append(hold_time)
                        gain_counts += 1
                    elif pnl < 0:
                        loss_hold_times.append(hold_time)
                        loss_counts += 1
                    break  # Match first sell after buy

    # Also analyze based on outcome
    outcomes = np.array([t['outcome'] for t in trades])
    prices = np.array([t['price'] for t in trades])
    actions = [t['action'] for t in trades]

    winning_trade_sizes = []
    losing_trade_sizes = []

    for i, t in enumerate(trades):
        if t['action'] == 'BUY':
            pnl = outcomes[i] - prices[i]
        else:
            pnl = prices[i] - outcomes[i]

        if pnl > 0:
            winning_trade_sizes.append(t['quantity'])
        else:
            losing_trade_sizes.append(t['quantity'])

    avg_win_size = np.mean(winning_trade_sizes) if winning_trade_sizes else 0
    avg_loss_size = np.mean(losing_trade_sizes) if losing_trade_sizes else 0

    print(f"Average position size (winning trades): {avg_win_size:.1f}")
    print(f"Average position size (losing trades): {avg_loss_size:.1f}")

    if avg_loss_size > avg_win_size * 1.1:
        print("WARNING: You hold larger positions in losing trades.")
        print("This suggests reluctance to close losing positions (disposition effect).")

    if gain_hold_times and loss_hold_times:
        avg_gain_hold = np.mean(gain_hold_times)
        avg_loss_hold = np.mean(loss_hold_times)
        print(f"\nAverage hold time (gains): {avg_gain_hold:.1f} days")
        print(f"Average hold time (losses): {avg_loss_hold:.1f} days")

        if avg_loss_hold > avg_gain_hold * 1.2:
            print("DETECTED: Disposition effect — you hold losers longer than winners.")
        else:
            print("No clear disposition effect detected in hold times.")
    else:
        print("\nInsufficient matched buy-sell pairs for hold time analysis.")

    # Loss aversion estimate
    if winning_trade_sizes and losing_trade_sizes:
        realized_wins = np.array(winning_trade_sizes)
        realized_losses = np.array(losing_trade_sizes)
        ratio = realized_losses.mean() / max(realized_wins.mean(), 1)
        print(f"\nImplied loss aversion ratio: {ratio:.2f}")
        print(f"(Prospect theory predicts ~2.25; ratio > 1.5 suggests loss aversion)")

disposition_analysis(trade_history)

Part 6: Confirmation Bias Detection

def confirmation_bias_analysis(trades):
    """Detect signs of confirmation bias."""
    print("\nCONFIRMATION BIAS ANALYSIS")
    print("=" * 70)

    # Proxy: Do you double down on losing positions?
    # Confirmation bias predicts you'll add to positions
    # rather than cutting them when the market moves against you.

    contract_actions = {}
    for t in sorted(trades, key=lambda x: x['date']):
        cid = t['contract_id']
        if cid not in contract_actions:
            contract_actions[cid] = []
        contract_actions[cid].append(t)

    doubling_down_count = 0
    cutting_losses_count = 0
    adding_to_winners_count = 0
    taking_profits_count = 0

    for cid, ctrades in contract_actions.items():
        if len(ctrades) < 2:
            continue

        for i in range(1, len(ctrades)):
            prev = ctrades[i - 1]
            curr = ctrades[i]

            if prev['action'] == 'BUY':
                price_moved = curr['price'] - prev['price']
                if price_moved < -0.03:  # Market moved against
                    if curr['action'] == 'BUY':
                        doubling_down_count += 1
                    elif curr['action'] == 'SELL':
                        cutting_losses_count += 1
                elif price_moved > 0.03:  # Market moved in favor
                    if curr['action'] == 'SELL':
                        taking_profits_count += 1
                    elif curr['action'] == 'BUY':
                        adding_to_winners_count += 1

    total_adverse = doubling_down_count + cutting_losses_count
    total_favorable = taking_profits_count + adding_to_winners_count

    print(f"When market moves against your position:")
    print(f"  Doubled down: {doubling_down_count}")
    print(f"  Cut losses:   {cutting_losses_count}")
    if total_adverse > 0:
        dd_rate = doubling_down_count / total_adverse
        print(f"  Doubling-down rate: {dd_rate:.1%}")
        if dd_rate > 0.6:
            print("  WARNING: High doubling-down rate suggests confirmation bias.")
            print("  You may be interpreting adverse price moves as the market being wrong,")
            print("  rather than considering that you might be wrong.")

    print(f"\nWhen market moves in your favor:")
    print(f"  Took profits:     {taking_profits_count}")
    print(f"  Added to winner:  {adding_to_winners_count}")

    # Also check: does the trader's estimate update in response to
    # market movements? (With synthetic data we can check this)
    estimates = np.array([t['your_estimate'] for t in trades])
    prices = np.array([t['price'] for t in trades])

    # Measure how much estimates diverge from market prices
    divergence = np.abs(estimates - prices)
    print(f"\nAverage divergence from market price: {divergence.mean():.4f}")
    print(f"Median divergence from market price: {np.median(divergence):.4f}")

    if divergence.mean() > 0.10:
        print("Your estimates frequently disagree with the market by large amounts.")
        print("This could indicate strong private information OR confirmation bias.")
        print("Check your accuracy for high-divergence trades:")

        high_div_mask = divergence > 0.10
        if high_div_mask.any():
            outcomes = np.array([t['outcome'] for t in trades])
            actions_arr = np.array([t['action'] for t in trades])

            high_div_pnl = []
            for i in np.where(high_div_mask)[0]:
                if actions_arr[i] == 'BUY':
                    high_div_pnl.append(outcomes[i] - prices[i])
                else:
                    high_div_pnl.append(prices[i] - outcomes[i])

            high_div_pnl = np.array(high_div_pnl)
            print(f"  Win rate for high-divergence trades: {(high_div_pnl > 0).mean():.2%}")
            print(f"  Average PnL for high-divergence trades: {high_div_pnl.mean():.4f}")

            if (high_div_pnl > 0).mean() < 0.50:
                print("  VERDICT: Your high-conviction disagreements with the market are unprofitable.")
                print("  This strongly suggests confirmation bias rather than superior information.")

confirmation_bias_analysis(trade_history)

Part 7: Calculating the Cost of Biases

def cost_of_biases(trades):
    """Estimate the financial cost of each detected bias."""
    print("\nCOST OF BIASES ANALYSIS")
    print("=" * 70)

    outcomes = np.array([t['outcome'] for t in trades])
    prices = np.array([t['price'] for t in trades])
    estimates = np.array([t['your_estimate'] for t in trades])
    sizes = np.array([t['quantity'] for t in trades])
    confidence = np.array([t['confidence'] for t in trades])
    actions = np.array([t['action'] for t in trades])

    # Actual PnL
    pnl = np.array([
        (outcomes[i] - prices[i]) * sizes[i] if actions[i] == 'BUY'
        else (prices[i] - outcomes[i]) * sizes[i]
        for i in range(len(trades))
    ])

    total_pnl = pnl.sum()
    print(f"Total actual PnL: ${total_pnl:.2f}")

    # Cost of overconfidence: excess position sizing
    # If the trader sized positions proportional to actual edge rather
    # than perceived edge, what would PnL be?
    actual_edges = np.array([
        abs(outcomes[i] - prices[i]) for i in range(len(trades))
    ])
    perceived_edges = np.abs(estimates - prices)

    edge_ratio = np.where(perceived_edges > 0,
                          np.minimum(actual_edges / perceived_edges, 3), 1)
    optimal_sizes = sizes * edge_ratio
    optimal_pnl = np.array([
        (outcomes[i] - prices[i]) * optimal_sizes[i] if actions[i] == 'BUY'
        else (prices[i] - outcomes[i]) * optimal_sizes[i]
        for i in range(len(trades))
    ])

    overconfidence_cost = total_pnl - optimal_pnl.sum()
    print(f"Estimated cost of overconfidence (sizing): ${abs(overconfidence_cost):.2f}")

    # Cost of miscalibration: wrong direction trades
    wrong_direction = np.array([
        (estimates[i] > prices[i] and outcomes[i] < prices[i]) or
        (estimates[i] < prices[i] and outcomes[i] > prices[i])
        for i in range(len(trades))
    ])
    miscalibration_cost = np.abs(pnl[wrong_direction]).sum()
    print(f"Estimated cost of miscalibration (wrong direction): ${miscalibration_cost:.2f}")

    # Cost of high-confidence failures
    high_conf_wrong = (confidence >= 7) & (pnl < 0)
    high_conf_loss = np.abs(pnl[high_conf_wrong]).sum()
    print(f"Cost of high-confidence losing trades: ${high_conf_loss:.2f}")

    print(f"\n{'Bias':>25} {'Estimated Cost':>15} {'% of Total Loss':>15}")
    print("-" * 60)
    total_loss = np.abs(pnl[pnl < 0]).sum()
    print(f"{'Overconfidence':>25} ${abs(overconfidence_cost):>14.2f} {abs(overconfidence_cost)/max(total_loss,1):>14.1%}")
    print(f"{'Miscalibration':>25} ${miscalibration_cost:>14.2f} {miscalibration_cost/max(total_loss,1):>14.1%}")
    print(f"{'High-conf failures':>25} ${high_conf_loss:>14.2f} {high_conf_loss/max(total_loss,1):>14.1%}")
    print(f"{'Total losses':>25} ${total_loss:>14.2f}")

cost_of_biases(trade_history)

Part 8: Building Your Debiasing Plan

Based on the analysis above, here is a framework for creating a personalized debiasing plan.

Step 1: Rank Your Biases

Order the detected biases by their estimated financial cost:

  1. Most costly bias: Address this first with the most intensive debiasing technique.
  2. Second most costly: Address with a moderate intervention.
  3. Third most costly: Monitor and address as time permits.

Step 2: Select Debiasing Interventions

Bias Detected Recommended Intervention Implementation
Overconfidence Calibration training + smaller position sizes Take calibration quizzes weekly; cap max position at 2% of bankroll
Anchoring (round numbers) Force non-round estimates Before estimating, write "my estimate will probably NOT be a round number"
Anchoring (market price) Estimate before looking at price Write your probability estimate before opening the market
Confirmation bias Seek opposing views Read one analysis opposing your view before every trade
Disposition effect Pre-commit to exit rules Set stop-losses and take-profit levels at entry; do not modify them
Herding Contrarian checklist For every trade that follows the crowd, write three reasons the crowd could be wrong
Recency bias Base rate reference Before every trade, look up the historical base rate for the event type

Step 3: Implement Tracking

Create a simple spreadsheet or use the code from code/example-03-debiasing-tools.py to track:

  • Weekly calibration scores
  • Monthly bias audit results
  • Trade-by-trade pre-trade checklist completion
  • Rolling P&L by bias category

Step 4: Review and Iterate

Schedule monthly reviews where you re-run the full behavioral audit on your recent trades. Track whether your bias metrics are improving over time. Adjust your debiasing interventions based on what is working and what is not.

Conclusions

A behavioral audit of your trading history is one of the most valuable exercises a prediction market trader can undertake. By systematically measuring your biases and their costs, you transform vague self-improvement goals into concrete, measurable objectives.

The key findings from a typical audit include:

  1. Most traders are overconfident — their estimates are more extreme than their accuracy warrants, and their confidence ratings do not predict accuracy.
  2. Round-number anchoring is nearly universal — estimates cluster near 25%, 50%, and 75% far more than they should.
  3. The disposition effect is common — losing positions are held longer and at larger sizes than winning positions.
  4. Confirmation bias is invisible — traders rarely notice it in themselves, which is why quantitative analysis is essential.

The goal is not to eliminate all biases (that is probably impossible) but to reduce their magnitude and their financial cost. Even a 20% reduction in the cost of biases can meaningfully improve your long-term returns.

The full code for this case study is available in code/case-study-code.py.