Case Study 2: Behavioral Audit of Your Trading History
Overview
This case study provides a comprehensive framework for analyzing a personal trading history for behavioral biases. Rather than analyzing external market data, you will turn the lens inward — examining your own trades for patterns that reveal systematic errors in judgment. The goal is to identify which biases are costing you the most money and build a personalized debiasing plan.
We use a synthetic trading history to demonstrate the methodology, but the framework is designed to be applied to your real trading data.
Part 1: Preparing Your Trading Data
The first step is to export and organize your trading history. At minimum, you need the following fields for each trade:
| Field | Description | Example |
|---|---|---|
| date | Date and time of trade | 2025-03-15 14:30 |
| contract_id | Unique identifier | polymarket_2024_election |
| contract_name | Human-readable name | "Biden wins 2024 election" |
| action | Buy or sell | BUY |
| price | Execution price | 0.55 |
| quantity | Number of shares | 50 |
| your_estimate | Your probability estimate at time of trade | 0.65 |
| confidence | Self-rated confidence (1-10) | 7 |
| reasoning | Brief note on why you made the trade | "Economic indicators favor incumbent" |
| outcome | 1 if event occurred, 0 if not (filled after resolution) | 1 |
| resolution_date | When the contract resolved | 2024-11-05 |
If you have not been recording all these fields, that is itself a finding — a well-disciplined trader keeps detailed records of their reasoning and confidence for exactly this kind of analysis.
import numpy as np
import csv
from datetime import datetime, timedelta
# Generate synthetic trading history for demonstration
np.random.seed(123)
n_trades = 300
contracts = [f"contract_{i:03d}" for i in range(80)]
trade_history = []
base_date = datetime(2024, 1, 1)
for i in range(n_trades):
# Simulate a trade
contract = np.random.choice(contracts)
true_prob = np.random.beta(2, 2)
# Trader's estimate has systematic biases:
# - Overconfident (estimates too extreme)
# - Anchored to round numbers
# - Confirmation bias (doesn't update enough)
estimate = true_prob + np.random.normal(0, 0.12)
estimate = np.clip(estimate, 0.05, 0.95)
# Push toward round numbers (anchoring)
round_targets = [0.1, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.9]
nearest_round = min(round_targets, key=lambda x: abs(x - estimate))
if abs(estimate - nearest_round) < 0.05:
estimate = estimate * 0.6 + nearest_round * 0.4
# Market price (somewhat efficient but noisy)
market_price = true_prob + np.random.normal(0, 0.08)
market_price = np.clip(market_price, 0.02, 0.98)
# Trader buys if they think true prob > market price
if estimate > market_price:
action = 'BUY'
else:
action = 'SELL'
# Confidence is overrated (Dunning-Kruger effect)
# Confidence is higher when edge is smaller (inverse relationship)
edge = abs(estimate - market_price)
confidence = min(10, max(1, int(7 - edge * 10 + np.random.normal(0, 1.5))))
# Outcome
outcome = 1 if np.random.uniform() < true_prob else 0
# Position size (overconfident traders bet too much)
position_size = max(5, int(confidence * 8 + np.random.normal(0, 10)))
trade_date = base_date + timedelta(days=np.random.randint(0, 365))
trade_history.append({
'date': trade_date,
'contract_id': contract,
'action': action,
'price': round(market_price, 3),
'quantity': position_size,
'your_estimate': round(estimate, 3),
'confidence': confidence,
'outcome': outcome,
'true_prob': round(true_prob, 3), # In real life you don't know this
})
print(f"Generated {len(trade_history)} trades across {len(set(t['contract_id'] for t in trade_history))} contracts")
Part 2: Calibration Analysis
The first and most important test: are your probability estimates well-calibrated?
def calibration_analysis(trades, n_bins=10):
"""Analyze calibration of probability estimates."""
estimates = np.array([t['your_estimate'] for t in trades])
outcomes = np.array([t['outcome'] for t in trades])
bin_edges = np.linspace(0, 1, n_bins + 1)
print("CALIBRATION ANALYSIS")
print("=" * 70)
print(f"{'Bin':>12} {'N':>6} {'Avg Estimate':>14} {'Actual Rate':>14} {'Error':>10} {'Status':>12}")
print("-" * 70)
total_error = 0
total_bins = 0
for i in range(n_bins):
lo, hi = bin_edges[i], bin_edges[i + 1]
if i == n_bins - 1:
mask = (estimates >= lo) & (estimates <= hi)
else:
mask = (estimates >= lo) & (estimates < hi)
n = mask.sum()
if n < 5:
continue
avg_est = estimates[mask].mean()
actual = outcomes[mask].mean()
error = actual - avg_est
status = "OK" if abs(error) < 0.05 else ("Overconf" if error < 0 else "Underconf")
print(f" [{lo:.1f}-{hi:.1f}] {n:6d} {avg_est:14.3f} {actual:14.3f} {error:10.3f} {status:>12}")
total_error += abs(error)
total_bins += 1
mean_cal_error = total_error / max(total_bins, 1)
print(f"\nMean Absolute Calibration Error: {mean_cal_error:.4f}")
# Brier score
brier = np.mean((estimates - outcomes) ** 2)
# Reference Brier score (using base rate)
base_rate = outcomes.mean()
brier_ref = np.mean((base_rate - outcomes) ** 2)
brier_skill = 1 - brier / brier_ref
print(f"Brier Score: {brier:.4f}")
print(f"Reference Brier Score: {brier_ref:.4f}")
print(f"Brier Skill Score: {brier_skill:.4f}")
if mean_cal_error > 0.08:
print("\nVERDICT: SIGNIFICANT MISCALIBRATION DETECTED")
print("Your probability estimates are systematically off.")
elif mean_cal_error > 0.04:
print("\nVERDICT: MODERATE MISCALIBRATION")
print("Your estimates are reasonably calibrated but could improve.")
else:
print("\nVERDICT: WELL CALIBRATED")
print("Your probability estimates closely match observed frequencies.")
return mean_cal_error, brier
cal_error, brier = calibration_analysis(trade_history)
Part 3: Overconfidence Detection
def overconfidence_analysis(trades):
"""Detect overconfidence in trading behavior."""
print("\nOVERCONFIDENCE ANALYSIS")
print("=" * 70)
estimates = np.array([t['your_estimate'] for t in trades])
prices = np.array([t['price'] for t in trades])
outcomes = np.array([t['outcome'] for t in trades])
confidence = np.array([t['confidence'] for t in trades])
sizes = np.array([t['quantity'] for t in trades])
actions = [t['action'] for t in trades]
# 1. Estimated edge vs actual edge
estimated_edges = np.abs(estimates - prices)
actual_pnl = np.array([
(outcomes[i] - prices[i]) if actions[i] == 'BUY' else (prices[i] - outcomes[i])
for i in range(len(trades))
])
print(f"Average estimated edge: {estimated_edges.mean():.4f}")
print(f"Average actual PnL per trade: {actual_pnl.mean():.4f}")
print(f"Edge overestimation: {estimated_edges.mean() - max(actual_pnl.mean(), 0):.4f}")
# 2. Confidence vs accuracy
print(f"\nConfidence vs Accuracy:")
for conf_level in range(1, 11):
mask = confidence == conf_level
if mask.sum() < 5:
continue
win_rate = (actual_pnl[mask] > 0).mean()
avg_pnl = actual_pnl[mask].mean()
avg_size = sizes[mask].mean()
print(f" Confidence {conf_level:2d}: Win rate {win_rate:.2%}, "
f"Avg PnL {avg_pnl:+.4f}, Avg size {avg_size:.0f}, N={mask.sum()}")
# 3. Position sizing relative to edge
high_conf = confidence >= 7
low_conf = confidence <= 4
if high_conf.sum() > 0 and low_conf.sum() > 0:
hc_win_rate = (actual_pnl[high_conf] > 0).mean()
lc_win_rate = (actual_pnl[low_conf] > 0).mean()
hc_avg_size = sizes[high_conf].mean()
lc_avg_size = sizes[low_conf].mean()
print(f"\nHigh confidence (>=7): Win rate {hc_win_rate:.2%}, Avg size {hc_avg_size:.0f}")
print(f"Low confidence (<=4): Win rate {lc_win_rate:.2%}, Avg size {lc_avg_size:.0f}")
if hc_win_rate < lc_win_rate + 0.05 and hc_avg_size > lc_avg_size * 1.2:
print("WARNING: You bet larger when confident, but confidence doesn't predict accuracy!")
print("This is classic overconfidence — your self-assessed confidence is not informative.")
# 4. Overall overconfidence score
expected_accuracy = np.mean(np.maximum(estimates, 1 - estimates))
actual_accuracy = (actual_pnl > 0).mean()
overconfidence_gap = expected_accuracy - actual_accuracy
print(f"\nExpected accuracy (from your estimates): {expected_accuracy:.2%}")
print(f"Actual accuracy: {actual_accuracy:.2%}")
print(f"Overconfidence gap: {overconfidence_gap:.2%}")
if overconfidence_gap > 0.10:
print("VERDICT: SEVERELY OVERCONFIDENT")
elif overconfidence_gap > 0.05:
print("VERDICT: MODERATELY OVERCONFIDENT")
elif overconfidence_gap > 0:
print("VERDICT: SLIGHTLY OVERCONFIDENT")
else:
print("VERDICT: NOT OVERCONFIDENT (may be underconfident)")
return overconfidence_gap
oc_gap = overconfidence_analysis(trade_history)
Part 4: Anchoring Detection
def anchoring_analysis(trades):
"""Detect anchoring to round numbers and market prices."""
print("\nANCHORING ANALYSIS")
print("=" * 70)
estimates = np.array([t['your_estimate'] for t in trades])
prices = np.array([t['price'] for t in trades])
# 1. Round number anchoring
round_numbers = [0.1, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.9]
distances_to_round = []
for est in estimates:
min_dist = min(abs(est - rn) for rn in round_numbers)
distances_to_round.append(min_dist)
distances_to_round = np.array(distances_to_round)
# Compare to uniform distribution of distances
# Under uniform distribution, expected distance to nearest round number
# from the set above would be about 0.025
expected_mean_distance = 0.025
actual_mean_distance = distances_to_round.mean()
print(f"Average distance of estimates to nearest round number: {actual_mean_distance:.4f}")
print(f"Expected distance if unanchored: ~{expected_mean_distance:.4f}")
if actual_mean_distance < expected_mean_distance * 0.7:
print("DETECTED: Strong round-number anchoring")
print("Your estimates cluster around round numbers more than expected.")
elif actual_mean_distance < expected_mean_distance * 0.9:
print("DETECTED: Mild round-number anchoring")
else:
print("No significant round-number anchoring detected.")
# 2. Estimate distribution around key points
print(f"\nEstimate frequency near round numbers:")
for rn in [0.25, 0.50, 0.75]:
near = np.abs(estimates - rn) < 0.03
print(f" Within 3% of {rn:.0%}: {near.sum()} trades ({near.mean():.1%} of all trades)")
# 3. Anchoring to market price
# If anchored, estimates should be pulled toward market price
# The residual (estimate - true_prob) should correlate with (price - true_prob)
if 'true_prob' in trades[0]:
true_probs = np.array([t['true_prob'] for t in trades])
estimate_error = estimates - true_probs
price_error = prices - true_probs
anchor_corr = np.corrcoef(estimate_error, price_error)[0, 1]
print(f"\nCorrelation between estimate error and price error: {anchor_corr:.4f}")
if anchor_corr > 0.3:
print("DETECTED: Your estimates are anchored to market prices.")
print("Your errors track the market's errors — you're not independent.")
else:
print("No significant market-price anchoring detected.")
anchoring_analysis(trade_history)
Part 5: Disposition Effect Analysis
def disposition_analysis(trades):
"""Detect the disposition effect in trading behavior."""
print("\nDISPOSITION EFFECT ANALYSIS")
print("=" * 70)
# Group trades by contract
contract_trades = {}
for t in sorted(trades, key=lambda x: x['date']):
cid = t['contract_id']
if cid not in contract_trades:
contract_trades[cid] = []
contract_trades[cid].append(t)
# For contracts with multiple trades, analyze hold times and
# whether gains were realized faster than losses
gain_hold_times = []
loss_hold_times = []
gain_counts = 0
loss_counts = 0
for cid, ctrades in contract_trades.items():
if len(ctrades) < 2:
continue
# Find buy-sell pairs
buys = [t for t in ctrades if t['action'] == 'BUY']
sells = [t for t in ctrades if t['action'] == 'SELL']
for buy in buys:
for sell in sells:
if sell['date'] > buy['date']:
hold_time = (sell['date'] - buy['date']).days
pnl = sell['price'] - buy['price']
if pnl > 0:
gain_hold_times.append(hold_time)
gain_counts += 1
elif pnl < 0:
loss_hold_times.append(hold_time)
loss_counts += 1
break # Match first sell after buy
# Also analyze based on outcome
outcomes = np.array([t['outcome'] for t in trades])
prices = np.array([t['price'] for t in trades])
actions = [t['action'] for t in trades]
winning_trade_sizes = []
losing_trade_sizes = []
for i, t in enumerate(trades):
if t['action'] == 'BUY':
pnl = outcomes[i] - prices[i]
else:
pnl = prices[i] - outcomes[i]
if pnl > 0:
winning_trade_sizes.append(t['quantity'])
else:
losing_trade_sizes.append(t['quantity'])
avg_win_size = np.mean(winning_trade_sizes) if winning_trade_sizes else 0
avg_loss_size = np.mean(losing_trade_sizes) if losing_trade_sizes else 0
print(f"Average position size (winning trades): {avg_win_size:.1f}")
print(f"Average position size (losing trades): {avg_loss_size:.1f}")
if avg_loss_size > avg_win_size * 1.1:
print("WARNING: You hold larger positions in losing trades.")
print("This suggests reluctance to close losing positions (disposition effect).")
if gain_hold_times and loss_hold_times:
avg_gain_hold = np.mean(gain_hold_times)
avg_loss_hold = np.mean(loss_hold_times)
print(f"\nAverage hold time (gains): {avg_gain_hold:.1f} days")
print(f"Average hold time (losses): {avg_loss_hold:.1f} days")
if avg_loss_hold > avg_gain_hold * 1.2:
print("DETECTED: Disposition effect — you hold losers longer than winners.")
else:
print("No clear disposition effect detected in hold times.")
else:
print("\nInsufficient matched buy-sell pairs for hold time analysis.")
# Loss aversion estimate
if winning_trade_sizes and losing_trade_sizes:
realized_wins = np.array(winning_trade_sizes)
realized_losses = np.array(losing_trade_sizes)
ratio = realized_losses.mean() / max(realized_wins.mean(), 1)
print(f"\nImplied loss aversion ratio: {ratio:.2f}")
print(f"(Prospect theory predicts ~2.25; ratio > 1.5 suggests loss aversion)")
disposition_analysis(trade_history)
Part 6: Confirmation Bias Detection
def confirmation_bias_analysis(trades):
"""Detect signs of confirmation bias."""
print("\nCONFIRMATION BIAS ANALYSIS")
print("=" * 70)
# Proxy: Do you double down on losing positions?
# Confirmation bias predicts you'll add to positions
# rather than cutting them when the market moves against you.
contract_actions = {}
for t in sorted(trades, key=lambda x: x['date']):
cid = t['contract_id']
if cid not in contract_actions:
contract_actions[cid] = []
contract_actions[cid].append(t)
doubling_down_count = 0
cutting_losses_count = 0
adding_to_winners_count = 0
taking_profits_count = 0
for cid, ctrades in contract_actions.items():
if len(ctrades) < 2:
continue
for i in range(1, len(ctrades)):
prev = ctrades[i - 1]
curr = ctrades[i]
if prev['action'] == 'BUY':
price_moved = curr['price'] - prev['price']
if price_moved < -0.03: # Market moved against
if curr['action'] == 'BUY':
doubling_down_count += 1
elif curr['action'] == 'SELL':
cutting_losses_count += 1
elif price_moved > 0.03: # Market moved in favor
if curr['action'] == 'SELL':
taking_profits_count += 1
elif curr['action'] == 'BUY':
adding_to_winners_count += 1
total_adverse = doubling_down_count + cutting_losses_count
total_favorable = taking_profits_count + adding_to_winners_count
print(f"When market moves against your position:")
print(f" Doubled down: {doubling_down_count}")
print(f" Cut losses: {cutting_losses_count}")
if total_adverse > 0:
dd_rate = doubling_down_count / total_adverse
print(f" Doubling-down rate: {dd_rate:.1%}")
if dd_rate > 0.6:
print(" WARNING: High doubling-down rate suggests confirmation bias.")
print(" You may be interpreting adverse price moves as the market being wrong,")
print(" rather than considering that you might be wrong.")
print(f"\nWhen market moves in your favor:")
print(f" Took profits: {taking_profits_count}")
print(f" Added to winner: {adding_to_winners_count}")
# Also check: does the trader's estimate update in response to
# market movements? (With synthetic data we can check this)
estimates = np.array([t['your_estimate'] for t in trades])
prices = np.array([t['price'] for t in trades])
# Measure how much estimates diverge from market prices
divergence = np.abs(estimates - prices)
print(f"\nAverage divergence from market price: {divergence.mean():.4f}")
print(f"Median divergence from market price: {np.median(divergence):.4f}")
if divergence.mean() > 0.10:
print("Your estimates frequently disagree with the market by large amounts.")
print("This could indicate strong private information OR confirmation bias.")
print("Check your accuracy for high-divergence trades:")
high_div_mask = divergence > 0.10
if high_div_mask.any():
outcomes = np.array([t['outcome'] for t in trades])
actions_arr = np.array([t['action'] for t in trades])
high_div_pnl = []
for i in np.where(high_div_mask)[0]:
if actions_arr[i] == 'BUY':
high_div_pnl.append(outcomes[i] - prices[i])
else:
high_div_pnl.append(prices[i] - outcomes[i])
high_div_pnl = np.array(high_div_pnl)
print(f" Win rate for high-divergence trades: {(high_div_pnl > 0).mean():.2%}")
print(f" Average PnL for high-divergence trades: {high_div_pnl.mean():.4f}")
if (high_div_pnl > 0).mean() < 0.50:
print(" VERDICT: Your high-conviction disagreements with the market are unprofitable.")
print(" This strongly suggests confirmation bias rather than superior information.")
confirmation_bias_analysis(trade_history)
Part 7: Calculating the Cost of Biases
def cost_of_biases(trades):
"""Estimate the financial cost of each detected bias."""
print("\nCOST OF BIASES ANALYSIS")
print("=" * 70)
outcomes = np.array([t['outcome'] for t in trades])
prices = np.array([t['price'] for t in trades])
estimates = np.array([t['your_estimate'] for t in trades])
sizes = np.array([t['quantity'] for t in trades])
confidence = np.array([t['confidence'] for t in trades])
actions = np.array([t['action'] for t in trades])
# Actual PnL
pnl = np.array([
(outcomes[i] - prices[i]) * sizes[i] if actions[i] == 'BUY'
else (prices[i] - outcomes[i]) * sizes[i]
for i in range(len(trades))
])
total_pnl = pnl.sum()
print(f"Total actual PnL: ${total_pnl:.2f}")
# Cost of overconfidence: excess position sizing
# If the trader sized positions proportional to actual edge rather
# than perceived edge, what would PnL be?
actual_edges = np.array([
abs(outcomes[i] - prices[i]) for i in range(len(trades))
])
perceived_edges = np.abs(estimates - prices)
edge_ratio = np.where(perceived_edges > 0,
np.minimum(actual_edges / perceived_edges, 3), 1)
optimal_sizes = sizes * edge_ratio
optimal_pnl = np.array([
(outcomes[i] - prices[i]) * optimal_sizes[i] if actions[i] == 'BUY'
else (prices[i] - outcomes[i]) * optimal_sizes[i]
for i in range(len(trades))
])
overconfidence_cost = total_pnl - optimal_pnl.sum()
print(f"Estimated cost of overconfidence (sizing): ${abs(overconfidence_cost):.2f}")
# Cost of miscalibration: wrong direction trades
wrong_direction = np.array([
(estimates[i] > prices[i] and outcomes[i] < prices[i]) or
(estimates[i] < prices[i] and outcomes[i] > prices[i])
for i in range(len(trades))
])
miscalibration_cost = np.abs(pnl[wrong_direction]).sum()
print(f"Estimated cost of miscalibration (wrong direction): ${miscalibration_cost:.2f}")
# Cost of high-confidence failures
high_conf_wrong = (confidence >= 7) & (pnl < 0)
high_conf_loss = np.abs(pnl[high_conf_wrong]).sum()
print(f"Cost of high-confidence losing trades: ${high_conf_loss:.2f}")
print(f"\n{'Bias':>25} {'Estimated Cost':>15} {'% of Total Loss':>15}")
print("-" * 60)
total_loss = np.abs(pnl[pnl < 0]).sum()
print(f"{'Overconfidence':>25} ${abs(overconfidence_cost):>14.2f} {abs(overconfidence_cost)/max(total_loss,1):>14.1%}")
print(f"{'Miscalibration':>25} ${miscalibration_cost:>14.2f} {miscalibration_cost/max(total_loss,1):>14.1%}")
print(f"{'High-conf failures':>25} ${high_conf_loss:>14.2f} {high_conf_loss/max(total_loss,1):>14.1%}")
print(f"{'Total losses':>25} ${total_loss:>14.2f}")
cost_of_biases(trade_history)
Part 8: Building Your Debiasing Plan
Based on the analysis above, here is a framework for creating a personalized debiasing plan.
Step 1: Rank Your Biases
Order the detected biases by their estimated financial cost:
- Most costly bias: Address this first with the most intensive debiasing technique.
- Second most costly: Address with a moderate intervention.
- Third most costly: Monitor and address as time permits.
Step 2: Select Debiasing Interventions
| Bias Detected | Recommended Intervention | Implementation |
|---|---|---|
| Overconfidence | Calibration training + smaller position sizes | Take calibration quizzes weekly; cap max position at 2% of bankroll |
| Anchoring (round numbers) | Force non-round estimates | Before estimating, write "my estimate will probably NOT be a round number" |
| Anchoring (market price) | Estimate before looking at price | Write your probability estimate before opening the market |
| Confirmation bias | Seek opposing views | Read one analysis opposing your view before every trade |
| Disposition effect | Pre-commit to exit rules | Set stop-losses and take-profit levels at entry; do not modify them |
| Herding | Contrarian checklist | For every trade that follows the crowd, write three reasons the crowd could be wrong |
| Recency bias | Base rate reference | Before every trade, look up the historical base rate for the event type |
Step 3: Implement Tracking
Create a simple spreadsheet or use the code from code/example-03-debiasing-tools.py to track:
- Weekly calibration scores
- Monthly bias audit results
- Trade-by-trade pre-trade checklist completion
- Rolling P&L by bias category
Step 4: Review and Iterate
Schedule monthly reviews where you re-run the full behavioral audit on your recent trades. Track whether your bias metrics are improving over time. Adjust your debiasing interventions based on what is working and what is not.
Conclusions
A behavioral audit of your trading history is one of the most valuable exercises a prediction market trader can undertake. By systematically measuring your biases and their costs, you transform vague self-improvement goals into concrete, measurable objectives.
The key findings from a typical audit include:
- Most traders are overconfident — their estimates are more extreme than their accuracy warrants, and their confidence ratings do not predict accuracy.
- Round-number anchoring is nearly universal — estimates cluster near 25%, 50%, and 75% far more than they should.
- The disposition effect is common — losing positions are held longer and at larger sizes than winning positions.
- Confirmation bias is invisible — traders rarely notice it in themselves, which is why quantitative analysis is essential.
The goal is not to eliminate all biases (that is probably impossible) but to reduce their magnitude and their financial cost. Even a 20% reduction in the cost of biases can meaningfully improve your long-term returns.
The full code for this case study is available in code/case-study-code.py.