Chapter 24 Key Takeaways

Core Principle

Text is the raw material of belief. Prediction markets move when traders read text, update beliefs, and trade. A system that can extract signal from text faster and more accurately than a human reader holds a structural edge.

The Big Ideas

1. Text Is a Leading Indicator in Prediction Markets

The causal chain is: event occurs, text is published, traders read, beliefs update, prices move. The latency between publication and price adjustment creates a tradeable window. This window is larger in prediction markets than in equities because analyst coverage is thinner and liquidity is lower.

2. Preprocessing Must Match the Downstream Model

Classical models (TF-IDF + logistic regression) benefit from aggressive preprocessing: lowercasing, stopword removal, stemming, and lemmatization. Transformer models should receive minimally processed text -- they were trained with full syntax, capitalization, and stopwords, and removing these degrades performance.

3. TF-IDF Remains a Strong Baseline for Structured Text Tasks

Term Frequency-Inverse Document Frequency converts text into a sparse numerical matrix where each dimension corresponds to a token or n-gram. When paired with logistic regression or SVM, TF-IDF achieves surprisingly competitive results on many classification tasks, especially with limited training data:

$$\text{TF-IDF}(t, d) = \text{TF}(t, d) \times \log\!\Bigl(\frac{N}{\text{DF}(t)}\Bigr)$$

4. VADER and TextBlob Provide Instant Sentiment -- but with Limits

Lexicon-based sentiment tools require zero training data and run in microseconds. VADER handles social media conventions (capitalization, punctuation, emojis). TextBlob provides both polarity and subjectivity. Neither captures context well: "not great" may score as positive because "great" is a positive word, and neither understands domain-specific language.

5. Transformers Understand Context -- That Is Their Superpower

BERT, RoBERTa, and their descendants process text as contextualized embeddings where the meaning of each word depends on its surroundings. "The Fed raised rates" and "Interest rates went up" produce similar embeddings despite sharing no content words. This contextual understanding is critical for prediction market text where subtle phrasing carries enormous weight.

6. Fine-Tuning Is the Practical Frontier

Pre-trained transformer models are general. Fine-tuning on a few hundred labeled prediction market examples specializes them dramatically. Key fine-tuning decisions:

Decision Recommendation
Base model DistilBERT for speed; RoBERTa for accuracy
Learning rate 2e-5 is a safe starting point
Epochs 3-5 (more risks overfitting small datasets)
Max sequence length 128 for headlines; 512 for full articles
Minimum labeled data ~200 examples for meaningful improvement

7. News Impact Is Measurable Through Event Studies

The event study methodology quantifies how specific news items move prediction market prices. Define a pre-event baseline, measure the post-event price change, and subtract the expected change to isolate the abnormal impact:

$$\text{Abnormal Change}_t = \Delta P_t - \mathbb{E}[\Delta P_t]$$

8. Sentiment Features Must Be Aggregated Carefully

Raw article-level sentiments must be aggregated into tradeable time-series features. Three methods, each with different properties:

  • Simple moving average: Equal weighting of recent articles; smooths noise but lags.
  • Exponential moving average: Recency-weighted; responsive to shifts but noisy.
  • Volume-weighted: High-volume days count more; captures information intensity.

9. LLMs as Direct Forecasters Are Promising but Not Dominant

Large language models can generate probability estimates when prompted with structured analysis frameworks (base rate reasoning, reference classes, devil's advocate). Current evidence suggests they are competitive with prediction market prices on base-rate-rich questions but lag on questions requiring current, rapidly changing information.

10. Real-Time NLP Requires Robust Engineering

A production NLP pipeline must handle RSS feeds, API rate limits, deduplication, error recovery, and alert generation. The analytics are secondary to the engineering. A system that processes 90% of articles reliably beats one that processes 100% of articles intermittently.

Key Code Patterns

# VADER sentiment scoring
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores(text)  # returns dict with compound, pos, neg, neu

# HuggingFace transformer inference
from transformers import pipeline
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("The candidate surged in polls after the debate.")

# TF-IDF feature extraction
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1, 2))
X = vectorizer.fit_transform(documents)

Key Formulas

Formula Purpose
$\text{TF-IDF}(t,d) = \text{TF}(t,d) \times \log(N / \text{DF}(t))$ Feature weighting for text classification
$\text{Cosine Similarity} = \frac{\mathbf{a} \cdot \mathbf{b}}{\lVert\mathbf{a}\rVert \lVert\mathbf{b}\rVert}$ Document similarity comparison
$\text{EMA}_t = \alpha \cdot s_t + (1-\alpha) \cdot \text{EMA}_{t-1}$ Recency-weighted sentiment aggregation
$\text{Surprise} = 1 - \cos(\mathbf{v}_{\text{new}}, \bar{\mathbf{v}}_{\text{recent}})$ News novelty measurement via TF-IDF distance

Decision Framework

Question Recommendation
Need instant sentiment, no training data? VADER (social media) or TextBlob (general)
Have 200+ labeled examples? Fine-tune DistilBERT or RoBERTa
Need document classification, small data? TF-IDF + logistic regression
Need contextual understanding? Pre-trained transformer embeddings
Need domain-specific sentiment? Build custom lexicon or fine-tune
Real-time or batch? Real-time: VADER/cached transformer; Batch: full transformer
What aggregation for trading features? EMA for responsiveness; volume-weighted for robustness

The One-Sentence Summary

Extract sentiment from text using lexicon tools for speed, transformers for accuracy, and always aggregate into time-series features with proper temporal alignment before feeding into your prediction market trading models.