Chapter 24 Key Takeaways
Core Principle
Text is the raw material of belief. Prediction markets move when traders read text, update beliefs, and trade. A system that can extract signal from text faster and more accurately than a human reader holds a structural edge.
The Big Ideas
1. Text Is a Leading Indicator in Prediction Markets
The causal chain is: event occurs, text is published, traders read, beliefs update, prices move. The latency between publication and price adjustment creates a tradeable window. This window is larger in prediction markets than in equities because analyst coverage is thinner and liquidity is lower.
2. Preprocessing Must Match the Downstream Model
Classical models (TF-IDF + logistic regression) benefit from aggressive preprocessing: lowercasing, stopword removal, stemming, and lemmatization. Transformer models should receive minimally processed text -- they were trained with full syntax, capitalization, and stopwords, and removing these degrades performance.
3. TF-IDF Remains a Strong Baseline for Structured Text Tasks
Term Frequency-Inverse Document Frequency converts text into a sparse numerical matrix where each dimension corresponds to a token or n-gram. When paired with logistic regression or SVM, TF-IDF achieves surprisingly competitive results on many classification tasks, especially with limited training data:
$$\text{TF-IDF}(t, d) = \text{TF}(t, d) \times \log\!\Bigl(\frac{N}{\text{DF}(t)}\Bigr)$$
4. VADER and TextBlob Provide Instant Sentiment -- but with Limits
Lexicon-based sentiment tools require zero training data and run in microseconds. VADER handles social media conventions (capitalization, punctuation, emojis). TextBlob provides both polarity and subjectivity. Neither captures context well: "not great" may score as positive because "great" is a positive word, and neither understands domain-specific language.
5. Transformers Understand Context -- That Is Their Superpower
BERT, RoBERTa, and their descendants process text as contextualized embeddings where the meaning of each word depends on its surroundings. "The Fed raised rates" and "Interest rates went up" produce similar embeddings despite sharing no content words. This contextual understanding is critical for prediction market text where subtle phrasing carries enormous weight.
6. Fine-Tuning Is the Practical Frontier
Pre-trained transformer models are general. Fine-tuning on a few hundred labeled prediction market examples specializes them dramatically. Key fine-tuning decisions:
| Decision | Recommendation |
|---|---|
| Base model | DistilBERT for speed; RoBERTa for accuracy |
| Learning rate | 2e-5 is a safe starting point |
| Epochs | 3-5 (more risks overfitting small datasets) |
| Max sequence length | 128 for headlines; 512 for full articles |
| Minimum labeled data | ~200 examples for meaningful improvement |
7. News Impact Is Measurable Through Event Studies
The event study methodology quantifies how specific news items move prediction market prices. Define a pre-event baseline, measure the post-event price change, and subtract the expected change to isolate the abnormal impact:
$$\text{Abnormal Change}_t = \Delta P_t - \mathbb{E}[\Delta P_t]$$
8. Sentiment Features Must Be Aggregated Carefully
Raw article-level sentiments must be aggregated into tradeable time-series features. Three methods, each with different properties:
- Simple moving average: Equal weighting of recent articles; smooths noise but lags.
- Exponential moving average: Recency-weighted; responsive to shifts but noisy.
- Volume-weighted: High-volume days count more; captures information intensity.
9. LLMs as Direct Forecasters Are Promising but Not Dominant
Large language models can generate probability estimates when prompted with structured analysis frameworks (base rate reasoning, reference classes, devil's advocate). Current evidence suggests they are competitive with prediction market prices on base-rate-rich questions but lag on questions requiring current, rapidly changing information.
10. Real-Time NLP Requires Robust Engineering
A production NLP pipeline must handle RSS feeds, API rate limits, deduplication, error recovery, and alert generation. The analytics are secondary to the engineering. A system that processes 90% of articles reliably beats one that processes 100% of articles intermittently.
Key Code Patterns
# VADER sentiment scoring
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores(text) # returns dict with compound, pos, neg, neu
# HuggingFace transformer inference
from transformers import pipeline
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("The candidate surged in polls after the debate.")
# TF-IDF feature extraction
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1, 2))
X = vectorizer.fit_transform(documents)
Key Formulas
| Formula | Purpose |
|---|---|
| $\text{TF-IDF}(t,d) = \text{TF}(t,d) \times \log(N / \text{DF}(t))$ | Feature weighting for text classification |
| $\text{Cosine Similarity} = \frac{\mathbf{a} \cdot \mathbf{b}}{\lVert\mathbf{a}\rVert \lVert\mathbf{b}\rVert}$ | Document similarity comparison |
| $\text{EMA}_t = \alpha \cdot s_t + (1-\alpha) \cdot \text{EMA}_{t-1}$ | Recency-weighted sentiment aggregation |
| $\text{Surprise} = 1 - \cos(\mathbf{v}_{\text{new}}, \bar{\mathbf{v}}_{\text{recent}})$ | News novelty measurement via TF-IDF distance |
Decision Framework
| Question | Recommendation |
|---|---|
| Need instant sentiment, no training data? | VADER (social media) or TextBlob (general) |
| Have 200+ labeled examples? | Fine-tune DistilBERT or RoBERTa |
| Need document classification, small data? | TF-IDF + logistic regression |
| Need contextual understanding? | Pre-trained transformer embeddings |
| Need domain-specific sentiment? | Build custom lexicon or fine-tune |
| Real-time or batch? | Real-time: VADER/cached transformer; Batch: full transformer |
| What aggregation for trading features? | EMA for responsiveness; volume-weighted for robustness |
The One-Sentence Summary
Extract sentiment from text using lexicon tools for speed, transformers for accuracy, and always aggregate into time-series features with proper temporal alignment before feeding into your prediction market trading models.