Key Takeaways: Text on Screen
Core Principle
Half your audience is watching with sound off. Text on screen transforms invisible audio content into visible, dual-coded information that boosts retention by 15-25%, serves accessibility, and captures viewers who would otherwise scroll past without ever hearing your message.
Why Text Overlays Work
The silent scroll problem: 50-85% of initial video views happen with sound off.
Dual coding theory (Paivio, 1971): Same information through two channels (visual text + auditory speech) creates: 1. Two memory traces (more durable than one) 2. Improved comprehension (text and audio serve as mutual backups) 3. Reduced processing effort (frees cognitive resources for engagement)
Retention improvement by content type:
| Content Type | Without Text | With Text | Improvement |
|---|---|---|---|
| Educational | 42% | 56% | +33% |
| Commentary | 38% | 48% | +26% |
| Process/how-to | 44% | 57% | +30% |
| Storytelling | 51% | 58% | +14% |
| Comedy | 55% | 60% | +9% |
Five Rules of Video Typography
| Rule | Key Points |
|---|---|
| 1. Readability first | Min 5% frame height; display time = words ÷ 3 seconds; 5-8 words/line max |
| 2. Contrast always | White text + black outline = safest. Or: shadow, background bar, dedicated zone |
| 3. Font = tone | Sans-serif (modern), Bold (urgent), Serif (authority), Handwritten (personal) |
| 4. Placement respects frame | Top = hooks/headlines; Center = emphasis (rare); Bottom = captions/CTAs |
| 5. Consistency = brand | Same fonts, colors, placement, animation across all videos |
The two-font rule: Maximum two fonts per video (one headline, one body).
Captioning
Who uses captions: 80% are NOT deaf or hard of hearing — they're sound-off viewers, commuters, students, international audiences.
Caption quality standards:
| Element | Good | Bad |
|---|---|---|
| Accuracy | Word-for-word or close | Garbled auto-generated errors |
| Timing | Synchronized (±0.5 sec) | Delayed or early |
| Readability | 1-2 lines, clear font | Paragraphs, tiny font |
| Placement | Consistent, avoids UI | Jumping, hidden by buttons |
| Completeness | All speech captured | Missing segments |
Auto-generated → Manual correction is best practice. Auto-captions average ~85% accuracy; manual correction brings it to ~99%.
The Subtitle Style
Definition: Text IS the narration — no voiceover. Visual content + text overlays.
| Works Well For | Doesn't Work For |
|---|---|
| Cooking/process | Complex analysis |
| ASMR/aesthetic | Emotional vulnerability |
| Get ready with me | Rapid dialogue/reaction |
| Day in my life | Information requiring vocal nuance |
Advantages: Sound-off optimized, preserves process sounds, personality through writing, reduced production barrier.
Text Animation Hierarchy
| Tier | Animation | Use For | Example |
|---|---|---|---|
| Tier 1: Headline | Pop/bounce/scale | Punchlines, key facts | "I can't believe this worked" |
| Tier 2: Body | Fade in / soft slide | Narration, context | "so I tried something different" |
| Tier 3: Label | Static (none) | Names, timestamps | "Day 3" |
Principle: If everything is emphasized, nothing is emphasized.
Five Text Hook Formats
| Hook Type | Mechanism | Example |
|---|---|---|
| Question | Curiosity gap (Ch. 5) | "Why does your brain want you to fail?" |
| Statement | Schema violation (Ch. 6) | "Everything you know about productivity is wrong." |
| Preview | Value proposition (Ch. 16) | "I tried every viral recipe so you don't have to." |
| Dialogue | Relatability (Ch. 14) | "her: 'try working out' / me: [lying on couch]" |
| List | Concrete promise | "3 things I wish I knew at 15" |
Best text hooks work on two levels: Sound off = text delivers the hook. Sound on = text + voice create dual-coded hook.
Text vs. Voice (from Case Study 2)
| Metric | Text Wins | Voice Wins |
|---|---|---|
| Save rate | ✓ (reference-friendly) | |
| Comment rate | ✓ (parasocial response) | |
| Simple content | ✓ (efficient) | |
| Complex content | ✓ (pacing, emphasis help) | |
| Emotional content | ✓ (authenticity through voice) | |
| Sound-off viewers | ✓ | |
| Sound-on viewers | ✓ | |
| Combined | Best overall for most content types |
Format by content type:
| Content | Primary Format |
|---|---|
| Simple recipe/process | Text-primary |
| Complex tutorial | Combined (equal) |
| Emotional/personal | Voice-primary |
| Reference/informational | Text-primary |
| Personality/entertainment | Voice-primary |
Quick Text Overlay Checklist
Before publishing: - [ ] Can content be understood with sound off? - [ ] Is text large enough to read on a phone? - [ ] Is contrast high enough for any background? - [ ] Are captions accurate and synchronized? - [ ] Does text placement avoid platform UI elements? - [ ] Is there an animation hierarchy (not everything equal)? - [ ] Does the opening frame include a text hook? - [ ] Is font/color/placement consistent with brand?
One-Sentence Chapter Summary
Add text to every video because half your audience watches muted, use high-contrast readable typography with consistent branding, caption for accessibility and engagement, choose text-primary or voice-primary based on content complexity and emotional intent, and design text hooks that stop the scroll even when your voice can't be heard.