Key Takeaways: Text on Screen

Core Principle

Half your audience is watching with sound off. Text on screen transforms invisible audio content into visible, dual-coded information that boosts retention by 15-25%, serves accessibility, and captures viewers who would otherwise scroll past without ever hearing your message.


Why Text Overlays Work

The silent scroll problem: 50-85% of initial video views happen with sound off.

Dual coding theory (Paivio, 1971): Same information through two channels (visual text + auditory speech) creates: 1. Two memory traces (more durable than one) 2. Improved comprehension (text and audio serve as mutual backups) 3. Reduced processing effort (frees cognitive resources for engagement)

Retention improvement by content type:

Content Type Without Text With Text Improvement
Educational 42% 56% +33%
Commentary 38% 48% +26%
Process/how-to 44% 57% +30%
Storytelling 51% 58% +14%
Comedy 55% 60% +9%

Five Rules of Video Typography

Rule Key Points
1. Readability first Min 5% frame height; display time = words ÷ 3 seconds; 5-8 words/line max
2. Contrast always White text + black outline = safest. Or: shadow, background bar, dedicated zone
3. Font = tone Sans-serif (modern), Bold (urgent), Serif (authority), Handwritten (personal)
4. Placement respects frame Top = hooks/headlines; Center = emphasis (rare); Bottom = captions/CTAs
5. Consistency = brand Same fonts, colors, placement, animation across all videos

The two-font rule: Maximum two fonts per video (one headline, one body).


Captioning

Who uses captions: 80% are NOT deaf or hard of hearing — they're sound-off viewers, commuters, students, international audiences.

Caption quality standards:

Element Good Bad
Accuracy Word-for-word or close Garbled auto-generated errors
Timing Synchronized (±0.5 sec) Delayed or early
Readability 1-2 lines, clear font Paragraphs, tiny font
Placement Consistent, avoids UI Jumping, hidden by buttons
Completeness All speech captured Missing segments

Auto-generated → Manual correction is best practice. Auto-captions average ~85% accuracy; manual correction brings it to ~99%.


The Subtitle Style

Definition: Text IS the narration — no voiceover. Visual content + text overlays.

Works Well For Doesn't Work For
Cooking/process Complex analysis
ASMR/aesthetic Emotional vulnerability
Get ready with me Rapid dialogue/reaction
Day in my life Information requiring vocal nuance

Advantages: Sound-off optimized, preserves process sounds, personality through writing, reduced production barrier.


Text Animation Hierarchy

Tier Animation Use For Example
Tier 1: Headline Pop/bounce/scale Punchlines, key facts "I can't believe this worked"
Tier 2: Body Fade in / soft slide Narration, context "so I tried something different"
Tier 3: Label Static (none) Names, timestamps "Day 3"

Principle: If everything is emphasized, nothing is emphasized.


Five Text Hook Formats

Hook Type Mechanism Example
Question Curiosity gap (Ch. 5) "Why does your brain want you to fail?"
Statement Schema violation (Ch. 6) "Everything you know about productivity is wrong."
Preview Value proposition (Ch. 16) "I tried every viral recipe so you don't have to."
Dialogue Relatability (Ch. 14) "her: 'try working out' / me: [lying on couch]"
List Concrete promise "3 things I wish I knew at 15"

Best text hooks work on two levels: Sound off = text delivers the hook. Sound on = text + voice create dual-coded hook.


Text vs. Voice (from Case Study 2)

Metric Text Wins Voice Wins
Save rate ✓ (reference-friendly)
Comment rate ✓ (parasocial response)
Simple content ✓ (efficient)
Complex content ✓ (pacing, emphasis help)
Emotional content ✓ (authenticity through voice)
Sound-off viewers
Sound-on viewers
Combined Best overall for most content types

Format by content type:

Content Primary Format
Simple recipe/process Text-primary
Complex tutorial Combined (equal)
Emotional/personal Voice-primary
Reference/informational Text-primary
Personality/entertainment Voice-primary

Quick Text Overlay Checklist

Before publishing: - [ ] Can content be understood with sound off? - [ ] Is text large enough to read on a phone? - [ ] Is contrast high enough for any background? - [ ] Are captions accurate and synchronized? - [ ] Does text placement avoid platform UI elements? - [ ] Is there an animation hierarchy (not everything equal)? - [ ] Does the opening frame include a text hook? - [ ] Is font/color/placement consistent with brand?


One-Sentence Chapter Summary

Add text to every video because half your audience watches muted, use high-contrast readable typography with consistent branding, caption for accessibility and engagement, choose text-primary or voice-primary based on content complexity and emotional intent, and design text hooks that stop the scroll even when your voice can't be heard.