Chapter 9 Key Takeaways: The Voice as Instrument


Core Concepts

1. The Source-Filter Model

The human voice operates as a two-stage acoustic system: - Source: The glottis produces a harmonic-rich buzz with a spectrum that rolls off at ~−12 dB/octave - Filter: The vocal tract selects which harmonics are amplified through formant resonances - Output = Source × Filter (multiplicative in amplitude; additive in dB) - This model explains why vowels are identifiable across speakers of different sizes and pitches

2. Formants Define Vowels

  • F1 correlates with tongue height (low tongue → high F1)
  • F2 correlates with tongue frontness (front tongue → high F2)
  • The F1-F2 vowel space maps onto tongue position — and this relationship is universal across human languages
  • Three-vowel systems universally select /a/, /i/, /u/ — the three corners of maximum acoustic distance

3. The Mucosal Wave

  • Vocal fold vibration is not a simple string vibration — it is a rolling surface wave that propagates across the fold cover
  • The mucosal wave is the source of vocal fold vibratory efficiency
  • Disruption of the mucosal wave (by nodules, edema, etc.) is the primary mechanism of vocal pathology and hoarseness

4. The Singer's Formant

  • Operatic singers cluster F3, F4, and F5 in the 2800–3200 Hz range
  • This provides 15–20 dB enhancement in the orchestra's natural spectral gap
  • Mechanism: lowered larynx + narrowed epilaryngeal tube + widened pharynx
  • Result: the voice "cuts through" the orchestra by exploiting a frequency band where the orchestra doesn't compete

5. Registers and the Passaggio

  • Chest voice: Full fold mass vibration, complete closure, harmonically rich
  • Falsetto: Stretched, thin folds, edge-only contact, purer/breathier sound
  • Mixed voice: Intermediate co-activation of cricothyroid and vocalis muscles
  • The passaggio (transition zone) is the primary technical challenge of classical vocal training

6. Overtone Singing

  • Khoomei/throat singing exploits the source-filter model by creating an extremely narrow-bandwidth formant
  • A single selected harmonic of the drone becomes perceptually audible as a distinct melodic pitch
  • The "scale" of overtone singing is the harmonic series — determined by physics, not cultural convention
  • Traditions in Tuva, Mongolia, and Tibet independently developed this technique

7. Vibrato

  • Classical vibrato: rate 5–7 Hz, depth ±50 cents
  • Listeners perceive the average of the modulated frequency (not peaks or troughs)
  • Too slow (<3 Hz): heard as wobble; too fast (>8 Hz): heard as tremor; 5–7 Hz: perceived as single pitch with added richness

8. The Descended Larynx

  • The human larynx is lower in the throat than any other primate
  • This creates a long pharyngeal cavity that enables the full F1-F2 vowel space
  • Trade-off: The crossed food/air pathway creates choking risk unique to humans
  • The descended larynx is both the anatomical foundation of speech and of music

9. Choral Acoustics

  • Multiple singers produce incoherent addition: amplitude grows as √N (not N)
  • 60 singers → ~18 dB increase over one singer (not 35 dB)
  • Blend emerges from: spectral smoothing + vibrato averaging + distributed formant frequencies

10. Voice and Language

  • Cross-linguistic phoneme inventories reflect acoustic optimization within vocal tract physics
  • Languages select maximally distinct sounds — vowels separated in F1-F2 space
  • Tone languages use fundamental frequency (F0) for lexical meaning, adding a layer of acoustic information

Key Equations and Values

Concept Value/Formula
Source roll-off ~−12 dB/octave
Singer's formant range 2800–3200 Hz
Singer's formant enhancement 15–20 dB
Classical vibrato rate 5–7 Hz
Classical vibrato depth ±50 cents
Vocal tract length (adult) ~17 cm
Fundamental resonance of 17 cm tube ~500 Hz
Incoherent loudness for N sources ΔdB = 10 log₁₀(N)

Big Picture Connections

  • The voice exemplifies the Reductionism vs. Emergence theme: it reduces to a simple source-filter model, but the emergent acoustic behavior of trained voices is extraordinarily complex
  • The singer's formant and the particle accelerator cavity comparison illustrates universal structures: resonance physics is the same whether the cavity is a soprano's throat or a superconducting niobium shell
  • Overtone singing demonstrates how constraint generates creativity: the harmonic series is a rigid physical constraint that has enabled an entire musical genre
  • The cross-cultural comparison of vocal traditions (opera, khoomei, Indian classical, Tibetan chant) demonstrates universal vs. cultural: the same physical system (the human voice) is deployed in radically different ways by different cultures for different aesthetic ends

Bridge to Chapter 10

The source-filter model that governs the voice has a direct electronic analog: - SourceOscillator (VCO): generates periodic signals with harmonic content - FilterFilter (VCF): shapes the spectrum by boosting or attenuating frequency regions - Amplitude controlAmplifier (VCA): controls overall volume over time

Electronic synthesizers are, at one level, mechanical implementations of the source-filter model. Chapter 10 explores this connection — and discovers, in Aiko Tanaka's synthesizer patch, that the resonant filter used to sculpt electronic sound is governed by exactly the same differential equation as the quantum harmonic oscillator.