Case Study 9.2: Tuvan Throat Singing — One Throat, Two Notes

DataField.Dev

Case Study 9.2: Tuvan Throat Singing — One Throat, Two Notes

The Impossibility That Isn't

The first time most Western listeners encounter Tuvan throat singing, the experience is one of productive disbelief. A single human figure — one set of lungs, one larynx, one vocal tract — produces two simultaneous, clearly distinct pitches: a deep, steady drone below, and above it, a pure, flute-like melody that seems to float in the air entirely independently of the person producing it. The melody rises and falls, moving through what sounds like a pentatonic or diatonic scale, while the drone holds steady. Surely, the uninitiated listener thinks, there must be two performers, or some electronic manipulation, or some hidden instrument.

There isn't. The physics that makes this possible is not exotic or mysterious — it is a precise and deliberate application of the source-filter model, pushed to an extreme that Western vocal tradition has never systematically explored. One throat, trained with remarkable precision, separates its own acoustic signal into its component frequencies and makes one of those frequencies sing independently.

Cultural and Geographic Context

Tuva is a republic of the Russian Federation situated in the geographic center of Asia, bordering Mongolia to the south. The Tuvan people are traditionally nomadic herders of the Central Asian steppe — a landscape of vast grasslands, mountains, rivers, and sky. The cultural tradition of khoomei (throat singing) is intimately linked to this landscape; singers describe the practice as a form of communication with nature, an imitation of wind, water, mountains, and animals. The word khoomei means, roughly, "pharynx" or "throat" in the Tuvan language.

Khoomei is not one technique but a family of related styles, each with a distinct aesthetic character and a somewhat different acoustic approach:

Khoomei (narrow sense): A moderate, medium-pitched style, melodic and flowing, considered the foundational and most widely practiced style. The overtone is gentle and the drone is mid-range.
Sygyt: The most dramatic and technically demanding style — a pure, penetrating whistling or flute-like overtone melody in the upper register, above a clear drone. The Tuvan word means "whistle." This is the style most frequently heard by international audiences and documented in ethnomusicological recordings.
Kargyraa: A very deep, growling, powerful style in which the fundamental is pushed an octave below normal bass range, sometimes into subharmonic territory, with a dark, buzzing drone from which a rough overtone emerges. The kargyraa style produces the most striking perceptual effect of a "deep earth" quality.
Borbangnadyr: A rolling, bubbling style imitating flowing water, characterized by rapid formant modulation that produces rapidly trilling overtones.
Ezengileer: A rhythmically complex style associated with horseback riding, with overtones rhythmically accented to mimic the motion of a horse.

The ensemble Huun-Huur-Tu — formed in 1992 and consisting of four musicians from Tuva — brought this tradition to international attention, performing at major concert halls worldwide and making recordings that introduced khoomei to a global audience. Their instrumentation typically combines throat singing with the igil (a horsehair fiddle), doshpuluur (a lute), and percussion instruments, creating a rich blend that demonstrates the integration of khoomei into a full musical system rather than presenting it as a curiosity.

The Physics of Two Notes from One Throat

The Source: A Low, Stable Drone

The foundation of khoomei is the drone — a sustained, stable fundamental pitch. For sygyt (the high-overtone style), the drone is typically in the range of F2 (87 Hz) to C3 (130 Hz) in male singers. This low pitch is essential: the harmonic series of a low fundamental contains many harmonics in the human hearing range, giving the singer a rich palette of potential overtone frequencies to select from.

The drone must be steady, sustainably produced, and harmonically rich. Tuvan singers develop remarkable control over breath management and glottal tension to maintain a stable, consistent drone for extended periods. The glottal waveform for khoomei tends to be strongly periodic, with a prominent closed phase — this is what generates the strong, clear harmonic series above the drone that the formant can select from.

The Filter: An Extreme, Narrow Formant

This is where the acoustic "magic" happens. The singer shapes the vocal tract — primarily using the tongue, lips, and jaw — to create a formant of extraordinary narrowness tuned precisely to one harmonic of the drone.

In normal speech, formants have bandwidths of 100–200 Hz. A narrow speech formant might be 80 Hz wide. In sygyt, expert singers achieve formant bandwidths of 40–60 Hz — and in some documented cases, even narrower. This extreme narrowness translates to a very high Q factor (resonance quality factor), meaning the resonance is extremely selective: it boosts only the targeted harmonic and not its neighbors.

The physical configuration that achieves this narrow formant involves: - Tongue blade raised high toward the hard palate, creating a narrow constriction with a small oral cavity in front of it - Lips protruded into a rounded, narrow aperture, further adjusting the resonance frequency and sharpening the peak - Pharynx kept relatively open and relaxed, providing the main body of the vocal tract below the narrow tongue constriction

This creates a two-chamber configuration: a small front cavity (in front of the tongue constriction) and a larger back cavity (the pharynx). The small front cavity has a resonant frequency determined by its volume and the size of the constriction — and this resonant frequency can be tuned precisely by adjusting tongue position and lip rounding. This is exactly the acoustic structure of a Helmholtz resonator embedded in a tube — a classic acoustic configuration for producing sharp, selective resonances.

When this narrow formant is tuned to the 8th harmonic of a G2 drone (98 Hz × 8 = 784 Hz), that harmonic is amplified to audible prominence as a separate pitch. The melody is then created by sliding the formant from one harmonic to an adjacent one — from the 8th to the 9th, 10th, 11th, etc. — by changing tongue position and lip configuration. This motion is rapid, smooth, and produces the characteristic flowing melodic quality of sygyt.

The Harmonic Series as Musical Scale

The harmonics of a drone at 98 Hz (G2) are: - 1st: 98 Hz (G2) — the drone - 2nd: 196 Hz (G3) - 3rd: 294 Hz (D4) - 4th: 392 Hz (G4) - 5th: 490 Hz (B4 slightly flat) - 6th: 588 Hz (D5) - 7th: 686 Hz (between F5 and F#5 — the "blue" seventh) - 8th: 784 Hz (G5) - 9th: 882 Hz (A5) - 10th: 980 Hz (B5 slightly flat) - 11th: 1078 Hz (between E5 and F5 — the "neutral" eleventh) - 12th: 1176 Hz (D6) - 13th–16th: progressively dense above...

The melody available to the overtone singer is therefore determined by the structure of the harmonic series — not by an independently designed scale, but by the physics of vibrating air. The resulting melodic possibilities include some intervals that align with Western equal temperament (the octave, fifth, fourth, major third) and others that don't (the 7th harmonic, "flat seventh"; the 11th harmonic, "neutral fourth"). This gives overtone singing a characteristic modal quality that is immediately recognizable and that does not map perfectly onto any standard Western scale.

This is a profound observation: the "scale" of Tuvan melody in khoomei is determined by the harmonic series — by physics, not culture. Culture enters in the choice of drone pitch, the specific harmonics emphasized as melodically significant, the rhythmic patterns used, and the aesthetic values applied to different timbres. But the raw material is given by the physics of vibrating strings (or, in this case, vibrating folds).

What Khoomei Reveals About the Human Voice

Khoomei demonstrates several things that are not obvious from conventional Western vocal practice:

1. The vocal tract is a tunable frequency selector. The source-filter model is not merely a theoretical framework — it is a physical reality that can be made perceptually direct. By sharpening the formant enough, the filter becomes visible as melody.

2. The harmonic series is musically universal. Tuvan singers, operating in a tradition entirely independent of Western music theory, arrived at a musical practice in which the harmonic series is the explicit source of pitch. This convergence suggests that the harmonic series is not a cultural choice but a physical fact that independently shapes musical practice across traditions.

3. The human voice has not been fully explored by any single tradition. Western classical singing, despite its extraordinary development, emphasizes one region of vocal possibility (high pressure, complete glottal closure, trained singer's formant). Khoomei emphasizes another (stable drone, extreme formant selectivity, harmonic overtone melody). Neither is "the voice" — both are applications of the same physical system to different aesthetic ends.

4. Constraint generates creativity. The harmonic series is a rigid physical constraint — you cannot move between arbitrary pitches in overtone singing; you can only select from the available harmonics. But within this constraint, Tuvan singers have developed a rich, nuanced, melodically sophisticated musical language. The constraint shapes the music without determining it.

Huun-Huur-Tu and Global Khoomei

When Huun-Huur-Tu performed at Carnegie Hall in 1994, the concert was reportedly met with extended silence before the audience began to applaud — not indifference, but the kind of stunned pause that follows an experience outside normal categories. The group went on to collaborate with musicians from Frank Zappa's circle to Kronos Quartet to Ry Cooder, demonstrating that khoomei's acoustic basis (harmonics, drones, resonance) connects directly with Western musical concepts despite the surface cultural differences.

Their vocalist and igil player Kaigal-ool Khovalyg is considered one of the great masters of sygyt. Recordings of his sygyt show formant bandwidths consistent with the extreme narrowness described above — the physics confirm the perception. In interviews, Khovalyg has described the practice as a form of listening to nature, of training the voice to imitate the sounds of water, wind, and mountains. Whether or not one accepts the spiritual framing, the acoustic result is a demonstration that the physical world's harmonic content — present in the sounds of wind, strings, water — can be reproduced and manipulated by the human voice.

Discussion Questions

Overtone singing makes the source-filter model directly audible. Design an active learning exercise (for a classroom with internet access) that allows students to compare overtone singing recordings to their spectrograms, identifying the formant and the harmonics in both the audio and the visual representation.
The harmonic series "scale" available to an overtone singer depends on the fundamental frequency of the drone. If a singer has a drone at C2 (65 Hz) vs. G2 (98 Hz), how does the available melodic range differ? Calculate the first 12 harmonics for each drone and compare their musical interval structure.
Overtone singing from Tibet, Mongolia, and Tuva have developed independently (with some cultural contact). They share the acoustic strategy of formant-selected overtones but differ in specific technique and aesthetic. What does this parallel development suggest about the relationship between universal acoustic constraints and cultural specificity? Apply the textbook theme of "universal structures vs. cultural specificity" to this case.
Could overtone singing be taught in a Western choral tradition? What would need to change about standard vocal pedagogy to accommodate overtone singing? What physical properties of the Western operatic voice (particularly the singer's formant strategy) might conflict with the requirements of overtone singing, and how might those conflicts be resolved?