Chapter 9 Quiz: The Voice as Instrument

Each question includes the correct answer hidden in a <details> block. Try answering before revealing.


Question 1 In the source-filter model of the voice, which of the following correctly describes the "source"?

A) The vocal tract resonances that shape the output spectrum B) The periodic puff-train of acoustic energy produced by vibrating vocal folds C) The air pressure in the lungs that drives phonation D) The listener's perception of the fundamental frequency

Show Answer **Correct Answer: B** The "source" in the source-filter model is the glottal pulse train — the sequence of acoustic pressure puffs generated by the opening and closing of the vocal folds. It has a rich harmonic spectrum (harmonics at integer multiples of f₀) that rolls off at about −12 dB/octave. The "filter" is the vocal tract. Lung pressure (C) drives the source but is not itself the source. Listener perception (D) is an output, not a component of the production model.

Question 2 A singer produces the vowel /i/ (as in "beat"). Which of the following best describes the tongue position that creates this vowel?

A) Low tongue body, back of the mouth — large pharyngeal constriction B) High tongue body, front of the mouth — narrow oral constriction C) Mid-height tongue body, center of the mouth — balanced resonance D) High tongue body, back of the mouth — velar constriction

Show Answer **Correct Answer: B** The /i/ vowel is produced with a high, front tongue position. This creates a narrow constriction near the front of the oral cavity. In formant terms, this produces low F1 (high constriction → low F1) and high F2 (front constriction → high F2). The /a/ vowel corresponds to (A), with low tongue and back placement giving high F1 and lower F2.

Question 3 The "singer's formant" refers to:

A) The first formant (F1) that all singers develop to project their voice B) A clustering of F3, F4, and F5 in the 2800–3200 Hz range that enhances vocal projection C) The formant that distinguishes a trained singing voice from a speaking voice at all frequencies D) An isolated peak at exactly 3000 Hz that is present in all professional singers

Show Answer **Correct Answer: B** The singer's formant is a cluster of the third, fourth, and fifth formants (F3, F4, F5) that bunch together in the 2800–3200 Hz range in trained operatic singers. This is achieved through a combination of lowered larynx, widened pharynx, and narrowed epilaryngeal tube. It provides 15–20 dB of enhancement in the frequency range where orchestral instruments produce relatively little energy — allowing the voice to project over the orchestra.

Question 4 What is the mucosal wave, and why is it important?

A) A resonance wave that travels through the trachea and amplifies low frequencies B) A rolling surface wave that travels across the vocal fold cover during vibration, enabling efficient phonation C) The pressure wave in the vocal tract that creates formant peaks D) A neural signal from the brain that coordinates vocal fold closure timing

Show Answer **Correct Answer: B** The mucosal wave is a propagating surface deformation that travels across the vocal fold cover (the soft, viscoelastic mucosa) from the lower edge to the upper edge during each vibratory cycle. It results from the loose coupling between the stiff vocal fold body and the soft cover layer. The mucosal wave makes vocal fold vibration highly efficient at converting airflow into acoustic energy. Its absence or disruption (due to nodules, edema, or stiffness) is a key indicator of vocal pathology.

Question 5 In a vowel space plot (F1 on the y-axis, F2 on the x-axis, with lower numbers at top and right), where would you find the vowel /u/ (as in "boot")?

A) Top left (low F1, high F2) B) Bottom right (high F1, low F2) C) Top right (low F1, low F2) D) Bottom left (high F1, high F2)

Show Answer **Correct Answer: C** The /u/ vowel has low F1 (because the tongue body is high) and low F2 (because the tongue body and lip rounding create a back, round configuration). In a standard vowel space plot where lower F1 is at the top and lower F2 is at the right, /u/ plots in the top right. The /a/ vowel is bottom left (high F1, high F2). The /i/ vowel is top left (low F1, high F2).

Question 6 Chest voice and falsetto (head voice) differ primarily because:

A) Chest voice uses the lungs more directly; falsetto bypasses normal breathing B) In chest voice, the full fold mass vibrates with complete closure; in falsetto, the folds are stretched thin with incomplete or edge-only closure C) Chest voice has a lower fundamental frequency limit than falsetto D) Falsetto uses more subglottal pressure than chest voice

Show Answer **Correct Answer: B** The key distinction is the mode of vocal fold vibration. In chest voice (modal voice), the full thickness and mass of the folds participates in vibration, with complete glottal closure during each cycle, producing a harmonically rich sound. In falsetto, the folds are longitudinally stretched and thinned by the cricothyroid muscle, only the edges vibrate or make contact, and closure is incomplete — producing a lighter, breathier, more spectrally pure sound. (D) is backwards: falsetto typically requires *less* subglottal pressure than chest voice at similar pitches.

Question 7 Tuvan throat singers (khoomei) produce a second audible pitch above their drone. Acoustically, this second pitch is:

A) A separately vibrating part of the vocal fold system (e.g., a second pair of folds) B) An electronically added pitch through a hidden device C) A single harmonic of the drone that has been selectively amplified by a very sharp, narrow formant in the vocal tract D) The result of two different fundamental frequencies produced simultaneously by different parts of the larynx

Show Answer **Correct Answer: C** Overtone singing works entirely within the source-filter model. The source (glottal vibration) produces a full harmonic series above the drone fundamental. The filter (vocal tract) is shaped to create an extremely sharp (narrow-bandwidth) formant tuned to one specific harmonic. This formant amplifies that harmonic dramatically — by 20 dB or more — above all others, making it perceptually audible as a distinct melodic pitch. The drone and the "melody" are produced by the same glottis; only the filter (vocal tract shape) changes.

Question 8 Vibrato in classical singing has a typical rate of approximately:

A) 1–2 Hz B) 5–7 Hz C) 12–15 Hz D) 20–30 Hz

Show Answer **Correct Answer: B** Classical singing vibrato typically oscillates at 5–7 Hz. This rate is psychoacoustically significant: below about 3 Hz, the oscillation is perceived as an unstable "wobble"; above about 8 Hz, it begins to sound like a pathological tremor. At 5–7 Hz, the auditory system integrates the modulated frequency into a single perceived pitch with added richness. This rate corresponds to natural oscillation frequencies in vocal neuromuscular control systems.

Question 9 Why did the human larynx descend during evolution, and what was the primary cost of this change?

A) The larynx descended to lower the voice pitch; the cost was increased effort in swallowing B) The larynx descended to create a longer pharyngeal resonating cavity, enabling a richer vowel space; the cost was that food and air pathways now cross, creating choking risk C) The larynx descended to allow bipedal posture; the cost was reduced oxygen intake capacity D) The larynx descended to accommodate a larger brain; there was no significant acoustic benefit

Show Answer **Correct Answer: B** The descended larynx in humans creates a long pharynx above the vocal folds, which is the primary acoustic resonating cavity that enables the full F1-F2 vowel space of human speech. The cost is that the crossed anatomy of the pharynx (food going back and down, air going forward and down) requires precise muscular coordination to avoid aspiration, making choking possible in humans in a way that is not possible for other primates with high larynxes.

Question 10 Which three vowels are found in virtually all 3-vowel language systems in the world?

A) /a/, /e/, /o/ B) /i/, /e/, /a/ C) /a/, /i/, /u/ D) /o/, /u/, /a/

Show Answer **Correct Answer: C** The vowels /a/ (low, central-back), /i/ (high, front), and /u/ (high, back-round) represent the three corners of the vowel space triangle — they are maximally acoustically distinct from each other in F1-F2 space. Any 3-vowel language system selecting for maximum perceptual distinctiveness converges on these three. This is not a cultural coincidence but an acoustic inevitability grounded in the physics of the vocal tract and the geometry of the vowel space.

Question 11 What is the primary acoustic difference between a formant (as in normal speech) and the formant used in overtone singing?

A) Overtone singing formants are at lower frequencies than speech formants B) Overtone singing uses a much narrower-bandwidth (higher Q) formant, allowing a single harmonic to be emphasized with much greater selectivity C) Speech formants are created by the vocal folds; overtone singing formants are created by the lips D) There is no acoustic difference; overtone singing is simply louder speech

Show Answer **Correct Answer: B** The critical difference is formant bandwidth (or equivalently, Q factor). Normal speech formants have bandwidths of 100–200 Hz — broad enough to simultaneously boost several harmonics. Overtone singing formants can be as narrow as 50 Hz or less, achieved through extreme tongue articulation and lip protrusion. This narrow bandwidth gives the formant a high Q factor, allowing it to boost a single harmonic dramatically above all others, making it perceptually distinct as a separate pitch.

Question 12 The "spectral gap" in orchestral sound (at approximately 2800–3200 Hz) is important for vocal projection because:

A) The orchestra is louder in this range, and singers exploit the resonance B) This range is where the singer's formant operates, and the relatively low orchestral energy here allows the singer's boosted harmonics to stand out C) Human hearing is least sensitive in this range, so it requires less energy to be heard D) Consonants (not vowels) are most important in this frequency range, giving speech clarity

Show Answer **Correct Answer: B** Orchestral instruments collectively produce less acoustic energy in the 2800–3200 Hz range than below 2000 Hz — this is the "spectral gap." Trained operatic singers develop the singer's formant cluster (F3+F4+F5 in this range), which enhances their voice by 15–20 dB in exactly this range. Since the orchestra hasn't filled this gap with competing energy, the singer's enhanced harmonics in this range are clearly audible even when the total orchestra power exceeds the singer's total power. The singer doesn't outshout the orchestra — they exploit a frequency gap.

Question 13 In the context of choral singing, "incoherent addition" means:

A) The singers are out of tune, creating dissonance B) Multiple sound sources with random phase relationships produce total intensity proportional to the number of sources (not the square of the number) C) Choral singing is always quieter than expected because of phase cancellation D) Individual voices cannot be distinguished in a large choir

Show Answer **Correct Answer: B** Incoherent addition describes how multiple uncorrelated sound sources (with random phase relationships) combine. Total intensity (proportional to loudness) grows proportionally to the number of sources N, so amplitude grows as √N. For 60 singers, each contributing power P, the total intensity is 60P and the amplitude is √60 ≈ 7.75 times one singer. This gives a SPL increase of about 18 dB (for 60 singers). Coherent addition (perfectly in phase) would give 60× amplitude increase — 35.6 dB — which is much more but practically impossible.

Question 14 The source-filter model predicts that the same vowel should be recognizable across speakers of very different voice sizes (male, female, child) because:

A) Absolute formant frequencies are identical across speakers B) The formant frequency ratios (relative positions in the vowel space) are similar across speakers, even if absolute values differ C) All speakers produce the same fundamental frequency for any given vowel D) The vocal tract length is the same in all humans regardless of age or sex

Show Answer **Correct Answer: B** Vowel identity is not carried by absolute formant frequencies but by the pattern of formant frequencies relative to each other — and relative to that speaker's overall vocal tract resonance scale. A child's vocal tract is shorter than an adult male's, so all formant frequencies are higher, but the *ratios* between formants for any given vowel are similar. The auditory system performs "speaker normalization" — adjusting for the overall formant scale of a speaker and interpreting the pattern accordingly. This is why we effortlessly understand speech across speakers of very different sizes.

Question 15 Vocal nodules impair voice quality primarily by:

A) Lowering the fundamental frequency below usable range B) Disrupting the mucosal wave and preventing complete glottal closure, causing breathiness and rough timbre C) Reducing subglottal air pressure available for phonation D) Blocking the nasal passage and reducing nasal resonance

Show Answer **Correct Answer: B** Nodules are fibrotic (hard, callus-like) thickenings that form at the midpoint of the vocal folds due to repeated collision trauma. Because they are stiff, they disrupt the smooth mucosal wave propagation. Because they project into the glottis, they prevent the folds from achieving complete closure — creating an air leak (breathy quality) and an irregular vibratory pattern (rough/hoarse quality). The clinical markers are increased jitter (pitch irregularity) and shimmer (amplitude irregularity) in the acoustic signal.

Question 16 "Formant tuning" in singing refers to:

A) Adjusting the pitch of the sung note to match the room's resonant frequencies B) Modifying vowel quality to align vocal tract formants with harmonics of the sung pitch, maximizing acoustic efficiency C) Training the vocal folds to produce stronger harmonics at certain frequencies D) Using electronic equipment to adjust formant frequencies in real time during performance

Show Answer **Correct Answer: B** Formant tuning is the (often unconscious) modification of vowel quality — i.e., vocal tract shape — to align a formant frequency with one of the harmonics of the sung pitch. When a formant frequency coincides with a harmonic, that harmonic receives maximum resonant amplification. At high pitches, where harmonics are widely spaced, singers must modify vowels more dramatically to achieve formant-harmonic alignment, which is why vowels on high notes in singing often sound different from the same vowel in speech.

Question 17 In comparing the soprano voice to a particle accelerator's radiofrequency cavity, the most precise acoustic/physical analogy is:

A) Both use electromagnetic energy to produce sound B) Both use resonance states to selectively enhance energy at specific frequencies, allowing efficient energy transfer to a driven system C) Both are loudest at very low frequencies D) Both systems are tuned by external computers rather than internal mechanisms

Show Answer **Correct Answer: B** Both the soprano's vocal tract and an SRF (superconducting radiofrequency) accelerator cavity work by maintaining resonance states — specific frequencies at which the system preferentially stores and amplifies energy. The soprano tunes her formants (resonance states of the vocal tract cavity) to match harmonics of her voice, amplifying those harmonics. The accelerator tunes its cavity resonances to match the frequency of the proton beam, amplifying the accelerating electromagnetic field. In both cases, resonance enables efficient energy transfer that would be impossible without it.

Question 18 Why does a large choir of 80 singers sound "smoother" and more blended than a single voice, even when both produce the same pitch?

A) The choir uses electronic reverb to smooth the individual voices B) Spectral smoothing from averaging multiple slightly different spectra, vibrato averaging across singers, and distributed formant frequencies combine to create a more homogeneous timbral output C) The choir rehearses so much that all voices become acoustically identical D) Large choirs always use vowel formants in the same frequency range as electronic instruments

Show Answer **Correct Answer: B** Choral blend emerges from three acoustic processes: (1) Spectral smoothing — individual voices have irregular harmonic amplitudes; averaging 80 such spectra smooths out the irregularities. (2) Vibrato averaging — each singer's vibrato has a slightly different phase and rate, so the combined signal shows smeared harmonic peaks rather than distinctly wobbling ones. (3) Formant distribution — small differences in vocal tract size mean 80 singers have slightly different formant frequencies, and their combined output has a broader, smoother formant structure than any individual.

Question 19 The "mixed voice" in singing refers to:

A) A blend of the singer's voice with an electronic synthesizer B) A coordination of chest-voice (vocalis) and head-voice (cricothyroid) muscle activation, allowing full-voiced singing at higher pitches than pure chest voice C) Singing with two vocalists in unison to create a mixed timbral output D) A style of singing that mixes operatic and popular techniques in alternation

Show Answer **Correct Answer: B** Mixed voice (voix mixte, mix) is a vocal production mode in which both the vocalis muscle (which creates the thick, chest-mode configuration) and the cricothyroid muscle (which creates the thin, head-mode configuration) are co-activated. The result is a voice that has the power and harmonic richness of chest voice but can reach the higher pitches of head voice. The folds are in an intermediate configuration — shorter and thicker than pure head voice, longer and thinner than pure chest voice. Training the mixed voice is central to most classical and musical theater pedagogical systems.

Question 20 Which statement about the relationship between physics and culture in singing is best supported by the material in this chapter?

A) Physics determines everything about singing; cultural variations are superficial B) Culture determines everything about singing; physical constraints are irrelevant C) Physical constraints (acoustic physics, vocal anatomy) create the space of possibilities; culture makes choices within that space, sometimes exploiting different regions of the possibility space for aesthetic purposes D) Physics and culture are completely independent and never interact in the realm of singing

Show Answer **Correct Answer: C** The chapter consistently illustrates that physical constraints — the source-filter model, the formant frequencies, the register mechanics, the choking risk of the descended larynx — define the *space of possibilities* for vocal production. Different cultural traditions explore different regions of this space: Western opera develops the singer's formant for large-hall projection; Tuvan throat singing exploits formant selectivity for overtone melody; Tibetan chant pushes the lower limits of modal vibration. None of these choices is acoustically "wrong" — they are different cultural solutions to different aesthetic problems within the same physical system.