Chapter 5 Exercises: Psychoacoustics — The Physics Inside Your Head
Part A: Conceptual Understanding (5 problems)
A.1 The "hard problem of consciousness" as described in Section 5.1 holds that no physical description of a stimulus, however complete, fully explains the subjective experience of perceiving it. Psychoacoustics, however, has mapped detailed relationships between physical stimuli and perceptual responses. Do you think psychoacoustics has made progress toward solving the hard problem, or is it working on a different problem entirely? In your answer, clearly distinguish between: (a) explaining what someone perceives, (b) explaining why they perceive it that way, and (c) explaining what it is like for them to perceive it.
A.2 Explain why the statement "I heard a 100 Hz tone" might be physically false while being perceptually true. Under what circumstances does this happen? Give at least two real-world musical examples where the fundamental frequency of a sound is physically absent but clearly perceived. What does this tell us about the relationship between physical sound and musical experience?
A.3 A sound engineer is preparing a live concert mix for an outdoor festival where the audience will be listening at approximately 80 dB SPL at their position. They go to do a final soundcheck and notice that the bass guitar seems to be overpowering everything at this level. But when they listen at their normal monitoring level (about 85 dB), the mix sounds balanced. Explain this discrepancy using equal-loudness contours. At 80 dB listening level versus 85 dB, how does the equal-loudness contour shape change, and how does this affect the perceived balance between bass and midrange frequencies?
A.4 You are listening to a complex orchestral recording with many instruments. You want to follow the second violin melody, which is quieter than the first violin melody playing simultaneously. Describe at least four distinct mechanisms by which the second violin might be masked by other instruments. For each masking mechanism, suggest one thing the composer, arranger, or engineer could do to make the second violin line more audible.
A.5 The temporal theory of pitch perception (phase-locking) works well below ~4–5 kHz but fails at higher frequencies. The place theory of pitch perception (tonotopic map) works well at high frequencies but has insufficient frequency resolution for fine pitch discrimination at low frequencies. Yet humans can sing along with and recognize melodies played by piccolo (range: 600–4,500 Hz, with significant harmonic content above 5 kHz) with no difficulty. How does the auditory system achieve this across the transition region between the two mechanisms? What would you predict about pitch discrimination ability right in the transition zone versus well below or above it?
Part B: Auditory Illusions and Perceptual Phenomena (5 problems)
B.1 The Shepard Tone Illusion The Shepard tone creates the illusion of a pitch that continuously rises (or falls) without ever actually getting higher (or lower). The physical mechanism involves a set of sine tones separated by octaves, with a bell-shaped amplitude envelope that emphasizes the middle octaves. As the pitch seems to rise by a semitone, the lowest components fade out and new components fade in at the top, creating a loop. (a) Which feature of pitch perception does this illusion exploit — place coding, temporal coding, or octave equivalence? Explain. (b) If you listened to a Shepard tone for a very long time (say, 10 minutes of continuous ascent), what should you hear at the end? What does this tell you about pitch perception? (c) How could you create a Shepard rhythm (an illusion of a rhythm that seems to continuously accelerate without getting faster)? What aspect of temporal perception would it exploit?
B.2 The Octave Illusion Diana Deutsch discovered this illusion: two alternating tones, a perfect octave apart (e.g., 400 Hz and 800 Hz), alternate between ears. What most listeners hear is not an octave jumping back and forth between ears — instead, most right-handed listeners hear a high tone in the right ear and a low tone in the left ear, even though each ear is actually receiving both tones alternately. (a) What does this illusion reveal about how the auditory system processes binaural information? (b) Would you expect left-handed listeners to show a different pattern? Why or why not? (c) What does this tell us about the relationship between physical sound and perceptual experience? (d) Design an experiment to test whether this illusion is learned (through lateralization habits) or innate.
B.3 The Cocktail Party Effect — Experimental Design Design a rigorous experiment to measure the "cocktail party effect" — the ability to follow a target speaker among multiple competing voices. Your experiment should: - Specify the stimulus materials (how many competing voices, what content, what spatial arrangement) - Specify the dependent variable (how you measure success at following the target) - Specify at least three independent variables you would manipulate (e.g., number of competing speakers, level difference between target and maskers, spatial separation, familiarity of target voice) - Address potential confounds (factors other than your independent variables that could affect performance) - Describe how you would analyze the data to draw conclusions about the relative contribution of each variable
B.4 Critical Band Listening You have two tuning forks producing tones at 1000 Hz and 1030 Hz. The critical bandwidth at 1000 Hz is approximately 130 Hz. (a) Are these two tones within the same critical band? What perceptual consequence does this have? (b) Now imagine listening to the same two tones in a very reverberant room with 2 seconds of RT60. How might the room acoustics affect your perception of the beating between these tones? Consider: what does reverberation do to the waveform envelope? (c) In a musical context, a violinist playing in tune with another violinist will produce beats of ~0–2 Hz (nearly identical pitches). A violinist slightly out of tune produces beats at 5–15 Hz. A violinist wildly out of tune produces beats at 25–50 Hz, or no beats at all (above the critical band). Describe the perceptual character of each of these three cases. What do listeners typically describe as "in tune" vs. "out of tune" vs. "dissonant"?
B.5 The Missing Fundamental and Technology A recording engineer is mastering a track featuring a bass guitar playing low notes (E string fundamental: ~41 Hz). The track will be released in three formats: (a) hi-fi streaming (full frequency range), (b) standard MP3 (frequency range: ~20 Hz to 20 kHz but with psychoacoustic compression), and (c) telephone audio (300 Hz to 3,400 Hz). (a) In each format, which harmonics of the 41 Hz fundamental are likely to be present? (b) In which formats will the missing fundamental effect come into play, and what pitch will listeners perceive for the bass note? (c) If the bass player plays a note at E1 (41 Hz) and another at A1 (55 Hz), will the perceived pitch relationship (a perfect fourth) be preserved in each format? Why or why not? (d) What does this tell us about the robustness of pitch perception to bandwidth limitations?
Part C: Musical Applications (5 problems)
C.1 Consonance, Dissonance, and Cultural Context Listen carefully (in a quiet space, if possible) to recordings of the following: (a) A Bach chorale (well-resolved, slow-moving harmonies) (b) A Schoenberg atonal work (e.g., from Op. 11 piano pieces) (c) Balinese gamelan music (d) Jazz chord voicings with added 9ths, 11ths, and 13ths
For each, describe your personal perception of consonance and dissonance. Then consider: are your reactions psychoacoustic (based on roughness and critical bands) or cultural (based on familiarity and training)? How could you design an experiment to determine the relative contribution of each factor for a group of listeners with different musical backgrounds?
C.2 Auditory Streaming and Counterpoint Bach's Cello Suites are written for a solo instrument yet create the illusion of multiple simultaneous voices (typically two or three). This is achieved through melodic writing that rapidly alternates between pitches in different registers. (a) What auditory streaming principle does this technique exploit? (b) What are the psychoacoustic conditions that make this illusion succeed? (Consider: pitch register separation, note duration, tempo, and timbre.) (c) Would the same technique work as effectively on a harpsichord? On a flute? On a tuba? Explain your reasoning with reference to psychoacoustic principles. (d) Describe a specific piece of music (from any tradition) where you believe auditory streaming is being deliberately exploited, and explain the perceptual mechanism.
C.3 Loudness and Live Sound Engineering A live sound engineer is mixing a rock concert in a 5,000-seat arena. The target listening level for the audience is approximately 100 dB SPL. (a) At 100 dB SPL, how does the equal-loudness contour shape compare to the 60 dB SPL curve? What happens to the perceived balance between bass, midrange, and treble at high listening levels? (b) The engineer notices that the kick drum (centered around 80–100 Hz) seems to be losing definition in the mix at high volumes. Using your knowledge of equal-loudness contours, explain why this might be counterintuitive — shouldn't the bass be perceived as louder at high SPL? (c) The opening act performs at a much lower level (about 80 dB SPL). The engineer maintains the same EQ settings. How will the tonal balance sound different to the audience during the opening act vs. the headliner, and what should the engineer do to compensate?
C.4 Timbre and Spectral Masking Two instruments, a flute and an oboe, are playing the same pitch simultaneously in an orchestral texture. (a) Explain why these two instruments sound different from each other even at the same fundamental frequency and the same overall loudness. (b) In a dense orchestral tutti (full orchestra playing together), which instrument do you think would be more audible and why? Consider the spectral characteristics of each instrument, the masking properties of other orchestral instruments, and the frequency ranges where the Fletcher-Munson curves show greatest ear sensitivity. (c) Composers often write important melodic lines for the oboe (rather than the flute) when they need the line to project through a thick orchestral texture. Based on your psychoacoustic analysis, why might this be a wise scoring choice? (d) How does this relate to the concept of "orchestral balance" — is balance about equal physical loudness or equal perceived loudness?
C.5 Rhythm, Temporal Resolution, and Musical Style Human temporal resolution — the ability to perceive rapid events as distinct — has limits (see Section 5.10). Different musical styles exploit these limits in very different ways. (a) Fast bluegrass banjo playing can reach 300+ notes per minute (5+ notes per second). Is this within or near the boundary of individual note perception? What happens perceptually when note rates exceed about 600–800 per minute? (b) In hip-hop and electronic music, producers sometimes use very rapid sub-divisions (subdivisions of the 16th note at high tempos) that approach 10 ms in duration. At what point do listeners stop perceiving these as distinct rhythmic events and start hearing them as timbre (texture) rather than rhythm? (c) The "groove" of a rhythmic performance is partly a matter of tiny timing deviations — individual notes arriving slightly early or slightly late relative to the mathematical grid. How small can these deviations be and still be perceptible? How large before they are heard as "out of time" rather than expressive? What does this suggest about the temporal sensitivity of the rhythmic perception system? (d) Design a psychoacoustic experiment to determine the just-noticeable difference (JND) for rhythmic timing at a tempo of 120 beats per minute.
Part D: Designing Psychoacoustic Experiments (5 problems)
D.1 Testing the Missing Fundamental Across Age Groups Design a study to test whether the missing fundamental effect is equally strong in young children (age 5–7), adolescents, and adults. Your design should: - Describe the stimuli (which frequencies, what combinations, what fundamental frequency) - Describe the task (how you measure whether the subject hears the implied fundamental) - Explain how you would adapt the task for children who may not be able to give precise verbal responses about pitch - Predict the pattern of results you would expect if the missing fundamental mechanism is: (a) innate and present from birth, (b) learned through musical experience, (c) partially innate and partially developed through experience
D.2 Cross-Cultural Consonance Testing Following up on the Tsimane research discussed in Section 5.9, design a study to test consonance/dissonance perception in a population with a musical background very different from Western tonality (e.g., listeners trained in Indian classical music, Arabic maqam, or gamelan). Your design should: - Specify the stimuli (which intervals, in which tuning system) - Control for the effects of acoustic roughness (which should be culture-independent) vs. musical familiarity (which is culture-specific) - Include a control group of Western-trained listeners - Predict specific differences you would expect between groups, based on the theories discussed in this chapter - Describe how you would interpret results that show both similarities (supporting universality) and differences (supporting cultural construction)
D.3 Measuring the Cocktail Party Effect in Music The cocktail party effect was originally studied with speech, but musicians experience an analogous challenge when trying to hear their own part within a complex ensemble texture. Design a study measuring "musical cocktail party" performance: - Participants listen to a recorded orchestral excerpt and are asked to follow a specific instrument - What factors would you vary (number of instruments in texture, dynamic level of target instrument, harmonic relationship between target and maskers)? - How would you measure successful "tracking" of the target instrument? - Would you expect professional musicians to outperform non-musicians? Would you expect musicians who play the target instrument to outperform those who play different instruments? - Design a specific comparison that would tell you whether the advantage (if any) of musical training is perceptual or cognitive
D.4 HRTF Individualization Head-related transfer functions (HRTFs) differ from person to person based on individual ear geometry. Some commercial 3D audio systems use "generic" HRTFs (measured on an artificial head), while others offer personalized HRTFs measured from the individual user's ears. Design an experiment to test whether personalized HRTFs produce meaningfully better spatial audio than generic HRTFs. Consider: - What aspects of spatial perception would you measure (elevation discrimination, front-back discrimination, distance perception, externalization — i.e., do sounds feel outside vs. inside the head)? - How would you counterbalance the order of personalized vs. generic HRTF conditions? - What would constitute "meaningful" improvement — statistical significance, or a difference large enough to affect real-world usage? - Are there participant characteristics (e.g., previous 3D audio experience, ear geometry extremeness) that might moderate the effect of HRTF personalization?
D.5 The Loudness-Pitch Interaction There is a well-documented but subtle interaction between loudness and pitch perception: for many listeners, very loud low-frequency tones seem to drop slightly in pitch, while very loud high-frequency tones seem to rise slightly in pitch. This is called the "loudness-pitch interaction" or the Stevens effect. - Design an experiment to measure this effect at three frequency ranges (low: 200 Hz; middle: 1000 Hz; high: 5000 Hz) - Describe the method of adjustment procedure you would use: the listener adjusts the frequency of a variable tone to match the pitch of a reference tone at different levels - What control condition would you need to rule out the possibility that any observed pitch shift is due to changes in basilar membrane response at different levels? - What musical implications would this effect have, if it is large enough to be noticeable in real listening situations?
Part E: Synthesis and Applications (5 problems)
E.1 Psychoacoustics and the Spotify Spectral Dataset The Spotify Spectral Dataset (introduced in Chapter 1) contains audio features including "energy," "loudness," "acousticness," and "speechiness" for 10,000 tracks across 12 genres. These features are derived from audio signal analysis. (a) The "energy" feature is described as a measure of "intensity and activity" in the audio. How might this feature be related to the equal-loudness contours we discussed? Would two tracks with the same measured RMS level necessarily have the same Spotify "energy" rating? Why or why not? (b) The "acousticness" feature measures "whether the track is acoustic." Using your knowledge of psychoacoustics, what spectral or temporal characteristics would you expect to distinguish acoustic instruments from electronic/synthesized ones? What masking or streaming properties might contribute to the "acoustic" quality of a sound? (c) Propose two additional audio features that Spotify does not currently include, but that would be predictable from psychoacoustic principles and useful for music recommendation. Describe how each feature would be calculated and what musical dimension it would capture.
E.2 Psychoacoustics and Accessibility Approximately 15% of the population has some degree of hearing loss. Using the psychoacoustic principles in this chapter, analyze how different types of hearing loss affect the musical experience: (a) High-frequency hearing loss (the most common type, often age-related or noise-induced): Which aspects of music perception are most affected? Consider: pitch of high instruments, timbre differentiation, spatial hearing, masking patterns. (b) Moderate flat hearing loss (reduced sensitivity at all frequencies): How would the equal-loudness contour effectively shift for this listener? What would the perceived tonal balance of an orchestral recording sound like? (c) Propose two specific design features for concert halls or live sound systems that would improve the musical experience for listeners with mild hearing loss, based on psychoacoustic principles (not simply "turn it up louder").
E.3 Theme 1 (Reductionism vs. Emergence) in Psychoacoustics Section 5.14 discussed whether psychoacoustics is "just more physics." Taking the specific example of auditory scene analysis (how the brain separates a complex acoustic mixture into distinct sound streams): (a) Describe the phenomenon at the physical level (pressure waves, basilar membrane response, auditory nerve firing) (b) Describe the same phenomenon at the perceptual/cognitive level (auditory streaming, grouping principles) (c) Is the perceptual-level description fully derivable from the physical-level description? What would you need to add? (d) Identify one specific aspect of auditory scene analysis that seems irreducibly perceptual — where the physical description alone seems insufficient to explain what occurs (e) Does this analysis support a reductionist, emergentist, or dualist (separate physical and mental) account of hearing? Defend your position.
E.4 Universal vs. Cultural: Music and Emotion Section 5.13 previewed the question of whether musical emotion is universal or culturally constructed. Using psychoacoustic principles as your starting point: (a) Identify three specific musical parameters (e.g., tempo, mode, loudness) and describe the psychoacoustic mechanism by which each might produce a specific emotional response (e.g., arousal, sadness, tension) (b) For each mechanism you identified, consider: is the emotional response predicted by the psychoacoustic mechanism universal (the same across all cultures), or is it modulated by cultural learning? (c) Describe a specific research finding or musical example that challenges the idea that musical emotion is simply a product of psychoacoustic primitives (d) Propose a model that incorporates both psychoacoustic (universal) and cultural (specific) contributions to musical emotion perception
E.5 Listening Journal: Psychoacoustic Observation Over the course of five days, keep a "psychoacoustic listening journal." Each day, select one musical listening experience and record: - What you were listening to and in what environment - One observation about auditory streaming (which voices/instruments could you follow separately? which merged into a texture?) - One observation about masking (what was hard to hear? why, based on the level or frequency of competing sounds?) - One observation about spatial perception (did the recording have a sense of space? from which directions did sounds appear to come?) - One moment where your perception surprised you — where what you heard was not what the physics would simply predict
Write a 600-word synthesis at the end of the five days, discussing what repeated psychoacoustic listening attention taught you about your own auditory perception and about the music you listened to.