Chapter 5 Key Takeaways: Psychoacoustics — The Physics Inside Your Head

The Core Ideas

1. Perception Is Not Passive Reception — It Is Active Construction

The single most important insight in this chapter: the auditory brain does not simply record the physical properties of sound waves and play them back. It actively processes, interprets, predicts, and constructs perceptual representations that go beyond — and sometimes actively diverge from — the physical stimulus. What you "hear" is as much a product of your brain's activity as of the physical sound in the room. The Shepard tone illusion (an endlessly ascending scale that never actually rises), the missing fundamental (hearing a pitch that is physically absent), and auditory streaming (following one voice in a crowd) all demonstrate this constructive activity.

2. Loudness Perception Is Frequency-Dependent

The equal-loudness contours (Fletcher-Munson curves) show that the ear is far more sensitive to sounds in the 2–5 kHz range than to very low or very high frequencies. A 100 Hz tone requires about 15–20 dB more physical intensity than a 1,000 Hz tone to sound equally loud. This means: - Orchestral balance depends on listening level - Bass frequencies require dramatically more amplifier power to achieve the same perceived loudness as midrange - Audio engineers must account for equal-loudness effects when mixing at different playback levels - The ear canal's resonance (~3,400 Hz) partly explains the heightened sensitivity in the 3–4 kHz region

3. The Cochlea Performs a Mechanical Fourier Transform

The basilar membrane maps frequency to position along its length — high frequencies near the base, low frequencies near the apex. This creates a tonotopic (frequency-to-place) representation of sound, parallel to the mathematical operation of Fourier analysis. The hair cells at each position signal the amplitude of their corresponding frequency component. Critical bands — the frequency ranges processed by single auditory filters — represent the cochlea's frequency resolution limit and determine when spectral components interact (within a band) versus are processed independently (across bands).

4. Masking Is a Fundamental Limit of Hearing and a Tool for Audio Compression

Simultaneous masking (louder sounds suppressing softer nearby-frequency sounds), forward masking (a loud sound suppressing subsequent softer sounds for up to 200 ms), and backward masking (a loud sound retroactively suppressing a preceding softer sound) together define the perceptual limits of hearing at any given moment. Audio compression codecs (MP3, AAC) exploit masking to discard inaudible components, achieving 10:1 or greater compression ratios with minimal perceptual quality loss.

5. Pitch Involves Both Place Coding and Temporal Coding

No single mechanism explains pitch perception across the full audible range. Temporal coding (phase-locking of auditory nerve fibers to the period of low-frequency sounds) provides fine pitch discrimination below ~4 kHz. Place coding (tonotopic position on the basilar membrane) dominates above ~4–5 kHz. Both mechanisms contribute in the intermediate range, and the brain integrates information from both for a unified pitch percept.

6. The Missing Fundamental Demonstrates That Pitch Is a Construction

Listeners clearly perceive a pitch even when the fundamental frequency of a complex tone is physically absent from the stimulus. The auditory system extracts the common periodicity of the present harmonics and constructs a pitch corresponding to the implied fundamental. This demonstrates that pitch is not a simple readout of a specific frequency in the sound; it is a perceptual inference from the pattern of available harmonics. The missing fundamental explains why small speakers can convey bass, why telephones convey voice pitch, and why orchestral bass is perceived even when bass frequencies attenuate with distance.

7. Auditory Scene Analysis Is the Perceptual Basis of Musical Texture

The brain's ability to separate simultaneous sounds into distinct perceptual streams — auditory scene analysis — is what allows listeners to follow a specific voice or instrument in a complex acoustic environment. It is the perceptual foundation of musical polyphony. The auditory system uses harmonicity, common onset/offset, continuity, spatial location, and timbral similarity as grouping cues. Composers from Bach to Mahler have intuitively — and sometimes explicitly — written music that exploits these grouping principles.

8. Consonance Has a Psychoacoustic Grounding, But Its Musical Role Is Cultural

Consonance/dissonance has a physical basis in roughness: when spectral components fall within the same critical band and produce beat rates in the 25–40 Hz range, the result is perceived as rough and unpleasant. Intervals that avoid this (octave, perfect fifth, etc.) are perceptually smooth (consonant). But the cultural hierarchy of consonance/dissonance — which intervals are "beautiful," which are "expressive," which are "harsh" — varies significantly across musical traditions. The physiology constrains; the culture interprets and extends.

9. Spatial Hearing Relies on Complementary Binaural Cues

The auditory system uses interaural time differences (ITD, dominant below ~1.5 kHz) and interaural level differences (ILD, dominant above ~1.5 kHz) for horizontal localization, supplemented by head-related transfer function (HRTF) spectral cues for elevation and front-back disambiguation. These mechanisms together allow humans to localize sound sources with a few degrees of precision. HRTFs are individually variable, which explains why generic spatial audio (using one person's measured HRTF) sounds less convincing for some listeners than others.

10. The Hard Problem Remains

Even after all the psychoacoustics we've studied, the "hard problem" of consciousness — why there is subjective experience at all, why physical neural processing is accompanied by felt experience — is not solved. Psychoacoustics has made enormous progress mapping the relationship between physical stimuli and perceptual responses. But the gap between "here is what the neurons are doing" and "here is what it is like to hear this music" is not bridged by any current physical or computational account. Holding this gap honestly open is the intellectually responsible position.


Essential Vocabulary

Term Definition
Qualia The subjective, experiential qualities of perception ("what it is like" to hear)
Equal-loudness contours Curves showing which frequency-intensity combinations produce equal perceived loudness
Phon Unit of loudness level; N phons at any frequency sounds equally loud as N dB at 1 kHz
Critical band Frequency range processed by a single cochlear auditory filter
Simultaneous masking A loud sound making a softer nearby-frequency sound inaudible
Forward masking A loud sound suppressing perception of softer sounds occurring up to ~200 ms later
Backward masking A loud sound suppressing perception of a softer sound occurring up to ~20 ms earlier
Basilar membrane Structure within the cochlea that maps frequency to position; performs mechanical Fourier analysis
Tonotopy Frequency-to-position organization of the auditory system
Place theory Theory that pitch is determined by position of maximum basilar membrane excitation
Temporal theory Theory that pitch is determined by the timing pattern of auditory nerve firing
Phase-locking Auditory nerve fibers firing in synchrony with the period of a sound wave
Missing fundamental Perceiving the pitch of a tone whose fundamental frequency is physically absent
Residue pitch Synonym for missing fundamental / virtual pitch
Auditory scene analysis The brain's process of separating a complex acoustic mixture into distinct source streams
Auditory streaming Perceptual organization of sound into distinct sequential and simultaneous streams
Consonance Perceptual smoothness of simultaneous tones; related to low roughness between partials
Dissonance Perceptual roughness of simultaneous tones; related to beating within critical bands
Interaural time difference (ITD) Difference in arrival time of a sound at the two ears; used for horizontal localization
Interaural level difference (ILD) Difference in sound level at the two ears due to head shadowing
HRTF Head-Related Transfer Function; the direction-dependent filtering of sound by the pinna, head, and torso
Haas effect Perceptual fusion of reflections within ~80 ms; precedence effect

Recurring Themes in This Chapter

Theme 1 (Reductionism vs. Emergence): This chapter is where the reductionism-vs-emergence debate becomes most vivid. Physics can describe the cochlea's mechanics. Neuroscience can map the auditory cortex's activity. But pitch perception — the sense that this note is higher than that one — seems to emerge at the system level in ways not straightforwardly predictable from the components. And the experience of musical beauty may require accounts at a level of description that physics alone does not supply.

Theme 2 (Universal structures vs. cultural specificity): Equal-loudness contours, critical bands, and the missing fundamental appear to be universal features of human hearing (present across populations, cultures, and individuals with normal hearing). But the musical use of these features — which intervals are consonant, how dissonance is deployed, what emotional responses music evokes — varies significantly across cultures. The universal physical-perceptual apparatus is a canvas; cultural tradition is the painting.


Bridge to Part II

With Chapters 4 and 5, Part I (Sound and Vibration: The Physical Foundation) is complete. We've built from basic wave physics (Chapters 1–3) through the physics of acoustic space (Chapter 4) to the biology and psychology of hearing (Chapter 5).

Part II begins the investigation of the musical system itself — the organized structures of pitch, scale, harmony, and rhythm that human cultures have developed to exploit the physics and perception we've studied. Chapter 6 begins with the simplest musical question: why do notes exist? Why do humans organize the continuous frequency spectrum into discrete pitches, and which pitches do different cultures choose? The answers will draw on everything in Part I — wave physics, harmonic series, room acoustics, and psychoacoustics — and will reveal that musical systems are at once deeply rooted in physical and perceptual universals and powerfully shaped by cultural choice.