Chapter 5 Key Takeaways: Psychoacoustics — The Physics Inside Your Head
The Core Ideas
1. Perception Is Not Passive Reception — It Is Active Construction
The single most important insight in this chapter: the auditory brain does not simply record the physical properties of sound waves and play them back. It actively processes, interprets, predicts, and constructs perceptual representations that go beyond — and sometimes actively diverge from — the physical stimulus. What you "hear" is as much a product of your brain's activity as of the physical sound in the room. The Shepard tone illusion (an endlessly ascending scale that never actually rises), the missing fundamental (hearing a pitch that is physically absent), and auditory streaming (following one voice in a crowd) all demonstrate this constructive activity.
2. Loudness Perception Is Frequency-Dependent
The equal-loudness contours (Fletcher-Munson curves) show that the ear is far more sensitive to sounds in the 2–5 kHz range than to very low or very high frequencies. A 100 Hz tone requires about 15–20 dB more physical intensity than a 1,000 Hz tone to sound equally loud. This means: - Orchestral balance depends on listening level - Bass frequencies require dramatically more amplifier power to achieve the same perceived loudness as midrange - Audio engineers must account for equal-loudness effects when mixing at different playback levels - The ear canal's resonance (~3,400 Hz) partly explains the heightened sensitivity in the 3–4 kHz region
3. The Cochlea Performs a Mechanical Fourier Transform
The basilar membrane maps frequency to position along its length — high frequencies near the base, low frequencies near the apex. This creates a tonotopic (frequency-to-place) representation of sound, parallel to the mathematical operation of Fourier analysis. The hair cells at each position signal the amplitude of their corresponding frequency component. Critical bands — the frequency ranges processed by single auditory filters — represent the cochlea's frequency resolution limit and determine when spectral components interact (within a band) versus are processed independently (across bands).
4. Masking Is a Fundamental Limit of Hearing and a Tool for Audio Compression
Simultaneous masking (louder sounds suppressing softer nearby-frequency sounds), forward masking (a loud sound suppressing subsequent softer sounds for up to 200 ms), and backward masking (a loud sound retroactively suppressing a preceding softer sound) together define the perceptual limits of hearing at any given moment. Audio compression codecs (MP3, AAC) exploit masking to discard inaudible components, achieving 10:1 or greater compression ratios with minimal perceptual quality loss.
5. Pitch Involves Both Place Coding and Temporal Coding
No single mechanism explains pitch perception across the full audible range. Temporal coding (phase-locking of auditory nerve fibers to the period of low-frequency sounds) provides fine pitch discrimination below ~4 kHz. Place coding (tonotopic position on the basilar membrane) dominates above ~4–5 kHz. Both mechanisms contribute in the intermediate range, and the brain integrates information from both for a unified pitch percept.
6. The Missing Fundamental Demonstrates That Pitch Is a Construction
Listeners clearly perceive a pitch even when the fundamental frequency of a complex tone is physically absent from the stimulus. The auditory system extracts the common periodicity of the present harmonics and constructs a pitch corresponding to the implied fundamental. This demonstrates that pitch is not a simple readout of a specific frequency in the sound; it is a perceptual inference from the pattern of available harmonics. The missing fundamental explains why small speakers can convey bass, why telephones convey voice pitch, and why orchestral bass is perceived even when bass frequencies attenuate with distance.
7. Auditory Scene Analysis Is the Perceptual Basis of Musical Texture
The brain's ability to separate simultaneous sounds into distinct perceptual streams — auditory scene analysis — is what allows listeners to follow a specific voice or instrument in a complex acoustic environment. It is the perceptual foundation of musical polyphony. The auditory system uses harmonicity, common onset/offset, continuity, spatial location, and timbral similarity as grouping cues. Composers from Bach to Mahler have intuitively — and sometimes explicitly — written music that exploits these grouping principles.
8. Consonance Has a Psychoacoustic Grounding, But Its Musical Role Is Cultural
Consonance/dissonance has a physical basis in roughness: when spectral components fall within the same critical band and produce beat rates in the 25–40 Hz range, the result is perceived as rough and unpleasant. Intervals that avoid this (octave, perfect fifth, etc.) are perceptually smooth (consonant). But the cultural hierarchy of consonance/dissonance — which intervals are "beautiful," which are "expressive," which are "harsh" — varies significantly across musical traditions. The physiology constrains; the culture interprets and extends.
9. Spatial Hearing Relies on Complementary Binaural Cues
The auditory system uses interaural time differences (ITD, dominant below ~1.5 kHz) and interaural level differences (ILD, dominant above ~1.5 kHz) for horizontal localization, supplemented by head-related transfer function (HRTF) spectral cues for elevation and front-back disambiguation. These mechanisms together allow humans to localize sound sources with a few degrees of precision. HRTFs are individually variable, which explains why generic spatial audio (using one person's measured HRTF) sounds less convincing for some listeners than others.
10. The Hard Problem Remains
Even after all the psychoacoustics we've studied, the "hard problem" of consciousness — why there is subjective experience at all, why physical neural processing is accompanied by felt experience — is not solved. Psychoacoustics has made enormous progress mapping the relationship between physical stimuli and perceptual responses. But the gap between "here is what the neurons are doing" and "here is what it is like to hear this music" is not bridged by any current physical or computational account. Holding this gap honestly open is the intellectually responsible position.
Essential Vocabulary
| Term | Definition |
|---|---|
| Qualia | The subjective, experiential qualities of perception ("what it is like" to hear) |
| Equal-loudness contours | Curves showing which frequency-intensity combinations produce equal perceived loudness |
| Phon | Unit of loudness level; N phons at any frequency sounds equally loud as N dB at 1 kHz |
| Critical band | Frequency range processed by a single cochlear auditory filter |
| Simultaneous masking | A loud sound making a softer nearby-frequency sound inaudible |
| Forward masking | A loud sound suppressing perception of softer sounds occurring up to ~200 ms later |
| Backward masking | A loud sound suppressing perception of a softer sound occurring up to ~20 ms earlier |
| Basilar membrane | Structure within the cochlea that maps frequency to position; performs mechanical Fourier analysis |
| Tonotopy | Frequency-to-position organization of the auditory system |
| Place theory | Theory that pitch is determined by position of maximum basilar membrane excitation |
| Temporal theory | Theory that pitch is determined by the timing pattern of auditory nerve firing |
| Phase-locking | Auditory nerve fibers firing in synchrony with the period of a sound wave |
| Missing fundamental | Perceiving the pitch of a tone whose fundamental frequency is physically absent |
| Residue pitch | Synonym for missing fundamental / virtual pitch |
| Auditory scene analysis | The brain's process of separating a complex acoustic mixture into distinct source streams |
| Auditory streaming | Perceptual organization of sound into distinct sequential and simultaneous streams |
| Consonance | Perceptual smoothness of simultaneous tones; related to low roughness between partials |
| Dissonance | Perceptual roughness of simultaneous tones; related to beating within critical bands |
| Interaural time difference (ITD) | Difference in arrival time of a sound at the two ears; used for horizontal localization |
| Interaural level difference (ILD) | Difference in sound level at the two ears due to head shadowing |
| HRTF | Head-Related Transfer Function; the direction-dependent filtering of sound by the pinna, head, and torso |
| Haas effect | Perceptual fusion of reflections within ~80 ms; precedence effect |
Recurring Themes in This Chapter
Theme 1 (Reductionism vs. Emergence): This chapter is where the reductionism-vs-emergence debate becomes most vivid. Physics can describe the cochlea's mechanics. Neuroscience can map the auditory cortex's activity. But pitch perception — the sense that this note is higher than that one — seems to emerge at the system level in ways not straightforwardly predictable from the components. And the experience of musical beauty may require accounts at a level of description that physics alone does not supply.
Theme 2 (Universal structures vs. cultural specificity): Equal-loudness contours, critical bands, and the missing fundamental appear to be universal features of human hearing (present across populations, cultures, and individuals with normal hearing). But the musical use of these features — which intervals are consonant, how dissonance is deployed, what emotional responses music evokes — varies significantly across cultures. The universal physical-perceptual apparatus is a canvas; cultural tradition is the painting.
Bridge to Part II
With Chapters 4 and 5, Part I (Sound and Vibration: The Physical Foundation) is complete. We've built from basic wave physics (Chapters 1–3) through the physics of acoustic space (Chapter 4) to the biology and psychology of hearing (Chapter 5).
Part II begins the investigation of the musical system itself — the organized structures of pitch, scale, harmony, and rhythm that human cultures have developed to exploit the physics and perception we've studied. Chapter 6 begins with the simplest musical question: why do notes exist? Why do humans organize the continuous frequency spectrum into discrete pitches, and which pitches do different cultures choose? The answers will draw on everything in Part I — wave physics, harmonic series, room acoustics, and psychoacoustics — and will reveal that musical systems are at once deeply rooted in physical and perceptual universals and powerfully shaped by cultural choice.