58 min read

This appendix provides structured summaries of thirty landmark studies in psychoacoustics and music cognition that are referenced throughout The Physics of Music & the Music of Physics. Each entry follows a consistent format: full citation...

Appendix D: Key Studies Summary — Psychoacoustics & Music Cognition

This appendix provides structured summaries of thirty landmark studies in psychoacoustics and music cognition that are referenced throughout The Physics of Music & the Music of Physics. Each entry follows a consistent format: full citation, background on the scientific problem, methods employed, key findings, limitations, textbook chapter references, and a statement of broader significance. Together, these studies trace the intellectual arc of a field that sits at the intersection of physics, neuroscience, psychology, and musicology — a field that remains one of the most productive frontiers in cognitive science.

Readers are encouraged to use this appendix as both a reference guide and a reading list. The studies span more than a century of inquiry, from the early psychophysical experiments of Fletcher and Munson to the neuroimaging work of Salimpoor and colleagues. Wherever possible, original sources are identified so that interested students can pursue primary reading.


SECTION I: PSYCHOACOUSTICS CLASSICS

Study 1

Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation. Journal of the Acoustical Society of America, 5(2), 82–108.

Background: Before Fletcher and Munson, engineers and musicians had no rigorous way to compare the perceived loudness of sounds at different frequencies. The telephone industry, where Fletcher worked at Bell Laboratories, needed to understand how to allocate bandwidth efficiently — which parts of the audible spectrum mattered most to human perception. The underlying question was deceptively simple: if a 1000 Hz tone at 60 decibels sounds a certain loudness, what sound pressure level must a 100 Hz tone reach to sound equally loud?

Methods: Fletcher and Munson recruited a panel of listeners with normal hearing and presented them with pairs of tones: a reference tone at 1000 Hz set to various sound pressure levels, and a comparison tone at a test frequency. Listeners adjusted the comparison tone until it appeared equally loud. Experiments were conducted across a wide frequency range (20 Hz to 15,000 Hz) and across a wide range of reference levels, yielding a family of equal-loudness curves that map out the sensitivity of the human ear across the audible spectrum.

Key Findings: Human loudness perception is profoundly non-linear across frequency. The ear is most sensitive in the 2–5 kHz range (near the first resonance of the ear canal and where the cochlea responds most efficiently), and dramatically less sensitive at low and very high frequencies. At low sound levels, this non-linearity is especially pronounced: a bass tone at 20 Hz must be roughly 70 dB louder than a 1 kHz tone to sound equally loud at quiet listening levels. This is why bass frequencies seem to "disappear" at low volume on home audio systems and why many amplifiers incorporate a "loudness" compensation circuit. The unit of loudness — the "phon" — was defined by this work, and the resulting curves were later standardized by the International Organization for Standardization.

Limitations: The original curves relied on a relatively small listener panel under monaural headphone listening conditions, which differ from free-field (loudspeaker) listening. Later revisions (ISO 226:2003) updated the curves substantially using larger, more diverse samples. Individual variation in ear-canal resonance and cochlear sensitivity also limits the universality of any single set of curves.

Textbook Chapters: Ch. 3 (Hearing the Physical World), Ch. 8 (Dynamics and Decibels), Ch. 14 (The Frequency Domain and the Ear).

Why It Matters: The equal-loudness contours remain the foundation of audio engineering, hearing conservation standards, and the design of musical instruments. They reveal that "loudness" is not a physical property of sound but a perceptual construction shaped by the ear's evolved priorities.


Study 2

von Békésy, G. (1960). Experiments in Hearing. New York: McGraw-Hill. (Nobel Prize in Physiology or Medicine, 1961.)

Background: For most of the nineteenth century, the mechanism by which the inner ear separated sounds by frequency was a matter of passionate debate. Helmholtz had proposed a "resonance theory" in which discrete fibers along the basilar membrane act like the strings of a piano, each tuned to a specific frequency. The question of whether this picture was even approximately correct could only be answered by direct mechanical measurement of the living basilar membrane — an extraordinarily difficult experimental problem given the membrane's location deep within the temporal bone.

Methods: Békésy worked with fresh human and animal cochleae, painstakingly dissecting temporal bones under a microscope and coating the basilar membrane with silver particles to make its motion visible under stroboscopic illumination. He drove the cochlea with pure tones at various frequencies and measured the resulting pattern of membrane displacement using optical and later capacitive techniques. His experiments spanned decades and were compiled in the 1960 book, which collects both his methods and his theoretical synthesis.

Key Findings: Békésy demonstrated that sound causes a traveling wave along the basilar membrane, moving from the stiff, narrow base (near the stapes) toward the wide, floppy apex. Each frequency produces a wave that reaches a maximum amplitude at a specific location: high frequencies peak near the base, low frequencies near the apex. This tonotopic organization — the continuous frequency map along the basilar membrane — is the physical foundation of pitch perception. Békésy's work also revealed that the basilar membrane responses he measured were surprisingly broad, a finding that later work would show reflects the passive, damped mechanics of post-mortem tissue (living cochleae exhibit much sharper tuning via active outer hair cell amplification).

Limitations: Because his measurements were made on cadaveric or physiologically compromised tissue, Békésy systematically underestimated the sharpness of frequency tuning. The active cochlear amplifier — driven by prestin-powered outer hair cells — was not discovered until decades later and explains the much narrower tuning seen in healthy ears.

Textbook Chapters: Ch. 4 (The Mechanical Ear), Ch. 5 (Frequency, Pitch, and the Cochlea), Ch. 14 (The Frequency Domain and the Ear).

Why It Matters: Békésy's work established that the cochlea is, among other things, a physical spectrum analyzer — a mechanical Fourier transformer that maps frequency to place. This insight unifies the physics of vibration with the neuroscience of hearing at the most fundamental level.


Study 3

Plomp, R., & Levelt, W. J. M. (1965). Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America, 38(4), 548–560.

Background: Since Helmholtz, theorists had proposed that musical consonance — the sense of stability or pleasantness between two simultaneously sounded pitches — was related to the absence of beating between their harmonics. But this explanation had remained largely qualitative. Plomp and Levelt sought a rigorous psychophysical formulation by asking: what is the relationship between the frequency separation of two pure tones and the listener's sense of consonance or dissonance?

Methods: Listeners judged the consonance/dissonance of pairs of pure tones across a range of frequency separations, from unison to a full octave above. Critically, the experiment used pure (sinusoidal) tones rather than complex tones with harmonics, isolating the role of the critical band from the interference of overtone series. The experiment was repeated at several center frequencies. Listeners rated each interval on a scale from "most consonant" to "most dissonant."

Key Findings: Consonance ratings were not uniform across the interval. For small frequency separations (less than about 25% of the critical bandwidth), listeners heard a rough, dissonant sensation. As the separation increased beyond the critical bandwidth, ratings smoothed to a plateau of consonance. The maximum dissonance occurred when tones were separated by roughly one quarter of the critical bandwidth — the point of maximum beating roughness. This result tied musical consonance directly to the physics of cochlear filtering: intervals sound consonant when their partials fall in different critical bands and do not excite the same hair cells simultaneously. Musical intervals like the fifth and octave are consonant precisely because their harmonic series avoid critical-band overlap.

Limitations: The experiment used pure tones, and real musical tones are complex. The translation from pure-tone results to the consonance of musical intervals depends on additional assumptions about how partial-by-partial interactions sum to an overall sensation.

Textbook Chapters: Ch. 6 (Consonance, Dissonance, and the Physics of Harmony), Ch. 7 (Intervals and the Harmonic Series), Ch. 15 (Timbre and Spectrum).

Why It Matters: Plomp and Levelt provided the first rigorously psychophysical account of consonance, replacing vague appeals to "simplicity of ratios" with a mechanistic account rooted in cochlear processing. The paper is still routinely cited and remains foundational to computational models of harmony.


Study 4

Zwicker, E., & Fastl, H. (1990). Psychoacoustics: Facts and Models. Berlin: Springer-Verlag. (Multiple editions through 2007.)

Background: By the mid-twentieth century, the concept of the auditory filter — a frequency-selective channel corresponding to a region of the basilar membrane — had become central to hearing science. But a coherent synthesis of auditory filter shapes, masking patterns, loudness models, and their engineering applications remained scattered across dozens of papers. Zwicker spent decades systematically measuring and modeling auditory filters, culminating in the textbook that bears both his name and that of his collaborator Hugo Fastl.

Methods: The book synthesizes Zwicker's experimental work, which relied primarily on masking experiments: presenting a target tone in the presence of a masker noise of varying bandwidth and measuring the threshold of detection. By manipulating the masker bandwidth, researchers can infer the shape and width of the auditory filter at each center frequency. Zwicker formalized the concept of the "critical band" — the filter bandwidth beyond which adding more masking energy no longer raises threshold — and introduced the "Bark scale," a frequency scale where one unit (one Bark) corresponds roughly to one critical bandwidth.

Key Findings: The human auditory system contains approximately 24 critical bands spanning the audible range from 20 Hz to 16 kHz. Below about 500 Hz, critical bands are approximately 100 Hz wide; above 500 Hz, they scale proportionally to center frequency (about 20% of center frequency). The Bark scale provides a perceptually uniform frequency axis. Zwicker also developed a loudness model — the "Zwicker loudness" model, later standardized as ISO 532B — that predicts perceived loudness from the distribution of energy across critical bands.

Limitations: Critical band widths estimated by masking experiments do not precisely match filter widths estimated by other methods (notched-noise masking, auditory brainstem responses). The Bark scale is a useful approximation rather than a precise psychophysical law. Active cochlear processing complicates the mapping between filter bandwidths and membrane mechanics.

Textbook Chapters: Ch. 4 (The Mechanical Ear), Ch. 5 (Frequency, Pitch, and the Cochlea), Ch. 14 (The Frequency Domain and the Ear).

Why It Matters: The critical band framework is the workhorse of audio engineering, underlying perceptual audio codecs (MP3, AAC), hearing aid design, and psychoacoustic models of timbre and consonance. It provides the bridge between the physics of cochlear mechanics and the psychology of hearing.


Study 5

Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press.

Background: In everyday listening, the auditory system performs a remarkable feat: it separates the sonic world into discrete streams — the voice of a friend, the traffic outside, the hum of an air conditioner — despite the fact that the eardrums receive only a single time-varying pressure waveform that is the sum of all these sources. Bregman called this problem "auditory scene analysis" and spent over two decades investigating the perceptual principles that allow the brain to solve it, using carefully designed auditory demonstrations and controlled psychoacoustic experiments.

Methods: Bregman's book summarizes hundreds of experiments, many of his own devising, involving alternating tone sequences, simultaneous complex tones, and carefully crafted "auditory ambiguity" displays. A typical paradigm: listeners hear a sequence of tones alternating between two frequencies (ABA–ABA–) at various rates and frequency separations. Depending on these parameters, listeners hear either a single galloping stream or two separate streams — one high, one low. Bregman documented the conditions under which streams split or fuse and articulated the Gestalt-like principles underlying these segregation effects.

Key Findings: Auditory streaming is governed by principles analogous to Gestalt principles in vision: proximity (tones close in frequency or time tend to group), similarity (similar timbres stream together), continuity (a tone continuing through noise tends to be perceived as uninterrupted), and common fate (components that change together belong together). Bregman demonstrated that these principles reflect evolved strategies for solving the "cocktail party problem" — separating concurrent sound sources in natural environments. The principles also explain perceptual illusions in music, such as the way Bach's solo violin partitas create the illusion of multiple voices through a single melodic line.

Limitations: The laboratory paradigms Bregman used are simpler than real acoustic scenes, and the mapping from experimental results to everyday listening involves extrapolation. The neural mechanisms underlying stream segregation were not well characterized at the time of publication, though subsequent neuroimaging work has identified brainstem and cortical correlates.

Textbook Chapters: Ch. 9 (Auditory Streaming and Musical Counterpoint), Ch. 11 (Rhythm, Meter, and Temporal Grouping), Ch. 20 (Polyphony and the Physics of Multiple Voices).

Why It Matters: Bregman's framework revealed that musical listening is not passive reception but active perceptual construction — the auditory brain is always solving an inverse problem, inferring sources from signals. This insight bridges the physics of sound with the neuroscience and psychology of musical experience.


SECTION II: MUSIC PERCEPTION

Study 6

Deutsch, D. (1975). Two-channel listening to musical scales. Journal of the Acoustical Society of America, 57(5), 1156–1160.

Background: The octave illusion is one of the most striking demonstrations that auditory perception is not a simple readout of the physical stimulus. Diana Deutsch discovered it while investigating how the brain integrates information from the two ears. The question at stake was whether the auditory system treats the two ears as independent channels or as components of a unified perceptual system that assigns sounds to locations in space.

Methods: Listeners heard a repeating two-tone pattern: a high tone (800 Hz) in one ear and a low tone (400 Hz) in the other, alternating in a continuous sequence, with the ear of presentation switching on each alternation. The physical stimulus is symmetric and ambiguous. Listeners reported what they heard.

Key Findings: Most right-handed listeners reported hearing a single tone that alternated in pitch — high, low, high, low — always in one ear (usually the right). The high tone seemed to be in the right ear regardless of which ear actually received it. Left-handers showed more variable patterns. This "octave illusion" demonstrates that the brain resolves ambiguous binaural information according to a dominant-ear bias and that perceived pitch and perceived location can diverge dramatically from the physical signal. The finding revealed that auditory localization and pitch perception interact in complex ways.

Limitations: The illusion is highly dependent on individual differences in handedness and, presumably, hemispheric dominance. The precise neural locus of the illusion — whether it arises from binaural brainstem processing or higher cortical mechanisms — was not determined in this study.

Textbook Chapters: Ch. 10 (Auditory Illusions and Perceptual Construction), Ch. 22 (Pitch and the Brain).

Why It Matters: The octave illusion demonstrates that what we hear is not what is physically present but what the brain constructs — a theme that runs throughout the textbook's treatment of music perception.


Study 7

Shepard, R. N. (1964). Circularity in judgments of relative pitch. Journal of the Acoustical Society of America, 36(12), 2346–2353.

Background: Pitch is commonly described as a linear dimension: higher or lower. But music theorists have long noted that pitches separated by an octave share a peculiar quality — they seem to be "the same note" in a different register. Shepard was intrigued by whether this circular quality of pitch could be made perceptually dominant, producing a sequence that endlessly ascends (or descends) without ever arriving anywhere — a sonic analog of Escher's impossible staircase.

Methods: Shepard synthesized tones consisting of ten simultaneous sinusoidal partials separated by octaves, with their amplitudes shaped by a fixed bell-curve spectral envelope. By holding the envelope fixed and gradually shifting the phases of the underlying partials, he created a sequence of tones that appear to rise in pitch while returning to the same chroma. The experiment tested listeners' judgments of which tone in a pair was higher.

Key Findings: Listeners judged a "Shepard tone" ascending sequence as continuously rising, even though after twelve steps the sequence had returned to its starting point. The tones are perceived as higher or lower based on their chroma (position within the octave), not on their physical frequency relationship. This demonstrates that pitch has two quasi-independent dimensions: height (the linear dimension) and chroma (the circular dimension). Musical scales exploit the chroma dimension; the Shepard tone isolates it.

Limitations: The effect depends critically on the spectral envelope remaining fixed (masking the lack of overall spectral shift). With headphones that reveal spectral differences, some listeners can hear the trick. The effect also depends on cultural familiarity with the chromatic scale.

Textbook Chapters: Ch. 5 (Frequency, Pitch, and the Cochlea), Ch. 10 (Auditory Illusions and Perceptual Construction), Ch. 21 (Scales, Tuning, and the Structure of Musical Space).

Why It Matters: Shepard tones demonstrated that pitch is not one-dimensional, inspiring decades of computational and neural modeling of pitch as a helix or torus — with profound implications for understanding why the octave is musically special.


Study 8

Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

Background: A central puzzle in music psychology is why music evokes emotion. One influential answer is that musical emotions arise from the brain's predictions about what comes next and the pleasures or tensions of having those predictions confirmed or violated. Huron's book synthesizes two decades of his own research with findings from behavioral ecology, evolutionary psychology, and neuroscience to propose a comprehensive theory of musical expectation — the ITPRA framework.

Methods: The book presents a series of interrelated experiments on melodic expectation, including studies in which listeners rate the "goodness" of melodic continuations, studies using reaction-time methods to assess priming of expected versus unexpected continuations, cross-cultural comparisons of melodic expectations, and analysis of statistical regularities in musical corpora. Key experiments use Narmour's implication-realization model as a foil and baseline.

Key Findings: Huron proposes that the brain's response to an expected event unfolds in five stages: Imagination (anticipatory feelings), Tension (physiological arousal during anticipation), Prediction (a fast, pre-conscious assessment of whether the outcome matched expectation), Reaction (an immediate affective response), and Appraisal (a slower cognitive evaluation). Musical pleasure arises partly from the satisfaction of accurate prediction and partly from the controlled violation of expectation — the "sweet" anticipation of the title. Huron shows that musical styles have evolved to exploit and manipulate these expectation mechanisms, and that the pleasures of music are tied to the same biological reward systems that reinforce learning.

Limitations: The ITPRA framework is a theoretical synthesis, and direct experimental tests of all five stages are incomplete. Distinguishing prediction from reaction in rapid musical sequences requires methodology that was only beginning to mature at time of publication. Cross-cultural generalizability of melodic expectation norms was not fully established.

Textbook Chapters: Ch. 12 (Musical Expectation and Tension), Ch. 26 (Music and Emotion), Ch. 32 (Musical Form and the Brain).

Why It Matters: Huron's framework provides the most comprehensive account of musical emotion grounded in evolutionary and cognitive science, offering a bridge between the abstract physics of sound and the deeply felt human experience of music.


Study 9

Sloboda, J. A. (1991). Music structure and emotional response: Some empirical findings. Psychology of Music, 19(2), 110–120.

Background: By the late 1980s, researchers had documented that music reliably evokes emotional responses, but systematic analysis of which structural features of music trigger which physical manifestations of emotion was limited. Sloboda sought to identify the specific musical structures — melodic, harmonic, rhythmic — associated with distinct physiological responses such as tears, shivers, and "rushes of excitement."

Methods: A large sample of music listeners (n = 83) completed a retrospective questionnaire identifying pieces of music that had produced specific physical responses: tears, shivers (frisson), racing heart, laughter, lump in throat. Participants then identified the specific moments in those pieces that triggered each response. Musical analysis of the identified passages was then conducted to determine common structural features.

Key Findings: Each physical response was associated with a distinct set of structural features. Tears were associated with melodic appoggiaturas (leaning notes that resolve downward), harmonic sequences descending through the cycle of fifths, and passages of unexpected harmonic richness. Shivers (frisson) were associated with sudden changes in harmony, register, or texture — particularly unexpected entries of new voices or sudden dynamic shifts. Racing heart was associated with syncopation and rhythmic acceleration. These results suggest that specific acoustic and structural features reliably map onto specific emotional-physiological pathways, even across diverse musical genres.

Limitations: The retrospective questionnaire method depends on participants' accurate recollection of which musical moment produced the response, which may be imprecise. The sample was self-selected music enthusiasts, limiting generalizability to less engaged listeners. No physiological measurements were taken; all reports were subjective.

Textbook Chapters: Ch. 26 (Music and Emotion), Ch. 28 (Tension, Release, and Musical Drama), Ch. 36 (The Neuroscience of Musical Beauty).

Why It Matters: Sloboda's work provided the first systematic empirical mapping from musical structure to emotional-physiological response, anchoring the study of musical emotion in concrete, analyzable features rather than vague impressions.


Study 10

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, 6(7), 688–691.

Background: Cognitive neuroscientists have long debated whether the brain contains specialized, encapsulated modules for specific cognitive functions, or whether cognition arises from more general-purpose processing networks. Music offers a particularly interesting test case, because musical abilities seem to be selectively impaired by brain injury in some patients and selectively preserved in others (e.g., in patients with profound language deficits). Peretz and Coltheart synthesized neuropsychological case studies to propose a modular architecture for music processing.

Methods: The paper reviews neuropsychological case studies of patients with selective musical deficits — including cases of acquired amusia (loss of music perception following brain injury without loss of other auditory or cognitive abilities) — and cases of preserved musical ability despite severe language or general cognitive impairment. The review draws on Peretz's own patient series as well as published cases in the literature, using the modularity criteria of selective impairment and selective sparing.

Key Findings: Peretz and Coltheart propose a modular architecture with distinct processing components for tonal organization (scale analysis, contour analysis), temporal organization (rhythm, meter), and emotional response to music. These components are doubly dissociable: some patients lose tonal processing while retaining temporal processing, and vice versa. The emotional response module appears to be partially separable from the perceptual modules. This architecture suggests that music processing is neurologically specialized — that it is not merely a by-product of general auditory or linguistic processing.

Limitations: Neuropsychological cases reveal what can be selectively damaged but do not definitively establish that intact processing is modular. Double dissociations in brain-damaged patients are subject to multiple interpretations. The paper is explicitly theoretical and proposes a model rather than reporting new experiments.

Textbook Chapters: Ch. 23 (The Architecture of Musical Cognition), Ch. 25 (Melody and Harmony in the Brain), Ch. 33 (Musical Aptitude and Musical Development).

Why It Matters: The modular model of music processing has driven two decades of neuroimaging research and continues to frame debates about the evolutionary origins of music and its relationship to language.


SECTION III: MUSIC AND EMOTION

Study 11

Goldstein, A. (1980). Thrills in response to music and other stimuli. Physiological Psychology, 8(1), 126–129.

Background: Before Goldstein, the subjective experience of "thrills" or chills in response to music was largely anecdotal — mentioned in letters and memoirs of composers and listeners but never systematically studied. Goldstein recognized that these thrills represented a measurable physiological response to aesthetic experience and that their pharmacological manipulation could reveal something about the neurobiological substrate of musical emotion. The key question: are musical thrills mediated by the opioid system?

Methods: In a within-subjects design, Goldstein administered either naloxone (an opioid antagonist that blocks endorphin receptors) or a placebo to participants before they listened to music they had previously identified as thrilling. Participants rated the frequency and intensity of thrills during listening sessions under each drug condition. The logic was that if thrills are mediated by endorphins, blocking opioid receptors should reduce them.

Key Findings: Naloxone significantly reduced the frequency of thrills in approximately half of participants — those who reported frequent, intense thrills. In the other participants, naloxone had no significant effect. This suggests that at least a subpopulation of "thrill-prone" listeners experience musically-induced chills via an endorphinergic mechanism. It was the first experimental evidence that music engages reward neurochemistry — prefiguring the dopamine findings of Salimpoor et al. thirty years later.

Limitations: The sample was very small (ten participants), and the effect was found in only a subset of them. Naloxone is not perfectly selective for any single neurochemical pathway. The study predated modern neuroimaging and could not localize the effect anatomically.

Textbook Chapters: Ch. 26 (Music and Emotion), Ch. 36 (The Neuroscience of Musical Beauty).

Why It Matters: Goldstein's paper established frisson as a legitimate object of scientific inquiry and opened the investigation of music's relationship to the brain's reward systems — a research program that would eventually encompass dopamine, opioids, and the default mode network.


Study 12

Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience, 14(2), 257–264.

Background: Dopamine is the neurotransmitter most closely associated with reward, motivation, and the anticipation of pleasure. It had been extensively studied in the context of food, sex, and addictive substances. Salimpoor and colleagues asked whether music — an abstract, non-biological stimulus — could engage the dopamine system in the same way, and whether the neurochemistry of anticipating a musical climax might differ from the neurochemistry of experiencing one.

Methods: Participants who reliably experienced chills (frisson) to music were selected. In two separate experiments, they listened to intensely pleasurable music while undergoing (1) positron emission tomography (PET) with a radioligand selective for dopamine D2 receptors, and (2) functional MRI. Skin conductance response and subjective ratings of chills were recorded concurrently. PET allowed measurement of actual dopamine release in specific brain regions; fMRI allowed measurement of blood-oxygen-level-dependent (BOLD) signal correlates.

Key Findings: Dopamine release occurred in two anatomically distinct regions at different times. During the anticipatory phase — the moments of building tension before a musical climax — dopamine release was greatest in the caudate nucleus, a region associated with anticipation of reward. During the experience of the climax itself, dopamine release shifted to the nucleus accumbens, a region associated with the hedonic experience of reward. This dissociation shows that music engages the full dopamine reward cycle — desire followed by consummation — not merely a general state of arousal. Skin conductance (an objective measure of physiological arousal) correlated directly with subjective ratings of chills and with dopamine release.

Limitations: Only participants who reliably experience chills were studied, so the findings may not generalize to listeners who find music pleasurable without chills. The selection of music was individual — each participant chose their own most emotionally powerful pieces — making cross-participant comparison difficult. The radioligand used (raclopride) measures displacement by endogenous dopamine indirectly.

Textbook Chapters: Ch. 26 (Music and Emotion), Ch. 36 (The Neuroscience of Musical Beauty), Ch. 38 (Music, Reward, and Addiction).

Why It Matters: This study definitively established that music engages the dopaminergic reward system and provided a neurochemical mechanism for the most intense forms of musical pleasure, placing music alongside food and sex as a biological reward stimulus.


Study 13

Juslin, P. N., & Sloboda, J. A. (Eds.) (2001). Music and Emotion: Theory and Research. Oxford: Oxford University Press. (BRECVEMA framework developed by Juslin, 2013, in subsequent publications.)

Background: By 2001, the study of music and emotion was flourishing but fragmented. Different researchers emphasized different mechanisms — conditioning, cognitive appraisal, contagion — without a unifying framework. Juslin's theoretical contribution, developed across multiple publications and synthesized in the BRECVEMA framework, proposed that music evokes emotion through multiple distinct psychological mechanisms operating simultaneously.

Methods: The BRECVEMA framework is a theoretical synthesis drawing on experimental psychology, psychophysiology, neuroscience, and music analysis. Individual mechanisms were investigated through separate lines of experimental evidence: e.g., brainstem reflexes were studied using psychophysical methods; episodic memory was studied using autobiographical recall paradigms; rhythmic entrainment was studied using synchronization tasks.

Key Findings: BRECVEMA identifies eight mechanisms through which music evokes emotion: Brainstem Reflex (rapid, pre-attentive responses to acoustic features like loud sudden sounds), Rhythmic Entrainment (synchronization of body rhythms to musical pulse), Evaluative Conditioning (associations formed through co-occurrence with emotional events), Contagion (perceiving and mirroring the emotional expression of music), Visual Imagery (music evoking mental images with emotional content), Episodic Memory (music triggering autobiographical memories), Musical Expectancy (emotions arising from confirmation or violation of melodic and harmonic expectations), and Aesthetic Judgment (evaluative emotions arising from appreciation of musical craft). Each mechanism has distinct neural substrates, developmental trajectories, and cross-cultural profiles.

Limitations: The BRECVEMA taxonomy is post-hoc and descriptive rather than derived from a single theoretical framework. Distinguishing between mechanisms in real-time listening is methodologically difficult. The relative weight of each mechanism in a given listening episode cannot easily be quantified.

Textbook Chapters: Ch. 26 (Music and Emotion), Ch. 27 (Why Music Moves Us), Ch. 37 (Music in Everyday Life).

Why It Matters: BRECVEMA is the most comprehensive taxonomy of music-emotion mechanisms available and provides a roadmap for integrating disparate lines of research into a coherent picture of how music emotionally affects listeners.


Study 14

Huron, D., & Margulis, E. H. (2010). Musical expectancy and thrills. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of Music and Emotion (pp. 575–604). Oxford: Oxford University Press.

Background: If musical emotion arises partly from expectation and its violations, what specific expectation-related events trigger the most intense physical responses — the chills and thrills reported by music listeners? Huron and Margulis synthesized theoretical and empirical work to propose a detailed account of how expectation violations produce frisson, drawing on Huron's ITPRA framework and Sloboda's structural analysis.

Methods: The chapter synthesizes experimental studies, corpus analyses, and theoretical arguments. Key empirical anchors include Sloboda's structural analysis of emotion-evoking passages, reaction-time studies of melodic expectation, and analysis of physiological (skin conductance, heart rate) responses to music during controlled listening experiments. The chapter develops formal predictions about which types of expectation violation should be most frisson-inducing.

Key Findings: Frisson is most reliably triggered by events that are both unexpected and large in magnitude — sudden entries of a new voice or instrument, unexpected harmonic shifts to remote keys, abrupt changes in register or dynamics, or moments where a long-anticipated event finally arrives. The intensity of the chills depends on the tension accumulated during anticipation — consistent with Huron's "Imagination" and "Tension" stages of ITPRA. The chapter also notes that frisson can be triggered by exceptionally beautiful confirmation of expectations, suggesting that the system responds to "super-optimal" outcomes, not just surprises.

Limitations: The chapter is a theoretical synthesis, and many of its specific predictions await direct experimental testing. The measurement of "expectation magnitude" in real music remains methodologically challenging.

Textbook Chapters: Ch. 12 (Musical Expectation and Tension), Ch. 26 (Music and Emotion), Ch. 36 (The Neuroscience of Musical Beauty).

Why It Matters: Huron and Margulis's synthesis connects the abstract mathematics of information theory (prediction and surprise) to the visceral physical experience of musical chills, illustrating the depth of the connection between physics-like predictive processing and musical emotion.


Study 15

Eerola, T., & Vuoskoski, J. K. (2011). A comparison of the discrete and dimensional models of emotion in music. Psychology of Music, 39(1), 18–49.

Background: Two major traditions compete in the scientific study of emotion: the "discrete" tradition, which holds that emotions fall into a small number of distinct categories (joy, sadness, fear, anger), and the "dimensional" tradition, which represents emotions as points in a continuous multidimensional space (most commonly defined by valence and arousal). Music provides an unusually tractable domain for comparing these models, since emotional ratings of music can be collected efficiently and reliably. Eerola and Vuoskoski subjected a large set of film music excerpts to systematic comparison of four competing models.

Methods: A large sample of participants (N = 116) rated 360 film music excerpts on a battery of emotional scales including discrete emotion categories and dimensional ratings (valence, arousal, tension). Statistical modeling compared the fit and predictive validity of four models: basic emotion categories, dimensional (circumplex), Tellegen-Watson-Clark's two-dimensional model, and Geneva Emotional Music Scales (GEMS). The film music corpus was chosen for its diversity and its well-established emotional associations.

Key Findings: No single model won decisively across all criteria. Discrete models predicted specific emotional responses (e.g., ratings of "scary" or "joyful") more accurately than dimensional models. Dimensional models provided a better description of the overall structure of emotional responses across the corpus. A hybrid approach — using dimensional models to organize the space and discrete emotion categories to label regions — provided the best overall account. The GEMS (Geneva Emotional Music Scales) — designed specifically for music with categories like "wonder," "tenderness," "nostalgia" — outperformed general-purpose discrete emotion models, suggesting that music evokes some emotions not well captured by standard taxonomies.

Limitations: The corpus was film music, which has been deliberately composed to evoke emotion in specific contexts; results may not generalize to absolute music. Ratings reflect perceived emotion as much as felt emotion — listeners may recognize that a piece is sad without feeling sad themselves.

Textbook Chapters: Ch. 26 (Music and Emotion), Ch. 27 (Why Music Moves Us).

Why It Matters: The study highlights that music occupies a unique emotional space — one that partially overlaps with, but is not fully captured by, general models of human emotion — suggesting that musical experience involves emotion categories shaped by the acoustic and structural properties of music itself.


SECTION IV: MUSIC UNIVERSALS

Study 16

Mehr, S. A., Singh, M., Knox, D., Ketter, D. M., Pickens-Jones, D., Atwood, S., ... & Glowacki, L. (2019). Universality and diversity in human song. Science, 366(6468), eaax0868.

Background: The question of whether music is a human universal — present in all known societies with shared structural features — had been debated for decades without rigorous empirical resolution. Ethnomusicologists and evolutionary theorists held opposing views, and existing cross-cultural studies were limited to convenience samples of Western and a handful of non-Western societies. The Natural History of Song project sought to resolve this debate using a systematic, pre-registered study of music across sixty societies.

Methods: The research team compiled a corpus of ethnographic recordings from sixty societies spanning all inhabited continents, sampled systematically from the Human Relations Area Files. In two experiments, online listeners (from the US, India, and other countries) rated recordings for perceived function (dance, lullaby, healing, love) and for specific musical features. A third study had expert ethnomusicologists code musical features from the recordings. The project was pre-registered and used large participant samples (n > 5,000 in the online studies).

Key Findings: Across all sixty societies, music was found to be a universal human behavior. Listeners — even those unfamiliar with a given musical tradition — could reliably identify the behavioral context of recordings (whether a song was a dance song, a lullaby, a healing song, or a love song) at above-chance rates, suggesting that certain acoustic features reliably signal social context across cultures. Songs associated with infant care, dance, love, and healing were found in all sixty societies. Within each functional category, songs showed convergent acoustic features (e.g., lullabies are slower, softer, and have narrower pitch range than dance songs) across unrelated cultures, providing evidence for universal structural features shaped by function.

Limitations: The corpus represents ethnographic recordings, which may over-represent performative contexts and under-represent everyday musical practices. Online rating experiments by non-specialist listeners may miss subtle within-tradition emotional distinctions. The study demonstrates universality at the level of functional categories but cannot resolve debates about universality at the level of specific scales, intervals, or tuning systems.

Textbook Chapters: Ch. 1 (Why Music? The Evolutionary Question), Ch. 29 (Music Across Cultures), Ch. 30 (Is Music Universal?).

Why It Matters: The Natural History of Song study provides the strongest empirical case to date for musical universals — and thus for the hypothesis that music is a biological adaptation rather than a purely cultural invention — while simultaneously documenting the striking diversity of musical practices across human societies.


Study 17

Savage, P. E., Brown, S., Sakai, E., & Currie, T. E. (2015). Statistical universals reveal the structures and functions of human music. Proceedings of the National Academy of Sciences, 112(29), 8987–8992.

Background: Rather than asking whether music is present in all cultures (which is well established), Savage and colleagues asked whether specific musical features — particular scale structures, rhythmic patterns, or social contexts — are statistically overrepresented across cultures relative to chance. The goal was to identify the structural building blocks of a universal musical grammar.

Methods: The team analyzed a sample of 304 recordings drawn from sixty world cultures using the Cantometrics coding system, which rates recordings on over forty variables including melodic range, rhythmic regularity, group performance style, and social context. Statistical tests identified features that appeared across cultures at rates significantly above or below chance, and phylogenetic comparative methods tested whether the patterns reflect common descent or convergent evolution.

Key Findings: Several features were found to be statistical universals: music is predominantly performed in groups rather than by individuals; it occurs in social contexts (not in isolation); it is metrically regular (even in cultures with complex polyrhythmic styles, an underlying pulse can be detected); melodic intervals tend to cluster near simple frequency ratios. The association of music with group activity was particularly robust, supporting the "social bonding" theory of music's evolutionary origins. Features that vary widely across cultures (specific scale systems, tonal languages, tuning) appear to be culturally specific rather than biologically grounded.

Limitations: The Cantometrics coding system, though widely used, was developed in the 1960s and reflects some of the theoretical assumptions of that era. Coding reliability can be an issue for complex musical traditions. The sample, though large for this type of study, is not fully representative of all world musical traditions.

Textbook Chapters: Ch. 29 (Music Across Cultures), Ch. 30 (Is Music Universal?), Ch. 1 (Why Music? The Evolutionary Question).

Why It Matters: By distinguishing statistical universals from absolute universals, Savage et al. provided a nuanced empirical framework for the nature-nurture debate in music — showing that some features of music are universal because they are functional, while others are culturally constructed.


Study 18

Nettl, B. (1983). The Study of Ethnomusicology: Twenty-Nine Issues and Concepts. Urbana: University of Illinois Press.

Background: Bruno Nettl is the dean of North American ethnomusicology, and this book represents his mature synthesis of the field's central problems and debates. Written as a guide for graduate students and advanced undergraduates, it addresses fundamental questions about what music is, how it can be studied across cultures, and what, if anything, all music has in common. It serves as a corrective to the bias toward Western art music that has historically dominated music theory and psychology.

Methods: The book is theoretical and argumentative rather than a report of empirical experiments. Nettl draws on his fieldwork in Iran, Native North American music, and Central European traditions to illustrate his arguments. Each chapter addresses a specific issue or concept in ethnomusicology, from notation and transcription to the ethics of fieldwork and the concept of musical change. The relevant chapters for this textbook involve his arguments about musical universals and the diversity of musical cultures.

Key Findings: Nettl argues that while music is present in all known human societies, the specific features that Western observers tend to treat as universal (harmony, the major-minor system, regular meter, notation) are in fact culture-specific. At the same time, he identifies several features that do appear across cultures: music is always distinguished from speech; it is always embedded in social contexts and used for social purposes; it involves some form of organized pitch and time; and it is always learned, not innate. Nettl is cautious about evolutionary claims and emphasizes that cross-cultural comparison must be grounded in deep familiarity with individual traditions.

Limitations: As a theoretical argument drawing on selected examples, the book does not offer systematic quantitative evidence for or against universals. Nettl's skepticism about evolutionary approaches reflects the disciplinary norms of anthropology more than a comprehensive engagement with the biological literature.

Textbook Chapters: Ch. 29 (Music Across Cultures), Ch. 30 (Is Music Universal?), Ch. 2 (What Is Music?).

Why It Matters: Nettl's work is essential reading for anyone who wants to think rigorously about what music is and whether any of its properties are universal — providing a humanistic and ethnographic counterweight to the more experimentally inclined perspectives in the rest of this appendix.


Study 19

Blacking, J. (1973). How Musical Is Man? Seattle: University of Washington Press.

Background: John Blacking was a British ethnomusicologist and anthropologist who spent years studying the Venda people of South Africa. His 1973 book — based on his Jessie and John Danz Lectures — is one of the most provocative short books ever written about the nature of music. Its central argument: musical ability is a species-wide human capacity, not the specialized gift of a talented few, and the Western division of people into "musical" and "unmusical" reflects social ideology rather than biological reality.

Methods: Blacking's argument draws primarily on his ethnographic fieldwork with the Venda, among whom every member of the community participates in sophisticated musical performance. He contrasts this with Western societies in which most adults believe themselves to be unmusical and abstain from musical participation. He also engages critically with the psychoacoustic and psychological literature of his time and with evolutionary accounts of music's origins.

Key Findings: Blacking argues that music is a fundamental human competence, comparable to language: all humans are "musical" in the sense of being capable of musical participation, just as all humans are linguistic in the sense of being capable of language acquisition. What appears as musical talent is largely the result of early musical enculturation and practice, not innate ability limited to a few. He further argues that music's primary social function — and the key to understanding its human universality — is as a form of organized movement and social ritual that coordinates communal experience. The Western concert tradition, with its sharp division between passive audience and active performer, represents an impoverishment of music's social function.

Limitations: Blacking's argument is partly ideological — he was writing against what he saw as the elitism of Western musical culture — and sometimes overstates the uniformity of Venda musical participation. The claim that all humans are equally musical has been contested by subsequent research on musical aptitude and absolute pitch.

Textbook Chapters: Ch. 29 (Music Across Cultures), Ch. 30 (Is Music Universal?), Ch. 33 (Musical Aptitude and Musical Development).

Why It Matters: Blacking's argument anticipates modern research showing that musical ability is more widely distributed than Western cultural assumptions suggest, and his emphasis on music as embodied social practice remains a vital corrective to purely cognitive or acoustic accounts of musical experience.


Study 20

Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., Friederici, A. D., & Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology, 19(7), 573–576.

Background: A fundamental question in the study of music and emotion is whether the emotional communication of music is culturally specific — learned through exposure to a particular musical tradition — or whether there are acoustic features of emotional music that are recognized cross-culturally by listeners without exposure to the relevant tradition. Fritz et al. tested this by studying the Mafa — an isolated farming community in the remote mountains of Cameroon with no prior exposure to Western music — using Western musical stimuli.

Methods: Mafa participants and matched Western control participants (university students in Leipzig, Germany) listened to brief excerpts of Western music specifically composed to convey happiness, sadness, or fear. Mafa participants had had no prior exposure to Western music or Western electronic media. Participants identified the intended emotion from a three-way forced choice and rated the pleasantness of each excerpt.

Key Findings: Mafa listeners identified the emotional content of the Western music excerpts at above-chance rates for all three emotion categories (happy, sad, fearful), despite having no prior exposure to Western music. Recognition accuracy was lower than for Western participants but was significantly above chance. Pleasantness ratings also showed some consistency across groups, though Western participants rated happy music as more pleasant and fearful music as less pleasant than Mafa participants did, suggesting that some affective responses to music are learned while the basic emotional recognition is partially universal. The results implicate universal acoustic cues — tempo, mode (major vs. minor), rhythmic regularity, pitch height — in emotional communication.

Limitations: The three-choice forced-selection paradigm may overestimate recognition by limiting response options. The stimuli were composed music rather than ecologically valid excerpts, and the specific acoustic features driving cross-cultural recognition could not be fully disentangled from each other in this design.

Textbook Chapters: Ch. 26 (Music and Emotion), Ch. 29 (Music Across Cultures), Ch. 30 (Is Music Universal?).

Why It Matters: The Mafa study is one of the strongest pieces of evidence for a universal basis to musical emotional communication, suggesting that at least some acoustic features of emotional expression are recognized across radically different cultural contexts.


SECTION V: ABSOLUTE PITCH

Study 21

Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin, 113(2), 345–361.

Background: Absolute pitch (AP) — the ability to identify or produce a musical note without reference to an external standard — has fascinated musicians, scientists, and laypeople for centuries. The ability is rare in the general Western population (estimates range from 1 in 1,500 to 1 in 10,000) and appears to be closely associated with early musical training, but its genetic and developmental bases were unclear. Takeuchi and Hulse's comprehensive review synthesized the available empirical literature.

Methods: Literature review synthesizing studies of AP prevalence across populations, developmental studies of AP acquisition, behavioral studies of AP accuracy and reliability, and neuropsychological and genetic reports. Studies from both Western and East Asian populations were examined, along with studies of AP in people with Williams syndrome, autism, and other neurodevelopmental conditions.

Key Findings: AP appears to require both genetic predisposition and early musical training during a sensitive period (roughly the first six years of life). The ability is more prevalent among individuals who began musical training before age six and among those from tonal-language-speaking backgrounds. AP is not a single homogeneous ability: some AP possessors have precise pitch memory (accurate to within a few cents), while others have approximate AP (accurate to within a semitone). AP possessors show different patterns of brain activation during pitch tasks compared to non-AP individuals, consistent with a difference in processing strategy (labeling vs. relative comparison). AP prevalence is substantially higher in East Asian populations than in Western populations.

Limitations: Prevalence estimates are methodologically heterogeneous, with different studies using different criteria for what counts as AP. Selection biases in studied populations (musicians, conservatory students) make population prevalence hard to estimate. The genetics of AP had not been directly studied at the time of the review.

Textbook Chapters: Ch. 22 (Pitch and the Brain), Ch. 33 (Musical Aptitude and Musical Development), Ch. 34 (The Development of Musical Expertise).

Why It Matters: Absolute pitch is the most extensively studied example of a musical ability with both genetic and developmental components, making it a key test case for the nature-nurture debate in musical cognition.


Study 22

Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. (2006). Absolute pitch among American and Chinese conservatory students: Prevalence differences and evidence for a speech-link hypothesis. Journal of the Acoustical Society of America, 119(2), 719–722.

Background: Earlier research had suggested that AP was more prevalent among speakers of tonal languages (such as Mandarin or Cantonese, in which word meaning is determined by pitch contour) than among speakers of non-tonal languages (like English). Deutsch and colleagues designed a systematic study to test this hypothesis directly, comparing AP prevalence between matched samples of American and Chinese conservatory students who began training at similar ages.

Methods: Students at conservatories in the United States and China completed a standardized AP test: listening to 36 piano tones and naming each without reference to a standard. Participants reported their language backgrounds, age of onset of musical training, and other relevant variables. Prevalence of AP (defined as correct identification of at least 85% of tones) was compared across groups.

Key Findings: AP prevalence was dramatically higher among Chinese-speaking students than among English-speaking students matched for age of musical training onset. Among students who began training before age five, AP prevalence was approximately 50% in the Chinese-speaking group compared to approximately 14% in the English-speaking group. The difference was not fully explained by age of training onset or intensity of training, supporting the hypothesis that speaking a tonal language during the critical period facilitates the development of AP by maintaining the salience of absolute pitch information in auditory memory.

Limitations: The study is correlational and cannot establish causation — tonal language speaking may be a proxy for other cultural differences in musical enculturation. The sample is confined to conservatory students, who are not representative of the general population. The definition of AP (85% correct) is stringent and excludes partial AP possessors.

Textbook Chapters: Ch. 22 (Pitch and the Brain), Ch. 29 (Music Across Cultures), Ch. 33 (Musical Aptitude and Musical Development).

Why It Matters: The language-AP link is one of the most striking examples of cross-domain effects in auditory cognition, suggesting that the perceptual categories established by language can shape musical perception at a fundamental level.


Study 23

Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned melodies. Perception & Psychophysics, 56(4), 414–423.

Background: Existing research on absolute pitch focused on trained musicians with exceptional AP ability. Levitin asked a more radical question: do people without formal AP — indeed, most people — nonetheless retain precise absolute pitch information in long-term memory for familiar melodies? The methodology was ingenious: rather than testing laboratory-learned tones, he used the pitch memory inherent in familiar popular music.

Methods: Participants (college students, most without formal musical training) were asked to sing or hum familiar pop songs from memory, starting from the first note of the song, without any external reference. Recordings were analyzed for the starting pitch of the production, and the accuracy of these productions was compared to the actual pitch of the original commercial recordings. Pitch accuracy was measured to the nearest semitone.

Key Findings: Participants produced starting pitches within one semitone of the original recording's pitch on approximately two-thirds of trials — far above the chance rate expected if pitch memory were imprecise. Furthermore, participants were more accurate for pieces they knew very well and listened to frequently, suggesting that long-term memory for familiar music preserves pitch-specific information. This "implicit absolute pitch" does not involve the rapid, automatic labeling seen in true AP possessors but shows that precise pitch information is encoded in long-term memory for familiar music by most people.

Limitations: The study used production (singing) rather than recognition, which introduces motor and vocal range limitations. Participants may have been influenced by the range comfortable for their voice rather than purely by memory. The one-semitone criterion for accuracy is relatively coarse.

Textbook Chapters: Ch. 22 (Pitch and the Brain), Ch. 24 (Memory for Music), Ch. 33 (Musical Aptitude and Musical Development).

Why It Matters: Levitin's study challenged the assumption that precise pitch memory is rare, suggesting instead that most people retain rich, pitch-specific memories for familiar music — with implications for how musical memory is organized in the brain and how music encodes personal and emotional meaning.


SECTION VI: MUSIC AND THE BRAIN

Study 24

Zatorre, R. J., & Salimpoor, V. N. (2013). From perception to pleasure: Music and its neural substrates. Science, 340(6129), 1468–1473.

Background: By 2013, a decade of neuroimaging research had produced a rich picture of how the brain processes music — which regions analyze pitch and rhythm, which regions integrate musical syntax, and which regions respond to musical emotion. Zatorre and Salimpoor wrote a synthesis review for Science that attempted to draw these threads together into a unified account of how acoustic processing transforms into the experience of musical pleasure.

Methods: The review synthesizes fMRI, PET, and neuropsychological studies of music processing, drawing heavily on the authors' own previous work (including the Salimpoor 2011 dopamine study) as well as the broader neuroimaging literature. The paper integrates evidence from both lesion studies and functional imaging to make claims about the causal role of specific regions.

Key Findings: Musical pleasure involves a cascade of processing: auditory cortex analyzes acoustic features; superior temporal regions integrate melodic and harmonic patterns; frontal lobe regions predict musical continuations; the mesolimbic dopamine system translates predictions and prediction errors into hedonic responses. The key insight is that musical pleasure is fundamentally predictive — the emotional impact of music arises from the temporal interplay between expectation (driven by frontal-subcortical connections) and acoustic reality (represented in auditory cortex). The nucleus accumbens is the site where prediction meets reward, and its connectivity to auditory cortex is proposed to be a key individual-difference variable in musical pleasure.

Limitations: fMRI provides correlational evidence and cannot establish causality. The focus on chills-inducing music may not represent the full range of musical pleasure. Individual differences in the structural and functional connectivity between auditory and reward areas have since become a major research focus but were only beginning to be characterized at publication.

Textbook Chapters: Ch. 36 (The Neuroscience of Musical Beauty), Ch. 38 (Music, Reward, and Addiction), Ch. 12 (Musical Expectation and Tension).

Why It Matters: Zatorre and Salimpoor's synthesis provides the most coherent neuroscientific account of musical pleasure, connecting the physics of acoustic processing to the neurobiology of reward through the central mechanism of predictive processing.


Study 25

Koelsch, S. (2011). Toward a neural basis of music-evoked emotions. Trends in Cognitive Sciences, 15(9), 410–417.

Background: Koelsch has been among the most productive investigators of the neural basis of musical emotion, using event-related potential (ERP) and fMRI techniques to study how the brain processes musical syntax and semantics and how these processes connect to emotional responses. This review paper synthesizes his research program and situates it within the broader landscape of affective neuroscience.

Methods: Review of ERP and fMRI studies of music processing, focusing on studies that used harmonic manipulations (unexpected chords, out-of-key progressions) to probe syntactic and emotional processing. The review draws on the ERAN (Early Right Anterior Negativity) component, a cortical ERP response to harmonically unexpected chords, as a key index of musical syntax processing and traces its connections to emotional processing regions.

Key Findings: Music activates a broad network of brain regions associated with emotion, including the amygdala, hippocampus, ventral striatum, and anterior insula. Harmonically unexpected chords — those that violate tonal syntax — activate the amygdala and generate physiological arousal responses, even in musically untrained listeners. This suggests that the brain's threat-detection system (amygdala) responds to acoustic events that violate learned regularities. Koelsch proposes that music activates seven distinct emotion-triggering mechanisms at different levels of the neural hierarchy, from brainstem (processing loudness and rhythm) through limbic system (memory and emotional associations) to cortex (syntactic and semantic processing).

Limitations: The manipulation of harmonic expectation in ERP paradigms requires musical stimuli that differ in controlled ways, which can make them seem unnatural or artificial. The mapping from brain activation patterns to specific subjective emotional states requires careful validation. Koelsch's model is more mechanistic than the BRECVEMA framework and may oversimplify the diversity of mechanisms.

Textbook Chapters: Ch. 25 (Melody and Harmony in the Brain), Ch. 26 (Music and Emotion), Ch. 36 (The Neuroscience of Musical Beauty).

Why It Matters: Koelsch's work demonstrates that the emotional impact of music is not confined to high-level cognitive processing but engages subcortical emotion systems at multiple levels — including systems that evolved for threat detection, suggesting that music taps into very ancient neural architecture.


Study 26

Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. Journal of Neuroscience, 29(10), 3019–3025.

Background: Adult musicians show structural brain differences from non-musicians in regions associated with auditory, motor, and multimodal processing — but whether these differences are caused by musical training or reflect pre-existing characteristics of people who become musicians had been difficult to determine. Hyde and colleagues conducted the first longitudinal study of brain structure in children beginning musical training, enabling a direct test of causality.

Methods: Children aged six years were recruited and assessed for musical aptitude at baseline. Fifteen children then received fifteen months of keyboard instruction (one lesson per week plus daily home practice); sixteen children served as an active control group (receiving no formal music training). All children underwent structural MRI at baseline and at fifteen-month follow-up. Cortical thickness and gray matter volume were measured in regions of interest identified from prior adult musician studies.

Key Findings: Children who received musical training showed significantly greater structural changes in two brain regions compared to controls: the primary motor cortex (controlling hand movements) and primary auditory cortex (processing sounds). Furthermore, the degree of structural change correlated with the amount of practice and with improvements in musical ability. These findings provide causal evidence that musical training shapes brain structure — not merely that musicians' brains are pre-wired differently. The changes were detectable after only fifteen months of training, suggesting that structural plasticity in response to musical training is robust and rapid in young children.

Limitations: The training group and control group differed in the amount of structured activity they received, so some of the differences could reflect general effects of structured learning rather than music-specific effects. The training period was fifteen months — long enough to see effects, but too short to see the full developmental trajectory. Longer-term follow-up data were not available.

Textbook Chapters: Ch. 33 (Musical Aptitude and Musical Development), Ch. 34 (The Development of Musical Expertise), Ch. 39 (Music and Education).

Why It Matters: Hyde et al.'s longitudinal study is the strongest direct evidence that musical training causally reshapes the brain, with implications for music education policy and for understanding how practice shapes neural architecture more generally.


Study 27

Levitin, D. J., & Menon, V. (2003). Musical structure is processed in "language" areas of the brain: A possible role for Brodmann area 47 in temporal coherence. NeuroImage, 20(4), 2142–2152.

Background: An ongoing debate in cognitive neuroscience concerns the extent to which music and language share neural resources. Language is processed largely in the left hemisphere, especially Broca's area (Brodmann area 44/45) and Wernicke's area. Music, by contrast, tends to engage right hemisphere and bilateral regions. But both music and language involve hierarchical structure — phrases, clauses, and sections organized into larger wholes. Levitin and Menon investigated whether a specific region — Brodmann area 47, associated with syntactic processing in language — is also recruited for processing musical structure.

Methods: Participants (non-musicians) listened to original pop and classical music pieces while undergoing fMRI. The stimuli were manipulated to create five conditions: forward original music, scrambled chords (random temporal ordering of chords), scrambled beats (random reordering of individual beats), reversed music, and silence. Comparing responses to these conditions allowed isolation of brain responses to temporal coherence, tonal structure, and rhythm separately.

Key Findings: Brodmann area 47 (in the inferior frontal gyrus, overlapping with part of Broca's area) was significantly more active for organized musical structure than for temporally scrambled controls, even for untrained listeners. This region showed a left-hemisphere bias. The findings suggest that the hierarchical syntactic processing previously associated with language extends to musical syntax, consistent with shared neural resources for structured sequence processing. Tonal (harmonic) structure and temporal coherence engaged partially overlapping but distinguishable regions.

Limitations: The scrambling procedure, while useful for isolating structure, produces stimuli that differ from the original on many dimensions simultaneously. fMRI BOLD signal is an indirect measure of neural activity. The conclusion that music and language "share" Brodmann area 47 depends on the interpretation of overlapping activation, which could reflect a general structural processing mechanism rather than truly shared modules.

Textbook Chapters: Ch. 23 (The Architecture of Musical Cognition), Ch. 25 (Melody and Harmony in the Brain), Ch. 32 (Musical Form and the Brain).

Why It Matters: The evidence for overlapping neural substrates for musical and linguistic syntax provides a key argument in the ongoing debate about whether music and language co-evolved and whether they share developmental origins.


SECTION VII: PHYSICS OF MUSIC

Study 28

Voss, R. F., & Clarke, J. (1975). "1/f noise" in music and speech. Nature, 258(5533), 317–318.

Background: In many physical systems — electronic circuits, biological rhythms, geological records — fluctuations show a characteristic "1/f" or "pink noise" power spectrum, in which the spectral power at each frequency is inversely proportional to the frequency. This pattern lies between white noise (equal power at all frequencies) and Brownian (random-walk) noise (power falling as 1/f²). Voss and Clarke asked whether musical sequences — specifically, the way pitch and loudness fluctuate over time in a piece of music — also exhibit 1/f statistics, which would suggest that music sits at a special point in the space of temporal complexity.

Methods: Voss and Clarke analyzed recordings of multiple musical genres (classical, jazz, folk) by measuring the power spectrum of fluctuations in pitch (instantaneous frequency) and amplitude (loudness) over time. They also analyzed speech recordings. The power spectra of these fluctuations were computed and compared to the predictions of white noise, 1/f noise, and Brownian noise models.

Key Findings: The fluctuations in both pitch and loudness in music exhibited 1/f power spectra across the genres examined — neither the randomness of white noise nor the sluggish drift of Brownian motion, but the intermediate, scale-invariant pattern of 1/f noise. Speech showed a similar pattern. The authors noted that 1/f noise represents a kind of "optimal" complexity: sufficient unpredictability to be interesting, with enough long-range correlation to feel structured. This suggests that composers — consciously or not — produce music with a characteristic statistical signature that may be related to the listener's experience of balance between predictability and surprise.

Limitations: The study analyzed only a small number of recordings. The method of extracting "pitch" from complex musical recordings is technically approximate. The claim that 1/f music is "optimal" for listeners was suggestive rather than experimentally demonstrated. Subsequent studies have found significant genre and cultural variation in spectral slope.

Textbook Chapters: Ch. 17 (Chaos, Complexity, and Musical Structure), Ch. 31 (The Statistics of Music).

Why It Matters: The Voss and Clarke study opened the investigation of music as a physical signal with measurable statistical properties — an approach that has since grown into the field of music information retrieval and the computational study of musical style.


Study 29

Plomp, R. (1964). The ear as a frequency analyzer. Journal of the Acoustical Society of America, 36(9), 1628–1636.

Background: Ohm's acoustic law — formulated by Georg Simon Ohm in 1843, and later refined by Helmholtz — proposed that the ear analyzes sound into its sinusoidal frequency components, essentially performing a real-time Fourier transform. But the precision of this analysis, and particularly whether the ear could resolve the individual harmonics of a complex tone at low harmonic numbers, had not been rigorously established. Plomp's experiments provided the definitive psychoacoustic test of the ear's frequency-analytic capability.

Methods: Listeners were presented with complex tones consisting of several harmonics, and tested on whether they could hear out individual harmonics as separate pitches. The experiment varied the harmonic number (first, second, third, etc.) and the frequency separation between adjacent harmonics relative to the critical bandwidth. Thresholds for the resolution of individual harmonics were measured as a function of these variables.

Key Findings: The ear can resolve individual harmonics as separate perceptual entities when they are separated by more than a critical bandwidth. For a 200 Hz fundamental, this means that approximately the first six to eight harmonics can be individually resolved (heard out separately), while higher harmonics merge into an unresolved "buzz." This boundary — the transition from resolved to unresolved harmonics — is a key determinant of timbre perception and of virtual pitch (the perception of the fundamental from its harmonics). The finding confirmed that the ear does function as a limited-precision spectrum analyzer, with frequency resolution governed by critical bandwidth.

Limitations: The experiment used highly trained listeners able to perform analytic listening — attending to individual partials within a complex tone. Naive listeners show lower resolution performance, raising questions about how the resolved-harmonic limit applies to everyday listening. The critical bandwidth estimates of the 1960s were later refined.

Textbook Chapters: Ch. 4 (The Mechanical Ear), Ch. 14 (The Frequency Domain and the Ear), Ch. 15 (Timbre and Spectrum).

Why It Matters: Plomp's frequency-analyzer paper provides the psychoacoustic foundation for understanding why the harmonic series is perceptually special, why timbre differs from pitch, and why the first few harmonics dominate the perception of musical intervals.


Study 30

Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of America, 55(5), 1061–1069.

Background: The perception of pitch from complex tones had long been attributed to the physical presence of the fundamental frequency component. But psychoacoustic experiments in the mid-twentieth century had established that the pitch of the fundamental could be heard even when that frequency was entirely absent from the signal — the "missing fundamental" phenomenon. Terhardt sought to develop a rigorous theoretical account of "virtual pitch" — the pitch heard from a set of harmonics in the absence of the fundamental — and to connect this account to consonance and harmony.

Methods: Terhardt conducted psychoacoustic experiments measuring pitch-matching responses to complex tones with and without the fundamental, across a range of harmonic compositions and fundamental frequencies. He developed a mathematical model of virtual pitch that predicts perceived pitch from the pattern of harmonic components, based on what he called "subharmonic coincidence" — a template-matching process in which the auditory system identifies the fundamental frequency whose harmonic series best matches the incoming spectral pattern.

Key Findings: Virtual pitch — the perceived pitch of a complex tone in the absence of its fundamental — is veridical with the pitch of the fundamental when the remaining harmonics are low-numbered (resolved). As harmonic number increases and harmonics become unresolved, virtual pitch becomes weaker and less precise. Terhardt proposed that the auditory system uses a "virtual pitch processor" that applies learned templates of harmonic relationships to infer the fundamental from its harmonics — a process that would explain both the missing fundamental phenomenon and the special perceptual status of simple integer-ratio relationships between tones. This learning-based model anticipated later computational accounts of pitch and has influenced cochlear implant design.

Limitations: The template model requires that listeners have learned harmonic templates from environmental exposure — it is therefore a developmental and experiential model, not a purely peripheral one. The boundary between virtual pitch (a cognitive inference) and spectral pitch (a peripheral readout) is not always sharp in practice.

Textbook Chapters: Ch. 5 (Frequency, Pitch, and the Cochlea), Ch. 7 (Intervals and the Harmonic Series), Ch. 14 (The Frequency Domain and the Ear), Ch. 15 (Timbre and Spectrum).

Why It Matters: Terhardt's virtual pitch theory explains one of the most remarkable facts in psychoacoustics — that the pitch we hear from a musical instrument is not physically present as a sine wave in the air but is constructed by the brain from its harmonics. This insight dissolves the apparent puzzle of the "missing fundamental" and grounds our understanding of pitch in the physics of harmonic series rather than in any single frequency component.


Closing Note

The thirty studies summarized in this appendix represent only a fraction of the literature that underlies the arguments of this textbook. Each entry points toward a body of work far richer than any summary can capture. Students who find themselves gripped by a particular question — why music makes us cry, how the cochlea works, whether music is truly universal — are encouraged to follow the citations into the primary literature, where the full texture of scientific argument, replication, controversy, and refinement can be appreciated. The history of psychoacoustics and music cognition is one of the most interesting intellectual stories of the past century, and it is still being written.