Case Study 5.1: The Shepard Tone — An Infinite Auditory Staircase
Overview
Imagine hearing a scale that ascends forever. Each note is clearly higher than the last; the progression is unambiguous — up, up, up. Yet minutes later, the scale has returned exactly to where it began, without ever going back down. This is the Shepard tone: an auditory impossibility made perceptually real, an infinite staircase for the ear that reveals, as few phenomena do, the constructive nature of pitch perception and the profound gap between physical stimulus and perceptual experience.
The Shepard tone was described by cognitive scientist Roger Shepard in 1964. In the six decades since, it has become one of the most studied and most exploited phenomena in psychoacoustics — a tool for researchers mapping the boundaries of pitch perception, and a compositional and film-scoring technique used to create unresolvable tension and forward momentum without end.
Roger Shepard and the Discovery
Roger Shepard (born 1929) is one of the towering figures of 20th-century cognitive science, best known for his work on mental imagery and the geometry of psychological spaces. His 1964 paper in the Journal of the Acoustical Society of America — "Circularity in Judgments of Relative Pitch" — described a striking demonstration of the non-absolute nature of pitch perception.
Shepard had been thinking about the structure of pitch: specifically, whether pitch was a purely linear quantity (a number on a scale from low to high) or whether it had a more complex structure. He knew that octave equivalence — the perceptual similarity of notes an octave apart — suggested that pitch had at least two dimensions: a continuous "height" dimension (high-low) and a cyclic "chroma" dimension (C, D, E, F, G, A, B, C...). His question was: could these two dimensions be made to work against each other?
The answer was yes — and the Shepard tone was the demonstration.
Physical Construction: The Infinite Staircase
A Shepard tone is constructed from a set of sinusoidal tones separated by octave intervals, all sounding simultaneously. The key ingredient is an amplitude envelope that controls the loudness of each octave component: the middle octaves are loudest, while the very highest and lowest octave components are very quiet — faded nearly to inaudibility.
To create the illusion of ascending pitch, the entire set of tones is shifted upward by a semitone. As the shift happens, the tone that was previously at the top of the loudest region moves slightly outside the envelope's peak, becoming slightly quieter. Meanwhile, a new tone at the bottom of the ensemble comes into slightly better alignment with the envelope, becoming slightly louder. The net effect: the chroma of the tone changes (it moves from, say, C to C-sharp), but the overall range of frequencies present stays the same, because the bottom components are fading in as the top components fade out.
The loop completes after twelve semitone steps — one full octave — returning to the original chroma (C again), at the same overall register. The listener has experienced twelve perceptible upward steps but is back exactly where they started in frequency space. The staircase is infinite.
📊 The Shepard Tone: Physical Structure
Imagine 8 sine tones, spaced one octave apart: C2, C3, C4, C5, C6, C7, C8, C9. - A bell-shaped amplitude envelope makes C5 and C6 loudest, C4 and C7 somewhat quieter, C3 and C8 very quiet, C2 and C9 barely audible. - When all tones shift upward by one semitone, they become: C#2, C#3, C#4, C#5, C#6, C#7, C#8, C#9. - The envelope stays fixed: C#5 and C#6 are still the loudest. C#2 is still barely audible. C#9 is barely audible. - The tones in the middle — the ones the listener pays most attention to — have clearly moved up. The barely-audible top and bottom have changed, but the ear ignores them. - After 12 semitone steps, the tones are: C3, C4, C5, C6, C7, C8, C9, C10 — shifted one octave from where we started, but perceptually equivalent (same chroma, same apparent register, because the envelope controls the apparent register).
What the Shepard Tone Reveals About Pitch Perception
Octave Equivalence
The most fundamental insight the Shepard tone demonstrates is octave equivalence: the perceptual similarity of tones separated by octave intervals. C4 (middle C) and C5 are perceived as related — they are "the same note, only higher." This is so deeply ingrained in musical perception that most trained listeners regard it as obvious. The Shepard tone reveals that octave equivalence is not just a musical convention but a genuine perceptual phenomenon: the chroma C is extracted as a perceptual identity shared by C2, C3, C4, C5, and all the other C's, independent of their specific pitch height.
The Two-Dimensionality of Pitch
The Shepard tone reveals that pitch is not a simple one-dimensional quantity. It has (at minimum) two dimensions: - Chroma (pitch class): The cyclic dimension — which note of the chromatic scale (C, C#, D, D#, E, F, F#, G, G#, A, A#, B). Chroma is what makes C4 and C5 "the same note." - Height (pitch register): The linear dimension — the overall high-low location. C5 is "higher" than C4.
The Shepard tone manipulates chroma (which the listener perceives as pitch direction) while holding height approximately constant (because the amplitude envelope prevents any one octave from seeming consistently higher or lower than the others). The result exposes the two dimensions as separable — which is a deep claim about the structure of auditory pitch space.
Perceptual Ambiguity and Filling-In
The Shepard tone also demonstrates that pitch perception involves perceptual disambiguation under ambiguity. When the top components are very quiet and the new bottom components come in very quietly, the auditory system "ignores" these ambiguous extremes and focuses on the most salient (loudest) components. The brain makes a constructive inference: the middle notes are the relevant pitch information; the extremes can be interpolated or ignored. This is not simply passive reception of a physical signal; it is active construction of a perceptual interpretation.
Modern Applications in Film and Music
Christopher Nolan's Dunkirk (2017)
The most celebrated recent use of the Shepard tone illusion is in Hans Zimmer's score for Christopher Nolan's Dunkirk, which depicts the World War II evacuation of British troops from the French coast. Nolan and Zimmer used a continuous Shepard tone figure as the primary musical element of much of the film: a ticking, ascending musical figure that creates a pervasive, unresolvable sense of urgency and upward tension for the film's full 106-minute runtime.
The effect is psychologically and physiologically potent. Listeners report feeling a constant low-grade anxiety and forward pressure — the sense that something is always escalating, always building. Because the tone never resolves downward (it can't — it's infinite), the tension never releases. This maps perfectly to the psychological experience of the film's subjects: soldiers waiting to be rescued, never certain whether rescue will come.
Zimmer layered the Shepard tone with other elements (orchestra, sound effects, dialogue) but its structural role is inescapable. Dunkirk's score is an extended experiment in sustained psychoacoustic tension — a demonstration that a perceptual illusion, carefully deployed, can become a structural compositional tool.
Hans Zimmer's Batman Theme
Zimmer and collaborator James Newton Howard used Shepard-tone-inspired techniques in their Batman scores for Christopher Nolan's Dark Knight trilogy. The "Batman theme" — a low, rising menace figure — is constructed to feel perpetually heavy and escalating. While not a pure Shepard tone, the technique of overlapping octave-spaced motifs with a shifting amplitude envelope creates a similar effect: a musical figure that feels like it is always in the process of becoming more threatening without resolution.
Earlier Musical Uses
The Shepard tone predates its film applications. Composer James Tenney used it in his electronic composition "For Ann (Rising)" (1969) — one of the earliest musical uses of the phenomenon. Gyorgy Ligeti incorporated quasi-Shepard-like textures in some of his micropolyphonic orchestral works. In rock music, The Beatles used rising, unresolvable musical figures in contexts that may have been intuitively discovered before Shepard formalized the phenomenon.
The Shepard Tone and the Stimulus-Perception Gap
The Shepard tone is, in a sense, a controlled demonstration of something profound about all perception: what you experience is not determined solely by what is physically present in the stimulus. The physical stimulus — a set of overlapping sinusoidal tones with a specific amplitude envelope — does not "contain" the endlessly ascending scale that listeners perceive. That ascending scale is constructed by the auditory system, which extracts chroma (upward motion), resolves ambiguity (by attending to the salient middle register), and suppresses conflicting information (the barely audible extremes).
This is what cognitive scientists mean when they say perception is generative or constructive: the brain generates a perceptual representation that goes beyond — and sometimes actively diverges from — the physical stimulus. The Shepard tone is not a trick or an error; it is a window into the normal operation of the auditory system, which always does exactly this: constructs the most coherent perceptual interpretation of the available physical evidence.
For music, this is both exciting and slightly vertiginous. If pitch perception is a construction — if what we hear as "a C" is a perceptual inference rather than a direct readout of physical frequency — then the experience of music is even more deeply a product of mind than of physics. The physics matters enormously, but it does not determine the experience. The experience is made, not found.
Discussion Questions
-
The Shepard tone exploits octave equivalence — the perceptual similarity of tones an octave apart — to create a circular pitch space. Do all musical cultures that use octave-based scales experience this illusion in the same way? Would you expect a listener raised in a tradition that does not systematically use octaves (if any such tradition exists) to experience the Shepard tone differently? What does this suggest about whether octave equivalence is physiological (universal) or cultural (learned)?
-
The Shepard tone is constructed from sinusoidal (pure tone) components. How would the effect change if you replaced the sine tones with complex tones (with full harmonic series)? Would the illusion be stronger, weaker, or different in character? Explain your reasoning using the psychoacoustic concepts from this chapter.
-
Hans Zimmer used the Shepard tone in Dunkirk to create sustained, unresolvable tension. Are there ethical questions about using psychoacoustic techniques to manipulate emotional states in film audiences? Does the audience's lack of awareness that they are experiencing a psychoacoustic illusion affect the ethical status of the technique? How does this compare to other film techniques (lighting, editing, camera angle) that also influence emotional experience non-consciously?
-
The Shepard tone reveals that pitch is not simply "how high a frequency is" but a complex, multi-dimensional perceptual construction. If pitch is a construction, what other aspects of musical experience might be constructions — perceptual interpretations that are not straightforwardly read off from the physical stimulus? Give two examples and explain the psychoacoustic mechanism behind each.
-
Roger Shepard described the tone as a demonstration of "circularity in judgments of relative pitch" — meaning that pitch judgments (higher vs. lower) are relative and contextual, not absolute. Is there a parallel in other domains of human judgment? Can you identify a visual illusion that works on the same principle (exploiting the relative/contextual nature of a perceptual judgment)? What does the existence of such illusions tell us about the reliability of human perception as a guide to physical reality?