Chapter 22 Exercises: The Uncertainty Principle & Musical Timbre — Time-Frequency Trade-offs

These exercises are organized into five sections (A–E), progressing from conceptual foundations to mathematical analysis, coding challenges, and philosophical examination.


Section A: Conceptual Foundations

A1. The chapter distinguishes between two incorrect popular descriptions of the Heisenberg uncertainty principle: (1) "It means you disturb what you measure" and (2) "It means we don't know both position and momentum, but they have definite values." Explain why both descriptions are wrong. What is the correct statement? In what physical sense is a state with definite position AND definite momentum not just unknown but physically impossible?

A2. The Gabor limit states that for any audio signal, Δf·Δt ≥ 1/(4π). State the Heisenberg uncertainty principle and write down the Gabor limit. Show that they have the same mathematical form. Identify the physical quantities that play analogous roles in each. Explain why the constant on the right-hand side is different (ħ/2 vs. 1/4π) and what this difference reflects.

A3. A staccato quarter note played on a piano at 120 BPM has a duration of approximately 0.15 seconds. A legato quarter note at the same tempo might last the full 0.5 seconds. Using the Gabor limit, calculate the minimum frequency spread (Δf) for each note. What does this imply about the pitch purity of staccato vs. legato playing? Can this difference be heard? Why or why not?

A4. A bass singer sustaining the note E₂ (82.4 Hz) for three seconds has a very narrow frequency bandwidth — his pitch is stable and pure. A timpani drummer strikes a drum head, producing a sound lasting about 0.3 seconds with a dominant "pitch" around 120 Hz. Which sound has better-defined pitch, and which has better time-definition? Compute approximate minimum frequency spreads using the Gabor limit and compare them to the frequency difference between E₂ and the next semitone (F₂, 87.3 Hz). Can a listener tell whether the timpani is playing E or F?

A5. Explain why the human auditory system cannot simultaneously have perfect time resolution and perfect frequency resolution. Describe specifically how the cochlea allocates the uncertainty budget across different frequency ranges (compare high-frequency processing at the base to low-frequency processing at the apex). What does this suggest about how the brain organizes rhythmic vs. harmonic information?


Section B: Spectrograms and Time-Frequency Analysis

B1. An audio engineer is analyzing a recording with an FFT window of 512 samples at a sample rate of 44,100 Hz. Calculate: (a) the time duration of the analysis window, (b) the frequency resolution of the analysis, (c) the minimum Δf·Δt product achievable with this window. How does this compare to the Gabor minimum of 1/(4π) ≈ 0.08? Why is the actual product always larger than the Gabor minimum?

B2. A spectrogram of a cello note shows a bright vertical line (sharp onset) that gradually fans out into clear horizontal harmonic lines. Sketch what this spectrogram looks like and explain, in terms of the time-frequency uncertainty, why the onset is broad in frequency but the sustained note is narrow. What windowing choice would you make to best visualize (a) the onset, and (b) the sustained note?

B3. Explain the concept of "spectral leakage" in FFT analysis. When you analyze a finite-duration tone with an FFT, why does the spectrum show energy at frequencies other than the fundamental? How does this relate to the Gabor limit? How do window functions (Hann, Hamming, Blackman) address spectral leakage, and what is the trade-off they make?

B4. Two notes are played simultaneously: A₄ at 440 Hz and A#₄ at 466 Hz (a semitone apart). What FFT window length (in samples at 44,100 Hz) would you need to distinguish these two notes in a spectrogram? Express your answer both in samples and in milliseconds. What time resolution do you sacrifice to achieve this frequency resolution?

B5. Wavelets are described as providing "multi-resolution analysis." Explain what this means in concrete terms: what resolution does a wavelet analysis give at 100 Hz versus 1000 Hz versus 8000 Hz? Compare this to what a standard STFT gives. Why is the wavelet approach better suited to music analysis than a fixed-window STFT, and where does the wavelet approach still fail to beat the Gabor limit?


Section C: The Gabor Atom and Minimum Uncertainty

C1. A Gabor atom is described as achieving minimum time-frequency uncertainty. Describe its mathematical form: g(t) = A·exp(-(t-t₀)²/2σ²)·cos(2πf₀t). Identify each parameter and explain its physical meaning. What is the time spread σ_t of this atom? What is the frequency spread σ_f? Show that their product is 1/(4π).

C2. In quantum mechanics, the coherent state of a quantum harmonic oscillator is the state that saturates the Heisenberg bound (Δx·Δp = ħ/2). The coherent state is a Gaussian wave packet. How does this relate to the Gabor atom? Identify the specific mathematical parallel: what plays the role of x (position) in the acoustic domain? What plays the role of p (momentum)? What is the acoustic analog of ħ?

C3. A musician tries to produce a "perfect" tone — one with both perfectly precise pitch and perfectly precise attack time. Explain why this is physically impossible using the Gabor limit. Is the constraint a practical limitation (better instruments could overcome it) or a fundamental one? How does this affect what we can mean by "perfectly in tune and perfectly in time" in ensemble playing?

C4. Compare a Gabor atom with σ_t = 5 ms (centered at 440 Hz) to a Gabor atom with σ_t = 50 ms (also at 440 Hz). For each: (a) calculate the theoretical frequency spread σ_f, (b) describe what the sound would feel like perceptually (a brief tap vs. a soft bell-tone), (c) calculate the uncertainty product and compare to the Gabor minimum.

C5. The chapter mentions that laser photons are in "coherent states" — the quantum analog of Gabor atoms. What physical property of laser light corresponds to the "minimum uncertainty" of the Gabor atom? How does this explain why lasers are so useful for precision measurements (like in LIGO, described in Chapter 23's case study)?


Section D: Applications in Music and Audio Engineering

D1. A compressor plugin is set to have an "attack time" of 1 ms (how quickly it responds to signal increases). Given the Gabor limit, what is the minimum frequency bandwidth that this 1 ms response window analyzes? Can a compressor with 1 ms attack accurately respond to the level of a specific frequency band (say, just the bass frequencies below 200 Hz)? What trade-off does the engineer face?

D2. A singer is performing with a pitch correction plugin (like Auto-Tune). The plugin must (a) detect the current pitch and (b) apply correction quickly to avoid noticeable artifacts. If the pitch detector uses a 20 ms analysis window, what is its frequency resolution? Can it distinguish between notes that are 25 cents apart (a quarter-tone)? How long would the analysis window need to be to distinguish notes one semitone apart at 440 Hz?

D3. In a recording of a full orchestra, an engineer wants to apply a narrow notch filter to remove a 60 Hz hum from the recording. If the notch filter is 2 Hz wide (removing 59–61 Hz), how long will the filter's impulse response be? What does this mean for how long the ringing artifact will last in the processed audio? How does this relate to the Gabor limit?

D4. MP3 audio compression uses psychoacoustic masking: sounds below the ear's masking threshold can be discarded. One type of masking is "temporal masking" — a loud sound masks nearby quieter sounds in time, for about 100 ms before and 200 ms after. Another is "frequency masking" — a loud sound masks nearby quieter sounds in frequency. Explain how both types of masking relate to the time-frequency uncertainty. Why is the uncertainty principle relevant to understanding what the ear can and cannot hear?

D5. The "phase vocoder" algorithm manipulates audio by separating the time and frequency dimensions. It can slow down audio without changing pitch (time-stretch) or change pitch without altering tempo (pitch-shift). Describe conceptually how this works: what does the phase vocoder analyze, and what does it preserve/change? Does the phase vocoder "violate" the Gabor limit, or is it consistent with it? What artifacts appear in phase-vocoded audio, and why?


Section E: Philosophy and Depth

E1. The chapter claims: "The Heisenberg uncertainty principle is a theorem of Fourier analysis applied to quantum wave functions. The same Fourier analysis applies to audio signals, giving an identical theorem — the Gabor uncertainty principle. This is not an analogy: it is the same mathematical proof." Evaluate this claim carefully. In what precise sense are the two theorems "the same"? In what sense might they differ despite having the same mathematical form? What would it mean for them to be "the same theorem"?

E2. The debate question asks: "Does the mathematical identity of Heisenberg and Gabor uncertainty mean physics and music are 'the same,' or just that they both use Fourier analysis?" Construct an argument for each side. Consider: what would it mean for them to be "the same"? Is Fourier analysis a physical theory or a mathematical tool? Can a mathematical tool be physically significant?

E3. The thought experiment asks: what would music sound like without the uncertainty principle? But there is a more fundamental question: could music exist without waves? Sound is a wave phenomenon; the uncertainty principle is a property of waves. If there were no waves — no mechanical vibration propagating through a medium — could there be music at all? What does this suggest about the relationship between music and physics?

E4. Some audio engineers and physicists argue that the Gabor limit is not a "fundamental" constraint but an artifact of how we define "duration" and "bandwidth" using RMS measures. If you defined these measures differently, you could get different lower bounds. Evaluate this argument. Is the Gabor limit really fundamental, or is it a consequence of definitions? Does the same issue arise with the Heisenberg uncertainty principle?

E5. The Wigner distribution is described as the "exact time-frequency representation" that requires no windowing compromise — but it can be negative. Explain why negativity is a problem for interpreting the Wigner distribution as a probability distribution. What does it mean physically for a "probability" to be negative? Connect this to the quantum mechanical Wigner function and its negativity as a signature of quantum coherence. What does the existence of negative-valued time-frequency distributions suggest about the nature of waves and uncertainty?