Chapter 32 Quiz: Digital Audio — Sampling, Quantization & the Nyquist Theorem
20 questions. Click the arrow to reveal each answer.
Question 1. State the Nyquist-Shannon sampling theorem in a single sentence. What is the "Nyquist frequency"?
Show Answer
**Theorem:** A continuous signal whose highest frequency component is f_max can be perfectly reconstructed from samples taken at any rate greater than 2 × f_max. **Nyquist frequency:** For a system operating at sample rate f_s, the Nyquist frequency is f_N = f_s / 2. This is the highest frequency that can be correctly captured. The Nyquist frequency of CD audio (44,100 Hz sampling) is 22,050 Hz.Question 2. Why is "2×" the magic number in the Nyquist theorem? What would happen if you sampled at exactly 2× the signal frequency — not above, but exactly at?
Show Answer
**Why 2×:** A sine wave has two distinct phases per cycle — the rising half and the falling half. To uniquely determine that a sine wave is oscillating (rather than being a constant), you need at minimum one sample per phase, i.e., two samples per cycle, i.e., 2× the signal frequency. **Exactly at 2× (the Nyquist rate):** Sampling at exactly 2× the signal frequency is technically insufficient. The theorem requires *greater than* 2× (strictly). At exactly 2×, you might sample the zero-crossings of a sine wave, getting all zeros and concluding the signal is silent. More generally, the exact relationship between the sample phase and signal phase at exactly the Nyquist rate is undefined — the reconstruction is not guaranteed. In practice, sampling must exceed the Nyquist rate by at least a small margin, which is why CD's 44,100 Hz rate slightly exceeds the 40,000 Hz minimum (2 × 20,000 Hz) for human hearing.Question 3. What is aliasing, and what causes it?
Show Answer
**Aliasing** is the phenomenon by which a frequency component above the Nyquist limit, when sampled without proper anti-aliasing filtering, appears in the digital representation as a false lower-frequency signal. **Cause:** When a signal at frequency f is sampled at rate f_s, and f > f_s/2, the sampling process cannot distinguish f from its "mirror image" (alias) frequency: f_alias = |f − n × f_s| (for the integer n that brings this closest to zero). The samples of the original high frequency are identical to the samples that would have been produced by the lower alias frequency. The two frequencies are *indistinguishable* in the digital domain — the sampling process has created a false signal.Question 4. A 30,000 Hz tone is sampled at 44,100 Hz without an anti-aliasing filter. What frequency will appear in the digital recording?
Show Answer
Using f_alias = |f − n × f_s|: - n = round(30,000 / 44,100) = round(0.68) = 1 - f_alias = |30,000 − 1 × 44,100| = |30,000 − 44,100| = 14,100 Hz A 14,100 Hz tone will appear in the digital recording. This tone was NOT in the original signal — it is a phantom created by aliasing. It is fully audible (just below the 16,000 Hz range many adults can hear) and is inharmonically related to any other content in the recording.Question 5. What is an anti-aliasing filter, and why must it be placed before the ADC rather than after?
Show Answer
An **anti-aliasing filter** is a low-pass filter that removes all frequency content above the Nyquist frequency (f_s/2) from the audio signal before it reaches the Analog-to-Digital Converter. **Why before the ADC, not after:** Once aliasing has occurred — once a high-frequency signal has been sampled and the alias has folded into the audio band — the alias is indistinguishable from genuine audio content at that frequency. There is no way to remove the alias in post-processing, because the digital data contains no information about whether a 14,100 Hz signal in the record is a genuine 14,100 Hz signal or an aliased version of a 30,000 Hz signal. The information about the original frequency is destroyed by the sampling process. Therefore, the aliasing must be prevented *before* sampling by filtering out the problematic frequencies.Question 6. Why does a steep anti-aliasing filter introduce phase distortion in the audio band?
Show Answer
Any analog filter that sharply cuts off near a specific frequency (a "steep" filter) is, by the mathematics of filter design, also a filter with significant **phase shift** near its cutoff frequency. This is not an accident of design — it is a mathematical consequence of the Kramers-Kronig relations: a filter cannot have a sharp amplitude rolloff without also having significant phase shift in the same frequency region. For a 44.1 kHz ADC, the anti-aliasing filter must cut off between 20,000 Hz (pass) and 22,050 Hz (block) — a very narrow transition band. This narrow transition band requires a very steep (high-order) filter, which introduces measurable phase shift in the 16,000–20,000 Hz range. This phase shift means that different high frequencies are slightly delayed relative to one another in the output — a form of timing distortion. Whether this is audible is debated, but it is measurable. Oversampling (working at a much higher internal sample rate) allows a gentler anti-aliasing filter with a much wider transition band, reducing this phase distortion problem.Question 7. How many quantization levels does 16-bit audio have? What is the approximate dynamic range?
Show Answer
**Quantization levels:** 2^16 = 65,536 distinct amplitude values. **Dynamic range:** Using the formula DR = 6.02 × B + 1.76 dB: DR = 6.02 × 16 + 1.76 = 96.32 + 1.76 ≈ **98 dB** This means the ratio of the loudest possible signal to the quantization noise floor is approximately 98 dB — far exceeding the dynamic range of any analog recording technology of the pre-digital era (professional tape: approximately 70 dB).Question 8. Why does each additional bit of bit depth add approximately 6 dB of dynamic range?
Show Answer
Each additional bit doubles the number of available quantization levels (2^B grows by a factor of 2 for each additional bit). Doubling the number of levels halves the step size between levels — the quantization error amplitude is halved. In decibels: 20 × log₁₀(2) = 20 × 0.301 = **6.02 dB**. So doubling the number of quantization levels (adding 1 bit) reduces quantization noise by 6.02 dB, which is equivalent to increasing the dynamic range by 6.02 dB. This is the physics behind the "6 dB per bit" rule of thumb.Question 9. What is quantization noise, and why is it particularly audible at very low signal levels?
Show Answer
**Quantization noise** is the error introduced when an audio sample is rounded to the nearest available quantization level. The error ranges from −½ LSB to +½ LSB (half of one least-significant-bit step size). **Why audible at low levels:** At high signal levels (near full scale), the signal spans thousands or millions of quantization levels per cycle. The quantization error (a tiny fraction of the signal amplitude) is negligible relative to the signal. But at very low signal levels — say, a quiet fade that is only 1-2 quantization levels above the noise floor — the signal uses only a few discrete steps. The quantization error is no longer negligible: it is a substantial fraction of the signal amplitude. Worse, the error is **correlated with the signal** (predictably related to where the signal is on the quantization staircase), producing a granular, buzzing distortion rather than smooth audio. This correlated distortion is much more audible than an equivalent amount of random noise.Question 10. What is dithering, and why does it improve the quality of audio at low signal levels despite adding noise?
Show Answer
**Dithering** is the addition of a small amount of random noise to an audio signal before quantization. **Why it improves quality:** Without dithering, quantization error is correlated with the signal — it follows the signal deterministically, producing granular distortion. Adding random dither noise *randomizes* the quantization error: instead of tracking the signal deterministically, the error becomes statistically independent of the signal. The brain finds random noise less objectionable than correlated distortion of the same power level. With dithered quantization, the ear perceives a uniform noise floor below the signal, rather than signal-correlated granularity. This allows quiet signals that are below the quantization level to still be "perceived" through their statistical effect on the noise — the listener can hear signal energy that is smaller than one quantization step on average. The tradeoff: a small, constant noise floor in exchange for elimination of distortion.Question 11. What is "noise-shaped dithering," and why is it perceptually superior to flat (white) dither noise?
Show Answer
**Noise-shaped dithering** is a technique that concentrates the dither noise energy in frequency bands where human hearing is least sensitive — typically above 15,000 Hz and at very low frequencies — and removes noise from the midrange (1,000–5,000 Hz) where the ear is most sensitive. **Why perceptually superior:** The human auditory system has a non-flat sensitivity curve (the Fletcher-Munson or equal-loudness contour). The ear is far more sensitive to midrange frequencies than to high or low frequencies. If dither noise is spectrally shaped to concentrate in regions of low auditory sensitivity, the same amount of total noise power (required to decorrelate quantization error) becomes much less audible. Noise-shaped dithering applied to 16-bit audio can achieve a perceptual dynamic range exceeding 120 dB — significantly better than the theoretical 96 dB limit of undithered 16-bit audio.Question 12. Why was 44,100 Hz chosen as the CD sampling rate? What is the historical reason?
Show Answer
The 44,100 Hz figure was derived from the need to store digital audio on **consumer video cassette tape** (VCR), which was the most practical high-density data storage medium available in the late 1970s when the CD standard was being developed. Audio was encoded within video signals by using a fixed number of audio samples per video line. Both PAL (25 fps, 625 lines) and NTSC (approximately 30 fps, 525 lines) video standards could accommodate a sample rate of approximately 44,100 Hz using 3 samples per video line. This compatibility requirement led to the 44,100 Hz figure. The practical audio requirement was that the sample rate exceed 40,000 Hz (twice the nominal 20,000 Hz upper limit of human hearing). 44,100 Hz satisfies this with a comfortable margin and happened to be compatible with both major video standards.Question 13. What is "high-resolution audio" (e.g., 96 kHz/24-bit), and what are the main physical arguments for and against it providing better quality than 44.1 kHz/16-bit?
Show Answer
**High-resolution audio** uses higher sample rates (88.2, 96, 176.4, or 192 kHz) and/or higher bit depths (24 or 32 bits) than the CD standard. **Physical arguments FOR:** (1) Higher sample rates can capture genuine musical content above 20 kHz (cymbal shimmer, bow noise) that may interact with the audible range. (2) Higher sample rates allow gentler anti-aliasing filters with wider transition bands, reducing phase distortion in the upper audio band. (3) 24-bit recording provides headroom for gain staging errors during tracking and mixing. (4) If any ultrasonic content is present, intermodulation with the playback chain could affect the audible range. **Physical arguments AGAINST:** (1) Human hearing is universally accepted to extend no higher than 20,000 Hz — there is no mechanism for perceiving 40,000 Hz content. (2) Multiple double-blind listening tests (including the Meyer-Moran 2007 study) have found no statistically reliable preference for hi-res over 16/44.1 audio. (3) The additional data is 6× larger (for 24/96) without documented perceptual benefit for delivery formats.Question 14. What does the Spotify Spectral Dataset analysis in Section 32.9 reveal about the practical importance of sample rate for different music genres?
Show Answer
The analysis reveals that the practical importance of sample rate varies substantially by genre: **Genres where higher sample rates matter most:** Orchestral classical, acoustic jazz, acoustic folk — music with natural high-frequency acoustic content from cymbals, bowed strings near the bridge, woodwind overtones. These sources can have significant energy at 22–28 kHz that is captured at 96 kHz but eliminated by the 44.1 kHz anti-aliasing filter. **Genres where higher sample rates matter least:** Electronic music, hip-hop, and heavily processed pop — music produced digitally or with heavy signal processing, where high-frequency content is often already band-limited by the production tools, or where spectral content above 20 kHz is absent or very low level. **Implication:** For 70–80% of commercial music in the dataset, 44.1 kHz is practically equivalent to 96 kHz for the end listener. High-resolution sampling has clear value during the recording and mastering process (where it provides headroom and reduced filter distortion), but may not matter for most genres at the final delivery stage.Question 15. Explain why a DAC cannot simply connect the digital sample values directly to an audio output — why is a reconstruction filter required?
Show Answer
When a DAC converts samples to analog, the simplest implementation holds each sample value constant until the next sample arrives — producing a "staircase" waveform. This staircase does contain the correct audio information, but it also contains **high-frequency spectral images**: copies of the audio spectrum centered at the sample rate and its multiples (44,100 Hz, 88,200 Hz, 132,300 Hz, etc.). These images are artifacts of the discrete-to-continuous conversion process, not part of the original audio. They are at ultrasonic frequencies and would not be heard directly, but they could intermodulate with the audio band in subsequent electronics (amplifiers, speaker drivers), creating audible distortion products. They could also interfere with electronic components not designed to handle ultrasonic signals. The **reconstruction (low-pass) filter** removes these images by blocking everything above the Nyquist frequency (22,050 Hz for 44.1 kHz systems), leaving only the desired audio-band signal.Question 16. What is the "Gibbs phenomenon" in the context of ideal digital reconstruction filters, and why does it matter for audio quality?
Show Answer
The **Gibbs phenomenon** is an overshoot/oscillation that occurs in the time-domain output of a system with a perfectly sharp (ideal brick-wall) frequency cutoff. When a sharp transient (like a drum attack) is passed through an ideal low-pass filter, the output shows oscillations both *before* the transient (pre-ringing) and *after* it (post-ringing), at the cutoff frequency. **Why it matters for audio:** The pre-ringing is particularly controversial. In physical acoustics, effects always come *after* their causes (causality). A reconstruction filter that produces audible oscillation *before* a drum attack would be physically unnatural. Some engineers and audiophiles argue this pre-ringing is audible and objectionable for sharp transients like snare drums. However, for ideal brick-wall filters, the ringing frequency is the Nyquist frequency (22,050 Hz for CD audio) — inaudible to most adults. The pre-ringing is at ultrasonic frequencies, suggesting it should not be perceivable. Whether there is an audible effect through intermodulation or subtle time-domain cues remains a genuine point of debate in high-end audio design.Question 17. What is "oversampling" in modern ADC design, and how does it help address the anti-aliasing filter problem?
Show Answer
**Oversampling** is the practice of sampling the audio signal at a rate many times higher than the final desired sample rate (e.g., sampling at 2.8 MHz internally to produce 44.1 kHz audio — an oversampling ratio of approximately 64×). **How it helps:** The anti-aliasing filter must cut off frequencies above the Nyquist frequency. At 44.1 kHz, the Nyquist is 22.05 kHz — dangerously close to 20 kHz (human hearing limit). The filter must transition from 0 dB (pass) to −80 dB (block) in just 2 kHz — a very steep filter that introduces phase distortion in the audible band. At 2.8 MHz, the Nyquist is 1.4 MHz. The anti-aliasing filter now only needs to block frequencies above 1.4 MHz while passing 20 kHz. The transition band is huge (1.4 MHz vs. 2 kHz), allowing a very gentle filter that introduces negligible phase distortion in the audio band. A digital filter then decimates the 2.8 MHz stream down to 44.1 kHz with perfect brick-wall characteristics (digital filters can be made arbitrarily steep without phase problems).Question 18. What is latency in digital audio, and why does it matter for live performance?
Show Answer
**Latency** is the time delay between an audio event entering the digital system (e.g., a singer's voice at the microphone) and being heard through the output (e.g., monitor headphones). In a digital system, audio must be processed in blocks of samples (buffer size). The buffer must be filled before processing begins and before audio can be output. Round-trip latency = input buffer time + processing time + output buffer time. **Why it matters for live performance:** When a musician monitors themselves through a digital system with significant latency, they hear their own performance with a delay. Delays above approximately 10–15 milliseconds begin to interfere with performance — musicians tend to "chase" the delayed signal, slowing their tempo or making rhythmic errors. Delays above 30–40 ms become severely disruptive, similar to the auditory feedback experiments that cause stuttering in normal speakers. Professional audio interfaces achieve round-trip latencies of 3–5 ms using small buffer sizes (64–256 samples) and dedicated low-latency drivers. Consumer audio hardware with large buffers may have latencies of 50–100 ms or more — unacceptable for live monitoring.Question 19. The CD's 44.1 kHz/16-bit standard was established in the late 1970s. If you were designing a new digital audio standard today, would you choose the same specifications? What would you change and why?
Show Answer
This is a discussion question; well-reasoned answers may vary. A strong response would address: **What to keep:** - 44.1 kHz (or 48 kHz for film/broadcast) sample rate: The Nyquist frequency of 22,050 Hz is adequate for human hearing (nominally 20,000 Hz). Higher sample rates provide diminishing perceptual returns for playback. - 16-bit minimum: Adequate dynamic range for playback in any real listening environment (ambient noise floors of typical rooms are above −60 dB SPL, within 16-bit's 96 dB range). **What to change:** - **Higher delivery resolution:** 24-bit for playback (the additional dynamic range has no cost in a streaming environment and provides headroom for quiet passages near the quantization limit). - **Higher recording/production standard:** 96 kHz/24-bit or 88.2 kHz/24-bit for production, even if final delivery is at 44.1 kHz. The wider anti-aliasing filter transition band at higher sample rates reduces phase distortion in production. - **Mandatory dithering:** The standard should specify required dither when converting from production to delivery bit depth.Question 20. Why is the Nyquist-Shannon theorem considered one of the most important mathematical results of the 20th century, beyond just its application to audio?