Chapter 32 Exercises: Digital Audio — Sampling, Quantization & the Nyquist Theorem

Part A: Nyquist Theorem and Sampling

A1. State the Nyquist-Shannon sampling theorem in your own words without using mathematical notation. Then state it using a formula. What are the two conditions required for perfect reconstruction? Give a concrete example: if you want to sample audio with content up to 18,000 Hz, what is the minimum sampling rate required?

A2. A digital audio system samples at 32,000 Hz. (a) What is the Nyquist frequency? (b) Can this system accurately capture a 15,000 Hz cymbal shimmer? (c) Can it capture a 17,000 Hz flute harmonic? (d) What will happen to an 18,000 Hz component that enters the ADC without being filtered out first? Calculate the alias frequency.

A3. Video game audio from the 1980s era often used sample rates of 8,000 Hz or lower. (a) What is the Nyquist frequency for 8,000 Hz sampling? (b) What is the highest vocal formant frequency that can be captured? (c) The "telephone voice" quality of old game audio is partly attributable to this bandwidth limitation. Explain specifically which aspects of voice quality are lost above 4,000 Hz. (d) What sample rate would be needed to capture voice quality comparable to modern telephone (4,000 Hz bandwidth)?

A4. A film sound system uses a sample rate of 48,000 Hz rather than the CD-standard 44,100 Hz. (a) What are the Nyquist frequencies for each? (b) What additional frequency content can the 48 kHz system capture that the 44.1 kHz system cannot? (c) Film sound is often mixed for theatrical speakers that may not reproduce above 20,000 Hz anyway. Is there a practical reason to prefer 48 kHz for film audio? Consider the anti-aliasing filter steepness required for each rate.

A5. Shannon's reconstruction formula states that the original signal can be recovered from its samples using a sum of sinc functions. (a) What is the sinc function, and what does it look like in the time domain? (b) At what time values does sinc(t) equal zero? (c) If sample values are x[0] = 0.5 and x[1] = 0.8, and the sampling period is T_s, sketch (don't calculate — just describe the shape) the reconstructed continuous signal between these two samples using sinc interpolation. Why does the reconstructed value between samples not follow a straight line from x[0] to x[1]?


Part B: Aliasing

B1. A 25,000 Hz tone is sampled at 44,100 Hz without an anti-aliasing filter. (a) What is the alias frequency? Show your calculation using the formula f_alias = |f − n × f_s| for the appropriate integer n. (b) Is this alias audible to a typical adult? (c) Now a 35,000 Hz tone is sampled at the same rate. What is its alias frequency? (c) A 22,500 Hz tone is sampled at 44,100 Hz. What is its alias? Why is this particularly dangerous?

B2. The wagon-wheel effect in film is a visual analog of audio aliasing. A wagon wheel has 12 spokes and rotates at 25 revolutions per second, filmed at 24 frames per second. (a) What is the "frequency" of the wheel rotation in terms of spoke passages per second (12 spokes × 25 revolutions = 300 spoke-passages/second)? (b) What is the "Nyquist frequency" of 24-frame-per-second film for this motion (in spoke-passages/second)? (c) What "aliased" rotation rate will the wheel appear to have on film? Will it appear to go forward or backward?

B3. An anti-aliasing filter for a 44,100 Hz ADC must attenuate frequencies above 22,050 Hz before sampling. (a) The filter achieves −3 dB at 20,000 Hz and −80 dB at 22,050 Hz. Sketch the filter's frequency response and label the passband, transition band, and stopband. (b) What is the filter's roll-off rate in dB per octave? (c) Is there any frequency content above 22,050 Hz that "leaks through" a real (non-ideal) anti-aliasing filter? What does this mean for practical digital audio?

B4. Digital audio workstations often work at 88.2 or 96 kHz internally, even for projects that will be delivered at 44.1 or 48 kHz. (a) What aliasing risk does 88.2 kHz internal processing avoid that 44.1 kHz processing would encounter? (b) Many digital effects (reverb, EQ, distortion) can generate frequency content above the input signal's bandwidth. How does working at 88.2 kHz help avoid aliasing artifacts from these effects? (c) When the project is downsampled (decimated) from 88.2 kHz to 44.1 kHz for delivery, what process must occur to avoid introducing aliasing at that step?

B5. A musician is recording a synthesizer that produces a 15,000 Hz sawtooth wave (the fundamental) with harmonics at 30,000, 45,000, 60,000 Hz, etc. The ADC runs at 44,100 Hz. (a) The 15,000 Hz fundamental is safely below Nyquist. Is it captured correctly? (b) The 30,000 Hz harmonic is above Nyquist. What is its alias frequency? (c) The 45,000 Hz harmonic is above the sample rate itself. What is its alias? Show your calculation. (d) Why does a digitally synthesized sawtooth wave avoid this problem that an analog synthesizer sawtooth does not?


Part C: Quantization and Bit Depth

C1. (a) How many distinct amplitude levels can a 16-bit digital audio system represent? (b) What is the theoretical dynamic range in dB? Use the formula DR = 6.02 × B + 1.76 dB. (c) How many bits would be needed to achieve 120 dB of dynamic range? (d) A 32-bit floating-point audio format achieves approximately 1,528 dB of theoretical dynamic range. Is there any practical use for dynamic range beyond 120 dB in a listening context?

C2. A recording engineer sets the input level too low: the peak signal is at −24 dBFS (24 dB below full scale) rather than the target −6 dBFS. (a) For a 16-bit system, how many quantization levels are actually being used by this quiet signal? (b) What is the effective dynamic range of this recording, given that the signal uses only the bottom portion of the 16-bit range? (c) If the recording is later boosted by 18 dB in post-production to reach the correct level, will the boosted recording sound as good as if it had been recorded at the correct level? Why?

C3. Explain the concept of quantization noise using the following steps: (a) Define "quantization error" precisely. (b) Explain why, for a complex audio signal, quantization error behaves statistically like random noise. (c) Explain why quantization error for a very simple signal (a pure sine wave at very low level) does NOT behave like random noise — what does it sound like, and why? (d) What does dithering do to change this behavior?

C4. A 24-bit audio recording is being prepared for CD release at 16-bit. The engineer needs to reduce the bit depth from 24 to 16 bits. (a) Simply rounding the 24-bit samples to the nearest 16-bit value: what noise level is introduced? (b) The engineer adds 16-bit dither before rounding. How does this change the character of the quantization error? (c) The engineer then applies noise-shaped dithering that concentrates quantization error above 15,000 Hz. Why would this be preferable to flat dither noise? What assumption about the listener's hearing does this rely on?

C5. The CD standard (16-bit, 44,100 Hz) stores approximately 10 MB per minute per channel of audio (44,100 samples × 2 bytes × 60 seconds ÷ 1,048,576). (a) Calculate the actual storage for one minute of stereo CD audio. (b) A 24-bit/96 kHz "high-resolution" file is how many times larger than the same recording at 16-bit/44.1 kHz? Calculate the exact factor. (c) A 16-bit/44.1 kHz FLAC lossless compressed file is typically 50-60% the size of the uncompressed WAV. How does FLAC achieve this compression without losing any audio information? What principle of information theory does it exploit?


Part D: Analog-to-Digital and Digital-to-Analog Conversion

D1. Explain the difference between an analog signal and a digital signal in terms of (a) the number of distinct amplitude values possible at any instant, (b) the continuity of the time axis, and (c) how information is encoded. Then explain why the phrase "the digital version is just an approximation of the analog original" is technically incorrect when the Nyquist condition is met.

D2. A sigma-delta ADC samples at 4 MHz (4,000,000 Hz) using 1-bit quantization, then digitally filters and decimates to produce a 48,000 Hz, 24-bit output. (a) What is the Nyquist frequency of the internal 4 MHz sampling stage? (b) What oversampling ratio is the converter using (4 MHz / 48 kHz)? (c) Why is a 1-bit quantizer adequate when combined with this extreme oversampling? What property of the quantization noise spectrum does the sigma-delta feedback loop create? (d) What is the advantage of needing only a gentle anti-aliasing filter at 4 MHz rather than a steep filter at 48 kHz?

D3. The reconstruction filter in a DAC must remove "images" of the audio signal centered at multiples of the sample rate. (a) If a DAC outputs at 44,100 Hz, at what frequencies do images appear? List the first three. (b) Why must these images be removed before the signal reaches the loudspeaker? (c) The reconstruction filter introduces "pre-ringing" in the time domain for sharp transients. Explain the Gibbs phenomenon and why it produces this effect. (d) Some DAC designers use "minimum-phase" reconstruction filters to reduce pre-ringing at the cost of more post-ringing. What is the trade-off?

D4. Latency in digital audio systems results from the need to process audio in blocks. (a) If a DAW uses a buffer size of 256 samples at 44,100 Hz, what is the buffer's duration in milliseconds? (b) What total round-trip latency (input buffer + processing + output buffer) would this produce? (c) Why does a musician performing live with effects (guitar amp simulation, for example) need latency below approximately 10-15 ms? What perceptual effect does latency above this threshold produce? (d) How does an audio interface with hardware DSP effects bypass the computer's buffer latency for monitoring?

D5. Dithering adds noise to improve perceived quality. This seems paradoxical. Explain the physics and psychoacoustics of dithering using these steps: (a) What specific problem does undithered quantization create for very quiet signals (below approximately −60 dBFS for 16-bit)? (b) How does adding dither noise before quantization change the statistical relationship between the quantization error and the audio signal? (c) Why does the ear find random noise less objectionable than correlated distortion, even at the same power level? (d) After adding dither and quantizing, the signal is slightly noisier. When is this noise floor trade-off worthwhile?


Part E: High-Resolution Audio and Critical Analysis

E1. A manufacturer claims their 32-bit/384 kHz audio format is "the ultimate in digital audio fidelity." Evaluate this claim from first principles: (a) What is the Nyquist frequency of 384 kHz sampling, and what does this mean for the required anti-aliasing filter? (b) What dynamic range does 32-bit integer audio theoretically provide? Is any music — or any listening environment — capable of using this dynamic range? (c) What would be the file size of one minute of stereo 32-bit/384 kHz audio, in megabytes? (d) If the manufacturer's 32-bit DAC is only capable of 130 dB of actual dynamic range (due to circuit noise), how many effective bits does the converter actually achieve? Use the DR formula.

E2. The Spotify Spectral Dataset analysis in Section 32.9 found that the majority of commercial music tracks have negligible content above 20,000 Hz. (a) For these tracks, is there a measurable benefit to delivering them at 96 kHz rather than 44.1 kHz? Consider both the frequency content argument and the filter quality argument. (b) For orchestral recordings that do have content at 22-28 kHz (cymbal shimmer, string bow noise), is this content musically significant? Design an argument both for and against preserving it. (c) Under what listening conditions — what playback system, what listening environment — could any difference between 44.1 kHz and 96 kHz audio theoretically be detectable?

E3. The Meyer and Moran 2007 study found no statistically significant preference for high-resolution audio in double-blind tests. However, the AES meta-analysis (Reiss 2016) found a small but statistically significant advantage. (a) Explain what "statistically significant" means in this context. Can a result be statistically significant but not practically meaningful? (b) What methodological issues might cause double-blind tests to underestimate real-world preference differences? (c) Conversely, what methodological issues in unblinded audiophile listening might exaggerate perceived differences? (d) Design an improved listening test that addresses at least two methodological concerns. What controls would you include?

E4. Consider three scenarios and evaluate the importance of sample rate and bit depth in each: (a) A pop song destined for Spotify streaming at 320 kbps MP3. The song will be heard on earbuds during commuting, often with ambient noise. (b) A classical piano recording destined for high-resolution download and playback on a $10,000 audiophile system in a quiet room. (c) A forensic audio analysis of a crime scene recording, where the investigator needs to identify barely audible speech behind other sounds. For each scenario, recommend a sample rate and bit depth, and justify your choice.

E5. Vinyl records have experienced a major commercial renaissance. From a digital audio perspective: (a) A vinyl record can contain frequency content above 22,050 Hz (the CD Nyquist). Does this frequency content survive the entire vinyl production and playback chain (mastering, cutting lathe, pressing, stylus, phono cartridge, phono preamplifier)? Research and discuss each stage's bandwidth. (b) Vinyl also has lower dynamic range (approximately 60-70 dB) than 16-bit CD (96 dB). Paradoxically, many audiophiles prefer vinyl. Using the concepts of quantization noise, distortion character, and mastering practices discussed in this chapter, construct a physics-based argument for why the vinyl listening experience might be preferred despite its technical limitations.