Chapter 2 Quiz
Closed book, honest conditions, no peeking at the chapter. Answer everything, then open the details blocks — the explanations are written to teach, not only to grade. Scoring table at the bottom.
Section 1: Multiple Choice (1 point each)
Q1. A single audio sample records: - (a) A short slice of sound, like a tiny audio clip - (b) The wave's amplitude at one instant, stored as a number - (c) The frequency present at one instant - (d) One complete cycle of the waveform
Answer
**(b).** A sample is one measurement — the signal's instantaneous amplitude, written down as a number. It contains no frequency information by itself; frequency emerges from the *pattern* across many samples. Thinking of samples as "tiny slices of sound" is the film-frames trap that leads straight to the stairstep misconception.Q2. The Nyquist frequency of a 44.1 kHz session is: - (a) 44,100 Hz - (b) 88,200 Hz - (c) 22,050 Hz - (d) 20,000 Hz
Answer
**(c).** The Nyquist frequency is half the sample rate — the highest frequency the system can capture honestly. 44,100 ÷ 2 = 22,050 Hz. Answer (d) is the textbook ceiling of human *hearing*, which is why 22,050 works: it clears the ear's range with elbow room for the anti-aliasing filter.Q3. A 32 kHz tone enters a 48 kHz converter with no anti-aliasing filter. The recording contains: - (a) Nothing — the tone is above Nyquist and is discarded - (b) A 32 kHz tone, captured with reduced accuracy - (c) A false tone at 16 kHz - (d) A false tone at 24 kHz
Answer
**(c).** Above-Nyquist content folds: 48,000 − 32,000 = 16,000 Hz — a phantom, fully audible, harmonically unrelated to the source. This is why the anti-aliasing filter must act *before* conversion: once recorded, the 16 kHz impostor is indistinguishable from a real 16 kHz tone.Q4. Increasing bit depth from 16 to 24 bits primarily: - (a) Extends the captured frequency range - (b) Lowers the quantization noise floor by roughly 48 dB - (c) Makes loud signals sound smoother and more detailed - (d) Doubles the file's sample rate
Answer
**(b).** Each bit is worth about 6 dB of dynamic range; eight extra bits ≈ 48 dB more space between full scale and the format's hiss (≈96 dB → ≈144 dB theoretical). Frequency range is sample rate's job, not bit depth's. And a signal well above the floor is captured equally well at either depth — bit depth moves the *floor*, not the "detail."Q5. 0 dBFS represents: - (a) Silence - (b) The threshold of human hearing - (c) A comfortable average mixing level - (d) The largest value the digital format can record — the absolute ceiling
Answer
**(d).** Full Scale is every bit maxed out. The scale runs *downward* from there — all real audio lives at negative dBFS values, and silence is negative infinity. (Answer (b) describes 0 dB **SPL** — the other ruler, anchored at the bottom. Keeping the two apart is the literacy.)Q6. Digital clipping sounds harsh because: - (a) The converter adds random noise during overload - (b) Flattened peaks push the waveform toward square-wave shapes, spraying strong odd harmonics (which can also alias) - (c) The sample rate momentarily drops - (d) Loudness itself is inherently harsh
Answer
**(b).** The guillotined flat tops are a shape change, and shape is harmonic recipe ([Chapter 1](../chapter-01-what-is-sound/index.md)): square-ish waves mean strong odd harmonics smeared up the spectrum. Generated harmonics that exceed Nyquist fold back down as inharmonic aliases — two failure modes stacking. Nothing random about it; it's grimly deterministic.Q7. Dither is: - (a) Noise added before reducing bit depth, to convert correlated rounding distortion into benign hiss - (b) A filter that removes frequencies above Nyquist - (c) A loudness-boosting stage used in mastering - (d) An error-correction code that restores clipped peaks
Answer
**(a).** When 24-bit becomes 16-bit, plain truncation creates error that *tracks the music* — audible grunge on fades and tails. A whisper of random noise added first scrambles those errors into steady, signal-independent hiss far below audibility. Applied once, at the final reduction to 16-bit. It cannot fix clipping, and it has nothing to do with loudness.Q8. The most defensible everyday reason to run a session above 48 kHz is: - (a) Listeners can hear ultrasonic frequencies on good speakers - (b) Capturing material you intend to slow down drastically for sound design - (c) Streaming platforms require 96 kHz uploads - (d) Higher rates lower the noise floor
Answer
**(b).** Slow a 96/192 kHz capture down an octave or two and recorded ultrasonic content descends into audibility as real material — the audio version of high-frame-rate slow motion. (a) is false for human ears; (c) is false — platforms accept standard rates and transcode regardless; (d) confuses sample rate with bit depth. The other honest case — alias-free distortion processing — is mostly handled by plugins oversampling internally.Q9. Halving your buffer size will generally: - (a) Lower monitoring latency and increase the risk of clicks and dropouts - (b) Lower latency with no cost - (c) Raise latency but improve stability - (d) Change the sound of your plugins
Answer
**(a).** The tradeoff in its purest form: smaller batches mean the signal waits less (lower latency) but the CPU must meet deadlines twice as often (fragility, crackles under load). That's why the workflow is track small, mix big — no single setting wins both jobs.Q10. A vocalist complains they hear themselves "doubled" in their headphones while tracking through the DAW at a 1024-sample buffer. The cause is: - (a) The microphone's polar pattern - (b) Their voice arriving instantly via bone conduction and again ~20+ ms later through the monitoring path - (c) Quantization distortion - (d) The headphone cable picking up interference
Answer
**(b).** A 1024-sample buffer at 48 kHz is ~21 ms each way before converter and driver overhead — far past the ~10 ms region where performers feel delay. The singer hears their own body's instant conduction *plus* the late electronic copy: a flam. Fixes: drop the buffer for tracking, lighten the session, or use the interface's direct monitoring path.Q11. Which formats restore the original audio bit-for-bit after compression? - (a) MP3 and AAC - (b) FLAC and ALAC - (c) Ogg Vorbis and Opus - (d) None — all compression loses something
Answer
**(b).** FLAC and ALAC are *lossless*: zip-logic for audio, roughly half the size, mathematically guaranteed identical on decode. Everything in (a) and (c) is lossy — perceptual approximation, with discarded detail gone forever regardless of bitrate.Q12. A lossy codec decides what to discard based on: - (a) Removing the quietest 50% of samples - (b) A model of human hearing — spending bits where you'll notice and starving detail that louder nearby content hides - (c) Removing all frequencies above 16 kHz - (d) Reducing bit depth to 8 bits
Answer
**(b).** Lossy encoding is applied psychoacoustics: it bets, region by region and moment by moment, on what a human will actually perceive — exploiting the way louder sounds hide quieter neighbors. When the bet wins, transparency at a fraction of the size; when it loses, swishy cymbals, papery applause, smeared tails.Q13. The 16-bit format's quantization noise floor sits approximately: - (a) 66 dB below full scale - (b) 96 dB below full scale - (c) 144 dB below full scale - (d) 20 dB below full scale
Answer
**(b).** ~6 dB per bit × 16 ≈ 96 dB. Play peaks at a loud 96 dB SPL and that floor sits near 0 dB SPL — the threshold of hearing, far beneath any real room's ambient noise. This is the arithmetic behind the chapter title's "(usually) enough."Q14. This book defaults to 48 kHz rather than 44.1 kHz because: - (a) 48 kHz sounds audibly clearer - (b) 44.1 kHz can't capture the full range of hearing - (c) 48 kHz is the video world's native rate, easing every future collision with picture, at trivial extra cost - (d) Streaming platforms reject 44.1 kHz files
Answer
**(c).** Pragmatism, not physics. Both rates clear human hearing completely (so (a) and (b) are false), and platforms accept both (so (d) is false). 48 kHz buys compatibility with film/TV/YouTube pipelines and a little extra filter elbow room for ~9% more disk. The audible difference between the two is the easiest blind test you'll ever fail.Q15. The "digital audio is stairsteps" picture is wrong because: - (a) Modern screens draw curves, not steps - (b) The reconstruction filter outputs the single smooth band-limited wave the samples define — steps never exist in the signal path - (c) Stairsteps are only a problem below 16-bit - (d) The steps are real but too small to hear
Answer
**(b).** The threshold concept. For a band-limited signal, the samples specify exactly one wave; reconstruction rebuilds *that* wave, smooth and stepless — verifiable on an analog oscilloscope. The steps you see are a drawing convenience. Note (d) is the seductive wrong answer: it concedes the false premise. The steps aren't inaudibly small; they're *absent*.Section 2: True / False — with Justification (2 points: 1 for the call, 1 for the why)
Q16. Recording at 24-bit captures a wider frequency range than 16-bit.
Answer
**False.** Frequency range is set by sample rate (Nyquist = half of it); bit depth sets the distance to the noise floor. A 16-bit and a 24-bit recording at the same sample rate capture the identical band — the 24-bit one carries its hiss ~48 dB deeper, which is a dynamic-range story, not a frequency story.Q17. A 21 kHz tone is captured accurately by a 44.1 kHz session.
Answer
**True.** 21,000 Hz sits below the 22,050 Hz Nyquist ceiling, so it's inside the honestly captured band — completely, per the sampling theorem, not approximately. Whether any adult in the room can *hear* it is a separate question (most can't), but the math captures it without complaint.Q18. If a take clipped at the converter during recording, pulling the clip's gain down afterward restores the audio.
Answer
**False.** Clipping at capture means the over-ceiling values were never measured — the file contains flat tops, and lowering them yields *quieter flat tops*. The information doesn't exist anywhere; repair tools can synthesize plausible guesses over brief overs, but that's reconstruction, not recovery. Prevention (sane tracking levels) is the only real cure.Q19. Dither should be applied on every bounce, at every bit depth, as a best practice.
Answer
**False.** Dither belongs at exactly one moment: the final reduction to a lower bit depth (in practice, the 16-bit delivery bounce). Bounces that stay at 24-bit don't need it, and stacking dither across repeated bounces only accumulates noise. Once, at the end, via the checkbox — then stop thinking about it.Q20. At the same buffer size in samples, a 96 kHz session has lower buffer latency than a 48 kHz one.
Answer
**True** — with a catch worth the second point. 128 samples lasts 2.7 ms at 48 kHz but only 1.3 ms at 96 kHz, since samples tick by twice as fast. The catch: the CPU also works roughly twice as hard at 96 kHz, which often forces a *larger* buffer, handing the dividend straight back. True in arithmetic, usually a wash in practice.Section 3: Short Answer (3 points each)
Q21. A skeptic shows you a zoomed-in DAW waveform: "See? Steps. Digital is an approximation." In two or three sentences, give the honest correction.
Answer
Strong answers include: (1) the steps are how the *screen* draws between dots, not what the converter outputs; (2) for a band-limited signal the samples define exactly one smooth wave, and the reconstruction filter rebuilds precisely that wave — the output, viewed on analog test gear, is stepless; (3) therefore higher rates don't "smooth the steps," they only widen the captured band beyond hearing. Bonus credit for noting early digital's harshness was a converter-hardware problem, not a sampling-math problem.Q22. Why must the anti-aliasing filter act before analog-to-digital conversion — and what specifically goes wrong if it's absent?
Answer
Aliasing happens at the instant of measurement: content above Nyquist gets recorded *as* false lower frequencies (folded, mirror-style, back into the band). Once written, an alias is bit-for-bit indistinguishable from a legitimate tone at the same frequency, so no later filter can target it. Without the filter, ultrasonic content — synth harmonics, mic transients, interference — lands inside the music as inharmonic, wrong-way-moving garbage permanently fused to the recording.Q23. Describe the two-tier export doctrine, including where dither enters and why you never encode an MP3 from another MP3.
Answer
Tier one: bounce a single archive master — WAV/AIFF, 24-bit, at session rate, peaks safely below 0 dBFS, no dither (bit depth unchanged). Tier two: derive every deliverable from that master — dither exactly once when going down to 16-bit; encode lossy formats fresh from the lossless file, leaving about a dB of peak headroom for encoder overshoot. Lossy-from-lossy re-bets a perceptual model on already-degraded material: artifacts compound generation by generation — that's the "underwater cymbals" sound of a copy of a copy.Q24. Your bandmate tracks everything with peaks at -1 dBFS "to use all the bits." Give the two-part correction: why the practice is dangerous, and why the supposed benefit is imaginary at 24-bit.
Answer
Danger: -1 dBFS leaves no margin for the performance to grow — one excited downbeat and the take clips at capture, which is permanent. Imaginary benefit: at 24-bit the noise floor sits so far down (~144 dB theoretical, 115+ real) that peaks at -18 to -10 dBFS still float the signal enormously above any format hiss; recording hotter adds zero audible fidelity. Headroom is cheap insurance bought with bits you'd never have heard anyway. ([Chapter 21](../../part-05-mixing-foundations/chapter-21-gain-staging/index.md) turns this into a full workflow.)Section 4: Applied Scenario (8 points)
Q25. Your friend's indie band booked a weekend to track their EP in your home setup: drums, bass, two guitars, and a singer with a habit of getting louder when the take is going well. The EP will go to streaming; the label-ish person involved mutters about "maybe a vinyl run" and "definitely some video content." Specify your plan — session sample rate and bit depth, tracking levels, buffer strategy for tracking vs. mixing, and the export/delivery chain when the songs are done — justifying each choice against its tradeoffs. Then name the one setting in this plan that no listener will ever hear, and the one mistake in this domain that every listener would hear.
Answer
A strong plan: **48 kHz / 24-bit** session — 48 because "definitely some video content" settles the rate debate on compatibility grounds (and vinyl/streaming are happy either way); 24-bit because a weekend of unrepeatable performances deserves the deep noise floor that makes conservative levels free. **Tracking levels:** peaks around -18 to -10 dBFS on every source, with extra margin on the singer *because the brief says she leans in when it's working* — never clip a take you love; the floor is 100+ dB down, the ceiling is one excited chorus away. **Buffers:** 64–128 samples while tracking (performers monitoring through the system need single-digit-ish ms; keep the session lean, freeze nothing-critical plugins, lean on direct monitoring if the machine complains), then 512–1024 for editing/mixing when latency stops mattering and plugin counts grow. **Exports:** one archive master per song — 24-bit WAV at 48 kHz with headroom intact, no dither — then deliverables derived from it: lossless upload to the distributor (the platforms transcode regardless), dithered 16-bit WAVs only where a deliverable demands them, MP3s for the band group chat encoded fresh with ~1 dB of peak headroom, and the masters archived in two places because one copy is a decision to lose it. Full credit requires both closers: the setting nobody will hear is the *sample rate* (44.1 vs 48 vs 96 is a workflow choice, not an audible one); the mistake everyone would hear is *clipping at capture* — the only error in this chapter with no undo.Scoring
| Section | Points available |
|---|---|
| Multiple choice (Q1–Q15) | 15 |
| True/False + justification (Q16–Q20) | 10 |
| Short answer (Q21–Q24) | 12 |
| Applied scenario (Q25) | 8 |
| Total | 45 |
| Score | Verdict |
|---|---|
| 40–45 | Threshold crossed — you can referee the forum wars now. On to Chapter 3. |
| 32–39 | Solid. Revisit whichever section bled points — usually it's aliasing arithmetic or the dither rules. |
| 22–31 | The concepts are half-landed. Re-read the threshold block and the bit-depth section, then redo Part C of the exercises — the DAW makes these ideas physical. |
| < 22 | No shame: this chapter unteaches a lifetime of marketing. Watch the Xiph video in Further Reading, redo the chapter with the Fast Track route, retake in two days. |
Spaced-review note: expect this chapter's ideas to resurface in later quizzes — dBFS and headroom especially. They're load-bearing.