Chapter 2 Exercises

Concepts you can recite are worth a little. Concepts you've tested with your own ears and your own DAW are worth everything β€” that's the difference between knowing the Nyquist frequency and never again losing an evening to a forum thread. Work Part A on paper, Part B with good headphones and honest skepticism, Part C inside your actual project from this chapter's checkpoint, and Parts D and M when you want to find out whether the ideas connected.

Difficulty: β˜… = warm-up Β· β˜…β˜… = solid work Β· β˜…β˜…β˜… = stretch. Times are honest estimates for doing the thing properly, not skimming it.

Selected answers (marked β—†) appear in the back matter under Answers to Selected Exercises.


Part A: Conceptual Understanding

A1. The analogy autopsy β—† (β˜…, 10 min). Explain sampling to a non-musician friend using the film-frames analogy β€” then, in a second paragraph, explain precisely where the analogy fails. Your second paragraph should use the phrase "band-limited" correctly and should account for why a movie genuinely loses the bullet between frames while a converter genuinely loses nothing below Nyquist.

A2. Fold-over arithmetic β—† (β˜…, 10 min). Your session runs at 48 kHz. For each rogue frequency below, state whether it's captured honestly or aliased, and if aliased, calculate where it lands: (a) 18,000 Hz; (b) 27,500 Hz; (c) 31,000 Hz; (d) 45,000 Hz. Then answer the kicker: which of your four results would be the most audible contamination for a typical adult listener, and why does "most audible" not mean "largest number"?

A3. The bit-depth rebuttal (β˜…β˜…, 15 min). A bandmate insists: "24-bit recordings capture more detail in the sound β€” finer resolution means a more accurate waveform, like more megapixels." Write a 150-word reply that (a) concedes what's technically true in their statement, (b) corrects what's wrong using the phrase "noise floor," and (c) ends by agreeing that you should still track at 24-bit β€” for the right reason.

A4. Two rulers β—† (β˜…, 10 min). A file's loudest peak hits -6 dBFS. Your friend asks, "So how loud is that in dB SPL?" Explain why the question has no answer as asked, what third piece of information would make it answerable, and which direction each scale is anchored from. Bonus: name one real-world confusion that arises when someone treats the two scales as interchangeable.

A5. Dither, in your own words (β˜…β˜…, 10 min). Without looking back at the chapter: when is dither applied, what problem does it solve, what does it cost, and what's the one-sentence rule for when not to apply it? Then check yourself against the chapter's "one honest paragraph" and note anything you got wrong β€” the gaps are the studying.

A6. The insurance argument β—† (β˜…β˜…, 10 min). Your drummer records at 16-bit "because CDs are 16-bit and they sound fine." Using the numbers from the chapter (β‰ˆ6 dB per bit, β‰ˆ96 dB vs β‰ˆ144 dB theoretical), explain why delivery at 16-bit and tracking at 16-bit are entirely different bets. Your answer must include what happens to a take recorded with peaks at -20 dBFS when it later gets boosted, compressed, and summed with 29 other tracks.

A7. Latency ledger (β˜…β˜…, 15 min). Compute the one-way buffer time for: (a) 64 samples at 48 kHz; (b) 256 samples at 48 kHz; (c) 256 samples at 96 kHz. Then list at least three other contributors to real-world round-trip latency that mean your answer to (b) understates what a vocalist actually feels in their headphones. Finally: using the ~1 foot per millisecond rule, express (b)'s round trip (double it, roughly) as a distance β€” would a guitarist standing that far from their amp complain?

A8. Format triage (β˜…, 10 min). For each scenario, choose WAV, FLAC, or MP3 β€” and defend it in one sentence: (a) the master you're archiving for ten years; (b) the rough mix you're texting the band from a parking lot; (c) the individual track files you're sending to the person mixing your song; (d) the live-show recording a fan wants; (e) the a cappella you're about to chop into a beat.


Part B: 🎧 Listening Lab

Use the best headphones you own, a quiet room, and modest volume. The goal of every exercise here is the same: replace received opinions with your own verified perceptions. Write what you hear in the listening journal you started in Chapter 1 β€” band vocabulary, not vibes.

B1. The cymbal safari β—† (β˜…β˜…, 25 min). Reference: Steely Dan β€” "Aja" (the Steve Gadd drum sections are a cymbal showcase recorded with obsessive care). Alternate: any jazz recording with prominent ride cymbal, or Daft Punk β€” "Get Lucky" for endless crisp hi-hats. Procedure: encode yourself a 128 kbps MP3 of a lossless source (rip a CD, buy a FLAC/WAV download, or bounce from your DAW β€” not a stream, which is already lossy). Listen to 30 seconds of cymbal-heavy material three times: lossless first, MP3 second, then lossless again. Journal: what specifically changes in the 8–16 kHz region? Common findings: "watery," "swishy," "the decay flutters," "the air gets gray." If you hear no difference, lower the bitrate to 96 or 64 kbps and find the threshold where you do. That threshold is data about your ears and your playback chain β€” write it down.

B2. The mother of MP3 (β˜…β˜…, 20 min). Reference: Suzanne Vega β€” "Tom's Diner" (the a cappella original β€” the track Karlheinz Brandenburg used to torture early MP3 encoders into adequacy). Alternate: any solo, dry, unaccompanied voice. Make a 64 kbps encode and compare against lossless. Solo voice is where human ears have maximum expertise β€” list three specific things the codec damages (sibilance texture? the room around the voice? the consonants?). Then reflect: why would a lone dry voice be harder for a perceptual codec than a full band? (Hint: what's missing that codecs normally hide behind?)

B3. The applause test β—† (β˜…β˜…, 15 min). Reference: Donny Hathaway β€” Live (1972) β€” the crowd on "You've Got a Friend" is famously part of the record's magic. Alternate: any live album's applause, or crowd noise from a concert film. Compare lossless vs 128 kbps on the applause alone. Applause β€” thousands of uncorrelated transients β€” is codec kryptonite. Describe the failure in your own words. (Listeners report "papery," "rustling plastic," "a rainstorm becomes a hiss.")

B4. The honesty test: 44.1 vs 48 (β˜…β˜…β˜…, 30 min). Bounce the same mix (any session, even a loop) twice from your DAW: once at 44.1 kHz, once at 48 kHz, both 24-bit WAV. Level-match them, label them A and B, shuffle the files (or have a friend rename them), and try to identify which is which across ten trials. Score honestly. Most people β€” including most professionals β€” score around chance, and that is the lesson: you've now personally verified that the sample-rate war is fought over something inaudible, which frees you to spend your attention where it pays. If you reliably beat chance, document your playback chain and method; you've either found a converter quirk, a resampling artifact, or a flawed test β€” finding out which is a better education than the exercise itself.

B5. The noise-floor hunt (β˜…β˜…, 20 min). Find the quietest musical moment you can: the final fade of a ballad, the silence before a hidden track, a solo-piano decay β€” Bill Evans Trio β€” "Peace Piece" works beautifully (alternate: any sparse piano or ambient track). At a safe but generous volume, listen into the fade. What you'll hear under the music is almost never 16-bit quantization (that's ~96 dB down) β€” it's room tone from the studio, analog tape hiss on older recordings, preamp noise. Journal what you find on three different recordings from three different decades. Lesson: the format's floor is almost never the actual floor β€” the recording chain's is.

B6. The double-encode hunt (β˜…β˜…, 15 min). Pick one familiar, hi-hat-busy track β€” Michael Jackson β€” "Billie Jean" is ideal (alternate: anything with a steady, exposed hi-hat pattern). Listen three ways at matched volume: (1) wired headphones from a lossless or high-quality source; (2) the same source over Bluetooth to a speaker or earbuds; (3) the track inside a video someone uploaded to a social platform, over that same Bluetooth link. Each hop adds a lossy encode on top of the last. Journal what survives each hop and what erodes β€” most listeners find the hi-hat texture and the stereo air degrade first while the bass line sails through untouched. The takeaway is practical, not snobbish: this chain is how most people will hear your music, which is an argument for clean masters with headroom (each re-encode amplifies existing damage), not an argument for despair.

B7. Degradation as aesthetic (β˜…, 15 min). Stream any lo-fi hip-hop playlist (or reference 9th Wonder- or J Dilla-school boom-bap built on vintage samplers). The wobble, crunch, and grit you're hearing includes deliberate bit-depth and sample-rate reduction β€” the SP-1200 sound the chapter described. Identify, in two or three tracks: where is the degradation (whole mix, or one element)? Does it read as "broken" or as "warm" β€” and what does that tell you about the difference between an accident and a decision? There are no rules, only consequences; here, the consequence is the genre.


Part C: πŸŽ›οΈ DAW Lab

Do these inside the project you created in this chapter's checkpoint. Every DAW names these controls slightly differently; Appendix E translates.

C1. The settings audit β—† (β˜…, 10 min). Open your project settings and record, in your Notes/ folder: current sample rate, bit depth, buffer size, and where each control lives (menu path). Confirm the project matches your checkpoint decision (48/24 or your documented alternative). Then find your DAW's bounce/export dialog and screenshot or note where sample rate, bit depth, file format, and dither hide. Four seconds of future panic, deleted.

C2. The export ladder β—† (β˜…β˜…, 25 min). Take any 30-second piece of music you've made (or a drum loop with cymbals). Export four versions: 24-bit WAV, 320 kbps MP3, 128 kbps MP3, 64 kbps MP3. Import all four into a fresh session on separate tracks, line them up, match their levels by ear, and loop-audition with blind switching (close your eyes, mash solo buttons, guess, check). Journal: at which rung does the ladder become audible to you, on your gear? That rung is your personal floor for demo distribution β€” and your proof that 320 is not the crime the internet says it is.

C3. Measure your latency (β˜…β˜…, 20 min). Set your buffer to 1024. Create a click track. Play it through speakers and record it back through your mic onto a new track (or loop an output cable back to an input if you have one). Zoom way in and measure the offset, in milliseconds, between the original click and the recorded one β€” your DAW's ruler can display time in ms or samples (samples Γ· sample rate Γ— 1000 = ms). Repeat at 128. Journal both numbers. You've just measured your actual round-trip latency, which no spec sheet would have told you β€” and you now know what your DAW's latency compensation is quietly correcting for.

C4. The deliberate clip β—† (β˜…β˜…, 20 min). Take a drum loop. Raise its clip gain (or a utility/gain plugin) until the channel slams +6 over and render it that way to a new file at 24-bit fixed point (bounce it β€” DAWs differ in where the clip actually bakes in; the render is the point). Zoom into the rendered waveform: find the sheared-flat tops. Listen at low volume, A/B against the original level-matched. Describe the damage in Chapter 1 vocabulary: where did the new energy appear in the spectrum, and what happened to the transients? Keep the clipped file β€” when you reach the saturation chapter much later, you'll repeat this test against gentler ways of doing the same violence.

C5. The floor ladder (β˜…β˜…, 20 min). Arm a track with your mic plugged in and gain set normally β€” then record 30 seconds of nothing (sit still, room quiet) at 24-bit. Normalize the recording upward until you can clearly hear the floor, noting how many dB of boost that took. Listen: what IS your floor? (Hiss = preamp/interface; rumble and fridge = your room; both = welcome to recording.) Now do the math from the chapter: would a 16-bit format floor (-96 dBFS) have been above or below what you actually recorded? For nearly everyone, the honest answer reframes the entire bit-depth debate: your room is your floor.

C6. Aliasing safari (β˜…β˜…β˜…, 25 min). Create a sine wave around 5 kHz (test oscillator plugin, or a synth with all harmonics off). Put a distortion or saturation plugin on it β€” the cheapest, crunchiest one you own β€” and put a spectrum analyzer after it. Sweep the sine slowly upward toward 10–15 kHz and watch the analyzer: legitimate harmonics march upward in formation; aliases appear as partials sliding downward against the sweep β€” wrong-way traffic. If your plugin has an oversampling/HQ switch, flip it and watch the wrong-way traffic thin out or vanish. You've now seen the failure the chapter described, and you know what that HQ switch is for.

C7. The dither audition (β˜…β˜…β˜…, 20 min). Take a track with a long reverb or piano decay. Bounce its final two seconds twice at 16-bit: once with dither off (truncation), once with dither on. Import both, boost each bounce massively (+30 dB or more β€” careful with your monitor level), and audition the very tail of the decay. Truncation: a faint crackling grit that follows the music. Dither: smooth hiss. At normal volume you may hear no difference at all β€” which is the honest second half of the lesson: dither matters, at the margins, on fades, and it costs one checkbox. Tick it at final 16-bit bounces and move on forever.


Part D: Synthesis & Critical Thinking

D1. The forum reply (β˜…β˜…, 25 min). Someone posts: "Recording at 44.1k throws away musical information that 192k preserves. Pros use high sample rates for a reason. Do it right or don't bother." Write a 250-word reply that is technically airtight and socially generous: correct the stairstep assumption, cite Nyquist honestly, concede the two genuine high-rate use cases, and close with what actually deserves the poster's anxiety. No sneering β€” the goal is the reply you wish someone had written for Aisha.

D2. The archive dilemma β—† (β˜…β˜…β˜…, 20 min). Your grandmother, a retired jazz singer, agrees to record three songs with you β€” once, next Sunday, at age 91. Choose your session settings and your complete file/backup strategy, and justify every choice. Note the trap: this chapter argued high rates are usually folklore β€” so is 96/24 defensible here, or hypocrisy? (Consider: unknown future uses, stretch/restoration tools, the asymmetry between storage costs and regret. "Usually" is doing real work in this chapter's title.)

D3. Design-a-codec (β˜…β˜…β˜…, 25 min). You must shrink an audio file to one-tenth its size, and you're only allowed to discard information (no clever lossless math). Using what you know about hearing from Chapter 1 and this chapter β€” frequency limits, loud-hides-quiet, where ears are expert vs naive β€” write a priority list: the first five things you'd throw away, in order, and the last thing you'd touch. Then compare your list against what MP3 actually does (further reading has the details). How close did you get?

D4. The 1982 time machine (β˜…β˜…, 15 min). You can send one paragraph to a mastering engineer in 1982 who believes "digital sounds harsh and always will β€” the math loses the music between the samples." Using the threshold concept and the converter sidebar, write the paragraph: what's actually causing what they hear, what will fix it, and what should they stop worrying about? History will grade your answer kindly β€” make sure it's the right history.


Part M: Mixed Practice

Interleaving Chapter 1 and Chapter 2 β€” because retrieval across chapters is where the knowledge starts becoming yours.

M1. Octaves meet Nyquist β—† (β˜…, 10 min). A4 = 440 Hz. (a) What frequency is three octaves above it? (b) How many more whole octaves above that result fit beneath a 44.1 kHz session's Nyquist ceiling? (c) In one sentence: why does "ten octaves of hearing" (Chapter 1) pair so neatly with "44.1 kHz is enough" (this chapter)?

M2. The square wave's revenge (β˜…β˜…, 15 min). Chapter 1: a square wave at frequency f contains odd harmonics at 3f, 5f, 7f… A synth plays a 5 kHz square wave in a 44.1 kHz session with no oversampling. (a) List the first four harmonic frequencies. (b) Which of them exceed Nyquist? (c) Calculate where the first offender folds to. (d) Connect to practice: why do harmonically rich, high-pitched sources expose cheap plugins fastest?

M3. Two rulers, one disaster (β˜…β˜…, 10 min). A podcaster records a guest with peaks at -1 dBFS "to be safe," monitors at a quiet 55 dB SPL, and hears no problem. The guest laughs hard once, mid-episode. Narrate, using both rulers correctly, what happens in the file and why the quiet monitoring level hid it. What two-part habit (one from each ruler's world) prevents the rerun?

M4. Transients under fire (β˜…β˜…, 10 min). Chapter 1 called transients the identity of a sound's attack. List the two distinct ways this chapter showed transients getting damaged β€” one from recording too hot, one from lossy encoding β€” and describe how each failure sounds different from the other, in band vocabulary.

M5. Describe the damage (β˜…β˜…, 15 min). Play your C2 export ladder's worst MP3 (64 kbps) and describe its failures using only Chapter 1's frequency-band vocabulary and timbre terms β€” no codec jargon allowed. ("The air band turns granular and swirly; sibilants lose their edge; the low end survives almost untouched…") Why does the damage concentrate where it does β€” and what does that tell you about where the codec believes your ears are weakest?


β—† = selected answer in the back matter. Done with these? Take the quiz β€” closed-book, honest conditions β€” then get back to the project: your session exists now, and Chapter 3 is going to wire it to the world.