Case Study 1: "bad guy" Under the Five-Pass Method — What Trained Ears Extract
This case study models the chapter's centerpiece skill: the complete five-pass reference listen, written out in full. Read it with the track playing — pause at each pass, listen for yourself first, then read the notes. Where the walkthrough cites production facts, they're documented; everything else is description of what's audible in the released master, which is the whole point: the record is the evidence, and it's available to anyone with ears.
Why This Track
Billie Eilish's "bad guy" is close to a perfect first deconstruction target, for three reasons.
First, the backstory is the book's premise wearing a trophy. The song — and most of the album it anchors, When We All Fall Asleep, Where Do We Go? (2019) — was written and produced by Eilish's brother Finneas and recorded largely in his small bedroom studio in the family's Highland Park home in Los Angeles. No big studio, no expensive live rooms. At the 2020 Grammys, the album took Album of the Year and "bad guy" took Record of the Year — an award that honors the recording: production and engineering included. A bedroom production beat the industry's flagship studios at the industry's own awards. The knowledge gap, not the gear gap, is the story.
Second, it's sparse. Famously, deliberately sparse. Dense mixes hide their decisions under forty layers; "bad guy" makes a handful of elements carry everything, which means a beginner can actually isolate and name each one. You can hear the architecture because there are no curtains.
Third, almost every one of this chapter's concepts shows up in it somewhere: deliberate empty bands, a masking-proof arrangement, depth built from dryness, width spent like money, and dynamics managed by subtraction. It's the chapter with a beat.
Setup, per the method: known song (you've heard it; the planet has), your consistent playback system, moderate volume, journal open. The track runs about three and a quarter minutes at roughly 135 BPM, in G minor. Five plays. One question each.
Pass 1 — Balance: What's Loudest?
Press play and rank what's prominent. Within the first verse, the hierarchy declares itself:
- The vocal. Highest on the totem — but not by shouting. It's a whisper that's enormous, sitting closer to your ear than anything else in the track. Hold that thought for the space pass.
- The bass synth. Nearly co-equal with the vocal, and carrying the entire harmonic story — there's no chord pad, no guitar, no keys behind it for most of the song. The bassline is the music bed.
- The kick. Four-on-the-floor through the verses, tucked slightly under the bass — present as a pulse, not a punch-in-the-chest. It reads as the floor the bass stands on.
- The finger snaps. The backbeat. Clearly quieter than vocal/bass/kick, but never lost — and notice why they don't get lost: nothing else lives in their frequency neighborhood. That's the frequency pass's story arriving early, which happens constantly once your ears wake up — the passes are lenses, not walls.
- Everything else — the staccato synth stabs in the pre-chorus, the "duh" lead synth in the post-chorus hook, occasional vocal harmonies — enters, takes focus while present, and leaves.
The rookie expectation is that a chart-topping pop record is full. The balance pass on this one finds four or five elements at any given moment, ranked with daylight between them. Write the hierarchy down: vocal ≥ bass > kick > snaps > visitors.
Pass 2 — Frequency: Who Owns Each Band?
Walk the six bands, bottom to top. This is the pass where "bad guy" teaches the most.
- Sub (below 60 Hz): Owned — emphatically — by the bass synth's fundamentals, with the kick's lowest weight underneath. On a real playback system this band is loaded; on a phone speaker it vanishes (run the case study's companion exercise, B10, and you'll hear the missing fundamental do its inference trick: the bassline's notes survive, the weight doesn't).
- Lows (60–200 Hz): Kick knock and the bass synth's body. Still busy, still controlled — kick and bass interlock rather than wrestle (how mixes achieve that interlock is a Part V and Part VI story; for now, hear that it's possible).
- Low-mids (200–500 Hz): Here's the secret. Nearly empty. Listen hard at the mud zone and find... air. No strummed guitars depositing body, no piano left hand, no thick pad. The single most congested band in amateur mixes is, in this Grammy-winning mix, a quiet hallway. This is what "expensive-sounding" frequently is: not added sheen, but absent mud. Emptiness as a decision.
- Mids (500 Hz–2 kHz): The vocal's vowels live here with almost no competition, plus the upper harmonics of the bass — which is why the bassline remains a melody on small speakers. This band is why the song survives a phone.
- High-mids (2–6 kHz): The snaps' crack, the vocal's consonants, the staccato pre-chorus synth's edge. Sparse occupancy again — so each of those elements cuts through at low level. (Finneas has said in interviews that the pre-chorus's staccato sound was built from a recording of an Australian pedestrian-crossing signal — the kind of found-sound pragmatism bedroom production rewards. The story goes that the sample was pitched into the track's key; what's audible is a clicky, in-key stab living happily in a band nobody else wanted.)
- Air (6 kHz+): The whisper's breath and sibilance, snap transients' sparkle, and not much else. The track doesn't hiss with excitement; it leaves the top open so the vocal's breathiness reads as intimacy.
Band-map summary for the journal: sub/lows = full and governed; low-mids = deliberately vacant; mids = vocal's private property; high-mids/air = small, sharp, uncontested tenants. Six bands, almost zero contested real estate. Masking has nowhere to happen because the arrangement refused to schedule the fights.
Pass 3 — Space: How Far Away Is Everything?
Depth ranking, nearest to farthest:
- The vocal — closer than close. This is the pass where "bad guy" is radical. The lead vocal is nearly bone dry: no audible reverb tail, full transient detail on every consonant, breath sounds intact, louder than its neighbors. All four proximity cues from the chapter — loud, bright, dry, sharp — pinned at maximum. The result is a voice inside your personal space, which is the record's entire attitude. Intimacy here isn't a lyric; it's an engineering decision you can name cue by cue.
- Bass and kick — close behind, dry, physical.
- Snaps — a touch of short ambience; they sit a step back from the vocal rather than beside it.
- The synth stabs and harmonies — set slightly deeper when they appear, which keeps the foreground uncluttered.
Count the distinct spaces: essentially one very dry foreground with a shallow second shelf — until the moments the production wants you to notice space, when a vocal phrase suddenly blooms with a tail and then dries up again. Space used as an event, not an atmosphere. After the cathedral-soaked mixes of an earlier pop era, the dryness reads as confrontational, modern, expensive.
Pass 4 — Width: Where Is Everything, Left to Right?
Headphones on. The power alley is textbook: vocal, bass, kick dead center — the mono spine. Now the spending pattern:
- Verses: narrow. Almost defiantly mono-ish, with the snaps and small details placed off-center to keep the picture alive.
- Pre-chorus: the staccato stabs start playing with the sides — call-and-answer placement, left then right.
- The post-chorus "duh" hook: the detuned synth lead widens the world. After a verse of center-channel claustrophobia, the hook feels panoramic — not because it's extremely wide by absolute standards, but because the verse spent nothing and the hook spends a lot. Width is relative, and this mix banks it deliberately so the chorus can cash it.
Pointing test results: vocal — dead ahead, no ambiguity. Snaps — off-center, easy to point at. Hook synth — your hands drift outward and stop pointing at all, which is the answer.
Pass 5 — Dynamics: What Moves?
Micro first: transients are alive. The snaps crack, the kick's pulse has a clean front edge, the vocal's consonants tick — nothing sounds smoothed flat, because (the arrangement again) nothing competes for the moments those transients occupy.
Macro: this track manages energy almost entirely by density and subtraction, not volume:
- Verse → pre-chorus: elements leave; the staccato stabs and vocal take over a sparser stage. Tension by removal.
- The bar before the hook: the song's most famous trick — a near-total dropout. "I'm the bad guy..." hangs in dry, mocking silence, and then "duh" — the hook lands on an emptied stage. The drop hits hard because the production dug a hole for it. Contrast is the engine; the dropout is the chapter's masking and balance lessons inverted into a weapon.
- The outro: the song halves itself — tempo feel drops into a darker, slower half-time section, vocals pitched and processed downward, energy deliberately reduced to end on menace instead of a final chorus. A pop song that refuses the obvious big finish, and the restraint is the statement.
Loudest moment versus quietest: the gap is wide and used — the loud moments are loud because the quiet ones are honored.
The Synthesis: The Difference List a Trained Ear Writes
Five passes, one page of notes. Here's what trained ears extract — the transferable decisions, each one checkable against your own work tonight:
- Few elements, ranked with daylight. Four or five things at once, hierarchy unmistakable. Check: can you rank your own track's top five in one listen — and would a stranger's ranking match yours?
- One owner per band. The bass owns the bottom, the vocal owns the mids, small sharp sounds own the top. Nothing contested, so nothing masked, so nothing needed to win a loudness arms race. Check: walk your six bands — name each band's owner. Hesitation = pileup.
- The low-mids are a choice. The mud zone is empty because nothing muddy was invited. The cleanest EQ move is the part you didn't add — arrangement as the first mix. Check: what's living at 200–500 Hz in your track, and did any of it earn the space?
- Proximity is a production value. Dry + bright + loud + detailed = a vocal in the listener's personal space. Intimacy is buildable from nameable cues. Check: rank your elements front-to-back. Do you have a foreground, or a flat wall?
- Width and dynamics are budgets. Spend nothing in the verse so the hook's width and the pre-hook silence land like events. Check: does anything in your track change enough to feel like an event?
None of these five require a single dollar. Every one requires ears that can hear the difference — which is the chapter, and the book, in one sentence.
What to Do With This
Run exercise B1 if you haven't: your own five-pass on this track, then compare against this walkthrough. The gaps between your notes and these aren't failures — they're a map of which passes need reps. Then point the same method at your three references from the Project Checkpoint, where the answers won't be printed on the next page. They never will be again. That's the skill.