Case Study 2: Anatomy of a Password-Dump Disaster — When Weak Hashing Meets a Breach

DataField.Dev

Case Study 2: Anatomy of a Password-Dump Disaster — When Weak Hashing Meets a Breach

"The breach was bad. The way the passwords were stored is what turned it into a decade-long problem for millions of people." — Incident retrospective, "Folio" social platform (composite, constructed)

Executive Summary

The previous case study built encryption correctly, from the defender's seat. This one does the opposite: it dissects a failure, analytically, the way a responder dissects a breach after the fact — because the fastest way to learn what "use crypto correctly" means is to study what happens when an organization does not. "Folio" is a composite social-media platform, assembled from the well-documented pattern of several real megabreaches, whose password database of roughly 120 million accounts was stolen and later leaked. The intrusion itself was ordinary; what made it a generational disaster was the cryptographic malpractice in how the passwords were stored — choices made years earlier by people who believed "hashing the passwords" was enough. You will reconstruct why the stored hashes fell so fast, quantify the difference the right choices would have made, and trace the downstream damage that good password storage would have contained. This is an analysis-heavy case (contrast Case Study 1's design focus). The organization and all specifics are constructed for teaching (Tier 3); the pattern mirrors real public incidents, which are cited generically rather than by name.

Skills applied: reading a credential-storage scheme for weakness; explaining rainbow tables and offline brute force; quantifying the effect of salting and slow hashing; assessing breach severity from the storage method; tracing credential-stuffing damage across systems; extracting the §4.7 lessons from a real failure pattern.

Background

Folio launched as a fast-growing startup, where "ship it" beat "secure it" and where, like many products of its era, password storage was an afterthought handled by whoever set up the user table. The original choice was the one that feels responsible to a non-specialist and is in fact dangerous: store MD5(password). No salt. The reasoning, reconstructed from the retrospective, was exactly the misconception §4.4 warns about — "we don't store plaintext passwords, we hash them, so even if the database leaks the passwords are safe." Every word of that sentence is technically true and the conclusion is wrong.

Years later, an attacker gained access through an unremarkable path (the post-mortem points to a compromised employee credential and lax internal segmentation — the kinds of failures the rest of this book addresses) and exfiltrated the user table: usernames, email addresses, and the unsalted MD5 password hashes for about 120 million accounts. For a while nobody outside knew. Then the database appeared for sale, and then for free, on breach-trading forums. At that point the storage decision made years earlier determined the blast radius — and it was catastrophic.

This case is not about how the attacker got in (that is Chapters 2, 16, and 24). It is about the question a defender must be able to answer the instant a credential dump surfaces: given how these were stored, how bad is this, and how fast?

The Analysis

Phase 1 — Reading the storage scheme for weakness

The first thing a responder does with a leaked password table is identify the storage scheme, because that single fact sets the entire severity. Folio's hashes looked like this (illustrative, clearly fake values):

   user            password_hash (32 hex chars -> MD5)
   ───────────     ────────────────────────────────────
   a_rivera        5f4dcc3b5aa765d61d8327deb882cf99      <- classic MD5 of "password"
   b_chen          5f4dcc3b5aa765d61d8327deb882cf99      <- IDENTICAL -> same password!
   c_okoye         e10adc3949ba59abbe56e057f20f883e      <- MD5 of "123456"
   d_park          25d55ad283aa400af464c76d713c07ad      <- MD5 of "12345678"

Three damning observations, each tied directly to a chapter concept:

The hashes are MD5 — fast and broken (§4.4). MD5 was designed for speed, and a commodity GPU can compute billions of MD5 hashes per second.
There is no salt. Identical passwords produce identical hashes — visible right there in the data: a_rivera and b_chen share a hash, so they share a password, and an attacker learns that before cracking anything. Worse, unsalted hashes are vulnerable to precomputation.
The hashes are directly recognizable. 5f4dcc3b... is the MD5 of password. A responder does not even need to "crack" it; it is a known value in every public rainbow table and lookup site. The plaintext is recoverable by lookup, instantly, for any common password.

🛡️ Defender's Lens: The severity of a credential breach is set by the storage scheme, and you can assess it in seconds: plaintext or unsalted fast hash (MD5/SHA-1) → assume effectively every password is recoverable, treat as total credential compromise; salted slow hash (bcrypt/Argon2 at a real work factor) → strong passwords are likely safe, weak ones fall slowly, you have time. Folio is the worst case. The first responder to see 5f4dcc3b... in the dump knew, before any analysis, that this was a "force a global reset and brace for stuffing" event, not a "monitor and assess" one.

Phase 2 — Why the hashes fell so fast: rainbow tables and offline brute force

To make the failure concrete, reconstruct the two attacks §4.4 named, against Folio's specific scheme.

Precomputation / rainbow tables. Because the hash is unsalted and deterministic, MD5("password") is always 5f4dcc3b..., for every user, on every site, forever. So an attacker precomputes (or simply downloads) a giant table mapping common passwords → their MD5 hashes, once, and then looks up every stolen hash. There is no per-user work. For the tens of millions of Folio users who chose a common password, the plaintext was recovered by table lookup essentially the moment the dump was available — no "cracking" in any meaningful sense.

Offline brute force / dictionary attack. For passwords not already in a table, the attacker runs an offline guessing attack: take a wordlist (real-world password dumps, dictionary words, common patterns with substitutions like P@ssw0rd), hash each guess with MD5, and compare. Crucially this is offline — the attacker is not hitting Folio's login page where rate-limiting or lockout could intervene; they have the hashes on their own hardware and can guess as fast as the hardware allows. With MD5 at billions of guesses per second on a single GPU rig, the math is brutal:

   OFFLINE GUESSING SPEED (order-of-magnitude, illustrative)

   storage scheme          guesses/sec (1 GPU)     time to try 10 billion guesses
   ─────────────────       ───────────────────     ──────────────────────────────
   unsalted MD5            ~ tens of billions      seconds
   unsalted SHA-256        ~ billions              minutes
   bcrypt (work factor)    ~ tens of thousands     YEARS-to-centuries
   Argon2 (memory-hard)    ~ thousands (+ memory)  effectively infeasible at scale

Figure CS4.3 — The same wordlist against different storage schemes. The numbers are illustrative orders of magnitude, but the ratios are real and decisive: a deliberately slow, salted algorithm changes "seconds" into "centuries." Folio chose the top row.

The result: the overwhelming majority of Folio's 120 million passwords were recovered — common ones instantly by lookup, the long tail within days of focused guessing. The cipher was not broken; nothing was "hacked" in the cryptographic sense. The hashes simply offered almost no resistance, exactly as §4.4 predicts for a fast, unsalted hash.

Phase 3 — The counterfactual: what correct storage would have changed

The most useful analytical move is the counterfactual — what if Folio had stored passwords correctly? — because it isolates the value of each defensive idea.

Add a per-user salt. Salting would not slow down a single targeted guess, but it would have destroyed the precomputation attack entirely. Rainbow tables become useless: the attacker cannot reuse one precomputed table across users, because each user's hash incorporates a unique salt, so the same password yields different hashes per user. The instant lookups of Phase 2 — the recovery of tens of millions of accounts in moments — simply do not happen. The attacker is forced into per-user brute force.

Use a slow, memory-hard algorithm (bcrypt/Argon2) at a real work factor. Salting forces per-user guessing; the slow algorithm then makes that guessing infeasible. Replace MD5's billions-per-second with bcrypt's tens-of-thousands-per-second (or Argon2's even harsher, memory-bound rate), and the same wordlist that recovered Folio's passwords in days would take years to centuries — long enough that the passwords would be changed, expired, and irrelevant before more than the weakest handful fell.

   COUNTERFACTUAL SEVERITY for Folio's 120M accounts

   actual (unsalted MD5):     ~all passwords recovered (common ones instantly)  -> CATASTROPHE
   +salt only:                no instant lookups; per-user brute force required -> still bad, slower
   +salt +bcrypt/Argon2:      only the weakest passwords fall, slowly           -> CONTAINED

🚪 Threshold Concept: A breach is not a single event with a fixed severity — its severity is set in advance by decisions made long before the attacker arrives. Folio's catastrophe was authored years earlier, in a five-minute choice to call MD5() instead of a password-hashing library. This is Theme 4 (defense in depth assumes each layer fails) turned inward: you assume the database will be stolen, and you design the storage so that theft is survivable. The defender's leverage over a future breach is largest before it happens, in choices exactly this mundane.

Phase 4 — The blast radius: credential stuffing and password reuse

Folio's damage did not stay at Folio, and understanding why is essential to grasping why password storage matters beyond one company. People reuse passwords. A person whose recovered@example.com / Summer2019! fell out of the Folio dump very likely used the same pair on their email, their bank, and their employer's VPN. Attackers know this, so the moment a dump like Folio's circulates, automated credential stuffing begins: the attacker takes the recovered email/password pairs and replays them, at scale, against hundreds of other services, logging in wherever the victim reused the password.

   THE BLAST RADIUS OF ONE BAD HASH DECISION

   Folio breach (unsalted MD5) ──► 120M email+password pairs recovered
                                          │
                                          ▼
        ┌──────────────── credential stuffing (automated replay) ───────────────┐
        ▼                    ▼                     ▼                     ▼
   victims' email      online banking        corporate VPNs        other social
   accounts            (account takeover)    (initial access!)     accounts
        │                    │                     │                     │
        └─ account takeover, fraud, and a foothold into *other* organizations ─┘

Figure CS4.4 — Weak password storage at one company becomes initial access at many others through password reuse and credential stuffing. The hash decision's consequences are not contained to the breached organization.

This is why, for a defending organization like Meridian, other companies' breaches are a direct threat. The recovered Folio passwords feed the credential-stuffing botnets that hammer Meridian's online-banking login (a threat we met in Chapter 1's risk register and will defend against directly in Chapter 16, with breached-password checking and phishing-resistant MFA). One organization's cryptographic malpractice becomes the entire internet's attack inventory.

🔗 Connection: The defenses that contain this blast radius are the subject of Chapter 16 (authentication): checking new passwords against known-breached lists (so a stuffed password is rejected), and — decisively — phishing-resistant MFA, which means a stolen password alone (from Folio or anywhere) is not enough to log in. Recall Chapter 1: the loan officer's password was captured and the attacker still failed, because a second factor stood in the way. Folio shows why that design matters at internet scale: passwords leak constantly, so a system that relies on the password alone is relying on a secret that is, somewhere, already for sale.

Phase 5 — Extracting the §4.7 lessons

Map Folio cleanly onto the failure catalog from §4.7, because that is how you convert a war story into something that changes your own systems:

§4.7 failure	How Folio committed it	The correct practice
Weak/deprecated algorithm	Used MD5, broken and fast	SHA-2/3 for integrity; never MD5/SHA-1 for security
(Password-specific) wrong tool	Used a fast general hash for passwords	Use a slow, memory-hard password hash: Argon2/bcrypt/scrypt
Missing defense	No salt → precomputation worked	Per-user random salt, always
"We hashed it, we're fine"	Believed hashing alone protected passwords	Hashing ≠ safe storage; salted + slow is the bar
Blast radius ignored	Did not anticipate reuse/stuffing	Assume reuse; add breach-checking + MFA (Ch.16)

Notice, one last time, the through-line of this whole chapter: the math never broke. MD5 did exactly what MD5 does. There was no cryptographic breakthrough, no broken cipher, no nation-state mathematics. There was a fast, unsalted hash where a slow, salted one belonged — a usage failure, the kind §4.7 promises accounts for nearly every real crypto disaster. And the fix required no cryptographer: a defender who knew the one rule ("salt, and be slow on purpose") would have prevented the entire generational mess.

🔄 Check Your Understanding: Folio could have migrated its stored hashes to bcrypt without ever seeing users' plaintext passwords — a common remediation. Sketch how: when a hash is stored as fast and unsalted, you cannot recover the password, so how would you upgrade the storage for existing users over time without forcing everyone to reset at once? (Hint: you can hash the existing stored value with a better scheme, and/or upgrade lazily at each user's next successful login. What are the trade-offs of each?)

Discussion Questions

Folio's fatal decision was made in five minutes by someone who believed they were doing the responsible thing ("we hash passwords"). How should an organization prevent well-intentioned crypto mistakes like this? What role do an encryption standard (Case Study 1) and code review play?
Salting and slow hashing each defeat a different attack. Explain which defeats which, and why you need both rather than either alone. Could a slow hash without a salt ever be acceptable?
The case argues that "a breach's severity is set in advance." Do you agree? Name another security decision (besides password storage) whose value is determined long before any incident.
Credential stuffing means Folio's breach harmed Meridian, a completely unrelated company. What does this imply about whether password-only authentication can ever be considered safe in 2026, regardless of how your organization stores passwords?
A vendor tells you their product "securely hashes all passwords with SHA-256." Is that reassuring? What exactly would you ask next before believing the passwords are well protected?

Your Turn

Find (or construct) a description of a real, public credential breach and analyze it the way this case analyzed Folio: (1) identify the storage scheme and state, in one line, how bad it makes the incident and why; (2) explain which of rainbow tables or offline brute force the scheme was vulnerable to; (3) write the counterfactual — what one change to the storage would have most reduced the damage, and roughly how much; and (4) trace one blast-radius consequence beyond the breached company (whom does it harm next, and through what mechanism?). Conclude with a single sentence in the form: "The math did not break; what broke was ______." If you cannot complete that sentence with a usage/operations failure, re-examine the case — genuine algorithm breaks are rare, and you may have missed the real cause.

Key Takeaways

A credential breach's severity is set by the storage scheme, assessable in seconds: plaintext or unsalted fast hash (MD5/SHA-1) → assume total compromise; salted slow hash (Argon2/bcrypt) → largely contained.
Fast, unsalted hashing is barely better than plaintext against a real attacker: rainbow tables recover common passwords by lookup, and offline brute force at billions/second recovers the rest.
Salting defeats precomputation; a slow, memory-hard algorithm defeats brute force — you need both, and the right pairing turns "seconds" into "centuries."
A breach's damage is authored in advance by mundane choices; the defender's leverage is largest before the incident, in decisions as small as which function to call (Theme 4, turned inward).
Weak storage at one company becomes initial access at many others via password reuse and credential stuffing — which is why password-only authentication is unsafe regardless of your own hygiene, and why breach-checking and phishing-resistant MFA (Chapter 16) are the real containment.
The recurring truth of this chapter: the math did not break — the usage did. Almost every real crypto disaster is implementation, configuration, or operation (§4.7), and almost every one was preventable by a defender who knew the rule.