Case Study: Auditing a Password Policy by Sizing Its Keyspace

DataField.Dev

Case Study: Auditing a Password Policy by Sizing Its Keyspace

"The enemy knows the system." — Shannon's maxim (Kerckhoffs's principle, restated)

Executive Summary

A security review lands on your desk: the company's authentication service stores hashed passwords, and leadership wants a one-page answer to a deceptively simple question — "Is our password policy strong enough to survive a leaked database?" You will answer it the way a working engineer does: not with opinion, but by counting. You will model the policy's keyspace as a sequence of independent choices, size it with the product rule, account for "complexity requirements" with the subtraction principle and inclusion-exclusion, and convert each count into a worst-case brute-force time. The entire analysis is the four counting rules of this chapter applied to one high-stakes number — and that number, not any hand-waving, is the security argument.

This is an analysis-heavy case study: there is almost no new code, just careful modeling and arithmetic you can do on an envelope. The companion build-heavy study (case-study-02.md) writes a keyspace calculator; here, the skill is reading a policy and turning it into a defensible count.

Skills applied

Modeling a keyspace as a Cartesian product of per-position choices (the product rule, §15.2).
Counting "at least one X" via the subtraction principle — complement counting (§15.6).
Handling two simultaneous complexity rules with inclusion-exclusion (§15.4).
Counting a range of lengths with the sum rule (§15.3).
Converting a keyspace size into a worst-case attack time and a feasibility verdict.

Background

The scenario

The authentication team uses a slow password hash (so each guess is expensive), but assume the worst: the database leaks, and an offline attacker can test guesses at a fixed rate. Two rates frame the whole analysis:

A modest attacker: $10^6$ guesses per second (a single machine against a slow hash).
A serious attacker: $10^{12}$ guesses per second (a GPU/ASIC farm against a fast or poorly configured hash).

Leadership is evaluating three candidate policies:

Policy A. Exactly 8 characters, lowercase letters only (26 symbols).
Policy B. Exactly 10 characters, all 95 printable ASCII characters, no complexity rules.
Policy C. Length 10 to 12, all 95 printable ASCII, and "must contain at least one digit and at least one uppercase letter."

A policy is "strong enough" if even the serious attacker needs, say, more than a few decades in the worst case. (The threshold is a business decision; counting gives the number the decision rests on.)

💡 Intuition: A keyspace is just a finite set — the set of all strings the policy permits. Its size is a cardinality, and every tool from §15.1–§15.6 is a tool for cardinalities. "How strong is the policy?" reduces to "how big is the set, and how long to walk through it?" That reduction is the whole game.

Why this matters

The number you compute here is the difference between a policy you can sign off on and one that exposes the company. The exact same reasoning — how large is the space, and how long to search it? — scales up to cryptographic keys in Part IV: a 56-bit key fell in public in the 1990s, while a 256-bit key ($2^{256} \approx 10^{77}$) is beyond any conceivable brute force. Sizing a keyspace is the first quantitative instrument of security, and you are building the habit on passwords before you apply it to RSA in Chapter 25.

🔗 Connection. The keyspace arithmetic in this study is the direct ancestor of the RSA security argument in Chapters 22-25. There the "keyspace" is the set of possible private keys, and "brute force is infeasible" becomes "factoring a 600-digit number is infeasible." Same sentence, larger numbers.

Phase 1: Size Policy A and translate it to time

Model first. A Policy-A password is a sequence of 8 positions, each an independent choice among 26 lowercase letters. The choice at position 5 does not constrain position 6, so by the product rule (§15.2) the keyspace size is $$\lvert K_A \rvert = 26^8 = 208{,}827{,}064{,}576 \approx 2.09 \times 10^{11}.$$

Now convert to worst-case brute-force time — the time to try every key, which is the relevant figure because the true password could be the last one tried. Time $\approx$ (keyspace size) ÷ (guesses per second):

Attacker	Rate (guesses/s)	Worst-case time	In human terms
Modest	$10^6$	$2.09 \times 10^{11} / 10^6 = 2.09 \times 10^5$ s	about 2.4 days
Serious	$10^{12}$	$2.09 \times 10^{11} / 10^{12} = 0.21$ s	a fraction of a second

The verdict writes itself. Policy A falls to a modest attacker over a weekend and to a serious attacker instantly. Eight lowercase letters is not a password policy; it is a speed bump. Notice we reached that conclusion without listing a single password — the count is the argument.

🔄 Check Your Understanding Recompute Policy A's keyspace if the length were raised from 8 to 10 (still lowercase only). By what factor does the keyspace grow, and is that enough to stop the serious attacker?

Answer

$26^{10} = 26^8 \times 26^2 = 208{,}827{,}064{,}576 \times 676 \approx 1.41 \times 10^{14}$. The keyspace grows by a factor of $26^2 = 676$. Against the serious attacker ($10^{12}$/s) the worst-case time becomes $\approx 141$ s — under three minutes. Two extra lowercase characters help, but nowhere near enough; the alphabet, not just the length, has to grow.

Phase 2: Size Policy B and see the effect of a bigger alphabet

Policy B keeps the length modest (10) but widens the alphabet to all 95 printable ASCII characters. Same model — a sequence of 10 independent per-position choices — so the product rule again: $$\lvert K_B \rvert = 95^{10} \approx 5.99 \times 10^{19}.$$

A useful sanity check is to think in "bits of entropy": $\log_2(95^{10}) = 10 \log_2 95 \approx 10 \times 6.57 = 65.7$ bits. (Each printable ASCII character carries about $\log_2 95 \approx 6.57$ bits; ten of them give roughly 66 bits.) Translating to time:

Attacker	Rate	Worst-case time	In human terms
Modest	$10^6$	$5.99 \times 10^{13}$ s	about 1.9 million years
Serious	$10^{12}$	$5.99 \times 10^{7}$ s	about 1.9 years

Against the modest attacker, Policy B is wildly overkill. Against the serious attacker it holds for roughly two years in the worst case — comfortable, but not the "decades" margin leadership wanted. The single change from Policy A — 26 symbols to 95, and 8 positions to 10 — moved the verdict from "broken in a second" to "good for ~2 years against a farm." That swing is the product rule's leverage: the keyspace size depends on the alphabet size raised to the length, so widening the base is enormously powerful.

💡 Intuition: Length adds to the exponent; alphabet size is the base. Both help, but they help differently. Going from $26^{10}$ to $95^{10}$ multiplies the keyspace by $(95/26)^{10} \approx 3.65^{10} \approx 2.4 \times 10^{5}$ — a quarter-million-fold jump from the alphabet alone. This is why "allow symbols" is such a high-leverage policy lever.

Phase 3: Policy C — complexity rules with the subtraction principle

Policy C adds two real-world wrinkles: a range of lengths (10 to 12) and complexity requirements ("at least one digit and at least one uppercase letter"). Each wrinkle is exactly one of this chapter's rules.

Step 3a: handle the length range with the sum rule

A password has exactly one length, so the cases "length 10," "length 11," and "length 12" are mutually exclusive. By the sum rule (§15.3), the unconstrained count over the range is a sum of products: $$95^{10} + 95^{11} + 95^{12}.$$ But Policy C also imposes complexity rules, so we must subtract the passwords that violate them — and the cleanest way to do that is per length, then sum. Define, for a fixed length $L$:

$U_L = 95^L$: all length-$L$ strings (the universe for that length).
$D_L$: length-$L$ strings with no digit (the "bad: missing a digit" set).
$V_L$: length-$L$ strings with no uppercase letter (the "bad: missing uppercase" set).

The printable ASCII set of 95 characters breaks down as 10 digits, 26 uppercase, 26 lowercase, and 33 symbols/space. So:

Characters that are not digits: $95 - 10 = 85$. Hence $\lvert D_L \rvert = 85^L$.
Characters that are not uppercase: $95 - 26 = 69$. Hence $\lvert V_L \rvert = 69^L$.

Step 3b: combine the two "bad" sets with inclusion-exclusion

A password is bad (rejected by the complexity rule) if it is missing a digit or missing an uppercase letter — that is, if it lies in $D_L \cup V_L$. These two sets overlap (a string can be missing both), so we cannot just add them; we need inclusion-exclusion (§15.4): $$\lvert D_L \cup V_L \rvert = \lvert D_L \rvert + \lvert V_L \rvert - \lvert D_L \cap V_L \rvert.$$ The intersection $D_L \cap V_L$ is the strings with neither a digit nor an uppercase letter, i.e. built only from the $95 - 10 - 26 = 59$ remaining characters (lowercase + symbols + space): $\lvert D_L \cap V_L \rvert = 59^L$. Therefore the count of good (policy-compliant) length-$L$ passwords is, by the subtraction principle, $$\text{good}_L = \lvert U_L \rvert - \lvert D_L \cup V_L \rvert = 95^L - \big(85^L + 69^L - 59^L\big).$$

Step 3c: sum over the three lengths

By the sum rule, the full Policy-C keyspace is $$\lvert K_C \rvert = \sum_{L=10}^{12} \Big(95^L - 85^L - 69^L + 59^L\Big).$$

Let us pin down the dominant length, $L = 12$, by hand-magnitude (the others are smaller and only nudge the total):

$95^{12} \approx 5.404 \times 10^{23}$.
$85^{12} \approx 1.424 \times 10^{23}$.
$69^{12} \approx 1.156 \times 10^{22}$.
$59^{12} \approx 1.367 \times 10^{21}$.
$\text{good}_{12} \approx 5.404\times10^{23} - 1.424\times10^{23} - 0.1156\times10^{23} + 0.01367\times10^{23} \approx 3.86 \times 10^{23}.$

Adding the (smaller) $L=10$ and $L=11$ contributions, the total keyspace is on the order of $\lvert K_C \rvert \approx 3.9 \times 10^{23}$ — call it "a few times $10^{23}$." Even against the serious attacker: $$\text{worst-case time} \approx \frac{3.9 \times 10^{23}}{10^{12}} = 3.9 \times 10^{11}\ \text{s} \approx 12{,}000\ \text{years}.$$ That clears the "decades" bar by a wide margin. Policy C is strong enough.

⚠️ Common Pitfall: the complexity rule shrinks the keyspace. It is tempting to assume that "require a digit and an uppercase" makes the space bigger — it does not. Every constraint removes permitted passwords, so $\lvert K_C \rvert < 95^{10} + 95^{11} + 95^{12}$. The reason to require complexity is not to enlarge the keyspace; it is to forbid the weak, guessable passwords (all lowercase, dictionary words) that attackers try first. Counting tells you the worst case; it does not capture that real attackers do not search uniformly. Keep both facts in mind: the count bounds the brute-force time, and the complexity rule defends against smarter-than-brute-force attacks.

🔄 Check Your Understanding In Step 3b, why did we subtract $\lvert D_L \cap V_L \rvert = 59^L$ rather than treat $D_L$ and $V_L$ as disjoint and just add them?

Answer

Because a single password can be missing both a digit and an uppercase letter at once — for example, an all-lowercase string belongs to $D_L$ and $V_L$. The sets overlap, so the plain sum rule would double-count those strings. Inclusion-exclusion subtracts the overlap ($59^L$ strings using only the 59 lowercase/symbol characters) exactly once to correct it.

Phase 4: A small calculator to confirm the structure

The arithmetic above is the analysis; a few lines of code just let you reproduce any single number and check the hand-estimates. The function structure is the chapter: a product rule for the universe, an inclusion-exclusion for the bad set, a subtraction for the good count, and a sum over the lengths. We print one exactly hand-verifiable number (Policy A's $26^8$, computed in §15.2) and, to confirm the inclusion-exclusion logic itself, run it on a deliberately tiny alphabet whose answer we can check by listing.

def keyspace(alphabet_size, length):
    """Product rule: number of strings of `length` over `alphabet_size` symbols."""
    return alphabet_size ** length

def good_count_one_length(L, total, no_d, no_u, neither):
    """One length: total - (no_d + no_u - neither), i.e. universe - |D ∪ V|."""
    bad = keyspace(no_d, L) + keyspace(no_u, L) - keyspace(neither, L)  # inclusion-exclusion
    return keyspace(total, L) - bad                                      # subtraction principle

print(keyspace(26, 8))                              # Policy A, exact (see §15.2)
# Tiny check: length-2 strings over a 4-symbol alphabet {one digit, one upper, two other},
# requiring >=1 digit and >=1 upper.  total=4, no_digit=3, no_upper=3, neither=2.
print(good_count_one_length(2, total=4, no_d=3, no_u=3, neither=2))
# Expected output:
# 208827064576
# 2

The tiny check is small enough to verify by listing. Over the 4-symbol alphabet $\{d, U, x, y\}$ ($d$ = the digit, $U$ = the uppercase, $x, y$ = the two "other" characters), the length-2 strings that contain at least one digit and at least one uppercase are exactly dU and Ud — 2 of them — and the formula returns $4^2 - (3^2 + 3^2 - 2^2) = 16 - (9 + 9 - 4) = 16 - 14 = 2$. The logic is sound, so the same formula applied to the real numbers ($\text{total}=95$, $\text{no\_d}=85$, $\text{no\_u}=69$, $\text{neither}=59$) gives the magnitudes we estimated in Phase 3 ($\lvert K_C \rvert \approx 3.9 \times 10^{23}$). The code did not create the analysis; the modeling did. It only confirmed the structure on a case we can see by eye.

🐛 Find the Error: A teammate computes Policy C's "bad" count as no_digit + no_upper (forgetting the - neither term) and concludes the keyspace is smaller than it really is — so they report the policy as weaker than it is. Which counting rule did they violate, and does their error make the security verdict too optimistic or too pessimistic?

Answer

They violated inclusion-exclusion: by not adding back the overlap $59^L$, they over-counted the bad set (counted the all-lowercase-type strings twice), so they subtracted too much from the universe and got a keyspace that is too small. A too-small keyspace makes brute force look faster than it is, so their verdict is too pessimistic (they would under-rate a policy that is actually fine). The fix is the - neither term. Erring pessimistically is safer than optimistically, but it is still wrong.

Discussion Questions

The worst-case time assumes the attacker may have to try every key. The average case is half the keyspace. Why do security analysts almost always quote the worst case (or, equivalently, treat "half" as a rounding detail) rather than the average? When would the average matter?
Policy C's keyspace ($\sim 3.9 \times 10^{23}$) is enormous, yet real breaches still succeed against such policies. Given that the count is correct, what assumption in "brute-force time = keyspace ÷ rate" fails against real attackers, and how do password dictionaries exploit it?
We computed $\log_2(95^{10}) \approx 66$ bits for Policy B. Express Policy A and Policy C (use the $L=12$ portion) in approximate bits of entropy. Why is the "bits" scale more convenient than raw counts when comparing policies?
Suppose the hash is deliberately slowed so each guess takes 0.1 seconds (a "work factor"). Recompute the serious attacker's effective rate and Policy A's worst-case time. How does an expensive hash change which policies are "strong enough" — and why is that a different lever from the keyspace size?
In Step 3c we summed three length-cases. If the policy instead allowed any length from 10 to 128, would you still write out all 119 terms? What single term dominates the sum, and why does that let you approximate the whole keyspace by its largest length alone?

Your Turn: Extensions

Option A (a third complexity rule). Add "must contain at least one symbol" to Policy C, so a password is bad if it misses a digit or misses an uppercase or misses a symbol. Set up the count with three-set inclusion-exclusion (singles − pairs + triple) for a single length $L$, work out each of the seven terms as a power (e.g., "no digit and no uppercase" uses how many characters?), and write the formula. This is the natural three-set generalization of Phase 3.
Option B (entropy comparison table). Compute the approximate bits of entropy for Policies A, B, and C (use $\log_2$ of each keyspace) and build a small table mapping "bits" to "worst-case years at $10^{12}$/s." Identify the bit threshold below which a policy is unsafe against the serious attacker.
Option C (the average-case factor). Reproduce Phases 1-2 but report average-case brute-force time (half the keyspace). Confirm that the verdicts (broken / overkill / strong) are unchanged, and write one sentence explaining why a factor of 2 essentially never flips a feasibility conclusion.

Key Takeaways

A keyspace is a finite set; its size is a product. Model the policy as independent per-position choices and the product rule sizes it in one line — no enumeration, ever.
Complexity rules are counting operations. "At least one X" is the subtraction principle (universe − complement); "missing X or missing Y" is inclusion-exclusion; "a range of lengths" is the sum rule. Policy C used all three at once.
The count is the security argument. Worst-case brute-force time ≈ keyspace ÷ guess rate; that single number, not intuition, decides "strong enough."
A constraint shrinks the keyspace but blocks weak passwords. Counting bounds the brute-force worst case; it does not capture dictionary attacks, which is why complexity rules earn their keep despite reducing the raw count.
The same reasoning scales to cryptography. "How big is the space, how long to search it?" is the sentence behind both an 8-character password and a 2048-bit RSA key (Chapters 22-25). You have built the habit on the small case.