Case Study: Auditing a "Textbook RSA" Voting Service

DataField.Dev

Case Study: Auditing a "Textbook RSA" Voting Service

"The enemy knows the system." — Claude Shannon's restatement of Kerckhoffs's principle

Executive Summary

A startup ships an internal tool, BallotBox, that lets employees submit confidential single-choice votes — a satisfaction rating from 1 to 5, or a yes/no on a policy. The engineers, proud of having read this very chapter, encrypt each vote with RSA they wrote themselves: ciphertext is $c = m^{e} \bmod n$, public key published on the company wiki, private key on the tally server. The math is correct — decryption provably inverts encryption (§25.5). And yet, as a security auditor, you will read every employee's "confidential" vote without ever touching the private key and without factoring $n$.

This case study is the §25.6 lesson made painfully concrete: correctness is not security, and a large unfactorable modulus does not save you if the structure leaks. You will mount two attacks the chapter warns about — the deterministic-encryption attack and the small-message / small-exponent attack — confirm each by hand, and then specify the fixes. Nothing here breaks RSA's underlying trapdoor; everything here breaks the protocol built carelessly on top of it.

Skills applied

Reading a deployed cryptosystem under Kerckhoffs's principle (§25.1): assume the attacker has the code and the public key.
Recognizing that textbook RSA is deterministic and turning that into a chosen-plaintext confirmation attack (§25.6).
Exploiting a small message space and a small public exponent with an integer cube root (§25.6).
Reconstructing the RSA round trip by hand with fast modular exponentiation (§25.4, Chapter 23).
Separating a correctness property (a theorem, §25.5) from a security property (a protocol claim, §25.6).

Background

The scenario

BallotBox works like this. The tally server runs RSA key generation once (§25.4) and publishes the public key $(n, e)$ on the wiki. To vote, an employee's browser encodes the choice as a small integer $m$, computes $c = m^{e} \bmod n$, and POSTs $c$. The server stores the list of ciphertexts; after the poll closes it decrypts each with the private exponent $d$ and tallies.

The encoding is the obvious one:

Choice	Encoded $m$
rating "1"	1
rating "2"	2
rating "3"	3
rating "4"	4
rating "5"	5
"NO"	0
"YES"	1

The published key, taken straight from this chapter's worked example (§25.4), is $$(n, e) = (3233, 17), \qquad \text{with private } d = 2753 \text{ on the server only.}$$ You, the auditor, can see the wiki (so you know $n$ and $e$) and you can sniff the network traffic (so you see each posted ciphertext $c$). You do not have $d$, and the engineers correctly point out that factoring 2048-bit moduli is infeasible — they just used a tiny one here for the demo, but the structure of their system would be identical at full size.

🔗 Connection: Everything you are about to do honors Kerckhoffs's principle from §25.1: the only thing you are not given is the private key. The attack must succeed anyway. If it does, the system is broken in the precise sense the chapter means — security cannot rest on the attacker's ignorance of the design.

Why this matters

"We used RSA" is one of the most common false-comfort statements in software security. RSA's core equation is sound; deployments fail in the wrapper — no padding, predictable messages, reused moduli, tiny exponents. This audit is a microcosm of real CVEs: the cryptographic primitive is fine, and the system is wide open. Learning to see the gap between "the math checks out" and "the system is secure" is the whole point of §25.6.

Phase 1: Confirm the system is correct (so we know that is not the problem)

Before attacking, verify the engineers' claim that decryption inverts encryption — so that when the system falls, no one can wave it away as "a math bug." Take a sample vote, $m = 4$ (rating "4"), and run the full round trip with the published key. Encryption is $c = 4^{17} \bmod 3233$.

Use fast modular exponentiation (§25.4 / Chapter 23). Since $17 = 16 + 1 = (10001)_2$, we need four squarings and one multiply:

$$4^{2} = 16,\quad 4^{4} = 256,\quad 4^{8} \equiv 876,\quad 4^{16} \equiv 1155 \pmod{3233}.$$

The reductions, each kept below $n$ as the §25.4 pitfall demands: - $4^{8} = 256^{2} = 65536$; $65536 = 20\cdot 3233 + 876$, so $4^{8} \equiv 876$. - $4^{16} = 876^{2} = 767376$; $767376 = 237\cdot 3233 + 1155$ (since $237\cdot 3233 = 766221$), so $4^{16} \equiv 1155$.

Then $4^{17} = 4^{16}\cdot 4 \equiv 1155\cdot 4 = 4620 = 3233 + 1387 \equiv 1387 \pmod{3233}$. So a rating-"4" vote travels as the ciphertext $c = 1387$.

def mod_pow(b, e, n):
    """Fast modular exponentiation, b^e mod n (Chapter 23). Equivalent to pow(b, e, n)."""
    result = 1
    b = b % n
    while e > 0:
        if e & 1:
            result = (result * b) % n
        b = (b * b) % n
        e >>= 1
    return result

n, e, d = 3233, 17, 2753          # published key + server's private exponent
c = mod_pow(4, e, n)              # encrypt the vote m = 4
back = mod_pow(c, d, n)           # server decrypts with d
print(c, back)
# Expected output:
# 1387 4

The decryption returns $4$ — exactly the input — because $ed \equiv 1 \pmod{\phi(n)}$ and the correctness theorem of §25.5 guarantees $(m^{e})^{d}\equiv m$. (The server's $d = 2753$ satisfies $17\cdot 2753 = 46801 = 15\cdot 3120 + 1$, the check the chapter ran in §25.4.) The system is correct. Now we break it anyway.

🔄 Check Your Understanding We verified correctness on $m = 4$. Does the §25.5 theorem promise the round trip also works for the "NO" vote $m = 0$? Compute $0^{17} \bmod 3233$ and $0^{2753} \bmod 3233$ in your head.

Answer

Yes — the general-case proof of §25.5 covers every $m$ with $0 \le m < n$, including $m = 0$. And trivially $0^{17} = 0$ and $0^{2753} = 0$, so the round trip returns $0$. (This is also a foreshadowing of the attack: a "NO" vote always produces the ciphertext $0$, the most recognizable value possible.)

Phase 2: The deterministic-encryption attack

Here is the flaw, and it is exactly the first bullet of §25.6: textbook RSA is deterministic. The public key is public, the encoding is public, and there are only seven possible plaintexts. So you, the attacker, simply encrypt all of them yourself and build a lookup table from ciphertext back to vote. No private key. No factoring. Just the public key the engineers handed you.

Encrypt each candidate $m \in \{0,1,2,3,4,5\}$ with the public key (note "YES" and rating "1" both encode to $m=1$, so six distinct plaintexts). We already have $m=4 \to 1387$. Compute the rest the same way ($e = 17 = (10001)_2$: square four times, multiply once):

$m = 0$: $0^{17} \equiv 0$.
$m = 1$: $1^{17} \equiv 1$.
$m = 2$: $2^{2}=4,\ 2^{4}=16,\ 2^{8}=256,\ 2^{16}=65536\equiv 876$; then $2^{17}\equiv 876\cdot 2 = 1752$. (Check: $2^{17} = 131072 = 40\cdot 3233 + 1752$. ✓)
$m = 3$: $3^{8} = 6561 \equiv 95$; $3^{16} = 95^{2} = 9025 \equiv 2559$ (since $9025 = 2\cdot 3233 + 2559$); $3^{17} \equiv 2559\cdot 3 = 7677 \equiv 1211$ (since $7677 = 2\cdot 3233 + 1211$).
$m = 4$: $1387$ (Phase 1).
$m = 5$: $5^{8} = 390625 \equiv 2665$ (since $390625 = 120\cdot 3233 + 2665$); $5^{16} = 2665^{2} = 7102225 \equiv 2557$ (since $7102225 = 2196\cdot 3233 + 2557$); $5^{17} \equiv 2557\cdot 5 = 12785 \equiv 3086$ (since $12785 = 3\cdot 3233 + 3086$).

That yields the attacker's rainbow table for this poll:

Vote (encoded $m$)	Ciphertext $c = m^{17}\bmod 3233$
NO (0)	0
YES / rating 1 (1)	1
rating 2 (2)	1752
rating 3 (3)	1211
rating 4 (4)	1387
rating 5 (5)	3086

n, e = 3233, 17                       # ONLY the public key — no d, no p, no q
table = {mod_pow(m, e, n): m for m in range(6)}   # encrypt every possible vote
sniffed = [1387, 3086, 0, 1211, 1752, 1387]       # ciphertexts seen on the wire
votes = [table[c] for c in sniffed]               # invert via the public table
print(votes)
# Expected output:
# [4, 5, 0, 3, 2, 4]

Every "confidential" ballot is now in plaintext. The two employees who voted rating "4" produced the identical ciphertext $1387$ — so you also learn which voters agreed, a leak beyond the votes themselves. The attack cost six encryptions with the public key.

⚠️ Common Pitfall: The engineers protest, "but $n = 3233$ is just for the demo; at 2048 bits this table would be impossible to precompute!" It would not. The table's size is the size of the message space, not the modulus. There are six possible votes whether $n$ has 4 digits or 617 digits; the attacker encrypts six values and is done. A giant unfactorable $n$ defends against factoring, not against a small, guessable plaintext set. This is the §25.6 point that "a big key space does not imply security if the structure leaks" — and a deterministic map from six inputs is about as leaky as structure gets.

💡 Intuition: Determinism turns encryption into a public function the attacker can also evaluate. Because $\text{Enc}(m) = m^{e}\bmod n$ uses only public data, the attacker computes $\text{Enc}$ on every plausible $m$ and matches. Secrecy required that identical plaintexts produce different ciphertexts — which is exactly what randomized padding (OAEP) provides, and what textbook RSA omits.

Phase 3: The small-message / small-exponent attack

Suppose a different team fixes the "only six votes" problem by letting employees submit a short numeric comment code (say a 1–3 digit reference number) and, for speed, picks the small public exponent $e = 3$. They reason: the message space is now larger, so the Phase 2 table attack is impractical. But they have walked straight into the second §25.6 pitfall.

For this variant the key (a fresh, legal one — note $e=3$ was illegal for $n=3233$ because $3 \mid \phi(3233)$, so the team must regenerate) is $$p = 11,\ q = 23,\ n = 253,\ \phi(n) = 10\cdot 22 = 220,\ e = 3,\ d = 147,$$ since $3\cdot 147 = 441 = 2\cdot 220 + 1$ gives $ed \equiv 1 \pmod{220}$. The flaw: whenever the comment code $m$ is small enough that $m^{3} < n$, the modular reduction never fires, so $c = m^{3}$ as an ordinary integer — and the attacker just takes a cube root.

How small is "small enough"? We need $m^{3} < 253$. Since $6^{3} = 216 < 253$ but $7^{3} = 343 > 253$, every code $m \in \{0,1,\dots,6\}$ encrypts with no wraparound. Take a sniffed ciphertext $c = 216$:

$$m = \sqrt[3]{216} = 6.$$

No private key, no factoring — an ordinary integer cube root recovers the plaintext.

c = 216                       # a sniffed ciphertext, e = 3, n = 253
m = round(c ** (1.0 / 3))     # plain integer cube root: no n, no d needed
print(m, m ** 3 == c)         # verify the recovered message really cubes to c
# Expected output:
# 6 True

The recovered $m = 6$ satisfies $6^{3} = 216 = c$ exactly, confirming the message wrapped around the modulus not at all. For codes $m \ge 7$ the cube wraps and this naive root fails — but small, common codes (a single digit, a leading-zero reference) leak immediately.

🔄 Check Your Understanding With $e = 3$ and $n = 253$, why does the cube-root attack fail for $m = 10$? Compute $10^{3} \bmod 253$ and show the wraparound.

Answer

$10^{3} = 1000$, and $1000 = 3\cdot 253 + 241$, so $c = 241$, not $1000$. The reduction fired (three times), so $\sqrt[3]{241} \approx 6.2$ is meaningless — it is not $10$. The attack only works while $m^{e} < n$; padding to make $m$ large (and randomized) is the §25.6 fix that defeats it.

Phase 4: Diagnose — correctness vs. security

Step back and name precisely what failed, using the chapter's own distinction. Two claims were in play:

A correctness claim (a theorem, §25.5): for the generated keys, $(m^{e})^{d}\equiv m \pmod n$ for every message. We verified this in Phase 1; it never broke. It is a theorem, proved with Euler and CRT.
A security claim (a protocol property, §25.6): an attacker holding $(n,e)$ and the ciphertext cannot recover $m$. This is what BallotBox needed and never had. It is not implied by claim 1.

The engineers proved (1) and assumed it gave them (2). It does not. The security of textbook RSA reduces to factoring only against an attacker who must recover $d$ — and neither attack here ever wanted $d$. Phase 2 exploited a tiny message space plus determinism; Phase 3 exploited a small message plus a small exponent. Both are §25.6 pitfalls, and both live entirely in the wrapper around the sound core.

🔗 Connection: This is the same lesson as the substitution cipher in §25.1. There, an enormous key space ($26!$) did not save the cipher because letter frequencies (structure) leaked. Here, an unfactorable modulus does not save the system because determinism and small messages (structure) leak. "Large key space" and "secure" are different claims at every scale of the subject.

Discussion Questions

Phase 2's table had six rows because there were six possible votes. Suppose BallotBox instead collected a 4-digit PIN ($m \in \{0,\dots,9999\}$). Is the deterministic attack still feasible? How many encryptions would the attacker precompute, and does the size of $n$ change that number?
The "NO" vote always encrypts to $0$ and the "YES"/rating-1 vote always to $1$, regardless of the key. Why? Which step of RSA encryption makes $m = 0$ and $m = 1$ fixed points, and why is this an especially blunt instance of the determinism problem?
Real RSA uses randomized padding (OAEP) so that encrypting the same $m$ twice yields different ciphertexts. Explain, in terms of Phase 2, exactly why this defeats the rainbow-table attack — what can the attacker no longer precompute?
In Phase 3, the team thought a larger message space fixed their problem. It addressed Phase 2 but opened Phase 3. Argue that the real fix is not "bigger messages" or "bigger $n$" but randomized padding plus a standard exponent, and connect this to the §25.6 claim that security is a property of the whole protocol, not the core equation.
The audit never tried to factor $n$. Under what circumstances would an attacker need to factor $n$, and which §25.6 theorem describes the consequence of succeeding?

Your Turn: Extensions

Option A (the YES/NO leak). Build the Phase-2 table for a pure yes/no poll (only $m\in\{0,1\}$) under any public key you like, and explain why such a poll is unconditionally broken by textbook RSA — the table has two rows you can write without any computation at all. Tie this to Discussion Q2.
Option B (defeating the table). Sketch a "poor man's padding": before encrypting, the client appends a few random low-order digits to $m$ (e.g., $m' = 100\cdot m + r$ for random $r\in{0,\dots, 99}$, with the server stripping $r$ after decryption). Show with a small hand example that the same vote now produces different ciphertexts, and state one reason this homemade scheme is still not OAEP (what guarantee does it lack?).
Option C (the cube-root boundary). For $e = 3$ and a modulus $n$ of your choice, derive the exact threshold $T$ such that messages $m < T$ leak to a cube root while $m \ge T$ do not (it is $T = \lceil n^{1/3}\rceil$). Verify it on $n = 253$ (you should recover $T = 7$) and on one larger $n$ you pick. Explain why padding that forces $m$ to be a full-width integer kills this attack.
Option D (toward the capstone). BallotBox's real fix needs padding and an integrity check so votes cannot be silently altered. Note which Chapter 39 capstone track (Track A) builds padding and a signature mode on top of the RSA core you implemented in §25.4, and list the two new behaviors a production version would add to crypto.py.

Key Takeaways

Correctness is a theorem; security is a separate claim. §25.5 guarantees decryption inverts encryption; it says nothing about whether an eavesdropper can read the message. BallotBox had the first and lacked the second.
Textbook RSA is deterministic, and determinism leaks. A public, deterministic encryption function over a small or guessable message space is invertible by the attacker with the public key — no private key, no factoring. Randomized padding (OAEP) is what makes encryption non-deterministic.
Small messages plus small exponents leak to arithmetic, not cryptanalysis. With $e=3$ and $m^{e}
A big modulus defends against factoring, not against bad structure. The number of encryptions in the rainbow table equals the size of the message space, independent of $\lvert n\rvert$. This is the §25.1 substitution-cipher lesson reincarnated: large key space $\ne$ secure when structure leaks.
Audit the protocol, not just the primitive. Padding, randomness, exponent choice, key management, and message encoding are where deployed RSA actually breaks — exactly the pitfalls §25.6 enumerates.