Chapter 2: Cryptographic Foundations: Hashing, Digital Signatures, and Public-Key Cryptography

34 min read

In Chapter 1, we described a blockchain as a chain of blocks linked by cryptographic hashes, where transactions are authorized by digital signatures and verified through Merkle trees. We used those terms as if they were self-evident. They are not...

Learning Objectives

Implement SHA-256 hashing in Python and demonstrate its three critical properties: determinism, avalanche effect, and preimage resistance
Explain public-key cryptography using the mathematical intuition of trapdoor functions and elliptic curves
Generate ECDSA key pairs, sign messages, and verify signatures using Python
Build a Merkle tree from scratch and generate/verify inclusion proofs
Connect each cryptographic primitive to its specific role in blockchain systems

In This Chapter

Why Cryptography Is the Foundation, Not the Decoration
2.1 Hash Functions: The Digital Fingerprint
2.2 Public-Key Cryptography: The Lock That Only You Can Open
2.3 Digital Signatures: Proving Ownership Without Revealing Secrets
2.4 Merkle Trees: Efficient Verification at Scale
2.5 Putting It All Together: How These Primitives Combine in a Blockchain
2.6 Advanced Sidebar: Post-Quantum Cryptography and Blockchain
2.7 A Warning: Never Roll Your Own Cryptography
2.8 Chapter Summary
Key Vocabulary Quick Reference

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 2: Cryptographic Foundations: Hashing, Digital Signatures, and Public-Key Cryptography

Why Cryptography Is the Foundation, Not the Decoration

In Chapter 1, we described a blockchain as a chain of blocks linked by cryptographic hashes, where transactions are authorized by digital signatures and verified through Merkle trees. We used those terms as if they were self-evident. They are not. If you walked away from Chapter 1 thinking "I get the general idea," that is precisely the level of understanding this chapter is designed to destroy and rebuild on solid ground.

Here is the uncomfortable truth: if you cannot explain how a hash function works, what a digital signature proves, and why a Merkle tree enables efficient verification, you do not understand blockchain. You understand a metaphor about blockchain. Metaphors are useful for cocktail parties. They are useless for building systems, auditing protocols, or evaluating whether a cryptocurrency project is technically sound or an elaborate confidence game.

This chapter builds the cryptographic toolkit from the ground up. We will implement every primitive in Python. Not because you need to write your own cryptographic library — you absolutely should not do that in production — but because running the code and seeing the math work is the difference between "I believe hashing is one-way" and "I understand why hashing is one-way."

We will cover four primitives:

Hash functions (specifically SHA-256) — the digital fingerprints that link blocks, secure passwords, and enable proof-of-work mining.
Public-key cryptography (specifically Elliptic Curve Cryptography) — the mathematical framework that lets you have a public identity without revealing your private secrets.
Digital signatures (specifically ECDSA) — the mechanism that proves you authorized a transaction without revealing your private key.
Merkle trees — the data structure that lets you verify a single transaction belongs to a block without downloading the entire block.

By the end of this chapter, you will have written working Python code for all four. These become the building blocks for the blockchain we construct in Chapter 6.

Let us begin with the simplest and most important primitive.

2.1 Hash Functions: The Digital Fingerprint

2.1.1 What a Hash Function Does

A hash function is a mathematical function that takes an input of any size and produces a fixed-size output. That output is called a hash, a digest, or colloquially, a fingerprint.

Consider the SHA-256 hash function, which produces a 256-bit (32-byte) output, typically displayed as 64 hexadecimal characters:

import hashlib

message = "Hello, blockchain!"
hash_output = hashlib.sha256(message.encode()).hexdigest()
print(hash_output)
# Output: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

That 64-character string is the SHA-256 hash of "Hello, blockchain!" It is always the same for that input. It is always 64 hexadecimal characters long regardless of whether the input is a single character or the entire text of War and Peace. And given only the hash output, there is no known way to recover the original input.

These three properties are not accidental features. They are the entire point.

2.1.2 The Three Critical Properties of Cryptographic Hash Functions

A function qualifies as a cryptographic hash function only if it satisfies three properties. Not two. Not "mostly." All three, or it is not suitable for security applications.

Property 1: Preimage Resistance (One-Way)

Given a hash output h, it is computationally infeasible to find any input m such that hash(m) = h.

This is the "one-way" property. You can compute the hash from the input, but you cannot compute the input from the hash. "Computationally infeasible" means that the best known attack requires approximately 2^256 operations for SHA-256 — a number so large that every computer on Earth running until the heat death of the universe would not complete the search.

Let us demonstrate this in Python. We will hash a secret message, then try a brute-force search to find it:

import hashlib
import time

# The "secret" — in reality this would be unknown
secret = "blockchain"
target_hash = hashlib.sha256(secret.encode()).hexdigest()
print(f"Target hash: {target_hash}")

# Try to find the secret by brute force (limited dictionary attack)
wordlist = ["bitcoin", "ethereum", "crypto", "mining", "ledger",
            "blockchain", "token", "wallet", "hash", "node"]

start = time.time()
for word in wordlist:
    if hashlib.sha256(word.encode()).hexdigest() == target_hash:
        print(f"Found it: '{word}' in {time.time() - start:.6f} seconds")
        break

This only works because our "secret" was in the wordlist. If the input is random, or even a long passphrase not in any dictionary, brute force is hopeless.

💡 Key Insight: Preimage resistance is what makes proof-of-work mining possible. Miners must find an input that produces a hash below a target value. Because hashing is one-way, the only strategy is trial and error — which requires real computational work. That is the "work" in proof-of-work.

Property 2: Second Preimage Resistance (Weak Collision Resistance)

Given an input m1 and its hash hash(m1), it is computationally infeasible to find a different input m2 (where m2 != m1) such that hash(m1) = hash(m2).

This property protects data integrity. If someone sends you a document and its hash, no adversary can create a different document that produces the same hash. In blockchain, this means a transaction, once hashed and included in a block, cannot be replaced with a different transaction that has the same hash.

Property 3: Collision Resistance (Strong Collision Resistance)

It is computationally infeasible to find any two different inputs m1 and m2 such that hash(m1) = hash(m2).

This is a stronger property than second preimage resistance. Here the attacker gets to choose both inputs. Even with that freedom, finding a collision should require approximately 2^128 operations for SHA-256 (due to the birthday paradox — more on this below).

⚠️ Common Misconception: "Collisions are impossible." No. SHA-256 maps an infinite input space to 2^256 possible outputs. Collisions must exist — the pigeonhole principle guarantees it. The point is that finding one is computationally infeasible. There is a universe of difference between "exists" and "findable."

2.1.3 The Avalanche Effect

Change one bit of the input, and approximately half the output bits change. This is called the avalanche effect, and it is what makes hash functions behave like random number generators (without actually being random — they are deterministic).

Let us see this in Python:

import hashlib

def show_avalanche(msg1, msg2):
    hash1 = hashlib.sha256(msg1.encode()).hexdigest()
    hash2 = hashlib.sha256(msg2.encode()).hexdigest()

    # Convert hex to binary and count differing bits
    bin1 = bin(int(hash1, 16))[2:].zfill(256)
    bin2 = bin(int(hash2, 16))[2:].zfill(256)
    diff_bits = sum(b1 != b2 for b1, b2 in zip(bin1, bin2))

    print(f"Input 1: '{msg1}'")
    print(f"Hash 1:  {hash1}")
    print(f"Input 2: '{msg2}'")
    print(f"Hash 2:  {hash2}")
    print(f"Bits changed: {diff_bits}/256 ({diff_bits/256*100:.1f}%)")
    print()

show_avalanche("Hello", "Hellp")      # One character changed
show_avalanche("Block 1", "Block 2")   # One digit changed
show_avalanche("a", "b")               # Minimal change

Running this code, you will find that each pair differs by approximately 128 bits (50%) — regardless of how small the change in the input. The hashes look completely unrelated. There is no pattern, no gradient, no way to infer "this hash is close to that hash because the inputs were similar."

📊 By the Numbers: For SHA-256, each output bit has approximately a 50% probability of flipping when any input bit changes. Across 256 output bits, the expected number of changed bits is 128. This is not a coincidence — it is a design requirement. The SHA-256 specification was engineered to maximize diffusion, ensuring that every input bit influences every output bit.

2.1.4 SHA-256: A Conceptual Overview

SHA-256 (Secure Hash Algorithm, 256-bit) was designed by the National Security Agency and published by NIST in 2001 as part of the SHA-2 family. Bitcoin's Satoshi Nakamoto chose it in 2008, and it remains the hash function used by Bitcoin, many other blockchains, and countless security applications.

The algorithm works in stages:

Padding: The input message is padded to a multiple of 512 bits. The padding includes the original message length, which prevents length-extension attacks.
Parsing: The padded message is split into 512-bit blocks.
Initialization: Eight 32-bit working variables are initialized to specific constant values (derived from the square roots of the first eight primes).
Compression: Each 512-bit block is processed through 64 rounds of bitwise operations, additions, and rotations. Each round mixes the input data with the current state using operations that are easy to compute forward but impossible to reverse.
Output: The final values of the eight working variables are concatenated to produce the 256-bit hash.

The critical insight is step 4: the compression function is lossy. Each 512-bit block of input is compressed into a 256-bit state update. Information is destroyed in this process, which is exactly why the function is one-way. You cannot reconstruct 512 bits of information from 256 bits of output — the other 256 bits are gone.

We will not implement SHA-256 from scratch (that would require a full chapter on its own and would teach you more about bit manipulation than about blockchain). Instead, we use Python's hashlib, which wraps the highly optimized OpenSSL implementation.

2.1.5 Hash Functions in Blockchain: Where They Appear

Hash functions serve at least four distinct roles in blockchain systems:

Role	How It Works	Example
Block linking	Each block contains the hash of the previous block's header	Bitcoin block header includes `prev_block_hash`
Proof-of-work	Miners search for a nonce that makes the block hash below a target	Bitcoin mining adjusts difficulty every 2,016 blocks
Transaction IDs	Each transaction is identified by its hash	Bitcoin txid is the double-SHA-256 of the serialized transaction
Merkle roots	Transactions are hashed into a Merkle tree; the root goes in the block header	Enables SPV (Simplified Payment Verification) for light clients

We will build all four of these uses across Chapters 2-6. For now, let us implement the basic hashing operations:

import hashlib
import json

def hash_data(data: str) -> str:
    """Hash a string using SHA-256 and return the hex digest."""
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

def hash_block_header(index, previous_hash, timestamp, merkle_root, nonce):
    """Hash a block header — we'll use this structure in Chapter 6."""
    header = f"{index}{previous_hash}{timestamp}{merkle_root}{nonce}"
    return hash_data(header)

# Double hashing (as Bitcoin does it)
def double_sha256(data: str) -> str:
    """Bitcoin uses SHA-256(SHA-256(x)) for most hashing operations."""
    first_hash = hashlib.sha256(data.encode('utf-8')).digest()
    return hashlib.sha256(first_hash).hexdigest()

# Demonstration
print("Single SHA-256:", hash_data("Hello, blockchain!"))
print("Double SHA-256:", double_sha256("Hello, blockchain!"))

🧪 Try It Yourself: Modify the hash_data function to accept a file path and hash the file contents. Then hash two different files and verify that even files differing by a single byte produce completely different hashes.

2.1.6 The Birthday Paradox and Collision Bounds

How hard is it to find a collision? Intuitively, you might think you need to try 2^256 inputs. In fact, the birthday paradox tells us the expected number is closer to 2^128.

The birthday paradox is named after a classic probability puzzle: in a room of just 23 people, there is a greater than 50% chance that two share a birthday. This is surprising because there are 365 possible birthdays, and 23 is far less than 365. The key insight is that you are comparing every pair of people, and the number of pairs grows quadratically: 23 people produce 23 x 22 / 2 = 253 pairs.

For hash functions, the same principle applies. If you hash 2^128 random inputs using SHA-256, you have a roughly 50% chance of finding two that collide. While 2^128 is still astronomically large (approximately 3.4 x 10^38 operations), it is the square root of 2^256, which is a significant reduction.

This is why cryptographers quote the "security level" of SHA-256 as 128 bits for collision resistance, not 256 bits. For blockchain applications, 128 bits of collision resistance is considered more than sufficient for the foreseeable future.

import math

def birthday_bound(hash_bits):
    """Calculate the approximate number of hashes needed for a 50% collision chance."""
    return math.sqrt(2 ** hash_bits * math.log(2))

print(f"SHA-256 birthday bound: ~2^{math.log2(birthday_bound(256)):.1f}")
print(f"For comparison, SHA-1 (160-bit): ~2^{math.log2(birthday_bound(160)):.1f}")
print(f"SHA-1 was broken in practice by Google in 2017 (SHAttered attack)")

🔗 Connection: In 2017, Google's SHAttered project produced the first practical SHA-1 collision, requiring 2^63.1 operations — far fewer than the birthday bound of 2^80 due to structural weaknesses in SHA-1. This is why SHA-256 replaced SHA-1 in security-critical applications. Blockchain was wise to use SHA-256 from the start.

2.2 Public-Key Cryptography: The Lock That Only You Can Open

2.2.1 The Key Insight: Asymmetric Encryption

Before 1976, all cryptography was symmetric: both parties shared the same secret key. If Alice wanted to send Bob a secret message, they first had to agree on a key — which required either meeting in person or using a trusted courier. This is the key distribution problem, and it stymied cryptography for thousands of years.

In 1976, Whitfield Diffie and Martin Hellman published "New Directions in Cryptography," proposing a radical idea: what if encryption and decryption used different keys? One key encrypts; the other decrypts. You can publish one key (the public key) to the entire world while keeping the other (the private key) secret.

This is asymmetric or public-key cryptography, and it is the foundation of all digital identity on the internet — and all ownership on blockchains.

The core requirement is a trapdoor function: a mathematical operation that is easy to compute in one direction but infeasible to reverse unless you know a secret. Two families of trapdoor functions dominate cryptography:

Integer factorization (RSA): multiplying two large primes is easy; factoring their product is hard.
Elliptic curve discrete logarithm (ECC): multiplying a point on an elliptic curve by a scalar is easy; finding the scalar given the original and resulting points is hard.

Blockchain overwhelmingly uses ECC, and specifically the Elliptic Curve Digital Signature Algorithm (ECDSA) with the secp256k1 curve. We will focus on ECC, but first, a brief detour through RSA for intuition.

2.2.2 RSA Intuition: Why Factoring Is Hard

RSA, published in 1977 by Rivest, Shamir, and Adleman, relies on a simple observation:

Pick two large prime numbers p and q.
Compute n = p * q.
Given n, finding p and q is extremely hard if both primes are large (2048+ bits).

Multiplication is fast. Factoring is slow. This asymmetry is the trapdoor.

import time

# Easy direction: multiplication
p = 104729
q = 104743
start = time.time()
n = p * q
mult_time = time.time() - start
print(f"Multiplication: {p} x {q} = {n} (took {mult_time:.8f} seconds)")

# Hard direction: factoring (brute force for small numbers)
start = time.time()
for i in range(2, int(n**0.5) + 1):
    if n % i == 0:
        found_p, found_q = i, n // i
        break
factor_time = time.time() - start
print(f"Factoring: {n} = {found_p} x {found_q} (took {factor_time:.8f} seconds)")
print(f"Factoring was {factor_time/max(mult_time, 1e-10):.0f}x slower")

For numbers this small, factoring is fast. But the difficulty grows exponentially with the number of digits. RSA-2048 (a 2048-bit number — roughly 617 decimal digits) is believed to be unfactorable with current technology.

⚠️ Common Misconception: "RSA has been broken." It has not. Specific key sizes (512-bit, 768-bit) have been factored, and the general recommendation has increased from 1024 to 2048 bits. RSA-2048 remains secure. However, blockchains do not use RSA because ECC provides equivalent security with much smaller keys.

2.2.3 Elliptic Curve Cryptography: Why Blockchain Chose ECC

An elliptic curve is defined by the equation:

y^2 = x^3 + ax + b

For the secp256k1 curve used by Bitcoin and Ethereum, the specific parameters are:

y^2 = x^3 + 7  (over a finite field of prime order p)

where a = 0, b = 7, and p is a specific 256-bit prime.

Over the real numbers, this curve looks like a smooth, continuous curve in the x-y plane. But cryptography does not operate over real numbers. Instead, we work over a finite field — the integers modulo a large prime p. In this finite field, the curve becomes a scattered collection of discrete points that nonetheless obey the same algebraic rules as the smooth curve. This is important: the algebra works the same way, but the geometry loses its visual continuity, which is part of why the discrete logarithm problem is so hard.

2.2.4 Point Addition: The Geometric Foundation

The power of elliptic curves comes from a special operation called point addition. Given two points P and Q on the curve, we can define their "sum" P + Q as follows:

Draw a straight line through P and Q.
This line intersects the curve at exactly one other point (by the algebraic properties of cubic curves).
Reflect that intersection point across the x-axis. The result is P + Q.

When P = Q (you are adding a point to itself, called point doubling), you use the tangent line at P instead of the line through two distinct points.

These rules may seem arbitrary, but they satisfy all the properties of a mathematical group: they are associative, commutative, have an identity element (the "point at infinity"), and every element has an inverse. This group structure is what makes elliptic curve cryptography possible.

Point multiplication — the operation we actually need — is simply repeated point addition:

k * G = G + G + G + ... + G  (k times)

In practice, this is computed efficiently using the double-and-add algorithm, which requires only about log2(k) point additions instead of k. For a 256-bit scalar, this means approximately 256 point additions — fast on any modern computer.

The trapdoor is this: given G and P = k * G, finding k is the Elliptic Curve Discrete Logarithm Problem (ECDLP). Over a finite field, the points do not follow a visually traceable path. After point multiplication, the result appears to be a random point on the curve with no obvious geometric relationship to the starting point. For the 256-bit curves used in blockchain, solving ECDLP requires approximately 2^128 operations — about the same as breaking SHA-256's collision resistance.

💡 Key Insight: The double-and-add algorithm is what makes point multiplication fast. To compute 200 * G, you do not add G to itself 200 times. Instead, you compute G, 2G, 4G, 8G, ... (by repeated doubling), then combine: 200 * G = 128G + 64G + 8G. This requires only 8 doublings and 3 additions. The forward operation is fast (O(log k)), but reversing it — finding k given the result — has no known shortcut. This asymmetry between forward computation (fast) and reverse computation (infeasible) is the mathematical core of all public-key cryptography in blockchain.

Why does blockchain prefer ECC over RSA?

Property	RSA-3072	ECC-256 (secp256k1)
Security level	128 bits	128 bits
Private key size	3072 bits	256 bits
Public key size	3072 bits	512 bits (uncompressed)
Signature size	3072 bits	512 bits
Key generation speed	Slow (prime finding)	Fast (random number)

ECC achieves the same security with keys that are 12 times smaller. In a system where every transaction includes a public key and a signature, and millions of transactions are stored forever, size matters enormously.

2.2.4 Key Generation: From Random Number to Blockchain Address

In blockchain, your private key is simply a random 256-bit integer. Your public key is the result of multiplying the generator point G by your private key. Your address is a hash of your public key.

Private Key (256-bit random number)
    |
    v  [Elliptic curve point multiplication: P = k * G]
Public Key (point on secp256k1 curve)
    |
    v  [SHA-256 + RIPEMD-160 hashing + encoding]
Blockchain Address (e.g., Bitcoin address starting with "1")

Let us implement this in Python:

from ecdsa import SECP256k1, SigningKey
import hashlib

# Step 1: Generate a private key (random 256-bit integer)
private_key = SigningKey.generate(curve=SECP256k1)
private_key_hex = private_key.to_string().hex()
print(f"Private key: {private_key_hex}")
print(f"Private key length: {len(private_key_hex) * 4} bits")

# Step 2: Derive the public key (point multiplication)
public_key = private_key.get_verifying_key()
public_key_hex = public_key.to_string().hex()
print(f"\nPublic key (uncompressed): {public_key_hex}")
print(f"Public key length: {len(public_key_hex) * 4} bits")

# Step 3: Derive a simplified address (SHA-256 of public key)
# Note: Real Bitcoin addresses use SHA-256 + RIPEMD-160 + Base58Check
address = hashlib.sha256(bytes.fromhex(public_key_hex)).hexdigest()[:40]
print(f"\nSimplified address: {address}")

The critical security property: you can freely share your public key and address. No one can derive your private key from them. This is the ECDLP guarantee.

💡 Key Insight: Your private key is literally a random number. There is nothing magical about it. The security comes from the mathematical difficulty of reversing point multiplication, not from any special property of the number itself. This is why protecting your private key is so important — if someone learns your random number, they become you on the blockchain.

2.2.6 Compressed vs. Uncompressed Public Keys

A point on the secp256k1 curve has two coordinates: x and y, each 256 bits. The uncompressed public key stores both, for a total of 512 bits (64 bytes). But since the point lies on the curve y^2 = x^3 + 7, if you know x, there are only two possible values of y (one positive, one negative in the finite field). A compressed public key stores only the x coordinate plus a single bit indicating which y to use — a total of 257 bits (33 bytes).

Bitcoin originally used uncompressed keys but later added support for compressed keys. Most modern wallets use compressed keys exclusively, saving 31 bytes per public key per transaction. Over millions of transactions stored permanently on the blockchain, this adds up to significant space savings.

from ecdsa import SECP256k1, SigningKey

private_key = SigningKey.generate(curve=SECP256k1)
public_key = private_key.get_verifying_key()

# Uncompressed: 04 || x || y (65 bytes total with prefix)
uncompressed = b'\x04' + public_key.to_string()
print(f"Uncompressed: {uncompressed.hex()}")
print(f"  Length: {len(uncompressed)} bytes")

# The x-coordinate alone (first 32 bytes of to_string())
x_coord = public_key.to_string()[:32]
print(f"X-coordinate: {x_coord.hex()}")
print(f"  Length: {len(x_coord)} bytes")
print(f"  Savings: {len(uncompressed) - len(x_coord) - 1} bytes per key")

📊 By the Numbers: Bitcoin has processed over 900 million transactions as of 2025. If each transaction saves 31 bytes by using compressed keys, the total savings exceed 27 GB of blockchain storage. For a distributed system where every full node stores the complete chain, this is a meaningful optimization.

2.2.7 The Keyspace: Why Collisions Are Effectively Impossible

A 256-bit private key means there are 2^256 possible keys. To appreciate this number:

2^256 is approximately 1.16 x 10^77.
The estimated number of atoms in the observable universe is approximately 10^80.
The number of possible private keys is roughly 1/1000th the number of atoms in the universe.

The probability that two people independently generate the same private key is so small that it rounds to zero for all practical purposes. If every person on Earth generated one billion keys per second for a billion years, the probability of any collision would still be negligible.

import math

keyspace = 2**256
earth_population = 8e9
keys_per_second = 1e9
seconds_per_year = 365.25 * 24 * 3600
years = 1e9

total_keys_generated = earth_population * keys_per_second * seconds_per_year * years
collision_probability = total_keys_generated**2 / (2 * keyspace)

print(f"Keyspace: 2^256 = {keyspace:.2e}")
print(f"Total keys generated: {total_keys_generated:.2e}")
print(f"Collision probability: {collision_probability:.2e}")
print(f"That's a 1 in {1/collision_probability:.2e} chance")

📊 By the Numbers: The probability above works out to approximately 1 in 10^25 — after generating 2.5 x 10^26 keys. For context, you are far more likely to be struck by lightning while being bitten by a shark while winning the lottery. Private key collision is not a realistic threat vector.

2.3 Digital Signatures: Proving Ownership Without Revealing Secrets

2.3.1 What Digital Signatures Prove

A digital signature provides three guarantees:

Authentication: The message was signed by the holder of a specific private key.
Integrity: The message has not been altered since it was signed.
Non-repudiation: The signer cannot later deny having signed the message.

In blockchain, digital signatures serve a specific purpose: when you send cryptocurrency, your signature proves that you, the holder of the private key associated with the sending address, authorized the transaction. No one else could have produced that signature. And anyone can verify it using only your public key — your private key never leaves your device.

This is the fundamental mechanism of ownership in blockchain. You "own" Bitcoin not because a bank says so, but because you hold the private key needed to sign transactions moving those coins.

2.3.2 The ECDSA Signing Algorithm

The Elliptic Curve Digital Signature Algorithm (ECDSA) uses the secp256k1 curve. Here is the process, explained step by step:

Signing (creating a signature):

Compute the hash of the message: z = SHA-256(message).
Generate a random number k (the nonce — number used once). This must be truly random and unique for each signature.
Compute the point R = k * G on the elliptic curve.
Take the x-coordinate of R and call it r.
Compute s = k^(-1) * (z + r * private_key) mod n, where n is the order of the curve.
The signature is the pair (r, s).

Verification (checking a signature):

Given the message, the signature (r, s), and the signer's public key P:

Compute the hash of the message: z = SHA-256(message).
Compute u1 = z * s^(-1) mod n.
Compute u2 = r * s^(-1) mod n.
Compute the point R' = u1 * G + u2 * P.
If the x-coordinate of R' equals r, the signature is valid.

Why does this work? Because the math is constructed so that R' equals the original R only if the signature was produced by the holder of the private key corresponding to P. The proof relies on the algebraic properties of elliptic curves, and we encourage the mathematically curious reader to work through the derivation in the exercises.

⚠️ Common Misconception: "The private key is embedded somewhere in the signature." It is not. The signature is computed using the private key, but the private key cannot be extracted from the signature (assuming the nonce k is random and unique — a critical assumption we will return to in Case Study 1).

2.3.3 The Nonce: The Most Dangerous Number in Cryptography

Step 2 above says "generate a random number k." This innocuous instruction hides a catastrophic vulnerability: if you ever reuse a nonce, or if your nonce is predictable, your private key can be computed from two signatures.

Here is why. Given two signatures (r1, s1) and (r2, s2) that used the same nonce k:

Since k is the same, r1 = r2 (they are both the x-coordinate of k * G).
From the two s values: s1 - s2 = k^(-1) * (z1 - z2) mod n.
Therefore: k = (z1 - z2) * (s1 - s2)^(-1) mod n.
Once you have k, you can compute the private key: private_key = (s1 * k - z1) * r1^(-1) mod n.

Your private key. Extracted. From two signatures with a reused nonce.

This is not a theoretical attack. In 2013, a weak random number generator in the Android SecureRandom class caused many Bitcoin wallets to reuse ECDSA nonces. Attackers extracted private keys and stole Bitcoin. We explore this incident in depth in Case Study 1.

2.3.4 Python Implementation: Sign and Verify

from ecdsa import SECP256k1, SigningKey, BadSignatureError
import hashlib

# Generate a key pair
private_key = SigningKey.generate(curve=SECP256k1)
public_key = private_key.get_verifying_key()

# The message to sign (in blockchain, this would be a transaction)
message = "Send 1.5 BTC from Alice to Bob"
message_bytes = message.encode('utf-8')

# Sign the message
# The ecdsa library handles nonce generation securely (RFC 6979)
signature = private_key.sign(message_bytes, hashfunc=hashlib.sha256)
print(f"Message: {message}")
print(f"Signature: {signature.hex()}")
print(f"Signature length: {len(signature)} bytes")

# Verify the signature
try:
    public_key.verify(signature, message_bytes, hashfunc=hashlib.sha256)
    print("\nSignature is VALID - the message was signed by the holder of this key")
except BadSignatureError:
    print("\nSignature is INVALID")

# Tamper with the message and try to verify
tampered_message = "Send 100 BTC from Alice to Bob"
try:
    public_key.verify(signature, tampered_message.encode('utf-8'),
                      hashfunc=hashlib.sha256)
    print("Tampered signature is VALID (this should never happen!)")
except BadSignatureError:
    print("Tampered message REJECTED - integrity check failed")

# Try to verify with a different public key
wrong_key = SigningKey.generate(curve=SECP256k1).get_verifying_key()
try:
    wrong_key.verify(signature, message_bytes, hashfunc=hashlib.sha256)
    print("Wrong key verification PASSED (this should never happen!)")
except BadSignatureError:
    print("Wrong key REJECTED - authentication check failed")

Run this code. You will see that the signature verifies correctly with the original message and correct public key, but fails if either the message is tampered with or the wrong public key is used. This demonstrates both integrity and authentication.

💡 Key Insight: Notice that the ecdsa library uses RFC 6979 for nonce generation — a deterministic scheme that derives the nonce from the private key and message hash. This eliminates the possibility of nonce reuse entirely, addressing the catastrophic vulnerability we discussed. Always use a well-tested library, never roll your own nonce generation.

2.3.5 How Signatures Authorize Blockchain Transactions

In a blockchain transaction, the process works as follows:

Alice constructs a transaction: "Transfer 1.5 BTC from address abc123 to address def456."
Alice signs the transaction data with her private key, producing a signature.
Alice broadcasts the transaction and signature to the network.
Every node verifies the signature using Alice's public key (derived from her address).
If the signature is valid, the transaction is accepted. If not, it is rejected.

At no point does Alice reveal her private key. The network verifies that only the holder of the private key corresponding to address abc123 could have produced that signature. This is the mechanism of ownership and authorization in blockchain — pure mathematics, no trusted third party.

import json
from ecdsa import SECP256k1, SigningKey
import hashlib

def create_signed_transaction(private_key, sender, recipient, amount):
    """Create and sign a transaction — a preview of Chapter 6."""
    tx = {
        "sender": sender,
        "recipient": recipient,
        "amount": amount
    }
    tx_data = json.dumps(tx, sort_keys=True).encode('utf-8')
    signature = private_key.sign(tx_data, hashfunc=hashlib.sha256)
    return tx, signature

def verify_transaction(public_key, tx, signature):
    """Verify a signed transaction."""
    tx_data = json.dumps(tx, sort_keys=True).encode('utf-8')
    try:
        public_key.verify(signature, tx_data, hashfunc=hashlib.sha256)
        return True
    except:
        return False

# Alice creates and signs a transaction
alice_private = SigningKey.generate(curve=SECP256k1)
alice_public = alice_private.get_verifying_key()
alice_address = hashlib.sha256(alice_public.to_string()).hexdigest()[:40]

tx, sig = create_signed_transaction(
    alice_private,
    sender=alice_address,
    recipient="b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0",
    amount=1.5
)

print(f"Transaction: {json.dumps(tx, indent=2)}")
print(f"Signature valid: {verify_transaction(alice_public, tx, sig)}")

# Eve tries to forge a transaction from Alice
eve_private = SigningKey.generate(curve=SECP256k1)
_, forged_sig = create_signed_transaction(
    eve_private,
    sender=alice_address,  # Eve tries to spend Alice's coins
    recipient="eveveveveveveveveveveveveveveveveveveveveve",
    amount=100.0
)
print(f"Forged transaction valid: {verify_transaction(alice_public, tx, forged_sig)}")

2.4 Merkle Trees: Efficient Verification at Scale

2.4.1 The Problem: Verifying Membership in a Large Dataset

Suppose a block contains 2,000 transactions. You want to verify that a specific transaction is included in the block. The naive approach: download all 2,000 transactions, hash each one, and check if yours is among them. This is O(n) in both time and space.

For a full node that stores the entire blockchain, this is fine. But what about a mobile phone running a lightweight wallet? Downloading every transaction in every block is infeasible. We need a data structure that allows verification of membership in a large set using only O(log n) data.

Enter the Merkle tree.

2.4.2 Building a Merkle Tree Step by Step

A Merkle tree (named after Ralph Merkle, who patented the concept in 1979) is a binary tree where:

Each leaf node is the hash of a data element (e.g., a transaction).
Each internal node is the hash of its two children concatenated together.
The root (the single node at the top) is called the Merkle root.

Let us build one with four transactions:

Transactions: [TX_A, TX_B, TX_C, TX_D]

Step 1: Hash each transaction (leaf nodes)
H_A = SHA-256(TX_A)
H_B = SHA-256(TX_B)
H_C = SHA-256(TX_C)
H_D = SHA-256(TX_D)

Step 2: Hash pairs to create internal nodes
H_AB = SHA-256(H_A + H_B)
H_CD = SHA-256(H_C + H_D)

Step 3: Hash the pair to create the root
Merkle_Root = SHA-256(H_AB + H_CD)

Visually (reading top to bottom):

              Merkle Root
             /           \
          H_AB           H_CD
         /    \         /    \
       H_A   H_B     H_C   H_D
        |      |       |      |
      TX_A   TX_B   TX_C   TX_D

The Merkle root is a single 256-bit hash that represents all four transactions. If any transaction changes, the root changes.

2.4.3 Merkle Proofs: O(log n) Verification

Here is the power of Merkle trees. Suppose you want to prove that TX_C is included in the tree. You do not need all four transactions. You need:

TX_C itself (to compute H_C).
H_D (the sibling of H_C, needed to compute H_CD).
H_AB (the sibling of H_CD, needed to compute the root).

With these three pieces of data, you can recompute the Merkle root and check whether it matches the one stored in the block header:

H_C = SHA-256(TX_C)                    # You compute this
H_CD = SHA-256(H_C + H_D)              # You compute this using the provided H_D
Root = SHA-256(H_AB + H_CD)            # You compute this using the provided H_AB
Does Root match the block header's Merkle root?  # YES → TX_C is in the block

For n transactions, the proof requires only log2(n) hashes. For a block with 2,000 transactions, that is about 11 hashes instead of 2,000.

📊 By the Numbers: A Bitcoin block can contain up to about 2,700 transactions (with SegWit). A Merkle proof for any single transaction requires at most ceil(log2(2700)) = 12 hashes, which is 12 x 32 bytes = 384 bytes. Compare this to downloading all 2,700 transactions (hundreds of kilobytes to megabytes). This is why light wallets can verify transactions without downloading the full blockchain.

2.4.4 Python Implementation: Full Merkle Tree

Let us build a complete Merkle tree with proof generation and verification:

import hashlib
from typing import List, Optional, Tuple

def sha256(data: str) -> str:
    """Compute SHA-256 hash of a string."""
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

class MerkleTree:
    def __init__(self, data_list: List[str]):
        """Build a Merkle tree from a list of data strings."""
        if not data_list:
            raise ValueError("Cannot create Merkle tree from empty list")

        # Hash all leaf data
        self.leaves = [sha256(item) for item in data_list]
        self.data_list = data_list

        # Build the tree bottom-up
        self.tree = self._build_tree(self.leaves)
        self.root = self.tree[-1][0]  # The root is the single element at the top

    def _build_tree(self, leaves: List[str]) -> List[List[str]]:
        """Build tree levels from leaves to root."""
        tree = [leaves[:]]  # First level is the leaf hashes

        current_level = leaves[:]
        while len(current_level) > 1:
            next_level = []
            # If odd number of nodes, duplicate the last one
            if len(current_level) % 2 == 1:
                current_level.append(current_level[-1])

            for i in range(0, len(current_level), 2):
                combined = current_level[i] + current_level[i + 1]
                parent_hash = sha256(combined)
                next_level.append(parent_hash)

            tree.append(next_level)
            current_level = next_level

        return tree

    def get_proof(self, index: int) -> List[Tuple[str, str]]:
        """
        Generate a Merkle proof for the element at the given index.
        Returns a list of (hash, direction) tuples where direction is
        'left' or 'right', indicating the sibling's position.
        """
        if index < 0 or index >= len(self.leaves):
            raise IndexError(f"Index {index} out of range")

        proof = []
        idx = index

        for level in range(len(self.tree) - 1):  # Exclude root level
            current_level = self.tree[level]

            # Handle odd-length levels (last element duplicated)
            if idx == len(current_level) - 1 and len(current_level) % 2 == 1:
                sibling_idx = idx  # Duplicate of itself
            elif idx % 2 == 0:
                sibling_idx = idx + 1
            else:
                sibling_idx = idx - 1

            # Determine if sibling is to the left or right
            if sibling_idx < idx:
                proof.append((current_level[sibling_idx], 'left'))
            else:
                proof.append((current_level[sibling_idx], 'right'))

            idx = idx // 2  # Move to parent index

        return proof

    @staticmethod
    def verify_proof(data: str, proof: List[Tuple[str, str]],
                     root: str) -> bool:
        """Verify a Merkle proof for a given piece of data."""
        current_hash = sha256(data)

        for sibling_hash, direction in proof:
            if direction == 'left':
                combined = sibling_hash + current_hash
            else:
                combined = current_hash + sibling_hash
            current_hash = sha256(combined)

        return current_hash == root

# Demonstration
transactions = [
    "Alice sends 1 BTC to Bob",
    "Bob sends 0.5 BTC to Charlie",
    "Charlie sends 0.3 BTC to Dave",
    "Dave sends 0.1 BTC to Eve",
    "Eve sends 2 BTC to Frank",
    "Frank sends 0.7 BTC to Grace",
    "Grace sends 1.2 BTC to Henry",
    "Henry sends 0.4 BTC to Ivy"
]

tree = MerkleTree(transactions)
print(f"Merkle root: {tree.root}")
print(f"Number of transactions: {len(transactions)}")
print(f"Tree levels: {len(tree.tree)}")

# Generate and verify a proof for the 3rd transaction
proof = tree.get_proof(2)
print(f"\nProof for transaction '{transactions[2]}':")
for i, (h, d) in enumerate(proof):
    print(f"  Level {i}: {h[:16]}... ({d})")

is_valid = MerkleTree.verify_proof(transactions[2], proof, tree.root)
print(f"\nProof valid: {is_valid}")

# Try to verify a fake transaction
is_valid_fake = MerkleTree.verify_proof(
    "Charlie sends 999 BTC to Charlie",
    proof, tree.root
)
print(f"Fake transaction proof valid: {is_valid_fake}")

This implementation demonstrates every operation a blockchain node needs: constructing the tree from a set of transactions, generating a proof for any specific transaction, and verifying a proof against the Merkle root.

2.4.5 Handling Odd Numbers of Leaves

What happens when the number of transactions is odd? Bitcoin's approach is to duplicate the last leaf hash and use it as the missing sibling. Our implementation above handles this case:

# Odd number of transactions
odd_transactions = ["TX_A", "TX_B", "TX_C", "TX_D", "TX_E"]
odd_tree = MerkleTree(odd_transactions)
print(f"5 transactions -> Merkle root: {odd_tree.root[:16]}...")
print(f"Tree structure: {[len(level) for level in odd_tree.tree]}")
# Output: [5, 3, 2, 1] — note how 5 becomes 3 (E is paired with itself)

⚠️ Common Misconception: "Duplicating the last leaf is a hack." It is a design choice with trade-offs. Some implementations (including Ethereum's Merkle Patricia Trie) handle this differently. Bitcoin's approach is simple and well-understood, but it does create a subtle vulnerability where a tree with [A, B, C, C] has the same root as a tree with [A, B, C]. Bitcoin addresses this at the protocol level.

2.4.6 Why Bitcoin and Ethereum Use Merkle Trees

The Merkle root goes into the block header. This has three powerful consequences:

Compact block summaries: The block header is 80 bytes (in Bitcoin), regardless of how many transactions the block contains. The Merkle root is the commitment to all transactions.
Efficient verification (SPV): A light client can verify that a transaction is in a block by requesting a Merkle proof from a full node. The proof is small (a few hundred bytes) and can be verified using only the block header.
Tamper evidence: If any transaction in the block is modified, the Merkle root changes, which changes the block header hash, which breaks the chain link to the next block. A single bit change anywhere propagates to the root.

🔗 Connection: In Chapter 4, when we study the Bitcoin protocol, we will see how SPV clients use Merkle proofs to verify transactions with minimal bandwidth and storage. In Chapter 7, we will see how Ethereum extends the Merkle tree concept into Merkle Patricia Tries for state verification.

2.5 Putting It All Together: How These Primitives Combine in a Blockchain

We have now built four cryptographic tools. Let us see how they work together in a blockchain system:

2.5.1 The Anatomy of a Block (Cryptographic View)

Block Header:
+----------------------------------+
| Previous Block Hash (SHA-256)    |  <-- Hash function (chain linking)
| Merkle Root (SHA-256)            |  <-- Merkle tree (transaction commitment)
| Timestamp                        |
| Difficulty Target                |
| Nonce                            |  <-- Hash function (proof-of-work)
+----------------------------------+

Block Body:
+----------------------------------+
| Transaction 1                    |
|   - Signed by sender (ECDSA)    |  <-- Digital signature (authorization)
| Transaction 2                    |
|   - Signed by sender (ECDSA)    |
| ...                              |
| Transaction n                    |
|   - Signed by sender (ECDSA)    |
+----------------------------------+

Every primitive serves a specific role:

SHA-256 hashing creates the chain: each block's header includes the hash of the previous block's header. Changing any block invalidates all subsequent blocks.
ECDSA digital signatures authorize transactions: each transaction includes a signature proving the sender owns the coins being spent. Nodes verify these signatures before accepting transactions.
Merkle trees summarize transactions: all transactions in a block are organized into a Merkle tree, and the root hash is stored in the block header. This enables efficient verification.
SHA-256 again secures the chain through proof-of-work: miners search for a nonce that, combined with the rest of the block header, produces a hash below the difficulty target.

2.5.2 The Security Guarantees: What Exactly Is Protected

Before tracing a transaction, let us be precise about what each primitive guarantees:

Hash functions guarantee immutability. If block 500 is linked to block 499 by including block 499's hash in block 500's header, then any modification to block 499 changes its hash, which invalidates block 500's reference, which changes block 500's hash, which invalidates block 501's reference, and so on. To tamper with block 499, an attacker must recompute every subsequent block — and do it faster than the rest of the network adds new blocks. This is the core security guarantee of proof-of-work, and it relies entirely on the preimage resistance of SHA-256.

Digital signatures guarantee authorization. A transaction spending coins from address X is valid only if it carries a signature that verifies against the public key corresponding to address X. No private key, no signature. No signature, no spending. This guarantee holds even if every other node on the network is malicious — the math does not care about trust.

Merkle trees guarantee verifiability. A light client can confirm that a specific transaction is included in a specific block by verifying a Merkle proof against the block header's Merkle root. This proof is compact (O(log n) hashes) and self-contained — the client does not need to trust the full node providing the proof, because the proof is cryptographically verifiable.

Together, these three guarantees create a system where: nobody can alter history (hashing), nobody can spend coins they do not own (signatures), and anybody can verify any transaction efficiently (Merkle trees). No single guarantee is sufficient by itself. All three are necessary.

2.5.3 A Complete Flow: Sending a Transaction

Let us trace a transaction from creation to confirmation:

Alice decides to send 1 BTC to Bob. She constructs a transaction referencing her unspent output, Bob's address, and the amount.
Alice signs the transaction using her private key and ECDSA. The signature is attached to the transaction.
Alice broadcasts the transaction to the network. Nodes receive it and verify the signature using Alice's public key. If valid, the transaction enters the mempool (the queue of unconfirmed transactions).
A miner selects transactions from the mempool and organizes them into a Merkle tree. The Merkle root goes into the block header.
The miner performs proof-of-work, repeatedly changing the nonce and hashing the block header until the hash is below the difficulty target.
The miner broadcasts the new block. Other nodes verify: (a) each transaction signature is valid, (b) the Merkle root matches the transactions, (c) the block hash meets the difficulty target, (d) the previous block hash matches.
The block is accepted and Alice's transaction is confirmed. Bob can now reference this output in future transactions.

Every step involves cryptographic primitives we implemented in this chapter. Nothing is taken on faith.

2.5.4 Preview: Building the Full Blockchain in Chapter 6

In Chapter 6, we will combine all these primitives into a working blockchain implementation. The code will look like this (a preview, not the final version):

# Preview — the full implementation comes in Chapter 6
class Block:
    def __init__(self, transactions, previous_hash):
        self.transactions = transactions
        self.previous_hash = previous_hash
        self.merkle_root = MerkleTree(transactions).root  # Chapter 2
        self.nonce = 0
        self.hash = self.compute_hash()

    def compute_hash(self):
        # SHA-256 of block header — Chapter 2
        header = f"{self.previous_hash}{self.merkle_root}{self.nonce}"
        return hashlib.sha256(header.encode()).hexdigest()

class Transaction:
    def __init__(self, sender_public_key, recipient, amount, signature):
        self.sender = sender_public_key
        self.recipient = recipient
        self.amount = amount
        self.signature = signature  # ECDSA signature — Chapter 2

    def verify(self):
        # Verify ECDSA signature — Chapter 2
        pass

The cryptographic primitives from this chapter are the molecular building blocks. Chapters 3-5 add the remaining components (consensus, networking, state management), and Chapter 6 assembles the complete system.

2.6 Advanced Sidebar: Post-Quantum Cryptography and Blockchain

2.6.1 The Quantum Threat

Everything we have built in this chapter relies on the computational difficulty of two mathematical problems:

ECDLP (for digital signatures): given points P and G, find k such that P = k * G.
Preimage resistance (for hash functions): given h, find m such that SHA-256(m) = h.

In 1994, Peter Shor published a quantum algorithm that solves both the integer factorization problem (breaking RSA) and the discrete logarithm problem (breaking ECDSA) in polynomial time. A sufficiently large quantum computer running Shor's algorithm would break every digital signature on every blockchain.

📊 By the Numbers: Breaking the secp256k1 curve used by Bitcoin and Ethereum would require a quantum computer with approximately 2,500 logical qubits. As of 2025, the largest quantum computers have around 1,000-1,200 physical qubits, and each logical qubit requires thousands of physical qubits for error correction. The timeline for a cryptographically relevant quantum computer is debated, but most experts estimate 10-30 years.

2.6.2 The Hash Function Question

Shor's algorithm does not threaten hash functions directly. However, Grover's algorithm (1996) provides a quadratic speedup for brute-force search, effectively halving the security level of hash functions. SHA-256 would go from 256-bit security to 128-bit security against a quantum attacker — which is still considered sufficient. This is one reason SHA-256 was chosen over shorter hash functions.

2.6.3 Post-Quantum Signature Schemes

The cryptographic community has been developing post-quantum signature schemes that resist quantum attacks. In 2024, NIST finalized standards for three post-quantum algorithms:

ML-DSA (CRYSTALS-Dilithium): Based on lattice problems. Fast signing and verification, moderate signature sizes.
SLH-DSA (SPHINCS+): Based on hash functions. Very conservative security assumptions but larger signatures.
FN-DSA (FALCON): Based on lattice problems. Smaller signatures but more complex implementation.

Several blockchain projects are researching or implementing post-quantum transitions:

Ethereum has a long-term research track for post-quantum signatures.
QRL (Quantum Resistant Ledger) was designed from the ground up with post-quantum signatures (XMSS, a hash-based scheme).
IOTA has implemented post-quantum capabilities.

The transition will not be instant. It requires changing address formats, transaction structures, and signature verification across the entire network. But the blockchain community has time — the quantum threat is not immediate, and the solutions are maturing.

💡 Key Insight: The cryptographic agility of a blockchain — its ability to upgrade cryptographic primitives — is an underappreciated design consideration. Blockchains that hard-code specific algorithms will have a harder time transitioning than those with modular cryptographic layers. When evaluating blockchain projects, ask: "What is their post-quantum migration plan?"

2.7 A Warning: Never Roll Your Own Cryptography

This chapter has shown you the inner workings of SHA-256, ECDSA, and Merkle trees. You have implemented them in Python. You understand how they work. And now we must deliver the most important practical lesson of this chapter: never use your own implementations in production.

Cryptographic security depends not just on the mathematics being correct, but on the implementation handling every edge case, every timing vulnerability, every memory allocation, and every error condition correctly. A single bug can be catastrophic, as the Android nonce reuse vulnerability demonstrated.

Here are three categories of implementation vulnerabilities that no amount of mathematical understanding protects you from:

Timing attacks. If your signature verification takes a different amount of time depending on where a forged signature first differs from the expected value, an attacker can measure the timing and deduce the signature byte by byte. Production cryptographic libraries use constant-time comparison functions that take exactly the same amount of time regardless of the input. Our Python implementations do not.

Side-channel attacks. The power consumption, electromagnetic emissions, and even acoustic noise produced by a computer during cryptographic operations can leak information about the private key. Hardware security modules (HSMs) and hardened software libraries include countermeasures against these attacks. Our implementations include none.

Memory safety. Private keys should be stored in protected memory, zeroed after use, and never swapped to disk. Python's garbage collector makes these guarantees effectively impossible. Production systems use libraries written in C or Rust with explicit memory management.

For blockchain development, use these libraries: - libsecp256k1 (C): Bitcoin Core's ECDSA library. Highly optimized, constant-time, audited. - OpenSSL (C): The standard TLS/SSL library. SHA-256 and many other primitives. - python-ecdsa (Python): Suitable for learning and prototyping (as we use it here), but not for production wallets. - cryptography (Python): A more robust Python library that wraps OpenSSL.

The code in this chapter is for understanding. The code in your wallet should come from battle-tested, peer-reviewed, professionally audited libraries.

2.8 Chapter Summary

This chapter built the cryptographic toolkit that underlies all blockchain technology:

Hash Functions (SHA-256) - A hash function maps arbitrary input to fixed-size output (256 bits for SHA-256). - Three critical properties: preimage resistance (one-way), second preimage resistance, and collision resistance. - The avalanche effect ensures that small input changes produce large, unpredictable output changes. - In blockchain: block linking, proof-of-work, transaction IDs, and Merkle roots.

Public-Key Cryptography (Elliptic Curves) - Asymmetric cryptography uses a key pair: private key (secret) and public key (shared). - Blockchain uses the secp256k1 elliptic curve because it offers 128-bit security with small keys. - Your private key is a random 256-bit number. Your public key is derived via point multiplication. Your address is a hash of your public key. - The security rests on the Elliptic Curve Discrete Logarithm Problem (ECDLP).

Digital Signatures (ECDSA) - Signatures provide authentication, integrity, and non-repudiation. - ECDSA signs a message hash using the private key and a random nonce. - Verification uses only the public key — the private key is never revealed. - Nonce reuse is catastrophic: it allows private key extraction. - In blockchain: transaction authorization. You prove ownership without revealing secrets.

Merkle Trees - A binary tree where each leaf is a data hash and each internal node is the hash of its children. - The Merkle root summarizes all data in a single hash. - Merkle proofs enable O(log n) verification of membership. - In blockchain: efficient transaction verification, enabling lightweight clients.

The Big Picture - These four primitives combine to create the trustless, decentralized system described in Chapter 1. - Hashing creates immutable links. Signatures authorize transactions. Merkle trees enable efficient verification. - Understanding these primitives is not optional — they are the mechanism, not the metaphor.

Bridge to Chapter 3

In Chapter 3, we move from cryptographic primitives to the consensus problem: how do thousands of nodes, who do not trust each other, agree on a single version of the truth? We will study the Byzantine Generals Problem, understand why it was considered unsolvable for decades, and see how Proof of Work provides a probabilistic solution. The hash functions and digital signatures from this chapter are prerequisites — consensus mechanisms use them extensively.

Key Vocabulary Quick Reference

Term	Definition
Hash function	A function that maps input of any size to a fixed-size output
SHA-256	A 256-bit cryptographic hash function used by Bitcoin
Preimage resistance	Given a hash, it is infeasible to find the input
Collision resistance	It is infeasible to find two inputs with the same hash
Avalanche effect	Small input changes cause large, unpredictable output changes
Public-key cryptography	Encryption using a key pair (public + private)
Private key	A secret number used to sign transactions
Public key	A number derived from the private key, shared publicly
Elliptic curve	A mathematical curve used for efficient public-key cryptography
secp256k1	The specific elliptic curve used by Bitcoin and Ethereum
ECDSA	Elliptic Curve Digital Signature Algorithm
Digital signature	A mathematical proof of message authenticity
Nonce	A number used once — critical for ECDSA security
Trapdoor function	Easy to compute forward, infeasible to reverse without a secret
Merkle tree	A binary hash tree for efficient data verification
Merkle root	The single hash at the top of a Merkle tree
Merkle proof	A set of hashes proving membership in a Merkle tree
Birthday paradox	The probability of collisions is higher than intuition suggests