When you interact with a decentralized exchange, mint an NFT, or deposit funds into a lending protocol, the experience feels like using any other web application. Buttons get clicked. Transactions confirm. Balances update. But beneath that familiar...
Learning Objectives
- Trace EVM execution of simple bytecode sequences through the stack, memory, and storage
- Explain the cost difference between memory operations, storage operations, and computation and why SSTORE is the most expensive opcode
- Decode compiled Solidity bytecode back to human-readable opcodes
- Describe the ABI encoding scheme and how external calls to smart contracts are formatted
- Identify the EVM's security boundaries and explain how the sandbox model protects the network
In This Chapter
- 12.1 The Machine Under the Machine
- 12.2 EVM Architecture Overview
- 12.3 The Stack, Memory, and Storage
- 12.4 EVM Opcodes: The Instruction Set
- 12.5 Gas Costs: Why Some Operations Are Expensive
- 12.6 From Solidity to Bytecode
- 12.7 Contract Deployment
- 12.8 Contract Interaction
- 12.9 ABI Encoding
- 12.10 The EVM as a Sandbox
- 12.11 EVM Alternatives and Competitors
- 12.12 Summary and Bridge to Chapter 13
- Key Terms Glossary
Chapter 12: The Ethereum Virtual Machine: How Smart Contracts Execute
12.1 The Machine Under the Machine
When you interact with a decentralized exchange, mint an NFT, or deposit funds into a lending protocol, the experience feels like using any other web application. Buttons get clicked. Transactions confirm. Balances update. But beneath that familiar surface, something profoundly different is happening. Your transaction is not being processed by a server in a data center. It is being executed, independently, by tens of thousands of computers scattered across the planet — and every single one of them must arrive at the exact same result.
The mechanism that makes this possible is the Ethereum Virtual Machine, or EVM. It is, without exaggeration, the most widely deployed virtual machine in the history of computing by node count. Every Ethereum node runs its own copy of the EVM. Every smart contract ever deployed on Ethereum — from the simplest token to the most complex DeFi protocol — is just a sequence of EVM bytecode instructions stored on-chain. When a transaction triggers a contract, every validating node independently loads that bytecode, executes it against the current state, and agrees on the outcome.
This chapter takes you inside the machine. We will examine the EVM at the level of individual opcodes — the primitive instructions that the processor executes one by one. We will trace how Solidity code compiles down to bytecode, how that bytecode runs on a stack-based architecture, how the gas metering system prices each operation, and how contracts deploy and communicate with one another. By the end, you will be able to read raw bytecode, understand why certain operations are expensive, and appreciate both the power and the deliberate limitations of the EVM's sandboxed execution model.
This is the most technical chapter so far. It is also one of the most important. Every vulnerability we will study in later chapters — reentrancy, integer overflow, storage collision, gas griefing — originates at the EVM level. Understanding the machine is not optional for anyone who wants to write, audit, or reason about smart contracts. The abstractions Solidity provides are useful, but they are leaky. The EVM is what actually runs.
💡 Why This Matters: You do not need to memorize every opcode. But you need to understand how the EVM's stack, memory, and storage work, how gas metering shapes economic incentives, and how the bytecode execution model creates both the security guarantees and the attack surfaces that define smart contract development.
12.2 EVM Architecture Overview
12.2.1 A Stack-Based Virtual Machine
The EVM is a stack-based virtual machine. This places it in the same architectural family as the Java Virtual Machine (JVM), the .NET Common Language Runtime (CLR), and Python's CPython bytecode interpreter — but with critical differences that reflect its unique requirements as a decentralized computation engine.
In a stack machine, most operations take their operands from, and push their results onto, a stack — a last-in, first-out (LIFO) data structure. There are no general-purpose registers as you would find in x86 or ARM processors. If you want to add two numbers, you push both onto the stack, execute the ADD opcode, and the result replaces the two operands at the top of the stack.
Why a stack architecture instead of a register architecture? Several reasons converge:
-
Simplicity of implementation. Stack machines require no register allocation algorithm. Every node implementer — whether writing a client in Go, Rust, Java, or Python — can implement the same behavior without worrying about register mapping differences.
-
Determinism. The stack model eliminates an entire class of ambiguity. There is exactly one stack. Operations affect it in precisely defined ways. Two implementations of the EVM, given the same bytecode and the same initial state, must produce identical stack states after every instruction.
-
Compact bytecode. Stack operations need no operand fields specifying register numbers. This keeps bytecode small, which matters when every byte of contract code is stored permanently on-chain and paid for in gas.
-
Security analysis. The sequential, deterministic nature of stack operations makes formal verification and static analysis more tractable than for register-based architectures with complex instruction pipelines.
12.2.2 The 256-Bit Word Size
Every item on the EVM stack is exactly 256 bits (32 bytes) wide. This is an unusual choice — most processors use 32-bit or 64-bit words — and it was a deliberate decision by Ethereum's designers.
The primary reason is cryptographic alignment. The Keccak-256 hash function, which Ethereum uses pervasively for address derivation, Merkle trees, and ABI encoding, produces 256-bit outputs. The secp256k1 elliptic curve used for digital signatures operates on 256-bit scalars. By making the native word size 256 bits, the EVM can handle hashes, addresses (160-bit, zero-extended to 256-bit), and large integers in single stack operations without requiring multi-word arithmetic libraries.
This design choice has consequences. Arithmetic on small numbers — adding two uint8 values, for instance — still operates on full 256-bit words internally. The EVM has no native support for smaller data types at the opcode level. Solidity's uint8, uint16, and other sub-256-bit types are implemented through masking and shifting at the compiler level, which adds a small gas overhead. This is why Solidity developers sometimes use uint256 even when smaller types would suffice logically.
12.2.3 Harvard Architecture: Separate Code and Data
The EVM follows a modified Harvard architecture, meaning the code being executed and the data being operated on are stored in separate address spaces. The contract's bytecode is immutable once deployed — there is no self-modifying code. The EVM reads instructions from the code segment and operates on data in three distinct regions: the stack, memory, and storage.
This separation is another security feature. A smart contract cannot rewrite its own code. It cannot jump to an arbitrary memory address and begin executing data as instructions. The program counter can only advance through the code segment, and JUMP/JUMPI instructions can only target bytes that are marked as JUMPDEST opcodes. This eliminates an entire category of exploits that plague conventional software — buffer overflow attacks that redirect execution to attacker-controlled data simply do not work in the EVM.
12.2.4 The Execution Context
When the EVM begins executing a transaction that calls a smart contract, it creates an execution context (sometimes called a "machine state") containing:
| Component | Description |
|---|---|
| Program Counter (PC) | Points to the current opcode in the bytecode |
| Stack | Up to 1,024 items, each 256 bits |
| Memory | Byte-addressable, expandable, initialized to zero |
| Storage | 256-bit key to 256-bit value mapping, persistent |
| Gas Available | Remaining gas for this execution |
| Calldata | The input data sent with the transaction |
| Call Value | The amount of ETH sent with the call |
| Caller | The address that initiated this call |
| Code | The bytecode being executed |
Each of these components plays a specific role in execution, and their interactions define what smart contracts can and cannot do.
12.3 The Stack, Memory, and Storage
The EVM's three data locations — stack, memory, and storage — form a hierarchy that is fundamental to understanding both contract behavior and gas costs. They differ in lifetime, capacity, and price, and choosing the wrong one is a common source of bugs and inefficiency.
12.3.1 The Stack
The stack is the EVM's primary workspace. It holds up to 1,024 items, each 256 bits wide. Most opcodes operate exclusively on the stack: they pop operands from the top, perform computation, and push results back.
The stack is free in the sense that pushing and popping values costs only the gas of the opcode performing the operation. An ADD instruction costs 3 gas regardless of the values being added — the stack manipulation is included in the opcode's price.
However, the stack has important limitations:
- Maximum depth of 1,024. If execution pushes a 1,025th item, the transaction reverts. This is relevant when considering deeply nested function calls.
- Only the top 16 items are directly accessible. The DUP1 through DUP16 opcodes can duplicate items at positions 1-16 from the top. SWAP1 through SWAP16 can swap the top item with items at positions 2-17. Reaching deeper items requires shuffling.
- The stack does not persist. Once execution ends, the stack vanishes. It exists only during the current call frame.
The 16-item access limit is one of the most common frustrations Solidity developers encounter. When a function has many local variables, the compiler must juggle them on a stack where only 16 positions are reachable. The dreaded "Stack too deep" error occurs when the compiler cannot arrange variables to fit within this constraint.
12.3.2 Memory
Memory is a byte-addressable, linear, expandable array initialized to all zeros at the start of each call. It is used for intermediate data that does not fit on the stack — function arguments being prepared for an external call, return data, hash inputs, and dynamic types like strings and arrays.
Memory is volatile: like the stack, it disappears when the current call ends. If Contract A calls Contract B, Contract B gets its own fresh memory space. When Contract B returns, its memory is discarded.
The gas cost of memory follows a critical rule: memory expansion cost is quadratic. Specifically, the cost to expand memory to n words (32-byte chunks) is:
memory_cost = 3 * n + floor(n^2 / 512)
For small amounts of memory, the cost is approximately linear (3 gas per word). But as memory grows, the quadratic term dominates. Expanding memory to 1 KB costs roughly 100 gas. Expanding to 1 MB costs approximately 3 million gas. Expanding to 10 MB would cost billions of gas — far more than any block's gas limit allows.
This quadratic pricing is deliberate. It allows contracts to use small amounts of memory cheaply while making it economically impossible to consume the gigabytes of RAM that could slow down or crash nodes. It is a defense against denial-of-service attacks masquerading as legitimate transactions.
Key memory opcodes:
| Opcode | Gas | Description |
|---|---|---|
| MLOAD | 3* | Load 32 bytes from memory at a given offset |
| MSTORE | 3* | Store 32 bytes to memory at a given offset |
| MSTORE8 | 3* | Store 1 byte to memory |
| MSIZE | 2 | Return the size of active memory in bytes |
*Plus any memory expansion cost if accessing beyond current memory size.
12.3.3 Storage
Storage is the EVM's persistent data layer. It is a mapping from 256-bit keys to 256-bit values, and it is the only data location that survives after a transaction completes. When you set a variable in a Solidity contract and that variable retains its value between transactions, it lives in storage.
Storage is also, by far, the most expensive data location:
| Operation | Gas Cost | Context |
|---|---|---|
| SSTORE (new slot, zero to non-zero) | 20,000 | Writing to a storage slot for the first time |
| SSTORE (update, non-zero to non-zero) | 5,000 | Modifying an existing value |
| SSTORE (clear, non-zero to zero) | 5,000 + 4,800 refund | Clearing a slot (net cost ~200 after refund) |
| SLOAD (cold) | 2,100 | First read of a slot in a transaction |
| SLOAD (warm) | 100 | Subsequent reads of the same slot |
Compare these to ADD (3 gas) or MLOAD (3 gas). Writing a single new storage slot costs nearly 7,000 times more than adding two numbers. This is not arbitrary — it reflects the real cost to the network:
-
Storage is permanent. Every byte written to storage must be maintained by every full node, forever (or until explicitly deleted). A single SSTORE operation adds data that tens of thousands of nodes must store on disk indefinitely.
-
Storage requires disk I/O. The stack and memory live in RAM during execution. Storage requires reading from and writing to a persistent database (typically LevelDB or similar). Disk operations are orders of magnitude slower than memory operations.
-
Storage affects the state trie. Every storage modification updates the contract's storage trie, which in turn updates the account state trie, which in turn updates the world state root in the block header. This cascade of Merkle proof updates is computationally expensive.
The cold/warm distinction, introduced in EIP-2929 (Berlin upgrade, April 2021), reflects the reality that the first access to a storage slot requires a disk read, while subsequent accesses within the same transaction can be served from a cache. Before this change, SLOAD always cost 800 gas — which was too cheap for the first access and too expensive for subsequent ones.
📊 By the Numbers: A typical ERC-20 token transfer modifies two storage slots (sender balance and receiver balance) and emits one event. At 5,000 gas per SSTORE (modifying existing values) plus 2,100 gas for the cold SLOADs, the storage operations alone consume roughly 14,200 gas — about 27% of a standard token transfer's ~52,000 total gas cost.
12.3.4 A Comparison
| Property | Stack | Memory | Storage |
|---|---|---|---|
| Persistence | Call frame only | Call frame only | Permanent |
| Capacity | 1,024 items | ~limited by gas | 2^256 slots (theoretical) |
| Access pattern | Top 16 items | Random access (by offset) | Random access (by key) |
| Gas cost | Included in opcodes | 3 per word + quadratic expansion | 2,100–20,000 per operation |
| Typical use | Computation | Intermediate data, ABI encoding | State variables |
Understanding this hierarchy is not academic — it is the single most important factor in gas optimization. Smart contract developers who minimize storage writes and maximize stack/memory usage produce dramatically cheaper contracts.
12.4 EVM Opcodes: The Instruction Set
The EVM's instruction set consists of approximately 140 opcodes, each identified by a single byte (values 0x00 through 0xFF, with many slots unused or reserved). This section categorizes the most important opcodes and explains their roles.
12.4.1 Arithmetic and Comparison
These opcodes perform integer arithmetic on 256-bit unsigned integers. All arithmetic is modular — results wrap around on overflow (there is no hardware exception for overflow, which is why Solidity's SafeMath library and, later, built-in overflow checks were critical).
| Opcode | Hex | Gas | Description |
|---|---|---|---|
| ADD | 0x01 | 3 | Addition |
| MUL | 0x02 | 5 | Multiplication |
| SUB | 0x03 | 3 | Subtraction |
| DIV | 0x04 | 5 | Integer division |
| MOD | 0x06 | 5 | Modulo |
| ADDMOD | 0x08 | 8 | (a + b) mod c in one step |
| MULMOD | 0x09 | 8 | (a * b) mod c in one step |
| EXP | 0x0A | 10* | Exponentiation (*plus 50 per byte of exponent) |
| LT | 0x10 | 3 | Less than |
| GT | 0x11 | 3 | Greater than |
| EQ | 0x14 | 3 | Equality |
| ISZERO | 0x15 | 3 | Check if top of stack is zero |
Notice the gas costs. Basic arithmetic is cheap — 3 to 5 gas per operation. EXP is notably more expensive because its cost scales with the size of the exponent, reflecting the actual computation required for large-exponent modular exponentiation.
ADDMOD and MULMOD exist specifically for cryptographic operations, where computing (a * b) mod p on 256-bit numbers risks intermediate overflow. These opcodes perform the modular reduction internally, avoiding the need for 512-bit intermediate results.
12.4.2 Bitwise and Shift Operations
| Opcode | Gas | Description |
|---|---|---|
| AND | 3 | Bitwise AND |
| OR | 3 | Bitwise OR |
| XOR | 3 | Bitwise XOR |
| NOT | 3 | Bitwise NOT |
| SHL | 3 | Shift left (added in Constantinople) |
| SHR | 3 | Shift right (added in Constantinople) |
| SAR | 3 | Arithmetic shift right (signed) |
The shift operations (SHL, SHR, SAR) were added in the Constantinople upgrade (EIP-145, February 2019). Before their introduction, bit shifting required combining MUL/DIV with powers of 2 and EXP, which was significantly more expensive. Their addition was one of many examples of the EVM instruction set evolving to improve efficiency.
12.4.3 Cryptographic Operations
| Opcode | Hex | Gas | Description |
|---|---|---|---|
| SHA3 | 0x20 | 30 + 6/word | Keccak-256 hash |
The SHA3 opcode is misnamed — it actually computes Keccak-256, not the NIST SHA-3 standard (which uses different padding). The naming reflects the fact that Ethereum was designed before NIST finalized the SHA-3 standard. The distinction is important: Keccak-256 and SHA-3-256 produce different outputs for the same input.
The gas cost scales with the size of the input being hashed: 30 gas base plus 6 gas for each 32-byte word. Hashing 32 bytes costs 36 gas. Hashing 1 KB costs about 222 gas.
12.4.4 Environment and Block Information
These opcodes allow contracts to inspect their execution environment:
| Opcode | Gas | Description |
|---|---|---|
| ADDRESS | 2 | Address of the currently executing contract |
| BALANCE | 2,600 (cold) / 100 (warm) | ETH balance of an address |
| ORIGIN | 2 | Address of the original transaction sender (tx.origin) |
| CALLER | 2 | Address of the immediate caller (msg.sender) |
| CALLVALUE | 2 | ETH sent with this call (msg.value) |
| CALLDATALOAD | 3 | Load 32 bytes of input data |
| CALLDATASIZE | 2 | Size of input data in bytes |
| GASPRICE | 2 | Gas price of the transaction |
| BLOCKHASH | 20 | Hash of a recent block (last 256 blocks only) |
| TIMESTAMP | 2 | Current block timestamp |
| NUMBER | 2 | Current block number |
| CHAINID | 2 | Chain ID (EIP-1344, Istanbul) |
Two opcodes deserve special attention:
ORIGIN vs. CALLER. ORIGIN (Solidity's tx.origin) returns the externally owned account (EOA) that initiated the entire transaction chain. CALLER (Solidity's msg.sender) returns the address that directly called the current contract. In a chain A -> B -> C, contract C sees CALLER = B and ORIGIN = A. Using ORIGIN for authentication is a well-known vulnerability — it enables phishing attacks where a malicious contract tricks a user into calling it, then calls the target contract with the user's tx.origin.
BLOCKHASH limitations. BLOCKHASH can only access the hashes of the most recent 256 blocks. Attempting to access older blocks returns zero. This makes BLOCKHASH unsuitable as a source of randomness — miners can manipulate it, and the 256-block window creates predictability issues. The PREVRANDAO opcode (introduced in the Merge, replacing DIFFICULTY) provides randomness from the beacon chain's RANDAO, though it is still subject to single-slot lookahead by validators.
12.4.5 Stack Manipulation
| Opcode | Gas | Description |
|---|---|---|
| POP | 2 | Remove top item |
| PUSH1-PUSH32 | 3 | Push 1-32 bytes onto stack |
| DUP1-DUP16 | 3 | Duplicate item at position 1-16 |
| SWAP1-SWAP16 | 3 | Swap top item with position 2-17 |
The PUSH opcodes are the only way to introduce literal values into the EVM. PUSH1 pushes a single byte (0x00-0xFF). PUSH32 pushes a full 32-byte (256-bit) value. When you see a number in a Solidity contract — like uint256 x = 42 — the compiled bytecode contains a PUSH1 0x2A (42 in hex) instruction.
12.4.6 Memory, Storage, and Flow Control
| Opcode | Gas | Description |
|---|---|---|
| MLOAD | 3* | Load from memory |
| MSTORE | 3* | Store to memory |
| SLOAD | 2,100 (cold) / 100 (warm) | Load from storage |
| SSTORE | 20,000 / 5,000 / various | Store to storage |
| JUMP | 8 | Unconditional jump |
| JUMPI | 10 | Conditional jump |
| JUMPDEST | 1 | Mark valid jump destination |
| PC | 2 | Program counter value |
JUMP and JUMPI can only target addresses containing a JUMPDEST opcode. This constraint prevents jumping into the middle of a PUSH instruction (which could reinterpret data bytes as opcodes) and makes control flow analysis possible.
12.4.7 System Operations
| Opcode | Gas | Description |
|---|---|---|
| CREATE | 32,000 | Deploy a new contract |
| CREATE2 | 32,000 | Deploy with deterministic address |
| CALL | 2,600 (cold) + value transfer costs | Call another contract |
| STATICCALL | 2,600 (cold) | Read-only call (no state changes allowed) |
| DELEGATECALL | 2,600 (cold) | Call using caller's storage context |
| RETURN | 0 | Return data and stop execution |
| REVERT | 0 | Revert changes and return data |
| SELFDESTRUCT | 5,000* | Destroy contract (*deprecated, behavior changed post-Dencun) |
| LOG0-LOG4 | 375+ | Emit event logs |
These system operations are the most powerful opcodes in the EVM. CALL, DELEGATECALL, and STATICCALL enable the composability that defines DeFi. CREATE and CREATE2 allow contracts to deploy other contracts. We examine several of these in detail in subsequent sections.
⚠️ SELFDESTRUCT Deprecation: EIP-6780, implemented in the Dencun upgrade (March 2024), changed SELFDESTRUCT so that it only destroys the contract if called in the same transaction that created it. In all other cases, it sends the contract's ETH balance but does not remove the code or storage. This change was necessary for Verkle tree migration. Contracts relying on SELFDESTRUCT for cleanup should be redesigned.
12.5 Gas Costs: Why Some Operations Are Expensive
Gas is not just a fee — it is a pricing signal that reflects the real computational, storage, and bandwidth costs that each operation imposes on every node in the network. Understanding why specific gas costs were chosen reveals the economic architecture of the EVM.
12.5.1 The Gas Cost Hierarchy
Arranging common operations by gas cost reveals a clear hierarchy:
| Operation | Gas | Why |
|---|---|---|
| ADD, SUB, LT, GT | 3 | Pure CPU computation on values already in memory |
| MUL, DIV, MOD | 5 | Slightly more complex 256-bit arithmetic |
| SHA3 (32 bytes) | 36 | Cryptographic hash computation |
| SLOAD (warm) | 100 | Cache hit — data already in memory from earlier access |
| BALANCE (warm) | 100 | Same — cached account data |
| SLOAD (cold) | 2,100 | Disk read — traversing the storage trie on disk |
| CALL (cold) | 2,600 | Disk read for target account + execution overhead |
| SSTORE (update) | 5,000 | Disk write — modifying a trie node + state root update |
| SSTORE (new slot) | 20,000 | Disk write — creating a new trie entry |
| CREATE | 32,000 | Full contract deployment — code storage + address derivation |
The pattern is unmistakable: operations that touch disk are expensive, operations that stay in CPU/RAM are cheap. This is not coincidental. It mirrors the actual cost hierarchy of computer hardware:
- CPU operation: nanoseconds
- RAM access: ~100 nanoseconds
- SSD read: ~100 microseconds (1,000x slower than RAM)
- SSD write: ~1 millisecond (10,000x slower than RAM)
The EVM's gas costs approximate this hardware reality. And because storage writes must be propagated to every node and stored permanently, the multiplier is even larger than raw hardware costs suggest.
12.5.2 Why SSTORE Is 20,000 Gas
The cost of writing to a new storage slot — 20,000 gas — deserves detailed analysis because it is the single most impactful cost in smart contract economics.
When a contract executes SSTORE to write a non-zero value to a previously zero slot:
-
The storage trie must be updated. Each contract has its own storage trie (a modified Merkle Patricia trie). Inserting a new key requires creating new trie nodes and recomputing hashes up to the root.
-
The account state must be updated. The new storage root hash replaces the old one in the contract's account entry, which itself is a node in the world state trie.
-
The world state root must be recomputed. The block header contains the state root. Every state change triggers a cascade of Merkle hash recomputations.
-
Every full node must perform this work. There is no delegation or sharding (yet) for state storage. Every full node independently computes and stores the update.
-
The data persists indefinitely. Unlike a memory write that vanishes when execution ends, a storage write creates a permanent obligation for the entire network.
At a gas price of 30 gwei and ETH at $3,000, writing one storage slot costs roughly $1.80. This may seem expensive for storing 32 bytes — but consider that those 32 bytes are being stored permanently on tens of thousands of computers simultaneously, with cryptographic integrity guarantees. By that measure, it is extraordinarily cheap.
12.5.3 Gas Refunds and Incentive Alignment
The EVM includes a gas refund mechanism for operations that reduce state size. Clearing a storage slot (setting a non-zero value to zero) grants a refund of 4,800 gas, reducing the net cost from 5,000 to just 200 gas.
The economic logic is clear: the network wants to incentivize state cleanup. Every cleared storage slot reduces the burden on all nodes. The refund partially reimburses users for "giving back" storage space.
However, the refund mechanism has been subject to abuse. Before EIP-3529 (London upgrade, August 2021), refunds were larger and could offset up to 50% of a transaction's total gas consumption. Projects like GasToken exploited this by writing to storage when gas was cheap and clearing storage when gas was expensive, effectively arbitraging gas price fluctuations. EIP-3529 reduced refunds and capped them at 20% of transaction gas, closing this exploit.
12.5.4 Access Lists and EIP-2929
The cold/warm distinction, introduced in EIP-2929, fundamentally changed how gas costs work for state access. Before this EIP, every SLOAD cost 800 gas and every external CALL cost 700 gas, regardless of whether the slot or address had been accessed before.
This flat pricing was problematic:
- First access was underpriced. The first read of a storage slot requires traversing the on-disk trie — an expensive operation priced at only 800 gas. This created DoS vectors where attackers could force nodes to perform thousands of expensive disk lookups at below-cost prices.
- Repeated access was overpriced. After the first read, the data is in memory. Charging 800 gas for a cache hit was excessive and made gas optimization difficult.
EIP-2929 fixed this by introducing the concept of "accessed" addresses and storage keys. The first access to a slot costs 2,100 gas (cold). Subsequent accesses cost 100 gas (warm). This better reflects actual node costs and, importantly, closed a category of DoS attacks that had been exploited in the 2016 Shanghai attacks.
Access lists (EIP-2930) allow transactions to pre-declare which addresses and storage keys they will access, paying the cold cost upfront at a discount. This is useful for contracts that access many storage slots — prepaying via an access list can be cheaper than paying cold costs as they are encountered.
🔗 Connection to Chapter 11: The gas mechanism we discussed in Chapter 11 as Ethereum's anti-spam defense is implemented at the EVM level through these opcode costs. Every opcode's gas price is a policy decision that balances computational cost, network security, and developer experience.
12.6 From Solidity to Bytecode
12.6.1 The Compilation Pipeline
When you write a smart contract in Solidity and compile it, the output is not a single blob of bytecode. The Solidity compiler (solc) produces two distinct artifacts:
-
Creation bytecode (also called constructor bytecode or init code): Executed once during deployment. Contains the constructor logic plus instructions to copy the runtime bytecode to the blockchain.
-
Runtime bytecode: The permanent bytecode stored at the contract's address. Executed every time the contract is called.
The relationship between them is subtle. The creation bytecode is a program that, when executed, returns the runtime bytecode. Think of it as an installer: you run it once, it sets up the contract, and the result of its execution — the runtime code — is what gets stored on-chain.
Let us examine this with a minimal Solidity contract:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;
contract SimpleStorage {
uint256 private storedValue;
constructor(uint256 _initialValue) {
storedValue = _initialValue;
}
function get() public view returns (uint256) {
return storedValue;
}
function set(uint256 _value) public {
storedValue = _value;
}
}
Compiling this contract produces creation bytecode that, at a high level, performs:
- Execute the constructor: Take
_initialValuefrom calldata, execute SSTORE to save it to storage slot 0. - Copy the runtime bytecode from the code segment into memory using CODECOPY.
- Return the runtime bytecode via RETURN, which tells the EVM "store this as the contract's code."
The runtime bytecode, in turn, contains:
- A function dispatcher: Reads the first 4 bytes of calldata (the function selector), compares them against known selectors, and jumps to the appropriate function.
- Function bodies: The compiled logic for
get()andset(). - Metadata hash: The Solidity compiler appends a CBOR-encoded hash of the source code metadata (compiler version, optimization settings, etc.) at the end of the bytecode.
12.6.2 Compilation in Practice
The creation bytecode for our SimpleStorage contract (compiled with solc 0.8.20, optimizer enabled at 200 runs) looks approximately like this:
6080604052348015600e575f80fd5b5060405161...
Decoding the first several bytes:
60 80 PUSH1 0x80 ; Push 128 onto the stack
60 40 PUSH1 0x40 ; Push 64
52 MSTORE ; Store 128 at memory position 64
; (this initializes the "free memory pointer")
34 CALLVALUE ; Push msg.value onto stack
80 DUP1 ; Duplicate it
15 ISZERO ; Check if value is zero
60 0e PUSH1 0x0e ; Push jump destination
57 JUMPI ; Jump if value is zero (non-payable check)
5f PUSH0 ; Push 0 (for REVERT)
80 DUP1
fd REVERT ; Revert if ETH was sent
5b JUMPDEST ; Landing pad for the jump
The very first thing the bytecode does is initialize the free memory pointer — a convention where memory position 0x40 holds the address of the next available memory byte (initially 0x80, since positions 0x00-0x3F and 0x40-0x7F are reserved). This is a Solidity convention, not an EVM requirement, but it is present in virtually all Solidity-compiled contracts.
The next block checks whether ETH was sent with the deployment. Since the constructor is not marked payable, any ETH sent should cause a revert. This is a safety check the compiler inserts automatically.
12.6.3 The Function Dispatcher
The runtime bytecode begins with a dispatcher that routes incoming calls. The compiled dispatcher for our SimpleStorage contract works like this (pseudocode):
load first 4 bytes of calldata ; function selector
if selector == 0x60fe47b1: ; keccak256("set(uint256)")[:4]
jump to set_function
if selector == 0x6d4ce63c: ; keccak256("get()")[:4]
jump to get_function
revert ; unknown function selector
In actual bytecode:
60 00 PUSH1 0x00
35 CALLDATALOAD ; Load first 32 bytes of calldata
60 e0 PUSH1 0xe0
1c SHR ; Shift right 224 bits (keep top 4 bytes)
80 DUP1
63 60fe47b1 PUSH4 0x60fe47b1 ; Function selector for set(uint256)
14 EQ ; Compare
60 xx PUSH1 [jump dest]
57 JUMPI ; Jump if match
80 DUP1
63 6d4ce63c PUSH4 0x6d4ce63c ; Function selector for get()
14 EQ
60 xx PUSH1 [jump dest]
57 JUMPI
This pattern — load selector, compare against known values, jump — appears in every compiled Solidity contract. For contracts with many functions, the compiler may generate a binary search tree instead of a linear scan to reduce gas costs.
12.6.4 Compiler Optimizations
The Solidity compiler's optimizer performs several transformations to reduce gas costs:
- Constant folding: Expressions involving only constants are computed at compile time.
- Dead code elimination: Unreachable code is removed.
- Common subexpression elimination: Repeated computations are performed once and the result reused.
- Peephole optimization: Short sequences of opcodes are replaced with equivalent but cheaper sequences.
- Storage packing: Multiple variables smaller than 256 bits are packed into a single storage slot.
The optimizer's "runs" parameter (typically 200 or 10000) controls the trade-off between deployment cost and execution cost. A higher run count optimizes for cheaper function calls at the expense of larger (more expensive to deploy) bytecode. A lower run count produces smaller bytecode that costs more to call.
💡 Practical Insight: The
code/SimpleStorage.solfile in this chapter's code directory provides the complete contract. Compile it withsolc --optimize --bin --asm SimpleStorage.solto see the actual bytecode and assembly output. Thecode/bytecode_decoder.pyscript can decode the resulting bytecode back to human-readable opcodes.
12.7 Contract Deployment
12.7.1 The CREATE Opcode
When a contract deploys another contract (or when an EOA deploys a contract via a transaction with no to address), the EVM uses the CREATE opcode. The process works as follows:
- The deployer provides the creation bytecode, a value (ETH to send), and gas.
- The EVM computes the new contract's address as:
address = keccak256(rlp([sender, nonce]))[12:] - A new account is created at that address with an empty code hash.
- The creation bytecode is executed in the new account's context.
- If execution succeeds, the data returned by RETURN is stored as the contract's runtime bytecode.
- If execution reverts or runs out of gas, the contract creation fails and no account is created.
The address derivation formula means that the address of a CREATE-deployed contract depends on the sender's address and their current nonce. If you deploy a contract, its address is deterministic — but only if you know the exact nonce at deployment time. This makes pre-computing addresses difficult in situations involving multiple pending transactions.
The nonce-based addressing also means that if a contract is destroyed (pre-Dencun) and a new contract is deployed from the same sender at the same nonce, the new contract occupies the same address. This has been exploited in various attacks.
12.7.2 CREATE2: Deterministic Deployment
The CREATE2 opcode (EIP-1014, Constantinople upgrade) computes the contract address differently:
address = keccak256(0xFF ++ sender ++ salt ++ keccak256(init_code))[12:]
Here, salt is an arbitrary 32-byte value chosen by the deployer, and init_code is the creation bytecode. The address depends on the deployer, the salt, and the exact bytecode — but not on any nonce. This makes the address fully deterministic and predictable before deployment.
CREATE2 enables several important patterns:
- Counterfactual deployment: You can compute where a contract will be deployed and interact with that address (e.g., sending it ETH) before the contract actually exists. This is fundamental to state channels and account abstraction.
- Factory patterns: A factory contract can deploy child contracts at predictable addresses, enabling other contracts to trust the address without needing to verify deployment.
- Redeployment at the same address: If a CREATE2-deployed contract is destroyed, the same bytecode can be redeployed at the same address using the same salt. (This created the "metamorphic contract" attack vector, discussed in Case Study 2.)
12.7.3 The Deployment Transaction
When deploying a contract via an EOA transaction (rather than from another contract), the process has a slightly different entry point:
- The user sends a transaction with
to = null(empty recipient) anddata = creation bytecode. - The EVM treats this as a contract creation, computing the address from the sender's address and nonce.
- The creation bytecode is executed.
- The returned data becomes the contract's runtime code.
- A deployment fee of 200 gas per byte of runtime code is charged. This is separate from the execution costs of running the constructor.
The 200 gas per byte fee means that larger contracts are more expensive to deploy. A contract with 24 KB of runtime bytecode (the maximum allowed by EIP-170) costs 4,800,000 gas just for the code deposit, before any constructor logic runs. This limit was introduced after the 2016 Shanghai attacks, where attackers deployed contracts with no code size limit to bloat the state.
12.8 Contract Interaction
Smart contracts rarely operate in isolation. The power of Ethereum's ecosystem comes from composability — the ability for contracts to call other contracts, building complex financial primitives from simple components. The EVM provides four call opcodes for this purpose, each with distinct security properties.
12.8.1 CALL
The most common call opcode. CALL executes code at another address with a new execution context:
- The called contract gets its own stack (empty), memory (empty), and execution context.
- The called contract operates on its own storage.
- The caller specifies gas, value (ETH to send), input data, and a memory region for return data.
- CALL pushes 1 on the caller's stack on success, 0 on failure.
In Solidity, this maps to patterns like otherContract.someFunction(args) or the low-level address.call(data).
Critically, CALL does not propagate reverts automatically at the EVM level. If the called contract reverts, CALL returns 0 (failure) but does not revert the calling contract. Solidity's high-level call syntax adds an automatic revert check, but low-level address.call() does not — failing to check the return value is a common vulnerability.
12.8.2 STATICCALL
Introduced in EIP-214 (Byzantium upgrade), STATICCALL is identical to CALL except that it prohibits any state modifications. If the called contract attempts to execute SSTORE, CREATE, LOG, or SELFDESTRUCT, the entire call reverts.
This maps to Solidity's view and pure function modifiers. When you call a view function on another contract, the compiler generates a STATICCALL instruction, ensuring that the called contract cannot sneakily modify state.
STATICCALL is a critical security primitive. Before its introduction, calling a view function on an untrusted contract was risky — there was no EVM-level enforcement that the function would not modify state.
12.8.3 DELEGATECALL
DELEGATECALL is the most subtle and dangerous call opcode. It executes the called contract's code but in the context of the calling contract:
msg.senderremains the original caller (not the calling contract).msg.valueremains the original value.- Storage operations read and write the calling contract's storage, not the called contract's storage.
In other words, DELEGATECALL says: "Run that other contract's code, but pretend it is running inside me."
This is the mechanism behind proxy patterns and upgradeable contracts. A proxy contract holds the state (storage) and delegates all logic to an implementation contract via DELEGATECALL. To upgrade, you deploy a new implementation and point the proxy at it. The state remains unchanged because it lives in the proxy's storage.
However, DELEGATECALL is extraordinarily dangerous if misused. Because the called code operates on the caller's storage, any bug or malice in the called code can corrupt the caller's state. The Parity wallet hack (November 2017) exploited this: a library contract was accidentally made self-destructible, and when it self-destructed, every proxy contract that delegatecalled to it became permanently bricked — locking approximately $150 million in ETH.
12.8.4 CALLCODE (Deprecated)
CALLCODE is an older opcode that behaves similarly to DELEGATECALL but sets msg.sender to the calling contract's address rather than preserving the original sender. It has been effectively superseded by DELEGATECALL and should not be used in new contracts. It remains in the EVM for backward compatibility.
12.8.5 The Call Stack Limit
The EVM imposes a call stack depth limit of 1,024. Each CALL, DELEGATECALL, STATICCALL, or CALLCODE increments the call depth by one. If a call is attempted at depth 1,024, it fails (returns 0) without reverting the outer call.
Before EIP-150 (Tangerine Whistle, October 2016), this limit was exploitable. An attacker could engineer a situation where the call depth was exactly 1,023, then trigger a critical external call in the target contract that would silently fail due to the depth limit. The target contract, if it did not check the return value, would proceed as if the call succeeded.
EIP-150 mitigated this by changing the gas forwarding rule so that at most 63/64 of remaining gas is forwarded to a child call. This means the gas required to reach depth 1,024 grows exponentially — making call stack depth attacks impractical. However, the 1,024 limit still exists and can affect legitimate contracts with very deep call chains.
12.8.6 Return Data Handling
When a called contract returns or reverts, the data it provides is accessible via:
- RETURNDATASIZE: Returns the size of the most recent call's return data.
- RETURNDATACOPY: Copies return data into memory.
These opcodes (EIP-211, Byzantium) replaced the earlier pattern of preallocating memory for return data. Before their introduction, the caller had to guess the size of the return data, which was error-prone.
Solidity uses these opcodes automatically when handling function return values and revert messages. The try/catch syntax in Solidity 0.6+ is compiled to CALL + RETURNDATASIZE + RETURNDATACOPY, catching revert data for error handling.
12.9 ABI Encoding
The Application Binary Interface (ABI) is the standard encoding scheme for all external communication with smart contracts — function calls, return values, event logs, and error messages. It defines how high-level types (uint256, address, bytes, string, arrays, structs) are serialized into the raw bytes that the EVM processes.
12.9.1 Function Selectors
Every external function in a Solidity contract is identified by a 4-byte function selector, computed as the first 4 bytes of the Keccak-256 hash of the function's signature:
selector = keccak256("transfer(address,uint256)")[:4]
= keccak256("transfer(address,uint256)") → 0xa9059cbb...
→ selector = 0xa9059cbb
When you call token.transfer(to, amount), the transaction's calldata begins with 0xa9059cbb followed by the ABI-encoded parameters.
The function signature uses canonical type names: uint256 not uint, address not address payable, no parameter names, no spaces. Getting this wrong produces a different selector, causing the call to hit the fallback function (or revert).
With only 4 bytes (2^32 possible selectors), collisions are theoretically possible — two different function signatures could produce the same selector. In practice, the Solidity compiler prevents this within a single contract, and the probability of accidental collision is negligible. However, tools like cast sig from Foundry can search for selector collisions, and some attacks have deliberately crafted colliding selectors.
12.9.2 Parameter Encoding
ABI encoding divides types into two categories:
Static types have a fixed size:
- uint256, int256: 32 bytes, left-padded with zeros
- address: 20 bytes, left-padded to 32 bytes
- bool: 32 bytes (0 or 1, left-padded)
- bytes1 through bytes32: right-padded to 32 bytes
Dynamic types have variable size:
- bytes: length-prefixed byte array
- string: length-prefixed UTF-8 data
- T[]: length-prefixed array of type T
- Tuples containing any dynamic type
For static types, the encoding is straightforward concatenation. For dynamic types, the encoding uses an offset/data pattern:
- In the parameter's position, encode an offset pointing to where the data begins.
- At that offset, encode the length followed by the actual data.
Here is a concrete example. Calling function foo(uint256 a, string memory b, uint256 c) with a=42, b="hello", c=7:
0x00: xxxxxxxx // function selector (4 bytes)
0x04: 000000000000000000000000000000000000000000000000000000000000002a // a = 42
0x24: 0000000000000000000000000000000000000000000000000000000000000060 // offset to b (96 bytes from start of params)
0x44: 0000000000000000000000000000000000000000000000000000000000000007 // c = 7
0x64: 0000000000000000000000000000000000000000000000000000000000000005 // length of b = 5
0x84: 68656c6c6f000000000000000000000000000000000000000000000000000000 // "hello" padded to 32 bytes
Notice the structure: static parameters are encoded in place, the string parameter is replaced by an offset (0x60 = 96), and the actual string data appears at that offset with a length prefix.
12.9.3 Event Log Encoding
Events use a variant of ABI encoding. The first topic (topic[0]) is the Keccak-256 hash of the event signature (unless the event is anonymous). Indexed parameters become additional topics (up to 3). Non-indexed parameters are ABI-encoded in the log's data field.
event Transfer(address indexed from, address indexed to, uint256 value);
When this event fires: - topic[0] = keccak256("Transfer(address,address,uint256)") - topic[1] = from address (32 bytes) - topic[2] = to address (32 bytes) - data = ABI-encoded uint256 value
Indexed parameters enable efficient log filtering (you can search for all transfers to a specific address by filtering on topic[2]). However, indexing dynamic types (strings, bytes) stores only their hash, not the actual value.
12.9.4 Error Encoding
Since Solidity 0.8.4, custom errors use the same 4-byte selector scheme as functions:
error InsufficientBalance(uint256 available, uint256 required);
When this error is triggered, the revert data is:
selector(InsufficientBalance(uint256,uint256)) ++ abi_encode(available, required)
This is significantly more gas-efficient than the older require(condition, "string message") pattern, which ABI-encodes the entire error string.
💡 Hands-On: The
code/abi_encoder.pyscript demonstrates ABI encoding and decoding for various types. Try encoding a function call manually and verifying it matches what the script produces.
12.10 The EVM as a Sandbox
The EVM is, by design, a sandboxed execution environment. This sandboxing is not an afterthought — it is the fundamental architectural principle that makes trustless smart contract execution possible.
12.10.1 What the Sandbox Prevents
A smart contract running in the EVM cannot:
- Access the filesystem. There are no opcodes for file I/O. A contract cannot read or write files on the nodes that execute it.
- Make network requests. There is no
CONNECT,HTTP, orSOCKETopcode. A contract cannot fetch a web page, call an API, or communicate with the outside world. - Access the system clock directly. The only time-related information available is the block timestamp (TIMESTAMP), which is set by the block proposer and may differ slightly from actual time. There is no
CLOCKorTIMEopcode with millisecond precision. - Generate randomness. There is no
RANDOMopcode with true randomness (PREVRANDAO provides pseudo-randomness from the beacon chain, but it can be influenced by validators within narrow bounds). - Access other contracts' storage directly. A contract can only access its own storage. To read another contract's state, it must call that contract and have it return the data.
- Execute indefinitely. The gas limit ensures that every execution terminates. There are no infinite loops — at worst, a loop runs until gas runs out and the transaction reverts.
12.10.2 Why These Limitations Are Features
Each limitation serves a specific purpose:
No I/O ensures determinism. If a contract could make HTTP requests, different nodes might get different responses (server down, network partition, different cached data). The resulting state divergence would break consensus. By forbidding all external I/O, the EVM guarantees that given the same state and the same transaction, every node on the planet produces the exact same result.
No direct time access prevents manipulation. A high-precision clock would differ between nodes. Even TIMESTAMP is only trusted to be within a ~15-second window of actual time, and validators can slightly manipulate it. This is why time-sensitive contracts (e.g., auction deadlines) use block numbers when precision matters.
No randomness prevents exploitation. True randomness requires an external source. Any randomness generated within the EVM would be deterministic (and thus predictable by miners/validators who can preview execution). This is why on-chain randomness solutions like Chainlink VRF exist — they provide verifiably random numbers through an oracle mechanism.
Gas-limited execution prevents DoS. Without gas, a malicious contract could contain an infinite loop, and every node attempting to process it would hang forever. Gas ensures that execution always terminates and that users pay proportionally for the resources they consume.
12.10.3 The Oracle Problem
The sandbox's inability to access external data creates the fundamental oracle problem: smart contracts need real-world data (asset prices, weather conditions, sports scores, election results) but cannot fetch it themselves.
Oracles — services that publish external data on-chain — bridge this gap. The design of secure oracle systems is one of the most important challenges in blockchain engineering, and it is a direct consequence of the EVM's sandboxed architecture. We examine oracles in detail in Chapter 22.
12.10.4 Determinism in Practice
The sandbox guarantees are not just theoretical. They are enforced at the opcode level:
- The instruction set contains no nondeterministic opcodes.
- The JUMP/JUMPI opcodes can only target JUMPDEST bytes, preventing code injection.
- Integer overflow wraps deterministically (no undefined behavior as in C).
- Division by zero returns zero rather than raising an exception (EVM convention, though Solidity's compiler adds checks).
- Out-of-gas conditions are deterministic: the gas counter decrements identically on every node.
This determinism extends to error handling. When a contract reverts, the state changes within that call frame are rolled back identically on every node. When a CALL fails, it returns 0 on every node. There is no "sometimes it fails" behavior.
The result is that the EVM, despite being executed by tens of thousands of independent computers running different client implementations (Geth in Go, Nethermind in C#, Besu in Java, Reth in Rust), produces identical results. This is not a trivial achievement — it requires extraordinary precision in the specification and testing of every opcode.
12.11 EVM Alternatives and Competitors
While the EVM dominates the smart contract ecosystem, several alternative virtual machine architectures have emerged, each addressing perceived limitations.
12.11.1 WebAssembly-Based VMs (eWASM, Stylus)
WebAssembly (Wasm) is a binary instruction format designed as a compilation target for C, C++, Rust, and other languages. Several blockchain projects have adopted or proposed Wasm-based VMs:
- Arbitrum Stylus (launched 2023) allows developers to write smart contracts in Rust, C, or C++ that compile to Wasm and execute alongside EVM contracts. Stylus contracts can interoperate with EVM contracts seamlessly.
- Polkadot's
ink!framework compiles Rust contracts to Wasm for execution on Substrate-based chains. - CosmWasm is the Cosmos ecosystem's Wasm-based contract framework.
The advantages of Wasm over the EVM include:
- Near-native performance: Wasm is designed for efficient execution and can be JIT-compiled.
- Mature toolchains: Languages targeting Wasm (Rust, C++) have decades of compiler optimization work.
- Smaller code size: Wasm binaries are typically more compact than equivalent EVM bytecode.
- Broader type system: Wasm supports 32-bit and 64-bit integers natively, avoiding the overhead of 256-bit arithmetic for non-cryptographic operations.
12.11.2 Solana's Sealevel and SVM
Solana uses the Sealevel runtime, which executes programs compiled to BPF (Berkeley Packet Filter) bytecode. The Solana Virtual Machine (SVM) differs from the EVM in fundamental ways:
- Account model: Solana separates code and state into different accounts. Programs are stateless; they operate on data accounts passed as arguments.
- Parallelism: Because transactions declare which accounts they access, Sealevel can execute non-conflicting transactions in parallel — something the EVM's serial execution model cannot do.
- Rent: Solana charges ongoing rent for storage, unlike Ethereum's one-time SSTORE cost.
12.11.3 Move VM (Aptos, Sui)
The Move language, originally developed at Facebook/Meta for the Libra/Diem project, introduces a novel resource-oriented programming model. Resources in Move are linear types — they cannot be copied or implicitly discarded, only moved between owners. This makes certain classes of bugs (double-spending, lost tokens) impossible by construction.
The Move VM is used by Aptos and Sui, which have gained traction as high-performance alternatives to Ethereum.
12.11.4 The EVM's Moat
Despite these alternatives, the EVM maintains its dominance for several reasons:
- Network effects: The vast majority of smart contract developers, auditing tools, and security expertise target the EVM.
- Multi-chain adoption: The EVM is not just Ethereum. Polygon, BNB Chain, Avalanche, Arbitrum, Optimism, Base, and dozens of other chains run EVM-compatible execution layers.
- Tooling maturity: Hardhat, Foundry, Remix, Slither, Mythril, and hundreds of other tools form an ecosystem that would take years to replicate.
- Liquidity: The vast majority of DeFi liquidity exists on EVM-compatible chains.
Understanding the EVM is therefore not just about understanding Ethereum — it is about understanding the execution model that underlies the majority of the decentralized computing ecosystem.
12.12 Summary and Bridge to Chapter 13
This chapter has taken you inside the machine that powers smart contracts. The key takeaways:
Architecture. The EVM is a stack-based virtual machine with a 256-bit word size, no registers, and a modified Harvard architecture that separates code from data. Every node in the network runs its own copy of the EVM and must arrive at identical results.
Data locations. The stack is fast and free but limited to 1,024 items with only the top 16 accessible. Memory is byte-addressable and expandable but volatile, with quadratic expansion costs. Storage is a persistent key-value mapping that survives between transactions — and it is expensive because it creates a permanent obligation for every node.
Opcodes and gas. The EVM's ~140 opcodes are each priced according to the real resources they consume. Pure computation (ADD, MUL) costs 3-5 gas. Memory operations cost 3 gas plus expansion fees. Storage writes cost 5,000-20,000 gas. This pricing reflects the hardware cost hierarchy (CPU < RAM < Disk) and the network cost of permanent state storage.
Compilation. Solidity contracts compile to two phases of bytecode: creation code (runs once, executes the constructor, returns the runtime code) and runtime code (stored on-chain, contains the function dispatcher and logic). The function dispatcher uses 4-byte selectors derived from Keccak-256 hashes of function signatures.
Contract interaction. CALL, STATICCALL, and DELEGATECALL enable composability but with different security properties. DELEGATECALL executes foreign code in the caller's context — powerful for proxy patterns, dangerous if misused. The call stack is limited to 1,024 frames.
ABI encoding. The ABI standard defines how function calls, return values, events, and errors are serialized. Static types are encoded in place; dynamic types use an offset/data pattern. Function selectors are 4-byte Keccak-256 prefixes.
The sandbox. The EVM's inability to perform I/O, access files, generate randomness, or execute indefinitely is not a limitation — it is the foundation of trustless execution. Determinism enables consensus. The oracle problem is a direct consequence of this design.
In Chapter 13, we shift from the machine to the language. Having seen what the EVM actually does at the bytecode level, you are now equipped to understand Solidity not as an abstract programming language but as a human-readable layer over the opcode sequences we have studied here. Every Solidity construct — variables, functions, modifiers, inheritance — compiles down to the opcodes, stack operations, and storage patterns you now understand. That understanding will make you a dramatically better smart contract developer.
Key Terms Glossary
| Term | Definition |
|---|---|
| EVM | Ethereum Virtual Machine — the sandboxed, stack-based virtual machine that executes smart contract bytecode |
| Opcode | A single-byte instruction in the EVM's instruction set (e.g., ADD, SSTORE, CALL) |
| Stack | LIFO data structure holding up to 1,024 256-bit items; the EVM's primary workspace |
| Memory | Volatile, byte-addressable, expandable data region with quadratic expansion costs |
| Storage | Persistent key-value mapping (256-bit to 256-bit) that survives between transactions |
| Bytecode | The compiled machine code of a smart contract, consisting of sequential opcodes |
| Program Counter | Tracks the current position in the bytecode during execution |
| Gas | Unit measuring computational effort; each opcode has a defined gas cost |
| SSTORE | Opcode that writes to persistent storage (20,000 gas for new slot) |
| SLOAD | Opcode that reads from persistent storage (2,100 gas cold, 100 warm) |
| CALL | Opcode that invokes another contract with a new execution context |
| DELEGATECALL | Opcode that executes another contract's code in the caller's storage context |
| ABI | Application Binary Interface — the standard encoding scheme for smart contract interactions |
| Function Selector | First 4 bytes of keccak256(function signature), used to route calls to functions |
| Calldata | The input data sent with a transaction or call, read-only within the EVM |
| Returndata | Data returned by the most recent CALL, accessible via RETURNDATASIZE/RETURNDATACOPY |
| Contract Creation | Process of deploying bytecode via CREATE or CREATE2, producing a new contract account |
| Runtime Bytecode | The permanent bytecode stored at a contract's address, executed on every call |
| Constructor Bytecode | One-time bytecode that runs during deployment, returning the runtime bytecode |