Case Study 39-1: A Minimal JIT Compiler — Generating x86-64 Machine Code at Runtime
Introduction
A JIT compiler generates machine code at runtime and executes it. This sounds exotic, but it is precisely what JavaScript engines (V8, SpiderMonkey), the Java HotSpot JVM, Python's PyPy, and LLVM's ORC JIT do millions of times per second. Understanding a minimal JIT at the byte level demystifies all of these.
This case study builds a JIT compiler in C, starting with "return 42" and progressing to a simple arithmetic expression evaluator. Every machine code byte is written manually and explained. By the end, the mechanism of mmap → write bytes → mprotect → call that we flagged as suspicious in Chapter 34's malware analysis is fully understood as a legitimate and essential programming technique.
Phase 1: The Infrastructure
First, the three functions that all JIT compilers need:
#include <stdio.h>
#include <stdint.h>
#include <sys/mman.h>
#include <string.h>
#include <stdlib.h>
/* Allocate writable memory for code generation */
void *jit_alloc(size_t size) {
void *mem = mmap(NULL, size,
PROT_READ | PROT_WRITE, /* writable for code emission */
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
if (mem == MAP_FAILED) {
perror("mmap");
exit(1);
}
return mem;
}
/* Make the region executable (and remove write permission) */
void jit_seal(void *mem, size_t size) {
if (mprotect(mem, size, PROT_READ | PROT_EXEC) != 0) {
perror("mprotect");
exit(1);
}
}
/* Free JIT-allocated memory */
void jit_free(void *mem, size_t size) {
munmap(mem, size);
}
The write-then-execute pattern is the JIT lifecycle: allocate RW, write bytes, make RX, call.
Phase 2: "Return 42" — The Minimal JIT Function
The simplest possible function: no arguments, returns the integer 42.
In x86-64:
mov eax, 42 ; B8 2A 00 00 00
ret ; C3
In C, writing these bytes:
typedef int (*jit_int_fn)(void);
jit_int_fn jit_compile_return42(void) {
uint8_t *code = jit_alloc(16);
/* mov eax, 42 */
code[0] = 0xB8; /* opcode: MOV EAX, imm32 */
code[1] = 42; /* immediate: 42 (0x2A) */
code[2] = 0;
code[3] = 0;
code[4] = 0;
/* ret */
code[5] = 0xC3;
jit_seal(code, 16);
return (jit_int_fn)code;
}
int main(void) {
jit_int_fn f = jit_compile_return42();
printf("f() = %d\n", f()); /* prints: f() = 42 */
jit_free(f, 16);
return 0;
}
The CPU executes our 6 bytes as real instructions. It has no idea that we wrote them at runtime rather than at compile time. Machine code is machine code.
Phase 3: "Increment Argument" — Using Parameters
A function that takes one argument and returns it + 1:
; int inc(int x)
; x arrives in EDI (System V ABI: first integer arg)
lea eax, [edi + 1] ; 8D 47 01
ret ; C3
Wait — [edi + 1] uses a 32-bit addressing mode. In 64-bit code, we should use [rdi + 1], but the result is int (32-bit). The REX.W prefix makes it 64-bit. For 32-bit operations, we can either:
- Use add eax, edi; add eax, 1 (2 instructions)
- Use lea eax, [edi + 1] — but wait, in 64-bit mode this uses the 64-bit address space. Let's use the 64-bit lea:
; 64-bit: lea eax, [rdi + 1] → 8D 47 01
; REX.W makes it 64-bit: 48 8D 47 01
; But we want a 32-bit result, so just add:
mov eax, edi ; 89 F8 (move edi → eax)
add eax, 1 ; 83 C0 01 (add 1, sign-extended byte)
ret ; C3
typedef int (*jit_inc_fn)(int);
jit_inc_fn jit_compile_inc(void) {
uint8_t *code = jit_alloc(16);
int pos = 0;
/* mov eax, edi: 89 F8 */
code[pos++] = 0x89;
code[pos++] = 0xF8;
/* add eax, 1: 83 C0 01 */
code[pos++] = 0x83;
code[pos++] = 0xC0;
code[pos++] = 0x01;
/* ret: C3 */
code[pos++] = 0xC3;
jit_seal(code, 16);
return (jit_inc_fn)code;
}
Phase 4: Code Buffer with Emit Functions
Writing raw bytes directly is error-prone. A JIT compiler uses emit functions:
typedef struct {
uint8_t *code;
size_t pos;
size_t capacity;
} JitBuf;
void emit_byte(JitBuf *buf, uint8_t b) {
buf->code[buf->pos++] = b;
}
void emit_imm32(JitBuf *buf, int32_t imm) {
/* little-endian 4-byte immediate */
buf->code[buf->pos++] = imm & 0xFF;
buf->code[buf->pos++] = (imm >> 8) & 0xFF;
buf->code[buf->pos++] = (imm >> 16) & 0xFF;
buf->code[buf->pos++] = (imm >> 24) & 0xFF;
}
/* mov eax, imm32 */
void emit_mov_eax_imm(JitBuf *buf, int32_t imm) {
emit_byte(buf, 0xB8); /* MOV EAX, imm32 */
emit_imm32(buf, imm);
}
/* add eax, imm8 (sign-extended) */
void emit_add_eax_imm8(JitBuf *buf, int8_t imm) {
emit_byte(buf, 0x83); /* ADD r/m32, imm8 */
emit_byte(buf, 0xC0); /* ModRM: EAX */
emit_byte(buf, (uint8_t)imm);
}
/* imul eax, edi (EAX = EAX * EDI) */
void emit_imul_eax_edi(JitBuf *buf) {
emit_byte(buf, 0x0F); /* 2-byte opcode prefix */
emit_byte(buf, 0xAF); /* IMUL r32, r/m32 */
emit_byte(buf, 0xC7); /* ModRM: dst=EAX, src=EDI */
}
/* ret */
void emit_ret(JitBuf *buf) {
emit_byte(buf, 0xC3);
}
With these primitives, generating a function becomes readable:
/* Compile: int times_n(int x, int n) { return x * n; } */
typedef int (*times_fn)(int x, int n);
times_fn jit_compile_multiply(void) {
uint8_t *mem = jit_alloc(64);
JitBuf buf = { .code = mem, .pos = 0, .capacity = 64 };
/* mov eax, edi — move first arg (x) to eax */
emit_byte(&buf, 0x89);
emit_byte(&buf, 0xF8);
/* imul eax, esi — multiply by second arg (n) */
emit_byte(&buf, 0x0F);
emit_byte(&buf, 0xAF);
emit_byte(&buf, 0xC6); /* ModRM: EAX, ESI */
emit_ret(&buf);
jit_seal(mem, 64);
return (times_fn)mem;
}
Phase 5: Expression Evaluator
The goal: compile a simple arithmetic expression like (a + b) * c to x86-64 machine code at runtime. We support: integer constants, one variable (x), addition, and multiplication.
A tree-walking code generator:
typedef enum { CONST, VAR, ADD, MUL } NodeType;
typedef struct Node {
NodeType type;
int value; /* for CONST */
struct Node *left, *right; /* for ADD, MUL */
} Node;
/* Compile a node's result into EAX.
The variable x is in EDI (first argument). */
void emit_node(JitBuf *buf, Node *node) {
if (node->type == CONST) {
/* mov eax, constant */
emit_mov_eax_imm(buf, node->value);
} else if (node->type == VAR) {
/* mov eax, edi — variable x */
emit_byte(buf, 0x89);
emit_byte(buf, 0xF8);
} else if (node->type == ADD) {
/* Evaluate left → EAX */
emit_node(buf, node->left);
/* push eax (save left result) */
emit_byte(buf, 0x50);
/* Evaluate right → EAX */
emit_node(buf, node->right);
/* pop ecx (restore left result) */
emit_byte(buf, 0x59);
/* add eax, ecx */
emit_byte(buf, 0x01);
emit_byte(buf, 0xC8);
} else if (node->type == MUL) {
/* Same pattern but with imul */
emit_node(buf, node->left);
emit_byte(buf, 0x50); /* push eax */
emit_node(buf, node->right);
emit_byte(buf, 0x59); /* pop ecx */
/* imul eax, ecx */
emit_byte(buf, 0x0F);
emit_byte(buf, 0xAF);
emit_byte(buf, 0xC1);
}
}
typedef int (*expr_fn)(int);
expr_fn jit_compile_expr(Node *expr) {
uint8_t *mem = jit_alloc(256);
JitBuf buf = { .code = mem, .pos = 0, .capacity = 256 };
emit_node(&buf, expr);
emit_ret(&buf);
jit_seal(mem, 256);
return (expr_fn)mem;
}
Usage:
/* Build AST for (x + 5) * 3 */
Node n5 = { CONST, 5, NULL, NULL };
Node nx = { VAR, 0, NULL, NULL };
Node nadd = { ADD, 0, &nx, &n5, NULL };
Node n3 = { CONST, 3, NULL, NULL };
Node nmul = { MUL, 0, &nadd, &n3 };
expr_fn f = jit_compile_expr(&nmul);
printf("f(4) = %d\n", f(4)); /* (4 + 5) * 3 = 27 */
printf("f(10) = %d\n", f(10)); /* (10 + 5) * 3 = 45 */
The Generated Machine Code
For (x + 5) * 3:
; Compiled to:
mov eax, edi ; load x (for VAR node in ADD.left)
push eax ; save x
mov eax, 5 ; load 5 (CONST)
pop ecx ; restore x
add eax, ecx ; x + 5
push eax ; save x+5
mov eax, 3 ; load 3
pop ecx ; restore x+5
imul eax, ecx ; (x+5) * 3
ret
This is exactly what a compiler does, but made explicit. The JIT "compiles" the AST to machine code by recursively visiting nodes and emitting instruction bytes.
What This Demonstrates
-
Machine code is just bytes: there is nothing magical about compiled code. It is sequences of bytes that the CPU interprets as instructions according to its encoding rules.
-
Calling convention is crucial: the JIT knows to read
xfrom EDI because that is where the System V ABI puts the first integer argument. JIT code must follow the same conventions as AOT-compiled code. -
The write-then-execute pattern is fundamental: every JIT compiler from V8 to LLVM ORC uses some variant of this pattern. Understanding why it exists (writable and executable simultaneously violates W^X) demystifies both JIT compilers and the malware flag from Chapter 34.
-
Optimizing JITs are more complex but similar in principle: production JITs add register allocation (avoid push/pop), constant folding (evaluate
5 * 3 = 15at JIT time), inlining, and loop optimization. The core mechanism — emit bytes, seal, call — is the same.
This is a minimal JIT. LLVM ORC, V8's TurboFan, and Java HotSpot are this pattern scaled by several orders of magnitude and sophistication.