From the CPU's perspective, memory is a flat array of bytes addressed by a 64-bit integer. That's it. There is no type system, no boundary checking, no distinction between code and data (the hardware will execute data as code if you tell it to). The...
In This Chapter
Chapter 4: Memory
The Machine's View of Memory
From the CPU's perspective, memory is a flat array of bytes addressed by a 64-bit integer. That's it. There is no type system, no boundary checking, no distinction between code and data (the hardware will execute data as code if you tell it to). The entire virtual address space — 2^64 bytes in principle, 2^48 bytes in practice on current CPUs — is a single array of bytes.
The structure we see in a running program — the stack, the heap, the code section, the libraries — is imposed by the operating system and the linker, not by hardware. The hardware's job is to translate virtual addresses to physical addresses (via the MMU) and to enforce access permissions (via page protection bits). Everything else is software convention.
This chapter explains the memory model: how the address space is organized, what alignment means and why it matters, how the stack works at the hardware level, and how the NASM assembler declares data. By the end of this chapter, you should be able to look at a memory diagram of a running process and understand every region.
The 64-Bit Virtual Address Space
Every process on a 64-bit operating system runs in its own private virtual address space. The CPU presents the process with the illusion of having exclusive access to the entire 64-bit address space. Physical memory (DRAM) is much smaller — a typical server has 64GB to 1TB — but the virtual address space is 2^64 bytes = 16 exabytes.
In practice, current x86-64 CPUs implement only 48-bit virtual addresses. This is a hardware limitation: addresses must be "canonical" — bits 63:48 must all equal bit 47. Addresses in the range 0x0000000000000000 to 0x00007FFFFFFFFFFF (bits 47:0, with bit 47 = 0) are user space. Addresses in the range 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF (bit 47 = 1, upper bits = 1) are kernel space.
Virtual Address Space Layout (x86-64 Linux, typical)
0xFFFFFFFFFFFFFFFF ┬────────────────────────────────────────────────┐
│ │
│ KERNEL SPACE (~128 TB) │
│ (kernel text, data, page tables, vmalloc, │
│ direct physical memory map, etc.) │
│ │
0xFFFF800000000000 ┴────────────────────────────────────────────────┘
Non-canonical hole: 0x0001000000000000 to 0xFFFF7FFFFFFFFFFF
(accessing this causes #GP fault — General Protection exception)
0x00007FFFFFFFFFFF ┬────────────────────────────────────────────────┐
│ Stack (grows downward) │
│ ↓ │
│ ... │
│ │
│ Memory-mapped files, shared libraries │
│ (mmap region, starting near 0x7F0000000000) │
│ │
│ ... │
│ ↑ │
│ Heap (grows upward from program break) │
│ │
├────────────────────────────────────────────────┤
│ .bss (uninitialized data) │
├────────────────────────────────────────────────┤
│ .data (initialized data) │
├────────────────────────────────────────────────┤
│ .rodata (read-only data: string literals) │
├────────────────────────────────────────────────┤
│ .text (executable code) │
├────────────────────────────────────────────────┤
│ ELF headers │
0x0000000000400000 ┴────────────────────────────────────────────────┘
0x0000000000000000: NULL pointer (unmapped — any access faults)
This layout is for a non-PIE (Position-Independent Executable) program. PIE executables (the default in modern GCC) are loaded at a random address by ASLR (Address Space Layout Randomization), which we cover in Chapter 11.
Process Memory Segments
.text: Executable Code
The .text section contains machine code — the bytes that the CPU executes. It is mapped as read-execute: you can read the bytes (which is how function pointers work) and execute them, but you cannot write to this region. Writing to .text causes a segfault (kernel: page fault with write to read-only page).
The .text section begins at the ELF entry point (_start) and is laid out by the linker in the order functions appear in the source files.
section .text ; switch to the .text section
global _start ; declare _start as a global symbol
_start:
; code here goes into .text
mov rax, 1
; ...
.rodata: Read-Only Data
String literals, compile-time constants, and jump tables typically end up in .rodata. This section is mapped read-only: no writes, no execution.
section .rodata
greeting db "Hello", 0 ; null-terminated string
pi_approx dd 3.14159265 ; 32-bit float constant
error_codes dd 0, 1, 2, 3, 4 ; lookup table
.data: Initialized Data
Global and static variables with initial values go in .data. This section is read-write, not executable.
section .data
counter dq 0 ; 64-bit counter, initialized to 0
flag db 1 ; byte flag, initialized to 1
buffer db "default", 0 ; modifiable string (copies of it!)
.bss: Uninitialized Data
BSS stands for "Block Started by Symbol" (a historical name from early assemblers). Variables in .bss are zero-initialized but not stored in the file — the kernel/loader zeros the region when the program starts. This saves disk space: a 1MB zero-initialized buffer in .data would require 1MB in the binary file; in .bss, it requires only an entry in the section header indicating the size.
section .bss
large_buf resb 65536 ; 64KB zero-initialized buffer (not stored in ELF!)
counter resq 1 ; one 64-bit counter, initialized to 0
matrix resd 100 ; 100 32-bit integers, all zero
The resb, resw, resd, resq directives (reserve byte, word, doubleword, quadword) declare BSS space.
Heap: Dynamic Allocation
The heap is the region used by malloc(), new, and other dynamic allocators. It starts after the BSS section at the program break (a pointer maintained by the OS) and grows toward higher addresses.
In assembly, you access the heap through the mmap system call (preferred) or the legacy brk/sbrk calls:
; Allocate 4096 bytes using mmap (anonymous memory):
mov rax, 9 ; sys_mmap
mov rdi, 0 ; addr = 0 (OS chooses address)
mov rsi, 4096 ; length = 4096 bytes
mov rdx, 3 ; prot = PROT_READ | PROT_WRITE
mov r10, 0x22 ; flags = MAP_PRIVATE | MAP_ANONYMOUS
mov r8, -1 ; fd = -1 (anonymous mapping)
mov r9, 0 ; offset = 0
syscall
; rax now contains the address of the allocated memory (or error if < 0)
We cover dynamic memory allocation in full detail in Chapter 25.
Stack: Function Frames and Local Variables
The stack is the most important memory region for understanding assembly code. It grows toward lower addresses: each PUSH operation decrements RSP and stores a value; each POP operation loads a value and increments RSP.
High addresses
┌──────────────────────────┐ 0x7fffffffe020 ← initial RSP (set by kernel)
│ argc, argv, envp │ (command-line args and environment)
├──────────────────────────┤
│ caller's saved rbp │
├──────────────────────────┤
│ caller's local vars │
├──────────────────────────┤
│ return address │ ← pushed by CALL instruction
├──────────────────────────┤
│ current frame's │
│ saved rbp │ ← pushed in prologue (push rbp)
├──────────────────────────┤ ← RBP (after "mov rbp, rsp")
│ local variable 1 │ [rbp - 8]
│ local variable 2 │ [rbp - 16]
│ ... │
├──────────────────────────┤
│ (optional padding) │ for 16-byte alignment
├──────────────────────────┤ ← RSP (current top of stack)
│ (unallocated) │
└──────────────────────────┘ Low addresses
The stack has a maximum size (typically 8MB on Linux, set by ulimit -s). Stack overflow (exceeding this limit) produces a segfault, typically as a stack pointer going below the mapped region.
Memory Alignment
Alignment refers to the requirement that a value's address is a multiple of its size. An aligned 8-byte read must be from an address divisible by 8. An aligned 4-byte read from an address divisible by 4.
Why Alignment Matters
Performance: The CPU reads memory in cache lines (typically 64 bytes). An aligned 8-byte read always falls within a single cache line. A misaligned 8-byte read that straddles a cache line boundary requires two cache line reads — potentially 2x slower.
Correctness: Some SIMD instructions (especially older SSE instructions like MOVAPS) require 16-byte aligned operands and will fault on misaligned addresses. Newer instructions (MOVUPS) allow unaligned access but may be slower.
Atomicity: On x86-64, aligned loads and stores up to 8 bytes are guaranteed to be atomic (a concurrent reader will see either the old value or the new value, never a mix). Misaligned loads and stores lose this guarantee.
Alignment Requirements
| Data Type | Size | Required alignment |
|---|---|---|
char / byte |
1 | 1 (any address) |
short / word |
2 | 2 |
int / dword |
4 | 4 |
long / qword |
8 | 8 |
double |
8 | 8 |
long double (x87) |
10/16 | 16 |
__m128 (SSE) |
16 | 16 |
__m256 (AVX) |
32 | 32 |
__m512 (AVX-512) |
64 | 64 |
| Cache line | 64 | 64 |
Ensuring Alignment in NASM
section .data
align 8 ; pad to 8-byte boundary before next declaration
qword_val dq 0x123456789ABCDEF0 ; now guaranteed 8-byte aligned
align 16
xmm_val dq 0x1234567890ABCDEF, 0xFEDCBA9876543210 ; 128-bit, 16-byte aligned
section .bss
align 32
avx_buffer resb 256 ; 32-byte aligned for AVX operations
The linker also aligns sections by default. The .text section is typically page-aligned (4096 bytes). Individual symbols within a section may not be aligned unless you insert align directives.
Stack Alignment
The System V AMD64 ABI requires RSP to be 16-byte aligned before a CALL instruction. At function entry (immediately after CALL pushes the 8-byte return address), RSP is offset by 8 from 16-byte alignment. The standard function prologue:
_start: ; RSP is 16-byte aligned here (at program entry)
call some_function ; pushes 8-byte return address; RSP is now 8-byte aligned
; at entry to some_function
some_function: ; RSP ≡ 8 (mod 16) here
push rbp ; RSP -= 8; now RSP ≡ 0 (mod 16) ✓
mov rbp, rsp
sub rsp, 32 ; allocate local variables (must be multiple of 16)
; ... function body ...
mov rsp, rbp ; restore stack
pop rbp
ret
If you forget to maintain alignment and call a function that uses SSE instructions (which is most modern library code), you'll get a segfault on the misaligned MOVAPS or similar instruction.
Little-Endian Memory Layout
x86-64 stores multi-byte values in memory with the least-significant byte at the lowest address. This is little-endian byte order.
Given:
section .data
value dq 0x0102030405060708 ; 64-bit value
Memory layout (addresses shown are relative to the start of value):
Offset +0: 0x08 ← least significant byte
Offset +1: 0x07
Offset +2: 0x06
Offset +3: 0x05
Offset +4: 0x04
Offset +5: 0x03
Offset +6: 0x02
Offset +7: 0x01 ← most significant byte
When you read this back with mov rax, [value], you get 0x0102030405060708 — the correct value. The byte reversal is invisible to arithmetic operations. It only becomes visible when you examine memory byte-by-byte (in GDB's x/8xb output) or when exchanging data with big-endian systems.
Pointers: Just Numbers in Registers
In assembly, a "pointer" is simply a 64-bit integer that happens to contain an address. There is no pointer type, no null check, no bounds check. Writing a pointer to a register makes that register a pointer. Dereferencing a pointer means loading from or storing to the address it contains.
; C equivalent: long *p = &array[0]; *p = 42;
lea rdi, [array] ; rdi = address of array[0] (a "pointer")
mov QWORD [rdi], 42 ; *rdi = 42 (dereference and store)
; C equivalent: long x = *(p + 3); (pointer arithmetic, qword = 8 bytes per element)
mov rax, QWORD [rdi + 3*8] ; rax = array[3]
; C equivalent: p++; (advance pointer to next element)
add rdi, 8 ; add 8 (size of long) to pointer
The LEA (Load Effective Address) instruction computes an address and stores it in a register without accessing memory:
lea rax, [rbx + rcx*8 + 16] ; rax = rbx + rcx*8 + 16 (NO MEMORY ACCESS)
mov rax, QWORD [rbx + rcx*8 + 16] ; rax = memory at address rbx+rcx*8+16 (MEMORY ACCESS)
LEA is also used as a cheap arithmetic instruction:
lea rax, [rbx + rbx*2] ; rax = rbx * 3 (without using IMUL)
lea rax, [rbx + 1] ; rax = rbx + 1 (without modifying flags, unlike ADD)
PUSH and POP Mechanics
PUSH and POP are the primary instructions for managing the stack. Their behavior is precisely defined:
push rax ; Equivalent to: sub rsp, 8 ; mov [rsp], rax
pop rbx ; Equivalent to: mov rbx, [rsp] ; add rsp, 8
Stack grows downward: PUSH decrements RSP before storing. This means RSP always points to the top (lowest) occupied stack slot, not to the next free slot.
Before PUSH rax: After PUSH rax:
┌──────────┐ 0x100 ┌──────────┐ 0x100
│ (free) │ │ (free) │
├──────────┤ 0x0F8 ← RSP ├──────────┤ 0x0F8
│ ... │ │ value of │
└──────────┘ │ RAX │
├──────────┤ 0x0F0 ← RSP (new)
└──────────┘
PUSH and POP only operate on 64-bit values in 64-bit mode (you can override to 16-bit with a prefix, but not to 32-bit). There is no push eax in 64-bit mode; use push rax.
NASM Data Declarations in Detail
Initialized Data (in .data or .rodata)
section .data
; Bytes
single_byte db 42 ; one byte: 0x2A
char_val db 'A' ; one byte: 0x41 (ASCII 'A')
string db "Hello", 0 ; 6 bytes: H, e, l, l, o, \0
neg_byte db -1 ; one byte: 0xFF (two's complement)
; Words (16-bit)
word_val dw 0x1234 ; two bytes: 0x34, 0x12 (little-endian!)
neg_word dw -1 ; two bytes: 0xFF, 0xFF
; Doublewords (32-bit)
dword_val dd 0x12345678 ; four bytes
float_val dd 3.14159 ; 32-bit IEEE 754 float
; Quadwords (64-bit)
qword_val dq 0x123456789ABCDEF0 ; eight bytes
double_val dq 3.14159265358979 ; 64-bit IEEE 754 double
; Multiple values
array dd 1, 2, 3, 4, 5 ; five 32-bit integers
mixed db 0x01, 0x02, "AB" ; bytes from different sources
; The $ operator (current position) and $$ (section start)
message db "Hello, World!"
msg_len equ $ - message ; compile-time constant: length of message
section_offset equ $ - $$ ; bytes from start of section
; TIMES: repeated initialization
zeros times 10 db 0 ; 10 zero bytes
pattern times 4 dw 0xAAAA ; four 16-bit 0xAAAA values
BSS (Uninitialized)
section .bss
; Reserve N units:
byte_buf resb 1024 ; 1024 bytes
word_buf resw 512 ; 512 words = 1024 bytes
dword_buf resd 256 ; 256 dwords = 1024 bytes
qword_buf resq 128 ; 128 qwords = 1024 bytes
EQU: Compile-Time Constants
; EQU defines a symbol that is replaced at assembly time (like #define in C)
MAX_SIZE equ 1024
STDOUT equ 1
SYS_WRITE equ 1
SYS_EXIT equ 60
; Usage:
mov rax, SYS_WRITE ; assembles as: mov rax, 1
mov rdi, STDOUT ; assembles as: mov rdi, 1
mov rdx, MAX_SIZE ; assembles as: mov rdx, 1024
EQU constants are not allocated in memory. They exist only in the assembler's symbol table and are substituted during assembly.
Addressing Modes Preview
Memory operands in NASM use brackets. The general form is:
[base + index*scale + displacement]
Where:
- base: any 64-bit register
- index: any 64-bit register except RSP, multiplied by scale
- scale: 1, 2, 4, or 8 (for arrays of 1, 2, 4, or 8-byte elements)
- displacement: a signed 32-bit constant (sign-extended to 64 bits)
Examples:
mov rax, [rbp - 8] ; base+disp: local variable
mov rax, [rdi + rcx*8] ; base+index*scale: array element
mov rax, [rdi + rcx*8 + 16] ; base+index*scale+disp: struct field in array
mov rax, [0x400000] ; absolute address (rare in PIE code)
mov rax, [rel label] ; RIP-relative: address = RIP + signed 32-bit offset
The size of the memory operation (byte, word, dword, qword) is determined by the register size or by an explicit size override:
mov rax, [rdi] ; 64-bit load (rax is 64-bit)
mov eax, [rdi] ; 32-bit load (eax is 32-bit; also zeroes upper 32 of rax)
mov ax, [rdi] ; 16-bit load (ax is 16-bit; does NOT zero upper 48 bits!)
mov al, [rdi] ; 8-bit load
mov QWORD [rdi], rax ; 64-bit store (size must match register or be explicit)
mov DWORD [rdi], 42 ; 32-bit store of immediate (size must be explicit for imm)
Full coverage of addressing modes is in Chapter 8.
Reading Your Process's Memory Layout
On Linux, every process can read its own memory map from /proc/self/maps. Here's a NASM program that does exactly that:
; readmaps.asm — read and print /proc/self/maps
; Shows the actual memory layout of the running process
; Build: nasm -f elf64 readmaps.asm -o readmaps.o && ld readmaps.o -o readmaps
section .data
proc_path db "/proc/self/maps", 0
section .bss
fd resq 1 ; file descriptor
buf resb 4096 ; read buffer (one page)
section .text
global _start
_start:
; Open /proc/self/maps (O_RDONLY = 0)
mov rax, 2 ; sys_open
lea rdi, [rel proc_path]
xor esi, esi ; O_RDONLY = 0
xor edx, edx ; mode = 0 (ignored for O_RDONLY)
syscall
mov [rel fd], rax ; save file descriptor
test rax, rax ; fd < 0 means error
js .error
.read_loop:
; Read up to 4096 bytes
mov rax, 0 ; sys_read
mov rdi, [rel fd]
lea rsi, [rel buf]
mov rdx, 4096
syscall
test rax, rax ; 0 bytes = EOF; negative = error
jle .done
; Write what we read to stdout
mov rdx, rax ; bytes to write = bytes just read
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
lea rsi, [rel buf]
syscall
jmp .read_loop
.done:
; Close the file
mov rax, 3 ; sys_close
mov rdi, [rel fd]
syscall
; Exit
mov rax, 60
xor rdi, rdi
syscall
.error:
; Exit with error code 1
mov rax, 60
mov rdi, 1
syscall
Sample output (actual addresses vary with ASLR):
55f4a1234000-55f4a1235000 r-xp 00000000 08:02 1234567 /path/to/readmaps
55f4a1435000-55f4a1436000 r--p 00001000 08:02 1234567 /path/to/readmaps
55f4a1436000-55f4a1437000 rw-p 00002000 08:02 1234567 /path/to/readmaps
7f8b23456000-7f8b23600000 r-xp 00000000 08:02 987654 /lib/x86_64-linux-gnu/libc.so.6
7f8b23800000-7f8b23801000 r--p 001aa000 08:02 987654 /lib/x86_64-linux-gnu/libc.so.6
7f8b23801000-7f8b23803000 rw-p 001ab000 08:02 987654 /lib/x86_64-linux-gnu/libc.so.6
7f8b23a00000-7f8b23a21000 r-xp 00000000 08:02 111111 /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffd12345000-7ffd12366000 rw-p 00000000 00:00 0 [stack]
7ffd12390000-7ffd12394000 r--p 00000000 00:00 0 [vvar]
7ffd12394000-7ffd12396000 r-xp 00000000 00:00 0 [vdso]
Each line shows: address range, permissions (r=read, w=write, x=execute, p=private/cow), file offset, device, inode, and the mapped file or region name.
The permissions tell you exactly which regions the chapter described:
- The first r-xp entry is .text (read-execute, private): the code
- The r--p entry is .rodata (read-only, private): string literals
- The rw-p entry is .data/.bss (read-write, private): variables
- The [stack] region is rw-p but does not have the execute bit (NX protection)
- Note: libc.so.6 appears with its own r-xp, r--p, and rw-p segments
🔐 Security Note: The lack of the
x(execute) permission on the stack is the No-eXecute (NX) or DEP (Data Execution Prevention) protection. If you try to jump to the stack and execute code there, the CPU will raise a protection fault. This mitigation prevents classic shellcode injection attacks. We cover this in Chapter 11 (buffer overflows) and Chapter 35 (modern exploitation mitigations).
Summary
Memory in x86-64 is a flat virtual address space with enforced permission regions. The operating system maps the ELF binary's segments into memory, sets up a stack, and begins execution. The programmer's view consists of four main static regions (.text, .rodata, .data, .bss), the heap (dynamically allocated), and the stack (function frames).
Alignment matters for performance and correctness. The stack must stay 16-byte aligned. SIMD operations have their own alignment requirements. Data in memory is stored little-endian.
NASM's data declaration directives (db, dw, dd, dq, resb, resw, resd, resq) give you precise control over what bytes appear in memory at what positions.
The /proc/self/maps interface on Linux lets you inspect the memory layout of any running process. This is your window into the virtual address space.
🔄 Check Your Understanding: A function has the following local variable declarations. What is the minimum stack space needed (in bytes) to hold them all, with correct alignment for each? -
int32_t a;(4 bytes, requires 4-byte alignment) -int64_t b;(8 bytes, requires 8-byte alignment) -char c;(1 byte, any alignment) -double d;(8 bytes, requires 8-byte alignment)
Answer
Naive packing: 4 + 8 + 1 + 8 = 21 bytes. But alignment padding is required.Typical compiler layout (largest first to minimize padding): -
bat [rbp-8]: 8 bytes, 8-byte aligned ✓ -dat [rbp-16]: 8 bytes, 8-byte aligned ✓ -aat [rbp-20]: 4 bytes, 4-byte aligned ✓ -cat [rbp-21]: 1 byte, any alignment ✓ - Padding to restore 16-byte stack alignment: 3 bytes - Total:sub rsp, 24(24 bytes, which is 16-byte aligned + the 8-byte return address padding = RSP stays aligned)The compiler decides the exact layout, but the total stack allocation for a function's local variables must be a multiple of 16 bytes (after accounting for the saved RBP on the stack). Here, 24 bytes is the minimum that satisfies all alignment requirements and keeps RSP 16-byte aligned.