Chapter 4: Memory

Open Assembly Language Project

10 min read

From the CPU's perspective, memory is a flat array of bytes addressed by a 64-bit integer. That's it. There is no type system, no boundary checking, no distinction between code and data (the hardware will execute data as code if you tell it to). The...

In This Chapter

The Machine's View of Memory
The 64-Bit Virtual Address Space
Process Memory Segments
Memory Alignment
Little-Endian Memory Layout
Pointers: Just Numbers in Registers
PUSH and POP Mechanics
NASM Data Declarations in Detail
Addressing Modes Preview
Reading Your Process's Memory Layout
Summary

Key Takeaways Exercises Quiz Case Study 01 Case Study 02 Further Reading

Chapter 4: Memory

The Machine's View of Memory

From the CPU's perspective, memory is a flat array of bytes addressed by a 64-bit integer. That's it. There is no type system, no boundary checking, no distinction between code and data (the hardware will execute data as code if you tell it to). The entire virtual address space — 2^64 bytes in principle, 2^48 bytes in practice on current CPUs — is a single array of bytes.

The structure we see in a running program — the stack, the heap, the code section, the libraries — is imposed by the operating system and the linker, not by hardware. The hardware's job is to translate virtual addresses to physical addresses (via the MMU) and to enforce access permissions (via page protection bits). Everything else is software convention.

This chapter explains the memory model: how the address space is organized, what alignment means and why it matters, how the stack works at the hardware level, and how the NASM assembler declares data. By the end of this chapter, you should be able to look at a memory diagram of a running process and understand every region.

The 64-Bit Virtual Address Space

Every process on a 64-bit operating system runs in its own private virtual address space. The CPU presents the process with the illusion of having exclusive access to the entire 64-bit address space. Physical memory (DRAM) is much smaller — a typical server has 64GB to 1TB — but the virtual address space is 2^64 bytes = 16 exabytes.

In practice, current x86-64 CPUs implement only 48-bit virtual addresses. This is a hardware limitation: addresses must be "canonical" — bits 63:48 must all equal bit 47. Addresses in the range 0x0000000000000000 to 0x00007FFFFFFFFFFF (bits 47:0, with bit 47 = 0) are user space. Addresses in the range 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF (bit 47 = 1, upper bits = 1) are kernel space.

Virtual Address Space Layout (x86-64 Linux, typical)

0xFFFFFFFFFFFFFFFF ┬────────────────────────────────────────────────┐
                   │                                                 │
                   │         KERNEL SPACE (~128 TB)                 │
                   │  (kernel text, data, page tables, vmalloc,     │
                   │   direct physical memory map, etc.)            │
                   │                                                 │
0xFFFF800000000000 ┴────────────────────────────────────────────────┘

       Non-canonical hole: 0x0001000000000000 to 0xFFFF7FFFFFFFFFFF
       (accessing this causes #GP fault — General Protection exception)

0x00007FFFFFFFFFFF ┬────────────────────────────────────────────────┐
                   │  Stack (grows downward)                        │
                   │  ↓                                             │
                   │  ...                                           │
                   │                                                 │
                   │  Memory-mapped files, shared libraries         │
                   │  (mmap region, starting near 0x7F0000000000)  │
                   │                                                 │
                   │  ...                                           │
                   │  ↑                                             │
                   │  Heap (grows upward from program break)        │
                   │                                                 │
                   ├────────────────────────────────────────────────┤
                   │  .bss (uninitialized data)                     │
                   ├────────────────────────────────────────────────┤
                   │  .data (initialized data)                      │
                   ├────────────────────────────────────────────────┤
                   │  .rodata (read-only data: string literals)     │
                   ├────────────────────────────────────────────────┤
                   │  .text (executable code)                       │
                   ├────────────────────────────────────────────────┤
                   │  ELF headers                                   │
0x0000000000400000 ┴────────────────────────────────────────────────┘

0x0000000000000000: NULL pointer (unmapped — any access faults)

This layout is for a non-PIE (Position-Independent Executable) program. PIE executables (the default in modern GCC) are loaded at a random address by ASLR (Address Space Layout Randomization), which we cover in Chapter 11.

Process Memory Segments

.text: Executable Code

The .text section contains machine code — the bytes that the CPU executes. It is mapped as read-execute: you can read the bytes (which is how function pointers work) and execute them, but you cannot write to this region. Writing to .text causes a segfault (kernel: page fault with write to read-only page).

The .text section begins at the ELF entry point (_start) and is laid out by the linker in the order functions appear in the source files.

section .text      ; switch to the .text section
global _start      ; declare _start as a global symbol

_start:
    ; code here goes into .text
    mov rax, 1
    ; ...

.rodata: Read-Only Data

String literals, compile-time constants, and jump tables typically end up in .rodata. This section is mapped read-only: no writes, no execution.

section .rodata
    greeting    db "Hello", 0           ; null-terminated string
    pi_approx   dd 3.14159265           ; 32-bit float constant
    error_codes dd 0, 1, 2, 3, 4       ; lookup table

.data: Initialized Data

Global and static variables with initial values go in .data. This section is read-write, not executable.

section .data
    counter     dq 0                    ; 64-bit counter, initialized to 0
    flag        db 1                    ; byte flag, initialized to 1
    buffer      db "default", 0        ; modifiable string (copies of it!)

.bss: Uninitialized Data

BSS stands for "Block Started by Symbol" (a historical name from early assemblers). Variables in .bss are zero-initialized but not stored in the file — the kernel/loader zeros the region when the program starts. This saves disk space: a 1MB zero-initialized buffer in .data would require 1MB in the binary file; in .bss, it requires only an entry in the section header indicating the size.

section .bss
    large_buf   resb 65536     ; 64KB zero-initialized buffer (not stored in ELF!)
    counter     resq 1         ; one 64-bit counter, initialized to 0
    matrix      resd 100       ; 100 32-bit integers, all zero

The resb, resw, resd, resq directives (reserve byte, word, doubleword, quadword) declare BSS space.

Heap: Dynamic Allocation

The heap is the region used by malloc(), new, and other dynamic allocators. It starts after the BSS section at the program break (a pointer maintained by the OS) and grows toward higher addresses.

In assembly, you access the heap through the mmap system call (preferred) or the legacy brk/sbrk calls:

; Allocate 4096 bytes using mmap (anonymous memory):
mov  rax, 9              ; sys_mmap
mov  rdi, 0              ; addr = 0 (OS chooses address)
mov  rsi, 4096           ; length = 4096 bytes
mov  rdx, 3              ; prot = PROT_READ | PROT_WRITE
mov  r10, 0x22           ; flags = MAP_PRIVATE | MAP_ANONYMOUS
mov  r8, -1              ; fd = -1 (anonymous mapping)
mov  r9, 0               ; offset = 0
syscall
; rax now contains the address of the allocated memory (or error if < 0)

We cover dynamic memory allocation in full detail in Chapter 25.

Stack: Function Frames and Local Variables

The stack is the most important memory region for understanding assembly code. It grows toward lower addresses: each PUSH operation decrements RSP and stores a value; each POP operation loads a value and increments RSP.

High addresses
┌──────────────────────────┐ 0x7fffffffe020  ← initial RSP (set by kernel)
│  argc, argv, envp        │  (command-line args and environment)
├──────────────────────────┤
│  caller's saved rbp      │
├──────────────────────────┤
│  caller's local vars     │
├──────────────────────────┤
│  return address          │  ← pushed by CALL instruction
├──────────────────────────┤
│  current frame's         │
│  saved rbp               │  ← pushed in prologue (push rbp)
├──────────────────────────┤  ← RBP (after "mov rbp, rsp")
│  local variable 1        │  [rbp - 8]
│  local variable 2        │  [rbp - 16]
│  ...                     │
├──────────────────────────┤
│  (optional padding)      │  for 16-byte alignment
├──────────────────────────┤  ← RSP (current top of stack)
│  (unallocated)           │
└──────────────────────────┘  Low addresses

The stack has a maximum size (typically 8MB on Linux, set by ulimit -s). Stack overflow (exceeding this limit) produces a segfault, typically as a stack pointer going below the mapped region.

Memory Alignment

Alignment refers to the requirement that a value's address is a multiple of its size. An aligned 8-byte read must be from an address divisible by 8. An aligned 4-byte read from an address divisible by 4.

Why Alignment Matters

Performance: The CPU reads memory in cache lines (typically 64 bytes). An aligned 8-byte read always falls within a single cache line. A misaligned 8-byte read that straddles a cache line boundary requires two cache line reads — potentially 2x slower.

Correctness: Some SIMD instructions (especially older SSE instructions like MOVAPS) require 16-byte aligned operands and will fault on misaligned addresses. Newer instructions (MOVUPS) allow unaligned access but may be slower.

Atomicity: On x86-64, aligned loads and stores up to 8 bytes are guaranteed to be atomic (a concurrent reader will see either the old value or the new value, never a mix). Misaligned loads and stores lose this guarantee.

Alignment Requirements

Data Type	Size	Required alignment
`char` / `byte`	1	1 (any address)
`short` / `word`	2	2
`int` / `dword`	4	4
`long` / `qword`	8	8
`double`	8	8
`long double` (x87)	10/16	16
`__m128` (SSE)	16	16
`__m256` (AVX)	32	32
`__m512` (AVX-512)	64	64
Cache line	64	64

Ensuring Alignment in NASM

section .data
    align 8            ; pad to 8-byte boundary before next declaration
    qword_val  dq 0x123456789ABCDEF0    ; now guaranteed 8-byte aligned

    align 16
    xmm_val    dq 0x1234567890ABCDEF, 0xFEDCBA9876543210  ; 128-bit, 16-byte aligned

section .bss
    align 32
    avx_buffer  resb 256    ; 32-byte aligned for AVX operations

The linker also aligns sections by default. The .text section is typically page-aligned (4096 bytes). Individual symbols within a section may not be aligned unless you insert align directives.

Stack Alignment

The System V AMD64 ABI requires RSP to be 16-byte aligned before a CALL instruction. At function entry (immediately after CALL pushes the 8-byte return address), RSP is offset by 8 from 16-byte alignment. The standard function prologue:

_start:                     ; RSP is 16-byte aligned here (at program entry)
    call  some_function     ; pushes 8-byte return address; RSP is now 8-byte aligned
                            ; at entry to some_function

some_function:              ; RSP ≡ 8 (mod 16) here
    push  rbp               ; RSP -= 8; now RSP ≡ 0 (mod 16) ✓
    mov   rbp, rsp
    sub   rsp, 32           ; allocate local variables (must be multiple of 16)
    ; ... function body ...
    mov   rsp, rbp          ; restore stack
    pop   rbp
    ret

If you forget to maintain alignment and call a function that uses SSE instructions (which is most modern library code), you'll get a segfault on the misaligned MOVAPS or similar instruction.

Little-Endian Memory Layout

x86-64 stores multi-byte values in memory with the least-significant byte at the lowest address. This is little-endian byte order.

Given:

section .data
    value  dq 0x0102030405060708    ; 64-bit value

Memory layout (addresses shown are relative to the start of value):

Offset +0: 0x08    ← least significant byte
Offset +1: 0x07
Offset +2: 0x06
Offset +3: 0x05
Offset +4: 0x04
Offset +5: 0x03
Offset +6: 0x02
Offset +7: 0x01    ← most significant byte

When you read this back with mov rax, [value], you get 0x0102030405060708 — the correct value. The byte reversal is invisible to arithmetic operations. It only becomes visible when you examine memory byte-by-byte (in GDB's x/8xb output) or when exchanging data with big-endian systems.

Pointers: Just Numbers in Registers

In assembly, a "pointer" is simply a 64-bit integer that happens to contain an address. There is no pointer type, no null check, no bounds check. Writing a pointer to a register makes that register a pointer. Dereferencing a pointer means loading from or storing to the address it contains.

; C equivalent: long *p = &array[0]; *p = 42;
lea  rdi, [array]       ; rdi = address of array[0] (a "pointer")
mov  QWORD [rdi], 42    ; *rdi = 42 (dereference and store)

; C equivalent: long x = *(p + 3); (pointer arithmetic, qword = 8 bytes per element)
mov  rax, QWORD [rdi + 3*8]   ; rax = array[3]

; C equivalent: p++;  (advance pointer to next element)
add  rdi, 8             ; add 8 (size of long) to pointer

The LEA (Load Effective Address) instruction computes an address and stores it in a register without accessing memory:

lea  rax, [rbx + rcx*8 + 16]  ; rax = rbx + rcx*8 + 16 (NO MEMORY ACCESS)
mov  rax, QWORD [rbx + rcx*8 + 16]  ; rax = memory at address rbx+rcx*8+16 (MEMORY ACCESS)

LEA is also used as a cheap arithmetic instruction:

lea  rax, [rbx + rbx*2]   ; rax = rbx * 3 (without using IMUL)
lea  rax, [rbx + 1]        ; rax = rbx + 1 (without modifying flags, unlike ADD)

PUSH and POP Mechanics

PUSH and POP are the primary instructions for managing the stack. Their behavior is precisely defined:

push rax    ; Equivalent to: sub rsp, 8 ; mov [rsp], rax
pop  rbx    ; Equivalent to: mov rbx, [rsp] ; add rsp, 8

Stack grows downward: PUSH decrements RSP before storing. This means RSP always points to the top (lowest) occupied stack slot, not to the next free slot.

Before PUSH rax:              After PUSH rax:
┌──────────┐ 0x100            ┌──────────┐ 0x100
│  (free)  │                  │  (free)  │
├──────────┤ 0x0F8 ← RSP      ├──────────┤ 0x0F8
│   ...    │                  │ value of │
└──────────┘                  │   RAX    │
                              ├──────────┤ 0x0F0 ← RSP (new)
                              └──────────┘

PUSH and POP only operate on 64-bit values in 64-bit mode (you can override to 16-bit with a prefix, but not to 32-bit). There is no push eax in 64-bit mode; use push rax.

NASM Data Declarations in Detail

Initialized Data (in .data or .rodata)

section .data
    ; Bytes
    single_byte     db 42               ; one byte: 0x2A
    char_val        db 'A'              ; one byte: 0x41 (ASCII 'A')
    string          db "Hello", 0       ; 6 bytes: H, e, l, l, o, \0
    neg_byte        db -1               ; one byte: 0xFF (two's complement)

    ; Words (16-bit)
    word_val        dw 0x1234           ; two bytes: 0x34, 0x12 (little-endian!)
    neg_word        dw -1               ; two bytes: 0xFF, 0xFF

    ; Doublewords (32-bit)
    dword_val       dd 0x12345678       ; four bytes
    float_val       dd 3.14159          ; 32-bit IEEE 754 float

    ; Quadwords (64-bit)
    qword_val       dq 0x123456789ABCDEF0  ; eight bytes
    double_val      dq 3.14159265358979    ; 64-bit IEEE 754 double

    ; Multiple values
    array           dd 1, 2, 3, 4, 5   ; five 32-bit integers
    mixed           db 0x01, 0x02, "AB"  ; bytes from different sources

    ; The $ operator (current position) and $$ (section start)
    message         db "Hello, World!"
    msg_len         equ $ - message    ; compile-time constant: length of message
    section_offset  equ $ - $$         ; bytes from start of section

    ; TIMES: repeated initialization
    zeros           times 10 db 0      ; 10 zero bytes
    pattern         times 4  dw 0xAAAA ; four 16-bit 0xAAAA values

BSS (Uninitialized)

section .bss
    ; Reserve N units:
    byte_buf    resb 1024   ; 1024 bytes
    word_buf    resw 512    ; 512 words = 1024 bytes
    dword_buf   resd 256    ; 256 dwords = 1024 bytes
    qword_buf   resq 128    ; 128 qwords = 1024 bytes

EQU: Compile-Time Constants

; EQU defines a symbol that is replaced at assembly time (like #define in C)
MAX_SIZE     equ 1024
STDOUT       equ 1
SYS_WRITE    equ 1
SYS_EXIT     equ 60

; Usage:
    mov  rax, SYS_WRITE     ; assembles as: mov rax, 1
    mov  rdi, STDOUT        ; assembles as: mov rdi, 1
    mov  rdx, MAX_SIZE      ; assembles as: mov rdx, 1024

EQU constants are not allocated in memory. They exist only in the assembler's symbol table and are substituted during assembly.

Addressing Modes Preview

Memory operands in NASM use brackets. The general form is:

[base + index*scale + displacement]

Where: - base: any 64-bit register - index: any 64-bit register except RSP, multiplied by scale - scale: 1, 2, 4, or 8 (for arrays of 1, 2, 4, or 8-byte elements) - displacement: a signed 32-bit constant (sign-extended to 64 bits)

Examples:

mov  rax, [rbp - 8]           ; base+disp: local variable
mov  rax, [rdi + rcx*8]       ; base+index*scale: array element
mov  rax, [rdi + rcx*8 + 16]  ; base+index*scale+disp: struct field in array
mov  rax, [0x400000]          ; absolute address (rare in PIE code)
mov  rax, [rel label]         ; RIP-relative: address = RIP + signed 32-bit offset

The size of the memory operation (byte, word, dword, qword) is determined by the register size or by an explicit size override:

mov  rax, [rdi]          ; 64-bit load (rax is 64-bit)
mov  eax, [rdi]          ; 32-bit load (eax is 32-bit; also zeroes upper 32 of rax)
mov  ax,  [rdi]          ; 16-bit load (ax is 16-bit; does NOT zero upper 48 bits!)
mov  al,  [rdi]          ; 8-bit load
mov  QWORD [rdi], rax    ; 64-bit store (size must match register or be explicit)
mov  DWORD [rdi], 42     ; 32-bit store of immediate (size must be explicit for imm)

Full coverage of addressing modes is in Chapter 8.

Reading Your Process's Memory Layout

On Linux, every process can read its own memory map from /proc/self/maps. Here's a NASM program that does exactly that:

; readmaps.asm — read and print /proc/self/maps
; Shows the actual memory layout of the running process
; Build: nasm -f elf64 readmaps.asm -o readmaps.o && ld readmaps.o -o readmaps

section .data
    proc_path   db "/proc/self/maps", 0

section .bss
    fd          resq 1          ; file descriptor
    buf         resb 4096       ; read buffer (one page)

section .text
    global _start

_start:
    ; Open /proc/self/maps (O_RDONLY = 0)
    mov  rax, 2                 ; sys_open
    lea  rdi, [rel proc_path]
    xor  esi, esi               ; O_RDONLY = 0
    xor  edx, edx               ; mode = 0 (ignored for O_RDONLY)
    syscall
    mov  [rel fd], rax          ; save file descriptor

    test rax, rax               ; fd < 0 means error
    js   .error

.read_loop:
    ; Read up to 4096 bytes
    mov  rax, 0                 ; sys_read
    mov  rdi, [rel fd]
    lea  rsi, [rel buf]
    mov  rdx, 4096
    syscall

    test rax, rax               ; 0 bytes = EOF; negative = error
    jle  .done

    ; Write what we read to stdout
    mov  rdx, rax               ; bytes to write = bytes just read
    mov  rax, 1                 ; sys_write
    mov  rdi, 1                 ; stdout
    lea  rsi, [rel buf]
    syscall

    jmp  .read_loop

.done:
    ; Close the file
    mov  rax, 3                 ; sys_close
    mov  rdi, [rel fd]
    syscall

    ; Exit
    mov  rax, 60
    xor  rdi, rdi
    syscall

.error:
    ; Exit with error code 1
    mov  rax, 60
    mov  rdi, 1
    syscall

Sample output (actual addresses vary with ASLR):

55f4a1234000-55f4a1235000 r-xp 00000000 08:02 1234567  /path/to/readmaps
55f4a1435000-55f4a1436000 r--p 00001000 08:02 1234567  /path/to/readmaps
55f4a1436000-55f4a1437000 rw-p 00002000 08:02 1234567  /path/to/readmaps
7f8b23456000-7f8b23600000 r-xp 00000000 08:02 987654   /lib/x86_64-linux-gnu/libc.so.6
7f8b23800000-7f8b23801000 r--p 001aa000 08:02 987654   /lib/x86_64-linux-gnu/libc.so.6
7f8b23801000-7f8b23803000 rw-p 001ab000 08:02 987654   /lib/x86_64-linux-gnu/libc.so.6
7f8b23a00000-7f8b23a21000 r-xp 00000000 08:02 111111   /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffd12345000-7ffd12366000 rw-p 00000000 00:00 0        [stack]
7ffd12390000-7ffd12394000 r--p 00000000 00:00 0        [vvar]
7ffd12394000-7ffd12396000 r-xp 00000000 00:00 0        [vdso]

Each line shows: address range, permissions (r=read, w=write, x=execute, p=private/cow), file offset, device, inode, and the mapped file or region name.

The permissions tell you exactly which regions the chapter described: - The first r-xp entry is .text (read-execute, private): the code - The r--p entry is .rodata (read-only, private): string literals - The rw-p entry is .data/.bss (read-write, private): variables - The [stack] region is rw-p but does not have the execute bit (NX protection) - Note: libc.so.6 appears with its own r-xp, r--p, and rw-p segments

🔐 Security Note: The lack of the x (execute) permission on the stack is the No-eXecute (NX) or DEP (Data Execution Prevention) protection. If you try to jump to the stack and execute code there, the CPU will raise a protection fault. This mitigation prevents classic shellcode injection attacks. We cover this in Chapter 11 (buffer overflows) and Chapter 35 (modern exploitation mitigations).

Summary

Memory in x86-64 is a flat virtual address space with enforced permission regions. The operating system maps the ELF binary's segments into memory, sets up a stack, and begins execution. The programmer's view consists of four main static regions (.text, .rodata, .data, .bss), the heap (dynamically allocated), and the stack (function frames).

Alignment matters for performance and correctness. The stack must stay 16-byte aligned. SIMD operations have their own alignment requirements. Data in memory is stored little-endian.

NASM's data declaration directives (db, dw, dd, dq, resb, resw, resd, resq) give you precise control over what bytes appear in memory at what positions.

The /proc/self/maps interface on Linux lets you inspect the memory layout of any running process. This is your window into the virtual address space.

🔄 Check Your Understanding: A function has the following local variable declarations. What is the minimum stack space needed (in bytes) to hold them all, with correct alignment for each? - int32_t a; (4 bytes, requires 4-byte alignment) - int64_t b; (8 bytes, requires 8-byte alignment) - char c; (1 byte, any alignment) - double d; (8 bytes, requires 8-byte alignment)

Answer
Naive packing: 4 + 8 + 1 + 8 = 21 bytes. But alignment padding is required.

Typical compiler layout (largest first to minimize padding): - b at [rbp-8]: 8 bytes, 8-byte aligned ✓ - d at [rbp-16]: 8 bytes, 8-byte aligned ✓ - a at [rbp-20]: 4 bytes, 4-byte aligned ✓ - c at [rbp-21]: 1 byte, any alignment ✓ - Padding to restore 16-byte stack alignment: 3 bytes - Total: sub rsp, 24 (24 bytes, which is 16-byte aligned + the 8-byte return address padding = RSP stays aligned)

The compiler decides the exact layout, but the total stack allocation for a function's local variables must be a multiple of 16 bytes (after accounting for the saved RBP on the stack). Here, 24 bytes is the minimum that satisfies all alignment requirements and keeps RSP 16-byte aligned.