16 min read

Every binary you have ever run was once assembly language. The compiler translated source code to assembly, the assembler translated assembly to object code, the linker combined object files into an executable. That process is lossy — variable...

Chapter 34: Reverse Engineering

Reading Assembly Without the Source Code

Every binary you have ever run was once assembly language. The compiler translated source code to assembly, the assembler translated assembly to object code, the linker combined object files into an executable. That process is lossy — variable names, comments, type information, and high-level structure are stripped away — but it is not irreversible. The assembly is still there. Reverse engineering is the discipline of reading it.

Reverse engineering (RE) is not about circumventing security or breaking software. It is a fundamental skill for understanding software systems at the level that matters: the instruction level. Malware analysts reverse engineer malicious binaries to understand what they do. Vulnerability researchers reverse engineer software to find security flaws. Interoperability engineers reverse engineer protocols and file formats when documentation is unavailable. CTF competitors reverse engineer challenge programs to find hidden flags. And ordinary programmers reverse engineer their own compiled output to understand what the compiler did and why.

This chapter builds the RE skills you need for all of those purposes.

🔐 Security Note: Reverse engineering is legal in most jurisdictions for security research, interoperability, and personal use. Check local laws and terms of service. Everything in this chapter is framed for understanding — malware analysis, vulnerability research, security auditing — not for unauthorized access to systems.

The Reverse Engineering Toolkit

The RE ecosystem has several tools, each with different strengths. You will use all of them.

objdump: Always Available

objdump ships with GNU binutils and is present on virtually every Linux system. It is the blunt instrument of RE: useful for quick inspection, but limited in analysis capability.

# Disassemble all code sections
objdump -d binary

# Disassemble all sections (including data, interpreted as code)
objdump -D binary

# Show symbol table
objdump -t binary

# Show dynamic symbol table (imported functions)
objdump -T binary

# Use Intel syntax (much more readable)
objdump -M intel -d binary

# Show section headers
objdump -h binary

# Show all: headers, symbols, disassembly
objdump -x -d binary

Intel syntax vs. AT&T syntax deserves explanation. AT&T syntax (the GCC/binutils default) writes movq %rax, %rbx and means "move rax to rbx." Intel syntax writes mov rbx, rax and means the same thing — destination first, then source. AT&T syntax also uses % for registers and $ for immediates, and requires size suffixes (b/w/l/q). Intel syntax is cleaner and is what Ghidra, IDA, and most RE tools use. Always pass -M intel to objdump.

GDB: Dynamic Analysis

GDB lets you run the program and examine its live state. This is invaluable when static analysis is insufficient — when code paths depend on external input, when data structures are only meaningful at runtime, when you need to understand behavior rather than just structure.

We covered GDB extensively in Chapter 6 and throughout the book. In the RE context, GDB's most valuable capabilities are: setting breakpoints in unknown code, examining memory at runtime, stepping through stripped binaries, and scripting analysis with Python.

Ghidra: Free NSA-Developed Decompiler

Ghidra (pronounced "JEE-druh") was developed by the NSA's Research Directorate and released as open source in 2019. It is now one of the most capable free RE tools available. Its key advantage over objdump is its decompiler: it not only disassembles binary code into assembly, but produces a reconstructed C-like representation.

Ghidra understands calling conventions, data types, and control flow. When you load a binary, it auto-analyzes the code and produces both a disassembly view and a decompiled view. You can rename variables and functions, define data types, and annotate your analysis. Cross-references show you every call site for a function and every place a data item is accessed.

The Ghidra workflow: 1. Launch Ghidra, create a new project 2. Import your binary (File → Import File) 3. Double-click the binary to open the Code Browser 4. Answer "Yes" to auto-analyze 5. Use the Symbol Tree to navigate functions 6. The Listing view shows disassembly; the Decompiler panel shows C 7. Press L to rename a symbol; press T to retype a variable

IDA Free: Industry Standard

IDA Pro is the industry-standard commercial RE tool used by professional malware analysts and security researchers worldwide. IDA Free is a limited free version that handles 64-bit ELF and PE binaries without the commercial features. Its decompiler (Hex-Rays, available in Pro) is the gold standard but not available in the free version.

IDA Free is worth knowing because much RE documentation, tutorials, and conference talks use IDA screenshots and IDA script (IDAPython). The interface concepts transfer directly.

radare2: Open-Source Power Tool

radare2 (r2) is a powerful open-source RE framework with a steep learning curve. Its command-line interface rewards investment: once you know it, r2 is fast for scripted analysis, binary patching, and automated RE tasks. Its associated tools include rabin2 (binary information), rasm2 (assembler/disassembler), and rafind2 (pattern searching).

# Open a binary in radare2
r2 binary

# Analyze all functions
[0x00401000]> aaa

# List functions
[0x00401000]> afl

# Go to main
[0x00401000]> s main

# Disassemble current function
[0x00401000]> pdf

# Print cross-references to a function
[0x00401000]> axt sym.vulnerable_function

Binary Ninja: Modern Commercial Platform

Binary Ninja (binja) is a commercial RE platform with a strong Python API, a clean UI, and an intermediate language (BNIL) that enables automated analysis. Its free cloud version handles small binaries at binary.ninja/free. Its strength is programmability: if you need to write custom analysis passes (find all uses of strcpy, identify all functions that access a specific memory location), Binary Ninja's API makes this tractable.

pwndbg and peda: GDB Extensions

pwndbg and peda are GDB plugins that extend it with RE and exploit-development focused features:

  • Colored, formatted register display after every step
  • Automatic disassembly context
  • Heap inspection (heap, bins, chunks)
  • ROP gadget finding
  • Pattern generation and offset calculation
  • Stack visualization with canary detection

Install pwndbg (the more actively maintained):

git clone https://github.com/pwndbg/pwndbg
cd pwndbg
./setup.sh

💡 Mental Model: Think of the RE toolkit in layers. objdump and readelf give you raw information. GDB gives you dynamic execution. Ghidra/IDA give you high-level analysis. pwndbg extends GDB for security work. r2 and Binary Ninja are for when you need programmatic analysis. You will use all of these; knowing which to reach for in which situation is a skill that develops with practice.

Static Analysis with objdump

Static analysis means examining the binary without running it. Start here for any unknown binary — it is safe (you are not executing anything), fast, and gives you an overview before you commit to dynamic analysis.

Reading the Symbol Table

objdump -t /usr/bin/ls | head -40
/usr/bin/ls:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 crt1.o
0000000000000000 l    df *ABS*  0000000000000000 crtstuff.c
0000000000000000 l    d  .init  0000000000000000 .init
...
0000000000401800 g     F .text  0000000000000000 _start
0000000000401830 g     F .text  0000000000000068 main
...

The columns: address, flags (l=local, g=global, d=debug, F=function, O=object), section, size, name.

For stripped binaries (most production software), the symbol table is absent. The dynamic symbol table (-T) may still show imported library functions.

Finding Functions Without Symbols

When symbols are stripped, you find functions by:

  1. Entry points: objdump -f binary shows the entry point address. This is _start.
  2. Function prologues: Search for the standard prologue pattern: 55 48 89 e5 (push rbp; mov rbp, rsp) or the variation 48 83 ec XX (sub rsp, N).
  3. CALL instructions: Every CALL target is a potential function start. Follow them.
  4. String cross-references: Find interesting strings in the binary, then find code that references them.
# Find string references
objdump -M intel -d binary | grep -A5 "Password\|password\|error\|fail"

# Show all strings in the binary
strings -n 8 binary | head -30

# Find where a string address appears in code (manual cross-ref)
# First find the string's address from objdump -D
# Then search for that address in the disassembly

Reading a Disassembly

Here is a real annotated disassembly of a simple utility function. Understanding this kind of output is the core RE skill:

0000000000401196 <check_password>:
  401196:  55                      push   rbp
  401197:  48 89 e5                mov    rbp, rsp
  40119a:  48 83 ec 10             sub    rsp, 0x10          ; allocate 16 bytes
  40119e:  48 89 7d f8             mov    QWORD PTR [rbp-0x8],rdi  ; save arg1 (ptr)
  4011a2:  48 8b 45 f8             mov    rax, QWORD PTR [rbp-0x8] ; load ptr
  4011a6:  48 89 c6                mov    rsi, rax           ; arg2 = ptr (input)
  4011a9:  48 8d 05 54 0e 00 00    lea    rax, [rip+0xe54]   ; rip+0xe54 = .rodata
  4011b0:  48 89 c7                mov    rdi, rax           ; arg1 = "s3cr3t" (known good)
  4011b3:  e8 88 fe ff ff          call   401040 <strcmp@plt> ; call strcmp
  4011b8:  85 c0                   test   eax, eax           ; check return value
  4011ba:  75 07                   jne    4011c3 <check_password+0x2d>  ; not equal → fail
  4011bc:  b8 01 00 00 00          mov    eax, 0x1           ; return 1 (success)
  4011c1:  eb 05                   jmp    4011c8 <check_password+0x32>
  4011c3:  b8 00 00 00 00          mov    eax, 0x0           ; return 0 (failure)
  4011c8:  c9                      leave
  4011c9:  c3                      ret

What we can read from this without any symbols: - It takes one argument (pointer in RDI) - It calls strcmp with that argument and a string from .rodata - It returns 1 if they are equal, 0 otherwise - We can find the comparison string by examining rip+0xe54 at address 0x4011b0: 0x4011b0 + 0xe54 + 7 = 0x402011 — look there in the binary for the literal string

Static Analysis with Ghidra

Ghidra transforms the above from a skilled manual exercise to an automated workflow.

Auto-Analysis

When you open a binary in Ghidra and accept the auto-analysis, it: 1. Finds all function entry points (by pattern matching prologues, following calls) 2. Analyzes each function for data types and control flow 3. Detects calling convention and maps arguments 4. Identifies library calls through PLT stubs 5. Reconstructs local variable names as local_X where X is the stack offset 6. Produces a decompiled C-like output

The decompiled view of check_password above would appear as:

bool check_password(char *param_1)
{
    int iVar1;
    iVar1 = strcmp("s3cr3t", param_1);
    return iVar1 == 0;
}

Not perfect — the argument names are generic, the type is approximated — but it identifies the logic immediately.

Renaming and Annotation

Press L to rename a symbol. Start with the most obvious functions (anything calling strcmp, printf, exit is meaningful) and work outward. As you rename functions, Ghidra updates all call sites automatically.

Press T to set a data type. If you know a pointer is char * or a struct pointer, setting the type propagates through the decompiled view and clarifies the code.

Cross-References

Press X on any symbol to see all references to it. This is the most powerful navigation feature: if you find a suspicious function, X shows you every call site. If you find a suspicious string, X shows you the code that references it.

In the Code Browser's Symbol References window, cross-references show as XREF[N]: annotations — N callers, each linked.

Data Type Recovery

Ghidra's type recovery is imperfect but useful. It recognizes: - Standard C library structs (FILE *, struct stat, etc.) - Common patterns for linked lists and arrays - Vtable pointers (in C++ code)

When you encounter an unknown struct (accessed as offsets from a pointer like [rdi+8], [rdi+16]), create a new struct type in the Data Type Manager and assign fields to match the observed offsets. Ghidra will then show field names in the decompiled view.

Dynamic Analysis with GDB

Static analysis tells you structure. Dynamic analysis tells you behavior. For anything that depends on runtime state, you need GDB.

Running Stripped Binaries

Without symbols, set breakpoints by address:

(gdb) break *0x401196
(gdb) run
Breakpoint 1, 0x0000000000401196 in ?? ()
(gdb) layout asm
(gdb) stepi

Find function boundaries by following CALL instructions from _start. The first CALL from _start typically leads to __libc_start_main, which calls main. The third argument to __libc_start_main (in RDX) is the address of main.

(gdb) break *_start
(gdb) run
(gdb) stepi   # several times, watching for the call to __libc_start_main
(gdb) info registers rdx   # this is main()
(gdb) break *<address in rdx>
(gdb) continue

info functions and info variables

For binaries with partial symbols (debug symbols stripped but dynamic symbols present):

(gdb) info functions         # all known function names and addresses
(gdb) info variables         # all known global variable names
(gdb) info functions check   # functions matching "check"

Memory Inspection at Breakpoints

(gdb) x/20xb $rsp            # 20 bytes at RSP in hex
(gdb) x/8gx $rsp             # 8 64-bit words at RSP
(gdb) x/s $rdi               # string at RDI
(gdb) x/i $rip               # instruction at RIP
(gdb) x/20i $rip             # next 20 instructions

GDB Python Scripting

GDB has a full Python API for automated analysis:

# Save as break_and_print.py
# Load with: gdb -x break_and_print.py binary

import gdb

class TraceFunction(gdb.Breakpoint):
    def stop(self):
        rdi = gdb.parse_and_eval("$rdi")
        rsi = gdb.parse_and_eval("$rsi")
        print(f"[+] Function called: RDI={rdi:#x}, RSI={rsi:#x}")
        return False  # don't stop, continue execution

TraceFunction("*0x401196")
gdb.execute("run arg1 arg2")

This approach is powerful for tracing all calls to a function, logging arguments across many executions, and automating repetitive analysis tasks.

A GDB Session: Finding a Bug in a Stripped Binary

Here is a realistic GDB session finding a bug in a stripped binary (no source, no symbols):

$ gdb ./mystery_binary
(gdb) set disassembly-flavor intel
(gdb) break *_start
(gdb) run AAAA
Breakpoint 1, 0x0000000000401080 in ?? ()

(gdb) # Find main: it's the 3rd arg to __libc_start_main
(gdb) disassemble $rip, $rip+80
   0x401080: endbr64
   0x401084: xor    ebp, ebp
   0x401086: mov    r9, rdx
   0x401089: pop    rsi
   0x40108a: mov    rdx, rsp
   0x40108d: and    rsp, 0xfffffffffffffff0
   0x401091: push   rax
   0x401092: push   rsp
   0x401093: lea    r8, [rip+0x2a6]    ; 0x40133e
   0x40109a: lea    rcx, [rip+0x22f]   ; 0x4012ce
   0x4010a1: lea    rdi, [rip+0x2b8]   ; 0x401360
   0x4010a8: call   0x401060 <__libc_start_main@plt>

(gdb) # main is at [rip+0x2b8] = 0x4010a1+7+0x2b8 = 0x401360
(gdb) break *0x401360
(gdb) continue

Breakpoint 2, 0x0000000000401360 in ?? ()
(gdb) disassemble $rip, $rip+200
   0x401360: push   rbp
   0x401361: mov    rbp, rsp
   0x401364: sub    rsp, 0x50          ; 80 bytes of local vars
   0x401368: mov    DWORD PTR [rbp-0x44], edi   ; argc
   0x40136b: mov    QWORD PTR [rbp-0x50], rsi   ; argv
   ...
   0x40137f: lea    rax, [rbp-0x40]    ; local buffer (64 bytes)
   0x401383: mov    rdx, QWORD PTR [rbp-0x50]
   0x401387: mov    rsi, QWORD PTR [rdx+0x8]    ; argv[1]
   0x40138b: mov    rdi, rax
   0x40138e: call   0x401040 <strcpy@plt>   ; strcpy(buf, argv[1]) -- vulnerable!

(gdb) # strcpy with argv[1] into a 64-byte buffer: classic overflow
(gdb) # Let's confirm: run with 80 A's
(gdb) run $(python3 -c "print('A'*80)")
Segmentation fault.

The analysis reveals: a strcpy call from argv[1] into a stack buffer of 64 bytes — a textbook stack buffer overflow (covered in depth in Chapter 35).

Recognizing Compiler Patterns

Compiler output follows predictable patterns. Learning these patterns is what transforms raw disassembly from noise into structured information.

Function Prologue and Epilogue

Modern GCC/Clang with frame pointer:

push    rbp
mov     rbp, rsp
sub     rsp, N          ; allocate local variable space
; ... function body ...
leave                   ; mov rsp, rbp; pop rbp
ret

Without frame pointer (-fomit-frame-pointer):

sub     rsp, N          ; allocate locals + alignment
; ... function body ...
add     rsp, N
ret

With -O2 or higher, the prologue/epilogue may be minimal or absent for leaf functions (functions that call no other functions).

If-Else Patterns

if (x > 0) { A(); } else { B(); }

Compiles to:

cmp     edi, 0          ; compare x to 0
jle     .else           ; if x <= 0, jump to else
call    A
jmp     .end
.else:
call    B
.end:

Recognize: CMP followed by a conditional jump. The body is between the jump target and a JMP to the end. The condition in the source is the opposite of the jump condition (if the source says "if x > 0 do A", the assembly jumps past A when x <= 0).

For Loop

for (int i = 0; i < 10; i++) { body(); }

Compiles to (with -O0):

mov     DWORD PTR [rbp-4], 0    ; i = 0
jmp     .check
.body:
call    body
add     DWORD PTR [rbp-4], 1    ; i++
.check:
cmp     DWORD PTR [rbp-4], 9    ; i < 10
jle     .body

With optimization, the loop counter may be in a register and the check may be at the bottom:

xor     ecx, ecx        ; i = 0
.body:
call    body
inc     ecx
cmp     ecx, 10
jl      .body

Switch/Case Jump Tables

For switches with dense, contiguous case values, compilers generate a jump table — an array of addresses:

switch (n) {
    case 0: return A();
    case 1: return B();
    case 2: return C();
}
cmp     edi, 2
ja      .default        ; n > 2: fall through to default
lea     rax, [rip+.jumptable]
mov     rax, QWORD PTR [rax + rdi*8]  ; load address from table
jmp     rax             ; indirect jump through table
.jumptable:
    .quad case_0
    .quad case_1
    .quad case_2

The signature: a bounds check (cmp + ja) followed by an array load and an indirect jump. The data section will contain an array of code addresses.

Virtual Dispatch (C++)

obj->virtual_method();
mov     rax, QWORD PTR [rdi]     ; load vtable pointer (first field)
mov     rax, QWORD PTR [rax+N]   ; load function pointer at vtable offset N
call    rax                       ; indirect call through pointer

The signature: two loads followed by an indirect CALL. The vtable is in .rodata. If you can identify the vtable, you can recover the class hierarchy.

String Literals and Global Variables

String literals appear in .rodata and are accessed via RIP-relative addressing:

lea     rdi, [rip+0x1234]       ; address of string in .rodata

To find the string: current_address + instruction_size + 0x1234. Ghidra resolves this automatically; with objdump, calculate manually or run objdump -s -j .rodata binary to see the section contents.

Global variables: also RIP-relative, or via the GOT (Global Offset Table) for PIE binaries.

Reconstructing C from Assembly

Given a function's disassembly, systematically reconstruct the C:

Step 1: Identify Parameters

Count how many of RDI, RSI, RDX, RCX, R8, R9 are read before being written. Each read-before-write register is a parameter. Look at what they are used for (string operations → char *, arithmetic → integer).

Step 2: Identify Local Variables

Stack offsets like [rbp-8], [rbp-16], [rbp-24] are local variables. Their use pattern tells you the type: if they are passed to printf %s they are char *, if they are compared with cmp dword they are int.

Step 3: Reconstruct Control Flow

Build a control flow graph (CFG): each basic block is a node, each conditional jump creates two edges. Loops appear as back edges (jumps to earlier addresses). If-else appears as diamond patterns.

Step 4: Identify the Return Value

What is in RAX (or XMM0 for floats) at each ret? That is the return type and value.

Step 5: Fill in the Structure

With parameters, locals, control flow, and return value identified, write the equivalent C.

Working Without Symbols: Practical Techniques

String Cross-References as Entry Points

Interesting strings lead to interesting functions. A string like "authentication failed" or "root" or "/etc/passwd" will be in .rodata. Find the code that references it, and you have found a relevant function. This is why malware analysts search for strings first.

Constant Values Identify Algorithms

Magic constants identify algorithms: - 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476 — MD5 initial state - 0x6A09E667, 0xBB67AE85 — SHA-256 initial state - 0x9E3779B9 — Fibonacci hashing constant - 0x61626364 — four-character ASCII (likely a string comparison) - 0x5A4D (MZ) — DOS/PE file signature

When you see unexplained constants being loaded and used in computation, search for the constants online. This is how analysts identify obfuscated cryptographic code.

Function Size and Call Patterns

  • Very small functions (< 10 instructions): accessors, wrappers, trivial computations
  • Functions calling malloc and free: memory management
  • Functions with many comparisons and branches: parsers or validators
  • Functions with loops over memory: string processing, copy, search
  • Functions calling socket, connect, send, recv: network code

CTF-Style Reverse Engineering

CTF (Capture The Flag) competitions include "rev" (reverse engineering) challenges where you must reverse engineer a program to find a hidden "flag" — typically a string like FLAG{s0m3_s3cr3t}.

The general approach: 1. Run the binary. What does it do? Does it ask for input? 2. Examine strings: strings binary | grep FLAG 3. Disassemble and find the input validation routine 4. Understand the validation logic 5. Either: run it with the correct input, patch the binary to skip validation, or derive the correct input analytically

Keygen Algorithms

A common pattern: the program applies a transformation to your input and compares the result to a known value (or applies a transformation to a serial number and validates the checksum). To produce valid input:

  1. Reverse the transformation algorithm
  2. If it is a hash, try brute force (if the algorithm is weak)
  3. If it is a simple reversible computation, invert it mathematically
  4. If it is a lookup table, invert the table

Obfuscation Techniques

Junk instructions: meaningless instructions that do not affect program logic (NOPs, XOR reg, reg before setting it again). Identifies by having no effect on registers that matter.

Opaque predicates: conditionals that always take the same branch (e.g., test rax, rax; jz never_taken_label where rax is provably never zero). Used to confuse static analysis.

Self-modifying code: decrypts itself at runtime. Recognize by: executable memory being written, then jumped to. Requires dynamic analysis to see the decrypted form.

Packed executables: the binary is compressed/encrypted; a stub at the entry point unpacks it into memory and jumps to it. Tools: UPX unpacker, manual stub analysis in GDB (set breakpoint at jmp rax in unpacking stub, memory dump after).

Register Trace: Following a Crackme

Let us trace the register state through a simple validation function:

Input: "test123"
Correct answer: "s3cr3t"
Instruction RAX RDI RSI RFLAGS (key)
mov rdi, <"s3cr3t"> - 0x402011 - -
mov rsi, <"test123"> - 0x402011 0x7ffd1234 -
call strcmp - 0x402011 0x7ffd1234 -
(after strcmp returns) 1 - - -
test eax, eax 1 - - ZF=0
jne fail 1 - - ZF=0 → taken
mov eax, 0 0 - - -
ret 0 - - -

The strcmp returned 1 (not equal), ZF was not set, the jump to "fail" was taken, and the function returns 0.

Complete Annotated Disassembly: A Real Utility

Here is a complete annotated disassembly of a simplified cat implementation, demonstrating a real RE analysis:

; cat-simple: reads argv[1] and writes to stdout
; No symbols, stripped binary

0000000000401196 <???>:     ; we name this "read_file"
  401196: 55                push rbp
  401197: 48 89 e5          mov rbp, rsp
  40119a: 48 83 ec 20       sub rsp, 0x20       ; 32 bytes local storage
  40119e: 48 89 7d e8       mov [rbp-0x18], rdi ; save arg1 (filename)

  ; Open the file
  4011a2: 48 8b 45 e8       mov rax, [rbp-0x18] ; filename
  4011a6: be 00 00 00 00    mov esi, 0x0         ; O_RDONLY = 0
  4011ab: 48 89 c7          mov rdi, rax
  4011ae: e8 xx xx xx xx    call open@plt        ; fd = open(filename, O_RDONLY)
  4011b3: 89 45 fc          mov [rbp-0x4], eax   ; save fd (int, 4 bytes)

  ; Check for error
  4011b6: 83 7d fc 00       cmp DWORD [rbp-0x4], 0
  4011ba: 79 xx             jns .read_loop       ; fd >= 0: success
  ; Error path: write error message and return -1
  ...

.read_loop:
  ; read(fd, buf, 4096)
  4011d0: 44 8b 45 fc       mov r8d, [rbp-0x4]  ; fd
  4011d4: 48 8d 45 d0       lea rax, [rbp-0x30]  ; buf = rbp-48
    ; (local buffer: 48 - 32 = ... actually rbp-0x30 implies more stack space)
  4011d8: ba 00 10 00 00    mov edx, 0x1000      ; count = 4096
  4011dd: 48 89 c6          mov rsi, rax         ; buf
  4011e0: 44 89 c7          mov edi, r8d         ; fd
  4011e3: e8 xx xx xx xx    call read@plt
  4011e8: 48 89 45 f0       mov [rbp-0x10], rax  ; save nread (ssize_t, 8 bytes)

  ; Check for EOF or error
  4011ec: 48 83 7d f0 00    cmp QWORD [rbp-0x10], 0
  4011f1: 7e xx             jle .done            ; nread <= 0: done

  ; write(1, buf, nread)
  4011f5: 48 8b 55 f0       mov rdx, [rbp-0x10]  ; nread
  4011f9: 48 8d 45 d0       lea rax, [rbp-0x30]  ; buf
  4011fd: 48 89 c6          mov rsi, rax
  401200: bf 01 00 00 00    mov edi, 0x1         ; stdout = 1
  401205: e8 xx xx xx xx    call write@plt
  40120a: eb c4             jmp .read_loop       ; loop

.done:
  40120c: 8b 45 fc          mov eax, [rbp-0x4]  ; fd
  40120f: 89 c7             mov edi, eax
  401211: e8 xx xx xx xx    call close@plt
  401216: b8 00 00 00 00    mov eax, 0          ; return 0
  40121b: c9                leave
  40121c: c3                ret

Reconstructed C:

int read_file(const char *filename) {
    char buf[4096];   // at rbp-0x30... (stack layout may vary)
    int fd;
    ssize_t nread;

    fd = open(filename, O_RDONLY);
    if (fd < 0) { /* error */ return -1; }

    while ((nread = read(fd, buf, 4096)) > 0) {
        write(1, buf, nread);
    }

    close(fd);
    return 0;
}

The reconstruction is straightforward once you know the library function signatures and can identify stack variables.

🛠️ Lab Exercise: Download a simple open-source utility (e.g., GNU wc), compile it without debug symbols (strip wc), and reverse engineer the word-counting logic. Start with strings to find interesting output, work backward to the counting function, reconstruct the loop logic.

Practical RE Workflow

When encountering an unknown binary:

  1. Gather metadata: file binary (what type?), readelf -h binary (ELF headers), strings -n 8 binary | sort -u (interesting strings)
  2. Check imports: objdump -T binary or ldd binary (what libraries? what functions?)
  3. Identify entry points: readelf -l binary (entry point), follow from _start to main
  4. High-level structure: In Ghidra, look at the function list. Identify main. Follow calls outward.
  5. Focus on interesting functions: Functions calling cryptographic constants, network functions, file I/O, string comparisons
  6. Dynamic confirmation: When you have a hypothesis about behavior, confirm it with GDB. Set a breakpoint, run, inspect state.
  7. Document as you go: Rename symbols in Ghidra as you understand them. Future-you will be grateful.

🔄 Check Your Understanding: 1. What does objdump -M intel -d binary produce, and why prefer Intel syntax? 2. In a stripped binary, how do you find the address of main()? 3. What assembly pattern indicates a switch statement with dense cases? 4. A function contains the constant 0x67452301. What algorithm does this suggest? 5. What is the difference between static and dynamic analysis? When would you use each?

Summary

Reverse engineering is the practice of reading assembly without source code. The toolkit — objdump for quick inspection, GDB for dynamic analysis, Ghidra for high-level understanding — covers the full range from raw bytes to reconstructed C. Compiler patterns (prologues, if-else, loops, switch tables, virtual dispatch) are predictable and learnable. Working without symbols uses string cross-references, constant values, function size, and call patterns to identify functionality. The skills built in this chapter form the foundation for vulnerability research (Chapter 35-37) and security-focused work throughout a systems programming career.