Chapter 34 Key Takeaways: Reverse Engineering

Open Assembly Language Project

Chapter 34 Key Takeaways: Reverse Engineering

Reverse engineering is reading assembly without source code — a fundamental skill for malware analysis, vulnerability research, interoperability, CTF competitions, and understanding black-box software. It is legal and essential for defensive security work.
The RE toolkit has distinct roles: objdump for quick inspection, GDB for dynamic analysis, Ghidra for automated decompilation and cross-references, IDA Free for industry-standard static analysis, radare2 for scripted analysis, pwndbg/peda for security-focused GDB enhancement. Use the right tool for each task.
Always use Intel syntax (objdump -M intel): destination-first, no % register prefixes, no size suffixes. AT&T syntax is historical convention; Intel syntax is what every RE tool and security publication uses.
Static analysis is safe; start there: examine strings, imports (dynamic symbol table), section headers, and disassembly without executing the binary. Dynamic analysis with GDB confirms hypotheses from static analysis.
Compiler patterns are predictable and learnable: function prologues (push rbp; mov rbp, rsp; sub rsp, N), if-else (CMP + conditional jump), for/while loops (counter + conditional jump at bottom), switch/case (bounds check + indirect JMP through jump table), virtual dispatch (two loads + indirect CALL). Recognizing these transforms noise into structure.
Find functions without symbols by: following CALL instructions from _start, finding the third argument to __libc_start_main (which is main), searching for function prologue byte sequences, and using string cross-references to interesting functions.
String cross-references guide RE analysis: interesting strings (error messages, format strings, paths) are in .rodata and lead directly to the functions that use them. In malware, strings like /proc/self/maps or network-related strings are critical indicators.
Magic constants identify algorithms: 0x67452301 is MD5, 0x6A09E667 is SHA-256, 0x9E3779B9 is Fibonacci hashing, 0x5A4D is PE/DOS signature, 0x7F454C46 is ELF. Identifying these in unknown code reveals the algorithm without understanding every instruction.
Ghidra's decompiler accelerates analysis by producing C-like pseudocode from assembly, resolving RIP-relative addresses, and maintaining cross-references. Rename symbols as you understand them — it propagates to all call sites and makes subsequent analysis faster.
GDB Python scripting enables automated analysis: set breakpoints that log arguments without stopping execution, trace all calls to a function across thousands of iterations, automate repetitive analysis steps. This is essential for practical RE work.
NOP sleds, mmap/mprotect sequences, and position-independent code are shellcode indicators: seeing writable memory made executable at runtime (mprotect with PROT_EXEC) is a key malware behavioral indicator, regardless of what the shellcode contains.
Reconstructing C from assembly is systematic: identify parameters (registers read before write), local variables (stack offsets and their types), control flow (CFG from conditional jumps), and return value (RAX at each RET). With these four elements, the C reconstruction is mechanical.
CTF-style RE follows a consistent workflow: run the binary to understand behavior, examine strings for hints, find the validation function via string cross-references, understand the comparison logic, reverse the transformation analytically or algebraically. No source code needed.
Stripped binaries are the norm in production software: learn to work without symbols. The dynamic symbol table (imported functions) still reveals what capabilities the binary uses, even when internal names are stripped.
Dynamic analysis confirms static hypotheses: when you believe you understand a function statically, set a GDB breakpoint and observe the actual values. Assembly is unambiguous, but your interpretation might not be. Confirmation is cheap and saves embarrassment.