Case Study 3.2: CPUID — Querying the CPU About Itself
A complete NASM program that reads the processor brand string and checks for SSE4.2, AVX2, and AES-NI
Overview
The CPUID instruction is one of x86-64's most unusual features: it's a self-reporting mechanism that lets software query the processor's capabilities, vendor, and version. Before using AES-NI for encryption, AVX2 for vectorized loops, or any other optional extension, you should check CPUID to confirm the current CPU supports it. This case study builds a complete NASM program that enumerates CPU features and demonstrates how runtime capability detection is implemented in real programs.
How CPUID Works
CPUID takes a "leaf" value in EAX (and sometimes a "sub-leaf" in ECX) and returns information in EAX, EBX, ECX, and EDX. The leaf value determines what information is returned.
Key CPUID leaves:
| EAX (leaf) | ECX (sub-leaf) | Information returned |
|---|---|---|
| 0 | — | Maximum basic leaf; vendor string in EBX/ECX/EDX |
| 1 | — | Processor version info; feature flags |
| 4 | 0,1,2,... | Cache topology |
| 7 | 0 | Extended feature flags (AVX2, SHA, etc.) |
| 7 | 1 | More extended features (AVXVNNI, SHA512, etc.) |
| 0x80000000 | — | Maximum extended leaf |
| 0x80000002-4 | — | Processor brand string (48 chars) |
| 0x80000008 | — | Virtual/physical address size |
Important: CPUID is a serializing instruction — it completes all previous instructions and prevents subsequent instructions from executing until CPUID completes. This makes it expensive (hundreds of cycles) but safe for one-time capability detection at startup.
The Complete Program
; cpuid_demo.asm
; Demonstrates CPUID: reads brand string, checks SSE4.2/AVX2/AES-NI/AVX512
; Build: nasm -f elf64 cpuid_demo.asm -o cpuid_demo.o && ld cpuid_demo.o -o cpuid_demo
section .data
; Labels and messages
brand_label db "CPU: ", 0
brand_label_len equ $ - brand_label - 1 ; exclude null
newline db 10
feature_labels:
.sse42 db "SSE4.2 : ", 0
.aesni db "AES-NI : ", 0
.avx db "AVX : ", 0
.avx2 db "AVX2 : ", 0
.avx512f db "AVX-512F : ", 0
.sha db "SHA Ext. : ", 0
msg_yes db "YES", 10
msg_yes_len equ $ - msg_yes
msg_no db "NO ", 10
msg_no_len equ $ - msg_no
section .bss
; 48 bytes for brand string + null terminator
brand_string resb 49
section .text
global _start
; ============================================================
; print_cstr: print a null-terminated string
; Args: rdi = pointer to string
; Clobbers: rax, rdi, rsi, rdx, rcx (caller-saved)
; ============================================================
print_cstr:
push rdi
; calculate length by scanning for null
xor ecx, ecx
.len_loop:
cmp BYTE [rdi + rcx], 0
je .len_done
inc ecx
jmp .len_loop
.len_done:
; rdi = string, rcx = length
mov rsi, rdi ; buffer
mov rdx, rcx ; length
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
syscall
pop rdi
ret
; ============================================================
; print_yes_no: print YES or NO followed by newline
; Args: rdi = 1 for yes, 0 for no
; ============================================================
print_yes_no:
test rdi, rdi
jz .no
mov rax, 1
mov rdi, 1
lea rsi, [rel msg_yes]
mov rdx, msg_yes_len
syscall
ret
.no:
mov rax, 1
mov rdi, 1
lea rsi, [rel msg_no]
mov rdx, msg_no_len
syscall
ret
; ============================================================
; get_brand_string: fill brand_string buffer with CPU brand
; Uses CPUID leaves 0x80000002, 0x80000003, 0x80000004
; Each leaf returns 16 bytes (EAX, EBX, ECX, EDX = 4×4 bytes)
; ============================================================
get_brand_string:
; First, check if extended CPUID leaves are available
mov eax, 0x80000000
cpuid
cmp eax, 0x80000004 ; check max extended leaf
jb .no_brand ; if < 0x80000004, brand string not available
lea rdi, [rel brand_string]
; Leaf 0x80000002: first 16 bytes
mov eax, 0x80000002
cpuid
mov [rdi], eax ; bytes 0-3
mov [rdi + 4], ebx ; bytes 4-7
mov [rdi + 8], ecx ; bytes 8-11
mov [rdi + 12], edx ; bytes 12-15
; Leaf 0x80000003: second 16 bytes
mov eax, 0x80000003
cpuid
mov [rdi + 16], eax
mov [rdi + 20], ebx
mov [rdi + 24], ecx
mov [rdi + 28], edx
; Leaf 0x80000004: third 16 bytes
mov eax, 0x80000004
cpuid
mov [rdi + 32], eax
mov [rdi + 36], ebx
mov [rdi + 40], ecx
mov [rdi + 44], edx
; The brand string is null-terminated within the 48 bytes
ret
.no_brand:
; Fill with "Unknown CPU"
lea rdi, [rel brand_string]
mov DWORD [rdi], 0x6E6B6E55 ; "Unkn" (little-endian)
mov DWORD [rdi + 4], 0x206E776F ; "own "
mov DWORD [rdi + 8], 0x00555043 ; "CPU\0"
ret
; ============================================================
; check_features: check and print CPU feature flags
; Uses CPUID leaf 1 (ECX) for SSE4.2, AES-NI, AVX
; Uses CPUID leaf 7, sub-leaf 0 (EBX) for AVX2, AVX-512F, SHA
; ============================================================
check_features:
push r12 ; callee-saved: will hold leaf-1 ECX
push r13 ; callee-saved: will hold leaf-7 EBX
push rbx ; callee-saved: CPUID clobbers EBX
; === CPUID Leaf 1 ===
mov eax, 1
cpuid
; EAX = version info, EBX = brand/clflush/apicid, ECX = feature flags
mov r12d, ecx ; save ECX (leaf-1 feature flags)
; EDX also has feature flags (older ones: SSE, SSE2, etc.)
; === CPUID Leaf 7, Sub-leaf 0 ===
mov eax, 7
xor ecx, ecx ; sub-leaf 0
cpuid
mov r13d, ebx ; save EBX (leaf-7 feature flags)
; === Print SSE4.2 (leaf 1, ECX bit 20) ===
lea rdi, [rel feature_labels.sse42]
call print_cstr
xor rdi, rdi
test r12, (1 << 20) ; bit 20 of leaf-1 ECX = SSE4.2
setnz dil ; set to 1 if bit is set, 0 otherwise
call print_yes_no
; === Print AES-NI (leaf 1, ECX bit 25) ===
lea rdi, [rel feature_labels.aesni]
call print_cstr
xor rdi, rdi
test r12, (1 << 25) ; bit 25 = AES-NI
setnz dil
call print_yes_no
; === Print AVX (leaf 1, ECX bit 28) ===
lea rdi, [rel feature_labels.avx]
call print_cstr
xor rdi, rdi
test r12, (1 << 28) ; bit 28 = AVX
setnz dil
call print_yes_no
; === Print AVX2 (leaf 7, EBX bit 5) ===
lea rdi, [rel feature_labels.avx2]
call print_cstr
xor rdi, rdi
test r13, (1 << 5) ; bit 5 = AVX2
setnz dil
call print_yes_no
; === Print AVX-512F (leaf 7, EBX bit 16) ===
lea rdi, [rel feature_labels.avx512f]
call print_cstr
xor rdi, rdi
test r13, (1 << 16) ; bit 16 = AVX-512 Foundation
setnz dil
call print_yes_no
; === Print SHA Extensions (leaf 7, EBX bit 29) ===
lea rdi, [rel feature_labels.sha]
call print_cstr
xor rdi, rdi
test r13, (1 << 29) ; bit 29 = SHA extensions
setnz dil
call print_yes_no
pop rbx
pop r13
pop r12
ret
; ============================================================
; _start: main entry point
; ============================================================
_start:
; Get and print brand string
call get_brand_string
; Print "CPU: " label
mov rax, 1
mov rdi, 1
lea rsi, [rel brand_label]
mov rdx, brand_label_len
syscall
; Print the brand string
lea rdi, [rel brand_string]
call print_cstr
; Print newline
mov rax, 1
mov rdi, 1
lea rsi, [rel newline]
mov rdx, 1
syscall
; Check and print features
call check_features
; Exit
mov rax, 60
xor rdi, rdi
syscall
Sample Output
Running on an Intel Core i9-12900K:
CPU: Intel(R) Core(TM) i9-12900K CPU @ 3.20GHz
SSE4.2 : YES
AES-NI : YES
AVX : YES
AVX2 : YES
AVX-512F : NO
SHA Ext. : NO
Running on an AMD Ryzen 9 7950X (Zen 4):
CPU: AMD Ryzen 9 7950X 16-Core Processor
SSE4.2 : YES
AES-NI : YES
AVX : YES
AVX2 : YES
AVX-512F : YES
SHA Ext. : YES
Key Technical Notes
Why CPUID Clobbers EBX
The CPUID instruction writes to EAX, EBX, ECX, and EDX — all four. In a function that uses EBX for its own purposes (or RBX as a callee-saved register), you must save RBX before calling CPUID and restore it after:
push rbx ; save callee-saved register
mov eax, 7
xor ecx, ecx
cpuid
mov r13d, ebx ; save the CPUID result before restoring rbx
pop rbx ; restore callee-saved register
Alternatively, save the result to a separate register before CPUID potentially modifies EBX again.
The SETNZ Idiom
test r12, (1 << 25) ; sets ZF if bit 25 is 0; clears ZF if bit 25 is 1
setnz dil ; set DIL to 1 if ZF=0 (bit was set), 0 if ZF=1 (bit was clear)
SETNZ (Set if Not Zero) writes 1 or 0 to a byte register based on the Zero Flag. This branchless pattern converts a flag bit into a 0/1 value without a conditional jump.
The Brand String Encoding
The brand string returned by CPUID is in little-endian byte order, stored across three CPUID leaves. Each register (EAX, EBX, ECX, EDX) contains 4 bytes in little-endian order.
For example, if a brand string starts with "Inte" (ASCII: 49 6E 74 65):
- EAX at leaf 0x80000002 = 0x65746E49 (little-endian: 49 6E 74 65 → "Inte")
When you write EAX to memory with mov [buffer], eax, the little-endian write places 0x49 at offset 0, 0x6E at offset 1, etc. — which happens to spell "Inte" correctly in the buffer. Little-endian byte order and left-to-right string reading are complementary here.
Practical Applications
Runtime Dispatch in Real Code
Production code uses CPUID at startup to choose the fastest available implementation:
; At program startup:
call detect_features
; Later, when calling a performance-critical function:
cmp BYTE [has_avx2], 1
je use_avx2_memcpy
cmp BYTE [has_sse4_2], 1
je use_sse42_memcpy
jmp use_scalar_memcpy
This pattern is used in glibc (for memcpy, strlen, strcmp), OpenSSL (for AES), and virtually every high-performance library.
IFUNC (Indirect Function)
On Linux, the ELF format supports STT_GNU_IFUNC — indirect functions that call a resolver at load time to select the best implementation. This is how glibc's memcpy automatically uses AVX-512 on systems that support it and falls back to SSE2 on older CPUs, without any runtime branch overhead after initialization.
What to Build Next
The CPUID check for AES-NI support is the first step in the XOR→AES-NI encryption tool introduced in Chapter 13. Before writing any AES encryption code, you check:
; Is AES-NI available?
mov eax, 1
cpuid
test ecx, (1 << 25) ; AES-NI bit
jz .no_aesni ; fall back to software AES
If jz is not taken, you can safely use the AESENC, AESENCLAST, AESDEC, AESDECLAST, and AESKEYGENASSIST instructions that perform AES rounds in hardware.
This case study gives you the complete infrastructure for that detection.