Appendix K: ASCII, Encoding, and Number Reference
ASCII Table (0–127)
Control Characters (0–31)
| Dec |
Hex |
Symbol |
Name |
| 0 |
0x00 |
NUL |
Null — string terminator in C |
| 1 |
0x01 |
SOH |
Start of Heading |
| 2 |
0x02 |
STX |
Start of Text |
| 3 |
0x03 |
ETX |
End of Text |
| 4 |
0x04 |
EOT |
End of Transmission |
| 5 |
0x05 |
ENQ |
Enquiry |
| 6 |
0x06 |
ACK |
Acknowledge |
| 7 |
0x07 |
BEL |
Bell (audible alert) |
| 8 |
0x08 |
BS |
Backspace |
| 9 |
0x09 |
HT |
Horizontal Tab |
| 10 |
0x0A |
LF |
Line Feed (Unix newline \n) |
| 11 |
0x0B |
VT |
Vertical Tab |
| 12 |
0x0C |
FF |
Form Feed |
| 13 |
0x0D |
CR |
Carriage Return (Windows line ending \r) |
| 14 |
0x0E |
SO |
Shift Out |
| 15 |
0x0F |
SI |
Shift In |
| 16 |
0x10 |
DLE |
Data Link Escape |
| 17 |
0x11 |
DC1 |
Device Control 1 (XON) |
| 18 |
0x12 |
DC2 |
Device Control 2 |
| 19 |
0x13 |
DC3 |
Device Control 3 (XOFF) |
| 20 |
0x14 |
DC4 |
Device Control 4 |
| 21 |
0x15 |
NAK |
Negative Acknowledge |
| 22 |
0x16 |
SYN |
Synchronous Idle |
| 23 |
0x17 |
ETB |
End of Transmission Block |
| 24 |
0x18 |
CAN |
Cancel |
| 25 |
0x19 |
EM |
End of Medium |
| 26 |
0x1A |
SUB |
Substitute |
| 27 |
0x1B |
ESC |
Escape (terminal escape sequences) |
| 28 |
0x1C |
FS |
File Separator |
| 29 |
0x1D |
GS |
Group Separator |
| 30 |
0x1E |
RS |
Record Separator |
| 31 |
0x1F |
US |
Unit Separator |
Printable Characters (32–127)
| Dec |
Hex |
Char |
|
Dec |
Hex |
Char |
|
Dec |
Hex |
Char |
|
Dec |
Hex |
Char |
| 32 |
0x20 |
(space) |
|
56 |
0x38 |
8 |
|
80 |
0x50 |
P |
|
104 |
0x68 |
h |
| 33 |
0x21 |
! |
|
57 |
0x39 |
9 |
|
81 |
0x51 |
Q |
|
105 |
0x69 |
i |
| 34 |
0x22 |
" |
|
58 |
0x3A |
: |
|
82 |
0x52 |
R |
|
106 |
0x6A |
j |
| 35 |
0x23 |
# |
|
59 |
0x3B |
; |
|
83 |
0x53 |
S |
|
107 |
0x6B |
k |
| 36 |
0x24 |
$ |
|
60 |
0x3C |
< |
|
84 |
0x54 |
T |
|
108 |
0x6C |
l |
| 37 |
0x25 |
% |
|
61 |
0x3D |
= |
|
85 |
0x55 |
U |
|
109 |
0x6D |
m |
| 38 |
0x26 |
& |
|
62 |
0x3E |
> |
|
86 |
0x56 |
V |
|
110 |
0x6E |
n |
| 39 |
0x27 |
' |
|
63 |
0x3F |
? |
|
87 |
0x57 |
W |
|
111 |
0x6F |
o |
| 40 |
0x28 |
( |
|
64 |
0x40 |
@ |
|
88 |
0x58 |
X |
|
112 |
0x70 |
p |
| 41 |
0x29 |
) |
|
65 |
0x41 |
A |
|
89 |
0x59 |
Y |
|
113 |
0x71 |
q |
| 42 |
0x2A |
* |
|
66 |
0x42 |
B |
|
90 |
0x5A |
Z |
|
114 |
0x72 |
r |
| 43 |
0x2B |
+ |
|
67 |
0x43 |
C |
|
91 |
0x5B |
[ |
|
115 |
0x73 |
s |
| 44 |
0x2C |
, |
|
68 |
0x44 |
D |
|
92 |
0x5C |
\ |
|
116 |
0x74 |
t |
| 45 |
0x2D |
- |
|
69 |
0x45 |
E |
|
93 |
0x5D |
] |
|
117 |
0x75 |
u |
| 46 |
0x2E |
. |
|
70 |
0x46 |
F |
|
94 |
0x5E |
^ |
|
118 |
0x76 |
v |
| 47 |
0x2F |
/ |
|
71 |
0x47 |
G |
|
95 |
0x5F |
_ |
|
119 |
0x77 |
w |
| 48 |
0x30 |
0 |
|
72 |
0x48 |
H |
|
96 |
0x60 |
` |
|
120 |
0x78 |
x |
| 49 |
0x31 |
1 |
|
73 |
0x49 |
I |
|
97 |
0x61 |
a |
|
121 |
0x79 |
y |
| 50 |
0x32 |
2 |
|
74 |
0x4A |
J |
|
98 |
0x62 |
b |
|
122 |
0x7A |
z |
| 51 |
0x33 |
3 |
|
75 |
0x4B |
K |
|
99 |
0x63 |
c |
|
123 |
0x7B |
{ |
| 52 |
0x34 |
4 |
|
76 |
0x4C |
L |
|
100 |
0x64 |
d |
|
124 |
0x7C |
| |
| 53 |
0x35 |
5 |
|
77 |
0x4D |
M |
|
101 |
0x65 |
e |
|
125 |
0x7D |
} |
| 54 |
0x36 |
6 |
|
78 |
0x4E |
N |
|
102 |
0x66 |
f |
|
126 |
0x7E |
~ |
| 55 |
0x37 |
7 |
|
79 |
0x4F |
O |
|
103 |
0x67 |
g |
|
127 |
0x7F |
DEL |
Key ASCII Ranges (useful for validation code)
| Range |
Values |
Characters |
| Digits |
0x30–0x39 |
0–9 |
| Uppercase |
0x41–0x5A |
A–Z |
| Lowercase |
0x61–0x7A |
a–z |
| Uppercase → lowercase |
add 0x20 |
A (0x41) → a (0x61) |
| Lowercase → uppercase |
subtract 0x20 |
mask bit 5: AND 0xDF |
| Digit → value |
subtract 0x30 |
'7' (0x37) - 0x30 = 7 |
UTF-8 Encoding
UTF-8 is the dominant text encoding on the internet and in Linux systems. It is a variable-width encoding of Unicode code points.
Encoding Rules
| Code point range |
Byte sequence |
| U+0000 to U+007F (ASCII) |
0xxxxxxx (1 byte) |
| U+0080 to U+07FF |
110xxxxx 10xxxxxx (2 bytes) |
| U+0800 to U+FFFF |
1110xxxx 10xxxxxx 10xxxxxx (3 bytes) |
| U+10000 to U+10FFFF |
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (4 bytes) |
The leading byte type is identified by its high bits:
- 0xxxxxxx: single-byte (ASCII range)
- 110xxxxx: start of 2-byte sequence
- 1110xxxx: start of 3-byte sequence
- 11110xxx: start of 4-byte sequence
- 10xxxxxx: continuation byte
Examples
| Character |
Code Point |
UTF-8 Bytes |
A |
U+0041 |
41 |
€ |
U+20AC |
E2 82 AC |
© |
U+00A9 |
C2 A9 |
你 |
U+4F60 |
E4 BD A0 |
😀 |
U+1F600 |
F0 9F 98 80 |
Assembly: Testing for ASCII vs. Multi-byte UTF-8
; Test if a byte is ASCII (single-byte UTF-8):
test al, 0x80 ; if bit 7 is 0, it is ASCII
jz .is_ascii
; Test if a byte is a UTF-8 continuation byte:
and al, 0xC0
cmp al, 0x80 ; 10xxxxxx = continuation
je .is_continuation
Number Base Conversion Reference
Powers of 2
| Power |
Value |
Hex |
Notes |
| 2^0 |
1 |
0x1 |
|
| 2^1 |
2 |
0x2 |
|
| 2^2 |
4 |
0x4 |
|
| 2^3 |
8 |
0x8 |
|
| 2^4 |
16 |
0x10 |
1 hex digit |
| 2^5 |
32 |
0x20 |
|
| 2^6 |
64 |
0x40 |
|
| 2^7 |
128 |
0x80 |
High bit of byte |
| 2^8 |
256 |
0x100 |
1 byte + 1 |
| 2^9 |
512 |
0x200 |
|
| 2^10 |
1,024 |
0x400 |
1 KiB |
| 2^11 |
2,048 |
0x800 |
|
| 2^12 |
4,096 |
0x1000 |
1 page (4 KiB) |
| 2^13 |
8,192 |
0x2000 |
|
| 2^14 |
16,384 |
0x4000 |
|
| 2^15 |
32,768 |
0x8000 |
High bit of 16-bit |
| 2^16 |
65,536 |
0x10000 |
64 KiB |
| 2^20 |
1,048,576 |
0x100000 |
1 MiB |
| 2^21 |
2,097,152 |
0x200000 |
2 MiB page |
| 2^30 |
1,073,741,824 |
0x40000000 |
1 GiB |
| 2^31 |
2,147,483,648 |
0x80000000 |
High bit of 32-bit |
| 2^32 |
4,294,967,296 |
0x100000000 |
4 GiB |
| 2^40 |
1,099,511,627,776 |
0x10000000000 |
1 TiB |
| 2^48 |
281,474,976,710,656 |
0x1000000000000 |
Max addressable in 48-bit VA |
| 2^63 |
9,223,372,036,854,775,808 |
0x8000000000000000 |
High bit of 64-bit |
Hexadecimal Quick Reference
| Hex |
Binary |
Dec |
|
Hex |
Binary |
Dec |
| 0 |
0000 |
0 |
|
8 |
1000 |
8 |
| 1 |
0001 |
1 |
|
9 |
1001 |
9 |
| 2 |
0010 |
2 |
|
A |
1010 |
10 |
| 3 |
0011 |
3 |
|
B |
1011 |
11 |
| 4 |
0100 |
4 |
|
C |
1100 |
12 |
| 5 |
0101 |
5 |
|
D |
1101 |
13 |
| 6 |
0110 |
6 |
|
E |
1110 |
14 |
| 7 |
0111 |
7 |
|
F |
1111 |
15 |
Two's Complement Quick Reference
For an N-bit signed integer:
- Range: -2^(N-1) to +2^(N-1) - 1
- Negative number: invert all bits, add 1
| N |
Min (signed) |
Max (signed) |
Max (unsigned) |
| 8 |
-128 (0x80) |
127 (0x7F) |
255 (0xFF) |
| 16 |
-32,768 (0x8000) |
32,767 (0x7FFF) |
65,535 (0xFFFF) |
| 32 |
-2,147,483,648 (0x80000000) |
2,147,483,647 (0x7FFFFFFF) |
4,294,967,295 (0xFFFFFFFF) |
| 64 |
-9,223,372,036,854,775,808 |
9,223,372,036,854,775,807 |
18,446,744,073,709,551,615 |
Common Small Negatives in Hex (32-bit and 64-bit)
| Decimal |
32-bit Hex |
64-bit Hex |
| -1 |
0xFFFFFFFF |
0xFFFFFFFFFFFFFFFF |
| -2 |
0xFFFFFFFE |
0xFFFFFFFFFFFFFFFE |
| -4 |
0xFFFFFFFC |
0xFFFFFFFFFFFFFFFC |
| -8 |
0xFFFFFFF8 |
0xFFFFFFFFFFFFFFF8 |
| -16 |
0xFFFFFFF0 |
0xFFFFFFFFFFFFFFF0 |
| -128 |
0xFFFFFF80 |
0xFFFFFFFFFFFFFF80 |
When you see a value like 0xFFFFFFFFFFFFF000 as a return value from a system call, it is -4096 = -0x1000, which is the error code -ENOMEM (negated errno 12) if it's in the expected error range, or a valid large address if it's from mmap.
Byte Order (Endianness)
Little-Endian (x86-64, ARM64, RISC-V default)
The least significant byte is stored at the lowest address. The value 0x0000000000401160 in memory (as stored by x86-64):
Address: 0x7fff0000 0x7fff0001 0x7fff0002 0x7fff0003 0x7fff0004 0x7fff0005 0x7fff0006 0x7fff0007
Value: 0x60 0x11 0x40 0x00 0x00 0x00 0x00 0x00
Reading left to right: 60 11 40 00 00 00 00 00 is 0x0000000000401160 in little-endian.
Big-Endian (network byte order, some MIPS/SPARC configurations)
The most significant byte is stored at the lowest address. The same value 0x0000000000401160 big-endian:
Address: 0x7fff0000 0x7fff0001 ... 0x7fff0007
Value: 0x00 0x00 ... 0x60
Conversion in Assembly (x86-64)
; Swap bytes of RAX (convert between endianness):
bswap rax ; reverse byte order of 64-bit register
bswap eax ; reverse byte order of 32-bit register (clears upper 32 bits)
Python Packing Conventions
import struct
# Little-endian (x86-64, ARM64):
struct.pack('<Q', 0x401160) # b'\x60\x11\x40\x00\x00\x00\x00\x00'
struct.pack('<I', 0x401160) # b'\x60\x11\x40\x00'
# Big-endian (network):
struct.pack('>Q', 0x401160) # b'\x00\x00\x00\x00\x00\x40\x11\x60'
# pwntools:
from pwn import p64, p32, u64, u32
p64(0x401160) # little-endian 8 bytes
u64(b'\x60\x11\x40\x00' + b'\x00' * 4) # unpack
IEEE 754 Floating-Point Quick Reference
Single Precision (32-bit, float)
| Field |
Bits |
Description |
| Sign |
1 (bit 31) |
0 = positive, 1 = negative |
| Exponent |
8 (bits 30-23) |
Biased by 127 |
| Mantissa |
23 (bits 22-0) |
Fractional part (implicit leading 1) |
Special values:
- 0x7F800000 = +Infinity
- 0xFF800000 = -Infinity
- 0x7FC00000 = NaN (quiet)
- 0x00000000 = +0.0
- 0x3F800000 = 1.0
- 0x40000000 = 2.0
- 0x3F000000 = 0.5
Double Precision (64-bit, double)
| Field |
Bits |
Description |
| Sign |
1 (bit 63) |
0 = positive, 1 = negative |
| Exponent |
11 (bits 62-52) |
Biased by 1023 |
| Mantissa |
52 (bits 51-0) |
Fractional part |
Special values:
- 0x7FF0000000000000 = +Infinity
- 0x3FF0000000000000 = 1.0
- 0x4000000000000000 = 2.0
- 0x3FE0000000000000 = 0.5
Assembly: Examining Floats
; Store float 1.0 to memory and read back as integer:
mov DWORD [rsp - 4], 0x3F800000 ; 1.0f as raw bits
movss xmm0, DWORD [rsp - 4] ; load as float
; Convert float to integer representation in GDB:
; (gdb) p/f $xmm0.v4_float[0] shows as 1.0
; (gdb) p/x $xmm0.v4_int32[0] shows as 0x3f800000