Appendix C: ASCII and Character Set Reference
This appendix provides a complete ASCII table, an overview of extended character sets and code pages, an introduction to Unicode and UTF-8 as they apply to Free Pascal, and a reference for Pascal's built-in character classification and conversion functions.
C.1 The ASCII Table (0-127)
ASCII (American Standard Code for Information Interchange) defines 128 characters using 7 bits. The first 32 codes (0-31) and code 127 are control characters; the rest are printable.
Control Characters (0-31 and 127)
| Dec | Hex | Char | Description | Pascal Escape |
|---|---|---|---|---|
| 0 | 00 | NUL | Null | #0 |
| 1 | 01 | SOH | Start of Heading | #1 |
| 2 | 02 | STX | Start of Text | #2 |
| 3 | 03 | ETX | End of Text | #3 |
| 4 | 04 | EOT | End of Transmission | #4 |
| 5 | 05 | ENQ | Enquiry | #5 |
| 6 | 06 | ACK | Acknowledge | #6 |
| 7 | 07 | BEL | Bell (beep) | #7 |
| 8 | 08 | BS | Backspace | #8 |
| 9 | 09 | HT | Horizontal Tab | #9 |
| 10 | 0A | LF | Line Feed | #10 |
| 11 | 0B | VT | Vertical Tab | #11 |
| 12 | 0C | FF | Form Feed | #12 |
| 13 | 0D | CR | Carriage Return | #13 |
| 14 | 0E | SO | Shift Out | #14 |
| 15 | 0F | SI | Shift In | #15 |
| 16 | 10 | DLE | Data Link Escape | #16 |
| 17 | 11 | DC1 | Device Control 1 (XON) | #17 |
| 18 | 12 | DC2 | Device Control 2 | #18 |
| 19 | 13 | DC3 | Device Control 3 (XOFF) | #19 |
| 20 | 14 | DC4 | Device Control 4 | #20 |
| 21 | 15 | NAK | Negative Acknowledge | #21 |
| 22 | 16 | SYN | Synchronous Idle | #22 |
| 23 | 17 | ETB | End of Transmission Block | #23 |
| 24 | 18 | CAN | Cancel | #24 |
| 25 | 19 | EM | End of Medium | #25 |
| 26 | 1A | SUB | Substitute (EOF on Windows) | #26 |
| 27 | 1B | ESC | Escape | #27 |
| 28 | 1C | FS | File Separator | #28 |
| 29 | 1D | GS | Group Separator | #29 |
| 30 | 1E | RS | Record Separator | #30 |
| 31 | 1F | US | Unit Separator | #31 |
| 127 | 7F | DEL | Delete | #127 |
The most important control characters for everyday programming are:
- LF (10): Line feed. The newline character on Unix/Linux/macOS.
- CR (13): Carriage return. Together, CR+LF (13+10) form the newline on Windows.
- HT (9): Horizontal tab.
- NUL (0): Null character. Terminates PChar strings.
- BEL (7): Produces an audible beep on many terminals.
- ESC (27): Used to begin ANSI escape sequences for terminal colors and cursor control.
Printable Characters (32-126)
| Dec | Hex | Char | Dec | Hex | Char | Dec | Hex | Char | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 32 | 20 | (space) | 64 | 40 | @ |
96 | 60 | ` |
||
| 33 | 21 | ! |
65 | 41 | A |
97 | 61 | a |
||
| 34 | 22 | " |
66 | 42 | B |
98 | 62 | b |
||
| 35 | 23 | # |
67 | 43 | C |
99 | 63 | c |
||
| 36 | 24 | $ |
68 | 44 | D |
100 | 64 | d |
||
| 37 | 25 | % |
69 | 45 | E |
101 | 65 | e |
||
| 38 | 26 | & |
70 | 46 | F |
102 | 66 | f |
||
| 39 | 27 | ' |
71 | 47 | G |
103 | 67 | g |
||
| 40 | 28 | ( |
72 | 48 | H |
104 | 68 | h |
||
| 41 | 29 | ) |
73 | 49 | I |
105 | 69 | i |
||
| 42 | 2A | * |
74 | 4A | J |
106 | 6A | j |
||
| 43 | 2B | + |
75 | 4B | K |
107 | 6B | k |
||
| 44 | 2C | , |
76 | 4C | L |
108 | 6C | l |
||
| 45 | 2D | - |
77 | 4D | M |
109 | 6D | m |
||
| 46 | 2E | . |
78 | 4E | N |
110 | 6E | n |
||
| 47 | 2F | / |
79 | 4F | O |
111 | 6F | o |
||
| 48 | 30 | 0 |
80 | 50 | P |
112 | 70 | p |
||
| 49 | 31 | 1 |
81 | 51 | Q |
113 | 71 | q |
||
| 50 | 32 | 2 |
82 | 52 | R |
114 | 72 | r |
||
| 51 | 33 | 3 |
83 | 53 | S |
115 | 73 | s |
||
| 52 | 34 | 4 |
84 | 54 | T |
116 | 74 | t |
||
| 53 | 35 | 5 |
85 | 55 | U |
117 | 75 | u |
||
| 54 | 36 | 6 |
86 | 56 | V |
118 | 76 | v |
||
| 55 | 37 | 7 |
87 | 57 | W |
119 | 77 | w |
||
| 56 | 38 | 8 |
88 | 58 | X |
120 | 78 | x |
||
| 57 | 39 | 9 |
89 | 59 | Y |
121 | 79 | y |
||
| 58 | 3A | : |
90 | 5A | Z |
122 | 7A | z |
||
| 59 | 3B | ; |
91 | 5B | [ |
123 | 7B | { |
||
| 60 | 3C | < |
92 | 5C | \ |
124 | 7C | \| |
||
| 61 | 3D | = |
93 | 5D | ] |
125 | 7D | } |
||
| 62 | 3E | > |
94 | 5E | ^ |
126 | 7E | ~ |
||
| 63 | 3F | ? |
95 | 5F | _ |
Key Ranges to Remember
| Range (Dec) | Range (Hex) | Characters |
|---|---|---|
| 48-57 | 30-39 | Digits 0-9 |
| 65-90 | 41-5A | Uppercase A-Z |
| 97-122 | 61-7A | Lowercase a-z |
The difference between uppercase and lowercase letters is always 32. This means you can convert case by toggling bit 5:
UpperChar := Chr(Ord(LowerChar) - 32); { 'a' -> 'A' }
LowerChar := Chr(Ord(UpperChar) + 32); { 'A' -> 'a' }
{ Or using bitwise operations: }
UpperChar := Chr(Ord(Ch) and $DF); { Clear bit 5 }
LowerChar := Chr(Ord(Ch) or $20); { Set bit 5 }
Using ASCII Values in Pascal
var
Ch: Char;
begin
Ch := #65; { 'A' via decimal ASCII code }
Ch := #$41; { 'A' via hex ASCII code }
Ch := Chr(65); { 'A' via Chr function }
WriteLn(Ord('A')); { 65 — get ASCII code of a character }
WriteLn(Ord('0')); { 48 }
{ Common idiom: convert digit character to its numeric value }
DigitValue := Ord(Ch) - Ord('0'); { '7' -> 7 }
end;
C.2 Extended ASCII and Code Pages
Standard ASCII only defines 128 characters, which is sufficient for English text but inadequate for other languages. The values 128-255 are undefined by the ASCII standard itself. Various code pages assign different characters to this range.
Common Code Pages
| Code Page | Name | Region / Use |
|---|---|---|
| 437 | OEM United States | Original IBM PC. Box-drawing characters, accented letters. |
| 850 | OEM Multilingual Latin I | Western European languages on DOS. |
| 1250 | Windows Central European | Polish, Czech, Hungarian, etc. |
| 1251 | Windows Cyrillic | Russian, Ukrainian, Bulgarian, etc. |
| 1252 | Windows Western | Western European languages on Windows. Superset of ISO 8859-1. |
| 28591 | ISO 8859-1 (Latin-1) | Western European. Standard on older Unix systems. |
| 28592 | ISO 8859-2 (Latin-2) | Central European. |
| 65001 | UTF-8 | Unicode encoded as variable-width bytes. The modern standard. |
The problem with code pages is that a byte value like 200 means different characters in different code pages. A file written with code page 1251 (Cyrillic) looks like gibberish when read with code page 1252 (Western). This is why Unicode was created.
Code Pages in Free Pascal
Free Pascal's AnsiString type is code-page aware in FPC 3.0+. Each AnsiString carries a code page identifier. You can set the system default code page:
uses
SysUtils;
begin
DefaultSystemCodePage := CP_UTF8; { 65001 }
end;
When you assign strings between different code pages, FPC automatically converts them.
C.3 Unicode Basics and UTF-8 in Free Pascal
What Is Unicode?
Unicode is a universal character set that assigns a unique number (called a code point) to every character in every writing system, plus symbols, emoji, and technical characters. As of Unicode 15.1, there are over 149,000 assigned characters.
Code points are written in the form U+XXXX:
U+0041= Latin Capital Letter AU+00E9= Latin Small Letter E with Acute (e)U+4E16= CJK character (Chinese/Japanese/Korean)U+1F600= Grinning Face emoji
UTF-8 Encoding
Unicode code points must be encoded into bytes for storage and transmission. UTF-8 is the dominant encoding today. It is a variable-width encoding:
| Byte Count | Code Point Range | Byte Pattern |
|---|---|---|
| 1 byte | U+0000 - U+007F | 0xxxxxxx |
| 2 bytes | U+0080 - U+07FF | 110xxxxx 10xxxxxx |
| 3 bytes | U+0800 - U+FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
| 4 bytes | U+10000 - U+10FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
Key properties of UTF-8:
- All ASCII characters (0-127) are single bytes, identical to ASCII. Existing ASCII text is valid UTF-8.
- Non-ASCII characters use 2 to 4 bytes.
Length(S)on a UTF-8AnsiStringreturns the number of bytes, not the number of characters.
UTF-8 in Free Pascal
Free Pascal has strong UTF-8 support, especially through the LazUTF8 unit in Lazarus:
uses
LazUTF8;
var
S: String;
begin
S := 'Hello, world!';
WriteLn(Length(S)); { 14 — byte count }
WriteLn(UTF8Length(S)); { 14 — character count (same for ASCII) }
S := 'Cafe'; { With accent: Caf + U+00E9 }
WriteLn(Length(S)); { 5 — 'C','a','f' = 3 bytes + e = 2 bytes }
WriteLn(UTF8Length(S)); { 4 — four visible characters }
end;
UnicodeString
Free Pascal also provides the UnicodeString type (UTF-16 encoded, similar to Delphi's String type in modern Delphi versions):
var
U: UnicodeString;
begin
U := 'Hello';
WriteLn(Length(U)); { 5 — number of UTF-16 code units }
end;
For most Free Pascal and Lazarus projects, AnsiString with UTF-8 encoding (the default in Lazarus) is the recommended approach. The Lazarus LCL is built around UTF-8 strings.
Practical Tips
-
Set the default code page to UTF-8 early in your program:
pascal DefaultSystemCodePage := CP_UTF8; -
Use
UTF8Length,UTF8Copy,UTF8Pos(fromLazUTF8) instead ofLength,Copy,Poswhen working with non-ASCII text. -
Save source files as UTF-8. Most modern editors do this by default. In Lazarus, go to Source > File Encoding and select UTF-8.
-
Be careful with
Charindexing.S[i]accesses the i-th byte, not the i-th character. For UTF-8 strings containing non-ASCII characters, iterate usingUTF8CharStartor similar routines.
C.4 Character Classification Functions
Free Pascal provides functions for testing and converting characters. Most are in the SysUtils unit or work directly on Char values.
Classification Functions (SysUtils)
| Function | Returns True When |
|---|---|
IsLetter(Ch) |
Ch is a letter (locale-aware) |
IsDigit(Ch) |
Ch is a decimal digit |
IsLetterOrDigit(Ch) |
Ch is a letter or digit |
IsWhiteSpace(Ch) |
Ch is space, tab, CR, LF, etc. |
IsUpper(Ch) |
Ch is an uppercase letter |
IsLower(Ch) |
Ch is a lowercase letter |
IsPunctuation(Ch) |
Ch is a punctuation character |
IsControl(Ch) |
Ch is a control character (0-31, 127) |
Manual Classification (Works Without SysUtils)
function IsAlpha(Ch: Char): Boolean;
begin
Result := Ch in ['A'..'Z', 'a'..'z'];
end;
function IsNumeric(Ch: Char): Boolean;
begin
Result := Ch in ['0'..'9'];
end;
function IsAlphaNumeric(Ch: Char): Boolean;
begin
Result := Ch in ['A'..'Z', 'a'..'z', '0'..'9'];
end;
function IsHexDigit(Ch: Char): Boolean;
begin
Result := Ch in ['0'..'9', 'A'..'F', 'a'..'f'];
end;
function IsSpace(Ch: Char): Boolean;
begin
Result := Ch in [' ', #9, #10, #13];
end;
function IsPrintable(Ch: Char): Boolean;
begin
Result := (Ord(Ch) >= 32) and (Ord(Ch) <= 126);
end;
Conversion Functions
| Function | Description | Example |
|---|---|---|
UpCase(Ch) |
Uppercase a Char (ASCII only) | UpCase('a') = 'A' |
LowerCase(Ch) |
Lowercase a Char (needs SysUtils) | LowerCase('A') = 'a' |
UpperCase(S) |
Uppercase a String | UpperCase('hello') = 'HELLO' |
LowerCase(S) |
Lowercase a String | LowerCase('HELLO') = 'hello' |
Ord(Ch) |
Character to integer code | Ord('A') = 65 |
Chr(N) |
Integer code to character | Chr(65) = 'A' |
Locale-Aware Conversions
For non-ASCII characters, use the UTF8UpperCase and UTF8LowerCase functions from the LazUTF8 unit:
uses
LazUTF8;
var
S: String;
begin
S := UTF8UpperCase('cafe'); { 'CAFE' — handles accented e correctly }
S := UTF8LowerCase('STRASSE'); { 'strasse' — handles German characters }
end;
C.5 Common Character Constants in Pascal
For readability, you can define named constants for frequently used control characters:
const
NUL = #0;
TAB = #9;
LF = #10;
CR = #13;
SPACE = #32;
ESC = #27;
DEL = #127;
{ Line endings }
CRLF = #13#10; { Windows line ending }
{$IFDEF UNIX}
LineEnd = #10;
{$ELSE}
LineEnd = #13#10;
{$ENDIF}
Free Pascal provides the LineEnding constant in the System unit, which automatically resolves to the correct line-ending sequence for the target platform. Prefer LineEnding over hardcoded values for portable code.
This appendix serves as a reference whenever you need to look up a character code, understand encoding behavior, or classify characters in your Pascal programs.