Appendix C: ASCII and Character Set Reference

This appendix provides a complete ASCII table, an overview of extended character sets and code pages, an introduction to Unicode and UTF-8 as they apply to Free Pascal, and a reference for Pascal's built-in character classification and conversion functions.


C.1 The ASCII Table (0-127)

ASCII (American Standard Code for Information Interchange) defines 128 characters using 7 bits. The first 32 codes (0-31) and code 127 are control characters; the rest are printable.

Control Characters (0-31 and 127)

Dec Hex Char Description Pascal Escape
0 00 NUL Null #0
1 01 SOH Start of Heading #1
2 02 STX Start of Text #2
3 03 ETX End of Text #3
4 04 EOT End of Transmission #4
5 05 ENQ Enquiry #5
6 06 ACK Acknowledge #6
7 07 BEL Bell (beep) #7
8 08 BS Backspace #8
9 09 HT Horizontal Tab #9
10 0A LF Line Feed #10
11 0B VT Vertical Tab #11
12 0C FF Form Feed #12
13 0D CR Carriage Return #13
14 0E SO Shift Out #14
15 0F SI Shift In #15
16 10 DLE Data Link Escape #16
17 11 DC1 Device Control 1 (XON) #17
18 12 DC2 Device Control 2 #18
19 13 DC3 Device Control 3 (XOFF) #19
20 14 DC4 Device Control 4 #20
21 15 NAK Negative Acknowledge #21
22 16 SYN Synchronous Idle #22
23 17 ETB End of Transmission Block #23
24 18 CAN Cancel #24
25 19 EM End of Medium #25
26 1A SUB Substitute (EOF on Windows) #26
27 1B ESC Escape #27
28 1C FS File Separator #28
29 1D GS Group Separator #29
30 1E RS Record Separator #30
31 1F US Unit Separator #31
127 7F DEL Delete #127

The most important control characters for everyday programming are:

  • LF (10): Line feed. The newline character on Unix/Linux/macOS.
  • CR (13): Carriage return. Together, CR+LF (13+10) form the newline on Windows.
  • HT (9): Horizontal tab.
  • NUL (0): Null character. Terminates PChar strings.
  • BEL (7): Produces an audible beep on many terminals.
  • ESC (27): Used to begin ANSI escape sequences for terminal colors and cursor control.

Printable Characters (32-126)

Dec Hex Char Dec Hex Char Dec Hex Char
32 20 (space) 64 40 @ 96 60 `
33 21 ! 65 41 A 97 61 a
34 22 " 66 42 B 98 62 b
35 23 # 67 43 C 99 63 c
36 24 $ 68 44 D 100 64 d
37 25 % 69 45 E 101 65 e
38 26 & 70 46 F 102 66 f
39 27 ' 71 47 G 103 67 g
40 28 ( 72 48 H 104 68 h
41 29 ) 73 49 I 105 69 i
42 2A * 74 4A J 106 6A j
43 2B + 75 4B K 107 6B k
44 2C , 76 4C L 108 6C l
45 2D - 77 4D M 109 6D m
46 2E . 78 4E N 110 6E n
47 2F / 79 4F O 111 6F o
48 30 0 80 50 P 112 70 p
49 31 1 81 51 Q 113 71 q
50 32 2 82 52 R 114 72 r
51 33 3 83 53 S 115 73 s
52 34 4 84 54 T 116 74 t
53 35 5 85 55 U 117 75 u
54 36 6 86 56 V 118 76 v
55 37 7 87 57 W 119 77 w
56 38 8 88 58 X 120 78 x
57 39 9 89 59 Y 121 79 y
58 3A : 90 5A Z 122 7A z
59 3B ; 91 5B [ 123 7B {
60 3C < 92 5C \ 124 7C \|
61 3D = 93 5D ] 125 7D }
62 3E > 94 5E ^ 126 7E ~
63 3F ? 95 5F _

Key Ranges to Remember

Range (Dec) Range (Hex) Characters
48-57 30-39 Digits 0-9
65-90 41-5A Uppercase A-Z
97-122 61-7A Lowercase a-z

The difference between uppercase and lowercase letters is always 32. This means you can convert case by toggling bit 5:

UpperChar := Chr(Ord(LowerChar) - 32);   { 'a' -> 'A' }
LowerChar := Chr(Ord(UpperChar) + 32);   { 'A' -> 'a' }

{ Or using bitwise operations: }
UpperChar := Chr(Ord(Ch) and $DF);       { Clear bit 5 }
LowerChar := Chr(Ord(Ch) or $20);        { Set bit 5 }

Using ASCII Values in Pascal

var
  Ch: Char;
begin
  Ch := #65;              { 'A' via decimal ASCII code }
  Ch := #$41;             { 'A' via hex ASCII code }
  Ch := Chr(65);          { 'A' via Chr function }

  WriteLn(Ord('A'));      { 65 — get ASCII code of a character }
  WriteLn(Ord('0'));      { 48 }

  { Common idiom: convert digit character to its numeric value }
  DigitValue := Ord(Ch) - Ord('0');   { '7' -> 7 }
end;

C.2 Extended ASCII and Code Pages

Standard ASCII only defines 128 characters, which is sufficient for English text but inadequate for other languages. The values 128-255 are undefined by the ASCII standard itself. Various code pages assign different characters to this range.

Common Code Pages

Code Page Name Region / Use
437 OEM United States Original IBM PC. Box-drawing characters, accented letters.
850 OEM Multilingual Latin I Western European languages on DOS.
1250 Windows Central European Polish, Czech, Hungarian, etc.
1251 Windows Cyrillic Russian, Ukrainian, Bulgarian, etc.
1252 Windows Western Western European languages on Windows. Superset of ISO 8859-1.
28591 ISO 8859-1 (Latin-1) Western European. Standard on older Unix systems.
28592 ISO 8859-2 (Latin-2) Central European.
65001 UTF-8 Unicode encoded as variable-width bytes. The modern standard.

The problem with code pages is that a byte value like 200 means different characters in different code pages. A file written with code page 1251 (Cyrillic) looks like gibberish when read with code page 1252 (Western). This is why Unicode was created.

Code Pages in Free Pascal

Free Pascal's AnsiString type is code-page aware in FPC 3.0+. Each AnsiString carries a code page identifier. You can set the system default code page:

uses
  SysUtils;

begin
  DefaultSystemCodePage := CP_UTF8;   { 65001 }
end;

When you assign strings between different code pages, FPC automatically converts them.


C.3 Unicode Basics and UTF-8 in Free Pascal

What Is Unicode?

Unicode is a universal character set that assigns a unique number (called a code point) to every character in every writing system, plus symbols, emoji, and technical characters. As of Unicode 15.1, there are over 149,000 assigned characters.

Code points are written in the form U+XXXX:

  • U+0041 = Latin Capital Letter A
  • U+00E9 = Latin Small Letter E with Acute (e)
  • U+4E16 = CJK character (Chinese/Japanese/Korean)
  • U+1F600 = Grinning Face emoji

UTF-8 Encoding

Unicode code points must be encoded into bytes for storage and transmission. UTF-8 is the dominant encoding today. It is a variable-width encoding:

Byte Count Code Point Range Byte Pattern
1 byte U+0000 - U+007F 0xxxxxxx
2 bytes U+0080 - U+07FF 110xxxxx 10xxxxxx
3 bytes U+0800 - U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
4 bytes U+10000 - U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Key properties of UTF-8:

  • All ASCII characters (0-127) are single bytes, identical to ASCII. Existing ASCII text is valid UTF-8.
  • Non-ASCII characters use 2 to 4 bytes.
  • Length(S) on a UTF-8 AnsiString returns the number of bytes, not the number of characters.

UTF-8 in Free Pascal

Free Pascal has strong UTF-8 support, especially through the LazUTF8 unit in Lazarus:

uses
  LazUTF8;

var
  S: String;
begin
  S := 'Hello, world!';
  WriteLn(Length(S));           { 14 — byte count }
  WriteLn(UTF8Length(S));       { 14 — character count (same for ASCII) }

  S := 'Cafe';                 { With accent: Caf + U+00E9 }
  WriteLn(Length(S));           { 5 — 'C','a','f' = 3 bytes + e = 2 bytes }
  WriteLn(UTF8Length(S));       { 4 — four visible characters }
end;

UnicodeString

Free Pascal also provides the UnicodeString type (UTF-16 encoded, similar to Delphi's String type in modern Delphi versions):

var
  U: UnicodeString;
begin
  U := 'Hello';
  WriteLn(Length(U));           { 5 — number of UTF-16 code units }
end;

For most Free Pascal and Lazarus projects, AnsiString with UTF-8 encoding (the default in Lazarus) is the recommended approach. The Lazarus LCL is built around UTF-8 strings.

Practical Tips

  1. Set the default code page to UTF-8 early in your program: pascal DefaultSystemCodePage := CP_UTF8;

  2. Use UTF8Length, UTF8Copy, UTF8Pos (from LazUTF8) instead of Length, Copy, Pos when working with non-ASCII text.

  3. Save source files as UTF-8. Most modern editors do this by default. In Lazarus, go to Source > File Encoding and select UTF-8.

  4. Be careful with Char indexing. S[i] accesses the i-th byte, not the i-th character. For UTF-8 strings containing non-ASCII characters, iterate using UTF8CharStart or similar routines.


C.4 Character Classification Functions

Free Pascal provides functions for testing and converting characters. Most are in the SysUtils unit or work directly on Char values.

Classification Functions (SysUtils)

Function Returns True When
IsLetter(Ch) Ch is a letter (locale-aware)
IsDigit(Ch) Ch is a decimal digit
IsLetterOrDigit(Ch) Ch is a letter or digit
IsWhiteSpace(Ch) Ch is space, tab, CR, LF, etc.
IsUpper(Ch) Ch is an uppercase letter
IsLower(Ch) Ch is a lowercase letter
IsPunctuation(Ch) Ch is a punctuation character
IsControl(Ch) Ch is a control character (0-31, 127)

Manual Classification (Works Without SysUtils)

function IsAlpha(Ch: Char): Boolean;
begin
  Result := Ch in ['A'..'Z', 'a'..'z'];
end;

function IsNumeric(Ch: Char): Boolean;
begin
  Result := Ch in ['0'..'9'];
end;

function IsAlphaNumeric(Ch: Char): Boolean;
begin
  Result := Ch in ['A'..'Z', 'a'..'z', '0'..'9'];
end;

function IsHexDigit(Ch: Char): Boolean;
begin
  Result := Ch in ['0'..'9', 'A'..'F', 'a'..'f'];
end;

function IsSpace(Ch: Char): Boolean;
begin
  Result := Ch in [' ', #9, #10, #13];
end;

function IsPrintable(Ch: Char): Boolean;
begin
  Result := (Ord(Ch) >= 32) and (Ord(Ch) <= 126);
end;

Conversion Functions

Function Description Example
UpCase(Ch) Uppercase a Char (ASCII only) UpCase('a') = 'A'
LowerCase(Ch) Lowercase a Char (needs SysUtils) LowerCase('A') = 'a'
UpperCase(S) Uppercase a String UpperCase('hello') = 'HELLO'
LowerCase(S) Lowercase a String LowerCase('HELLO') = 'hello'
Ord(Ch) Character to integer code Ord('A') = 65
Chr(N) Integer code to character Chr(65) = 'A'

Locale-Aware Conversions

For non-ASCII characters, use the UTF8UpperCase and UTF8LowerCase functions from the LazUTF8 unit:

uses
  LazUTF8;

var
  S: String;
begin
  S := UTF8UpperCase('cafe');     { 'CAFE' — handles accented e correctly }
  S := UTF8LowerCase('STRASSE');   { 'strasse' — handles German characters }
end;

C.5 Common Character Constants in Pascal

For readability, you can define named constants for frequently used control characters:

const
  NUL   = #0;
  TAB   = #9;
  LF    = #10;
  CR    = #13;
  SPACE = #32;
  ESC   = #27;
  DEL   = #127;

  { Line endings }
  CRLF  = #13#10;      { Windows line ending }
  {$IFDEF UNIX}
  LineEnd = #10;
  {$ELSE}
  LineEnd = #13#10;
  {$ENDIF}

Free Pascal provides the LineEnding constant in the System unit, which automatically resolves to the correct line-ending sequence for the target platform. Prefer LineEnding over hardcoded values for portable code.


This appendix serves as a reference whenever you need to look up a character code, understand encoding behavior, or classify characters in your Pascal programs.