Appendix C: ASCII and Character Set Reference

This appendix provides a complete ASCII table, an overview of extended character sets and code pages, an introduction to Unicode and UTF-8 as they apply to Free Pascal, and a reference for Pascal's built-in character classification and conversion functions.

C.1 The ASCII Table (0-127)

ASCII (American Standard Code for Information Interchange) defines 128 characters using 7 bits. The first 32 codes (0-31) and code 127 are control characters; the rest are printable.

Control Characters (0-31 and 127)

Dec	Hex	Char	Description	Pascal Escape
0	00	NUL	Null	`#0`
1	01	SOH	Start of Heading	`#1`
2	02	STX	Start of Text	`#2`
3	03	ETX	End of Text	`#3`
4	04	EOT	End of Transmission	`#4`
5	05	ENQ	Enquiry	`#5`
6	06	ACK	Acknowledge	`#6`
7	07	BEL	Bell (beep)	`#7`
8	08	BS	Backspace	`#8`
9	09	HT	Horizontal Tab	`#9`
10	0A	LF	Line Feed	`#10`
11	0B	VT	Vertical Tab	`#11`
12	0C	FF	Form Feed	`#12`
13	0D	CR	Carriage Return	`#13`
14	0E	SO	Shift Out	`#14`
15	0F	SI	Shift In	`#15`
16	10	DLE	Data Link Escape	`#16`
17	11	DC1	Device Control 1 (XON)	`#17`
18	12	DC2	Device Control 2	`#18`
19	13	DC3	Device Control 3 (XOFF)	`#19`
20	14	DC4	Device Control 4	`#20`
21	15	NAK	Negative Acknowledge	`#21`
22	16	SYN	Synchronous Idle	`#22`
23	17	ETB	End of Transmission Block	`#23`
24	18	CAN	Cancel	`#24`
25	19	EM	End of Medium	`#25`
26	1A	SUB	Substitute (EOF on Windows)	`#26`
27	1B	ESC	Escape	`#27`
28	1C	FS	File Separator	`#28`
29	1D	GS	Group Separator	`#29`
30	1E	RS	Record Separator	`#30`
31	1F	US	Unit Separator	`#31`
127	7F	DEL	Delete	`#127`

The most important control characters for everyday programming are:

LF (10): Line feed. The newline character on Unix/Linux/macOS.
CR (13): Carriage return. Together, CR+LF (13+10) form the newline on Windows.
HT (9): Horizontal tab.
NUL (0): Null character. Terminates PChar strings.
BEL (7): Produces an audible beep on many terminals.
ESC (27): Used to begin ANSI escape sequences for terminal colors and cursor control.

Printable Characters (32-126)

Dec	Hex	Char	Dec	Hex	Char	Dec	Hex	Char
32	20	(space)	64	40	`@`	96	60	`
33	21	`!`	65	41	`A`	97	61	`a`
34	22	`"`	66	42	`B`	98	62	`b`
35	23	`#`	67	43	`C`	99	63	`c`
36	24	`$`	68	44	`D`	100	64	`d`
37	25	`%`	69	45	`E`	101	65	`e`
38	26	`&`	70	46	`F`	102	66	`f`
39	27	`'`	71	47	`G`	103	67	`g`
40	28	`(`	72	48	`H`	104	68	`h`
41	29	`)`	73	49	`I`	105	69	`i`
42	2A	`*`	74	4A	`J`	106	6A	`j`
43	2B	`+`	75	4B	`K`	107	6B	`k`
44	2C	`,`	76	4C	`L`	108	6C	`l`
45	2D	`-`	77	4D	`M`	109	6D	`m`
46	2E	`.`	78	4E	`N`	110	6E	`n`
47	2F	`/`	79	4F	`O`	111	6F	`o`
48	30	`0`	80	50	`P`	112	70	`p`
49	31	`1`	81	51	`Q`	113	71	`q`
50	32	`2`	82	52	`R`	114	72	`r`
51	33	`3`	83	53	`S`	115	73	`s`
52	34	`4`	84	54	`T`	116	74	`t`
53	35	`5`	85	55	`U`	117	75	`u`
54	36	`6`	86	56	`V`	118	76	`v`
55	37	`7`	87	57	`W`	119	77	`w`
56	38	`8`	88	58	`X`	120	78	`x`
57	39	`9`	89	59	`Y`	121	79	`y`
58	3A	`:`	90	5A	`Z`	122	7A	`z`
59	3B	`;`	91	5B	`[`	123	7B	`{`
60	3C	`<`	92	5C	`\`	124	7C	`\\|`
61	3D	`=`	93	5D	`]`	125	7D	`}`
62	3E	`>`	94	5E	`^`	126	7E	`~`
63	3F	`?`	95	5F	`_`

Key Ranges to Remember

Range (Dec)	Range (Hex)	Characters
48-57	30-39	Digits `0`-`9`
65-90	41-5A	Uppercase `A`-`Z`
97-122	61-7A	Lowercase `a`-`z`

The difference between uppercase and lowercase letters is always 32. This means you can convert case by toggling bit 5:

UpperChar := Chr(Ord(LowerChar) - 32);   { 'a' -> 'A' }
LowerChar := Chr(Ord(UpperChar) + 32);   { 'A' -> 'a' }

{ Or using bitwise operations: }
UpperChar := Chr(Ord(Ch) and $DF);       { Clear bit 5 }
LowerChar := Chr(Ord(Ch) or $20);        { Set bit 5 }

Using ASCII Values in Pascal

var
  Ch: Char;
begin
  Ch := #65;              { 'A' via decimal ASCII code }
  Ch := #$41;             { 'A' via hex ASCII code }
  Ch := Chr(65);          { 'A' via Chr function }

  WriteLn(Ord('A'));      { 65 — get ASCII code of a character }
  WriteLn(Ord('0'));      { 48 }

  { Common idiom: convert digit character to its numeric value }
  DigitValue := Ord(Ch) - Ord('0');   { '7' -> 7 }
end;

C.2 Extended ASCII and Code Pages

Standard ASCII only defines 128 characters, which is sufficient for English text but inadequate for other languages. The values 128-255 are undefined by the ASCII standard itself. Various code pages assign different characters to this range.

Common Code Pages

Code Page	Name	Region / Use
437	OEM United States	Original IBM PC. Box-drawing characters, accented letters.
850	OEM Multilingual Latin I	Western European languages on DOS.
1250	Windows Central European	Polish, Czech, Hungarian, etc.
1251	Windows Cyrillic	Russian, Ukrainian, Bulgarian, etc.
1252	Windows Western	Western European languages on Windows. Superset of ISO 8859-1.
28591	ISO 8859-1 (Latin-1)	Western European. Standard on older Unix systems.
28592	ISO 8859-2 (Latin-2)	Central European.
65001	UTF-8	Unicode encoded as variable-width bytes. The modern standard.

The problem with code pages is that a byte value like 200 means different characters in different code pages. A file written with code page 1251 (Cyrillic) looks like gibberish when read with code page 1252 (Western). This is why Unicode was created.

Code Pages in Free Pascal

Free Pascal's AnsiString type is code-page aware in FPC 3.0+. Each AnsiString carries a code page identifier. You can set the system default code page:

uses
  SysUtils;

begin
  DefaultSystemCodePage := CP_UTF8;   { 65001 }
end;

When you assign strings between different code pages, FPC automatically converts them.

C.3 Unicode Basics and UTF-8 in Free Pascal

What Is Unicode?

Unicode is a universal character set that assigns a unique number (called a code point) to every character in every writing system, plus symbols, emoji, and technical characters. As of Unicode 15.1, there are over 149,000 assigned characters.

Code points are written in the form U+XXXX:

U+0041 = Latin Capital Letter A
U+00E9 = Latin Small Letter E with Acute (e)
U+4E16 = CJK character (Chinese/Japanese/Korean)
U+1F600 = Grinning Face emoji

UTF-8 Encoding

Unicode code points must be encoded into bytes for storage and transmission. UTF-8 is the dominant encoding today. It is a variable-width encoding:

Byte Count	Code Point Range	Byte Pattern
1 byte	U+0000 - U+007F	`0xxxxxxx`
2 bytes	U+0080 - U+07FF	`110xxxxx 10xxxxxx`
3 bytes	U+0800 - U+FFFF	`1110xxxx 10xxxxxx 10xxxxxx`
4 bytes	U+10000 - U+10FFFF	`11110xxx 10xxxxxx 10xxxxxx 10xxxxxx`

Key properties of UTF-8:

All ASCII characters (0-127) are single bytes, identical to ASCII. Existing ASCII text is valid UTF-8.
Non-ASCII characters use 2 to 4 bytes.
Length(S) on a UTF-8 AnsiString returns the number of bytes, not the number of characters.

UTF-8 in Free Pascal

Free Pascal has strong UTF-8 support, especially through the LazUTF8 unit in Lazarus:

uses
  LazUTF8;

var
  S: String;
begin
  S := 'Hello, world!';
  WriteLn(Length(S));           { 14 — byte count }
  WriteLn(UTF8Length(S));       { 14 — character count (same for ASCII) }

  S := 'Cafe';                 { With accent: Caf + U+00E9 }
  WriteLn(Length(S));           { 5 — 'C','a','f' = 3 bytes + e = 2 bytes }
  WriteLn(UTF8Length(S));       { 4 — four visible characters }
end;

UnicodeString

Free Pascal also provides the UnicodeString type (UTF-16 encoded, similar to Delphi's String type in modern Delphi versions):

var
  U: UnicodeString;
begin
  U := 'Hello';
  WriteLn(Length(U));           { 5 — number of UTF-16 code units }
end;

For most Free Pascal and Lazarus projects, AnsiString with UTF-8 encoding (the default in Lazarus) is the recommended approach. The Lazarus LCL is built around UTF-8 strings.

Practical Tips

Set the default code page to UTF-8 early in your program: pascal DefaultSystemCodePage := CP_UTF8;
Use UTF8Length, UTF8Copy, UTF8Pos (from LazUTF8) instead of Length, Copy, Pos when working with non-ASCII text.
Save source files as UTF-8. Most modern editors do this by default. In Lazarus, go to Source > File Encoding and select UTF-8.
Be careful with Char indexing. S[i] accesses the i-th byte, not the i-th character. For UTF-8 strings containing non-ASCII characters, iterate using UTF8CharStart or similar routines.

C.4 Character Classification Functions

Free Pascal provides functions for testing and converting characters. Most are in the SysUtils unit or work directly on Char values.

Classification Functions (SysUtils)

Function	Returns `True` When
`IsLetter(Ch)`	Ch is a letter (locale-aware)
`IsDigit(Ch)`	Ch is a decimal digit
`IsLetterOrDigit(Ch)`	Ch is a letter or digit
`IsWhiteSpace(Ch)`	Ch is space, tab, CR, LF, etc.
`IsUpper(Ch)`	Ch is an uppercase letter
`IsLower(Ch)`	Ch is a lowercase letter
`IsPunctuation(Ch)`	Ch is a punctuation character
`IsControl(Ch)`	Ch is a control character (0-31, 127)

Manual Classification (Works Without SysUtils)

function IsAlpha(Ch: Char): Boolean;
begin
  Result := Ch in ['A'..'Z', 'a'..'z'];
end;

function IsNumeric(Ch: Char): Boolean;
begin
  Result := Ch in ['0'..'9'];
end;

function IsAlphaNumeric(Ch: Char): Boolean;
begin
  Result := Ch in ['A'..'Z', 'a'..'z', '0'..'9'];
end;

function IsHexDigit(Ch: Char): Boolean;
begin
  Result := Ch in ['0'..'9', 'A'..'F', 'a'..'f'];
end;

function IsSpace(Ch: Char): Boolean;
begin
  Result := Ch in [' ', #9, #10, #13];
end;

function IsPrintable(Ch: Char): Boolean;
begin
  Result := (Ord(Ch) >= 32) and (Ord(Ch) <= 126);
end;

Conversion Functions

Function	Description	Example
`UpCase(Ch)`	Uppercase a Char (ASCII only)	`UpCase('a')` = `'A'`
`LowerCase(Ch)`	Lowercase a Char (needs SysUtils)	`LowerCase('A')` = `'a'`
`UpperCase(S)`	Uppercase a String	`UpperCase('hello')` = `'HELLO'`
`LowerCase(S)`	Lowercase a String	`LowerCase('HELLO')` = `'hello'`
`Ord(Ch)`	Character to integer code	`Ord('A')` = 65
`Chr(N)`	Integer code to character	`Chr(65)` = `'A'`

Locale-Aware Conversions

For non-ASCII characters, use the UTF8UpperCase and UTF8LowerCase functions from the LazUTF8 unit:

uses
  LazUTF8;

var
  S: String;
begin
  S := UTF8UpperCase('cafe');     { 'CAFE' — handles accented e correctly }
  S := UTF8LowerCase('STRASSE');   { 'strasse' — handles German characters }
end;

C.5 Common Character Constants in Pascal

For readability, you can define named constants for frequently used control characters:

const
  NUL   = #0;
  TAB   = #9;
  LF    = #10;
  CR    = #13;
  SPACE = #32;
  ESC   = #27;
  DEL   = #127;

  { Line endings }
  CRLF  = #13#10;      { Windows line ending }
  {$IFDEF UNIX}
  LineEnd = #10;
  {$ELSE}
  LineEnd = #13#10;
  {$ENDIF}

Free Pascal provides the LineEnding constant in the System unit, which automatically resolves to the correct line-ending sequence for the target platform. Prefer LineEnding over hardcoded values for portable code.

This appendix serves as a reference whenever you need to look up a character code, understand encoding behavior, or classify characters in your Pascal programs.