> "The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information and it is the most common of all data types."
In This Chapter
- 10.1 Strings in Pascal: A Brief History
- 10.2 String Declarations and Basics
- 10.3 Standard String Functions
- 10.4 String Processing Patterns
- 10.5 String Conversion
- 10.6 Character Operations
- 10.7 Building a Text Parser for Crypts of Pascalia
- 10.8 Regular-Expression-Style Matching
- 10.9 Project Checkpoint: PennyWise CSV Import
- 10.10 Chapter Summary
Chapter 10: Strings — Text Processing in Pascal
"The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information and it is the most common of all data types." — Alan J. Perlis, Epigrams on Programming (1982)
Every interesting program eventually needs to work with text. Whether you are parsing a command that a player types into a game, reading a line of comma-separated values from a file, or formatting a number for display on screen, you are working with strings. In Chapter 9, we explored arrays — ordered collections of elements all sharing a single type. A string, at its heart, is an array of characters, but it carries so much extra behavior and historical baggage that it deserves — and gets — an entire chapter of its own.
By the end of this chapter you will be able to:
- Distinguish between
ShortString,AnsiString, and other string types in Free Pascal. - Use standard string functions (
Length,Copy,Pos,Insert,Delete,Concat,Trim, and friends). - Parse and process text data — splitting, searching, and replacing.
- Implement basic string algorithms (reverse, palindrome check, word counting).
- Handle string-to-number and number-to-string conversions cleanly.
We will build toward two concrete milestones: a command parser for our ongoing Crypts of Pascalia text adventure, and a CSV import procedure for the PennyWise personal-finance application. Both of these require you to treat text not as something to display, but as something to dissect.
10.1 Strings in Pascal: A Brief History
Pascal was born in 1970. At that time, most languages treated strings as fixed-length arrays of characters, and Niklaus Wirth's original Pascal was no exception — the language did not even include a built-in string type in its earliest definition. Programmers had to declare something like packed array[1..80] of Char and manage lengths manually.
Borland Turbo Pascal (1983) changed everything for practical Pascal programmers by introducing the ShortString: a length-prefixed sequence of characters stored in a fixed 256-byte block. The first byte held the current length (0–255), and bytes 1–255 held the characters. This was fast, simple, and sufficient for the era of 80-column terminals.
As programs grew, 255 characters stopped being enough. Delphi introduced AnsiString — a reference-counted, heap-allocated, dynamically sized string that could hold up to 2 GB of text. Free Pascal adopted this model and added UnicodeString for full UTF-16 support.
Here is a summary of the string types you will encounter in Free Pascal 3.2+:
| Type | Max Length | Storage | Encoding | Index Base |
|---|---|---|---|---|
ShortString |
255 chars | Stack (256 bytes) | System codepage | 1 |
String[N] |
N chars (1–255) | Stack (N+1 bytes) | System codepage | 1 |
AnsiString |
~2 GB | Heap, ref-counted | System codepage | 1 |
UnicodeString |
~1 GB chars | Heap, ref-counted | UTF-16 | 1 |
WideString |
~1 GB chars | Heap, COM-compatible | UTF-16 | 1 |
PChar |
Null-terminated | Pointer | System codepage | 0 |
💡 Practical Guidance. For nearly all work in this textbook, the plain
Stringtype — which in Free Pascal's default mode maps toShortStringbut in{$mode Delphi}` or `{$H+}maps toAnsiString— is all you need. We will use{$H+}in most examples so thatStringmeansAnsiString, giving us dynamic length without worrying about the 255-character limit.
Why does this history matter? Because you will encounter old code, old tutorials, and old habits. Understanding that ShortString stores its length in byte zero (accessible as Ord(s[0])) while AnsiString stores metadata in a hidden header before the first character explains many of the quirks you will see in Pascal string code. Theme 2 — discipline transfers — applies here: the mental model of "a string is a length-prefixed array of characters" transfers from Pascal to virtually every other language, even if the implementation details differ.
How Other Languages Handle Strings
It is worth pausing to see how Pascal's approach compares to other languages, because this comparison deepens your understanding of why Pascal strings work the way they do:
- C uses null-terminated strings: a plain array of characters with a zero byte (
'\0') marking the end. This means finding the length of a C string requires scanning every character — an O(n) operation. Pascal's length-prefixed approach makesLengthan O(1) operation — instant, regardless of string size. - Java and C# use immutable string objects. Once created, a Java
Stringcannot be modified; any "modification" creates a new string. Pascal'sAnsiStringuses copy-on-write, which achieves a similar effect but more efficiently — the copy only happens when you actually modify the string, not on every assignment. - Python strings are also immutable sequences of Unicode characters. Python's
+concatenation creates a new string each time, much like naive Pascal concatenation. Python programmers learn to use''.join(list)for efficiency; Pascal programmers learn to useSetLengthand index assignment.
The mental model you build here — "a string is a sequence of characters with known length, supporting indexing, searching, and extraction" — is universal. The specific functions differ (Copy in Pascal, substring in Java, slicing in Python), but the concepts are identical.
ShortString Internals
A ShortString (or equivalently String[N] for some N between 1 and 255) is laid out in memory as follows:
Byte 0: Current length (0..N)
Byte 1: First character
Byte 2: Second character
...
Byte N: Last possible character
You can read the length either with Length(s) or, in classic Turbo Pascal style, with Ord(s[0]). We strongly recommend Length(s) — it works for every string type, not just ShortString.
AnsiString Internals (Conceptual)
An AnsiString variable is a pointer. When the string is empty, the pointer is nil. When the string has content, the pointer refers to the first character of a heap-allocated buffer that is preceded by a hidden header containing:
- A reference count (how many variables point to this buffer).
- The length in characters.
- The codepage identifier.
When you assign one AnsiString to another (t := s), Free Pascal does not copy the characters. It increments the reference count. Only when you modify one of the copies does the compiler generate a copy-on-write — allocating a new buffer and copying the data. This is efficient and invisible to you in everyday coding, but it explains why AnsiString is so much faster than you might expect for passing large strings to procedures.
⚠️ Warning. Because
AnsiStringis reference-counted, you should never use pointer arithmetic orMoveon its internal buffer unless you truly know what you are doing. Let the compiler manage the memory.
10.2 String Declarations and Basics
Let us begin writing code. Consider the following declarations:
{$mode objfpc}{$H+}
var
greeting : String; // AnsiString (because {$H+})
short : String[40]; // ShortString, max 40 chars
initial : Char; // single character
Assignment and Concatenation
Strings are assigned with := and concatenated with the + operator or the Concat function:
greeting := 'Hello';
greeting := greeting + ', world!'; // Now 'Hello, world!'
greeting := Concat('Hello', ', ', 'world!'); // Same result
Single characters and strings are largely interchangeable on the right side of an assignment. You can concatenate a Char onto a String:
initial := '!';
greeting := 'Hello' + initial; // 'Hello!'
Indexing
Strings in Pascal are 1-indexed. The first character is s[1], the second is s[2], and so on:
greeting := 'Hello';
WriteLn(greeting[1]); // 'H'
WriteLn(greeting[5]); // 'o'
You can also assign to individual characters:
greeting[1] := 'J';
WriteLn(greeting); // 'Jello'
⚠️ Warning. Indexing beyond
Length(s)is undefined behavior forShortStringand will raise an exception (with range checking enabled) or silently read garbage. Always check bounds.
Comparison
Strings can be compared with the standard relational operators: =, <>, <, >, <=, >=. Comparison is lexicographic — character by character, using the ordinal values of the characters:
if 'apple' < 'banana' then
WriteLn('apple comes first'); // True: 'a' (97) < 'b' (98)
if 'Apple' < 'apple' then
WriteLn('uppercase first'); // True: 'A' (65) < 'a' (97)
This means comparison is case-sensitive by default. If you need case-insensitive comparison, convert both strings to the same case first:
if LowerCase(s1) = LowerCase(s2) then
WriteLn('Equal (case-insensitive)');
Empty Strings and Length
An empty string has length zero:
var s: String;
begin
s := '';
WriteLn(Length(s)); // 0
if s = '' then
WriteLn('Empty!');
end.
String Length vs. Capacity
A subtle but important distinction: for ShortString, the capacity (maximum number of characters the variable can hold) is fixed at declaration time, but the length (number of characters currently stored) can vary from 0 to that maximum. For AnsiString, the capacity grows automatically as needed — you never need to worry about it.
var
short: String[10]; { Capacity = 10, Length starts at 0 }
long : String; { AnsiString: capacity is unlimited }
begin
short := 'Hi'; { Length = 2, capacity still 10 }
short := 'Hello World'; { Silently truncated to 'Hello Worl' (10 chars) }
WriteLn(short); { 'Hello Worl' — data loss with no warning! }
long := 'Hello World'; { Length = 11, capacity grows to fit }
WriteLn(long); { 'Hello World' — no truncation }
end.
⚠️ Warning.
ShortStringsilently truncates strings that exceed its declared maximum. This is one of the most insidious sources of bugs in Pascal programs that useShortString. This is another reason we preferAnsiStringvia{$H+}— it never truncates.
Iterating Through a String
Since strings are indexed sequences, you can process them character by character using a for loop. This pattern is fundamental to nearly everything else in this chapter:
var
s: String;
i: Integer;
vowelCount: Integer;
begin
s := 'Hello, World!';
vowelCount := 0;
for i := 1 to Length(s) do
begin
case UpCase(s[i]) of
'A', 'E', 'I', 'O', 'U':
Inc(vowelCount);
end;
end;
WriteLn('Vowels found: ', vowelCount); { 3 }
end.
Notice the use of UpCase (a function that converts a single Char to uppercase) combined with a case statement. This is cleaner than writing a long chain of if comparisons.
10.3 Standard String Functions
Free Pascal provides a rich set of built-in string routines. Let us walk through the ones you will use most often. We will build a quick-reference table first, then explore each with examples.
| Function | Signature (simplified) | Purpose |
|---|---|---|
Length |
Length(s): Integer |
Returns the number of characters in s |
Copy |
Copy(s, start, count): String |
Extracts a substring |
Pos |
Pos(substr, s): Integer |
Finds first occurrence of substr in s (0 if not found) |
Insert |
Insert(source, var target, index) |
Inserts source into target at position index |
Delete |
Delete(var s, start, count) |
Removes count characters from s starting at start |
Concat |
Concat(s1, s2, ...): String |
Concatenates multiple strings |
UpperCase |
UpperCase(s): String |
Converts to uppercase |
LowerCase |
LowerCase(s): String |
Converts to lowercase |
Trim |
Trim(s): String |
Removes leading and trailing whitespace |
TrimLeft |
TrimLeft(s): String |
Removes leading whitespace |
TrimRight |
TrimRight(s): String |
Removes trailing whitespace |
StringReplace |
StringReplace(s, old, new, flags): String |
Replaces occurrences (in SysUtils) |
StringOfChar |
StringOfChar(c, n): String |
Creates a string of n copies of character c |
Length
Length returns the number of characters currently stored in the string — not the maximum capacity:
var s: String;
begin
s := 'Free Pascal';
WriteLn(Length(s)); // 11
end.
Copy
Copy(s, start, count) extracts a substring. It returns count characters beginning at position start. If start + count exceeds the length, it returns whatever characters remain without error:
var s, sub: String;
begin
s := 'Hello, world!';
sub := Copy(s, 8, 5); // 'world'
WriteLn(sub);
sub := Copy(s, 8, 100); // 'world!' — no error, just takes what's there
WriteLn(sub);
end.
Copy never modifies the original string. It is a pure function.
Pos
Pos(substr, s) searches for substr within s and returns the position of its first character. If not found, it returns 0:
var idx: Integer;
begin
idx := Pos('world', 'Hello, world!');
WriteLn(idx); // 8
idx := Pos('xyz', 'Hello, world!');
WriteLn(idx); // 0
end.
💡 Tip.
Posfinds only the first occurrence. To find all occurrences, you need a loop — we will build one shortly.
Insert and Delete
Insert and Delete are procedures, not functions — they modify the target string variable in place:
var s: String;
begin
s := 'Hello world!';
Insert('beautiful ', s, 7); // s = 'Hello beautiful world!'
WriteLn(s);
Delete(s, 7, 10); // s = 'Hello world!'
WriteLn(s);
end.
Insert(source, target, index) shifts characters to the right and inserts source at position index. Delete(s, start, count) removes count characters starting at start, shifting subsequent characters to the left.
UpperCase, LowerCase, Trim
These are straightforward:
WriteLn(UpperCase('hello')); // 'HELLO'
WriteLn(LowerCase('HELLO')); // 'hello'
WriteLn(Trim(' hello ')); // 'hello'
Trim removes both leading and trailing whitespace (spaces, tabs, carriage returns, line feeds). TrimLeft and TrimRight handle one side only.
StringReplace (from SysUtils)
For find-and-replace operations, StringReplace is invaluable:
uses SysUtils;
var s: String;
begin
s := 'The cat sat on the cat mat';
s := StringReplace(s, 'cat', 'dog', [rfReplaceAll]);
WriteLn(s); // 'The dog sat on the dog mat'
end.
The flags set can include:
- rfReplaceAll — replace all occurrences (default replaces only the first).
- rfIgnoreCase — case-insensitive matching.
Combining Functions: Finding All Occurrences
Here is a pattern you will use often — finding every occurrence of a substring:
procedure FindAllOccurrences(const haystack, needle: String);
var
startPos, found: Integer;
begin
startPos := 1;
repeat
found := Pos(needle, Copy(haystack, startPos, Length(haystack)));
if found > 0 then
begin
WriteLn('Found at position ', startPos + found - 1);
startPos := startPos + found; // Move past this occurrence
end;
until found = 0;
end;
This combines Pos and Copy to scan through the entire string. Notice the arithmetic: Pos returns a position relative to the copy, so we add startPos - 1 to get the position in the original string.
Common Mistakes with String Functions
Before moving on, let us address three mistakes that trip up nearly every beginner:
Mistake 1: Confusing parameter order in Pos. The signature is Pos(needle, haystack) — the substring you are searching for comes first, the string you are searching in comes second. Many students write Pos(s, 'hello') when they mean Pos('hello', s). If you come from other languages where the order is reversed (like Python's s.find('hello')), this will catch you.
Mistake 2: Forgetting that Insert and Delete are procedures. Writing result := Insert('x', s, 3) will not compile — Insert does not return a value. It modifies s directly. If you need the result without modifying the original, make a copy first:
temp := s;
Insert('x', temp, 3);
{ Now temp is modified, s is unchanged }
Mistake 3: Off-by-one errors with Copy after Pos. When you find a delimiter at position p and want to extract everything after it, the start position for Copy is p + Length(delimiter), not p + 1 (unless the delimiter is a single character). This mistake is especially common when the delimiter is a multi-character string like ', ' or ': '.
10.4 String Processing Patterns
With the fundamental functions in hand, we can tackle real-world string processing tasks. In this section, we build three essential patterns: splitting a string by a delimiter, extracting words, and parsing structured text.
Pattern 1: Splitting a String by a Delimiter
One of the most common tasks in programming is splitting a string into parts at some delimiter. Pascal does not have a built-in Split function (though Free Pascal's SysUtils has string.Split in newer modes), so we will build one:
procedure SplitString(const s, delimiter: String;
var parts: array of String;
var count: Integer);
var
remaining: String;
delimPos: Integer;
begin
count := 0;
remaining := s;
repeat
delimPos := Pos(delimiter, remaining);
if delimPos > 0 then
begin
if count <= High(parts) then
begin
parts[count] := Copy(remaining, 1, delimPos - 1);
Inc(count);
end;
remaining := Copy(remaining, delimPos + Length(delimiter),
Length(remaining));
end
else
begin
{ Last segment (or only segment if no delimiter found) }
if count <= High(parts) then
begin
parts[count] := remaining;
Inc(count);
end;
end;
until delimPos = 0;
end;
This procedure takes a string s, a delimiter, an open array parts to fill in, and a count to report how many parts were found. It repeatedly finds the delimiter, extracts the text before it, and advances through the string.
Example usage:
var
fields: array[0..9] of String;
n, i: Integer;
begin
SplitString('apple,banana,cherry', ',', fields, n);
for i := 0 to n - 1 do
WriteLn(fields[i]);
// Output:
// apple
// banana
// cherry
end.
📊 Theme 5 (Algorithms + Data Structures = Programs). This split procedure is a clear example of Theme 5 in action. The algorithm (repeated search-and-extract) operates on the data structure (a string, which is itself an array of characters) to produce a new data structure (an array of substrings). Neither the algorithm nor the data structure alone solves the problem — it is their combination that makes the program work.
Pattern 2: Word Extraction
Extracting individual words from text is slightly different from splitting by a single delimiter, because words can be separated by any amount of whitespace — spaces, tabs, or multiple spaces in a row:
function ExtractWord(const s: String; wordIndex: Integer): String;
var
i, currentWord, wordStart: Integer;
inWord: Boolean;
begin
Result := '';
currentWord := 0;
inWord := False;
for i := 1 to Length(s) do
begin
if s[i] <> ' ' then
begin
if not inWord then
begin
inWord := True;
Inc(currentWord);
wordStart := i;
end;
if currentWord = wordIndex then
Result := Result + s[i];
end
else
begin
if inWord and (currentWord = wordIndex) then
Exit; // We have the complete word
inWord := False;
end;
end;
end;
This function uses a state machine with a Boolean flag inWord that tracks whether we are currently inside a word or in whitespace. This is a fundamental pattern in text parsing — you will see it again and again.
Pattern 3: Counting Words
Building on the state machine approach:
function CountWords(const s: String): Integer;
var
i: Integer;
inWord: Boolean;
begin
Result := 0;
inWord := False;
for i := 1 to Length(s) do
begin
if s[i] <> ' ' then
begin
if not inWord then
begin
inWord := True;
Inc(Result);
end;
end
else
inWord := False;
end;
end;
Pattern 4: String Reversal
Reversing a string is a classic exercise that reinforces indexing:
function ReverseString(const s: String): String;
var
i: Integer;
begin
Result := '';
for i := Length(s) downto 1 do
Result := Result + s[i];
end;
⚠️ Performance Note. Concatenating one character at a time in a loop (
Result := Result + s[i]) is simple but inefficient for very long strings, because each concatenation may allocate a new buffer. For production code on large strings, pre-allocate withSetLength(Result, Length(s))and assign characters by index. For the string sizes we work with in this textbook (up to a few thousand characters), the simple approach is fine.
Pattern 5: Palindrome Check
A palindrome reads the same forwards and backwards. We can check this efficiently without creating a reversed copy:
function IsPalindrome(const s: String): Boolean;
var
left, right: Integer;
begin
left := 1;
right := Length(s);
while left < right do
begin
if LowerCase(s[left]) <> LowerCase(s[right]) then
Exit(False);
Inc(left);
Dec(right);
end;
Result := True;
end;
This uses a two-pointer technique — left starts at the beginning, right at the end, and they walk toward each other comparing characters. The function exits as soon as a mismatch is found.
Pattern 6: Efficient String Reversal with SetLength
We mentioned earlier that concatenating one character at a time is inefficient. Here is the production-quality version of string reversal that pre-allocates:
function ReverseStringFast(const s: String): String;
var
i, len: Integer;
begin
len := Length(s);
SetLength(Result, len);
for i := 1 to len do
Result[len - i + 1] := s[i];
end;
SetLength(Result, len) allocates a string of exactly len characters in one operation. We then fill in the characters by index, which involves no allocation at all — just memory writes. For a string of 10,000 characters, this is roughly 10,000 times faster than the naive version, because we avoid 10,000 separate heap allocations.
💡 When does performance matter? For strings under a few hundred characters (the vast majority of strings in typical programs), the naive approach is fine — the overhead is unmeasurable. For processing large files or running string operations in tight loops, the
SetLength-and-index approach is essential. Learn both: the simple version for clarity, the efficient version for when you need speed.
Putting the Patterns Together: A Text Statistics Function
Let us combine several patterns into a practical function that computes basic statistics about a block of text:
procedure TextStatistics(const text: String);
var
i: Integer;
charCount, wordCount, sentenceCount, lineCount: Integer;
inWord: Boolean;
begin
charCount := Length(text);
wordCount := 0;
sentenceCount := 0;
lineCount := 1; { At least one line if text is non-empty }
inWord := False;
for i := 1 to Length(text) do
begin
{ Count words using state machine }
if (text[i] <> ' ') and (text[i] <> #10) and (text[i] <> #13) then
begin
if not inWord then
begin
inWord := True;
Inc(wordCount);
end;
end
else
inWord := False;
{ Count sentences (rough: period, exclamation, question mark) }
if text[i] in ['.', '!', '?'] then
Inc(sentenceCount);
{ Count lines }
if text[i] = #10 then
Inc(lineCount);
end;
WriteLn('Characters: ', charCount);
WriteLn('Words: ', wordCount);
WriteLn('Sentences: ', sentenceCount);
WriteLn('Lines: ', lineCount);
if wordCount > 0 then
WriteLn('Avg word/sentence: ', sentenceCount / wordCount:0:1);
end;
This function processes the text in a single pass through the string — it visits each character exactly once. This is efficient and demonstrates how the state machine pattern scales to handle multiple counting tasks simultaneously.
10.5 String Conversion
Programs constantly need to convert between strings and numbers. A financial application reads '42.99' from a CSV file and needs the number 42.99 for arithmetic. A game displays 'Score: 1500' by converting the integer 1500 to text. Pascal provides several mechanisms for these conversions.
Str and Val (Classic Pascal)
The classic approach uses Str and Val:
var
s: String;
n: Integer;
code: Integer;
begin
{ Number to string }
Str(42, s);
WriteLn(s); // '42'
Str(3.14159:8:4, s); // Formatted: 8 total width, 4 decimal places
WriteLn(s); // ' 3.1416'
{ String to number }
Val('123', n, code);
if code = 0 then
WriteLn('Parsed: ', n) // 123
else
WriteLn('Error at position ', code);
Val('12x3', n, code);
WriteLn('Error at position ', code); // 3 (where 'x' was found)
end.
Val(s, variable, code) is particularly useful because it reports where the conversion failed via the code parameter. A code of 0 means success; any other value is the position of the first invalid character.
SysUtils Conversion Functions
The SysUtils unit provides more convenient (though less informative on error) functions:
uses SysUtils;
var
s: String;
n: Integer;
x: Double;
begin
{ Integer conversions }
s := IntToStr(42); // '42'
n := StrToInt('42'); // 42
n := StrToIntDef('abc', 0); // 0 (default on failure)
{ Float conversions }
s := FloatToStr(3.14); // '3.14'
x := StrToFloat('2.718'); // 2.718
x := StrToFloatDef('bad', 0.0); // 0.0
{ Formatted output }
s := Format('Name: %s, Age: %d, GPA: %.2f',
['Alice', 20, 3.85]);
WriteLn(s); // 'Name: Alice, Age: 20, GPA: 3.85'
end.
💡 Which should you use? For simple conversions where you trust the input (or want an exception on bad data), use
StrToIntandStrToFloat. For parsing user input or file data where bad values are possible, useVal(for precise error location) orStrToIntDef/StrToFloatDef(for default-on-failure behavior). In the PennyWise project, where we parse CSV files that might contain malformed data, we will useValso we can report exactly where parsing failed.
The Format Function
Format deserves special attention. It works like printf in C, using format specifiers:
| Specifier | Type | Example |
|---|---|---|
%s |
String | Format('%s', ['hello']) → 'hello' |
%d |
Integer | Format('%d', [42]) → '42' |
%f |
Float | Format('%.2f', [3.14]) → '3.14' |
%e |
Scientific | Format('%e', [1234.0]) → '1.234E+003' |
%x |
Hexadecimal | Format('%x', [255]) → 'FF' |
%% |
Literal % |
Format('100%%') → '100%' |
Width and precision are specified between % and the letter: %10d means at least 10 characters wide (right-aligned), %-10s means left-aligned in 10 characters, %8.2f means 8 total characters with 2 decimal places.
Practical Example: Formatted Receipt
Let us put Format to work by generating a formatted receipt — a task that combines string conversion, alignment, and string construction:
procedure PrintReceipt;
var
items: array[0..2] of String = ('Widget', 'Gadget', 'Sprocket');
prices: array[0..2] of Real = (4.99, 19.95, 2.50);
quantities: array[0..2] of Integer = (3, 1, 10);
i: Integer;
lineTotal, grandTotal: Real;
begin
WriteLn(Format('%-20s %5s %8s %10s', ['Item', 'Qty', 'Price', 'Total']));
WriteLn(StringOfChar('=', 46));
grandTotal := 0;
for i := 0 to 2 do
begin
lineTotal := prices[i] * quantities[i];
grandTotal := grandTotal + lineTotal;
WriteLn(Format('%-20s %5d %8.2f %10.2f',
[items[i], quantities[i], prices[i], lineTotal]));
end;
WriteLn(StringOfChar('-', 46));
WriteLn(Format('%35s %10.2f', ['TOTAL:', grandTotal]));
end;
Output:
Item Qty Price Total
==============================================
Widget 3 4.99 14.97
Gadget 1 19.95 19.95
Sprocket 10 2.50 25.00
----------------------------------------------
TOTAL: 59.92
This kind of formatted output is essential for the PennyWise project — financial data must be neatly aligned and properly formatted for readability.
A Note on Locale and Decimal Separators
When using StrToFloat and FloatToStr, be aware that the decimal separator depends on the system locale. In the United States, it is a period (3.14); in France and Germany, it is a comma (3,14). If your program reads data files that always use a period, you may need to set DefaultFormatSettings.DecimalSeparator := '.' at the beginning of your program to avoid conversion errors on European systems. This is a common source of bugs in programs that process international data.
10.6 Character Operations
Since strings are sequences of characters, understanding the Char type is essential. A Char in Free Pascal is a single byte representing one character in the system codepage (typically ASCII-compatible for values 0–127).
Ord and Chr
Ord converts a character to its numeric value (ordinal). Chr converts a number back to a character:
WriteLn(Ord('A')); // 65
WriteLn(Ord('a')); // 97
WriteLn(Ord('0')); // 48
WriteLn(Chr(65)); // 'A'
WriteLn(Chr(10)); // Line feed (newline)
Character Classification
You can classify characters using comparisons or functions from SysUtils:
function IsLetter(c: Char): Boolean;
begin
Result := (c >= 'A') and (c <= 'Z') or
(c >= 'a') and (c <= 'z');
end;
function IsDigit(c: Char): Boolean;
begin
Result := (c >= '0') and (c <= '9');
end;
function IsWhitespace(c: Char): Boolean;
begin
Result := c in [' ', #9, #10, #13]; // space, tab, LF, CR
end;
Or use the Character unit in modern Free Pascal for Unicode-aware classification.
Character Arithmetic
Because characters are ordinal values, you can perform arithmetic on them:
var c: Char;
begin
c := 'A';
c := Chr(Ord(c) + 1); // 'B'
WriteLn(c);
{ Convert digit character to integer }
WriteLn(Ord('7') - Ord('0')); // 7
{ Convert uppercase to lowercase }
c := 'G';
c := Chr(Ord(c) + 32); // 'g' (only works for A-Z!)
WriteLn(c);
end.
This character arithmetic is the foundation for the Caesar cipher we will build in Case Study 1. It also demonstrates Theme 2 — the concept that characters are just numbers in a particular encoding transfers to every programming language.
Building Character-Level Functions
Here is a function that strips all non-alphabetic characters from a string, which is useful for preparing text for analysis:
function StripNonAlpha(const s: String): String;
var
i: Integer;
begin
Result := '';
for i := 1 to Length(s) do
if ((s[i] >= 'A') and (s[i] <= 'Z')) or
((s[i] >= 'a') and (s[i] <= 'z')) then
Result := Result + s[i];
end;
And a function that counts occurrences of a specific character:
function CountChar(const s: String; c: Char): Integer;
var
i: Integer;
begin
Result := 0;
for i := 1 to Length(s) do
if s[i] = c then
Inc(Result);
end;
The ASCII Table: A Mental Map
You do not need to memorize the entire ASCII table, but knowing the structure helps you write character-level code confidently:
| Range | Characters | Notes |
|---|---|---|
| 0–31 | Control characters | #9 = Tab, #10 = LF, #13 = CR |
| 32 | Space | The first printable character |
| 48–57 | '0' through '9' |
Digits are contiguous and in order |
| 65–90 | 'A' through 'Z' |
Uppercase letters, contiguous and in order |
| 97–122 | 'a' through 'z' |
Lowercase letters, contiguous and in order |
The fact that digits, uppercase letters, and lowercase letters each occupy contiguous ranges is what makes character arithmetic work. The gap between 'A' (65) and 'a' (97) is always 32, so converting case is as simple as adding or subtracting 32 — though using UpCase or LowerCase is always more readable and more correct (it handles edge cases you might forget).
Practical Example: Validating Input
Character operations are essential for input validation. Here is a function that checks whether a string is a valid identifier (starts with a letter, contains only letters, digits, and underscores):
function IsValidIdentifier(const s: String): Boolean;
var
i: Integer;
begin
Result := False;
if Length(s) = 0 then Exit;
{ First character must be a letter or underscore }
if not ((s[1] >= 'A') and (s[1] <= 'Z') or
(s[1] >= 'a') and (s[1] <= 'z') or
(s[1] = '_')) then
Exit;
{ Remaining characters: letters, digits, or underscores }
for i := 2 to Length(s) do
begin
if not ((s[i] >= 'A') and (s[i] <= 'Z') or
(s[i] >= 'a') and (s[i] <= 'z') or
(s[i] >= '0') and (s[i] <= '9') or
(s[i] = '_')) then
Exit;
end;
Result := True;
end;
This function validates identifier names — useful in the Crypts of Pascalia parser (validating command names), in PennyWise (validating category names), and in any program that accepts structured input from users.
10.7 Building a Text Parser for Crypts of Pascalia
Now we bring everything together in our anchor example. Crypts of Pascalia is a text adventure where the player types commands like:
go northtake swordlook arounduse key on doorinventoryexamine rusty shield
Our parser needs to:
- Read a line of input.
- Normalize it (trim whitespace, convert to lowercase).
- Extract the verb (first word) and the object (everything after the verb).
- Handle multi-word commands like
use key on doorby identifying the verb, the direct object, and a preposition + indirect object.
Step 1: Normalization
function NormalizeInput(const raw: String): String;
begin
Result := LowerCase(Trim(raw));
{ Collapse multiple spaces to single spaces }
while Pos(' ', Result) > 0 do
Result := StringReplace(Result, ' ', ' ', [rfReplaceAll]);
end;
Step 2: Command Record
We define a record to hold the parsed command:
type
TCommand = record
Verb : String;
DirectObj : String;
Preposition: String;
IndirectObj: String;
IsValid : Boolean;
end;
Step 3: The Parser
function ParseCommand(const raw: String): TCommand;
var
normalized: String;
spacePos, prepPos: Integer;
rest: String;
const
Prepositions: array[0..4] of String =
('on', 'with', 'at', 'to', 'from');
var
i: Integer;
begin
Result.Verb := '';
Result.DirectObj := '';
Result.Preposition := '';
Result.IndirectObj := '';
Result.IsValid := False;
normalized := NormalizeInput(raw);
if normalized = '' then Exit;
{ Extract verb (first word) }
spacePos := Pos(' ', normalized);
if spacePos = 0 then
begin
{ Single-word command like "inventory" or "quit" }
Result.Verb := normalized;
Result.IsValid := True;
Exit;
end;
Result.Verb := Copy(normalized, 1, spacePos - 1);
rest := Copy(normalized, spacePos + 1, Length(normalized));
{ Look for a preposition in the rest of the command }
for i := Low(Prepositions) to High(Prepositions) do
begin
prepPos := Pos(' ' + Prepositions[i] + ' ', ' ' + rest + ' ');
if prepPos > 0 then
begin
Result.DirectObj := Trim(Copy(rest, 1, prepPos - 1));
Result.Preposition := Prepositions[i];
Result.IndirectObj := Trim(Copy(rest,
prepPos + Length(Prepositions[i]) + 1, Length(rest)));
Result.IsValid := True;
Exit;
end;
end;
{ No preposition found — everything after verb is the direct object }
Result.DirectObj := rest;
Result.IsValid := True;
end;
Step 4: Dispatching Commands
With parsed commands in hand, we can dispatch to game logic:
procedure ExecuteCommand(const cmd: TCommand);
begin
if not cmd.IsValid then
begin
WriteLn('I don''t understand that command.');
Exit;
end;
case cmd.Verb of
'go', 'walk', 'move':
WriteLn('You move ', cmd.DirectObj, '.');
'take', 'get', 'grab':
WriteLn('You pick up the ', cmd.DirectObj, '.');
'look', 'examine':
if cmd.DirectObj = '' then
WriteLn('You look around the room.')
else
WriteLn('You examine the ', cmd.DirectObj, '.');
'use':
if cmd.IndirectObj <> '' then
WriteLn('You use the ', cmd.DirectObj, ' ',
cmd.Preposition, ' the ', cmd.IndirectObj, '.')
else
WriteLn('You use the ', cmd.DirectObj, '.');
'inventory', 'inv', 'i':
WriteLn('You check your inventory.');
'quit', 'exit':
WriteLn('Farewell, adventurer!');
else
WriteLn('I don''t know how to "', cmd.Verb, '".');
end;
end;
🔗 Connection to earlier chapters. This parser uses records (Chapter 9 introduced arrays, but we preview records here — they get full treatment in Chapter 11). It uses procedures with parameters (Chapter 8), loops and conditionals (Chapter 5–6), and of course every string function from this chapter. This is Theme 5 in full force: algorithms (parsing), data structures (strings, records, arrays), and their combination yield a program.
Running the Parser
Here is a sample interaction:
> use key on door
You use the key on the door.
> take rusty sword
You pick up the rusty sword.
> go north
You move north.
> look
You look around the room.
> xyzzy
I don't know how to "xyzzy".
The parser is deliberately simple — it does not handle ambiguity, synonyms for objects, or articles like "the." A production text adventure would need significantly more sophistication. But this parser demonstrates the core technique: normalize, tokenize, classify, dispatch. This four-step pattern appears in everything from command-line tools to compilers, and mastering it in Pascal gives you a template that transfers to any language.
Why Four Steps?
Let us examine why each step in the normalize-tokenize-classify-dispatch pattern is necessary, and what goes wrong if you skip one:
-
Normalize. Without normalization,
"GO NORTH"," go north ", and"Go North"are three different inputs that should all do the same thing. If you skip normalization, you need to handle every possible capitalization and whitespace variation in your dispatch logic — an exponential explosion of cases. -
Tokenize. Without tokenization, you would need to search the raw string for every possible command. Pattern matching on raw text is fragile:
Pos('go', input)would match"forget"(because it contains"go"). Tokenizing into words eliminates this problem because you compare whole words, not substrings. -
Classify. Without classification, your dispatch code must understand the grammar of every possible sentence. By identifying the verb, objects, and prepositions as distinct roles, you separate grammar from action — the dispatch code only needs to know what to do with a verb and its objects, not how to parse English.
-
Dispatch. Without a clean dispatch mechanism (like our
casestatement on the verb), adding new commands requires modifying complex conditional logic. With dispatch, adding a new command is as simple as adding a new branch.
📊 Theme 2 (Discipline Transfers). This four-step pattern is not just for text adventures. Web servers normalize URLs, tokenize path segments, classify the HTTP method, and dispatch to handler functions. Compilers normalize source code (remove comments), tokenize into lexemes, classify tokens, and dispatch to the parser. Database engines normalize queries, tokenize SQL, classify clauses, and dispatch to the query planner. The pattern is universal because it addresses a universal problem: transforming unstructured text into structured action.
10.8 Regular-Expression-Style Matching
Pascal does not include a built-in regular expression engine (though Free Pascal has the RegExpr unit available). For many tasks, however, you can build simple pattern matching with the string functions you already know. Let us build a function that matches strings against patterns containing wildcards:
*matches any sequence of characters (including empty).?matches exactly one character.
function WildcardMatch(const pattern, text: String): Boolean;
var
pIdx, tIdx: Integer;
pLen, tLen: Integer;
starPIdx, starTIdx: Integer;
begin
pIdx := 1;
tIdx := 1;
pLen := Length(pattern);
tLen := Length(text);
starPIdx := 0;
starTIdx := 0;
while tIdx <= tLen do
begin
if (pIdx <= pLen) and ((pattern[pIdx] = '?') or
(pattern[pIdx] = text[tIdx])) then
begin
{ Exact match or ? wildcard }
Inc(pIdx);
Inc(tIdx);
end
else if (pIdx <= pLen) and (pattern[pIdx] = '*') then
begin
{ Star: remember position, try matching zero characters }
starPIdx := pIdx;
starTIdx := tIdx;
Inc(pIdx);
end
else if starPIdx > 0 then
begin
{ Backtrack: star matches one more character }
pIdx := starPIdx + 1;
Inc(starTIdx);
tIdx := starTIdx;
end
else
Exit(False);
end;
{ Skip trailing stars in pattern }
while (pIdx <= pLen) and (pattern[pIdx] = '*') do
Inc(pIdx);
Result := pIdx > pLen;
end;
Example usage:
WriteLn(WildcardMatch('*.txt', 'readme.txt')); // True
WriteLn(WildcardMatch('data_???.csv', 'data_001.csv')); // True
WriteLn(WildcardMatch('hello*world', 'hello beautiful world')); // True
WriteLn(WildcardMatch('test', 'testing')); // False
This algorithm uses a greedy match with backtracking strategy. The * initially tries to match zero characters. If later matching fails, it backtracks and has the * consume one more character, then tries again. This is a simplified version of the algorithm used in Unix shell globbing.
A Simpler Alternative: StartsWith, EndsWith, Contains
For many practical needs, you do not need full wildcard matching. These three utility functions handle the most common cases:
function StartsWith(const s, prefix: String): Boolean;
begin
Result := Copy(s, 1, Length(prefix)) = prefix;
end;
function EndsWith(const s, suffix: String): Boolean;
begin
if Length(suffix) > Length(s) then
Result := False
else
Result := Copy(s, Length(s) - Length(suffix) + 1,
Length(suffix)) = suffix;
end;
function Contains(const s, substr: String): Boolean;
begin
Result := Pos(substr, s) > 0;
end;
These are simple, readable, and cover a surprising number of real-world matching needs — checking file extensions, finding keywords, validating prefixes and suffixes.
When to Use Each Approach
| Need | Best Tool |
|---|---|
| Check file extension | EndsWith(filename, '.pas') |
| Check if string contains a keyword | Contains(text, 'error') or Pos('error', text) > 0 |
| Check string prefix | StartsWith(line, '//') |
| Match simple patterns with wildcards | WildcardMatch('data_*.csv', filename) |
| Complex patterns (email, phone number, etc.) | Use RegExpr unit or hand-written state machine |
For the Crypts of Pascalia text adventure, StartsWith is useful for checking directional abbreviations (StartsWith(cmd.DirectObj, 'n') could match "north"), and Contains is useful for searching item descriptions. The wildcard matcher might be used for a game save-file feature that lets the player load files matching 'save_*.dat'.
10.9 Project Checkpoint: PennyWise CSV Import
It is time to apply everything we have learned to the PennyWise personal-finance application. PennyWise needs to import bank transaction data from CSV files. A typical CSV line looks like:
2025-03-15,Groceries,-42.99,Whole Foods Market
2025-03-14,Income,2500.00,Paycheck - ACME Corp
2025-03-13,Dining,-18.50,"Joe's Pizza, Pasta & More"
Note the challenges: - Fields are separated by commas. - Some fields may be quoted (containing commas within the quotes). - Amounts can be negative (expenses) or positive (income). - We need to convert date strings and amount strings to usable data.
Data Structure
type
TTransaction = record
TransDate : String;
Category : String;
Amount : Real;
Description: String;
end;
TTransactionArray = array[0..999] of TTransaction;
CSV Line Parser (Handling Quoted Fields)
function ParseCSVField(const line: String; var pos: Integer): String;
var
inQuotes: Boolean;
begin
Result := '';
if pos > Length(line) then Exit;
inQuotes := False;
if line[pos] = '"' then
begin
inQuotes := True;
Inc(pos); { Skip opening quote }
end;
while pos <= Length(line) do
begin
if inQuotes then
begin
if line[pos] = '"' then
begin
{ Check for escaped quote ("") }
if (pos < Length(line)) and (line[pos + 1] = '"') then
begin
Result := Result + '"';
Inc(pos, 2);
end
else
begin
Inc(pos); { Skip closing quote }
{ Skip comma after closing quote }
if (pos <= Length(line)) and (line[pos] = ',') then
Inc(pos);
Exit;
end;
end
else
begin
Result := Result + line[pos];
Inc(pos);
end;
end
else
begin
if line[pos] = ',' then
begin
Inc(pos); { Skip comma }
Exit;
end
else
begin
Result := Result + line[pos];
Inc(pos);
end;
end;
end;
end;
Import Procedure
function ImportCSV(const filename: String;
var transactions: TTransactionArray;
var count: Integer): Boolean;
var
f: TextFile;
line: String;
pos, code: Integer;
amountStr: String;
begin
Result := False;
count := 0;
AssignFile(f, filename);
{$I-}
Reset(f);
{$I+}
if IOResult <> 0 then
begin
WriteLn('Error: Cannot open file ', filename);
Exit;
end;
while not Eof(f) and (count <= High(transactions)) do
begin
ReadLn(f, line);
line := Trim(line);
if line = '' then Continue;
pos := 1;
transactions[count].TransDate := ParseCSVField(line, pos);
transactions[count].Category := ParseCSVField(line, pos);
amountStr := ParseCSVField(line, pos);
transactions[count].Description := ParseCSVField(line, pos);
Val(amountStr, transactions[count].Amount, code);
if code <> 0 then
begin
WriteLn('Warning: Invalid amount "', amountStr,
'" on line, skipping.');
Continue;
end;
Inc(count);
end;
CloseFile(f);
Result := True;
WriteLn('Imported ', count, ' transactions from ', filename);
end;
Category Search by Substring
Once transactions are imported, a user might want to find all transactions matching a category keyword:
procedure SearchByCategory(const transactions: TTransactionArray;
count: Integer;
const keyword: String);
var
i: Integer;
total: Real;
matches: Integer;
lowerKeyword: String;
begin
total := 0;
matches := 0;
lowerKeyword := LowerCase(keyword);
WriteLn;
WriteLn('--- Transactions matching "', keyword, '" ---');
WriteLn(Format('%-12s %-15s %10s %s',
['Date', 'Category', 'Amount', 'Description']));
WriteLn(StringOfChar('-', 60));
for i := 0 to count - 1 do
begin
if Pos(lowerKeyword, LowerCase(transactions[i].Category)) > 0 then
begin
WriteLn(Format('%-12s %-15s %10.2f %s',
[transactions[i].TransDate,
transactions[i].Category,
transactions[i].Amount,
transactions[i].Description]));
total := total + transactions[i].Amount;
Inc(matches);
end;
end;
WriteLn(StringOfChar('-', 60));
WriteLn(Format('Total: %d transactions, $%.2f', [matches, total]));
end;
This checkpoint demonstrates several key skills working together:
- String splitting (the
ParseCSVFieldfunction uses a state machine for quoted fields). - String-to-number conversion (
Valwith error checking). - Substring search (
PoswithLowerCasefor case-insensitive matching). - Formatted output (
Formatwith width specifiers for tabular display).
✅ Checkpoint Verification. After implementing this code, test with a small CSV file containing at least one quoted field with a comma inside it, one negative amount, and one blank line. Your program should handle all three correctly.
What We Built and Why It Matters
Let us step back and appreciate what we have accomplished in this checkpoint. We started with a raw text file — just bytes on disk — and through a sequence of string operations, we extracted structured, typed data that we can compute with. This is the essence of data parsing, and it is a skill you will use in nearly every non-trivial program.
The ParseCSVField function alone demonstrates five concepts from this chapter working together:
- Character-by-character iteration (Section 10.6) — the
while pos <= Length(line)loop. - State machine logic (Section 10.4) — the
inQuotesflag tracks whether we are inside a quoted field. - String-to-number conversion (Section 10.5) —
Valconverts the amount field. - Substring search (Section 10.3) —
Posenables case-insensitive category matching. - Formatted output (Section 10.5) —
Formatproduces the aligned table display.
This is Theme 5 in its purest form: algorithms (parsing, searching, formatting) operating on data structures (strings, records, arrays) to produce a working program. None of these operations is complicated in isolation. Their power comes from composition — combining simple operations into a pipeline that transforms raw text into meaningful information.
10.10 Chapter Summary
Strings are among the most versatile and frequently used data structures in programming. In this chapter, we covered:
This chapter covered a vast amount of ground. Let us consolidate the key ideas.
String Types. Free Pascal offers multiple string types — ShortString (stack-allocated, 255-character max), AnsiString (heap-allocated, dynamically sized, reference-counted), UnicodeString (UTF-16), and PChar (null-terminated, C-compatible). Using {$H+} makes the plain String type behave as AnsiString, which is the most practical choice for most programs. Remember that ShortString silently truncates strings that exceed its declared capacity — a dangerous behavior that AnsiString avoids entirely.
Basic Operations. Strings support concatenation with +, comparison with relational operators (lexicographic, case-sensitive), and indexing starting at 1. Individual characters can be read and written via bracket notation.
Standard Functions. Pascal provides a rich toolkit: Length for size, Copy for substrings, Pos for searching, Insert and Delete for in-place modification, UpperCase/LowerCase for case conversion, Trim for whitespace removal, and StringReplace for find-and-replace.
Processing Patterns. We built fundamental patterns — splitting strings by delimiters, extracting words with state machines, reversing strings, checking palindromes, and finding all occurrences of a substring. These patterns recur across every domain of programming.
Conversions. The classic Str/Val procedures and the SysUtils functions (IntToStr, StrToInt, FloatToStr, StrToFloat, Format) bridge the gap between textual and numeric data. Val is particularly valuable because it reports the exact position of conversion errors.
Character Operations. Characters are ordinal values. Ord and Chr convert between characters and their numeric codes. Character arithmetic enables everything from case conversion to encryption.
Text Parsing. Our Crypts of Pascalia command parser demonstrated the normalize-tokenize-classify-dispatch pattern. This four-step approach applies to command-line tools, data file parsers, and even simple compilers.
Pattern Matching. We built a wildcard matching function supporting * and ?, along with simpler StartsWith, EndsWith, and Contains helpers.
CSV Import. The PennyWise project checkpoint combined all these skills into a practical CSV parser that handles quoted fields, validates numeric data, and supports case-insensitive category search. This checkpoint demonstrated how string processing serves as the bridge between raw external data and the typed, structured internal representations that programs need to do useful work.
Spaced Review
From Chapter 8: Explain what happens on the call stack when procedure A calls procedure B. When procedure A calls B, the runtime pushes a new stack frame onto the call stack. This frame contains B's local variables, parameters, and the return address (the instruction in A to resume after B finishes). When B completes, its frame is popped, restoring A's context. If B calls another procedure C, a third frame is pushed on top of B's frame — the stack grows and shrinks in strict last-in-first-out order.
From Chapter 6: Write a WHILE loop that reads strings until the user enters 'quit':
var input: String;
begin
Write('Enter text (quit to stop): ');
ReadLn(input);
while LowerCase(Trim(input)) <> 'quit' do
begin
WriteLn('You entered: ', input);
Write('Enter text (quit to stop): ');
ReadLn(input);
end;
WriteLn('Goodbye!');
end.
Looking Ahead
In Chapter 11, we will explore records — custom data structures that group related fields of different types under a single name. You have already seen a preview in this chapter with TCommand and TTransaction. Records are the Pascal mechanism for modeling real-world entities: a student with a name, ID, and GPA; a transaction with a date, amount, and description; an inventory item with a name, weight, and magical properties. Combined with the string processing skills from this chapter, records will let us build truly structured programs.