Chapter 10: Strings — Text Processing in Pascal

26 min read

> "The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information and it is the most common of all data types."

In This Chapter

10.1 Strings in Pascal: A Brief History
10.2 String Declarations and Basics
10.3 Standard String Functions
10.4 String Processing Patterns
10.5 String Conversion
10.6 Character Operations
10.7 Building a Text Parser for Crypts of Pascalia
10.8 Regular-Expression-Style Matching
10.9 Project Checkpoint: PennyWise CSV Import
10.10 Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 10: Strings — Text Processing in Pascal

"The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information and it is the most common of all data types." — Alan J. Perlis, Epigrams on Programming (1982)

Every interesting program eventually needs to work with text. Whether you are parsing a command that a player types into a game, reading a line of comma-separated values from a file, or formatting a number for display on screen, you are working with strings. In Chapter 9, we explored arrays — ordered collections of elements all sharing a single type. A string, at its heart, is an array of characters, but it carries so much extra behavior and historical baggage that it deserves — and gets — an entire chapter of its own.

By the end of this chapter you will be able to:

Distinguish between ShortString, AnsiString, and other string types in Free Pascal.
Use standard string functions (Length, Copy, Pos, Insert, Delete, Concat, Trim, and friends).
Parse and process text data — splitting, searching, and replacing.
Implement basic string algorithms (reverse, palindrome check, word counting).
Handle string-to-number and number-to-string conversions cleanly.

We will build toward two concrete milestones: a command parser for our ongoing Crypts of Pascalia text adventure, and a CSV import procedure for the PennyWise personal-finance application. Both of these require you to treat text not as something to display, but as something to dissect.

10.1 Strings in Pascal: A Brief History

Pascal was born in 1970. At that time, most languages treated strings as fixed-length arrays of characters, and Niklaus Wirth's original Pascal was no exception — the language did not even include a built-in string type in its earliest definition. Programmers had to declare something like packed array[1..80] of Char and manage lengths manually.

Borland Turbo Pascal (1983) changed everything for practical Pascal programmers by introducing the ShortString: a length-prefixed sequence of characters stored in a fixed 256-byte block. The first byte held the current length (0–255), and bytes 1–255 held the characters. This was fast, simple, and sufficient for the era of 80-column terminals.

As programs grew, 255 characters stopped being enough. Delphi introduced AnsiString — a reference-counted, heap-allocated, dynamically sized string that could hold up to 2 GB of text. Free Pascal adopted this model and added UnicodeString for full UTF-16 support.

Here is a summary of the string types you will encounter in Free Pascal 3.2+:

Type	Max Length	Storage	Encoding	Index Base
`ShortString`	255 chars	Stack (256 bytes)	System codepage	1
`String[N]`	N chars (1–255)	Stack (N+1 bytes)	System codepage	1
`AnsiString`	~2 GB	Heap, ref-counted	System codepage	1
`UnicodeString`	~1 GB chars	Heap, ref-counted	UTF-16	1
`WideString`	~1 GB chars	Heap, COM-compatible	UTF-16	1
`PChar`	Null-terminated	Pointer	System codepage	0

💡 Practical Guidance. For nearly all work in this textbook, the plain String type — which in Free Pascal's default mode maps to ShortString but in {$mode Delphi}` or `{$H+} maps to AnsiString — is all you need. We will use {$H+} in most examples so that String means AnsiString, giving us dynamic length without worrying about the 255-character limit.

Why does this history matter? Because you will encounter old code, old tutorials, and old habits. Understanding that ShortString stores its length in byte zero (accessible as Ord(s[0])) while AnsiString stores metadata in a hidden header before the first character explains many of the quirks you will see in Pascal string code. Theme 2 — discipline transfers — applies here: the mental model of "a string is a length-prefixed array of characters" transfers from Pascal to virtually every other language, even if the implementation details differ.

How Other Languages Handle Strings

It is worth pausing to see how Pascal's approach compares to other languages, because this comparison deepens your understanding of why Pascal strings work the way they do:

C uses null-terminated strings: a plain array of characters with a zero byte ('\0') marking the end. This means finding the length of a C string requires scanning every character — an O(n) operation. Pascal's length-prefixed approach makes Length an O(1) operation — instant, regardless of string size.
Java and C# use immutable string objects. Once created, a Java String cannot be modified; any "modification" creates a new string. Pascal's AnsiString uses copy-on-write, which achieves a similar effect but more efficiently — the copy only happens when you actually modify the string, not on every assignment.
Python strings are also immutable sequences of Unicode characters. Python's + concatenation creates a new string each time, much like naive Pascal concatenation. Python programmers learn to use ''.join(list) for efficiency; Pascal programmers learn to use SetLength and index assignment.

The mental model you build here — "a string is a sequence of characters with known length, supporting indexing, searching, and extraction" — is universal. The specific functions differ (Copy in Pascal, substring in Java, slicing in Python), but the concepts are identical.

ShortString Internals

A ShortString (or equivalently String[N] for some N between 1 and 255) is laid out in memory as follows:

Byte 0:  Current length (0..N)
Byte 1:  First character
Byte 2:  Second character
...
Byte N:  Last possible character

You can read the length either with Length(s) or, in classic Turbo Pascal style, with Ord(s[0]). We strongly recommend Length(s) — it works for every string type, not just ShortString.

AnsiString Internals (Conceptual)

An AnsiString variable is a pointer. When the string is empty, the pointer is nil. When the string has content, the pointer refers to the first character of a heap-allocated buffer that is preceded by a hidden header containing:

A reference count (how many variables point to this buffer).
The length in characters.
The codepage identifier.

When you assign one AnsiString to another (t := s), Free Pascal does not copy the characters. It increments the reference count. Only when you modify one of the copies does the compiler generate a copy-on-write — allocating a new buffer and copying the data. This is efficient and invisible to you in everyday coding, but it explains why AnsiString is so much faster than you might expect for passing large strings to procedures.

⚠️ Warning. Because AnsiString is reference-counted, you should never use pointer arithmetic or Move on its internal buffer unless you truly know what you are doing. Let the compiler manage the memory.

10.2 String Declarations and Basics

Let us begin writing code. Consider the following declarations:

{$mode objfpc}{$H+}
var
  greeting : String;          // AnsiString (because {$H+})
  short    : String[40];      // ShortString, max 40 chars
  initial  : Char;            // single character

Assignment and Concatenation

Strings are assigned with := and concatenated with the + operator or the Concat function:

greeting := 'Hello';
greeting := greeting + ', world!';   // Now 'Hello, world!'
greeting := Concat('Hello', ', ', 'world!');  // Same result

Single characters and strings are largely interchangeable on the right side of an assignment. You can concatenate a Char onto a String:

initial := '!';
greeting := 'Hello' + initial;  // 'Hello!'

Indexing

Strings in Pascal are 1-indexed. The first character is s[1], the second is s[2], and so on:

greeting := 'Hello';
WriteLn(greeting[1]);  // 'H'
WriteLn(greeting[5]);  // 'o'

You can also assign to individual characters:

greeting[1] := 'J';
WriteLn(greeting);  // 'Jello'

⚠️ Warning. Indexing beyond Length(s) is undefined behavior for ShortString and will raise an exception (with range checking enabled) or silently read garbage. Always check bounds.

Comparison

Strings can be compared with the standard relational operators: =, <>, <, >, <=, >=. Comparison is lexicographic — character by character, using the ordinal values of the characters:

if 'apple' < 'banana' then
  WriteLn('apple comes first');  // True: 'a' (97) < 'b' (98)

if 'Apple' < 'apple' then
  WriteLn('uppercase first');    // True: 'A' (65) < 'a' (97)

This means comparison is case-sensitive by default. If you need case-insensitive comparison, convert both strings to the same case first:

if LowerCase(s1) = LowerCase(s2) then
  WriteLn('Equal (case-insensitive)');

Empty Strings and Length

An empty string has length zero:

var s: String;
begin
  s := '';
  WriteLn(Length(s));  // 0
  if s = '' then
    WriteLn('Empty!');
end.

String Length vs. Capacity

A subtle but important distinction: for ShortString, the capacity (maximum number of characters the variable can hold) is fixed at declaration time, but the length (number of characters currently stored) can vary from 0 to that maximum. For AnsiString, the capacity grows automatically as needed — you never need to worry about it.

var
  short: String[10];   { Capacity = 10, Length starts at 0 }
  long : String;        { AnsiString: capacity is unlimited }
begin
  short := 'Hi';         { Length = 2, capacity still 10 }
  short := 'Hello World'; { Silently truncated to 'Hello Worl' (10 chars) }
  WriteLn(short);         { 'Hello Worl' — data loss with no warning! }

  long := 'Hello World';  { Length = 11, capacity grows to fit }
  WriteLn(long);           { 'Hello World' — no truncation }
end.

⚠️ Warning. ShortString silently truncates strings that exceed its declared maximum. This is one of the most insidious sources of bugs in Pascal programs that use ShortString. This is another reason we prefer AnsiString via {$H+} — it never truncates.

Iterating Through a String

Since strings are indexed sequences, you can process them character by character using a for loop. This pattern is fundamental to nearly everything else in this chapter:

var
  s: String;
  i: Integer;
  vowelCount: Integer;
begin
  s := 'Hello, World!';
  vowelCount := 0;

  for i := 1 to Length(s) do
  begin
    case UpCase(s[i]) of
      'A', 'E', 'I', 'O', 'U':
        Inc(vowelCount);
    end;
  end;

  WriteLn('Vowels found: ', vowelCount);  { 3 }
end.

Notice the use of UpCase (a function that converts a single Char to uppercase) combined with a case statement. This is cleaner than writing a long chain of if comparisons.

10.3 Standard String Functions

Free Pascal provides a rich set of built-in string routines. Let us walk through the ones you will use most often. We will build a quick-reference table first, then explore each with examples.

Function	Signature (simplified)	Purpose
`Length`	`Length(s): Integer`	Returns the number of characters in `s`
`Copy`	`Copy(s, start, count): String`	Extracts a substring
`Pos`	`Pos(substr, s): Integer`	Finds first occurrence of `substr` in `s` (0 if not found)
`Insert`	`Insert(source, var target, index)`	Inserts `source` into `target` at position `index`
`Delete`	`Delete(var s, start, count)`	Removes `count` characters from `s` starting at `start`
`Concat`	`Concat(s1, s2, ...): String`	Concatenates multiple strings
`UpperCase`	`UpperCase(s): String`	Converts to uppercase
`LowerCase`	`LowerCase(s): String`	Converts to lowercase
`Trim`	`Trim(s): String`	Removes leading and trailing whitespace
`TrimLeft`	`TrimLeft(s): String`	Removes leading whitespace
`TrimRight`	`TrimRight(s): String`	Removes trailing whitespace
`StringReplace`	`StringReplace(s, old, new, flags): String`	Replaces occurrences (in `SysUtils`)
`StringOfChar`	`StringOfChar(c, n): String`	Creates a string of `n` copies of character `c`

Length

Length returns the number of characters currently stored in the string — not the maximum capacity:

var s: String;
begin
  s := 'Free Pascal';
  WriteLn(Length(s));  // 11
end.

Copy

Copy(s, start, count) extracts a substring. It returns count characters beginning at position start. If start + count exceeds the length, it returns whatever characters remain without error:

var s, sub: String;
begin
  s := 'Hello, world!';
  sub := Copy(s, 8, 5);   // 'world'
  WriteLn(sub);

  sub := Copy(s, 8, 100);  // 'world!' — no error, just takes what's there
  WriteLn(sub);
end.

Copy never modifies the original string. It is a pure function.

Pos

Pos(substr, s) searches for substr within s and returns the position of its first character. If not found, it returns 0:

var idx: Integer;
begin
  idx := Pos('world', 'Hello, world!');
  WriteLn(idx);  // 8

  idx := Pos('xyz', 'Hello, world!');
  WriteLn(idx);  // 0
end.

💡 Tip. Pos finds only the first occurrence. To find all occurrences, you need a loop — we will build one shortly.

Insert and Delete

Insert and Delete are procedures, not functions — they modify the target string variable in place:

var s: String;
begin
  s := 'Hello world!';
  Insert('beautiful ', s, 7);   // s = 'Hello beautiful world!'
  WriteLn(s);

  Delete(s, 7, 10);             // s = 'Hello world!'
  WriteLn(s);
end.

Insert(source, target, index) shifts characters to the right and inserts source at position index. Delete(s, start, count) removes count characters starting at start, shifting subsequent characters to the left.

UpperCase, LowerCase, Trim

These are straightforward:

WriteLn(UpperCase('hello'));   // 'HELLO'
WriteLn(LowerCase('HELLO'));   // 'hello'
WriteLn(Trim('  hello  '));    // 'hello'

Trim removes both leading and trailing whitespace (spaces, tabs, carriage returns, line feeds). TrimLeft and TrimRight handle one side only.

StringReplace (from SysUtils)

For find-and-replace operations, StringReplace is invaluable:

uses SysUtils;

var s: String;
begin
  s := 'The cat sat on the cat mat';
  s := StringReplace(s, 'cat', 'dog', [rfReplaceAll]);
  WriteLn(s);  // 'The dog sat on the dog mat'
end.

The flags set can include: - rfReplaceAll — replace all occurrences (default replaces only the first). - rfIgnoreCase — case-insensitive matching.

Combining Functions: Finding All Occurrences

Here is a pattern you will use often — finding every occurrence of a substring:

procedure FindAllOccurrences(const haystack, needle: String);
var
  startPos, found: Integer;
begin
  startPos := 1;
  repeat
    found := Pos(needle, Copy(haystack, startPos, Length(haystack)));
    if found > 0 then
    begin
      WriteLn('Found at position ', startPos + found - 1);
      startPos := startPos + found;  // Move past this occurrence
    end;
  until found = 0;
end;

This combines Pos and Copy to scan through the entire string. Notice the arithmetic: Pos returns a position relative to the copy, so we add startPos - 1 to get the position in the original string.

Common Mistakes with String Functions

Before moving on, let us address three mistakes that trip up nearly every beginner:

Mistake 1: Confusing parameter order in Pos. The signature is Pos(needle, haystack) — the substring you are searching for comes first, the string you are searching in comes second. Many students write Pos(s, 'hello') when they mean Pos('hello', s). If you come from other languages where the order is reversed (like Python's s.find('hello')), this will catch you.

Mistake 2: Forgetting that Insert and Delete are procedures. Writing result := Insert('x', s, 3) will not compile — Insert does not return a value. It modifies s directly. If you need the result without modifying the original, make a copy first:

temp := s;
Insert('x', temp, 3);
{ Now temp is modified, s is unchanged }

Mistake 3: Off-by-one errors with Copy after Pos. When you find a delimiter at position p and want to extract everything after it, the start position for Copy is p + Length(delimiter), not p + 1 (unless the delimiter is a single character). This mistake is especially common when the delimiter is a multi-character string like ', ' or ': '.

10.4 String Processing Patterns

With the fundamental functions in hand, we can tackle real-world string processing tasks. In this section, we build three essential patterns: splitting a string by a delimiter, extracting words, and parsing structured text.

Pattern 1: Splitting a String by a Delimiter

One of the most common tasks in programming is splitting a string into parts at some delimiter. Pascal does not have a built-in Split function (though Free Pascal's SysUtils has string.Split in newer modes), so we will build one:

procedure SplitString(const s, delimiter: String;
                      var parts: array of String;
                      var count: Integer);
var
  remaining: String;
  delimPos: Integer;
begin
  count := 0;
  remaining := s;

  repeat
    delimPos := Pos(delimiter, remaining);
    if delimPos > 0 then
    begin
      if count <= High(parts) then
      begin
        parts[count] := Copy(remaining, 1, delimPos - 1);
        Inc(count);
      end;
      remaining := Copy(remaining, delimPos + Length(delimiter),
                         Length(remaining));
    end
    else
    begin
      { Last segment (or only segment if no delimiter found) }
      if count <= High(parts) then
      begin
        parts[count] := remaining;
        Inc(count);
      end;
    end;
  until delimPos = 0;
end;

This procedure takes a string s, a delimiter, an open array parts to fill in, and a count to report how many parts were found. It repeatedly finds the delimiter, extracts the text before it, and advances through the string.

Example usage:

var
  fields: array[0..9] of String;
  n, i: Integer;
begin
  SplitString('apple,banana,cherry', ',', fields, n);
  for i := 0 to n - 1 do
    WriteLn(fields[i]);
  // Output:
  // apple
  // banana
  // cherry
end.

📊 Theme 5 (Algorithms + Data Structures = Programs). This split procedure is a clear example of Theme 5 in action. The algorithm (repeated search-and-extract) operates on the data structure (a string, which is itself an array of characters) to produce a new data structure (an array of substrings). Neither the algorithm nor the data structure alone solves the problem — it is their combination that makes the program work.

Pattern 2: Word Extraction

Extracting individual words from text is slightly different from splitting by a single delimiter, because words can be separated by any amount of whitespace — spaces, tabs, or multiple spaces in a row:

function ExtractWord(const s: String; wordIndex: Integer): String;
var
  i, currentWord, wordStart: Integer;
  inWord: Boolean;
begin
  Result := '';
  currentWord := 0;
  inWord := False;

  for i := 1 to Length(s) do
  begin
    if s[i] <> ' ' then
    begin
      if not inWord then
      begin
        inWord := True;
        Inc(currentWord);
        wordStart := i;
      end;
      if currentWord = wordIndex then
        Result := Result + s[i];
    end
    else
    begin
      if inWord and (currentWord = wordIndex) then
        Exit;  // We have the complete word
      inWord := False;
    end;
  end;
end;

This function uses a state machine with a Boolean flag inWord that tracks whether we are currently inside a word or in whitespace. This is a fundamental pattern in text parsing — you will see it again and again.

Pattern 3: Counting Words

Building on the state machine approach:

function CountWords(const s: String): Integer;
var
  i: Integer;
  inWord: Boolean;
begin
  Result := 0;
  inWord := False;
  for i := 1 to Length(s) do
  begin
    if s[i] <> ' ' then
    begin
      if not inWord then
      begin
        inWord := True;
        Inc(Result);
      end;
    end
    else
      inWord := False;
  end;
end;

Pattern 4: String Reversal

Reversing a string is a classic exercise that reinforces indexing:

function ReverseString(const s: String): String;
var
  i: Integer;
begin
  Result := '';
  for i := Length(s) downto 1 do
    Result := Result + s[i];
end;

⚠️ Performance Note. Concatenating one character at a time in a loop (Result := Result + s[i]) is simple but inefficient for very long strings, because each concatenation may allocate a new buffer. For production code on large strings, pre-allocate with SetLength(Result, Length(s)) and assign characters by index. For the string sizes we work with in this textbook (up to a few thousand characters), the simple approach is fine.

Pattern 5: Palindrome Check

A palindrome reads the same forwards and backwards. We can check this efficiently without creating a reversed copy:

function IsPalindrome(const s: String): Boolean;
var
  left, right: Integer;
begin
  left := 1;
  right := Length(s);
  while left < right do
  begin
    if LowerCase(s[left]) <> LowerCase(s[right]) then
      Exit(False);
    Inc(left);
    Dec(right);
  end;
  Result := True;
end;

This uses a two-pointer technique — left starts at the beginning, right at the end, and they walk toward each other comparing characters. The function exits as soon as a mismatch is found.

Pattern 6: Efficient String Reversal with SetLength

We mentioned earlier that concatenating one character at a time is inefficient. Here is the production-quality version of string reversal that pre-allocates:

function ReverseStringFast(const s: String): String;
var
  i, len: Integer;
begin
  len := Length(s);
  SetLength(Result, len);
  for i := 1 to len do
    Result[len - i + 1] := s[i];
end;

SetLength(Result, len) allocates a string of exactly len characters in one operation. We then fill in the characters by index, which involves no allocation at all — just memory writes. For a string of 10,000 characters, this is roughly 10,000 times faster than the naive version, because we avoid 10,000 separate heap allocations.

💡 When does performance matter? For strings under a few hundred characters (the vast majority of strings in typical programs), the naive approach is fine — the overhead is unmeasurable. For processing large files or running string operations in tight loops, the SetLength-and-index approach is essential. Learn both: the simple version for clarity, the efficient version for when you need speed.

Putting the Patterns Together: A Text Statistics Function

Let us combine several patterns into a practical function that computes basic statistics about a block of text:

procedure TextStatistics(const text: String);
var
  i: Integer;
  charCount, wordCount, sentenceCount, lineCount: Integer;
  inWord: Boolean;
begin
  charCount := Length(text);
  wordCount := 0;
  sentenceCount := 0;
  lineCount := 1;  { At least one line if text is non-empty }
  inWord := False;

  for i := 1 to Length(text) do
  begin
    { Count words using state machine }
    if (text[i] <> ' ') and (text[i] <> #10) and (text[i] <> #13) then
    begin
      if not inWord then
      begin
        inWord := True;
        Inc(wordCount);
      end;
    end
    else
      inWord := False;

    { Count sentences (rough: period, exclamation, question mark) }
    if text[i] in ['.', '!', '?'] then
      Inc(sentenceCount);

    { Count lines }
    if text[i] = #10 then
      Inc(lineCount);
  end;

  WriteLn('Characters: ', charCount);
  WriteLn('Words:      ', wordCount);
  WriteLn('Sentences:  ', sentenceCount);
  WriteLn('Lines:      ', lineCount);
  if wordCount > 0 then
    WriteLn('Avg word/sentence: ', sentenceCount / wordCount:0:1);
end;

This function processes the text in a single pass through the string — it visits each character exactly once. This is efficient and demonstrates how the state machine pattern scales to handle multiple counting tasks simultaneously.

10.5 String Conversion

Programs constantly need to convert between strings and numbers. A financial application reads '42.99' from a CSV file and needs the number 42.99 for arithmetic. A game displays 'Score: 1500' by converting the integer 1500 to text. Pascal provides several mechanisms for these conversions.

Str and Val (Classic Pascal)

The classic approach uses Str and Val:

var
  s: String;
  n: Integer;
  code: Integer;
begin
  { Number to string }
  Str(42, s);
  WriteLn(s);  // '42'

  Str(3.14159:8:4, s);  // Formatted: 8 total width, 4 decimal places
  WriteLn(s);  // '  3.1416'

  { String to number }
  Val('123', n, code);
  if code = 0 then
    WriteLn('Parsed: ', n)  // 123
  else
    WriteLn('Error at position ', code);

  Val('12x3', n, code);
  WriteLn('Error at position ', code);  // 3 (where 'x' was found)
end.

Val(s, variable, code) is particularly useful because it reports where the conversion failed via the code parameter. A code of 0 means success; any other value is the position of the first invalid character.

SysUtils Conversion Functions

The SysUtils unit provides more convenient (though less informative on error) functions:

uses SysUtils;

var
  s: String;
  n: Integer;
  x: Double;
begin
  { Integer conversions }
  s := IntToStr(42);        // '42'
  n := StrToInt('42');      // 42
  n := StrToIntDef('abc', 0);  // 0 (default on failure)

  { Float conversions }
  s := FloatToStr(3.14);   // '3.14'
  x := StrToFloat('2.718');  // 2.718
  x := StrToFloatDef('bad', 0.0);  // 0.0

  { Formatted output }
  s := Format('Name: %s, Age: %d, GPA: %.2f',
              ['Alice', 20, 3.85]);
  WriteLn(s);  // 'Name: Alice, Age: 20, GPA: 3.85'
end.

💡 Which should you use? For simple conversions where you trust the input (or want an exception on bad data), use StrToInt and StrToFloat. For parsing user input or file data where bad values are possible, use Val (for precise error location) or StrToIntDef/StrToFloatDef (for default-on-failure behavior). In the PennyWise project, where we parse CSV files that might contain malformed data, we will use Val so we can report exactly where parsing failed.

The Format Function

Format deserves special attention. It works like printf in C, using format specifiers:

Specifier	Type	Example
`%s`	String	`Format('%s', ['hello'])` → `'hello'`
`%d`	Integer	`Format('%d', [42])` → `'42'`
`%f`	Float	`Format('%.2f', [3.14])` → `'3.14'`
`%e`	Scientific	`Format('%e', [1234.0])` → `'1.234E+003'`
`%x`	Hexadecimal	`Format('%x', [255])` → `'FF'`
`%%`	Literal `%`	`Format('100%%')` → `'100%'`

Width and precision are specified between % and the letter: %10d means at least 10 characters wide (right-aligned), %-10s means left-aligned in 10 characters, %8.2f means 8 total characters with 2 decimal places.

Practical Example: Formatted Receipt

Let us put Format to work by generating a formatted receipt — a task that combines string conversion, alignment, and string construction:

procedure PrintReceipt;
var
  items: array[0..2] of String = ('Widget', 'Gadget', 'Sprocket');
  prices: array[0..2] of Real = (4.99, 19.95, 2.50);
  quantities: array[0..2] of Integer = (3, 1, 10);
  i: Integer;
  lineTotal, grandTotal: Real;
begin
  WriteLn(Format('%-20s %5s %8s %10s', ['Item', 'Qty', 'Price', 'Total']));
  WriteLn(StringOfChar('=', 46));
  grandTotal := 0;

  for i := 0 to 2 do
  begin
    lineTotal := prices[i] * quantities[i];
    grandTotal := grandTotal + lineTotal;
    WriteLn(Format('%-20s %5d %8.2f %10.2f',
                   [items[i], quantities[i], prices[i], lineTotal]));
  end;

  WriteLn(StringOfChar('-', 46));
  WriteLn(Format('%35s %10.2f', ['TOTAL:', grandTotal]));
end;

Output:

Item                   Qty    Price      Total
==============================================
Widget                   3     4.99      14.97
Gadget                   1    19.95      19.95
Sprocket                10     2.50      25.00
----------------------------------------------
                             TOTAL:      59.92

This kind of formatted output is essential for the PennyWise project — financial data must be neatly aligned and properly formatted for readability.

A Note on Locale and Decimal Separators

When using StrToFloat and FloatToStr, be aware that the decimal separator depends on the system locale. In the United States, it is a period (3.14); in France and Germany, it is a comma (3,14). If your program reads data files that always use a period, you may need to set DefaultFormatSettings.DecimalSeparator := '.' at the beginning of your program to avoid conversion errors on European systems. This is a common source of bugs in programs that process international data.

10.6 Character Operations

Since strings are sequences of characters, understanding the Char type is essential. A Char in Free Pascal is a single byte representing one character in the system codepage (typically ASCII-compatible for values 0–127).

Ord and Chr

Ord converts a character to its numeric value (ordinal). Chr converts a number back to a character:

WriteLn(Ord('A'));    // 65
WriteLn(Ord('a'));    // 97
WriteLn(Ord('0'));    // 48
WriteLn(Chr(65));     // 'A'
WriteLn(Chr(10));     // Line feed (newline)

Character Classification

You can classify characters using comparisons or functions from SysUtils:

function IsLetter(c: Char): Boolean;
begin
  Result := (c >= 'A') and (c <= 'Z') or
            (c >= 'a') and (c <= 'z');
end;

function IsDigit(c: Char): Boolean;
begin
  Result := (c >= '0') and (c <= '9');
end;

function IsWhitespace(c: Char): Boolean;
begin
  Result := c in [' ', #9, #10, #13];  // space, tab, LF, CR
end;

Or use the Character unit in modern Free Pascal for Unicode-aware classification.

Character Arithmetic

Because characters are ordinal values, you can perform arithmetic on them:

var c: Char;
begin
  c := 'A';
  c := Chr(Ord(c) + 1);   // 'B'
  WriteLn(c);

  { Convert digit character to integer }
  WriteLn(Ord('7') - Ord('0'));  // 7

  { Convert uppercase to lowercase }
  c := 'G';
  c := Chr(Ord(c) + 32);  // 'g' (only works for A-Z!)
  WriteLn(c);
end.

This character arithmetic is the foundation for the Caesar cipher we will build in Case Study 1. It also demonstrates Theme 2 — the concept that characters are just numbers in a particular encoding transfers to every programming language.

Building Character-Level Functions

Here is a function that strips all non-alphabetic characters from a string, which is useful for preparing text for analysis:

function StripNonAlpha(const s: String): String;
var
  i: Integer;
begin
  Result := '';
  for i := 1 to Length(s) do
    if ((s[i] >= 'A') and (s[i] <= 'Z')) or
       ((s[i] >= 'a') and (s[i] <= 'z')) then
      Result := Result + s[i];
end;

And a function that counts occurrences of a specific character:

function CountChar(const s: String; c: Char): Integer;
var
  i: Integer;
begin
  Result := 0;
  for i := 1 to Length(s) do
    if s[i] = c then
      Inc(Result);
end;

The ASCII Table: A Mental Map

You do not need to memorize the entire ASCII table, but knowing the structure helps you write character-level code confidently:

Range	Characters	Notes
0–31	Control characters	`#9` = Tab, `#10` = LF, `#13` = CR
32	Space	The first printable character
48–57	`'0'` through `'9'`	Digits are contiguous and in order
65–90	`'A'` through `'Z'`	Uppercase letters, contiguous and in order
97–122	`'a'` through `'z'`	Lowercase letters, contiguous and in order

The fact that digits, uppercase letters, and lowercase letters each occupy contiguous ranges is what makes character arithmetic work. The gap between 'A' (65) and 'a' (97) is always 32, so converting case is as simple as adding or subtracting 32 — though using UpCase or LowerCase is always more readable and more correct (it handles edge cases you might forget).

Practical Example: Validating Input

Character operations are essential for input validation. Here is a function that checks whether a string is a valid identifier (starts with a letter, contains only letters, digits, and underscores):

function IsValidIdentifier(const s: String): Boolean;
var
  i: Integer;
begin
  Result := False;
  if Length(s) = 0 then Exit;

  { First character must be a letter or underscore }
  if not ((s[1] >= 'A') and (s[1] <= 'Z') or
          (s[1] >= 'a') and (s[1] <= 'z') or
          (s[1] = '_')) then
    Exit;

  { Remaining characters: letters, digits, or underscores }
  for i := 2 to Length(s) do
  begin
    if not ((s[i] >= 'A') and (s[i] <= 'Z') or
            (s[i] >= 'a') and (s[i] <= 'z') or
            (s[i] >= '0') and (s[i] <= '9') or
            (s[i] = '_')) then
      Exit;
  end;

  Result := True;
end;

This function validates identifier names — useful in the Crypts of Pascalia parser (validating command names), in PennyWise (validating category names), and in any program that accepts structured input from users.

10.7 Building a Text Parser for Crypts of Pascalia

Now we bring everything together in our anchor example. Crypts of Pascalia is a text adventure where the player types commands like:

go north
take sword
look around
use key on door
inventory
examine rusty shield

Our parser needs to:

Read a line of input.
Normalize it (trim whitespace, convert to lowercase).
Extract the verb (first word) and the object (everything after the verb).
Handle multi-word commands like use key on door by identifying the verb, the direct object, and a preposition + indirect object.

Step 1: Normalization

function NormalizeInput(const raw: String): String;
begin
  Result := LowerCase(Trim(raw));
  { Collapse multiple spaces to single spaces }
  while Pos('  ', Result) > 0 do
    Result := StringReplace(Result, '  ', ' ', [rfReplaceAll]);
end;

Step 2: Command Record

We define a record to hold the parsed command:

type
  TCommand = record
    Verb       : String;
    DirectObj  : String;
    Preposition: String;
    IndirectObj: String;
    IsValid    : Boolean;
  end;

Step 3: The Parser

function ParseCommand(const raw: String): TCommand;
var
  normalized: String;
  spacePos, prepPos: Integer;
  rest: String;
const
  Prepositions: array[0..4] of String =
    ('on', 'with', 'at', 'to', 'from');
var
  i: Integer;
begin
  Result.Verb := '';
  Result.DirectObj := '';
  Result.Preposition := '';
  Result.IndirectObj := '';
  Result.IsValid := False;

  normalized := NormalizeInput(raw);
  if normalized = '' then Exit;

  { Extract verb (first word) }
  spacePos := Pos(' ', normalized);
  if spacePos = 0 then
  begin
    { Single-word command like "inventory" or "quit" }
    Result.Verb := normalized;
    Result.IsValid := True;
    Exit;
  end;

  Result.Verb := Copy(normalized, 1, spacePos - 1);
  rest := Copy(normalized, spacePos + 1, Length(normalized));

  { Look for a preposition in the rest of the command }
  for i := Low(Prepositions) to High(Prepositions) do
  begin
    prepPos := Pos(' ' + Prepositions[i] + ' ', ' ' + rest + ' ');
    if prepPos > 0 then
    begin
      Result.DirectObj := Trim(Copy(rest, 1, prepPos - 1));
      Result.Preposition := Prepositions[i];
      Result.IndirectObj := Trim(Copy(rest,
        prepPos + Length(Prepositions[i]) + 1, Length(rest)));
      Result.IsValid := True;
      Exit;
    end;
  end;

  { No preposition found — everything after verb is the direct object }
  Result.DirectObj := rest;
  Result.IsValid := True;
end;

Step 4: Dispatching Commands

With parsed commands in hand, we can dispatch to game logic:

procedure ExecuteCommand(const cmd: TCommand);
begin
  if not cmd.IsValid then
  begin
    WriteLn('I don''t understand that command.');
    Exit;
  end;

  case cmd.Verb of
    'go', 'walk', 'move':
      WriteLn('You move ', cmd.DirectObj, '.');
    'take', 'get', 'grab':
      WriteLn('You pick up the ', cmd.DirectObj, '.');
    'look', 'examine':
      if cmd.DirectObj = '' then
        WriteLn('You look around the room.')
      else
        WriteLn('You examine the ', cmd.DirectObj, '.');
    'use':
      if cmd.IndirectObj <> '' then
        WriteLn('You use the ', cmd.DirectObj, ' ',
                cmd.Preposition, ' the ', cmd.IndirectObj, '.')
      else
        WriteLn('You use the ', cmd.DirectObj, '.');
    'inventory', 'inv', 'i':
      WriteLn('You check your inventory.');
    'quit', 'exit':
      WriteLn('Farewell, adventurer!');
  else
    WriteLn('I don''t know how to "', cmd.Verb, '".');
  end;
end;

🔗 Connection to earlier chapters. This parser uses records (Chapter 9 introduced arrays, but we preview records here — they get full treatment in Chapter 11). It uses procedures with parameters (Chapter 8), loops and conditionals (Chapter 5–6), and of course every string function from this chapter. This is Theme 5 in full force: algorithms (parsing), data structures (strings, records, arrays), and their combination yield a program.

Running the Parser

Here is a sample interaction:

> use key on door
You use the key on the door.
> take rusty sword
You pick up the rusty sword.
> go north
You move north.
> look
You look around the room.
> xyzzy
I don't know how to "xyzzy".

The parser is deliberately simple — it does not handle ambiguity, synonyms for objects, or articles like "the." A production text adventure would need significantly more sophistication. But this parser demonstrates the core technique: normalize, tokenize, classify, dispatch. This four-step pattern appears in everything from command-line tools to compilers, and mastering it in Pascal gives you a template that transfers to any language.

Why Four Steps?

Let us examine why each step in the normalize-tokenize-classify-dispatch pattern is necessary, and what goes wrong if you skip one:

Normalize. Without normalization, "GO NORTH", " go north ", and "Go North" are three different inputs that should all do the same thing. If you skip normalization, you need to handle every possible capitalization and whitespace variation in your dispatch logic — an exponential explosion of cases.
Tokenize. Without tokenization, you would need to search the raw string for every possible command. Pattern matching on raw text is fragile: Pos('go', input) would match "forget" (because it contains "go"). Tokenizing into words eliminates this problem because you compare whole words, not substrings.
Classify. Without classification, your dispatch code must understand the grammar of every possible sentence. By identifying the verb, objects, and prepositions as distinct roles, you separate grammar from action — the dispatch code only needs to know what to do with a verb and its objects, not how to parse English.
Dispatch. Without a clean dispatch mechanism (like our case statement on the verb), adding new commands requires modifying complex conditional logic. With dispatch, adding a new command is as simple as adding a new branch.

📊 Theme 2 (Discipline Transfers). This four-step pattern is not just for text adventures. Web servers normalize URLs, tokenize path segments, classify the HTTP method, and dispatch to handler functions. Compilers normalize source code (remove comments), tokenize into lexemes, classify tokens, and dispatch to the parser. Database engines normalize queries, tokenize SQL, classify clauses, and dispatch to the query planner. The pattern is universal because it addresses a universal problem: transforming unstructured text into structured action.

10.8 Regular-Expression-Style Matching

Pascal does not include a built-in regular expression engine (though Free Pascal has the RegExpr unit available). For many tasks, however, you can build simple pattern matching with the string functions you already know. Let us build a function that matches strings against patterns containing wildcards:

* matches any sequence of characters (including empty).
? matches exactly one character.

function WildcardMatch(const pattern, text: String): Boolean;
var
  pIdx, tIdx: Integer;
  pLen, tLen: Integer;
  starPIdx, starTIdx: Integer;
begin
  pIdx := 1;
  tIdx := 1;
  pLen := Length(pattern);
  tLen := Length(text);
  starPIdx := 0;
  starTIdx := 0;

  while tIdx <= tLen do
  begin
    if (pIdx <= pLen) and ((pattern[pIdx] = '?') or
        (pattern[pIdx] = text[tIdx])) then
    begin
      { Exact match or ? wildcard }
      Inc(pIdx);
      Inc(tIdx);
    end
    else if (pIdx <= pLen) and (pattern[pIdx] = '*') then
    begin
      { Star: remember position, try matching zero characters }
      starPIdx := pIdx;
      starTIdx := tIdx;
      Inc(pIdx);
    end
    else if starPIdx > 0 then
    begin
      { Backtrack: star matches one more character }
      pIdx := starPIdx + 1;
      Inc(starTIdx);
      tIdx := starTIdx;
    end
    else
      Exit(False);
  end;

  { Skip trailing stars in pattern }
  while (pIdx <= pLen) and (pattern[pIdx] = '*') do
    Inc(pIdx);

  Result := pIdx > pLen;
end;

Example usage:

WriteLn(WildcardMatch('*.txt', 'readme.txt'));     // True
WriteLn(WildcardMatch('data_???.csv', 'data_001.csv'));  // True
WriteLn(WildcardMatch('hello*world', 'hello beautiful world'));  // True
WriteLn(WildcardMatch('test', 'testing'));          // False

This algorithm uses a greedy match with backtracking strategy. The * initially tries to match zero characters. If later matching fails, it backtracks and has the * consume one more character, then tries again. This is a simplified version of the algorithm used in Unix shell globbing.

A Simpler Alternative: StartsWith, EndsWith, Contains

For many practical needs, you do not need full wildcard matching. These three utility functions handle the most common cases:

function StartsWith(const s, prefix: String): Boolean;
begin
  Result := Copy(s, 1, Length(prefix)) = prefix;
end;

function EndsWith(const s, suffix: String): Boolean;
begin
  if Length(suffix) > Length(s) then
    Result := False
  else
    Result := Copy(s, Length(s) - Length(suffix) + 1,
                   Length(suffix)) = suffix;
end;

function Contains(const s, substr: String): Boolean;
begin
  Result := Pos(substr, s) > 0;
end;

These are simple, readable, and cover a surprising number of real-world matching needs — checking file extensions, finding keywords, validating prefixes and suffixes.

When to Use Each Approach

Need	Best Tool
Check file extension	`EndsWith(filename, '.pas')`
Check if string contains a keyword	`Contains(text, 'error')` or `Pos('error', text) > 0`
Check string prefix	`StartsWith(line, '//')`
Match simple patterns with wildcards	`WildcardMatch('data_*.csv', filename)`
Complex patterns (email, phone number, etc.)	Use `RegExpr` unit or hand-written state machine

For the Crypts of Pascalia text adventure, StartsWith is useful for checking directional abbreviations (StartsWith(cmd.DirectObj, 'n') could match "north"), and Contains is useful for searching item descriptions. The wildcard matcher might be used for a game save-file feature that lets the player load files matching 'save_*.dat'.

10.9 Project Checkpoint: PennyWise CSV Import

It is time to apply everything we have learned to the PennyWise personal-finance application. PennyWise needs to import bank transaction data from CSV files. A typical CSV line looks like:

2025-03-15,Groceries,-42.99,Whole Foods Market
2025-03-14,Income,2500.00,Paycheck - ACME Corp
2025-03-13,Dining,-18.50,"Joe's Pizza, Pasta & More"

Note the challenges: - Fields are separated by commas. - Some fields may be quoted (containing commas within the quotes). - Amounts can be negative (expenses) or positive (income). - We need to convert date strings and amount strings to usable data.

Data Structure

type
  TTransaction = record
    TransDate : String;
    Category  : String;
    Amount    : Real;
    Description: String;
  end;

  TTransactionArray = array[0..999] of TTransaction;

CSV Line Parser (Handling Quoted Fields)

function ParseCSVField(const line: String; var pos: Integer): String;
var
  inQuotes: Boolean;
begin
  Result := '';
  if pos > Length(line) then Exit;

  inQuotes := False;
  if line[pos] = '"' then
  begin
    inQuotes := True;
    Inc(pos);  { Skip opening quote }
  end;

  while pos <= Length(line) do
  begin
    if inQuotes then
    begin
      if line[pos] = '"' then
      begin
        { Check for escaped quote ("") }
        if (pos < Length(line)) and (line[pos + 1] = '"') then
        begin
          Result := Result + '"';
          Inc(pos, 2);
        end
        else
        begin
          Inc(pos);  { Skip closing quote }
          { Skip comma after closing quote }
          if (pos <= Length(line)) and (line[pos] = ',') then
            Inc(pos);
          Exit;
        end;
      end
      else
      begin
        Result := Result + line[pos];
        Inc(pos);
      end;
    end
    else
    begin
      if line[pos] = ',' then
      begin
        Inc(pos);  { Skip comma }
        Exit;
      end
      else
      begin
        Result := Result + line[pos];
        Inc(pos);
      end;
    end;
  end;
end;

Import Procedure

function ImportCSV(const filename: String;
                   var transactions: TTransactionArray;
                   var count: Integer): Boolean;
var
  f: TextFile;
  line: String;
  pos, code: Integer;
  amountStr: String;
begin
  Result := False;
  count := 0;

  AssignFile(f, filename);
  {$I-}
  Reset(f);
  {$I+}
  if IOResult <> 0 then
  begin
    WriteLn('Error: Cannot open file ', filename);
    Exit;
  end;

  while not Eof(f) and (count <= High(transactions)) do
  begin
    ReadLn(f, line);
    line := Trim(line);
    if line = '' then Continue;

    pos := 1;
    transactions[count].TransDate   := ParseCSVField(line, pos);
    transactions[count].Category    := ParseCSVField(line, pos);
    amountStr                       := ParseCSVField(line, pos);
    transactions[count].Description := ParseCSVField(line, pos);

    Val(amountStr, transactions[count].Amount, code);
    if code <> 0 then
    begin
      WriteLn('Warning: Invalid amount "', amountStr,
              '" on line, skipping.');
      Continue;
    end;

    Inc(count);
  end;

  CloseFile(f);
  Result := True;
  WriteLn('Imported ', count, ' transactions from ', filename);
end;

Category Search by Substring

Once transactions are imported, a user might want to find all transactions matching a category keyword:

procedure SearchByCategory(const transactions: TTransactionArray;
                           count: Integer;
                           const keyword: String);
var
  i: Integer;
  total: Real;
  matches: Integer;
  lowerKeyword: String;
begin
  total := 0;
  matches := 0;
  lowerKeyword := LowerCase(keyword);

  WriteLn;
  WriteLn('--- Transactions matching "', keyword, '" ---');
  WriteLn(Format('%-12s %-15s %10s  %s',
                 ['Date', 'Category', 'Amount', 'Description']));
  WriteLn(StringOfChar('-', 60));

  for i := 0 to count - 1 do
  begin
    if Pos(lowerKeyword, LowerCase(transactions[i].Category)) > 0 then
    begin
      WriteLn(Format('%-12s %-15s %10.2f  %s',
                     [transactions[i].TransDate,
                      transactions[i].Category,
                      transactions[i].Amount,
                      transactions[i].Description]));
      total := total + transactions[i].Amount;
      Inc(matches);
    end;
  end;

  WriteLn(StringOfChar('-', 60));
  WriteLn(Format('Total: %d transactions, $%.2f', [matches, total]));
end;

This checkpoint demonstrates several key skills working together:

String splitting (the ParseCSVField function uses a state machine for quoted fields).
String-to-number conversion (Val with error checking).
Substring search (Pos with LowerCase for case-insensitive matching).
Formatted output (Format with width specifiers for tabular display).

✅ Checkpoint Verification. After implementing this code, test with a small CSV file containing at least one quoted field with a comma inside it, one negative amount, and one blank line. Your program should handle all three correctly.

What We Built and Why It Matters

Let us step back and appreciate what we have accomplished in this checkpoint. We started with a raw text file — just bytes on disk — and through a sequence of string operations, we extracted structured, typed data that we can compute with. This is the essence of data parsing, and it is a skill you will use in nearly every non-trivial program.

The ParseCSVField function alone demonstrates five concepts from this chapter working together:

Character-by-character iteration (Section 10.6) — the while pos <= Length(line) loop.
State machine logic (Section 10.4) — the inQuotes flag tracks whether we are inside a quoted field.
String-to-number conversion (Section 10.5) — Val converts the amount field.
Substring search (Section 10.3) — Pos enables case-insensitive category matching.
Formatted output (Section 10.5) — Format produces the aligned table display.

This is Theme 5 in its purest form: algorithms (parsing, searching, formatting) operating on data structures (strings, records, arrays) to produce a working program. None of these operations is complicated in isolation. Their power comes from composition — combining simple operations into a pipeline that transforms raw text into meaningful information.

10.10 Chapter Summary

Strings are among the most versatile and frequently used data structures in programming. In this chapter, we covered:

This chapter covered a vast amount of ground. Let us consolidate the key ideas.

String Types. Free Pascal offers multiple string types — ShortString (stack-allocated, 255-character max), AnsiString (heap-allocated, dynamically sized, reference-counted), UnicodeString (UTF-16), and PChar (null-terminated, C-compatible). Using {$H+} makes the plain String type behave as AnsiString, which is the most practical choice for most programs. Remember that ShortString silently truncates strings that exceed its declared capacity — a dangerous behavior that AnsiString avoids entirely.

Basic Operations. Strings support concatenation with +, comparison with relational operators (lexicographic, case-sensitive), and indexing starting at 1. Individual characters can be read and written via bracket notation.

Standard Functions. Pascal provides a rich toolkit: Length for size, Copy for substrings, Pos for searching, Insert and Delete for in-place modification, UpperCase/LowerCase for case conversion, Trim for whitespace removal, and StringReplace for find-and-replace.

Processing Patterns. We built fundamental patterns — splitting strings by delimiters, extracting words with state machines, reversing strings, checking palindromes, and finding all occurrences of a substring. These patterns recur across every domain of programming.

Conversions. The classic Str/Val procedures and the SysUtils functions (IntToStr, StrToInt, FloatToStr, StrToFloat, Format) bridge the gap between textual and numeric data. Val is particularly valuable because it reports the exact position of conversion errors.

Character Operations. Characters are ordinal values. Ord and Chr convert between characters and their numeric codes. Character arithmetic enables everything from case conversion to encryption.

Text Parsing. Our Crypts of Pascalia command parser demonstrated the normalize-tokenize-classify-dispatch pattern. This four-step approach applies to command-line tools, data file parsers, and even simple compilers.

Pattern Matching. We built a wildcard matching function supporting * and ?, along with simpler StartsWith, EndsWith, and Contains helpers.

CSV Import. The PennyWise project checkpoint combined all these skills into a practical CSV parser that handles quoted fields, validates numeric data, and supports case-insensitive category search. This checkpoint demonstrated how string processing serves as the bridge between raw external data and the typed, structured internal representations that programs need to do useful work.

Spaced Review

From Chapter 8: Explain what happens on the call stack when procedure A calls procedure B. When procedure A calls B, the runtime pushes a new stack frame onto the call stack. This frame contains B's local variables, parameters, and the return address (the instruction in A to resume after B finishes). When B completes, its frame is popped, restoring A's context. If B calls another procedure C, a third frame is pushed on top of B's frame — the stack grows and shrinks in strict last-in-first-out order.

From Chapter 6: Write a WHILE loop that reads strings until the user enters 'quit':

var input: String;
begin
  Write('Enter text (quit to stop): ');
  ReadLn(input);
  while LowerCase(Trim(input)) <> 'quit' do
  begin
    WriteLn('You entered: ', input);
    Write('Enter text (quit to stop): ');
    ReadLn(input);
  end;
  WriteLn('Goodbye!');
end.

Looking Ahead

In Chapter 11, we will explore records — custom data structures that group related fields of different types under a single name. You have already seen a preview in this chapter with TCommand and TTransaction. Records are the Pascal mechanism for modeling real-world entities: a student with a name, ID, and GPA; a transaction with a date, amount, and description; an inventory item with a name, weight, and magical properties. Combined with the string processing skills from this chapter, records will let us build truly structured programs.