Chapter 10 Key Takeaways

Core Concepts

  1. Pascal has multiple string types. ShortString is stack-allocated with a 255-character limit; AnsiString (enabled via {$H+}) is heap-allocated, dynamically sized, and reference-counted. For most programs, AnsiString is the practical choice.

  2. Strings are 1-indexed. The first character of a string s is s[1], not s[0]. This is consistent throughout Pascal and differs from C-family languages.

  3. String comparison is lexicographic and case-sensitive. 'Zebra' < 'apple' is True because uppercase letters have lower ordinal values than lowercase. Use LowerCase or UpperCase before comparison for case-insensitive matching.

  4. Know your functions vs. procedures. Copy, Pos, UpperCase, LowerCase, and Trim are functions that return new strings. Insert and Delete are procedures that modify the string variable in place.

  5. Pos returns 0 on failure. This is the standard "not found" signal in Pascal string searching. Always check for 0 before using the result as an index.

  6. Val gives you error position; StrToInt/StrToFloat give you convenience. Use Val when parsing untrusted data (files, user input) where you need to report precisely what went wrong. Use StrToIntDef/StrToFloatDef when you want a default value on failure.

  7. Characters are ordinal values. Ord and Chr bridge the gap between characters and integers. Character arithmetic (Ord(c) - Ord('A')) is the foundation for case conversion, digit parsing, and simple ciphers.

  8. State machines are the reliable way to parse text. Whether you are parsing commands, CSV lines, or any structured text, tracking your current state (inWord, inQuotes, etc.) and transitioning based on each character produces robust, readable parsers.

  9. The normalize-tokenize-classify-dispatch pattern is universal for command processing. Normalize the input (trim, lowercase, collapse spaces), tokenize it (split into parts), classify the parts (verb, object, preposition), and dispatch to the appropriate handler.

  10. CSV is harder than "split on commas." Quoted fields, escaped quotes, empty fields, and malformed rows all require careful handling. A state machine parser handles these edge cases cleanly.

Common Pitfalls

  • Off-by-one errors with Copy and Pos. When combining the two (e.g., finding all occurrences), keep careful track of whether positions are relative to the original string or to a substring.
  • Forgetting that Insert and Delete modify in place. They do not return a new string — they change the variable you pass to them.
  • Assuming case-insensitive comparison. It is not. Always explicitly convert to a common case.
  • Building long strings by repeated concatenation in a loop. This works but can be slow for very long strings. For performance-critical code, use SetLength and index assignment.
  • Using StrToInt on untrusted input without error handling. It raises an exception on invalid input. Use Val or StrToIntDef instead.

Threshold Concept

Strings are both arrays and abstractions. At the lowest level, a string is just an array of bytes. But the string functions (Pos, Copy, Insert, Delete, etc.) let you think at a higher level — searching for patterns, extracting substrings, transforming text. The ability to shift between these levels of abstraction — seeing the characters when you need to, seeing the meaning when you need to — is a skill that transfers to every programming language and every text-processing task you will ever encounter.