22 min read

> "The nice thing about standards is that you have so many to choose from."

Learning Objectives

  • Read and write INI configuration files using TINIFile
  • Parse and generate JSON data using the fpjson unit
  • Read and write XML documents using the DOM parser
  • Parse and generate CSV files with proper handling of edge cases
  • Design custom binary file formats with versioning
  • Implement serialization patterns for converting objects to and from file formats
  • Add JSON, XML, and CSV export/import capabilities to PennyWise

Chapter 34: File Formats and Serialization: JSON, XML, INI, and Custom Binary Formats

"The nice thing about standards is that you have so many to choose from." — Andrew S. Tanenbaum


34.1 Beyond Typed Files: Modern Data Formats

In Chapter 13, we learned to save and load data using Pascal's typed files — file of TExpense, file of TStudent. That was an excellent starting point. Typed files are fast, simple, and perfectly matched to Pascal's record system. But they have a fundamental limitation: they are opaque. Open a typed file in a text editor and you see gibberish. Open it in a spreadsheet and you see nothing useful. Try to read it from a Python script, a JavaScript application, or a web service, and you are out of luck. Typed files are Pascal talking to Pascal. The rest of the world does not speak that language.

Modern software lives in an ecosystem. Your Pascal program needs to read configuration files written by system administrators. It needs to export data that spreadsheets can open. It needs to exchange information with web services that speak JSON. It needs to parse bank statements that arrive as CSV files. It needs to generate reports that other applications can consume.

This chapter teaches you to speak the common languages of data interchange: INI for configuration, JSON for structured data, XML for document-oriented data, and CSV for tabular data. We will also revisit binary formats for situations where performance matters more than interoperability. By the end of this chapter, PennyWise will be able to export its data in any of these formats and import bank transaction data from CSV files — making it a genuine participant in the broader software ecosystem.

💡 Intuition: Data Formats as Languages Think of each file format as a human language. Pascal's typed files are like a private family dialect — perfectly efficient for internal communication but unintelligible to outsiders. JSON is like English — widely understood, flexible, good enough for most purposes. XML is like legal English — verbose but precise, with strict rules about structure. CSV is like a simple numbered list — minimal structure but universally readable. INI is like a labeled filing system — simple, flat, perfect for key-value configuration.

When to Use What

Before diving into specifics, here is a decision guide:

Format Best For Strengths Weaknesses
INI Configuration files Simple, human-readable, widely supported Flat structure only (no nesting)
JSON Data exchange, APIs, config Structured, widely supported, lightweight No comments (officially), no date type
XML Documents, complex schemas Self-describing, schema validation, namespaces Verbose, complex parser
CSV Tabular data, spreadsheets Universal, tiny overhead, human-readable No types, no nesting, quoting edge cases
Binary Performance-critical, large data Fastest read/write, compact Not human-readable, version-fragile

34.2 INI Files

INI files are the simplest structured file format: plain-text files organized into sections, each containing key-value pairs. They have been used for configuration since the earliest days of Windows, and they remain popular because they are almost trivially easy to read and write, both by humans and by programs.

The appeal of INI files is their simplicity. A non-programmer — a system administrator, a power user, even a curious end user — can open an INI file in Notepad, understand its structure immediately, and make changes. Try doing that with an XML file full of angle brackets and closing tags, or a JSON file where a misplaced comma causes a parse error. INI files are forgiving, human-readable, and sufficient for a surprising number of configuration needs.

Free Pascal's IniFiles unit provides the TINIFile class, which handles all the parsing, caching, and writing automatically. You never need to parse an INI file manually.

INI File Structure

[General]
AppName=PennyWise
Version=3.4
Language=English

[Database]
Filename=pennywise.db
AutoSave=true
AutoSaveInterval=30

[Display]
Currency=$
DateFormat=yyyy-mm-dd
ShowCents=true
ItemsPerPage=25

[Sync]
ServerURL=https://api.pennywise.example.com
Enabled=false
Interval=300

The format is self-explanatory: sections in square brackets, key=value pairs beneath each section. Comments start with ; or # (depending on the implementation).

A Complete TINIFile Example

Here is a complete program that demonstrates reading, writing, enumerating sections, and managing a PennyWise configuration file:

program INIDemo;

{$mode objfpc}{$H+}

uses
  SysUtils, Classes, IniFiles;

var
  INI: TINIFile;
  AppName, DbFile, DateFmt: string;
  AutoSave: Boolean;
  Interval, ItemsPerPage: Integer;
  Sections: TStringList;
  Keys: TStringList;
  I, J: Integer;
begin
  INI := TINIFile.Create('pennywise.ini');
  try
    { Reading values with defaults }
    AppName      := INI.ReadString('General', 'AppName', 'PennyWise');
    DbFile       := INI.ReadString('Database', 'Filename', 'data.db');
    AutoSave     := INI.ReadBool('Database', 'AutoSave', True);
    Interval     := INI.ReadInteger('Database', 'AutoSaveInterval', 60);
    DateFmt      := INI.ReadString('Display', 'DateFormat', 'dd/mm/yyyy');
    ItemsPerPage := INI.ReadInteger('Display', 'ItemsPerPage', 20);

    WriteLn('Application: ', AppName);
    WriteLn('Database: ', DbFile);
    WriteLn('Auto-save: ', AutoSave, ' every ', Interval, 's');
    WriteLn('Date format: ', DateFmt);
    WriteLn('Items per page: ', ItemsPerPage);

    { Writing values }
    INI.WriteString('General', 'LastOpened', DateTimeToStr(Now));
    INI.WriteInteger('Display', 'ItemsPerPage', 30);
    INI.WriteBool('Sync', 'Enabled', True);

    { Enumerating all sections and keys }
    WriteLn;
    WriteLn('--- Full INI Contents ---');
    Sections := TStringList.Create;
    Keys := TStringList.Create;
    try
      INI.ReadSections(Sections);
      for I := 0 to Sections.Count - 1 do
      begin
        WriteLn('[', Sections[I], ']');
        INI.ReadSection(Sections[I], Keys);
        for J := 0 to Keys.Count - 1 do
          WriteLn('  ', Keys[J], '=',
            INI.ReadString(Sections[I], Keys[J], ''));
      end;
    finally
      Keys.Free;
      Sections.Free;
    end;

    { Force write to disk }
    INI.UpdateFile;
  finally
    INI.Free;
  end;
end.

Key TINIFile Methods

Method Purpose
ReadString(Section, Key, Default) Read a string value, returning Default if not found
ReadInteger(Section, Key, Default) Read an integer value
ReadBool(Section, Key, Default) Read a boolean value
ReadFloat(Section, Key, Default) Read a floating-point value
WriteString(Section, Key, Value) Write a string value
WriteInteger(Section, Key, Value) Write an integer value
WriteBool(Section, Key, Value) Write a boolean value
UpdateFile Force pending writes to disk
SectionExists(Section) Check if a section exists
ValueExists(Section, Key) Check if a key exists in a section
ReadSection(Section, Strings) Read all keys in a section into a TStringList
ReadSections(Strings) Read all section names into a TStringList
EraseSection(Section) Delete an entire section
DeleteKey(Section, Key) Delete a single key

Every Read method takes a default value as its third parameter. This is a critical design feature: if the INI file is missing, corrupt, or lacks a particular key, the program gets a sensible default rather than crashing. Always provide meaningful defaults.

⚠️ Common Mistake: Forgetting UpdateFile TINIFile buffers writes in memory. Changes are not written to disk until you call UpdateFile or the object is freed. If your program crashes between writing values and freeing the object, the changes are lost. Call UpdateFile after important writes.

A PennyWise Configuration Manager

In practice, you wrap TINIFile in a configuration class that provides typed, validated access to all settings:

type
  TPennyWiseConfig = class
  private
    FINI: TINIFile;
    FFilename: string;
  public
    constructor Create(const AFilename: string);
    destructor Destroy; override;
    { General settings }
    function GetLanguage: string;
    procedure SetLanguage(const Value: string);
    { Database settings }
    function GetDBPath: string;
    function GetAutoSaveInterval: Integer;
    function GetAutoSaveEnabled: Boolean;
    { Display settings }
    function GetCurrencySymbol: string;
    function GetDateFormat: string;
    function GetItemsPerPage: Integer;
    { Sync settings }
    function GetSyncURL: string;
    function GetSyncEnabled: Boolean;
    function GetSyncInterval: Integer;
    procedure Save;
  end;

constructor TPennyWiseConfig.Create(const AFilename: string);
begin
  inherited Create;
  FFilename := AFilename;
  FINI := TINIFile.Create(FFilename);
end;

destructor TPennyWiseConfig.Destroy;
begin
  FINI.UpdateFile;
  FINI.Free;
  inherited;
end;

function TPennyWiseConfig.GetLanguage: string;
begin
  Result := FINI.ReadString('General', 'Language', 'English');
end;

function TPennyWiseConfig.GetDBPath: string;
begin
  Result := FINI.ReadString('Database', 'Filename', 'pennywise.db');
end;

function TPennyWiseConfig.GetAutoSaveInterval: Integer;
begin
  Result := FINI.ReadInteger('Database', 'AutoSaveInterval', 60);
  if Result < 5 then Result := 5;      { Minimum 5 seconds }
  if Result > 3600 then Result := 3600; { Maximum 1 hour }
end;

function TPennyWiseConfig.GetAutoSaveEnabled: Boolean;
begin
  Result := FINI.ReadBool('Database', 'AutoSave', True);
end;

function TPennyWiseConfig.GetCurrencySymbol: string;
begin
  Result := FINI.ReadString('Display', 'Currency', '$');
end;

function TPennyWiseConfig.GetDateFormat: string;
begin
  Result := FINI.ReadString('Display', 'DateFormat', 'yyyy-mm-dd');
end;

function TPennyWiseConfig.GetItemsPerPage: Integer;
begin
  Result := FINI.ReadInteger('Display', 'ItemsPerPage', 25);
  if Result < 5 then Result := 5;
  if Result > 200 then Result := 200;
end;

function TPennyWiseConfig.GetSyncURL: string;
begin
  Result := FINI.ReadString('Sync', 'ServerURL', '');
end;

function TPennyWiseConfig.GetSyncEnabled: Boolean;
begin
  Result := FINI.ReadBool('Sync', 'Enabled', False);
end;

function TPennyWiseConfig.GetSyncInterval: Integer;
begin
  Result := FINI.ReadInteger('Sync', 'Interval', 300);
end;

procedure TPennyWiseConfig.Save;
begin
  FINI.UpdateFile;
end;

Notice the validation in GetAutoSaveInterval and GetItemsPerPage — the configuration manager enforces sensible bounds even if the INI file contains unreasonable values. This defensive programming prevents a hand-edited configuration file from crashing the application.

Limitations of INI Files

INI files are flat — they do not support nesting. You cannot have a section within a section, or an array of values under a single key. If you need hierarchical configuration, use JSON. If you need only key-value pairs organized into groups, INI files are ideal.


34.3 JSON

JSON (JavaScript Object Notation) has become the dominant data interchange format on the internet. It is used by virtually every REST API, many configuration systems (VS Code settings, npm package.json, tsconfig.json), and countless data pipelines. Despite its name, JSON is language-independent — every major programming language can read and write it, including Pascal.

JSON was created by Douglas Crockford in the early 2000s as a lightweight alternative to XML for data exchange. It draws its syntax from JavaScript object literals, but it has become the universal data interchange format, used far beyond the web. Its success comes from three properties: it is simple (only six data types), it is human-readable (much cleaner than XML), and it is universally supported (every language has a JSON library).

Free Pascal provides two units for JSON processing: fpjson (the data model: objects, arrays, values) and jsonparser (the parser that converts JSON text into fpjson objects). Together, they give you complete JSON support — reading, writing, building, navigating, and formatting.

JSON Structure

{
  "name": "Rosa Martinelli",
  "age": 28,
  "freelancer": true,
  "skills": ["graphic design", "illustration", "branding"],
  "address": {
    "city": "Portland",
    "state": "OR"
  },
  "expenses": [
    {"description": "Groceries", "amount": 85.50, "category": "food"},
    {"description": "Bus pass", "amount": 45.00, "category": "transport"}
  ]
}

JSON has six data types: string, number, boolean (true/false), null, object (key-value pairs in {}), and array (ordered list in []). Objects and arrays can be nested to any depth.

Reading JSON with fpjson

Free Pascal includes the fpjson and jsonparser units for working with JSON:

program ReadJSON;

{$mode objfpc}{$H+}

uses
  SysUtils, Classes, fpjson, jsonparser;

var
  JSONStr: string;
  Data: TJSONData;
  Obj: TJSONObject;
  Expenses: TJSONArray;
  Expense: TJSONObject;
  I: Integer;
begin
  { Read JSON from a string }
  JSONStr := '{"name":"Rosa","expenses":[' +
    '{"desc":"Groceries","amount":85.50},' +
    '{"desc":"Bus pass","amount":45.00}]}';

  Data := GetJSON(JSONStr);
  try
    Obj := Data as TJSONObject;

    WriteLn('Name: ', Obj.Get('name', ''));

    Expenses := Obj.Arrays['expenses'];
    WriteLn('Expenses: ', Expenses.Count);

    for I := 0 to Expenses.Count - 1 do
    begin
      Expense := Expenses.Objects[I];
      WriteLn('  ', Expense.Get('desc', ''),
              ': $', Expense.Get('amount', 0.0):0:2);
    end;
  finally
    Data.Free;
  end;
end.

Reading JSON from a File

function LoadJSONFromFile(const Filename: string): TJSONData;
var
  F: TFileStream;
  Parser: TJSONParser;
begin
  F := TFileStream.Create(Filename, fmOpenRead or fmShareDenyWrite);
  try
    Parser := TJSONParser.Create(F);
    try
      Result := Parser.Parse;
    finally
      Parser.Free;
    end;
  finally
    F.Free;
  end;
end;

Writing JSON

Building JSON programmatically uses TJSONObject and TJSONArray:

program WriteJSON;

{$mode objfpc}{$H+}

uses
  SysUtils, Classes, fpjson;

var
  Root: TJSONObject;
  Expenses: TJSONArray;
  Expense: TJSONObject;
begin
  Root := TJSONObject.Create;
  try
    Root.Add('name', 'Rosa Martinelli');
    Root.Add('version', '3.4');
    Root.Add('generated', DateTimeToStr(Now));

    Expenses := TJSONArray.Create;

    Expense := TJSONObject.Create;
    Expense.Add('description', 'Groceries');
    Expense.Add('amount', 85.50);
    Expense.Add('category', 'food');
    Expense.Add('date', FormatDateTime('yyyy-mm-dd', Now));
    Expenses.Add(Expense);

    Expense := TJSONObject.Create;
    Expense.Add('description', 'Bus pass');
    Expense.Add('amount', 45.00);
    Expense.Add('category', 'transport');
    Expense.Add('date', FormatDateTime('yyyy-mm-dd', Now));
    Expenses.Add(Expense);

    Root.Add('expenses', Expenses);

    { Write formatted JSON to console }
    WriteLn(Root.FormatJSON);

    { Write to file }
    with TStringList.Create do
    try
      Text := Root.FormatJSON;
      SaveToFile('expenses.json');
    finally
      Free;
    end;
  finally
    Root.Free;  { Frees all child objects too }
  end;
end.

The FormatJSON method produces nicely indented output. For compact output (no whitespace), use AsJSON.

Converting TExpense to and from JSON

A critical pattern for PennyWise is converting between the domain model (TExpense records) and JSON objects. Here is the complete round-trip serialization:

function ExpenseToJSON(const E: TExpense): TJSONObject;
begin
  Result := TJSONObject.Create;
  Result.Add('id', E.ID);
  Result.Add('description', E.Description);
  Result.Add('amount', Double(E.Amount));
  Result.Add('category', CategoryToStr(E.Category));
  Result.Add('date', FormatDateTime('yyyy-mm-dd', E.ExpenseDate));
  Result.Add('recurring', E.IsRecurring);
end;

function JSONToExpense(Obj: TJSONObject): TExpense;
begin
  Result.ID := Obj.Get('id', 0);
  Result.Description := Obj.Get('description', '');
  Result.Amount := Obj.Get('amount', 0.0);
  Result.Category := StrToCategory(Obj.Get('category', 'other'));
  Result.ExpenseDate := StrToDate(Obj.Get('date', DateToStr(Now)));
  Result.IsRecurring := Obj.Get('recurring', False);
end;

{ Save an entire array of expenses to a JSON file }
procedure SaveExpensesToJSON(const Expenses: TExpenseArray;
  const Filename: string);
var
  Root: TJSONObject;
  Arr: TJSONArray;
  I: Integer;
  SL: TStringList;
begin
  Root := TJSONObject.Create;
  try
    Root.Add('format', 'PennyWise Expenses');
    Root.Add('version', 1);
    Root.Add('exported', FormatDateTime('yyyy-mm-dd"T"hh:nn:ss', Now));
    Root.Add('count', Length(Expenses));

    Arr := TJSONArray.Create;
    for I := 0 to High(Expenses) do
      Arr.Add(ExpenseToJSON(Expenses[I]));
    Root.Add('expenses', Arr);

    SL := TStringList.Create;
    try
      SL.Text := Root.FormatJSON;
      SL.SaveToFile(Filename);
    finally
      SL.Free;
    end;
  finally
    Root.Free;
  end;
end;

{ Load expenses from a JSON file }
function LoadExpensesFromJSON(const Filename: string): TExpenseArray;
var
  Data: TJSONData;
  Root: TJSONObject;
  Arr: TJSONArray;
  I: Integer;
begin
  Data := LoadJSONFromFile(Filename);
  try
    Root := Data as TJSONObject;
    Arr := Root.Arrays['expenses'];
    SetLength(Result, Arr.Count);
    for I := 0 to Arr.Count - 1 do
      Result[I] := JSONToExpense(Arr.Objects[I]);
  finally
    Data.Free;
  end;
end;

Key fpjson Classes

Class Purpose
TJSONData Base class for all JSON values
TJSONObject A JSON object ({key: value, ...})
TJSONArray A JSON array ([value, ...])
TJSONString A JSON string
TJSONIntegerNumber A JSON integer
TJSONFloatNumber A JSON float
TJSONBoolean A JSON boolean
TJSONNull JSON null
TJSONParser Parses JSON text into objects

💡 Memory Management Rule When you add a TJSONObject or TJSONArray as a child of another object/array, the parent takes ownership. Freeing the root object frees all children. Do not free child objects separately — this will cause a double-free crash.


34.4 XML

XML (Extensible Markup Language) is a more formal, more verbose, and more powerful data format than JSON. While JSON has largely replaced XML for web APIs, XML remains dominant in enterprise software, document formats (DOCX, ODT, SVG are all XML-based), configuration systems (Maven pom.xml, .NET .csproj), data feeds (RSS, Atom), and inter-system communication protocols (SOAP, XMPP).

Free Pascal includes the DOM (Document Object Model) unit for working with XML. The DOM approach loads the entire XML document into memory as a tree of nodes, which you can then navigate and manipulate. For large XML files that do not fit in memory, Free Pascal also supports SAX-style (event-based) parsing, but DOM is simpler and sufficient for most applications.

XML Structure

<?xml version="1.0" encoding="UTF-8"?>
<pennywise version="3.4">
  <user name="Rosa Martinelli" />
  <expenses>
    <expense id="1">
      <description>Groceries</description>
      <amount currency="USD">85.50</amount>
      <category>food</category>
      <date>2026-03-15</date>
    </expense>
    <expense id="2">
      <description>Bus pass</description>
      <amount currency="USD">45.00</amount>
      <category>transport</category>
      <date>2026-03-15</date>
    </expense>
  </expenses>
</pennywise>

XML has elements (tags), attributes, text content, and a strict hierarchy. Every opening tag must have a matching closing tag (or be self-closing).

Reading XML with the DOM Parser

Free Pascal provides the DOM and XMLRead units:

program ReadXML;

{$mode objfpc}{$H+}

uses
  SysUtils, Classes, DOM, XMLRead;

var
  Doc: TXMLDocument;
  Root, ExpensesNode, ExpenseNode, ChildNode: TDOMNode;
  I: Integer;
begin
  ReadXMLFile(Doc, 'expenses.xml');
  try
    Root := Doc.DocumentElement;  { <pennywise> }
    WriteLn('Root element: ', Root.NodeName);
    WriteLn('Version: ', TDOMElement(Root).GetAttribute('version'));

    { Find <expenses> node }
    ExpensesNode := Root.FindNode('expenses');
    if ExpensesNode <> nil then
    begin
      ExpenseNode := ExpensesNode.FirstChild;
      while ExpenseNode <> nil do
      begin
        if ExpenseNode.NodeName = 'expense' then
        begin
          Write('Expense #', TDOMElement(ExpenseNode).GetAttribute('id'), ': ');

          ChildNode := ExpenseNode.FindNode('description');
          if ChildNode <> nil then
            Write(ChildNode.TextContent);

          ChildNode := ExpenseNode.FindNode('amount');
          if ChildNode <> nil then
            Write(' - $', ChildNode.TextContent);

          WriteLn;
        end;
        ExpenseNode := ExpenseNode.NextSibling;
      end;
    end;
  finally
    Doc.Free;
  end;
end.

Writing XML

program WriteXML;

{$mode objfpc}{$H+}

uses
  SysUtils, Classes, DOM, XMLWrite;

var
  Doc: TXMLDocument;
  Root, ExpensesNode, ExpenseNode, ChildNode: TDOMElement;
begin
  Doc := TXMLDocument.Create;
  try
    { Create root element }
    Root := Doc.CreateElement('pennywise');
    Root.SetAttribute('version', '3.4');
    Doc.AppendChild(Root);

    { Create user element }
    ChildNode := Doc.CreateElement('user');
    ChildNode.SetAttribute('name', 'Rosa Martinelli');
    Root.AppendChild(ChildNode);

    { Create expenses container }
    ExpensesNode := Doc.CreateElement('expenses');
    Root.AppendChild(ExpensesNode);

    { Add an expense }
    ExpenseNode := Doc.CreateElement('expense');
    ExpenseNode.SetAttribute('id', '1');

    ChildNode := Doc.CreateElement('description');
    ChildNode.TextContent := 'Groceries';
    ExpenseNode.AppendChild(ChildNode);

    ChildNode := Doc.CreateElement('amount');
    ChildNode.SetAttribute('currency', 'USD');
    ChildNode.TextContent := '85.50';
    ExpenseNode.AppendChild(ChildNode);

    ChildNode := Doc.CreateElement('category');
    ChildNode.TextContent := 'food';
    ExpenseNode.AppendChild(ChildNode);

    ChildNode := Doc.CreateElement('date');
    ChildNode.TextContent := FormatDateTime('yyyy-mm-dd', Now);
    ExpenseNode.AppendChild(ChildNode);

    ExpensesNode.AppendChild(ExpenseNode);

    { Write to file }
    WriteXMLFile(Doc, 'expenses.xml');
    WriteLn('XML file written successfully.');
  finally
    Doc.Free;
  end;
end.

DOM Navigation: A Complete Reference

The DOM API works with a tree of nodes. Understanding the navigation methods is essential for extracting data from complex XML documents:

Method/Property Purpose
Doc.DocumentElement The root element of the document
Node.FindNode('name') Find a direct child by tag name
Node.FirstChild First child node
Node.LastChild Last child node
Node.NextSibling Next sibling at the same level
Node.PreviousSibling Previous sibling at the same level
Node.ParentNode Parent node
Node.NodeName Tag name of the element
Node.TextContent Text content of the element and all descendants
Node.HasChildNodes Whether the node has children
Node.ChildNodes A TDOMNodeList of all children
TDOMElement(Node).GetAttribute('name') Get an attribute value
TDOMElement(Node).SetAttribute('name', 'value') Set an attribute

A common pattern for processing all elements of a specific type is the sibling-walking loop:

procedure ProcessAllExpenses(ExpensesNode: TDOMNode);
var
  Node: TDOMNode;
  Desc, Amount, Category: string;
begin
  Node := ExpensesNode.FirstChild;
  while Node <> nil do
  begin
    if (Node.NodeType = ELEMENT_NODE) and (Node.NodeName = 'expense') then
    begin
      Desc := '';
      Amount := '';
      Category := '';

      if Node.FindNode('description') <> nil then
        Desc := Node.FindNode('description').TextContent;
      if Node.FindNode('amount') <> nil then
        Amount := Node.FindNode('amount').TextContent;
      if Node.FindNode('category') <> nil then
        Category := Node.FindNode('category').TextContent;

      WriteLn(Format('  %s: $%s (%s)', [Desc, Amount, Category]));
    end;
    Node := Node.NextSibling;
  end;
end;

Note the check for Node.NodeType = ELEMENT_NODE. This is important because the DOM tree includes text nodes (whitespace between elements) as children. Without this check, you would try to process whitespace nodes as expense elements and get empty results.

XML Helper Functions

In practice, you write helper functions to simplify DOM access:

function GetChildText(Parent: TDOMNode; const ChildName: string;
  const Default: string = ''): string;
var
  Child: TDOMNode;
begin
  Child := Parent.FindNode(ChildName);
  if (Child <> nil) and (Child.TextContent <> '') then
    Result := Child.TextContent
  else
    Result := Default;
end;

function GetAttr(Element: TDOMNode; const AttrName: string;
  const Default: string = ''): string;
begin
  if Element is TDOMElement then
    Result := TDOMElement(Element).GetAttribute(AttrName)
  else
    Result := Default;
  if Result = '' then
    Result := Default;
end;

procedure AddChildText(Doc: TXMLDocument; Parent: TDOMElement;
  const ChildName, Text: string);
var
  Child: TDOMElement;
begin
  Child := Doc.CreateElement(ChildName);
  Child.TextContent := Text;
  Parent.AppendChild(Child);
end;

With these helpers, the expense reading code becomes much more concise:

{ Instead of repeating FindNode + nil check + TextContent: }
Desc := GetChildText(ExpenseNode, 'description');
Amount := GetChildText(ExpenseNode, 'amount', '0.00');
Category := GetChildText(ExpenseNode, 'category', 'other');
ID := GetAttr(ExpenseNode, 'id', '0');

📊 JSON vs. XML: When to Choose Choose JSON when: you need lightweight data exchange, your consumers are web applications, or simplicity is important. Choose XML when: you need schema validation, document-oriented data, attributes on elements, or when integrating with enterprise systems that require XML. In practice, most new projects use JSON; most legacy integration requires XML.


34.5 CSV

CSV (Comma-Separated Values) is the simplest tabular data format. Every spreadsheet program can read and write CSV. Every programming language has CSV support. Bank statements, data exports, and bulk imports commonly use CSV. If you need to move tabular data between applications, CSV is the lowest common denominator — the format that everything understands.

CSV has no formal standard (RFC 4180 comes close but is not universally followed), which is both its strength and its weakness. The strength: any program can produce CSV by simply writing comma-separated values, one row per line. The weakness: different programs interpret "CSV" differently, leading to an endless parade of edge cases.

CSV Structure

Date,Description,Amount,Category
2026-03-15,Groceries,85.50,food
2026-03-15,Bus pass,45.00,transport
2026-03-14,Movie tickets,24.00,entertainment
2026-03-12,Electricity,95.30,utilities

Simple, right? Except CSV has edge cases that trip up almost every naive parser:

  1. Commas in values: "Portland, OR" — values containing commas must be quoted
  2. Quotes in values: "She said ""hello""" — literal quotes are escaped by doubling
  3. Newlines in values: A quoted field can span multiple lines
  4. Leading/trailing spaces: Are they part of the value or whitespace to trim?
  5. Empty fields: a,,c — the second field is empty, not missing
  6. Different delimiters: Some "CSV" files use semicolons, tabs, or pipes

Writing CSV

Writing CSV is straightforward if you handle quoting:

function CSVQuote(const S: string): string;
var
  NeedsQuoting: Boolean;
begin
  NeedsQuoting := (Pos(',', S) > 0) or (Pos('"', S) > 0) or
                  (Pos(#10, S) > 0) or (Pos(#13, S) > 0);
  if NeedsQuoting then
    Result := '"' + StringReplace(S, '"', '""', [rfReplaceAll]) + '"'
  else
    Result := S;
end;

procedure WriteCSVLine(var F: TextFile; const Fields: array of string);
var
  I: Integer;
begin
  for I := 0 to High(Fields) do
  begin
    if I > 0 then Write(F, ',');
    Write(F, CSVQuote(Fields[I]));
  end;
  WriteLn(F);
end;

procedure ExportExpensesToCSV(const Expenses: TExpenseArray; const Filename: string);
var
  F: TextFile;
  I: Integer;
begin
  AssignFile(F, Filename);
  Rewrite(F);
  try
    { Header row }
    WriteCSVLine(F, ['Date', 'Description', 'Amount', 'Category', 'Recurring']);

    { Data rows }
    for I := 0 to High(Expenses) do
      WriteCSVLine(F, [
        FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate),
        Expenses[I].Description,
        Format('%.2f', [Double(Expenses[I].Amount)]),
        CategoryToStr(Expenses[I].Category),
        BoolToStr(Expenses[I].IsRecurring, 'yes', 'no')
      ]);
  finally
    CloseFile(F);
  end;
end;

Reading CSV

Parsing CSV correctly requires handling quoted fields. Here is a complete, robust parser:

function ParseCSVLine(const Line: string): TStringArray;
var
  Fields: TStringArray;
  FieldCount: Integer;
  I: Integer;
  InQuote: Boolean;
  Current: string;

  procedure AddField;
  begin
    if FieldCount >= Length(Fields) then
      SetLength(Fields, FieldCount + 16);
    Fields[FieldCount] := Current;
    Inc(FieldCount);
    Current := '';
  end;

begin
  SetLength(Fields, 16);
  FieldCount := 0;
  Current := '';
  InQuote := False;
  I := 1;

  while I <= Length(Line) do
  begin
    if InQuote then
    begin
      if (Line[I] = '"') then
      begin
        if (I < Length(Line)) and (Line[I + 1] = '"') then
        begin
          Current := Current + '"';  { Escaped quote }
          Inc(I);  { Skip the second quote }
        end
        else
          InQuote := False;  { End of quoted field }
      end
      else
        Current := Current + Line[I];
    end
    else
    begin
      case Line[I] of
        ',': AddField;
        '"': InQuote := True;
      else
        Current := Current + Line[I];
      end;
    end;
    Inc(I);
  end;
  AddField;  { Add the last field }

  SetLength(Fields, FieldCount);
  Result := Fields;
end;

CSV Edge Cases: A Comprehensive Test

Here is a test that exercises every CSV edge case:

procedure TestCSVParser;
var
  Fields: TStringArray;
begin
  { Basic fields }
  Fields := ParseCSVLine('a,b,c');
  Assert(Length(Fields) = 3);
  Assert(Fields[0] = 'a');

  { Quoted field with comma }
  Fields := ParseCSVLine('name,"Portland, OR",zip');
  Assert(Fields[1] = 'Portland, OR');

  { Quoted field with embedded quote }
  Fields := ParseCSVLine('title,"She said ""hello""",end');
  Assert(Fields[1] = 'She said "hello"');

  { Empty fields }
  Fields := ParseCSVLine('a,,c');
  Assert(Length(Fields) = 3);
  Assert(Fields[1] = '');

  { Leading/trailing empty }
  Fields := ParseCSVLine(',b,');
  Assert(Length(Fields) = 3);
  Assert(Fields[0] = '');
  Assert(Fields[2] = '');

  WriteLn('All CSV parser tests passed.');
end;

A Configurable CSV Parser

Since real-world CSV files vary so widely, a professional CSV parser must support configurable delimiters. Here is an enhanced version that handles different delimiters and multiline quoted fields:

type
  TCSVReader = class
  private
    FDelimiter: Char;
    FQuoteChar: Char;
    FLines: TStringList;
    FCurrentLine: Integer;
  public
    constructor Create(const Filename: string; ADelimiter: Char = ',';
      AQuoteChar: Char = '"');
    destructor Destroy; override;
    function ReadRow: TStringArray;
    function EOF: Boolean;
    procedure Reset;
    property Delimiter: Char read FDelimiter write FDelimiter;
    property CurrentLine: Integer read FCurrentLine;
  end;

constructor TCSVReader.Create(const Filename: string;
  ADelimiter: Char; AQuoteChar: Char);
begin
  inherited Create;
  FDelimiter := ADelimiter;
  FQuoteChar := AQuoteChar;
  FCurrentLine := 0;
  FLines := TStringList.Create;
  FLines.LoadFromFile(Filename);
end;

destructor TCSVReader.Destroy;
begin
  FLines.Free;
  inherited;
end;

function TCSVReader.ReadRow: TStringArray;
var
  Line, CombinedLine: string;
  InQuote: Boolean;
begin
  if EOF then
  begin
    SetLength(Result, 0);
    Exit;
  end;

  { Handle multiline quoted fields by combining lines }
  CombinedLine := FLines[FCurrentLine];
  Inc(FCurrentLine);

  { Count quotes — if odd number, the field spans multiple lines }
  InQuote := False;
  for var C in CombinedLine do
    if C = FQuoteChar then InQuote := not InQuote;

  while InQuote and (FCurrentLine < FLines.Count) do
  begin
    CombinedLine := CombinedLine + #10 + FLines[FCurrentLine];
    Inc(FCurrentLine);
    InQuote := False;
    for var C in CombinedLine do
      if C = FQuoteChar then InQuote := not InQuote;
  end;

  Result := ParseCSVLineWithDelimiter(CombinedLine, FDelimiter, FQuoteChar);
end;

function TCSVReader.EOF: Boolean;
begin
  Result := FCurrentLine >= FLines.Count;
end;

procedure TCSVReader.Reset;
begin
  FCurrentLine := 0;
end;

This TCSVReader class handles the three most problematic CSV edge cases: configurable delimiters (commas, semicolons, tabs), multiline fields (where a quoted value contains newline characters), and escaped quotes (doubled quote characters inside quoted fields). It reads the file once into a TStringList and then parses row by row, combining physical lines when a quoted field spans multiple lines.

Importing Bank CSV Files

Bank CSV imports deserve special attention because they are one of the most common real-world data integration tasks, and they are also one of the most frustrating. The core problem is that there is no universal bank CSV standard. Every bank formats its exports differently, and even the same bank may change its format without notice.

Here are some of the variations you will encounter in the wild:

Delimiter variations: Most banks use commas, but European banks often use semicolons (because commas are used as decimal separators in many European locales). Some banks use tabs. A few use pipes.

Date format variations: US banks typically use MM/DD/YYYY. European banks use DD/MM/YYYY or DD.MM.YYYY. ISO-compliant banks use YYYY-MM-DD. Some include timestamps; most do not.

Amount format variations: US banks use a period as the decimal separator (85.50). European banks use a comma (85,50). Some banks use a period as a thousands separator (1.234,50). Some banks put debits and credits in separate columns. Others use a single amount column with negative numbers for debits. Some banks use parentheses for negative numbers: (85.50).

Header variations: Most banks include a header row. Some include multiple header lines (bank name, account number, date range, then the column headers). Some include no headers at all.

Encoding variations: Most modern exports are UTF-8. Some older systems use Latin-1 or Windows-1252. A few still produce ASCII-only output, replacing special characters.

The lesson is clear: a bank CSV importer must be configurable. Hard-coding any assumption about the format will break the first time a user tries to import from a different bank.

⚠️ The Bank CSV Problem Every bank formats its CSV exports differently. Some use semicolons instead of commas. Some put the date in MM/DD/YYYY format, others in DD.MM.YYYY. Some include a header row, others do not. Some quote every field, others quote nothing. When importing bank CSV files, you almost always need a configuration step where the user maps columns and specifies the date format. PennyWise handles this in the project checkpoint.


34.6 Custom Binary Formats

Sometimes neither JSON nor CSV is appropriate. If you are storing millions of records, or if read/write speed is critical, or if file size matters, a custom binary format can be orders of magnitude more efficient than text-based formats.

Designing a Binary Format with a Versioning Header

A well-designed binary format has these properties:

  1. A magic number — the first few bytes identify the file type
  2. A version field — so the reader knows which format version to expect
  3. A header — metadata about the file contents
  4. A body — the actual data in a defined layout
  5. A checksum (optional) — to detect corruption
type
  TPennyWiseHeader = packed record
    Magic: array[0..3] of Char;  { 'PWDB' }
    Version: Word;                { Format version: 1, 2, ... }
    RecordCount: LongWord;        { Number of expense records }
    CreatedDate: Int64;            { TDateTime as Int64 bits }
    Reserved: array[0..15] of Byte; { For future use }
  end;

  TPennyWiseRecordV1 = packed record
    ID: LongWord;
    DescLen: Word;
    { Description string follows (variable length) }
    Amount: Int64;          { Currency stored as cents (integer) }
    Category: Byte;         { TExpenseCategory ordinal }
    ExpenseDate: Int64;     { TDateTime as Int64 bits }
    Flags: Byte;            { Bit 0: IsRecurring }
  end;

  { Version 2 adds a Notes field }
  TPennyWiseRecordV2 = packed record
    ID: LongWord;
    DescLen: Word;
    { Description string follows (variable length) }
    Amount: Int64;
    Category: Byte;
    ExpenseDate: Int64;
    Flags: Byte;
    NotesLen: Word;          { NEW in v2 }
    { Notes string follows (variable length) }
  end;

The packed record Keyword

Notice packed record instead of plain record. Without packed, the compiler may insert padding bytes between fields for alignment optimization. In a file format, we need exact control over the byte layout, so packed eliminates all padding.

Why Store Currency as Cents?

Notice that the amount is stored as Int64 representing cents rather than as a floating-point number. This is critical: floating-point numbers cannot represent all decimal values exactly. The value 0.10 (ten cents) cannot be represented precisely in IEEE 754 floating-point. Over thousands of transactions, these tiny errors accumulate into real discrepancies. By storing amounts as integers (pennies/cents), every value is exact. The conversion is trivial: store Amount * 100, display Amount div 100 + '.' + Amount mod 100.

This is the same technique used by banks, payment processors, and accounting software worldwide. When you design binary formats for financial data, never use floating-point for currency. Use integer cents, or use Pascal's Currency type (which is internally a 64-bit integer with an implied four decimal places).

Versioning Strategy

The version field is critical. When you need to add new fields or change the format:

  1. Increment the version number
  2. Write the new format
  3. In the reader, check the version and branch accordingly:
procedure LoadBinaryExpenses(const Filename: string);
var
  F: file;
  Header: TPennyWiseHeader;
begin
  AssignFile(F, Filename);
  Reset(F, 1);
  try
    BlockRead(F, Header, SizeOf(Header));

    if (Header.Magic[0] <> 'P') or (Header.Magic[1] <> 'W') or
       (Header.Magic[2] <> 'D') or (Header.Magic[3] <> 'B') then
      raise Exception.Create('Not a PennyWise data file');

    case Header.Version of
      1: LoadVersion1(F, Header.RecordCount);
      2: LoadVersion2(F, Header.RecordCount);
    else
      raise Exception.CreateFmt('Unsupported format version: %d', [Header.Version]);
    end;
  finally
    CloseFile(F);
  end;
end;

Writing the Binary Format

Here is the complete write procedure for version 2:

procedure SaveBinaryExpenses(const Expenses: TExpenseArray;
  const Filename: string);
var
  F: file;
  Header: TPennyWiseHeader;
  DescBytes: TBytes;
  NotesBytes: TBytes;
  Len: Word;
  AmountCents: Int64;
  DateBits: Int64;
  Cat: Byte;
  Flags: Byte;
  I: Integer;
begin
  AssignFile(F, Filename);
  Rewrite(F, 1);
  try
    { Write header }
    Header.Magic[0] := 'P';
    Header.Magic[1] := 'W';
    Header.Magic[2] := 'D';
    Header.Magic[3] := 'B';
    Header.Version := 2;
    Header.RecordCount := Length(Expenses);
    Move(Now, Header.CreatedDate, SizeOf(Int64));
    FillChar(Header.Reserved, SizeOf(Header.Reserved), 0);
    BlockWrite(F, Header, SizeOf(Header));

    { Write records }
    for I := 0 to High(Expenses) do
    begin
      BlockWrite(F, Expenses[I].ID, SizeOf(LongWord));

      DescBytes := TEncoding.UTF8.GetBytes(Expenses[I].Description);
      Len := Length(DescBytes);
      BlockWrite(F, Len, SizeOf(Word));
      if Len > 0 then
        BlockWrite(F, DescBytes[0], Len);

      AmountCents := Round(Expenses[I].Amount * 10000);
      BlockWrite(F, AmountCents, SizeOf(Int64));

      Cat := Ord(Expenses[I].Category);
      BlockWrite(F, Cat, 1);

      Move(Expenses[I].ExpenseDate, DateBits, SizeOf(Int64));
      BlockWrite(F, DateBits, SizeOf(Int64));

      Flags := 0;
      if Expenses[I].IsRecurring then Flags := Flags or 1;
      BlockWrite(F, Flags, 1);

      { Notes (empty for now) }
      Len := 0;
      BlockWrite(F, Len, SizeOf(Word));
    end;
  finally
    CloseFile(F);
  end;
end;

Performance Comparison

For 100,000 expense records, typical file sizes and read times:

Format File Size Write Time Read Time
JSON (formatted) 12.4 MB 890 ms 1,240 ms
JSON (compact) 8.1 MB 650 ms 980 ms
XML 15.7 MB 1,100 ms 1,800 ms
CSV 5.2 MB 320 ms 480 ms
Binary (packed) 3.1 MB 45 ms 38 ms

Binary is 10-30x faster and 2-5x smaller. But it cannot be opened in a text editor, cannot be read by other applications without a custom parser, and is fragile across format changes. The choice depends on your priorities.

💡 Design Principle: Use Text Formats for Interchange, Binary for Internal Storage A common architecture uses binary files for the application's internal data store (fast, compact) and text formats (JSON, CSV) for import/export. This gives you the best of both worlds: performance for daily use, interoperability when sharing data with other applications.

Endianness and Portability

One pitfall of binary formats is endianness — the order in which bytes are stored for multi-byte values. Intel and AMD CPUs are little-endian (least significant byte first). Some ARM configurations are big-endian (most significant byte first). If you write a binary file on a little-endian machine and read it on a big-endian machine, multi-byte values (integers, floats) will be garbled.

For files that will only be read on the same architecture (most desktop applications), this is not a concern. For files that might be transferred between architectures, you must define a byte order and convert as needed. The convention is to use network byte order (big-endian), with explicit conversion functions:

function SwapEndian16(Value: Word): Word;
begin
  Result := (Value shr 8) or (Value shl 8);
end;

function SwapEndian32(Value: LongWord): LongWord;
begin
  Result := ((Value and $FF) shl 24) or
            ((Value and $FF00) shl 8) or
            ((Value and $FF0000) shr 8) or
            ((Value and $FF000000) shr 24);
end;

Free Pascal also provides built-in SwapEndian functions in the System unit for this purpose. For PennyWise, which runs on standard desktop platforms (all little-endian), endianness is not a concern. But it is important to know about for cross-platform binary format design.

Choosing Between Binary and Text Formats: A Decision Framework

Criterion Binary Text (JSON/CSV)
File size Smallest 2-5x larger
Read/write speed Fastest (10-30x) Slower (parsing overhead)
Human-readable No Yes
Cross-application No (custom parser needed) Yes (universal)
Version evolution Fragile (must handle old versions) Flexible (ignore unknown fields)
Debugging Hex editor required Text editor
Corruption recovery Difficult Easier (partial data readable)

Use binary for: internal databases, cache files, performance-critical data, large datasets (>1M records). Use text for: configuration files, data exchange, reports, anything a human might need to read.


34.7 Serialization Patterns

Serialization is the process of converting an in-memory object (a TExpense record, a TBudget array, a complex object graph) into a format that can be stored in a file, transmitted over a network, or persisted in a database. Deserialization is the reverse: reconstructing the in-memory object from the stored format.

Every time PennyWise saves expenses to JSON, it is serializing. Every time it loads a bank CSV, it is deserializing. These operations seem simple when you have one format and one data type, but they become complex when you have multiple formats (JSON, XML, CSV, binary) and multiple data types (expenses, budgets, categories, user preferences). Without a systematic approach, you end up with a tangled web of format-specific code.

The Converter Pattern

One clean approach is to define format-specific converters that each implement the same interface:

type
  IExpenseSerializer = interface
    procedure SaveExpenses(const Expenses: TExpenseArray;
      const Filename: string);
    function LoadExpenses(const Filename: string): TExpenseArray;
  end;

  TJSONExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
    procedure SaveExpenses(const Expenses: TExpenseArray;
      const Filename: string);
    function LoadExpenses(const Filename: string): TExpenseArray;
  end;

  TCSVExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
    procedure SaveExpenses(const Expenses: TExpenseArray;
      const Filename: string);
    function LoadExpenses(const Filename: string): TExpenseArray;
  end;

  TXMLExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
    procedure SaveExpenses(const Expenses: TExpenseArray;
      const Filename: string);
    function LoadExpenses(const Filename: string): TExpenseArray;
  end;

Now the rest of your application works with IExpenseSerializer without knowing or caring whether the format is JSON, CSV, or XML:

procedure ExportData(Serializer: IExpenseSerializer;
  const Expenses: TExpenseArray; const Filename: string);
begin
  WriteLn('Exporting ', Length(Expenses), ' expenses to ', Filename);
  Serializer.SaveExpenses(Expenses, Filename);
  WriteLn('Export complete.');
end;

This is the same dependency inversion principle we applied to unit design in Chapter 33, now applied to file format handling.

Implementing TJSONExpenseSerializer

Here is the complete implementation for the JSON serializer, demonstrating the converter pattern in practice:

type
  TJSONExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
  private
    function ExpenseToObj(const E: TExpense): TJSONObject;
    function ObjToExpense(Obj: TJSONObject): TExpense;
  public
    procedure SaveExpenses(const Expenses: TExpenseArray;
      const Filename: string);
    function LoadExpenses(const Filename: string): TExpenseArray;
  end;

function TJSONExpenseSerializer.ExpenseToObj(const E: TExpense): TJSONObject;
begin
  Result := TJSONObject.Create;
  Result.Add('id', E.ID);
  Result.Add('description', E.Description);
  Result.Add('amount', Double(E.Amount));
  Result.Add('category', CategoryToStr(E.Category));
  Result.Add('date', FormatDateTime('yyyy-mm-dd', E.ExpenseDate));
  Result.Add('recurring', E.IsRecurring);
end;

function TJSONExpenseSerializer.ObjToExpense(Obj: TJSONObject): TExpense;
begin
  Result.ID := Obj.Get('id', 0);
  Result.Description := Obj.Get('description', '');
  Result.Amount := SafeGetAmount(Obj, 'amount');
  Result.Category := StrToCategory(Obj.Get('category', 'other'));
  Result.ExpenseDate := SafeGetDate(Obj, 'date');
  Result.IsRecurring := Obj.Get('recurring', False);
end;

procedure TJSONExpenseSerializer.SaveExpenses(const Expenses: TExpenseArray;
  const Filename: string);
var
  Root: TJSONObject;
  Arr: TJSONArray;
  I: Integer;
  SL: TStringList;
begin
  Root := TJSONObject.Create;
  try
    Root.Add('format', 'PennyWise Expenses');
    Root.Add('version', 1);
    Root.Add('exported', FormatDateTime('yyyy-mm-dd"T"hh:nn:ss', Now));
    Root.Add('count', Length(Expenses));

    Arr := TJSONArray.Create;
    for I := 0 to High(Expenses) do
      Arr.Add(ExpenseToObj(Expenses[I]));
    Root.Add('expenses', Arr);

    SL := TStringList.Create;
    try
      SL.Text := Root.FormatJSON;
      SL.SaveToFile(Filename);
    finally
      SL.Free;
    end;
  finally
    Root.Free;
  end;
end;

function TJSONExpenseSerializer.LoadExpenses(
  const Filename: string): TExpenseArray;
var
  Data: TJSONData;
  Root: TJSONObject;
  Arr: TJSONArray;
  I: Integer;
begin
  SetLength(Result, 0);
  if not FileExists(Filename) then Exit;

  Data := LoadJSONFromFile(Filename);
  try
    if not (Data is TJSONObject) then Exit;
    Root := Data as TJSONObject;

    { Check format version for forward compatibility }
    if Root.Get('version', 0) > 1 then
      WriteLn('Warning: file was created by a newer version of PennyWise');

    Arr := Root.Arrays['expenses'];
    SetLength(Result, Arr.Count);
    for I := 0 to Arr.Count - 1 do
      Result[I] := ObjToExpense(Arr.Objects[I]);
  finally
    Data.Free;
  end;
end;

The serializer checks the file's version field before loading. If the file was created by a newer version of PennyWise, it warns the user but still attempts to load — because ObjToExpense uses defaults for missing fields, it will gracefully handle any fields that the current version does not recognize.

Handling Format Differences Gracefully

Real-world data is messy. A robust deserializer must handle:

  • Missing fields: Use defaults. Do not crash if a field is absent.
  • Extra fields: Ignore them. Forward compatibility means accepting data from newer versions.
  • Type mismatches: An amount might be a string "85.50" instead of a number 85.50. Handle both.
  • Date format variations: Try multiple date formats before giving up.
  • Encoding issues: UTF-8 BOM, different line endings (CR, LF, CRLF).
function SafeGetAmount(Obj: TJSONObject; const Key: string): Currency;
var
  Data: TJSONData;
begin
  Data := Obj.Find(Key);
  if Data = nil then
    Exit(0);
  case Data.JSONType of
    jtNumber: Result := Data.AsFloat;
    jtString:
      if not TryStrToCurr(Data.AsString, Result) then
        Result := 0;
  else
    Result := 0;
  end;
end;

function SafeGetDate(Obj: TJSONObject; const Key: string): TDateTime;
var
  S: string;
  Formats: array[0..3] of string;
  I: Integer;
  Settings: TFormatSettings;
begin
  S := Obj.Get(Key, '');
  if S = '' then
    Exit(Now);

  Formats[0] := 'yyyy-mm-dd';
  Formats[1] := 'mm/dd/yyyy';
  Formats[2] := 'dd.mm.yyyy';
  Formats[3] := 'dd/mm/yyyy';

  Settings := DefaultFormatSettings;

  for I := 0 to High(Formats) do
  begin
    Settings.ShortDateFormat := Formats[I];
    if TryStrToDate(S, Result, Settings) then
      Exit;
  end;

  Result := Now;  { All formats failed — use current date }
end;

34.8 Project Checkpoint: PennyWise Export/Import

PennyWise has been storing its data in a format that only PennyWise can read. This is like writing a journal in a private cipher — useful for you, but useless for anyone (or anything) else. In this checkpoint, we give PennyWise the ability to speak the world's data languages, transforming it from a closed system into an interoperable one.

We will add three capabilities:

  1. Export expenses to JSON, XML, and CSV
  2. Import bank transactions from CSV
  3. Configuration via INI file

The Export System

We create a FinanceExport unit (extending the modular architecture from Chapter 33):

unit FinanceExport;

{$mode objfpc}{$H+}

interface

uses
  SysUtils, Classes, FinanceCore;

procedure ExportToJSON(const Expenses: TExpenseArray; const Filename: string);
procedure ExportToCSV(const Expenses: TExpenseArray; const Filename: string);
procedure ExportToXML(const Expenses: TExpenseArray; const Filename: string);

implementation

uses
  fpjson, DOM, XMLWrite;

{ --- JSON Export --- }

procedure ExportToJSON(const Expenses: TExpenseArray; const Filename: string);
var
  Root: TJSONObject;
  Arr: TJSONArray;
  Exp: TJSONObject;
  I: Integer;
  SL: TStringList;
begin
  Root := TJSONObject.Create;
  try
    Root.Add('format', 'PennyWise Export');
    Root.Add('version', '3.4');
    Root.Add('exported', FormatDateTime('yyyy-mm-dd"T"hh:nn:ss', Now));
    Root.Add('count', Length(Expenses));

    Arr := TJSONArray.Create;
    for I := 0 to High(Expenses) do
    begin
      Exp := TJSONObject.Create;
      Exp.Add('id', Expenses[I].ID);
      Exp.Add('description', Expenses[I].Description);
      Exp.Add('amount', Double(Expenses[I].Amount));
      Exp.Add('category', CategoryToStr(Expenses[I].Category));
      Exp.Add('date', FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate));
      Exp.Add('recurring', Expenses[I].IsRecurring);
      Arr.Add(Exp);
    end;
    Root.Add('expenses', Arr);

    SL := TStringList.Create;
    try
      SL.Text := Root.FormatJSON;
      SL.SaveToFile(Filename);
    finally
      SL.Free;
    end;
  finally
    Root.Free;
  end;
end;

{ --- CSV Export --- }

function QuoteCSV(const S: string): string;
begin
  if (Pos(',', S) > 0) or (Pos('"', S) > 0) or (Pos(#10, S) > 0) then
    Result := '"' + StringReplace(S, '"', '""', [rfReplaceAll]) + '"'
  else
    Result := S;
end;

procedure ExportToCSV(const Expenses: TExpenseArray; const Filename: string);
var
  F: TextFile;
  I: Integer;
begin
  AssignFile(F, Filename);
  Rewrite(F);
  try
    WriteLn(F, 'ID,Date,Description,Amount,Category,Recurring');
    for I := 0 to High(Expenses) do
      WriteLn(F,
        Expenses[I].ID, ',',
        FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate), ',',
        QuoteCSV(Expenses[I].Description), ',',
        Format('%.2f', [Double(Expenses[I].Amount)]), ',',
        QuoteCSV(CategoryToStr(Expenses[I].Category)), ',',
        BoolToStr(Expenses[I].IsRecurring, 'yes', 'no')
      );
  finally
    CloseFile(F);
  end;
end;

{ --- XML Export --- }

procedure ExportToXML(const Expenses: TExpenseArray; const Filename: string);
var
  Doc: TXMLDocument;
  Root, ExpensesNode, ExpNode, Child: TDOMElement;
  I: Integer;
begin
  Doc := TXMLDocument.Create;
  try
    Root := Doc.CreateElement('pennywise');
    Root.SetAttribute('version', '3.4');
    Root.SetAttribute('exported', FormatDateTime('yyyy-mm-dd', Now));
    Doc.AppendChild(Root);

    ExpensesNode := Doc.CreateElement('expenses');
    ExpensesNode.SetAttribute('count', IntToStr(Length(Expenses)));
    Root.AppendChild(ExpensesNode);

    for I := 0 to High(Expenses) do
    begin
      ExpNode := Doc.CreateElement('expense');
      ExpNode.SetAttribute('id', IntToStr(Expenses[I].ID));

      Child := Doc.CreateElement('description');
      Child.TextContent := Expenses[I].Description;
      ExpNode.AppendChild(Child);

      Child := Doc.CreateElement('amount');
      Child.TextContent := Format('%.2f', [Double(Expenses[I].Amount)]);
      ExpNode.AppendChild(Child);

      Child := Doc.CreateElement('category');
      Child.TextContent := CategoryToStr(Expenses[I].Category);
      ExpNode.AppendChild(Child);

      Child := Doc.CreateElement('date');
      Child.TextContent := FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate);
      ExpNode.AppendChild(Child);

      ExpensesNode.AppendChild(ExpNode);
    end;

    WriteXMLFile(Doc, Filename);
  finally
    Doc.Free;
  end;
end;

end.

The Bank CSV Importer

Importing bank CSV is more complex because every bank's format is different. Our importer is configurable:

type
  TCSVImportConfig = record
    Delimiter: Char;
    HasHeader: Boolean;
    DateColumn: Integer;
    DescColumn: Integer;
    AmountColumn: Integer;
    DateFormat: string;
    SkipLines: Integer;
    DecimalSeparator: Char;
    NegativeInParens: Boolean;
  end;

function ImportFromBankCSV(const Filename: string;
  const Config: TCSVImportConfig): TExpenseArray;

Here is the implementation:

function ImportFromBankCSV(const Filename: string;
  const Config: TCSVImportConfig): TExpenseArray;
var
  F: TextFile;
  Line: string;
  Fields: TStringArray;
  LineNum, Count: Integer;
  E: TExpense;
  AmountStr: string;
  DateSettings: TFormatSettings;
begin
  SetLength(Result, 0);
  Count := 0;
  LineNum := 0;

  DateSettings := DefaultFormatSettings;
  DateSettings.ShortDateFormat := Config.DateFormat;
  DateSettings.DateSeparator := Config.DateFormat[3];

  AssignFile(F, Filename);
  Reset(F);
  try
    while not System.EOF(F) do
    begin
      ReadLn(F, Line);
      Inc(LineNum);

      { Skip initial lines (bank headers, etc.) }
      if LineNum <= Config.SkipLines then Continue;

      { Skip column header row }
      if Config.HasHeader and (LineNum = Config.SkipLines + 1) then Continue;

      if Trim(Line) = '' then Continue;

      Fields := ParseCSVLine(Line);

      if (Length(Fields) <= Config.DateColumn) or
         (Length(Fields) <= Config.DescColumn) or
         (Length(Fields) <= Config.AmountColumn) then
        Continue;  { Skip malformed rows }

      E.ID := Count + 1;
      E.Description := Trim(Fields[Config.DescColumn]);

      { Parse amount with locale awareness }
      AmountStr := Trim(Fields[Config.AmountColumn]);
      if Config.NegativeInParens and (Pos('(', AmountStr) > 0) then
      begin
        AmountStr := StringReplace(AmountStr, '(', '-', []);
        AmountStr := StringReplace(AmountStr, ')', '', []);
      end;
      if Config.DecimalSeparator <> '.' then
        AmountStr := StringReplace(AmountStr, Config.DecimalSeparator, '.', []);
      AmountStr := StringReplace(AmountStr, ',', '', [rfReplaceAll]);
      if not TryStrToCurr(AmountStr, E.Amount) then
        Continue;

      { Parse date }
      if not TryStrToDate(Fields[Config.DateColumn], E.ExpenseDate,
                          DateSettings) then
        Continue;

      E.Category := ecOther;  { User classifies later }
      E.IsRecurring := False;

      Inc(Count);
      SetLength(Result, Count);
      Result[Count - 1] := E;
    end;
  finally
    CloseFile(F);
  end;

  WriteLn(Format('Imported %d transactions from %s', [Count, Filename]));
end;

Rosa downloads her bank statement, tells PennyWise which columns contain the date, description, and amount, and the importer maps each row to a TExpense. Unknown categories are marked as ecOther for manual classification later.

What Tomas Discovered

Tomas downloads his bank's CSV file and opens it in PennyWise. His bank uses semicolons instead of commas (common in European exports), puts amounts with a comma as the decimal separator ("85,50" instead of "85.50"), and includes a two-line header that is not a column row.

"This is exactly why we made the importer configurable," Rosa says, updating the import settings. The CSV parser handles all of these variations because it was designed for them.

But then Tomas notices something else. Three of his bank transactions have descriptions containing semicolons — "Payment; ref 12345" — and the parser treats the semicolons as delimiters, splitting the description across two columns. The fix is to configure the parser to expect quoted fields: his bank wraps descriptions containing semicolons in double quotes.

"Every bank CSV is a little different," Tomas sighs.

"That is why we do not hard-code anything," Rosa replies. "The configuration record handles all the variation. Once you set it up for your bank, it works forever — or until the bank changes its format."

This kind of real-world messiness is exactly what Chapter 34 prepares you for. Data interchange is rarely clean. The formats themselves are simple — JSON is six types, CSV is rows and columns, XML is tags and text. The complexity lives in the variations, edge cases, and inconsistencies of real-world data producers. A good developer designs for the messy reality, not the clean specification.

When to Use Each Format: A Decision Flowchart

Making the right format choice at the start of a project saves significant effort later. Here is a practical decision process:

  1. Is the data tabular (rows and columns)? Yes → CSV (or TSV). No → continue.
  2. Is the data simple key-value configuration? Yes → INI. No → continue.
  3. Does it need schema validation? Yes → XML (with XSD). No → continue.
  4. Does it need to be human-editable? Yes → JSON (or YAML, but Pascal has no built-in YAML support). No → continue.
  5. Is performance critical (millions of records)? Yes → Custom binary. No → JSON.

In practice, JSON covers the majority of use cases. It is the safe default when you do not have a specific reason to choose something else. PennyWise uses INI for configuration, JSON for data exchange with the web dashboard, CSV for bank imports, and binary (potentially) for its internal database.


34.9 Summary

This chapter covered the major structured data formats and how to work with them in Free Pascal.

INI files are perfect for simple key-value configuration. The TINIFile class in the IniFiles unit handles reading, writing, sections, and defaults. The ReadSections and ReadSection methods let you enumerate all configuration data programmatically.

JSON is the lingua franca of modern data interchange. Free Pascal's fpjson and jsonparser units provide TJSONObject, TJSONArray, and TJSONParser for building and parsing JSON documents. The FormatJSON method produces human-readable output; AsJSON produces compact output. Memory management follows parent-child ownership: freeing the root frees all children. The ExpenseToJSON and JSONToExpense converter functions demonstrate the round-trip serialization pattern that PennyWise uses for data exchange.

XML is more verbose but more expressive than JSON, with support for attributes, namespaces, and schema validation. The DOM and XMLRead/XMLWrite units provide a DOM-based API for building and traversing XML documents.

CSV is the simplest tabular format and the universal language of spreadsheets and bulk data. Correct CSV parsing must handle quoted fields, escaped quotes, and newlines within fields. Writing CSV requires quoting fields that contain delimiters. Bank CSV imports require configurable parsing to handle the many format variations in the wild.

Custom binary formats offer the best performance but the worst interoperability. Design them with a magic number, version field, and header. Use packed record to control byte layout. Support format versioning to handle evolution. Store currency as integer cents, never as floating-point.

Serialization patterns — particularly the converter pattern with a common interface — decouple your domain logic from specific file formats. This makes it easy to add new formats without modifying existing code. Robust deserializers handle missing fields, type mismatches, and date format variations gracefully.

PennyWise gained three export formats (JSON, XML, CSV), a configurable bank CSV importer, and an INI-based configuration file. The modular architecture from Chapter 33 kept these additions clean: the new FinanceExport unit depends on FinanceCore but nothing else, and the main application code did not change.

Data formats are the bridges between applications. The program that can only talk to itself is an island. The program that speaks JSON, XML, and CSV is a citizen of the software world.