> "The nice thing about standards is that you have so many to choose from."
Learning Objectives
- Read and write INI configuration files using TINIFile
- Parse and generate JSON data using the fpjson unit
- Read and write XML documents using the DOM parser
- Parse and generate CSV files with proper handling of edge cases
- Design custom binary file formats with versioning
- Implement serialization patterns for converting objects to and from file formats
- Add JSON, XML, and CSV export/import capabilities to PennyWise
In This Chapter
Chapter 34: File Formats and Serialization: JSON, XML, INI, and Custom Binary Formats
"The nice thing about standards is that you have so many to choose from." — Andrew S. Tanenbaum
34.1 Beyond Typed Files: Modern Data Formats
In Chapter 13, we learned to save and load data using Pascal's typed files — file of TExpense, file of TStudent. That was an excellent starting point. Typed files are fast, simple, and perfectly matched to Pascal's record system. But they have a fundamental limitation: they are opaque. Open a typed file in a text editor and you see gibberish. Open it in a spreadsheet and you see nothing useful. Try to read it from a Python script, a JavaScript application, or a web service, and you are out of luck. Typed files are Pascal talking to Pascal. The rest of the world does not speak that language.
Modern software lives in an ecosystem. Your Pascal program needs to read configuration files written by system administrators. It needs to export data that spreadsheets can open. It needs to exchange information with web services that speak JSON. It needs to parse bank statements that arrive as CSV files. It needs to generate reports that other applications can consume.
This chapter teaches you to speak the common languages of data interchange: INI for configuration, JSON for structured data, XML for document-oriented data, and CSV for tabular data. We will also revisit binary formats for situations where performance matters more than interoperability. By the end of this chapter, PennyWise will be able to export its data in any of these formats and import bank transaction data from CSV files — making it a genuine participant in the broader software ecosystem.
💡 Intuition: Data Formats as Languages Think of each file format as a human language. Pascal's typed files are like a private family dialect — perfectly efficient for internal communication but unintelligible to outsiders. JSON is like English — widely understood, flexible, good enough for most purposes. XML is like legal English — verbose but precise, with strict rules about structure. CSV is like a simple numbered list — minimal structure but universally readable. INI is like a labeled filing system — simple, flat, perfect for key-value configuration.
When to Use What
Before diving into specifics, here is a decision guide:
| Format | Best For | Strengths | Weaknesses |
|---|---|---|---|
| INI | Configuration files | Simple, human-readable, widely supported | Flat structure only (no nesting) |
| JSON | Data exchange, APIs, config | Structured, widely supported, lightweight | No comments (officially), no date type |
| XML | Documents, complex schemas | Self-describing, schema validation, namespaces | Verbose, complex parser |
| CSV | Tabular data, spreadsheets | Universal, tiny overhead, human-readable | No types, no nesting, quoting edge cases |
| Binary | Performance-critical, large data | Fastest read/write, compact | Not human-readable, version-fragile |
34.2 INI Files
INI files are the simplest structured file format: plain-text files organized into sections, each containing key-value pairs. They have been used for configuration since the earliest days of Windows, and they remain popular because they are almost trivially easy to read and write, both by humans and by programs.
The appeal of INI files is their simplicity. A non-programmer — a system administrator, a power user, even a curious end user — can open an INI file in Notepad, understand its structure immediately, and make changes. Try doing that with an XML file full of angle brackets and closing tags, or a JSON file where a misplaced comma causes a parse error. INI files are forgiving, human-readable, and sufficient for a surprising number of configuration needs.
Free Pascal's IniFiles unit provides the TINIFile class, which handles all the parsing, caching, and writing automatically. You never need to parse an INI file manually.
INI File Structure
[General]
AppName=PennyWise
Version=3.4
Language=English
[Database]
Filename=pennywise.db
AutoSave=true
AutoSaveInterval=30
[Display]
Currency=$
DateFormat=yyyy-mm-dd
ShowCents=true
ItemsPerPage=25
[Sync]
ServerURL=https://api.pennywise.example.com
Enabled=false
Interval=300
The format is self-explanatory: sections in square brackets, key=value pairs beneath each section. Comments start with ; or # (depending on the implementation).
A Complete TINIFile Example
Here is a complete program that demonstrates reading, writing, enumerating sections, and managing a PennyWise configuration file:
program INIDemo;
{$mode objfpc}{$H+}
uses
SysUtils, Classes, IniFiles;
var
INI: TINIFile;
AppName, DbFile, DateFmt: string;
AutoSave: Boolean;
Interval, ItemsPerPage: Integer;
Sections: TStringList;
Keys: TStringList;
I, J: Integer;
begin
INI := TINIFile.Create('pennywise.ini');
try
{ Reading values with defaults }
AppName := INI.ReadString('General', 'AppName', 'PennyWise');
DbFile := INI.ReadString('Database', 'Filename', 'data.db');
AutoSave := INI.ReadBool('Database', 'AutoSave', True);
Interval := INI.ReadInteger('Database', 'AutoSaveInterval', 60);
DateFmt := INI.ReadString('Display', 'DateFormat', 'dd/mm/yyyy');
ItemsPerPage := INI.ReadInteger('Display', 'ItemsPerPage', 20);
WriteLn('Application: ', AppName);
WriteLn('Database: ', DbFile);
WriteLn('Auto-save: ', AutoSave, ' every ', Interval, 's');
WriteLn('Date format: ', DateFmt);
WriteLn('Items per page: ', ItemsPerPage);
{ Writing values }
INI.WriteString('General', 'LastOpened', DateTimeToStr(Now));
INI.WriteInteger('Display', 'ItemsPerPage', 30);
INI.WriteBool('Sync', 'Enabled', True);
{ Enumerating all sections and keys }
WriteLn;
WriteLn('--- Full INI Contents ---');
Sections := TStringList.Create;
Keys := TStringList.Create;
try
INI.ReadSections(Sections);
for I := 0 to Sections.Count - 1 do
begin
WriteLn('[', Sections[I], ']');
INI.ReadSection(Sections[I], Keys);
for J := 0 to Keys.Count - 1 do
WriteLn(' ', Keys[J], '=',
INI.ReadString(Sections[I], Keys[J], ''));
end;
finally
Keys.Free;
Sections.Free;
end;
{ Force write to disk }
INI.UpdateFile;
finally
INI.Free;
end;
end.
Key TINIFile Methods
| Method | Purpose |
|---|---|
ReadString(Section, Key, Default) |
Read a string value, returning Default if not found |
ReadInteger(Section, Key, Default) |
Read an integer value |
ReadBool(Section, Key, Default) |
Read a boolean value |
ReadFloat(Section, Key, Default) |
Read a floating-point value |
WriteString(Section, Key, Value) |
Write a string value |
WriteInteger(Section, Key, Value) |
Write an integer value |
WriteBool(Section, Key, Value) |
Write a boolean value |
UpdateFile |
Force pending writes to disk |
SectionExists(Section) |
Check if a section exists |
ValueExists(Section, Key) |
Check if a key exists in a section |
ReadSection(Section, Strings) |
Read all keys in a section into a TStringList |
ReadSections(Strings) |
Read all section names into a TStringList |
EraseSection(Section) |
Delete an entire section |
DeleteKey(Section, Key) |
Delete a single key |
Every Read method takes a default value as its third parameter. This is a critical design feature: if the INI file is missing, corrupt, or lacks a particular key, the program gets a sensible default rather than crashing. Always provide meaningful defaults.
⚠️ Common Mistake: Forgetting UpdateFile
TINIFile buffers writes in memory. Changes are not written to disk until you call UpdateFile or the object is freed. If your program crashes between writing values and freeing the object, the changes are lost. Call UpdateFile after important writes.
A PennyWise Configuration Manager
In practice, you wrap TINIFile in a configuration class that provides typed, validated access to all settings:
type
TPennyWiseConfig = class
private
FINI: TINIFile;
FFilename: string;
public
constructor Create(const AFilename: string);
destructor Destroy; override;
{ General settings }
function GetLanguage: string;
procedure SetLanguage(const Value: string);
{ Database settings }
function GetDBPath: string;
function GetAutoSaveInterval: Integer;
function GetAutoSaveEnabled: Boolean;
{ Display settings }
function GetCurrencySymbol: string;
function GetDateFormat: string;
function GetItemsPerPage: Integer;
{ Sync settings }
function GetSyncURL: string;
function GetSyncEnabled: Boolean;
function GetSyncInterval: Integer;
procedure Save;
end;
constructor TPennyWiseConfig.Create(const AFilename: string);
begin
inherited Create;
FFilename := AFilename;
FINI := TINIFile.Create(FFilename);
end;
destructor TPennyWiseConfig.Destroy;
begin
FINI.UpdateFile;
FINI.Free;
inherited;
end;
function TPennyWiseConfig.GetLanguage: string;
begin
Result := FINI.ReadString('General', 'Language', 'English');
end;
function TPennyWiseConfig.GetDBPath: string;
begin
Result := FINI.ReadString('Database', 'Filename', 'pennywise.db');
end;
function TPennyWiseConfig.GetAutoSaveInterval: Integer;
begin
Result := FINI.ReadInteger('Database', 'AutoSaveInterval', 60);
if Result < 5 then Result := 5; { Minimum 5 seconds }
if Result > 3600 then Result := 3600; { Maximum 1 hour }
end;
function TPennyWiseConfig.GetAutoSaveEnabled: Boolean;
begin
Result := FINI.ReadBool('Database', 'AutoSave', True);
end;
function TPennyWiseConfig.GetCurrencySymbol: string;
begin
Result := FINI.ReadString('Display', 'Currency', '$');
end;
function TPennyWiseConfig.GetDateFormat: string;
begin
Result := FINI.ReadString('Display', 'DateFormat', 'yyyy-mm-dd');
end;
function TPennyWiseConfig.GetItemsPerPage: Integer;
begin
Result := FINI.ReadInteger('Display', 'ItemsPerPage', 25);
if Result < 5 then Result := 5;
if Result > 200 then Result := 200;
end;
function TPennyWiseConfig.GetSyncURL: string;
begin
Result := FINI.ReadString('Sync', 'ServerURL', '');
end;
function TPennyWiseConfig.GetSyncEnabled: Boolean;
begin
Result := FINI.ReadBool('Sync', 'Enabled', False);
end;
function TPennyWiseConfig.GetSyncInterval: Integer;
begin
Result := FINI.ReadInteger('Sync', 'Interval', 300);
end;
procedure TPennyWiseConfig.Save;
begin
FINI.UpdateFile;
end;
Notice the validation in GetAutoSaveInterval and GetItemsPerPage — the configuration manager enforces sensible bounds even if the INI file contains unreasonable values. This defensive programming prevents a hand-edited configuration file from crashing the application.
Limitations of INI Files
INI files are flat — they do not support nesting. You cannot have a section within a section, or an array of values under a single key. If you need hierarchical configuration, use JSON. If you need only key-value pairs organized into groups, INI files are ideal.
34.3 JSON
JSON (JavaScript Object Notation) has become the dominant data interchange format on the internet. It is used by virtually every REST API, many configuration systems (VS Code settings, npm package.json, tsconfig.json), and countless data pipelines. Despite its name, JSON is language-independent — every major programming language can read and write it, including Pascal.
JSON was created by Douglas Crockford in the early 2000s as a lightweight alternative to XML for data exchange. It draws its syntax from JavaScript object literals, but it has become the universal data interchange format, used far beyond the web. Its success comes from three properties: it is simple (only six data types), it is human-readable (much cleaner than XML), and it is universally supported (every language has a JSON library).
Free Pascal provides two units for JSON processing: fpjson (the data model: objects, arrays, values) and jsonparser (the parser that converts JSON text into fpjson objects). Together, they give you complete JSON support — reading, writing, building, navigating, and formatting.
JSON Structure
{
"name": "Rosa Martinelli",
"age": 28,
"freelancer": true,
"skills": ["graphic design", "illustration", "branding"],
"address": {
"city": "Portland",
"state": "OR"
},
"expenses": [
{"description": "Groceries", "amount": 85.50, "category": "food"},
{"description": "Bus pass", "amount": 45.00, "category": "transport"}
]
}
JSON has six data types: string, number, boolean (true/false), null, object (key-value pairs in {}), and array (ordered list in []). Objects and arrays can be nested to any depth.
Reading JSON with fpjson
Free Pascal includes the fpjson and jsonparser units for working with JSON:
program ReadJSON;
{$mode objfpc}{$H+}
uses
SysUtils, Classes, fpjson, jsonparser;
var
JSONStr: string;
Data: TJSONData;
Obj: TJSONObject;
Expenses: TJSONArray;
Expense: TJSONObject;
I: Integer;
begin
{ Read JSON from a string }
JSONStr := '{"name":"Rosa","expenses":[' +
'{"desc":"Groceries","amount":85.50},' +
'{"desc":"Bus pass","amount":45.00}]}';
Data := GetJSON(JSONStr);
try
Obj := Data as TJSONObject;
WriteLn('Name: ', Obj.Get('name', ''));
Expenses := Obj.Arrays['expenses'];
WriteLn('Expenses: ', Expenses.Count);
for I := 0 to Expenses.Count - 1 do
begin
Expense := Expenses.Objects[I];
WriteLn(' ', Expense.Get('desc', ''),
': $', Expense.Get('amount', 0.0):0:2);
end;
finally
Data.Free;
end;
end.
Reading JSON from a File
function LoadJSONFromFile(const Filename: string): TJSONData;
var
F: TFileStream;
Parser: TJSONParser;
begin
F := TFileStream.Create(Filename, fmOpenRead or fmShareDenyWrite);
try
Parser := TJSONParser.Create(F);
try
Result := Parser.Parse;
finally
Parser.Free;
end;
finally
F.Free;
end;
end;
Writing JSON
Building JSON programmatically uses TJSONObject and TJSONArray:
program WriteJSON;
{$mode objfpc}{$H+}
uses
SysUtils, Classes, fpjson;
var
Root: TJSONObject;
Expenses: TJSONArray;
Expense: TJSONObject;
begin
Root := TJSONObject.Create;
try
Root.Add('name', 'Rosa Martinelli');
Root.Add('version', '3.4');
Root.Add('generated', DateTimeToStr(Now));
Expenses := TJSONArray.Create;
Expense := TJSONObject.Create;
Expense.Add('description', 'Groceries');
Expense.Add('amount', 85.50);
Expense.Add('category', 'food');
Expense.Add('date', FormatDateTime('yyyy-mm-dd', Now));
Expenses.Add(Expense);
Expense := TJSONObject.Create;
Expense.Add('description', 'Bus pass');
Expense.Add('amount', 45.00);
Expense.Add('category', 'transport');
Expense.Add('date', FormatDateTime('yyyy-mm-dd', Now));
Expenses.Add(Expense);
Root.Add('expenses', Expenses);
{ Write formatted JSON to console }
WriteLn(Root.FormatJSON);
{ Write to file }
with TStringList.Create do
try
Text := Root.FormatJSON;
SaveToFile('expenses.json');
finally
Free;
end;
finally
Root.Free; { Frees all child objects too }
end;
end.
The FormatJSON method produces nicely indented output. For compact output (no whitespace), use AsJSON.
Converting TExpense to and from JSON
A critical pattern for PennyWise is converting between the domain model (TExpense records) and JSON objects. Here is the complete round-trip serialization:
function ExpenseToJSON(const E: TExpense): TJSONObject;
begin
Result := TJSONObject.Create;
Result.Add('id', E.ID);
Result.Add('description', E.Description);
Result.Add('amount', Double(E.Amount));
Result.Add('category', CategoryToStr(E.Category));
Result.Add('date', FormatDateTime('yyyy-mm-dd', E.ExpenseDate));
Result.Add('recurring', E.IsRecurring);
end;
function JSONToExpense(Obj: TJSONObject): TExpense;
begin
Result.ID := Obj.Get('id', 0);
Result.Description := Obj.Get('description', '');
Result.Amount := Obj.Get('amount', 0.0);
Result.Category := StrToCategory(Obj.Get('category', 'other'));
Result.ExpenseDate := StrToDate(Obj.Get('date', DateToStr(Now)));
Result.IsRecurring := Obj.Get('recurring', False);
end;
{ Save an entire array of expenses to a JSON file }
procedure SaveExpensesToJSON(const Expenses: TExpenseArray;
const Filename: string);
var
Root: TJSONObject;
Arr: TJSONArray;
I: Integer;
SL: TStringList;
begin
Root := TJSONObject.Create;
try
Root.Add('format', 'PennyWise Expenses');
Root.Add('version', 1);
Root.Add('exported', FormatDateTime('yyyy-mm-dd"T"hh:nn:ss', Now));
Root.Add('count', Length(Expenses));
Arr := TJSONArray.Create;
for I := 0 to High(Expenses) do
Arr.Add(ExpenseToJSON(Expenses[I]));
Root.Add('expenses', Arr);
SL := TStringList.Create;
try
SL.Text := Root.FormatJSON;
SL.SaveToFile(Filename);
finally
SL.Free;
end;
finally
Root.Free;
end;
end;
{ Load expenses from a JSON file }
function LoadExpensesFromJSON(const Filename: string): TExpenseArray;
var
Data: TJSONData;
Root: TJSONObject;
Arr: TJSONArray;
I: Integer;
begin
Data := LoadJSONFromFile(Filename);
try
Root := Data as TJSONObject;
Arr := Root.Arrays['expenses'];
SetLength(Result, Arr.Count);
for I := 0 to Arr.Count - 1 do
Result[I] := JSONToExpense(Arr.Objects[I]);
finally
Data.Free;
end;
end;
Key fpjson Classes
| Class | Purpose |
|---|---|
TJSONData |
Base class for all JSON values |
TJSONObject |
A JSON object ({key: value, ...}) |
TJSONArray |
A JSON array ([value, ...]) |
TJSONString |
A JSON string |
TJSONIntegerNumber |
A JSON integer |
TJSONFloatNumber |
A JSON float |
TJSONBoolean |
A JSON boolean |
TJSONNull |
JSON null |
TJSONParser |
Parses JSON text into objects |
💡 Memory Management Rule
When you add a TJSONObject or TJSONArray as a child of another object/array, the parent takes ownership. Freeing the root object frees all children. Do not free child objects separately — this will cause a double-free crash.
34.4 XML
XML (Extensible Markup Language) is a more formal, more verbose, and more powerful data format than JSON. While JSON has largely replaced XML for web APIs, XML remains dominant in enterprise software, document formats (DOCX, ODT, SVG are all XML-based), configuration systems (Maven pom.xml, .NET .csproj), data feeds (RSS, Atom), and inter-system communication protocols (SOAP, XMPP).
Free Pascal includes the DOM (Document Object Model) unit for working with XML. The DOM approach loads the entire XML document into memory as a tree of nodes, which you can then navigate and manipulate. For large XML files that do not fit in memory, Free Pascal also supports SAX-style (event-based) parsing, but DOM is simpler and sufficient for most applications.
XML Structure
<?xml version="1.0" encoding="UTF-8"?>
<pennywise version="3.4">
<user name="Rosa Martinelli" />
<expenses>
<expense id="1">
<description>Groceries</description>
<amount currency="USD">85.50</amount>
<category>food</category>
<date>2026-03-15</date>
</expense>
<expense id="2">
<description>Bus pass</description>
<amount currency="USD">45.00</amount>
<category>transport</category>
<date>2026-03-15</date>
</expense>
</expenses>
</pennywise>
XML has elements (tags), attributes, text content, and a strict hierarchy. Every opening tag must have a matching closing tag (or be self-closing).
Reading XML with the DOM Parser
Free Pascal provides the DOM and XMLRead units:
program ReadXML;
{$mode objfpc}{$H+}
uses
SysUtils, Classes, DOM, XMLRead;
var
Doc: TXMLDocument;
Root, ExpensesNode, ExpenseNode, ChildNode: TDOMNode;
I: Integer;
begin
ReadXMLFile(Doc, 'expenses.xml');
try
Root := Doc.DocumentElement; { <pennywise> }
WriteLn('Root element: ', Root.NodeName);
WriteLn('Version: ', TDOMElement(Root).GetAttribute('version'));
{ Find <expenses> node }
ExpensesNode := Root.FindNode('expenses');
if ExpensesNode <> nil then
begin
ExpenseNode := ExpensesNode.FirstChild;
while ExpenseNode <> nil do
begin
if ExpenseNode.NodeName = 'expense' then
begin
Write('Expense #', TDOMElement(ExpenseNode).GetAttribute('id'), ': ');
ChildNode := ExpenseNode.FindNode('description');
if ChildNode <> nil then
Write(ChildNode.TextContent);
ChildNode := ExpenseNode.FindNode('amount');
if ChildNode <> nil then
Write(' - $', ChildNode.TextContent);
WriteLn;
end;
ExpenseNode := ExpenseNode.NextSibling;
end;
end;
finally
Doc.Free;
end;
end.
Writing XML
program WriteXML;
{$mode objfpc}{$H+}
uses
SysUtils, Classes, DOM, XMLWrite;
var
Doc: TXMLDocument;
Root, ExpensesNode, ExpenseNode, ChildNode: TDOMElement;
begin
Doc := TXMLDocument.Create;
try
{ Create root element }
Root := Doc.CreateElement('pennywise');
Root.SetAttribute('version', '3.4');
Doc.AppendChild(Root);
{ Create user element }
ChildNode := Doc.CreateElement('user');
ChildNode.SetAttribute('name', 'Rosa Martinelli');
Root.AppendChild(ChildNode);
{ Create expenses container }
ExpensesNode := Doc.CreateElement('expenses');
Root.AppendChild(ExpensesNode);
{ Add an expense }
ExpenseNode := Doc.CreateElement('expense');
ExpenseNode.SetAttribute('id', '1');
ChildNode := Doc.CreateElement('description');
ChildNode.TextContent := 'Groceries';
ExpenseNode.AppendChild(ChildNode);
ChildNode := Doc.CreateElement('amount');
ChildNode.SetAttribute('currency', 'USD');
ChildNode.TextContent := '85.50';
ExpenseNode.AppendChild(ChildNode);
ChildNode := Doc.CreateElement('category');
ChildNode.TextContent := 'food';
ExpenseNode.AppendChild(ChildNode);
ChildNode := Doc.CreateElement('date');
ChildNode.TextContent := FormatDateTime('yyyy-mm-dd', Now);
ExpenseNode.AppendChild(ChildNode);
ExpensesNode.AppendChild(ExpenseNode);
{ Write to file }
WriteXMLFile(Doc, 'expenses.xml');
WriteLn('XML file written successfully.');
finally
Doc.Free;
end;
end.
DOM Navigation: A Complete Reference
The DOM API works with a tree of nodes. Understanding the navigation methods is essential for extracting data from complex XML documents:
| Method/Property | Purpose |
|---|---|
Doc.DocumentElement |
The root element of the document |
Node.FindNode('name') |
Find a direct child by tag name |
Node.FirstChild |
First child node |
Node.LastChild |
Last child node |
Node.NextSibling |
Next sibling at the same level |
Node.PreviousSibling |
Previous sibling at the same level |
Node.ParentNode |
Parent node |
Node.NodeName |
Tag name of the element |
Node.TextContent |
Text content of the element and all descendants |
Node.HasChildNodes |
Whether the node has children |
Node.ChildNodes |
A TDOMNodeList of all children |
TDOMElement(Node).GetAttribute('name') |
Get an attribute value |
TDOMElement(Node).SetAttribute('name', 'value') |
Set an attribute |
A common pattern for processing all elements of a specific type is the sibling-walking loop:
procedure ProcessAllExpenses(ExpensesNode: TDOMNode);
var
Node: TDOMNode;
Desc, Amount, Category: string;
begin
Node := ExpensesNode.FirstChild;
while Node <> nil do
begin
if (Node.NodeType = ELEMENT_NODE) and (Node.NodeName = 'expense') then
begin
Desc := '';
Amount := '';
Category := '';
if Node.FindNode('description') <> nil then
Desc := Node.FindNode('description').TextContent;
if Node.FindNode('amount') <> nil then
Amount := Node.FindNode('amount').TextContent;
if Node.FindNode('category') <> nil then
Category := Node.FindNode('category').TextContent;
WriteLn(Format(' %s: $%s (%s)', [Desc, Amount, Category]));
end;
Node := Node.NextSibling;
end;
end;
Note the check for Node.NodeType = ELEMENT_NODE. This is important because the DOM tree includes text nodes (whitespace between elements) as children. Without this check, you would try to process whitespace nodes as expense elements and get empty results.
XML Helper Functions
In practice, you write helper functions to simplify DOM access:
function GetChildText(Parent: TDOMNode; const ChildName: string;
const Default: string = ''): string;
var
Child: TDOMNode;
begin
Child := Parent.FindNode(ChildName);
if (Child <> nil) and (Child.TextContent <> '') then
Result := Child.TextContent
else
Result := Default;
end;
function GetAttr(Element: TDOMNode; const AttrName: string;
const Default: string = ''): string;
begin
if Element is TDOMElement then
Result := TDOMElement(Element).GetAttribute(AttrName)
else
Result := Default;
if Result = '' then
Result := Default;
end;
procedure AddChildText(Doc: TXMLDocument; Parent: TDOMElement;
const ChildName, Text: string);
var
Child: TDOMElement;
begin
Child := Doc.CreateElement(ChildName);
Child.TextContent := Text;
Parent.AppendChild(Child);
end;
With these helpers, the expense reading code becomes much more concise:
{ Instead of repeating FindNode + nil check + TextContent: }
Desc := GetChildText(ExpenseNode, 'description');
Amount := GetChildText(ExpenseNode, 'amount', '0.00');
Category := GetChildText(ExpenseNode, 'category', 'other');
ID := GetAttr(ExpenseNode, 'id', '0');
📊 JSON vs. XML: When to Choose Choose JSON when: you need lightweight data exchange, your consumers are web applications, or simplicity is important. Choose XML when: you need schema validation, document-oriented data, attributes on elements, or when integrating with enterprise systems that require XML. In practice, most new projects use JSON; most legacy integration requires XML.
34.5 CSV
CSV (Comma-Separated Values) is the simplest tabular data format. Every spreadsheet program can read and write CSV. Every programming language has CSV support. Bank statements, data exports, and bulk imports commonly use CSV. If you need to move tabular data between applications, CSV is the lowest common denominator — the format that everything understands.
CSV has no formal standard (RFC 4180 comes close but is not universally followed), which is both its strength and its weakness. The strength: any program can produce CSV by simply writing comma-separated values, one row per line. The weakness: different programs interpret "CSV" differently, leading to an endless parade of edge cases.
CSV Structure
Date,Description,Amount,Category
2026-03-15,Groceries,85.50,food
2026-03-15,Bus pass,45.00,transport
2026-03-14,Movie tickets,24.00,entertainment
2026-03-12,Electricity,95.30,utilities
Simple, right? Except CSV has edge cases that trip up almost every naive parser:
- Commas in values:
"Portland, OR"— values containing commas must be quoted - Quotes in values:
"She said ""hello"""— literal quotes are escaped by doubling - Newlines in values: A quoted field can span multiple lines
- Leading/trailing spaces: Are they part of the value or whitespace to trim?
- Empty fields:
a,,c— the second field is empty, not missing - Different delimiters: Some "CSV" files use semicolons, tabs, or pipes
Writing CSV
Writing CSV is straightforward if you handle quoting:
function CSVQuote(const S: string): string;
var
NeedsQuoting: Boolean;
begin
NeedsQuoting := (Pos(',', S) > 0) or (Pos('"', S) > 0) or
(Pos(#10, S) > 0) or (Pos(#13, S) > 0);
if NeedsQuoting then
Result := '"' + StringReplace(S, '"', '""', [rfReplaceAll]) + '"'
else
Result := S;
end;
procedure WriteCSVLine(var F: TextFile; const Fields: array of string);
var
I: Integer;
begin
for I := 0 to High(Fields) do
begin
if I > 0 then Write(F, ',');
Write(F, CSVQuote(Fields[I]));
end;
WriteLn(F);
end;
procedure ExportExpensesToCSV(const Expenses: TExpenseArray; const Filename: string);
var
F: TextFile;
I: Integer;
begin
AssignFile(F, Filename);
Rewrite(F);
try
{ Header row }
WriteCSVLine(F, ['Date', 'Description', 'Amount', 'Category', 'Recurring']);
{ Data rows }
for I := 0 to High(Expenses) do
WriteCSVLine(F, [
FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate),
Expenses[I].Description,
Format('%.2f', [Double(Expenses[I].Amount)]),
CategoryToStr(Expenses[I].Category),
BoolToStr(Expenses[I].IsRecurring, 'yes', 'no')
]);
finally
CloseFile(F);
end;
end;
Reading CSV
Parsing CSV correctly requires handling quoted fields. Here is a complete, robust parser:
function ParseCSVLine(const Line: string): TStringArray;
var
Fields: TStringArray;
FieldCount: Integer;
I: Integer;
InQuote: Boolean;
Current: string;
procedure AddField;
begin
if FieldCount >= Length(Fields) then
SetLength(Fields, FieldCount + 16);
Fields[FieldCount] := Current;
Inc(FieldCount);
Current := '';
end;
begin
SetLength(Fields, 16);
FieldCount := 0;
Current := '';
InQuote := False;
I := 1;
while I <= Length(Line) do
begin
if InQuote then
begin
if (Line[I] = '"') then
begin
if (I < Length(Line)) and (Line[I + 1] = '"') then
begin
Current := Current + '"'; { Escaped quote }
Inc(I); { Skip the second quote }
end
else
InQuote := False; { End of quoted field }
end
else
Current := Current + Line[I];
end
else
begin
case Line[I] of
',': AddField;
'"': InQuote := True;
else
Current := Current + Line[I];
end;
end;
Inc(I);
end;
AddField; { Add the last field }
SetLength(Fields, FieldCount);
Result := Fields;
end;
CSV Edge Cases: A Comprehensive Test
Here is a test that exercises every CSV edge case:
procedure TestCSVParser;
var
Fields: TStringArray;
begin
{ Basic fields }
Fields := ParseCSVLine('a,b,c');
Assert(Length(Fields) = 3);
Assert(Fields[0] = 'a');
{ Quoted field with comma }
Fields := ParseCSVLine('name,"Portland, OR",zip');
Assert(Fields[1] = 'Portland, OR');
{ Quoted field with embedded quote }
Fields := ParseCSVLine('title,"She said ""hello""",end');
Assert(Fields[1] = 'She said "hello"');
{ Empty fields }
Fields := ParseCSVLine('a,,c');
Assert(Length(Fields) = 3);
Assert(Fields[1] = '');
{ Leading/trailing empty }
Fields := ParseCSVLine(',b,');
Assert(Length(Fields) = 3);
Assert(Fields[0] = '');
Assert(Fields[2] = '');
WriteLn('All CSV parser tests passed.');
end;
A Configurable CSV Parser
Since real-world CSV files vary so widely, a professional CSV parser must support configurable delimiters. Here is an enhanced version that handles different delimiters and multiline quoted fields:
type
TCSVReader = class
private
FDelimiter: Char;
FQuoteChar: Char;
FLines: TStringList;
FCurrentLine: Integer;
public
constructor Create(const Filename: string; ADelimiter: Char = ',';
AQuoteChar: Char = '"');
destructor Destroy; override;
function ReadRow: TStringArray;
function EOF: Boolean;
procedure Reset;
property Delimiter: Char read FDelimiter write FDelimiter;
property CurrentLine: Integer read FCurrentLine;
end;
constructor TCSVReader.Create(const Filename: string;
ADelimiter: Char; AQuoteChar: Char);
begin
inherited Create;
FDelimiter := ADelimiter;
FQuoteChar := AQuoteChar;
FCurrentLine := 0;
FLines := TStringList.Create;
FLines.LoadFromFile(Filename);
end;
destructor TCSVReader.Destroy;
begin
FLines.Free;
inherited;
end;
function TCSVReader.ReadRow: TStringArray;
var
Line, CombinedLine: string;
InQuote: Boolean;
begin
if EOF then
begin
SetLength(Result, 0);
Exit;
end;
{ Handle multiline quoted fields by combining lines }
CombinedLine := FLines[FCurrentLine];
Inc(FCurrentLine);
{ Count quotes — if odd number, the field spans multiple lines }
InQuote := False;
for var C in CombinedLine do
if C = FQuoteChar then InQuote := not InQuote;
while InQuote and (FCurrentLine < FLines.Count) do
begin
CombinedLine := CombinedLine + #10 + FLines[FCurrentLine];
Inc(FCurrentLine);
InQuote := False;
for var C in CombinedLine do
if C = FQuoteChar then InQuote := not InQuote;
end;
Result := ParseCSVLineWithDelimiter(CombinedLine, FDelimiter, FQuoteChar);
end;
function TCSVReader.EOF: Boolean;
begin
Result := FCurrentLine >= FLines.Count;
end;
procedure TCSVReader.Reset;
begin
FCurrentLine := 0;
end;
This TCSVReader class handles the three most problematic CSV edge cases: configurable delimiters (commas, semicolons, tabs), multiline fields (where a quoted value contains newline characters), and escaped quotes (doubled quote characters inside quoted fields). It reads the file once into a TStringList and then parses row by row, combining physical lines when a quoted field spans multiple lines.
Importing Bank CSV Files
Bank CSV imports deserve special attention because they are one of the most common real-world data integration tasks, and they are also one of the most frustrating. The core problem is that there is no universal bank CSV standard. Every bank formats its exports differently, and even the same bank may change its format without notice.
Here are some of the variations you will encounter in the wild:
Delimiter variations: Most banks use commas, but European banks often use semicolons (because commas are used as decimal separators in many European locales). Some banks use tabs. A few use pipes.
Date format variations: US banks typically use MM/DD/YYYY. European banks use DD/MM/YYYY or DD.MM.YYYY. ISO-compliant banks use YYYY-MM-DD. Some include timestamps; most do not.
Amount format variations: US banks use a period as the decimal separator (85.50). European banks use a comma (85,50). Some banks use a period as a thousands separator (1.234,50). Some banks put debits and credits in separate columns. Others use a single amount column with negative numbers for debits. Some banks use parentheses for negative numbers: (85.50).
Header variations: Most banks include a header row. Some include multiple header lines (bank name, account number, date range, then the column headers). Some include no headers at all.
Encoding variations: Most modern exports are UTF-8. Some older systems use Latin-1 or Windows-1252. A few still produce ASCII-only output, replacing special characters.
The lesson is clear: a bank CSV importer must be configurable. Hard-coding any assumption about the format will break the first time a user tries to import from a different bank.
⚠️ The Bank CSV Problem
Every bank formats its CSV exports differently. Some use semicolons instead of commas. Some put the date in MM/DD/YYYY format, others in DD.MM.YYYY. Some include a header row, others do not. Some quote every field, others quote nothing. When importing bank CSV files, you almost always need a configuration step where the user maps columns and specifies the date format. PennyWise handles this in the project checkpoint.
34.6 Custom Binary Formats
Sometimes neither JSON nor CSV is appropriate. If you are storing millions of records, or if read/write speed is critical, or if file size matters, a custom binary format can be orders of magnitude more efficient than text-based formats.
Designing a Binary Format with a Versioning Header
A well-designed binary format has these properties:
- A magic number — the first few bytes identify the file type
- A version field — so the reader knows which format version to expect
- A header — metadata about the file contents
- A body — the actual data in a defined layout
- A checksum (optional) — to detect corruption
type
TPennyWiseHeader = packed record
Magic: array[0..3] of Char; { 'PWDB' }
Version: Word; { Format version: 1, 2, ... }
RecordCount: LongWord; { Number of expense records }
CreatedDate: Int64; { TDateTime as Int64 bits }
Reserved: array[0..15] of Byte; { For future use }
end;
TPennyWiseRecordV1 = packed record
ID: LongWord;
DescLen: Word;
{ Description string follows (variable length) }
Amount: Int64; { Currency stored as cents (integer) }
Category: Byte; { TExpenseCategory ordinal }
ExpenseDate: Int64; { TDateTime as Int64 bits }
Flags: Byte; { Bit 0: IsRecurring }
end;
{ Version 2 adds a Notes field }
TPennyWiseRecordV2 = packed record
ID: LongWord;
DescLen: Word;
{ Description string follows (variable length) }
Amount: Int64;
Category: Byte;
ExpenseDate: Int64;
Flags: Byte;
NotesLen: Word; { NEW in v2 }
{ Notes string follows (variable length) }
end;
The packed record Keyword
Notice packed record instead of plain record. Without packed, the compiler may insert padding bytes between fields for alignment optimization. In a file format, we need exact control over the byte layout, so packed eliminates all padding.
Why Store Currency as Cents?
Notice that the amount is stored as Int64 representing cents rather than as a floating-point number. This is critical: floating-point numbers cannot represent all decimal values exactly. The value 0.10 (ten cents) cannot be represented precisely in IEEE 754 floating-point. Over thousands of transactions, these tiny errors accumulate into real discrepancies. By storing amounts as integers (pennies/cents), every value is exact. The conversion is trivial: store Amount * 100, display Amount div 100 + '.' + Amount mod 100.
This is the same technique used by banks, payment processors, and accounting software worldwide. When you design binary formats for financial data, never use floating-point for currency. Use integer cents, or use Pascal's Currency type (which is internally a 64-bit integer with an implied four decimal places).
Versioning Strategy
The version field is critical. When you need to add new fields or change the format:
- Increment the version number
- Write the new format
- In the reader, check the version and branch accordingly:
procedure LoadBinaryExpenses(const Filename: string);
var
F: file;
Header: TPennyWiseHeader;
begin
AssignFile(F, Filename);
Reset(F, 1);
try
BlockRead(F, Header, SizeOf(Header));
if (Header.Magic[0] <> 'P') or (Header.Magic[1] <> 'W') or
(Header.Magic[2] <> 'D') or (Header.Magic[3] <> 'B') then
raise Exception.Create('Not a PennyWise data file');
case Header.Version of
1: LoadVersion1(F, Header.RecordCount);
2: LoadVersion2(F, Header.RecordCount);
else
raise Exception.CreateFmt('Unsupported format version: %d', [Header.Version]);
end;
finally
CloseFile(F);
end;
end;
Writing the Binary Format
Here is the complete write procedure for version 2:
procedure SaveBinaryExpenses(const Expenses: TExpenseArray;
const Filename: string);
var
F: file;
Header: TPennyWiseHeader;
DescBytes: TBytes;
NotesBytes: TBytes;
Len: Word;
AmountCents: Int64;
DateBits: Int64;
Cat: Byte;
Flags: Byte;
I: Integer;
begin
AssignFile(F, Filename);
Rewrite(F, 1);
try
{ Write header }
Header.Magic[0] := 'P';
Header.Magic[1] := 'W';
Header.Magic[2] := 'D';
Header.Magic[3] := 'B';
Header.Version := 2;
Header.RecordCount := Length(Expenses);
Move(Now, Header.CreatedDate, SizeOf(Int64));
FillChar(Header.Reserved, SizeOf(Header.Reserved), 0);
BlockWrite(F, Header, SizeOf(Header));
{ Write records }
for I := 0 to High(Expenses) do
begin
BlockWrite(F, Expenses[I].ID, SizeOf(LongWord));
DescBytes := TEncoding.UTF8.GetBytes(Expenses[I].Description);
Len := Length(DescBytes);
BlockWrite(F, Len, SizeOf(Word));
if Len > 0 then
BlockWrite(F, DescBytes[0], Len);
AmountCents := Round(Expenses[I].Amount * 10000);
BlockWrite(F, AmountCents, SizeOf(Int64));
Cat := Ord(Expenses[I].Category);
BlockWrite(F, Cat, 1);
Move(Expenses[I].ExpenseDate, DateBits, SizeOf(Int64));
BlockWrite(F, DateBits, SizeOf(Int64));
Flags := 0;
if Expenses[I].IsRecurring then Flags := Flags or 1;
BlockWrite(F, Flags, 1);
{ Notes (empty for now) }
Len := 0;
BlockWrite(F, Len, SizeOf(Word));
end;
finally
CloseFile(F);
end;
end;
Performance Comparison
For 100,000 expense records, typical file sizes and read times:
| Format | File Size | Write Time | Read Time |
|---|---|---|---|
| JSON (formatted) | 12.4 MB | 890 ms | 1,240 ms |
| JSON (compact) | 8.1 MB | 650 ms | 980 ms |
| XML | 15.7 MB | 1,100 ms | 1,800 ms |
| CSV | 5.2 MB | 320 ms | 480 ms |
| Binary (packed) | 3.1 MB | 45 ms | 38 ms |
Binary is 10-30x faster and 2-5x smaller. But it cannot be opened in a text editor, cannot be read by other applications without a custom parser, and is fragile across format changes. The choice depends on your priorities.
💡 Design Principle: Use Text Formats for Interchange, Binary for Internal Storage A common architecture uses binary files for the application's internal data store (fast, compact) and text formats (JSON, CSV) for import/export. This gives you the best of both worlds: performance for daily use, interoperability when sharing data with other applications.
Endianness and Portability
One pitfall of binary formats is endianness — the order in which bytes are stored for multi-byte values. Intel and AMD CPUs are little-endian (least significant byte first). Some ARM configurations are big-endian (most significant byte first). If you write a binary file on a little-endian machine and read it on a big-endian machine, multi-byte values (integers, floats) will be garbled.
For files that will only be read on the same architecture (most desktop applications), this is not a concern. For files that might be transferred between architectures, you must define a byte order and convert as needed. The convention is to use network byte order (big-endian), with explicit conversion functions:
function SwapEndian16(Value: Word): Word;
begin
Result := (Value shr 8) or (Value shl 8);
end;
function SwapEndian32(Value: LongWord): LongWord;
begin
Result := ((Value and $FF) shl 24) or
((Value and $FF00) shl 8) or
((Value and $FF0000) shr 8) or
((Value and $FF000000) shr 24);
end;
Free Pascal also provides built-in SwapEndian functions in the System unit for this purpose. For PennyWise, which runs on standard desktop platforms (all little-endian), endianness is not a concern. But it is important to know about for cross-platform binary format design.
Choosing Between Binary and Text Formats: A Decision Framework
| Criterion | Binary | Text (JSON/CSV) |
|---|---|---|
| File size | Smallest | 2-5x larger |
| Read/write speed | Fastest (10-30x) | Slower (parsing overhead) |
| Human-readable | No | Yes |
| Cross-application | No (custom parser needed) | Yes (universal) |
| Version evolution | Fragile (must handle old versions) | Flexible (ignore unknown fields) |
| Debugging | Hex editor required | Text editor |
| Corruption recovery | Difficult | Easier (partial data readable) |
Use binary for: internal databases, cache files, performance-critical data, large datasets (>1M records). Use text for: configuration files, data exchange, reports, anything a human might need to read.
34.7 Serialization Patterns
Serialization is the process of converting an in-memory object (a TExpense record, a TBudget array, a complex object graph) into a format that can be stored in a file, transmitted over a network, or persisted in a database. Deserialization is the reverse: reconstructing the in-memory object from the stored format.
Every time PennyWise saves expenses to JSON, it is serializing. Every time it loads a bank CSV, it is deserializing. These operations seem simple when you have one format and one data type, but they become complex when you have multiple formats (JSON, XML, CSV, binary) and multiple data types (expenses, budgets, categories, user preferences). Without a systematic approach, you end up with a tangled web of format-specific code.
The Converter Pattern
One clean approach is to define format-specific converters that each implement the same interface:
type
IExpenseSerializer = interface
procedure SaveExpenses(const Expenses: TExpenseArray;
const Filename: string);
function LoadExpenses(const Filename: string): TExpenseArray;
end;
TJSONExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
procedure SaveExpenses(const Expenses: TExpenseArray;
const Filename: string);
function LoadExpenses(const Filename: string): TExpenseArray;
end;
TCSVExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
procedure SaveExpenses(const Expenses: TExpenseArray;
const Filename: string);
function LoadExpenses(const Filename: string): TExpenseArray;
end;
TXMLExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
procedure SaveExpenses(const Expenses: TExpenseArray;
const Filename: string);
function LoadExpenses(const Filename: string): TExpenseArray;
end;
Now the rest of your application works with IExpenseSerializer without knowing or caring whether the format is JSON, CSV, or XML:
procedure ExportData(Serializer: IExpenseSerializer;
const Expenses: TExpenseArray; const Filename: string);
begin
WriteLn('Exporting ', Length(Expenses), ' expenses to ', Filename);
Serializer.SaveExpenses(Expenses, Filename);
WriteLn('Export complete.');
end;
This is the same dependency inversion principle we applied to unit design in Chapter 33, now applied to file format handling.
Implementing TJSONExpenseSerializer
Here is the complete implementation for the JSON serializer, demonstrating the converter pattern in practice:
type
TJSONExpenseSerializer = class(TInterfacedObject, IExpenseSerializer)
private
function ExpenseToObj(const E: TExpense): TJSONObject;
function ObjToExpense(Obj: TJSONObject): TExpense;
public
procedure SaveExpenses(const Expenses: TExpenseArray;
const Filename: string);
function LoadExpenses(const Filename: string): TExpenseArray;
end;
function TJSONExpenseSerializer.ExpenseToObj(const E: TExpense): TJSONObject;
begin
Result := TJSONObject.Create;
Result.Add('id', E.ID);
Result.Add('description', E.Description);
Result.Add('amount', Double(E.Amount));
Result.Add('category', CategoryToStr(E.Category));
Result.Add('date', FormatDateTime('yyyy-mm-dd', E.ExpenseDate));
Result.Add('recurring', E.IsRecurring);
end;
function TJSONExpenseSerializer.ObjToExpense(Obj: TJSONObject): TExpense;
begin
Result.ID := Obj.Get('id', 0);
Result.Description := Obj.Get('description', '');
Result.Amount := SafeGetAmount(Obj, 'amount');
Result.Category := StrToCategory(Obj.Get('category', 'other'));
Result.ExpenseDate := SafeGetDate(Obj, 'date');
Result.IsRecurring := Obj.Get('recurring', False);
end;
procedure TJSONExpenseSerializer.SaveExpenses(const Expenses: TExpenseArray;
const Filename: string);
var
Root: TJSONObject;
Arr: TJSONArray;
I: Integer;
SL: TStringList;
begin
Root := TJSONObject.Create;
try
Root.Add('format', 'PennyWise Expenses');
Root.Add('version', 1);
Root.Add('exported', FormatDateTime('yyyy-mm-dd"T"hh:nn:ss', Now));
Root.Add('count', Length(Expenses));
Arr := TJSONArray.Create;
for I := 0 to High(Expenses) do
Arr.Add(ExpenseToObj(Expenses[I]));
Root.Add('expenses', Arr);
SL := TStringList.Create;
try
SL.Text := Root.FormatJSON;
SL.SaveToFile(Filename);
finally
SL.Free;
end;
finally
Root.Free;
end;
end;
function TJSONExpenseSerializer.LoadExpenses(
const Filename: string): TExpenseArray;
var
Data: TJSONData;
Root: TJSONObject;
Arr: TJSONArray;
I: Integer;
begin
SetLength(Result, 0);
if not FileExists(Filename) then Exit;
Data := LoadJSONFromFile(Filename);
try
if not (Data is TJSONObject) then Exit;
Root := Data as TJSONObject;
{ Check format version for forward compatibility }
if Root.Get('version', 0) > 1 then
WriteLn('Warning: file was created by a newer version of PennyWise');
Arr := Root.Arrays['expenses'];
SetLength(Result, Arr.Count);
for I := 0 to Arr.Count - 1 do
Result[I] := ObjToExpense(Arr.Objects[I]);
finally
Data.Free;
end;
end;
The serializer checks the file's version field before loading. If the file was created by a newer version of PennyWise, it warns the user but still attempts to load — because ObjToExpense uses defaults for missing fields, it will gracefully handle any fields that the current version does not recognize.
Handling Format Differences Gracefully
Real-world data is messy. A robust deserializer must handle:
- Missing fields: Use defaults. Do not crash if a field is absent.
- Extra fields: Ignore them. Forward compatibility means accepting data from newer versions.
- Type mismatches: An amount might be a string "85.50" instead of a number 85.50. Handle both.
- Date format variations: Try multiple date formats before giving up.
- Encoding issues: UTF-8 BOM, different line endings (CR, LF, CRLF).
function SafeGetAmount(Obj: TJSONObject; const Key: string): Currency;
var
Data: TJSONData;
begin
Data := Obj.Find(Key);
if Data = nil then
Exit(0);
case Data.JSONType of
jtNumber: Result := Data.AsFloat;
jtString:
if not TryStrToCurr(Data.AsString, Result) then
Result := 0;
else
Result := 0;
end;
end;
function SafeGetDate(Obj: TJSONObject; const Key: string): TDateTime;
var
S: string;
Formats: array[0..3] of string;
I: Integer;
Settings: TFormatSettings;
begin
S := Obj.Get(Key, '');
if S = '' then
Exit(Now);
Formats[0] := 'yyyy-mm-dd';
Formats[1] := 'mm/dd/yyyy';
Formats[2] := 'dd.mm.yyyy';
Formats[3] := 'dd/mm/yyyy';
Settings := DefaultFormatSettings;
for I := 0 to High(Formats) do
begin
Settings.ShortDateFormat := Formats[I];
if TryStrToDate(S, Result, Settings) then
Exit;
end;
Result := Now; { All formats failed — use current date }
end;
34.8 Project Checkpoint: PennyWise Export/Import
PennyWise has been storing its data in a format that only PennyWise can read. This is like writing a journal in a private cipher — useful for you, but useless for anyone (or anything) else. In this checkpoint, we give PennyWise the ability to speak the world's data languages, transforming it from a closed system into an interoperable one.
We will add three capabilities:
- Export expenses to JSON, XML, and CSV
- Import bank transactions from CSV
- Configuration via INI file
The Export System
We create a FinanceExport unit (extending the modular architecture from Chapter 33):
unit FinanceExport;
{$mode objfpc}{$H+}
interface
uses
SysUtils, Classes, FinanceCore;
procedure ExportToJSON(const Expenses: TExpenseArray; const Filename: string);
procedure ExportToCSV(const Expenses: TExpenseArray; const Filename: string);
procedure ExportToXML(const Expenses: TExpenseArray; const Filename: string);
implementation
uses
fpjson, DOM, XMLWrite;
{ --- JSON Export --- }
procedure ExportToJSON(const Expenses: TExpenseArray; const Filename: string);
var
Root: TJSONObject;
Arr: TJSONArray;
Exp: TJSONObject;
I: Integer;
SL: TStringList;
begin
Root := TJSONObject.Create;
try
Root.Add('format', 'PennyWise Export');
Root.Add('version', '3.4');
Root.Add('exported', FormatDateTime('yyyy-mm-dd"T"hh:nn:ss', Now));
Root.Add('count', Length(Expenses));
Arr := TJSONArray.Create;
for I := 0 to High(Expenses) do
begin
Exp := TJSONObject.Create;
Exp.Add('id', Expenses[I].ID);
Exp.Add('description', Expenses[I].Description);
Exp.Add('amount', Double(Expenses[I].Amount));
Exp.Add('category', CategoryToStr(Expenses[I].Category));
Exp.Add('date', FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate));
Exp.Add('recurring', Expenses[I].IsRecurring);
Arr.Add(Exp);
end;
Root.Add('expenses', Arr);
SL := TStringList.Create;
try
SL.Text := Root.FormatJSON;
SL.SaveToFile(Filename);
finally
SL.Free;
end;
finally
Root.Free;
end;
end;
{ --- CSV Export --- }
function QuoteCSV(const S: string): string;
begin
if (Pos(',', S) > 0) or (Pos('"', S) > 0) or (Pos(#10, S) > 0) then
Result := '"' + StringReplace(S, '"', '""', [rfReplaceAll]) + '"'
else
Result := S;
end;
procedure ExportToCSV(const Expenses: TExpenseArray; const Filename: string);
var
F: TextFile;
I: Integer;
begin
AssignFile(F, Filename);
Rewrite(F);
try
WriteLn(F, 'ID,Date,Description,Amount,Category,Recurring');
for I := 0 to High(Expenses) do
WriteLn(F,
Expenses[I].ID, ',',
FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate), ',',
QuoteCSV(Expenses[I].Description), ',',
Format('%.2f', [Double(Expenses[I].Amount)]), ',',
QuoteCSV(CategoryToStr(Expenses[I].Category)), ',',
BoolToStr(Expenses[I].IsRecurring, 'yes', 'no')
);
finally
CloseFile(F);
end;
end;
{ --- XML Export --- }
procedure ExportToXML(const Expenses: TExpenseArray; const Filename: string);
var
Doc: TXMLDocument;
Root, ExpensesNode, ExpNode, Child: TDOMElement;
I: Integer;
begin
Doc := TXMLDocument.Create;
try
Root := Doc.CreateElement('pennywise');
Root.SetAttribute('version', '3.4');
Root.SetAttribute('exported', FormatDateTime('yyyy-mm-dd', Now));
Doc.AppendChild(Root);
ExpensesNode := Doc.CreateElement('expenses');
ExpensesNode.SetAttribute('count', IntToStr(Length(Expenses)));
Root.AppendChild(ExpensesNode);
for I := 0 to High(Expenses) do
begin
ExpNode := Doc.CreateElement('expense');
ExpNode.SetAttribute('id', IntToStr(Expenses[I].ID));
Child := Doc.CreateElement('description');
Child.TextContent := Expenses[I].Description;
ExpNode.AppendChild(Child);
Child := Doc.CreateElement('amount');
Child.TextContent := Format('%.2f', [Double(Expenses[I].Amount)]);
ExpNode.AppendChild(Child);
Child := Doc.CreateElement('category');
Child.TextContent := CategoryToStr(Expenses[I].Category);
ExpNode.AppendChild(Child);
Child := Doc.CreateElement('date');
Child.TextContent := FormatDateTime('yyyy-mm-dd', Expenses[I].ExpenseDate);
ExpNode.AppendChild(Child);
ExpensesNode.AppendChild(ExpNode);
end;
WriteXMLFile(Doc, Filename);
finally
Doc.Free;
end;
end;
end.
The Bank CSV Importer
Importing bank CSV is more complex because every bank's format is different. Our importer is configurable:
type
TCSVImportConfig = record
Delimiter: Char;
HasHeader: Boolean;
DateColumn: Integer;
DescColumn: Integer;
AmountColumn: Integer;
DateFormat: string;
SkipLines: Integer;
DecimalSeparator: Char;
NegativeInParens: Boolean;
end;
function ImportFromBankCSV(const Filename: string;
const Config: TCSVImportConfig): TExpenseArray;
Here is the implementation:
function ImportFromBankCSV(const Filename: string;
const Config: TCSVImportConfig): TExpenseArray;
var
F: TextFile;
Line: string;
Fields: TStringArray;
LineNum, Count: Integer;
E: TExpense;
AmountStr: string;
DateSettings: TFormatSettings;
begin
SetLength(Result, 0);
Count := 0;
LineNum := 0;
DateSettings := DefaultFormatSettings;
DateSettings.ShortDateFormat := Config.DateFormat;
DateSettings.DateSeparator := Config.DateFormat[3];
AssignFile(F, Filename);
Reset(F);
try
while not System.EOF(F) do
begin
ReadLn(F, Line);
Inc(LineNum);
{ Skip initial lines (bank headers, etc.) }
if LineNum <= Config.SkipLines then Continue;
{ Skip column header row }
if Config.HasHeader and (LineNum = Config.SkipLines + 1) then Continue;
if Trim(Line) = '' then Continue;
Fields := ParseCSVLine(Line);
if (Length(Fields) <= Config.DateColumn) or
(Length(Fields) <= Config.DescColumn) or
(Length(Fields) <= Config.AmountColumn) then
Continue; { Skip malformed rows }
E.ID := Count + 1;
E.Description := Trim(Fields[Config.DescColumn]);
{ Parse amount with locale awareness }
AmountStr := Trim(Fields[Config.AmountColumn]);
if Config.NegativeInParens and (Pos('(', AmountStr) > 0) then
begin
AmountStr := StringReplace(AmountStr, '(', '-', []);
AmountStr := StringReplace(AmountStr, ')', '', []);
end;
if Config.DecimalSeparator <> '.' then
AmountStr := StringReplace(AmountStr, Config.DecimalSeparator, '.', []);
AmountStr := StringReplace(AmountStr, ',', '', [rfReplaceAll]);
if not TryStrToCurr(AmountStr, E.Amount) then
Continue;
{ Parse date }
if not TryStrToDate(Fields[Config.DateColumn], E.ExpenseDate,
DateSettings) then
Continue;
E.Category := ecOther; { User classifies later }
E.IsRecurring := False;
Inc(Count);
SetLength(Result, Count);
Result[Count - 1] := E;
end;
finally
CloseFile(F);
end;
WriteLn(Format('Imported %d transactions from %s', [Count, Filename]));
end;
Rosa downloads her bank statement, tells PennyWise which columns contain the date, description, and amount, and the importer maps each row to a TExpense. Unknown categories are marked as ecOther for manual classification later.
What Tomas Discovered
Tomas downloads his bank's CSV file and opens it in PennyWise. His bank uses semicolons instead of commas (common in European exports), puts amounts with a comma as the decimal separator ("85,50" instead of "85.50"), and includes a two-line header that is not a column row.
"This is exactly why we made the importer configurable," Rosa says, updating the import settings. The CSV parser handles all of these variations because it was designed for them.
But then Tomas notices something else. Three of his bank transactions have descriptions containing semicolons — "Payment; ref 12345" — and the parser treats the semicolons as delimiters, splitting the description across two columns. The fix is to configure the parser to expect quoted fields: his bank wraps descriptions containing semicolons in double quotes.
"Every bank CSV is a little different," Tomas sighs.
"That is why we do not hard-code anything," Rosa replies. "The configuration record handles all the variation. Once you set it up for your bank, it works forever — or until the bank changes its format."
This kind of real-world messiness is exactly what Chapter 34 prepares you for. Data interchange is rarely clean. The formats themselves are simple — JSON is six types, CSV is rows and columns, XML is tags and text. The complexity lives in the variations, edge cases, and inconsistencies of real-world data producers. A good developer designs for the messy reality, not the clean specification.
When to Use Each Format: A Decision Flowchart
Making the right format choice at the start of a project saves significant effort later. Here is a practical decision process:
- Is the data tabular (rows and columns)? Yes → CSV (or TSV). No → continue.
- Is the data simple key-value configuration? Yes → INI. No → continue.
- Does it need schema validation? Yes → XML (with XSD). No → continue.
- Does it need to be human-editable? Yes → JSON (or YAML, but Pascal has no built-in YAML support). No → continue.
- Is performance critical (millions of records)? Yes → Custom binary. No → JSON.
In practice, JSON covers the majority of use cases. It is the safe default when you do not have a specific reason to choose something else. PennyWise uses INI for configuration, JSON for data exchange with the web dashboard, CSV for bank imports, and binary (potentially) for its internal database.
34.9 Summary
This chapter covered the major structured data formats and how to work with them in Free Pascal.
INI files are perfect for simple key-value configuration. The TINIFile class in the IniFiles unit handles reading, writing, sections, and defaults. The ReadSections and ReadSection methods let you enumerate all configuration data programmatically.
JSON is the lingua franca of modern data interchange. Free Pascal's fpjson and jsonparser units provide TJSONObject, TJSONArray, and TJSONParser for building and parsing JSON documents. The FormatJSON method produces human-readable output; AsJSON produces compact output. Memory management follows parent-child ownership: freeing the root frees all children. The ExpenseToJSON and JSONToExpense converter functions demonstrate the round-trip serialization pattern that PennyWise uses for data exchange.
XML is more verbose but more expressive than JSON, with support for attributes, namespaces, and schema validation. The DOM and XMLRead/XMLWrite units provide a DOM-based API for building and traversing XML documents.
CSV is the simplest tabular format and the universal language of spreadsheets and bulk data. Correct CSV parsing must handle quoted fields, escaped quotes, and newlines within fields. Writing CSV requires quoting fields that contain delimiters. Bank CSV imports require configurable parsing to handle the many format variations in the wild.
Custom binary formats offer the best performance but the worst interoperability. Design them with a magic number, version field, and header. Use packed record to control byte layout. Support format versioning to handle evolution. Store currency as integer cents, never as floating-point.
Serialization patterns — particularly the converter pattern with a common interface — decouple your domain logic from specific file formats. This makes it easy to add new formats without modifying existing code. Robust deserializers handle missing fields, type mismatches, and date format variations gracefully.
PennyWise gained three export formats (JSON, XML, CSV), a configurable bank CSV importer, and an INI-based configuration file. The modular architecture from Chapter 33 kept these additions clean: the new FinanceExport unit depends on FinanceCore but nothing else, and the main application code did not change.
Data formats are the bridges between applications. The program that can only talk to itself is an island. The program that speaks JSON, XML, and CSV is a citizen of the software world.