Chapter 34 Exercises: File Formats and Serialization

Chapter 34 Exercises: File Formats and Serialization

Part A: Conceptual Understanding

A.1. Compare JSON and XML for representing the same data. What are two advantages of JSON over XML? What are two advantages of XML over JSON?

Guidance

JSON advantages: (1) Less verbose — no closing tags, shorter syntax. (2) Simpler parsing — fewer rules, fewer edge cases. XML advantages: (1) Attributes on elements allow mixing metadata with data naturally. (2) Schema validation (XSD, DTD) allows formal specification of document structure. Other valid answers: XML has namespaces for avoiding conflicts between vocabularies; JSON maps more naturally to programming language data structures.

A.2. Why do we use packed record instead of plain record when defining binary file format structures? What problem does packed solve?

Guidance

Without `packed`, the compiler may insert padding bytes between fields to align them on word boundaries (e.g., a Word field after a Byte field might have 1 byte of padding). This improves memory access speed but means the record's memory layout does not match the intended byte layout of the file format. `packed` eliminates all padding, ensuring each field immediately follows the previous one. This is essential for binary file formats where byte positions must be exact.

A.3. Explain the "parent owns children" memory management rule in fpjson. Why is it important, and what happens if you violate it?

Guidance

When you add a TJSONObject or TJSONArray as a child of another JSON object/array, the parent takes ownership. Freeing the root object automatically frees all children. If you violate this by freeing a child object separately, the parent will later try to free it again (double-free), causing a crash or memory corruption. The rule: only free the root object. Let it cascade to children automatically.

A.4. What is a "magic number" in a binary file format? Why is it important?

Guidance

A magic number is a fixed sequence of bytes at the beginning of a file that identifies the file type. For example, JPEG files start with FF D8 FF, PNG files with 89 50 4E 47. It is important because: (1) It lets you verify the file type before attempting to parse it. (2) It prevents accidentally opening the wrong type of file and misinterpreting the data. (3) It provides a quick check that the file is not corrupt.

A.5. Describe three edge cases that make CSV parsing more complex than splitting a string on commas.

Guidance

(1) Commas within quoted fields: "Portland, OR" should be one field, not two. (2) Escaped quotes within quoted fields: "She said ""hello""" contains a literal quote. (3) Newlines within quoted fields: a field can span multiple lines if quoted. Other valid answers: empty fields between consecutive commas, different delimiters (semicolons), BOM markers, mixed line endings.

Part B: Applied Analysis

B.1. You need to store application settings that include: a list of recent files (up to 10), the window position (x, y, width, height), and a set of user preferences (theme, font size, language). Design the storage format. Would you choose INI, JSON, or XML? Justify your choice and show the file structure.

Guidance

JSON is the best choice because the data includes an array (recent files) that INI cannot represent cleanly, but is not complex enough to need XML's features. Structure:

{
  "recentFiles": ["file1.dat", "file2.dat"],
  "window": {"x": 100, "y": 50, "width": 800, "height": 600},
  "preferences": {"theme": "dark", "fontSize": 14, "language": "en"}
}

INI would require awkward workarounds for the array (RecentFile1, RecentFile2, ...). XML would work but adds unnecessary verbosity.

B.2. A bank provides CSV exports in this format:

"Transaction Date";"Amount";"Description";"Balance"
"15.03.2026";"-85,50";"GROCERY STORE";"1.234,50"
"15.03.2026";"-45,00";"TRANSIT AUTHORITY";"1.189,50"

Identify all the ways this differs from "standard" CSV and explain how your parser would need to handle each difference.

Guidance

Differences: (1) Semicolons instead of commas as field delimiters. (2) All fields are quoted, even when not necessary. (3) Dates in DD.MM.YYYY format instead of YYYY-MM-DD. (4) Negative amounts indicated by a leading minus sign. (5) Comma as decimal separator instead of period ("85,50" not "85.50"). (6) Period as thousands separator ("1.234,50" not "1234.50"). Each requires specific handling: configurable delimiter, strip unnecessary quotes, configurable date format, sign detection, locale-aware number parsing (swap comma/period before converting to float).

Part C: Code Exercises

C.1. Write a program that creates an INI file with at least three sections, reads it back, modifies two values, and writes it again. Verify the file contents by reading the final version.

Guidance

Create sections like [User], [Display], [Paths]. Write values using WriteString, WriteInteger, WriteBool. Read them back using the corresponding Read methods with defaults. Modify values and call UpdateFile. Read again to verify. Consider printing the INI file contents to the console to visually confirm the result.

C.2. Write a program that creates a JSON object representing a student with fields for name (string), age (integer), graduated (boolean), and courses (array of strings). Write it to a file with FormatJSON, then read it back and display each field.

Guidance

Use TJSONObject.Create, Add() for simple fields, TJSONArray for courses. Write with FormatJSON to a TStringList.SaveToFile. Read with TFileStream + TJSONParser, navigate with Get() and Arrays[]. Remember to Free the root object.

C.3. Implement a robust CSV parser that correctly handles: (a) quoted fields with commas, (b) escaped quotes within quoted fields, (c) empty fields. Test with the following line:

"Smith, John",42,"Said ""hello""",,"Portland"

Expected result: 5 fields: Smith, John | 42 | Said "hello" | ` (empty) |Portland`

Guidance

Use the state-machine approach from Section 34.5: track an InQuote boolean. When InQuote is true, a comma is literal (added to field). When InQuote is false, a comma terminates the field. Double-quote inside a quoted field is an escaped literal quote. Consecutive commas produce empty fields. Test each edge case independently and then all together.

C.4. Write a program that stores 10,000 random expense records in three formats: JSON, CSV, and binary (packed records). Measure the file size and read/write time for each format. Display a comparison table.

Guidance

Generate 10,000 random TExpense records (random descriptions, amounts, categories, dates). Use GetTickCount64 or Now before and after each operation. Write to all three formats, then read from all three. Display results as a table with columns for format, file size (from FileSize or FileInfo), write time, and read time.

Part D: Challenge Problems

D.1. Implement a JSON-to-XML converter. The program should read any valid JSON file and produce a reasonable XML representation. Design rules for mapping JSON objects to elements, arrays to repeated elements, and primitive values to text content.

Guidance

Mapping rules: JSON object → XML element with child elements for each key. JSON array → repeated elements with a generic name (e.g., "item"). JSON string/number/boolean → text content of an element. JSON null → empty element. Key challenge: JSON array items have no names, so you need to invent element names (use the parent key name, singularized if possible). Test with nested structures.

D.2. Build a "universal import" system that can auto-detect whether a file is JSON, XML, CSV, or INI by examining its content (not its extension). The system should read the first few bytes/lines, determine the format, and parse accordingly.

Guidance

Detection heuristics: JSON starts with '{' or '[' (after whitespace/BOM). XML starts with '