Chapter 7 Key Takeaways

Multi-Row Operations

  1. Thread switching is the hidden tax on single-row SQL. Every SQL call crosses the application-to-DB2 address space boundary at a cost of 0.01–0.05 ms. At 500 million calls, that's hours of pure overhead. Multi-row operations reduce this by 99%+ by batching rows into single calls.

  2. Multi-row FETCH requires three changes: WITH ROWSET POSITIONING on the cursor declaration, COBOL host variable arrays (OCCURS), and FETCH NEXT ROWSET FROM... FOR n ROWS syntax. Always check SQLERRD(3) for actual row count — not the requested rowset size.

  3. The three most common multi-row bugs are: (a) processing stale data in array positions beyond SQLERRD(3), (b) forgetting indicator arrays for nullable columns and checking single values instead of indexed values, and (c) forgetting the final flush in multi-row INSERT.

  4. Rowset sizing is empirical. 100–1000 is the typical sweet spot. Below 100, the overhead reduction is modest. Above 1000, WORKING-STORAGE pressure and diminishing returns dominate. Benchmark against your specific workload and row width.

  5. ATOMIC vs. NOT ATOMIC is a business decision. Use ATOMIC for financial/regulatory data where partial batches are unacceptable. Use NOT ATOMIC for logging/audit where maximum coverage matters more than completeness.

MERGE

  1. MERGE replaces the SELECT-then-decide-then-write anti-pattern with a single atomic statement. One thread switch instead of three. No concurrency gap between the existence check and the write.

  2. The MERGE source must not contain duplicates on the ON clause columns. SQLCODE -788 means your source has duplicate join keys. Aggregate before merging.

  3. Combine MERGE with global temporary tables for batch upsert. Multi-row INSERT into the temp table, then MERGE from the temp table into the target.

Temporal Tables

  1. System-period temporal tables require zero application changes. DB2 handles history archival transparently on UPDATE and DELETE. Existing COBOL programs continue to work without modification (unless they use SELECT *).

  2. The three temporal query forms answer different questions: AS OF returns the single version valid at a point in time. FROM...TO returns all versions overlapping a range (exclusive end). BETWEEN...AND returns all versions overlapping a range (inclusive end).

  3. Bi-temporal tables are essential when the data has business-meaningful validity periods. System-period alone answers "when was this current in the database." Business-period answers "when was this valid in the real world." Regulatory audits often require both perspectives.

  4. History tables grow without bound. Plan for it. Use range partitioning by SYS_END for efficient purging. Budget DASD capacity for (current_rows * avg_updates_per_row * retention_years) rows in the history table.

Recursive CTEs

  1. Recursive CTEs resolve hierarchies in one SQL statement — replacing multi-pass cursor loops with stack management. The anchor member provides root nodes; the recursive member traverses each level via UNION ALL and self-join.

  2. Always include a depth limit. A single cyclic row in your data will cause unbounded recursion that consumes all TEMP space. Add a level counter and filter on it in the recursive member's WHERE clause.

  3. Recursive CTEs execute within DB2's work file processing. They use TEMP database space, not your program's WORKING-STORAGE. Size TEMP for your largest expected hierarchy traversal.

OLAP Functions

  1. OLAP functions push row-comparison and accumulation logic into SQL. Running totals (SUM OVER), previous/next row access (LAG/LEAD), ranking (ROW_NUMBER/RANK/DENSE_RANK), and moving aggregates replace the most complex COBOL cursor-processing patterns.

  2. PARTITION BY defines independent groups; ORDER BY defines sequence within groups. The window frame (ROWS BETWEEN / RANGE BETWEEN) defines which rows contribute to the aggregate. Get these three clauses right and the function does the rest.

  3. ROWS vs. RANGE matters with duplicate ORDER BY values. ROWS counts physical positions. RANGE groups rows with the same ORDER BY value. Use ROWS when your ORDER BY includes a unique tiebreaker; use RANGE when you intentionally want tie grouping.

Decision Framework

  1. Use set-based SQL for data transformations; use COBOL for business logic. The boundary is: if it's about computing values from other rows, SQL does it better. If it's about making decisions based on business rules, external service calls, or side effects, COBOL handles it.

  2. The hybrid approach is the production approach. Real batch programs combine multi-row FETCH (input), COBOL business logic (processing), multi-row INSERT/MERGE (output), OLAP functions (analytics), and cursor-based handling (exceptions). The techniques are complementary, not competing.

Production Reminders

  • SQLERRD(3) is your row count for multi-row operations. Memorize this.
  • Indicator arrays must be declared for every nullable column. Check them by index, not as single values.
  • Temporal history purge jobs are your responsibility. DB2 doesn't auto-purge.
  • TEMP database sizing must account for recursive CTEs and OLAP sorts.
  • EXPLAIN every MERGE, recursive CTE, and OLAP query before production deployment. These constructs generate access paths that may surprise you.