Appendix A: Mathematical Foundations for DB2 Practitioners
Database systems are, at their core, applied mathematics. This appendix consolidates the formal underpinnings that DB2's optimizer, storage manager, and buffer pool rely on every time you execute a query. You do not need to memorize these formulas, but understanding them will make you a sharper diagnostician when EXPLAIN output looks wrong or capacity plans fall short.
A.1 Relational Algebra
Every SQL statement you write is translated by DB2 into an internal tree of relational algebra operators before the optimizer considers access paths. The six fundamental operators follow.
A.1.1 SELECT (Restriction) --- σ
The SELECT operator filters rows based on a predicate. It does not choose columns; that is PROJECT.
σ_predicate(R)
Example. Given a relation EMPLOYEE(EMPNO, NAME, DEPT, SALARY):
σ_{DEPT='D01'}(EMPLOYEE)
Returns only those tuples where DEPT = 'D01'. In SQL this maps to the WHERE clause:
SELECT * FROM EMPLOYEE WHERE DEPT = 'D01';
Properties: - Commutative: σ_p(σ_q(R)) = σ_q(σ_p(R)) - Cascadable: σ_{p AND q}(R) = σ_p(σ_q(R)) - The optimizer exploits commutativity to push selective predicates down the plan tree, reducing intermediate result sizes early.
A.1.2 PROJECT --- π
PROJECT removes columns (attributes) from a relation, producing a result with only the specified attributes.
π_{A1, A2, ...}(R)
Example:
π_{EMPNO, NAME}(EMPLOYEE)
In SQL:
SELECT DISTINCT EMPNO, NAME FROM EMPLOYEE;
Note the DISTINCT. Relational algebra works on sets; SQL works on multisets (bags) unless you explicitly request DISTINCT. DB2's optimizer knows when duplicate elimination is necessary and when it can be skipped.
A.1.3 JOIN --- ⋈
JOIN combines tuples from two relations based on a matching condition.
Theta Join:
R ⋈_{condition} S
Equijoin (theta join where the condition uses equality):
EMPLOYEE ⋈_{EMPLOYEE.DEPT = DEPARTMENT.DEPTNO} DEPARTMENT
Natural Join (equijoin on all identically named columns, projecting away duplicates):
R ⋈ S
Join cardinality estimation (critical for the optimizer):
For an equijoin on attribute A between relations R and S:
|R ⋈ S| = (|R| * |S|) / max(V(A, R), V(A, S))
Where V(A, R) is the number of distinct values of attribute A in relation R. This formula assumes uniform distribution and attribute independence---assumptions that DB2's RUNSTATS-collected statistics refine with histograms and frequent-value data.
A.1.4 UNION
UNION combines tuples from two union-compatible relations, eliminating duplicates.
R ∪ S
Both R and S must have the same degree (number of attributes) and domain-compatible attributes. In SQL:
SELECT EMPNO, NAME FROM EMPLOYEE_NYC
UNION
SELECT EMPNO, NAME FROM EMPLOYEE_LON;
Cardinality bound: |R ∪ S| ≤ |R| + |S|
A.1.5 INTERSECT
INTERSECT returns tuples present in both relations.
R ∩ S
Cardinality bound: |R ∩ S| ≤ min(|R|, |S|)
A.1.6 DIFFERENCE
DIFFERENCE returns tuples in R that are not in S.
R - S
Cardinality bound: |R - S| ≤ |R|
In SQL this corresponds to EXCEPT:
SELECT EMPNO FROM ALL_EMPLOYEES
EXCEPT
SELECT EMPNO FROM TERMINATED_EMPLOYEES;
A.1.7 Derived Operators
Several common operators are defined in terms of the six fundamentals:
| Operator | Definition |
|---|---|
| Semi-join R ⋉ S | π_{attrs(R)}(R ⋈ S) |
| Anti-join R ▷ S | R - (R ⋉ S) |
| Division R ÷ S | Used for "for all" queries |
| Outer Join | Extension that preserves non-matching tuples |
DB2's optimizer internally uses semi-join and anti-join transformations heavily, especially when rewriting EXISTS and NOT EXISTS subqueries.
A.2 Set Theory Basics for SQL
SQL operates on multisets (bags), not pure sets, unless DISTINCT is specified. This distinction matters for aggregation and for understanding why UNION ALL is cheaper than UNION.
Key multiset operations:
| Operation | Set Semantics | Bag Semantics (SQL default) |
|---|---|---|
| UNION | Eliminates duplicates | UNION ALL preserves them |
| INTERSECT | Eliminates duplicates | INTERSECT ALL preserves min count |
| EXCEPT | Eliminates duplicates | EXCEPT ALL subtracts counts |
NULL handling in set operations. Two NULLs are considered duplicates for purposes of UNION, INTERSECT, and EXCEPT. This differs from the general SQL rule that NULL = NULL evaluates to UNKNOWN, and it catches practitioners off guard during data reconciliation work.
Three-valued logic. SQL uses three truth values: TRUE, FALSE, and UNKNOWN. Any comparison involving NULL yields UNKNOWN. The WHERE clause keeps only rows where the predicate evaluates to TRUE.
| p | q | p AND q | p OR q | NOT p |
|---|---|---|---|---|
| T | T | T | T | F |
| T | U | U | T | F |
| T | F | F | T | F |
| U | U | U | U | U |
| U | F | F | U | U |
| F | F | F | F | T |
A.3 Cost-Based Optimizer Math
DB2's optimizer is cost-based on both z/OS and LUW. It estimates the cost of each candidate access path in terms of I/O, CPU, and (on z/OS) communications cost. The core inputs are statistics gathered by RUNSTATS.
A.3.1 Filter Factors
A filter factor (FF) is the estimated fraction of rows that satisfy a predicate. It ranges from 0.0 to 1.0.
Equal predicate on a column with uniform distribution:
FF(col = value) = 1 / COLCARDF
Where COLCARDF is the column cardinality (number of distinct values) stored in SYSCAT.COLUMNS (LUW) or SYSIBM.SYSCOLUMNS (z/OS).
Range predicates:
FF(col > value) = (HIGH2KEY - value) / (HIGH2KEY - LOW2KEY)
FF(col BETWEEN v1 AND v2) = (v2 - v1) / (HIGH2KEY - LOW2KEY)
These assume uniform distribution between LOW2KEY and HIGH2KEY.
IN list:
FF(col IN (v1, v2, ..., vn)) = n / COLCARDF
Capped at 1.0.
LIKE predicate:
FF(col LIKE 'prefix%') ≈ 1 / COLCARDF (if prefix is selective)
FF(col LIKE '%substring%') ≈ default filter factor (typically 1/10 to 1/20)
DB2 uses a default filter factor when it cannot estimate selectivity. On z/OS, the classic default is 1/25 for equal predicates on unindexed columns without statistics.
Compound predicates:
FF(p1 AND p2) = FF(p1) * FF(p2) -- assumes independence
FF(p1 OR p2) = FF(p1) + FF(p2) - FF(p1) * FF(p2)
FF(NOT p) = 1 - FF(p)
The independence assumption is often wrong in practice. DB2 on z/OS v12 and Db2 LUW v11.5 both support multi-column statistics (column groups) that capture correlations, improving estimates when columns are dependent.
A.3.2 Cardinality Estimation
The estimated number of qualifying rows after applying predicates:
Estimated_Rows = CARD(table) * FF(predicate)
Where CARD(table) is the table cardinality from catalog statistics.
For a join between R and S on column A:
Estimated_Join_Rows = CARD(R) * CARD(S) * FF(join_predicate)
FF(R.A = S.A) = 1 / max(COLCARDF(A, R), COLCARDF(A, S))
Cascading joins compound the estimation error. If each single-table estimate is off by a factor of 2, a five-way join estimate can be off by 2^5 = 32x. This is why complex queries sometimes get catastrophically bad plans---and why keeping statistics current with RUNSTATS is non-negotiable.
A.3.3 Cost Formulas (Simplified)
Table scan cost:
Cost_tablescan = Pages(T) * IO_cost_per_page + CARD(T) * CPU_cost_per_row
Index scan cost (matching index):
Cost_indexscan = Levels(I) * IO_cost_per_page -- index traversal
+ MatchingPages(I) * IO_cost_per_page -- leaf page scan
+ QualifyingRows * IO_cost_per_page -- data page fetch (worst case)
+ QualifyingRows * CPU_cost_per_row -- row processing
In practice, buffer pool caching dramatically reduces the effective I/O cost term, which is where buffer pool hit ratio enters the equation.
A.4 B+ Tree Height Calculations
DB2 indexes are B+ trees. Understanding the height tells you the maximum number of I/Os needed to traverse from root to leaf.
A.4.1 Basic Formula
Height = ⌈log_F(N)⌉ + 1
Where: - F = fanout (number of entries per index page) - N = number of leaf pages (or equivalently, number of index entries) - The "+1" accounts for the leaf level itself
A.4.2 Computing Fanout
F = floor(PageSize / (KeySize + RID_Size + Overhead))
Typical values: - PageSize: 4096 bytes (4 KB, the z/OS default) or larger - RID_Size: 4 bytes (z/OS traditional) or 6 bytes (LUW with large RIDs) - Overhead: approximately 10-12 bytes per entry for pointers and headers
Example. For a 4 KB page, an 8-byte integer key, and 4-byte RID with 10-byte overhead:
F = floor(4096 / (8 + 4 + 10)) = floor(4096 / 22) = 186
A.4.3 Height Examples
| Rows | Fanout | Leaf Pages | Height | Max I/O to Leaf |
|---|---|---|---|---|
| 1,000 | 186 | 6 | 1 (root is leaf) | 1 |
| 100,000 | 186 | 538 | 2 | 2 |
| 10,000,000 | 186 | 53,764 | 3 | 3 |
| 1,000,000,000 | 186 | 5,376,344 | 4 | 4 |
The practical takeaway: even a billion-row table needs only four I/Os (at most) to traverse from root to leaf. Since the root page and often the first non-leaf level are cached in the buffer pool, the effective cost is usually two physical I/Os---one for the lower non-leaf page and one for the leaf page.
A.5 Buffer Pool Hit Ratio Math
The buffer pool is DB2's in-memory cache for data and index pages. The hit ratio measures how often a page request is satisfied from memory without physical I/O.
A.5.1 Basic Hit Ratio
Hit_Ratio = (Logical_Reads - Physical_Reads) / Logical_Reads * 100
Or equivalently:
Hit_Ratio = (1 - Physical_Reads / Logical_Reads) * 100
Target values: - Data pages: > 80% for OLTP workloads; 95%+ is achievable for well-tuned systems - Index pages: > 95% (index pages are reused heavily and should almost always be in cache)
A.5.2 Marginal Benefit of Adding Memory
The relationship between buffer pool size and hit ratio is not linear. It follows a curve often modeled as:
Hit_Ratio ≈ 1 - (Working_Set_Pages / Buffer_Pool_Pages)^α
Where α depends on the access pattern. For random access, α ≈ 1. For patterns with temporal locality, α > 1, meaning the hit ratio improves faster.
Practical rule of thumb: Doubling the buffer pool size from 50% to 100% of the working set yields diminishing returns. The biggest gains come from ensuring the buffer pool is large enough to hold the "hot" set of frequently accessed pages.
A.5.3 z/OS-Specific: GETPAGE and Read Counts
On z/OS, you monitor hit ratio using:
-DISPLAY BUFFERPOOL(BP0) DETAIL
Key fields: - GETPAGE: Total logical page requests - SYNCHRONOUS READ: Physical reads (synchronous I/O) - PREFETCH READ: Physical reads via sequential or list prefetch
Hit_Ratio = (GETPAGE - SYNC_READ - PREFETCH_READ) / GETPAGE * 100
Some practitioners exclude prefetch reads from the denominator since prefetch is expected to cause physical I/O. The formula above is the conservative (honest) version.
A.5.4 LUW: BPHR from Snapshot or MON_GET
SELECT BP_NAME,
POOL_DATA_L_READS,
POOL_DATA_P_READS,
CASE WHEN POOL_DATA_L_READS > 0
THEN DECIMAL(1.0 - (FLOAT(POOL_DATA_P_READS) / POOL_DATA_L_READS), 5, 3) * 100
ELSE 0
END AS DATA_HIT_RATIO
FROM TABLE(MON_GET_BUFFERPOOL(NULL, -2)) AS T;
A.6 Capacity Planning Formulas
A.6.1 Storage Sizing
Table space size estimation:
Row_Size = Σ(column_sizes) + overhead_per_row
Rows_Per_Page = floor((PageSize - Page_Overhead) / (Row_Size + Row_Overhead))
Pages_Needed = ceil(Total_Rows / Rows_Per_Page)
Table_Space_Size = Pages_Needed * PageSize
Typical overhead values: - z/OS page overhead: 22 bytes (4 KB page) - z/OS row overhead: 6 bytes (non-LOB) - LUW page overhead: 91 bytes (4 KB page) - LUW row overhead: 8-10 bytes depending on null indicators
Include free space:
Effective_Rows_Per_Page = Rows_Per_Page * (1 - PCTFREE / 100)
Adjusted_Pages = ceil(Total_Rows / Effective_Rows_Per_Page)
Index size estimation:
Index_Entry_Size = KeySize + RID_Size + Entry_Overhead
Entries_Per_Leaf = floor((PageSize - Leaf_Overhead) / Index_Entry_Size)
Leaf_Pages = ceil(Total_Rows / Entries_Per_Leaf)
Total_Index_Pages ≈ Leaf_Pages * (1 + 1/Fanout + 1/Fanout^2 + ...) ≈ Leaf_Pages * Fanout / (Fanout - 1)
For high fanout (typical), the non-leaf overhead is less than 1% of the leaf level.
A.6.2 IOPS Estimation
OLTP workload:
IOPS = TPS * Avg_IO_Per_Transaction * (1 - Hit_Ratio)
Where: - TPS = transactions per second - Avg_IO_Per_Transaction = average logical I/Os per transaction (from accounting or monitoring) - Hit_Ratio = buffer pool hit ratio as a decimal
Example. 5,000 TPS, 20 logical I/Os per transaction, 95% hit ratio:
IOPS = 5000 * 20 * (1 - 0.95) = 5000 * 20 * 0.05 = 5,000
You need a storage subsystem capable of sustaining 5,000 random IOPS---or roughly 2-3 modern SSDs.
A.6.3 Log Space Estimation
Log_Rate = TPS * Avg_Log_Bytes_Per_Transaction
Daily_Log_Volume = Log_Rate * Seconds_Per_Day
Example. 2,000 TPS, 500 bytes average log per transaction:
Log_Rate = 2000 * 500 = 1,000,000 bytes/sec = ~1 MB/sec
Daily_Log_Volume = 1 MB/sec * 86,400 = ~84 GB/day
On z/OS, this determines how many active and archive log data sets you need. On LUW, this determines log file size and number of primary/secondary logs.
A.6.4 Backup Window Estimation
Backup_Time = Database_Size / Backup_Throughput + Overhead
Typical backup throughput with modern storage: - Disk-to-disk: 100-500 MB/sec per stream - Disk-to-tape (z/OS): 50-150 MB/sec per tape unit - Parallel streams multiply throughput linearly (with diminishing returns above 8-12 streams)
Example. 2 TB database, 4 parallel streams at 200 MB/sec each:
Backup_Time = 2,048,000 MB / (4 * 200 MB/sec) = 2,560 sec ≈ 43 minutes
Add 10-20% for catalog operations, compression overhead, and log archival.
A.6.5 Recovery Time Estimation
Recovery time is harder to predict and depends on the recovery strategy:
Recovery_Time = Restore_Time + Log_Apply_Time + Validation
Log_Apply_Time = Log_Volume_Since_Backup / Log_Apply_Rate
Log apply rates are typically 2-5x slower than log generation rate due to random I/O patterns during redo processing. z/OS system-level recovery (using RECOVER TABLESPACE) benefits from parallel log apply. LUW's HADR standby replays logs continuously, so failover time is measured in seconds rather than hours.
A.7 Summary of Key Formulas
| Formula | Purpose |
|---|---|
| FF = 1/COLCARDF | Equal predicate filter factor |
| Est_Rows = CARD * FF | Cardinality estimation |
| Height = ⌈log_F(N)⌉ + 1 | B+ tree levels |
| HR = (LR - PR) / LR | Buffer pool hit ratio |
| IOPS = TPS * IO/Txn * (1-HR) | Physical I/O demand |
| Pages = ceil(Rows / RPP) | Table storage sizing |
| Log/day = TPS * bytes/txn * 86400 | Log volume planning |
These formulas are approximations. Real systems exhibit non-uniform distributions, correlated columns, skewed access patterns, and caching effects that complicate the math. But these approximations are what DB2's optimizer starts with, and understanding them is the first step to understanding why the optimizer chose the plan it chose---and what to do when it chose wrong.
A.8 Join Cost Models
Understanding the cost models for each join method helps you interpret EXPLAIN output and predict when the optimizer will switch strategies.
A.8.1 Nested-Loop Join (NLJOIN)
For each row of the outer table, read the matching rows from the inner table (typically via an index).
Cost_NLJ = CARD(outer) * (Index_Probe_Cost + Matching_Rows * Data_Page_Cost)
Where:
- Index_Probe_Cost = number of index levels traversed per probe (typically 2-3 I/Os, often cached to 0-1)
- Matching_Rows = average matching rows in inner table per outer row
- Data_Page_Cost = cost to fetch data pages (reduced by buffer pool caching)
Nested-loop join is optimal when the outer table is small and the inner table has a selective index. It is the only join method that supports non-equijoin predicates efficiently.
A.8.2 Hash Join (HSJOIN)
Build a hash table from the smaller (build) input, then probe it with each row of the larger (probe) input.
Cost_HJ = Pages(build) + Pages(probe) + CPU_hash_cost
+ Spill_Cost (if hash table exceeds available memory)
Hash join is optimal for large equijoins where both tables are large and no useful index exists. It requires the build input to fit in memory (the sort heap); if it does not, DB2 spills partitions to disk, significantly increasing cost.
A.8.3 Merge Join (MSJOIN)
Both inputs are sorted on the join key, then merged in a single pass.
Cost_MJ = Sort_Cost(R) + Sort_Cost(S) + Pages(R) + Pages(S)
If one or both inputs are already sorted (e.g., delivered by an index scan in key order), the sort cost drops to zero for that input, making merge join very efficient. DB2's optimizer considers this "sort avoidance" when evaluating merge join.
A.8.4 Join Method Selection Rules of Thumb
| Condition | Likely Best Join |
|---|---|
| Small outer, indexed inner | Nested-loop |
| Large tables, no useful indexes | Hash join |
| Both inputs pre-sorted on join key | Merge join |
| Non-equijoin (>, <, BETWEEN) | Nested-loop |
| Memory-constrained environment | Merge join (lower memory than hash) |
A.9 Compression Ratio Estimation
DB2 supports row-level compression (LUW and z/OS) and columnar compression (BLU Acceleration on LUW). Estimating compression ratios helps with capacity planning.
A.9.1 Row Compression
Row compression uses a dictionary built from repeating patterns in the data. Typical compression ratios:
Compressed_Size = Original_Size * (1 - Compression_Ratio)
Typical ratios: - Tables with many repeated values (status codes, dates, short strings): 60-80% reduction - Tables with unique or high-entropy data (GUIDs, hashes): 10-20% reduction - Average across a typical OLTP database: 40-60% reduction
On LUW, estimate compression before enabling:
-- Estimate compression savings for a table
CALL SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('MYSCHEMA', 'MYTABLE', 'ESTIMATE', ?, ?, ?, ?);
A.9.2 Columnar Compression (BLU)
Column-organized tables typically achieve 3-10x compression ratios compared to uncompressed row-organized tables, because: - Column-homogeneous data compresses better than mixed-type rows - Sorted columns exhibit long runs of identical values - Frequency encoding replaces common values with short codes - Dictionary encoding eliminates redundant string storage
A.10 Little's Law and Its Application to DB2
Little's Law is a fundamental queueing theory result that relates throughput, concurrency, and response time:
L = λ * W
Where: - L = average number of concurrent requests (threads in DB2) - λ = throughput (requests per second) - W = average response time (seconds per request)
Application to DB2 thread sizing:
If your target throughput is 2,000 TPS and average transaction response time is 10 ms:
L = 2000 * 0.010 = 20 concurrent threads
You need at least 20 concurrent database connections to sustain this throughput. In practice, add headroom for variability: 2-3x the calculated value.
Application to connection pool sizing:
Pool_Size = Target_TPS * Avg_Response_Time_Seconds * Safety_Factor
For 5,000 TPS, 15 ms average response time, and a 2x safety factor:
Pool_Size = 5000 * 0.015 * 2 = 150 connections
Little's Law explains why adding connections beyond the saturation point does not increase throughput---it only increases response time and resource contention.