Case Study 2: Hierarchical Data — Organizational Chart Processing
Background
Meridian National Bank's EMPLOYEE table contains a self-referencing hierarchy. Each employee has a MANAGER_ID column that points to another employee's EMPLOYEE_ID. The CEO has a NULL MANAGER_ID. The hierarchy is typically 5-7 levels deep, from the CEO down through regional managers, branch managers, department leads, and individual contributors.
The bank's HR department needs several reports that require traversing this hierarchy:
- Full organizational path for each employee (e.g., "CEO > VP Operations > Regional Manager > Branch Manager > Teller").
- Total headcount under each manager, including indirect reports.
- Depth analysis — identify the deepest reporting chains and flag any that exceed the bank's policy maximum of 6 levels.
Flat SQL — without recursive capabilities — cannot traverse an unknown-depth hierarchy. This is where recursive CTEs come in.
Recursive CTEs in DB2
A recursive CTE consists of two parts:
- Anchor member: A SELECT that defines the starting rows (the base case). This runs once.
- Recursive member: A SELECT that references the CTE itself, joined to another table, to produce the next level. This runs repeatedly until it produces no new rows.
The general syntax:
WITH RECURSIVE cte_name (column_list) AS (
-- Anchor member
SELECT ...
FROM ...
WHERE ... -- starting condition
UNION ALL
-- Recursive member
SELECT ...
FROM cte_name -- self-reference
JOIN ... -- table with hierarchy
WHERE ... -- traversal condition
)
SELECT * FROM cte_name;
DB2 supports the RECURSIVE keyword, though in many DB2 versions it is optional — the optimizer detects the self-reference. However, including RECURSIVE makes the intent explicit and is recommended for clarity.
Report 1: Full Organizational Path
WITH RECURSIVE org_path (EMPLOYEE_ID, EMPLOYEE_NAME, MANAGER_ID,
ORG_LEVEL, PATH) AS (
-- Anchor: start with the CEO (MANAGER_ID IS NULL)
SELECT e.EMPLOYEE_ID,
e.FIRST_NAME || ' ' || e.LAST_NAME,
e.MANAGER_ID,
1,
CAST(e.FIRST_NAME || ' ' || e.LAST_NAME AS VARCHAR(2000))
FROM EMPLOYEE e
WHERE e.MANAGER_ID IS NULL
UNION ALL
-- Recursive: find direct reports of the current level
SELECT e.EMPLOYEE_ID,
e.FIRST_NAME || ' ' || e.LAST_NAME,
e.MANAGER_ID,
op.ORG_LEVEL + 1,
CAST(op.PATH || ' > ' || e.FIRST_NAME || ' ' || e.LAST_NAME
AS VARCHAR(2000))
FROM EMPLOYEE e
JOIN org_path op ON e.MANAGER_ID = op.EMPLOYEE_ID
)
SELECT EMPLOYEE_ID,
EMPLOYEE_NAME,
ORG_LEVEL,
PATH
FROM org_path
ORDER BY PATH;
How It Works
Iteration 0 (Anchor): The anchor selects the CEO — the one employee with MANAGER_ID IS NULL. ORG_LEVEL is 1. PATH is just the CEO's name.
Iteration 1 (First recursion): The recursive member joins EMPLOYEE to org_path on MANAGER_ID = EMPLOYEE_ID. This finds everyone who reports directly to the CEO. ORG_LEVEL becomes 2. PATH concatenates the CEO's name with " > " and the direct report's name.
Iteration 2: Finds everyone who reports to the CEO's direct reports. ORG_LEVEL becomes 3. PATH grows longer.
This continues until a recursion pass produces no new rows — meaning we have reached the leaf-level employees who have no reports.
Important Notes
- The CAST to VARCHAR(2000) on the PATH column is necessary because DB2 must know the column's maximum length for the recursive union. Without the CAST, the first iteration defines the column width as the CEO's name length, and subsequent concatenations could truncate.
- ORDER BY PATH produces a depth-first display of the org chart — employees are listed under their manager, which reads naturally as an indented hierarchy.
Report 2: Total Headcount Under Each Manager
WITH RECURSIVE subordinates (MANAGER_ID, SUBORDINATE_ID, DEPTH) AS (
-- Anchor: direct reports
SELECT e.MANAGER_ID,
e.EMPLOYEE_ID AS SUBORDINATE_ID,
1 AS DEPTH
FROM EMPLOYEE e
WHERE e.MANAGER_ID IS NOT NULL
UNION ALL
-- Recursive: reports of reports
SELECT s.MANAGER_ID,
e.EMPLOYEE_ID,
s.DEPTH + 1
FROM EMPLOYEE e
JOIN subordinates s ON e.MANAGER_ID = s.SUBORDINATE_ID
)
SELECT m.EMPLOYEE_ID,
m.FIRST_NAME || ' ' || m.LAST_NAME AS MANAGER_NAME,
COUNT(DISTINCT s.SUBORDINATE_ID) AS TOTAL_HEADCOUNT,
MAX(s.DEPTH) AS MAX_DEPTH
FROM EMPLOYEE m
JOIN subordinates s ON m.EMPLOYEE_ID = s.MANAGER_ID
GROUP BY m.EMPLOYEE_ID, m.FIRST_NAME, m.LAST_NAME
ORDER BY TOTAL_HEADCOUNT DESC;
Design Decision
Notice that the anchor's MANAGER_ID is the original manager. The recursive member keeps propagating upward — each time we find a deeper subordinate, we associate them back to the original manager. This means if the CEO has 3 direct reports and each has 10 reports, the CEO's row will show a TOTAL_HEADCOUNT of 33 (3 direct + 30 indirect).
The COUNT(DISTINCT s.SUBORDINATE_ID) is important because the same subordinate can appear through multiple recursion paths if the data has any irregularities. DISTINCT ensures accurate counting.
Alternative Approach: Bottom-Up Aggregation
An alternative is to traverse top-down, producing one row per (manager, subordinate) pair, then aggregating. The approach above does exactly that but anchors at the direct-report relationship and recurses downward. For very large organizations (tens of thousands of employees), the intermediate result set can be large. DB2 handles this efficiently for typical corporate hierarchies (hundreds to low thousands of employees).
Report 3: Depth Analysis and Policy Violations
WITH RECURSIVE chain (EMPLOYEE_ID, EMPLOYEE_NAME, CHAIN_DEPTH,
ROOT_MANAGER_ID) AS (
-- Anchor: start from the top
SELECT e.EMPLOYEE_ID,
e.FIRST_NAME || ' ' || e.LAST_NAME,
1,
e.EMPLOYEE_ID
FROM EMPLOYEE e
WHERE e.MANAGER_ID IS NULL
UNION ALL
SELECT e.EMPLOYEE_ID,
e.FIRST_NAME || ' ' || e.LAST_NAME,
c.CHAIN_DEPTH + 1,
c.ROOT_MANAGER_ID
FROM EMPLOYEE e
JOIN chain c ON e.MANAGER_ID = c.EMPLOYEE_ID
),
depth_violations AS (
SELECT EMPLOYEE_ID,
EMPLOYEE_NAME,
CHAIN_DEPTH
FROM chain
WHERE CHAIN_DEPTH > 6
)
SELECT dv.EMPLOYEE_ID,
dv.EMPLOYEE_NAME,
dv.CHAIN_DEPTH,
dv.CHAIN_DEPTH - 6 AS LEVELS_OVER_POLICY
FROM depth_violations dv
ORDER BY dv.CHAIN_DEPTH DESC;
This query first builds the full hierarchy with depth tracking, then a second CTE filters for employees whose reporting chain exceeds the 6-level policy maximum.
Cycle Detection and Safety
A critical concern with recursive CTEs is infinite loops. If the data contains a cycle — employee A reports to B, who reports to C, who reports to A — the recursive CTE will loop forever (or until DB2's recursion limit is hit).
DB2 provides protection through a maximum recursion depth. You can also guard against cycles explicitly:
WITH RECURSIVE org_safe (EMPLOYEE_ID, EMPLOYEE_NAME, ORG_LEVEL,
VISITED) AS (
SELECT e.EMPLOYEE_ID,
e.FIRST_NAME || ' ' || e.LAST_NAME,
1,
CAST(CHAR(e.EMPLOYEE_ID) AS VARCHAR(4000))
FROM EMPLOYEE e
WHERE e.MANAGER_ID IS NULL
UNION ALL
SELECT e.EMPLOYEE_ID,
e.FIRST_NAME || ' ' || e.LAST_NAME,
os.ORG_LEVEL + 1,
CAST(os.VISITED || ',' || CHAR(e.EMPLOYEE_ID) AS VARCHAR(4000))
FROM EMPLOYEE e
JOIN org_safe os ON e.MANAGER_ID = os.EMPLOYEE_ID
WHERE LOCATE(CHAR(e.EMPLOYEE_ID), os.VISITED) = 0
)
SELECT * FROM org_safe ORDER BY ORG_LEVEL;
The VISITED column accumulates a comma-separated list of visited EMPLOYEE_IDs. The WHERE clause in the recursive member checks that the next employee has not already been visited. If a cycle exists, the recursion for that path stops instead of looping.
This is a pragmatic safeguard. In production, you should also have a CHECK constraint or trigger that prevents cyclic manager assignments at the data level.
Performance Considerations
Recursive CTEs can be expensive for deep or wide hierarchies:
- Width: If each manager has many direct reports, the intermediate result set grows exponentially with depth. For a binary tree of depth 20, that is over a million rows. For a bank's org chart, this is rarely an issue.
- Depth: Each recursion level requires a pass through the data. DB2 uses a work table internally to store intermediate results. Deep hierarchies (20+ levels) may require tuning of sort heap and temp space.
- Indexes: An index on
EMPLOYEE(MANAGER_ID)is essential. Without it, each recursion level performs a full table scan of EMPLOYEE to find subordinates. With the index, it is an indexed lookup per parent.
For Meridian's EMPLOYEE table (estimated 500-2,000 employees, 5-7 levels deep), recursive CTEs execute in milliseconds with appropriate indexes.
Key Takeaways
-
Recursive CTEs solve hierarchical queries that flat SQL cannot. Any self-referencing table with a parent-child relationship is a candidate.
-
The anchor + recursive member pattern is mechanical. Anchor selects the root(s). Recursive member joins the CTE to the base table to find the next level.
-
CAST string columns to a sufficient length in the anchor member to prevent truncation during concatenation.
-
Guard against cycles with a visited-list check or rely on DB2's built-in recursion limits.
-
An index on the parent-key column (MANAGER_ID) is essential for recursive CTE performance.
-
Recursive CTEs are a natural extension of the CTE concept covered in this chapter — the WITH clause gains the power to express iteration, not just naming.