Case Study 1: Cross-System Data Integration at Meridian National Bank
Background
Meridian National Bank completed a merger with Pacific Coast Savings in Q3 2024. The merger brought 35,000 new customers, 62,000 accounts, and 1.4 million historical transactions into Meridian's environment. The challenge: Pacific Coast Savings ran Oracle Database, while Meridian runs IBM DB2 for LUW. The data had been extracted, transformed, and loaded into staging tables on the DB2 side, but the integration was far from done.
Rebecca Torres, the lead data engineer, faced a problem that went beyond simple data migration. The two banks had incompatible schemas, overlapping customer IDs, different account type codes, and no shared keys. Her job was to reconcile the two systems using SQL joins as the primary analytical tool before committing any data to the production tables.
The Challenge
The staging tables from Pacific Coast Savings sat alongside Meridian's production tables:
-- Pacific Coast staging tables
PCS_CUSTOMER (PCS_CUST_ID, FULL_NAME, SSN_HASH, DOB, ADDRESS, CITY, STATE, ZIP)
PCS_ACCOUNT (PCS_ACCT_ID, PCS_CUST_ID, ACCT_CODE, BALANCE, OPEN_DATE)
PCS_TRANSACTION (PCS_TXN_ID, PCS_ACCT_ID, TYPE_CODE, AMT, TXN_TIMESTAMP)
-- Meridian production tables
CUSTOMER (CUSTOMER_ID, FIRST_NAME, LAST_NAME, SSN_HASH, DATE_OF_BIRTH, ...)
ACCOUNT (ACCOUNT_ID, CUSTOMER_ID, BRANCH_ID, ACCOUNT_TYPE, BALANCE, ...)
TRANSACTION (TXN_ID, ACCOUNT_ID, TXN_TYPE, AMOUNT, TXN_DATE, ...)
Three fundamental problems needed solving:
- Customer deduplication: Some Pacific Coast customers were already Meridian customers. These needed to be matched, not duplicated.
- Account code translation: Pacific Coast used codes like 'CHK', 'SAV', 'MMA', 'TDA'. Meridian used 'CHECKING', 'SAVINGS', 'MONEY_MARKET', 'CD'.
- Orphan detection: Some Pacific Coast accounts referenced customers that did not exist in the Pacific Coast data (data quality issues from the source system).
Phase 1: Customer Matching with FULL OUTER JOIN
The first task was identifying which Pacific Coast customers already existed in Meridian's system. Both systems stored a hashed Social Security Number (SSN_HASH) and date of birth, providing a reliable matching key.
SELECT m.CUSTOMER_ID AS MERIDIAN_ID,
m.FIRST_NAME || ' ' || m.LAST_NAME AS MERIDIAN_NAME,
p.PCS_CUST_ID AS PACIFIC_ID,
p.FULL_NAME AS PACIFIC_NAME,
CASE
WHEN m.CUSTOMER_ID IS NOT NULL AND p.PCS_CUST_ID IS NOT NULL THEN 'DUPLICATE'
WHEN m.CUSTOMER_ID IS NOT NULL AND p.PCS_CUST_ID IS NULL THEN 'MERIDIAN_ONLY'
WHEN m.CUSTOMER_ID IS NULL AND p.PCS_CUST_ID IS NOT NULL THEN 'PACIFIC_ONLY'
END AS MATCH_STATUS
FROM CUSTOMER m
FULL OUTER JOIN PCS_CUSTOMER p
ON m.SSN_HASH = p.SSN_HASH
AND m.DATE_OF_BIRTH = p.DOB;
Results summary:
| MATCH_STATUS | COUNT |
|---|---|
| DUPLICATE | 4,218 |
| MERIDIAN_ONLY | 47,892 |
| PACIFIC_ONLY | 30,782 |
Over 4,200 customers existed in both systems. Without this step, the bank would have created duplicate customer records — a compliance risk and an operational headache.
Rebecca created a mapping table from these results:
CREATE TABLE CUSTOMER_MAPPING AS (
SELECT m.CUSTOMER_ID AS MERIDIAN_CUSTOMER_ID,
p.PCS_CUST_ID AS PCS_CUSTOMER_ID,
CASE
WHEN m.CUSTOMER_ID IS NOT NULL THEN m.CUSTOMER_ID
ELSE NEXT VALUE FOR CUSTOMER_ID_SEQ
END AS FINAL_CUSTOMER_ID
FROM CUSTOMER m
FULL OUTER JOIN PCS_CUSTOMER p
ON m.SSN_HASH = p.SSN_HASH
AND m.DATE_OF_BIRTH = p.DOB
) WITH DATA;
Phase 2: Account Code Translation with INNER JOIN
Pacific Coast used short codes for account types. Rebecca created a translation table and joined it in:
CREATE TABLE ACCT_TYPE_MAP (
PCS_CODE CHAR(3),
MER_TYPE VARCHAR(15)
);
INSERT INTO ACCT_TYPE_MAP VALUES
('CHK', 'CHECKING'),
('SAV', 'SAVINGS'),
('MMA', 'MONEY_MARKET'),
('TDA', 'CD');
Then she validated that every Pacific Coast account had a valid translation:
-- Find accounts with untranslatable codes
SELECT p.PCS_ACCT_ID,
p.ACCT_CODE,
p.BALANCE
FROM PCS_ACCOUNT p
LEFT OUTER JOIN ACCT_TYPE_MAP atm
ON p.ACCT_CODE = atm.PCS_CODE
WHERE atm.PCS_CODE IS NULL;
This anti-join query revealed 47 accounts with code 'IRA' — a type Pacific Coast offered but Meridian did not yet support. The business team decided to map IRA accounts to a new 'RETIREMENT' type, and the mapping table was updated.
Phase 3: Orphan Detection with Anti-Joins
Before migrating, Rebecca checked for data integrity issues in the Pacific Coast data:
-- Pacific Coast accounts with no valid customer
SELECT p_acct.PCS_ACCT_ID,
p_acct.PCS_CUST_ID,
p_acct.BALANCE
FROM PCS_ACCOUNT p_acct
LEFT OUTER JOIN PCS_CUSTOMER p_cust
ON p_acct.PCS_CUST_ID = p_cust.PCS_CUST_ID
WHERE p_cust.PCS_CUST_ID IS NULL;
Result: 23 orphaned accounts totaling $184,000 in balances. These were flagged for manual review by the compliance team.
-- Pacific Coast transactions referencing non-existent accounts
SELECT COUNT(*) AS ORPHAN_TXN_COUNT,
SUM(ABS(p_txn.AMT)) AS TOTAL_AMOUNT
FROM PCS_TRANSACTION p_txn
LEFT OUTER JOIN PCS_ACCOUNT p_acct
ON p_txn.PCS_ACCT_ID = p_acct.PCS_ACCT_ID
WHERE p_acct.PCS_ACCT_ID IS NULL;
Result: 1,247 orphaned transactions. These were archived to a quarantine table for investigation.
Phase 4: The Final Migration Query
With mappings, translations, and orphans handled, the actual data migration used a complex multi-table join:
INSERT INTO ACCOUNT (ACCOUNT_ID, CUSTOMER_ID, BRANCH_ID, ACCOUNT_TYPE, BALANCE, OPENED_DATE, STATUS)
SELECT NEXT VALUE FOR ACCOUNT_ID_SEQ,
cm.FINAL_CUSTOMER_ID,
105, -- assigned to new "Pacific Coast Integration" branch
atm.MER_TYPE,
p.BALANCE,
p.OPEN_DATE,
'ACTIVE'
FROM PCS_ACCOUNT p
INNER JOIN CUSTOMER_MAPPING cm
ON p.PCS_CUST_ID = cm.PCS_CUSTOMER_ID
INNER JOIN ACCT_TYPE_MAP atm
ON p.ACCT_CODE = atm.PCS_CODE
WHERE p.PCS_ACCT_ID NOT IN (
-- Exclude orphaned accounts
SELECT pa.PCS_ACCT_ID
FROM PCS_ACCOUNT pa
LEFT OUTER JOIN PCS_CUSTOMER pc ON pa.PCS_CUST_ID = pc.PCS_CUST_ID
WHERE pc.PCS_CUST_ID IS NULL
);
Phase 5: Post-Migration Validation
After migration, Rebecca ran a FULL OUTER JOIN between the migrated data and the source data to ensure nothing was lost or duplicated:
SELECT CASE
WHEN p.PCS_ACCT_ID IS NOT NULL AND a.ACCOUNT_ID IS NOT NULL THEN 'Migrated'
WHEN p.PCS_ACCT_ID IS NOT NULL AND a.ACCOUNT_ID IS NULL THEN 'Missing'
WHEN p.PCS_ACCT_ID IS NULL AND a.ACCOUNT_ID IS NOT NULL THEN 'Unexpected'
END AS STATUS,
COUNT(*) AS CNT
FROM PCS_ACCOUNT p
FULL OUTER JOIN (
SELECT a.ACCOUNT_ID, am.PCS_ACCT_ID
FROM ACCOUNT a
INNER JOIN ACCOUNT_MIGRATION_MAP am ON a.ACCOUNT_ID = am.NEW_ACCOUNT_ID
) a ON p.PCS_ACCT_ID = a.PCS_ACCT_ID
GROUP BY CASE
WHEN p.PCS_ACCT_ID IS NOT NULL AND a.ACCOUNT_ID IS NOT NULL THEN 'Migrated'
WHEN p.PCS_ACCT_ID IS NOT NULL AND a.ACCOUNT_ID IS NULL THEN 'Missing'
WHEN p.PCS_ACCT_ID IS NULL AND a.ACCOUNT_ID IS NOT NULL THEN 'Unexpected'
END;
Final results:
| STATUS | CNT |
|---|---|
| Migrated | 61,977 |
| Missing | 23 |
| Unexpected | 0 |
The 23 "Missing" entries were the orphaned accounts, correctly excluded by design. Zero unexpected records confirmed no phantom data was created.
Lessons Learned
-
FULL OUTER JOIN is the reconciliation workhorse. When comparing two data sources, it reveals exactly what exists on each side and what overlaps. No other join type gives you the complete picture in a single query.
-
Anti-join patterns catch data quality issues before they become production problems. The orphaned accounts and transactions would have caused foreign key violations or, worse, silently incorrect data if not caught.
-
Translation/mapping tables with INNER JOIN enforce strict validation. Any source row that does not match the mapping is excluded — if you get fewer rows than expected, the mapping table is incomplete.
-
Multi-column join conditions improve match accuracy. Matching customers on SSN_HASH alone would have produced false positives (hash collisions). Adding DATE_OF_BIRTH as a second join condition virtually eliminated this risk.
-
The order of operations matters. Rebecca built mapping tables first, validated them, then used them in the migration. Attempting to do everything in a single query would have been unmaintainable and unverifiable.
Discussion Questions
- Why did the team choose FULL OUTER JOIN instead of two separate LEFT OUTER JOIN queries (one from each direction)?
- What risks would arise if the SSN_HASH alone were used as the join key without DATE_OF_BIRTH?
- How would you modify the migration query to handle Pacific Coast customers who had accounts at multiple branches?
- What additional validation queries would you run after the transaction migration?