Case Study 5.1: The Data Migration That Wasn't — Rafael's Painful Education

The Situation

Organization: Meridian Capital (fictional) Context: Mid-implementation crisis during the AML platform replacement Problem: Data quality issues discovered during migration that were not visible before the project started Time: Month 5 of an 8-month implementation


Background

By the time Rafael Torres signed the contract with the new AML vendor (an integrated monitoring and case management platform replacing the 8-year-old rules-based system), he had done significant due diligence. He had tested the vendor's model against historical Meridian data. He had visited three reference clients. He had negotiated contractual protections around implementation timeline and data migration support.

What he had not done — and what he would later describe as his most significant oversight — was conduct a comprehensive assessment of Meridian's data quality before committing to the implementation timeline.


Month 5: The Discovery

The original implementation plan called for a full data migration in Month 3: all five years of historical transaction data, all customer records, and all case histories from the legacy system would be migrated to the new platform. This historical data was necessary for the new ML model to be trained on Meridian-specific transaction patterns.

In Month 3, the migration began. By Month 5, it was still not complete. The data engineering team had encountered a cascade of data quality problems that the pre-migration assessment had not identified:

Problem 1: Inconsistent customer identifiers Meridian's core banking system used one customer ID format. The legacy AML platform used a different one. The mapping between them was maintained in a spreadsheet by one analyst who had retired in 2022. The spreadsheet was found — but had not been updated since 2021, meaning approximately 8,400 customers added after 2021 did not have a mapping.

Problem 2: Missing transaction data from the acquired retail broker-dealer When Meridian acquired a retail brokerage five years earlier, the transaction data from the acquired firm's systems had been technically migrated to Meridian's core banking system — but with a different account numbering convention that had never been fully reconciled. Approximately 18 months of historical transaction data for the acquired firm's customers existed in a format that the new AML platform could not consume without transformation.

Problem 3: Corrupted case history data The legacy AML system's case history database had been migrated to a new server in 2020 as part of an unrelated IT project. During that migration, approximately 2,300 case records had been partially corrupted — readable but with key fields missing or truncated. These records could not be migrated to the new platform without manual remediation.

Problem 4: Reference data inconsistencies The legacy system had been configured with a country risk classification scheme that was no longer aligned with Meridian's current risk framework. The new platform used a different classification. Approximately 40 monitoring scenarios needed to be reconfigured to account for the reclassification.


The Impact

The data problems pushed the implementation timeline from 8 months to 14 months — a 75% overrun. They also required: - Three additional data engineers hired from the vendor's professional services team (additional cost: $280,000) - Manual remediation of the 2,300 corrupted case records by the compliance team (estimated 600 analyst-hours) - A parallel operation period of 4 months (both old and new systems running simultaneously) to ensure continuity - A delay in the ML model training, which had to begin with a smaller historical dataset than planned and be retrained once the full migration was complete

Total cost overrun: approximately $480,000.

The regulatory implications were manageable — Rafael disclosed the implementation delay to the FCA (Meridian's European business was FCA-regulated) proactively, and the FCA noted it without formal finding. But the business case Rafael had presented to the board had assumed 8 months; 14 months was a significant reputational cost internally.


What Rafael Learned

When Priya's firm asked Rafael to speak at a client event about "lessons from large-scale RegTech implementations," his preparation notes became a memo within Meridian that every compliance technology project was required to read.

The key lessons:

Lesson 1: Data assessment is not a nice-to-have pre-implementation step. It is the critical path. "We spent four weeks assessing the vendor's technology. We spent two days assessing our data. That was exactly backwards. The technology worked. Our data didn't."

Lesson 2: The 'mapping spreadsheet' risk is real. Institutions routinely maintain critical data mappings in spreadsheets maintained by one person. When that person leaves, the knowledge leaves. Systematic data lineage documentation is not a BCBS 239 compliance exercise — it is business continuity.

Lesson 3: M&A data debt is real and accumulates. "Every acquisition creates data debt. When we acquired the retail broker-dealer, we accepted the IT team's assurance that the data migration was complete. It was complete in the sense that no records were lost. It was not complete in the sense that the data was fit for compliance analytics use."

Lesson 4: Run a full data quality assessment before signing an implementation contract. "A pre-implementation data assessment would have cost us roughly $80,000 and identified all four data quality problems before we were contractually committed to an 8-month timeline. Instead, we discovered them in Month 3 at a cost of $480,000 and six months of timeline."


The Process Rafael Put in Place

After the implementation, Rafael mandated a Data Assessment Protocol for all compliance technology projects:

Phase 1 (pre-RFP): Inventory all data sources that will feed the new system. Document source systems, data formats, data owners, and refresh frequencies.

Phase 2 (pre-contract): Conduct a data quality assessment on a representative sample (at least 10%) of the historical data that will be migrated. Measure all six quality dimensions. Document findings in a risk register.

Phase 3 (contract): Ensure the implementation contract includes: - An allowance for data remediation (time and budget) - A data migration acceptance criterion (definition of "migration complete") - A parallel operation period sufficient to validate the migration

Phase 4 (implementation): Implement data quality monitoring on the migration pipeline — automated checks that flag data quality failures during migration rather than discovering them when the migration is "complete."


Discussion Questions

1. Rafael describes the decision to spend four weeks assessing vendor technology and two days assessing data quality as "exactly backwards." What organizational pressures might explain this allocation of assessment effort? How would you design an assessment process that resists these pressures?

2. The "mapping spreadsheet" problem — critical business knowledge maintained in a spreadsheet by one person — is extremely common in financial institutions. What governance processes would prevent this situation from arising in the first place?

3. Rafael frames the M&A data problem as "data debt." What is the appropriate time to address M&A data debt: immediately after acquisition, or when a specific need (like a compliance platform migration) makes it necessary? What factors determine the right answer?

4. The data assessment protocol Rafael put in place has four phases. What additional governance would you add to this protocol to ensure it is followed consistently across projects (not just on the projects where Rafael is directly involved)?

5. Rafael disclosed the implementation delay to the FCA proactively. What is the regulatory basis for this disclosure? Are there circumstances where an implementation delay would not require regulatory notification?