Case Study 12.1: Maya's Technology Risk Remediation — Verdant Bank's Operational Resilience Journey
The Situation
Organization: Verdant Bank (fictional UK challenger bank) Challenge: Following the KYC and AML program transformations of 2021–2022, Maya Osei turns her attention to the operational and technology risk gaps that have emerged as Verdant's systems complexity has grown Timeline: Q1 2023 — Q4 2023 Regulatory backdrop: FCA and PRA operational resilience requirements, with DORA creating new EU-linked obligations through Verdant's planned EU expansion
Background: Growth Created Risk
By early 2023, Verdant Bank had grown significantly since Maya joined. Customer count: 340,000 (from 110,000 at founding). The technology stack that supported this growth:
- Core banking platform: third-party SaaS (Finacle, licensed 2020)
- KYC automation: third-party SaaS (vendor implemented 2021)
- AML transaction monitoring: third-party SaaS with proprietary ML layer (implemented 2022)
- Sanctions screening: third-party SaaS (implemented 2021)
- Regulatory reporting: in-house (Excel/Power BI pipeline, built 2020)
- Customer-facing mobile and web banking: in-house development + AWS infrastructure
- Data warehouse: AWS Redshift
The technology profile was typical for a UK challenger bank: heavy reliance on third parties for regulated functions; significant AWS dependency; limited operational risk management maturity.
The trigger for Maya's technology risk program was an FCA letter in January 2023 requesting Verdant to provide its operational resilience self-assessment — specifically: its identification of "important business services," its "impact tolerances" for each, and the results of testing whether those tolerances could be maintained. This was the FCA's post-PS21/3 supervisory implementation: testing whether regulated firms had actually implemented operational resilience requirements, not just documented them.
Maya's honest assessment: Verdant's operational resilience documentation was aspirational. The actual testing had not been done.
Phase 1: Identifying Important Business Services
The FCA's operational resilience framework requires firms to identify their "important business services" — services where disruption would cause harm to customers or market integrity.
Maya convened a working group: CTO, Head of Operations, CCO (herself), Chief Risk Officer, Head of Customer. The group used a structured methodology: for each business service Verdant provided, evaluate: (1) impact on customers if disrupted; (2) impact on market integrity if disrupted; (3) financial impact on Verdant.
Important Business Services identified:
-
Payment processing — the ability to process customer payments (debits, BACS, Faster Payments, international wires). Disruption impact: high (customers cannot pay bills, access funds).
-
Customer account access — the ability for customers to access their accounts (mobile app, web banking). Disruption impact: high (customers cannot check balances, conduct transactions).
-
Customer onboarding — the ability to onboard new customers through the KYC process. Disruption impact: medium (new customers cannot be served; existing customers unaffected).
-
AML monitoring and reporting — the ability to generate, review, and file SAR reports. Disruption impact: medium regulatory (failed SARs are a compliance risk, not immediate customer harm).
-
Regulatory reporting — the ability to submit required regulatory returns to the FCA/PRA. Disruption impact: medium regulatory.
Phase 2: Setting Impact Tolerances
For each important business service, Verdant set an "impact tolerance" — the maximum level of disruption, measured in time, that the firm was prepared to accept.
| Important Business Service | Impact Tolerance | Basis |
|---|---|---|
| Payment processing | 4 hours maximum outage | Regulatory expectation; significant customer harm beyond 4 hours |
| Customer account access | 4 hours maximum outage | Consistent with payment processing |
| Customer onboarding | 48 hours maximum outage | New customer harm; existing customers unaffected |
| AML monitoring and reporting | 24 hours maximum outage | SAR deadline implications; regulatory expectation |
| Regulatory reporting | 24 hours maximum outage | Filing deadline management |
The impact tolerances were calibrated against regulatory expectations, customer harm assessment, and benchmarks from the FCA's consultation papers.
Phase 3: Testing — What Verdant Discovered
The impact tolerances were set. Now Verdant had to test whether they could actually meet them.
Test 1: Payment processing
Simulation: Core banking platform becomes unavailable. Can Verdant process payments within 4 hours?
Discovery: The core banking platform SLA guaranteed 99.5% uptime — approximately 43 hours of allowed downtime per year. The SLA did not guarantee recovery within 4 hours for any individual outage. In a worst-case scenario under the SLA, an outage could persist for up to 72 hours before the vendor was in breach.
Gap: Verdant's 4-hour impact tolerance was not contractually supported by the vendor SLA. Resolution: negotiate contractual enhancement (4-hour RTO guarantee) or develop manual processing fallback.
Test 2: Customer account access
Simulation: AWS Redshift (data warehouse) experiences outage. Can customer account access be maintained?
Discovery: Customer-facing systems (mobile app) depended on Redshift for real-time balance queries. An AWS Redshift outage would render the mobile app unable to display account balances. The web banking fallback relied on the same data source.
Gap: Both primary and secondary customer access channels shared a single point of failure (Redshift). Resolution: implement a read-replica cache that can serve balance data for up to 4 hours without refreshing from Redshift.
Test 3: AML monitoring
Simulation: Third-party AML SaaS vendor is unavailable. Can SAR reporting continue within 24 hours?
Discovery: Verdant had no documented manual processing fallback for AML monitoring. If the vendor's system was unavailable, the entire alert generation process halted. Analysts had no access to transaction data independently of the vendor platform.
Gap: No fallback for AML monitoring. Resolution: implement a read access mechanism for transaction data directly from the data warehouse, enabling manual alert generation if the vendor system is unavailable; document a manual review protocol.
Phase 4: Third-Party Contract Review
The testing process revealed that Verdant's third-party contracts — particularly for the core banking platform and AML monitoring — did not adequately support the impact tolerances.
Maya commissioned a contract review for all five third-party providers supporting important business services. The review revealed: - None of the five contracts included explicit RTO (Recovery Time Objective) or RPO (Recovery Point Objective) commitments - Three of the five contracts had no audit rights provision - None of the five contracts had an exit assistance clause (requiring the vendor to support transition to an alternative provider) - The AML monitoring vendor contract did not require notification to Verdant within a specific time window upon a service interruption
Renegotiation outcomes (six months of negotiation): - Core banking platform: 4-hour RTO SLA for critical incidents; audit rights added; exit assistance clause negotiated - AML monitoring vendor: 2-hour incident notification; 8-hour RTO SLA; audit rights added; exit assistance clause deferred to next renewal - KYC automation: audit rights added; exit assistance clause added; no RTO commitment obtained (vendor declined) - Sanctions screening: 4-hour RTO; full contract re-paper (existing contract was 2019, pre-DORA) - Regulatory reporting: in-house — no third party
Phase 5: Model Risk Gap
During the operational resilience review, Maya identified a related gap: Verdant's ML-based transaction monitoring model — deployed in September 2022 — had never been formally validated.
The model was vendor-supplied but had been customized with Verdant-specific training data. Under SR 11-7 equivalent principles (applicable in the UK through FCA model governance guidance), Verdant should have: 1. Documented the model in its model inventory 2. Conducted or commissioned an independent validation of the model before deployment 3. Established ongoing performance monitoring
None of these steps had been completed. The model inventory did not exist. Validation had not been conducted.
Maya commissioned a model validation by an external model risk firm. Key validation findings: - The model was conceptually sound and implemented correctly - Training data: the positive class (confirmed SAR cases) was very small (67 cases) — the validator flagged this as a limitation but did not recommend withdrawal - Ongoing monitoring: no monitoring metrics were in place to detect model performance degradation - Recommendation: implement four monitoring metrics (precision, recall, AUC-ROC on a labeled test set) measured monthly; re-validate annually
The model validation was completed in 8 weeks and the model inventory was populated. Monthly monitoring reporting was automated through the data warehouse pipeline.
Results: FCA Response
In December 2023, Maya submitted Verdant's updated operational resilience self-assessment to the FCA. Key changes from the January request:
| Dimension | January 2023 | December 2023 |
|---|---|---|
| Important business services identified | No formal list | 5 documented with rationale |
| Impact tolerances set | None | Set for all 5 services |
| Testing conducted | None | All 5 services tested; gaps documented |
| Third-party contracts reviewed | Not reviewed | All 5 reviewed; gaps remediated |
| Model inventory | Non-existent | Created; ML model validated |
| Incident reporting to FCA | 2 missed (discovered) | Process in place for future incidents |
The FCA's response: "Verdant Bank's operational resilience self-assessment demonstrates meaningful progress from January 2023. The identification of important business services, setting of impact tolerances, and testing program are consistent with the expectations set out in PS21/3. The remaining gap in the KYC automation vendor contract (lack of RTO commitment) should be addressed at the next contract renewal. The model validation and inventory program represents good practice."
No enforcement action. One recommendation (KYC vendor RTO) to address at renewal.
Discussion Questions
1. The core banking platform SLA allowed up to 72 hours of downtime before vendor breach, while Verdant's impact tolerance was 4 hours. How common is this gap between vendor SLAs and institution impact tolerances in the industry? What negotiating leverage does a challenger bank have compared to a major bank?
2. The ML-based transaction monitoring model had not been validated before deployment. Who in Verdant's governance structure should have caught this gap before the model was put into production? What governance controls should prevent similar gaps in the future?
3. The manual processing fallback for AML monitoring — enabling alert generation directly from the data warehouse if the vendor is unavailable — requires maintaining documentation and training. Estimate the ongoing operational cost of maintaining this fallback capability, and assess whether this cost is justified by the compliance benefit.
4. Maya's operational resilience program took approximately 12 months from FCA letter to updated submission. For a larger institution, would this timeline be achievable? What factors would extend or compress the timeline?
5. The FCA's December 2023 response noted the KYC automation vendor's lack of RTO commitment as a remaining gap. From a vendor management perspective, what options does Verdant have if the KYC vendor refuses to include RTO commitments even at next renewal?