Chapter 5 Exercises

Data Architecture for Regulatory Compliance


Exercise 5.1: Data Quality Dimension Identification

Difficulty: Introductory

For each of the following scenarios, identify which data quality dimension(s) have failed and describe the compliance consequence.

a) A sanctions screening system checks transactions against a list that is updated weekly. A counterparty was added to the OFAC SDN list on Monday; the system's list was last updated the previous Friday.

b) Verdant Bank's core banking system stores customer names in UPPER CASE; the KYC system stores them in Title Case. When the two systems are integrated, "JOHN SMITH" and "John Smith" are not recognized as the same customer.

c) During a system migration, 8 months of historical transaction records for a specific product line were not migrated to the new platform.

d) A customer's occupation is recorded as "Business Owner" in the CRM, but the KYC system requires selection from a standardized dropdown list and the analyst chose "Self-Employed" as the closest match.

e) The same corporate client appears in the customer database as both "Acme Corp" and "Acme Corporation" because two different relationship managers onboarded the same company at different times for different products.


Exercise 5.2: Building a BCBS 239 Assessment

Difficulty: Intermediate

You are a compliance consultant assessing a regional bank's readiness for a regulatory examination that will specifically review its data governance against BCBS 239 principles.

The bank has provided the following information: - It has 12 source systems that feed data to its risk reporting function - Risk data aggregation for the quarterly Pillar 2 report takes approximately 3 weeks of manual work - There is no documented data lineage for most report fields — analysts know the process from experience - The data governance committee meets annually to review data policies - The bank has experienced three regulatory report corrections in the past 12 months due to data errors - There is no formal data owner role — IT is generally considered "responsible" for data

Assess this bank against five of the eleven BCBS 239 principles (your choice) and rate each as: Compliant / Partially Compliant / Non-Compliant. For each non-compliant or partially compliant rating, describe the remediation required.


Exercise 5.3: Designing a Customer Golden Record

Difficulty: Intermediate

Design the data model for a customer golden record at a retail bank that offers current accounts, savings, and personal loans.

a) What fields should the golden record contain? Organize them into categories (identity, KYC, risk, accounts, etc.)

b) Suppose the bank has three source systems: - Core banking (has account data, basic identity, account type) - KYC platform (has verified identity data, risk rating, beneficial ownership) - CRM (has relationship manager assignment, product history, preferences)

For each field in your golden record, identify the authoritative source system.

c) The KYC platform shows the customer's address as "123 High Street, London"; the core banking system shows "123 High St, London." Which record should win (the "survivorship rule")? What is your reasoning?

d) How would you handle a customer who has accounts at the bank under two slightly different names (e.g., "Elizabeth Johnson" on the savings account and "Liz Johnson" on the loan)? What process would confirm these are the same person?


Exercise 5.4: Data Architecture Design

Difficulty: Applied

Maya is designing Verdant Bank's compliance data architecture. The bank has: - A core banking system (vendor system, on-premise) - A KYC platform (cloud-hosted, vendor system) - An AML monitoring platform (cloud-hosted, to be procured) - A case management system (internal, on-premise) - A regulatory reporting system (cloud-hosted, vendor system) - Customer data in the UK only (no cross-border data movement required)

Design a compliance data architecture for Verdant that addresses: a) How raw data flows from source systems to compliance applications b) Where the customer golden record lives and how it is maintained c) How data lineage is tracked across the pipeline d) How GDPR requirements are met (data minimization, subject access requests, right to erasure) within this architecture e) Cloud vs. on-premise considerations given Verdant's specific situation

Draw the architecture as a diagram (text notation is fine) and describe each component.


Coding Exercise 5.5: Extend the Data Quality Assessment

Difficulty: Coding — Intermediate

Open code/example-01-data-quality.py (create this file based on the code examples in Section 5.4). Extend the assessment with two additional checks:

Check 4: Referential Integrity — Verify that every account_id in the transactions table corresponds to a valid customer_id in the customers table. (Generate synthetic data where some transactions reference non-existent customer IDs.)

Check 5: Date Logic Validation — Verify that kyc_verified_date is always before or on the account opening date. (Generate synthetic data where some KYC dates come after account opening — a logical impossibility that indicates data error.)


Research Exercise 5.6: Cloud Regulatory Guidance Review

Difficulty: Research-required

Download and review one of the following documents:

  • FCA PS23/3 (Operational Resilience and Cloud) — fca.org.uk
  • EBA Guidelines on ICT and Security Risk Management — eba.europa.eu
  • OCC Bulletin 2020-10: Third-Party Relationships — occ.gov

For your chosen document: a) Summarize the five most important requirements it imposes on financial institutions using cloud services. b) Identify two requirements that would be challenging for a small fintech (like Verdant Bank) to meet. c) For one of those challenging requirements, describe a pragmatic approach that a small firm with limited resources could take.