Chapter 4 Exercises

DataField.Dev

Chapter 4 Exercises

Technology Foundations: AI, ML, NLP, and Automation in Compliance

Exercise 4.1: Technology-to-Problem Matching

Difficulty: Introductory

Match each compliance problem to the most appropriate technology approach. Multiple technologies may be relevant; identify the primary approach.

Compliance Problem	Technology Options
a) Verifying that a passport photograph matches a selfie submitted during digital onboarding	Rules-based; Computer Vision/ML; NLP; Graph Analytics; RPA
b) Identifying new regulatory requirements in a 150-page consultation paper	Rules-based; Computer Vision/ML; NLP; Graph Analytics; RPA
c) Populating a Basel IV capital report template with data from 12 different source systems	Rules-based; Computer Vision/ML; NLP; Graph Analytics; RPA
d) Detecting that three customer accounts with no obvious connection are all controlled by the same person	Rules-based; Computer Vision/ML; NLP; Graph Analytics; RPA
e) Identifying transactions that are structurally unusual compared to the customer's historical behavior	Rules-based; Computer Vision/ML; NLP; Graph Analytics; RPA
f) Automatically downloading updated OFAC sanctions lists daily and loading them into the screening system	Rules-based; Computer Vision/ML; NLP; Graph Analytics; RPA
g) Determining whether a customer's business description on their onboarding form is consistent with their actual transaction patterns	Rules-based; Computer Vision/ML; NLP; Graph Analytics; RPA

Exercise 4.2: Precision and Recall Calculation

Difficulty: Introductory-Intermediate

A transaction monitoring system processed 200,000 transactions last month. Review of the alerts produced the following results:

True Positives (genuinely suspicious, correctly flagged): 450
False Positives (legitimate, incorrectly flagged): 4,050
True Negatives (legitimate, correctly cleared): 195,200
False Negatives (genuinely suspicious, not flagged): 300

Calculate: a) Precision b) Recall c) False Positive Rate (% of legitimate transactions that were incorrectly flagged) d) False Negative Rate (% of suspicious transactions that were missed) e) F1 Score f) What does the false negative rate tell you about the regulatory risk this system creates?

Exercise 4.3: The Threshold Decision

Difficulty: Intermediate

Rafael is reviewing the performance of Meridian Capital's upgraded AML model. The data science team presents the following options for the alert threshold:

Threshold	Daily Alerts	False Positive Rate	Suspicious Activity Caught
0.30	1,200	90%	97%
0.50	600	82%	91%
0.65	350	74%	85%
0.80	180	60%	72%

Meridian has a team of 8 analysts who can review approximately 75 alerts per day each.

a) At which threshold(s) does the daily alert volume exceed the team's review capacity? b) At which threshold does the analyst team have the highest ratio of true positives to review capacity? Why does this matter? c) The data science team recommends the 0.65 threshold as the "optimal" F1 score. But 15% of suspicious activity is missed at this threshold. Write the compliance rationale for choosing either a higher or lower threshold. d) What process governance should surround the threshold decision? Who needs to sign off, and what needs to be documented?

Exercise 4.4: Graph Analytics Scenario

Difficulty: Intermediate

You are analyzing the following transaction network for AML red flags:

Account A receives $50,000 from an external source
Account A sends $24,500 to Account B and $24,500 to Account C
Account B sends $24,000 to Account D
Account C sends $24,000 to Account D
Account D sends $47,500 to an external beneficiary

a) Draw this transaction network as a graph (you can use text notation: A→B with amount). b) Identify the AML typology this pattern most resembles. c) What information would you want to investigate further about the accounts involved? d) Would this pattern be detected by a rules-based system monitoring individual transactions? Why or why not? e) What graph analytics technique would be most useful for detecting this pattern at scale across millions of accounts?

Exercise 4.5: NLP Obligation Extraction

Difficulty: Applied

Read the following excerpt from a fictional regulatory circular:

"Firms must ensure that all personal accounts dealing (PAD) requests from employees are submitted to the compliance team no later than two business days before the intended trade. The compliance team is required to review and respond to all PAD requests within one business day of receipt. Any PAD request involving a security that is currently on the firm's restricted list must be escalated to the CCO for approval before any response is provided. Firms that fail to maintain adequate PAD controls may be subject to supervisory action."

a) Identify every compliance obligation in this text. Write them in structured format: [Who] must [do what] [by when/under what conditions]. b) Identify any ambiguities in the text that a compliance team would need to resolve before implementing the requirements. c) How would an NLP system be trained to extract obligations like these from regulatory text? What type of NLP task is this?

Coding Exercise 4.6: Extend the Rule-Based Monitor

Difficulty: Coding — Beginner/Intermediate

Open code/example-01-rule-based-monitoring.py. Add two new monitoring rules:

Rule 6: Flag any transaction where the customer has had a prior SAR filed against them (simulate this with a prior_sars field > 0 on the Transaction object) AND the transaction amount exceeds $5,000.

Rule 7: Flag round-number transactions (amounts that are exact multiples of $1,000) above $15,000 from accounts less than 90 days old, as round-number laundering patterns are a recognized typology.

Test your rules with appropriate test transactions. Document each rule with a comment explaining the typology it addresses.

Research Exercise 4.7: The AI Readiness Self-Assessment

Difficulty: Research/Applied

Apply the four-dimension AI readiness framework from Section 4.7 to a financial services organization you are familiar with (or to a publicly described institution from a case study or news article).

For each dimension (Data, Technology, Governance, People): a) Rate the organization's readiness: Low / Medium / High b) Provide specific evidence for your rating c) Identify the most important improvement needed in that dimension

Conclude with a paragraph on which dimension you would prioritize for investment and why.