Case Study 2: Pinnacle Health's Claims Calculation UDFs

Turning Business Rules into SQL-Callable Functions

Background

Pinnacle Health processes 4.2 million medical claims per month. Each claim requires a series of calculations: the allowed amount (what the insurance plan pays for the procedure), the copay (the patient's fixed payment), the deductible allocation (how much of the patient's annual deductible this claim consumes), and the coinsurance split (the percentage split between plan and patient after the deductible). These four calculations are collectively called the "adjudication waterfall."

For fifteen years, the adjudication waterfall lived in a single COBOL batch program called CLMADJ00. It ran nightly, processing the day's claims. It worked. But then three things happened:

First, Pinnacle launched a real-time claims estimation portal for providers. Before submitting a claim, a provider could enter the procedure code and patient information and get an estimate of the allowed amount and patient responsibility. This required the adjudication calculations to be available online, not just in batch.

Second, the finance team needed the adjudication calculations in their quarterly reporting queries. They were running SQL against the claims tables and needed to recalculate allowed amounts for "what-if" scenarios: "What would our exposure be if we changed the tier 3 provider rate by 5%?"

Third, a regulatory audit found discrepancies between the batch adjudication results and the numbers produced by a separate reporting program that had its own implementation of the allowed amount calculation. The discrepancy was small — $0.02 per claim on average — but across 50 million annual claims, it aggregated to a $1 million reporting variance.

Ahmad Patel proposed user-defined functions. "The adjudication waterfall is four calculations," he said. "Each takes inputs, reads lookup tables, and returns a number. That's what functions are for. If we implement them as UDFs, every SQL query in the enterprise can call them. The batch program, the online portal, the finance reports — they all use the same function."

Diane Morrison was skeptical. "I've seen UDFs destroy performance. One bad function in a WHERE clause and your query runs for six hours."

"Then we design them correctly," Ahmad said. "And we educate the developers about where to use them and where not to."

Design Phase

Ahmad designed four scalar UDFs, each implementing one stage of the adjudication waterfall:

  1. PINNACLE.CALC_ALLOWED_AMT — Takes procedure code, provider tier, and plan type. Returns the allowed amount. Reads from PROCEDURE_RATE_SCHEDULE, PROVIDER_TIER_TABLE, and PLAN_ADJUSTMENT_TABLE.

  2. PINNACLE.CALC_COPAY — Takes plan type and procedure category. Returns the copay amount. Reads from COPAY_SCHEDULE.

  3. PINNACLE.CALC_DEDUCTIBLE — Takes patient ID, plan year, and claim amount. Returns the deductible applied to this claim. Reads from DEDUCTIBLE_TRACKER and PLAN_DEDUCTIBLE_TABLE. This is the most complex function because it must account for the patient's year-to-date deductible consumption.

  4. PINNACLE.CALC_COINSURANCE — Takes plan type, allowed amount, copay, and deductible applied. Returns the coinsurance amount (plan's share). Pure calculation, no table reads.

Each function was designed to be independent. You could call them individually or compose them:

SELECT C.CLAIM_ID,
       C.BILLED_AMOUNT,
       PINNACLE.CALC_ALLOWED_AMT(C.PROC_CODE,
           C.PROVIDER_TIER, C.PLAN_TYPE) AS ALLOWED,
       PINNACLE.CALC_COPAY(C.PLAN_TYPE,
           C.PROC_CATEGORY) AS COPAY,
       PINNACLE.CALC_DEDUCTIBLE(C.PATIENT_ID,
           YEAR(C.SERVICE_DATE),
           PINNACLE.CALC_ALLOWED_AMT(C.PROC_CODE,
               C.PROVIDER_TIER, C.PLAN_TYPE)) AS DEDUCTIBLE,
       PINNACLE.CALC_COINSURANCE(C.PLAN_TYPE,
           PINNACLE.CALC_ALLOWED_AMT(C.PROC_CODE,
               C.PROVIDER_TIER, C.PLAN_TYPE),
           PINNACLE.CALC_COPAY(C.PLAN_TYPE,
               C.PROC_CATEGORY),
           PINNACLE.CALC_DEDUCTIBLE(C.PATIENT_ID,
               YEAR(C.SERVICE_DATE),
               PINNACLE.CALC_ALLOWED_AMT(C.PROC_CODE,
                   C.PROVIDER_TIER, C.PLAN_TYPE)))
           AS PLAN_PAYMENT
FROM CLAIMS C
WHERE C.CLAIM_ID = :WS-CLAIM-ID;

Ahmad noticed the problem immediately: CALC_ALLOWED_AMT is called three times for the same claim — once directly and twice as input to other functions. With DETERMINISTIC declared, DB2 could theoretically cache the result. But Ahmad didn't trust "theoretically" for production workloads.

Design Decision: The Wrapper Stored Procedure. For the batch adjudication program and the online portal — callers that need all four calculations for a single claim — Ahmad created a wrapper stored procedure that calls each UDF once, stores intermediate results, and returns all four values:

CREATE PROCEDURE PINNACLE.ADJUDICATE_CLAIM
    (IN  P_CLAIM_ID        CHAR(15),
     OUT P_ALLOWED_AMT     DECIMAL(11,2),
     OUT P_COPAY           DECIMAL(11,2),
     OUT P_DEDUCTIBLE      DECIMAL(11,2),
     OUT P_COINSURANCE     DECIMAL(11,2),
     OUT P_PLAN_PAYMENT    DECIMAL(11,2),
     OUT P_PATIENT_RESP    DECIMAL(11,2),
     OUT P_STATUS_CODE     INTEGER,
     OUT P_STATUS_MSG      VARCHAR(200))
    LANGUAGE COBOL
    EXTERNAL NAME CLMADJSP
    PARAMETER STYLE GENERAL
    NOT DETERMINISTIC
    READS SQL DATA
    WLM ENVIRONMENT WLMCLM
    DYNAMIC RESULT SETS 0
    COLLID PINCOLL
    COMMIT ON RETURN NO
    ASUTIME LIMIT 3000;

The stored procedure internally calls each UDF via SQL:

       2000-CALCULATE-WATERFALL.
           EXEC SQL
               SET :WS-ALLOWED-AMT =
                   PINNACLE.CALC_ALLOWED_AMT(
                       :WS-PROC-CODE,
                       :WS-PROVIDER-TIER,
                       :WS-PLAN-TYPE)
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 8 TO LS-STATUS-CODE
               STRING 'ALLOWED AMT CALC FAILED: SQLCODE='
                      WS-SQLCODE
                      DELIMITED BY SIZE
                      INTO LS-STATUS-MSG
               GOBACK
           END-IF

           EXEC SQL
               SET :WS-COPAY =
                   PINNACLE.CALC_COPAY(
                       :WS-PLAN-TYPE,
                       :WS-PROC-CATEGORY)
           END-EXEC

           EXEC SQL
               SET :WS-DEDUCTIBLE =
                   PINNACLE.CALC_DEDUCTIBLE(
                       :WS-PATIENT-ID,
                       :WS-PLAN-YEAR,
                       :WS-ALLOWED-AMT)
           END-EXEC

           EXEC SQL
               SET :WS-COINSURANCE =
                   PINNACLE.CALC_COINSURANCE(
                       :WS-PLAN-TYPE,
                       :WS-ALLOWED-AMT,
                       :WS-COPAY,
                       :WS-DEDUCTIBLE)
           END-EXEC

           COMPUTE WS-PLAN-PAYMENT =
               WS-ALLOWED-AMT - WS-COPAY -
               WS-DEDUCTIBLE + WS-COINSURANCE

           COMPUTE WS-PATIENT-RESP =
               WS-COPAY + WS-DEDUCTIBLE -
               WS-COINSURANCE

           MOVE WS-ALLOWED-AMT  TO LS-ALLOWED-AMT
           MOVE WS-COPAY        TO LS-COPAY
           MOVE WS-DEDUCTIBLE   TO LS-DEDUCTIBLE
           MOVE WS-COINSURANCE  TO LS-COINSURANCE
           MOVE WS-PLAN-PAYMENT TO LS-PLAN-PAYMENT
           MOVE WS-PATIENT-RESP TO LS-PATIENT-RESP
           .

This gave callers two options: call the wrapper stored procedure for a complete adjudication, or call individual UDFs inline in SQL for ad-hoc queries.

The Performance Incident

Six weeks after deployment, Diane's worst fear materialized. A financial analyst on the reporting team wrote this query:

SELECT C.PROVIDER_ID,
       COUNT(*) AS CLAIM_COUNT,
       SUM(C.BILLED_AMOUNT) AS TOTAL_BILLED,
       SUM(PINNACLE.CALC_ALLOWED_AMT(C.PROC_CODE,
           C.PROVIDER_TIER, C.PLAN_TYPE)) AS TOTAL_ALLOWED
FROM CLAIMS C
WHERE C.SERVICE_DATE BETWEEN '2024-01-01' AND '2024-12-31'
  AND PINNACLE.CALC_ALLOWED_AMT(C.PROC_CODE,
      C.PROVIDER_TIER, C.PLAN_TYPE) > C.BILLED_AMOUNT
GROUP BY C.PROVIDER_ID
ORDER BY TOTAL_ALLOWED DESC;

The query was looking for providers whose allowed amounts exceeded billed amounts — a valid audit question. But the CLAIMS table had 31 million rows for the year. The UDF in the WHERE clause meant DB2 had to call CALC_ALLOWED_AMT for every row that passed the date filter (approximately 31 million calls) before it could apply the allowed-amount predicate. The UDF in the SELECT's SUM required another 31 million calls.

The query ran for four hours before the DBA killed it. It consumed 47 minutes of CPU and held read locks on the PROCEDURE_RATE_SCHEDULE table for the entire duration, blocking rate schedule updates.

Ahmad responded with a three-part remediation:

Part 1: Materialized Column. For the CALC_ALLOWED_AMT function specifically, Pinnacle added a materialized column to the CLAIMS table:

ALTER TABLE CLAIMS
    ADD COLUMN CALC_ALLOWED_AMOUNT DECIMAL(11,2);

A nightly batch updates this column for all new claims using the UDF:

UPDATE CLAIMS
SET CALC_ALLOWED_AMOUNT =
    PINNACLE.CALC_ALLOWED_AMT(PROC_CODE,
        PROVIDER_TIER, PLAN_TYPE)
WHERE CALC_ALLOWED_AMOUNT IS NULL;

Reporting queries use the materialized column instead of calling the UDF:

SELECT PROVIDER_ID, COUNT(*), SUM(BILLED_AMOUNT),
       SUM(CALC_ALLOWED_AMOUNT)
FROM CLAIMS
WHERE SERVICE_DATE BETWEEN '2024-01-01' AND '2024-12-31'
  AND CALC_ALLOWED_AMOUNT > BILLED_AMOUNT
GROUP BY PROVIDER_ID;

This query runs in 8 seconds.

Part 2: Usage Guidelines Document. Ahmad wrote a one-page document that became mandatory reading for anyone writing SQL at Pinnacle:

UDF Usage Rules: 1. Never use a UDF in a WHERE clause on a table with more than 10,000 rows unless other predicates reduce the candidate set to under 10,000 rows first. 2. If you need UDF results for reporting, check if a materialized column exists first. 3. For ad-hoc analysis, use the UDF in SELECT for a small result set. Don't use it to filter a large table. 4. If your query with a UDF runs for more than 30 seconds, stop and ask the DBA for a better approach.

Part 3: Resource Limit Facility (RLF). Diane configured DB2's Resource Limit Facility to set a CPU time limit on dynamic SQL statements that reference UDFs. Any dynamic SQL calling a PINNACLE UDF is limited to 120 CPU-seconds. This prevents runaway queries without affecting the batch adjudication program (which uses static SQL and is exempt from RLF limits).

The CALC_DEDUCTIBLE Complexity

The deductible UDF was the most challenging implementation. Unlike the other calculations, it depends on state: the patient's year-to-date deductible consumption. Two claims for the same patient in the same batch could affect each other's deductible allocation.

Ahmad's solution:

For the batch adjudication (via the wrapper stored procedure), the stored procedure processes claims for each patient in date order and tracks the running deductible consumption in working storage. The CALC_DEDUCTIBLE UDF is called with the current accumulated deductible as a parameter, not just the patient ID.

For ad-hoc queries (calling the UDF directly), the UDF reads the current DEDUCTIBLE_TRACKER table, which reflects the last batch run. This means ad-hoc UDF calls are accurate to the last batch, not real-time. Ahmad documented this limitation prominently.

For the online portal (real-time estimation), the wrapper stored procedure reads the current tracker AND any pending claims in the CLAIMS_PENDING table to give a real-time estimate.

       4000-CALC-DEDUCTIBLE-LOGIC.
      *    READ THE PATIENT'S ANNUAL DEDUCTIBLE LIMIT
           EXEC SQL
               SELECT ANNUAL_DEDUCTIBLE
               INTO :WS-ANNUAL-DEDUCTIBLE
               FROM PLAN_DEDUCTIBLE_TABLE
               WHERE PLAN_TYPE = :LS-PLAN-TYPE
                 AND EFFECTIVE_YEAR = :LS-PLAN-YEAR
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 0 TO LS-RESULT
               MOVE 0 TO LS-RESULT-IND
               MOVE '00000' TO LS-SQLSTATE
               GOBACK
           END-IF

      *    READ THE PATIENT'S YEAR-TO-DATE CONSUMPTION
           EXEC SQL
               SELECT COALESCE(YTD_DEDUCTIBLE, 0)
               INTO :WS-YTD-CONSUMED
               FROM DEDUCTIBLE_TRACKER
               WHERE PATIENT_ID = :LS-PATIENT-ID
                 AND PLAN_YEAR = :LS-PLAN-YEAR
           END-EXEC

           IF SQLCODE = +100
               MOVE 0 TO WS-YTD-CONSUMED
           END-IF

      *    CALCULATE REMAINING DEDUCTIBLE
           COMPUTE WS-REMAINING =
               WS-ANNUAL-DEDUCTIBLE - WS-YTD-CONSUMED

           IF WS-REMAINING <= 0
      *        DEDUCTIBLE FULLY MET
               MOVE 0 TO LS-RESULT
           ELSE
               IF LS-CLAIM-AMOUNT <= WS-REMAINING
      *            ENTIRE CLAIM APPLIES TO DEDUCTIBLE
                   MOVE LS-CLAIM-AMOUNT TO LS-RESULT
               ELSE
      *            PARTIAL - ONLY REMAINING AMOUNT APPLIES
                   MOVE WS-REMAINING TO LS-RESULT
               END-IF
           END-IF

           MOVE 0 TO LS-RESULT-IND
           MOVE '00000' TO LS-SQLSTATE
           .

The CALC_COINSURANCE Function: A Pure Calculation UDF

The coinsurance UDF is the simplest of the four, and it illustrates an important design principle: not every UDF needs to read tables. CALC_COINSURANCE is a pure calculation based on its input parameters and the plan type's coinsurance percentage.

Ahmad declared it with special attributes:

CREATE FUNCTION PINNACLE.CALC_COINSURANCE
    (P_PLAN_TYPE     CHAR(4),
     P_ALLOWED_AMT   DECIMAL(11,2),
     P_COPAY         DECIMAL(11,2),
     P_DEDUCTIBLE    DECIMAL(11,2))
    RETURNS DECIMAL(11,2)
    LANGUAGE COBOL
    EXTERNAL NAME CALCCOIN
    PARAMETER STYLE SQL
    DETERMINISTIC
    READS SQL DATA
    SPECIFIC CALC_COINS_V1
    NO EXTERNAL ACTION
    WLM ENVIRONMENT WLMCLM;

Wait — it reads SQL data? Ahmad marked it READS SQL DATA because the COBOL implementation reads the coinsurance percentage from a lookup table. If the percentage were a parameter instead, he could have used CONTAINS SQL or even NO SQL, which would enable DB2 to parallelize calls to this function. This is a design trade-off: parameter simplicity versus optimizer freedom.

Results After One Year

Metric Before UDFs After UDFs Notes
Logic implementations 4 (batch, online, reporting, audit) 1 (UDF set + wrapper) Single source of truth
Reporting variance $1M/year | $0 Eliminated by using same functions
Batch adjudication time 3.2 hours 2.8 hours 12.5% faster (wrapper SP reduces round-trips)
Online estimation response 420 ms 180 ms Portal calls wrapper SP instead of replicating logic
Ad-hoc reporting capability Not possible Available Finance team can use UDFs in SQL
Regulatory audit findings 3 findings/year 0 findings/year Consistent calculations across all paths
Rate schedule updates 4-week deployment cycle 1-day deployment cycle Update lookup tables; UDFs pick up changes automatically

The rate schedule update improvement was unexpected. Previously, changing a procedure rate required modifying and redeploying the batch program, the online transaction, and the reporting program. With UDFs, the rate data lives in tables. The UDFs read the tables. Update the table, and every caller immediately gets the new rate. The UDF code itself only changes when the calculation logic changes, not when the data changes.

Lessons Learned

1. UDFs and stored procedures are complementary. The UDFs handle individual calculations that can be used inline in SQL. The wrapper stored procedure composes them for callers that need the complete waterfall. Neither alone was sufficient.

2. Materialized columns are the UDF performance escape valve. When you need UDF results for large-scale reporting, materialize them. The UDF is still the single source of truth — it populates the materialized column. But queries read the column, not the function.

3. The deductible problem reveals the limits of stateless functions. A scalar UDF is stateless by design. When calculations depend on running state (year-to-date accumulation), the UDF can only see the last persisted state. Real-time state requires a stored procedure that manages the sequence.

4. Developer education is as important as the code. The performance incident happened because a developer used a UDF in a way that was technically correct but operationally disastrous. The UDF usage guidelines and RLF limits are part of the solution, not an afterthought.

5. Separate the data from the logic. Ahmad's biggest insight: the UDFs read lookup tables. The calculation logic is in COBOL. The calculation data (rates, tiers, caps) is in DB2 tables. When rates change, update the tables. When rules change, update the COBOL. This separation made the system dramatically more maintainable.


Discussion Questions

  1. Ahmad designed four separate UDFs rather than one monolithic function that returns all four values. What are the advantages of this decomposition? Are there scenarios where a single function would be preferable?

  2. The CALC_DEDUCTIBLE function depends on the patient's year-to-date deductible consumption. This creates a temporal accuracy problem: the UDF called in an ad-hoc query reflects the last batch run, not real-time state. How would you communicate this limitation to users? Is there a design that provides real-time accuracy without the wrapper stored procedure?

  3. The materialized column solution (Part 1 of the performance remediation) introduces a new problem: the materialized column can become stale if the nightly update fails or if the lookup tables change mid-day. Design a staleness detection mechanism.

  4. Ahmad declared CALC_COINSURANCE as READS SQL DATA even though a redesign could make it NO SQL. What optimizer freedoms does NO SQL unlock? Is the trade-off (adding a coinsurance percentage parameter) worth it?

  5. Diane configured Resource Limit Facility (RLF) limits for dynamic SQL calling UDFs but exempted static SQL. Why is this distinction appropriate? Could a static SQL program also cause the same performance problem?

  6. The case mentions that the reporting variance was "$0.02 per claim on average" which aggregated to "$1 million over 50 million claims." What does this reveal about the importance of precision in financial calculations? How do COBOL's COMP-3 (packed decimal) fields and DB2's DECIMAL type help prevent this kind of drift?