Case Study 2: Government Tax Processing — The Federal Tax Authority of Valdoria


Background

The Federal Tax Authority of Valdoria (FTAV) is the government agency responsible for collecting taxes from the 62 million citizens and 4.8 million registered businesses of Valdoria, a fictional European nation with an economy comparable in size to that of the Netherlands or Switzerland.

FTAV processes approximately 150 million tax filings per year — a number that includes individual income tax returns, corporate tax filings, value-added tax (VAT) submissions, and quarterly estimated payments. Processing occurs year-round, but 65% of individual filings arrive during a six-week peak season from March 15 to April 30.

The stakes are unambiguous: FTAV collected EUR 312 billion in tax revenue last year, representing 94% of the national government's total revenue. Any failure in the tax processing system directly impacts the government's ability to fund public services — hospitals, schools, defense, infrastructure.

The Current System: "Atlas"

FTAV's core tax processing system, "Atlas," has been in operation since 1998. It runs on a cluster of four IBM z15 mainframes in a Parallel Sysplex configuration, with DB2 12 for z/OS as the database layer. A disaster recovery site, 300 kilometers away, maintains a standby Sysplex with continuous data replication.

Metric Value
Total tables 22,600
Total stored procedures 11,200
Total indexes 48,000
Database size (compressed) 86 TB
Database size (uncompressed equivalent) 310 TB
Data retention period 10 years (legal requirement); 20 years (audit trail)
Daily transactions (off-peak) 8 million
Daily transactions (peak season) 65 million
Peak transactions per second 14,000
Batch processing (nightly, off-peak) 4.2 hours
Batch processing (nightly, peak) 7.8 hours
Concurrent online users (peak) 180,000
Unplanned downtime (last 10 years) 22 minutes total
DB2 z/OS DBA staff 24
System programmers 12
Application developers (COBOL) 85
Application developers (Java) 42

Atlas is a monument to careful engineering. In 28 years of operation, it has processed over 3.8 billion tax filings without a single filing being lost or irreversibly corrupted. Its 22-minute total unplanned downtime over ten years translates to 99.99996% availability — beyond six nines.

The Challenge

Dr. Ingrid Sorensen, FTAV's Director of Information Technology, faces a set of challenges that are different from those of a private-sector bank but no less pressing.

1. Legislative Complexity

Valdoria's parliament passes an average of 14 tax law changes per year. Each change must be reflected in Atlas within the legislatively mandated implementation period — typically 90 days for minor changes, 180 days for major restructurings. Last year, a comprehensive VAT reform required changes to 340 tables, 890 stored procedures, and 1,200 application programs. The project consumed 18 months and 65,000 person-hours.

The tax code's complexity is growing. Twenty years ago, the average individual tax return had 12 data elements. Today, it has 47 — reflecting new deductions, credits, investment reporting requirements, cryptocurrency declarations, and international income reporting. Each new data element ripples through the database schema, application logic, validation rules, and reporting systems.

2. Citizen Expectations

Valdorian citizens increasingly expect real-time interaction with the tax authority. The "My Tax Portal" — a web application launched in 2019 — allows citizens to file returns electronically, check refund status, and communicate with tax inspectors. During peak season, it handles 180,000 concurrent users.

The portal runs on a set of Java application servers that connect to DB2 for z/OS via DRDA (Distributed Relational Database Architecture). Response times are acceptable (under 2 seconds for most operations), but the architecture creates a tight coupling between the web layer and the mainframe. When a peak-season batch cycle runs long and consumes more DB2 resources than planned, portal response times degrade — and citizens notice.

3. Analytics and Fraud Detection

FTAV's analytics team wants to deploy machine-learning models for fraud detection. Currently, the agency identifies potentially fraudulent filings using rule-based scoring in DB2 stored procedures — a system that catches approximately 72% of confirmed fraud cases. The analytics team believes that ML models trained on historical filing patterns could increase the detection rate to 88-92%.

The challenge: ML model training requires access to large volumes of historical tax data (10+ years, hundreds of terabytes uncompressed) in a format suitable for data science tools (Python, R, TensorFlow). This data currently resides in DB2 for z/OS, and extracting it at the required volume is operationally difficult. Batch extract jobs that pull historical data compete with production workloads for I/O and CPU resources.

4. Compliance and Sovereignty

Valdorian law requires that all tax data be stored within Valdorian borders, processed by Valdorian government employees or vetted contractors, and encrypted at rest and in transit. EU GDPR also applies, giving citizens the right to access, correct, and (in limited circumstances) delete their tax data.

These requirements effectively rule out public cloud infrastructure for the core tax database, though they do not prevent cloud usage for non-tax workloads such as the public website, internal collaboration tools, or development/testing environments.

5. Succession Planning

The age profile of Atlas's technical staff mirrors the broader mainframe industry. Of the 24 DB2 DBAs, 16 are over 50. Of the 85 COBOL developers, 62 are over 50. FTAV has a statutory obligation to maintain operational capability — unlike a private company, a government tax authority cannot accept a period of degraded capability while it retrains staff.

Dr. Sorensen has implemented a graduate training program that recruits computer science graduates from Valdorian universities and trains them in mainframe technology. The program produces 6-8 new mainframe-capable staff per year, but retirements are projected to exceed 12 per year starting in 2027.

The Modernization Strategy

Dr. Sorensen assembles a cross-functional team to develop a modernization strategy. After 12 months of analysis, they produce a plan called "Atlas Next" with four pillars:

Pillar 1: Database Layer Separation

Atlas Next introduces Db2 for LUW on Linux as an operational data layer between the mainframe core and citizen-facing applications. The architecture:

Citizens (Web/Mobile)
        |
    [Load Balancers]
        |
    [Java Application Servers]
        |
    [Db2 for LUW - Operational Data Layer]    <-- NEW
        |                    |
    [Read Queries]     [Write Queries]
        |                    |
    [Local Cache]     [DB2 for z/OS - Core]
                            |
                    [Batch Processing]

Citizen-facing read operations (account status, filing history, refund tracking) are served from Db2 for LUW, which is refreshed from z/OS in near-real-time via IBM Data Replication. Write operations (filing submissions, amendments) are routed through to DB2 for z/OS to maintain a single authoritative source.

This architecture decouples the web tier from the mainframe, eliminating the contention between portal traffic and batch processing.

Pillar 2: Analytics Data Platform

A separate Db2 Warehouse instance receives full historical data extracts from DB2 for z/OS — initially via batch ETL, transitioning to change data capture (CDC) for lower latency. The analytics team gets their sandbox for ML development without impacting production.

The data is anonymized and masked for development and testing. Production fraud-detection models, once validated, are deployed as DB2 stored procedures on z/OS — bringing the ML inference back to where the data lives, rather than moving the data to where the models live.

Pillar 3: Schema Modernization

Rather than migrating away from DB2 for z/OS, Atlas Next invests in making the z/OS database more maintainable:

  • Temporal tables for tracking all data changes, replacing custom audit-trail code that accounts for 15% of stored procedure complexity
  • Row-level and column-level access control using DB2 12's built-in features, replacing application-level security checks
  • Online schema changes to reduce the implementation time for legislative changes
  • JSON support for new data elements that change frequently, reducing the need for ALTER TABLE operations

Pillar 4: Workforce Transformation

The most difficult pillar. Atlas Next invests in:

  • Expanding the graduate program from 6-8 to 15 graduates per year
  • Cross-training existing mainframe staff on LUW, Linux, and cloud technologies
  • Cross-training distributed-platform staff on z/OS fundamentals
  • Creating a "dual-platform DBA" career track with premium compensation
  • Partnering with two Valdorian universities to include DB2 and mainframe technology in their curricula

The goal is not to eliminate mainframe skills but to create a workforce that is fluent in both worlds.

The First Year

Atlas Next's first year produces measurable results:

  • Portal response time: Average response time during peak season drops from 1.8 seconds to 0.6 seconds after the Db2 for LUW operational layer absorbs read traffic.
  • Batch window: The nightly batch window during peak season shrinks from 7.8 hours to 6.1 hours, reduced by the removal of portal query contention.
  • Fraud detection: The analytics team trains an initial ML model on 5 years of anonymized data. In parallel testing against the rule-based system, the ML model identifies 84% of confirmed fraud cases versus the rule system's 72% — a 17% improvement. Full production deployment is scheduled for the next tax season.
  • Legislative agility: The first legislative change implemented under Atlas Next (a new cryptocurrency reporting requirement) takes 11 weeks from law passage to production, compared to a projected 16 weeks under the old process. Temporal tables and JSON support reduce the schema-change overhead significantly.
  • Staffing: The expanded graduate program recruits 14 candidates, exceeding the target. Retention of existing staff improves after the announcement of the dual-platform career track and associated compensation adjustments.
  • Cost: Total first-year investment is EUR 18 million — within the EUR 15-22 million budget. Mainframe CPU consumption decreases by 8% due to workload offloading, translating to approximately EUR 2.4 million in annual software license savings.

Remaining Challenges

Atlas Next is not without problems:

  • Data latency: The near-real-time replication from z/OS to LUW introduces a latency of 2-8 seconds. For most portal operations, this is acceptable. For a citizen who just submitted a filing and immediately checks their status, the 8-second lag can cause confusion. The team is evaluating solutions including synchronous replication for critical data flows (which would impact mainframe performance) and UI-level messaging ("Your filing has been received and will appear in your account within one minute").

  • Dual-platform complexity: Operating two DB2 platforms requires two sets of monitoring tools, two backup strategies, and two sets of operational procedures. The DBAs report that incident response is more complex because they must determine which platform is the source of a problem before they can resolve it.

  • Scope creep: Success with the operational data layer has generated enthusiasm. Business stakeholders are requesting that additional workloads — pension calculations, customs processing, social-benefit eligibility — be integrated into the Atlas Next architecture. Each additional workload increases complexity and cost.


Discussion Questions

Question 1: Scale Comparison (Beginner)

Compare FTAV's Atlas system to Meridian National Bank from Chapter 1. Create a table comparing transaction volumes, database sizes, staff sizes, and availability. Which organization faces the greater technical challenge? Which faces the greater organizational challenge? Are these the same?

Question 2: Data Sovereignty (Intermediate)

Valdoria's data-sovereignty requirements effectively prohibit using public cloud for the core tax database. How would your modernization strategy change if these requirements were relaxed? Would cloud deployment be advisable even if it were permitted? Consider both the technical and political dimensions.

Question 3: The Replication Latency Problem (Intermediate)

The 2-8 second replication latency causes confusion for citizens who check their filing status immediately after submission. Propose at least three different solutions to this problem, and evaluate the trade-offs (cost, complexity, performance impact, user experience) of each.

Question 4: ML Model Deployment Architecture (Advanced)

Atlas Next deploys ML fraud-detection models as DB2 stored procedures on z/OS — "bringing the model to the data." An alternative architecture would extract data to a separate ML platform and score filings there. Evaluate both approaches. Consider data volume, latency, security, and operational complexity. Which approach would you recommend, and why?

Question 5: The Workforce Pipeline (Advanced)

FTAV's graduate program aims to produce 15 mainframe-capable staff per year, but retirements are projected at 12+ per year starting in 2027. Is this sustainable? What assumptions could invalidate the plan? What contingency measures should Dr. Sorensen put in place?

Question 6: Government vs. Private Sector (Intermediate)

How does FTAV's modernization strategy differ from what a private-sector bank (like CUB in Case Study 1) might pursue? Consider regulatory constraints, risk tolerance, procurement processes, and organizational culture. Are there lessons that each sector could learn from the other?

Question 7: The 28-Year System (Advanced)

Atlas has been in production for 28 years. Is this a sign of success or a sign of stagnation? Argue both sides. At what point (if ever) should a well-functioning system be replaced purely due to age? What criteria would you use to make that determination?


Return to Chapter 1 | Continue to Key Takeaways