42 min read

> "Every COBOL program you've written is an API waiting to happen. The question isn't whether to expose it — it's whether you'll do it on your terms or let someone else do it badly on theirs." — Yuki Tanaka, SecureFirst Insurance, Enterprise...

Chapter 21: API-First COBOL — Exposing Mainframe Services via z/OS Connect, API Mediation Layer, and OpenAPI

"Every COBOL program you've written is an API waiting to happen. The question isn't whether to expose it — it's whether you'll do it on your terms or let someone else do it badly on theirs." — Yuki Tanaka, SecureFirst Insurance, Enterprise Architecture Review, 2024

Spaced Review

Before we begin, revisit three concepts that form the foundation of this chapter:

From Chapter 13 — CICS Architecture: CICS regions host the transaction programs that will become your API backends. Remember the distinction between terminal-owning regions (TORs) and application-owning regions (AORs). When we expose COBOL services as APIs, those API calls land on a TOR and route to the AOR where your program runs. The CICS communication area (COMMAREA) and channels/containers you mastered in Chapter 13 become the data contracts your APIs fulfill.

From Chapter 14 — Web Services: You built SOAP and REST services directly in CICS using DFHWS2LS and the CICS web services pipeline. That was the first generation. This chapter takes you to the second generation — where z/OS Connect sits in front of CICS and provides a cleaner separation between API definition and backend implementation. If you skipped Chapter 14, you can still follow this chapter, but you'll miss some context on why the earlier approach had limitations.

From Chapter 19 — MQ Integration: Message queues give you asynchronous communication patterns. APIs are synchronous by default — a client sends a request and waits for a response. But real-world mainframe APIs often need to bridge both patterns. The request-reply pattern you built with MQ in Chapter 19 becomes the foundation for asynchronous API patterns in Section 21.1.


21.1 The API-First Mindset for Mainframe

Let me be direct: if your mainframe shop is still exchanging data via flat file transfers, FTP batches, and screen-scraping, you're operating on borrowed time. Not because those methods don't work — they've worked for decades — but because every other system in your enterprise has moved to API-based integration, and your mainframe is becoming an island.

Why APIs, Not File Transfers

Here's the conversation that happens in every modernization meeting. Someone from the digital team says, "We need real-time account balance data for the mobile app." The mainframe team says, "We run a batch extract every night at 2 AM." The digital team says, "We need it in real time." The mainframe team says, "We can run the batch every hour." Nobody is happy.

File transfers have three fundamental problems that APIs solve:

Latency. A file extract captures a point-in-time snapshot. By the time the downstream system processes it, the data is stale. When Kwame's team at CNB tried to build a real-time fraud detection system on hourly batch extracts, they were detecting fraud that had already cleared.

Coupling. File formats create tight coupling disguised as loose coupling. You think you're decoupled because you're exchanging files — but change a field length in the COBOL copybook, and every downstream consumer breaks. With APIs, you version the contract explicitly.

Volume. File transfers move entire datasets when consumers often need a single record. SecureFirst was transferring 2.3 million policy records nightly to a downstream system that queried maybe 500 of them per day. That's moving 4,600 records for every one that matters.

The API-First Principle

API-first doesn't mean "add APIs to your existing architecture." It means designing your mainframe services as APIs from the beginning, then building whatever else you need (batch processes, file extracts, screen interfaces) as consumers of those APIs.

This inverts the traditional mainframe model. Traditionally:

COBOL Program → CICS Transaction → 3270 Screen
                                  → Batch Extract → File → Downstream

API-first:

COBOL Program → CICS Transaction → API (z/OS Connect)
                                       ↓
                                  → Mobile App
                                  → Web Portal
                                  → Partner System
                                  → Batch Consumer
                                  → 3270 Screen (yes, even this)

The COBOL program doesn't change. The business logic stays where it is. What changes is how you expose it.

SecureFirst's Turning Point

Yuki Tanaka had been saying this for three years before anyone listened. SecureFirst Insurance had 847 CICS transactions supporting their policy administration system — pure COBOL, rock-solid, running on a z15. When the board approved a new customer-facing portal, the initial plan was to replicate the mainframe data into a cloud database and build the portal on top of that.

Carlos Rivera, leading the API team, did the math: replicating the data meant building and maintaining a synchronization layer, handling conflicts, managing two sources of truth, and explaining to regulators which database was authoritative. The estimated timeline was 18 months and $4.2 million.

Yuki's counter-proposal: expose the existing CICS transactions as APIs using z/OS Connect. Timeline: 4 months for the first 40 critical APIs. Cost: $800K including infrastructure. The mainframe remains the single source of truth. The portal consumes APIs. So does the mobile app. So does the agent desktop. One source, many consumers.

The board approved Yuki's plan. Four months later, the first APIs were live. That project is the backbone of this chapter.

The Real-World Numbers

Let me give you the numbers that convinced SecureFirst's CFO, because business cases need more than architecture diagrams.

Before APIs (file transfer model): - 23 downstream systems consuming mainframe data via nightly/hourly FTP - Average 14 incidents per month due to failed transfers, encoding issues, format mismatches - 920 hours per year of developer time maintaining file transfer jobs - 6-8 weeks to onboard a new data consumer (new batch job, new copybook, new FTP configuration, new scheduling, testing) - Data freshness: 1-24 hours stale depending on batch schedule

After APIs (API-first model): - Same 23 consumers, now calling REST APIs - Average 2 incidents per month (mostly authentication-related, resolved in minutes) - 180 hours per year maintaining the API layer - 2-3 weeks to onboard a new consumer (register on portal, provision API key, test in sandbox) - Data freshness: real-time (current as of the API call)

The operational savings alone — 740 hours per year at $150/hour fully loaded — pay for the API infrastructure in 14 months. The data freshness improvement is harder to quantify but drove a 40% reduction in customer complaints about stale information on the web portal.

When APIs Aren't the Answer

I'm not going to pretend APIs solve everything. Batch processing isn't going away. If you need to process 10 million records with complex transformations, an API that handles one record at a time is absurd. The right approach is batch processing that's initiated via API and reports status via API, while doing the heavy lifting in traditional batch.

Similarly, if you have a high-volume, fire-and-forget data flow between two mainframe subsystems, MQ (Chapter 19) is a better fit than a synchronous API. APIs add overhead — HTTP parsing, JSON serialization, authentication — that's unnecessary for tightly integrated internal systems.

Here's a decision matrix I use when consulting with mainframe shops:

Characteristic Use API Use MQ Use Batch
Consumer needs real-time data Yes
Consumer needs request-response Yes
Fire-and-forget messaging Yes
High-volume, complex transformations Yes
Multiple diverse consumers Yes
Guaranteed delivery more important than speed Yes
Regulatory requirement for real-time access Yes
Internal mainframe-to-mainframe Yes
Consumer is a modern web/mobile app Yes

Most real-world integrations combine these patterns. The fund transfer API in the progressive project uses all three: the API accepts the request (synchronous), puts it on an MQ queue (asynchronous messaging), and a batch-like process handles settlement (batch processing). The API-first mindset means APIs are your default integration pattern. You choose something else only when you have a specific reason.

The Cost of Doing Nothing

One more point before we move on. I hear this objection constantly: "Our file transfers work fine. Why change?" The answer isn't that your file transfers will stop working. It's that the world around them has changed.

Your cloud-native applications expect APIs. Your mobile apps expect APIs. Your partners expect APIs. Your regulators are starting to expect APIs (see the Federal Benefits case study). Every time a downstream team can't get real-time data from the mainframe, they build a workaround — a replicated database, a screen scraper, a shadow system. Each workaround creates data inconsistency, security risk, and maintenance burden.

The cost of doing nothing isn't zero. It's the accumulating cost of every workaround, every data inconsistency incident, every delayed project because "we're waiting for the mainframe batch."


21.2 z/OS Connect Enterprise Edition

z/OS Connect EE is IBM's strategic product for exposing mainframe assets as RESTful APIs. It sits between API consumers and your backend systems (CICS, IMS, DB2, MQ), handling the translation between JSON/REST on the outside and mainframe-native protocols on the inside.

Architecture Overview

z/OS Connect EE runs as a Liberty server on z/OS. It's a Java-based runtime, but don't let that bother you — your COBOL programs don't change. z/OS Connect handles the impedance mismatch between the REST/JSON world and the COBOL/COMMAREA world.

The architecture has three layers:

┌─────────────────────────────────────────────┐
│              API Consumer                    │
│     (Mobile App, Web Portal, Partner)        │
└──────────────────┬──────────────────────────┘
                   │ HTTPS / JSON
┌──────────────────▼──────────────────────────┐
│           z/OS Connect EE                    │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  │
│  │   API    │  │  Service  │  │ Service  │  │
│  │ Defini-  │→ │  Archive  │→ │ Provider │  │
│  │  tion    │  │  (.sar)   │  │          │  │
│  └──────────┘  └───────────┘  └──────────┘  │
└──────────────────┬──────────────────────────┘
                   │ COMMAREA / Channel / MQ / SQL
┌──────────────────▼──────────────────────────┐
│         Backend System                       │
│    (CICS, IMS, DB2, MQ)                     │
└─────────────────────────────────────────────┘

API Definition — An OpenAPI 3.0 specification that defines the RESTful interface. This is what external consumers see. It includes paths, operations, request/response schemas, and security requirements.

Service Archive (.sar) — A deployable unit that contains the mapping between the API's JSON structures and the backend's data structures (COBOL copybooks, MQ message formats, DB2 result sets). The SAR file is created using the z/OS Connect EE API toolkit (an Eclipse-based tool) or the command-line zconbt utility.

Service Provider — A connector that knows how to communicate with a specific backend. The CICS service provider invokes CICS programs via the CICS Transaction Gateway or IPIC connections. The MQ service provider puts/gets messages. The DB2 service provider executes SQL.

Service Providers in Detail

The most common service provider — and the one you'll use for the banking project — is the CICS service provider. Here's how a request flows:

  1. Consumer sends GET /api/v1/accounts/1234567890 with a Bearer token
  2. z/OS Connect validates the token, matches the path to an API definition
  3. The API definition maps to a service archive that specifies CICS program ACCTINQ with COMMAREA layout from copybook ACCT-INQ-AREA
  4. z/OS Connect builds the COMMAREA from the URL path parameter and any query parameters
  5. The CICS service provider invokes ACCTINQ in the target CICS region via IPIC
  6. ACCTINQ runs its standard COBOL logic, populates the COMMAREA response area
  7. z/OS Connect maps the COMMAREA response back to JSON
  8. Consumer receives a JSON response with account details

Your COBOL program ACCTINQ doesn't know it was called via an API. It received a COMMAREA and returned a COMMAREA, exactly as it would if called from a 3270 screen or another CICS program. This is the power of z/OS Connect — zero changes to existing business logic.

The API Requester Pattern

z/OS Connect isn't just about exposing mainframe services — it also lets mainframe programs consume external APIs. The API requester pattern works in reverse:

  1. Your COBOL program calls a z/OS Connect API requester service
  2. z/OS Connect maps the COBOL data structures to a JSON request
  3. z/OS Connect calls the external REST API
  4. The JSON response is mapped back to COBOL data structures
  5. Your program processes the response

This is how SecureFirst's COBOL policy-rating program calls an external weather API to factor climate risk into premium calculations. The COBOL program doesn't know it's calling a REST API — it's just calling another program through a standard interface.

Configuration Fundamentals

z/OS Connect EE is configured through its server.xml and associated configuration files. Here's a minimal configuration for the CICS service provider:

<server>
    <featureManager>
        <feature>zosconnect:zosConnect-2.0</feature>
        <feature>zosconnect:cicsSP-2.0</feature>
        <feature>appSecurity-3.0</feature>
        <feature>ssl-1.0</feature>
    </featureManager>

    <httpEndpoint id="defaultHttpEndpoint"
                  host="*"
                  httpPort="9080"
                  httpsPort="9443" />

    <zosconnect_cicsIpicConnection id="CICS_PROD"
        host="cics-prod.securefirst.com"
        port="1490"
        applid="CICSPROD" />

    <zosconnect_services>
        <weightedRoutingPolicy>
            <weightedRoute connectionRef="CICS_PROD" weight="100" />
        </weightedRoutingPolicy>
    </zosconnect_services>
</server>

The zosconnect_cicsIpicConnection element establishes an IPIC connection to your CICS region. The connection reference is then used by individual service archives to route requests to the correct CICS region. In production, you'll have multiple connections with weighted routing for workload distribution — exactly the kind of high-availability pattern you've been building in the progressive project.

Creating a Service Archive

The service archive creation process starts with your COBOL copybook. Let's say you have this copybook for an account inquiry:

       01  ACCT-INQ-REQUEST.
           05  AIR-ACCOUNT-NUMBER     PIC X(10).
           05  AIR-REQUEST-TYPE       PIC X(01).
              88  AIR-BALANCE-ONLY    VALUE 'B'.
              88  AIR-FULL-DETAIL     VALUE 'F'.

       01  ACCT-INQ-RESPONSE.
           05  AIR-RESP-CODE          PIC 9(02).
           05  AIR-ACCOUNT-DATA.
               10  AIR-ACCT-NAME      PIC X(40).
               10  AIR-ACCT-TYPE      PIC X(01).
               10  AIR-BALANCE        PIC S9(13)V99 COMP-3.
               10  AIR-LAST-ACTIVITY  PIC X(10).
               10  AIR-STATUS         PIC X(01).
           05  AIR-ERROR-MSG          PIC X(80).

Using the z/OS Connect API toolkit, you import this copybook and it generates a JSON schema mapping. The tool handles the COBOL-to-JSON type conversions:

  • PIC X(n) → JSON string (trimmed)
  • PIC 9(n) → JSON integer
  • PIC S9(n)V9(m) COMP-3 → JSON number (decimal)
  • 88-level conditions → documented as enum values
  • OCCURS clauses → JSON arrays
  • REDEFINES → requires manual mapping (the tool can't auto-resolve which redefinition to use)

The resulting .sar file bundles the mapping, the connection configuration, and metadata about the target CICS program and transaction.

Interceptors and Request Processing

z/OS Connect EE supports interceptors — Java classes that execute before and after the backend call. Interceptors are where you implement cross-cutting logic that doesn't belong in the COBOL program or the service archive:

  • Logging interceptor — Capture request/response details for audit and debugging
  • Validation interceptor — Apply business rules that aren't expressible in the OpenAPI schema
  • Transformation interceptor — Modify request or response data (e.g., date format conversion, currency code lookup)
  • Circuit breaker interceptor — Track backend failures and short-circuit requests when the backend is unhealthy
  • Caching interceptor — Cache responses for idempotent GET requests to reduce backend load

SecureFirst's custom interceptor chain processes every API request in this order:

Request → Logging → Validation → [z/OS Connect Mapping] → Backend Call
Response ← Logging ← Sanitization ← [z/OS Connect Mapping] ← Backend Response

The sanitization interceptor is critical. It examines every response for patterns that look like internal error information (CICS abend codes, COBOL response codes, program names, memory offsets) and strips them before the response reaches the consumer. This is defense in depth — even if the service archive's response code mapping misses an edge case, the sanitization interceptor catches it.

Performance Characteristics

z/OS Connect adds overhead to every API call. Understanding this overhead helps you set realistic SLA expectations:

  • JSON parsing and serialization: 2-5ms for typical payloads (under 10KB)
  • COBOL-to-JSON mapping: 1-3ms for straightforward copybook mappings
  • IPIC connection establishment: 0ms (connection pooling — the connection is already established)
  • IPIC round-trip overhead: 5-15ms (network latency between z/OS Connect and CICS)
  • SSL/TLS handshake: 0ms for persistent connections, 50-200ms for new connections
  • Interceptor chain: 1-5ms depending on interceptor count and complexity

Total z/OS Connect overhead for a typical request: 10-30ms. If your COBOL program takes 100ms to execute, the API consumer will see approximately 110-130ms end-to-end from z/OS Connect (not counting network latency between the consumer and z/OS Connect).

This is important context for SLA negotiations. If your CICS transaction takes 200ms (p50) and the consumer expects 250ms (p50) end-to-end, you have a 50ms budget for z/OS Connect overhead and network latency. That's tight but achievable if the consumer is on the same network. If the consumer is across the internet, add another 20-100ms for network latency.

Common Pitfalls

COMP-3 precision. COBOL packed decimal fields can hold more precision than IEEE 754 floating-point. A PIC S9(15)V99 COMP-3 field has 17 significant digits — more than a JSON number (which is IEEE 754 double) can represent exactly. For financial amounts, map to JSON string with a documented decimal format, not JSON number. SecureFirst learned this the hard way when a $1,234,567,890,123.45 policy value lost its last two digits in JSON serialization.

Character encoding. COBOL on z/OS uses EBCDIC. z/OS Connect handles the conversion to UTF-8 for JSON, but watch out for code pages. If your COBOL data contains characters outside the basic Latin set (accented names, special symbols), ensure your z/OS Connect configuration specifies the correct EBCDIC code page. SecureFirst uses code page 1047, which supports the full Latin-1 character set.

COMMAREA size limits. CICS COMMAREAs are limited to 32,763 bytes. If your API needs to return more data than that, use channels and containers instead. z/OS Connect supports both, but the service archive configuration is different.

COBOL FILLER fields. COBOL copybooks frequently contain FILLER fields — unnamed placeholders for alignment or future use. These should be excluded from the JSON mapping. The z/OS Connect API toolkit handles this automatically, but if you're building mappings manually, remember to skip FILLER.

REDEFINES and conditional logic. When a COBOL record uses REDEFINES, the same bytes have different interpretations depending on a condition. z/OS Connect can't automatically determine which interpretation to use. You need to configure the mapping explicitly, usually by specifying a discriminator field that determines which REDEFINES branch applies.

Null handling. COBOL doesn't have a concept of null. An empty string field is spaces (PIC X(40) filled with X'40'). An empty numeric field is zeros. When mapping to JSON, decide whether empty COBOL values should become JSON null, empty string, or zero. Document your convention and apply it consistently. SecureFirst's convention: empty alphanumeric fields map to JSON null; zero numeric fields map to JSON 0 (or "0.00" for currency).


21.3 OpenAPI for COBOL

OpenAPI (formerly Swagger) is the industry standard for describing RESTful APIs. When you expose COBOL services through z/OS Connect, you need OpenAPI specifications that accurately represent your mainframe data structures, business rules, and operational characteristics.

Specification Design Principles

An OpenAPI spec for a COBOL-backed service must answer questions that don't arise with typical microservices:

What are the data boundaries? COBOL fields have fixed lengths. Your API must document these limits. An account name field that's PIC X(40) in COBOL means the API accepts at most 40 characters. If a consumer sends 41, what happens? Your spec must define this — either truncation (dangerous) or rejection with a 400 error (correct).

What are the numeric ranges? PIC 9(7) means values 0000000 through 9999999. Your OpenAPI spec should use minimum and maximum to enforce this. Don't let invalid data reach your COBOL program — validate at the API layer.

What about COBOL-specific data patterns? Dates in PIC 9(8) format (YYYYMMDD), social security numbers in PIC 9(9), policy numbers with embedded check digits — these all need pattern constraints in your OpenAPI schema.

Data Mapping: COBOL Copybook to JSON Schema

The mapping from COBOL copybooks to JSON schema is mechanical but requires judgment calls. Here's a systematic approach:

Level-01 items become JSON objects. Each subordinate item becomes a property of that object.

Group items (no PIC clause) become nested JSON objects.

Elementary items map based on their PIC clause:

COBOL PIC JSON Type Notes
X(n) string, maxLength: n Trim trailing spaces
9(n) integer, maximum: 10^n - 1
S9(n) integer Include negative range
S9(n)V9(m) string (financial) or number Use string for precision-critical values
9(n) COMP integer Binary — range depends on byte length
S9(n)V9(m) COMP-3 string or number Packed decimal — same precision concern

OCCURS clauses become JSON arrays:

       05  AIR-TRANSACTIONS.
           10  AIR-TRANS-COUNT     PIC 9(03).
           10  AIR-TRANS-ENTRY OCCURS 50 TIMES.
               15  AIR-TRANS-DATE  PIC 9(08).
               15  AIR-TRANS-AMT   PIC S9(09)V99 COMP-3.
               15  AIR-TRANS-DESC  PIC X(30).

Becomes:

transactions:
  type: object
  properties:
    transactionCount:
      type: integer
      minimum: 0
      maximum: 999
    entries:
      type: array
      maxItems: 50
      items:
        type: object
        properties:
          date:
            type: string
            pattern: '^\d{8}$'
            description: 'Date in YYYYMMDD format'
          amount:
            type: string
            pattern: '^\-?\d{1,9}\.\d{2}$'
            description: 'Transaction amount with 2 decimal places'
          description:
            type: string
            maxLength: 30

REDEFINES are the hardest mapping problem. When a COBOL record uses REDEFINES to overlay different structures on the same memory, you can't represent that directly in JSON. You have three options:

  1. Discriminated union — Use a type field to determine which interpretation to apply, and document both schemas with oneOf in OpenAPI.
  2. Separate endpoints — Create different API paths for different record interpretations.
  3. Flatten — Return all possible fields, with unused ones set to null. This is the laziest option but often the most practical.

Naming Conventions

COBOL field names like AIR-ACCT-CUST-LST-NM are not suitable for JSON APIs. Establish a naming convention and apply it consistently:

  • Convert to camelCase: accountCustomerLastName
  • Remove prefix codes: Drop the AIR- prefix that indicated the copybook
  • Expand abbreviations: CUSTcustomer, NMname, NBRnumber
  • Document the mapping: Maintain a data dictionary that maps every JSON field back to its COBOL source

SecureFirst maintains this mapping in a shared spreadsheet that both the mainframe team and the API consumers can reference. Carlos insists on it — when a consumer reports that accountBalance is wrong, you need to trace it back to AIR-BALANCE in ACCTINQ without ambiguity.

Here's a sample from SecureFirst's data dictionary:

JSON Field JSON Type COBOL Field COBOL PIC Copybook Notes
policyNumber string (12) PIR-POLICY-NUM X(12) PLCY-INQ-AREA No transformation
holderName string (40) PIR-HOLDER-NAME X(40) PLCY-INQ-AREA Trim trailing spaces
effectiveDate string (date) PIR-EFF-DATE 9(08) PLCY-INQ-AREA YYYYMMDD → YYYY-MM-DD
monthlyPremium string (decimal) PIR-PREMIUM S9(07)V99 COMP-3 PLCY-INQ-AREA Precision-critical
coverageAmount string (decimal) PIR-COVERAGE-AMT S9(09)V99 COMP-3 PLCY-INQ-AREA Precision-critical
status string (enum) PIR-STATUS X(01) PLCY-INQ-AREA A→ACTIVE, I→INACTIVE, C→CANCELLED

This dictionary is the single source of truth for field mappings. When a new developer joins the API team, this is the first document they read.

Handling COBOL Response Codes

COBOL programs typically return numeric response codes — 00 for success, 12 for not found, 16 for error, and so on. These internal codes should not leak into your API. Map them to proper HTTP status codes:

COBOL 00 → HTTP 200 (OK) or 201 (Created)
COBOL 04 → HTTP 200 with warnings in response body
COBOL 12 → HTTP 404 (Not Found)
COBOL 16 → HTTP 400 (Bad Request) or 500 (Internal Server Error)
COBOL 20 → HTTP 503 (Service Unavailable — backend system down)

Your z/OS Connect service archive configuration can include response code mapping rules. But get this right at design time — changing HTTP status codes after consumers have coded against them is a breaking change.

The OpenAPI Specification in Practice

A complete OpenAPI spec for a COBOL-backed account inquiry service looks like the example in code/example-02-openapi-spec.yaml. The key sections are:

  • info — Version, description, contact for the mainframe team
  • servers — Your z/OS Connect endpoints (dev, test, production)
  • paths — The API operations, mapped to COBOL programs
  • components/schemas — JSON schemas derived from COBOL copybooks
  • security — Authentication requirements (OAuth2, API key)

The spec is the contract. Everything else — z/OS Connect configuration, CICS program behavior, network routing — is implementation detail that consumers shouldn't need to know.

Contract-First vs. Code-First

There are two approaches to creating the OpenAPI spec:

Code-first: Build the z/OS Connect service archive from the COBOL copybook, let z/OS Connect generate the OpenAPI spec, then clean it up. This is faster for the first iteration but produces specifications that look machine-generated — verbose, poorly documented, with COBOL-influenced naming.

Contract-first: Write the OpenAPI spec by hand (or using a design tool), get it reviewed and approved, then configure z/OS Connect to match the spec. This produces better API designs but takes longer.

Yuki mandates contract-first for all SecureFirst APIs. "The API spec is a product design document," she says. "You don't let the database schema design your user interface. You shouldn't let the COBOL copybook design your API."

In practice, the process is hybrid: use the z/OS Connect API toolkit to generate an initial mapping from the copybook, then redesign the API spec from the consumer's perspective, then reconfigure the service archive to match the redesigned spec.

Date and Time Handling

COBOL date formats are a notorious source of bugs. Your APIs must handle date conversion reliably.

Common COBOL date formats you'll encounter:

COBOL Format PIC Example Value ISO 8601
YYYYMMDD 9(08) 20250115 2025-01-15
YYMMDD 9(06) 250115 2025-01-15 (century assumption!)
YYYYDDD (Julian) 9(07) 2025015 2025-01-15 (day 15 of 2025)
MM/DD/YYYY X(10) 01/15/2025 2025-01-15
YYYYMMDDHHMMSS 9(14) 20250115142345 2025-01-15T14:23:45Z

Your API should always return ISO 8601 dates. The conversion happens in the z/OS Connect service archive or in a custom interceptor. Be especially careful with:

  • Two-digit years (PIC 9(06)) — You need a windowing rule. SecureFirst treats 00-49 as 20xx and 50-99 as 19xx.
  • Julian dates — Day-of-year requires leap year awareness. Day 60 is March 1 in a common year but February 29 in a leap year.
  • Timestamps without timezone — COBOL timestamps are typically local time with no timezone indicator. Your API must define the timezone. SecureFirst standardizes on UTC for all API timestamps.
  • Timestamps stored across multiple fields — Some COBOL programs store date in one field and time in another. The API should combine them into a single ISO 8601 datetime.

21.4 API Mediation Layer

z/OS Connect gives you API exposure. An API mediation layer gives you API management. These are different concerns, and confusing them is a common mistake.

z/OS Connect translates between REST/JSON and COBOL/COMMAREA. An API mediation layer sits in front of z/OS Connect and handles cross-cutting concerns: routing, load balancing, authentication, rate limiting, monitoring, and service discovery.

Zowe API Mediation Layer

Zowe is the open-source framework for z/OS modernization. Its API Mediation Layer (API ML) provides three core components:

API Gateway — A single entry point for all mainframe APIs. Consumers call the gateway; the gateway routes to the correct backend service. This means consumers don't need to know the hostname and port of every z/OS Connect instance — they just know the gateway.

Consumer → API Gateway (port 7554) → z/OS Connect (port 9443) → CICS

Discovery Service — A Eureka-based service registry. When a z/OS Connect instance starts, it registers itself with the discovery service. When it stops, it deregisters. The API gateway queries the discovery service to find available instances.

API Catalog — A developer portal that aggregates OpenAPI specifications from all registered services. Developers browse the catalog to find available APIs, read documentation, and try API calls interactively.

Gateway Patterns

The API gateway pattern is not unique to mainframe — it's standard in microservices architecture. But mainframe APIs have characteristics that affect how you configure the gateway:

Sticky sessions. Some CICS transactions maintain conversational state across multiple calls. If your API represents a multi-step transaction (start → update → commit), all calls in the sequence must route to the same CICS region. Configure session affinity in the gateway.

Backend health checks. The gateway needs to know if a z/OS Connect instance is healthy. But "healthy" for a mainframe API means more than "the process is running" — it means the CICS region is up, the IPIC connection is active, and the backend program is available. z/OS Connect exposes a health endpoint that checks these dependencies; configure the gateway to use it.

Timeout management. Mainframe transactions have different performance characteristics than microservices. A COBOL program that reads a VSAM file and does complex calculations might take 200ms — fast by mainframe standards, slow by microservices standards. Set gateway timeouts appropriately. SecureFirst uses 5 seconds for inquiry APIs, 15 seconds for update APIs, and 60 seconds for complex calculation APIs (like premium rating).

Service Discovery and Registration

Each z/OS Connect instance registers with the Zowe discovery service using a YAML configuration:

discoveryService:
  enabled: true
  urls: https://discovery.securefirst.com:7553/eureka
  serviceId: POLICY-API
  title: "SecureFirst Policy Administration API"
  description: "COBOL-backed policy lifecycle operations"
  routes:
    - gatewayUrl: "api/v1"
      serviceUrl: /policy
  apiInfo:
    - apiId: com.securefirst.policy
      gatewayUrl: "api/v1"
      swaggerUrl: https://zosconnect1.securefirst.com:9443/policy/api-docs
  healthCheckUrl: https://zosconnect1.securefirst.com:9443/health
  instanceBaseUrl: https://zosconnect1.securefirst.com:9443

When you have multiple z/OS Connect instances (and in production, you will — at minimum two for availability), each registers separately. The discovery service tracks them all, and the gateway load-balances across healthy instances.

IBM API Connect as an Alternative

Some shops use IBM API Connect instead of Zowe API ML. API Connect is a commercial product with more features — a full developer portal, subscription management, billing integration, advanced analytics, and a visual API design tool.

The key difference: Zowe API ML is open source and z/OS-native. API Connect is a broader API management platform that can manage mainframe and non-mainframe APIs from a single control plane. If your enterprise already uses API Connect for cloud APIs, extending it to cover mainframe APIs creates a unified management experience.

SecureFirst uses both. Zowe API ML handles the z/OS-level routing and discovery. API Connect sits above it as the enterprise-wide API management layer, providing the developer portal and analytics that the business stakeholders want.

Gateway Topology for High Availability

For the HA banking system you're building in the progressive project, the gateway topology matters:

                    ┌─────────────┐
                    │  External   │
                    │  Load       │
                    │  Balancer   │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ Gateway  │ │ Gateway  │ │ Gateway  │
        │ Node 1   │ │ Node 2   │ │ Node 3   │
        └────┬─────┘ └────┬─────┘ └────┬─────┘
             │             │            │
             └──────┬──────┘            │
                    │                   │
        ┌───────────┼──────────┐       │
        ▼           ▼          ▼       ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐
  │ z/OS     │ │ z/OS     │ │ z/OS     │
  │ Connect  │ │ Connect  │ │ Connect  │
  │ LPAR A   │ │ LPAR B   │ │ LPAR C   │
  └────┬─────┘ └────┬─────┘ └────┬─────┘
       │             │            │
       ▼             ▼            ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐
  │ CICS     │ │ CICS     │ │ CICS     │
  │ Region A │ │ Region B │ │ Region C │
  └──────────┘ └──────────┘ └──────────┘

Three gateway nodes behind an external load balancer. Three z/OS Connect instances, one per LPAR, each connected to the CICS region on its LPAR. The discovery service tracks all three z/OS Connect instances. If LPAR B goes down for maintenance, its z/OS Connect instance deregisters, and the gateways route traffic to LPARs A and C.

This is the topology you'll implement for the banking system's API layer in the project checkpoint.

Handling Backend Failures

What happens when a CICS region goes down? The gateway's behavior depends on your configuration:

Without circuit breaker: The gateway sends the request to z/OS Connect, which tries to invoke the CICS program, which times out after the configured IPIC timeout (typically 5-10 seconds). The consumer waits for the timeout, then gets a 503 error. If traffic is heavy, requests queue up, and the gateway becomes a bottleneck.

With circuit breaker: After detecting N consecutive failures (configurable, typically 3-5), the circuit breaker opens. Subsequent requests to the failed backend are immediately rejected with 503, without waiting for a timeout. This protects both the consumer (fast failure) and the gateway (no queued requests).

The circuit breaker has three states:

  1. Closed (normal) — Requests flow through. Failures are counted.
  2. Open (tripped) — Requests are immediately rejected. After a cooldown period (typically 30-60 seconds), the circuit moves to half-open.
  3. Half-open (probing) — A single request is allowed through. If it succeeds, the circuit closes. If it fails, the circuit reopens.

Configure circuit breakers at two levels: in the API gateway (for z/OS Connect instance failures) and in z/OS Connect (for CICS region failures). This way, if one CICS region goes down but the z/OS Connect instance is still running, traffic routes to other CICS regions without taking the entire z/OS Connect instance out of rotation.

Cross-Sysplex Routing

In a Parallel Sysplex environment, your z/OS Connect instances on different LPARs can all route to a shared CICS region using cross-sysplex IPIC connections. This gives you maximum flexibility — any z/OS Connect instance can reach any CICS region — but it adds network hops and complexity.

The simpler approach, which SecureFirst uses, is affinity-based: each z/OS Connect instance preferentially routes to the CICS region on its own LPAR. Cross-sysplex routing is the fallback when the local CICS region is unavailable. This minimizes network latency for normal operations while maintaining failover capability.


21.5 API Management

Having APIs is not the same as managing APIs. API management covers the lifecycle concerns that determine whether your APIs are usable, reliable, and sustainable.

API Versioning

COBOL programs change. Copybooks get new fields. Business rules evolve. When these changes affect your API's contract, you need versioning.

Three versioning strategies, in order of recommendation:

URL path versioning/api/v1/accounts, /api/v2/accounts. This is the clearest approach. Each version has its own OpenAPI spec, its own z/OS Connect service archive, and potentially its own COBOL program (or the same program with version-aware logic).

Header versioningAccept: application/vnd.securefirst.v2+json. More "RESTful" but harder for consumers to implement and harder for operations to monitor and route.

Query parameter versioning/api/accounts?version=2. Functional but cluttered. Avoid.

SecureFirst uses URL path versioning. When they added real-time premium calculation to the policy API (which required a new response format), they created /api/v2/policies/{id}/premium while keeping /api/v1/policies/{id}/premium operational for existing consumers. The v1 endpoint is scheduled for deprecation in Q3 2025 — 12 months after v2 launched.

Versioning rules at SecureFirst: - Additive changes (new optional fields) don't require a new version - Removing fields, changing types, or changing behavior requires a new version - Maximum two active versions at any time - Deprecated versions get 12 months of support - Version sunset dates are published in the API catalog

Rate Limiting

Mainframe capacity is not infinite, and it's expensive. A single z15 LPAR running CICS can handle thousands of transactions per second, but those transactions share capacity with batch jobs, DB2 queries, and other workloads. Uncontrolled API traffic can starve other work.

Rate limiting at the API gateway prevents this:

rateLimiting:
  policies:
    - name: standard
      requests: 100
      interval: 60  # seconds
      burstLimit: 150
    - name: premium
      requests: 1000
      interval: 60
      burstLimit: 1500
    - name: internal
      requests: 5000
      interval: 60
      burstLimit: 7500

Assign rate limit policies based on the consumer:

  • Standard — External partners get 100 requests/minute. Enough for normal usage, not enough to impact mainframe capacity.
  • Premium — Key partners (or internal high-volume services) get 1,000 requests/minute with a signed SLA.
  • Internal — Internal services get 5,000 requests/minute, with the understanding that capacity planning covers the load.

When a consumer exceeds the rate limit, the gateway returns 429 Too Many Requests with a Retry-After header. The consumer's data doesn't reach the mainframe, and mainframe capacity is protected.

Dynamic rate limiting is the next evolution. Instead of fixed limits, adjust the rate based on real-time mainframe utilization. If the z/OS LPAR is at 85% CPU utilization (from RMF data), reduce external API rate limits by 50%. If utilization drops below 60%, restore full limits. SecureFirst is piloting this using a custom z/OS Connect interceptor that queries RMF data.

API Analytics

You can't manage what you can't measure. API analytics answer critical questions:

  • Which APIs are being used? By whom? How often?
  • What's the average response time? 95th percentile? 99th?
  • Which consumers are approaching their rate limits?
  • Which APIs have error rates above threshold?
  • How is usage trending month over month?

The Zowe API ML gateway produces access logs that can be forwarded to your analytics platform (Splunk, Elastic, Datadog). Each log entry includes:

{
  "timestamp": "2025-01-15T14:23:45.123Z",
  "method": "GET",
  "path": "/api/v1/accounts/1234567890",
  "statusCode": 200,
  "responseTimeMs": 187,
  "consumerId": "mobile-app-prod",
  "gatewayNode": "gw-node-2",
  "backendService": "ACCT-API",
  "backendInstance": "zosconnect-lpar-a",
  "requestSizeBytes": 0,
  "responseSizeBytes": 1247,
  "rateLimitRemaining": 87
}

SecureFirst dashboards show: - Real-time request volume by API and consumer - Response time heatmaps (time of day vs. API vs. latency) - Error rate trending with anomaly detection - Capacity utilization correlation (API volume vs. LPAR CPU)

These dashboards saved them during a mobile app launch when request volume spiked to 10x normal. They caught it within minutes, increased rate limits for the mobile app consumer, and provisioned additional z/OS Connect capacity — before any user experienced an error.

API Analytics for Capacity Planning

Beyond operational monitoring, API analytics drive capacity planning decisions. When SecureFirst's API traffic grew 300% in 12 months (from the mobile app adoption), they needed to answer: "When will we need more mainframe capacity?"

The answer came from correlating API call volume with mainframe CPU utilization (from RMF data). They built a linear regression model: each additional 1,000 API calls per minute consumed approximately 2.3% of one LPAR's CPU capacity. At current growth rates, they'd exhaust their API-allocated capacity (30% of one LPAR) in 8 months. This gave them time to plan a capacity upgrade — well before users noticed degradation.

The data also revealed optimization opportunities. The account balance API (ACCTBAL) was called 3x more often than the full account inquiry (ACCTINQ), but both used the same CICS program — ACCTINQ with a flag to return only the balance fields. Carlos's team created a dedicated ACCTBAL program that read only the balance record (not the full account record), reducing the average response time from 85ms to 35ms and the CPU cost per call by 60%.

Without API analytics, they never would have known that the balance check was the dominant workload or that a targeted optimization could free up significant capacity.

Developer Portal

The developer portal is where API consumers discover, learn about, and subscribe to your APIs. A good developer portal includes:

  • API catalog — Browseable list of all available APIs with descriptions
  • Interactive documentation — Try-it-now functionality (Swagger UI) that lets developers test API calls
  • Getting started guides — Step-by-step instructions for authentication, first API call, error handling
  • SDK downloads — Pre-generated client libraries in popular languages
  • Subscription management — Self-service API key provisioning and rate limit monitoring
  • Changelog — Version history, deprecation notices, migration guides
  • Status page — Current availability and incident history

The API catalog in Zowe API ML provides basic functionality. For a full developer portal, SecureFirst uses IBM API Connect's developer portal, which adds subscription management, analytics dashboards for consumers, and community features (forums, FAQs).


21.6 Security for Mainframe APIs

Security for mainframe APIs involves two worlds: the modern API security stack (OAuth2, JWT, API keys) and the mainframe security subsystem (RACF, ACF2, Top Secret). Your job is to bridge them seamlessly.

Authentication Patterns

OAuth 2.0 is the standard for API authentication. The flow for a mainframe API:

  1. Consumer authenticates with your OAuth authorization server (could be IBM DataPower, Okta, Azure AD, or any OIDC-compliant provider)
  2. Authorization server issues an access token (JWT)
  3. Consumer includes the token in the API request: Authorization: Bearer eyJhbG...
  4. API gateway validates the token (signature, expiration, audience)
  5. Gateway extracts the identity and permissions from the token claims
  6. Gateway forwards the request to z/OS Connect with the identity context
  7. z/OS Connect maps the identity to a RACF user ID
  8. CICS executes the program under that RACF user ID, subject to normal RACF authorization

The critical step is #7 — identity mapping. z/OS Connect's SAFCredentialMapper maps an OAuth identity to a RACF user ID. You configure this mapping based on your security requirements:

  • One-to-one mapping — Each OAuth client ID maps to a specific RACF user ID. Most granular but most administrative overhead.
  • Role-based mapping — OAuth token claims include a role (e.g., read-only, read-write, admin). Each role maps to a RACF user ID with appropriate permissions.
  • Client certificate mapping — For system-to-system calls, the client's TLS certificate maps to a RACF user ID.

API keys are simpler but less secure. They're appropriate for low-risk, read-only APIs where you need to identify the consumer but don't need user-level authorization. The API key is sent in a header (X-API-Key: abc123) and the gateway validates it against a registry. API keys should not be used for APIs that modify data or access sensitive information.

Mutual TLS (mTLS) provides the strongest authentication for system-to-system calls. Both the client and the server present certificates. The API gateway validates the client certificate against a trust store, and the certificate's Distinguished Name (DN) maps to a RACF identity.

JWT Token Validation

When the gateway validates a JWT, it checks:

1. Signature — Is the token signed by a trusted issuer?
   → Verify against the issuer's public key (JWKS endpoint)

2. Expiration — Is the token still valid?
   → Check 'exp' claim against current time (with clock skew tolerance)

3. Audience — Is the token intended for this API?
   → Check 'aud' claim matches your API's audience identifier

4. Issuer — Was the token issued by a trusted authorization server?
   → Check 'iss' claim against whitelist

5. Scopes — Does the token grant permission for this operation?
   → Check 'scope' claim includes required scope (e.g., 'accounts:read')

If any check fails, the gateway returns 401 Unauthorized and the request never reaches the mainframe. This is defense in depth — even if someone breaches the network layer, they can't call your APIs without a valid token.

RACF Integration

On the mainframe side, RACF (Resource Access Control Facility) is the ultimate authority. When a request reaches CICS, it executes under a RACF user ID that determines:

  • Which CICS transactions the user can execute
  • Which CICS resources (files, queues, programs) the user can access
  • Which DB2 tables the user can query or modify
  • Which MQ queues the user can read from or write to

For API access, create dedicated RACF user IDs that represent API consumers or roles, not individual human users. These user IDs should have the minimum permissions required for the API operations they support.

RACF User ID: APIREAD
  - CICS Transaction: ACIQ (Account Inquiry) — READ
  - VSAM File: ACCTMAST — READ
  - DB2: SELECT on ACCOUNT table

RACF User ID: APIWRITE
  - Everything APIREAD has, plus:
  - CICS Transaction: ACUP (Account Update) — UPDATE
  - VSAM File: ACCTMAST — UPDATE
  - DB2: SELECT, INSERT, UPDATE on ACCOUNT table

RACF User ID: APIADMIN
  - Everything APIWRITE has, plus:
  - CICS Transaction: ACAD (Account Admin) — ALL
  - DB2: ALL on ACCOUNT, AUDIT tables

API Security Checklist

Every mainframe API should meet these security requirements before going live:

  • [ ] TLS 1.2 or higher for all connections (no plain HTTP, ever)
  • [ ] OAuth 2.0 or mTLS authentication (API keys only for low-risk read-only)
  • [ ] JWT token validation with all five checks (signature, expiration, audience, issuer, scopes)
  • [ ] RACF user ID mapping with least-privilege permissions
  • [ ] Rate limiting configured and tested
  • [ ] Input validation at the API layer (don't rely on COBOL program validation alone)
  • [ ] Audit logging of all API calls (who, what, when, from where)
  • [ ] Error responses don't leak internal details (no COBOL abend codes in HTTP responses)
  • [ ] API keys and credentials rotated on schedule
  • [ ] Penetration testing completed by security team

At SecureFirst, Yuki mandates this checklist for every API before it's registered in the production API catalog. Carlos's team runs automated security scans (OWASP ZAP) against every API in the staging environment as part of the CI/CD pipeline.

Error Response Sanitization

This deserves special emphasis. When your COBOL program abends with an ASRA (program check) or returns an unexpected response code, the API layer must not expose this to the consumer. An error response that says "CICS ABEND ASRA IN PROGRAM ACCTINQ AT OFFSET X'00001A4C'" is a security vulnerability — it tells an attacker the program name, the error type, and the memory offset.

Instead, z/OS Connect should catch these errors and return a generic response:

{
  "status": 500,
  "error": "Internal Server Error",
  "message": "The service is temporarily unavailable. Please retry.",
  "correlationId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

The correlationId lets your operations team trace the error in the z/OS Connect logs and the CICS system dump without exposing internals to the consumer.

Input Validation at the API Layer

Don't rely on your COBOL program to validate input. Validate at the API layer and reject bad requests before they reach the mainframe.

Why? Three reasons:

  1. Mainframe CPU costs money. Every invalid request that reaches CICS consumes CPU cycles. If an attacker sends 10,000 requests with invalid account numbers, you don't want your COBOL program processing (and rejecting) each one.

  2. Better error messages. The API layer can return precise JSON error responses ("Account ID must be exactly 10 digits"). The COBOL program returns a response code (16) that the API layer has to interpret.

  3. Defense in depth. The COBOL program should still validate input (it always has), but the API layer is an additional validation boundary. This protects against mapping errors — if the z/OS Connect mapping accidentally passes an invalid value, the COBOL program catches it.

Your OpenAPI schema provides the first level of validation — the gateway rejects requests that don't match the schema (wrong types, missing required fields, values outside declared ranges). Custom validation interceptors in z/OS Connect provide the second level — business rules that can't be expressed in the schema (e.g., "source and destination accounts cannot be the same").

Transport Security in Detail

All mainframe API traffic must use TLS 1.2 or higher. This is non-negotiable. Here's the complete transport security configuration:

TLS protocol versions: TLS 1.2 minimum. TLS 1.3 preferred if both client and server support it. TLS 1.0 and 1.1 are disabled — they have known vulnerabilities.

Cipher suites: Use only strong cipher suites. SecureFirst's allowed list: - TLS_AES_256_GCM_SHA384 (TLS 1.3) - TLS_AES_128_GCM_SHA256 (TLS 1.3) - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (TLS 1.2) - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (TLS 1.2)

Disable all CBC-mode cipher suites (vulnerable to BEAST and Lucky13 attacks) and all cipher suites using RSA key exchange (no forward secrecy).

Certificate management: Server certificates should use 2048-bit RSA or 256-bit ECDSA keys, signed by a trusted CA. Certificate rotation should be automated — at SecureFirst, certificates are renewed 30 days before expiry, and the API monitoring dashboard alerts at 60 days and 30 days remaining.

HSTS (HTTP Strict Transport Security): The API gateway returns the Strict-Transport-Security header to prevent protocol downgrade attacks:

Strict-Transport-Security: max-age=31536000; includeSubDomains

21.7 API Lifecycle and Governance

APIs aren't fire-and-forget. They have a lifecycle — design, build, test, deploy, operate, version, deprecate, retire — and each stage needs governance.

The API Lifecycle

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  DESIGN  │ →  │  BUILD   │ →  │   TEST   │ →  │  DEPLOY  │
│          │    │          │    │          │    │          │
│ OpenAPI  │    │ z/OS     │    │ Contract │    │ Gateway  │
│ spec     │    │ Connect  │    │ tests    │    │ registr. │
│ review   │    │ config   │    │ Security │    │ Rate     │
│          │    │ COBOL    │    │ scan     │    │ limits   │
│          │    │ changes  │    │ Load     │    │ Catalog  │
│          │    │ (if any) │    │ test     │    │ publish  │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
      ↑                                              │
      │                                              ▼
┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  RETIRE  │ ←  │DEPRECATE │ ←  │ VERSION  │ ←  │ OPERATE  │
│          │    │          │    │          │    │          │
│ Remove   │    │ Sunset   │    │ New      │    │ Monitor  │
│ from     │    │ notice   │    │ version  │    │ SLA      │
│ gateway  │    │ Consumer │    │ Old ver  │    │ Incident │
│ Archive  │    │ migration│    │ deprec.  │    │ mgmt     │
│ docs     │    │ guide    │    │ plan     │    │ Capacity │
└──────────┘    └──────────┘    └──────────┘    └──────────┘

Governance Framework

At SecureFirst, every API goes through a governance process managed by their API Center of Excellence (CoE):

Design review — Before any code is written, the OpenAPI spec is reviewed by: - A mainframe architect (does the data mapping make sense?) - A security architect (are the security requirements met?) - An API design reviewer (does it follow naming conventions, REST best practices?) - A representative consumer (does it meet their needs?)

Contract-first testing — Tests are written against the OpenAPI spec before the service is built. This ensures the spec is complete and unambiguous. Tools like Dredd or Schemathesis generate test cases from the spec automatically.

Change management — Any change to a production API goes through a formal change request: 1. Document the change and its impact 2. Update the OpenAPI spec 3. Assess whether it's a breaking change (new version required) or additive (same version) 4. Update all affected service archives 5. Test in staging with real consumer traffic (shadow testing) 6. Deploy with canary release (10% of traffic, then 50%, then 100%) 7. Notify affected consumers

SLA management — Each API has a defined SLA:

Metric Target Measurement
Availability 99.95% Monthly uptime
Response time (p50) < 200ms Median latency
Response time (p99) < 2,000ms Tail latency
Error rate < 0.1% 5xx responses / total
Time to first byte < 100ms Excludes network transit

SLA violations trigger an incident review. If the issue is mainframe performance, the CICS systems programmers investigate. If it's gateway configuration, the API team handles it. The SLA keeps everyone accountable.

API Documentation Standards

Documentation is part of governance. Every API in SecureFirst's catalog includes:

  • OpenAPI spec (generated, always current)
  • Getting started guide (written by the API team, reviewed by a consumer)
  • Authentication guide (how to get tokens, what scopes to request)
  • Error handling guide (every error response documented with resolution steps)
  • Data dictionary (every field explained, with business context)
  • Rate limit documentation (limits, burst allowances, what to do when throttled)
  • Changelog (every version, every change, every deprecation)
  • Migration guide (when a new version ships, how to upgrade)

Deprecation and Retirement

When you need to retire an API version:

  1. Announce — Add a Deprecation header to all responses: Deprecation: true. Add a Sunset header with the retirement date: Sunset: Sat, 01 Mar 2026 00:00:00 GMT.
  2. Document — Publish a migration guide showing consumers exactly how to update their code.
  3. Monitor — Track which consumers are still using the deprecated version. Contact them directly.
  4. Grace period — Give consumers at least 6 months (12 for external partners) to migrate.
  5. Retire — Remove the API from the gateway. Old endpoints return 410 Gone.

Sandra's team at Federal Benefits learned this lesson the hard way. They retired an API version with only 30 days' notice and broke three inter-agency integrations. Now they enforce a minimum 6-month grace period, no exceptions.

API as a Product

The most mature API shops treat APIs as products, not projects. This means:

  • Product owner — Someone is responsible for the API's success, not just its implementation
  • Roadmap — The API has a published roadmap with planned features and deprecations
  • User research — The API team talks to consumers to understand their needs
  • Developer experience — The API is designed for ease of use, not just functionality
  • Metrics — Success is measured by adoption, consumer satisfaction, and business value — not just uptime

SecureFirst's policy API has a product owner (Carlos), a published roadmap, and quarterly feedback sessions with key consumers. When the mobile app team needed batch policy lookup (multiple policies in one call), they submitted a feature request through the API product process. Carlos prioritized it against other requests, designed it with consumer input, and delivered it in the next quarterly release.

This is the difference between "we have APIs" and "we have an API platform."


Progressive Project: HA Banking Transaction Processing System

Checkpoint 21: The API Layer

In Chapters 19 and 20, you built the messaging and event-driven components of the HA banking system. Now you're adding the API layer that exposes the banking services to external consumers.

Objective: Design and configure the API layer for the HA banking system, including z/OS Connect service archives, OpenAPI specifications, API gateway configuration, and security setup.

Requirements

Your banking system API layer must support these operations:

API Method Path Backend Auth
Account Inquiry GET /api/v1/accounts/{id} CICS ACCTINQ OAuth2
Account Balance GET /api/v1/accounts/{id}/balance CICS ACCTBAL OAuth2
Fund Transfer POST /api/v1/transfers CICS XFERINIT → MQ → CICS XFERPROC OAuth2
Transfer Status GET /api/v1/transfers/{id} CICS XFERSTS OAuth2
Transaction History GET /api/v1/accounts/{id}/transactions CICS TXNHIST OAuth2
Batch Transfer POST /api/v1/batch/transfers MQ → Batch XFERBAT OAuth2 + mTLS

Design Decisions

Fund Transfer is asynchronous. When a consumer POSTs to /api/v1/transfers, the API doesn't wait for the transfer to complete. Instead:

  1. The API validates the request and creates a transfer record (status: PENDING)
  2. The API puts a message on the transfer queue (from Chapter 19)
  3. The API returns 202 Accepted with the transfer ID and a Location header pointing to the status endpoint
  4. The consumer polls /api/v1/transfers/{id} to check progress

This is the correct pattern for operations that may take longer than an API timeout. It bridges the synchronous API world with the asynchronous MQ-driven processing you built in Chapter 19.

Batch Transfer requires mTLS in addition to OAuth2. This endpoint is for system-to-system bulk transfers from partner banks. The higher security requirement reflects the higher risk — a single call can initiate thousands of transfers.

Transaction History supports pagination. The response includes totalRecords, page, pageSize, and navigation links. The COBOL backend program handles pagination via START and READNEXT commands on the transaction VSAM file (from Chapter 13's CICS patterns).

What to Build

Refer to code/project-checkpoint.md for the detailed implementation specifications. You'll produce:

  1. OpenAPI 3.0 specification for the complete banking API
  2. z/OS Connect configuration including CICS service provider connections for the HA topology
  3. API gateway configuration with rate limiting, routing, and health checks
  4. Security configuration covering OAuth2, RACF mapping, and mTLS for batch endpoints
  5. A monitoring/alerting specification for API SLA compliance

This checkpoint integrates everything from Chapters 13 (CICS), 19 (MQ), 20 (event-driven), and 21 (API layer) into a cohesive, production-ready architecture.


Chapter Summary

API-first COBOL is not about replacing mainframe technology — it's about making mainframe technology accessible to modern consumers. z/OS Connect translates between the REST/JSON world and the COBOL/COMMAREA world. An API mediation layer (Zowe API ML or API Connect) provides routing, security, rate limiting, and developer experience. OpenAPI specifications create contracts that both mainframe and cloud teams can work from.

The key insight: your COBOL programs don't change. The business logic that's been running reliably for decades stays exactly where it is. What changes is how you expose it — through well-designed, well-documented, well-managed APIs that any modern application can consume.

SecureFirst's journey from flat-file integration to API-first architecture took 18 months. Your organization's journey will be different, but the pattern is the same: start with the highest-value services, get the security right, establish governance, and iterate.

The progressive project checkpoint for this chapter puts all of these pieces together. You'll design the API layer for the banking system, including the OpenAPI specification, z/OS Connect configuration, gateway topology, security architecture, and monitoring strategy. This is the most integration-heavy checkpoint in the project — it touches everything from CICS (Chapter 13) to MQ (Chapter 19) to event-driven processing (Chapter 20).

In Chapter 22, we'll move from exposing mainframe services as APIs to consuming cloud services from COBOL — the other direction of the integration equation. The API requester pattern you glimpsed in Section 21.2 becomes the main event.