Appendix G: Production Readiness Checklists

Every production deployment is a contract. You're telling the operations team, the business, and every person who answers the phone at 2am: "This is ready." These checklists are how you keep that promise.

I've watched too many "it worked in the test region" deployments turn into Sev-1 incidents. The pattern is always the same: someone skipped a step they considered obvious, and the obvious thing is exactly what failed. Kwame Mensah keeps a framed printout on his wall at CNB that reads: "Production doesn't care about your intentions." He's right.

Use these checklists as a starting point. Every shop should maintain its own version, tailored to local standards and learned from local scars. The checklists below reflect the minimum bar for systems that process financial transactions, health claims, or government benefits — systems where failure has consequences measured in dollars, reputation, and regulatory citations.


G.1 COBOL Program Readiness Checklist

This checklist applies to every COBOL program being deployed to production, whether it's a new program or a modification to an existing one.

G.1.1 Code Quality and Standards

# Item Verified Notes
1 Program compiles clean with NOSOURCE and FLAG(W) — zero warnings
2 Compiler options match production standards (RENT, DYNAM, THREAD for CICS; NORENT for batch unless dynamic calls required)
3 COPY statements reference production copybook libraries, not development or test
4 All paragraph names follow site naming conventions
5 WORKING-STORAGE data names are meaningful (no WS-X, WS-TEMP, WS-FLAG without context)
6 No hardcoded literals for values that could change (dates, limits, thresholds) — use configuration tables or COPY members
7 No orphaned paragraphs (dead code) remaining from development
8 All PERFORM THRU ranges are contiguous and correct (Chapter 3)
9 COBOL reference modification validated — no potential out-of-bounds references
10 Program has been peer-reviewed by at least one developer who did not write it

G.1.2 Error Handling

# Item Verified Notes
11 Every SQL statement is followed by SQLCODE checking — no exceptions
12 Every CICS command includes appropriate RESP/RESP2 handling or HANDLE CONDITION (Chapter 18)
13 Every MQ API call checks COMPCODE and REASON (Chapter 19)
14 Every file OPEN/READ/WRITE/CLOSE checks FILE STATUS
15 Deadlock/timeout retry logic implemented for all DB2 update paths (Chapter 8)
16 Retry logic includes a maximum retry count and exponential backoff
17 All error paths produce a meaningful diagnostic message (program name, paragraph name, error code, key data values)
18 Abend handler is registered (CICS HANDLE ABEND or LE condition handler)
19 Program does not GOBACK or STOP RUN from a CICS environment
20 Resource cleanup occurs on all exit paths, including error paths (cursors closed, files closed, queues closed)

G.1.3 Performance and Scalability

# Item Verified Notes
21 DB2 cursors use WITH HOLD where COMMIT is issued within the cursor loop (Chapter 12)
22 Commit frequency is tuned and documented (not too frequent, not too infrequent) (Chapter 8)
23 Multi-row FETCH used where appropriate for high-volume reads (Chapter 7)
24 No SELECT * — only columns actually needed are fetched
25 WORKING-STORAGE tables are appropriately sized (not allocating 100MB for a 1,000-row lookup)
26 CICS programs are quasi-reentrant — no persistent state between task invocations
27 Batch programs close and re-open cursors after COMMIT if not using WITH HOLD
28 I/O buffer sizes are optimized (BUFNO, BLKSIZE) for batch sequential processing (Chapter 26)
29 CPU-intensive calculations are not repeated unnecessarily inside loops
30 Program has been tested at production-like volumes (not just with 10 records)

G.1.4 Logging and Diagnostics

# Item Verified Notes
31 Program produces a start-of-job message with program name, version, date, time, and runtime parameters
32 Program produces an end-of-job message with record counts, elapsed time, and completion status
33 Error messages include enough context to diagnose the problem without access to the source code
34 Audit trail records are written for all business-critical operations (Chapter 28)
35 Logging does not include sensitive data (SSNs, account numbers, passwords) in clear text
36 Batch programs produce checkpoint messages showing progress (every N records or N minutes)

G.2 DB2 Readiness Checklist

This checklist covers DB2 objects (tables, indexes, tablespaces) supporting the deployed COBOL program. Lisa Tran reviews this checklist personally for every CNB production deployment — and she will send back any deployment request that has a blank cell.

G.2.1 Schema and Objects

# Item Verified Notes
1 DDL has been reviewed and approved by the DBA team
2 Table and column names follow site naming conventions
3 Primary keys are defined on all tables
4 Foreign keys are defined where referential integrity is required
5 NOT NULL WITH DEFAULT is used appropriately (no unintentional nullable columns)
6 Data types match COBOL host variable definitions exactly (no implicit conversions)
7 Tablespace type is appropriate (segmented, partitioned, universal) for the workload
8 Partitioning key and range are designed for the access pattern and growth (Chapter 6)
9 Buffer pool assignments are reviewed and approved

G.2.2 Indexing and Access Paths

# Item Verified Notes
10 All critical query access paths have been verified via EXPLAIN (Chapter 11)
11 Indexes support the most common predicates with sufficient matching columns
12 No unnecessary indexes that will slow INSERT/UPDATE/DELETE without benefit
13 Clustering index is appropriate for the dominant access pattern
14 EXPLAIN output shows no unexpected tablespace scans for high-volume queries
15 Access paths are stable — re-running EXPLAIN after RUNSTATS produces the same plan
16 PLANMGMT(EXTENDED) is set to preserve fallback access paths (Chapter 6)

G.2.3 Statistics and Maintenance

# Item Verified Notes
17 RUNSTATS has been run with appropriate column and COLGROUP statistics (Chapter 9)
18 Distribution statistics are collected for columns with skewed data
19 RUNSTATS schedule is defined and integrated with the batch stream
20 REORG schedule is defined based on CLUSTERRATIO and FARINDREF thresholds
21 Image COPY schedule is defined (full and incremental)
22 RECOVER has been tested — you can actually restore from the image copies
23 Deferred utility states (REORP, COPYP, RBDP) are clear

G.2.4 Locking and Concurrency

# Item Verified Notes
24 LOCKSIZE is appropriate for the concurrency profile (ROW vs. PAGE) (Chapter 8)
25 LOCKMAX is set to prevent runaway escalation
26 Isolation levels are correct for each SQL statement (CS for online, RS/RR only where required)
27 Access ordering is consistent across programs that touch the same tables (deadlock prevention)
28 Batch programs that run concurrently with online have been tested under concurrent load
29 Lock contention testing has been performed with expected concurrent user counts

G.2.5 Binding and Packages

# Item Verified Notes
30 Package has been bound with production BIND options (VALIDATE(BIND), ISOLATION, RELEASE)
31 Package owner and qualifier are correct for the production environment
32 GRANT EXECUTE ON PACKAGE has been issued to the appropriate plan or auth ID
33 If using dynamic SQL, the DYNAMICRULES setting is correct (BIND for batch, RUN for CICS)
34 APREUSE(WARN) has been considered for existing packages being rebound

G.3 CICS Readiness Checklist

This checklist covers CICS region configuration, program deployment, and transaction setup. It applies to every CICS program deployment — new programs and updates alike.

G.3.1 Region and Resource Definitions

# Item Verified Notes
1 PROGRAM definition installed in all target AORs (Chapter 13)
2 TRANSACTION definition installed with correct PROGRAM, TWASIZE, TASKDATALOC
3 MAPSET definition installed (if BMS maps are used)
4 DB2CONN and DB2ENTRY/DB2TRAN definitions are correct for this program's DB2 access
5 FILE definitions are correct (if VSAM files are accessed from CICS)
6 TDQUEUE definitions are correct (if transient data queues are used)
7 TSMODEL definitions are correct for any temporary storage usage, including shared TS if Sysplex-aware (Chapter 13)
8 ENQMODEL definitions are in place if the program uses EXEC CICS ENQ
9 WEBSERVICE or URIMAP definitions are correct for REST/SOAP services (Chapter 14)
10 PIPELINE definitions are installed and tested (if web service pipelines are used)

G.3.2 Security Configuration

# Item Verified Notes
11 RACF TCICSTRN profile protects the transaction ID (Chapter 16)
12 Surrogate user security is configured for API-driven transactions
13 CICS resource-level security is enabled for files, queues, and programs as required
14 Transaction isolation class is set correctly (to prevent low-priority work from consuming shared resources)
15 SSL/TLS certificates are installed and valid for external-facing services

G.3.3 Monitoring and Operations

# Item Verified Notes
16 Transaction has been added to CICSPlex SM monitoring definitions (Chapter 13)
17 Response time thresholds are defined for alerting
18 Transaction has been included in workload management routing rules
19 CICS statistics will capture this transaction's performance data
20 Operations team has been briefed on the new/changed transaction and knows the escalation path
21 Runbook entry exists for this transaction's failure modes

G.3.4 Testing and Validation

# Item Verified Notes
22 Program tested in a CICS test region with production-equivalent configuration
23 Channel/container sizes validated (no truncation) (Chapter 15)
24 COMMAREA or channel data validated for boundary conditions (max length, empty, null)
25 Transaction tested for thread safety — no WORKING-STORAGE state leaks between tasks
26 Recovery tested — transaction was abended mid-flight and resubmitted to verify clean recovery (Chapter 18)
27 XA recovery tested if program participates in distributed transactions

G.4 Batch Readiness Checklist

This checklist covers batch jobs, JCL, scheduling, and operational procedures. Rob Calloway at CNB treats the nightly batch window like a space launch — everything on the checklist is verified before the window opens.

G.4.1 JCL Review

# Item Verified Notes
1 JCL compiles clean through JCL checker (no JCL errors)
2 JOBCLASS and MSGCLASS are correct for production
3 REGION and MEMLIMIT are set appropriately (Chapter 2)
4 TIME parameter is set (no TIME=NOLIMIT in production)
5 COND or IF/THEN/ELSE step-level condition checking is in place
6 DD statements reference production dataset names (not test or development)
7 DISP parameters are correct (especially DISP=(,CATLG,DELETE) for output datasets)
8 SMS classes are assigned correctly (data class, storage class, management class) (Chapter 4)
9 DCB parameters (BLKSIZE, LRECL, RECFM) are optimized for the access pattern (Chapter 26)
10 BUFNO is tuned for sequential processing performance
11 GDG references are correct (+1 for new, 0 for current) (Chapter 23)
12 Temporary datasets use system-managed allocation (no hardcoded VOL=SER)
13 JOBLIB/STEPLIB references production load libraries (not test)

G.4.2 Checkpoint/Restart

# Item Verified Notes
14 Checkpoint/restart logic is implemented for long-running programs (Chapter 24)
15 Checkpoint frequency is tuned (Chapter 24: balance between restart time and commit overhead)
16 Restart from checkpoint has been tested end-to-end
17 Checkpoint dataset is allocated with sufficient space and retention
18 Restart JCL or procedure is documented and available to operations
19 If using DB2, restart correctly repositions the cursor (Chapter 24)
20 If using GDGs, restart logic handles the "partially written generation" scenario

G.4.3 Scheduling and Dependencies

# Item Verified Notes
21 Job is registered in the enterprise scheduler (TWS/OPC, CA-7, Control-M)
22 Predecessor dependencies are correct and complete
23 Successor dependencies are correct — downstream jobs will trigger when this job completes
24 Resource requirements are defined (DB2 threads, initiator class, tape drives)
25 Expected run time is documented and alerting threshold is set
26 Critical path impact has been assessed — is this job on the critical path? (Chapter 23)
27 Month-end / quarter-end / year-end special processing is correctly conditioned
28 Holiday schedule variations are accounted for

G.4.4 Recovery and Operations

# Item Verified Notes
29 Restart procedure is documented in the operations runbook (Chapter 27)
30 Common abend codes and their resolutions are documented
31 Escalation contacts are defined (Level 1 through Level 3)
32 Expected output — record counts, file sizes, completion messages — is documented so operations can verify successful completion
33 Fallback procedure exists if the job cannot complete within the batch window
34 Impact of skipping this job is documented (what breaks downstream?)

G.5 Security Readiness Checklist

This checklist covers security configuration for the deployed application. Ahmad Rashidi at Pinnacle Health won't sign off on any deployment that hasn't passed this checklist — and he's right to insist.

G.5.1 RACF Profiles

# Item Verified Notes
1 Dataset profiles are defined with UACC(NONE) (Chapter 28)
2 Program access is restricted to authorized userids and groups
3 DB2 DSNR class profiles control plan/package access
4 CICS TCICSTRN profiles are defined for all new transaction IDs (Chapter 16)
5 MQ profiles (MQQUEUE, MQPROC, MQNLIST classes) are defined as needed (Chapter 19)
6 Started task userid is defined with minimum necessary privileges
7 No SPECIAL or OPERATIONS authority has been granted to application userids
8 RACF PERMIT commands use group-based access, not individual userid access
9 All profiles have been tested — authorized users can access, unauthorized users are denied

G.5.2 Encryption and Data Protection

# Item Verified Notes
10 Sensitive data is encrypted at rest (z/OS dataset encryption, DB2 column encryption) (Chapter 28)
11 Sensitive data is encrypted in transit (AT-TLS, SSL/TLS for CICS web services)
12 Encryption key labels are defined in ICSF and access is restricted via CSFKEYS class
13 No sensitive data (SSNs, PANs, passwords) appears in log files, SYSOUT, or dump datasets
14 PCI-DSS cardholder data handling requirements are met (if applicable)
15 HIPAA ePHI data handling requirements are met (if applicable)

G.5.3 Audit and Compliance

# Item Verified Notes
16 SMF recording is enabled for all access types relevant to this application (Chapter 28)
17 Application-level audit trail captures who, what, when, where, and outcome
18 Audit records cannot be modified or deleted by application userids
19 Audit data retention meets regulatory requirements (7 years for SOX, as defined by policy)
20 Compliance team has reviewed and approved the security configuration

G.6 Disaster Recovery Readiness Checklist

This checklist covers DR preparedness for the deployed application. Sandra Chen at FBA learned the hard way that a DR plan you haven't tested is a DR plan you don't have.

G.6.1 Backup and Replication

# Item Verified Notes
1 DB2 image copy schedule covers all tablespaces used by this application (Chapter 30)
2 VSAM backup schedule covers all clusters used by this application
3 Sequential datasets are included in DFSMShsm backup policies
4 Log datasets (DB2 active/archive logs, CICS journals) are replicated to DR site
5 If using GDPS, XRC or PPRC replication is verified for all application volumes
6 RPO (Recovery Point Objective) is documented and achievable with current replication lag

G.6.2 Recovery Procedures

# Item Verified Notes
7 RTO (Recovery Time Objective) is documented and has been validated
8 DB2 RECOVER procedure is documented and tested for all application tablespaces
9 CICS cold start / warm start procedure at DR site is documented
10 Batch restart procedure from DR site is documented
11 MQ channel configuration at DR site mirrors production
12 Network routing (DNS, VTAM, TCP/IP) at DR site is configured to receive application traffic

G.6.3 DR Testing

# Item Verified Notes
13 Application has been included in the most recent DR test
14 DR test confirmed that the application can start, process transactions, and produce correct results at the DR site
15 DR test results are documented with actual RTO and RPO achieved
16 Gaps identified during DR test have been remediated
17 Runbook for this application's DR failover has been updated within the last 6 months
18 Contact list for DR escalation is current

G.7 Deployment Sign-Off

Before any production deployment, the following roles must sign off:

Role Name Sign-Off Date
Development Lead
Peer Reviewer
DBA
CICS Systems Programmer
Security Administrator
Operations / Batch Lead
Business Owner / Product Owner
Change Management

Deployment window: ________

Rollback procedure documented: Yes / No

Rollback tested: Yes / No

Estimated deployment time: ________

Post-deployment validation plan: ________


G.8 Post-Deployment Validation

After deployment, verify the following within the first production cycle:

# Item Verified Notes
1 Program executed successfully (check return codes)
2 Record counts match expectations
3 CICS transaction response times are within threshold
4 No unexpected DB2 lock escalations or timeouts
5 No unexpected abends or error messages in SYSLOG
6 Audit trail records are being written correctly
7 Monitoring alerts are functioning (test by simulating a threshold breach if feasible)
8 Downstream systems received expected data
9 Business users confirm correct behavior
10 Performance baseline captured for future comparison

A note on checklist culture. Checklists work only if people actually use them, and people will only use them if they're maintained. A 200-item checklist that hasn't been updated in three years is worse than no checklist at all — it creates false confidence. Review these checklists quarterly. Remove items that no longer apply. Add items for every production incident that a checklist item could have prevented. The best checklists are living documents, scarred by experience.