Appendix G: Production Readiness Checklists
Every production deployment is a contract. You're telling the operations team, the business, and every person who answers the phone at 2am: "This is ready." These checklists are how you keep that promise.
I've watched too many "it worked in the test region" deployments turn into Sev-1 incidents. The pattern is always the same: someone skipped a step they considered obvious, and the obvious thing is exactly what failed. Kwame Mensah keeps a framed printout on his wall at CNB that reads: "Production doesn't care about your intentions." He's right.
Use these checklists as a starting point. Every shop should maintain its own version, tailored to local standards and learned from local scars. The checklists below reflect the minimum bar for systems that process financial transactions, health claims, or government benefits — systems where failure has consequences measured in dollars, reputation, and regulatory citations.
G.1 COBOL Program Readiness Checklist
This checklist applies to every COBOL program being deployed to production, whether it's a new program or a modification to an existing one.
G.1.1 Code Quality and Standards
| # |
Item |
Verified |
Notes |
| 1 |
Program compiles clean with NOSOURCE and FLAG(W) — zero warnings |
|
|
| 2 |
Compiler options match production standards (RENT, DYNAM, THREAD for CICS; NORENT for batch unless dynamic calls required) |
|
|
| 3 |
COPY statements reference production copybook libraries, not development or test |
|
|
| 4 |
All paragraph names follow site naming conventions |
|
|
| 5 |
WORKING-STORAGE data names are meaningful (no WS-X, WS-TEMP, WS-FLAG without context) |
|
|
| 6 |
No hardcoded literals for values that could change (dates, limits, thresholds) — use configuration tables or COPY members |
|
|
| 7 |
No orphaned paragraphs (dead code) remaining from development |
|
|
| 8 |
All PERFORM THRU ranges are contiguous and correct (Chapter 3) |
|
|
| 9 |
COBOL reference modification validated — no potential out-of-bounds references |
|
|
| 10 |
Program has been peer-reviewed by at least one developer who did not write it |
|
|
G.1.2 Error Handling
| # |
Item |
Verified |
Notes |
| 11 |
Every SQL statement is followed by SQLCODE checking — no exceptions |
|
|
| 12 |
Every CICS command includes appropriate RESP/RESP2 handling or HANDLE CONDITION (Chapter 18) |
|
|
| 13 |
Every MQ API call checks COMPCODE and REASON (Chapter 19) |
|
|
| 14 |
Every file OPEN/READ/WRITE/CLOSE checks FILE STATUS |
|
|
| 15 |
Deadlock/timeout retry logic implemented for all DB2 update paths (Chapter 8) |
|
|
| 16 |
Retry logic includes a maximum retry count and exponential backoff |
|
|
| 17 |
All error paths produce a meaningful diagnostic message (program name, paragraph name, error code, key data values) |
|
|
| 18 |
Abend handler is registered (CICS HANDLE ABEND or LE condition handler) |
|
|
| 19 |
Program does not GOBACK or STOP RUN from a CICS environment |
|
|
| 20 |
Resource cleanup occurs on all exit paths, including error paths (cursors closed, files closed, queues closed) |
|
|
| # |
Item |
Verified |
Notes |
| 21 |
DB2 cursors use WITH HOLD where COMMIT is issued within the cursor loop (Chapter 12) |
|
|
| 22 |
Commit frequency is tuned and documented (not too frequent, not too infrequent) (Chapter 8) |
|
|
| 23 |
Multi-row FETCH used where appropriate for high-volume reads (Chapter 7) |
|
|
| 24 |
No SELECT * — only columns actually needed are fetched |
|
|
| 25 |
WORKING-STORAGE tables are appropriately sized (not allocating 100MB for a 1,000-row lookup) |
|
|
| 26 |
CICS programs are quasi-reentrant — no persistent state between task invocations |
|
|
| 27 |
Batch programs close and re-open cursors after COMMIT if not using WITH HOLD |
|
|
| 28 |
I/O buffer sizes are optimized (BUFNO, BLKSIZE) for batch sequential processing (Chapter 26) |
|
|
| 29 |
CPU-intensive calculations are not repeated unnecessarily inside loops |
|
|
| 30 |
Program has been tested at production-like volumes (not just with 10 records) |
|
|
G.1.4 Logging and Diagnostics
| # |
Item |
Verified |
Notes |
| 31 |
Program produces a start-of-job message with program name, version, date, time, and runtime parameters |
|
|
| 32 |
Program produces an end-of-job message with record counts, elapsed time, and completion status |
|
|
| 33 |
Error messages include enough context to diagnose the problem without access to the source code |
|
|
| 34 |
Audit trail records are written for all business-critical operations (Chapter 28) |
|
|
| 35 |
Logging does not include sensitive data (SSNs, account numbers, passwords) in clear text |
|
|
| 36 |
Batch programs produce checkpoint messages showing progress (every N records or N minutes) |
|
|
G.2 DB2 Readiness Checklist
This checklist covers DB2 objects (tables, indexes, tablespaces) supporting the deployed COBOL program. Lisa Tran reviews this checklist personally for every CNB production deployment — and she will send back any deployment request that has a blank cell.
G.2.1 Schema and Objects
| # |
Item |
Verified |
Notes |
| 1 |
DDL has been reviewed and approved by the DBA team |
|
|
| 2 |
Table and column names follow site naming conventions |
|
|
| 3 |
Primary keys are defined on all tables |
|
|
| 4 |
Foreign keys are defined where referential integrity is required |
|
|
| 5 |
NOT NULL WITH DEFAULT is used appropriately (no unintentional nullable columns) |
|
|
| 6 |
Data types match COBOL host variable definitions exactly (no implicit conversions) |
|
|
| 7 |
Tablespace type is appropriate (segmented, partitioned, universal) for the workload |
|
|
| 8 |
Partitioning key and range are designed for the access pattern and growth (Chapter 6) |
|
|
| 9 |
Buffer pool assignments are reviewed and approved |
|
|
G.2.2 Indexing and Access Paths
| # |
Item |
Verified |
Notes |
| 10 |
All critical query access paths have been verified via EXPLAIN (Chapter 11) |
|
|
| 11 |
Indexes support the most common predicates with sufficient matching columns |
|
|
| 12 |
No unnecessary indexes that will slow INSERT/UPDATE/DELETE without benefit |
|
|
| 13 |
Clustering index is appropriate for the dominant access pattern |
|
|
| 14 |
EXPLAIN output shows no unexpected tablespace scans for high-volume queries |
|
|
| 15 |
Access paths are stable — re-running EXPLAIN after RUNSTATS produces the same plan |
|
|
| 16 |
PLANMGMT(EXTENDED) is set to preserve fallback access paths (Chapter 6) |
|
|
G.2.3 Statistics and Maintenance
| # |
Item |
Verified |
Notes |
| 17 |
RUNSTATS has been run with appropriate column and COLGROUP statistics (Chapter 9) |
|
|
| 18 |
Distribution statistics are collected for columns with skewed data |
|
|
| 19 |
RUNSTATS schedule is defined and integrated with the batch stream |
|
|
| 20 |
REORG schedule is defined based on CLUSTERRATIO and FARINDREF thresholds |
|
|
| 21 |
Image COPY schedule is defined (full and incremental) |
|
|
| 22 |
RECOVER has been tested — you can actually restore from the image copies |
|
|
| 23 |
Deferred utility states (REORP, COPYP, RBDP) are clear |
|
|
G.2.4 Locking and Concurrency
| # |
Item |
Verified |
Notes |
| 24 |
LOCKSIZE is appropriate for the concurrency profile (ROW vs. PAGE) (Chapter 8) |
|
|
| 25 |
LOCKMAX is set to prevent runaway escalation |
|
|
| 26 |
Isolation levels are correct for each SQL statement (CS for online, RS/RR only where required) |
|
|
| 27 |
Access ordering is consistent across programs that touch the same tables (deadlock prevention) |
|
|
| 28 |
Batch programs that run concurrently with online have been tested under concurrent load |
|
|
| 29 |
Lock contention testing has been performed with expected concurrent user counts |
|
|
G.2.5 Binding and Packages
| # |
Item |
Verified |
Notes |
| 30 |
Package has been bound with production BIND options (VALIDATE(BIND), ISOLATION, RELEASE) |
|
|
| 31 |
Package owner and qualifier are correct for the production environment |
|
|
| 32 |
GRANT EXECUTE ON PACKAGE has been issued to the appropriate plan or auth ID |
|
|
| 33 |
If using dynamic SQL, the DYNAMICRULES setting is correct (BIND for batch, RUN for CICS) |
|
|
| 34 |
APREUSE(WARN) has been considered for existing packages being rebound |
|
|
G.3 CICS Readiness Checklist
This checklist covers CICS region configuration, program deployment, and transaction setup. It applies to every CICS program deployment — new programs and updates alike.
G.3.1 Region and Resource Definitions
| # |
Item |
Verified |
Notes |
| 1 |
PROGRAM definition installed in all target AORs (Chapter 13) |
|
|
| 2 |
TRANSACTION definition installed with correct PROGRAM, TWASIZE, TASKDATALOC |
|
|
| 3 |
MAPSET definition installed (if BMS maps are used) |
|
|
| 4 |
DB2CONN and DB2ENTRY/DB2TRAN definitions are correct for this program's DB2 access |
|
|
| 5 |
FILE definitions are correct (if VSAM files are accessed from CICS) |
|
|
| 6 |
TDQUEUE definitions are correct (if transient data queues are used) |
|
|
| 7 |
TSMODEL definitions are correct for any temporary storage usage, including shared TS if Sysplex-aware (Chapter 13) |
|
|
| 8 |
ENQMODEL definitions are in place if the program uses EXEC CICS ENQ |
|
|
| 9 |
WEBSERVICE or URIMAP definitions are correct for REST/SOAP services (Chapter 14) |
|
|
| 10 |
PIPELINE definitions are installed and tested (if web service pipelines are used) |
|
|
G.3.2 Security Configuration
| # |
Item |
Verified |
Notes |
| 11 |
RACF TCICSTRN profile protects the transaction ID (Chapter 16) |
|
|
| 12 |
Surrogate user security is configured for API-driven transactions |
|
|
| 13 |
CICS resource-level security is enabled for files, queues, and programs as required |
|
|
| 14 |
Transaction isolation class is set correctly (to prevent low-priority work from consuming shared resources) |
|
|
| 15 |
SSL/TLS certificates are installed and valid for external-facing services |
|
|
G.3.3 Monitoring and Operations
| # |
Item |
Verified |
Notes |
| 16 |
Transaction has been added to CICSPlex SM monitoring definitions (Chapter 13) |
|
|
| 17 |
Response time thresholds are defined for alerting |
|
|
| 18 |
Transaction has been included in workload management routing rules |
|
|
| 19 |
CICS statistics will capture this transaction's performance data |
|
|
| 20 |
Operations team has been briefed on the new/changed transaction and knows the escalation path |
|
|
| 21 |
Runbook entry exists for this transaction's failure modes |
|
|
G.3.4 Testing and Validation
| # |
Item |
Verified |
Notes |
| 22 |
Program tested in a CICS test region with production-equivalent configuration |
|
|
| 23 |
Channel/container sizes validated (no truncation) (Chapter 15) |
|
|
| 24 |
COMMAREA or channel data validated for boundary conditions (max length, empty, null) |
|
|
| 25 |
Transaction tested for thread safety — no WORKING-STORAGE state leaks between tasks |
|
|
| 26 |
Recovery tested — transaction was abended mid-flight and resubmitted to verify clean recovery (Chapter 18) |
|
|
| 27 |
XA recovery tested if program participates in distributed transactions |
|
|
G.4 Batch Readiness Checklist
This checklist covers batch jobs, JCL, scheduling, and operational procedures. Rob Calloway at CNB treats the nightly batch window like a space launch — everything on the checklist is verified before the window opens.
G.4.1 JCL Review
| # |
Item |
Verified |
Notes |
| 1 |
JCL compiles clean through JCL checker (no JCL errors) |
|
|
| 2 |
JOBCLASS and MSGCLASS are correct for production |
|
|
| 3 |
REGION and MEMLIMIT are set appropriately (Chapter 2) |
|
|
| 4 |
TIME parameter is set (no TIME=NOLIMIT in production) |
|
|
| 5 |
COND or IF/THEN/ELSE step-level condition checking is in place |
|
|
| 6 |
DD statements reference production dataset names (not test or development) |
|
|
| 7 |
DISP parameters are correct (especially DISP=(,CATLG,DELETE) for output datasets) |
|
|
| 8 |
SMS classes are assigned correctly (data class, storage class, management class) (Chapter 4) |
|
|
| 9 |
DCB parameters (BLKSIZE, LRECL, RECFM) are optimized for the access pattern (Chapter 26) |
|
|
| 10 |
BUFNO is tuned for sequential processing performance |
|
|
| 11 |
GDG references are correct (+1 for new, 0 for current) (Chapter 23) |
|
|
| 12 |
Temporary datasets use system-managed allocation (no hardcoded VOL=SER) |
|
|
| 13 |
JOBLIB/STEPLIB references production load libraries (not test) |
|
|
G.4.2 Checkpoint/Restart
| # |
Item |
Verified |
Notes |
| 14 |
Checkpoint/restart logic is implemented for long-running programs (Chapter 24) |
|
|
| 15 |
Checkpoint frequency is tuned (Chapter 24: balance between restart time and commit overhead) |
|
|
| 16 |
Restart from checkpoint has been tested end-to-end |
|
|
| 17 |
Checkpoint dataset is allocated with sufficient space and retention |
|
|
| 18 |
Restart JCL or procedure is documented and available to operations |
|
|
| 19 |
If using DB2, restart correctly repositions the cursor (Chapter 24) |
|
|
| 20 |
If using GDGs, restart logic handles the "partially written generation" scenario |
|
|
G.4.3 Scheduling and Dependencies
| # |
Item |
Verified |
Notes |
| 21 |
Job is registered in the enterprise scheduler (TWS/OPC, CA-7, Control-M) |
|
|
| 22 |
Predecessor dependencies are correct and complete |
|
|
| 23 |
Successor dependencies are correct — downstream jobs will trigger when this job completes |
|
|
| 24 |
Resource requirements are defined (DB2 threads, initiator class, tape drives) |
|
|
| 25 |
Expected run time is documented and alerting threshold is set |
|
|
| 26 |
Critical path impact has been assessed — is this job on the critical path? (Chapter 23) |
|
|
| 27 |
Month-end / quarter-end / year-end special processing is correctly conditioned |
|
|
| 28 |
Holiday schedule variations are accounted for |
|
|
G.4.4 Recovery and Operations
| # |
Item |
Verified |
Notes |
| 29 |
Restart procedure is documented in the operations runbook (Chapter 27) |
|
|
| 30 |
Common abend codes and their resolutions are documented |
|
|
| 31 |
Escalation contacts are defined (Level 1 through Level 3) |
|
|
| 32 |
Expected output — record counts, file sizes, completion messages — is documented so operations can verify successful completion |
|
|
| 33 |
Fallback procedure exists if the job cannot complete within the batch window |
|
|
| 34 |
Impact of skipping this job is documented (what breaks downstream?) |
|
|
G.5 Security Readiness Checklist
This checklist covers security configuration for the deployed application. Ahmad Rashidi at Pinnacle Health won't sign off on any deployment that hasn't passed this checklist — and he's right to insist.
G.5.1 RACF Profiles
| # |
Item |
Verified |
Notes |
| 1 |
Dataset profiles are defined with UACC(NONE) (Chapter 28) |
|
|
| 2 |
Program access is restricted to authorized userids and groups |
|
|
| 3 |
DB2 DSNR class profiles control plan/package access |
|
|
| 4 |
CICS TCICSTRN profiles are defined for all new transaction IDs (Chapter 16) |
|
|
| 5 |
MQ profiles (MQQUEUE, MQPROC, MQNLIST classes) are defined as needed (Chapter 19) |
|
|
| 6 |
Started task userid is defined with minimum necessary privileges |
|
|
| 7 |
No SPECIAL or OPERATIONS authority has been granted to application userids |
|
|
| 8 |
RACF PERMIT commands use group-based access, not individual userid access |
|
|
| 9 |
All profiles have been tested — authorized users can access, unauthorized users are denied |
|
|
G.5.2 Encryption and Data Protection
| # |
Item |
Verified |
Notes |
| 10 |
Sensitive data is encrypted at rest (z/OS dataset encryption, DB2 column encryption) (Chapter 28) |
|
|
| 11 |
Sensitive data is encrypted in transit (AT-TLS, SSL/TLS for CICS web services) |
|
|
| 12 |
Encryption key labels are defined in ICSF and access is restricted via CSFKEYS class |
|
|
| 13 |
No sensitive data (SSNs, PANs, passwords) appears in log files, SYSOUT, or dump datasets |
|
|
| 14 |
PCI-DSS cardholder data handling requirements are met (if applicable) |
|
|
| 15 |
HIPAA ePHI data handling requirements are met (if applicable) |
|
|
G.5.3 Audit and Compliance
| # |
Item |
Verified |
Notes |
| 16 |
SMF recording is enabled for all access types relevant to this application (Chapter 28) |
|
|
| 17 |
Application-level audit trail captures who, what, when, where, and outcome |
|
|
| 18 |
Audit records cannot be modified or deleted by application userids |
|
|
| 19 |
Audit data retention meets regulatory requirements (7 years for SOX, as defined by policy) |
|
|
| 20 |
Compliance team has reviewed and approved the security configuration |
|
|
G.6 Disaster Recovery Readiness Checklist
This checklist covers DR preparedness for the deployed application. Sandra Chen at FBA learned the hard way that a DR plan you haven't tested is a DR plan you don't have.
G.6.1 Backup and Replication
| # |
Item |
Verified |
Notes |
| 1 |
DB2 image copy schedule covers all tablespaces used by this application (Chapter 30) |
|
|
| 2 |
VSAM backup schedule covers all clusters used by this application |
|
|
| 3 |
Sequential datasets are included in DFSMShsm backup policies |
|
|
| 4 |
Log datasets (DB2 active/archive logs, CICS journals) are replicated to DR site |
|
|
| 5 |
If using GDPS, XRC or PPRC replication is verified for all application volumes |
|
|
| 6 |
RPO (Recovery Point Objective) is documented and achievable with current replication lag |
|
|
G.6.2 Recovery Procedures
| # |
Item |
Verified |
Notes |
| 7 |
RTO (Recovery Time Objective) is documented and has been validated |
|
|
| 8 |
DB2 RECOVER procedure is documented and tested for all application tablespaces |
|
|
| 9 |
CICS cold start / warm start procedure at DR site is documented |
|
|
| 10 |
Batch restart procedure from DR site is documented |
|
|
| 11 |
MQ channel configuration at DR site mirrors production |
|
|
| 12 |
Network routing (DNS, VTAM, TCP/IP) at DR site is configured to receive application traffic |
|
|
G.6.3 DR Testing
| # |
Item |
Verified |
Notes |
| 13 |
Application has been included in the most recent DR test |
|
|
| 14 |
DR test confirmed that the application can start, process transactions, and produce correct results at the DR site |
|
|
| 15 |
DR test results are documented with actual RTO and RPO achieved |
|
|
| 16 |
Gaps identified during DR test have been remediated |
|
|
| 17 |
Runbook for this application's DR failover has been updated within the last 6 months |
|
|
| 18 |
Contact list for DR escalation is current |
|
|
G.7 Deployment Sign-Off
Before any production deployment, the following roles must sign off:
| Role |
Name |
Sign-Off |
Date |
| Development Lead |
|
|
|
| Peer Reviewer |
|
|
|
| DBA |
|
|
|
| CICS Systems Programmer |
|
|
|
| Security Administrator |
|
|
|
| Operations / Batch Lead |
|
|
|
| Business Owner / Product Owner |
|
|
|
| Change Management |
|
|
|
Deployment window: ________
Rollback procedure documented: Yes / No
Rollback tested: Yes / No
Estimated deployment time: ________
Post-deployment validation plan: ________
G.8 Post-Deployment Validation
After deployment, verify the following within the first production cycle:
| # |
Item |
Verified |
Notes |
| 1 |
Program executed successfully (check return codes) |
|
|
| 2 |
Record counts match expectations |
|
|
| 3 |
CICS transaction response times are within threshold |
|
|
| 4 |
No unexpected DB2 lock escalations or timeouts |
|
|
| 5 |
No unexpected abends or error messages in SYSLOG |
|
|
| 6 |
Audit trail records are being written correctly |
|
|
| 7 |
Monitoring alerts are functioning (test by simulating a threshold breach if feasible) |
|
|
| 8 |
Downstream systems received expected data |
|
|
| 9 |
Business users confirm correct behavior |
|
|
| 10 |
Performance baseline captured for future comparison |
|
|
A note on checklist culture. Checklists work only if people actually use them, and people will only use them if they're maintained. A 200-item checklist that hasn't been updated in three years is worse than no checklist at all — it creates false confidence. Review these checklists quarterly. Remove items that no longer apply. Add items for every production incident that a checklist item could have prevented. The best checklists are living documents, scarred by experience.