Appendix J: EBCDIC and ASCII Reference
Overview
IBM mainframes use EBCDIC (Extended Binary Coded Decimal Interchange Code) as their native character encoding, while virtually every other computing platform uses ASCII (American Standard Code for Information Interchange) or its superset UTF-8. This difference has significant implications for COBOL programmers: sort orders change, string comparisons produce different results, data must be converted when transferred between platforms, and numeric data storage follows different conventions. This appendix provides a comprehensive reference for understanding and working with EBCDIC in the context of COBOL programming.
1. Character Encoding Fundamentals
Both EBCDIC and ASCII are single-byte encodings that map each character to a numeric code point (a value from 0 to 255). The two encodings assign different code points to the same characters, which means that a byte containing the value X'C1' represents the letter "A" in EBCDIC but the character "A" with an accent in ASCII (where the letter "A" is X'41').
EBCDIC Origins
EBCDIC was developed by IBM in the early 1960s for the System/360 mainframe family. Its design reflects the punch card encoding system that preceded it: the code point layout maps to the zone and digit punches of the 80-column Hollerith card. This heritage explains some of EBCDIC's unusual characteristics, such as the non-contiguous letter ranges and the placement of digits after letters in the collating sequence.
ASCII Origins
ASCII was developed by the American Standards Association (now ANSI) in 1963 as a standard for data communication. Its 7-bit design (128 characters, later extended to 256 with various code pages) places characters in a more intuitive order and has become the foundation for Unicode and UTF-8.
2. Collating Sequence Differences
The most important difference between EBCDIC and ASCII for COBOL programmers is the collating sequence -- the order in which characters sort. This affects SORT statements, conditional comparisons (IF A < B), EVALUATE ranges, and any logic that depends on character ordering.
EBCDIC Collating Order (Low to High)
1. Special characters: space, period, comma, etc.
2. Lowercase letters: a-i, j-r, s-z
3. Uppercase letters: A-I, J-R, S-Z
4. Digits: 0-9
ASCII Collating Order (Low to High)
1. Special characters: space, most punctuation
2. Digits: 0-9
3. Uppercase letters: A-Z
4. Lowercase letters: a-z
Key Differences
| Comparison | EBCDIC Result | ASCII Result |
|---|---|---|
| 'a' vs 'A' | 'a' < 'A' (lowercase sorts first) | 'a' > 'A' (uppercase sorts first) |
| '1' vs 'A' | '1' > 'A' (digits sort after letters) | '1' < 'A' (digits sort before letters) |
| '1' vs 'a' | '1' > 'a' (digits sort after letters) | '1' < 'a' (digits sort before letters) |
| SPACE vs '0' | SPACE < '0' (same in both) | SPACE < '0' (same in both) |
Impact on COBOL Programs
SORT statement: A SORT on a character field produces different results on EBCDIC and ASCII systems. A file sorted alphabetically on a mainframe (EBCDIC) will have lowercase names before uppercase names, and all names before numeric codes. The same file sorted on a GnuCOBOL system (ASCII) will have uppercase names before lowercase names, and numeric codes before all names.
* These records sort differently on EBCDIC vs ASCII:
* EBCDIC order: apple, Baker, CHARLIE, 123ABC
* ASCII order: 123ABC, BAKER, CHARLIE, apple
IF/EVALUATE comparisons: Conditions like IF WS-CODE < "A" have different meanings depending on the platform. On EBCDIC, digits are greater than "A"; on ASCII, digits are less than "A".
INSPECT statement: INSPECT TALLYING and INSPECT REPLACING with BEFORE/AFTER phrases are sensitive to collating sequence when processing character ranges.
COBOL ALPHABET Clause
To ensure consistent behavior across platforms, COBOL provides the ALPHABET clause in the SPECIAL-NAMES paragraph:
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
SPECIAL-NAMES.
ALPHABET WS-ASCII IS ASCII
ALPHABET WS-EBCDIC IS EBCDIC.
The SORT statement can specify which collating sequence to use:
SORT SORT-FILE
ON ASCENDING KEY SORT-NAME
COLLATING SEQUENCE IS WS-ASCII
USING INPUT-FILE
GIVING OUTPUT-FILE.
3. Common Character Code Point Table
The following table shows code points for commonly used characters in both encodings.
Control Characters
| Character | EBCDIC Hex | ASCII Hex | Notes |
|---|---|---|---|
| Null (NUL) | X'00' | X'00' | LOW-VALUE in COBOL |
| Space | X'40' | X'20' | SPACE figurative constant |
| Newline (LF) | X'25' | X'0A' | Line feed |
| Carriage Return | X'0D' | X'0D' | Same in both |
| Tab (HT) | X'05' | X'09' | Horizontal tab |
Digits
| Character | EBCDIC Hex | ASCII Hex |
|---|---|---|
| 0 | X'F0' | X'30' |
| 1 | X'F1' | X'31' |
| 2 | X'F2' | X'32' |
| 3 | X'F3' | X'33' |
| 4 | X'F4' | X'34' |
| 5 | X'F5' | X'35' |
| 6 | X'F6' | X'36' |
| 7 | X'F7' | X'37' |
| 8 | X'F8' | X'38' |
| 9 | X'F9' | X'39' |
Uppercase Letters
| Character | EBCDIC Hex | ASCII Hex |
|---|---|---|
| A | X'C1' | X'41' |
| B | X'C2' | X'42' |
| C | X'C3' | X'43' |
| ... | ... | ... |
| I | X'C9' | X'49' |
| J | X'D1' | X'4A' |
| K | X'D2' | X'4B' |
| ... | ... | ... |
| R | X'D9' | X'52' |
| S | X'E2' | X'53' |
| T | X'E3' | X'54' |
| ... | ... | ... |
| Z | X'E9' | X'5A' |
Lowercase Letters
| Character | EBCDIC Hex | ASCII Hex |
|---|---|---|
| a | X'81' | X'61' |
| b | X'82' | X'62' |
| c | X'83' | X'63' |
| ... | ... | ... |
| i | X'89' | X'69' |
| j | X'91' | X'6A' |
| k | X'92' | X'6B' |
| ... | ... | ... |
| r | X'99' | X'72' |
| s | X'A2' | X'73' |
| t | X'A3' | X'74' |
| ... | ... | ... |
| z | X'A9' | X'7A' |
Special Characters
| Character | EBCDIC Hex | ASCII Hex |
|---|---|---|
| . (period) | X'4B' | X'2E' |
| , (comma) | X'6B' | X'2C' |
| $ (dollar) | X'5B' | X'24' |
| / (slash) | X'61' | X'2F' |
| - (hyphen) | X'60' | X'2D' |
| + (plus) | X'4E' | X'2B' |
| * (asterisk) | X'5C' | X'2A' |
| = (equals) | X'7E' | X'3D' |
| ( (left paren) | X'4D' | X'28' |
| ) (right paren) | X'5D' | X'29' |
| ' (apostrophe) | X'7D' | X'27' |
| " (quote) | X'7F' | X'22' |
Note on EBCDIC letter gaps: In EBCDIC, the letters are not contiguous. A through I (X'C1'--X'C9') are followed by a gap, then J through R (X'D1'--X'D9'), another gap, then S through Z (X'E2'--X'E9'). The same pattern applies to lowercase. This means you cannot reliably use the range X'C1' through X'E9' to represent "all uppercase letters" -- the gaps contain non-letter characters. The ALPHABETIC class test in COBOL handles this correctly regardless of platform.
4. Zoned Decimal Representation in EBCDIC
Numeric data items with USAGE DISPLAY (the default) are stored as zoned decimal in EBCDIC. Each digit occupies one byte, with the zone nibble (high 4 bits) set to X'F' and the digit nibble (low 4 bits) containing the numeric value.
Unsigned Zoned Decimal
For unsigned data (PIC 9(n)), each byte follows the pattern X'Fd' where d is the digit value:
PIC 9(5) containing 12345:
Byte 1: X'F1' (digit 1)
Byte 2: X'F2' (digit 2)
Byte 3: X'F3' (digit 3)
Byte 4: X'F4' (digit 4)
Byte 5: X'F5' (digit 5)
Signed Zoned Decimal (Sign Overpunch)
For signed data (PIC S9(n) USAGE DISPLAY), the sign is encoded in the zone nibble of the last byte. This is called sign overpunch because the sign "overwrites" the zone of the last digit:
| Sign | Zone Nibble | Example: +3 | Example: -3 |
|---|---|---|---|
| Positive | X'C' | X'C3' | -- |
| Negative | X'D' | -- | X'D3' |
| Unsigned | X'F' | X'F3' | -- |
PIC S9(5) containing +12345:
Byte 1: X'F1'
Byte 2: X'F2'
Byte 3: X'F3'
Byte 4: X'F4'
Byte 5: X'C5' <-- zone is C (positive), digit is 5
PIC S9(5) containing -12345:
Byte 1: X'F1'
Byte 2: X'F2'
Byte 3: X'F3'
Byte 4: X'F4'
Byte 5: X'D5' <-- zone is D (negative), digit is 5
SIGN IS LEADING and SIGN IS SEPARATE
COBOL allows alternative sign storage:
01 WS-AMOUNT PIC S9(5) SIGN IS LEADING.
*> Sign overpunch on FIRST byte instead of last
01 WS-AMOUNT PIC S9(5) SIGN IS LEADING SEPARATE.
*> Sign stored as separate byte: "+" (X'4E') or "-" (X'60')
*> Total size: 6 bytes (1 sign + 5 digits)
01 WS-AMOUNT PIC S9(5) SIGN IS TRAILING SEPARATE.
*> Sign stored as separate byte after the last digit
*> Total size: 6 bytes (5 digits + 1 sign)
The SEPARATE option stores the sign as a distinct character (+ or -) rather than overpunching a digit, making the data human-readable in dumps and displays but adding one byte to the field size.
5. Packed Decimal (COMP-3) Representation
Packed decimal storage is the same on EBCDIC and ASCII platforms because it is a numeric encoding, not a character encoding. Each byte stores two decimal digits (one in each nibble), with the sign in the low nibble of the last byte.
PIC S9(7) COMP-3 containing +1234567:
Byte 1: X'01' (digits 0, 1)
Byte 2: X'23' (digits 2, 3)
Byte 3: X'45' (digits 4, 5)
Byte 4: X'67' (digits 6, 7) -- wait, sign goes in low nibble of last byte
Correction -- with 7 digits + sign = 8 nibbles = 4 bytes:
Byte 1: X'01' (leading zero, digit 1)
Byte 2: X'23' (digits 2, 3)
Byte 3: X'45' (digits 4, 5)
Byte 4: X'67' (digits 6, 7) -- actually:
PIC S9(7) COMP-3 containing +1234567:
Total nibbles needed: 7 digits + 1 sign = 8 nibbles = 4 bytes
Byte 1: X'12' (digits 1, 2)
Byte 2: X'34' (digits 3, 4)
Byte 3: X'56' (digits 5, 6)
Byte 4: X'7C' (digit 7, sign C=positive)
PIC S9(7) COMP-3 containing -1234567:
Byte 1: X'12'
Byte 2: X'34'
Byte 3: X'56'
Byte 4: X'7D' (digit 7, sign D=negative)
Sign nibble values in packed decimal: - X'C' = positive (preferred) - X'D' = negative - X'F' = unsigned (treated as positive) - X'A', X'E' = also treated as positive (less common) - X'B' = also treated as negative (less common) - Any other value (0-9) in the sign nibble causes an S0C7 ABEND
6. Data Conversion Between Platforms
When transferring data between mainframe (EBCDIC) and distributed (ASCII) systems, character data must be converted. This conversion affects COBOL programs that:
- Process files received from or sent to distributed systems
- Communicate with APIs and web services (which use ASCII/UTF-8)
- Run on GnuCOBOL (ASCII) but process data designed for mainframe (EBCDIC) formats
Character Data Conversion
Character fields (PIC X, PIC A) require byte-by-byte conversion using a translation table that maps each EBCDIC code point to its ASCII equivalent and vice versa.
IBM provides conversion through: - FTP with translation: FTP's ASCII transfer mode performs EBCDIC-to-ASCII conversion automatically - ICONV utility: z/OS Unix System Services provides the iconv command for character set conversion - COBOL INSPECT CONVERTING: Can perform custom character translation within a program - z/OS Connect EE: Automatically converts data encoding when exposing COBOL services as REST APIs
Numeric Data Considerations
- COMP-3 (packed decimal): No conversion needed between platforms (binary representation is platform-independent)
- COMP/BINARY: Mainframes are big-endian (most significant byte first); most distributed systems are little-endian (least significant byte first). Byte order must be reversed during transfer.
- DISPLAY numeric (zoned decimal): Requires conversion because the zone nibbles differ between EBCDIC (X'F') and ASCII (X'3')
- Sign overpunch: EBCDIC sign overpunch characters do not have equivalent representations in ASCII. Data with trailing sign overpunch must be converted to a separate sign representation before transfer.
Practical Conversion in COBOL
* Convert EBCDIC zoned decimal to a portable format
* before sending to a distributed system:
MOVE WS-EBCDIC-AMOUNT TO WS-NUMERIC-WORK
MOVE WS-NUMERIC-WORK TO WS-ASCII-DISPLAY
* WS-ASCII-DISPLAY is PIC -(9)9.99
* which produces a readable signed decimal string
For file transfers, it is generally better to convert numeric fields to character representation (edited numeric or string formats like JSON) rather than attempting binary-level conversion. The JSON GENERATE statement in COBOL 2014 handles all encoding and representation issues automatically.
7. HIGH-VALUE and LOW-VALUE Across Platforms
The figurative constants HIGH-VALUE and LOW-VALUE have different byte values on EBCDIC and ASCII systems, but they always represent the highest and lowest values in the system's collating sequence:
| Constant | EBCDIC | ASCII | Meaning |
|---|---|---|---|
| LOW-VALUE | X'00' | X'00' | Lowest possible byte value (same on both) |
| HIGH-VALUE | X'FF' | X'FF' | Highest possible byte value (same on both) |
Although the byte values are the same, the characters they represent differ. X'FF' in EBCDIC is an unassigned code point; X'FF' in ASCII (extended) is the character y with a diaeresis. In practice, HIGH-VALUE and LOW-VALUE are used as sentinel values and should never be displayed or printed.
COBOL usage pattern (works on both platforms):
* Mark end-of-data sentinel
MOVE HIGH-VALUES TO WS-PREV-KEY
* Test for initialized state
IF WS-PREV-KEY = HIGH-VALUES
MOVE INPUT-KEY TO WS-PREV-KEY
END-IF
8. Practical Recommendations for Cross-Platform COBOL
-
Use IS NUMERIC and IS ALPHABETIC class tests rather than range comparisons (IF WS-CHAR >= "A" AND <= "Z") that depend on contiguous character ranges. EBCDIC has gaps between letter groups that make range tests unreliable.
-
Use FUNCTION UPPER-CASE and FUNCTION LOWER-CASE instead of INSPECT CONVERTING with literal character ranges. The intrinsic functions handle both EBCDIC and ASCII correctly.
-
Use COMP-3 for numeric data in shared files. Packed decimal is encoding-independent, avoiding the zoned decimal conversion issues.
-
Specify COLLATING SEQUENCE on SORT when consistent sort order across platforms is required.
-
Use JSON GENERATE/PARSE for data exchange between COBOL and other systems. JSON is always in UTF-8/ASCII, and the COBOL 2014 statements handle all encoding conversion automatically.
-
Test on both platforms if your programs must run on both mainframe (EBCDIC) and distributed (ASCII) environments. GnuCOBOL operates in ASCII mode by default, which can expose collating sequence assumptions in code ported from the mainframe.
-
Document encoding assumptions. When a program depends on a specific character encoding (e.g., sign overpunch processing, hex literal comparisons), document this dependency clearly so that future maintainers are aware of the platform dependency.
-
Be aware of code page variants. EBCDIC has many code pages (CCSID values). US English typically uses CCSID 037 or 1140. International systems may use different EBCDIC code pages where special characters (brackets, braces, exclamation mark) have different code points. The COBOL ALPHABET clause and SPECIAL-NAMES paragraph can map between code pages when needed.