Compatible and Incompatible Encodings

Overview to Compatible and Incompatible Encodings

ASCII is the foundation for most encodings, and is used by most personal computers, minicomputers, and workstations. However, the IBM mainframe uses an EBCDIC encoding. Therefore, ASCII and EBCDIC machines and data are incompatible. Transcoding is necessary if some or all characters in one encoding are different from the characters in the other encoding.
However, to avoid transcoding, you can create a data set and specify an encoding value that SAS will not transcode. For example, if you use the following values in either the ENCODING= data set option, or the INENCODING=, or the OUTENCODING= option in the LIBNAME statement, transcoding is not performed:
  • ANY specifies that no transcoding is desired, even between EBCDIC and ASCII encodings.
    Note: ANY is a synonym for binary. Because the data is binary, the actual encoding is irrelevant.
  • ASCIIANY enables you to create a data set that is compatible with all ASCII-based encodings.
  • EBCDICANY enables you to create a data set that is compatible with all EBCDIC-based encodings.
You might want to create a SAS data set that contains mixed encodings. For example, both Latin1 and Latin2. You do not want the data transcoded for either input or output processing. By default, data is transcoded to the current session encoding.
Data must be transcoded when the SAS file and the SAS session use incompatible encodings.For example, ASCII and EBCDIC.
In some cases, transcoding is not required because the SAS file and the SAS session have compatible encodings.
For a list of the encodings, by operating environment, see Encoding Values for a SAS Session.

Line-feed Characters and Transferring Data between EBCDIC and ASCII

Software that runs under ASCII operating environments requires the end of the line be specified by the line-feed character. When data is transferred from z/OS to a machine that supports ASCII encodings, formatting problems can occur, particularly in HTML output, because the EBCDIC newline character is not recognized. SAS supports two sets of EBCDIC-based encodings for z/OS:
  • The encodings that have EBCDIC in their names use the traditional mapping of EBCDIC line-feed to ASCII line-feed character, which can cause data to appear as one stream.
  • The encodings that have Open Edition in their names use the line-feed character as the end-of-line character. When the data is transferred to an operating environment that uses ASCII, the EBCDIC newline character maps to an ASCII line-feed character. This mapping enables ASCII applications to interpret the end-of-line correctly, resulting in better formatting.
For a list of the encodings, by operating environment, see Encoding Values for a SAS Session.

EBCDIC and OpenEdition Encodings Are Compatible

EBCDIC and OpenEdition are compatible encodings.
Encodings that contain EBCDIC in their names use the traditional mapping of EBCDIC line-feed (0x25) and new-line (0x15) characters.
Encodings that contain OPEN_ED in their names and OpenEdition in their descriptions switch the mapping of the new-line and line-feed characters. That is, they use the line-feed character as the end-of-line character.
If the two encodings use the same code page number but one is EBCDIC and the other is Open Edition, no transcoding is necessary.
Example:
If the data is encoded in EBCDIC1143 and the SAS session is encoded in OPEN_ED-1143, no transcoding is necessary because they use the same 1143 code page.
In order to transfer data between ASCII and EBCDIC, you can specify Open Edition encodings from the list of compatible encodings.
Note: Open Edition encodings are used by default in NONLSCOMPATMODE.

Some East Asian MBCS Encodings Are Compatible

Some East Asian double-byte (DBCS) are compatible encodings. Each line in the list contains compatible encodings:
  • SHIFT-JIS, MS-932, IBM-942, MACOS-1
  • MS-949, MACOS-3, EUC-KR
  • EUC-CN, MS-936, MACOS-25, DEC-CN
  • EUC-TW, DEC-TW
  • MS-950, MACOS-2, BIG5
If the SAS session is encoded in one of the encodings in the group and the data set is encoded in another encoding, but in the same group, then no transcoding occurs.
Example:
If the session encoding is SHIFT-JIS and the data set encoding is IBM-942, then no transcoding occurs.