z/OS Resource Names and Encoding

The SAS language and various SAS user interfaces enable the user to specify the names associated with operating system (OS) resources such as z/OS data set names and UFS paths. SAS also displays OS resource names as part of SAS output, in messages to the SAS log, in various windows of the SAS windowing environment, and so on.

The z/OS resource names are maintained by z/OS as a sequence of binary code points, rather than as characters. In other words, on z/OS, the application programming interfaces do not associate a character encoding with z/OS resource names. The same is true for the user interfaces associated with z/OS components such as JES (JCL), ISPF, the UNIX System Services (USS) shell, and so on. When a z/OS resource name is created, the encoding used to specify the name is not stored or saved with the name itself. The following z/OS data set name illustrates how some code points are associated with different characters by different EBCDIC code pages:

PROD.ACCT#104.RAWDATA1
PROD.ACCTÄ104.RAWDATA2
DDDC4CCCE7FFF4DCECCEC3
7964B1333B104B91641314

1	the data set name as it is represented in EBCDIC 1047
2	the data set name as it is represented in EBCDIC 1143
3	the first hexadecimal character of the code points for the data set name
4	the second hexadecimal character of the code points for the data set name

The tenth code point in the data set name is X’7B’. This code point corresponds to the # character in the U. S. English code page (EBCDIC 1047). However, the code page associated with the Finnish code page (EBCDIC 1143) maps the character Ä to the code point X’7B’. Therefore, the sequence of characters required to identify this OS resource is different for the EBCDIC 1047 and EBCDIC 1143 encodings.

The difference in the sequence of characters is illustrated in the following example, which reads the external file residing in the z/OS data set. A portion of the SAS log is shown for functionally equivalent programs that were run with SAS 9.3 in the EBCDIC 1047 and EBCDIC 1143 encodings.

The following SAS log excerpt shows the EBCDIC 1047 encoding:

5    proc options option=encoding; run;

ENCODING=OPEN_ED-1047
   Specifies default encoding for internal processing of data

6    filename rawdata 'prod.acct#104.rawdata';1
7    data acct;
8       account = 104;
9       infile rawdata;
10      attrib transdate format=date9. informat=date9.;
11      input transdate category $ amount;2
12   run;

NOTE: The infile RAWDATA is:
      Dsname=PROD.ACCT#104.RAWDATA,3
      Unit=3390,Volume=SDS012,Disp=SHR,Blksize=27920,
      Lrecl=80,Recfm=FB,Creation=2011/06/09

1	The highlighted character # in the data set name is associated with the code point `X’7B’` in the OPEN_ED-1047 encoding.
2	The highlighted character $ in the INPUT statement is part of SAS syntax. Syntactical meaning is based on the character instead of the code point.
3	The highlighted character # in the data set name is associated with the code point `X’7B’` in the OPEN_ED-1047 encoding.

The following SAS log excerpt shows the EBCDIC 1143 encoding:

5    proc options option=encoding; run; 

ENCODING=OPEN_ED-1143
   Specifies default encoding for internal processing of data

6    filename rawdata 'prod.acctÄ104.rawdata';1
7    data acct;
8       account = 104;
9       infile rawdata;
10      attrib transdate format=date9. informat=date9.; 
11      input transdate category $ amount;2
12   run;

NOTE: The infile RAWDATA is:
      Dsname=PROD.ACCTÄ104.RAWDATA,3
      Unit=3390,Volume=SDS012,Disp=SHR,Blksize=27920,
      Lrecl=80,Recfm=FB,Creation=2011/06/09

1	The highlighted character Ä in the data set name is associated with the code point `X’7B’` in the OPEN_ED-1143 encoding.
2	The highlighted character $ in the INPUT statement is part of SAS syntax. Syntactical meaning is based on the character instead of the code point.
3	The highlighted character Ä in the data set name is associated with the code point `X’7B’` in the OPEN_ED-1143 encoding.

In the preceding log excerpts, the same code point, X’7B’, is specified in the SAS program for the tenth character of the z/OS data set name. In addition, the same code point, X’7B’, is used by SAS in the NOTE message to represent the name of the data set that was read. However, different characters are displayed for this code point in the two log excerpts because the terminal emulation for the first log excerpt used a different encoding than the second log excerpt.

Note that SAS processes z/OS resource names differently than it does SAS syntax. When NONLSCOMPATMODE is in effect, SAS syntax is interpreted according to the value of the ENCODING option. NONLSCOMPATMODE allows the same character to be specified in SAS syntax regardless of the SAS session encoding that is in effect. For example, in the first log excerpt, the $ syntax character in the INPUT statement is encoded as X’5B’ because it is in EBCDIC 1047. In the second log excerpt, the $ syntax character is encoded as X’67’ because it is in EBCDIC 1143. However, both SAS sessions properly recognized these code points as corresponding to the same character, $, because the ENCODING option informed SAS how to interpret the code points. In NONLSCOMPATMODE, syntactical meaning is associated with the character, not a particular code point. NONLSCOMPATMODE is the default value of the NLSCOMPATMODE system option.

In contrast, because z/OS resource names have no inherent encoding, it is the string of code points that identifies the resource to the system, not the associated characters. The associated characters might vary depending on the encoding in which the SAS program is prepared.