Encoding for NLS |
An encoding maps each character in a character set to a unique numeric representation, which results in a table of all code points. This table is referred to as a code page, which is an ordered set of characters in which a numeric index (code point value) is associated with each character. The position of a character on the code page determines its two-digit hexadecimal number.
For example, the following is the code page for the Windows Latin1 encoding. In the following example, the row determines the first digit and the column determines the second digit. The numeric representation for the uppercase A is the hexadecimal number 41, and the numeric representation for the equal sign (=) is the hexadecimal number 3D.
A character set is the set of characters and symbols that are used by a language or group of languages. A character set includes national characters (which are characters specific to a particular nation or group of nations), special characters (such as punctuation marks), the unaccented Latin characters A-Z, the digits 0-9, and control characters that are needed by the computer.
An encoding method is a set of rules that assign the numeric representations to the set of characters. These rules govern the size of the encoding (number of bits used to store the numeric representation of the character) and the ranges in the code page where characters appear. The encoding methods result from the adherence to standards that have been developed in the computing industry. An encoding method is often specific to the computer hardware vendor.
An encoding results from applying an encoding method to a character set.
An individual character can occupy a different position in a code page, depending on the code page used. For example, the German uppercase letter Ä:
is represented as the hexadecimal number C4 in the Windows Latin1 code page (1252)
is represented as the hexadecimal number 4A in the German EBCDIC code page (1141)
In the following code page example, German is the character set and EBCDIC is the encoding method.
In the following example, the column determines the first digit and the row determines the second digit.
Each SAS session is set to a default encoding, which can be specified by using various SAS language elements.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.