An encoding maps
each character in a character set to a unique numeric representation,
which results in a table of all code points. This table is referred
to as a code page, which is an ordered set of characters in which
a numeric index (code point value) is associated with each character.
The position of a character on the code page determines its two-digit
hexadecimal number.
For example, the
following is the code page for the Windows Latin1 encoding. In the
following example, the row determines the first digit and the column
determines the second digit. The numeric representation for the uppercase
A is the hexadecimal number 41, and the numeric representation for
the equal sign (=) is the hexadecimal number 3
D.
A character set
is the set of characters and symbols that are used by a language or
group of languages. A character set includes national characters (which
are characters specific to a particular nation or group of nations),
special characters (such as punctuation marks), the unaccented Latin
characters A–Z, the digits 0–9, and control characters
that are needed by the computer.
An encoding
method is a set of rules that
assign the numeric representations to the set of characters. These
rules govern the size of the encoding (number of bits used to store
the numeric representation of the character) and the ranges in the
code page where characters appear. The encoding methods result from
the adherence to standards that have been developed in the computing
industry. An encoding method is often specific to the computer hardware
vendor.
An encoding results
from applying an encoding method to a character set.
An individual character
can occupy a different position in a code page, depending on the code
page used. For example, the German uppercase letter Ä:
-
is represented as the hexadecimal
number C4 in the Windows Latin1 code page (1252)
-
is represented as the hexadecimal
number 4A in the German EBCDIC code page (1141)
In the following
code page example, German is the character set and EBCDIC is the encoding
method.
In the following example,
the column determines the first digit and the row determines the second
digit.
Each SAS session is
set to a default encoding, which can be specified by using various
SAS language elements.