All data that is stored,
transmitted, or processed by a computer is in an encoding. An encoding
maps each character to a unique numeric representation. For example:
-
You press a key on a
keyboard, like the uppercase letter A.
-
The computer assigns
the internal numeric representation, that is, a unique hexadecimal
number.
-
To display or print
the character, the computer uses the font (graphical representation)
that matches the numeric representation, that is, the uppercase letter
A.
To assign the numeric
representation to a character, an encoding uses a code page, which
is an ordered set of characters in which a numeric index (code point
value) is associated with each character. The position of a character
on the code page determines its two-digit hexadecimal number. The
first digit of the hexadecimal number is determined by the column,
and the second digit by the row. For example, the following is the
code page for the Windows Latin1 encoding. The numeric representation
for the uppercase A is the hexadecimal number 41, and the numeric
representation for the equal sign (=) is the hexadecimal number 3D.
Encoding is the combination
of a character set with an encoding method:
-
A character set is the repertoire
of characters and symbols that are used by a language or group of
languages. A character set includes national characters (which are
characters specific to a particular nation or group of nations), special
characters (such as punctuation marks), the unaccented Latin characters
A through Z, the digits 0 through 9, and control characters that are
needed by the computer.
-
An encoding method is the set of
rules that are used to assign the numbers to the set of characters
that are in an encoding. These rules govern such things as the size
of the encoding (number of bits used to store the numeric representation
of the character) and the ranges in the code page where characters
are allowed to appear.
When the rules of the
encoding method are followed, and numbers are assigned to the characters,
the result is called an encoding.
An individual character
can have different positions in code pages for different encodings,
which result in different hexadecimal numbers. For example, the position
of the uppercase letter A in the Wlatin1 code page (shown above) results
in the hexadecimal number 41, while in the following Danish EBCDIC
code page, the position of the uppercase letter A results in the hexadecimal
number C1.