The encoding
methods result from standards developed by various computer hardware
manufacturers and standards organizations. For more information, see
Standards Organizations for NLS Encodings. The common
encoding methods are listed here:
ASCII (American Standard Code for Information Interchange)
is a 7-bit encoding
for the United States that provides 128 character combinations. The
encoding contains characters for uppercase and lowercase English,
American English punctuation, base 10 numbers, and a few control characters.
This set of 128 characters is common to most other encodings. ASCII
is used by personal computers.
EBCDIC (Extended Binary Coded Decimal Interchange Code) family
is an 8-bit encoding
that provides 256 character combinations. There are multiple EBCDIC-based
encodings. EBCDIC is used on IBM mainframes and most IBM mid-range
computers. EBCDIC follows ISO 646 conventions to facilitate translations
between EBCDIC encodings and 7-bit (and 8-bit) ASCII-based encodings.
The 95 EBCDIC graphical characters include 82 invariant characters
(including a blank space), which occupy the same code positions across
most EBCDIC single-byte code pages, and also includes 13 variant graphic
characters, which occupy varying code positions across most EBCDIC
single-byte code pages. For details about variant characters, see
Code Point Discrepancies among EBCDIC Encodings.
ISO (International Organization for Standardization) 646 family
is a 7-bit encoding
that is an international standard and provides 128 character combinations.
The ISO 646 family of encodings is similar to ASCII except that it
has 12 code points for national variants. The 12 national variants
represent specific characters that are needed for a particular language.
ISO 8859 family and Windows family
is an 8-bit extension
of ASCII that supports all of the ASCII code points and adds 12 more,
providing 256 character combinations. Latin1, which is officially
named ISO-8859-1, is the most frequently used member of the ISO 8859
family of encodings. In addition to the ASCII characters, Latin1 contains
accented characters, other letters needed for languages of Western
Europe, and some special characters. HTTP and HTML protocols are based
on Unicode.
provides up to 107,361
character combinations. Unicode can accommodate basically all of the
world's languages.
There are three Unicode
encoding forms:
is an MBCS encoding
that contains the Latin-script languages, Greek, Cyrillic, Arabic,
and Hebrew, and East Asian languages such as Japanese, Chinese and
Korean. The characters in UTF-8 are of varying width, from one to
four bytes. UTF-8 maintains ASCII compatibility by preserving the
ASCII characters in code positions 1 through 128.
is a 16-bit form that
contains all of the most common characters in all modern writing systems.
Most of the characters are uniformly represented with two bytes, although
there is extended space, called surrogate space, for additional characters
that require four bytes.
is a 32-bit form whose
characters each occupy 4 bytes.
The ISO 8859 family
has other members that are designed for other languages. The following
table describes the other encodings that are approved by ISO.
Other Encodings Approved by ISO
|
|
|
|
|
|
|
|
Central and East European
|
|
|
South European, Maltese
and Esperanto
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nordic (Inuit, Sámi,
Icelandic)
|
|
|
|
|
|
|
|
|
|
|
|
West European and Albanian
|
Also, a number of encoding
standards have been developed for East Asian languages, some of which
are listed in the following table.
Some East Asian Language Encodings Approved by ISO
|
|
|
|
|
People's Republic of
China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Japan Industry Standard
multibyte encoding
|
|
There are other encodings
in the standards for EBCDIC and Windows that support different languages
and locales.