Common Encoding Methods

The encoding methods result from standards developed by various computer hardware manufacturers and standards organizations. For more information, see Standards Organizations for NLS Encodings. The common encoding methods are listed here:
ASCII (American Standard Code for Information Interchange)
is a 7-bit encoding for the United States that provides 128 character combinations. The encoding contains characters for uppercase and lowercase English, American English punctuation, base 10 numbers, and a few control characters. This set of 128 characters is common to most other encodings. ASCII is used by personal computers.
EBCDIC (Extended Binary Coded Decimal Interchange Code) family
is an 8-bit encoding that provides 256 character combinations. There are multiple EBCDIC-based encodings. EBCDIC is used on IBM mainframes and most IBM mid-range computers. EBCDIC follows ISO 646 conventions to facilitate translations between EBCDIC encodings and 7-bit (and 8-bit) ASCII-based encodings. The 95 EBCDIC graphical characters include 82 invariant characters (including a blank space), which occupy the same code positions across most EBCDIC single-byte code pages, and also includes 13 variant graphic characters, which occupy varying code positions across most EBCDIC single-byte code pages. For details about variant characters, see Code Point Discrepancies among EBCDIC Encodings.
ISO (International Organization for Standardization) 646 family
is a 7-bit encoding that is an international standard and provides 128 character combinations. The ISO 646 family of encodings is similar to ASCII except that it has 12 code points for national variants. The 12 national variants represent specific characters that are needed for a particular language.
ISO 8859 family and Windows family
is an 8-bit extension of ASCII that supports all of the ASCII code points and adds 12 more, providing 256 character combinations. Latin1, which is officially named ISO-8859-1, is the most frequently used member of the ISO 8859 family of encodings. In addition to the ASCII characters, Latin1 contains accented characters, other letters needed for languages of Western Europe, and some special characters. HTTP and HTML protocols are based on Unicode.
Unicode
provides up to 107,361 character combinations. Unicode can accommodate basically all of the world's languages.
There are three Unicode encoding forms:
UTF-8
is an MBCS encoding that contains the Latin-script languages, Greek, Cyrillic, Arabic, and Hebrew, and East Asian languages such as Japanese, Chinese and Korean. The characters in UTF-8 are of varying width, from one to four bytes. UTF-8 maintains ASCII compatibility by preserving the ASCII characters in code positions 1 through 128.
UTF-16
is a 16-bit form that contains all of the most common characters in all modern writing systems. Most of the characters are uniformly represented with two bytes, although there is extended space, called surrogate space, for additional characters that require four bytes.
UTF-32
is a 32-bit form whose characters each occupy 4 bytes.
Other encodings
The ISO 8859 family has other members that are designed for other languages. The following table describes the other encodings that are approved by ISO.
Other Encodings Approved by ISO
ISO Standard
Name of Encoding
Description
ISO 8859-1
Latin 1
US and West European
ISO 8859-2
Latin 2
Central and East European
ISO 8859-3
Latin 3
South European, Maltese and Esperanto
ISO 8859-4
Baltic
North European
ISO 8859-5
Cyrillic
Slavic languages
ISO 8859-6
Arabic
Arabic
ISO 8859-7
Greek
Modern Greek
ISO 8859-8
Hebrew
Hebrew and Yiddish
ISO 8859-9
Turkish
Turkish
ISO 8859-10
Latin 6
Nordic (Inuit, Sámi, Icelandic)
ISO 8859-11
Latin/Thai
Thai
ISO 8859-13
Latin 7
Baltic Rim
ISO 8859-14
Latin 8
Celtic
ISO 8859-15
Latin 9
West European and Albanian
Also, a number of encoding standards have been developed for East Asian languages, some of which are listed in the following table.
Some East Asian Language Encodings Approved by ISO
Standard
Name of Encoding
Description
GB 2312-80
Simplified Chinese
People's Republic of China
CNS 11643
Traditional Chinese
Taiwan
Big-5
Traditional Chinese
Taiwan
KS C 5601
Korean National Standard
Korea
JIS
Japan Industry Standard
Japan
Shift-JIS
Japan Industry Standard multibyte encoding
Japan
There are other encodings in the standards for EBCDIC and Windows that support different languages and locales.