Encoding for NLS |
The encoding methods result from standards developed by various computer hardware manufacturers and standards organizations. For more information, see Standards Organizations for NLS Encodings. The common encoding methods are listed here:
is a 7-bit encoding for the United States that provides 128 character combinations. The encoding contains characters for uppercase and lowercase English, American English punctuation, base 10 numbers, and a few control characters. This set of 128 characters is common to most other encodings. ASCII is used by personal computers.
is an 8-bit encoding that provides 256 character combinations. There are multiple EBCDIC-based encodings. EBCDIC is used on IBM mainframes and most IBM mid-range computers. EBCDIC follows ISO 646 conventions to facilitate translations between EBCDIC encodings and 7-bit (and 8-bit) ASCII-based encodings. The 95 EBCDIC graphical characters include 82 invariant characters (including a blank space), which occupy the same code positions across most EBCDIC single-byte code pages, and also includes 13 variant graphic characters, which occupy varying code positions across most EBCDIC single-byte code pages. For details about variant characters, see Code Point Discrepancies among EBCDIC Encodings.
is a 7-bit encoding that is an international standard and provides 128 character combinations. The ISO 646 family of encodings is similar to ASCII except that it has 12 code points for national variants. The 12 national variants represent specific characters that are needed for a particular language.
is an 8-bit extension of ASCII that supports all of the ASCII code points and adds 12 more, providing 256 character combinations. Latin1, which is officially named ISO-8859-1, is the most frequently used member of the ISO 8859 family of encodings. In addition to the ASCII characters, Latin1 contains accented characters, other letters needed for languages of Western Europe, and some special characters. HTTP and HTML protocols are based on Unicode.
provides up to 99,024 character combinations. Unicode can accommodate basically all of the world's languages.
There are three Unicode encoding forms:
is an MBCS encoding that contains the Latin-script languages, Greek, Cyrillic, Arabic, and Hebrew, and East Asian languages such as Japanese, Chinese and Korean. The characters in UTF-8 are of varying width, from one to four bytes. UTF-8 maintains ASCII compatibility by preserving the ASCII characters in code positions 1 through 128.
is a 16-bit form that contains all of the most common characters in all modern writing systems. Most of the characters are uniformly represented with two bytes, although there is extended space, called surrogate space, for additional characters that require four bytes.
is a 32-bit form whose characters each occupy 4 bytes.
The ISO 8859 family has other members that are designed for other languages. The following table describes the other encodings that are approved by ISO.
ISO Standard | Name of Encoding | Description |
---|---|---|
ISO 8859-1 | Latin 1 | US and West European |
ISO 8859-2 | Latin 2 | Central and East European |
ISO 8859-3 | Latin 3 | South European, Maltese and Esperanto |
ISO 8859-4 | Baltic | North European |
ISO 8859-5 | Cyrillic | Slavic languages |
ISO 8859-6 | Arabic | Arabic |
ISO 8859-7 | Greek | Modern Greek |
ISO 8859-8 | Hebrew | Hebrew and Yiddish |
ISO 8859-9 | Turkish | Turkish |
ISO 8859-10 | Latin 6 | Nordic (Inuit, Sámi, Icelandic) |
ISO 8859-11 | Latin/Thai | Thai |
ISO 8859-13 | Latin 7 | Baltic Rim |
ISO 8859-14 | Latin 8 | Celtic |
ISO 8859-15 | Latin 9 | West European and Albanian |
Additionally, a number of encoding standards have been developed for East Asian languages, some of which are listed in the following table.
Standard | Name of Encoding | Description |
---|---|---|
GB 2312-80 | Simplified Chinese | People's Republic of China |
CNS 11643 | Traditional Chinese | Taiwan |
Big-5 | Traditional Chinese | Taiwan |
KS C 5601 | Korean National Standard | Korea |
JIS | Japan Industry Standard | Japan |
Shift-JIS | Japan Industry Standard multibyte encoding | Japan |
There are other encodings in the standards for EBCDIC and Windows that support different languages and locales.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.