Previous Page | Next Page

Glossary

Glossary

ANSI (American National Standards Institute)

an organization in the United States that coordinates voluntary standards and conformity to those standards. ANSI works with ISO to establish global standards. See also ISO (International Organization for Standardization).

ASCII (American Standard Code for Information Interchange)

a 7-bit encoding that is the U.S. national variant of ISO 646. The ASCII encoding includes the upper- and lowercase letters A-Z, digits, symbols (such as &, #, and mathematical symbols), punctuation marks, and control characters. This set of 128 characters is also included in most other encodings. See also ISO 646 family.

BIDI (bidirectional) text

a mixture of characters that are read from left to right and characters that are read from right to left. Most Arabic and Hebrew strings of text, for example, are read from right to left, but numbers and embedded Western terms within Arabic and Hebrew text are read from left to right.

CEDA (Cross-Environment Data Access)

a feature of SAS software that enables a SAS data file that was created in any directory-based operating environment (for example, Solaris, Windows, HP-UX) to be read by a SAS session that is running in another directory-based environment. You can access the SAS data files without using any intermediate conversion steps. See also data representation.

character set

the set of characters that are used by a language or group of languages. A character set includes national characters, special characters (such as punctuation marks and mathematical symbols), the digits 0-9, and control characters that are needed by the computer. Most character sets also include the unaccented upper- and lowercase letters A-Z. See also national character.

code page

an ordered character set in which a numeric index (code point) is associated with each character. See also character set.

code point

a hexadecimal value that represents a character in an encoding or that is associated with a character on a code page. See also code page, encoding.

code position

the row and column location of a character in a code page. See also code page.

code table

another term for code page. See code page.

data representation

the form in which data is stored in a particular operating environment. Different operating environments use different standards or conventions for storing floating-point numbers (for example, IEEE or IBM 390); for character encoding (ASCII or EBCDIC); for the ordering of bytes in memory (big Endian or little Endian); for word alignment (4-byte boundaries or 8-byte boundaries); and for data-type length (16-bit, 32-bit, or 64-bit).

DBCS (double-byte character set)

any East Asian character set (Japanese, Korean, Simplified Chinese, and Traditional Chinese) that requires a mixed-width encoding because most characters occupy more than one byte of computer memory or storage. This term is somewhat misleading because not all characters in a DBCS require more than one byte, and some DBCS characters actually require four bytes. See also character set.

EBCDIC (Extended Binary Coded Decimal Interchange Code)

a group of 8-bit encodings that each include up to 256 characters. EBCDIC is used on IBM mainframes and on most IBM mid-range computers. EBCDIC follows ISO 646 conventions in order to facilitate transcoding between EBCDIC encodings, ASCII, the ISO 646 family of encodings, and 8-bit extensions to ASCII such as the ISO 8859 family. The 95 EBCDIC graphical characters include 82 invariant characters (including the SPACE character), which occupy the same code positions across most single-byte EBCDIC code pages, and 13 variant graphic characters, which occupy varying code positions across most single-byte EBCDIC code pages. See also ASCII (American Standard Code for Information Interchange), encoding, ISO (International Organization for Standardization), ISO 646 family, ISO 8859 family.

encoding

a set of characters (letters, logograms, digits, punctuation marks, symbols, and control characters) that have been mapped to hexadecimal values (called code points) that can be used by computers. An encoding results from applying an encoding method to a specific character set. Groups of encodings that apply the same encoding method to different character sets are sometimes referred to as families of encodings. For example, German EBCDIC is an encoding in the EBCDIC family, Windows Cyrillic is an encoding in the Windows family, and Latin 1 is an encoding in the ISO 8859 family. See also character set, encoding method.

encoding method

the set of rules that is used for assigning numeric representations to the characters in a character set. For example, these rules specify how many bits are used for storing the numeric representation of the character, as well as the ranges in the code page in which characters appear. The encoding methods are standards that have been developed in the computing industry. An encoding method is often specific to a computer hardware vendor. See also character set, encoding.

internationalization

the process of designing a software application without making assumptions that are based on a single language or locale. See also NLS (National Language Support).

ISO (International Organization for Standardization)

an organization that promotes the development of standards and that sponsors related activities in order to facilitate the dissemination of products and services among nations and to support the exchange of intellectual, scientific, and technological information.

ISO 646 family

a group of 7-bit encodings that are defined in the ISO 646 standard and that each include up to 128 characters. The ISO 646 encodings are similar to ASCII except for 12 code points that are used for national variants. National variants are specific characters that are needed for a particular language. See also ASCII (American Standard Code for Information Interchange), ISO (International Organization for Standardization).

ISO 8859 family

a group of 8-bit extensions to ASCII that support all 128 of the ASCII code points plus an additional 128 code points, for a total of 256 characters. ISO-8859-1 (Latin 1) is a commonly used member of the ISO 8859 family of encodings. In addition to the ASCII characters, ISO-8859-1 contains accented characters, other letters that are needed for languages of Western Europe, and some special characters. See also ASCII (American Standard Code for Information Interchange), ISO (International Organization for Standardization).

language

an aspect of locale that is not necessarily unique to any one country or geographic region. For example, Portuguese is spoken in Brazil as well as in Portugal, but there are separate locales for Portuguese_Portugal and Portuguese_Brazil. See also locale.

locale

a value that reflects the language, local conventions, and culture for a geographic region. Local conventions can include specific formatting rules for dates, times, and numbers, and a currency symbol for the country or region. Collating sequences, paper sizes, and conventions for postal addresses and telephone numbers are also typically specified for each locale. Some examples of locale values are French_Canada, Portuguese_Brazil, and Chinese_Singapore.

localization

the process of adapting a product to meet the language, cultural, and other requirements of a specific target environment or market so that customers can use their own languages and conventions when using the product. Translation of the user interface, system messages, and documentation is part of localization.

MBCS (multi-byte character set)

a synonym for DBCS. See DBCS (double-byte character set).

national character

any character that is specific to a language as it is written in a particular nation or group of nations.

NLS (national language support)

the set of features that enable a software product to function properly in every global market for which the product is targeted.

SBCS (single-byte character set)

a character set in which each character occupies only one byte of computer memory or storage. A single-byte character set can be either 7 bits (providing up to 128 characters) or 8 bits (providing up to 256 characters). An example of an 8-bit SBCS is the ISO-8859-5 character set, which includes the Cyrillic characters that are used in Russian and other languages. See also character set.

transcoding

the process of converting the contents of a SAS file from one encoding to another encoding. Transcoding is necessary if the session encoding and the file encoding are different, such as when transferring data from a Latin 1 encoding under UNIX to a German EBCDIC encoding on an IBM mainframe. See also encoding, translation table.

translation table

a SAS catalog entry that is used for transcoding data from one encoding to another encoding. SAS language elements that control locale values and encoding properties automatically invoke the appropriate translation table. Translation tables are specific to the operating environment. For example, there is a specific translation table that maps the Windows Latin 2 encoding to the ISO Latin 2 encoding. See also encoding, transcoding.

Unicode

a 16-bit encoding that supports the interchange, processing, and display of characters and symbols from dozens of writing systems, for a total of up to 65,536 characters. Unicode includes all characters from most modern written languages as well as characters from some historical languages.

Unicode Consortium

an organization that develops and promotes the Unicode standard. See also Unicode.

Previous Page | Next Page | Top of Page