Glossary

accented character
a type of character that is modified by the addition of an accent mark that alters the pronunciation of the character. An example is "ñ", which results from combining the tilde (~) with the character "n".
American National Standards Institute
See ANSI.
American Standard Code for Information Interchange
See ASCII.
ANSI
the organization that coordinates the development of voluntary consensus standards for products, services, processes, systems, and personnel in the U.S. ANSI works with the International Organization for Standardization to establish global standards. Short form: ANSI.
ASCII
a 7-bit encoding standard that provides a basic set of 128 characters, supporting a variety of computer systems. ASCII encodes the uppercase and lowercase letters of the English alphabet, punctuation marks, the digits 0-9, and control characters. This set of 128 characters is also included in most other encodings. Short form: ASCII.
BIDI
pertaining to a writing system such as Arabic and Hebrew that generally runs from right to left, except for numbers and embedded text written in other languages that run from left to right. Short form: BIDI.
bidi
See BIDI.
bidirectional
See BIDI.
CEDA
a feature of SAS software that enables a SAS data file that was created in any directory-based operating environment (for example, Solaris, Windows, HP-UX, OpenVMS, and z/OS) to be read by a SAS session that is running in another directory-based environment. You can access the SAS data files without using any intermediate conversion steps. Short form: CEDA.
character
the smallest component of a writing system that has a semantic value such as the letters of an alphabet, digits, or ideographs. A character refers to the abstract meaning rather than to a specific shape.
character set
a collection of characters that are used by a language or group of languages. A character set includes national characters, special characters, the digits 0-9, and control characters.
collating sequence
a set of rules that determine how textual data is ordered and compared.
control character
a type of character that is used for control purposes rather than for information exchange. Control characters are usually nonprintable.
Cross-Environment Data Access
See CEDA.
data representation
the form in which data is stored in a particular operating environment. Different operating environments use different standards or conventions for storing floating-point numbers (for example, IEEE or IBM 390); for character encoding (ASCII or EBCDIC); for the ordering of bytes in memory (big Endian or little Endian); for word alignment (4-byte boundaries or 8-byte boundaries); and for data-type length (16-bit, 32-bit, or 64-bit).
DBCS
See double-byte character set.
double-byte character set
a type of encoding for which one or two bytes of computer memory are required to represent each character. Examples of double-byte character sets are Japanese, Korean, and Chinese. Short form: DBCS.
EBCDIC
a family of single-byte and multi-byte encodings for the representation of data on IBM mainframe and mid-range computers. EBCDIC encodes the uppercase and lowercase letters of the English alphabet, punctuation marks, the digits 0-9, and an extended set of control characters. Short form: EBCDIC
encode
to represent data in a particular character encoding scheme. For example, in ASCII, the letter "A" is represented as 41 (hexadecimal).
encoding
a mapping of a coded character set to code values.
encoding method
the set of rules that are used for assigning numeric representations to the characters in a character set. For example, these rules specify how many bits are used for storing the numeric representation of the character, as well as the ranges in the code page in which characters appear. The encoding methods are standards that have been developed in the computing industry. An encoding method is often specific to a computer hardware vendor. Common encoding methods include ASCII, EBCDIC, the ISO 646 family, the ISO 8859 family, and Unicode.
Extended Binary Coded Decimal Interchange Code
See EBCDIC.
graphic character
a type of character that can be written, printed, or displayed.
I18N
See internationalization.
International Organization for Standardization
See ISO.
internationalization
the process of designing a software product without making assumptions that are based on a single language or locale. Internationalization ensures that international conventions (including rules for sorting strings and for formatting dates, times, numbers, and currencies) are supported. It also facilitates a consistent user experience across different language editions of a product. (Short form: I18N.)
ISO
an organization that promotes the development of standards and sponsors related activities that help to disseminate products and services among nations. Also it supports the exchange of intellectual, scientific, and technological information. Short form: ISO.
ISO 646 family
the name of a group of 7-bit encodings that are defined in the ISO 646 standard and that each include up to 128 characters. The ISO 646 encodings are similar to ASCII except that ISO 646 has 12 code points that are used for national variants. National variants are specific characters that are needed for a particular language.
ISO 8859 family
the set of 16 8-bit encodings that are defined in the ISO 8859 standard. Each encoding contains both the 128 ASCII characters and the 128 extended characters, which are used in the language or languages that are supported by the encoding. For example, ISO 8859-1, also called Latin-1, is a commonly used encoding in the ISO 8859 family that contains the ASCII characters as well as characters used by Western European languages.
language
an aspect of locale that is not necessarily unique to any one country or geographic region. For example, Portuguese is spoken in Brazil as well as in Portugal, but there are separate locales for Portuguese_Portugal and Portuguese_Brazil.
locale
a setting that reflects the language, local conventions, and culture for a geographic region. Local conventions can include specific formatting rules for paper sizes, dates, times, and numbers, and a currency symbol for the country or region. Some examples of locale values are French_Canada, Portuguese_Brazil, and Chinese_Singapore.
localization
the process of adapting a product to meet the language, cultural, and other requirements of a specific target environment or market so that customers can use their own languages and conventions when using the product. Translation of the user interface, system messages, and documentation is part of localization.
logogram
a visual symbol that represents a word or morpheme rather than a speech sound. An example of a logogram in the Chinese language is 山 for the word "mountain".
MBCS
See multi-byte character set.
multi-byte character set
a type of encoding for which one or more bytes of computer memory are required to represent each character. Examples of multi-byte character sets are Japanese, Korean, and Chinese. Short form: MBCS.
national character
a character (letter, ideograph, or pictograph) that belongs to a writing system, but is not a Latin character (A-Z and a-z).
national language support
See NLS.
NLS
the set of features that enable a software product to function properly in every global market for which the product is targeted. Short form: NLS.
SBCS
See single-byte character set.
single-byte character set
a type of encoding for which each character is represented using one byte of computer memory. An example of a single-byte character set is Latin 1. Short form: SBCS.
special character
a type of character other than alphanumeric characters, the underscore (_), and the blank. An example is the asterisk (*).
transcoding
the process of converting the contents of a SAS file from one encoding to another encoding. Transcoding is necessary if the session encoding and the file encoding are different, such as when transferring data from a Latin 1 encoding under UNIX to a German EBCDIC encoding on an IBM mainframe.
translation table
an operating environment-specific SAS catalog entry that is used to translate the value of one character to another. Translation tables often are needed to support the use of multiple national languages in an application. An example of a translation table is one that converts characters from EBCDIC to ASCII-ISO.
Unicode
a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems. Unicode includes more than 109,000 characters covering dozens of scripts, plus standards for character properties such as upper and lower case, for rendering bidirectional script, and a number of related items.