Glossary
- accented character
-
a type of character that is modified by the addition
of an accent mark that alters the pronunciation of the character.
An example is "ñ", which results from combining the tilde (~)
with the character "n".
- American National Standards Institute
-
See ANSI.
- American Standard Code for Information Interchange
-
See ASCII.
- ANSI
-
the organization that coordinates the development
of voluntary consensus standards for products, services, processes,
systems, and personnel in the U.S. ANSI works with the International
Organization for Standardization to establish global standards. Short
form: ANSI.
- ASCII
-
a 7-bit encoding standard that provides a basic
set of 128 characters, supporting a variety of computer systems. ASCII
encodes the uppercase and lowercase letters of the English alphabet,
punctuation marks, the digits 0-9, and control characters. This set
of 128 characters is also included in most other encodings. Short
form: ASCII.
- BIDI
-
pertaining to a writing system such as Arabic
and Hebrew that generally runs from right to left, except for numbers
and embedded text written in other languages that run from left to
right. Short form: BIDI.
- bidi
-
See BIDI.
- bidirectional
-
See BIDI.
- CEDA
-
a feature of SAS software that enables a SAS data
file that was created in any directory-based operating environment
(for example, Solaris, Windows, HP-UX, OpenVMS, and z/OS) to be read
by a SAS session that is running in another directory-based environment.
You can access the SAS data files without using any intermediate conversion
steps. Short form: CEDA.
- character
-
the smallest component of a writing system that
has a semantic value such as the letters of an alphabet, digits, or
ideographs. A character refers to the abstract meaning rather than
to a specific shape.
- character set
-
a collection of characters that are used by a
language or group of languages. A character set includes national
characters, special characters, the digits 0-9, and control characters.
- collating sequence
-
a set of rules that determine how textual data
is ordered and compared.
- control character
-
a type of character that is used for control purposes
rather than for information exchange. Control characters are usually
nonprintable.
- Cross-Environment Data Access
-
See CEDA.
- data representation
-
the form in which data is stored in a particular
operating environment. Different operating environments use different
standards or conventions for storing floating-point numbers (for example,
IEEE or IBM 390); for character encoding (ASCII or EBCDIC); for the
ordering of bytes in memory (big Endian or little Endian); for word
alignment (4-byte boundaries or 8-byte boundaries); and for data-type
length (16-bit, 32-bit, or 64-bit).
- DBCS
-
See double-byte character set.
- double-byte character set
-
a type of encoding for which one or two bytes
of computer memory are required to represent each character. Examples
of double-byte character sets are Japanese, Korean, and Chinese. Short
form: DBCS.
- EBCDIC
-
a family of single-byte and multi-byte encodings
for the representation of data on IBM mainframe and mid-range computers.
EBCDIC encodes the uppercase and lowercase letters of the English
alphabet, punctuation marks, the digits 0-9, and an extended set of
control characters. Short form: EBCDIC
- encode
-
to represent data in a particular character encoding
scheme. For example, in ASCII, the letter "A" is represented as 41
(hexadecimal).
- encoding
-
a mapping of a coded character set to code values.
- encoding method
-
the set of rules that are used for assigning numeric
representations to the characters in a character set. For example,
these rules specify how many bits are used for storing the numeric
representation of the character, as well as the ranges in the code
page in which characters appear. The encoding methods are standards
that have been developed in the computing industry. An encoding method
is often specific to a computer hardware vendor. Common encoding methods
include ASCII, EBCDIC, the ISO 646 family, the ISO 8859 family, and
Unicode.
- Extended Binary Coded Decimal Interchange Code
-
See EBCDIC.
- graphic character
-
a type of character that can be written, printed,
or displayed.
- I18N
-
See internationalization.
- International Organization for Standardization
-
See ISO.
- internationalization
-
the process of designing a software product without
making assumptions that are based on a single language or locale.
Internationalization ensures that international conventions (including
rules for sorting strings and for formatting dates, times, numbers,
and currencies) are supported. It also facilitates a consistent user
experience across different language editions of a product. (Short
form: I18N.)
- ISO
-
an organization that promotes the development
of standards and sponsors related activities that help to disseminate
products and services among nations. Also it supports the exchange
of intellectual, scientific, and technological information. Short
form: ISO.
- ISO 646 family
-
the name of a group of 7-bit encodings that are
defined in the ISO 646 standard and that each include up to 128 characters.
The ISO 646 encodings are similar to ASCII except that ISO 646 has
12 code points that are used for national variants. National variants
are specific characters that are needed for a particular language.
- ISO 8859 family
-
the set of 16 8-bit encodings that are defined
in the ISO 8859 standard. Each encoding contains both the 128 ASCII
characters and the 128 extended characters, which are used in the
language or languages that are supported by the encoding. For example,
ISO 8859-1, also called Latin-1, is a commonly used encoding in the
ISO 8859 family that contains the ASCII characters as well as characters
used by Western European languages.
- language
-
an aspect of locale that is not necessarily unique
to any one country or geographic region. For example, Portuguese is
spoken in Brazil as well as in Portugal, but there are separate locales
for Portuguese_Portugal and Portuguese_Brazil.
- locale
-
a setting that reflects the language, local conventions,
and culture for a geographic region. Local conventions can include
specific formatting rules for paper sizes, dates, times, and numbers,
and a currency symbol for the country or region. Some examples of
locale values are French_Canada, Portuguese_Brazil, and Chinese_Singapore.
- localization
-
the process of adapting a product to meet the
language, cultural, and other requirements of a specific target environment
or market so that customers can use their own languages and conventions
when using the product. Translation of the user interface, system
messages, and documentation is part of localization.
- logogram
-
a visual symbol that represents a word or morpheme
rather than a speech sound. An example of a logogram in the Chinese
language is 山 for the word "mountain".
- MBCS
-
See multi-byte character set.
- multi-byte character set
-
a type of encoding for which one or more bytes
of computer memory are required to represent each character. Examples
of multi-byte character sets are Japanese, Korean, and Chinese. Short
form: MBCS.
- national character
-
a character (letter, ideograph, or pictograph)
that belongs to a writing system, but is not a Latin character (A-Z
and a-z).
- national language support
-
See NLS.
- NLS
-
the set of features that enable a software product
to function properly in every global market for which the product
is targeted. Short form: NLS.
- SBCS
-
See single-byte character set.
- single-byte character set
-
a type of encoding for which each character is
represented using one byte of computer memory. An example of a single-byte
character set is Latin 1. Short form: SBCS.
- special character
-
a type of character other than alphanumeric characters,
the underscore (_), and the blank. An example is the asterisk (*).
- transcoding
-
the process of converting the contents of a SAS
file from one encoding to another encoding. Transcoding is necessary
if the session encoding and the file encoding are different, such
as when transferring data from a Latin 1 encoding under UNIX to a
German EBCDIC encoding on an IBM mainframe.
- translation table
-
an operating environment-specific SAS catalog
entry that is used to translate the value of one character to another.
Translation tables often are needed to support the use of multiple
national languages in an application. An example of a translation
table is one that converts characters from EBCDIC to ASCII-ISO.
- Unicode
-
a computing industry standard for the consistent
encoding, representation and handling of text expressed in most of
the world's writing systems. Unicode includes more than 109,000 characters
covering dozens of scripts, plus standards for character properties
such as upper and lower case, for rendering bidirectional script,
and a number of related items.
Copyright © SAS Institute Inc. All rights reserved.