The SAS Quality Knowledge
Base (QKB) is a collection of files that store data and logic that
define data management operations such as parsing, standardization,
and matching. SAS software products refer to the QKB when performing
data management operations, also referred to as data cleansing, on
your data. Each SAS QKB is defined by a locale that specifies the
language or character set that is used for managing different types
of data. The examples in this chapter are based on the English, USA
(ENUSA) locale.
There are several types
of definitions in SAS QKB. The definition types available in SAS QKB
that are exposed in DS2 are as follows:
-
Case Definitions: Use case definitions to apply
uppercase and lowercase lettering using context-sensitive rules.
-
Extraction Definitions: Extraction definitions are used
to extract specific entities or attributes from a text string.
-
Gender Definitions: Use gender definitions to determine
the gender of a person from his or her name or other information.
-
Identification Definitions: Identification definitions determine the type of data that is represented by a text string.
-
Match Definitions: Use match definitions to generate
a matchcode for a text string.
-
Parse Definitions: Use parse definitions to segment
a string into several parts.
-
Pattern Definitions: Use pattern definitions to return
a simple representation of a character pattern based on a text string.
-
Standardization Definitions: Standardization definitions generate
a preferred standard representation of a string, presenting a consistent
format for data.
For complete details,
see the Help that is delivered with SAS QKB.