About QKB

The SAS Quality Knowledge Base (QKB) is a collection of files that store data and logic that define data management operations such as parsing, standardization, and matching. SAS software products refer to the QKB when performing data management operations, also referred to as data cleansing, on your data. Each SAS QKB is defined by a locale that specifies the language or character set that is used for managing different types of data. The examples in this chapter are based on the English, USA (ENUSA) locale.
There are several types of definitions in SAS QKB. The definition types available in SAS QKB that are exposed in DS2 are as follows:
  • Case Definitions: Use case definitions to apply uppercase and lowercase lettering using context-sensitive rules.
  • Extraction Definitions: Extraction definitions are used to extract specific entities or attributes from a text string.
  • Gender Definitions: Use gender definitions to determine the gender of a person from his or her name or other information.
  • Identification Definitions: Identification definitions determine the type of data that is represented by a text string.
  • Match Definitions: Use match definitions to generate a matchcode for a text string.
  • Parse Definitions: Use parse definitions to segment a string into several parts.
  • Pattern Definitions: Use pattern definitions to return a simple representation of a character pattern based on a text string.
  • Standardization Definitions: Standardization definitions generate a preferred standard representation of a string, presenting a consistent format for data.
For complete details, see the Help that is delivered with SAS QKB.
Last updated: March 6, 2018