SAS Quality Knowledge Base

SAS Quality Knowledge Base (QKB) is a collection of files that store data and logic that define data management operations such as parsing, standardization, and matching. SAS software products refer to the QKB when you perform data management operations such as data cleansing or address parsing.
There are two QKB products, QKB for Contact Information and QKB for Product Data. QKB for Contact Information supports the management of commonly used contact information for individuals and organizations, such as names, addresses, company names, and phone numbers. QKB for Product Data supports the management of common attributes related to products and services, such as dimensions, color, materials, packaging terms, and part numbers.
A QKB contains hundreds of thousands data points and rules that enable the computer to analyze and correct data like a human. This collection of data quality rules is shared across the entire SAS suite, supporting a “write once, use anywhere” data quality rules strategy. This same set of rules can be used in-stream, in-database, or in-memory.
You can modify and extend a QKB to cleanse literally any type of data simply by modifying or creating new pattern libraries. It can also be used to manage entities such as word vocabularies, phonetic match rules, and standardization rules. When you upgrade the QKB, the installation automatically identifies user-defined modifications and merges the modifications into the new release.
Each QKB supports and is licensed by a locale. The locale is organized by language and country (for example, English, United States; English, Canada; and French, Canada). SAS supports QKB locales for more than 40 language regions, including French, German, Italian, Russian, Chinese, and Polish. You can license support for one or more locales for each QKB for your enterprise. To process data that originates in specific locales, license those locales for the QKB that handles that type of data.
The data quality algorithms used in the QKB are completely tunable through an application called Customize that enables you to modify the QKBs used in your DataFlux Data Management Studio data flow jobs. Customize enables you to add or modify the following components:
  • pattern rules
  • word vocabularies
  • standardization or recode rules
  • phonetic match rules
  • regular expression rules
You can use the customization interface to teach the SAS engine how to parse, match, and standardize content that includes product names, descriptions, numeric information, and more. The ability to customize enables you to create completely new data quality definitions to better meet your projects' needs. These new definitions that are added to the Quality Knowledge Base are instantaneously available across the entire SAS suite.