SAS Quality Knowledge
Base (QKB) is a collection of files that store data and logic that
define data management operations such as parsing, standardization,
and matching. SAS software products refer to the QKB when you perform
data management operations such as data cleansing or address parsing.
There are two QKB products,
QKB for Contact Information and QKB for Product Data. QKB for Contact
Information supports the management of commonly used contact information
for individuals and organizations, such as names, addresses, company
names, and phone numbers. QKB for Product Data supports the management
of common attributes related to products and services, such as dimensions,
color, materials, packaging terms, and part numbers.
A QKB contains hundreds
of thousands data points and rules that enable the computer to analyze
and correct data like a human. This collection of data quality rules
is shared across the entire SAS suite, supporting a “write
once, use anywhere” data quality rules strategy. This same
set of rules can be used in-stream, in-database, or in-memory.
You can modify and extend
a QKB to cleanse literally any type of data simply by modifying or
creating new pattern libraries. It can also be used to manage entities
such as word vocabularies, phonetic match rules, and standardization
rules. When you upgrade the QKB, the installation automatically identifies
user-defined modifications and merges the modifications into the new
release.
Each QKB supports and
is licensed by a locale. The locale is organized by language and country
(for example, English, United States; English, Canada; and French,
Canada). SAS supports QKB locales for more than 40 language regions,
including French, German, Italian, Russian, Chinese, and Polish. You
can license support for one or more locales for each QKB for your
enterprise. To process data that originates in specific locales, license
those locales for the QKB that handles that type of data.
The data quality algorithms
used in the QKB are completely tunable through an application called
Customize that enables you to modify the QKBs used in your DataFlux
Data Management Studio data flow jobs. Customize enables you to
add or modify the following components:
-
-
-
standardization or recode rules
-
-
You can use the customization
interface to teach the SAS engine how to parse, match, and standardize
content that includes product names, descriptions, numeric information,
and more. The ability to customize enables you to create completely
new data quality definitions to better meet your projects' needs.
These new definitions that are added to the Quality Knowledge Base
are instantaneously available across the entire SAS suite.