Most of the data quality
transformations ask you to select a source column, a locale, and a
definition. A locale represents a distinct
alphabetical language, combined with a specified regional usage of
that language. For example, the English, United States locale applies
only to that region. The locale English, England addresses different
usage or data content for the same alphabetic language.
A locale consists of
a collection of definitions. Definitions
tell SAS how to cleanse data. For example, the Street Address definition
for the English, United States locale describes the structure of the
first part of an American mailing address. In the locale Spanish,
Mexico, the Street Address definition accommodates differences in
mailing address structure as well as the differences in language and
alphabet.
Locales and definitions
make up a SAS Quality Knowledge Base. A
Quality Knowledge Base is deployed on your Hadoop cluster. When you
run a data cleansing job in Hadoop, the SAS software on your cluster
accesses the Quality Knowledge Base to transform your data.
In SAS Data Loader you
specify a default locale, which should match the typical locale of
your source data. The default locale is selected in the
QKB panel
of the
Configuration window,
as described in QKB Panel. You can override the default locale in any of the data
quality transformations. The override applies only to the current
transformation.
To learn more about
the Quality Knowledge Base, refer to the related document titles
in Recommended Reading.
To learn about the output
that is generated by a given definition, refer to the online Help
for the SAS Quality Knowledge Base, in the topic
Global Definitions.