SAS enterprise software
offerings such as SAS Data Management and SAS Data Integration Server
include SAS Data Integration Studio, SAS Data Quality Server, and
the DataFlux Data Management Platform. The SAS Data Quality Server
consists of a Quality Knowledge Base (QKB) and SAS language elements.
The DataFlux Data Management Platform provides a single environment
for managing data quality, data discovery, and master data management
(MDM).
Many of the features
in SAS Data Quality Server and the DataFlux Data Management Platform
can be used in SAS Data Integration Studio jobs. For example, you
can use DataFlux standardization schemes and definitions in SAS Data
Integration Studio jobs. You can also execute DataFlux jobs, profiles,
and services from SAS Data Integration Studio.
If your site has licensed
the appropriate DataFlux software, you can take advantage of the following
components:
DataFlux Web Studio
a web-based application
with separately licensed modules that enable you to perform data management
tasks.
DataFlux Data Management Studio
a desktop client that
combines data quality and data discovery features. You can use this
client to create jobs, profiles, standardization schemes, and other
resources that can be included in SAS Data Integration Studio jobs.
DataFlux Data Management Server
provides a scalable
server environment for large DataFlux Data Management Studio jobs.
Jobs can be uploaded from DataFlux Data Management Studio to a DataFlux
Data Management Server, where the jobs are executed. SAS Data Integration
Studio can execute DataFlux jobs on this server.
data job
a DataFlux job that
specifies a set of data cleansing and enrichment operations that flow
from source to target.
data service
a data job that has
been configured as a real-time service and deployed to a DataFlux
Data Management Server.
process job
a DataFlux job that
combines data processing with conditional processing. The process
flow in the job supports logical decisions, looping, events, and other
features that are not available in a data job flow.
profile
a job that executes
one or more data profiling operations and displays a report based
on the result of these operations. Data profiling encompasses discovery
and audit activities that help you assess the composition, organization,
and quality of databases.
DataFlux Quality Knowledge Base (QKB)
a collection of files
and reference sources that allow Blue Fusion and consequently all
DataFlux software to do parsing, standardization, analysis, matching,
and other processes. A QKB includes locales, standardization schemes,
and other resources.
locale
a collection of data
types and definitions that are pertinent to a particular language
or language convention. A locale for English –
UK, for example, has an address parse definition different
than an English – US parse definition.
The address format is significantly different even though the language
is similar.
standardization scheme
a file that contains
pairs of data values and standardized values. Schemes are used to
standardize columns by providing a set of acceptable values.
standardization definition
a set of logic used
to standardize an element within a string. For example, a definition
could be used to expand all instances of “Univ.” to
“University” without having to specify every literal
instance such as “Univ. Arizona” and “Oxford
Unv.” in a scheme.