Enterprise bundles that
include SAS Data Integration Studio also include the SAS Data Quality
Server and the DataFlux Data Management Platform. The SAS Data Quality
Server consists of a Quality Knowledge Base (QKB) and SAS language
elements. The DataFlux Data Management Platform provides a single
environment for managing data quality, data discovery, and master
data management (MDM).
Many of the features
in SAS Data Quality Server and the DataFlux Data Management Platform
can be used in SAS Data Integration Studio jobs. For example, you
can use DataFlux standardization schemes and definitions in SAS Data
Integration Studio jobs. You can also execute DataFlux jobs, profiles,
and services from SAS Data Integration Studio.
If your site has licensed
the appropriate DataFlux software, you can take advantage of the following
components:
DataFlux Data Management Studio
a desktop client that
combines data quality and data discovery features. You can use this
client to create jobs, profiles, standardization schemes, and other
resources that can be included in SAS Data Integration Studio jobs.
DataFlux Data Management Server
provides a scalable
server environment for large DataFlux Data Management Studio jobs.
Jobs can be uploaded from DataFlux Data Management Studio to a DataFlux
Data Management Server, where the jobs are executed. SAS Data Integration
Studio can execute DataFlux jobs on this server.
data job
a DataFlux job that
specifies a set of data cleansing and enrichment operations that flow
from source to target.
data service
a data job that has
been configured as a real-time service and deployed to a DataFlux
Data Management Server.
process job
a DataFlux job that
combines data processing with conditional processing. The process
flow in the job supports logical decisions, looping, events, and other
features that are not available in a data job flow.
profile
a job that executes
one or more data profiling operations and displays a report based
on the result of these operations. Data profiling encompasses discovery
and audit activities that help you assess the composition, organization,
and quality of databases.
DataFlux Quality Knowledge Base (QKB)
a collection of files
and reference sources that allow Blue Fusion and consequently all
DataFlux software to do parsing, standardization, analysis, matching,
and other processes. A QKB includes locales, standardization schemes,
and other resources.
locale
a collection of data
types and definitions that are pertinent to a particular language
or language convention. A locale for English –
UK, for example, has an address parse definition different
than an English – US parse definition.
The address format is significantly different even though the language
is similar.
standardization scheme
a file that contains
pairs of data values and standardized values. Schemes are used to
standardize columns by providing a set of acceptable values.
standardization definition
a set of logic used
to standardize an element within a string. For example, a definition
could be used to expand all instances of “Univ.” to
“University” without having to specify every literal
instance such as “Univ. Arizona” and “Oxford
Unv.” in a scheme.