Integration with DataFlux Data Management Platform

Overview

Enterprise bundles that include SAS Data Integration Studio also include the SAS Data Quality Server and the DataFlux Data Management Platform. The SAS Data Quality Server consists of a Quality Knowledge Base (QKB) and SAS language elements. The DataFlux Data Management Platform provides a single environment for managing data quality, data discovery, and master data management (MDM).
Many of the features in SAS Data Quality Server and the DataFlux Data Management Platform can be used in SAS Data Integration Studio jobs. For example, you can use DataFlux standardization schemes and definitions in SAS Data Integration Studio jobs. You can also execute DataFlux jobs, profiles, and services from SAS Data Integration Studio.
If your site has licensed the appropriate DataFlux software, you can take advantage of the following components:
DataFlux Data Management Studio
a desktop client that combines data quality and data discovery features. You can use this client to create jobs, profiles, standardization schemes and other resources that can be included in SAS Data Integration Studio jobs.
DataFlux Data Management Server
provides a scalable server environment for large DataFlux Data Management Studio jobs. Jobs can be uploaded from DataFlux Data Management Studio to a DataFlux Data Management Server, where the jobs are executed. SAS Data Integration Studio can execute DataFlux jobs on this server.
data job
a DataFlux job that specifies a set of data cleansing and enrichment operations that flow from source to target.
data service
a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server.
process job
a DataFlux job that combines data processing with conditional processing. The process flow in the job supports logical decisions, looping, events and other features that are not available in a data job flow.
profile
a job that executes one or more data profiling operations and displays a report based on the result of these operations. Data profiling encompasses discovery and audit activities that help you assess the composition, organization, and quality of databases.
DataFlux Quality Knowledge Base (QKB)
a collection of files and reference sources that allow Blue Fusion and consequently all DataFlux software to do parsing, standardization, analysis, matching, and other processes. A QKB includes locales, standardization schemes, and other resources.
locale
a collection of data types and definitions that are pertinent to a particular language or language convention. A locale for English – UK, for example, has an address parse definition different than an English – US parse definition. The address format is significantly different even though the language is similar.
standardization scheme
a file that contains pairs of data values and standardized values. Schemes are used to standardize columns by providing a set of acceptable values.
standardization definition
a set of logic used to standardize an element within a string. For example, a definition could be used to expand all instances of “Univ.” to “University” without having to specify every literal instance such as “Univ. Arizona” and “Oxford Unv.” in a scheme.

Transformations in the Data Quality Folder

The Transformations tree in SAS Data Integration Studio includes a Data Quality folder. This folder includes the following transformations. In general, you could use Apply Lookup Standardization, Create Match Code, and Standardize with Definition for data cleansing operations. You could use DataFlux Batch Job and DataFlux Data Service to perform tasks that are a specialty of DataFlux software, such as profiling, monitoring, or address verification.
Apply Lookup Standardization
enables you to select and apply DataFlux schemes that standardize the format, casing, and spelling of character columns in a source table. Requires SAS Data Quality Server 9.3.
Create Match Code
enables you to analyze source data and generate match codes based on common information shared by clusters of records. Comparing match codes instead of actual data enables you to identify records that are in fact the same entity, despite minor variations in the data. Requires SAS Data Quality Server 9.3.
DataFlux Batch Job
enables you to select and execute a DataFlux job that is stored on a DataFlux Data Management Server. You can execute DataFlux Data Management Studio data jobs, process jobs, and profiles. You can also execute Architect jobs that were created with DataFlux® dfPower® Studio. Requires DataFlux Data Management Platform 2.1.
DataFlux Data Service
enables you to select and execute a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server. Requires DataFlux Data Management Platform 2.1.
Standardize with Definition
enables you to select and apply DataFlux standardization definitions to elements within a text string. For example, you might want to change all instances of “Mister” to “Mr.” but only when “Mister” is used as a salutation. Requires SAS Data Quality Server 9.3.