Integration with DataFlux Data Management Platform

Overview

SAS enterprise software offerings such as SAS Data Management and SAS Data Integration Server include SAS Data Integration Studio, SAS Data Quality Server, and the DataFlux Data Management Platform. The SAS Data Quality Server consists of a Quality Knowledge Base (QKB) and SAS language elements. The DataFlux Data Management Platform provides a single environment for managing data quality, data discovery, and master data management (MDM).
Many of the features in SAS Data Quality Server and the DataFlux Data Management Platform can be used in SAS Data Integration Studio jobs. For example, you can use DataFlux standardization schemes and definitions in SAS Data Integration Studio jobs. You can also execute DataFlux jobs, profiles, and services from SAS Data Integration Studio.
If your site has licensed the appropriate DataFlux software, you can take advantage of the following components:
DataFlux Web Studio
a web-based application with separately licensed modules that enable you to perform data management tasks.
DataFlux Data Management Studio
a desktop client that combines data quality and data discovery features. You can use this client to create jobs, profiles, standardization schemes, and other resources that can be included in SAS Data Integration Studio jobs.
DataFlux Data Management Server
provides a scalable server environment for large DataFlux Data Management Studio jobs. Jobs can be uploaded from DataFlux Data Management Studio to a DataFlux Data Management Server, where the jobs are executed. SAS Data Integration Studio can execute DataFlux jobs on this server.
data job
a DataFlux job that specifies a set of data cleansing and enrichment operations that flow from source to target.
data service
a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server.
process job
a DataFlux job that combines data processing with conditional processing. The process flow in the job supports logical decisions, looping, events, and other features that are not available in a data job flow.
profile
a job that executes one or more data profiling operations and displays a report based on the result of these operations. Data profiling encompasses discovery and audit activities that help you assess the composition, organization, and quality of databases.
DataFlux Quality Knowledge Base (QKB)
a collection of files and reference sources that allow Blue Fusion and consequently all DataFlux software to do parsing, standardization, analysis, matching, and other processes. A QKB includes locales, standardization schemes, and other resources.
locale
a collection of data types and definitions that are pertinent to a particular language or language convention. A locale for English – UK, for example, has an address parse definition different than an English – US parse definition. The address format is significantly different even though the language is similar.
standardization scheme
a file that contains pairs of data values and standardized values. Schemes are used to standardize columns by providing a set of acceptable values.
standardization definition
a set of logic used to standardize an element within a string. For example, a definition could be used to expand all instances of “Univ.” to “University” without having to specify every literal instance such as “Univ. Arizona” and “Oxford Unv.” in a scheme.

Transformations in the Data Quality Folder

The Transformations tree in SAS Data Integration Studio includes a Data Quality folder. This folder includes the following transformations. In general, you could use Apply Lookup Standardization, Create Match Code, and Standardize with Definition for data cleansing operations. You could use DataFlux Batch Job and DataFlux Data Service to perform tasks that are a specialty of DataFlux software, such as profiling, monitoring, or address verification.
Apply Lookup Standardization
enables you to select and apply DataFlux schemes that standardize the format, casing, and spelling of character columns in a source table.
Create Match Code
enables you to analyze source data and generate match codes based on common information shared by clusters of records. Comparing match codes instead of actual data enables you to identify records that are in fact the same entity, despite minor variations in the data.
DataFlux Batch Job
enables you to select and execute a DataFlux job that is stored on a DataFlux Data Management Server. You can execute DataFlux Data Management Studio data jobs, process jobs, and profiles. You can also execute Architect jobs that were created with DataFlux® dfPower® Studio.
DataFlux Data Service
enables you to select and execute a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server.
Standardize with Definition
enables you to select and apply DataFlux standardization definitions to elements within a text string. For example, you might want to change all instances of “Mister” to “Mr.” but only when “Mister” is used as a salutation. Requires SAS Data Quality Server.
If you export and import SAS Data Management Studio jobs that contain DataFlux Batch Job transformations or DataFlux Data Service transformations, then there are some special considerations. For more information, see Preparing to Import or Export SAS Package Metadata.