Integrating DataFlux Software with SAS Offerings

Overview

SAS has fully integrated the DataFlux suite of data quality, data integration, data governance, and master data management solutions into its SAS offerings. This helps customers build a more integrated information management approach that goes beyond data management and governance to support analytics and decision management.
SAS has certain software offerings, such as SAS Data Management, that include SAS Data Integration Studio, SAS Data Quality Server, and SAS/ACCESS interfaces as well as the DataFlux data management products. The SAS Data Quality offering, for example, consists of SAS Data Quality Server, a Quality Knowledge Base (QKB), and SAS language elements. Certain DataFlux products, when used together with SAS products, also enable you to manage data profiling, quality, integration, monitoring, and enrichment.
Many of the features in SAS Data Quality Server and the DataFlux Data Management Studio, for example, can be used in SAS Data Integration Studio jobs. You can also execute DataFlux jobs, profiles, and services from SAS Data Integration Studio.
If your site has licensed the appropriate SAS offerings, you can take advantage of the following components:
DataFlux Data Management Studio
a desktop client that combines data quality and data discovery features. You can use this client to create jobs, profiles, standardization schemes, and other resources that can be included in SAS Data Integration Studio jobs.
DataFlux Data Management Server
provides a scalable server environment for large DataFlux Data Management Studio jobs. Jobs can be uploaded from DataFlux Data Management Studio to a DataFlux Data Management Server, where the jobs are executed. SAS Data Integration Studio can execute DataFlux jobs on this server.
DataFlux Web Studio
a web-based application with separately licensed modules that enable you to perform data management tasks.
data job
a DataFlux job that specifies a set of data cleansing and enrichment operations that flow from source to target.
data service
a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server.
process job
a DataFlux job that combines data processing with conditional processing. The process flow in the job supports logical decisions, looping, events, and other features that are not available in a data job flow.
profile
a job that executes one or more data profiling operations and displays a report based on the result of these operations. Data profiling encompasses discovery and audit activities that help you assess the composition, organization, and quality of databases.
Quality Knowledge Base (QKB)
a collection of files and reference sources that allow Blue Fusion and consequently all DataFlux software to do parsing, standardization, analysis, matching, and other processes. A QKB includes locales, standardization schemes, and other resources.
a collection of data types and definitions that are pertinent to a particular language or language convention. A locale for English – UK, for example, has an address parse definition different from an English – US parse definition. The address format is significantly different even though the language is similar.
standardization scheme
a file that contains pairs of data values and standardized values. Schemes are used to standardize columns by providing a set of acceptable values.
standardization definition
a set of logic used to standardize an element within a string. For example, a definition could be used to expand all instances of “Univ.” to “University” without having to specify every literal instance such as “Univ. Arizona” and “Oxford Unv.” in a scheme.

Transformations in the Data Quality Folder

The Transformations tree in SAS Data Integration Studio includes a Data Quality folder. This folder includes the following transformations. In general, you could use Apply Lookup Standardization, Create Match Code, and Standardize with Definition for data cleansing operations. You could use DataFlux Batch Job and DataFlux Data Service to perform tasks that are a specialty of DataFlux software, such as profiling, monitoring, or address verification.
Apply Lookup Standardization
enables you to select and apply DataFlux schemes that standardize the format, casing, and spelling of character columns in a source table.
Create Match Code
enables you to analyze source data and generate match codes based on common information shared by clusters of records. Comparing match codes instead of actual data enables you to identify records that are in fact the same entity, despite minor variations in the data.
DataFlux Batch Job
enables you to select and execute a DataFlux job that is stored on a DataFlux Data Management Server. You can execute DataFlux Data Management Studio data jobs, process jobs, and profiles. You can also execute Architect jobs that were created with DataFlux® dfPower® Studio.
DataFlux Data Service
enables you to select and execute a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server.
Standardize with Definition
enables you to select and apply DataFlux standardization definitions to elements within a text string. For example, you might want to change all instances of “Mister” to “Mr.” but only when “Mister” is used as a salutation. Requires SAS Data Quality Server.
If you export and import DataFlux Data Management Studio jobs that contain DataFlux Batch Job transformations or DataFlux Data Service transformations, then there are some special considerations. For more information, see Preparing to Import or Export SAS Package Metadata.