DataFlux Data Management Studio 2.8: User Guide

Using QKB Definitions in Jobs, Profiles, and Explorations

Using QKB Definitions in Data Jobs

Data jobs are the main way to process data in DataFlux Data Management Studio. Each data job specifies a set of data-processing operations that flow from a source to a target.

The Quality folder in the node tree for data jobs includes a number of nodes that apply a QKB definition to analyze or transform data. For example, the next display shows the sample data job df_sample_parse.

Parsing Node in a Data Job Uses the English (United States) Locale

The Parsing node in the job specifies the English (United States) locale. The node uses the Name parse definition in that locale to parse names stored in a data-source field called CONTACT.

Using QKB Definitions in Data Profiles

A data profile is a job that analyzes a data source and produces reports. These reports reveal patterns, identify scarcity in the data, and calculate frequency and basic statistics. For example, the next display shows a report for the sample profile df_sample_profile.

Data Profile Report

If you use the Pattern Frequency Distribution metric in a profile, you can use a QKB pattern analysis definition instead of the default pattern-generation feature for profiles. You select a definition on the Quality Knowledge Base tab in the profile Options dialog.

Character Analysis Pattern is Selected for a Profile

You can also create a process job that contains a data job with Quality nodes by assigning tasks for individual data fields in a profile report.

Assigning a QKB Definition to a Field in a Profile Report

Finally, you can generate a QKB scheme from the output of a profile report, as shown in the next display.

Build a Scheme Option for a Profile

That scheme becomes part of your QKB and can then later be used to standardize data values.

Using QKB Definitions in Data Explorations

A data exploration reads data from databases and categorizes the fields in the selected tables into categories. These categories have been predefined in a QKB. Data explorations perform this categorization by matching column names. You also have the option of sampling the data in the table to determine whether the data is one of the specific types of categories in the QKB.

You can use match definitions and identification analysis definitions from your QKB to match and analyze database field names and field contents in a data exploration. This is done by selecting the QKB definition in the Analysis Methods section of the property dialog for the exploration, as shown in the next display.

QKB Definitions Specified in a Data Exploration

