DataFlux Data Management Studio 2.7: User Guide
Creating and analyzing a data profile can be useful step in planning and monitoring your data management processes. When you profile your data, you can perform the following tasks more efficiently:
Data profiling encompasses discovery and audit activities that help you assess the composition, organization, and quality of databases. Thus, a typical data profiling process helps you to recognize patterns, identify scarcity in the data, and calculate frequency and basic statistics. Data profiling can also aid in identifying redundant data across tables and cross-column dependencies. All of these tasks are critical to optimal planning and monitoring.
For example, a profiling analysis might indicate the following:
Suppose that you are marketing a line of high-end women's shoes and accessories. After data profiling, you would know that your data set is not an appropriate tool to drive sales for these products.
You can also use the profiling data job nodes in a data job. These nodes enable you to perform basic statistical analysis, data validation, or similar tasks by embedding them in a data job. Then, you can repeat the profiling tasks every time you run the job that contains the nodes. For an example of a data job that includes profiling nodes, see Creating a Data Job in the Folders Tree.
Profiles can also be executed from the command line, as described in Running Jobs from the Command Line. Finally, you can add a Profile Reference node to add a profile to a process job. The node enables you to run an actual profile operation in the context of the job. For example, you could perform some type of pre-processing to the data before profiling it such as combining data from different data sources into a single table and then profiling that table. For more information, see the Help topic on the Profile Reference Node.
Profiles use Quality Knowledge Bases. For information about setting QKB options, see How can I specify Quality Knowledge Base options for profiles and data explorations? They also support database catalogs.
Automatic data-type conversions will take place when a profile reads SAS data sets. For more information, see How are SAS Data Types Converted When DataFlux Software Reads or Writes SAS Data?
You can read an XML file in a profile. For information, see How Can I Read an XML File In a Profile?
Note: You must have a license that enables you to create a profile. However, you can read a profile without this license.
Performance tips and usage notes related to profiling are collected in Jobs, Profiles, Data Explorations.
A Monitor license is required to use custom metrics or business rules in a profile. A Quality license is required to use match codes inside a redundant data analysis (RDA).
Note: Jobs previously coded without monitor license information now require that information to run.
Documentation Feedback: yourturn@sas.com
|
Doc ID: dfU_T_ProfileOver.html |