Setting Data Exploration Properties
Overview
You can set the properties used to generate a data exploration report to specify the sources and analysis methods for the report.
Click the Properties tab and set the properties for the report that you want to generate.
Specify Data Sources
You can use the Data sources field to review the sources that are available for inclusion in your data exploration. Then, you can click Add Table to specify the sources that you need. Note that you can either open a table and pick individual fields with check boxes or select the check box next to the table to pick all of its fields.
For example, you could select all of the fields in the Client_Info, Client_Merge_Data, and Contacts tables by selecting the check boxes next to those tables. Then, you could add only the Address field in the NC_Customer table by opening the table and selecting the check box next to that field.
Specify Analysis Methods
You can specify the analysis methods that you want to use in constructing your data exploration report. You can choose from the following analysis methods:
- Field name matching - Uses match definitions and a locale to analyze the name of each field to discover which names match each other. Match definitions use a context-specific algorithm that combines parsing, normalization, standardization, and phonetics to identify potential duplicate records in a database table. For this example, you could select the Address match definition and set a locale of ENUSA and a sensitivity level of 85.
- Field name analysis - Uses identification analysis definitions and a locale to analyze the name of each field to determine which identity to assign to the field. Identification analysis definitions use an algorithm designed to guess at the identity of a data element based on the specified value. The definition takes advantage of a vocabulary to look up words that are known to be associated with certain identities. You could use the Contact Info identification analysis definition for a sample data exploration.
- Sample data analysis - Uses identification analysis definitions and a locale to determine which identity to assign to the field. Identification analysis definitions contain the logic and reference data used make this determination, so keep the Contact Info identification analysis definition. Unlike the field name matching and field name analysis methods, which examine metadata, the sample data analysis method examines a small sample of physical data. Use the Sample size(records) to specify the number of records included in the data sample.
Note: You can save your data exploration under a different name if you click Save Exploration As in the File menu. However, the properties for the data exploration are set only when the original data exploration report has been generated. To generate the report, click the Report tab. Then, you can click Save Exploration As and save a copy of the data exploration with its properties intact.