DataFlux Data Management Studio 2.5: User Guide
Here are some questions and answers you may have regarding the Parse Definition Quick Editor:
Both the Parse Definition Quick Editor and Customize require access the Quality Knowledge Base (QKB) therefore, you cannot have both tools open at the same time.
The Parse Definition Quick Editor allows you to create a new parse definition using a step-by-step process. When you view the results of the new parse definition you can determine the changes you need to make. You can also use the Parse Definition Quick Editor to apply existing parse definitions to your data and edit using the chop table, regex, grammar, or vocabulary components.
You will need to open Customize and set your Preprocessing and Token Mapping to have your new parse definition available.
Note: Both Customize and the Parse Definition Quick Editor need to access the QKB.
You can access an ODBC data source, SQL Query, Delimited Text File, Fixed Width Text File, SAS Data Sets, and Profile Reports.
You will configure your data sources the same way as you configure them for the DataFlux Data Management Studio data job input nodes.
A basic category is a category that represents a single word. Basic categories are the basic building blocks of Grammar rules. Every basic category in a Grammar corresponds to a category in an ordered word list. A derived category is a category composed of one or more other categories. The makeup of a derived category is described using rules.
To include all parse changes, click Options > Refresh Parse Results (from the main menu). This allows you to re-parse the data content with all the changes. You are prompted to save the parse definition as well.
The Parse Definition Quick Editor does not allow you to make changes directly to the vocabularies, chop tables, or regex libraries within a parse definition. To do this, use Customize.
The root category is where all other rules (basic and derived) begin.
Click the Categorization and Rule Building tab to view the result after your new parse definition is applied to your data. This table provides:
When there is no rule matched, you will see the message, "No Parse Solution" next to the entry.
This message appears when the Categorization and Rule Building tab cannot apply a grammar rule to an entry.
This illustrates how the Parse Definition Quick Editor parsed entry finds the correct grammar rule for the selected entry. Each step displays how the final definition arrived at the results from the previous definitions and processed components.
Create a filter, click Edit > Filter List.
When an entry does not fall under a particular rule category, it is difficult to select the correct category. When this happens there is no category associated with the entry.
The Parse Definition Quick Editor does not support categorizing regexlibs.
Yes, this is expected. When you remove a rule from Root Category, only the basic types remain in the list.
Parse Definition Quick Editor does not display the derived rules unless the derived rule falls into a root category.
Element Analysis shows how often each word is listed along with the category, based on the parse definition vocabulary.
Yes, right-click on an entry and select Assign Basic Category > Create Category.
To unassign a basic category, right-click on an entry and select Unassign Basic Category.
Some word entries are not included in the vocabulary associated with a grammar. However for these entries the grammar assigns a default guess category. Based on that default category the grammar rule is still applied. The entries that are not listed in the vocabulary are not assigned a category in Element Analysis.
The category list on the Element Analysis tab is built from the vocabulary. If the number does not appear in the vocabulary with the NUM category assigned, then the NUM category does not appear on the Element Analysis tab. Some numbers appear correctly categorized as NUM on the Categorization and Rule Building tab, even though they are not in the vocabulary. This happens because some parse definitions have a categorization regexlib that can match numbers with the NUM category.
No, you must unassign basic categories one at a time.
Filters allow you to narrow down entries shown in the Categorization and Rule Building, Element Analysis, and Vocabulary tabs.
With filters you can build "AND" logic with conditions on any of the available columns within each tab. The filter operations change depending on the data type of the content, string, numeric, or date type.
Yes, to use a regular expression when creating filters use the "LIKE" operation which allows you to apply a regular expression.
This functionality is not available at this time.
You can filter your content on the Categorization and Rule Building, Element Analysis, and Vocabulary tabs.
By default, filtering is case insensitive unless you select Match Case.
The Chop Table in Parse Definition Quick Editor provides a similar function as the separate Chop Table Editor. You can edit and create chop tables within the parse definition.
Chop Table in the Parse Definition Quick Editor does not provide the ability to test your chop table like the Chop Table Editor.
Yes, the changes immediately appear on the Categorization and Rule Building tab. You can also click Options > Refresh Parse Results.
Yes, you can make only changes to one item at a time.
Use the toolbar options, Increase Font Size and Decrease Font Size to adjust the font in the Parse Definition Quick Editor.
Yes, to locate a character using the decimal or hexadecimal value, click Search > Go To.
Implicit separators are an optional feature of string chopping. For each parse definition, you can enable or disable implicit separators. If implicit separators are enabled, a separator mark is placed wherever the classification of a character differs from the classification of the previous character in a string. See Customize - Chop Table Editor - Character Level Options for additional information.
The Normalization Regexlib in the Parse Definition Quick Editor provides similar functionality as the separate Regex Editor. You can edit and create new regex libraries within the parse definition.
Normalization Regexlib in the Parse Definition Quick Editor does not provide the ability to test your regex libraries like the Regex Editor.
Yes, the changes immediately appear on the Categorization and Rule Building tab. You can also click Options > Refresh Parse Results.
For additional information about creating regular expressions, refer to the DataFlux Expression Language Reference Guide (click Start > Programs > DataFlux Data Management Studio2.5 > Help > Expression Language Reference Guide).
The Grammar tab in the Parse Definition Quick Editor provides the same functionality as the separate Grammar Editor. You can edit and create new grammars within the parse definition.
Yes, both Grammar options provide the same functionality.
The Vocabulary tab in the Parse Definition Quick Editor provides similar functionality as the separate Vocabulary Editor. You can edit and create new vocabularies for the parse definition.
Vocabulary does not provide the option to import vocabularies like the Vocabulary Editor.
Refer to your Grammar basic categories which includes your vocabulary categories.
You have an option to begin building a new rule with an existing parse definition and regex library.
You can create a parse definition without a regex library but you must have at least a chop table definition.
From the Categorization and Rule Building tab, click an entry to select. At the bottom of the screen, you should see a Word/Category listing for the entry. Select a word, right-click and select Assign Basic Category. You can follow these steps to create a derived category and assign that derived category to a root category.
To assign a word to a basic category from Categorization And Rule Building tab and Element Analysis tab, select the word, right-click and select Assign Basic Category.
You can create derived categories after you have assigned basic categories to your words. Select a basic category or multiple basic categories, right-click and select Add Rule To Derived Category.
You can create right recursive rules such as:
Address
Street Address
But not left recursive rules such as:
Address
Address Street
Left recursive rules yield an infinite loop.
This error message appears when you try to assign a word to a derived category to itself under the Categorization and Rule Building tab.
Yes, under Word/Category, select the word then right-click and select Adjust Chopping. Click the operation drop-down list and select USE, SUPPRESS, or TRIM for this word.
Documentation Feedback: yourturn@sas.com
|
Doc ID: dfU_Cstm_PDQE_13010.html |