DataFlux Data Management Studio 2.7: User Guide

Parse Definition Quick Editor Usage Notes

General

Why would I use the Parse Definition Quick Editor?

The Parse Definition Quick Editor allows you to create a new parse definition using a step-by-step process. When you view the results of the new parse definition you can determine the changes you need to make. You can also use the Parse Definition Quick Editor to apply existing parse definitions to your data and edit using the chop table, regex, grammar, or vocabulary components.

Once I have created my parse definition using the Parse Definition Quick Editor, what else should I configure to complete the parse definition process?

You will need to open Customize and set your Preprocessing and Token Mapping to have your new parse definition available.

NoteNote: Both Customize and the Parse Definition Quick Editor need to access the QKB.

What data sources are available in the Parse Definition Quick Editor for my incoming data?

You can access an ODBC data source, SQL Query, Delimited Text File, Fixed Width Text File, SAS Data Sets, and Profile Reports.

How are my data sources configured?

You will configure your data sources the same way as you configure them for the DataFlux Data Management Studio data job input nodes.

What are Basic and Derived Categories?

A basic category is a category that represents a single word. Basic categories are the basic building blocks of Grammar rules. Every basic category in a Grammar corresponds to a category in an ordered word list. A derived category is a category composed of one or more other categories. The makeup of a derived category is described using rules.

Are word entries automatically re-parsed and updated with the changes in my parse definition?

To include all parse changes, click Options > Refresh Parse Results (from the main menu). This allows you to re-parse the data content with all the changes. You are prompted to save the parse definition as well.

When I have the Parse Definition Quick Editor open, can I have other vocabularies, chop tables, or regex libraries open within the parse definition?

The Parse Definition Quick Editor does not allow you to make changes directly to the vocabularies, chop tables, or regex libraries within a parse definition. To do this, use Customize.

What is a root category?

The root category is where all other rules (basic and derived) begin.

Common Errors

Error Description
Please select a locale before creating a new definition You must install a Quality Knowledge Base (QKB) and select the locale you want available in the Parse Definition Quick Editor prior to creating a new definition.
Unable to load locale You cannot have more than one instance of the Parse Definition Quick Editor running at one time. You also cannot have Customize and the Parse Definition Quick Editor open at the same time. Both tools require access to the QKB.
You cannot create left recursive rules This error message appears when you try to assign a word to a derived category to itself under the Categorization and Rule Building tab.
No Parse Definition Solution This message appears when the Categorization and Rule Building tab cannot apply a grammar rule to an entry.
An error occurred while setting the Parse Definition sought category: Unable to determine sought category index Check your grammar abbreviation, you may have incorrectly typed the abbreviation.

Categorization and Rule Building

What is Categorization and Rule Building?

Click the Categorization and Rule Building tab to view the result after your new parse definition is applied to your data. This table provides:

What root category rule is assigned when there is no rule matched?

When there is no rule matched, you will see the message, "No Parse Solution" next to the entry.

What does "No Parse Definition Solution" mean?

This message appears when the Categorization and Rule Building tab cannot apply a grammar rule to an entry.

What does the Word/Category fields listed at the bottom of the Parse Definition Quick Editor mean when I select an entry in the Categorization and Rule building tab?

This illustrates how the Parse Definition Quick Editor parsed entry finds the correct grammar rule for the selected entry. Each step displays how the final definition arrived at the results from the previous definitions and processed components.

How can I narrow down the number of entries under Categorization and Rule Building?

Create a filter, click Edit > Filter List.

Sometimes there is no category listed for words that are defined in my vocabulary, why?

When an entry does not fall under a particular rule category, it is difficult to select the correct category. When this happens there is no category associated with the entry.

Does Parse Definition Quick Editor have support for categorization regexlibs?

The Parse Definition Quick Editor does not support categorizing regexlibs.

When I remove a rule from the Root Category the derived rule also becomes unlisted. Is this expected behavior?

Yes, this is expected. When you remove a rule from Root Category, only the basic types remain in the list.

The Parse Definition Quick Editor Categorization and Rule tab does not display derived rules. Is this expected behavior?

Parse Definition Quick Editor does not display the derived rules unless the derived rule falls into a root category.

Element Analysis

What is Element Analysis?

Element Analysis shows how often each word is listed along with the category, based on the parse definition vocabulary.

Can I assign a basic category to a word?

Yes, right-click on an entry and select Assign Basic Category > Create Category.

How do I unassign an existing basic category?

To unassign a basic category, right-click on an entry and select Unassign Basic Category.

Why is there no category next to an entry, but the entry falls within a grammar?

Some word entries are not included in the vocabulary associated with a grammar. However for these entries the grammar assigns a default guess category. Based on that default category the grammar rule is still applied. The entries that are not listed in the vocabulary are not assigned a category in Element Analysis.

Why are some numbers listed under the NUM category in Element Analysis while others do not include numbers?

The category list on the Element Analysis tab is built from the vocabulary. If the number does not appear in the vocabulary with the NUM category assigned, then the NUM category does not appear on the Element Analysis tab. Some numbers appear correctly categorized as NUM on the Categorization and Rule Building tab, even though they are not in the vocabulary. This happens because some parse definitions have a categorization regexlib that can match numbers with the NUM category.

Can I unassign basic categories from multiple entries at one time, especially when the entries have the same type?

No, you must unassign basic categories one at a time.

Filters

What are filters?

Filters allow you to narrow down entries shown in the Categorization and Rule Building, Element Analysis, and Vocabulary tabs.

How much can I apply filters?

With filters you can build "AND" logic with conditions on any of the available columns within each tab. The filter operations change depending on the data type of the content, string, numeric, or date type.

Can I use regular expressions in my filtering?

Yes, to use a regular expression when creating filters use the "LIKE" operation which allows you to apply a regular expression.

Can I use "OR" logic in filters?

This functionality is not available at this time.

Which Parse Definition Quick Editor tabs use filtering?

You can filter your content on the Categorization and Rule Building, Element Analysis, and Vocabulary tabs.

Is filtering case sensitive?

By default, filtering is case insensitive unless you select Match Case.

Chop Table

What is Chop Table?

The Chop Table in Parse Definition Quick Editor provides a similar function as the separate Chop Table Editor. You can edit and create chop tables within the parse definition.

Does the Chop Table option in the Parse Definition Quick Editor provide the same functionality available in Chop Table Editor?

Chop Table in the Parse Definition Quick Editor does not provide the ability to test your chop table like the Chop Table Editor.

Do my changes to the Chop Table appear immediately on the Categorization and Rule Building tab?

Yes, the changes immediately appear on the Categorization and Rule Building tab. You can also click Options > Refresh Parse Results.

I can select only one item at a time in Chop Table Editor. Is this by design?

Yes, you can make only changes to one item at a time.

How do I make the font larger or smaller?

Use the toolbar options, Increase Font Size and Decrease Font Size to adjust the font in the Parse Definition Quick Editor.

Can I locate a character based on the decimal or hexadecimal value?

Yes, to locate a character using the decimal or hexadecimal value, click Search > Go To.

What is implicit separation?

Implicit separators are an optional feature of string chopping. For each parse definition, you can enable or disable implicit separators. If implicit separators are enabled, a separator mark is placed wherever the classification of a character differs from the classification of the previous character in a string. See Chop Table Editor - Character Level Options for additional information.

Normalization Regexlib

What is Normalization Regexlib?

The Normalization Regexlib in the Parse Definition Quick Editor provides similar functionality as the separate Regex Editor. You can edit and create new regex libraries within the parse definition.

Does the Normalization Regexlib option in the Parse Definition Quick Editor provide the same functionality available in the Regex Editor?

Normalization Regexlib in the Parse Definition Quick Editor does not provide the ability to test your regex libraries like the Regex Editor.

Do my changes to the Normalization Regexlib appear immediately on the Categorization and Rule Building tab?

Yes, the changes immediately appear on the Categorization and Rule Building tab. You can also click Options > Refresh Parse Results.

Where can I find more information about using regular expressions in DataFlux products?

For additional information about creating regular expressions, refer to the DataFlux Expression Language 2.5: Reference Guide.

Grammar

What is Grammar?

The Grammar tab in the Parse Definition Quick Editor provides the same functionality as the separate Grammar Editor. You can edit and create new grammars within the parse definition.

Does the Grammar option in the Parse Definition Quick Editor provide the same functionality as the separate Grammar Editor?

Yes, both Grammar options provide the same functionality.

Vocabulary

What is Vocabulary?

The Vocabulary tab in the Parse Definition Quick Editor provides similar functionality as the separate Vocabulary Editor. You can edit and create new vocabularies for the parse definition.

Does the Vocabulary tab in the Parse Definition Quick Editor provide the same functionality as the Vocabulary Editor?

Vocabulary does not provide the option to import vocabularies like the Vocabulary Editor.

How can I see all the different categories that I have for my Vocabularies?

Refer to your Grammar basic categories which includes your vocabulary categories.

Creating a New Grammar and Rule

What components can I reuse or start with when building a new rule?

You have an option to begin building a new rule with an existing parse definition and regex library.

Can I start creating a new parse definition without using any regex libraries or chop definitions?

You can create a parse definition without a regex library but you must have at least a chop table definition.

How do I start creating grammars?

From the Categorization and Rule Building tab, click an entry to select. At the bottom of the screen, you should see a Word/Category listing for the entry. Select a word, right-click and select Assign Basic Category. You can follow these steps to create a derived category and assign that derived category to a root category.

How do I add a basic category to a word?

To assign a word to a basic category from Categorization And Rule Building tab and Element Analysis tab, select the word, right-click and select Assign Basic Category.

How do I create derived categories from Categorization and Rule Building?

You can create derived categories after you have assigned basic categories to your words. Select a basic category or multiple basic categories, right-click and select Add Rule To Derived Category.

Can I create recursive rules to avoid repetition?

You can create right recursive rules such as:

Address
Street Address

But not left recursive rules such as:

Address
Address Street

Left recursive rules yield an infinite loop.

What does the message "You cannot create left recursive rules" mean?

This error message appears when you try to assign a word to a derived category to itself under the Categorization and Rule Building tab.

Can I change the chopping of a word from the Categorization and Rule Building tab?

Yes, under Word/Category, select the word then right-click and select Adjust Chopping. Click the operation drop-down list and select USE, SUPPRESS, or TRIM for this word.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: DMCust_QKBUsage_PDQuick_Ed.html