Parse Definition Quick Editor

DataFlux Data Management Studio 2.8: User Guide

Parse Definition Quick Editor - Chopper Tab

The Chopper tab in Parse Definition Quick Editor provides a similar function as the separate Chop Table Editor. You can edit and create chop tables within the parse definition.

The following unique menu items are available in the Edit menu of this tab:

Disable Table - This option allows you to edit and create chop tables within the parse definition.

Disable Rules - This option allows you to update the Chop Rules. See Chop Table - Rules.

Implicit Separation - Implicit separators are an optional feature of string chopping. For each parse definition, you can enable or disable implicit separators. If implicit separators are enabled, a separator mark is placed wherever the classification of a character differs from the classification of the previous character in a string. See Chop Table Editor - Character Level Options.

The Chopper tab includes the following elements:

Chopper used in definition - Specifies the name of a single chopper that is used by the parse definition. When a new definition is created, a chopper with a placeholder name New Chopper is listed here. You can change the chopper library file or edit the chopper using the buttons next to the field. When you choose a different chopper, you are prompted to either save or discard the existing definition. Then, you can either select an existing chopper with a drop-down menu or create a new chopper in the Chopper tab.

The Table sub-tab includes the following elements:

Unicode block - the Unicode block drop-down list provides a list of character subsets, see Unicode Block for the complete list.

Character Name - the Character Name is the actual name of the character, for example semicolon

Character - the Character represents the actual appearance of the character, for example the Character Name is semicolon and Character is ;

Classification - the Classification drop-down list includes:

LETTER/SYMBOL - a letter or non-separating symbol
NUMBER - a numeric digit (0-9)
LEAD SEPARATOR - a delimiter attached to the beginning of a word (for example, the left parenthesis)
TRAIL SEPARATOR - a delimiter attached at the end of a word (for example, a period)
FULL SEPARATOR - a delimiting character (for example, space, dash, and comma)

Operation - the Operation drop-down list includes:

USE - use the character as-is in the word list and output tokens
TRIM - omit from word list; trim leading/trailing characters in output tokens
SUPPRESS - omit from the word list and output tokens

Value - each character is assigned a value

Hex Value - this field represents the character code for the character on that line

The Rules sub-tab uses a rules-based chopping algorithm that works by matching portions of an input string from left to right using search criteria and states. These are specified by rules which can be defined from the Rules tab. These rules are processed from top to bottom and the search criteria includes:

A vocabulary of words
A single regular expression

The system uses the search criteria to first match the input string at the current position. If it fails, it proceeds to the next rule and attempts to match using the criteria for that rule, and so on. If no rules match, the input string position is advanced by one character and the algorithm run again. This process is repeated until the process reaches the end of the input string.

If the system is able to successfully match a substring, it then attempts to validate the state. At any point in time, the system maintains a state of zero or more flags, which are just variables that either exist or not. Each rule has the ability to check for the existence of flags in the current state using a simple Boolean syntax. This is called the Prerequisite State Condition. If this validation fails, the rule fails to match and the next rule is checked, and so on. If it succeeds, then the following occurs:

The input string is advanced to the end of the successful match in the Search criterion.
A new state (Output State) consisting of zero or more flags is set. The new state replaces the old state.
The string is chopped before the current match, after the current match, at both points, or not at all.

This part of the Rules tab sets up an initial state before any rule processing is performed. This is useful for identifying a pre-existing state in order to force certain rules first.

The Initial Flags list has two controls. Click the Add icon to open the Add New Flag dialog. Here, you will enter the name of the new initial flag. As you create new flags, it is added alphabetically to the list. To delete an existing flag, select the flag and click the Delete icon.

The Rules sub-tab includes the following elements:

Initial Flags - Displays the initial flags associated with the chopper definition. You can use the neighboring buttons to delete and rearrange the flags.

Method - The Method is either Vocab or Regex, depending on the type of criterion for the rule. This determines the format for the Search Criterion column.

Regular Expression - This field allows you to type the regex directly into the accompanying field.

Vocabulary - The Vocabulary drop-down list allows you to select one of the available vocabularies.

Category The Category drop-down list displays all of the possible categories for the selected vocabulary.

Prerequisite State Condition - This is a Boolean text expression.

Output Flags - The Output Flags list can be populated with flag names just like Initial Flags.

Chop Mode - The Chop Mode drop-down list allows you to select one of the possible modes.

Notes - The Notes field allows you to add 128 characters of text. This is used for documentation purposes.

Parse Definition Quick Editor - Chopper Tab

Related Topics