You are here: Customizing Quality Knowledge Bases>Overview of Customize Features>Nodes>Pattern Logic Node

DataFlux Data Management Studio 2.5: User Guide

Pattern Logic Node

The Pattern Logic Node uses the categories previously assigned to input words by morphological analysis to generate possible parse solutions. At the end of the morph analysis step, each word (or substring) in the input has been categorized. However, it is likely that:

The Pattern Logic Node solves both of these problems by considering each word and its categories in the context provided by the full string, and with the aid of a grammar that supplies information about the allowed structure of inputs.

Used in:

Properties

All Definition Types

The following properties are common to all definitions that include Pattern Logic Nodes.

Grammar

Select a grammar to use. A grammar describes all the categories that words might have within a particular context, such as names or addresses. A grammar also describes the relationships between categories. Some categories are derived from a combination of others, and there may be many different ways to derive a particular category.

By default, only files from the locale and its ancestors appear in the drop-down. If you do not see the desired library, click Tools > Options. Click Display and select Show files for all locales under the Library file selection drop-down lists to view QKB files from all locales.

Click Edit to edit the selected file or create a new file, the appropriate editor opens.

For example, in the United States, a phrase that represents a person's family name (the derived category) can commonly be seen in the following constructions:

In the simplest case, the structure of a grammar looks like a single tree. Some grammars may be made of multiple, unconnected trees.

Root Category

This is the category at the root of the desired sub-tree of the category hierarchy, as defined by the grammar. This may be anything from the true root of a single-tree grammar, the root of a tree in a multiple-tree grammar, or simply any category, depending on the intent.

Optimization Parameters

Parse resource limit - The maximum amount of computational resources that the parser should use.

Solution tree depth - The maximum depth within a solution tree down to which the parser will recurse.

Input length - Optimizes processing for different lengths of input string.

Identification Analysis and Locale Guess Definitions only
Sought category

This is the category of interest within that sub-tree whose root is the root category.

Stop searching if this pattern is found

If this check box is selected, processing of patterns stops after this node if the node matches.

Search for pattern
Identification Analysis Definition only
Identity

The identity to be assigned if the pattern matches.

Weight

This number weights the final calculation in the pattern analysis.

Locale Guess Definition only
Likelihood

The likelihood of the input being in the definition's locale, if the pattern matches. Setting to "Never" implies that an input that matches this pattern could never belong to the definition's locale.

Extraction Definition only
Weight

This number weights the final calculation in the pattern analysis.

Category token mappings

This table displays an ordered list of categories that are sought to match with input substrings, using the specified grammar. The best category match in the list causes the specified token to be applied to the input string. The table can therefore include multiple rows for the same category, each with a different token value. In the table you can reposition rows vertically for easier ready by clicking the up and down arrows. To add and remove rows, use the plus and minus arrows.

Sought categories

Use this field to select categories and add them to the table.

Token

Use this field to specify the token that is mapped to a category in the table.

Maximum matches per pattern

The maximum number of matches that will be processed for this pattern.

Search for pattern

Output

Message

One of the following:

Solution trees

If any solutions were found for a test string, a number of solution trees appear in the output pane. Depending on the definition, they will be organized and displayed in different ways.

Solution tree structure

--+ ROOT CATEGORY (Likelihood)

|

|--o string

|
|--o SUBCATEGORY1 SUBCATEGORY2...

The top of the tree shows the root category and the likelihood associated with it. The first bullet (blue) shows the string. The second bullet (yellow) shows the subcategories that form the root category when combined according to the rules of the grammar. Following this are similar sub-trees for each of the subcategories. Then structure then recurses into each category until only basic (non-derived categories) are a part of the tree, or until the maximum solution tree depth specified by the node is reached.

If there are multiple solutions generated for a pattern, the one with the highest score is used to determine the final result.

Parse Definition

The Parse Definition has only one pattern. Therefore, only one set of solutions appear, at most.

Identification Analysis Definition

Each pattern that generated solutions (up to and including the currently selected Node) has a row of solution trees. For viewing convenience, some information about the pattern is shown to the left of its solution tree row.

Extraction Definition

The Extraction Definition processes the input string in parallel across all Pattern Logic Nodes. If a node detects a word that matches its category, that word is extracted from the string. A second iteration then examines the new, shorter, substring. Iterations continue until all words have been extracted. In the Testing area, the left side of the output pane displays a substring tree. Each node in the tree represents an iteration. Iterations that expand will display the words that were extracted in that iteration. The same word or substring may appear more than once in the tree if multiple-token extraction is selected in the Extraction Definition Head Node.

Click a word to display its solution tree on the right side of the output pane. Included in the solution tree is the number of the Pattern Logic Node that generated the solution. Note that the test output for the Pattern Logic Summary Node combines the output from all Pattern Logic Nodes.

Locale Guess Definition

On the left of the output pane is a tree of patterns, with the substrings extracted by each pattern. The score for each pattern is displayed next to the pattern index. If you click on the pattern, it displays some information about the pattern. If you click on a substring, it displays the solution trees for that substring.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_Cstm_12327.html