DataFlux Data Management Studio 2.6: User Guide
The Pattern Logic Node uses the categories previously assigned to input words by morphological analysis to generate possible parse solutions. At the end of the morph analysis step, each word (or substring) in the input has been categorized. However, it is likely that:
The Pattern Logic Node solves both of these problems by considering each word and its categories in the context provided by the full string, and with the aid of a grammar that supplies information about the allowed structure of inputs.
Used in:
The following properties are common to all definitions that include Pattern Logic Nodes.
Select a grammar to use. A grammar describes all the categories that words might have within a particular context, such as names or addresses. A grammar also describes the relationships between categories. Some categories are derived from a combination of others, and there may be many different ways to derive a particular category.
By default, only files from the locale and its ancestors appear in the drop-down. If you do not see the desired library, click Tools > Options. Click Display and select Show files for all locales under the Library file selection drop-down lists to view QKB files from all locales.
Click Edit to edit the selected file or create a new file, the appropriate editor opens.
For example, in the United States, a phrase that represents a person's family name (the derived category) can commonly be seen in the following constructions:
- single family-name word (for example, "John SMITH")
- two family-name words (for example, "Helena BONHAM CARTER")
- two family-name words with a hyphen (for example, "Mary JONES-SMITH")
- Multiple family-name words, with hyphens or without
- Single initial (for example, "John S.", last name abbreviated to protect a person's identity), and the list expands greatly as different languages, countries, cultures, and customs are taken into account
In the simplest case, the structure of a grammar looks like a single tree. Some grammars may be made of multiple, unconnected trees.
This is the category at the root of the desired sub-tree of the category hierarchy, as defined by the grammar. This may be anything from the true root of a single-tree grammar, the root of a tree in a multiple-tree grammar, or simply any category, depending on the intent.
Parse resource limit - The maximum amount of computational resources that the parser should use.
Solution tree depth - The maximum depth within a solution tree down to which the parser will recurse.
Input length - Optimizes processing for different lengths of input string.
This is the category of interest within that sub-tree whose root is the root category.
If this check box is selected, processing of patterns stops after this node if the node matches.
The identity to be assigned if the pattern matches.
This number weights the final calculation in the pattern analysis.
The likelihood of the input being in the definition's locale, if the pattern matches. Setting to "Never" implies that an input that matches this pattern could never belong to the definition's locale.
This number weights the final calculation in the pattern analysis.
This table displays an ordered list of categories that are sought to match with input substrings, using the specified grammar. The best category match in the list causes the specified token to be applied to the input string. The table can therefore include multiple rows for the same category, each with a different token value. In the table you can reposition rows vertically for easier ready by clicking the up and down arrows. To add and remove rows, use the plus and minus arrows.
Use this field to select categories and add them to the table.
Use this field to specify the token that is mapped to a category in the table.
The maximum number of matches that will be processed for this pattern.
One of the following:
If any solutions were found for a test string, a number of solution trees appear in the output pane. Depending on the definition, they will be organized and displayed in different ways.
--+ ROOT CATEGORY (Likelihood)
|
|--o string
|
|--o SUBCATEGORY1 SUBCATEGORY2...
The top of the tree shows the root category and the likelihood associated with it. The first bullet (blue) shows the string. The second bullet (yellow) shows the subcategories that form the root category when combined according to the rules of the grammar. Following this are similar sub-trees for each of the subcategories. Then structure then recurses into each category until only basic (non-derived categories) are a part of the tree, or until the maximum solution tree depth specified by the node is reached.
If there are multiple solutions generated for a pattern, the one with the highest score is used to determine the final result.
The Parse Definition has only one pattern. Therefore, only one set of solutions appear, at most.
Each pattern that generated solutions (up to and including the currently selected Node) has a row of solution trees. For viewing convenience, some information about the pattern is shown to the left of its solution tree row.
The Extraction Definition processes the input string in parallel across all Pattern Logic Nodes. If a node detects a word that matches its category, that word is extracted from the string. A second iteration then examines the new, shorter, substring. Iterations continue until all words have been extracted. In the Testing area, the left side of the output pane displays a substring tree. Each node in the tree represents an iteration. Iterations that expand will display the words that were extracted in that iteration. The same word or substring may appear more than once in the tree if multiple-token extraction is selected in the Extraction Definition Head Node.
Click a word to display its solution tree on the right side of the output pane. Included in the solution tree is the number of the Pattern Logic Node that generated the solution. Note that the test output for the Pattern Logic Summary Node combines the output from all Pattern Logic Nodes.
On the left of the output pane is a tree of patterns, with the substrings extracted by each pattern. The score for each pattern is displayed next to the pattern index. If you click on the pattern, it displays some information about the pattern. If you click on a substring, it displays the solution trees for that substring.
Documentation Feedback: yourturn@sas.com
|
Doc ID: DMCust_12327.html |