DataFlux Data Management Studio 2.6: User Guide

Vocabulary Node

The Vocabulary Node performs a lookup in the vocabulary file to retrieve the categories and likelihoods for each input substring.

Used in:

Properties

Vocabulary

Select a Vocabulary to use. Click Edit to open the selected vocabulary in the Vocabulary Editor.

Stop searching if found in this vocabulary

If this check box is selected and the input substring is found in the vocabulary of the current node, subsequent Vocabulary Nodes will not process that word. This allows you to decide whether two (or more) vocabularies which contain the same word should contribute all the categories, or if only the first vocabulary categories should be considered.

Perform fuzzy lookups

Select this check box to retrieve from specified categories the likelihoods for words that are similar to input words.

Fuzzy lookup threshold

Specify the degree of similarity that must exist before a similar word from a selected category is included in the output. Higher threshold values indicate higher degrees of similarity.

Available categories

Lists all of the categories in the selected Vocabulary.

Applied categories

Click the left and right arrows to select the subset of categories that will be searched for substrings that are similar to input words.

Output

Message

If any word in the Vocabulary matches the input, "Changes were applied". Otherwise, the message will be, "No changes were applied".

Result

A table with four columns:

Vocabulary outputs are cumulative, so a single node's output includes the outputs from previous vocabularies (if applicable).

Notes

A vocabulary is a table containing a list of words. For each word, one or more categories are assigned, and a likelihood is attached to each assignment.

Categories

A category indicates the function of the word in the context for which the vocabulary is intended.

For example:

In the context of people's names, some possible categories might be:

Likelihoods

A likelihood indicates the presumptive probability of that word belonging to that category.

For example:

In the context of people's names, assuming the English language, you could say that, given no other information than our general knowledge of English, "Judy" has:

Relationship to Grammars

In many definitions, vocabularies (using Vocabulary Nodes) are used in conjunction with Grammars (through various Pattern Nodes). For this reason, the categories of a vocabulary that is intended for use in morphological analysis generally correspond to the categories defined in a related grammar.

FAQ - Multiple vocabularies with the same word

If you have two vocabularies and the same word is in each vocabulary with different categories, does it assign both categories?

Yes, assuming the "stop if found" flag is not set on the first vocabulary.

What if the word appears in two vocabularies with the same category but different likelihoods?

That category will appear one time and the last likelihood encountered will be used (that is, the duplicate overwrites the original).

Note Note: This situation tends to cause confusion; it should be avoided, if possible.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: DMCust_12321.html