DataFlux Data Management Studio 2.7: User Guide

Vocabulary Node

The Vocabulary Node performs a lookup in the vocabulary file to retrieve the categories and likelihoods for each input substring.

Used in:

Properties

Vocabulary:

Select a Vocabulary to use.

Click Open Vocabulary to open the selected vocabulary in the Vocabulary Editor.

Stop searching vocabularies if the word is found in this vocabulary

If this check box is selected and the input substring is found in the vocabulary of the current node, subsequent Vocabulary Nodes will not process that word. This allows you to decide whether two (or more) vocabularies which contain the same word should contribute all the categories, or if only the first vocabulary categories should be considered.

Perform fuzzy lookups for all categories (if no selection) or specified categories

A word must match an entry in a vocabulary in order to be called a match. When fuzzy lookups are activated, words that are somewhat similar to those in the specified vocabulary will match. This option lets you select a specific category or categories of words from the vocabulary to match instead of using all categories.

Threshold

The threshold specifies the degree of similarity that must exist for the word to match. The higher the number, the more similarity must exist for the match. If you want an exact match, select 100. This turns off fuzzy lookups.

Output

Message

If any word in the Vocabulary matches the input, the message is "Changes applied". Otherwise, the message will be, "No changes applied".

Result

A table with four columns:

Vocabulary outputs are cumulative, so a single node's output includes the outputs from previous vocabularies (if applicable).

Notes

A vocabulary is a table containing a list of words. For each word, one or more categories are assigned, and a likelihood is attached to each assignment.

Categories

A category indicates the function of the word in the context for which the vocabulary is intended.

For example:

In the context of people's names, some possible categories might be:

Likelihoods

A likelihood indicates the presumptive probability of that word belonging to that category.

For example:

In the context of people's names, assuming the English language, you could say that, given no other information than our general knowledge of English, "Judy" has:

Relationship to Grammars

In many definitions, vocabularies (using Vocabulary Nodes) are used in conjunction with Grammars (through various Pattern Nodes). For this reason, the categories of a vocabulary that is intended for use in morphological analysis generally correspond to the categories defined in a related grammar.

FAQ - Multiple vocabularies with the same word

If you have two vocabularies and the same word is in each vocabulary with different categories, does it assign both categories?

Yes, assuming the "stop if found" flag is not set on the first vocabulary.

What if the word appears in two vocabularies with the same category but different likelihoods?

That category will appear one time and the last likelihood encountered will be used (that is, the duplicate overwrites the original).

Note Note: This situation tends to cause confusion; it should be avoided, if possible.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: DMCust_12321.html