You are here: Customizing Quality Knowledge Bases>Overview of Customize Features>Nodes>N-Gram Scheme Node

DataFlux Data Management Studio 2.5: User Guide

N-Gram Scheme Node

The N-Gram Scheme Node holds training data known to be in the language of interest. The data is stored in the form of N-Grams, short segments of the input text produced by sliding a window of size N, moving one character at a time, over the text. N refers to the number of characters in the segment (for example, 2, 3, or 4).

For example:

The N-Grams of size 3 in the string "Hello Bob" are:

If bookends are used, there are two additional 3-Grams:

The bookends represent the beginnings and ends of lines.

Used in:

Properties

Scheme

Select a Scheme to use. By default, only files from the locale and its ancestors appear in the drop-down. If you do not see the desired library, click Tools > Options. Click Display and select Show files for all locales under the Library file selection drop-down lists to view QKB files from all locales.

Click Edit to edit the selected file or create a new file, the appropriate editor opens.

Output

Individual N-Gram Scheme Nodes have no output, because all the schemes are combined to produce the output.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_Cstm_12336.html