DataFlux Data Management Studio 2.5: User Guide

Document Extraction Node

You can add a Document Extraction node to identify textual entities and their usage within a given text input. The node will identify the terms (words) found in the input text and the terms’ usage categorizations, such as vehicle, person, title, or company. For an example of how this node can be used, see Converting and Extracting a Document.

Once you have added the node, you can double-click it to open its properties dialog. The properties dialog includes the following elements:

Name - Specifies a name for the node.

Notes - Enables you to open the Notes dialog. You use the dialog to enter optional details or any other relevant information for the input.

Source Field - Specifies the name of the input field from the parent node.

Language - Specifies the language of the input text. The default is EN.

Output null rows - Specifies whether an output row should be generated if the input field value contained no terms.

Number of rows to read - Specifies the maximum number of rows to read.

Exclude source field from output - Specifies whether to include the source field contents in the node's output.

You can access the following advanced properties by right-clicking the Document Extraction node:

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfDMStd_CF_ContextualExtract.html