You are here: Customizing Quality Knowledge Bases>Overview of Customize Features>Definitions>Extraction Definitions

DataFlux Data Management Studio 2.5: User Guide

Customize - Extraction Definitions

An Extraction Definition extracts parts of the input string and assigns them to corresponding tokens of the associated data type.

Input: a string

Example:

"100 Slightly used green Acme XJF-100 raygun $100 c/w lots of shiny buttons"

Output: mapping between tokens and substrings

Example:

Quantity => 100

Brand => "Acme"

Model => "XJF-100"

Color => "green"

Price => "$100"

Description => "Slightly used raygun c/w lots of shiny buttons"

Nodes

Hierarchy Node/Group Container Group Count
1 Extraction Definition Head Node   1
2 Preprocessing Regex Library Group   1
2.1 Preprocessing Regex Library Node Preprocessing Group 0 or more
3 Chopping Group   1
3.1 Chop Table Node Chopping Group 1
4 Table-Based Extraction Group   1
4.1 Extraction Scheme Node Table-Based Extraction Group 0 or more
5 Pattern-Based Extraction Group   1
5.1 Morph Analysis Group Pattern-Based Extraction Group 1 (*)
5.1.1 Lookup Normalization Group Morph Analysis Group 1 (*)
5.1.1.1 Uppercasing Node Lookup Normalization Group 1 (*)
5.1.1.2 Normalization Regex Libraries Group Lookup Normalization Group 1 (*)
5.1.1.2.1 Normalization Regex Library Node Normalization Regex Libraries Group 0 or more (*)
5.1.2 Vocabularies Group Morph Analysis Group 1 (*)
5.1.2.1 Vocabulary Node Vocabularies Group 1 or more (*)
5.1.3 Categorization Regex Libraries Group Morph Analysis Group 1 (*)
5.1.3.1 Categorization Regex Library Node Categorization Regex Libraries Group 0 or more (*)
5.1.4 Number Check Node Morph Analysis Group 1 (*)
5.1.5 Default Categories Node Morph Analysis Group 1 (*)
5.2 Pattern Recognition Group Pattern-Based Extraction Group 1 (*)
5.2.1 Pattern Logic Node Pattern Recognition Group 1 or more
6 Token Mappings Node   1

  


Related Topics

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_Cstm_12200.html