DataFlux Data Management Studio 2.5: User Guide
An Extraction Definition extracts parts of the input string and assigns them to corresponding tokens of the associated data type.
Input: a string
Example:
"100 Slightly used green Acme XJF-100 raygun $100 c/w lots of shiny buttons"
Output: mapping between tokens and substrings
Example:
Quantity => 100
Brand => "Acme"
Model => "XJF-100"
Color => "green"
Price => "$100"
Description => "Slightly used raygun c/w lots of shiny buttons"
| Hierarchy | Node/Group | Container Group | Count |
|---|---|---|---|
| 1 | Extraction Definition Head Node | 1 | |
| 2 | Preprocessing Regex Library Group | 1 | |
| 2.1 | Preprocessing Regex Library Node | Preprocessing Group | 0 or more |
| 3 | Chopping Group | 1 | |
| 3.1 | Chop Table Node | Chopping Group | 1 |
| 4 | Table-Based Extraction Group | 1 | |
| 4.1 | Extraction Scheme Node | Table-Based Extraction Group | 0 or more |
| 5 | Pattern-Based Extraction Group | 1 | |
| 5.1 | Morph Analysis Group | Pattern-Based Extraction Group | 1 (*) |
| 5.1.1 | Lookup Normalization Group | Morph Analysis Group | 1 (*) |
| 5.1.1.1 | Uppercasing Node | Lookup Normalization Group | 1 (*) |
| 5.1.1.2 | Normalization Regex Libraries Group | Lookup Normalization Group | 1 (*) |
| 5.1.1.2.1 | Normalization Regex Library Node | Normalization Regex Libraries Group | 0 or more (*) |
| 5.1.2 | Vocabularies Group | Morph Analysis Group | 1 (*) |
| 5.1.2.1 | Vocabulary Node | Vocabularies Group | 1 or more (*) |
| 5.1.3 | Categorization Regex Libraries Group | Morph Analysis Group | 1 (*) |
| 5.1.3.1 | Categorization Regex Library Node | Categorization Regex Libraries Group | 0 or more (*) |
| 5.1.4 | Number Check Node | Morph Analysis Group | 1 (*) |
| 5.1.5 | Default Categories Node | Morph Analysis Group | 1 (*) |
| 5.2 | Pattern Recognition Group | Pattern-Based Extraction Group | 1 (*) |
| 5.2.1 | Pattern Logic Node | Pattern Recognition Group | 1 or more |
| 6 | Token Mappings Node | 1 |
|
Documentation Feedback: yourturn@sas.com
|
Doc ID: dfU_Cstm_12200.html |