SAS Quality Knowledge Base for Contact Information 26
Parse definitions specify data and logic that are used to parse a data string.
The output of a parse definition is a set of tokens. We define a token as a semantically atomic component of a data value. For example, the set of tokens defined for the Name parse definition might be:
Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info
When a parse definition is applied to a data string, the string is analyzed and split into substrings that are assigned to the output tokens. As an example, consider the results of applying a Name parse definition to the following string:
Dr. James Goodnight, President and CEO
When the parse definition is applied, the string is split into tokens as follows:
Token Name | Token Value |
---|---|
Prefix | Dr. |
Given Name | James |
Middle Name | |
Family Name | Goodnight |
Suffix | |
Title/Additional Info | President and CEO |
Notice that not all of the output tokens are populated with values in this example. Only the tokens that are needed are used when the parse is performed.
Parse definitions are useful when you want to break data strings into substrings to better organize your data source, or if you want to perform analytics on specific elements of strings in a table. For instance, you might want to do a frequency count on all of the given names in a table. To do this, you could use a parse definition to parse your name data and then perform a frequency count on the field that contains the Given Name token.
Documentation Feedback: yourturn@sas.com |
Doc ID: QKBCI_parse_defs.html |