You are here: Definition Types>Parse Definitions

SAS Quality Knowledge Base for Contact Information 26

Parse Definitions

Parse definitions specify data and logic that are used to parse a data string.

The output of a parse definition is a set of tokens. We define a token as a semantically atomic component of a data value. For example, the set of tokens defined for the Name parse definition might be:

Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info

When a parse definition is applied to a data string, the string is analyzed and split into substrings that are assigned to the output tokens. As an example, consider the results of applying a Name parse definition to the following string:

Dr. James Goodnight, President and CEO

When the parse definition is applied, the string is split into tokens as follows:

Token Name Token Value
Prefix Dr.
Given Name James
Middle Name  
Family Name Goodnight
Suffix  
Title/Additional Info President and CEO

Notice that not all of the output tokens are populated with values in this example. Only the tokens that are needed are used when the parse is performed.

Parse definitions are useful when you want to break data strings into substrings to better organize your data source, or if you want to perform analytics on specific elements of strings in a table. For instance, you might want to do a frequency count on all of the given names in a table. To do this, you could use a parse definition to parse your name data and then perform a frequency count on the field that contains the Given Name token.