Parse Definitions

You are here: Definition Types>Parse Definitions

SAS Quality Knowledge Base for Contact Information 26

Parse Definitions

Parse definitions specify data and logic that are used to parse a data string.

The output of a parse definition is a set of tokens. We define a token as a semantically atomic component of a data value. For example, the set of tokens defined for the Name parse definition might be:

Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info

When a parse definition is applied to a data string, the string is analyzed and split into substrings that are assigned to the output tokens. As an example, consider the results of applying a Name parse definition to the following string:

Dr. James Goodnight, President and CEO

When the parse definition is applied, the string is split into tokens as follows:

Token Name	Token Value
Prefix	Dr.
Given Name	James
Middle Name
Family Name	Goodnight
Suffix
Title/Additional Info	President and CEO

Notice that not all of the output tokens are populated with values in this example. Only the tokens that are needed are used when the parse is performed.

Parse definitions are useful when you want to break data strings into substrings to better organize your data source, or if you want to perform analytics on specific elements of strings in a table. For instance, you might want to do a frequency count on all of the given names in a table. To do this, you could use a parse definition to parse your name data and then perform a frequency count on the field that contains the Given Name token.