You are here: Definitions>English Definitions>English, United States Definitions

SAS Quality Knowledge Base for Contact Information 25

English, United States Definitions

Definitions for the English, United States locale are described below.

Case Definitions
Extraction Definitions
Gender Analysis Definitions

Identification Analysis Definitions

Match Definitions

Parse Definitions

Pattern Analysis Definitions

Standardization Definitions

Inherited Definitions

Case Definitions

Proper (Address)
Description The Proper (Address) case definition propercases addresses.
  Input Output
Examples 11TH FLOOR 11th Floor
po box 125 PO Box 125
Remarks  

 

Proper (City - State/Province - Postal Code)
Description

The Proper (City - State/Province - Postal Code ) case definition propercases last line address information.

Example Input Output
cary, nc 27513 Cary, NC 27513
Remarks  

Extraction Definitions

Contact Info
Description

The Contact Info extraction definition extracts the Name, Organization, Address, E-mail, and Phone from a string.

Possible Outputs Name
Organization
Address
E-mail
Phone
Additional Info
Example Input Output
Mr John Smith, 100 SAS Campus Dr, PO Box 12345, Cary, NC 27513, 919-531-0000 NAME Mr John Smith
ORGANIZATION  
ADDRESS 100 SAS Campus Dr, PO Box 12345, Cary, NC 27513
E-MAIL  
PHONE 919-531-0000
ADDITIONAL INFO ,; ,
Remarks  

Gender Analysis Definitions

None.

Identification Analysis Definitions

Contact Info
Description

The Contact Info identification analysis identifies the contact information that is represented by a string.

Possible Outputs NAME
ORGANIZATION
E-MAIL
PHONE
ADDRESS
BLANK
MIXED
UNKNOWN
Examples Input Output
SAS Institute ORGANIZATION
John Smith NAME
Joe Smith DBA Some Company Name MIXED
john.smith@sas.com E-MAIL
919-531-0000 PHONE
100 SAS Campus Dr, Cary, NC 27513 ADDRESS
  BLANK
John Smith, 100 SAS Campus Dr, Cary, NC 27513, john.smith@sas.com MIXED
Fisher UNKNOWN
Remarks  

 

Phone (Validation)
Description

The Phone (Validation) identification analysis determines whether a string represents a valid phone number.

Possible Outputs VALID
INVALID
Examples Input Output
919-447-3000 VALID
888 8888 VALID
888 888 8888 INVALID
2468 INVALID
Remarks  

Match Definitions

Address
Description The Address match definition generates match codes which can be used to cluster records containing addresses.
Max Length of Match Code 44 characters
Examples Input Cluster ID
52 Commerce Street 0
52 Commerce St 0
52 Comerce St 0
52 Commerce Street, PO Box 1234 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

 

Address (Long)
Description

The Address (Long) match definition generates match codes which can be used to cluster records containing addresses.

Max Length of Match Code 20 characters
Examples Input Cluster ID
420 Park Royale Rd 0
420 Park Royale Road 0
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

The Address (Long) match definition is no longer supported. It is now deprecated and will be removed in a future release of the QKB. Please change your jobs to use the Address match definition.

 

City
Description

The City match definition generates match codes which can be used to cluster records containing city names.

Max Length of Match Code 15 characters
Examples Input Cluster ID
Cary 0
Carie 0
Durham 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

City - State/Province - Postal Code
Description

The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information.

Max Length of Match Code 15 characters
Examples Input Cluster ID
Cary, NC 27653 0
Durham, NC 27713 1
Durham, North Carolina 27713 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Name (with Suggestions)
Description

The Name (with Suggestions) match definition generates match codes which can be used to cluster records containing names of individuals.

Max Length of Match Code 21 characters
Examples Input Cluster ID
PRAIS HILTON 1
PARIS HILTON 1
HENRY NICKELSON 2
HENRY NICKERSON 2
NIKI WONG 3
ANIKI WONG 3
NIKI WONG 4
NICLOE WONG 4
Remarks

This match definition generates one or more match codes for each input string. Each match code represents a suggestion for what might be the true value of the input string; this enables two strings to be matched even when one or both strings contain a spelling mistake. For example, the name PRAIS might match the name PARIS, or the name NICLOE might match the name NIKI.

Note that a consequence of the generation of multiple match codes is that a record might be placed in more than one cluster by a subsequent clustering operation. Therefore, special attention should be given to the entity resolution process when using this definition.

For more information on suggestion-based matching, refer to the Suggestion-Based Matching section of the DataFlux Data Management Studio Online Help.

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Phone
Description

The Phone match definition generates match codes which can be used to cluster records containing phone numbers.

Max Length of Match Code 22 characters
Examples Input Cluster ID
1-800-DATAFLUX 0
1 (800) 328-2358 0
345-6789 1
345-6780 1
345-6700 2
447-3000 ext 1234 3
447-3000 ext 1266 3
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Postal Code
Description

The Postal Code match definition generates match codes which can be used to cluster records containing postal codes.

Max Length of Match Code 15 characters
Examples Input Cluster ID
27653 0
27653-0001 0
27713 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

State/Province
Description

The State/Province match definition generates match codes which can be used to cluster records containing states and provinces.

Max Length of Match Code 15 characters
Examples Input Cluster ID
Florida 0
FL 0
North Carolina 1
N. Carolina 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Text
Description

The Text match definition generates match codes which can be used to cluster records containing general text strings.

Max Length of Match Code 15 characters
Examples Input Cluster ID
they went 0
you are 1
you're 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Text (Long)
Description

The Text (Long) match definition generates match codes which can be used to cluster records containing longer general text strings.

Max Length of Match Code 30 characters
Examples Input Cluster ID
they went 0
you are 1
you're 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

Parse Definitions

Address
Description The Address parse definition parses addresses into a set of tokens.
Output Tokens Recipient
Building/Site
Street
Extension
PO Box
Additional Info
Example Input Output
Mr John Smith, Building T, Unit 102, 100 SAS Campus Dr, PO Box 12345 Recipient Mr John Smith
Building/Site Building T
Street 100 SAS Campus Dr
Extension Unit 102
PO Box PO Box 12345
Additional Info  
Remarks  

 

Address (Full)
Description The Address (Full) parse definition parses addresses containing complete two-line addresses into a set of tokens.
Output Tokens Recipient
Building/Site
Street
Extension
PO Box
City
State/Province
Postal Code
Country
Additional Info
Example Input Output
Mr John Smith, Building T, Unit 102, 100 SAS Campus Dr, PO Box 12345, Cary, NC 27513 USA Recipient Mr John Smith
Building/Site Building T
Street 100 SAS Campus Dr
Extension Unit 102
PO Box PO Box 12345
City Cary
State/Province NC
Postal Code 27513
Country USA
Additional Info  
Remarks  

 

Address (Global)
Description

The Address (Global) parse definition parses addresses into a globally recognized set of tokens.

Output Tokens Recipient
Building/Site
Street
Extension
PO Box
Additional Info
  Input Output
Example 1 Mr John Smith, Building T, Unit 102, 100 SAS Campus Dr, PO Box 12345 Recipient Mr John Smith
Building/Site Building T
Street 100 SAS Campus Dr
Extension Unit 102
PO Box PO Box 12345
Additional Info  
  Input Output
Example 2 420 Park Ridge Rd Recipient  
Building/Site  
Street 420 Park Ridge Rd
Extension  
PO Box  
Additional Info  
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB.

The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back.

 

Address (Global) (v23)
Description

The Address (Global) (v23) parse definition parses addresses into a globally recognized set of tokens.

Output Tokens Recipient
Building/Site
Street
Extension
PO Box
Additional Info
  Input Output
Example 1 Mr John Smith, Building T, Unit 102, 100 SAS Campus Dr, PO Box 12345 Recipient Mr John Smith
Building/Site Building T
Street 100 SAS Campus Dr
Extension Unit 102
PO Box PO Box 12345
Additional Info  
  Input Output
Example 2 420 Park Ridge Rd Recipient  
Building/Site  
Street 420 Park Ridge Rd
Extension  
PO Box  
Additional Info  
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB.

The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back.

 

City - State/Province - Postal Code
Description

The City - State/Province - Postal Code parse definition parses last line address information into a set of tokens.

Output Tokens City
State
ZIP
Example Input Output
Cary, NC 27653 City Cary
State NC
ZIP 27653
Remarks  

 

City - State/Province - Postal Code (Global)
Description

The City - State/Province - Postal Code (Global) parse definition parses last line address information into a globally recognized set of tokens.

Output Tokens City
State/Province
Postal Code
Additional Info
Example Input Output
Cary, NC 27653 City Cary
State/Province NC
Postal Code 27653
Additional Info  
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

 

Name (Address Update)
Description

The Name (Address Update) parse definition parses names of individuals into a set of tokens.

Output Tokens Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info
Example 1 Input Output
ANDY JEFFERSON Prefix  
Given Name ANDY
Middle Name  
Family Name JEFFERSON
Suffix  
Title/Additional Info  
Example 2 Input Output
MR CHIP BALTZER Prefix MR
Given Name CHIP
Middle Name  
Family Name BALTZER
Suffix  
Title/Additional Info  
Example 3 Input Output
ANTONIO FULLEN I Prefix  
Given Name ANTONIO
Middle Name  
Family Name FULLEN I
Suffix  
Title/Additional Info  
Remarks This definition is intended for use only with the Address Update Lookup feature in the DataFlux Data Management Platform.

 

Name/Organization
Description The Parse definition for Name/Organization parses strings that contain the names of individuals and organizations into a set of tokens.
Output Tokens Name
Organization
Example Input Output
Bob Brauer, DataFlux Corporation Name Bob Brauer
Organization DataFlux Corporation
Remarks  

 

Phone
Description

The Phone parse definition parses phone numbers into a set of tokens.

Output Tokens Country Code
Area Code
Base Number
Extension
Line Type
Additional Info
Example 1 Input Output
Work: +1 (919) 447-3000 Ext 3118 (ask for John) Country Code +1
Area Code 919
Base Number 447-3000
Extension 3118
Line Type Work:
Additional Info (ask for John)
Example 2 Input Output
011 44 2012345000 Country Code 011 44
Area Code  
Base Number 2012345000
Extension  
Line Type  
Additional Info  
Example 3 Input Output
Toll Free: (800) 888-4257 Country Code  
Area Code 800
Base Number 888-4257
Extension  
Line Type Toll Free:
Additional Info  
Example 4 Input Output
310 424-1442 3118 (w) Country Code  
Area Code 310
Base Number 424-1442
Extension 3118
Line Type (w)
Additional Info  
Remarks  

 

Phone (Global)
Description

The Phone (Global) parse definition parses phone numbers into a globally recognized set of tokens.

Output Tokens Country Code
Area Code
Base Number
Extension
Line Type
Additional Info
Example 1 Input Output
Work: +1 (919) 447-3000 Ext 3118 (ask for John) Country Code +1
Area Code 919
Base Number 447-3000
Extension 3118
Line Type Work:
Additional Info (ask for John)
Example 2 Input Output
011 44 2012345000 Country Code 011 44
Area Code  
Base Number 2012345000
Extension  
Line Type  
Additional Info  
Example 3 Input Output
Toll Free: (800) 888-4257 Country Code  
Area Code 800
Base Number 888-4257
Extension  
Line Type Toll Free:
Additional Info  
Example 4 Input Output
310 424-1442 3118 (w) Country Code  
Area Code 310
Base Number 424-1442
Extension 3118
Line Type (w)
Additional Info  
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

 

Postal Code
Description The Parse definition for Postal Code parses zip codes into a set of tokens.
Output Tokens ZIP
ZIP Add-on
Example Input Output
27653-0001 ZIP 27653
ZIP Add-on 0001
Remarks  

Pattern Analysis Definitions

None.

Standardization Definitions

Address
Description The Address standardization definition standardizes addresses.
Examples Input Output
52 Commerce Street 52 Commerce St
5TH avenue 5th Ave
Remarks  

 

City
Description

The City standardization definition standardizes city names.

Examples Input Output
cary Cary
NY New York
Remarks Common city abbreviations are expanded into full names.

 

City - State/Province - Postal Code
Description The City - State/Province - Postal Code standardization definition standardizes last line address information.
Example Input Output
cary nc 27513 Cary, NC 27513
Remarks  

 

Phone
Description

The Phone standardization definition standardizes phone numbers for domestic use.

Examples Input Output
+1 602 961-1317 (602) 961 1317
0044 (0)20 12345000 +44 2012345000
4159745060 (415) 974 5060
(212) 987-7654 EXT 456 (212) 987 7654 x456
Remarks  

 

Phone (Electronic)
Description

The Phone (Electronic) standardization definition standardizes phone numbers for automated calling systems.

Examples Input Output
Work: (301) 889-9200 (ask for Mary) +13018899200
Fax: 602 9611317 +16029611317
0044 (0)20 12345000 +442012345000
612.736.4436 +16127364436
Remarks  

 

Phone (with Country Code)
Description

The Phone (with Country Code) standardization definition standardizes phone numbers for international use.

Examples Input Output
(212) 987-7654 +1 212 987 7654
1(303)5466306 +1 303 546 6306
(414) 242-8202 - after 4pm +1 414 242 8202, After 4PM
0044 (0)20 12345000 +44 2012345000
011-81-53-460-2871 +81 534602871
Remarks  

 

Postal Code
Description

The Postal Code standardization definition standardizes postal codes.

Examples Input Output
27653 - 0001 27653-0001
27653 0001 27653-0001
Remarks  

 

State/Province (Abbreviation)
Description

The State/Province (Abbreviation) standardization definition standardizes state names.

Examples Input Output
nc NC
fla FL
Remarks  

 

State/Province (Full Name)
Description The State/Province (Full Name) standardization definition standardizes complete state names.
Examples Input Output
n carolina North Carolina
fla Florida
Remarks  

Inherited Definitions

In addition to the definitions listed on this page, the English, United States locale also inherits all definitions for the English language and all Global definitions.