SAS Quality Knowledge Base for Contact Information 26
Definitions for the English, South Africa locale are described below.
Case Definitions
Extraction Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Proper (PostBox) | ||
---|---|---|
Description | The Proper (PostBox) case definition propercases PostBox information in an address. | |
Examples | Input | Output |
PO BOX 18940 | PO Box 18940 | |
Remarks |
None.
ID Number | ||
---|---|---|
Description |
The ID Number gender analysis definition determines the gender of an individual based on their personal ID Number. |
|
Possible Outputs | M F U |
|
Examples | Input | Output |
7604210011088 | F | |
7604206143085 | M | |
Remarks |
Organization Registration | ||
---|---|---|
Description |
The Organization Registration identification analysis definition identifies the type of ID represented by the string. |
|
Possible Outputs | ORG_REG ID UNKNOWN |
|
Examples | Input | Output |
CC/94/00199 | ORG_REG | |
7604210011088 | ID | |
ABC | UNKNOWN | |
Remarks | The result ID means that the string represents a personal ID number. |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 77 characters | |
Examples | Input | Cluster ID |
173 BLOUWILDEBEES STRAAT | 0 | |
173 Blouwildebees St | 0 | |
Remarks |
|
Address (Full) | ||
---|---|---|
Description | The Address (Full) match definition generates match codes which can be used to cluster records containing complete two-line addresses. | |
Max Length of Match Code | 110 characters | |
Examples | Input | Cluster ID |
SANLAM CENTRE, C/O JEPPE & VON WIELLIGH STS, JOHANNESBURG, 2001 | 1 | |
4TH FLOOR SANLAM CENTRE, CNR JEPPE & VON WIELLIGH STR, JOBURG, 2001 | 1 | |
Remarks |
|
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
BANTRYBAAI | 0 | |
BANTRY BAY | 0 | |
Remarks |
|
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 41 characters | |
Examples | Input | Cluster ID |
Tygervalley 7536 | 1 | |
Tyger Valley, 7536 | 1 | |
Remarks |
|
ID Number | ||
---|---|---|
Description | The ID Number match definition generates match codes which can be used to cluster records containing ID card numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
5705025087080 | 0 | |
5705025087056 | 0 | |
Remarks |
Fuzziness is built in by ignoring digits near the end of the number.
|
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 40 characters | |
Examples | Input | Cluster ID |
DataFlux Corporation | 0 | |
DataFlex LLC | 0 | |
SAS Institute | 0 | |
SAS Institute, Canada | 0 | |
Remarks |
|
Organization Registration | ||
---|---|---|
Description | The Organization Registration match definition generates match codes which can be used to cluster records containing organization registration numbers. | |
Max Length of Match Code | 27 characters | |
Examples | Input | Cluster ID |
1989 / 008114/23 | 1 | |
1989/008114/23 | 1 | |
Remarks |
|
Organization (with Site) | ||
---|---|---|
Description | The Organization (with Site) match definition generates match codes which can be used to cluster records containing organization names including the site information. | |
Max Length of Match Code | 30 characters | |
Examples | Input | Cluster ID |
MOMENTUM GROUP LTD M.C.B | 1 | |
The MOMENTUM GROUP | 1 | |
Momentum Group Limited | 1 | |
NABUILD (PTY) LTD - KENILWORTH | 2 | |
Nabuild (Pty) Ltd - Empangeni | 3 | |
Remarks |
Organization names at different sites will not match if this definition is used.
|
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
011 4890292 | 0 | |
+27 (011) 4890292 | 0 | |
Remarks |
|
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
3370 | 1 | |
'3370' | 1 | |
Remarks |
|
Address | |||
---|---|---|---|
Description | The Address parse definition parses addresses into a set of tokens. | ||
Output Tokens | Prefix Building Group Building Name Street/Erf/Postbox Additional Info |
||
Example 1 | Input | Output Token | Output |
173 Blouwildebees Straat Unit 2 | Prefix | ||
Building Group | |||
Building Name | |||
Street/Erf/Postbox | 173 Blouwildebees Straat | ||
Additional Info | Unit 2 | ||
Example 2 | Input | Output Token | Output |
JEPPE MENS HOSTEL L ROOM 136 BLOCK 2 100 IMPALA STREET | Prefix | L ROOM 136 | |
Building Group | JEPPE MENS HOSTEL | ||
Building Name | BLOCK 2 | ||
Street/Erf/Postbox | 100 IMPALA STREET | ||
Additional Info | |||
Remarks |
Address (Full) | |||
---|---|---|---|
Description | The Address (Full) parse definition parses addresses containing complete two-line addresses into a set of tokens. | ||
Output Tokens | Prefix Building Group Building Name Street/Erf/Postbox Suburb/Township Town Area/Metropol Province Postcode Additional Info |
||
Example 1 | Input | Output Token | Output |
4TH FLOOR SANLAM CENTRE, CNR JEPPE & VON WIELLIGH STR, JOHANNESBURG, 2001 | Prefix | 4TH FLOOR | |
Building Group | |||
Building Name | SANLAM CENTRE | ||
Street/Erf/Postbox | CNR JEPPE & VON WIELLIGH STR | ||
Suburb/Township | |||
Town | |||
Area/Metropol | JOHANNESBURG | ||
Province | |||
Postcode | 2001 | ||
Additional Info | |||
Example 2 | Input | Output Token | Output |
JEPPE MENS HOSTEL, L ROOM 136, BLOCK 2 EAST, 100 IMPALA STREET, JEPPESTOWN, JOHANNESBURG, 2001 | Prefix | L ROOM 136 | |
Building Group | JEPPE MENS HOSTEL | ||
Building Name | BLOCK 2 EAST | ||
Street/Erf/Postbox | 100 IMPALA STREET | ||
Suburb/Township | JEPPESTOWN | ||
Town | |||
Area/Metropol | JOHANNESBURG | ||
Province | |||
Postcode | 2001 | ||
Additional Info | |||
Remarks |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Example 1 | Input | Output Token | Output |
JEPPE MENS HOSTEL L ROOM 136 BLOCK 2 100 IMPALA STREET | Recipient | ||
Building/Site | BLOCK 2 JEPPE MENS HOSTEL | ||
Street | 100 IMPALA STREET | ||
Extension | L ROOM 136 | ||
PO Box | |||
Additional Info | |||
Example 2 | Input | Output Token | Output |
173 Blouwildebees Straat, P.bus 123 | Recipient | ||
Building/Site | |||
Street | 173 Blouwildebees Straat | ||
Extension | |||
PO Box | P.bus 123 | ||
Additional Info | |||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | The City - State/Province - Postal Code parse definition parses last line address information into a set of tokens. | ||
Output Tokens | Suburb/Township Town Area/Metropol Province Postcode Additional Info |
||
Example | Input | Output Token | Output |
LOWER HOUGHTON, JOHANNESBURG 2198 | Suburb/Township | LOWER HOUGHTON | |
Town | |||
Area/Metropol | JOHANNESBURG | ||
Province | |||
Postcode | 2198 | ||
Additional Info | |||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The City - State/Province - Postal Code (Global) parse definition parses last line address information into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Example | Input | Output Token | Output |
LOWER HOUGHTON, JOHANNESBURG 2198 | City | LOWER HOUGHTON JOHANNESBURG | |
State/Province | |||
Postal Code | 2198 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
ID Number | |||
---|---|---|---|
Description | The ID Number parse definition parses personal identification numbers into a set of tokens. | ||
Output Tokens | Year Month Day Gender DOB/Gender Sequence Number Citizenship Check Digit |
||
Example | Input | Output Token | Output |
7604206147185 | Year | 76 | |
Month | 04 | ||
Day | 20 | ||
Gender | 6 | ||
DOB/Gender Sequence Number | 147 | ||
Citizenship | 18 | ||
Check Digit | 5 | ||
Remarks |
Organization Registration | |||
---|---|---|---|
Description |
The Organization Registration parse definition parses organization registration numbers into a set of tokens. |
||
Output Tokens | Year Sequence Type Alphabetic Info Additional Info |
||
Example 1 | Input | Output Token | Output |
1997/123456/07 | Year | 1997 | |
Sequence | 123456 | ||
Type | 07 | ||
Alphabetic Info | |||
Additional Info | |||
Example 2 | Input | Output Token | Output |
7604206147185 | Year | ||
Sequence | |||
Type | |||
Alphabetic Info | |||
Additional Info | 7604206147185 | ||
Remarks | The Additional Info token is used to store incorrect or extra information in the string. |
Phone | |||
---|---|---|---|
Description | The Phone parse definition parses phone numbers into a set of tokens. | ||
Output Tokens | Prefix Country Code Area Code Exchange Station Extension ID Extension Suffix |
||
Example 1 | Input | Output Token | Output |
Mobile 27 (11) 123-4567 ext 23 | Prefix | Mobile | |
Country Code | 27 | ||
Area Code | 11 | ||
Exchange | 123 | ||
Station | 4567 | ||
Extension ID | ext | ||
Extension | 23 | ||
Suffix | |||
Example 2 | Input | Output Token | Output |
011 555-9987 Home | Prefix | ||
Country Code | |||
Area Code | 011 | ||
Exchange | 555 | ||
Station | 9987 | ||
Extension ID | |||
Extension | |||
Suffix | Home | ||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description | The Phone (Global) parse definition parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Example 1 | Input | Output Token | Output |
Mobile 27 (11) 123-4567 ext 23 | Country Code | 27 | |
Area Code | 11 | ||
Base Number | 123-4567 | ||
Extension | ext 23 | ||
Line Type | Mobile | ||
Additional Info | |||
Example 2 | Input | Output Token | Output |
011 555-9987 Home | Country Code | ||
Area Code | 011 | ||
Base Number | 555-9987 | ||
Extension | |||
Line Type | Home | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Phone (Multiple Number) | |||
---|---|---|---|
Description | The Phone (Multiple Number) parse definition parses multiple phone number format input into a set of tokens. | ||
Output Tokens | Phone 1 Phone 2 Phone 3 |
||
Example 1 | Input | Output Token | Output |
(17) 610-8555 / (17) 611-9772 | Phone 1 | 17 610 8555 | |
Phone 2 | 17 611 9772 | ||
Phone 3 | |||
Example 2 | Input | Output Token | Output |
+27 (11) 392-1646/7/8 | Phone 1 | 27 11 392 164 6 | |
Phone 2 | 27 11 392 164 7 | ||
Phone 3 | 27 11 392 164 8 | ||
Example 3 | Input | Output Token | Output |
+27(22) 972-1600 | Phone 1 | 27 22 972 1600 | |
Phone 2 | |||
Phone 3 | |||
Remarks | Some loss of punctuation may occur when this definition is used. |
None.
Address | ||
---|---|---|
Description | The Address standardization definition standardizes addresses. | |
Examples | Input | Output |
14 NAPIER RD SETTLERS HEIGHTS | Settlers Heights, 14 Napier Road | |
Impalastraat 36 | 36 Impala Street | |
Remarks |
Address (Full) | ||
---|---|---|
Description | The Address (Full) standardization definition standardizes complete two line addresses. | |
Example | Input | Output |
IMPALASTRAAT 36 SEC J MAMELODI WEST,PRET 2941 | 36 Impala Street, Section J, Mamelodi, Pretoria, 2941 | |
Remarks |
City | ||
---|---|---|
Description | The City standardization definition standardizes city names. | |
Examples | Input | Output |
JHB | Johannesburg | |
Capetown | Cape Town | |
Remarks |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code standardization definition standardizes last line address information. | |
Examples | Input | Output |
ST HELENABAAI 7390 | St Helena Bay, 7390 | |
MOUNT FRERE,5090 | Mt Frere, 5090 | |
Remarks |
Phone | ||
---|---|---|
Description | The Phone standardization definition standardizes phone numbers for domestic use. | |
Example | Input | Output |
27119785895 | +27 (011) 978-5895 | |
Remarks |
Postal Code | ||
---|---|---|
Description | The Postal Code standardization definition standardizes postal codes. | |
Example | Input | Output |
'3370' | 3370 | |
Remarks |
In addition to the definitions listed on this page, the English, South Africa locale also inherits all definitions for the English language and all Global definitions.
Documentation Feedback: yourturn@sas.com |
Doc ID: QKBCI_ENZAF_defs.html |