SAS Quality Knowledge Base for Contact Information 25
Definitions for the English, South Africa locale are described below.
Case Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Proper (PostBox) | ||
---|---|---|
Description | The Case definition for Proper (PostBox) propercases PostBox information in an address. | |
Examples | Input | Output |
PO BOX 18940 | PO Box 18940 | |
Remarks |
ID Number | ||
---|---|---|
Description | The Gender Analysis definition for ID Number determines an individual's gender based on a personal ID Number. | |
Possible Outputs | M F U |
|
Examples | Input | Output |
7604210011088 | F | |
7604206143085 | M | |
Remarks |
Organization Registration | ||
---|---|---|
Description | The Identification Analysis definition for Organization Registration determines whether a string represents an organization registration number. | |
Possible Outputs | ORG_REG ID UNKNOWN |
|
Examples | Input | Output |
CC/94/00199 | ORG_REG | |
7604210011088 | ID | |
ABC | UNKNOWN | |
Remarks | The result ID means that the string represents a personal ID number. |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 77 characters | |
Examples | Input | Cluster ID |
173 BLOUWILDEBEES STRAAT | 0 | |
173 Blouwildebees St | 0 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Address (Full) | ||
---|---|---|
Description | The Address (Full) match definition generates match codes which can be used to cluster records containing complete two-line addresses. | |
Max Length of Match Code | 110 characters | |
Examples | Input | Cluster ID |
SANLAM CENTRE, C/O JEPPE & VON WIELLIGH STS, JOHANNESBURG, 2001 | 1 | |
4TH FLOOR SANLAM CENTRE, CNR JEPPE & VON WIELLIGH STR, JOBURG, 2001 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
BANTRYBAAI | 0 | |
BANTRY BAY | 0 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 41 characters | |
Examples | Input | Cluster ID |
Tygervalley 7536 | 1 | |
Tyger Valley, 7536 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
ID Number | ||
---|---|---|
Description | The ID Number match definition generates match codes which can be used to cluster records containing ID card numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
5705025087080 | 0 | |
5705025087056 | 0 | |
Remarks |
Fuzziness is built in by ignoring digits near the end of the number. Note: The results listed above reflect the default match sensitivity (85). |
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 40 characters | |
Examples | Input | Cluster ID |
DataFlux Corporation | 0 | |
DataFlex LLC | 0 | |
SAS Institute | 0 | |
SAS Institute, Canada | 0 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Organization Registration | ||
---|---|---|
Description | The Organization Registration match definition generates match codes which can be used to cluster records containing organization registration numbers. | |
Max Length of Match Code | 27 characters | |
Examples | Input | Cluster ID |
1989 / 008114/23 | 1 | |
1989/008114/23 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Organization (with Site) | ||
---|---|---|
Description | The Organization (with Site) match definition generates match codes which can be used to cluster records containing organization names including the site information in match codes. | |
Max Length of Match Code | 30 characters | |
Examples | Input | Cluster ID |
MOMENTUM GROUP LTD M.C.B | 1 | |
The MOMENTUM GROUP | 1 | |
Momentum Group Limited | 1 | |
NABUILD (PTY) LTD - KENILWORTH | 2 | |
Nabuild (Pty) Ltd - Empangeni | 3 | |
Remarks |
Organization names at different sites will not match if this definition is used. Note: The results listed above reflect the default match sensitivity (85). |
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
011 4890292 | 0 | |
+27 (011) 4890292 | 0 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
3370 | 1 | |
'3370' | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Address | |||
---|---|---|---|
Description | The Parse definition for Address parses addresses into a set of tokens. | ||
Output Tokens | Prefix Building Group Building Name Street/Erf/Postbox Additional Info |
||
Example 1 | Input | Output | |
173 Blouwildebees Straat Unit 2 | Prefix | ||
Building Group | |||
Building Name | |||
Street/Erf/Postbox | 173 Blouwildebees Straat | ||
Additional Info | Unit 2 | ||
Example 2 | Input | Output | |
JEPPE MENS HOSTEL L ROOM 136 BLOCK 2 100 IMPALA STREET | Prefix | L ROOM 136 | |
Building Group | JEPPE MENS HOSTEL | ||
Building Name | BLOCK 2 | ||
Street/Erf/Postbox | 100 IMPALA STREET | ||
Additional Info | |||
Remarks |
Address (Full) | |||
---|---|---|---|
Description | The Parse definition for Address (Full) parses a full two-line address into a set of tokens. | ||
Output Tokens | Prefix Building Group Building Name Street/Erf/Postbox Suburb/Township Town Area/Metropol Province Postcode Additional Info |
||
Example 1 | Input | Output | |
4TH FLOOR SANLAM CENTRE, CNR JEPPE & VON WIELLIGH STR, JOHANNESBURG, 2001 | Prefix | 4TH FLOOR | |
Building Group | |||
Building Name | SANLAM CENTRE | ||
Street/Erf/Postbox | CNR JEPPE & VON WIELLIGH STR | ||
Suburb/Township | |||
Town | |||
Area/Metropol | JOHANNESBURG | ||
Province | |||
Postcode | 2001 | ||
Additional Info | |||
Example 2 | Input | Output | |
JEPPE MENS HOSTEL, L ROOM 136, BLOCK 2 EAST, 100 IMPALA STREET, JEPPESTOWN, JOHANNESBURG, 2001 | Prefix | L ROOM 136 | |
Building Group | JEPPE MENS HOSTEL | ||
Building Name | BLOCK 2 EAST | ||
Street/Erf/Postbox | 100 IMPALA STREET | ||
Suburb/Township | JEPPESTOWN | ||
Town | |||
Area/Metropol | JOHANNESBURG | ||
Province | |||
Postcode | 2001 | ||
Additional Info | |||
Remarks |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Example 1 | Input | Output | |
JEPPE MENS HOSTEL L ROOM 136 BLOCK 2 100 IMPALA STREET | Recipient | ||
Building/Site | BLOCK 2 JEPPE MENS HOSTEL | ||
Street | 100 IMPALA STREET | ||
Extension | L ROOM 136 | ||
PO Box | |||
Additional Info | |||
Example 2 | Input | Output | |
173 Blouwildebees Straat, P.bus 123 | Recipient | ||
Building/Site | |||
Street | 173 Blouwildebees Straat | ||
Extension | |||
PO Box | P.bus 123 | ||
Additional Info | |||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
Address (Global) (v23) | |||
---|---|---|---|
Description | The Address (Global) (v23) parse definition parses addresses into a globally recognized set of tokens. | ||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Example 1 | Input | Output | |
JEPPE MENS HOSTEL L ROOM 136 BLOCK 2 100 IMPALA STREET | Recipient | ||
Building/Site | BLOCK 2 JEPPE MENS HOSTEL | ||
Street | 100 IMPALA STREET | ||
Extension | L ROOM 136 | ||
PO Box | |||
Additional Info | |||
Example 2 | Input | Output | |
173 Blouwildebees Straat, P.bus 123 | Recipient | ||
Building/Site | |||
Street | 173 Blouwildebees Straat | ||
Extension | |||
PO Box | P.bus 123 | ||
Additional Info | |||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | The Parse definition for City - State/Province - Postal Code parses address "last line" data into a set of tokens. | ||
Output Tokens | Suburb/Township Town Area/Metropol Province Postcode Additional Info |
||
Example | Input | Output | |
LOWER HOUGHTON, JOHANNESBURG 2198 | Suburb/Township | LOWER HOUGHTON | |
Town | |||
Area/Metropol | JOHANNESBURG | ||
Province | |||
Postcode | 2198 | ||
Additional Info | |||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The Parse definition for City - State/Province - Postal Code (Global) parses address "last line" data into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Example | Input | Output | |
LOWER HOUGHTON, JOHANNESBURG 2198 | City | LOWER HOUGHTON JOHANNESBURG | |
State/Province | |||
Postal Code | 2198 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
ID Number | |||
---|---|---|---|
Description | The Parse definition for ID Number parses personal identification numbers into a set of tokens. | ||
Output Tokens | Year Month Day Gender DOB/Gender Sequence Number Citizenship Check Digit |
||
Example | Input | Output | |
7604206147185 | Year | 76 | |
Month | 04 | ||
Day | 20 | ||
Gender | 6 | ||
DOB/Gender Sequence Number | 147 | ||
Citizenship | 18 | ||
Check Digit | 5 | ||
Remarks |
Organization Registration | |||
---|---|---|---|
Description | The Parse definition for Organization Registration parses organization registration numbers into a set of tokens. | ||
Output Tokens | Year Sequence Type Alphabetic Info Additional Info |
||
Example 1 | Input | Output | |
1997/123456/07 | Year | 1997 | |
Sequence | 123456 | ||
Type | 07 | ||
Alphabetic Info | |||
Additional Info | |||
Example 2 | Input | Output | |
7604206147185 | Year | ||
Sequence | |||
Type | |||
Alphabetic Info | |||
Additional Info | 7604206147185 | ||
Remarks | The Additional Info token is used to store incorrect or extra information in the string. |
Phone | |||
---|---|---|---|
Description | The Parse definition for Phone parses South African phone numbers into a set of tokens. | ||
Output Tokens | Prefix Country Code Area Code Exchange Station Extension ID Extension Suffix |
||
Example 1 | Input | Output | |
Mobile 27 (11) 123-4567 ext 23 | Prefix | Mobile | |
Country Code | 27 | ||
Area Code | 11 | ||
Exchange | 123 | ||
Station | 4567 | ||
Extension ID | ext | ||
Extension | 23 | ||
Suffix | |||
Example 2 | Input | Output | |
011 555-9987 Home | Prefix | ||
Country Code | |||
Area Code | 011 | ||
Exchange | 555 | ||
Station | 9987 | ||
Extension ID | |||
Extension | |||
Suffix | Home | ||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description | The Parse definition for Phone (Global) parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Example 1 | Input | Output | |
Mobile 27 (11) 123-4567 ext 23 | Country Code | 27 | |
Area Code | 11 | ||
Base Number | 123-4567 | ||
Extension | ext 23 | ||
Line Type | Mobile | ||
Additional Info | |||
Example 2 | Input | Output | |
011 555-9987 Home | Country Code | ||
Area Code | 011 | ||
Base Number | 555-9987 | ||
Extension | |||
Line Type | Home | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Phone (Multiple Number) | |||
---|---|---|---|
Description | The Parse definition for Phone (Multiple Number) parses strings that contain one, two, or three South African phone numbers into a set of tokens. | ||
Output Tokens | Phone 1 Phone 2 Phone 3 |
||
Example 1 | Input | Output | |
(17) 610-8555 / (17) 611-9772 | Phone 1 | 17 610 8555 | |
Phone 2 | 17 611 9772 | ||
Phone 3 | |||
Example 2 | Input | Output | |
+27 (11) 392-1646/7/8 | Phone 1 | 27 11 392 164 6 | |
Phone 2 | 27 11 392 164 7 | ||
Phone 3 | 27 11 392 164 8 | ||
Example 3 | Input | Output | |
+27(22) 972-1600 | Phone 1 | 27 22 972 1600 | |
Phone 2 | |||
Phone 3 | |||
Remarks | Some loss of punctuation may occur when this definition is used. |
None.
Address | ||
---|---|---|
Description | The Standardization definition for Address standardizes address data. | |
Examples | Input | Output |
14 NAPIER RD SETTLERS HEIGHTS | Settlers Heights, 14 Napier Road | |
Impalastraat 36 | 36 Impala Street | |
Remarks |
Address (Full) | ||
---|---|---|
Description | The Standardization definition for Address (Full) standardizes full two-line address data. | |
Example | Input | Output |
IMPALASTRAAT 36 SEC J MAMELODI WEST,PRET 2941 | 36 Impala Street, Section J, Mamelodi, Pretoria, 2941 | |
Remarks |
City | ||
---|---|---|
Description | The Standardization definition for City standardizes city names. | |
Examples | Input | Output |
JHB | Johannesburg | |
Capetown | Cape Town | |
Remarks |
City - State/Province - Postal Code | ||
---|---|---|
Description | The Standardization definition for City - State/Province - Postal Code standardizes address "last line" data. | |
Examples | Input | Output |
ST HELENABAAI 7390 | St Helena Bay, 7390 | |
MOUNT FRERE,5090 | Mt Frere, 5090 | |
Remarks |
Phone | ||
---|---|---|
Description | The Standardization definition for Phone standardizes South African phone numbers. | |
Example | Input | Output |
27119785895 | +27 (011) 978-5895 | |
Remarks |
Postal Code | ||
---|---|---|
Description | The Standardization definition for Postal Code standardizes postal codes. | |
Example | Input | Output |
'3370' | 3370 | |
Remarks |
In addition to the definitions listed on this page, the English, South Africa locale also inherits all definitions for the English language and all Global definitions.
Documentation Feedback: yourturn@sas.com
|
Doc ID: QKBCI_ENZAF_defs.html |