SAS Quality Knowledge Base for Contact Information 25
Definitions for the English, India locale are described below.
Case Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Proper (Address) | ||
---|---|---|
Description | The Proper (Address) case definition propercases addresses. | |
Examples | Input | Output |
405 SAMEDH TOWER SHAM NAGAR NR ADARASH PETROL PUMP | 405 Samedh Tower Sham Nagar nr Adarash Petrol Pump | |
30 OMKAR HOUSE C G ROAD NAVRANGPURA | 30 Omkar House C G Road Navrangpura | |
Remarks |
Proper (Name) | ||
---|---|---|
Description | The Proper (Name) case definition propercases names of individuals. | |
Examples | Input | Output |
AMRITLAL M DHAKA | Amritlal M Dhaka | |
pravin champalal bokadia | Pravin Champalal Bokadia | |
Remarks |
Name | ||
---|---|---|
Description | The Name gender analysis definition determines the gender of a name. | |
Possible Outputs | M F U |
|
Examples | Input | Output |
Miss. Neha Patel | F | |
Vora Praveen | M | |
Dr. Pandey | U | |
Remarks |
None.
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 86 characters | |
Examples | Input | Cluster ID |
B 8 Kapol Society Marve Road Malad Mumbai | 0 | |
A 310 Kapol Nivas Marve Road Near Green Park Malad | 0 | |
JIMIT APARTMENT 2ND FLOOR NEAR KAPOL SOC MARVE RD ROHINI | 1 | |
IBM LTD JIMIT APARTMENT 4TH FLOOR MARVE RD NEAR KASTURI NAGAR ROHINI | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
|
The time it takes to execute matching for Indian addresses is mainly driven by the parsing that takes place internally in this match definition. If your input is already parsed, you can save a significant amount of execution time when you use the Match Codes (Parsed) node in DataFlux Data Management Studio. |
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
Mumbai | 0 | |
Bombay | 0 | |
Ambala Sadar | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 22 characters | |
Examples | Input | Cluster ID |
Buria Pin 110025 Mizoram | 0 | |
Mizoram Buria - 110025 | 0 | |
Charkhi Dadri Nagaland - 110026 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Name | ||
---|---|---|
Description | The Name match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 20 characters | |
Examples | Input | Cluster ID |
Navneet Nischal | 0 | |
Mr. Navnit Nischal, Dir | 0 | |
Deepak Parekh | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Name (with Extensions) | ||
---|---|---|
Description | The Name (with Extensions) match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 50 characters | |
Examples | Input | Cluster ID |
V V S Laxman | 0 | |
Mr. V V P D Laxxman ,CEO | 0 | |
Mr. A V D Rajendra | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 16 characters | |
Examples | Input | Cluster ID |
MOB-9869000595 | 0 | |
98690-00595 | 0 | |
2871565 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
PinCode | ||
---|---|---|
Description | The PinCode match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
PIN 110035 | 0 | |
110035 | 0 | |
110018 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
PIN 110035 | 0 | |
PIN 110035 | 0 | |
110018 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
|
This is the same behavior as PinCode. This definition allows the user to invoke postal code matching across locales with a single definition call, because a definition with this name is implemented on all locales. |
State | ||
---|---|---|
Description | The State match definition generates match codes which can be used to cluster records containing names of states and union territories. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
J & K | 0 | |
Jammu and Kashir | 0 | |
Andaman | 1 | |
Andaman & Nicobar | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Address (Full) | |||
---|---|---|---|
Description | The Address (Full) parse definition parses addresses containing complete two-line addresses into a set of tokens. | ||
Output Tokens | Extension Building/Wing Number Building Name Plot Phase Street Sector Part Neighborhood At Post Locality Landmark Scheme Layout Survey Cross Main Stage Block Tehsil City Pincode State Additional Info |
||
Example 1 | Input | Output | |
B-8 Kapol Soc Marve Road Malad West Mumbai Maharashtra 400060 | Extension | B-8 | |
Building/Wing Number | |||
Building Name | Kapol Soc | ||
Plot | |||
Phase | |||
Street | Marve Road | ||
Sector | |||
Part | |||
Neighborhood | |||
At Post | |||
Locality | Malad West | ||
Landmark | |||
Scheme | |||
Layout | |||
Survey | |||
Cross | |||
Main | |||
Stage | |||
Block | |||
Tehsil | |||
City | Mumbai | ||
Pincode | 400060 | ||
State | Maharashtra | ||
Additional Info | |||
Example 2 | Input | Output | |
A/1212 Saraswati CHS Plot No 1 23 rd Phase Sector No 12 Mahakali Road Govind Nagar Kandivali Mumbai 400064 Maharashtra India | Extension | A/1212 | |
Building/Wing Number | |||
Building Name | Saraswati CHS | ||
Plot | Plot No 1 | ||
Phase | 23 rd Phase | ||
Street | Mahakali Road | ||
Sector | Sector No 12 | ||
Part | |||
Neighborhood | Govind Nagar | ||
At Post | |||
Locality | Kandivali | ||
Landmark | |||
Scheme | |||
Layout | |||
Survey | |||
Cross | |||
Main | |||
Stage | |||
Block | |||
Tehsil | |||
City | Mumbai | ||
Pincode | 400064 | ||
State | MAHARASHTRA | ||
Additional Info | India | ||
Example 3 | Input | Output | |
C/1 Appejay House Plot No 12 Jyoti Scheme Karvaya Layout Pocket 12 Survey No 1 Nr K C College Mumbai 400064 | Extension | C/1 | |
Building/Wing Number | |||
Building Name | Appejay House | ||
Plot | Plot No 12 | ||
Phase | |||
Street | |||
Sector | |||
Part | |||
Neighborhood | |||
At Post | |||
Locality | |||
Landmark | Nr K C College | ||
Scheme | Jyoti Scheme | ||
Pocket 12 | |||
Layout | Karvaya Layout | ||
Survey | Survey No 1 | ||
Cross | |||
Main | |||
Stage | |||
Block | |||
Tehsil | |||
City | Mumbai | ||
Pincode | 400064 | ||
State | |||
Additional Info | |||
Remarks |
Due to the complexity of Indian addresses, the software might require a higher memory limit to complete processing of an input string. This definition has set the Parse Resource Limit to INTENSIVE in the QKB to obtain the most accurate results possible when executing definitions involving parsing of Indian addresses. Higher values for the Parse Resource Limit setting can cause higher execution times for your jobs, however, lower values cause some records to get a result code of ABANDONED. You can experiment with different values for the Parse Resource Limit setting to determine which value enables the best balance of accuracy and performance for your data. In Data Management Studio, the NULL value for the Parse Resource Limit means that the software will use the value for the Parse Resource Limit that is saved in the QKB. To override the value saved in the QKB, change the value in your job from NULL to one of the other available values: VERY_LOW, LOW, MEDIUM, HIGH, VERY_HIGH, or INTENSIVE. It is recommended that you leave the Parse Resource Limit set to NULL in Data Management Studio or explicitly set the value to INTENSIVE to obtain the most accurate results possible when executing definitions involving parsing of Indian addresses. |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Example 1 | Input | Output | |
1051 3RD FLOOR MAHAVIR BHAWAN,SITA RAM NAGAR,NEW DELHI 200021 | Recipient | ||
Building/Site | MAHAVIR BHAWAN | ||
Street | SITA RAM NAGAR | ||
Extension | 1051 3RD FLOOR | ||
PO Box | |||
Additional Info | NEW DELHI 200021 | ||
Example 2 | Input | Output | |
61 SAWYAM SIDHA COLONY WEST AVEVUE ROAD ,NEW DELHI | Recipient | ||
Building/Site | SAWYAM SIDHA COLONY | ||
Street | WEST AVEVUE ROAD | ||
Extension | 61 | ||
PO Box | |||
Additional Info | NEW DELHI | ||
Example 3 | Input | Output | |
D-54 IST FLOOR, BATLA HOUSE, OKHLA ,NEW DELHI | Recipient | ||
Building/Site | BATLA HOUSE | ||
Street | OKHLA | ||
Extension | D-54 IST FLOOR | ||
PO Box | |||
Additional Info | NEW DELHI | ||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
Address (Global) (v23) | |||
---|---|---|---|
Description |
The Address (Global) (v23) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Example 1 | Input | Output | |
1051 3RD FLOOR MAHAVIR BHAWAN,SITA RAM NAGAR,NEW DELHI 200021 | Recipient | ||
Building/Site | MAHAVIR BHAWAN | ||
Street | SITA RAM NAGAR | ||
Extension | 1051 3RD FLOOR | ||
PO Box | |||
Additional Info | NEW DELHI 200021 | ||
Example 2 | Input | Output | |
61 SAWYAM SIDHA COLONY WEST AVEVUE ROAD ,NEW DELHI | Recipient | ||
Building/Site | SAWYAM SIDHA COLONY | ||
Street | WEST AVEVUE ROAD | ||
Extension | 61 | ||
PO Box | |||
Additional Info | NEW DELHI | ||
Example 3 | Input | Output | |
D-54 IST FLOOR, BATLA HOUSE, OKHLA ,NEW DELHI | Recipient | ||
Building/Site | BATLA HOUSE | ||
Street | OKHLA | ||
Extension | D-54 IST FLOOR | ||
PO Box | |||
Additional Info | NEW DELHI | ||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
City - State/Province - Postal Code | |||
---|---|---|---|
Description |
The City - State/Province - Postal Code parse definition parses last line address information into a set of tokens. |
||
Output Tokens | City PinCode State/Union Territory |
||
Example | Input | Output | |
Buria Mizoram - 110025 | City | Buria | |
PinCode | 110025 | ||
State/Union Territory | Mizoram | ||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The City - State/Province - Postal Code (Global) parse definition parses last line address information into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Example | Input | Output | |
Buria Mizoram - 110025 | City | Buria | |
State/Province | Mizoram | ||
Postal Code | 110025 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name | |||
---|---|---|---|
Description |
The Name parse definition parses names of individuals into a set of tokens. |
||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Example 1 | Input | Output | |
Mrs. Asha A Ghelani | Prefix | Mrs. | |
Given Name | Asha | ||
Middle Name | A | ||
Family Name | Ghelani | ||
Suffix | |||
Title/Additional Info | |||
Example 2 | Input | Output | |
Jason Santosh Rebello,CTO Accenture India | Prefix | ||
Given Name | Jason | ||
Middle Name | Santosh | ||
Family Name | Rebello | ||
Suffix | |||
Title/Additional Info | CTO Accenture India | ||
Example 3 | Input | Output | |
Mrs. Tina Peter D'mello Jr. | Prefix | Mrs. | |
Given Name | Tina | ||
Middle Name | Peter | ||
Family Name | D'mello | ||
Suffix | Jr. | ||
Title/Additional Info | |||
Remarks |
Name (Matching) | |||
---|---|---|---|
Description | The Name (Matching) parse definition parses names of individuals into a set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Example 1 | Input | Output | |
Mrs. Asha A Ghelani | Prefix | Mrs. | |
Given Name | Asha A | ||
Middle Name | |||
Family Name | Ghelani | ||
Suffix | |||
Title/Additional Info | |||
Example 2 | Input | Output | |
Jason Santosh Rebello,CTO Accenture India | Prefix | ||
Given Name | Jason Santosh | ||
Middle Name | |||
Family Name | Rebello | ||
Suffix | |||
Title/Additional Info | CTO Accenture India | ||
Example 3 | Input | Output | |
Mrs. Tina Peter D'mello Jr. | Prefix | Mrs. | |
Given Name | Tina Peter | ||
Middle Name | |||
Family Name | D'mello | ||
Suffix | Jr. | ||
Title/Additional Info | |||
Remarks | This definition is currently used by the Name match definition. All given names are parsed into the Given Name token, and the Middle Name token is never used. |
Name (with Extensions) | |||
---|---|---|---|
Description | The Name (with Extensions) parse definition parses names of individuals into a set of tokens. | ||
Output Tokens | Prefix Extension Initial 1 Extension Initial 2 Extension Initial 3 Extension Initial 4 Given Name Given Name Extension Middle Name Middle Name Extension Family Name Suffix Title/Additional Info |
||
Example 1 | Input | Output | |
Mr. Yogesh Kumar Mahesh Kumar Chabbaria, President | Prefix | Mr. | |
Extension Initial 1 | |||
Extension Initial 2 | |||
Extension Initial 3 | |||
Extension Initial 4 | |||
Given Name | Yogesh | ||
Given Name Extension | Kumar | ||
Middle Name | Mahesh | ||
Middle Name Extension | Kumar | ||
Family Name | Chabbaria | ||
Suffix | |||
Title/Additional Info | President | ||
Example 2 | Input | Output | |
Mrs. F D Devi Pratap Rajput | Prefix | Mrs. | |
Extension Initial 1 | F | ||
Extension Initial 2 | D | ||
Extension Initial 3 | |||
Extension Initial 4 | |||
Given Name | Devi | ||
Given Name Extension | |||
Middle Name | Pratap | ||
Middle Name Extension | |||
Family Name | Rajput | ||
Suffix | |||
Title/Additional Info | |||
Example 3 | Input | Output | |
Mr. V V S Laxman Narayan Nayar, CEO ICICI | Prefix | Mr. | |
Extension Initial 1 | V | ||
Extension Initial 2 | V | ||
Extension Initial 3 | S | ||
Extension Initial 4 | |||
Given Name | Laxman | ||
Given Name Extension | |||
Middle Name | Narayan | ||
Middle Name Extension | |||
Family Name | Nayar | ||
Suffix | |||
Title/Additional Info | CEO ICICI | ||
Remarks |
|
Name (Global) | |||
---|---|---|---|
Description | The Name (Global) parse definition parses names of individuals into a globally recognized set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Example 1 | Input | Output | |
Mr. Dinkar Sathe | Prefix | Mr. | |
Given Name | Dinkar | ||
Middle Name | |||
Family Name | Sathe | ||
Suffix | |||
Title/Additional Info | |||
Example 2 | Input | Output | |
Mr. Deepak Keshavji Chheda | Prefix | Mr. | |
Given Name | Deepak | ||
Middle Name | Keshavji | ||
Family Name | Chheda | ||
Suffix | |||
Title/Additional Info | |||
Example 3 | Input | Output | |
Mrs. Dyna Peter D'souza, Vice President | Prefix | Mrs. | |
Given Name | Dyna | ||
Middle Name | Peter | ||
Family Name | D'souza | ||
Suffix | |||
Title/Additional Info | Vice President | ||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Phone | |||
---|---|---|---|
Description |
The Phone parse definition parses phone numbers into a set of tokens. |
||
Output Tokens | Prefix Country Code Area Code Base Number/Cell Number Extension Word Extension Suffix Additional Info |
||
Example 1 | Input | Output | |
+91-2027290597 x 211 | Prefix | ||
Country Code | +91 | ||
Area Code | 20 | ||
Base Number/Cell Number | 27290597 | ||
Extension Word | x | ||
Extension | 211 | ||
Suffix | |||
Additional Info | |||
Example 2 | Input | Output | |
6422381/6431746 | Prefix | ||
Country Code | |||
Area Code | |||
Base Number/Cell Number | 6422381 | ||
Extension Word | |||
Extension | |||
Additional Info | 6431746 | ||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description | The Phone (Global) parse definition parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Example | Input | Output | |
+91-2027290597 x 211 | Country Code | +91 | |
Area Code | 20 | ||
Base Number | 27290597 | ||
Extension | 211 | ||
Line Type | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
None.
Address | ||
---|---|---|
Description | The Address standardization definition standardizes addresses. | |
Examples | Input | Output |
B/8 Kapol Soc Marve Rd Malad Mumbai | B/8, Kapol Society, Marve Road, Malad, Mumbai | |
B/8 Jimit Apt Marve Road Bangur Ngr Goregaon East | B/8, Jimit Apartment, Marve Road, Bangur Nagar, Goregaon East | |
Remarks | This standardization definition performs context-specific transformations based on an internal parse of the input data string. Successful parsing is vital to achieve accurate results for the standardization. However, the complexity of Indian addresses makes parsing difficult in many cases. Therefore, we recommend the following method for standardizing Indian addresses:
1. Parse address strings using the Address Parse definition. Store the result code for the parse along with the parse output. 2. Separate the parsed output in two branches. One branch should contain the output for successful parses (result code OK). The other branch should contain the output for unsuccessful parses (result codes NO SOLUTION and ABANDONED). 3. Standardize the successfully parsed records using the Address Standardization definition. (It is recommended that you input the pre-parsed token values into the Standardization definition to avoid a time-consuming internal parse.) 4. Standardize the records in the second branch using the Address (Generic) Standardization definition. 5. Merge the standardized results from each branch. For further information about address parsing, please review the documentation for the Parse definition for Address (Remarks section). |
Address (Generic) | ||
---|---|---|
Description | The Address (Generic) standardization definition standardizes addresses. | |
Examples | Input | Output |
B/8 Kapol Soc Marve Rd Malad Mumbai | B/8 Kapol Society Marve Road Malad Mumbai | |
B/8 Jimit Apt Marve Road Bangur Ngr Goregaon East | B/8 Jimit Apartment Marve Road Bangur Nagar Goregaon East | |
Remarks | This standardization definition for Address (Generic) performs only simple transformations; it does not attempt complex transformations based on internal parsing. It is recommended for use when standardizing addressess that cannot be parsed successfully. For more information, refer to the Standardization definition for Address (Remarks section). In some cases, if changing the data before parsing is appropriate, you may get better parse results if you standardize the data using this definition before doing the parse. |
City | ||
---|---|---|
Description |
The City standardization definition standardizes city names. |
|
Examples | Input | Output |
Bombay | Mumbai | |
Calcutta | Kolkatta | |
Remarks | Common city abbreviations are expanded into full names. |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code standardization definition standardizes last line address information. | |
Examples | Input | Output |
Faridabad Uttaranchal - 110034 | Faridabad, 110034, Uttaranchal | |
Remarks |
Name | ||
---|---|---|
Description | The Name standardization definition standardizes names of individuals. | |
Examples | Input | Output |
CEO Nishit V Shah | Nishit V Shah, CEO | |
Professor Ranjan Verma | Prof Ranjan Verma | |
Remarks |
Phone | ||
---|---|---|
Description | The Phone standardization definition standardizes phone numbers for domestic use. | |
Examples | Input | Output |
91 22-6983000 | +91-22-6983000 | |
911452572 ext. 45 | +91-14-52572 X 45 | |
Remarks |
PinCode | ||
---|---|---|
Description | The PinCode standardization definition standardizes postal codes. | |
Example | Input | Output |
PO411006 | PO 411006 | |
Remarks |
Postal Code | ||
---|---|---|
Description | The Postal Code standardization definition standardizes postal codes. | |
Example | Input | Output |
PO411006 | PO 411006 | |
Remarks | This is the same behavior as PinCode. This Standardization definition allows you to invoke postal code standard across locales with a single definition call, because a definition with this name is implemented on all locales. |
State | ||
---|---|---|
Description | The State standardization definition standardizes state and union territory names. | |
Examples | Input | Output |
A.P. | Andhra Pradesh | |
nicobar | Andaman and Nicobar Islands | |
Remarks |
In addition to the definitions listed on this page, the English, India locale also inherits all definitions for the English language and all Global definitions.
Documentation Feedback: yourturn@sas.com
|
Doc ID: QKBCI_ENIND_defs.html |