You are here: Definitions>English Definitions>English, India Definitions

SAS Quality Knowledge Base for Contact Information 25

English, India Definitions

Definitions for the English, India locale are described below.

Case Definitions
Gender Analysis Definitions

Identification Analysis Definitions

Match Definitions

Parse Definitions

Pattern Analysis Definitions

Standardization Definitions

Inherited Definitions

Case Definitions

Proper (Address)
Description The Proper (Address) case definition propercases addresses.
Examples Input Output
405 SAMEDH TOWER SHAM NAGAR NR ADARASH PETROL PUMP 405 Samedh Tower Sham Nagar nr Adarash Petrol Pump
30 OMKAR HOUSE C G ROAD NAVRANGPURA 30 Omkar House C G Road Navrangpura
Remarks  

 

Proper (Name)
Description The Proper (Name) case definition propercases names of individuals.
Examples Input Output
AMRITLAL M DHAKA Amritlal M Dhaka
pravin champalal bokadia Pravin Champalal Bokadia
Remarks  

Gender Analysis Definitions

Name
Description The Name gender analysis definition determines the gender of a name.
Possible Outputs M
F
U
Examples Input Output
Miss. Neha Patel F
Vora Praveen M
Dr. Pandey U
Remarks  

Identification Analysis Definitions

None.

Match Definitions

Address
Description The Address match definition generates match codes which can be used to cluster records containing addresses.
Max Length of Match Code 86 characters
Examples Input Cluster ID
B 8 Kapol Society Marve Road Malad Mumbai 0
A 310 Kapol Nivas Marve Road Near Green Park Malad 0
JIMIT APARTMENT 2ND FLOOR NEAR KAPOL SOC MARVE RD ROHINI 1
IBM LTD JIMIT APARTMENT 4TH FLOOR MARVE RD NEAR KASTURI NAGAR ROHINI 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

The time it takes to execute matching for Indian addresses is mainly driven by the parsing that takes place internally in this match definition. If your input is already parsed, you can save a significant amount of execution time when you use the Match Codes (Parsed) node in DataFlux Data Management Studio.

 

City
Description The City match definition generates match codes which can be used to cluster records containing city names.
Max Length of Match Code 15 characters
Examples Input Cluster ID
Mumbai 0
Bombay 0
Ambala Sadar 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

City - State/Province - Postal Code
Description The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information.
Max Length of Match Code 22 characters
Examples Input Cluster ID
Buria Pin 110025 Mizoram 0
Mizoram Buria - 110025 0
Charkhi Dadri Nagaland - 110026 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Name
Description The Name match definition generates match codes which can be used to cluster records containing names of individuals.
Max Length of Match Code 20 characters
Examples Input Cluster ID
Navneet Nischal 0
Mr. Navnit Nischal, Dir 0
Deepak Parekh 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Name (with Extensions)
Description The Name (with Extensions) match definition generates match codes which can be used to cluster records containing names of individuals.
Max Length of Match Code 50 characters
Examples Input Cluster ID
V V S Laxman 0
Mr. V V P D Laxxman ,CEO 0
Mr. A V D Rajendra 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Phone
Description The Phone match definition generates match codes which can be used to cluster records containing phone numbers.
Max Length of Match Code 16 characters
Examples Input Cluster ID
MOB-9869000595 0
98690-00595 0
2871565 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

PinCode
Description The PinCode match definition generates match codes which can be used to cluster records containing postal codes.
Max Length of Match Code 15 characters
Examples Input Cluster ID
PIN 110035 0
110035 0
110018 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Postal Code
Description The Postal Code match definition generates match codes which can be used to cluster records containing postal codes.
Max Length of Match Code 15 characters
Examples Input Cluster ID
PIN 110035 0
PIN 110035 0
110018 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

This is the same behavior as PinCode. This definition allows the user to invoke postal code matching across locales with a single definition call, because a definition with this name is implemented on all locales.

 

State
Description The State match definition generates match codes which can be used to cluster records containing names of states and union territories.
Max Length of Match Code 15 characters
Examples Input Cluster ID
J & K 0
Jammu and Kashir 0
Andaman 1
Andaman & Nicobar 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

Parse Definitions

Address
Description The Address parse definition parses addresses into a set of tokens.
Output Tokens Extension
Building/Wing Number
Building Name
Plot
Phase
Street
Sector
Part
Neighborhood
At Post
Locality
Landmark
Scheme
Pocket
Layout
Survey
Cross
Main
Stage
Block
Tehsil
Additional Info
Example 1 Input Output
402 Shalimar Manzil Layout No 2 Survey No 3 Tehsil Manas Thane Extension 402
Building/Wing Number  
Building Name Shalimar Manzil
Plot  
Phase  
Street  
Sector  
Part  
Neighborhood  
At Post  
Locality  
Landmark  
Scheme  
Pocket  
Layout Layout No 2
Survey Survey No 3
Cross  
Main  
Stage  
Block  
Tehsil Tehsil Manas
Additional Info Thane
Example 2 Input Output
Shah Associates Ltd B/8/1 A Wing Anand Dham Soc Sector 4 Part No 2 12 A Main Survey No 1 Layout No 12 Shankar Rd Malad West Mumbai Maharashtra 400064 India Extension B/8/1
Building/Wing Number A Wing
Building Name Anand Dham Soc
Plot  
Phase  
Street Shankar Rd
Sector Sector 4
Part Part No 2
Neighborhood  
At Post  
Locality Malad West
Landmark  
Scheme  
Pocket  
Layout Layout No 12
Survey Survey No 1
Cross  
Main 12 A Main
Stage  
Block  
Tehsil  
Additional Info Shah Associates Ltd Mumbai Maharashtra 400064 India
Example 3 Input Output
  TCS Ltd Third Floor Amardeep Plaza 12 Stage Scheme No - 12 Pocket Z Block 12 12 A Cross Opp Sai Temple Kala Nagar Bangalore Karnataka India Extension Third Floor
Building/Wing Number  
Building Name Amardeep Plaza
Plot  
Phase  
Street  
Sector  
Part  
Neighborhood Kala Nagar
At Post  
Locality  
Landmark Opp Sai Temple
Scheme Scheme No - 12
Pocket Pocket Z
Layout  
Survey  
Cross 12 A Cross
Main  
Stage 12 Stage
Block Block 12
Tehsil  
Additional Info TCS Ltd Bangalore Karnataka India
Remarks

Due to the complexity of Indian addresses, the software might require a higher memory limit to complete processing of an input string. This definition has set the Parse Resource Limit to INTENSIVE in the QKB to obtain the most accurate results possible when executing definitions involving parsing of Indian addresses. Higher values for the Parse Resource Limit setting can cause higher execution times for your jobs, however, lower values cause some records to get a result code of ABANDONED. You can experiment with different values for the Parse Resource Limit setting to determine which value enables the best balance of accuracy and performance for your data.

In Data Management Studio, the NULL value for the Parse Resource Limit means that the software will use the value for the Parse Resource Limit that is saved in the QKB. To override the value saved in the QKB, change the value in your job from NULL to one of the other available values: VERY_LOW, LOW, MEDIUM, HIGH, VERY_HIGH, or INTENSIVE.

It is recommended that you leave the Parse Resource Limit set to NULL in Data Management Studio or explicitly set the value to INTENSIVE to obtain the most accurate results possible when executing definitions involving parsing of Indian addresses.

 

Address (Full)
Description The Address (Full) parse definition parses addresses containing complete two-line addresses into a set of tokens.
Output Tokens Extension
Building/Wing Number
Building Name
Plot
Phase
Street
Sector
Part
Neighborhood
At Post
Locality
Landmark
Scheme
Pocket
Layout
Survey
Cross
Main
Stage
Block
Tehsil
City
Pincode
State
Additional Info
Example 1 Input Output
B-8 Kapol Soc Marve Road Malad West Mumbai Maharashtra 400060 Extension B-8
Building/Wing Number  
Building Name Kapol Soc
Plot  
Phase  
Street Marve Road
Sector  
Part  
Neighborhood  
At Post  
Locality Malad West
Landmark  
Scheme  
Pocket  
Layout  
Survey  
Cross  
Main  
Stage  
Block  
Tehsil
City Mumbai
Pincode 400060
State Maharashtra
Additional Info  
Example 2 Input Output
A/1212 Saraswati CHS Plot No 1 23 rd Phase Sector No 12 Mahakali Road Govind Nagar Kandivali Mumbai 400064 Maharashtra India Extension A/1212
Building/Wing Number  
Building Name Saraswati CHS
Plot Plot No 1
Phase 23 rd Phase
Street Mahakali Road
Sector Sector No 12
Part  
Neighborhood Govind Nagar
At Post  
Locality Kandivali
Landmark  
Scheme  
Pocket  
Layout  
Survey  
Cross  
Main  
Stage  
Block  
Tehsil  
City Mumbai
Pincode 400064
State MAHARASHTRA
Additional Info India
Example 3 Input Output
C/1 Appejay House Plot No 12 Jyoti Scheme Karvaya Layout Pocket 12 Survey No 1 Nr K C College Mumbai 400064 Extension C/1
Building/Wing Number  
Building Name Appejay House
Plot Plot No 12
Phase  
Street  
Sector  
Part  
Neighborhood  
At Post  
Locality  
Landmark Nr K C College
Scheme Jyoti Scheme
Pocket Pocket 12
Layout Karvaya Layout
Survey Survey No 1
Cross  
Main  
Stage  
Block  
Tehsil  
City Mumbai
Pincode 400064
State  
Additional Info  
Remarks

Due to the complexity of Indian addresses, the software might require a higher memory limit to complete processing of an input string. This definition has set the Parse Resource Limit to INTENSIVE in the QKB to obtain the most accurate results possible when executing definitions involving parsing of Indian addresses. Higher values for the Parse Resource Limit setting can cause higher execution times for your jobs, however, lower values cause some records to get a result code of ABANDONED. You can experiment with different values for the Parse Resource Limit setting to determine which value enables the best balance of accuracy and performance for your data.

In Data Management Studio, the NULL value for the Parse Resource Limit means that the software will use the value for the Parse Resource Limit that is saved in the QKB. To override the value saved in the QKB, change the value in your job from NULL to one of the other available values: VERY_LOW, LOW, MEDIUM, HIGH, VERY_HIGH, or INTENSIVE.

It is recommended that you leave the Parse Resource Limit set to NULL in Data Management Studio or explicitly set the value to INTENSIVE to obtain the most accurate results possible when executing definitions involving parsing of Indian addresses.

 

Address (Global)
Description

The Address (Global) parse definition parses addresses into a globally recognized set of tokens.

Output Tokens Recipient
Building/Site
Street
Extension
PO Box
Additional Info
Example 1 Input Output
1051 3RD FLOOR MAHAVIR BHAWAN,SITA RAM NAGAR,NEW DELHI 200021 Recipient  
Building/Site MAHAVIR BHAWAN
Street SITA RAM NAGAR
Extension 1051 3RD FLOOR
PO Box  
Additional Info NEW DELHI 200021
Example 2 Input Output
61 SAWYAM SIDHA COLONY WEST AVEVUE ROAD ,NEW DELHI Recipient  
Building/Site SAWYAM SIDHA COLONY
Street WEST AVEVUE ROAD
Extension 61
PO Box  
Additional Info NEW DELHI
Example 3 Input Output
D-54 IST FLOOR, BATLA HOUSE, OKHLA ,NEW DELHI Recipient  
Building/Site BATLA HOUSE
Street OKHLA
Extension D-54 IST FLOOR
PO Box  
Additional Info NEW DELHI
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB.

The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back.

 

Address (Global) (v23)
Description

The Address (Global) (v23) parse definition parses addresses into a globally recognized set of tokens.

Output Tokens Recipient
Building/Site
Street
Extension
PO Box
Additional Info
Example 1 Input Output
1051 3RD FLOOR MAHAVIR BHAWAN,SITA RAM NAGAR,NEW DELHI 200021 Recipient  
Building/Site MAHAVIR BHAWAN
Street SITA RAM NAGAR
Extension 1051 3RD FLOOR
PO Box  
Additional Info NEW DELHI 200021
Example 2 Input Output
61 SAWYAM SIDHA COLONY WEST AVEVUE ROAD ,NEW DELHI Recipient  
Building/Site SAWYAM SIDHA COLONY
Street WEST AVEVUE ROAD
Extension 61
PO Box  
Additional Info NEW DELHI
Example 3 Input Output
D-54 IST FLOOR, BATLA HOUSE, OKHLA ,NEW DELHI Recipient  
Building/Site BATLA HOUSE
Street OKHLA
Extension D-54 IST FLOOR
PO Box  
Additional Info NEW DELHI
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB.

The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back.

 

City - State/Province - Postal Code
Description

The City - State/Province - Postal Code parse definition parses last line address information into a set of tokens.

Output Tokens City
PinCode
State/Union Territory
Example Input Output
Buria Mizoram - 110025 City Buria
PinCode 110025
State/Union Territory Mizoram
Remarks  

 

City - State/Province - Postal Code (Global)
Description The City - State/Province - Postal Code (Global) parse definition parses last line address information into a globally recognized set of tokens.
Output Tokens City
State/Province
Postal Code
Additional Info
Example Input Output
Buria Mizoram - 110025 City Buria
State/Province Mizoram
Postal Code 110025
Additional Info  
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

 

Name
Description

The Name parse definition parses names of individuals into a set of tokens.

Output Tokens Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info
Example 1 Input Output
Mrs. Asha A Ghelani Prefix Mrs.
Given Name Asha
Middle Name A
Family Name Ghelani
Suffix  
Title/Additional Info  
Example 2 Input Output
Jason Santosh Rebello,CTO Accenture India Prefix  
Given Name Jason
Middle Name Santosh
Family Name Rebello
Suffix  
Title/Additional Info CTO Accenture India
Example 3 Input Output
Mrs. Tina Peter D'mello Jr. Prefix Mrs.
Given Name Tina
Middle Name Peter
Family Name D'mello
Suffix Jr.
Title/Additional Info  
Remarks  

 

Name (Matching)
Description The Name (Matching) parse definition parses names of individuals into a set of tokens.
Output Tokens Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info
Example 1 Input Output
Mrs. Asha A Ghelani Prefix Mrs.
Given Name Asha A
Middle Name  
Family Name Ghelani
Suffix  
Title/Additional Info  
Example 2 Input Output
Jason Santosh Rebello,CTO Accenture India Prefix  
Given Name Jason Santosh
Middle Name
Family Name Rebello
Suffix  
Title/Additional Info CTO Accenture India
Example 3 Input Output
Mrs. Tina Peter D'mello Jr. Prefix Mrs.
Given Name Tina Peter
Middle Name  
Family Name D'mello
Suffix Jr.
Title/Additional Info  
Remarks This definition is currently used by the Name match definition. All given names are parsed into the Given Name token, and the Middle Name token is never used.

 

Name (with Extensions)
Description The Name (with Extensions) parse definition parses names of individuals into a set of tokens.
Output Tokens Prefix
Extension Initial 1
Extension Initial 2
Extension Initial 3
Extension Initial 4
Given Name
Given Name Extension
Middle Name
Middle Name Extension
Family Name
Suffix
Title/Additional Info
Example 1 Input Output
Mr. Yogesh Kumar Mahesh Kumar Chabbaria, President Prefix Mr.
Extension Initial 1  
Extension Initial 2  
Extension Initial 3  
Extension Initial 4  
Given Name Yogesh
Given Name Extension Kumar
Middle Name Mahesh
Middle Name Extension Kumar
Family Name Chabbaria
Suffix  
Title/Additional Info President
Example 2 Input Output
Mrs. F D Devi Pratap Rajput Prefix Mrs.
Extension Initial 1 F
Extension Initial 2 D
Extension Initial 3  
Extension Initial 4  
Given Name Devi
Given Name Extension  
Middle Name Pratap
Middle Name Extension  
Family Name Rajput
Suffix  
Title/Additional Info  
Example 3 Input Output
Mr. V V S Laxman Narayan Nayar, CEO ICICI Prefix Mr.
Extension Initial 1 V
Extension Initial 2 V
Extension Initial 3 S
Extension Initial 4  
Given Name Laxman
Given Name Extension  
Middle Name Narayan
Middle Name Extension  
Family Name Nayar
Suffix  
Title/Additional Info CEO ICICI
Remarks

 

 

Name (Global)
Description The Name (Global) parse definition parses names of individuals into a globally recognized set of tokens.
Output Tokens Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info
Example 1 Input Output
Mr. Dinkar Sathe Prefix Mr.
Given Name Dinkar
Middle Name  
Family Name Sathe
Suffix  
Title/Additional Info  
Example 2 Input Output
Mr. Deepak Keshavji Chheda Prefix Mr.
Given Name Deepak
Middle Name Keshavji
Family Name Chheda
Suffix  
Title/Additional Info  
Example 3 Input Output
Mrs. Dyna Peter D'souza, Vice President Prefix Mrs.
Given Name Dyna
Middle Name Peter
Family Name D'souza
Suffix  
Title/Additional Info Vice President
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

 

Phone
Description

The Phone parse definition parses phone numbers into a set of tokens.

Output Tokens Prefix
Country Code
Area Code
Base Number/Cell Number
Extension Word
Extension
Suffix
Additional Info
Example 1 Input Output
+91-2027290597 x 211 Prefix  
Country Code +91
Area Code 20
Base Number/Cell Number 27290597
Extension Word x
Extension 211
Suffix  
Additional Info  
Example 2 Input Output
6422381/6431746 Prefix  
Country Code  
Area Code  
Base Number/Cell Number 6422381
Extension Word  
Extension  
Additional Info 6431746
Remarks  

 

Phone (Global)
Description The Phone (Global) parse definition parses phone numbers into a globally recognized set of tokens.
Output Tokens Country Code
Area Code
Base Number
Extension
Line Type
Additional Info
Example Input Output
+91-2027290597 x 211 Country Code +91
Area Code 20
Base Number 27290597
Extension 211
Line Type  
Additional Info  
Remarks Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

Pattern Analysis Definitions

None.

Standardization Definitions

Address
Description The Address standardization definition standardizes addresses.
Examples Input Output
B/8 Kapol Soc Marve Rd Malad Mumbai B/8, Kapol Society, Marve Road, Malad, Mumbai
B/8 Jimit Apt Marve Road Bangur Ngr Goregaon East B/8, Jimit Apartment, Marve Road, Bangur Nagar, Goregaon East
Remarks This standardization definition performs context-specific transformations based on an internal parse of the input data string. Successful parsing is vital to achieve accurate results for the standardization. However, the complexity of Indian addresses makes parsing difficult in many cases. Therefore, we recommend the following method for standardizing Indian addresses:

1. Parse address strings using the Address Parse definition. Store the result code for the parse along with the parse output.

2. Separate the parsed output in two branches. One branch should contain the output for successful parses (result code OK). The other branch should contain the output for unsuccessful parses (result codes NO SOLUTION and ABANDONED).

3. Standardize the successfully parsed records using the Address Standardization definition. (It is recommended that you input the pre-parsed token values into the Standardization definition to avoid a time-consuming internal parse.)

4. Standardize the records in the second branch using the Address (Generic) Standardization definition.

5. Merge the standardized results from each branch.

For further information about address parsing, please review the documentation for the Parse definition for Address (Remarks section).

 

Address (Generic)
Description The Address (Generic) standardization definition standardizes addresses.
Examples Input Output
B/8 Kapol Soc Marve Rd Malad Mumbai B/8 Kapol Society Marve Road Malad Mumbai
B/8 Jimit Apt Marve Road Bangur Ngr Goregaon East B/8 Jimit Apartment Marve Road Bangur Nagar Goregaon East
Remarks This standardization definition for Address (Generic) performs only simple transformations; it does not attempt complex transformations based on internal parsing. It is recommended for use when standardizing addressess that cannot be parsed successfully. For more information, refer to the Standardization definition for Address (Remarks section).

In some cases, if changing the data before parsing is appropriate, you may get better parse results if you standardize the data using this definition before doing the parse.

 

City
Description

The City standardization definition standardizes city names.

Examples Input Output
Bombay Mumbai
Calcutta Kolkatta
Remarks Common city abbreviations are expanded into full names.

 

City - State/Province - Postal Code
Description The City - State/Province - Postal Code standardization definition standardizes last line address information.
Examples Input Output
Faridabad Uttaranchal - 110034 Faridabad, 110034, Uttaranchal
Remarks  

 

Name
Description The Name standardization definition standardizes names of individuals.
Examples Input Output
CEO Nishit V Shah Nishit V Shah, CEO
Professor Ranjan Verma Prof Ranjan Verma
Remarks  

 

Phone
Description The Phone standardization definition standardizes phone numbers for domestic use.
Examples Input Output
91 22-6983000 +91-22-6983000
911452572 ext. 45 +91-14-52572 X 45
Remarks  

 

PinCode
Description The PinCode standardization definition standardizes postal codes.
Example Input Output
PO411006 PO 411006
Remarks  

 

Postal Code
Description The Postal Code standardization definition standardizes postal codes.
Example Input Output
PO411006 PO 411006
Remarks This is the same behavior as PinCode. This Standardization definition allows you to invoke postal code standard across locales with a single definition call, because a definition with this name is implemented on all locales.

 

State
Description The State standardization definition standardizes state and union territory names.
Examples Input Output
A.P. Andhra Pradesh
nicobar Andaman and Nicobar Islands
Remarks  

Inherited Definitions

In addition to the definitions listed on this page, the English, India locale also inherits all definitions for the English language and all Global definitions.