SAS Quality Knowledge Base for Contact Information 25
Definitions for the Chinese, China locale are described below.
Case Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Proper (Name) | ||
---|---|---|
Description | Propercases names written in the Latin alphabet. | |
Input | Output | |
Examples | XIANG LI CHEN | Xiang Li Chen |
(赛仕总经理)LIDAWEI | (赛仕总经理)Lidawei | |
Remarks |
Upper (Address) | ||
---|---|---|
Description | Uppercases Latin characters found in address data. | |
Input | Output | |
Examples | 益田新村106栋22g | 益田新村106栋22G |
黄兴路2005弄1号23号楼a座 | 黄兴路2005弄1号23号楼A座 | |
cbd商务外环路1号蓝码地王大厦3001单元 | CBD商务外环路1号蓝码地王大厦3001单元 | |
Remarks |
Upper (Organization) | ||
---|---|---|
Description | Uppercases Latin characters found in organization names. Well-known words are propercased where appropriate. | |
Input | Output | |
Examples | 上海mwb互感器有限公司 | 上海MWB互感器有限公司 |
沃尔玛深国投百货有限公司成都sm广场分店 | 沃尔玛深国投百货有限公司成都SM广场分店 | |
Remarks | Certain well-known company names are propercased. |
ID Number | ||
---|---|---|
Description | Determines the gender associated with an ID number. | |
Possible Outputs | M F U |
|
Input | Output | |
Examples | 130503196704010012 | M |
330108198503179268 | F | |
0174685503(D) | U | |
Remarks | Gender is determined from the sequence code within the ID number. |
Individual/Organization | ||
---|---|---|
Description | Determines whether a string represents the name of an individual or an organization. | |
Possible Outputs | INDIVIDUAL ORGANIZATION UNKNOWN |
|
Input | Output | |
Examples | 张晓东 | INDIVIDUAL |
李大伟 赛仕(北京) | INDIVIDUAL | |
司徒怀(先生) | INDIVIDUAL | |
深圳海王药业有限公司 | ORGANIZATION | |
李宁 | INDIVIDUAL | |
李宁有限公司 | ORGANIZATION | |
曲 | UNKNOWN | |
Remarks |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 237 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
人民北路群星广场开元大厦A单元2层2306室(底商) | 1 |
人民北路群星广场开元大厦A单元2层2306室(商铺) | 2 | |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 3 | |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 4 | |
Remarks | All components of the address are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
人民北路群星广场开元大厦A单元2层2306室(底商) | 1 |
人民北路群星广场开元大厦A单元2层2306室(商铺) | 1 | |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 2 | |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 3 | |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 4 | |
Remarks | Street, Block/Lane, Building, Unit, Floor, and Room are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
人民北路群星广场开元大厦A单元2层2306室(商铺) | 1 |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 1 | |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 2 | |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 3 | |
人民北路群星广场开天大厦B单元3层2307室(商铺) | 4 | |
Remarks | Street, Block/Lane, Building, Unit, and Floor are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 1 |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 1 | |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 2 | |
人民北路群星广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Street, Block/Lane, Building, and Unit are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 1 |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 1 | |
人民北路群星广场开天大厦B单元3层2307室(商铺) | 2 | |
人民北路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Street, Block/Lane, and Building are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
人民公社北路群星广场开元大厦B单元3层2307室(商铺) | 1 |
人民公社北路群星广场开天大厦B单元3层2307室(商铺) | 1 | |
人民公社南路群星广场开天大厦B单元3层2307室(商铺) | 2 | |
人民公社北路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
人民公社南路群众广场开天大厦B单元3层2307室(商铺) | 4 | |
Remarks | Street and Block/Lane are evaluated. Different forms of some words will match. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 7 Sensitivities 50-69 |
人民公社北路群星广场开元大厦B单元3层2307室(商铺) | 1 |
人民公社南路群星广场开天大厦B单元3层2307室(商铺) | 1 | |
人民公社北路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
人民公社南路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
Remarks | Fewer characters in Street and the same characters in Block/Lane are evaluated, since in sampling data Block/Lane is more popular than Street. Different forms of some words will match. Note that fewer characters in the address are considered as the sensitivity is lowered. |
Address (Full) | ||
---|---|---|
Description | The Address (Full) match definition generates match codes which can be used to cluster records containing complete two-line addresses. | |
Max Length of Match Code | 223 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元2层2306室(底商) 123456 | 1 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元2层2306室(商铺) 123456 | 1 | |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元2层2306室(商铺) 邮编:123456 | 1 | |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元2层2306室(商铺) 100052 | 2 | |
Remarks | All components of the address are evaluated except for additional info. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元2层2306室(商铺) 100052 | 1 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元2层2307室(商铺) 100052 | 1 | |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元3层2307室(商铺) | 2 | |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦B单元3层2307室(商铺) | 3 | |
Remarks | Additional info, postal code, and room info are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元2层2307室(商铺) | 1 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元3层2307室(商铺) | 1 | |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦B单元3层2307室(商铺) | 2 | |
广东省深圳市宝安区西乡镇人民北路群星广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Additional info, postal code, room, and floor info are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦A单元3层2307室(商铺) | 1 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦B单元3层2307室(商铺) | 1 | |
广东省深圳市宝安区西乡镇人民北路群星广场开天大厦B单元3层2307室(商铺) | 2 | |
广东省深圳市宝安区西乡镇人民北路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Additional info, postal code, room, floor, and unit info are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
广东省深圳市宝安区西乡镇人民北路群星广场开元大厦B单元3层2307室(商铺) | 1 |
广东省深圳市宝安区西乡镇人民北路群星广场开天大厦B单元3层2307室(商铺) | 1 | |
广东省深圳市宝安区西乡镇人民北路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
广东省深圳市宝安区西乡镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Additional info, postal code, room, floor, unit, and building info are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
广东省深圳市宝安区西乡镇人民北路群星广场开天大厦B单元3层2307室(商铺) | 1 |
广东省深圳市宝安区西乡镇人民北路群星广场开天大厦B单元3层2307室(商铺) | 1 | |
广东省深圳市宝安区西乡镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
广东省深圳市宝安区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Only province, city, district/prefecture/county, town/village, and street info are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 7 Sensitivities 65-69 |
广东省深圳市宝安区西乡镇人民北路群众广场开天大厦B单元3层2307室(商铺) | 1 |
广东省深圳市宝安区西乡镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 1 | |
广东省深圳市宝安区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
广东省深圳市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Only province, city, district/prefecture/county, and town/village info are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 8 Sensitivities 60-64 |
广东省深圳市宝安区西乡镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 1 |
广东省深圳市宝安区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 1 | |
广东省深圳市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
广东省中山市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Only province, city, district/prefecture/county, and town/village info are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 9 Sensitivities 55-59 |
广东省深圳市宝安区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 1 |
广东省深圳市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 1 | |
广东省中山市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
广西省中山市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Only province and city info are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 10 Sensitivities 50-54 |
广东省深圳市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 1 |
广东省中山市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 1 | |
广西省中山市南山区西风镇人民南路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
Remarks | Only province info is evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. |
Address (PO Box Only) | ||
---|---|---|
Description | The Address (PO Box Only) match definition generates match codes which can be used to cluster records containing the PO Box portion of an address. | |
Max Length of Match Code | 23 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
北京国际邮电局邮政信箱“100600-9082”号 | 1 |
北京国际邮电局邮政信箱第100600-9082号 | 1 | |
北京邮政信箱100600-9082号 | 1 | |
北京邮政一零零六零零-九零八二信箱 | 1 | |
北京邮政100600-9082信箱 | 1 | |
北京邮政100600-9010信箱 | 2 | |
北京邮政100600-9999信箱 | 3 | |
Remarks | The first 13 digits of the PO Box number and the first 2 characters of the city are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
北京邮政100600-9082信箱 | 1 |
北京邮政100600-9010信箱 | 1 | |
北京邮政100600-9999信箱 | 2 | |
北京邮政100600-8888信箱 | 3 | |
Remarks | The first 9 digits of the PO Box number and the first 2 characters of the city are evaluated. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
北京邮政100600-9010信箱 | 1 |
北京邮政100600-9999信箱 | 1 | |
北京邮政100600-8888信箱 | 2 | |
北京邮政1006007777信箱 | 3 | |
Remarks | The first 8 digits of the PO Box number and the first 2 characters of the city are evaluated. | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
北京邮政100600-9999信箱 | 1 |
北京邮政100600-8888信箱 | 1 | |
北京邮政1006007777信箱 | 2 | |
北京邮政100606-6666信箱 | 3 | |
Remarks | The first 7 digits of the PO Box number and the first 2 characters of the city are evaluated. | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
北京邮政100600-8888信箱 | 1 |
北京邮政1006007777信箱 | 1 | |
北京邮政100606-6666信箱 | 2 | |
北京邮政100655-5555信箱 | 3 | |
Remarks | Additional info, postal code, room, floor, unit, and building info are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
北京邮政1006007777信箱 | 1 |
北京邮政100606-6666信箱 | 1 | |
北京邮政100655-5555信箱 | 2 | |
北京邮政100444-4444信箱 | 3 | |
Remarks | Only province, city, district/prefecture/county, town/village, and street info are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 7 Sensitivities 65-69 |
北京邮政100606-6666信箱 | 1 |
北京邮政100655-5555信箱 | 1 | |
北京邮政100444-4444信箱 | 2 | |
北京邮政103333-3333信箱 | 3 | |
Remarks | The first 4 digits of the PO Box number and the first 2 characters of the city are evaluated. | |
Input | Cluster ID | |
Example 8 Sensitivities 60-64 |
北京邮政100655-5555信箱 | 1 |
北京邮政100444-4444信箱 | 1 | |
北京邮政103333-3333信箱 | 2 | |
北京邮政122222-2222信箱 | 3 | |
Remarks | The first 3 digits of the PO Box number and the first 2 characters of the city are evaluated. | |
Input | Cluster ID | |
Example 9 Sensitivities 55-59 |
北京邮政100444-4444信箱 | 1 |
北京邮政103333-3333信箱 | 1 | |
北京邮政122222-2222信箱 | 2 | |
北京邮政211111-1111信箱 | 3 | |
京211111-1111信箱 | 3 | |
天津100101-88信箱 | 4 | |
Remarks | The first 2 digits of the PO Box number and the first 2 characters of the city are evaluated. | |
Input | Cluster ID | |
Example 10 Sensitivities 50-54 |
北京邮政103333-3333信箱 | 1 |
北京邮政122222-2222信箱 | 1 | |
北京邮政211111-1111信箱 | 2 | |
京211111-1111信箱 | 2 | |
天津100101-88信箱 | 3 | |
津100101-88信箱 | 3 | |
上海100101-88信箱 | 4 | |
Remarks | The first digit of the PO Box number and the first 2 characters of the city are evaluated. |
Address (Street Only) | ||
---|---|---|
Description | The Address (Street Only) match definition generates match codes which can be used to cluster records containing the street portion of an address. Because addresses containing PO Box information are rare for mainland China, this definition is a copy of the Address match definition. | |
Max Length of Match Code | 237 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
人民北路群星广场开元大厦A单元2层2306室(底商) | 1 |
人民北路群星广场开元大厦A单元2层2306室(商铺) | 2 | |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 3 | |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 4 | |
Remarks | All components of the address are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
人民北路群星广场开元大厦A单元2层2306室(底商) | 1 |
人民北路群星广场开元大厦A单元2层2306室(商铺) | 1 | |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 2 | |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 3 | |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 4 | |
Remarks | Street, Block/Lane, Building, Unit, Floor, and Room are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
人民北路群星广场开元大厦A单元2层2306室(商铺) | 1 |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 1 | |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 2 | |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 3 | |
人民北路群星广场开天大厦B单元3层2307室(商铺) | 4 | |
Remarks | Street, Block/Lane, Building, Unit, and Floor are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
人民北路群星广场开元大厦A单元2层2307室(商铺) | 1 |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 1 | |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 2 | |
人民北路群星广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Street, Block/Lane, Building, and Unit are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
人民北路群星广场开元大厦A单元3层2307室(商铺) | 1 |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 1 | |
人民北路群星广场开天大厦B单元3层2307室(商铺) | 2 | |
人民北路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Street, Block/Lane, and Building are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
人民北路群星广场开元大厦B单元3层2307室(商铺) | 1 |
人民北路群星广场开天大厦B单元3层2307室(商铺) | 1 | |
人民北路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
人民南路群众广场开天大厦B单元3层2307室(商铺) | 3 | |
Remarks | Street and Block/Lane are evaluated. Different forms of some words will match. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 7 Sensitivities 50-69 |
人民幸福北路群星广场开天大厦B单元3层2307室(商铺) | 1 |
人民幸福北路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
人民幸福南路群众广场开天大厦B单元3层2307室(商铺) | 2 | |
Remarks | Fewer characters in Street and the same characters in Block/Lane are evaluated, since in sampling data Block/Lane is more popular than Street. Different forms of some words will match. Note that fewer characters in the address are considered as the sensitivity is lowered. |
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 85 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
乌鲁木齐市克孜勒苏柯尔克孜自治州 | 1 |
乌鲁木齐市克孜勒苏柯尔克孜地区 | 2 | |
昆明市西双版纳傣族自治州 | 3 | |
昆明市西双版纳傣族地区 | 4 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 11 Chinese characters of the district/prefecture/county and the first 6 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
乌鲁木齐市克孜勒苏柯尔克孜自治州 | 1 |
乌鲁木齐市克孜勒苏柯尔克孜地区 | 1 | |
昆明市西双版纳傣族自治州 | 2 | |
昆明市西双版纳傣族地区 | 3 | |
呼和浩特市阿拉善左旗 | 4 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 8 Chinese characters of the district/prefecture/county and the first 6 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
乌鲁木齐市克孜勒苏柯尔克孜自治州 | 1 |
乌鲁木齐市克孜勒苏柯尔克孜地区 | 1 | |
昆明市西双版纳傣族自治州 | 2 | |
昆明市西双版纳傣族地区 | 2 | |
呼和浩特市阿拉善左旗 | 3 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 6 Chinese characters of the district/prefecture/county and the first 6 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
乌鲁木齐市克孜勒苏柯尔克孜自治州 | 1 |
乌鲁木齐市克孜勒苏柯尔克孜地区 | 1 | |
昆明市西双版纳傣族自治州 | 2 | |
昆明市西双版纳傣族地区 | 2 | |
呼和浩特市阿拉善左旗 | 3 | |
呼和浩特市阿拉善右旗 | 4 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 4 Chinese characters of the district/prefecture/county and the first 6 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
昆明市西双版纳傣族地区 | 1 |
呼和浩特市阿拉善左旗 | 2 | |
呼和浩特市阿拉善右旗 | 2 | |
北京市昌平区 | 3 | |
北京市宣武区 | 4 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 3 Chinese characters of the district/prefecture/county and the first 6 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
昆明市西双版纳傣族地区 | 1 |
呼和浩特市阿拉善左旗 | 2 | |
呼和浩特市阿拉善右旗 | 2 | |
北京市昌平区 | 3 | |
北京市宣武区 | 4 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 2 Chinese characters of the district/prefecture/county and the first 6 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 7 Sensitivities 65-69 |
昆明市西双版纳傣族地区 | 1 |
呼和浩特市阿拉善左旗 | 2 | |
呼和浩特市阿拉善右旗 | 2 | |
北京市昌平区 | 3 | |
北京市宣武区 | 3 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 6 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 8 Sensitivities 60-64 |
北京市昌平区 | 1 |
北京市宣武区 | 1 | |
重庆市合川市 | 2 | |
重庆市涪陵区 | 3 | |
张家口市桥西区 | 4 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 4 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 9 Sensitivities 55-59 |
北京市昌平区 | 1 |
北京市宣武区 | 1 | |
重庆市合川市 | 2 | |
重庆市涪陵区 | 2 | |
张家口市桥西区 | 3 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 3 Chinese characters of the city are evaluated. | |
Input | Cluster ID | |
Example 10 Sensitivities 50-54 |
北京市昌平区 | 1 |
北京市宣武区 | 1 | |
重庆市合川市 | 2 | |
重庆市涪陵区 | 2 | |
张家口市桥西区 | 3 | |
Beijing | 5 | |
北京 | 5 | |
Remarks | The first 2 Chinese characters of the city are evaluated. |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 36 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
福建省泉州市丰田572000 | 1 |
福建省泉州市丰泽572000 | 1 | |
福建省泉州市丰泽572001 | 2 | |
福建省泉州市丰泽572012 | 3 | |
Remarks | Province, City, and Postal Code are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
福建省泉州市丰泽572000 | 1 |
福建省泉州市丰泽572001 | 1 | |
福建省泉州市丰泽572012 | 2 | |
福建省泉州市丰泽572123 | 3 | |
Remarks | Province, City, and the first 5 digits of Postal Code are evaluated. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
福建省泉州市丰泽572001 | 1 |
福建省泉州市丰泽572012 | 1 | |
福建省泉州市丰泽572123 | 2 | |
福建省泉州市丰泽573234 | 3 | |
Remarks | Province, City, and the first 4 digits of Postal Code are evaluated. | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
福建省泉州市丰泽572012 | 1 |
福建省泉州市丰泽572123 | 1 | |
福建省泉州市丰泽573234 | 2 | |
福建省泉州市丰泽584345 | 3 | |
Remarks | Province, City, and the first 3 digits of Postal Code are evaluated. | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
福建省泉州市丰泽572123 | 1 |
福建省泉州市丰泽573234 | 1 | |
福建省泉州市丰泽584345 | 2 | |
福建省泉州市丰泽FR-12345 | 3 | |
Remarks | Province, City, and the first 2 digits of Postal Code are evaluated. | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
福建省泉州市丰泽573234 | 1 |
福建省泉州市丰泽584345 | 1 | |
福建省泉州市丰泽FR-12345 | 2 | |
福建省泉州市丰泽12345 | 2 | |
Remarks | Province, City, and the first digit of Postal Code is evaluated. | |
Input | Cluster ID | |
Example 7 Sensitivities 65-69 |
福建省泉州市丰泽012345 | 1 |
福建省厦门市朝阳234567 | 2 | |
厦门市朝阳234567 | 3 | |
广东省深圳市福田345678 | 4 | |
Remarks | Only Province and City are evaluated. | |
Input | Cluster ID | |
Example 8 Sensitivities 60-64 |
福建省泉州市丰泽012345 | 1 |
福建省厦门市朝阳234567 | 2 | |
厦门市朝阳234567 | 2 | |
广东省深圳市福田345678 | 3 | |
Remarks | Only City is evaluated. |
Date | ||
---|---|---|
Description | The Date match definition generates match codes which can be used to cluster records containing date information. | |
Max Length of Match Code | 15 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 85-100 |
2009-10-21 | 1 |
2009年10月21日 | 1 | |
2009/10/21 | 1 | |
10/21/2009 | 1 | |
21-Oct-09 | 1 | |
二零零九年十月二十一日 | 1 | |
2009/10/22 | 2 | |
2009/10/23 | 3 | |
Remarks | All digits of the year, month, and day are evaluated. Full-width and half-with characters match. Chinese numerals and Arabic numerals match. Any separators (including Chinese characters) match. Names of months match the corresponding digits that represent those months. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-84 |
2009/10/21 | 1 |
2009/10/22 | 1 | |
2009/10/12 | 2 | |
Remarks | All digits of the year and month are evaluated. Only one digit of the day is evaluated. Full-width and half-with characters match. Chinese numerals and Arabic numerals match. Any separators (including Chinese characters) match. Names of months match the corresponding digits that represent those months. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. | |
Input | Cluster ID | |
Example 3 Sensitivities 75-79 |
2009/10/15 | 1 |
2009/10/22 | 1 | |
2009/11/12 | 2 | |
Remarks | All digits of the year and month are evaluated. The day is ignored. Full-width and half-with characters match. Chinese numerals and Arabic numerals match. Any separators (including Chinese characters) match. Names of months match the corresponding digits that represent those months. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. | |
Input | Cluster ID | |
Example 4 Sensitivities 70-74 |
2009/10/15 | 1 |
2009/11/22 | 1 | |
2009/09/12 | 2 | |
Remarks | All digits of the year are evaluated. Only one digit of the month is evaluated. The day is ignored. Full-width and half-with characters match. Chinese numerals and Arabic numerals match. Any separators (including Chinese characters) match. Names of months match the corresponding digits that represent those months. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. | |
Input | Cluster ID | |
Example 5 Sensitivities 65-69 |
2009/10/15 | 1 |
2009/11/22 | 1 | |
2008/09/12 | 2 | |
Remarks | All digits of the year are evaluated. The month and day are ignored. Full-width and half-with characters match. Chinese numerals and Arabic numerals match. Any separators (including Chinese characters) match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. | |
Input | Cluster ID | |
Example 6 Sensitivities 60-64 |
2009/10/15 | 1 |
2008/11/22 | 1 | |
2012/09/12 | 2 | |
Remarks | Only the first 3 digits of the year are evaluated. The month and day are ignored. Full-width and half-with characters match. Chinese numerals and Arabic numerals match. Any separators (including Chinese characters) match. Names of months match the corresponding digits that represent those months. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. | |
Input | Cluster ID | |
Example 7 Sensitivities 50-59 |
2009/10/15 | 1 |
2012/11/22 | 1 | |
1990/09/12 | 2 | |
Remarks | Only the first 2 digits of the year are evaluated. The month and day are ignored. Full-width and half-with characters match. Chinese numerals and Arabic numerals match. Any separators (including Chinese characters) match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. |
Name | ||
---|---|---|
Description | The Name match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 21 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
李友琴先生 | 1 |
李友琴 | 1 | |
李友勤女士 | 2 | |
黎友琴(总经理) | 3 | |
LIYOUQIN | 4 | |
Remarks | The family name and given name are evaluated. The 5-bit pinyin code to the Family Name token and the 5-bit pinyin code to the Given Name token are applied. The first 11 digits for family name pinyin code and the 12th-21th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 85-89 |
李期勤 | 1 |
李期 | 2 | |
李奇 | 3 | |
黎友 | 4 | |
Remarks | The family name and given name are evaluated. The 5-bit pinyin code to the Family Name token and the 5-bit pinyin code to the Given Name token are applied. The first 10 digits for family name pinyin code and the 12th-21th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 3 Sensitivities 80-84 |
李友琴 | 1 |
李友勤 | 1 | |
黎友琴 | 2 | |
LIYOUQIN | 3 | |
黎又青 | 4 | |
Remarks | The family name and given name are evaluated. The 5-bit pinyin code to the Family Name token and the 3-bit pinyin code to the Given Name token are applied. The first 9 digits for family name pinyin code and the 12th-17th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 4 Sensitivities 75-79 |
黎友琴 | 1 |
黎又青 | 1 | |
李期勤 | 2 | |
李期 | 3 | |
黎友 | 4 | |
Remarks | The family name and given name are evaluated. The 3-bit pinyin code to the Family Name token and the 3-bit pinyin code to the Given Name token are applied. The first 8 digits for family name pinyin code and the 12th-16th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 5 Sensitivities 70-74 |
李期 | 1 |
李奇 | 1 | |
黎友 | 2 | |
欧阳修 | 3 | |
欧阳休期 | 4 | |
Remarks | The family name and given name are evaluated. The 3-bit pinyin code to the Family Name token and the 3-bit pinyin code to the Given Name token are applied. The first 7 digits for family name pinyin code and the 12th-15th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 6 Sensitivities 65-69 |
李期勤 | 1 |
李期 | 1 | |
李奇 | 1 | |
欧阳修 | 2 | |
欧阳休期 | 2 | |
Remarks | The family name and given name are evaluated. The 3-bit pinyin code to the Family Name token and the 3-bit pinyin code to the Given Name token are applied. The first 6 digits for family name pinyin code and the 12th-14th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 7 Sensitivities 60-64 |
李友琴 | 1 |
李友勤 | 1 | |
黎友 | 1 | |
李期勤 | 2 | |
李奇 | 2 | |
Remarks | The family name and given name are evaluated. The 3-bit pinyin code to the Family Name token and the 3-bit pinyin code to the Given Name token are applied. The first 5 digits for family name pinyin code and the 12th-13th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 8 Sensitivities 55-59 |
李奇 | 1 |
黎友 | 2 | |
Remarks | The family name and given name are evaluated. The 3-bit pinyin code to the Family Name token and the 3-bit pinyin code to the Given Name token are applied. The first 4 digits for family name pinyin code and the 12th digits for given name pinyin code are evaluated. | |
Input | Cluster ID | |
Example 9 Sensitivities 50-54 |
李奇 | 1 |
黎友 | 1 | |
Remarks | Only the family name is evaluated. The 3-bit pinyin code to the Family Name token and the 3-bit pinyin code to the Given Name token are applied. The first 3 digits for family name pinyin code are evaluated. |
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 100 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
中国农业银行上海市分行 | 1 |
中国农业银行北京市分行 | 2 | |
中国石化 | 3 | |
中国石油化工股份有限公司 | 3 | |
西门子医疗设备有限公司广东分公司 | 4 | |
广东西门子医疗设备有限公司 | 5 | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
中国农业银行上海市分行 | 1 |
中国农业银行北京市分行 | 2 | |
中国石化 | 3 | |
中国石油化工股份有限公司 | 3 | |
广东西门子医疗设备有限公司 | 4 | |
西门子医疗设备有限公司广东分公司 | 4 | |
西门子医疗设备有限公司广东省分公司 | 5 | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
中国农业银行上海市分行 | 1 |
中国农业银行北京市分行 | 2 | |
中国石化 | 3 | |
中国石油化工股份有限公司 | 3 | |
广东西门子医疗设备有限公司 | 4 | |
西门子医疗设备有限公司广东分公司 | 4 | |
西门子医疗设备有限公司广东省分公司 | 4 | |
Input | Cluster ID | |
Example 4 Sensitivities 75-84 |
中国农业银行上海市分行 | 1 |
中国农业银行北京市分行 | 1 | |
中国石化 | 2 | |
中国石油化工股份有限公司 | 2 | |
广东西门子医疗设备有限公司 | 3 | |
西门子医疗设备有限公司广东分公司 | 3 | |
西门子医疗设备有限公司广东省分公司 | 3 | |
上海西门子医疗设备有限公司广东分公司 | 4 | |
Input | Cluster ID | |
Example 5 Sensitivities 65-74 |
中国农业银行上海市分行 | 1 |
中国农业银行北京市分行 | 1 | |
中国石化 | 2 | |
中国石油化工股份有限公司 | 2 | |
广东西门子医疗设备有限公司 | 3 | |
西门子医疗设备有限公司广东分公司 | 3 | |
西门子医疗设备有限公司广东省分公司 | 3 | |
上海西门子医疗设备有限公司广东分公司 | 3 | |
天津三星视界移动有限公司 | 4 | |
天津三星视界有限公司 | 5 | |
Input | Cluster ID | |
Example 6 Sensitivities 55-64 |
中国农业银行上海市分行 | 1 |
中国农业银行北京市分行 | 1 | |
中国石化 | 2 | |
中国石油化工股份有限公司 | 2 | |
广东西门子医疗设备有限公司 | 3 | |
西门子医疗设备有限公司广东分公司 | 3 | |
西门子医疗设备有限公司广东省分公司 | 3 | |
上海西门子医疗设备有限公司广东分公司 | 3 | |
天津三星视界移动有限公司 | 4 | |
天津三星视界有限公司 | 4 | |
上海三星通信技术有限公司 | 5 | |
Input | Cluster ID | |
Example 7 Sensitivities 50-54 |
中国农业银行上海市分行 | 1 |
中国农业银行北京市分行 | 1 | |
中国石化 | 1 | |
中国石油化工股份有限公司 | 1 | |
广东西门子医疗设备有限公司 | 2 | |
西门子医疗设备有限公司广东分公司 | 2 | |
西门子医疗设备有限公司广东省分公司 | 2 | |
上海西门子医疗设备有限公司广东分公司 | 2 | |
天津三星视界移动有限公司 | 3 | |
天津三星视界有限公司 | 3 | |
上海三星通信技术有限公司 | 3 | |
Remarks | For sensitivities 85-100, name and site information are evaluated. Legal forms and additional info are ignored. For sensitivities 50-84, only name is evaluated. Note that fewer characters are considered as the sensitivity is lowered. |
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 22 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
+682 356648 | 1 |
+683 356648 | 2 | |
593 6784569 | 3 | |
594 6784569 | 4 | |
3456789 | 5 | |
3456780 | 6 | |
34567891 | 7 | |
34567890 | 8 | |
4473000 ext 12345 | 9 | |
4473000 ext 12346 | 9 | |
4473000 ext 12356 | 10 | |
Remarks | First three digits of the country code are evaluated. All digits of the area code are evaluated. All seven characters of a 7-digit base number are evaluated. All eight characters of an 8-digit base number are evaluated. First four characters of the extension are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-95 |
+682 356648 | 1 |
+683 356648 | 2 | |
593 6784569 | 3 | |
594 6784569 | 4 | |
3456789 | 5 | |
3456780 | 6 | |
34567891 | 7 | |
34567890 | 8 | |
4473000 ext 12345 | 9 | |
4473000 ext 12346 | 9 | |
4473000 ext 12356 | 9 | |
Remarks | First three digits of the country code are evaluated. All digits of the area code are evaluated. All seven digits of a 7-digit base number are evaluated. All eight digits of an 8-digit base number are evaluated. First two digits of the extension are evaluated. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
+682 356648 | 1 |
+683 356648 | 2 | |
593 6784569 | 3 | |
594 6784569 | 4 | |
3456789 | 5 | |
3456780 | 5 | |
3456700 | 6 | |
34567891 | 7 | |
34567890 | 7 | |
34567800 | 8 | |
4473000 ext 12345 | 9 | |
4473000 ext 12346 | 9 | |
4473000 ext 12356 | 9 | |
Remarks | First three digits of the country code are evaluated. All digits of the area code are evaluated. First six digits of a 7-digit base number are evaluated. First seven digits of an 8-digit base number are evaluated. Extension is not evaluated. | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
+682 356648 | 1 |
+683 356648 | 1 | |
+62 356648 | 2 | |
593 6784569 | 3 | |
594 6784569 | 4 | |
3456789 | 5 | |
3456780 | 5 | |
3456700 | 6 | |
34567891 | 7 | |
34567890 | 7 | |
34567800 | 8 | |
4473000 ext 12345 | 9 | |
4473000 ext 12346 | 9 | |
4473000 ext 12356 | 9 | |
Remarks | First two digits of the country code are evaluated. All digits of the area code are evaluated. First six digits of a 7-digit base number are evaluated. First seven digits of an 8-digit base number are evaluated. Extension is not evaluated. | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
+682 356648 | 1 |
+683 356648 | 1 | |
+62 356648 | 2 | |
593 6784569 | 3 | |
594 6784569 | 4 | |
3456789 | 5 | |
3456780 | 5 | |
3456700 | 6 | |
34567891 | 7 | |
34567890 | 7 | |
34567800 | 7 | |
34567000 | 8 | |
4473000 ext 12345 | 9 | |
4473000 ext 12346 | 9 | |
4473000 ext 12356 | 9 | |
Remarks | First two digits of the country code are evaluated. All digits of the area code are evaluated. First six digits of a 7-digit base number are evaluated. First six digits of an 8-digit base number are evaluated. Extension is not evaluated. | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
+682 356648 | 1 |
+683 356648 | 1 | |
+62 356648 | 2 | |
593 6784569 | 3 | |
594 6784569 | 3 | |
580 6784569 | 4 | |
3456789 | 5 | |
3456780 | 5 | |
3456700 | 5 | |
3456000 | 6 | |
34567891 | 7 | |
34567890 | 7 | |
34567800 | 7 | |
34567000 | 8 | |
4473000 ext 12345 | 9 | |
4473000 ext 12346 | 9 | |
4473000 ext 12356 | 9 | |
Remarks | First two digits of the country code are evaluated. First two digits of the area code are evaluated. First five digits of a 7-digit base number are evaluated. First six digits of an 8-digit base number are evaluated. Extension is not evaluated. | |
Input | Cluster ID | |
Example 7 Sensitivities 65-69 |
+682 356648 | 1 |
+683 356648 | 1 | |
+62 356648 | 1 | |
593 6784569 | 2 | |
594 6784569 | 2 | |
580 6784569 | 2 | |
633 6784569 | 3 | |
3456789 | 4 | |
3456780 | 4 | |
3456700 | 4 | |
3456000 | 4 | |
3450000 | 5 | |
34567891 | 6 | |
34567890 | 6 | |
34567800 | 6 | |
34567000 | 6 | |
34560000 | 7 | |
4473000 ext 12345 | 8 | |
4473000 ext 12346 | 8 | |
4473000 ext 12356 | 8 | |
Remarks | Country code is not evaluated. First digit of the area code is evaluated. First four digits of a 7-digit base number are evaluated. First five digits of an 8-digit base number are evaluated. Extension is not evaluated. | |
Input | Cluster ID | |
Example 8 Sensitivities 60-64 |
+682 356648 | 1 |
+683 356648 | 1 | |
+62 356648 | 1 | |
593 6784569 | 2 | |
594 6784569 | 2 | |
580 6784569 | 2 | |
633 6784569 | 2 | |
3456789 | 3 | |
3456780 | 3 | |
3456700 | 3 | |
3456000 | 3 | |
3450000 | 5 | |
34567891 | 6 | |
34567890 | 6 | |
34567800 | 6 | |
34567000 | 6 | |
34560000 | 6 | |
34500000 | 7 | |
4473000 ext 12345 | 8 | |
4473000 ext 12346 | 8 | |
4473000 ext 12356 | 8 | |
Remarks | Country code is not evaluated. Area code is not evaluated. First four digits of a 7-digit base number are evaluated. First four digits of an 8-digit base number are evaluated. Extension is not evaluated. | |
Input | Cluster ID | |
Example 9 Sensitivities 55-59 |
+682 356648 | 1 |
+683 356648 | 1 | |
+62 356648 | 1 | |
593 6784569 | 2 | |
594 6784569 | 2 | |
580 6784569 | 2 | |
633 6784569 | 2 | |
3456789 | 3 | |
3456780 | 3 | |
3456700 | 3 | |
3456000 | 3 | |
3450000 | 3 | |
3400000 | 5 | |
34567891 | 6 | |
34567890 | 6 | |
34567800 | 6 | |
34567000 | 6 | |
34560000 | 6 | |
34500000 | 6 | |
34000000 | 7 | |
4473000 ext 12345 | 8 | |
4473000 ext 12346 | 8 | |
4473000 ext 12356 | 8 | |
Remarks | Country code is not evaluated. Area code is not evaluated. First three digits of a 7-digit base number are evaluated. First three digits of an 8-digit base number are evaluated. Extension is not evaluated. | |
Input | Cluster ID | |
Example 10 Sensitivities 50-54 |
+682 356648 | 1 |
+683 356648 | 1 | |
+62 356648 | 1 | |
593 6784569 | 2 | |
594 6784569 | 2 | |
580 6784569 | 2 | |
633 6784569 | 2 | |
3456789 | 3 | |
3456780 | 3 | |
3456700 | 3 | |
3456000 | 3 | |
3450000 | 3 | |
3400000 | 3 | |
3000000 | 5 | |
34567891 | 6 | |
34567890 | 6 | |
34567800 | 6 | |
34567000 | 6 | |
34560000 | 6 | |
34500000 | 6 | |
34000000 | 6 | |
30000000 | 7 | |
4473000 ext 12345 | 8 | |
4473000 ext 12346 | 8 | |
4473000 ext 12356 | 8 | |
Remarks | Country code is not evaluated. Area code is not evaluated. First two digits of a 7-digit base number are evaluated. First two digits of an 8-digit base number are evaluated. Extension is not evaluated. |
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 16 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
654321 | 1 |
654321 | 1 | |
CN-654321 | 1 | |
邮编654321 | 1 | |
654322 | 2 | |
Remarks | All 6 digits of the domestic postal code are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
654321 | 1 |
654321 | 1 | |
CN-654321 | 1 | |
邮编654321 | 1 | |
654322 | 1 | |
654332 | 2 | |
Remarks | The first 5 digits of the domestic postal code are evaluated. | |
Input | Cluster ID | |
Example 3 Sensitivities 80-89 |
654321 | 1 |
654321 | 1 | |
CN-654321 | 1 | |
邮编654321 | 1 | |
654322 | 1 | |
654332 | 1 | |
654432 | 2 | |
Remarks | The first 4 digits of the domestic postal code are evaluated. | |
Input | Cluster ID | |
Example 4 Sensitivities 70-79 |
654321 | 1 |
654321 | 1 | |
CN-654321 | 1 | |
邮编654321 | 1 | |
654322 | 1 | |
654332 | 1 | |
654432 | 1 | |
655432 | 2 | |
Remarks | The first 3 digits of the domestic postal code are evaluated. | |
Input | Cluster ID | |
Example 5 Sensitivities 55-69 |
654321 | 1 |
654321 | 1 | |
CN-654321 | 1 | |
邮编654321 | 1 | |
654322 | 1 | |
654332 | 1 | |
654432 | 1 | |
655432 | 1 | |
665432 | 2 | |
Remarks | The first 2 digits of the domestic postal code are evaluated. | |
Input | Cluster ID | |
Example 6 Sensitivities 50-54 |
654321 | 1 |
654321 | 1 | |
CN-654321 | 1 | |
邮编654321 | 1 | |
654322 | 1 | |
654332 | 1 | |
654432 | 1 | |
655432 | 1 | |
665432 | 1 | |
765432 | 2 | |
Remarks | The first digit of the domestic postal code is evaluated. |
State/Province | ||
---|---|---|
Description | The State/Province match definition generates match codes which can be used to cluster records containing states and provinces. | |
Max Length of Match Code | 40 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
新疆 | 1 |
新疆省 | 1 | |
新疆维吾尔自治区 | 1 | |
新 疆 | 1 | |
澳门特别行政区 | 2 | |
澳门行政区 | 2 | |
澳门特区 | 2 | |
澳 | 2 | |
广西壮族自治区 | 3 | |
广西省 | 3 | |
广西 | 3 | |
广东省 | 4 | |
广东 | 4 | |
Remarks | The first 8 Chinese characters of the province are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 85-94 |
新疆 | 1 |
新疆省 | 1 | |
新疆维吾尔自治区 | 1 | |
新 疆 | 1 | |
澳门特别行政区 | 2 | |
澳门行政区 | 2 | |
澳门特区 | 2 | |
澳 | 2 | |
广西壮族自治区 | 3 | |
广西省 | 3 | |
广西 | 3 | |
广东省 | 4 | |
广东 | 4 | |
Remarks | The first 5 Chinese characters of the province are evaluated. | |
Input | Cluster ID | |
Example 3 Sensitivities 80-84 |
新疆 | 1 |
新疆省 | 1 | |
新疆维吾尔自治区 | 1 | |
新 疆 | 1 | |
澳门特别行政区 | 2 | |
澳门行政区 | 2 | |
澳门特区 | 2 | |
澳 | 2 | |
广西壮族自治区 | 3 | |
广西省 | 3 | |
广西 | 3 | |
广东省 | 4 | |
广东 | 4 | |
Remarks | The first 4 Chinese characters of the province are evaluated. | |
Input | Cluster ID | |
Example 4 Sensitivities 70-79 |
新疆 | 1 |
新疆省 | 1 | |
新疆维吾尔自治区 | 1 | |
新 疆 | 1 | |
澳门特别行政区 | 2 | |
澳门行政区 | 2 | |
澳门特区 | 2 | |
澳 | 2 | |
广西壮族自治区 | 3 | |
广西省 | 3 | |
广西 | 3 | |
广东省 | 4 | |
广东 | 4 | |
Remarks | The first 3 Chinese characters of the province are evaluated. | |
Input | Cluster ID | |
Example 5 Sensitivities 65-69 |
新疆 | 1 |
新疆省 | 1 | |
新疆维吾尔自治区 | 1 | |
新 疆 | 1 | |
澳门特别行政区 | 2 | |
澳门行政区 | 2 | |
澳门特区 | 2 | |
澳 | 2 | |
广西壮族自治区 | 3 | |
广西省 | 3 | |
广西 | 3 | |
广东省 | 4 | |
广东 | 4 | |
Remarks | The first 2 Chinese characters of the province are evaluated. | |
Input | Cluster ID | |
Example 6 Sensitivities 50-64 |
新疆 | 1 |
新疆省 | 1 | |
新疆维吾尔自治区 | 1 | |
新 疆 | 1 | |
澳门特别行政区 | 2 | |
澳门行政区 | 2 | |
澳门特区 | 2 | |
澳 | 2 | |
广西壮族自治区 | 3 | |
广西省 | 3 | |
广西 | 3 | |
广东省 | 3 | |
广东 | 3 | |
Remarks | The first Chinese character of the province is evaluated. |
Address | |||
---|---|---|---|
Description | The Parse definition for Address parses address information. | ||
Output Tokens | Street Block/Lane Building Unit Floor Room Additional Info |
||
Input | Output | ||
Example 1 | 星光大道62号海王星科技大厦A座6楼 | Street | 星光大道62号 |
Block/Lane | |||
Building | 海王星科技大厦 | ||
Unit | A座 | ||
Floor | 6楼 | ||
Room | |||
Additional Info | |||
Input | Output | ||
Example 2 | 佳和国小区24号楼2单元602 | Street | |
Block/Lane | 佳和国小区 | ||
Building | 24号楼 | ||
Unit | 2单元 | ||
Floor | |||
Room | 602 | ||
Additional Info | |||
Input | Output | ||
Example 3 | 建设路295号云南天达光伏科技股份有限公司组装车 | Street | 建设路295号 |
Block/Lane | |||
Building | |||
Unit | |||
Floor | |||
Room | |||
Additional Info | 云南天达光伏科技股份有限公司组装车 | ||
Input | Output | ||
Example 4 | 芳园南里西区8号楼C段 | Street | |
Block/Lane | 芳园南里西区 | ||
Building | 8号楼 | ||
Unit | C段 | ||
Floor | |||
Room | |||
Additional Info | |||
Input | Output | ||
Example 5 | 东风西路195号广州医学院教学学术交流中心大厦A座101室、202室 | Street | 东风西路195号 |
Block/Lane | 广州医学院 | ||
Building | 教学学术交流中心大厦 | ||
Unit | A座 | ||
Floor | |||
Room | 101室、202室 | ||
Additional Info | |||
Remarks |
Address (Full) | |||
---|---|---|---|
Description | The Parse definition for Address (Full) parses full two-line addresses. | ||
Output Tokens | Province City District/Prefecture/County Town/Village Street Block/Lane Building Unit Floor Room Additional Info Postal Code |
||
Input | Output | ||
Example 1 | 北京宣武区宣武门外大街10号庄胜广场北翼19层205 | Province | |
City | 北京 | ||
District/Prefecture/County | 宣武区 | ||
Town/Village | |||
Street | 宣武门外大街10号 | ||
Block/Lane | 庄胜广场 | ||
Building | 北翼 | ||
Unit | |||
Floor | 19层 | ||
Room | 205 | ||
Additional Info | |||
Postal Code | |||
Input | Output | ||
Example 2 | 北京市门头沟区永定镇侯庄子村76号 | Province | |
City | 北京市 | ||
District/Prefecture/County | 门头沟区 | ||
Town/Village | 永定镇侯庄子村76号 | ||
Street | |||
Block/Lane | |||
Building | |||
Unit | |||
Floor | |||
Room | |||
Additional Info | |||
Postal Code | |||
Input | Output | ||
Example 3 | 北京市宣武区新安中里5-1-101号 | Province | |
City | 北京市 | ||
District/Prefecture/County | 宣武区 | ||
Town/Village | |||
Street | |||
Block/Lane | 新安中里 | ||
Building | |||
Unit | |||
Floor | |||
Room | 5-1-101号 | ||
Additional Info | |||
Postal Code | |||
Input | Output | ||
Example 4 | 深圳龙岗区雅豪苑6栋3单元301号 | Province | |
City | 深圳 | ||
District/Prefecture/County | 龙岗区 | ||
Town/Village | |||
Street | |||
Block/Lane | 雅豪苑 | ||
Building | 6栋 | ||
Unit | 3单元 | ||
Floor | |||
Room | 301号 | ||
Additional Info | |||
Postal Code | |||
Input | Output | ||
Example 5 | 邮政编码:014000 包头市九原区哈林格尔镇(滨河路河西生态化工基地1号) | Province | |
City | 包头市 | ||
District/Prefecture/County | 九原区 | ||
Town/Village | 哈林格尔镇 | ||
Street | |||
Block/Lane | |||
Building | |||
Unit | |||
Floor | |||
Room | |||
Additional Info | (滨河路河西生态化工基地1号) | ||
Postal Code | 邮政编码:014000 | ||
Input | Output | ||
Example 6 | 北京市东城区安定门西滨河路22号(神华大厦)五、六层100000 | Province | |
City | 北京市 | ||
District/Prefecture/County | 东城区 | ||
Town/Village | |||
Street | 安定门西滨河路22号 | ||
Block/Lane | |||
Building | |||
Unit | |||
Floor | 五、六层 | ||
Room | |||
Additional Info | (神华大厦) | ||
Postal Code | 100000 | ||
Input | Output | ||
Example 7 | 北京市西城区复兴门内大街28号凯晨世贸中心中座8层 邮编:100000 | Province | |
City | 北京市 | ||
District/Prefecture/County | 西城区 | ||
Town/Village | |||
Street | 复兴门内大街28号 | ||
Block/Lane | 凯晨世贸中心 | ||
Building | |||
Unit | 中座 | ||
Floor | 8层 | ||
Room | |||
Additional Info | |||
Postal Code | 邮编:100000 | ||
Input | Output | ||
Example 8 | P.C. (100000) 宣武区鸭子桥路24号510室 | Province | |
City | |||
District/Prefecture/County | 宣武区 | ||
Town/Village | |||
Street | 鸭子桥路24号 | ||
Block/Lane | |||
Building | |||
Unit | |||
Floor | |||
Room | 510室 | ||
Additional Info | |||
Postal Code | P.C. (100000) | ||
Remarks | Province, city, and district/prefecture/county names are recognized with or without an indicator keyword. Postal codes can only be recognized as 6-digit numeric strings at the beginning or end of the full address. |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output | ||
Example 1 | 星光大道62号海王星科技大厦A座6楼 | Recipient | |
Building/Site | 海王星科技大厦A座 | ||
Street | 星光大道62号 | ||
Extension | 6楼 | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 2 | 佳和国小区24号楼2单元602 | Recipient | |
Building/Site | 24号楼2单元 | ||
Street | 佳和国小区 | ||
Extension | 602 | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 3 | 建设路295号云南天达光伏科技股份有限公司组装车 | Recipient | |
Building/Site | |||
Street | 建设路295号 | ||
Extension | |||
PO Box | |||
Additional Info | 云南天达光伏科技股份有限公司组装车 | ||
Input | Output | ||
Example 4 | 芳园南里西区8号楼C段 | Recipient | |
Building/Site | 8号楼C段 | ||
Street | 芳园南里西区 | ||
Extension | |||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 5 | 东风西路195号广州医学院教学学术交流中心大厦A座101室、202室 | Recipient | |
Building/Site | 教学学术交流中心大厦A座 | ||
Street | 东风西路195号广州医学院 | ||
Extension | 101室、202室 | ||
PO Box | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
Address (Global) (v23) | |||
---|---|---|---|
Description |
The Address (Global) (v23) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output | ||
Example 1 | 星光大道62号海王星科技大厦A座6楼 | Recipient | |
Building/Site | 海王星科技大厦A座 | ||
Street | 星光大道62号 | ||
Extension | 6楼 | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 2 | 佳和国小区24号楼2单元602 | Recipient | |
Building/Site | 24号楼2单元 | ||
Street | 佳和国小区 | ||
Extension | 602 | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 3 | 建设路295号云南天达光伏科技股份有限公司组装车 | Recipient | |
Building/Site | |||
Street | 建设路295号 | ||
Extension | |||
PO Box | |||
Additional Info | 云南天达光伏科技股份有限公司组装车 | ||
Input | Output | ||
Example 4 | 芳园南里西区8号楼C段 | Recipient | |
Building/Site | 8号楼C段 | ||
Street | 芳园南里西区 | ||
Extension | |||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 5 | 东风西路195号广州医学院教学学术交流中心大厦A座101室、202室 | Recipient | |
Building/Site | 教学学术交流中心大厦A座 | ||
Street | 东风西路195号广州医学院 | ||
Extension | 101室、202室 | ||
PO Box | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
City | |||
---|---|---|---|
Description | The Parse definition for City parses city and district/prefecture/county names. | ||
Output Tokens | City District/Prefecture/County Additional Info |
||
Input | Output | ||
Example 1 | 北京市昌平区* | City | 北京市 |
District/Prefecture/County | 昌平区 | ||
Additional Info | |||
Input | Output | ||
Example 2 | 深圳市宝安区3区 | City | 深圳市 |
District/Prefecture/County | 宝安区3区 | ||
Additional Info | |||
Input | Output | ||
Example 3 | 北京市(密云县) | City | 北京市 |
District/Prefecture/County | (密云县) | ||
Additional Info | |||
Input | Output | ||
Example 4 | 北京市宣武区南部,西部 | City | 北京市 |
District/Prefecture/County | 宣武区 | ||
Additional Info | 南部,西部 | ||
Remarks | Recognizes city names with or without identifier keywords ("市"). |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | The Parse definition for City - State/Province - Postal Code parses address last line data, which typically includes province, city, and postal code information. | ||
Output Tokens | City State/Province Additional Info Postal Code |
||
Input | Output | ||
Example 1 | 北京市100020* | City | |
State/Province | 北京市 | ||
Additional Info | |||
Postal Code | 100020 | ||
Input | Output | ||
Example 2 | 江苏扬州(仪征市区西)678300 | City | 江苏 |
State/Province | 扬州 | ||
Additional Info | (仪征市区西) | ||
Postal Code | 678300 | ||
Input | Output | ||
Example 3 | 邮编:231300 遵义市 | City | |
State/Province | 遵义市 | ||
Additional Info | |||
Postal Code | 邮编:231300 | ||
Input | Output | ||
Example 4 | 淄博市 邮政编码242200 | City | |
State/Province | 淄博市 | ||
Additional Info | |||
Postal Code | 邮政编码242200 | ||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The Parse definition for City - State/Province - Postal Code (Global) parses address last line data into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Input | Output | ||
Example 1 | 北京市100020* | City | 北京市 |
State/Province | |||
Postal Code | 100020 | ||
Additional Info | |||
Input | Output | ||
Example 2 | 江苏扬州(仪征市区西)678300 | City | 扬州 |
State/Province | 江苏 | ||
Postal Code | 678300 | ||
Additional Info | (仪征市区西) | ||
Input | Output | ||
Example 3 | 邮编:231300 遵义市 | City | 遵义市 |
State/Province | |||
Postal Code | 邮编:231300 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Date | |||
---|---|---|---|
Description | The Parse definition for Date parses date information. | ||
Output Tokens | Year Month Day |
||
Input | Output | ||
Example 1 | 2009/10/21 | Year | 2009 |
Month | 10 | ||
Day | 21 | ||
Input | Output | ||
Example 2 | 二零零九年十月二十一日 | Year | 二零零九年 |
Month | 十月 | ||
Day | 二十一日 | ||
Input | Output | ||
Example 3 | 14Mar, 2001 | Year | 2001 |
Month | Mar | ||
Day | 14 | ||
Input | Output | ||
Example 4 | 20091021 | Year | 2009 |
Month | 10 | ||
Day | 21 | ||
Remarks |
ID Number | |||
---|---|---|---|
Description | The Parse definition for ID Number parses ID number information. | ||
Output Tokens | Province Code City/Prefecture Code District/County Code Birth Year Birth Month Birth Day Sequence Code Validation Code |
||
Input | Output | ||
Example 1 | 130503196704010012 | Province Code | 13 |
City/Prefecture Code | 05 | ||
District/County Code | 03 | ||
Birth Year | 1967 | ||
Birth Month | 04 | ||
Birth Day | 01 | ||
Sequence Code | 001 | ||
Validation Code | 2 | ||
Input | Output | ||
Example 2 | 130503670401001 | Province Code | 13 |
City/Prefecture Code | 05 | ||
District/County Code | 03 | ||
Birth Year | 67 | ||
Birth Month | 04 | ||
Birth Day | 01 | ||
Sequence Code | 001 | ||
Validation Code | |||
Remarks |
Name | |||
---|---|---|---|
Description | The Parse definition for Name parses names of individuals. | ||
Output Tokens | Family Name Given Name Suffix Title/Additional Info |
||
Input | Output | ||
Example 1 | 陈胜华 | Family Name | 陈 |
Given Name | 胜华 | ||
Suffix | |||
Title/Additional Info | |||
Input | Output | ||
Example 2 | 李大伟,博士(中国区总裁) | Family Name | 李 |
Given Name | 大伟 | ||
Suffix | |||
Title/Additional Info | 博士(中国区总裁) | ||
Input | Output | ||
Example 3 | 司徒怀,先生(中国区总裁) | Family Name | 司徒 |
Given Name | 怀 | ||
Suffix | 先生 | ||
Title/Additional Info | (中国区总裁) | ||
Remarks |
Name (Global) | |||
---|---|---|---|
Description | The Parse definition for Name (Global) parses names of individuals into a globally recognized set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Input | Output | ||
Example | 陈胜华 | Prefix | |
Given Name | 胜华 | ||
Middle Name | |||
Family Name | 陈 | ||
Suffix | |||
Title/Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Organization | |||
---|---|---|---|
Description | The Parse definition for Organization parses organization names. | ||
Output Tokens | Name Legal Form Site Additional Info |
||
Input | Output | ||
Example 1 | 无锡市城市环境卫生有限公司 | Name | 城市环境卫生 |
Legal Form | 有限公司 | ||
Site | 无锡市 | ||
Additional Info | |||
Input | Output | ||
Example 2 | 国华(呼伦贝尔)风电有限公司南京办事处 | Name | 国华(呼伦贝尔)风电 |
Legal Form | 有限公司 | ||
Site | 南京办事处 | ||
Additional Info | |||
Input | Output | ||
Example 3 | 国华(呼伦贝尔)风电有限公司开发一处 | Name | 国华 风电 |
Legal Form | 有限公司 | ||
Site | (呼伦贝尔) | ||
Additional Info | 开发一处 | ||
Input | Output | ||
Example 4 | 香港华艺设计顾问(深圳)有限公司 | Name | 香港华艺设计顾问 |
Legal Form | 有限公司 | ||
Site | (深圳) | ||
Additional Info | |||
Input | Output | ||
Example 5 | 神华集团包头矿业有限责任公司运销处集装站 | Name | 神华集团包头矿业 |
Legal Form | 有限责任公司 | ||
Site | |||
Additional Info | 运销处集装站 | ||
Input | Output | ||
Example 6 | 北京中铁特货冷藏物流有限公司(已于2009年1月8日撤消) | Name | 中铁特货冷藏物流 |
Legal Form | 有限公司 | ||
Site | 北京 | ||
Additional Info | (已于2009年1月8日撤消) | ||
Input | Output | ||
Example 7 | 北京大学计算机学院 | Name | 北京大学 |
Legal Form | |||
Site | |||
Additional Info | 计算机学院 | ||
Input | Output | ||
Example 8 | 赛仕软件研究开发(北京)有限公司(收) | Name | 赛仕软件研究开发 |
Legal Form | 有限公司 | ||
Site | (北京) | ||
Additional Info | |||
Remarks |
Organization (Global) | |||
---|---|---|---|
Description | The Parse definition for Organization (Global) parses organization names into a globally recognized set of tokens. | ||
Output Tokens | Name Legal Form Site Additional Info |
||
Input | Output | ||
Example 1 | 无锡市城市环境卫生有限公司 | Name | 城市环境卫生 |
Legal Form | 有限公司 | ||
Site | 无锡市 | ||
Additional Info | |||
Input | Output | ||
Example 2 | 长安汽车(集团)有限责任公司北京分公司 | Name | 长安汽车(集团) |
Legal Form | 有限责任公司 | ||
Site | 北京分公司 | ||
Additional Info | |||
Input | Output | ||
Example 3 | 国华(呼伦贝尔)风电有限公司南京办事处 | Name | 国华(呼伦贝尔)风电 |
Legal Form | 有限公司 | ||
Site | 南京办事处 | ||
Additional Info | |||
Input | Output | ||
Example 4 | 国华(呼伦贝尔)风电有限公司开发一处 | Name | 国华 风电 |
Legal Form | 有限公司 | ||
Site | (呼伦贝尔) | ||
Additional Info | 开发一处 | ||
Input | Output | ||
Example 5 | 香港华艺设计顾问(深圳)有限公司 | Name | 香港华艺设计顾问 |
Legal Form | 有限公司 | ||
Site | (深圳) | ||
Additional Info | |||
Input | Output | ||
Example 6 | 神华集团包头矿业有限责任公司运销处集装站 | Name | 神华集团包头矿业 |
Legal Form | 有限责任公司 | ||
Site | |||
Additional Info | 运销处集装站 | ||
Input | Output | ||
Example 7 | 北京中铁特货冷藏物流有限公司(已于2009年1月8日撤消) | Name | 中铁特货冷藏物流 |
Legal Form | 有限公司 | ||
Site | 北京 | ||
Additional Info | (已于2009年1月8日撤消) | ||
Input | Output | ||
Example 8 | 北京大学计算机学院 | Name | 北京大学 |
Legal Form | |||
Site | |||
Additional Info | 计算机学院 | ||
Input | Output | ||
Example 9 | 赛仕软件研究开发(北京)有限公司(收) | Name | 赛仕软件研究开发 |
Legal Form | 有限公司 | ||
Site | (北京) | ||
Additional Info | |||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Phone | |||
---|---|---|---|
Description | The Parse definition for Phone parses phone numbers into a set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Input | Output | ||
Example 1 | Tel(+86)10 8319 3355-3636 办公电话 | Country Code | 86 |
Area Code | 10 | ||
Base Number | 8319 3355 | ||
Extension | 3636 | ||
Line Type | Tel | ||
Additional Info | 办公电话 | ||
Input | Output | ||
Example 2 | (+86)10 8319 3355-3636 办公电话 | Country Code | 86 |
Area Code | 10 | ||
Base Number | 8319 3355 | ||
Extension | 3636 | ||
Line Type | 办公电话 | ||
Additional Info | |||
Input | Output | ||
Example 3 | TEL(0319)7456537 | Country Code | |
Area Code | 0319 | ||
Base Number | 7456537 | ||
Extension | |||
Line Type | TEL | ||
Additional Info | |||
Input | Output | ||
Example 4 | 手机13412345678 | Country Code | |
Area Code | 134 | ||
Base Number | 12345678 | ||
Extension | |||
Line Type | 手机 | ||
Additional Info | |||
Input | Output | ||
Example 5 | +1 919-447-3000 | Country Code | 1 |
Area Code | |||
Base Number | 919-447-3000 | ||
Extension | |||
Line Type | |||
Additional Info | |||
Remarks | Mobile vendor ID is parsed into the Area Code token. |
Phone (Global) | |||
---|---|---|---|
Description | The Parse definition for Phone (Global) parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Input | Output | ||
Example 1 | Tel(+86)10 8319 3355-3636 办公电话 | Country Code | +86 |
Area Code | 10 | ||
Base Number | 8319 3355 | ||
Extension | 3636 | ||
Line Type | Tel | ||
Additional Info | 办公电话 | ||
Input | Output | ||
Example 2 | (+86)10 8319 3355-3636 办公电话 | Country Code | +86 |
Area Code | 10 | ||
Base Number | 8319 3355 | ||
Extension | 3636 | ||
Line Type | 办公电话 | ||
Additional Info | |||
Input | Output | ||
Example 3 | TEL(0319)7456537 | Country Code | |
Area Code | 0319 | ||
Base Number | 7456537 | ||
Extension | |||
Line Type | TEL | ||
Additional Info | |||
Input | Output | ||
Example 4 | 手机13412345678 | Country Code | |
Area Code | 134 | ||
Base Number | 12345678 | ||
Extension | |||
Line Type | 手机 | ||
Additional Info | |||
Input | Output | ||
Example 5 | +1 919-447-3000 | Country Code | +1 |
Area Code | |||
Base Number | 919-447-3000 | ||
Extension | |||
Line Type | |||
Additional Info | |||
Remarks | Mobile vendor ID is parsed into the Area Code token. Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
None.
Address | ||
---|---|---|
Description | Standardizes address information. | |
Input | Output | |
Examples | "青年大道3号" | 青年大道3号 |
临江路2号(工商银行六楼) | 临江路2号 工商银行六楼 | |
凯旋路451号1楼 , 4楼 | 凯旋路451号1层, 4层 | |
益田新村106栋22G | 益田新村106栋22G | |
福中路15号大院肆层4-803 | 福中路15号大院4层4-803 | |
黄兴路2005弄1号23号楼a座 | 黄兴路2005弄1号23号楼A座 | |
Remarks | Floor identifier is standardized to "层". Full-width alphanumeric characters are converted to half-width characters. Non-logical characters are removed: quotes, blanks, and so on. Chinese numerals in Unit, Floor, Room information are converted to Arabic numerals. All English letters are converted to upper case. |
Address (Full) | ||
---|---|---|
Description | Standardizes full two-line addresses. | |
Input | Output | |
Examples | 广东深圳福田益田村 | 广东省深圳市福田区益田村 |
杭州市凯旋路451号1楼 , 4楼 | 杭州市凯旋路451号1层, 4层 | |
广东省深圳市福田区益田村106栋22G | 广东省深圳市福田区益田村106栋22G | |
广东省深圳市福田区福中路15号大院肆层4-803 | 广东省深圳市福田区福中路15号大院4层4-803 | |
邮政编码:014000 包头市九原区哈林格尔镇(滨河路河西生态化工基地1号) | 包头市九原区哈林格尔镇 滨河路河西生态化工基地1号 014000 | |
北京市西城区复兴门内大街28号凯晨世贸中心中座8层 邮编:100000 | 北京市西城区复兴门内大街28号凯晨世贸中心中座8层 100000 | |
P.C. (100000) 宣武区鸭子桥路24号510室 | 宣武区鸭子桥路24号510室 100000 | |
Remarks | Province, city and district/prefecture/county identifier keywords are added when possible. Floor identifier is standardized to "层". Full-width alphanumeric characters are converted to half-width characters. Non-logical characters are removed: quotes, blanks, and so on. Chinese numerals within unit, floor, and room information are converted to Arabic numerals. All English letters are converted to upper case. |
City | ||
---|---|---|
Description | Standardizes city and district/prefecture/county names. | |
Input | Output | |
Examples | 北京市(密云县) | 北京市密云县 |
北京市宣武区南部,西部 | 北京市宣武区, 南部, 西部 | |
深圳盐田 | 深圳市盐田区 | |
北京市密云县南行10公里 | 北京市密云县, 南行10公里 | |
Remarks | Adds city identifier keywords when possible. Removes non-logical characters: quotes, blanks, and so on. |
City - State/Province - Postal Code | ||
---|---|---|
Description | Standardizes address "last line" data, which typically includes province, city and postal code information. | |
Input | Output | |
Examples | 北京市100052 | 北京市 100052 |
福建省泉州市572000 | 福建省泉州市 572000 | |
北京市CN-100052 | 北京市 100052 | |
邮政编码242200 安徽省蚌埠市 | 安徽省蚌埠市 242200 | |
安徽安庆潜山(邮编)242500 | 安徽省安庆市 (潜山) 242500 | |
Remarks | Add Identifier when possible. Remove non-logical characters: quotes, blanks, and so on. |
Date (Chinese Calendar) | |||
---|---|---|---|
Description | Standardizes date expressions to Chinese calendar format. | ||
Input | Output | Explanation | |
Examples | 2009/10/21 | 2009年10月21日 | Standardize calendar identifier to YYYY年MM月DD日 format. |
二零零九年十月二十一日 | 2009年10月21日 | ||
14-Mar-01 | 2001年03月14日 | Standardize month name. When the day and year are ambiguous, consider the last number to be the year. | |
2009/10/21 | 2009年10月21日 | Convert full-width to half-width. | |
20091021 | 2009年10月21日 | 8 digits are considered to be YYYYMMDD format. | |
Remarks | Supports dates from 1901 to 2050. Assumes two-digit years 00-29 are 2000-2029. Assumes two-digit years 30-99 are 1930-1999. |
Date (Western Calendar) | |||
---|---|---|---|
Description | Standardizes date expressions to Western calendar format. | ||
Input | Output | Explanation | |
Examples | (2009/10/21) | 2009/10/21 | Standardize calendar identifier to YYYY/MM/DD format. |
二零零九年十月二十一日 | 2009/10/21 | ||
14-Mar-01 | 2001/03/14 | Standardize month name. When the day and year are ambiguous, consider the last number to be the year. | |
2009/10/21 | 2009/10/21 | Convert full-width to half-width. | |
20091021 | 2009/10/21 | 8 digits are considered to be YYYYMMDD format. | |
Remarks | Supports dates from 1901 to 2050. Assumes two-digit years 00-29 are 2000-2029. Assumes two-digit years 30-99 are 1930-1999. |
ID Number | ||
---|---|---|
Description | Standardizes ID numbers. | |
Input | Output | |
Examples | 130503196704010012 | 130503196704010012 |
(130503196704010012) | 130503196704010012 | |
Remarks |
Name | ||
---|---|---|
Description | Standardizes names of individuals. | |
Input | Output | |
Examples | 李大伟先生 | 李大伟 先生 |
“刘丽” | 刘丽 | |
司徒怀,先生(中国区总裁) | 司徒怀 先生 中国区总裁 | |
Remarks |
Organization | ||
---|---|---|
Description | Standardizes organization names. | |
Input | Output | |
Examples | 碧丽服装有限公司 | 碧丽服装 有限公司 |
香港华艺设计顾问(深圳)有限公司 | 香港华艺设计顾问 有限责任公司, 深圳 | |
DATAFLUX, INC | DataFlux Inc | |
中国石化集团洛阳石油化工工程公司 | 中国石油化工集团, 洛阳, 石油化工工程公司 | |
上海mwb互感器有限公司 | MWB互感器 有限责任公司, 上海 | |
碧丽服装(北京)有限公司上海分公司 | 碧丽服装(北京) 有限责任公司, 上海分公司 | |
Remarks | Full-width ASCII characters are transformed to half-width. |
Phone | ||
---|---|---|
Description | Standardizes phone numbers for domestic use. | |
Input | Output | |
Examples | 采购部电话010-12345678 | (010) 12345678, 采购部电话 |
82741510 转 345 | 82741510 x345 | |
+86 03197456537 | (0319) 7456537 | |
13512345678 | 135 12345678 | |
1082741510 | (010) 82741510 | |
0044 (0)20 12345000 | +44 2012345000 | |
Remarks |
Phone (Electronic) | ||
---|---|---|
Description | Standardizes phone numbers for automated calling systems. | |
Input | Output | |
Example | 采购部电话010-12345678 内線123 | +861012345678 |
Remarks |
Phone (with Country Code) | ||
---|---|---|
Description | Standardizes phone numbers for international use. | |
Input | Output | |
Example | 采购部电话010-12345678 | +86 10 12345678, 采购部电话 |
Remarks |
Postal Code | ||
---|---|---|
Description | Standardizes postal codes. | |
Input | Output | |
Examples | 邮编242500 | 242500 |
242-500 | 242500 | |
CN-100052 | 100052 | |
FR 12345 | FR-12345 | |
37100 | 037100 | |
Remarks | Identifies domestic postal code patterns with potentially missing leading zeroes and adds them to the input, as in the final example. |
Postal Code (with Country Code) | ||
---|---|---|
Description | Standardizes postal codes for international use. | |
Input | Output | |
Examples | 邮编242500 | CN-242500 |
242-500 | CN-242500 | |
CN 100052 | CN-100052 | |
FR 12345 | FR-12345 | |
37100 | CN-037100 | |
Remarks | Identifies domestic postal code patterns with potentially missing leading zeroes and adds them to the input, as in the final example. Uses international formatting, with no spaces. |
State/Province | ||
---|---|---|
Description | Standardizes province information. | |
Input | Output | |
Examples | 内蒙古 | 内蒙古自治区 |
"浙江省" | 浙江省 | |
江西 | 江西省 | |
香港 | 香港特别行政区 | |
鲁 | 山东省 | |
Remarks | Adds province identifier keywords when possible. Converts aliases to full names. Removes non-logical characters: quotes, blanks, and so on. |
In addition to the definitions listed on this page, the Chinese, China locale also inherits all definitions for the Chinese language and all Global definitions.
Documentation Feedback: yourturn@sas.com
|
Doc ID: QKBCI_ZHCHN_defs.html |