SAS Quality Knowledge Base for Contact Information 25
Definitions for the Japanese, Japan locale are described below.
Case Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Upper (Address) | ||
---|---|---|
Description | The Case definition for Upper (Address) uppercases Latin characters found in address data. | |
Input | Output | |
Examples | 1-13-1 イヌイビル・カチドキ 8f 501 | 1-13-1 イヌイビル・カチドキ 8F 501 |
3丁目53-2Biz原宿2F | 3丁目53-2BIZ原宿2F | |
2丁目jrタワーオフィス15階札幌駅総合開発内 | 2丁目JRタワーオフィス15階札幌駅総合開発内 | |
Remarks |
Upper (Organization) | ||
---|---|---|
Description | The Case definition for Upper (Organization) uppercases Latin characters found in organization names. Well-known words are propercased where appropriate. | |
Input | Output | |
Examples | Aigスター生命保険株式会社 | AIGスター生命保険株式会社 |
cfg株式会社 | CFG株式会社 | |
Necエレクトロニクス株式会社 | NECエレクトロニクス株式会社 | |
dataflux, a sas company | DataFlux, A SAS Company | |
Remarks | Certain well-known company names are propercased. |
Name | ||
---|---|---|
Description | The Gender Analysis definition for Name makes a best guess at the genders of names. | |
Possible Outputs | M F U |
|
Input | Output | |
Examples | 鈴木一郎 | M |
山田花子 | F | |
マークスミス | U | |
Remarks | Non-Japanese names are not evaluated; they will produce output 'U'. |
Individual/Organization | ||
---|---|---|
Description | The Identification Analysis definition for Individual/Organization determines whether a string represents the name of an individual or an organization. | |
Possible Outputs | INDIVIDUAL ORGANIZATION UNKNOWN |
|
Input | Output | |
Examples | 株式会社ソニー | ORGANIZATION |
田中健二 | INDIVIDUAL | |
田中健二商店 | ORGANIZATION | |
リチャードパシフィック | UNKNOWN | |
Remarks |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 108 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
1丁目14の9タナカ大宮桜木ビル5F502(総務部内) | 1 |
1丁目14の9タナカ大宮桜木町ビル5F502(開発部内) | 2 | |
1-2-3 新宿郵便局 私書箱第456号 | 3 | |
1-2-3 郵便事業株式会社 新宿支店 私書箱第456号 | 4 | |
Remarks | All components of the address are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-89 |
1丁目14の9タナカ大宮桜木ビル5F502(総務部内) | 1 |
1丁目14の9タナカ大宮桜木町ビル5F502(開発部内) | 1 | |
一丁目四-二二大山生命宇都宮南ビル16階1623号室 | 2 | |
一丁目四-二二大山生命宇都宮北ビル16階1635号室 | 3 | |
1-2-3 新宿郵便局 私書箱第456号 | 4 | |
1-2-3 郵便事業株式会社 新宿支店 私書箱第456号 | 4 | |
1-2-3 新宿東郵便局 私書箱第456号 | 5 | |
1-2-3 新宿西郵便局 私書箱第456号 | 5 | |
Remarks | Block, building, floor, room, and PO Box info are evaluated. Different forms of some words will match. | |
Input | Cluster ID | |
Example 3 Sensitivities 70-79 |
一丁目四-二二大山生命宇都宮南ビル16階1623号室 | 1 |
一丁目四-二二大山生命宇都宮北ビル16階1635号室 | 1 | |
2丁目6の3五ツ橋ビル3階302 | 2 | |
2丁目6の3五ツ橋ビル5階502 | 3 | |
1-2-3 新宿東郵便局 私書箱第456号 | 4 | |
1-2-3 新宿西郵便局 私書箱第456号 | 4 | |
Remarks | Block, building, floor, room, and PO Box info are evaluated. Different forms of some words will match. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 60-69 |
2丁目6の3五ツ橋ビル3階302 | 1 |
2丁目6の3五ツ橋ビル5階502 | 1 | |
6丁目27-18ABC横浜別館 | 2 | |
6丁目27-19ABD会館 | 3 | |
1-2-3 新宿東郵便局 私書箱第456号 | 4 | |
1-2-3 新宿西郵便局 私書箱第456号 | 4 | |
Remarks | Block, building, and PO Box info are evaluated. Different forms of some words will match. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 5
Sensitivities 50-59 |
6丁目27-18ABC横浜別館 | 1 |
6丁目27-19ABD会館 | 1 | |
1丁目3-4 | 2 | |
1丁目3-5 | 3 | |
Remarks | Block and PO Box info are evaluated. Different forms of some words will match. Note that fewer characters in the address are considered as the sensitivity is lowered. |
Address (Full) | ||
---|---|---|
Description | The Address (Full) match definition generates match codes which can be used to cluster records containing complete two-line addresses. | |
Max Length of Match Code | 195 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
104-0054 東京都中央区勝どき1-13-1 イヌイビルカチドキ8F 801号室 (開発部内) | 1 |
104ー0054 とうきょうと中央区勝どき1-13-1 イヌイビル勝どき8階 801号室 | 2 | |
123-4567 渋谷新宿西郵便局 私書箱第456号 | 3 | |
123-4567 郵便事業株式会社 渋谷新宿東支店 私書箱第456号 | 4 | |
Remarks | All components of the address are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-89 |
104-0054 東京都中央区勝どき1-13-1 イヌイビルカチドキ8F 801号室 (開発部内) | 1 |
104ー0054 とうきょうと中央区勝どき1-13-1 イヌイビル勝どき8階 801号室 | 1 | |
123-4567 渋谷新宿西郵便局 私書箱第456号 | 2 | |
123-4567 郵便事業株式会社 渋谷新宿東支店 私書箱第456号 | 2 | |
104-0052 東京都中央区月島1-13-1 ファミリータワー8F 801号室 | 3 | |
104-0052 中央区月島1-13-1 8F 801号室 | 4 | |
123-4567 新宿西郵便局 私書箱第456号 | 5 | |
123-4567 新宿東郵便局 私書箱第456号 | 5 | |
Remarks | All components of the address are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 70-79 |
104-0052 東京都中央区月島1-13-1 ファミリータワー8F 801号室 | 1 |
104-0052 中央区月島1-13-1 8F 801号室 | 1 | |
104-0053 中央区月島1-13-1 8F 801号室 | 2 | |
123-4567 新宿西郵便局 私書箱第456号 | 3 | |
123-4567 新宿東郵便局 私書箱第456号 | 3 | |
123-8901 渋谷郵便局 私書箱第456号 | 4 | |
Remarks | Prefecture and building name information are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 60-69 |
104-0052 中央区月島1-13-1 8F 801号室 | 1 |
104-0053 中央区月島1-13-1 8F 801号室 | 1 | |
123-4567 新宿東郵便局 私書箱第456号 | 2 | |
123-8901 渋谷郵便局 私書箱第456号 | 2 | |
460-1234 名古屋市緑区1-13-1 8F 801号室 | 3 | |
460-5678 名古屋市中区1-13-2 9F 901号室 | 4 | |
Remarks | Prefecture, building name, and PO Box information are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 5 Sensitivities 50-59 |
460-1234 名古屋市緑区1-13-1 8F 801号室 | 1 |
460-5678 名古屋市中区1-13-2 9F 901号室 | 1 | |
Remarks | Only postal code, primary city, block, and PO Box numbers are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. |
Address (PO Box Only) | ||
---|---|---|
Description | The Address (PO Box Only) match definition generates match codes which can be used to cluster records containing the PO Box portion of an address. | |
Max Length of Match Code | 40 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
郵便事業株式会社 新東京支店 私書箱123号 | 1 |
郵便事業株式会社 新東京支店 私書箱123号 | 1 | |
新東京郵便局 私書箱123号 | 2 | |
Remarks | Full-width and half-width expressions match. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 2 Sensitivities 70-89 |
郵便事業株式会社 新東京支店 私書箱123号 | 1 |
新東京郵便局 私書箱123号 | 1 | |
横浜北郵便局 私書箱1234号 | 2 | |
横浜南郵便局 私書箱1234号 | 3 | |
Remarks | Different representations of the same post office match. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 50-69 |
横浜北郵便局 私書箱1234号 | 1 |
横浜南郵便局 私書箱1234号 | 1 | |
Remarks | Only PO Box numbers are evaluated. Post office names are ignored. Note that fewer characters in the address are considered as the sensitivity is lowered. |
Address (Street Only) | ||
---|---|---|
Description | The Address (Street Only) match definition generates match codes which can be used to cluster records containing the street portion of an address. | |
Max Length of Match Code | 68 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
1丁目14の9タナカ大宮桜木ビル5F502(総務部内) | 1 |
1丁目14の9タナカ大宮桜木町ビル5F502(開発部内) | 2 | |
Remarks | All components of the address are evaluated except for PO Box info. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-89 |
1丁目14の9タナカ大宮桜木ビル5F502(総務部内) | 1 |
1丁目14の9タナカ大宮桜木町ビル5F502(開発部内) | 1 | |
一丁目四-二二大山生命宇都宮南ビル16階1623号室 | 2 | |
一丁目四-二二大山生命宇都宮北ビル16階1635号室 | 3 | |
Remarks | Block, building, floor, and room information are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 70-79 |
一丁目四-二二大山生命宇都宮南ビル16階1623号室 | 1 |
一丁目四-二二大山生命宇都宮北ビル16階1635号室 | 1 | |
2丁目6の3五ツ橋ビル3階302 | 2 | |
2丁目6の3五ツ橋ビル5階502 | 3 | |
Remarks | Block, building, floor, and room information are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 60-69 |
2丁目6の3五ツ橋ビル3階302 | 1 |
2丁目6の3五ツ橋ビル5階502 | 1 | |
6丁目27-18ABC横浜別館 | 2 | |
6丁目27-19ABD会館 | 3 | |
Remarks | Only block and building information are evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 5 Sensitivities 50-59 |
6丁目27-18ABC横浜別館 | 1 |
6丁目27-19ABD会館 | 1 | |
1丁目3-4 | 2 | |
1丁目3-5 | 3 | |
Remarks | Only block information is evaluated. Note that fewer characters in the address are considered as the sensitivity is lowered. |
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 80 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
なごやしちくさくいまいけみなみ | 1 |
ナゴヤシチクサクイマイケミナミ | 1 | |
名古屋市千種区今池南 | 1 | |
札幌市白石区菊水元町八条 | 2 | |
札幌市白石区菊水元町九条 | 3 | |
Remarks | City and town names are evaluated. Kanji and kana are matched. | |
Input | Cluster ID | |
Example 2 Sensitivities 85-89 |
札幌市白石区菊水元町八条 | 1 |
札幌市白石区菊水元町九条 | 1 | |
札幌市豊平区美園三条 | 2 | |
札幌市豊平区美園四条 | 3 | |
Remarks | City and town names are evaluated. Kanji and kana are matched. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 75-84 |
札幌市白石区菊水元町八条 | 1 |
札幌市豊平区美園三条 | 2 | |
札幌市豊平区美園四条 | 2 | |
Remarks | City and town names are evaluated. Kanji and kana are matched. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 65-74 |
札幌市白石区菊水元町八条 | 1 |
札幌市白石区北郷三条 | 1 | |
札幌市中央区中島公園 | 2 | |
利尻郡利尻町 | 3 | |
利尻郡利尻富士町 | 4 | |
帯広市空港南町 | 5 | |
Remarks | City names are evaluated. Town names are ignored. Kanji and kana are matched. | |
Input | Cluster ID | |
Example 5 Sensitivities 55-64 |
札幌市白石区菊水元町八条 | 1 |
札幌市白石区北郷三条 | 1 | |
札幌市中央区中島公園 | 2 | |
利尻郡利尻町 | 3 | |
利尻郡利尻富士町 | 3 | |
帯広市空港南 | 4 | |
Remarks | City names are evaluated. Town names are ignored. Kanji and kana are matched. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 6 Sensitivities 50-54 |
札幌市白石区菊水元町八条 | 1 |
札幌市中央区中島公園 | 1 | |
帯広市空港南町 | 2 | |
Remarks | Only primary city names are evaluated. Secondary city and town information are ignored. |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information, which typically includes postal code, prefecture, and city information. | |
Max Length of Match Code | 104 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
104-0054 東京都中央区勝どき | 1 |
104-0054 トウキョウトチュウオウクカチドキ | 1 | |
アバシリグンオオゾラチョウメマンベツニシ2ジョウ | 2 | |
アバシリグンオオゾラチョウメマンベツニシ3ジョウ | 3 | |
Remarks | All components of the input string are evaluated. Kanji and kana are matched. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-89 |
104-0054 東京都中央区勝どき | 1 |
104-0054 トウキョウトチュウオウクカチドキ | 1 | |
ホッカイドウアバシリグンオオゾラチョウメマンベツニシ2ジョウ | 2 | |
ホッカイドウアバシリグンオオゾラチョウメマンベツニシ3ジョウ | 2 | |
神奈川県横浜市鶴見区市場下町 | 3 | |
神奈川県横浜市鶴見区市場東中町 | 4 | |
Remarks | All components of the input string are evaluated. Kanji and kana are matched. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 3 Sensitivities 70-79 |
〒230 神奈川県横浜市鶴見区市場下町 | 1 |
〒230 神奈川県横浜市鶴見区市場東中町 | 1 | |
〒230 神奈川県横浜市鶴見区汐入町 | 1 | |
〒230 神奈川県横浜市磯子区杉田 | 2 | |
Remarks | All components of the input string are evaluated except for the town name. Kanji and kana are matched. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 60-69 |
〒230-0043 神奈川県横浜市鶴見区汐入町 | 1 |
〒230-0073 神奈川県横浜市磯子区杉田 | 1 | |
〒212 神奈川県川崎市幸区南加瀬 | 2 | |
〒212 神奈川県鎌倉市幸区南加瀬 | 3 | |
Remarks | Only the postal code, prefecture, and primary city name are evaluated. Kanji and kana are matched. Note that fewer characters in the address are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 5 Sensitivities 50-59 |
〒212 神奈川県川崎市幸区南加瀬 | 1 |
〒212 神奈川県鎌倉市幸区南加瀬 | 1 | |
Remarks | Only the postal code, prefecture, and primary city name are evaluated. Kanji and kana are matched. Note that fewer characters in the address are considered as the sensitivity is lowered. |
Date | ||
---|---|---|
Description | The Date match definition generates match codes which can be used to cluster records containing date information. | |
Max Length of Match Code | 15 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 85-100 |
2001/3/14 | 1 |
2001年3月14日 | 1 | |
3/14/2001 | 1 | |
14-mar-01 | 1 | |
H13.3.14 | 1 | |
平成十三年三月十四日 | 1 | |
2001/3/14 | 1 | |
平成元年六月三十日 | 2 | |
昭和64年六月三十日 | 2 | |
2001/12/14 | 3 | |
2001/12/15 | 4 | |
Remarks | All digits of the year, month, and day are evaluated. Full-width and half-with characters match. Kanji numerals and Arabic numerals match. Any separators (including Japanese characters) match. Names of months match the corresponding digits that represent those months. Japanese Nengo years and Western years match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. When a year belongs to two Japanese Nengo, these two Japanese Nengo years match. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-84 |
2001/12/14 | 1 |
2001/12/15 | 1 | |
2001/12/24 | 2 | |
Remarks | All digits of the year and month are evaluated. Only one digit of the day is evaluated. Full-width and half-with characters match. Kanji numerals and Arabic numerals match. Any separators (including Japanese characters) match. Names of months match the corresponding digits that represent those months. Japanese Nengo years and Western years match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. When a year belongs to two Japanese Nengo, these two Japanese Nengo years match. | |
Input | Cluster ID | |
Example 3
Sensitivities 75-79 |
2001/12/15 | 1 |
2001/12/24 | 1 | |
2001/11/14 | 2 | |
Remarks | All digits of the year and month are evaluated. The day is ignored. Full-width and half-with characters match. Kanji numerals and Arabic numerals match. Any separators (including Japanese characters) match. Names of months match the corresponding digits that represent those months. Japanese Nengo years and Western years match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. When a year belongs to two Japanese Nengo, these two Japanese Nengo years match. | |
Input | Cluster ID | |
Example 4 Sensitivities 70-74 |
2001/12/24 | 1 |
2001/11/14 | 1 | |
2001/9/14 | 2 | |
Remarks | All digits of the year are evaluated. Only one digit of the month is evaluated. The day is ignored. Full-width and half-with characters match. Kanji numerals and Arabic numerals match. Any separators (including Japanese characters) match. Names of months match the corresponding digits that represent those months. Japanese Nengo years and Western years match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. When a year belongs to two Japanese Nengo, these two Japanese Nengo years match. | |
Input | Cluster ID | |
Example 5 Sensitivities 65-69 |
2001/11/14 | 1 |
2001/9/14 | 1 | |
2002/12/14 | 2 | |
Remarks | All digits of the year are evaluated. The month and day are ignored. Full-width and half-with characters match. Kanji numerals and Arabic numerals match. Any separators (including Japanese characters) match. Names of months match the corresponding digits that represent those months. Japanese Nengo years and Western years match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. When a year belongs to two Japanese Nengo, these two Japanese Nengo years match. | |
Input | Cluster ID | |
Example 6 Sensitivities 60-64 |
2001/9/14 | 1 |
2002/12/14 | 1 | |
2012/12/14 | 2 | |
Remarks | Only the first three digits of the year are evaluated. The month and day are ignored. Full-width and half-with characters match. Kanji numerals and Arabic numerals match. Any separators (including Japanese characters) match. Names of months match the corresponding digits that represent those months. Japanese Nengo years and Western years match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. When a year belongs to two Japanese Nengo, these two Japanese Nengo years match. | |
Input | Cluster ID | |
Example 7 Sensitivities 50-59 |
2002/12/14 | 1 |
2012/12/14 | 1 | |
1990/12/14 | 2 | |
Remarks | Only the first two digits of the year are evaluated. The month and day are ignored. Full-width and half-with characters match. Kanji numerals and Arabic numerals match. Any separators (including Japanese characters) match. Names of months match the corresponding digits that represent those months. Japanese Nengo years and Western years match. When the day and year are ambiguous, it is assumed that the last number is the year. It is assumed that two-digit sequences in the range 00-29 represent years in the range 2000-2029. It is assumed that two-digit sequences in the range 30-99 represent the years 1930-1999. When a year belongs to two Japanese Nengo, these two Japanese Nengo years match. |
Name | ||
---|---|---|
Description | The Name match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 20 characters | |
Input | Cluster | |
Example 1 Sensitivities 90-100 |
田中サチコ | 1 |
田中サチコ | 1 | |
田中さちこ | 2 | |
渡辺二郎 | 3 | |
渡邊二郎 | 4 | |
Remarks | The family name and given name are evaluated. Half-width and Full-width Katakana are matched. | |
Input | Cluster ID | |
Example 2 Sensitivities 85-89 |
田中サチコ | 1 |
田中さちこ | 1 | |
渡辺二郎 | 2 | |
渡邊二郎 | 2 | |
伊藤三郎 | 3 | |
伊東三郎 | 4 | |
いとう三郎 | 5 | |
Remarks | The family name and given name are evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. | |
Input | Cluster ID | |
Example 3 Sensitivities 80-84 |
伊藤三郎 | 1 |
伊東三郎 | 1 | |
いとう三郎 | 1 | |
佐藤美幸 | 2 | |
佐藤武 | 3 | |
Remarks | The family name and given name are evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Different variations of Kanji are matched. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. Some Kanji family names are matched with their Katakana, Hiragana, and Romaji representations. Note that fewer characters in the name are considered as the sensitivity is lowered. | |
Input | Cluster ID | |
Example 4 Sensitivities 75-79 |
伊藤三郎 | 1 |
伊東三郎 | 1 | |
いとう三郎 | 1 | |
佐藤美幸 | 2 | |
佐藤武 | 3 | |
Remarks | The family name and given name are evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Different variations of Kanji are matched. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. Some Kanji family names are matched with their Katakana, Hiragana, and Romaji representations. | |
Input | Cluster ID | |
Example 5 Sensitivities 70-74 |
伊藤三郎 | 1 |
伊東三郎 | 1 | |
いとう三郎 | 1 | |
佐藤美幸 | 2 | |
佐藤武 | 3 | |
Remarks | The family name and given name are evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Different variations of Kanji are matched. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. Some Kanji family names are matched with their Katakana, Hiragana, and Romaji representations. | |
Input | Cluster ID | |
Example 6 Sensitivities 65-69 |
伊藤三郎 | 1 |
伊東三郎 | 1 | |
いとう三郎 | 1 | |
佐藤美幸 | 2 | |
佐藤武 | 3 | |
Remarks | The family name and given name are evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Different variations of Kanji are matched. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. Some Kanji family names are matched with their Katakana, Hiragana, and Romaji representations. | |
Input | Cluster ID | |
Example 7 Sensitivities 60-64 |
伊藤三郎 | 1 |
伊東三郎 | 1 | |
いとう三郎 | 1 | |
佐藤美幸 | 2 | |
佐藤武 | 3 | |
Remarks | The family name and given name are evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Different variations of Kanji are matched. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. Some Kanji family names are matched with their Katakana, Hiragana, and Romaji representations. | |
Input | Cluster ID | |
Example 8 Sensitivities 55-59 |
佐藤美幸 | 1 |
佐藤武 | 2 | |
Remarks | The family name and given name are evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Different variations of Kanji are matched. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. Some Kanji family names are matched with their Katakana, Hiragana, and Romaji representations. | |
Input | Cluster ID | |
Example 9 Sensitivities 50-54 |
佐藤美幸 | 1 |
佐藤武 | 1 | |
Remarks | Only the family name is evaluated. "Old style" Kanji representations are matched to modern Kanji representations. Different variations of Kanji are matched. Half-width and Full-width Katakana are matched. Katakana, Hiragana, and Romaji are matched. Some Kanji family names are matched with their Katakana, Hiragana, and Romaji representations. |
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 35 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
タナカ鉄工株式会社 | 1 |
タナカ鉄工株式会社 | 1 | |
ソニー株式会社大阪第1工場 | 2 | |
ソニー株式会社大阪第2工場 | 3 | |
㈱ソニー北九州工場 | 4 | |
Input | Cluster ID | |
Example 2 Sensitivities 90-94 |
ソニー株式会社大阪第1工場 | 1 |
ソニー株式会社大阪第2工場 | 1 | |
㈱ソニー北九州工場 | 2 | |
國學院大學 | 3 | |
国学院大学 | 4 | |
株式会社くぼた | 5 | |
株式会社クボタ | 6 | |
Remarks | For sensitivities 90-100, Organization name and site information are evaluated. Half-width and full-width Katakana are matched. Company legal forms are ignored. | |
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
ソニー株式会社大阪第1工場 | 1 |
ソニー株式会社大阪第2工場 | 1 | |
㈱ソニー北九州工場 | 1 | |
國學院大學 | 2 | |
国学院大学 | 2 | |
株式会社くぼた | 3 | |
株式会社クボタ | 3 | |
インシュアランスとうきょう | 4 | |
インシュアランスとうきゅう | 5 | |
Input | Cluster ID | |
Example 4 Sensitivities 80-84 |
インシュアランスとうきょう | 1 |
インシュアランスとうきゅう | 1 | |
エービーシープライム生命保険 | 2 | |
エービーシープライム損害保険 | 3 | |
Input | Cluster ID | |
Example 5 Sensitivities 75-79 |
エービーシープライム生命保険 | 1 |
エービーシープライム損害保険 | 1 | |
KOGAファイナンス | 2 | |
KOGAファイナンシング | 3 | |
Input | Cluster ID | |
Example 6 Sensitivities 70-74 |
KOGAファイナンス | 1 |
KOGAファイナンシング | 1 | |
わたなべ金属加工株式会社 | 2 | |
わたなべ金属鍛造株式会社 | 3 | |
Input | Cluster ID | |
Example 7 Sensitivities 65-69 |
わたなべ金属加工株式会社 | 1 |
わたなべ金属鍛造株式会社 | 1 | |
(株)テレビキャスト | 2 | |
(株)テレビキョーワ | 3 | |
Input | Cluster ID | |
Example 8 Sensitivities 60-64 |
(株)テレビキャスト | 1 |
(株)テレビキョーワ | 1 | |
トヨタ自動車 | 2 | |
トヨタホーム | 3 | |
Input | Cluster ID | |
Example 9 Sensitivities 55-59 |
トヨタ自動車 | 1 |
トヨタホーム | 1 | |
パナソニック | 2 | |
パナホーム | 3 | |
Input | Cluster ID | |
Example 10 Sensitivities 50-54 |
パナソニック | 1 |
パナホーム | 1 | |
Remarks | For sensitivities 50-89, Organization name is evaluated. Half-width and full-width Katakana are matched. Company legal forms are ignored. Old style Kanji and modern Kanji are matched. Katakana and Hiragana are matched. |
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 22 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 95-100 |
(03)1234-5678 | 1 |
(03)1234-5678 | 1 | |
直通(+81)3-12345678 | 1 | |
0521234567 ext.123 | 2 | |
0521234567 ext.124 | 3 | |
0521234567 | 4 | |
1234567 | 5 | |
0591234567 | 6 | |
0541231234 | 7 | |
0541231235 | 8 | |
Remarks | ||
Input | Cluster ID | |
Example 2 Sensitivities 90-95 |
(03)1234-5678 | 1 |
(03)1234-5678 | 1 | |
直通(+81)3-12345678 | 1 | |
0521234567 ext.123 | 2 | |
0521234567 ext.124 | 2 | |
0521234567 | 3 | |
1234567 | 4 | |
0591234567 | 5 | |
0541231234 | 6 | |
0541231235 | 7 | |
Remarks | ||
Input | Cluster ID | |
Example 3 Sensitivities 85-89 |
(03)1234-5678 | 1 |
(03)1234-5678 | 1 | |
直通(+81)3-12345678 | 1 | |
0521234567 ext.123 | 2 | |
0521234567 ext.124 | 2 | |
0521234567 | 2 | |
1234567 | 3 | |
0591234567 | 4 | |
0541231234 | 5 | |
0541231235 | 5 | |
Remarks | ||
Input | Cluster ID | |
Example 4 Sensitivities 70-84 |
(03)1234-5678 | 1 |
(03)1234-5678 | 1 | |
直通(+81)3-12345678 | 1 | |
0521234567 ext.123 | 2 | |
0521234567 ext.124 | 2 | |
0521234567 | 2 | |
1234567 | 3 | |
0591234567 | 4 | |
0541231234 | 5 | |
Remarks | ||
Input | Cluster ID | |
Example 5 Sensitivities 65-69 |
(03)1234-5678 | 1 |
(03)1234-5678 | 1 | |
直通(+81)3-12345678 | 1 | |
0521234567 ext.123 | 2 | |
0521234567 ext.124 | 2 | |
0521234567 | 2 | |
0591234567 | 2 | |
0541231234 | 3 | |
1234567 | 4 | |
Remarks | ||
Input | Cluster ID | |
Example 6 Sensitivities 50-64 |
(03)1234-5678 | 1 |
(03)1234-5678 | 1 | |
直通(+81)3-12345678 | 1 | |
(04)12345678 ext.123 | 1 | |
(04)12345678 ext.124 | 1 | |
(04)12345678 | 1 | |
12345678 | 1 | |
(06)12345678 | 1 | |
(06)12349999 | 1 | |
Remarks | Note that the number of digits retained in the match code for the Country Code, Area Code, and Extension tokens depends on the sensitivity level. The number of digits retained in the match code for the Base Number depends on the sensitivity level and the number of digits in the Base Number input. |
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 85-100 |
104-0052 | 1 |
〒104-0052 | 1 | |
郵便番号一〇四の〇〇五二 | 1 | |
1040053 | 2 | |
1040054 | 3 | |
1040065 | 4 | |
1040123 | 5 | |
1050123 | 6 | |
Remarks | Primary and secondary postal codes are evaluated. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-84 |
104-0052 | 1 |
〒104-0052 | 1 | |
郵便番号一〇四の〇〇五二 | 1 | |
1040053 | 1 | |
1040054 | 1 | |
1040065 | 2 | |
1040123 | 3 | |
1050123 | 4 | |
Remarks | The last digit in secondary postal code is ignored. | |
Input | Cluster ID | |
Example 3
Sensitivities 70-79 |
104-0052 | 1 |
〒104-0052 | 1 | |
郵便番号一〇四の〇〇五二 | 1 | |
1040053 | 1 | |
1040054 | 1 | |
1040065 | 1 | |
1040123 | 2 | |
1050123 | 3 | |
Remarks | The last two digits in secondary postal code are ignored. | |
Input | Cluster ID | |
Example 4 Sensitivities 50-69 |
104-0052 | 1 |
〒104-0052 | 1 | |
郵便番号一〇四の〇〇五二 | 1 | |
1040053 | 1 | |
1040054 | 1 | |
1040065 | 1 | |
1040123 | 1 | |
1050123 | 2 | |
Remarks | Only primary postal code is evaluated. |
Prefecture | ||
---|---|---|
Description | The Prefecture match definition generates match codes which can be used to cluster records containing prefecture names. | |
Max Length of Match Code | 15 characters | |
Input | Cluster ID | |
Example 1 Sensitivities 90-100 |
愛知県 | 1 |
あいちけん | 1 | |
アイチケン | 1 | |
アイチ | 1 | |
カナガワケン | 2 | |
カナガクケン | 3 | |
カナカワケン | 4 | |
Remarks | Kanji and kana are matched. | |
Input | Cluster ID | |
Example 2 Sensitivities 80-89 |
愛知県 | 1 |
あいちけん | 1 | |
アイチケン | 1 | |
アイチ | 1 | |
カナガワケン | 2 | |
カナガクケン | 2 | |
カナカワケン | 3 | |
Remarks | Kanji and kana are matched. Note that fewer characters are considered as the sensitivity decreases. | |
Input | Cluster ID | |
Example 3 Sensitivities 50-79 |
愛知県 | 1 |
あいちけん | 1 | |
アイチケン | 1 | |
アイチ | 1 | |
カナガワケン | 2 | |
カナガクケン | 2 | |
カナカワケン | 2 | |
Remarks | Kanji and kana are matched. Note that fewer characters are considered as the sensitivity decreases. |
Address | |||
---|---|---|---|
Description | The Parse definition for Address parses address information. | ||
Output Tokens | Block Building Floor Room PO Box Additional Info |
||
Input | Output | ||
Example 1 | 1-13-1 イヌイビル・カチドキ8F | Block | 1-13-1 |
Building | イヌイビル・カチドキ | ||
Floor | 8F | ||
Room | |||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 2 | 二丁目五番八十二号山王パークビルB館5階502 (総務部内) | Block | 二丁目五番八十二号 |
Building | 山王パークビルB館 | ||
Floor | 5階 | ||
Room | 502 | ||
PO Box | |||
Additional Info | (総務部内) | ||
Input | Output | ||
Example 3 | 新東京郵便局 私書箱123 | Block | |
Building | |||
Floor | |||
Room | |||
PO Box | 新東京郵便局 私書箱123 | ||
Additional Info | |||
Remarks | The Parse definition for Address recognizes both ASCII and Kanji numerals. |
Address (Full) | |||
---|---|---|---|
Description | The Parse definition for Address (Full) parses full two-line addresses. | ||
Output Tokens | Postal Code Prefecture City Town Block Building Floor Room PO Box Additional Info |
||
Input | Output | ||
Example 1 | 104-0054 東京都中央区勝どき1-13-1 イヌイビルカチドキ8F 801号室 (開発部内) | Postal Code | 104-0054 |
Prefecture | 東京都 | ||
City | 中央区 | ||
Town | 勝どき | ||
Block | 1-13-1 | ||
Building | イヌイビルカチドキ | ||
Floor | 8F | ||
Room | 801号室 | ||
PO Box | |||
Additional Info | 開発部内 | ||
Input | Output | ||
Example 2 | 三四五-六七八九 新潟南魚沼郡湯沢町神立二丁目五番八十二号山王パークビルB館5階502 (総務部内) | Postal Code | 三四五-六七八九 |
Prefecture | 新潟 | ||
City | 南魚沼郡湯沢町 | ||
Town | 神立 | ||
Block | 二丁目五番八十二号 | ||
Building | 山王パークビルB館 | ||
Floor | 5階 | ||
Room | 502 | ||
PO Box | |||
Additional Info | 総務部内 | ||
Input | Output | ||
Example 3 | 〒163-8799 東京都新宿区 新宿郵便局 私書箱第123号 | Postal Code | 〒163-8799 |
Prefecture | 東京都 | ||
City | 新宿区 | ||
Town | |||
Block | |||
Building | |||
Floor | |||
Room | |||
PO Box | 新宿郵便局 私書箱第123号 | ||
Additional Info | |||
Remarks | Prefecture and city names are recognized with or without an indicator keyword. Both ASCII and Kanji numerals are recognized. |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output | ||
Example 1 | 1-13-1 イヌイビル・カチドキ8F | Recipient | |
Building/Site | イヌイビル・カチドキ | ||
Street | 1-13-1 | ||
Extension | 8F | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 2 | 二丁目五番八十二号山王パークビルB館5階502 (総務部内) | Recipient | |
Building/Site | 山王パークビルB館 | ||
Street | 二丁目五番八十二号 | ||
Extension | 5階 502 | ||
PO Box | |||
Additional Info | (総務部内) | ||
Input | Output | ||
Example 3 | 新東京郵便局 私書箱123 | Recipient | |
Building/Site | |||
Street | |||
Extension | |||
PO Box | 新東京郵便局 私書箱123 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
Address (Global) (v23) | |||
---|---|---|---|
Description |
The Address (Global) (v23) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output | ||
Example 1 | 1-13-1 イヌイビル・カチドキ8F | Recipient | |
Building/Site | イヌイビル・カチドキ | ||
Street | 1-13-1 | ||
Extension | 8F | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 2 | 二丁目五番八十二号山王パークビルB館5階502 (総務部内) | Recipient | |
Building/Site | 山王パークビルB館 | ||
Street | 二丁目五番八十二号 | ||
Extension | 5階 502 | ||
PO Box | |||
Additional Info | (総務部内) | ||
Input | Output | ||
Example 3 | 新東京郵便局 私書箱123 | Recipient | |
Building/Site | |||
Street | |||
Extension | |||
PO Box | 新東京郵便局 私書箱123 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
City | |||
---|---|---|---|
Description | The Parse definition for City parses city and town names. | ||
Output Tokens | City Town |
||
Input | Output | ||
Example 1 | 横浜市神奈川区白幡東町 | City | 横浜市神奈川区 |
Town | 白幡東町 | ||
Input | Output | ||
Example 2 | チュウオウクカチドキ | City | チュウオウク |
Town | カチドキ | ||
Input | Output | ||
Example 3 | 名古屋千種今池南 | City | 名古屋千種 |
Town | 今池南 | ||
Remarks | This definition recognizes Kanji, Hiragana, Full-width Katakana, and Half-width Katakana. Recognizes city names with or without identifier keywords ("市", "区", and so on). |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | The Parse definition for City - State/Province - Postal Code parses address "last line" data, which typically includes postal code, prefecture, and city information. | ||
Output Tokens | Postal Code Prefecture City Town |
||
Input | Output | ||
Example 1 | 104-0054 東京都中央区勝どき | Postal Code | 104-0054 |
Prefecture | 東京都 | ||
City | 中央区 | ||
Town | 勝どき | ||
Input | Output | ||
Example 2 | 〒1040054 トウキョウトチュウオウクカチドキ | Postal Code | 〒1040054 |
Prefecture | トウキョウト | ||
City | チュウオウク | ||
Town | カチドキ | ||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The Parse definition for City - State/Province - Postal Code (Global) parses address "last line" data into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Input | Output | ||
Example 1 | 3280011 栃木県栃木市大宮町 | City | 栃木市大宮町 |
State/Province | 栃木県 | ||
Postal Code | 3280011 | ||
Additional Info | |||
Input | Output | ||
Example 2 | 〒三二八ー〇〇一一栃木県栃木市大宮町 | City | 栃木市大宮町 |
State/Province | 栃木県 | ||
Postal Code | 〒三二八ー〇〇一一 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name | |||
---|---|---|---|
Description | The Parse definition for Name parses names of individuals. | ||
Output Tokens | Family Name Given Name Name Suffix Title/Additional Info |
||
Input | Output | ||
Example 1 | 鈴木一郎様 | Family Name | 鈴木 |
Given Name | 一郎 | ||
Name Suffix | 様 | ||
Title/Additional Info | |||
Input | Output | ||
Example 2 | 医学博士すずきいちろう殿 | Family Name | すずき |
Given Name | いちろう | ||
Name Suffix | 殿 | ||
Title/Additional Info | 医学博士 | ||
Input | Output | ||
Example 3 | 営業課長スズキイチロウ (一級建築士) | Family Name | スズキ |
Given Name | イチロウ | ||
Name Suffix | |||
Title/Additional Info | 営業課長 (一級建築士) | ||
Remarks |
Name (Global) | |||
---|---|---|---|
Description | The Parse definition for Name (Global) parses names of individuals into a globally recognized set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Input | Output | ||
Example 1 | 鈴木一郎様 | Prefix | |
Given Name | 一郎 | ||
Middle Name | |||
Family Name | 鈴木 | ||
Suffix | 様 | ||
Title/Additional Info | |||
Input | Output | ||
Example 2 | 医学博士すずきいちろう殿 | Prefix | |
Given Name | いちろう | ||
Middle Name | |||
Family Name | すずき | ||
Suffix | 殿 | ||
Title/Additional Info | 医学博士 | ||
Input | Output | ||
Example 3 | 営業課長スズキイチロウ (一級建築士) | Prefix | |
Given Name | イチロウ | ||
Middle Name | |||
Family Name | スズキ | ||
Suffix | |||
Title/Additional Info | 営業課長 (一級建築士) | ||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Organization | |||
---|---|---|---|
Description | The Parse definition for Organization parses organization names. | ||
Output Tokens | Name Legal Form Site Additional Info |
||
Input | Output | ||
Example 1 | 株式会社SASジャパン東京本社 開発部 | Name | SASジャパン |
Legal Form | 株式会社 | ||
Site | 東京本社 | ||
Additional Info | 開発部 | ||
Input | Output | ||
Example 2 | 山田製造(株)西大阪支社 (カスタマーサポート係) | Name | 山田製造 |
Legal Form | (株) | ||
Site | 西大阪支社 | ||
Additional Info | (カスタマーサポート係) | ||
Remarks |
Organization (Global) | |||
---|---|---|---|
Description | The Parse definition for Organization (Global) parses organization names into a globally recognized set of tokens. | ||
Output Tokens | Name Legal Form Site Additional Info |
||
Input | Output | ||
Example 1 | 株式会社SASジャパン東京本社 開発部 | Name | SASジャパン |
Legal Form | 株式会社 | ||
Site | 東京本社 | ||
Additional Info | 開発部 | ||
Input | Output | ||
Example 2 | 山田製造(株)西大阪支社 (カスタマーサポート係) | Name | 山田製造 |
Legal Form | (株) | ||
Site | 西大阪支社 | ||
Additional Info | (カスタマーサポート係) | ||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Phone | |||
---|---|---|---|
Description | The Parse definition for Phone parses phone numbers into a set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Input | Output | ||
Example 1 | Tel(+81) 03-1234-5678 ext.123 (evenings and weekends) | Country Code | (+81) |
Area Code | 03 | ||
Base Number | 1234-5678 | ||
Extension | 123 | ||
Line Type | Tel | ||
Additional Info | (evenings and weekends) | ||
Input | Output | ||
Example 2 | 携帯 03ー1234ー5678 | Country Code | |
Area Code | 03 | ||
Base Number | 1234ー5678 | ||
Extension | |||
Line Type | 携帯 | ||
Additional Info | |||
Input | Output | ||
Example 3 | 0521234567123 | Country Code | |
Area Code | 052 | ||
Base Number | 1234567 | ||
Extension | 123 | ||
Line Type | |||
Additional Info | |||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description | The Parse definition for Phone (Global) parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Input | Output | ||
Example 1 | Tel(+81) 03-1234-5678 ext.123 (evenings and weekends) | Country Code | (+81) |
Area Code | 03 | ||
Base Number | 1234-5678 | ||
Extension | 123 | ||
Line Type | Tel | ||
Additional Info | (evenings and weekends) | ||
Input | Output | ||
Example 2 | 携帯 03ー1234ー5678 | Country Code | |
Area Code | 03 | ||
Base Number | 1234ー5678 | ||
Extension | |||
Line Type | 携帯 | ||
Additional Info | |||
Input | Output | ||
Example 3 | 0521234567123 | Country Code | |
Area Code | 052 | ||
Base Number | 1234567 | ||
Extension | 123 | ||
Line Type | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
None.
Address | ||
---|---|---|
Description | The Standardization definition for Address standardizes address information. | |
Input | Output | |
Example 1 | 1ー13ー1 イヌイビル・カチドキ8F 501号室 | 1-13-1 イヌイビル・カチドキ 8F 501 |
Input | Output | |
Example 2 | "二丁目五番八十二号山王パークビルB館5階502(総務部内)" | 2-5-82 山王パークビルB館 5F 502 総務部内 |
Remarks | Numeric expressions are standardized. Full-width alphanumeric characters are converted to half-width characters. Building names written in kana are not transliterated (except that half-width Katakana is converted to full-width Katakana). |
Address (Full) | ||
---|---|---|
Description | The Standardization definition for Address (Full) standardizes full two-line addresses. | |
Input | Output | |
Examples | 1040052 トウキョウ都中央区勝どき1-13-1 イヌイビルカチドキ8F 801号室 (開発部内) | 104-0054 東京都中央区勝どき 1-13-1 イヌイビルカチドキ 8F 801 開発部内 |
"三四五-六七八九 新潟南魚沼郡湯沢町神立二丁目五番八十二号山王パークビルB館5階502 (総務部内) " | 345-6789 新潟南魚沼郡湯沢町神立 2-5-82 山王パークビルB館 5F 502 総務部内 | |
〒163-8799 東京都新宿区 新宿郵便局 私書箱第123号 | 163-8799 東京都新宿区 郵便事業株式会社 新宿支店 私書箱123号 | |
Remarks | Numeric expressions are standardized. Prefecture and city names are converted from kana to kanji. Full-width alphanumeric characters are converted to half-width characters. Prefecture and city identifier keywords are added when possible. Non-logical characters are removed: quotes, blanks, and so on. Arabic numerals in city and town names are converted to kanji (for example, 三番町). Kanji numerals in block numbers are converted to Arabic numerals (for example, 4-3-5). Building names written in kana are not transliterated (except that half-width katakana is converted to full-width katakana). |
City | ||
---|---|---|
Description | The Standardization definition for City standardizes city and town names. | |
Input | Output | |
Examples | 名古屋千種今池南 | 名古屋市千種区今池南 |
チュウオウクカチドキ | 中央区勝どき | |
"札幌市 中央区 北1条東" | 札幌市中央区北一条東 | |
Remarks | Adds city identifier keywords when possible. Converts kana names to kanji. Removes non-logical characters: quotes, blanks, and so on. Makes a best guess of Kanji or Arabic numbers in city and town names depending on the context (for example, 一条, 三番町, 1区). |
City - State/Province - Postal Code | ||
---|---|---|
Description | The Standardization definition for City - State/Province - Postal Code standardizes Postal Code, Prefecture, and City. | |
Input | Output | |
Example 1 | 〒1051234 "東京港浜松町" | 105-1234 東京都港区浜松町 |
Remarks | Add Identifier when possible. Remove non-logical characters: quotes, blanks, and so on. | |
Input | Output | |
Example 2 | 一〇四ノ〇〇五四 トウキョウチュウオウクカチドキ | 104-0054 東京都中央区勝どき |
Remarks | This definition converts kanji numerals to Arabic numerals. Convert prefecture and city names from kana to kanji. |
Date (Japanese Calendar) | |||
---|---|---|---|
Description | The Standardization definition for Date (Japanese Calendar) standardizes date expressions to Japanese calendar format (Nengo). | ||
Input | Output | Explanation | |
Examples | 2001/3/14 | 平成13年03月14日 | Standardize fullwidth characters to halfwidth |
2001年3月14日 | 平成13年03月14日 | Standardize calendar identifier to EEYY年MM月DD日 format. | |
3/14/2001 | 平成13年03月14日 | Consider 4-digit number as year | |
14-mar-01 | 平成13年03月14日 | Standardize month name. When the day and year are ambiguous, consider the last number to be the year. | |
H13.3.14 | 平成13年03月14日 | Standardize Nengo expression. | |
平成十三年三月十四日 | 平成13年03月14日 | Standardize Kanji numbers. | |
大正十五年一二月二五日 | 昭和元年12月25日 | When a day is assigned to two nengo, the later of the two nengo is used. For example, 1926/12/25 is in 昭和元年, 1912/07/30 is in 大正元年. | |
S01年12月25日 | 昭和元年12月25日 | 01年 is expressed as 元年. | |
Remarks | This definition supports dates from 1901 (明治34年) to 2050 (平成62年). Assumes two-digit years 00-29 are 2000-2029. Assumes two-digit years 30-99 are 1930-1999. |
Date (Western Calendar) | |||
---|---|---|---|
Description | The Standardization definition for Date (Western Calendar) standardizes date expressions to Western calendar format. | ||
Input | Output | Explanation | |
Examples | 2001/3/14 | 2001/03/14 | Standardize Fullwidth characters to Halfwidth. |
2001年3月14日 | 2001/03/14 | Standardize calendar identifier to YYYY/MM/DD format. | |
3/14/2001 | 2001/03/14 | Consider 4-digit number as year. | |
14-mar-01 | 2001/03/14 | Standardize month name to digit. When the day and year are ambiguous, consider the last number to be the year. | |
H13.3.14 | 2001/03/14 | Convert Nengo year to western year. | |
平成13年3月14日 | 2001/03/14 | Convert Nengo year to western year. | |
平成十三年三月十四日 | 2001/03/14 | Standardize Kanji numbers. | |
平成元年三月十四日 | 1989/03/14 | ||
Remarks | This definition supports dates from 1901 (明治34年) to 2050 (平成62年). The definition assumes two-digit years 00-29 are 2000-2029. The definition assumes two-digit years 30-99 are 1930-1999. |
Name | ||
---|---|---|
Description | The Standardization definition for Name standardizes names of individuals. | |
Input | Output | |
Examples | 鈴木一郎様 | 鈴木 一郎 |
"営業課長鈴木 一郎様 (一級建築士)" | 鈴木 一郎, 営業課長 一級建築士 | |
すずきいちろう | すずき いちろう | |
Remarks | Name suffixes are discarded. A single white space is placed between the family name and the given name. Titles and additional information are placed after name. Titles are converted to Kanji. Names written in Katakana are left in Katakana (they are not transliterated to Kanji). Similarly, names written in Hiragana are left in Hiragana. |
Organization | ||
---|---|---|
Description | The Standardization definition for Organization standardizes organization names. | |
Input | Output | |
Examples | (株)SASジャパン東京本社 開発部 | SASジャパン 株式会社, 東京本社 開発部 |
"ツキシマクミンセンター" | 月島区民センター | |
Remarks | Half-width Katakana is transformed to full-with Katakana. Full-width ASCII characters are transformed to half-width. Legal form information appearing at the beginning of a string is moved to the end of the organization name. Legal forms are converted to long-form. Kana forms of well-known company names are converted to Kanji. |
Phone | ||
---|---|---|
Description | The Standardization definition for Phone standardizes phone numbers for domestic use. | |
Input | Output | |
Examples | 電話 (03)1234-5678 内線123 | (03) 1234 5678 x123, Tel |
"81312345678" | (03) 1234 5678 | |
+1 (919) 447-3000 | +1 9194473000 | |
Remarks |
Phone (Electronic) | ||
---|---|---|
Description | The Standardization definition for Phone (Electronic) standardizes phone numbers for automated calling systems | |
Input | Output | |
Examples | 電話 (03)1234-5678 内線123 | +81312345678 |
+1-800-HOLIDAY | +18004654329 | |
+1-800-DATAFLUX | +180032823589 | |
0120 VISITJAPAN | +811208474852726 | |
Remarks |
Phone (with Country Code) | ||
---|---|---|
Description | The Standardization definition for Phone (with Country Code) standardizes phone numbers for international use. | |
Input | Output | |
Examples | 電話 (03)1234-5678 内線123 | +81 3 1234 5678 x123, Tel |
81312345678 (after 4pm) | +81 3 1234 5678, After 4PM | |
0044 (0)20 12345000 | +44 2012345000 | |
Remarks |
Postal Code | ||
---|---|---|
Description | The Standardization definition for Postal Code standardizes postal codes. | |
Input | Output | |
Examples | 〒1040052 | 104-0052 |
〶一〇四の〇〇五二 | 104-0052 | |
"907-11" | 907-11 | |
Remarks | The Standardization definition for Postal Code inserts a hyphen after the first three digits. Converts kanji numerals to Arabic numerals. Converts full-width ASCII to half-width. Removes postal code marker characters. |
Prefecture | ||
---|---|---|
Description | The Standardization definition for Prefecture standardizes prefecture names. | |
Input | Output | |
Examples | 愛知 | 愛知県 |
あいちけん | 愛知県 | |
"アイチ" | 愛知県 | |
Remarks | This definition adds prefecture identifier keywords when possible. Converts names from kanji to kana. Removes non-logical characters: quotes, blanks, and so on. |
In addition to the definitions listed on this page, the Japanese, Japan locale also inherits all definitions for the Japanese language and all Global definitions.
Documentation Feedback: yourturn@sas.com
|
Doc ID: QKBCI_JAJPN_defs.html |