SAS Quality Knowledge Base for Contact Information 26
Definitions for the Hungarian, Hungary locale are described below.
Case Definitions
Extraction Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Proper (City) | ||
---|---|---|
Description | The Proper (City) case definition propercases city names. | |
Examples | Input | Output |
BUDAPEST | Budapest | |
Szeged-tápé | Szeged-Tápé | |
Remarks |
Proper (Name) | ||
---|---|---|
Description | The Proper (Name) case definition propercases names of individuals. | |
Examples | Input | Output |
Kis-juhász Béla | Kis-Juhász Béla | |
KOVÁCS ISTVÁN | Kovács István | |
Szabó ferenc | Szabó Ferenc | |
Remarks |
Proper (Organization) | ||
---|---|---|
Description | The Proper (Organization) case definition propercases organization names. | |
Examples | Input | Output |
ECODEV Kft. | Ecodev Kft. | |
aranyalma bt. | Aranyalma Bt. | |
Remarks |
None.
Name | ||
---|---|---|
Description | The Name gender analysis definition determines the gender of a name. | |
Possible Outputs | M F U |
|
Examples | Input | Output |
Kovács János | M | |
Kovács Jánosné | F | |
John Smith | M | |
Gálfi G. | U | |
Remarks |
Individual/Organization | ||
---|---|---|
Description | The Individual/Organization identification analysis definition determines whether a string represents the name of an individual or an organization. | |
Possible Outputs | ORGANIZATION INDIVIDUAL UNKNOWN |
|
Examples | Input | Output |
Kovács János | INDIVIDUAL | |
Kovács János Bt. | ORGANIZATION | |
MOL Rt. | ORGANIZATION | |
MOL | UNKNOWN | |
Remarks |
Name | ||
---|---|---|
Description |
The Name identification analysis definition determines whether a string represents a correct Hungarian name. |
|
Possible Outputs | HUN FOREIGN OTHER |
|
Input | Output | |
Examples | Kovács János | HUN |
Kovács Jánso | FOREIGN | |
John Smith | FOREIGN | |
Szimán | OTHER | |
Remarks | The Name identification analysis definition is used to identify incorrect Hungarian names. It has a limited capability to distinguish between Hungarian and foreign names. A misspelled given name with a common family name might be identified as "FOREIGN" rather than "OTHER" (see the second example). |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 22 characters | |
Examples | Input | Cluster ID |
Cserje u. 18 fszt. 1. | 0 | |
Bajcsy Zs. u. 21-23 | 1 | |
Bajcsi Zsilinszky. út 21/a. | 1 | |
Nagyvárad tér 1/b. | 2 | |
Remarks |
Diacritics transliteration is applied at a sensitivity level of 84 and below. |
|
|
Address (Full) | ||
---|---|---|
Description | The Address (Full) match definition generates match codes which can be used to cluster records containing complete two-line addresses. | |
Max Length of Match Code | 41 characters | |
Examples | Input | Cluster ID |
1025 Budapest, Cserje u. 18. | 0 | |
1051 Bp., Bajcsy-Zsilinszky E. u. 21-23. | 1 | |
1051 Budapest, Bajcsy-Zs. E. u. 21-23. | 1 | |
9092 Győr, Arany J. tér 19. | 2 | |
Remarks |
|
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
TÁPÉ | 0 | |
SZEGED-TÁPÉ | 0 | |
BUDAPEST | 1 | |
BUDA-PEST | 1 | |
BP | 1 | |
Remarks |
|
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 19 characters | |
Examples | Input | Cluster ID |
1025 Budapest | 0 | |
1038 Bp | 1 | |
Remarks |
|
ID Number | ||
---|---|---|
Description | The ID Number match definition generates match codes which can be used to cluster records containing ID card numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
AU-VII. 255435 | 0 | |
AU255435 | 0 | |
255435AU | 0 | |
Remarks |
|
Name | ||
---|---|---|
Description | The Name match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 20 characters | |
Examples | Input | Cluster ID |
dr. Kovács István | 0 | |
Kováts István | 0 | |
özv. Kovács Istvánné | 0 | |
Kovács Istvánné Szabó Mária | 1 | |
Szabó Mária | 1 | |
Remarks |
|
|
Note that this definition does not produce the same match code for two strings such that the first is a man's name and the second includes the man's name as the marriage name portion of a woman's name. To match names according to the marriage name, use the Name (Marriage Name Only) match definition. |
Name (Marriage Name Only) | ||
---|---|---|
Description | The Name (Marriage Name Only) match definition generates match codes which can be used to cluster records containing names of individuals, based on the marriage name portion of the name string. | |
Max Length of Match Code | 20 characters | |
Examples | Input | Cluster ID |
özv. Kovács Istvánné | 0 | |
Kovács Istvánné Mária | 0 | |
Kovács Istvánné Szabó Mária | 0 | |
Kovács Istvánné Judit | 0 | |
Bakó Istvánné Szabó Mária | 1 | |
Remarks |
|
|
Note that this definition might cause some false positives, as when strings representing two women married to men of the same name produce the same match code. Also, this definition will not match two names if the names do not share marriage names -- even if other portions of the names are identical. |
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
Macska 2007 Kereskedelmi és Szolgáltató kft | 0 | |
Macska bt. | 1 | |
Macska Kereskedelmi és Szolgáltató kft | 1 | |
R and R Kft. | 2 | |
Remarks |
|
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 16 characters | |
Examples | Input | Cluster ID |
+36-1-326-0573 | 0 | |
326-05-73 | 0 | |
30/9456-455 | 1 | |
06-30-945-6455 | 1 | |
06-70-4532321 | 2 | |
Remarks |
|
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
1025. | 0 | |
1 0 2 5 | 0 | |
Remarks |
|
Text | ||
---|---|---|
Description | The Text match definition generates match codes which can be used to cluster records containing general text strings. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
Dell Számítógép | 0 | |
DELL-SZÁMITOGEP | 0 | |
Remarks |
|
Address | |||
---|---|---|---|
Description | The Address parse definition parses addresses into a set of tokens. | ||
Output Tokens | Street Name Street Type Building Number Extension Extension Number Additional Information |
||
Input | Output Token | Output | |
Example 1 | Cserje u. 18 fszt. 1. | Street Name | Cserje |
Street Type | u | ||
Building Number | 18 | ||
Extension | fszt. 1 | ||
Extension Number | |||
Additional Information | |||
Input | Output Token | Output | |
Example 2 | József A. ltp. 4/C 1/3 | Street Name | József A |
Street Type | ltp | ||
Building Number | 4/C | ||
Extension | 1/3 | ||
Extension Number | |||
Additional Information | |||
Remarks |
Address (Full) | |||
---|---|---|---|
Description | The Address (Full) parse definition parses addresses containing complete two-line addresses into a set of tokens. | ||
Output Tokens | Street Name Street Type Building Number Extension Extension Number Postal Code City Additional Information |
||
Input | Output Token | Output | |
Example 1 | 2324 Szeged-Tápé, Kossuth u. 13. | Street Name | Kossuth |
Street Type | u | ||
Building Number | 13 | ||
Extension | |||
Extension Number | |||
Postal Code | 2324 | ||
City | Szeged-Tápé | ||
Additional Information | |||
Input | Output Token | Output | |
Example 2 | 1038 Budapest, Magyar Televízió PF:138 | Street Name | Magyar Televízió |
Street Type | |||
Building Number | |||
Extension | PF | ||
Extension Number | 138 | ||
Postal Code | 1038 | ||
City | Budapest | ||
Additional Information | |||
Remarks |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output Token | Output | |
Example | Cserje u. 18. fszt. 1. | Recipient | |
Building/Site | |||
Street | Cserje u. 18 | ||
Extension | fszt. 1 | ||
PO Box | |||
Additional Info | |||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | The City - State/Province - Postal Code parse definition parses last line address information into a set of tokens. | ||
Output Tokens | Postal Code City |
||
Input | Output Token | Output | |
Example | 1025 Budapest | Postal Code | 1025 |
City | Budapest | ||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The City - State/Province - Postal Code (Global) parse definition parses last line address information into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Input | Output Token | Output | |
Examples | 1025 Budapest | City | Budapest |
State/Province | |||
Postal Code | 1025 | ||
Additional Info | |||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name | |||
---|---|---|---|
Description | The Name parse definition parses names of individuals into a set of tokens. | ||
Output Tokens | Prefix Marriage Name Family Name Given Name Suffix Title/Additional Info |
||
Input | Output Token | Output | |
Example 1 | dr. Kiss Józsefné | Prefix | dr. |
Marriage Name | |||
Family Name | Kiss | ||
Given Name | Józsefné | ||
Suffix | |||
Title/Additional Info | |||
Input | Output Token | Output | |
Example 2 | ifj. Kiss Lászlóné dr. Sipos Gabriella | Prefix | dr. |
Marriage Name | ifj. Kiss Lászlóné | ||
Family Name | Sipos | ||
Given Name | Gabriella | ||
Suffix | |||
Title/Additional Info | |||
Input | Output Token | Output | |
Example 3 | Dézsi István Zoltán (ifj.) | Prefix | ifj. |
Marriage Name | |||
Family Name | Dézsi | ||
Given Name | István Zoltán | ||
Suffix | |||
Title/Additional Info | |||
Remarks | When a woman's name is represented as her husband's name extended with the suffix -né, then the husband's family name and given name are parsed into the Family Name and Given Name tokens (see example 1). If a woman's name contains her maiden family name, then the woman's maiden name and given name are parsed into Family Name and Given Name, and the husband's name is parsed into the Marriage Name token (example 2). If a woman's name appears with the woman's maiden name and the husband's name, and the husband's name has a title or generational indicator, the husband's title or generational indicator is parsed into the Marriage Name token along with the rest of the husband's name (example 2). If the title appears between the husband's name and a woman's maiden name, the title is considered to apply to the woman, and is therefore parsed into the Prefix token (example 2). Generational indicator words appearing with a man's name are always parsed into the Prefix token (example 3). |
Name (Global) | |||
---|---|---|---|
Description | The Name (Global) parse definition parses names of individuals into a globally recognized set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Input | Output Token | Output | |
Example 1 | dr. Kiss Józsefné | Prefix | dr. |
Given Name | Józsefné | ||
Middle Name | |||
Family Name | Kiss | ||
Suffix | |||
Title/Additional Info | |||
Input | Output Token | Output | |
Example 2 | ifj. Kiss Lászlóné dr. Sipos Gabriella | Prefix | dr. |
Given Name | Gabriella | ||
Middle Name | |||
Family Name | Sipos | ||
Suffix | |||
Title/Additional Info | ifj. Kiss Lászlóné | ||
Input | Output Token | Output | |
Example 3 | Dézsi István Zoltán (ifj.) | Prefix | |
Given Name | István Zoltán | ||
Middle Name | |||
Family Name | Dézsi | ||
Suffix | ifj. | ||
Title/Additional Info | |||
Remarks | When a woman's name is represented as her husband's name extended with the suffix -né, then the husband's family name and given name are parsed into the Family Name and Given Name tokens (see example 1). If a woman's name contains her maiden family name, then the woman's maiden name and given name are parsed into Family Name and Given Name, and the husband's name is parsed into the Title/Additional Info token (Example 2). If a woman's name appears with the woman's maiden name and the husband's name, and the husband's name has a title or generational indicator, the husband's title or generational indicator is parsed into the Title/Additional Info token along with the rest of the husband's name (example 2). If the title appears between the husband's name and a woman's maiden name, the title is considered to apply to the woman, and is therefore parsed into the Prefix token (example 2). Generational indicator words appearing with a man's name are always parsed into the Suffix token (example 3). The Middle Name token is not used. Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name (Multiple Name) | |||
---|---|---|---|
Description | The Name (Multiple Name) parse definition parses strings that contain the names of two individuals into a set of tokens. | ||
Output Tokens | Name 1 Name 2 |
||
Input | Output Token | Output | |
Example 1 | László Kővári / József Barsi | Name 1 | László Kővári |
Name 2 | József Barsi | ||
Input | Output Token | Output | |
Example 2 | Demjén Zsolt és Szabó Albert | Name 1 | Demjén Zsolt |
Name 2 | Szabó Albert | ||
Remarks |
Name (with Infix) | |||
---|---|---|---|
Description | The Name (with Infix) parse definition parses names of individuals, using an Infix token for embedded title words. | ||
Output Tokens | Prefix Marriage Name Infix Family Name Given Name Suffix Title/Additional Info |
||
Input | Output Token | Output | |
Example 1 | dr. Kiss Józsefné | Prefix | dr. |
Marriage Name | |||
Infix | |||
Family Name | Kiss | ||
Given Name | Józsefné | ||
Suffix | |||
Title/Additional Info | |||
Input | Output Token | Output | |
Example 2 | ifj. Kiss Lászlóné dr. Sipos Gabriella | Prefix | ifj. |
Marriage Name | Kiss Lászlóné | ||
Infix | dr. | ||
Family Name | Sipos | ||
Given Name | Gabriella | ||
Suffix | |||
Title/Additional Info | |||
Input | Output Token | Output | |
Example 3 | Dézsi István Zoltán (ifj.) | Prefix | ifj. |
Marriage Name | |||
Infix | |||
Family Name | Dézsi | ||
Given Name | István Zoltán | ||
Suffix | |||
Title/Additional Info | |||
Remarks | When a woman's name is represented as her husband's name extended with the suffix -né, then the husband's family name and given name are parsed into the Family Name and Given Name tokens (see example 1). If a woman's name contains her maiden family name, then the woman's maiden name and given name are parsed into Family Name and Given Name, and the husband's name is parsed into the Marriage Name token (example 2). If a name contains a title at the beginning of the name, the title is parsed into the Prefix token (examples 1 and 2). If the title appears between the husband's name and a woman's maiden name, the title is parsed into the Infix token (example 2). Generational indicator words such as ifj are always parsed into the Prefix token (example 3). |
Organization | |||
---|---|---|---|
Description | The Organization parse definition parses organization names into a set of tokens. | ||
Output Tokens | Name Legal Form Additional Info Site Description |
||
Input | Output Token | Output | |
Example | ECODEV Gazdasági Fejlesztő és Tanácsadó Kft. | Name | ECODEV |
Legal form | Kft | ||
Additional Info | |||
Site | |||
Description | Gazdasági Fejlesztő és Tanácsadó | ||
Remarks | The Site token is unused. It is reserved for further development. |
Phone | |||
---|---|---|---|
Description | The Phone parse definition parses phone numbers into a set of tokens. | ||
Output Tokens | Prefix Country Code Area Code Base Number Extension |
||
Input | Output Token | Output | |
Example 1 | 06-85-560-020/123 | Prefix | |
Country Code | |||
Area Code | 85 | ||
Base Number | 560020 | ||
Extension | 123 | ||
Input | Output Token | Output | |
Example 2 | fax: 82/553-113 | Prefix | fax |
Country Code | |||
Area Code | 82 | ||
Base Number | 553113 | ||
Extension | |||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description |
The Phone (Global) parse definition parses phone numbers into a globally recognized set of tokens. |
||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Input | Output Token | Output | |
Example 1 | 06-85-560-020/123 | Country Code | |
Area Code | 85 | ||
Base Number | 560020 | ||
Extension | 123 | ||
Line Type | |||
Additional Info | |||
Input | Output Token | Output | |
Example 2 | fax: 82/553-113 | Country Code | |
Area Code | 82 | ||
Base Number | 553113 | ||
Extension | |||
Line Type | fax | ||
Additional Info | |||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Hungarian Word Analysis | ||
---|---|---|
Description |
The Hungarian Word Analysis pattern analysis definition determines the pattern of words in the input string. |
|
Examples | Input | Output |
Árvíztűrő tüköfúrógép | A A | |
1941-ben ismertem meg Dániel | 9*A A A A | |
MB3 Kft. | M A* | |
Grätzer György | X A | |
Remarks | The Hungarian Word Analysis is similar to Word Analysis, but is used to designate words with some non-Hungarian letters (for example, German, Slovakian, or Polish special characters) by an X. |
Address | ||
---|---|---|
Description | The Address standardization definition standardizes addresses. | |
Examples | Input | Output |
KOSSUTH TÉR 11 | Kossuth Lajos tér 11. | |
Széchenyi u. 1/A | Széchenyi István utca 1/a. | |
pf 89 | Pf.: 89 | |
Remarks |
Address (Full) | ||
---|---|---|
Description | The Address (Full) standardization definition standardizes complete two line addresses. | |
Input | Output | |
Example | 1025 Bp Cserje u. 18 | 1025 Budapest, Cserje utca 18. |
Remarks |
City | ||
---|---|---|
Description | The City standardization definition standardizes city names. | |
Input | Output | |
Examples | Bp | Budapest |
Szfvar | Székesfehérvár | |
SZEGED-TAPE | Szeged-Tápé | |
Remarks |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code standardization definition standardizes last line address information. | |
Input | Output | |
Examples | 1245 bp | 1245 Budapest |
3234GYOR | 3234 Győr | |
Remarks |
Name | ||
---|---|---|
Description | The Name standardization definition standardizes names of individuals. | |
Input | Output | |
Examples | DR. KOVÁCS P JÓZSEF | dr Kovács P József |
Bánáti János (Ifj.) | ifj Bánáti János | |
Remarks |
Organization | ||
---|---|---|
Description | The Organization standardization definition standardizes organization names. | |
Input | Output | |
Examples | ARANYALMA 2000 BETÉTI TÁRS | ARANYALMA 2000 Bt. |
Zsivány Ker És Szolg KFT | Zsivány Kereskedelmi És Szolgáltató Kft. | |
Remarks |
Phone | ||
---|---|---|
Description | The Phone standardization definition standardizes phone numbers for domestic use. | |
Input | Output | |
Examples | 06-85-560-020/123 | 85-560020/123 |
fax: 82/553-113 | Fax:82-553113 | |
Remarks |
Postal Code | ||
---|---|---|
Description |
The Postal Code standardization definition standardizes postal codes. |
|
Input | Output | |
Example | 1025Bp | 1025 |
Remarks |
In addition to the definitions listed on this page, the Hungarian, Hungary locale also inherits all definitions for the Hungarian language and all Global definitions.
Documentation Feedback: yourturn@sas.com |
Doc ID: QKBCI_HUHUN_defs.html |