SAS Quality Knowledge Base for Contact Information 26
Definitions for the Polish, Poland locale are described below.
Case Definitions
Extraction Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Lower (Phone) | ||
---|---|---|
Description | The Lower (Phone) case definition lowercases text in phone data. | |
Examples | Input | Output |
08013BBANK | 08013bbank | |
6327958 SOLARIUM | 6327958 solarium | |
7178455 dom | 7178455 dom | |
Remarks |
Proper (Address) | ||
---|---|---|
Description | The Proper (Address) case definition propercases addresses. | |
Examples | Input | Output |
Pl Orlat Lwowskich 1 | Pl Orlat Lwowskich 1 | |
ul Wladyslawa IV 1 | ul Wladyslawa IV 1 | |
Remarks |
Proper (City - State/Province - Postal Code) | ||
---|---|---|
Description | The Proper (City - State/Province - Postal Code ) case definition propercases last line address information. | |
Examples | Input | Output |
ZLOTORYJA 59-500 | Zlotoryja 59-500 | |
ZYWIEC 34-300 | Zywiec 34-300 | |
Remarks |
Proper (Name) | ||
---|---|---|
Description | The Proper (Name) case definition propercases names of individuals. | |
Examples | Input | Output |
JAN SZYMON VON DEKER | Jan Szymon von Deker | |
k kowalski | K Kowalski | |
Remarks |
Proper (Organization) | ||
---|---|---|
Description | The Proper (Organization) case definition propercases organization names. | |
Examples | Input | Output |
gtc rail poland SP. z o.o. | GTC Rail Poland sp. z o.o. | |
XXVIII LICEUM IM. JANA NOWAKA | XXVIII Liceum im. Jana Nowaka | |
Remarks |
None.
Name | ||
---|---|---|
Description | The Name gender analysis definition determines the gender of a name. | |
Possible Outputs | M, F, U | |
Examples | Input | Output |
Beata Krawczak | F | |
Marcin Giebultowicz | M | |
T. Soszynski | M | |
T. Soszyńska | F | |
Dziecko | U | |
Remarks |
Individual/Organization | ||
---|---|---|
Description | The Individual/Organization identification analysis definition determines whether a string represents the name of an individual or an organization. | |
Possible Outputs | INDIVIDUAL, ORGANIZATION, UNKNOWN | |
Examples | Input | Output |
Action Sp. z o.o. | ORGANIZATION | |
Janusz Wojdecki | INDIVIDUAL | |
Telekomunikacja Polska Spółka Akcyjna | ORGANIZATION | |
Action Budniak | UNKNOWN | |
Remarks |
Name (Single/Multiple) | ||
---|---|---|
Description | The Name (Single/Multiple) identification analysis definition determines whether a string represents the name of one person or more than one person. | |
Possible Outputs | Single, Multiple | |
Examples | Input | Output |
DEZYDERAT I WYSZESLAW PONIKWIA | Multiple | |
Wyszeslaw Ponikwia | Single | |
KORDIAN GAJEWSKI I HORACY TYRKIEL | Multiple | |
Remarks |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 16 characters | |
Examples | Input | Cluster ID |
AL. Armii Ludowej 26 | 0 | |
Focus Building Al. Armii Ludowej 26 | 0 | |
ul Dekoracyjna 3 | 1 | |
Dekoracyjna 3 | 1 | |
Remarks |
|
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
Kłodzka | 0 | |
Kłodska | 0 | |
Zielonka | 1 | |
Zialonka | 1 | |
Zielonk | 1 | |
Remarks |
|
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 77 characters | |
Examples | Input | Cluster ID |
00-925 Warszawa, Mazowieckie | 0 | |
44-100 Gliwice, Śląskie | 1 | |
44-100 GLIWICE WOJ. SLASKIE | 1 | |
Remarks |
|
Country | ||
---|---|---|
Description | The Country match definition generates match codes which can be used to cluster records containing country names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
POLSKA | 0 | |
WARSZAWA | 0 | |
PL | 0 | |
BELGIA | 1 | |
BELGIUM | 1 | |
Remarks |
|
Name | ||
---|---|---|
Description | The Name match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 22 characters | |
Examples | Input | Cluster ID |
Jerzy Topolski | 0 | |
Pan Jerzy Topolski | 0 | |
Pan Jerzy Cieśliński | 1 | |
Prof. Janusz Siwy | 2 | |
Pan Janusz Sawa | 2 | |
Remarks |
|
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 35 characters | |
Examples | Input | Cluster ID |
Przedsiębiorstwo Usługowo-Handlowe TOMAI Sp. z o.o. | 0 | |
Przedsiębiorstwo Handlowe TOMI | 0 | |
Polskie Wydawnictwa Profesjonalne | 1 | |
Polskie Wydawnictwa Profesjonalne Sp. z o.o. KiK Konieczny | 1 | |
Remarks |
|
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
815-67-78 | 0 | |
8156778 | 0 | |
kom. 8833377 | 1 | |
tel.komor. 8833377 | 1 | |
służ. 888867875 | 2 | |
Remarks |
|
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
44-100 | 0 | |
44100 | 0 | |
00-925 | 1 | |
00 925 | 1 | |
Remarks |
|
Text | ||
---|---|---|
Description | The Text match definition generates match codes which can be used to cluster records containing general text strings. | |
Max Length of Match Code | 20 characters | |
Examples | Input | Cluster ID |
Data Management Studio | 0 | |
Przewodniczący | 1 | |
Przewodnicząca | 1 | |
Remarks |
|
Address | |||
---|---|---|---|
Description | The Address parse definition parses addresses into a set of tokens. | ||
Output Tokens | Street Type
Street Name Building Number Extension Additional Info |
||
Example 1 | Input | Output Token | Output |
ul. Gdańska 27/31 nr. 4, III piętro | Street Type | ul. | |
Street Name | Gdańska | ||
Building Number | 27/31 | ||
Extension | nr. 4 | ||
Additional Info | III piętro | ||
Example 2 | Input | Output Token | Output |
os. Gdańskie 15, blok 1 | Street Type | os. | |
Street Name | Gdańskie | ||
Building Number | 15 | ||
Extension | |||
Additional Info | blok 1 | ||
Remarks |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Example 1 | Input | Output Token | Output |
ul. Gdańska 27/31 nr. 4, III piętro | Recipient | ||
Building/Site | |||
Street | ul. Gdańska 27/31 | ||
Extension | nr. 4 | ||
PO Box | |||
Additional Info | III piętro | ||
Example 2 | Input | Output Token | Output |
os. Gdańskie 15, blok 1 | Recipient | ||
Building/Site | |||
Street | os. Gdańskie 15 | ||
Extension | |||
PO Box | |||
Additional Info | blok 1 | ||
Remarks |
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | The City - State/Province - Postal Code parse definition parses last line address information into a set of tokens. | ||
Output Tokens | Postal Code City Neighboring City Commune Region Province |
||
Example 1 | Input | Output Token | Output |
34 550 Olkusz k/Częstochowy, gm. Gąbin | Postal Code | 34550 | |
City | Olkusz | ||
Neighboring City | Częstochowy | ||
Commune | Gąbin | ||
Region | |||
Province | |||
Example 2 | Input | Output Token | Output |
44-100 Knurów, Gliwicki, woj. Śląskie |
Postal Code | 44-100 | |
City | Knurów | ||
Neighboring City | |||
Commune | |||
Region | Gliwicki | ||
Province | Śląskie | ||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The City - State/Province - Postal Code (Global) parse definition parses last line address information into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Example 1 | Input | Output Token | Output |
34 550 Olkusz k/Częstochowy, gm. Gąbin | City | Olkusz k/Częstochowy | |
State/Province | gm. Gąbin | ||
Postal Code | 34550 | ||
Additional Info | |||
Example 2 | Input | Output Token | Output |
44-100 Knurów, Gliwicki, woj. Śląskie | City | Knurów, | |
State/Province | Gliwicki, woj. Śląskie | ||
Postal Code | 44-100 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name | |||
---|---|---|---|
Description | The Name parse definition parses names of individuals into a set of tokens. | ||
Output Tokens | Prefix/Title Given Name Middle Name Family Name Suffix |
||
Example 1 | Input | Output Token | Output |
Pan Prof. Wojciech M. Kulik | Prefix/Title | Pan Prof. | |
Given Name | Wojciech | ||
Middle Name | M. | ||
Family Name | Kulik | ||
Suffix | |||
Example 2 | Input | Output Token | Output |
ks. prow. Jan Szymon von Deker II | Prefix/Title | ks. prow. | |
Given Name | Jan | ||
Middle Name | Szymon | ||
Family Name | von Deker | ||
Suffix | II | ||
Remarks |
Name (Global) | |||
---|---|---|---|
Description | The Name (Global) parse definition parses names of individuals into a globally recognized set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Example 1 | Input | Output Token | Output |
Pan Prof. Wojciech M. Kulik | Prefix | Pan Prof. | |
Given Name | Wojciech | ||
Middle Name | M. | ||
Family Name | Kulik | ||
Suffix | |||
Title/Additional Info | |||
Example 2 | Input | Output Token | Output |
dr Janina Nowak-Kowalska | Prefix/Title | dr | |
Given Name | Janina | ||
Middle Name | |||
Family Name | Nowak-Kowalska | ||
Suffix | |||
Title/Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name (Multiple Name) | |||
---|---|---|---|
Description | The Name (Multiple Name) parse definition parses strings that contain the names of two individuals into a set of tokens. | ||
Output Tokens | Name 1 Name 2 |
||
Examples | Input | Output Token | Output |
1 | Łukasz Leszewski i Magdalena Pawłowska | Name 1 | Łukasz Leszewski |
Name 2 | Magdalena Pawłowska | ||
2 | Katarzyna Potocka | Name 1 | Katarzyna Potocka |
Name 2 | |||
3* | Łukasz Leszewski i Magdalena Leszewska | Name 1 | Łukasz Leszewski |
Name 2 | Magdalena Leszewska | ||
Remarks | If only one name is present in the input, the first token is used. Because Polish family names use feminine, masculine, and plural variations, strings containing multiple names should be standardized with the Name (Multiple Name) standardization definition before being processed with the Name (Multiple Name) parse definition. Otherwise, the results of the parse may show incorrect family name variations for individual names. * In Example 3, the input is the output of Example 5 of the Name (Multiple Name) standardization definition. The original input was Łukasz i Magdalena Leszewscy. |
Organization | |||
---|---|---|---|
Description | The Organization parse definition parses organization names into a set of tokens. | ||
Output Tokens | Name Legal Form Site Additional Info |
||
Example 1 | Input | Output Token | Output |
SAS Sp. z o.o. BUFIN oddział w Warszawie | Name | SAS | |
Legal Form | Sp. z o.o. | ||
Site | oddział w Warszawie | ||
Additional Info | BUFIN | ||
Example 2 | Input | Output Token | Output |
DataFlux sc w Górze Kalwarii | Name | DataFlux | |
Legal Form | sc | ||
Site | w Górze Kalwarii | ||
Additional Info | |||
Remarks |
Organization (Global) | |||
---|---|---|---|
Description | The Organization (Global) parse definition parses organization names into a set of tokens. | ||
Output Tokens | Name Legal Form Site Additional Info |
||
Example 1 | Input | Output Token | Output |
SAS Sp. z o.o. BUFIN oddział w Warszawie | Name | SAS | |
Legal Form | Sp. z o.o. | ||
Site | oddział w Warszawie | ||
Additional Info | BUFIN | ||
Example 2 | Input | Output Token | Output |
DataFlux sc w Górze Kalwarii | Name | DataFlux | |
Legal Form | sc | ||
Site | w Górze Kalwarii | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Phone | |||
---|---|---|---|
Description | The Phone parse definition parses phone numbers into a set of tokens. | ||
Output Tokens | Prefix Country Code Area Code Base Number Extension |
||
Example 1 | Input | Output Token | Output |
tel. +48 (22) 1234567 w. 89 | Prefix | tel. | |
Country Code | 48 | ||
Area Code | 22 | ||
Base Number | 1234567 | ||
Extension | w. 89 | ||
Example 2 | Input | Output Token | Output |
603584911 | Prefix | ||
Country Code | |||
Area Code | |||
Base Number | 603584911 | ||
Extension | |||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description | The Phone (Global) parse definition parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Example 1 | Input | Output Token | Output |
tel. +48 (22) 1234567 w. 89 | Country Code | 48 | |
Area Code | 22 | ||
Base Number | 1234567 | ||
Extension | w. 89 | ||
Line Type | tel. | ||
Additional Info | |||
Example 2 | Input | Output Token | Output |
603584911 | Country Code | ||
Area Code | |||
Base Number | 603584911 | ||
Extension | |||
Line Type | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
None.
Address | ||
---|---|---|
Description | The Address standardization definition standardizes addresses. | |
Examples | Input | Output |
ULICA GDAŃSKA 27-31 | ul. Gdańska 27/31 | |
Trabki ul Osadnicza 8 | ul. Osadnicza 8, Trabki | |
ul Chlodna 51 XVI Floor | ul. Chłodna 51, XVI Floor | |
Remarks |
City | ||
---|---|---|
Description | The City standardization definition standardizes city names. | |
Examples | Input | Output |
Lubiana k Koscierzny | Łubiana k Koscierzny | |
Grodzisk Mazowiecki | Grodzisk Mazowiecki | |
SRODA SLASKA | Środa Śląska | |
Remarks |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code standardization definition standardizes last line address information. | |
Examples | Input | Output |
Pila 64920 | 64-920 Piła | |
Skarzysko-Kamienna 26-110 | 26-110 Skarżysko-Kamienna | |
Wegierska Górka 34-350 | 34-350 Węgierska Górka | |
Remarks |
Country | ||
---|---|---|
Description | The Country standardization definition standardizes country names. | |
Examples | Input | Output |
Niemcy | Niemcy | |
PL | Polska | |
NETHERLANDS | Holandia | |
Remarks |
Country (ISO 2 char) | ||
---|---|---|
Description | The Country (ISO 2 Char) standardization definition standardizes country names into the ISO-3166 two-character designation. | |
Examples | Input | Output |
Dania | DK | |
CZECHY | CZ | |
LEGIONOWO | PL | |
Remarks |
Name | ||
---|---|---|
Description | The Name standardization definition standardizes names of individuals. | |
Examples | Input | Output |
INZYNIER LUKASZ KOWALKOWSKI | inż. Łukasz Kowalkowski | |
bartos, adam | Adam Bartos | |
Remarks |
Name (Multiple Name) | ||
---|---|---|
Description | The Name (Multiple Name) standardization definition standardizes input data that contains two names. | |
Examples | Input | Output |
DEZYDERAT I WYSZESLAW PONIKWIA | Dezyderat Ponikwia i Wyszeslaw Ponikwia | |
CIESZYSLAW I LESLAW NOWINSCY | Cieszyslaw Nowinski i Leslaw Nowinski | |
KORDIAN GAJEWSKI I HORACY TYRKIEL | Kordian Gajewski i Horacy Tyrkiel | |
LONGIN ZALEWSKA I GEMMA WITKOWSKI | Longin Zalewska i Gemma Witkowski | |
Łukasz i Magdalena Leszewscy | Łukasz Leszewski i Magdalena Leszewska | |
Remarks | This definition splits plural variations of Polish family names into individual feminine and/or masculine variations. It should be used to standardize strings containing multiple names before the Name (Multiple Name) parse definition is used to parse those strings. |
Organization | ||
---|---|---|
Description | The Organization standardization definition standardizes organization names. | |
Examples | Input | Output |
Y M C D SA | YMCD S.A. | |
Amazon.COM | Amazon.com | |
A.T.W.Products Sp. z o.o. | ATW Products sp. z o.o. | |
Remarks |
Phone | ||
---|---|---|
Description | The Phone standardization definition standardizes phone numbers for domestic use. | |
Examples | Input | Output |
868-58-16 | 868 58 16 | |
0501-712-050 | 501 712 050 | |
6337928 | 633 79 28 | |
Remarks |
Postal Code | ||
---|---|---|
Description | The Postal Code standardization definition standardizes postal codes. | |
Examples | Input | Output |
12345 | 12-345 | |
11/234 | 11-234 | |
990-00 | 99-000 | |
9-9000 | 99-000 | |
12-123 | 12-123 | |
Remarks |
In addition to the definitions listed on this page, the Polish, Poland locale also inherits all definitions for the Polish language and all Global definitions.
Documentation Feedback: yourturn@sas.com |
Doc ID: QKBCI_PLPOL_defs.html |