SAS Quality Knowledge Base for Contact Information 25
Definitions for the Russian, Russia locale are described below.
Case Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Proper (Name) | ||
---|---|---|
Description | Propercases names. | |
Examples | Input | Output |
Александр Бородин | Александр Бородин | |
николай григорьевич рубинштейн | Николай Григорьевич Рубинштейн | |
РИМСКИЙ-КОРСАКОВ НИКОЛАЙ | Римский-Корсаков Николай | |
Remarks |
Name | ||
---|---|---|
Description | Determines an individual's gender based on a name. | |
Possible Outputs | M, F, U | |
Examples | Input | Output |
Пафнутий Львович Чебышев | M | |
Софья Ковалевская | F | |
Арнольд В.И. | U | |
Remarks |
City | ||
---|---|---|
Description | Determines whether a string represents a Russian city name. | |
Possible Outputs | CITY, UNK | |
Examples | Input | Output |
Москва | CITY | |
Пушкин | CITY | |
Вентилятор | UNK | |
Remarks |
Individual/Organization | ||
---|---|---|
Description | Determines whether a string represents the name of an individual or an organization. | |
Possible Outputs | Organization, Individual, Unknown | |
Examples | Input | Output |
Федор Достоевский | Name | |
ОАО У Швейка | Organization | |
ООО Промкнигторг | Organization | |
Remarks |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 27 characters | |
Examples | Input | Cluster ID |
Verkhniy Tagansky Tupik, 4 | 0 | |
Verhniy Taganskiy tup., 4 | 0 | |
Тимура Фрунзе, д. 12, кв. 34 | 1 | |
Тимура Фрунзе, 12, 34 | 1 | |
ул. Новорязанская.,31/7, 3-й этаж, building 2 | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Address (Full) | ||
---|---|---|
Description | The Address (Full) match definition generates match codes which can be used to cluster records containing complete two-line addresses. | |
Max Length of Match Code | 27 characters | |
Examples | Input | Cluster ID |
119146 Москва, Комсомольский пр-т, дом 14/2, кв. 58 | 0 | |
119200 Москва, Комсомольский пр-т, дом 14/4, кв. 70 | 1 | |
123100, Москва, улица 1905 года, 1, кв 15 | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Address (PO Box Only) | ||
---|---|---|
Description | The Address (PO Box Only) match definition generates match codes which can be used to cluster records containing the PO Box portion of an address. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
ул. Новая Басманная, д. 2, а/я 12 | 0 | |
ул. Тверская 5, а/я 12 | 0 | |
Remarks |
The Match definition for Address (PO Box Only) ignores street name information. Note: The results listed above reflect the default match sensitivity (85). |
Address (Street Only) | ||
---|---|---|
Description | The Address (Street Only) match definition generates match codes which can be used to cluster records containing the street portion of an address. | |
Max Length of Match Code | 23 characters | |
Examples | Input | Cluster ID |
ул. Новая Басманная, д. 2, а/я 12 | 1 | |
ул. Новая Басманная, д. 2, а/я 17 | 1 | |
ул. Новая Басманная, д. 2 | 1 | |
Remarks |
The Match definition for Address (Street Only) ignores PO Box information. Note: The results listed above reflect the default match sensitivity (85). |
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
город Дубна | 1 | |
г. Дубна | 1 | |
Новосибирск | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 32 characters | |
Examples | Input | Cluster ID |
ТЮМЕНСКАЯ ОБЛ,ЯМАЛО-НЕНЕЦКИЙ АВТ ОКРУГ Г ПЫТЬ-ЯХ | 1 | |
ТЮМЕНСКАЯ ОБЛ,ХАНТЫ-МАНСИЙСКИЙ АВТ ОКРУГ Г ПЫТЬ-ЯХ | 1 | |
121069 RUSSIA MOSCOW UL B MOLCHANOVKA, 36 | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Name | ||
---|---|---|
Description | The Name match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 24 characters | |
Examples | Input | Cluster ID |
Agafonova Anna | 1 | |
Агафонова Анна | 1 | |
Aleksandr Bukatin | 2 | |
Саша Букатин | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 60 characters | |
Examples | Input | Cluster ID |
ООО "У пескаря" | 0 | |
ПБОЮЛ Сидорова А.Н. | 1 | |
Банк Жилищного Финансирования | 2 | |
БАНК ЖИЛИЩНОГО ФИНАНСИРОВАНИЯ | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Passport | ||
---|---|---|
Description | The Passport match definition generates match codes which can be used to cluster records containing passport numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
Паспорт 65 00 523259 | 1 | |
серия 6500 номер 523259 | 1 | |
98 56 659821 | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
32-65-99 | 1 | |
8(8352)32-65-99 | 1 | |
+79093129896 | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
123100 | 1 | |
123104 | 1 | |
607750 | 2 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Address | |||
---|---|---|---|
Description | Parses addresses. | ||
Output Tokens | Street House Building Flat PO Box Additional Info |
||
Example 1 | Input | Output | |
ул. Профсоюзная, дом 45, этаж 8 кв. 34 а/я 34 | Street | ул. Профсоюзная | |
House | дом 45 | ||
Building | |||
Flat | кв 34 | ||
PO Box | а/я 34 | ||
Additional Info | этаж 8 | ||
Example 2 | Input | Output | |
пр-т Воробьёвых, 18 корпус 4, 9 | Street | пр-т Воробьёвых | |
House | 18 | ||
Building | 4 | ||
Flat | 9 | ||
PO Box | |||
Additional Info | |||
Remarks |
Address (Full) | |||
---|---|---|---|
Description | Parses full two-line addresses. | ||
Output Tokens | Postal Code Country Region Province City Street House Building Flat PO Box Additional Info |
||
Example 1 | Input | Output | |
185547 Тверская обл. г. Торопец ул. Лесная дом 4, офис 12 | Postal Code | 185547 | |
Country | |||
Region | Тверская обл. | ||
Province | |||
City | г. Торопец | ||
Street | ул. Лесная | ||
House | дом 4 | ||
Building | |||
Flat | офис 12 | ||
PO Box | |||
Additional Info | |||
Example 2 | Input | Output | |
г. Волгоград ул. Московская дом 48 корпус 3, кв. 14 этаж 2, а/я 65975 | Postal Code | 185547 | |
Country | |||
Region | |||
Province | |||
City | г. Волгоград | ||
Street | ул. Московская | ||
House | дом 48 | ||
Building | 3 | ||
Flat | кв. 14 | ||
PO Box | а/я 65975 | ||
Additional Info | этаж 2 | ||
Remarks |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output | ||
Example 1 | Пр-т Юных Ленинцев, 78, кв 23, этаж 3 | Recipient | |
Building/Site | |||
Street | Пр-т Юных Ленинцев, 78 | ||
Extension | кв 23, этаж 3 | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 2 | ул. Профсоюзная, дом 45, этаж 8 кв. 34 а/я 34 | Recipient | |
Building/Site | |||
Street | ул. Профсоюзная, дом 45 | ||
Extension | этаж 8 кв. 34 | ||
PO Box | а/я 34 | ||
Additional Info | |||
Input | Output | ||
Example 3 | пр-т Воробьёвых, 18 корпус 4, 9 | Recipient | |
Building/Site | |||
Street | пр-т Воробьёвых, 18 | ||
Extension | корпус 4, 9 | ||
PO Box | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
Address (Global) (v23) | |||
---|---|---|---|
Description |
The Address (Global) (v23) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output | ||
Example 1 | Пр-т Юных Ленинцев, 78, кв 23, этаж 3 | Recipient | |
Building/Site | |||
Street | Пр-т Юных Ленинцев, 78 | ||
Extension | кв 23, этаж 3 | ||
PO Box | |||
Additional Info | |||
Input | Output | ||
Example 2 | ул. Профсоюзная, дом 45, этаж 8 кв. 34 а/я 34 | Recipient | |
Building/Site | |||
Street | ул. Профсоюзная, дом 45 | ||
Extension | этаж 8 кв. 34 | ||
PO Box | а/я 34 | ||
Additional Info | |||
Input | Output | ||
Example 3 | пр-т Воробьёвых, 18 корпус 4, 9 | Recipient | |
Building/Site | |||
Street | пр-т Воробьёвых, 18 | ||
Extension | корпус 4, 9 | ||
PO Box | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. | ||
The Address (Global) (v23) parse definition is now deprecated and will be removed in a future release of the QKB. The Address (Global) parse definition has been replaced with a copy of the Address (Global) (v23) definition which takes advantage of the new tokens and updated processing. If you changed your jobs to use Address (Global) (v23) it is suggested that you change them back. |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | Parses address "last line" data. | ||
Output Tokens | Postal Code Country Region Province City |
||
Example | Input | Output | |
258488, Новосибирская область, г. Бердск | Postal Code | 258488 | |
Country | |||
Region | Новосибирская область | ||
Province | |||
City | г. Бердск | ||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | Parses address "last line" data into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Example | Input | Output | |
258488, Новосибирская область, г. Бердск | City | г. Бердск | |
State/Province | Новосибирская область | ||
Postal Code | 258488 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name | |||
---|---|---|---|
Description | Parses names of individuals. | ||
Output Tokens | Prefix Given Name Patronym Family Name Additional Info |
||
Example 1 | Input | Output | |
ФОКИН АНДРЕЙ ВЛАДИМИРОВИЧ | Prefix | ||
Given Name | АНДРЕЙ | ||
Patronym | ВЛАДИМИРОВИЧ | ||
Family Name | ФОКИН | ||
Additional Info | |||
Example 2 | Input | Output | |
Профессор Ф.Ф. Преображенский | Prefix | ||
Given Name | Ф | ||
Patronym | Ф | ||
Family Name | Преображенский | ||
Additional Info | Профессор | ||
Example 3 | Input | Output | |
Доцент г-н Дальф Ёжиков | Prefix | г-н | |
Given Name | Дальф | ||
Patronym | |||
Family Name | Ёжиков | ||
Additional Info | Доцент | ||
Remarks |
Name (Global) | |||
---|---|---|---|
Description | Parses names of individuals into a globally recognized set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Example 1 | Input | Output | |
Доктор технических наук госпожа Синичкина | Prefix | госпожа | |
Given Name | |||
Middle Name | |||
Family Name | Синичкина | ||
Suffix | |||
Title/Additional Info | Доктор технических наук | ||
Example 2 | Input | Output | |
Эйншьейн Альберт Германович, член РАН | Prefix | ||
Given Name | Альберт | ||
Middle Name | Германович | ||
Family Name | Эйнштейн | ||
Suffix | |||
Title/Additional Info | член РАН | ||
Remarks | Patronymic names are parsed into the Middle Name token. |
||
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Passport | |||
---|---|---|---|
Description | Parses passport number data. | ||
Output Tokens | Series Number |
||
Example | Input | Output | |
паспорт №345678 серия 46 00 | Series | 46 00 | |
Number | 345678 | ||
Remarks |
Phone | |||
---|---|---|---|
Description | Parses Russian phone numbers. | ||
Output Tokens | Prefix Long Distance Code Country Code Area Code Base Number Extension Suffix |
||
Example 1 | Input | Output | |
+7499568-58-96 | Prefix | ||
Long Distance Code | |||
Country Code | +7 | ||
Area Code | 499 | ||
Base Number | 568-58-96 | ||
Extension | |||
Suffix | |||
Example 2 | Input | Output | |
8 495 1235485 доб. 1285 | Prefix | ||
Long Distance Code | 8 | ||
Country Code | |||
Area Code | 495 | ||
Base Number | 1235485 | ||
Extension | |||
Suffix | доб. 1285 | ||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description | Parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Example 1 | Input | Output | |
+7 (812) 3198565 доб. 11 | Country Code | +7 | |
Area Code | 812 | ||
Base Number | 3198565 | ||
Extension | |||
Line Type | |||
Additional Info | доб. 11 | ||
Example 2 | Input | Output | |
fax +7 (812) 3198565 | Country Code | +7 | |
Area Code | 812 | ||
Base Number | 3198565 | ||
Extension | |||
Line Type | |||
Additional Info | fax | ||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
None.
Address | ||
---|---|---|
Description | Standardizes address data. | |
Examples | Input | Output |
ул. Карла Маркса, дом 13 стр.2, кв. 15 этаж 4 а/я 14 | ул. Карла Маркса, дом 13, корп 2, кв. 15, а/я 14, этаж 4 | |
ул. Тверская дом 5 подъезд 4 этаж 1 кв 45 | ул. Тверская, дом 5, кв 45, под 4 | |
Remarks |
Address (Full) | ||
---|---|---|
Description | Standardizes full two-line address data. | |
Example | Input | Output |
123321, г. норильск, ул. Теплая, 19, корп.2, кв. 14, а/я 125521 | 123321, г. Норильск, ул. Теплая, д 19, корп 2, Кв 14, а/я 125521 | |
Remarks |
City | ||
---|---|---|
Description | Standardizes city names. | |
Examples | Input | Output |
ТамБов | Тамбов | |
город Дубна | г Дубна | |
Remarks |
City - State/Province - Postal Code | ||
---|---|---|
Description | Standardizes city name and postal code combinations. | |
Examples | Input | Output |
958859 Московская область город Серпухов | 958859, Московская обл, г Серпухов | |
198367, КАЛИНИНГРАД | 198367, Калининград | |
Remarks |
Name | ||
---|---|---|
Description | Standardizes names of individuals. | |
Examples | Input | Output |
Доктор господин Сидоров А.Р. | г-н Сидоров А.Р. доктор | |
Анатолий Сергеевич ЛОГИНОВ | Логинов Анатолий Сергеевич | |
Remarks |
Organization | ||
---|---|---|
Description | Standardizes organization names. | |
Examples | Input | Output |
ООО «ПАРУС» | «ПАРУС», ООО | |
Remarks |
Passport | ||
---|---|---|
Description | Standardizes passport numbers. | |
Example | Input | Output |
95 06 359 569 | 95-06 359569 | |
Remarks |
Phone | ||
---|---|---|
Description | Standardizes Russian phone numbers. | |
Examples | Input | Output |
+7 495 485 95 96 | +7 (495) 485-95-96 | |
18568 | 1-85-68 | |
Remarks |
Phone (Replace Obsolete Area Codes) | ||
---|---|---|
Description | Standardizes Russian phone numbers, replacing old area codes with new area codes. | |
Example | Input | Output |
(08242) 40537 | (08242) 4-05-37 | |
Remarks |
Postal Code | ||
---|---|---|
Description | Standardizes postal codes. | |
Example | Input | Output |
«123100» | 123100 | |
Remarks |
In addition to the definitions listed on this page, the Russian, Russia locale also inherits all definitions for the Russian language and all Global definitions.
Documentation Feedback: yourturn@sas.com
|
Doc ID: QKBCI_RURUS_defs.html |