SAS Quality Knowledge Base for Contact Information 26
Definitions for the Russian, Russia locale are described below.
Case Definitions
Extraction Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions
Proper (Name) | ||
---|---|---|
Description | The Proper (Name) case definition propercases names of individuals. | |
Examples | Input | Output |
Александр Бородин | Александр Бородин | |
николай григорьевич рубинштейн | Николай Григорьевич Рубинштейн | |
РИМСКИЙ-КОРСАКОВ НИКОЛАЙ | Римский-Корсаков Николай | |
Remarks |
None.
Name | ||
---|---|---|
Description | The Name gender analysis definition determines the gender of a name. | |
Possible Outputs | M, F, U | |
Examples | Input | Output |
Пафнутий Львович Чебышев | M | |
Софья Ковалевская | F | |
Арнольд В.И. | U | |
Remarks |
City | ||
---|---|---|
Description | The City identification analysis definition determines if a string represents a Russian City. | |
Possible Outputs | CITY, UNK | |
Examples | Input | Output |
Москва | CITY | |
Пушкин | CITY | |
Вентилятор | UNK | |
Remarks |
Individual/Organization | ||
---|---|---|
Description | The Individual/Organization identification analysis definition determines whether a string represents the name of an individual or an organization. | |
Possible Outputs | Organization, Individual, Unknown | |
Examples | Input | Output |
Федор Достоевский | Name | |
ОАО У Швейка | Organization | |
ООО Промкнигторг | Organization | |
Remarks |
Address | ||
---|---|---|
Description | The Address match definition generates match codes which can be used to cluster records containing addresses. | |
Max Length of Match Code | 27 characters | |
Examples | Input | Cluster ID |
Verkhniy Tagansky Tupik, 4 | 0 | |
Verhniy Taganskiy tup., 4 | 0 | |
Тимура Фрунзе, д. 12, кв. 34 | 1 | |
Тимура Фрунзе, 12, 34 | 1 | |
ул. Новорязанская.,31/7, 3-й этаж, building 2 | 2 | |
Remarks |
|
Address (Full) | ||
---|---|---|
Description | The Address (Full) match definition generates match codes which can be used to cluster records containing complete two-line addresses. | |
Max Length of Match Code | 27 characters | |
Examples | Input | Cluster ID |
119146 Москва, Комсомольский пр-т, дом 14/2, кв. 58 | 0 | |
119200 Москва, Комсомольский пр-т, дом 14/4, кв. 70 | 1 | |
123100, Москва, улица 1905 года, 1, кв 15 | 2 | |
Remarks |
|
Address (PO Box Only) | ||
---|---|---|
Description | The Address (PO Box Only) match definition generates match codes which can be used to cluster records containing the PO Box portion of an address. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
ул. Новая Басманная, д. 2, а/я 12 | 0 | |
ул. Тверская 5, а/я 12 | 0 | |
Remarks |
|
|
The Address (PO Box Only) match definition ignores street name information. |
Address (Street Only) | ||
---|---|---|
Description | The Address (Street Only) match definition generates match codes which can be used to cluster records containing the street portion of an address. | |
Max Length of Match Code | 23 characters | |
Examples | Input | Cluster ID |
ул. Новая Басманная, д. 2, а/я 12 | 1 | |
ул. Новая Басманная, д. 2, а/я 17 | 1 | |
ул. Новая Басманная, д. 2 | 1 | |
Remarks |
|
|
The Address (Street Only) match definition ignores PO Box information. |
City | ||
---|---|---|
Description | The City match definition generates match codes which can be used to cluster records containing city names. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
город Дубна | 1 | |
г. Дубна | 1 | |
Новосибирск | 2 | |
Remarks |
|
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code match definition generates match codes which can be used to cluster records containing last line address information. | |
Max Length of Match Code | 32 characters | |
Examples | Input | Cluster ID |
ТЮМЕНСКАЯ ОБЛ,ЯМАЛО-НЕНЕЦКИЙ АВТ ОКРУГ Г ПЫТЬ-ЯХ | 1 | |
ТЮМЕНСКАЯ ОБЛ,ХАНТЫ-МАНСИЙСКИЙ АВТ ОКРУГ Г ПЫТЬ-ЯХ | 1 | |
121069 RUSSIA MOSCOW UL B MOLCHANOVKA, 36 | 2 | |
Remarks |
|
Name | ||
---|---|---|
Description | The Name match definition generates match codes which can be used to cluster records containing names of individuals. | |
Max Length of Match Code | 24 characters | |
Examples | Input | Cluster ID |
Agafonova Anna | 1 | |
Агафонова Анна | 1 | |
Aleksandr Bukatin | 2 | |
Саша Букатин | 2 | |
Remarks |
|
Organization | ||
---|---|---|
Description | The Organization match definition generates match codes which can be used to cluster records containing organization names. | |
Max Length of Match Code | 60 characters | |
Examples | Input | Cluster ID |
ООО "У пескаря" | 0 | |
ПБОЮЛ Сидорова А.Н. | 1 | |
Банк Жилищного Финансирования | 2 | |
БАНК ЖИЛИЩНОГО ФИНАНСИРОВАНИЯ | 2 | |
Remarks |
|
Passport | ||
---|---|---|
Description | The Passport match definition generates match codes which can be used to cluster records containing passport numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
Паспорт 65 00 523259 | 1 | |
серия 6500 номер 523259 | 1 | |
98 56 659821 | 2 | |
Remarks |
|
Phone | ||
---|---|---|
Description | The Phone match definition generates match codes which can be used to cluster records containing phone numbers. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
32-65-99 | 1 | |
8(8352)32-65-99 | 1 | |
+79093129896 | 2 | |
Remarks |
|
Postal Code | ||
---|---|---|
Description | The Postal Code match definition generates match codes which can be used to cluster records containing postal codes. | |
Max Length of Match Code | 15 characters | |
Examples | Input | Cluster ID |
123100 | 1 | |
123104 | 1 | |
607750 | 2 | |
Remarks |
|
Address | |||
---|---|---|---|
Description | The Address parse definition parses addresses into a set of tokens. | ||
Output Tokens | Street House Building Flat PO Box Additional Info |
||
Example 1 | Input | Output Token | Output |
ул. Профсоюзная, дом 45, этаж 8 кв. 34 а/я 34 | Street | ул. Профсоюзная | |
House | дом 45 | ||
Building | |||
Flat | кв 34 | ||
PO Box | а/я 34 | ||
Additional Info | этаж 8 | ||
Example 2 | Input | Output Token | Output |
пр-т Воробьёвых, 18 корпус 4, 9 | Street | пр-т Воробьёвых | |
House | 18 | ||
Building | 4 | ||
Flat | 9 | ||
PO Box | |||
Additional Info | |||
Remarks |
Address (Full) | |||
---|---|---|---|
Description | The Address (Full) parse definition parses addresses containing complete two-line addresses into a set of tokens. | ||
Output Tokens | Postal Code Country Region Province City Street House Building Flat PO Box Additional Info |
||
Example 1 | Input | Output Token | Output |
185547 Тверская обл. г. Торопец ул. Лесная дом 4, офис 12 | Postal Code | 185547 | |
Country | |||
Region | Тверская обл. | ||
Province | |||
City | г. Торопец | ||
Street | ул. Лесная | ||
House | дом 4 | ||
Building | |||
Flat | офис 12 | ||
PO Box | |||
Additional Info | |||
Example 2 | Input | Output Token | Output |
г. Волгоград ул. Московская дом 48 корпус 3, кв. 14 этаж 2, а/я 65975 | Postal Code | 185547 | |
Country | |||
Region | |||
Province | |||
City | г. Волгоград | ||
Street | ул. Московская | ||
House | дом 48 | ||
Building | 3 | ||
Flat | кв. 14 | ||
PO Box | а/я 65975 | ||
Additional Info | этаж 2 | ||
Remarks |
Address (Global) | |||
---|---|---|---|
Description |
The Address (Global) parse definition parses addresses into a globally recognized set of tokens. |
||
Output Tokens | Recipient Building/Site Street Extension PO Box Additional Info |
||
Input | Output Token | Output | |
Example 1 | Пр-т Юных Ленинцев, 78, кв 23, этаж 3 | Recipient | |
Building/Site | |||
Street | Пр-т Юных Ленинцев, 78 | ||
Extension | кв 23, этаж 3 | ||
PO Box | |||
Additional Info | |||
Input | Output Token | Output | |
Example 2 | ул. Профсоюзная, дом 45, этаж 8 кв. 34 а/я 34 | Recipient | |
Building/Site | |||
Street | ул. Профсоюзная, дом 45 | ||
Extension | этаж 8 кв. 34 | ||
PO Box | а/я 34 | ||
Additional Info | |||
Input | Output Token | Output | |
Example 3 | пр-т Воробьёвых, 18 корпус 4, 9 | Recipient | |
Building/Site | |||
Street | пр-т Воробьёвых, 18 | ||
Extension | корпус 4, 9 | ||
PO Box | |||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
City - State/Province - Postal Code | |||
---|---|---|---|
Description | The City - State/Province - Postal Code parse definition parses last line address information into a set of tokens. | ||
Output Tokens | Postal Code Country Region Province City |
||
Example | Input | Output Token | Output |
258488, Новосибирская область, г. Бердск | Postal Code | 258488 | |
Country | |||
Region | Новосибирская область | ||
Province | |||
City | г. Бердск | ||
Remarks |
City - State/Province - Postal Code (Global) | |||
---|---|---|---|
Description | The City - State/Province - Postal Code (Global) parse definition parses last line address information into a globally recognized set of tokens. | ||
Output Tokens | City State/Province Postal Code Additional Info |
||
Example | Input | Output Token | Output |
258488, Новосибирская область, г. Бердск | City | г. Бердск | |
State/Province | Новосибирская область | ||
Postal Code | 258488 | ||
Additional Info | |||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Name | |||
---|---|---|---|
Description | The Name parse definition parses names of individuals into a set of tokens. | ||
Output Tokens | Prefix Given Name Patronym Family Name Additional Info |
||
Example 1 | Input | Output Token | Output |
ФОКИН АНДРЕЙ ВЛАДИМИРОВИЧ | Prefix | ||
Given Name | АНДРЕЙ | ||
Patronym | ВЛАДИМИРОВИЧ | ||
Family Name | ФОКИН | ||
Additional Info | |||
Example 2 | Input | Output Token | Output |
Профессор Ф.Ф. Преображенский | Prefix | ||
Given Name | Ф | ||
Patronym | Ф | ||
Family Name | Преображенский | ||
Additional Info | Профессор | ||
Example 3 | Input | Output Token | Output |
Доцент г-н Дальф Ёжиков | Prefix | г-н | |
Given Name | Дальф | ||
Patronym | |||
Family Name | Ёжиков | ||
Additional Info | Доцент | ||
Remarks |
Name (Global) | |||
---|---|---|---|
Description | The Name (Global) parse definition parses names of individuals into a globally recognized set of tokens. | ||
Output Tokens | Prefix Given Name Middle Name Family Name Suffix Title/Additional Info |
||
Example 1 | Input | Output Token | Output |
Доктор технических наук госпожа Синичкина | Prefix | госпожа | |
Given Name | |||
Middle Name | |||
Family Name | Синичкина | ||
Suffix | |||
Title/Additional Info | Доктор технических наук | ||
Example 2 | Input | Output Token | Output |
Эйншьейн Альберт Германович, член РАН | Prefix | ||
Given Name | Альберт | ||
Middle Name | Германович | ||
Family Name | Эйнштейн | ||
Suffix | |||
Title/Additional Info | член РАН | ||
Remarks | Patronymic names are parsed into the Middle Name token. |
||
Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
Passport | |||
---|---|---|---|
Description |
The Passport parse definition parses passport number data into a set of tokens. |
||
Output Tokens | Series Number |
||
Example | Input | Output Token | Output |
паспорт №345678 серия 46 00 | Series | 46 00 | |
Number | 345678 | ||
Remarks |
Phone | |||
---|---|---|---|
Description | The Phone parse definition parses phone numbers into a set of tokens. | ||
Output Tokens | Prefix Long Distance Code Country Code Area Code Base Number Extension Suffix |
||
Example 1 | Input | Output Token | Output |
+7499568-58-96 | Prefix | ||
Long Distance Code | |||
Country Code | +7 | ||
Area Code | 499 | ||
Base Number | 568-58-96 | ||
Extension | |||
Suffix | |||
Example 2 | Input |
Output Token |
Output |
8 495 1235485 доб. 1285 | Prefix | ||
Long Distance Code | 8 | ||
Country Code | |||
Area Code | 495 | ||
Base Number | 1235485 | ||
Extension | |||
Suffix | доб. 1285 | ||
Remarks |
Phone (Global) | |||
---|---|---|---|
Description | The Phone (Global) parse definition parses phone numbers into a globally recognized set of tokens. | ||
Output Tokens | Country Code Area Code Base Number Extension Line Type Additional Info |
||
Example 1 | Input | Output Token | Output |
+7 (812) 3198565 доб. 11 | Country Code | +7 | |
Area Code | 812 | ||
Base Number | 3198565 | ||
Extension | |||
Line Type | |||
Additional Info | доб. 11 | ||
Example 2 | Input | Output Token | Output |
fax +7 (812) 3198565 | Country Code | +7 | |
Area Code | 812 | ||
Base Number | 3198565 | ||
Extension | |||
Line Type | |||
Additional Info | fax | ||
Remarks | Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales. |
None.
Address | ||
---|---|---|
Description | The Address standardization definition standardizes addresses. | |
Examples | Input | Output |
ул. Карла Маркса, дом 13 стр.2, кв. 15 этаж 4 а/я 14 | ул. Карла Маркса, дом 13, корп 2, кв. 15, а/я 14, этаж 4 | |
ул. Тверская дом 5 подъезд 4 этаж 1 кв 45 | ул. Тверская, дом 5, кв 45, под 4 | |
Remarks |
Address (Full) | ||
---|---|---|
Description | The Address (Full) standardization definition standardizes complete two line addresses. | |
Example | Input | Output |
123321, г. норильск, ул. Теплая, 19, корп.2, кв. 14, а/я 125521 | 123321, г. Норильск, ул. Теплая, д 19, корп 2, Кв 14, а/я 125521 | |
Remarks |
City | ||
---|---|---|
Description | The City standardization definition standardizes city names. | |
Examples | Input | Output |
ТамБов | Тамбов | |
город Дубна | г Дубна | |
Remarks |
City - State/Province - Postal Code | ||
---|---|---|
Description | The City - State/Province - Postal Code standardization definition standardizes last line address information. | |
Examples | Input | Output |
958859 Московская область город Серпухов | 958859, Московская обл, г Серпухов | |
198367, КАЛИНИНГРАД | 198367, Калининград | |
Remarks |
Name | ||
---|---|---|
Description | The Name standardization definition standardizes names of individuals. | |
Examples | Input | Output |
Доктор господин Сидоров А.Р. | г-н Сидоров А.Р. доктор | |
Анатолий Сергеевич ЛОГИНОВ | Логинов Анатолий Сергеевич | |
Remarks |
Organization | ||
---|---|---|
Description | The Organization standardization definition standardizes organization names. | |
Example | Input | Output |
ООО «ПАРУС» | «ПАРУС», ООО | |
Remarks |
Passport | ||
---|---|---|
Description | The Passport standardization definition standardizes passport numbers. | |
Example | Input | Output |
95 06 359 569 | 95-06 359569 | |
Remarks |
Phone | ||
---|---|---|
Description | The Phone standardization definition standardizes phone numbers for domestic use. | |
Examples | Input | Output |
+7 495 485 95 96 | +7 (495) 485-95-96 | |
18568 | 1-85-68 | |
Remarks |
Phone (Replace Obsolete Area Codes) | ||
---|---|---|
Description |
The Phone (Replace Obsolete Area Codes) standardization definition standardizes phone numbers, replacing old area codes with new area codes. |
|
Example | Input | Output |
(08242) 40537 | (08242) 4-05-37 | |
Remarks |
Postal Code | ||
---|---|---|
Description | The Postal Code standardization definition standardizes postal codes. | |
Example | Input | Output |
«123100» | 123100 | |
Remarks |
In addition to the definitions listed on this page, the Russian, Russia locale also inherits all definitions for the Russian language and all Global definitions.
Documentation Feedback: yourturn@sas.com |
Doc ID: QKBCI_RURUS_defs.html |