You are here: Definitions>Hebrew Definitions

SAS Quality Knowledge Base for Contact Information 25

Hebrew Definitions

In the SAS Quality Knowledge Base, the Hebrew definitions are shared by all Hebrew-language locales. Shared Hebrew definitions are described below.

Case Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Inherited Definitions

Case Definitions

None.

Gender Analysis Definitions

Name
Description The Name gender analysis definition determines the gender of a name.
Possible Outputs M
F
U
  Input Output
Examples בן-שששש סיקולובסקי M
בת-שששש סיקולובסקי F
דוד סיקולובסקי M
יעל סיקולובסקי F
עדי סיקולובסקי U
אדון עדי סיקולובסקי M
גברת עדי סיקולובסקי F
סיקולובסקי U
מלכה וינשטיין F
שאול וינשטיין M
Remarks Since a very large percentage of Hebrew names are not gender specific, an 80% threshold is applied. A gender is identified if the name is associated with that gender at least 80% of the time.

Identification Analysis Definitions

Field Name
Description

The Field Name identification analysis definition identifies database column names.

Possible Outputs NAME
ORGANIZATION
ADDRESS
CITY
STATE/PROVINCE
POSTALCODE
COUNTRY
PHONE
EMAIL
DATE
UNKNOWN
URL
GENDER
MATCHCODE
PERSONAL_ID
ORGANIZATION_ID
GENERIC_ID
COUNTY
MARITAL_STATUS
  Input Output
Examples Company Name ORGANIZATION
HEVRA ORGANIZATION
Address ADDRESS
כתובת ADDRESS
Telephone PHONE
MISPAR_AVODAH PHONE
Remarks

This definition is recommended to determine the type of data stored in a database column based on the name of the column.

The Field Name (v23) identification analysis definition is now deprecated and will be removed in a future release of the QKB.

The Field Name identification analysis definition has been replaced with a copy of the Field Name (v23) definition which takes advantage of updated processing. If you changed your jobs to use the Field Name (v23) definition it is suggested that you change them back.

 

Field Name (v23)
Description The Field Name (v23) identification analysis definition identifies database column names.
Possible Outputs NAME
ORGANIZATION
ADDRESS
CITY
STATE/PROVINCE
POSTALCODE
COUNTRY
PHONE
EMAIL
DATE
UNKNOWN
URL
GENDER
MATCHCODE
PERSONAL_ID
ORGANIZATION_ID
GENERIC_ID
COUNTY
MARITAL_STATUS
  Input Output
Examples Company Name ORGANIZATION
HEVRA ORGANIZATION
Address ADDRESS
כתובת ADDRESS
Telephone PHONE
MISPAR_AVODAH PHONE
Remarks

This definition is recommended to determine the type of data stored in a database column based on the name of the column.

The Field Name (v23) identification analysis definition is now deprecated and will be removed in a future release of the QKB.

The Field Name identification analysis definition has been replaced with a copy of the Field Name (v23) definition which takes advantage of updated processing. If you changed your jobs to use the Field Name (v23) definition it is suggested that you change them back.

Match Definitions

Date (DMY)
Description The Date (DMY) match definition generates match codes which can be used to cluster records containing dates that have the format DMY.
Max Length of Match Code 15 characters
  Input Cluster ID
Examples 05March1969 1
5-3-1969 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Date (DMY) (with Combinations)
Description The Date (DMY) (with Combinations) match definition generates match codes which can be used to cluster records containing dates, with a score for each match code, that have the format DMY.
Max Length of Match Code 15 characters
Example 1 Input Cluster ID
Sensitivities

50 - 100

Weight 100
01/02/2013 0
Feb-01-2013 0
Example 2 Input Cluster ID
Sensitivities

50 - 100

Weight 40
Feb-01-2013 1
01/02/2013 1
02/01/2013 1
Remarks

A date with DMY format will match with the date with MDY format.

This definition generates one or more match codes for each input string. The number of match codes generated for an input string depends on the content of the string. Each match code represents a combination of different parts of the input string; this enables two strings to be matched even when some parts of one or both of the strings differ. See the examples above for an illustration of clusters that can be produced using match codes generated by this definition.

Note that a consequence of generating multiple match codes is that a record can be placed in more than one cluster by a subsequent clustering operation. Therefore, special attention should be given to the entity resolution process when using this definition.

Generation of multiple match codes is achieved through the use of token-combination rules in the match definition. Each match code generated by the definition is associated with one token-combination rule. There is a weight assigned to each rule; each rule's weight is used to calculate a score that is assigned to the match code that is generated by that rule. The score for a match code is equal to the weight of the rule used to generate the match code times the sensitivity that is selected when the definition is executed.

When a record is clustered, the score for the record’s match code represents the confidence with which we can assert that the record belongs in the cluster. Note that when different rules lead to identical clustering results, the scores of the match codes generated by the different rules can be aggregated using the Cluster Aggregation node in a Data Job. The Cluster Aggregation node allows several different methods for aggregating match code scores, such as minimum, maximum, or mean across instances of a record, or minimum, maximum, or mean across all records in a cluster. For information on the Cluster Aggregation node, refer to the documentation provided with the DataFlux Data Management Studio installation.

 

Date (MDY)
Description The Date (MDY) match definition generates match codes which can be used to cluster records containing dates that have the format MDY.
Max Length of Match Code 15 characters
  Input Cluster ID
Examples 8.30.1997 0
August 30th, 1997 0
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Date (MDY) (with Combinations)
Description The Date (MDY) (with Combinations) match definition generates match codes which can be used to cluster records containing dates, with a score for each match code, that have the format MDY.
Max Length of Match Code 15 characters
Example 1 Input Cluster ID
Sensitivities

50 - 100

Weight 100
01/02/2013 0
Jan-02-2013 0
Example 2 Input Cluster ID
Sensitivities

50 - 100

Weight 40
Jan-02-2013 1
02/01/2013 1
Remarks

A date with MDY format will match with the date with DMY format.

This definition generates one or more match codes for each input string. The number of match codes generated for an input string depends on the content of the string. Each match code represents a combination of different parts of the input string; this enables two strings to be matched even when some parts of one or both of the strings differ. See the examples above for an illustration of clusters that may be produced using match codes generated by this definition.

Note that a consequence of generating multiple match codes is that a record can be placed in more than one cluster by a subsequent clustering operation. Therefore, special attention should be given to the entity resolution process when using this definition.

Generation of multiple match codes is achieved through the use of token-combination rules in the match definition. Each match code generated by the definition is associated with one token-combination rule. There is a weight assigned to each rule; each rule's weight is used to calculate a score that is assigned to the match code that is generated by that rule. The score for a match code is equal to the weight of the rule used to generate the match code times the sensitivity that is selected when the definition is executed.

When a record is clustered, the score for the record’s match code represents the confidence with which we can assert that the record belongs in the cluster. Note that when different rules lead to identical clustering results, the scores of the match codes generated by the different rules can be aggregated using the Cluster Aggregation node in a Data Job. The Cluster Aggregation node allows several different methods for aggregating match code scores, such as minimum, maximum, or mean across instances of a record, or minimum, maximum, or mean across all records in a cluster. For information on the Cluster Aggregation node, refer to the documentation provided with the DataFlux Data Management Studio installation.

 

Date (YMD)
Description The Date/Time (YMD) match definition generates match codes which can be used to cluster records containing date/time information.
Max Length of Match Code 15 characters
  Input Cluster ID
Examples 2002dec31 1
2002.12.31 1
Remarks

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Field Name
Description The Field Name match definition generates match codes which can be used to cluster records containing database field names.
Max Length of Match Code 15 characters
  Input Cluster ID
Examples Company Name 0
HEVRA 0
Address 1
כתובת 1
MISPAR_AVODAH 2
Phone 2
  Remarks This definition should be used to find potential matches between database column names.

NoteNote: The results listed above reflect the default match sensitivity (85).

 

Name
Description The Name match definition generates match codes which can be used to cluster records containing names of individuals.
Max Length of Match Code 27 characters
  Input Cluster ID
Examples דוד משה נתניהו 0
דוד ת. נתניהו 0
דוד נתניהו 0
רונית לפידור-בלו 1
רונית בלו 1
בָרוּךְ נָתַן קֵיְיְ 2
ברוך נתן קיי 2
ח'אלד זשצ'ירינסקי 3
חאלד זשצירינסקי 3
איתן מ' ר' הלוי 4
איתן מ. ר. הלוי 4
מיקי מורשת 5
מיכאל מורשת 5
Remarks NoteNote: The results listed above reflect the default match sensitivity (85).

 

Organization
Description The Organization match definition generates match codes which can be used to cluster records containing organization names.
Max Length of Match Code 35 characters
  Input Cluster ID
Examples מ.י.ה. מחשבים בע"מ, סניף הרצליה 0
מ.י.ה. מחשבים בע"מ, תל אביב 0
מ.י.ה. מחשבים בע"מ 0
מ.י.ה. מחשבים 0
החברה ג.ג.ג. 1
ג.ג.ג. חברה 1
חברת ג.ג.ג. 1
ג.ג.ג. ושותפיו 1
ג.ג.ג. 1
הבנק הבינ"ל הראשון 2
הבנק הבינ"ל ה-1 2
א.ב.ג. עמותה 3
א.ב.ג. תאגיד 3
א.ב.ג. בע"מ 3
א.ב.ג. 3
Remarks NoteNote: The results listed above reflect the default match sensitivity (85).

Parse Definitions

Date (DMY)
Description The Date (DMY) parse definition parses dates with format DMY into a set of tokens.
Output Tokens Day

Month

Year
  Input Output
Example 05March1969 Day 05
Month March
Year 1969
Remarks  

 

Date (MDY)
Description The Date (MDY) parse definition parses dates with format MDY into a set of tokens.
Output Tokens Day

Month

Year
  Input Output
Example 03-05-1969 Day 05
Month 03
Year 1969
Remarks  

 

Date (YMD)
Description The Date (YMD) parse definition parses dates with format YMD into a set of tokens.
Output Tokens Day

Month

Year
  Input Output
Example 1969.3.5 Day 5
Month 3
Year 1969
Remarks  

 

Name
Description The Organization parse definition parses company and organization information into a set of tokens.
Output Tokens Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info
  Input Output
Example 1 ד"ר יוסף יצחק קליין Prefix ד"ר
Given Name יוסף
Middle Name יצחק
Family Name קליין
Suffix  
Title/Additional Info  
  Input Output
Example 2 Dr. James Goodnight, CEO Prefix Dr.
Given Name James
Middle Name  
Family Name Goodnight
Suffix  
Title/Additional Info CEO
  Input Output
Example 3 יצחק-דוד פרלמן Prefix  
Given Name יצחק-דוד
Middle Name  
Family Name פרלמן
Suffix  
Title/Additional Info  
  Input Output
Example 4 בן ציון נתניהו Prefix  
Given Name בן ציון
Middle Name  
Family Name נתניהו
Suffix  
Title/Additional Info  
  Input Output
Example 5 רונית לפידור בלו Prefix  
Given Name רונית
Middle Name  
Family Name לפידור בלו
Suffix  
Title/Additional Info  
  Input Output
Example 6 חיים משה Prefix  
Given Name חיים
Middle Name  
Family Name משה
Suffix  
Title/Additional Info  
  Input Output
Example 7 משה חיים Prefix  
Given Name משה
Middle Name  
Family Name חיים
Suffix  
Title/Additional Info  
Remarks  

 

Name (Global)
Description The Name (Global) parse definition parses names of individuals into a globally recognized set of tokens.
Output Tokens Prefix
Given Name
Middle Name
Family Name
Suffix
Title/Additional Info
  Input Output
Example 1 ד"ר יוסף יצחק קליין Prefix ד"ר
Given Name יוסף
Middle Name יצחק
Family Name קליין
Suffix  
Title/Additional Info  
  Input Output
Example 2 Dr. James Goodnight, CEO Prefix Dr.
Given Name James
Middle Name  
Family Name Goodnight
Suffix  
Title/Additional Info CEO
  Input Output
Example 3 יצחק-דוד פרלמן Prefix  
Given Name יצחק-דוד
Middle Name  
Family Name פרלמן
Suffix  
Title/Additional Info  
  Input Output
Example 4 בן ציון נתניהו Prefix  
Given Name בן ציון
Middle Name  
Family Name נתניהו
Suffix  
Title/Additional Info  
  Input Output
Example 5 רונית לפידור בלו Prefix  
Given Name רונית
Middle Name  
Family Name לפידור בלו
Suffix  
Title/Additional Info  
  Input Output
Example 6 חיים משה Prefix  
Given Name חיים
Middle Name  
Family Name משה
Suffix  
Title/Additional Info  
  Input Output
Example 7 משה חיים Prefix  
Given Name משה
Middle Name  
Family Name חיים
Suffix  
Title/Additional Info  
Remarks

Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

 

Name (Multiple Name)
Description The Name (Multiple Name) parse definition parses strings that contain the names of two individuals into a set of tokens.
Output Tokens Name 1
Name 2
  Input Output
Example 1 אדון אייל וגברת יעל מורשת Name 1 אדון אייל מורשת
Name 2 גברת יעל מורשת
  Input Output
Example 2 מר וגב' נתניהו Name 1 מר נתניהו
Name 2 גב' נתניהו
  Input Output
Example 3 מר וגב' בנימין נתניהו Name 1 מר בנימין נתניהו
Name 2 גב' נתניהו
  Input Output
Example 4 משה פרלמן ואייל מורשת Name 1 משה פרלמן
Name 2 אייל מורשת
  Input Output
Example 5 אייל מורשת Name 1 אייל מורשת
Name 2  
  Input Output
Example 6 יוסף אהרון ורד Name 1 יוסף אהרון ורד
Name 2  
Remarks If only one name is present in the input, the first token is used.

 

Organization
Description The Organization parse definition parses company and organization information into a set of tokens.
Output Tokens Name
Legal Form
Site
Additional Info
  Input Output
Example 1 מ.י.ה. מחשבים בע"מ סניף הרצליה (תוכנה) Name מ.י.ה. מחשבים
Legal Form בע"מ
Site סניף הרצליה
Additional Info (תוכנה)
  Input Output
Example 2 אוניברסיטת בן גוריון, שלוחת אילת Name אוניברסיטת בן גוריון
Legal Form  
Site שלוחת אילת
Additional Info  
  Input Output
Example 3 מסעדת עזרא ובניו, ירושלים Name מסעדת עזרא ובניו
Legal Form  
Site ירושלים
Additional Info  
  Input Output
Example 4 מסעדת ירושלים Name מסעדת ירושלים
Legal Form  
Site  
Additional Info  
Remarks  

 

Organization (Global)
Description The Organization (Global) parse definition parses company and organization names into a globally recognized set of tokens.
Output Tokens Name
Legal Form
Site
Additional Info
  Input Output
Example 1 מ.י.ה. מחשבים בע"מ סניף הרצליה (תוכנה) Name מ.י.ה. מחשבים
Legal Form בע"מ
Site סניף הרצליה
Additional Info (תוכנה)
  Input Output
Example 2 אוניברסיטת בן גוריון, שלוחת אילת Name אוניברסיטת בן גוריון
Legal Form  
Site שלוחת אילת
Additional Info  
  Input Output
Example 3 מסעדת עזרא ובניו, ירושלים Name מסעדת עזרא ובניו
Legal Form  
Site ירושלים
Additional Info  
  Input Output
Example 4 מסעדת ירושלים Name מסעדת ירושלים
Legal Form  
Site  
Additional Info  
Remarks

Parse definitions named with the Global keyword use a set of output tokens that is consistent across every locale. Results obtained from these definitions can be stored in the same database fields as the results obtained from definitions of the same name in other locales.

Pattern Analysis Definitions

None.

Standardization Definitions

Date (DMY)
Description The Date (DMY) standardization definition standardizes dates that have format DMY. The output is a zero-padded two-digit day, followed by a zero-padded two-digit month, followed by a four-digit year. The day, month, and year are separated by spaces.
  Input Output
Examples 04/07/02 04 07 2002
04July05 04 07 1905
04.07.05 04 07 1905
04July2005 04 07 2005
04-07-2005 04 07 2005
Remarks If the input year is a two-digit value, it is assumed to be within the hundred year span with 2019 as the end of the span. For example, a year of 19 will be 2019, but a year of 20 will be 1920.

 

Date (MDY)
Description The Date (MDY) standardization definition standardizes dates that have format MDY. The output is a zero-padded two-digit month, followed by a zero-padded two-digit day, followed by a four-digit year. The month, day, and year are separated by spaces.
  Input Output
Examples July04, 02 07 04 2002
07/04/02 07 04 2002
July04, 05 07 04 1905
07.04.05 07 04 1905
July 4, 2005 07 04 2005
07-04-2005 07 04 2005
Remarks If the input year is a two-digit value, it is assumed to be within the hundred year span with 2019 as the end of the span. For example, a year of 19 will be 2019, but a year of 20 will be 1920.

 

Date (YMD)
Description The Date (YMD) standardization definition standardizes dates that have format YMD. The output is a four-digit year, followed by a zero-padded two-digit month, followed by a zero-padded two-digit day. The year, month, and day are separated by spaces.
  Input Output
Examples 02July04 2002 07 04
02/07/04 2002 07 04
05July04 1905 07 04
05.07.04 1905 07 04
2005July04 2005 07 04
2005-07-04 2005 07 04
Remarks If the input year is a two-digit value, it is assumed to be within the hundred year span with 2019 as the end of the span. For example, a year of 19 will be 2019, but a year of 20 will be 1920.

 

Name
Description

The Name standardization definition standardizes names of individuals.

  Input Output
Examples גונן אייל אייל גונן
דר דניאל לוין ד"ר דניאל לוין
מ ש נתניהו מ.ש. נתניהו
ברוך קיי (מנכ"ל) ברוך קיי, מנכ"ל
Remarks  

 

Nikud Removal
Description The Nikud Removal standardization definition removes Hebrew diacritics.
  Input Output
Examples נְקֻדּוֹת נקדות
חֲטַף סֶגּוֹל חטף סגול
קָמַץ מָלֵא קמץ מלא
שַׁלְשֶׁ֓לֶת שלשלת
פַּשְׁטָא֙ פשטא
דָּוִד בֶּן-גּוּרִיּוֹן‎ דוד בן-גוריון‎
Remarks  

 

Organization
Description The Organization standardization definition standardizes organization names.
  Input Output
Examples מ.י.ה. מחשבים בערבון מוגבל מ.י.ה. מחשבים בע"מ
דני וולך, עו"ד עו"ד דני וולך
מ.י.ה. מחשבים בע"מ הרצליה מ.י.ה. מחשבים בע"מ, הרצליה
א.ד.מטלון א.ד. מטלון
דרור אורטס - שפיגל דרור אורטס-שפיגל
Remarks  

Inherited Definitions

In addition to the definitions listed on this page, all Hebrew-language locales also inherit all Global definitions.