SAS Quality Knowledge Base for Contact Information 27

Word (Script Identification)

Pattern Analysis Definition

Word (Script Identification)
Description

The Word (Script Identification) pattern analysis definition determines the Unicode script of each word in the input, and outputs a character representing that script.

Output Symbols Symbol Meaning
L Latin character
Kanji/Han
Katakana
Hiragana
Hangul
Я Cyrillic
Θ Greek
Thai
أ Arabic
א Hebrew
9 Numeric digit
* other (punctuation, and so on)
Examples Input Output
1ー13ー1 イヌイビル・カチドキ8F 501号室 9*9*9 ア*ア9L 9漢
JOHN DOE L L
(7F, SAS Institute)スズキイチロウ *9L* L L*ア
李大伟 赛仕(北京) 漢 漢*漢*
爱新觉罗·溥仪 漢*漢
陈耀昌(Chan,Ed Yiu-Cheong) 漢*L*L L*L*
星光大道62号海王星科技大厦A座6楼 漢9漢L漢9漢
珠海市 245400(玫瑰楼) 漢 9*漢*
二零零九年十月二十一日
14Mar, 2001 9L* 9
2009/10/21 9*9*9
H134981(5)------ L9*9*
0174685503(D) 9*L*
22020319691106184X 9L
碧丽服装(北京)有限公司 漢*漢*漢
电话(+86)10-12345678 漢*9*9*9
Fax:01082741510 L*9
(010)82741510-345 *9*9*9
רודיה סקאלה כשאני אוהב (הערות Liner) Sonotone (1990) א א א א *א L* L *9*
ΑNDREΑS ZIΑKΑS W W
Remarks If a word contains a mix of Greek and Cyrillic, Latin and Cyrillic, or Latin and Greek glyphs (as in the final example, wherein the character Α is the Greek "Alpha" glyph), this definition will output a W, indicating a warning of potentially fraudulent data.