SAS Quality Knowledge Base for Contact Information 27
Word (Script Identification) | ||
---|---|---|
Description |
The Word (Script Identification) pattern analysis definition determines the Unicode script of each word in the input, and outputs a character representing that script. |
|
Output Symbols | Symbol | Meaning |
L | Latin character | |
漢 | Kanji/Han | |
ア | Katakana | |
あ | Hiragana | |
가 | Hangul | |
Я | Cyrillic | |
Θ | Greek | |
ก | Thai | |
أ | Arabic | |
א | Hebrew | |
9 | Numeric digit | |
* | other (punctuation, and so on) | |
Examples | Input | Output |
1ー13ー1 イヌイビル・カチドキ8F 501号室 | 9*9*9 ア*ア9L 9漢 | |
JOHN DOE | L L | |
(7F, SAS Institute)スズキイチロウ | *9L* L L*ア | |
李大伟 赛仕(北京) | 漢 漢*漢* | |
爱新觉罗·溥仪 | 漢*漢 | |
陈耀昌(Chan,Ed Yiu-Cheong) | 漢*L*L L*L* | |
星光大道62号海王星科技大厦A座6楼 | 漢9漢L漢9漢 | |
珠海市 245400(玫瑰楼) | 漢 9*漢* | |
二零零九年十月二十一日 | 漢 | |
14Mar, 2001 | 9L* 9 | |
2009/10/21 | 9*9*9 | |
H134981(5)------ | L9*9* | |
0174685503(D) | 9*L* | |
22020319691106184X | 9L | |
碧丽服装(北京)有限公司 | 漢*漢*漢 | |
电话(+86)10-12345678 | 漢*9*9*9 | |
Fax:01082741510 | L*9 | |
(010)82741510-345 | *9*9*9 | |
רודיה סקאלה כשאני אוהב (הערות Liner) Sonotone (1990) | א א א א *א L* L *9* | |
ΑNDREΑS ZIΑKΑS | W W | |
Remarks | If a word contains a mix of Greek and Cyrillic, Latin and Cyrillic, or Latin and Greek glyphs (as in the final example, wherein the character Α is the Greek "Alpha" glyph), this definition will output a W, indicating a warning of potentially fraudulent data. |
Documentation Feedback: yourturn@sas.com |
Doc ID: QKBCI_GB_Pattern_Word-ScriptID.html |