Word (Script Identification)

Pattern Analysis Definition

Word (Script Identification)
Description

The Word (Script Identification) pattern analysis definition determines the Unicode script of each word in the input, and outputs a character representing that script.

Output Symbols Symbol Meaning
L Latin character
Kanji/Han
Katakana
Hiragana
Hangul
Я Cyrillic
Θ Greek
Thai
أ Arabic
א Hebrew
9 Numeric digit
* other (punctuation, and so on)
Examples Input Output
1ー13ー1 イヌイビル・カチドキ8F 501号室 9*9*9 ア*ア9L 9漢
JOHN DOE L L
(7F, SAS Institute)スズキイチロウ *9L* L L*ア
李大伟 赛仕(北京) 漢 漢*漢*
爱新觉罗·溥仪 漢*漢
陈耀昌(Chan,Ed Yiu-Cheong) 漢*L*L L*L*
星光大道62号海王星科技大厦A座6楼 漢9漢L漢9漢
珠海市 245400(玫瑰楼) 漢 9*漢*
二零零九年十月二十一日
14Mar, 2001 9L* 9
2009/10/21 9*9*9
H134981(5)------ L9*9*
0174685503(D) 9*L*
22020319691106184X 9L
碧丽服装(北京)有限公司 漢*漢*漢
电话(+86)10-12345678 漢*9*9*9
Fax:01082741510 L*9
(010)82741510-345 *9*9*9
רודיה סקאלה כשאני אוהב (הערות Liner) Sonotone (1990) א א א א *א L* L *9*
ΑNDREΑS ZIΑKΑS W W
Remarks If a word contains a mix of Greek and Cyrillic, Latin and Cyrillic, or Latin and Greek glyphs (as in the final example, wherein the character Α is the Greek "Alpha" glyph), this definition will output a W, indicating a warning of potentially fraudulent data.