Word (Script Identification)
Pattern Analysis Definition
| Word (Script Identification) | ||
|---|---|---|
| Description |
The Word (Script Identification) pattern analysis definition determines the Unicode script of each word in the input, and outputs a character representing that script. |
|
| Output Symbols | Symbol | Meaning |
| L | Latin character | |
| 漢 | Kanji/Han | |
| ア | Katakana | |
| あ | Hiragana | |
| 가 | Hangul | |
| Я | Cyrillic | |
| Θ | Greek | |
| ก | Thai | |
| أ | Arabic | |
| א | Hebrew | |
| 9 | Numeric digit | |
| * | other (punctuation, and so on) | |
| Examples | Input | Output |
| 1ー13ー1 イヌイビル・カチドキ8F 501号室 | 9*9*9 ア*ア9L 9漢 | |
| JOHN DOE | L L | |
| (7F, SAS Institute)スズキイチロウ | *9L* L L*ア | |
| 李大伟 赛仕(北京) | 漢 漢*漢* | |
| 爱新觉罗·溥仪 | 漢*漢 | |
| 陈耀昌(Chan,Ed Yiu-Cheong) | 漢*L*L L*L* | |
| 星光大道62号海王星科技大厦A座6楼 | 漢9漢L漢9漢 | |
| 珠海市 245400(玫瑰楼) | 漢 9*漢* | |
| 二零零九年十月二十一日 | 漢 | |
| 14Mar, 2001 | 9L* 9 | |
| 2009/10/21 | 9*9*9 | |
| H134981(5)------ | L9*9* | |
| 0174685503(D) | 9*L* | |
| 22020319691106184X | 9L | |
| 碧丽服装(北京)有限公司 | 漢*漢*漢 | |
| 电话(+86)10-12345678 | 漢*9*9*9 | |
| Fax:01082741510 | L*9 | |
| (010)82741510-345 | *9*9*9 | |
| רודיה סקאלה כשאני אוהב (הערות Liner) Sonotone (1990) | א א א א *א L* L *9* | |
| ΑNDREΑS ZIΑKΑS | W W | |
| Remarks | If a word contains a mix of Greek and Cyrillic, Latin and Cyrillic, or Latin and Greek glyphs (as in the final example, wherein the character Α is the Greek "Alpha" glyph), this definition will output a W, indicating a warning of potentially fraudulent data. | |