Problem Note 50696: ANORM420 function provides options for correctly supporting the Arabic characters encoded in EBCDIC 420
Arabic is a complex script that is written from right to left. The characters have four basic forms according to the position that they take within the word: initial, middle, final, or isolated. Also, it uses ligatures, which are characters that consist of two or more characters. Most notably, the lam-alef ligature consists of lam (initial or medial form) with alef (final form).
In some encodings, such as EBCDIC420, the lam-alef ligature has a single code point whereas others, such as Windows 1256 or ISO-8859-6, encode the lam and alef characters as separate code points. When SAS transcodes data from EBCDIC420 to one of the other Arabic single-byte encodings, the lam-alef combined character cannot be mapped to a single code point, so a substitution character is placed in the data instead.
A new function, ANORM420, is being introduced in SAS® 9.4 TS1M1. It correctly maps the EBCDIC 420 lam-alef to two separate code points in your data. The function also has options that enable you to add spaces after code points representing the final form of a character and convert Arabic-Indic numerals to a digit.
The following example shows the ANORM420 function usage.
data _null_ ;
a = '59CD57BC577745'x ;
s1 = anorm420(a) ;
/* Turn off addition of space and mapping of Arabic-Indic numbers */
s2 = anorm420(a,"si") ;
/* Turn off transcoding */
s3 = anorm420(a,'t') ;
put s1= $hex20. / s2= $hex20. / s3=$hex20. ;
run;
Here is the resulting output in the SAS log:
s1=C8E5C7E3C7D320A02020
s2=C8E5C7E3C7D3A0202020
s3=59CD57BC577740454040
Click the Hot Fix tab in this note to access the hot fix for this issue.
Operating System and Release Information
SAS System | Base SAS | Z64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft® Windows® for x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft® Windows® for 64-Bit Itanium-based Systems | 9.4 | | 9.4 TS1M0 | |
Microsoft Windows 8 Enterprise 32-bit | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft Windows 8 Enterprise x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft Windows 8 Pro 32-bit | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft Windows 8 Pro x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft Windows Server 2008 R2 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft Windows Server 2008 for x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft Windows Server 2012 Datacenter | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Microsoft Windows Server 2012 Std | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Windows 7 Enterprise x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Windows 7 Professional x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
64-bit Enabled AIX | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
64-bit Enabled HP-UX | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
64-bit Enabled Solaris | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
HP-UX IPF | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Linux for x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
Solaris for x64 | 9.4 | 9.4_M1 | 9.4 TS1M0 | 9.4 TS1M1 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Problem Note |
Priority: | high |
Date Modified: | 2013-11-18 13:55:25 |
Date Created: | 2013-08-08 14:34:17 |