Chapter Contents |
Previous |
Next |
strxfrm |
This section discusses how to supplement the
standard locales (
"S370"
and
"POSIX"
) and the three
nonstandard locales supplied by the SAS/C Library by creating your own locales.
Following is a discussion of the user-added structures that correspond to
the standard locale structures, a listing of the header file
<localeu.h>
and an example locale (
"SAMP"
), and discussions of user-defined
strcoll
and
strxfrm
functions.
Creating a New Locale |
Creating a new locale involves two steps:
Once these tasks are completed, the necessary locale
information in object form can be properly loaded, enabling a program to use
the locale with a call to
setlocale
.
The header file
<localeu.h>
maps the various data structures and routines required
for a locale. You must include it as well as
<locale.h>
when compiling a locale. Locale source code should be
compiled with the
RENT
or
RENTEXT
compiler option
and link-edited
RENT
(for
re-entrant).
Each category for
setlocale
has a structure mapped within
<localeu.h>
that corresponds to it. The following table describes
the user-supplied categories that correspond to the categories for
setlocale
.
Category | Structure Name | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LC_NUMERIC
|
_lc_numeric
|
LC_NUMERIC
contains nonmonetary numeric formatting items; see the description
of
localeconv
in Chapter
15, "Localization Functions." |
||||||||||
LC_MONETARY
|
_lc_monetary
|
LC_MONETARY
contains monetary formatting items; see the description of
localeconv
in
localeconv
in Chapter 15. |
||||||||||
LC_TIME
|
_lc_time
|
LC_TIME
contains function pointers to locale-specific date and formatting
routines, pointers to month and weekday names, and a.m. and p.m. designation;
all are used with various
strftime
formats. |
||||||||||
LC_CTYPE
|
_lc_ctype
|
LC_CTYPE
contains a flag for enablement of DBCS processing and string recognition
by other library routines, including recognition of multibyte characters in
formatted I/O format strings such as those used in
printf.LC_CTYPE
also contains a pointer to the character type table
that affects the behavior of the character type functions such as
isalpha
and
tolower
. |
||||||||||
LC_COLLATE
|
_lc_collate
|
LC_COLLATE
contains a mode flag indicating the processing mode and collation
table pointer for
strxfrm
and
strcoll
. Mode flag
meanings are as follows:
strxfrm
and
strcoll
functions use the collation tabel if supplied. In double-byte and multibye
mode, any use of the table is strictly left up to the user-supplied routines.
The library functions use a standard double-byte collation in double-byte
mode when the collatin table pointer is
NULL
.
Included in this category are function pointers to locale-specific
versions of
|
||||||||||
LC_ALL
|
void *_lc_all[5]
|
LC_ALL
is an array of pointers to the other
_lc
structures in the following order:
|
When the pointer to a structure that corresponds to
a category is
NULL
, the
name returned by
setlocale
reflects the new locale's name. However, it has the default "
C
" locale characteristics for that category.
Similarly, if individual elements of a structure (pointers) are
NULL
or binary
0
, that piece of the locale also exhibits "
C
" locale behavior.
The following is a listing of the
<localeu.h>
header file required for compiling
a user-added locale.
/* This header file defines additions to the ANSI locale.h header */ /* file that are required for compiling both user-added locale */ /* value table load modules and several library functions. The */ /* "C" defaults appear as comments for the _lc_numeric and */ /* _lc_monetary categories. */ static struct _lc_numeric { char *decimal_point; /* "." */ char *thousands_sep; /* "" */ char *grouping; /* "" */ }; static struct _lc_monetary { char *int_curr_symbol; /* "" */ char *currency_symbol; /* "" */ char *mon_decimal_point; /* "" */ char *mon_thousands_sep; /* "" */ char *mon_grouping; /* "" */ char *positive_sign; /* "" */ char *negative_sign; /* "" */ char int_frac_digits; /* CHAR_MAX */ char frac_digits; /* CHAR_MAX */ char p_cs_precedes; /* CHAR_MAX */ char p_sep_by_space; /* CHAR_MAX */ char n_cs_precedes; /* CHAR_MAX */ char n_sep_by_space; /* CHAR_MAX */ char p_sign_posn; /* CHAR_MAX */ char n_sign_posn; /* CHAR_MAX */ }; static const struct _lc_time { /* locale's date and time conversion routine */ char *(*_lct_datetime_conv)(); /* address of locale's day conversion routine */ char *(*_lct_date_conv)(); /* address of locale's time conversion routine */ char *(*_lct_time_conv)(); /* address of weekday abbreviation table */ char *_lct_wday_name [7] ; /* address of full weekday name table */ char *_lct_weekday_name [7] ; /* address of month abbreviation table */ char *_lct_mon_name [12] ; /* address of full month name table */ char *_lct_month_name [12] ; /* locale's before-noon designation */ char *_lct_am; /* locale's after-noon designation */ char *_lct_pm; }; #define SBCS 0 /* single-byte character set */ #define DBCS 1 /* double-byte character set */ static const struct _lc_collate { /* single-, double-, or multibyte character indicator */ int _lcc_cmode; /* pointer to collation table */ void *_lcc_colltab; /* pointer to user-added strcoll function */ int (*_lcc_strcoll)(); /* pointer to user-added strxfrm function */ size_t (*_lcc_strxfrm)(); }; static const struct _lc_ctype { /* single-, double-, or multibyte character indicator */ int _lcc_cmode; /* character type table pointer */ void *_lcc_ctab; }; /* If _lcc_cmode is set to DBCS, it only has an impact on the ANSI */ /* multibyte character handling functions, not on isalpha, and */ /* so on. _lcc_ctab is for single-byte characters only, per the */ /* ANSI ctype.h-allowed representation of "unsigned char," and */ /* has no relation to to _lcc_mode. */ static const void *_lc_all [5] ; /* pointers to _lc struct */ /* [0] - &_lc_collate */ /* [1] - &_lc_ctype */ /* [2] - &_lc_monetary */ /* [3] - &_lc_numeric */ /* [4] - &_lc_time */
Example locales
L$CLSAMP
("
SAMP
")
and
L$CLDBEX
("
DBEX
") are provided in source form with the compiler
and library to serve as skeleton locales. You can easily modify
these locales to create new locales. Ask your SAS Software Representative
for SAS/C compiler products for information about obtaining copies of these
programs. Here is an abbreviated listing of the "
SAMP
" locale, illustrating the data structures and routine formats
required for a locale. The
L$CLDBEX
example is a double-byte example locale (not shown) with sample
strcoll
and
strxfrm
routines.
#title l$clsamp -- "SAMP" sample locale /* This is the "SAMP" locale value module table, which */ /* provides a skeleton example to modify for a */ /* particular locale. For those locales requiring */ /* double-byte character support, see the "DBEX" locale */ /* (L$DLDBEX) for examples of setting up a double-byte */ /* LC_CTYPE, LC_COLLATE, strcoll, and strxfrm. */ /* */ /* Any addresses of functions or tables not specified */ /* with a category use the "C" locale equivalent */ /* function or table. If a whole category is not specified */ /* and the locale is requested for that category with */ /* setlocale, effectively the "C" locale is used, although */ /* the locale string returned contains the locale's name. */ #include <stddef.h> #include <locale.h> #include <localeu.h> #include <dynam.h> #include <stdlib.h> #include <string.h> #include <time.h> #eject /* */ /* ENTRY: <=== _dynamn (externally visible) needs to be */ /* be compiled with SNAME l$clsam. */ /* */ /* USAGE: <=== Prototype call: _dynamn (dynamically */ /* loaded with the loadm function from the */ /* setlocale function). */ /* */ /* Needs to be compiled with RENT or RENText */ /* option and link-edited RENT. */ /* */ /* For example, the following is a call made */ /* to setlocale: */ /* setlocale(LC_ALL, "SAMP"); */ /* The following code is executed within */ /* setlocale code to load L$CLSAMP and then */ /* call it: */ /* loadm(L$CLSAMP, &fp) */ /* fncptr = (char *** )(*fp)(); */ /* */ /* ARGUMENTS: <=== None */ /* */ /* RETURNS: <=== A pointer to an array of pointers */ /* */ /* static const void *lc_all_samp[5] = */ /* &collate, collate pointer */ /* &ctype, ctype pointer */ /* &monetary monetary pointer */ /* &numeric numeric pointer */ /* &time time format pointer */ /* */ /* */ /* END */ #eject /*-----------------COLLATION category-----------------------*/ static const unsigned char sbcs_collate_table_samp [256] = /* If a collation table is specified, that is, its address */ /* is nonzero, a locale strxfrm function is not coded, and */ /* the locale is not a multibyte (double-byte) locale, then */ /* the collation array must have 256 elements that */ /* translate any character's 8-bit representation to its */ /* proper place in the locale's collating sequence. */ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* 0x00-0f */ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, /* 0x10-1f */ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, /* 0x20-2f */ 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, . . . . . . . . /* 0x30-ef */ . . . . . . . . . . . . . . . . 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, /* 0xf0-ff */ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff; #define SBCS 0 #define DBCS 1 static const struct _lc_collate lc_collate_samp = { SBCS, /* single-byte character mode */ sbcs_collate_table_samp, /* collation table address */ 0, /* locale strcoll collation */ /* function pointer */ 0 /* locale strxfrm transform */ /* function pointer */ ; /* See L$CLDBEX for DBCS example of strcoll and strxfrm functions. */ #eject /*-----------------CTYPE category--------------------------*/ #define U 1 /* uppercase */ #define L 2 /* lowercase */ #define N 4 /* number */ #define W 8 /* white space */ #define P 16 /* punctuation */ #define S 32 /* blank */ #define AX 64 /* alpha extender */ #define X 128 /* hexadecimal */ static const unsigned char lc_ctab_samp[513] = { /* The character type table array, if coded, must contain */ /* 513 single char elements. The first element is the EOF */ /* representation (-1 or 0xff) followed by 256 elements */ /* that contain the char types for any 8-bit character */ /* returned by functions isalpha, isnumeric, and so on. */ /* The next 256 elements contain the mappings for the */ /* tolower and toupper string transformation functions. */ 0, /* -1 = EOF */ 0, /* 00 = nul */ 0, /* 01 = soh */ 0, /* 02 = stx */ 0, /* 03 = etx */ 0, /* 04 = sel */ W, /* 05 = ht */ 0, /* 06 = rnl */ 0, /* 07 = del */ 0, /* 08 = ge */ 0, /* 09 = sps */ 0, /* 0a = rpt */ W, /* 0B = vt */ W, /* 0C = ff */ W, /* 0D = cr */ 0, /* 0E = so */ 0, /* 0F = si */ 0, /* 10 = dle */ 0, /* 11 = dcl */ 0, /* 12 = dc2 */ 0, /* 13 = dc3 */ . . . . . . U, /* E6 = W */ U, /* E7 = X */ U, /* E8 = Y */ U, /* E9 = Z */ 0, /* EA */ 0, /* EB */ 0, /* EC */ 0, /* ED */ 0, /* EE */ 0, /* EF */ N|X, /* F0 = 0 */ N|X, /* F1 = 1 */ N|X, /* F2 = 2 */ N|X, /* F3 = 3 */ N|X, /* F4 = 4 */ N|X, /* F5 = 5 */ N|X, /* F6 = 6 */ N|X, /* F7 = 7 */ N|X, /* F8 = 8 */ N|X, /* F9 = 9 */ 0, /* FA */ 0, /* FB */ 0, /* FC */ 0, /* FD */ 0, /* FE */ 0, /* FF = eo */ /* Lower 257 bytes contain char types, next */ /* 256 contain the tolower and toupper */ /* character mappings. */ 0x00, /* 00 = nul */ 0x01, /* 01 = soh */ 0x02, /* 02 = stx */ 0x03, /* 03 = etx */ . . . . . . 0x7d, /* 7D = ' */ 0x7e, /* 7E = = */ 0x7f, /* 7F = " */ 0x80, /* 80 */ 0xc1, /* 81 = a -> C1 = A */ 0xc2, /* 82 = b -> C2 = B */ 0xc3, /* 83 = c -> C3 = C */ 0xc4, /* 84 = d -> C4 = D */ 0xc5, /* 85 = e -> C5 = E */ 0xc6, /* 86 = f -> C6 = F */ 0xc7, /* 87 = g -> C7 = G */ 0xc8, /* 88 = h -> C8 = H */ 0xc9, /* 89 = i -> C9 = I */ . . . . . . 0xbc, /* BC */ 0xbd, /* BD = ] (close bracket) */ 0xbe, /* BE */ 0xbf, /* BF */ 0xc0, /* C0 = (open brace) */ 0x81, /* C1 = A -> 81 = a */ 0x82, /* C2 = B -> 82 = b */ 0x83, /* C3 = C -> 83 = c */ 0x84, /* C4 = D -> 84 = d */ 0x85, /* C5 = E -> 85 = e */ 0x86, /* C6 = F -> 86 = f */ 0x87, /* C7 = G -> 87 = g */ 0x88, /* C8 = H -> 88 = h */ 0x89, /* C9 = I -> 89 = i */ 0xca, /* CA = shy */ 0xcb, /* CB */ 0xcc, /* CC */ 0xcd, /* CD */ 0xce, /* CE */ . . . . . . 0xf7, /* F7 = 7 */ 0xf8, /* F8 = 8 */ 0xf9, /* F9 = 9 */ 0xfa, /* FA */ 0xfb, /* FB */ 0xfc, /* FC */ 0xfd, /* FD */ 0xfe, /* FE */ 0xff /* FF = eo */ }; static const struct _lc_ctype lc_ctype_samp = SBCS, /* single-byte character mode */ &lc_ctab_samp /* ctype table pointer */ }; #eject /*------------NUMERIC category--------------------*/ const static struct _lc_numeric lc_numeric_samp = { ".", /* decimal_point */ ",", /* thousands_sep */ "\3" /* grouping */ }; /*------------MONETARY category---------------------*/ static const struct _lc_monetary lc_monetary_samp = { "DOL", /* int_curr_symbol */ "$", /* currency_symbol */ ".", /* mon_decimal_point */ ",", /* mon_thousands_sep */ "\3", /* mon_grouping */ "", /* positive_sign */ "-", /* negative_sign */ 2, /* int_frac_digits */ 2, /* frac_digits */ 1, /* p_cs_precedes */ 0, /* p_sep_by_space */ 1, /* n_cs_precedes */ 0, /* n_sep_by_space */ 1, /* p_sign_posn */ 1 /* n_sign_posn */ ; #eject /*------------TIME category------------------------------*/ char *sampdcnv(struct tm *tp); static const struct _lc_time lc_time_samp = { 0, /* pointer to date and time conversion */ /* routine function pointer */ &sampdcnv, /* pointer to date conversion */ /* routine function pointer */ 0, /* pointer to time conversion */ /* routine function pointer */ /* weekday name abbreviations */ "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", /* weekday full names */ "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", /* month name abbreviations */ "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", /* month full names */ "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December", "AM", /* locale's "AM" equivalent */ "PM" /* locale's "PM" equivalent */ }; char *sampdcnv(struct tm *tp) { /* SAMP date conversion routine */ /* Function returns the date in the form: */ /* wkd mon dd 'yy */ /* for example, Thu Oct 10 '85. */ char *time_format; time_format = asctime(tp); memcpy(time_format + 11, " '", 2); memcpy(time_format + 13, time_format + 22, 2); *(time_format + 15) = ' 0'; return time_format; /* End sampdcnv. */ #eject /* ALL category - array of pointers to category structures */ static const void *lc_all_samp[5] = &lc_collate_samp , /* pointer to collate category */ &lc_ctype_samp , /* pointer to ctype category */ &lc_monetary_samp , /* pointer to monetary category */ &lc_numeric_samp , /* pointer to numeric samp */ &lc_time_samp /* pointer to time samp */ ; /*-----------------Return category pointers---------------*/ void *_dynamn() /* executable entry point */ { return (void *)&lc_all_samp; /* Return address of ALL array. */ }
If the library
strcoll
function is not adequate for the needs of a locale, you can write
and use your own routine to do the collation. You include this routine as
part of the
LC_COLLATE
category of a locale to be called from the library
strcoll
function after
setlocale
has loaded the locale. Because it is your own routine, you
can give it any legal name, as long as you are consistent in its use. (For
instance, the following example uses the name
loclcoll
.)
The locale's routine can make use of any information
available in the locale, such as the mode and collation tables. In addition,
if the
LC_COLLATE
mode
is not
0
, the collation
tables coded as part of the locale are not restricted to any format as long
as the locale's
strxfrm
and
strcoll
routines can
understand them.
The locale's routine is invoked from the library's
strcoll
function with the equivalent
of the following call:
/* library strcoll function */ int strcoll(char *str1, const char *str2) { . . . /* Return locale's strcoll value to library's */ /* strcoll caller. */ return loclcoll(str1, str2); }
The
loclcoll
function code appears as part of the
LC_COLLATE
category within the locale source code:
. . . /* collation tables, transformation tables, */ /* and other locale data */ int loclcoll(const char *str1, const char *str2) /* ARG DCL DESCRIPTION */ /* */ /* str1 const char * pointer to first input string */ /* */ /* str2 const char * pointer to second input string */ /* */ /* RETURNS: <=== str1 < str2 a negative value */ /* str1 = str2 0 */ /* str1 > str2 a positive value */ /* */ { . . /* locale's equivalent strcoll function code */ . return x /* Return a result, x. */ } . . /* more locale data, routines, and so on */ .
For an example of a locale routine for
strcoll
, see the
L$CLDBEX
source code member distributed with the compiler and library.
If the library
strxfrm
function is not adequate for the needs of a locale, you
can
write and use your own routine to do the transformation. You include this
routine as part of the
LC_COLLATE
category of a locale to be called from the library
strxfrm
function after being loaded with an appropriate
setlocale
call. Because it is
your own routine, you can give it any legal name, as long as you are consistent
in its use. (For instance, the following example uses the name
loclxfrm
.)
There is one main difference in behavior requirements
for the locale equivalent of the
strxfrm
function and behavior requirements for the library version.
After the output buffer is filled, it stops scanning the input string and
returns the size of the filled output buffer rather than the total transformed
length. It also places the number of characters consumed from the input string
in the area addressed by an additional fourth parameter. The locale's routine
can use any information available in the locale, such as the mode and collation
tables. In addition, if the
LC_COLLATE
mode is not
0
, the collation and transformation tables coded as part of the locale
are not restricted to any format as long as the locale's
strxfrm
and
strcoll
routines can understand them.
The reason the behavior requirements of the library
and locale
strxfrm
routines
differ is to allow the
strcoll
function to call
strxfrm
with a limited buffer that might permit only partial transformation
of the whole string. Theoretically, any number of output characters can be
produced by
strxfrm
from
any number of input characters.
The locale's
strxfrm
routine is invoked from within
strxfrm
with an equivalent of the following call. (The choice of
"loclxfrm"
is arbitrary. It could
be any legal name as long as it is consistent.)
size_t strxfrm(char *str1, const char *str2, size_t n) { size_t nchar_xfrmed, used; . . . wchar_xfrmed = loclxfrm(str1, str2, n, &used); . . . }
The
loclxfrm
function code appears as part of the
LC_COLLATE
category within the locale source code:
. . /* collation tables, transformation tables, and other */ /* locale data */ . size_t loclxfrm(char *str1, const char *str2, size_t n, size_t *used) /* ARG DCL DESCRIPTION */ /* */ /* str1 char * pointer to the transformed */ /* string output array */ /* */ /* str2 const char * pointer to input string array */ /* */ /* n size_t maximum number of bytes */ /* (characters) written to str1 */ /* including the terminating null. */ /* If n or more characters are */ /* required for the transformed */ /* string, only the first n are */ /* written and the string is not */ /* null-terminated. */ /* */ /* used size_t * pointer to size_t (unsigned int) */ /* value where the number of input */ /* characters consumed from the */ /* input string str2 is returned */ /* */ /* RETURNS: <=== The number of characters placed in the */ /* output transformation array is returned. */ /* */ /* Also, the number of characters consumed */ /* (scanned) from the input string str2 is */ /* placed in the size_t (unsigned int) */ /* value pointed to by used. */ /* */ /* The total number of characters required */ /* for the transformation is obtained by a */ /* special call: */ /* total_loclxfrm_len = loclxfrm(0, s2, 0, &used) */ { . . /* locale's strxfrm code */ } . . . /* more locale data, routines, and so on */ .
For an example of a locale routine for
strxfrm
, see the
L$CLDBEX
source code member distributed with the compiler and library.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.