Chapter Contents

Previous

Next
strxfrm

User-Added Locales

This section discusses how to supplement the standard locales ( "S370" and "POSIX" ) and the three nonstandard locales supplied by the SAS/C Library by creating your own locales. Following is a discussion of the user-added structures that correspond to the standard locale structures, a listing of the header file <localeu.h> and an example locale ( "SAMP" ), and discussions of user-defined strcoll and strxfrm functions.


Creating a New Locale

Creating a new locale involves two steps:

Once these tasks are completed, the necessary locale information in object form can be properly loaded, enabling a program to use the locale with a call to setlocale .

The header file <localeu.h> maps the various data structures and routines required for a locale. You must include it as well as <locale.h> when compiling a locale. Locale source code should be compiled with the RENT or RENTEXT compiler option and link-edited RENT (for re-entrant).

Each category for setlocale has a structure mapped within <localeu.h> that corresponds to it. The following table describes the user-supplied categories that correspond to the categories for setlocale .

Category Structure Name Description
LC_NUMERIC _lc_numeric LC_NUMERIC contains nonmonetary numeric formatting items; see the description of localeconv in Chapter 15, "Localization Functions."
LC_MONETARY _lc_monetary LC_MONETARY contains monetary formatting items; see the description of localeconv in localeconv in Chapter 15.
LC_TIME _lc_time LC_TIME contains function pointers to locale-specific date and formatting routines, pointers to month and weekday names, and a.m. and p.m. designation; all are used with various strftime formats.
LC_CTYPE _lc_ctype LC_CTYPE contains a flag for enablement of DBCS processing and string recognition by other library routines, including recognition of multibyte characters in formatted I/O format strings such as those used in printf.LC_CTYPE also contains a pointer to the character type table that affects the behavior of the character type functions such as isalpha and tolower .
LC_COLLATE _lc_collate LC_COLLATE contains a mode flag indicating the processing mode and collation table pointer for strxfrm and strcoll . Mode flag meanings are as follows:
0 indicates single-byte mode.
1 indicates double-byte mode.
>1 indicates multibyte mode.
For single-byte mode, the library's strxfrm and strcoll functions use the collation tabel if supplied. In double-byte and multibye mode, any use of the table is strictly left up to the user-supplied routines. The library functions use a standard double-byte collation in double-byte mode when the collatin table pointer is NULL .

Included in this category are function pointers to locale-specific versions of strxfrm and strcoll . The locale-specific version of the strxfrm function requires an additional fourth parameter. This parameter is a pointer to a size_t variable where the number of characters consumed from the input string by the function is placed. Also, the value returned by the locale's strxfrm is the number of characters placed in the output array, not necessarily the total transformed string length.

LC_ALL void *_lc_all[5] LC_ALL is an array of pointers to the other _lc structures in the following order:
[0] &_lc_collate
[1] &_lc_ctype
[2] &_lc_monetary
[3] &_lc_numeric
[4] &_lc_time .

When the pointer to a structure that corresponds to a category is NULL , the name returned by setlocale reflects the new locale's name. However, it has the default " C " locale characteristics for that category. Similarly, if individual elements of a structure (pointers) are NULL or binary 0 , that piece of the locale also exhibits " C " locale behavior.

The <localeu.h> Header File

The following is a listing of the <localeu.h> header file required for compiling a user-added locale.

/* This header file defines additions to the ANSI locale.h header  */
/* file that are required for compiling both user-added  locale    */
/* value table load modules and several library functions. The     */
/* "C" defaults appear as comments for the _lc_numeric and         */
/*  _lc_monetary categories.                                       */
static struct _lc_numeric {
   char *decimal_point;         /* "."      */
   char *thousands_sep;         /* ""       */
   char *grouping;              /* ""       */
};

static struct _lc_monetary  {
   char *int_curr_symbol;       /* ""       */
   char *currency_symbol;       /* ""       */
   char *mon_decimal_point;     /* ""       */
   char *mon_thousands_sep;     /* ""       */
   char *mon_grouping;          /* ""       */
   char *positive_sign;         /* ""       */
   char *negative_sign;         /* ""       */
   char int_frac_digits;        /* CHAR_MAX */
   char frac_digits;            /* CHAR_MAX */
   char p_cs_precedes;          /* CHAR_MAX */
   char p_sep_by_space;         /* CHAR_MAX */
   char n_cs_precedes;          /* CHAR_MAX */
   char n_sep_by_space;         /* CHAR_MAX */
   char p_sign_posn;            /* CHAR_MAX */
   char n_sign_posn;            /* CHAR_MAX */
};

static const struct _lc_time  {
/* locale's date and time conversion routine                       */
   char *(*_lct_datetime_conv)();
/* address of locale's day conversion routine                      */
   char *(*_lct_date_conv)();
/* address of locale's time conversion routine                     */
   char *(*_lct_time_conv)();
/* address of weekday abbreviation table                           */
   char *_lct_wday_name [7] ;
/* address of full weekday name table                              */
   char *_lct_weekday_name [7] ;
/* address of month abbreviation table                             */
   char *_lct_mon_name [12] ;
/* address of full month name table                                */
   char *_lct_month_name [12] ;
/* locale's before-noon designation                                */
   char *_lct_am;
/* locale's after-noon designation                                 */
   char *_lct_pm;
};

#define SBCS 0           /* single-byte character set              */
#define DBCS 1           /* double-byte character set              */

static const struct _lc_collate {
/* single-, double-, or multibyte character indicator              */
   int _lcc_cmode;
/* pointer to collation table                                      */
   void *_lcc_colltab;
/* pointer to user-added strcoll function                          */
   int (*_lcc_strcoll)();
/* pointer to user-added strxfrm function                          */
   size_t (*_lcc_strxfrm)();
};

static const struct _lc_ctype {
/* single-, double-, or multibyte character indicator              */
   int _lcc_cmode;

/* character type table pointer                                    */
   void *_lcc_ctab;
};

/* If _lcc_cmode is set to DBCS, it only has an impact on the ANSI */
/* multibyte character handling functions, not on isalpha, and     */
/* so on. _lcc_ctab is for single-byte characters only, per the    */
/* ANSI ctype.h-allowed representation of "unsigned char," and     */
/* has no relation to to _lcc_mode.                                */


static const void *_lc_all [5] ;         /* pointers to _lc struct */
                                         /* [0]  -  &_lc_collate   */
                                         /* [1]  -  &_lc_ctype     */
                                         /* [2]  -  &_lc_monetary  */
                                         /* [3]  -  &_lc_numeric   */
                                         /* [4]  -  &_lc_time      */


Example Locales

Example locales L$CLSAMP (" SAMP ") and L$CLDBEX (" DBEX ") are provided in source form with the compiler and library to serve as skeleton locales. You can easily modify these locales to create new locales. Ask your SAS Software Representative for SAS/C compiler products for information about obtaining copies of these programs. Here is an abbreviated listing of the " SAMP " locale, illustrating the data structures and routine formats required for a locale. The L$CLDBEX example is a double-byte example locale (not shown) with sample strcoll and strxfrm routines.

The "SAMP" locale

#title l$clsamp -- "SAMP" sample locale

/* This is the "SAMP" locale value module table, which      */
/* provides a skeleton example to modify for a              */
/* particular locale.  For those locales requiring          */
/* double-byte character support, see the "DBEX" locale     */
/* (L$DLDBEX) for examples of setting up a double-byte      */
/* LC_CTYPE, LC_COLLATE, strcoll, and strxfrm.              */
/*                                                          */
/* Any addresses of functions or tables not specified       */
/* with a category use the "C" locale equivalent            */
/* function or table. If a whole category is not specified  */
/* and the locale is requested for that category with       */
/* setlocale, effectively the "C" locale is used, although  */
/* the locale string returned contains the locale's name.   */


#include <stddef.h>
#include <locale.h>
#include <localeu.h>
#include <dynam.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#eject

/*                                                          */
/* ENTRY:  <=== _dynamn (externally visible) needs to be    */
/*              be compiled with SNAME l$clsam.             */
/*                                                          */
/* USAGE:  <=== Prototype call: _dynamn (dynamically        */
/*              loaded with the loadm function from the     */
/*              setlocale function).                        */
/*                                                          */

/*              Needs to be compiled with RENT or RENText   */
/*              option and link-edited RENT.                */
/*                                                          */
/*              For example, the following is a call made   */
/*              to setlocale:                               */
/*                 setlocale(LC_ALL, "SAMP");               */
/*              The following code is executed within       */
/*              setlocale code to load L$CLSAMP and then    */
/*              call it:                                    */
/*                 loadm(L$CLSAMP, &fp)                     */
/*                 fncptr = (char *** )(*fp)();             */
/*                                                          */
/* ARGUMENTS:  <=== None                                    */
/*                                                          */
/* RETURNS:    <=== A pointer to an array of pointers       */
/*                                                          */
/* static const void *lc_all_samp[5]  =                     */
/*    &collate,             collate pointer                 */
/*    &ctype,               ctype pointer                   */
/*    &monetary             monetary pointer                */
/*    &numeric              numeric pointer                 */
/*    &time                 time format pointer             */
/*                                                          */
/*                                                          */
/* END                                                      */

#eject

/*-----------------COLLATION category-----------------------*/

static const unsigned char sbcs_collate_table_samp [256]  =

/* If a collation table is specified, that is, its address  */
/* is nonzero, a locale strxfrm function is not coded, and  */
/* the locale is not a multibyte (double-byte) locale, then */
/* the collation array must have 256 elements that          */
/* translate any character's 8-bit representation to its    */
/* proper place in the locale's collating sequence.         */

0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,  /* 0x00-0f */
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,

0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,  /* 0x10-1f */
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,

0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,  /* 0x20-2f */
0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,

.     .     .     .     .     .     .     .      /* 0x30-ef */

.     .     .     .     .     .     .     .

.     .     .     .     .     .     .     .

0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, /* 0xf0-ff */
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff;

#define SBCS 0
#define DBCS 1

static const struct _lc_collate lc_collate_samp  =  {
   SBCS,                     /* single-byte character mode */
   sbcs_collate_table_samp,  /* collation table address    */
   0,                        /* locale strcoll collation   */
                             /* function pointer           */
   0                         /* locale strxfrm transform   */
                             /* function pointer           */
   ;
/* See L$CLDBEX for DBCS example of strcoll and strxfrm functions. */

#eject

/*-----------------CTYPE category--------------------------*/

#define U 1    /* uppercase       */
#define L 2    /* lowercase       */
#define N 4    /* number          */
#define W 8    /* white space     */
#define P 16   /* punctuation     */
#define S 32   /* blank           */
#define AX 64  /* alpha extender  */
#define X 128  /* hexadecimal     */

static const unsigned char lc_ctab_samp[513]  =
{

/* The character type table array, if coded, must contain  */
/* 513 single char elements. The first element is the EOF  */
/* representation (-1 or 0xff) followed by 256 elements    */
/* that contain the char types for any 8-bit character     */
/* returned by functions isalpha, isnumeric, and so on.    */
/* The next 256 elements contain the mappings for the      */
/* tolower and toupper string transformation functions.    */

0,       /* -1 = EOF */
0,       /* 00 = nul */
0,       /* 01 = soh */
0,       /* 02 = stx */
0,       /* 03 = etx */
0,       /* 04 = sel */
W,       /* 05 = ht  */
0,       /* 06 = rnl */
0,       /* 07 = del */
0,       /* 08 = ge  */
0,       /* 09 = sps */
0,       /* 0a = rpt */
W,       /* 0B = vt  */
W,       /* 0C = ff  */
W,       /* 0D = cr  */
0,       /* 0E = so  */
0,       /* 0F = si  */
0,       /* 10 = dle */
0,       /* 11 = dcl */
0,       /* 12 = dc2 */
0,       /* 13 = dc3 */
.        .
.        .
.        .
U,       /* E6 = W   */
U,       /* E7 = X   */
U,       /* E8 = Y   */
U,       /* E9 = Z   */
0,       /* EA       */
0,       /* EB       */
0,       /* EC       */
0,       /* ED       */
0,       /* EE       */
0,       /* EF       */
N|X,     /* F0 = 0   */
N|X,     /* F1 = 1   */
N|X,     /* F2 = 2   */
N|X,     /* F3 = 3   */
N|X,     /* F4 = 4   */
N|X,     /* F5 = 5   */
N|X,     /* F6 = 6   */
N|X,     /* F7 = 7   */
N|X,     /* F8 = 8   */
N|X,     /* F9 = 9   */
0,       /* FA       */
0,       /* FB       */
0,       /* FC       */
0,       /* FD       */
0,       /* FE       */
0,       /* FF = eo  */

   /* Lower 257 bytes contain char types, next */
   /* 256 contain the tolower and toupper      */
   /* character mappings.                      */
0x00,    /* 00 = nul */
0x01,    /* 01 = soh */
0x02,    /* 02 = stx */
0x03,    /* 03 = etx */
.        .
.        .
.        .
0x7d,    /* 7D = '           */
0x7e,    /* 7E = =           */
0x7f,    /* 7F = "           */
0x80,    /* 80               */
0xc1,    /* 81 = a -> C1 = A */
0xc2,    /* 82 = b -> C2 = B */
0xc3,    /* 83 = c -> C3 = C */
0xc4,    /* 84 = d -> C4 = D */
0xc5,    /* 85 = e -> C5 = E */
0xc6,    /* 86 = f -> C6 = F */
0xc7,    /* 87 = g -> C7 = G */
0xc8,    /* 88 = h -> C8 = H */
0xc9,    /* 89 = i -> C9 = I */
.        .
.        .
.        .
0xbc,    /* BC                      */
0xbd,    /* BD = ]  (close bracket) */
0xbe,    /* BE                      */
0xbf,    /* BF                      */
0xc0,    /* C0 =  (open brace)      */
0x81,    /* C1 = A -> 81 = a        */
0x82,    /* C2 = B -> 82 = b        */
0x83,    /* C3 = C -> 83 = c        */
0x84,    /* C4 = D -> 84 = d        */
0x85,    /* C5 = E -> 85 = e        */
0x86,    /* C6 = F -> 86 = f        */
0x87,    /* C7 = G -> 87 = g        */
0x88,    /* C8 = H -> 88 = h        */
0x89,    /* C9 = I -> 89 = i        */
0xca,    /* CA = shy                */
0xcb,    /* CB                      */
0xcc,    /* CC                      */
0xcd,    /* CD                      */
0xce,    /* CE                      */
.        .
.        .
.        .
0xf7,    /* F7 = 7   */
0xf8,    /* F8 = 8   */
0xf9,    /* F9 = 9   */
0xfa,    /* FA       */
0xfb,    /* FB       */
0xfc,    /* FC       */
0xfd,    /* FD       */
0xfe,    /* FE       */
0xff     /* FF = eo  */
};


static const struct _lc_ctype lc_ctype_samp =
   SBCS,             /* single-byte character mode */
   &lc_ctab_samp     /* ctype table pointer */
   };

#eject

/*------------NUMERIC category--------------------*/

const static struct _lc_numeric lc_numeric_samp = {
   ".",                     /* decimal_point */
   ",",                     /* thousands_sep */
   "\3"                    /* grouping */
};

/*------------MONETARY category---------------------*/

static const struct _lc_monetary lc_monetary_samp  = {
   "DOL",                   /* int_curr_symbol    */
   "$",                     /* currency_symbol    */
   ".",                     /* mon_decimal_point  */
   ",",                     /* mon_thousands_sep  */
   "\3",                    /* mon_grouping       */
   "",                      /* positive_sign      */
   "-",                     /* negative_sign      */
   2,                       /* int_frac_digits    */
   2,                       /* frac_digits        */
   1,                       /* p_cs_precedes      */
   0,                       /* p_sep_by_space     */
   1,                       /* n_cs_precedes      */
   0,                       /* n_sep_by_space     */
   1,                       /* p_sign_posn        */
   1                        /* n_sign_posn        */
   ;

#eject

/*------------TIME category------------------------------*/

char *sampdcnv(struct tm *tp);

static const struct _lc_time  lc_time_samp =  {
   0,             /* pointer to date and time conversion */
                  /* routine function pointer            */
   &sampdcnv,     /* pointer to date conversion          */
                  /* routine function pointer            */
   0,             /* pointer to time conversion          */
                  /* routine function pointer            */
      /* weekday name abbreviations                      */
   "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat",
      /* weekday full names                              */
   "Sunday", "Monday", "Tuesday", "Wednesday",
    "Thursday", "Friday", "Saturday",
      /* month name abbreviations                        */
   "Jan", "Feb", "Mar", "Apr", "May", "Jun",
    "Jul", "Aug", "Sep", "Oct", "Nov", "Dec",
      /* month full names */
   "January", "February", "March", "April", "May", "June",
    "July", "August", "September", "October", "November",
    "December",
   "AM",                    /* locale's "AM" equivalent  */
   "PM"                     /* locale's "PM" equivalent  */
};

char *sampdcnv(struct tm *tp)
{

/* SAMP date conversion routine           */
/* Function returns the date in the form: */
/* wkd mon dd 'yy                         */
/* for example, Thu Oct 10 '85.           */

char *time_format;

time_format = asctime(tp);

memcpy(time_format + 11, " '", 2);
memcpy(time_format + 13, time_format + 22, 2);
*(time_format + 15) = ' 0';

return time_format;
 /* End sampdcnv.  */

#eject

/* ALL category - array of pointers to category structures */

static const void *lc_all_samp[5]  =
   &lc_collate_samp ,  /* pointer to collate category      */
   &lc_ctype_samp ,    /* pointer to ctype category        */
   &lc_monetary_samp , /* pointer to monetary category     */
   &lc_numeric_samp ,  /* pointer to numeric samp          */
   &lc_time_samp       /* pointer to time samp             */
   ;

/*-----------------Return category pointers---------------*/

void *_dynamn()  /* executable entry point */
{
   return (void *)&lc_all_samp;  /* Return address of ALL array. */
}


LOCALE strcoll EQUIVALENT

If the library strcoll function is not adequate for the needs of a locale, you can write and use your own routine to do the collation. You include this routine as part of the LC_COLLATE category of a locale to be called from the library strcoll function after setlocale has loaded the locale. Because it is your own routine, you can give it any legal name, as long as you are consistent in its use. (For instance, the following example uses the name loclcoll .)

The locale's routine can make use of any information available in the locale, such as the mode and collation tables. In addition, if the LC_COLLATE mode is not 0 , the collation tables coded as part of the locale are not restricted to any format as long as the locale's strxfrm and strcoll routines can understand them.

The locale's routine is invoked from the library's strcoll function with the equivalent of the following call:

/* library strcoll function                   */
   int strcoll(char *str1, const char *str2)
   {
   .
   .
   .

      /* Return locale's strcoll value to library's */
      /* strcoll caller.                            */
   return loclcoll(str1, str2);
   }

The loclcoll function code appears as part of the LC_COLLATE category within the locale source code:

.
   .
   .
      /* collation tables, transformation tables,             */
      /* and other locale data                                */
   int loclcoll(const char *str1, const char *str2)

   /*   ARG    DCL              DESCRIPTION                   */
   /*                                                         */
   /*   str1   const char *    pointer to first input string  */
   /*                                                         */
   /*   str2   const char *    pointer to second input string */
   /*                                                         */
   /*  RETURNS:  <=== str1 < str2      a negative value       */
   /*                 str1 = str2      0                      */
   /*                 str1 > str2      a positive value       */
   /*                                                         */

   {
   .
   . /* locale's equivalent strcoll function code            */
   .
   return x     /* Return a result, x.              */
   }
      .
      . /* more locale data, routines, and so on             */
      .

For an example of a locale routine for strcoll , see the L$CLDBEX source code member distributed with the compiler and library.

LOCALE strxfrm EQUIVALENT

If the library strxfrm function is not adequate for the needs of a locale, you can write and use your own routine to do the transformation. You include this routine as part of the LC_COLLATE category of a locale to be called from the library strxfrm function after being loaded with an appropriate setlocale call. Because it is your own routine, you can give it any legal name, as long as you are consistent in its use. (For instance, the following example uses the name loclxfrm .)

There is one main difference in behavior requirements for the locale equivalent of the strxfrm function and behavior requirements for the library version. After the output buffer is filled, it stops scanning the input string and returns the size of the filled output buffer rather than the total transformed length. It also places the number of characters consumed from the input string in the area addressed by an additional fourth parameter. The locale's routine can use any information available in the locale, such as the mode and collation tables. In addition, if the LC_COLLATE mode is not 0 , the collation and transformation tables coded as part of the locale are not restricted to any format as long as the locale's strxfrm and strcoll routines can understand them.

The reason the behavior requirements of the library and locale strxfrm routines differ is to allow the strcoll function to call strxfrm with a limited buffer that might permit only partial transformation of the whole string. Theoretically, any number of output characters can be produced by strxfrm from any number of input characters.

The locale's strxfrm routine is invoked from within strxfrm with an equivalent of the following call. (The choice of "loclxfrm" is arbitrary. It could be any legal name as long as it is consistent.)

size_t strxfrm(char *str1, const char *str2, size_t n)
{
      size_t nchar_xfrmed, used;
   .
   .
   .
   wchar_xfrmed = loclxfrm(str1, str2, n, &used);
   .
   .
   .
}

The loclxfrm function code appears as part of the LC_COLLATE category within the locale source code:

.
. /* collation tables, transformation tables, and other   */
  /* locale data                                          */
.
size_t loclxfrm(char *str1, const char *str2,
                size_t n, size_t *used)

/* ARG    DCL            DESCRIPTION                      */
/*                                                        */
/* str1   char *         pointer to the transformed       */
/*                       string output array              */
/*                                                        */
/* str2   const char *   pointer to input string array    */
/*                                                        */
/* n      size_t         maximum number of bytes          */
/*                       (characters) written to str1     */
/*                       including the terminating null.  */
/*                       If n or more characters are      */
/*                       required for the transformed     */
/*                       string, only the first n are     */
/*                       written and the string is not    */
/*                       null-terminated.                 */
/*                                                        */
/* used   size_t *       pointer to size_t (unsigned int) */
/*                       value where the number of input  */
/*                       characters consumed from the     */
/*                       input string str2 is returned    */
/*                                                        */
/* RETURNS: <=== The number of characters placed in the   */
/*               output transformation array is returned. */
/*                                                        */
/*               Also, the number of characters consumed  */
/*               (scanned) from the input string str2 is  */
/*               placed in the size_t (unsigned int)      */
/*               value pointed to by used.                */
/*                                                        */
/*               The total number of characters required  */
/*               for the transformation is obtained by a  */
/*               special call:                            */
/*   total_loclxfrm_len = loclxfrm(0, s2, 0, &used)       */

{
.
. /* locale's strxfrm code                               */
}
.
.
. /* more locale data, routines, and so on               */
.

For an example of a locale routine for strxfrm , see the L$CLDBEX source code member distributed with the compiler and library.


Chapter Contents

Previous

Next

Top of Page

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.