mblen
mbstowcs
mbtowc
wcstombs
wctomb
There are two kinds of DBCS sequences, mixed and pure (in Standard terminology, multibyte and wide; this discussion uses DBCS terms). Mixed sequences may contain both single- and double-byte characters, while pure sequences contain only double-byte characters.
A mixed DBCS sequence must follow these rules:
\x0E and the value for SI is
\x0F. For example, the following is a mixed DBCS string in hex:
\x81\x82\x83\x0E\x41\x52\x0F\x81The
\x41\x52 between the \x0E and \x0F is a double-byte character. The other
characters are single-byte.
'a' is
\x81 in hex.
In the double-byte state, each character is represented by 2 bytes. Double-byte characters must conform to the following constraints:
\x41 and \xFE, except for
the encoding of the blank
space.
\x41 and \xFE, except for
the encoding of the blank
space.
\x40\x40.
\x0E\x0F). For example, the following sequence
(in hex), which might be construed as a single multibyte character, is not valid:
\x0E\x0F\x0E\x0F\x0E\x0F\x0E\x41\x81\x0FThis restriction is imposed because the number of bytes used to represent a multibyte character would, in theory, be unbounded; but the Standard requires an implementation to define a maximum byte-length for a multibyte character.
On the other hand, consecutive SI/SO pairs (\x0F\x0E)
are permitted because they may result from string concatenation.
For example, the following sequence (in hex) is valid:
\x0E\x41\x81\x0F\x0E\x41\x83\x0F
wchar_t, is implementation-defined as an integer type capable of
representing all the codes for the largest character set in locales
supported by the implementation. wchar_t is implemented by the
SAS/C Library in <stddef.h> as follows:
typedef unsigned short wchar_t;
wchar_t elements. When a mixed character sequence
contains characters that require only a single byte, these characters
are converted to wchar_t, but their values are unchanged. For
example, the mixed string ("abc") is represented as follows:
\x81\x82\x83\x00When converted to a pure DBCS sequence, the string will become the following:
\x00\x81\x00\x82\x00\x83\x00\x00Use the
mbtowc function to convert 1 multibyte character to a
double-byte character. Use the mbstowcs function to convert a
sequence of multibyte characters to a double-byte sequence. Note that
this function assumes the sequence is terminated by the null character,
\x00. You also can use regular string-handling functions
with mixed DBCS sequences. For example, you can use strlen to
determine the byte-length of a sequence, as long as the sequence is
null-terminated.
When converting from pure to mixed, SO/SI pairs are added to the
sequence as necessary. Use the wctomb function to convert 1
double-byte character to a multibyte character. Use the wcstombs
function to convert a sequence of double-byte characters to a multibyte
sequence. Note that this function assumes the sequence is terminated
by the null wide character, \x00\x00.
CRABDBCS bit in
CRABFLGM in your start-up
routine or in L$UMAIN.
printf, sprintf, scanf, sscanf,
and strftime as required by the Standard. Recognition of a mixed sequence
within a format requires that a double-byte locale such as
"DBCS" be in effect.
Mixed sequences are treated like any other character sequence in the format string
with one exception; they are copied unchanged to output or matched on scanf
input, but invalid sequences may cause premature termination of the
function. The conversion specifier % and specifications
associated with it, which are imbedded within the format string, are
recognized only while in single-byte mode, which is the initial shift
state at the beginning of the format string.
"S370" and "POSIX" do not support
DBCS sequences.
The default locale, "", may or may not support DBCS
sequences,
depending on the values of locale-related environment variables.
Of the three locales supplied by the SAS/C Library, "DBCS"
and "DBEX" support DBCS sequences, while "SAMP" does not.
The macro MB_CUR_MAX, defined in <stdlib.h>, defines the longest
sequence of bytes needed to represent a single multibyte character in
the current locale. The macro MB_LEN_MAX, on the other hand, is not
locale-dependent and defines the longest multibyte character permitted
across all locales.

#include <stdlib.h> int mblen(const char *s, size_t n);
mblen determines how many bytes are needed to represent the multibyte
character pointed to by s.
n specifies the maximum number of bytes of the multibyte character
sequence to examine.
s is not NULL, the return value is as follows:
0
s points to the null character.
n or fewer bytes constitute a valid
multibyte character.
-1
n or fewer bytes do not constitute a valid
multibyte character.
s is NULL , the return value is as follows:
0
mblen encounters invalid data; a
return value of -1 is the only indication of an error.
/* This example counts multibyte characters (not including */
/* terminating null) in a DBCS mixed string using mblen(). */
#include <locale.h>
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
/* "strptr" points to the beginning of a DBCS MIXED string. */
/* RETURNS: number of multibyte characters */
int count1(char *strptr)
{
int i = 0; /* number of multibyte characters found */
int charlen; /* byte length of current characte */
/* Inform library that we will be accepting a DBCS string. */
/* That is, SO and SI are not regular control characters: */
/* they indicate a change in shift state. */
setlocale(LC_ALL, "dbcs");
/* Reset to initial shift state. (A valid mixed string */
/* must begin in initial shift state). */
mblen(NULL, 0);
/* One loop iteration per character. Advance "strptr" by */
/* number of bytes consumed. */
while (charlen = mblen(strptr, MB_LEN_MAX)) {
if (charlen < 0) {
fputs("Invalid MIXED DBCS string", stderr);
abort();
fclose(stderr);
}
strptr += charlen;
i++;
}
return i;
}

#include <stdlib.h> size_t mbstowcs(wchar_t *pwcs, const char *s, size_t n);
mbstowcs converts a sequence of multibyte characters (mixed DBCS
sequence) pointed to by s
into a sequence of corresponding wide characters (pure DBCS
sequence) and stores the output sequence in the array pointed to by
pwcs.
The multibyte character sequence is assumed to begin in the initial shift state.
n specifies the maximum number of wide characters to be stored.
mbstowcs returns the
number of elements of pwcs that were modified, excluding the
terminating 0 code, if any. If the sequence of multibyte characters is
invalid, mbstowcs returns -1.
mbtowc.
If copying takes place between objects that overlap, the behavior of
mbstowcs is undefined.
A diagnostic is not issued if mbstowcs encounters invalid data; a
return value of -1 is the only indication of an error.
mbstowcs
and wcstomb.
#include <locale.h>
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#define MAX_CHARACTERS 81
/* "old_string" is the input MIXED DBCS string. "new_string" */
/* is the output MIXED DBCS string. "old_wchar" is the */
/* multibyte character to be replaced. "new_wchar" is the */
/* multibyte character to replace with. */
void mbsrepl(char *old_string, char *new_string,
wchar_t old_wchar, wchar_t new_wchar)
{
wchar_t work[MAX_CHARACTERS];
int nchars;
int i;
/* Inform library that we will be accepting a DBCS string.*/
/* That is, SO and SI are not regular control characters: */
/* they indicate a change in shift state. */
setlocale(LC_ALL, "dbcs");
nchars = mbstowcs(work, old_string, MAX_CHARACTERS);
if (nchars < 0) {
fputs("Invalid DBCS string.\n", stderr);
fclose(stderr);
abort();
}
/* Perform the actual substitution. */
for (i = 0; i < nchars; i++)
if (work[i] == old_wchar)
work[i] = new_wchar;
/* Convert back to MIXED format. */
nchars = wcstombs(new_string, work, MAX_CHARACTERS);
/* See if the replacement caused the string to overflow. */
if (nchars == MAX_CHARACTERS) {
fputs("Replacement string too large.\n", stderr);
abort();
fclose(stderr);
}
}

#include <stdlib.h> int mbtowc(wchar_t *pwc, const char *s, size_t n);
mbtowc determines how many bytes are needed to represent the
multibyte character pointed to by s. If s is not NULL,
mbtowc then stores the corresponding wide character in the array
pointed to by pwc.
n specifies the maximum number of bytes to examine in the array
pointed to by pwc.
s is not NULL, the return value is as follows:
0
s points to the null character.
n or fewer bytes constitute a valid
multibyte character.
-1
n or fewer bytes do not constitute a valid
multibyte character.
s is NULL, the return value is as follows:
0
mbtowc encounters invalid data; a
return value of -1 is the only indication of an error.
mbtowc.
#include <locale.h>
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
/* "begstr" points to the beginning of a DBCS MIXED string. */
/* "mbc_sought" is the character value we're looking for. */
int mbfind(char *begstr, wchar_t int mbc_sought)
{
int mbclen; /* length (in bytes) of current character */
wchar_t mbc; /* value of current character */
char *strptr; /* pointer to current location in string */
strptr = begstr;
/* Inform library that we will be accepting a DBCS string.*/
/* That is, SO and SI are not regular control characters: */
/* they indicate a change in shift state. */
setlocale(LC_ALL, "dbcs");
/* Reset to initial shift state. (A valid mixed string */
/* must begin in initial shift state). */
mbtowc((wchar_t *)NULL, NULL, 0);
/* One loop iteration per character. Advance "strptr" by */
/* number of bytes consumed. */
while (mbclen = mbtowc(&mbc, strptr, MB_LEN_MAX)) {
if (mbclen < 0) {
fputs("Invalid pure DBCS string\n", stderr);
abort();
}
if (mbc == mbc_sought)
break;
strptr += mbclen;
}
/* Last character was not '\0' -- must have found it */
if (mbclen) {
printf("MBFIND: found at byte offset %d\n", strptr - begstr);
return 1;
}
else {
puts("MBFIND: character not found\n");
return 0;
}
}

#include <stdlib.h> size_t wcstombs(char *s, const wchar_t *pwcs, size_t n);
wcstombs converts a sequence of wide characters (pure DBCS
sequence) to a sequence of multibyte characters (mixed DBCS sequence).
The wide characters are in the array pointed to by pwcs, and the
resulting multibyte characters are stored in the array pointed to by
s. The resulting multibyte character sequence begins in the
initial shift state.
n specifies the maximum number of bytes to be filled with
multibyte characters. The conversion stops if a multibyte character
would exceed the limit of n bytes or if a null character is
stored.
wcstombs returns the
number of bytes of s that were modified, excluding the terminating
0 byte, if any. If the sequence of multibyte characters is invalid,
wcstombs returns -1.
wcstombs is undefined.
A diagnostic is not issued if wcstombs encounters invalid data; a
return value of -1 is the only indication of an error.
mbstowcs.

#include <stdlib.h> int wctomb(char *s, wchar_t wchar);
wctomb determines how many bytes are needed to represent the
multibyte character corresponding to the wide (pure DBCS) character
whose value is wchar, including any change in shift state. It
stores the multibyte character representation in the array pointed to
by s, assuming s is not NULL. If the value of
wchar is 0, wctomb is left in the initial shift state.
s is not NULL, the return value is the number of bytes
that make up the multibyte character corresponding to the value of wchar.
If s is NULL, the return value is as follows:
0
MB_CUR_MAX macro.
wctomb encounters invalid data; a
return value of -1 is the only indication of an error.
wctomb.
#include <stdlib.h>
#include <locale.h>
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#define MAX_CHARACTERS 81
/* "pure_string" is the input PURE DBCS string. */
/* "mixed_string" the output MIXED DBCS string. */
void mbline(wchar_t *pure_string, char *mixed_string)
{
int i;
int mbclen;
wchar_t wc;
/* Inform library that we will be accepting a DBCS string. */
/* That is, SO and SI are not regular control characters: */
/* they indicate a change in shift state. */
setlocale(LC_ALL, "dbcs");
wctomb(NULL, 0); /* Reset to initial shift state. */
/* One loop iteration per character. Advance "mixed_string"*/
/* by number of bytes in character. */
i = 0;
do {
wc = pure_string[i++];
mbclen = wctomb(mixed_string, wc);
if (mbclen < 0) {
puts("Invalid PURE DBCS string.\n");
abort();
fclose(stdout);
}
mixed_string += mbclen;
} while (wc != L'\n');
*mixed_string = '\0';
}
Copyright (c) 1998 SAS Institute Inc. Cary, NC, USA. All rights reserved.