Previous Page | Next Page

Accessing Shared Executable Libraries from SAS

Special Considerations When Using Shared Libraries


32-Bit and 64-Bit Considerations


Compatibility between Your Shared Libraries and SAS

Starting in SAS 9, SAS is a 64-bit application that runs on all supported UNIX environments that are 64-bit enabled. The only exception is the Linux version of SAS which is a 32-bit application. When you call external routines in shared libraries, the shared library needs to be compatible with SAS.

For example, if you are running a 64-bit version of SAS on Solaris, you need to call a routine that is located in libc.so. In order for this shared library to be compatible with SAS, it needs to be a 64-bit shared library.

To determine whether a vendor supplied library is 32-bit or 64-bit, you can use the FILE command. The following output shows the results of using this command on Solaris for a 32-bit and 64-bit library:

$ file libc.so
libc.so:  ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, 
not stripped

$ file ./libc.so
./libc.so:  ELF 64-bit MSB dynamic lib SPARCV9 Version 1, dynamically linked,
not stripped


Memory Storage Allocated by the Shared Library

When specifying your SAS format and informat for each routine argument in the FORMAT attribute of the ARG statement, you need to consider the amount of memory the shared library allocates for the parameters that it receives and returns. To determine how much storage is being reserved for the input and return parameters of the routine in the external shared library, you can use the sizeof() C function.

The following table lists the typical memory allocations for C data types for 32-bit and 64-bit systems:

Memory Allocations for C Data Types
Type 32-Bit System Size (Bytes) 32-Bit System Size (Bits) 64-Bit System Size (Bytes) 64-Bit System Size (Bits)
char 1 8 1 8
short 2 16 2 16
int 4 32 4 32
long 4 32 8 64
long long 8 64 8 64
float 4 32 4 32
double 8 64 8 64
pointer 4 32 8 64

For information about the SAS formats to use for your data types, see Specifying Formats and Informats to Use with MODULE Arguments.


Naming Considerations When Using Shared Libraries

SAS loads external shared libraries that meet the following naming constraints:

If the name of your external shared library is greater than eight characters or contains a period, then you can create a symbolic link to point to the destination of the shared library. Once the link is created, you can add the name of the symbolic link to the MODULE statement in the SASCBTBL attribute table. When you are ready to execute your SAS program, use the PATH system option to point to the directory that contains the symbolic link.


Example of Creating a Symbolic Link

The Hewlett-Packard shared library libc.sl that is installed in the /usr/lib/pa20_64 directory contains a period in the name. Before SAS will load this shared library, you need to create a symbolic link that meets the naming convention of eight characters or less and no period. The symbolic link shown in the following example points to the target location of libc.sl:

$ ln -s /usr/lib/pa20_64/libc.sl /tmp/libclnk

After the symbolic link is created, you can then update the MODULE= option in the SASCBTBL attribute table, as shown in the following code:

routine name minarg=2 maxarg=2 returns=short module=libclnk;
arg 1 char output byaddr fdstart format=$cstr9.;
arg 2 char output format=$cstr9.;

To load the shared library during your invocation of SAS, type the following command:

/usr/local/sasv91/sas -path /tmp module.sas


Using PEEKLONG Functions to Access Character String Arguments

Because the SAS language does not provide pointers as data types, you can use the SAS PEEKLONG functions to access the data stored at these address values.

For example, the following program demonstrates how the address of a pointer is supplied and how it can set the pointer to the address of a static table containing the contiguous integers 1, 2, and 3. It also calls the useptr routine in the useptr shared library on a 64-bit operating system.

static struct MYTABLE {
int value1;
int value2;
int value3;
} mytable = {1,2,3};

useptr(toset)
char **toset;
{
   *toset = (char *)&mytable
}

The following is the SASCBTBL attribute table entry:

routine useptr minarg=1 maxarg=1;
arg 1 char update format=$char20.;

The following is the SAS code:

data _null_;
   length ptrval $20 thedata $12;
   call module('*i','useptr',ptrval);
   thedata=peekclong(ptrval,12);

   /* Converts hexadecimal data to character data */
   put thedata=$hex24.;

   /* Converts hexadecimal positive binary values to fixed or floating point value */
   ptrval=hex40.;
run;

SAS writes the following output to the log.

Log Output for Accessing Character Strings with the PEEKCLONG Function

thedata=000000010000000200000003 ptrval=800003FFFF0C

In this example, the PEEKCLONG function is given two arguments, a pointer via a numeric variable and a length in bytes. PEEKCLONG returns a character string of the specified length containing the characters at the pointer location.

For more information about the PEEKLONG functions, see PEEKLONG Function: UNIX.


Accessing Shared Libraries Efficiently

The MODULE function reads the attribute table that is referenced by the SASCBTBL fileref once per step (DATA step, PROC IML step, or SCL step). It parses the table and stores the attribute information for future use during the step. When you use the MODULE function, SAS searches the stored attribute information for the matching routine and module names. The first time you access a shared library during a step, SAS loads the shared library and determines the address of the requested routine. Each shared library you invoke stays loaded for the duration of the step, and is not reloaded in subsequent calls. All modules and routines are unloaded at the end of the step.

In the following example, the attribute table has the following basic form:

* routines XYZ and BBB in FIRST.Shared Library;
routine XYZ minarg=1 maxarg=1 module=FIRST;
arg 1 num input;
routine BBB minarg=1 maxarg=1 module=FIRST;
arg 1 num input;
* routines ABC and DDD in SECOND.Shared Library;
routine ABC minarg=1 maxarg=1 module=SECOND;
arg 1 num input;
routine DDD minarg=1 maxarg=1 module=SECOND;
arg 1 num input;

The DATA step code looks like the following:

filename sascbtbl 'myattr.tbl';
data _null_;
   do i=1 to 50;
      /* FIRST.Shared Library is loaded only once */
      value = modulen('XYZ',i);
      /* SECOND.Shared Library is loaded only once */
      value2 = modulen('ABC',value);
      put i= value= value2=;
   end;
run;

In this example, MODULEN parses the attribute table during DATA step compilation. In the first loop iteration (i=1), FIRST.Shared Library is loaded and the XYZ routine is accessed when MODULEN calls for it. Next, SECOND.Shared Library is loaded and the ABC routine is accessed. For subsequent loop iterations (starting when i=2), FIRST.Shared Library and SECOND.Shared Library remain loaded, so the MODULEN function simply accesses the XYZ and ABC routines. SAS unloads both shared libraries at the end of the DATA step.

Note that the attribute table can contain any number of descriptions for routines that are not accessed for a given step. The presence of the attribute table does not cause any additional overhead (apart from a few bytes of internal memory to hold the attribute descriptions). In the above example, BBB and DDD are in the attribute table but are not accessed by the DATA step.


Grouping SAS Variables as Structure Arguments

A common need when calling external routines is to pass a pointer to a structure. Some parts of the structure might be used as input to the routine, while other parts might be replaced or filled in by the routine. Even though SAS does not have structures in its language, you can indicate to the MODULE function that you want a particular set of arguments grouped into a single structure. You indicate this grouping by using the FDSTART option of the ARG statement to flag the argument that begins the structure in the attribute table. SAS gathers that argument and all the arguments that follow (until encountering another FDSTART option) into a single contiguous block, and passes a pointer to the block as an argument to the shared library routine.


Example: Grouping Your System Information as Structure Arguments

This example uses the uname routine, which is part of the /usr/lib/pa20_64/libc.sl shared library in the HP-UX operating environment. This routine returns the following information about your computer system:

The following is the C prototype for this routine:

int uname(struct utsname *name);

In C, the utsname structure is defined with the following members:

#define UTSLEN 9
#define SNLEN 15

char sysname[UTSLEN];
char nodename[UTSLEN];
char release[UTSLEN];
char version[UTSLEN];
char machine[UTSLEN];
char idnumber[SNLEN];

Each of the above structure members are null-terminated strings.

To call this routine using the MODULE function, you use the following attribute table entries:

* attribute table entry;
routine uname minarg=6 maxarg=6 returns=short module=libc;
arg 1 char output byaddr fdstart format=$cstr9.;
arg 2 char output                format=$cstr9.;
arg 3 char output                format=$cstr9.;
arg 4 char output                format=$cstr9.;
arg 5 char output                format=$cstr9.;
arg 6 char output                format=$cstr15.;

The following example shows the SAS source code to call the uname routine from within the DATA step:

x 'if [ ! -L ./libc ]; then ln -s /usr/lib/pa20_64/libc.sl ./libc ; fi' ;
x 'setenv LD_LIBRARY_PATH .:/usr/lib:/lib:/usr/lib/pa20_64'

data _null_;
   length sysname $9 nodename $9 release $9 version $9 machine $9 idnumber $15.
   retain sysname nodename release version machine idnumber " ";
   rc=modulen('uname', sysname, nodename, release, version, machine, idnumber)
   put rc = ;
   put sysname = ;
   put nodename = ;
   put release  = ;
   put version  = ;
   put machine  = ;
   put idnumber = ;
run;

SAS writes the following output to the log:

Grouping SAS Variables as a Structure

rc=0
sysname=HP-UX
nodename=garage
release=B.11.00
version=u
machine=9000/800
idnumber=103901537

Using Constants and Expressions as Arguments to the MODULE Function

You can pass any kind of expression as an argument to the MODULE function. The attribute table indicates whether the argument is for input, output, or update.

You can specify input arguments as constants and arithmetic expressions. However, because output and update arguments must be able to be modified and returned, you can pass only a variable for them. If you specify a constant or expression where a value that can be updated is expected, SAS issues a warning message pointing out the error. Processing continues, but the MODULE function cannot perform the update (meaning that the value of the argument you wanted to update will be lost).

Consider these examples. Here is the attribute table:

* attribute table entry for ABC;
routine abc minarg=2 maxarg=2;
arg 1 input format=ib4.;
arg 2 output format=ib4.;

Here is the DATA step with the MODULE calls:

data _null_;
  x=5;
  /* passing a variable as the    */
  /*   second argument - OK       */
  call module('abc',1,x);

  /* passing a constant as the    */
  /*   second argument - INVALID  */
  call module('abc',1,2);

  /* passing an expression as the */
  /*   second argument - INVALID  */
  call module('abc',1,x+1);
run;

In the above example, the first call to MODULE is correct because x is updated by the value that the abc routine returns for the second argument. The second call to MODULE is not correct because a constant is passed. MODULE issues a warning indicating you have passed a constant, and passes a temporary area instead. The third call to MODULE is not correct because an arithmetic expression is passed, which causes a temporary location from the DATA step to be used, and the returned value to be lost.


Specifying Formats and Informats to Use with MODULE Arguments

You specify the SAS format and informat for each shared library routine argument by specifying the FORMAT attribute in the ARG statement. The format indicates how numeric and character values should be passed to the shared library routine and how they should be read back upon completion of the routine.

Usually, the format you use corresponds to a variable type for a given programming language. The following sections describe the proper formats that correspond to different variable types in various programming languages.


C Language Formats

C Type SAS Format/Informat for 32-Bit Systems SAS Format/Informat for 64-Bit Systems
double RB8. RB8.
float FLOAT4. FLOAT4.
signed int IB4. IB4.
signed short IB2. IB2.
signed long IB4. IB8.
char * IB4. IB8.
unsigned int PIB4. PIB4.
unsigned short PIB2. PIB2.
unsigned long PIB4. PIB8.
char[w] $CHARw. or $CSTRw. (see $CSTRw. Format) $CHARw. or $CSTRw. (see $CSTRw. Format)

Note:   For information about passing character data other than as pointers to character strings, see $BYVALw. Format.  [cautionend]


FORTRAN Language Formats

FORTRAN Type SAS Format/Informat
integer*2 IB2.
integer*4 IB4.
real*4 RB4.
real*8 RB8.
character*w $CHARw.

The MODULE function can support FORTRAN character arguments only if they are not expected to be passed by a descriptor.


PL/I Language Formats

PL/I Type SAS Format/Informat
FIXED BIN(15) IB2.
FIXED BIN(31) IB4.
FLOAT BIN(21) RB4.
FLOAT BIN(31) RB8.
CHARACTER(w) $CHARw.

The PL/I descriptions are added here for completeness. These descriptions do not guarantee that you will be able to invoke PL/I routines.


COBOL Language Formats

COBOL Format SAS Format/Informat Description
PIC Sxxxx BINARY IBw. integer binary
COMP-2 RB8. double-precision floating point
COMP-1 RB4. single-precision floating point
PIC xxxx or Sxxxx Fw. printable numeric
PIC yyyy $CHARw. character

The following COBOL specifications might not match properly with the formats supplied by SAS because zoned and packed decimal are not truly defined for systems based on Intel architecture.

COBOL Format SAS Format/Informat Description
PIC Sxxxx DISPLAY ZDw. zoned decimal
PIC Sxxxx PACKED-DECIMAL PDw. packed decimal

The following COBOL specifications do not have true native equivalents and are usable only in conjunction with the corresponding S370Fxxx informat and format, which enables IBM mainframe-style representations to be read and written in the UNIX environment.

COBOL Format SAS Format/Informat Description
PIC xxxx DISPLAY S370FZDUw. zoned decimal unsigned
PIC Sxxxx DISPLAY SIGN LEADING S370FZDLw. zoned decimal leading sign
PIC Sxxxx DISPLAY SIGN LEADING SEPARATE S370FZDSw. zoned decimal leading sign separate
PIC Sxxxx DISPLAY SIGN TRAILING SEPARATE S370FZDTw. zoned decimal trailing sign separate
PIC xxxx BINARY S370FIBUw. integer binary unsigned
PIC xxxx PACKED-DECIMAL S370FPDUw. packed decimal unsigned


$CSTRw. Format

If you pass a character argument as a null-terminated string, use the $CSTRw. format. This format looks for the last non-blank character of your character argument and passes a copy of the string with a null terminator after the last non-blank character. For example, consider the following attribute table entry:

* attribute table entry;
routine abc minarg=1 maxarg=1;
arg 1 input char format=$cstr10.;

With this entry, you can use the following DATA step:

data _null_;
     rc = module('abc','my string');
     run;

The $CSTR format adds a null terminator to the character string my string before passing it to the abc routine. Adding a null terminator to the character string and then passing the string to the abc routine is equivalent to the following attribute entry:

* attribute table entry;
routine abc minarg=1 maxarg=1;
arg 1 input char format=$char10.;

The entry would have the following DATA step:

data _null_;
     rc = module('abc','my string'||'00'x);
run;

The first example is easier to understand and easier to use when using variable or expression arguments.

The $CSTR informat converts a null-terminated string into a blank-padded string of the specified length. If the shared library routine is supposed to update a character argument, use the $CSTR informat in the argument attribute.


$BYVALw. Format

When you use a MODULE function to pass a single character by value, the argument is automatically promoted to an integer. If you want to use a character expression in the MODULE call, you must use the special format/informat called $BYVALw. The $BYVALw. format/informat expects a single character and will produce a numeric value, the size of which depends on w. $BYVAL2. produces a short, $BYVAL4. produces a long, and $BYVAL8. produces a double. Consider this example using the C language:

long xyz(a,b)
  long a; double b;
  {
  static char c = 'Y';
  if (a == 'X')
     return(1);
  else if (b == c)
     return(2);
  else return(3);
  } 

In this example, the xyz routine expects two arguments, a long and a double. If the long is an X , the actual value of the long is 88 in decimal. This result happens because an ASCII X is stored as hexadecimal 58, and this value is promoted to a long, represented as 0x00000058 (or 88 decimal). If the value of a is X , or 88, then a 1 is returned. If the second argument, a double, is Y (which is interpreted as 89), then 2 is returned.

If you want to pass characters as the arguments to xyz then in the C language, you would invoke them as follows:

x = xyz('X',(double)'Z');
y = xyz('Q',(double)'Y');

The characters are invoked in this way because the X and Q values are automatically promoted to integers (which are the same as longs for the sake of this example), and the integer values corresponding to Z and Y are cast to doubles.

To call xyz using the MODULEN function, your attribute table must reflect the fact that you want to pass characters:

routine xyz minarg=2 maxarg=2 returns=long;
arg 1 input char byvalue format=$byval4.;
arg 2 input char byvalue format=$byval8.;

Note that it is important that the BYVALUE option appears in the ARG statement as well. Otherwise, MODULEN assumes that you want to pass a pointer to the routine, instead of a value.

Here is the DATA step that invokes MODULEN and passes it characters:

data _null_;
     x = modulen('xyz','X','Z');
     put x= ' (should be 1)';
     y = modulen('xyz','Q','Y');
     put y= ' (should be 2)';
run;


Understanding MODULE Log Messages

If you specify i in the control string parameter to MODULE, SAS prints several informational messages to the log. You can use these messages to determine whether you have passed incorrect arguments or coded the attribute table incorrectly.

Consider this example that uses MODULEIN from within the IML procedure. It uses the MODULEIN function to invoke the changi routine (which is stored in theoretical TRYMOD.so). In the example, MODULEIN passes the constant 6 and the matrix x2, which is a 4x5 matrix to be converted to an integer matrix. The attribute table for changi is as follows:

routine changi module=trymod returns=long;
arg 1 input num format=ib4. byvalue;
arg 2 update num format=ib4.;

The following IML step invokes MODULEIN:

proc iml;
   x1 = J(4,5,0);
   do i=1 to 4;
      do j=1 to 5;
         x1[i,j] = i*10+j+3;
      end;
   end;
   y1= x1;
          x2 = x1;
                   y2 = y1;
   rc = modulein('*i','changi',6,x2);
   ....

The '*i' control string causes the lines shown in the following output to be printed in the log.

MODULEIN Log Output

---PARM LIST FOR MODULEIN ROUTINE---  CHR PARM 1 885E0AA8 2A69 (*i)
CHR PARM 2 885E0AD0 6368616E6769 (changi)
NUM PARM 3 885E0AE0 0000000000001840
NUM PARM 4 885E07F0
0000000000002C400000000000002E40000000000000304000000000000031400000000000003240
000000000000384000000000000039400000000000003A400000000000003B400000000000003C40
0000000000004140000000000080414000000000
---ROUTINE changi LOADED AT ADDRESS 886119B8 (PARMLIST AT 886033A0)--- PARM 1 06000000     <CALL-BY-VALUE>
PARM 2 88604720
0E0000000F00000010000000110000001200000018000000190000001A0000001B0000001C000000
22000000230000002400000025000000260000002C0000002D0000002E0000002F00000030000000
---VALUES UPON RETURN FROM changi ROUTINE---   PARM 1 06000000     <CALL-BY-VALUE>
PARM 2 88604720
140000001F0000002A0000003500000040000000820000008D00000098000000A3000000AE000000
F0000000FB00000006010000110100001C0100005E01000069010000740100007F0100008A010000
---VALUES UPON RETURN FROM MODULEIN ROUTINE---  NUM PARM 3 885E0AE0 0000000000001840
NUM PARM 4 885E07F0
00000000000034400000000000003F4000000000000045400000000000804A400000000000005040
00000000004060400000000000A06140000000000000634000000000006064400000000000C06540
0000000000006E400000000000606F4000000000

The output is divided into four sections.

Previous Page | Next Page | Top of Page