Accessing External Shared Images from SAS |
Using PEEKLONG Functions to Access Character String Arguments
Specifying Formats and Informats to Use with MODULE Arguments
32-Bit and 64-Bit Considerations |
Starting in SAS 9, SAS is a 64-bit application that runs on an operating system that is 64-bit enabled. When you call external routines in shared images using the MODULE functions, the shared image needs to be of the same bit family as the version of SAS you are running. If you are running SAS 9 or any SAS version released after SAS 9, then the shared image needs to be 64-bit. If you are using a SAS version before SAS 9, then the shared image needs to be 32-bit.
For information about how to compile and link to a 64-bit shared image, see Using PEEKLONG Functions to Access Character String Arguments.
Note: SAS does not support return types for 64-bit pointers.
When specifying your SAS format and informat for each routine argument in the FORMAT attribute of the ARG statement, you need to consider the amount of memory storage the external shared image allocates for the parameters that it receives and returns. The data types of the external routine will determine the SAS format and informat to be used in the SASCBTBL attribute table. To determine how much storage is being reserved for the input and return parameters of the routine in the external shared image, you can use the sizeof() C function.
The following table lists the typical memory allocations for C data types for 32-bit and 64-bit systems.
32-Bit System Size | 64-Bit System Size | |||
---|---|---|---|---|
Type | Bytes | Bits | Bytes | Bits |
char | 1 | 8 | 1 | 8 |
short | 2 | 16 | 2 | 16 |
int | 4 | 32 | 4 | 32 |
long | 4 | 32 | 4 | 32 |
long long | 8 | 64 | 8 | 64 |
float | 4 | 32 | 4 | 32 |
double | 8 | 64 | 8 | 64 |
pointer | 4 | 32 | 8 | 64 |
For information about the SAS formats to use for your data types, see Specifying Formats and Informats to Use with MODULE Arguments.
Naming Considerations When Using Shared Images |
SAS automatically loads external shared images that conform to the following naming conventions:
If the name of your external shared image is greater than eight characters or contains a period, then you can create a logical name to point to the destination of the shared image. The following code shows how to define the logical name LIBCLNK:
$ define libclnk $1$disk:[tmp]libraryclink.exe
After the new logical name is created, you can update the MODULE= statement in the attribute table. In the following example, MODULE= is set to the name of the new logical name, LIBCLNK.
routine name minarg=2 maxarg=2 returns=short module=libclink; arg 1 char output byaddr fdstart format=$cstr9.; arg 2 char output format=$cstr9.;
Using PEEKLONG Functions to Access Character String Arguments |
Because the SAS language does not provide pointers as data types, you can use the SAS PEEKLONG functions to access the data stored at these address values.
For example, the following C program demonstrates how the address of a pointer is supplied, and how it can set the pointer to the address of a static table containing the contiguous integers 1, 2, and 3. It also calls the useptr routine in the vmslib shared image on a 64-bit operating system.
#include <stdarg.h> #include <stdio.h> static struct MYTABLE { int value1; int value2; int value3; } mytable = {1,2,3}; useptr(toset) char **toset; { *toset = (char *)&mytable; }
The following code shows how the C source code is then compiled and linked using a DCL command file, such as VMSLIB.COM.
$!Compiles are required to use /float=IEEE because SAS treats $!floats as IEEE S or D-Floats on OpenVMS. $ cc.decc/float=ieee/ieee=fast/obj=vmslib.o/arch=generic/name=(short,as_is)
/pointer=64=argv vmslib.c $ open/write optfile vmslib.opt $ write optfile "CASE_SENSITIVE=YES" $ write optfile "SYMBOL_VECTOR=(useptr=PROCEDURE)" $ write optfile "vmslib.o" $ close optfile $ link/exe=vmslib.exe/share/bpage=13/map/cross/full vmslib.opt/opt
The SAS source code needed to create the SASCBTBL attribute table and call the routine from within the DATA step is the following:
filename sascbtbl 'sas$worklib:temp.dat'; data _null_; file sascbtbl; input; put _infile_; datalines4; routine useptr minarg=1 maxarg=1 module=vmslib; arg 1 char update format=$char20.; ;;;; data _null_; length ptrval $20 thedata $12; call module('*i','useptr',ptrval); thedata=peekclong(ptrval,12); /* Converts hexadecimal data to character data */ put thedata=$hex24.; /* Converts hexadecimal positive binary values to fixed or floating point values */ put ptrval=hex40.; run;
Note: PEEKCLONG is used in this example because SAS 9.2 is 64-bit enabled.
The SAS log output would be the following:
Log Output for Using PEEKLONG Functions To Access Character Strings
thedata=010000000200000003000000 ptrval=0000D90300000000202020202020202020202020In this example, the PEEKCLONG function is given two arguments: PEEKCLONG returns a character string of the specified length containing the characters that are at the pointer location.
For more information, see PEEKLONG Function: OpenVMS.
Accessing External Shared Images Efficiently |
The MODULE function reads the attribute table that is referenced by the SASCBTBL fileref once per step (DATA step, PROC IML step, or SCL step). It parses the table and stores the attribute information for future use during the step. When you use the MODULE function, SAS searches the stored attribute information for the matching routine and module names. The first time you access a shared image during a step, SAS loads the shared image, and determines the address of the requested routine. Each shared image you invoke stays loaded for the duration of the step, and is not reloaded in subsequent calls. For example, suppose the attribute table had the following basic form:
* routines XYZ and BBB in FIRST.EXE; routine XYZ minarg=1 maxarg=1 module=first; arg 1 num input; routine BBB minarg=1 maxarg=1 module=first; arg 1 num input; * routines ABC and DDD in SECOND.EXE; routine ABC minarg=1 maxarg=1 module=second; arg 1 num input; routine DDD minarg=1 maxarg=1 module=second; arg 1 num input;
and the DATA step looked like the following:
filename sascbtbl 'myattr.tbl'; data _null_; do i=1 to 50; /* FIRST.EXE is loaded only once */ value = modulen('XYZ',i); /* SECOND.EXE is loaded only once */ value2 = modulen('ABC',value); put i= value= value2=; end; run;
In this example, MODULEN parses the attribute table during DATA step compilation. In the first loop iteration (i=1), FIRST.EXE is loaded and the XYZ routine is accessed when MODULEN calls for it. Next, SECOND.EXE is loaded and the ABC routine is accessed. For subsequent loop iterations (starting when i=2), FIRST.EXE and SECOND.EXE remain loaded, so the MODULEN function simply accesses the XYZ and ABC routines.
Note that the attribute table can contain any number of descriptions for routines that are not accessed for a given step. This does not cause any additional overhead (apart from a few bytes of internal memory to hold the attribute descriptions). In the above example, BBB and DDD are in the attribute table but are not accessed by the DATA step.
Grouping SAS Variables as Structure Arguments |
A common need when calling external routines is to pass a pointer to a structure. Some parts of the structure might be used as input to the routine, while other parts might be replaced or filled in by the routine. Even though SAS does not have structures in its language, you can indicate to the MODULE function that you want a particular set of arguments grouped into a single structure. You indicate this by using the FDSTART option of the ARG statement to flag the argument that begins the structure in the attribute table. SAS gathers that argument and all that follow (until encountering another FDSTART option) into a single contiguous block, and passes a pointer to the block as an argument to the shared image routine.
This example uses the uname routine that is part of HP C Run-Time Library on the OpenVMS environment. This routine returns information about your computer system. This information includes the following:
Because a shared image name is required when using the MODULE functions, you first need to create your own shared image that contains a routine that will include the uname routine for your use.
The C source code for VMSLIB.C is the following:
#include <stdarg.h> #include <stdio.h> #include <string.h> #include <utsname.h> /* The header file <utsname.h> declares the uname prototype and defines the utsname struct as: int uname(struct utsname *name); struct utsname { char sysname [31+1]; char release [31+1]; char version [31+1]; char machine [31+1]; char nodename[1024+1]; #ifndef _POSIX_C_SOURCE char arch [15+1]; char __spare [256+1]; #else char __spare [15+1+256+1]; #endif }; */ int vmsuname(struct utsname *name) { int rc; struct utsname vmsname; /* The HP C Run-Time Library function "uname()" is one of the few functions that does not accept 64-bit pointers; so you need to declares a 32-bit pointer &vmsname to pass into uname() in order to access the necessary information */ if ((rc=uname(&vmsname))!=0) perror("vmslib"); else { /* printf's used for debugging: printf("vmsname(%d) = %d\n",sizeof(vmsname),&vmsname); printf("sysname(%d) = %s\n",sizeof(vmsname.sysname),vmsname.sysname); printf("release(%d) = %s\n",sizeof(vmsname.release),vmsname.release); printf("version(%d) = %s\n",sizeof(vmsname.version),vmsname.version); printf("machine(%d) = %s\n",sizeof(vmsname.machine),vmsname.machine); printf("nodename(%d)= %s\n",sizeof(vmsname.nodename),vmsname.nodename); printf("arch(%d) = %s\n",sizeof(vmsname.arch), vmsname.arch); printf("spare(%d) = %s\n",sizeof(vmsname.__spare),"Filler"); printf("\n"); */ */ Since a 32-bit address &vmsname was used to get the information it is necessary to copy that information over to the 64-bit location (name) sent in as an argument to vmsuname */ strcpy(name->sysname,vmsname.sysname); strcpy(name->release,vmsname.release); strcpy(name->version,vmsname.version); strcpy(name->machine,vmsname.machine); strcpy(name->nodename,vmsname.nodename); strcpy(name->arch,vmsname.arch); strcpy(name->__spare,vmsname.__spare); } return(rc); }
The following example shows how the C source code could be compiled and linked using a DCL Command file, such as VMSLIB.COM:
$ cc/decc/obj=vmslib.o/arch=generic/name=(short,as_is)/pointer=64=argv vmslib.c $ open/write optfile vmslib.opt $ write optfile "CASE_SENSITIVE=YES" $ write optfile "SYMBOL_VECTOR=(vmsuname=PROCEDURE)" $ write optfile "vmslib.o" $ close optfile $ link/exe=vmslib.exe/share/bpage=13/map/cross/full vmslib.opt/opt
The SAS source code used to create the SASCBTBL attribute table and call the routine from within the DATA step is the following:
filename sascbtbl 'sas$worklib:temp.dat'; data _null_; file sascbtbl; input; put _infile_; datalines4; routine vmsuname minarg=7 maxarg=7 returns=short module=vmslib; arg 1 char output byaddr fdstart format=$cstr32.; arg 2 char output format=$cstr32.; arg 3 char output format=$cstr32.; arg 4 char output format=$cstr32.; arg 5 char output format=$cstr1025; arg 6 char output format=$cstr16.; arg 7 char output format=$cstr257; ;;;; data _null_; length sysname $32 release $32 version $32 machine $32 nodename $1025 arch $16 spare $257.; retain sysname release version machine nodename arch spare " "; rc=modulen('vmsuname',sysname,release,version,machine,nodename,arch,spare); put rc =; put sysname =; put release =; put version =; put machine =; put nodename=; put arch =; run;
The SAS log output would be the following:
Log Output for Grouping SAS Variables as Structure Arguments
rc=0 sysname=OpenVMS release=0 version=V8.3 machine=HP_rx4640__(1.60GHz/9.0MB) nodename=IT4640 arch=IA64
Using Constants and Expressions as Arguments to MODULE |
You can pass any kind of expression as an argument to the MODULE function. The attribute table indicates whether the argument is for input, output, or update.
You can specify input arguments as constants and arithmetic expressions. However, because output and update arguments must be able to be modified and returned, you can pass only a variable for them. If you specify a constant or expression where a value that can be updated is expected, SAS issues a warning message pointing out the error. Processing continues, but the MODULE function cannot perform the update (meaning that the value of the argument you wanted to update will be lost).
Consider these examples. Here is the attribute table:
* attribute table entry for ABC; routine abc minarg=2 maxarg=2; arg 1 input format=ib4.; arg 2 output format=ib4.;
Here is the DATA step with the MODULE calls:
data _null_; x=5; /* passing a variable as the second argument - OK */ call module('abc',1,x); /* passing a constant as the second argument - INVALID */ call module('abc',1,2); /* passing an expression as the second argument - INVALID */ call module('abc',1,x+1); run;
In the above example, the first call to MODULE is correct because x is updated by the value that the abc routine returns for the second argument. The second call to MODULE is not correct because a constant is passed. MODULE issues a warning indicating you have passed a constant, and passes a temporary area instead. The third call to MODULE is not correct because an arithmetic expression is passed, which causes a temporary location from the DATA step to be used, and the returned value to be lost.
Specifying Formats and Informats to Use with MODULE Arguments |
You specify the SAS format and informat for each shared image routine argument by specifying the FORMAT attribute in the ARG statement. The format indicates how numeric and character values should be passed to the shared image routine and how they should be read back upon completion of the routine.
Usually, the format you use corresponds to a variable type for a given programming language. The following sections describe the proper formats that correspond to different variable types in various programming languages.
C Type | SAS Format/Informat |
---|---|
double | RB8. |
float | FLOAT4. |
signed int | IB4. |
signed short | IB2. |
signed long | IB4. |
char * | IB8. |
unsigned int | PIB4. |
unsigned short | PIB2. |
unsigned long | PIB4. |
char[w] | $CHARw. or $CSTRw. (see $CSTRw. Format) |
Note: For information about passing character data other than as pointers to character strings, see $BYVALw. Format.
FORTRAN Type | SAS Format/Informat |
---|---|
integer*2 | IB2. |
integer*4 | IB4. |
real*4 | RB4. |
real*8 | RB8. |
character*w | $CHARw. |
The MODULE function can support FORTRAN character arguments only if they are not expected to be passed by a descriptor.
PL/I Type | SAS Format/Informat |
---|---|
FIXED BIN(15) | IB2. |
FIXED BIN(31) | IB4. |
FLOAT BIN(21) | RB4. |
FLOAT BIN(31) | RB8. |
CHARACTER(w) | $CHARw. |
The PL/I descriptions are added here for completeness. This does not guarantee that you will be able to invoke PL/I routines.
The following COBOL specifications might not match properly with the formats supplied by SAS because zoned and packed decimal are not truly defined for systems based on Intel architecture.
COBOL Format | SAS Format/Informat | Description |
---|---|---|
PIC Sxxxx DISPLAY | ZDw. | zoned decimal |
PIC Sxxxx PACKED-DECIMAL | PDw. | packed decimal |
The following COBOL specifications do not have true native equivalents and are usable only with the corresponding S370Fxxx informat and format, which enables IBM mainframe-style representations to be read and written in the PC environment.
If you pass a character argument as a null-terminated string, use the $CSTRw. format. This format looks for the last nonblank character of your character argument and passes a copy of the string with a null terminator after the last nonblank character. For example, given the following attribute table entry:
* attribute table entry; routine abc minarg=1 maxarg=1; arg 1 input char format=$cstr10.;
you can use the following DATA step:
data _null_; rc = module('abc','my string'); run;
The $CSTR format adds a null terminator to the character string my string before passing it to the abc routine. This is equivalent to the following attribute entry:
* attribute table entry; routine abc minarg=1 maxarg=1; arg 1 input char format=$char10.;
data _null_; rc = module('abc','my string'||'00'x); run;
The first example is easier to understand and easier to use when using variable or expression arguments.
The $CSTR informat converts a null-terminated string into a blank-padded string of the specified length. If the shared image routine is supposed to update a character argument, use the $CSTR informat in the argument attribute.
When you use the MODULE function to pass a single character by value, the argument is automatically promoted to an integer. If you want to use a character expression in the MODULE call, you must use the special format/informat called $BYVALw. The $BYVALw. format/informat expects a single character and will produce a numeric value, the size of which depends on w, the value of width. $BYVAL2. produces a short, $BYVAL4. produces a long, and $BYVAL8. produces a double. Consider this example using the C language:
long xyz(a,b) long a; double b; { static char c = 'Y'; if (a == 'X') return(1); else if (b == c) return(2); else return(3); }
In this example, the xyz routine expects two arguments, a long and a double. If the long is an X , the actual value of the long is 88 in decimal. This is because an ASCII X is stored as hexadecimal 58, and this is promoted to a long, represented as 0x00000058 (or 88 decimal). If the value of a is X , or 88, then a 1 is returned. If the second argument, a double, is Y (which is interpreted as 89), then 2 is returned.
Now suppose that you want to pass characters as the arguments to xyz . In C, you would invoke them as follows:
x = xyz('X',(double)'Z'); y = xyz('Q',(double)'Y');
This is because the X and Q values are automatically promoted to integers (which are the same as longs for the sake of this example), and the integer values corresponding to Z and Y are cast to doubles.
To call xyz using the MODULEN function, your attribute table must reflect the fact that you want to pass characters:
routine xyz minarg=2 maxarg=2 returns=long; arg 1 input char byvalue format=$byval4.; arg 2 input char byvalue format=$byval8.;
Note that it is important that the BYVALUE option appears in the ARG statement as well. Otherwise, MODULEN assumes that you want to pass a pointer to the routine, instead of a value.
Here is the DATA step that invokes MODULEN and passes it characters:
data _null_; x = modulen('xyz','X','Z'); put x= ' (should be 1)'; y = modulen('xyz','Q','Y'); put y= ' (should be 2)'; run;
Understanding MODULE Log Messages |
If you specify i in the control string parameter to MODULE, SAS prints several informational messages to the log. You can use these messages to determine whether you have passed incorrect arguments or coded the attribute table incorrectly.
Consider this example that uses MODULEIN from within the IML procedure. It uses the MODULEIN function to invoke the changi routine (which is stored in theoretical TRYMOD.EXE). In the example, MODULEIN passes the constant 6 and the matrix x2, which is a 4x5 matrix to be converted to an integer matrix. The attribute table for changi is as follows:
routine changi module=trymod returns=long; arg 1 input num format=ib4. byvalue; arg 2 update num format=ib4.;
The following PROC IML step invokes MODULEIN:
proc iml; x1 = J(4,5,0); do i=1 to 4; do j=1 to 5; x1[i,j] = i*10+j+3; end; end; y1= x1; x2 = x1; y2 = y1; rc = modulein('*i','changi',6,x2); ....
The '*i' control string causes the lines shown in the following output to be printed in the log.
---PARM LIST FOR MODULEIN ROUTINE--- CHR PARM 1 885E0AA8 2A69 (*i) CHR PARM 2 885E0AD0 6368616E6769 (changi) NUM PARM 3 885E0AE0 0000000000001840 NUM PARM 4 885E07F0 0000000000002C400000000000002E40000000000000304000000000000031400000000000003240 000000000000384000000000000039400000000000003A400000000000003B400000000000003C40 0000000000004140000000000080414000000000 ---ROUTINE changi LOADED AT ADDRESS 886119B8 (PARMLIST AT 886033A0)--- PARM 1 06000000 <CALL-BY-VALUE> PARM 2 88604720 0E0000000F00000010000000110000001200000018000000190000001A0000001B0000001C000000 22000000230000002400000025000000260000002C0000002D0000002E0000002F00000030000000 ---VALUES UPON RETURN FROM changi ROUTINE--- PARM 1 06000000 <CALL-BY-VALUE> PARM 2 88604720 140000001F0000002A0000003500000040000000820000008D00000098000000A3000000AE000000 F0000000FB00000006010000110100001C0100005E01000069010000740100007F0100008A010000 ---VALUES UPON RETURN FROM MODULEIN ROUTINE--- NUM PARM 3 885E0AE0 0000000000001840 NUM PARM 4 885E07F0 00000000000034400000000000003F4000000000000045400000000000804A400000000000005040 00000000004060400000000000A06140000000000000634000000000006064400000000000C06540 0000000000006E400000000000606F4000000000
The output is divided into four sections.
The first section describes the arguments passed to MODULEIN.
The 'CHR PARM n' portion indicates that character parameter n was passed. In the example, 885E0AA8 is the actual address of the first character parameter to MODULEIN. The value at the address is hexadecimal 2A69, and the ASCII representation of that value ('*i') is in parentheses after the hexadecimal value. The second parameter is likewise printed similarly. Only these first two arguments have their ASCII equivalents printed; this is because other arguments might contain unreadable binary data.
The remaining parameters appear with only hexadecimal representations of their values (NUM PARM 3 and NUM PARM 4 in the example).
The third parameter to MODULEIN is numeric, and it is at address 885E0AE0. The hexadecimal representation of the floating point number 6 is shown. The fourth parameter is at address 885E07F0, which points to an area containing all the values for the 4x5 matrix. The *i option prints the entire argument. Be careful if you use this option with large matrices, because the log might become quite large.
The second section of the log lists the arguments that are to be passed to the requested routine and, in this case, changed. This section is important for determining whether the arguments are being passed to the routine correctly. The first line of this section contains the name of the routine and its address in memory. It also contains the address of the location of the parameter block that MODULEIN created.
The log contains the status of each argument as it is passed. For example, the first parameter in the example is call-by-value (as indicated in the log). The second parameter is the address of the matrix. The log shows the address, along with the data to which it points.
Note that all the values in the first parameter and in the matrix are long integers because the attribute table states that the format is IB4.
In the third section, the log contains the argument values upon return from changi . The call-by-value argument is unchanged, but the other argument (the matrix) contains different values.
The last section of the log output contains the values of the arguments as they are returned to the MODULEIN calling routine.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.