46847 - Run a procedure or macro once for each data set, or each data subset (BY group), variable, or value specified in a data set

SUPPORT / SAMPLES & SAS NOTES

Support

Usage Note 46847: Run a procedure or macro once for each data set, or each data subset (BY group), variable, or value specified in a data set

There are situations which require you to run a procedure or a macro many times. For the special case where you need to run a procedure on separate blocks of observations in a data set, you can simply add a BY statement to your procedure call for most procedures. But many macros were not written with BY group processing as a built-in capability. Another type of problem needing repeated runs is when you want to fit a series of regression models with either a different response variable, or different predictors, or both. In these situations you would obviously like to avoid writing separate macro calls or procedure steps for the multiple runs required.

A general macro, RunBY, is available that can run other macros, procedures, or specialized code successively for BY groups in your data.

The following presents another approach that is also quite flexible. It uses the CALL EXECUTE function in the DATA step to generate the multiple procedure steps and run them. With this approach you can run the procedure or macro once for each observation in a data set, using the information that you place in the observations.

Three common uses of the CALL EXECUTE approach are illustrated below. For examples of doing similar tasks with the RunBY macro, see its documentation.

Run a procedure for each variable in a data set

Suppose you want to fit a series of single-predictor logistic regression models using the following data set. Each model will use one of the predictors to model the response variable, REMISS. When complete, you want a single data set that contains the intercepts and slopes for all of the models, along with tests and confidence intervals for the parameters. These statements create the data set.

      data remiss;
         input remiss cell smear infil li blast temp;
         datalines;
      1   .8   .83  .66  1.9  1.1     .996
      1   .9   .36  .32  1.4   .74    .992
      0   .8   .88  .7    .8   .176   .982
      0  1     .87  .87   .7  1.053   .986
      1   .9   .75  .68  1.3   .519   .98
      0  1     .65  .65   .6   .519   .982
      1   .95  .97  .92  1    1.23    .992
      0   .95  .87  .83  1.9  1.354  1.02
      0  1     .45  .45   .8   .322   .999
      0   .95  .36  .34   .5  0      1.038
      0   .85  .39  .33   .7   .279   .988
      0   .7   .76  .53  1.2   .146   .982
      0   .8   .46  .37   .4   .38   1.006
      0   .2   .39  .08   .8   .114   .99
      0  1     .9   .9   1.1  1.037   .99
      1  1     .84  .84  1.9  2.064  1.02
      0   .65  .42  .27   .5   .114  1.014
      0  1     .75  .75  1    1.322  1.004
      0   .5   .44  .22   .6   .114   .99
      1  1     .63  .63  1.1  1.072   .986
      0  1     .33  .33   .4   .176  1.01
      0   .9   .93  .84   .6  1.591  1.02
      1  1     .58  .58  1     .531  1.002
      0   .95  .32  .3   1.6   .886   .988
      1  1     .6   .6   1.7   .964   .99
      1  1     .69  .69   .9   .398   .986
      0  1     .73  .73   .7   .398   .986
      ;

Since PROC LOGISTIC will be run once for each of the predictor variables, we need a data set that contains a variable holding the names of the predictors. This could be done by specifying all of the names as data in a DATA step.

      data names;
        input var $ @@;
        datalines;
      cell smear infil li blast temp
      ;

The above approach might not be very practical for an existing data set with a large number of variables. Alternatively, use PROC TRANSPOSE to transpose the predictor variables and keep only the _NAME_ variable that contains the variable names. Since all the variables in the data set are predictors except for REMISS (the response), the DROP= option is used to ignore REMISS.

      proc transpose data=remiss(drop=remiss) out=names(keep=_name_); 
        run;

If the predictors are a small, contiguous group of variables, you could instead use a VAR statement with a variable list specifying only the first and last variables in the group. PROC CONTENTS can be used to see the order of variables in a data set.

      proc transpose data=remiss out=names(keep=_name_);
        var cell--temp; 
        run;

Regardless of which method is used to create a data set variable containing the predictor names, the following step is used to generate the statements for each PROC LOGISTIC step and then executes them. The ODS OUTPUT statement with the PERSIST option accumulates the Parameter Estimates tables in data set PE across all of the PROC LOGISTIC runs. Since this DATA step only produces code and runs it rather than creating a data set, DATA _NULL_ is specified. The SET statement assures that this DATA step will iterate for each observation in the NAMES data set. That is, it will iterate for each predictor name. The CALL EXECUTE statement runs the code in parentheses. Note that the code is a simple concatenation of three text strings: the PROC LOGISTIC statement and the MODEL statement up to the equal sign, the predictor name in the current observation of the NAMES data set, and the end of the MODEL statement and the RUN statement. The double vertical bars ( || ) are the string concatenation operator. For each iteration of this DATA step, the variable VAR in the NAMES data set contains the next predictor name and it is inserted into the MODEL statement. The text string is then run by CALL EXECUTE resulting in PROC LOGISTIC fitting a model with that predictor. The resulting Parameter Estimates table is added to the PE data set by the ODS OUTPUT statement.

      ods output parameterestimates(persist)=pe; 
      data _null_;
        set names;
        call execute("proc logistic data=remiss; 
                        model remiss =" || var || ";
                        run;"
                    );
        run;
      ods output close;

The SAS Log shows the execution of each PROC LOGISTIC step (notes produced by each executed step are also displayed but are omitted here for clarity):

1   + proc logistic data=remiss;                  model remiss =cell    ;
      run;

2   + proc logistic data=remiss;                  model remiss =smear   ;
      run;

3   + proc logistic data=remiss;                  model remiss =infil   ;
      run;

4   + proc logistic data=remiss;                  model remiss =li      ;
      run;

5   + proc logistic data=remiss;                  model remiss =blast   ;
      run;

6   + proc logistic data=remiss;                  model remiss =temp    ;
      run;

At completion, the accumulated parameter estimates tables are in data set PE.

      proc print data=pe noobs;
        run;

Each pair of observations contains the intercept and slope for each of the six models fit by the preceding DATA step.

_Proc_	_Run_	Variable	DF	Estimate	StdErr	WaldChiSq	ProbChiSq
Logistic	1	Intercept	1	5.6362	4.0971	1.8924	0.1689
Logistic	1	cell	1	-5.4114	4.3446	1.5514	0.2129
Logistic	1	Intercept	1	2.0454	1.4164	2.0853	0.1487
Logistic	1	smear	1	-2.0787	2.0329	1.0455	0.3065
Logistic	1	Intercept	1	2.2476	1.2851	3.0587	0.0803
Logistic	1	infil	1	-2.6123	1.9594	1.7776	0.1824
Logistic	1	Intercept	1	3.7771	1.3786	7.5064	0.0061
Logistic	1	li	1	-2.8973	1.1868	5.9594	0.0146
Logistic	1	Intercept	1	1.8258	0.8061	5.1306	0.0235
Logistic	1	blast	1	-1.5262	0.8683	3.0894	0.0788
Logistic	1	Intercept	1	-24.2018	31.1283	0.6045	0.4369
Logistic	1	temp	1	24.9941	31.2814	0.6384	0.4243

In this simple example, only a single substitution (the predictor name) is made to the code of the procedure at each run. But you could use this method for much more complex situations requiring more substitutions. You would simply need more variables in the input data set to specify all the information needed for each run.

Replicating the BY statement in a procedure

One common application is the equivalent of BY processing. This can be used with a procedure that doesn't offer a BY statement, or if an error when using a BY statement terminates the procedure before all BY groups are processed. Unlike BY processing with a BY statement, this method does not require sorting the data set.

See the documentation of the RunBY macro for a macro-based approach to doing BY group processing.

Using the Nonparametric Logistic Regression example in the PROC GAMPL documentation, the GAMPL procedure is run separately for the subjects in the training (TEST=0) and test (TEST=1) groups. PROC FREQ is used to produce a data set (GROUPS) containing the unique values of TEST. Then the DATA step runs for each unique TEST value executing PROC GAMPL for that TEST value. Note the use of pairs of quotes in the TITLE statement to represent a single quote in the code that is executed.

      proc freq data=Pima;
        table test / out=groups;
        run;
        
      data _null_;
        set groups;
        call execute("proc gampl data=Pima seed=12345; where test="|| test ||"; 
                      model diabetes(event='1')=spline(Glucose) spline(Pressure)/dist=binary; 
                      title ""Test="|| test ||""";
                      run;");
        run;

Adding BY processing to a macro

Another common usage of this technique is to add the capability of a BY statement to a macro.

See the documentation of the RunBY macro for a macro-based approach to doing BY group processing.

In this example, the technique is used to do the equivalent of BY processing with the MAGREE macro. Data set A contains the ratings of five raters (R) on each of ten subjects (S) for each of two questions (QUESTION).

      data a;
        do question=1 to 2;
        do s=1 to 10;
        do r=1 to 5;
          input y @@;
          output;
        end; end; end;
        datalines;
      1 2 2 2 2 
      1 1 3 3 3
      3 3 3 3 3
      1 1 1 1 3
      1 1 1 3 3
      1 2 2 2 2
      1 1 1 1 1
      2 2 2 2 3
      1 3 3 3 3
      1 1 1 3 3
      1 2 2 2 2 
      1 1 3 3 3
      3 3 3 2 3
      1 1 1 1 3
      1 1 2 3 3
      1 2 2 2 2
      1 1 1 2 1
      2 2 2 2 3
      1 3 3 3 3
      1 1 1 3 3
      ;

PROC FREQ is used to produce a data set (QUESTIONS) containing the unique values of QUESTION. Then the DATA step runs for each QUESTION executing a subsetting DATA step followed by the MAGREE macro for that question. The subsetting DATA step creates data set B which contains only the data for one question. Note the use of pairs of quotes in the TITLE statement to represent a single quote in the code that is executed. Also the %NRSTR macro function is needed to prevent the MAGREE macro from being executed before it is run by CALL EXECUTE.

      proc freq data=a;
        table question / out=questions;
        run;
        
      data _null_;
        set questions;
        call execute('data b; set a; where question='|| question ||'; title ''Question ='|| question ||''';run;
                      %nrstr(%magree(data=b, items=s, raters=r, response=y, stat=kappa))'
                    );
        run;

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows 2012
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 for x64
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX
SAS System	SAS/ETS	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows 2012
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 for x64
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> analytics SAS Reference ==> Functions ==> Macro ==> CALL EXECUTE

Date Modified:	2020-07-09 11:31:22
Date Created:	2012-06-19 17:35:36

Support

Usage Note 46847: Run a procedure or macro once for each data set, or each data subset (BY group), variable, or value specified in a data set

Operating System and Release Information

Follow Us

What is...