Usage Note 23977: How can I randomly permute the observations in a data set one or more times?
In the following example, we start by creating the data set (NAMES). Suppose we want three random permutations of the observations in this data set. In the PROC PLAN step, the FACTORS statement generates 120 (N=120) random permutations of the numbers 1 through 5 (OBS=5 PERM). While the exact number of permutations is not important, the number 120 is used because there are 120 (5! = 120) possible permutations of the numbers 1 through 5. As a result, specifying 120 generates all possible permutations, of which only three will be randomly chosen. Note that PROC PLAN produces the requested number of permutations algorithmically, meaning that (1 2 3 4 5) is always the first permutation; (1 2 3 5 4) is always the second permutation, and so on.
In order to select three random permutations we can rely on the N=120 specification, which assigns a random value between 1 and 120 to each permutation. In the OUTPUT statement, three random permutations (N=1, 2, or 3, though any set of three values would do) are selected for output to the data set PERMS by the WHERE data set option. PERMS has one observation for each value in each permutation resulting in 3*5 = 15 observations. The PROC PRINT step displays the data set PERMS. In the DATA step, each record from PERMS is read and the permutation values in the variable OBS are used in turn via the POINT= option to read in an observation from NAMES and write it out to the final data set, PERMNAMES. The PROC PRINT step then displays the final data set containing the three random permutations of the data set NAMES.
data names;
input name $ weight;
datalines;
john 185
mary 102
kristin 112
allen 178
bill 195
;
proc plan;
factors n=120 obs=5 perm / noprint;
output out=perms (where=(n<=3));
run;
proc print data=perms;
run;
data permnames;
set perms;
set names point=obs;
run;
proc print data=permnames;
run;
For numeric data, another alternative is to use the permutation resampling capabilities of PROC MULTTEST. What MULTTEST actually does is create a data set in which the values of one or more variables are randomly permuted.
In the following example, the WEIGHT variable in the original data set, NAMES, is randomly permuted three times. In the PROC MULTTEST statement, the PERM option requests permutation resampling. The N= option indicates the number of resamples to be generated. Specify N=1 if you want only a single random permutation of the observations. The OUTSAMP= option specifies the name of the output data set that will contain all of the resamples. Note that MULTTEST is intended for the larger purpose of computing tests and adjusting the p-values of those tests, so it requires a CLASS variable to define groups for comparison and a TEST statement to indicate the type of test desired. For our purpose, add a constant variable to the original data set to use in the CLASS statement (variable G in this example) and pick any test (MEAN in this example). If you want the permutation to involve multiple numeric variables, simply list all variables in the MEAN() list.
data names;
set names;
g=1;
run;
proc multtest data=names perm n=3 outsamp=multperms;
class g;
test mean(weight);
run;
proc print data=multperms;
run;
Operating System and Release Information
SAS System | Base SAS | All | n/a | |
SAS System | SAS/STAT | All | n/a | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Usage Note |
Priority: | low |
Topic: | Analytics SAS Reference ==> Procedures ==> PLAN SAS Reference ==> Procedures ==> MULTTEST Analytics ==> Multivariate Analysis Analytics ==> Exact Methods
|
Date Modified: | 2004-06-03 12:22:15 |
Date Created: | 2004-05-12 15:26:33 |