Beginning with SAS/STAT® 13.1 in SAS 9.4 TS1M1, the GROUPS= option in the PROC SURVEYSELECT statement randomly assigns observations to groups. If you specify a number of groups, then the numbers of observations assigned to the groups are equal or as equal as possible. You also have the ability to specify different group sizes for the random assignments in the GROUPS= option.
For example, suppose you want to divide the ten observations in the following data set into three groups.
data one; do x=1 to 10; output; end; run;
Specifying the GROUPS=3 option in PROC SURVEYSELECT divides the ten observations into three groups as evenly as possible. The results of this example can be reproduced by specifying the same value in the SEED= option.
proc surveyselect data=one groups=3 seed=49201 out=RandomGroups noprint; run; proc freq data=RandomGroups; tables GroupID; run;
|
Prior to SAS/STAT 13.1, you can use PROC SURVEYSELECT to randomly divide a data set into two groups as described in this note. For more than two groups, you can use PROC PLAN to randomly assign each observation to a group such that the groups are of equal size, or as equal as possible when the data set is not evenly divisible by the number of groups.
For example, suppose you want to divide the ten observations in data set ONE (above) into three groups. These statements create data set A consisting of four sets of three observations. Each set contains a random arrangement of the values 1, 2, and 3. Since three groups are desired, specify GROUP=3. To accommodate ten observations, you need four sets, so specify SET=4. The results of this example can be reproduced by specifying the same value in the SEED= option.
proc plan seed=4233; factors set=4 group=3 / noprint; output out=a; run;
In the following DATA step, the RANUNI function is used to add a random number between 0 and 1 to each observation. Again, by using the same seed the results of this example can be reproduced. The IF statement removes the two extra observations created by PROC PLAN.
data a; set a; random=ranuni(2342); if _n_>10 then stop; run;
Sorting by the random variable randomizes the group numbers across the entire data set.
proc sort data=a; by random; run;
The final data set consisting of the ten observations with assigned group numbers is created by merging the randomized data set of group numbers with the original data set.
data RandomGroups; merge one a; run; proc print; id x; var group; run;
|
PROC FREQ can be used to verify the sizes of the groups.
proc freq data=RandomGroups; tables group; run;
|
Note that groups 1 and 3 each have three observations and group 2 was randomly given a fourth observation. The group assignment for each observation is completely random.
Suppose you want each consecutive set of G observations to randomly assign one observation to each group, where G is the number of groups. This is often desired when the total number of observations is not initially known. In this example, if you did not know how many observations you would end up with, you might want to randomly assign the first three observations to each of the groups and continue to do the same for each set of three observations as they become available. Do this by specifying a sufficiently large value for SET in the PLAN step above and omit the DATA and SORT steps that follow.
Suppose you have twelve subjects and want to assign them to three groups. You expect an unknown number of additional subjects to become available that will also need to be randomly assigned. When all observations are collected, you want the groups sizes to be as equal as possible. The following statements produce random assignments, in sets of three, for up to 10 × 3 = 30 observations. As subjects become available after the 12th, they can be assigned to groups according to the plan. Each additional set of three observations is randomly assigned one to a group.
data one; do id=1 to 12; output; end; run; proc plan seed=58349; factors set=10 group=3 / noprint; output out=a; run; data RandomGroups; merge one a; run; proc print; id id; var group; run;
If more than 30 observations become available, simply run PROC PLAN again to generate more randomized sets of three.
proc plan seed=39352; factors set=10 group=3 / noprint; output out=a; run; proc print noobs; var group; run;
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics SAS Reference ==> Procedures ==> SURVEYSELECT SAS Reference ==> Procedures ==> PLAN |
Date Modified: | 2009-07-16 11:01:51 |
Date Created: | 2009-06-26 13:29:38 |