36383 - Randomly assign the observations in a data set to two or more groups

Usage Note 36383: Randomly assign the observations in a data set to two or more groups

SAS^® 9.4 TS1M1 or later

Beginning with SAS/STAT^® 13.1 in SAS 9.4 TS1M1, the GROUPS= option in the PROC SURVEYSELECT statement randomly assigns observations to groups. If you specify a number of groups, then the numbers of observations assigned to the groups are equal or as equal as possible. You also have the ability to specify different group sizes for the random assignments in the GROUPS= option.

For example, suppose you want to divide the ten observations in the following data set into three groups.

      data one;
        do x=1 to 10;
          output;
        end;
        run;

Specifying the GROUPS=3 option in PROC SURVEYSELECT divides the ten observations into three groups as evenly as possible. The results of this example can be reproduced by specifying the same value in the SEED= option.

      proc surveyselect data=one groups=3 seed=49201 out=RandomGroups noprint;
        run;
      proc freq data=RandomGroups;
        tables GroupID;
        run;

Group ID Number
GroupID	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1	3	30.00	3	30.00
2	3	30.00	6	60.00
3	4	40.00	10	100.00

Releases before SAS^® 9.4 TS1M1

Prior to SAS/STAT 13.1, you can use PROC SURVEYSELECT to randomly divide a data set into two groups as described in this note. For more than two groups, you can use PROC PLAN to randomly assign each observation to a group such that the groups are of equal size, or as equal as possible when the data set is not evenly divisible by the number of groups.

For example, suppose you want to divide the ten observations in data set ONE (above) into three groups. These statements create data set A consisting of four sets of three observations. Each set contains a random arrangement of the values 1, 2, and 3. Since three groups are desired, specify GROUP=3. To accommodate ten observations, you need four sets, so specify SET=4. The results of this example can be reproduced by specifying the same value in the SEED= option.

      proc plan seed=4233;
        factors set=4 group=3 / noprint;
        output out=a;
        run;

In the following DATA step, the RANUNI function is used to add a random number between 0 and 1 to each observation. Again, by using the same seed the results of this example can be reproduced. The IF statement removes the two extra observations created by PROC PLAN.

      data a; 
        set a; 
        random=ranuni(2342);
        if _n_>10 then stop;
        run;

Sorting by the random variable randomizes the group numbers across the entire data set.

      proc sort data=a; 
        by random;
        run;

The final data set consisting of the ten observations with assigned group numbers is created by merging the randomized data set of group numbers with the original data set.

      data RandomGroups;
        merge one a;
        run;
      proc print;
        id x;
        var group;
        run;

x	group
1	1
2	2
3	2
4	3
5	2
6	3
7	1
8	3
9	2
10	1

PROC FREQ can be used to verify the sizes of the groups.

      proc freq data=RandomGroups;
        tables group;
        run;

group	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1	3	30.00	3	30.00
2	4	40.00	7	70.00
3	3	30.00	10	100.00

Note that groups 1 and 3 each have three observations and group 2 was randomly given a fourth observation. The group assignment for each observation is completely random.

Unknown Number of Observations

Suppose you want each consecutive set of G observations to randomly assign one observation to each group, where G is the number of groups. This is often desired when the total number of observations is not initially known. In this example, if you did not know how many observations you would end up with, you might want to randomly assign the first three observations to each of the groups and continue to do the same for each set of three observations as they become available. Do this by specifying a sufficiently large value for SET in the PLAN step above and omit the DATA and SORT steps that follow.

Suppose you have twelve subjects and want to assign them to three groups. You expect an unknown number of additional subjects to become available that will also need to be randomly assigned. When all observations are collected, you want the groups sizes to be as equal as possible. The following statements produce random assignments, in sets of three, for up to 10 × 3 = 30 observations. As subjects become available after the 12^th, they can be assigned to groups according to the plan. Each additional set of three observations is randomly assigned one to a group.

      data one;
        do id=1 to 12;
          output;
        end;
        run;
      proc plan seed=58349;
        factors set=10 group=3 / noprint;
        output out=a;
        run;
      data RandomGroups;
        merge one a;
        run;
      proc print;
        id id;
        var group;
        run;

If more than 30 observations become available, simply run PROC PLAN again to generate more randomized sets of three.

      proc plan seed=39352;
        factors set=10 group=3 / noprint;
        output out=a;
        run;
      proc print noobs;
        var group;
        run;

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2008
		Microsoft Windows XP Professional
		Windows Millennium Edition (Me)
		Windows Vista
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics SAS Reference ==> Procedures ==> SURVEYSELECT SAS Reference ==> Procedures ==> PLAN

Date Modified:	2009-07-16 11:01:51
Date Created:	2009-06-26 13:29:38

Support

Usage Note 36383: Randomly assign the observations in a data set to two or more groups

SAS® 9.4 TS1M1 or later

Releases before SAS® 9.4 TS1M1

Unknown Number of Observations

Operating System and Release Information

SAS^® 9.4 TS1M1 or later

Releases before SAS^® 9.4 TS1M1