The REPS= (or REP=) option in PROC SURVEYSELECT takes independent samples using the same sample design, so by definition each sample is selected from the same original frame or input data set. This means duplication is possible.
You can create mutually exclusive or nonoverlapping samples using PROC SURVEYSELECT by first selecting a single sample, without replacement, with size equal to the total number of units. Then randomly divide this sample into distinct subsamples.
Example
Suppose you want to select 5 subsamples of 10 observations from the following data set of 1,000 observations:
data a;
do x=1 to 1000;
output;
end;
run;
Use PROC SURVEYSELECT to randomly select 5 subsamples × 10 observations per subsample = 50 observations. SAMPSIZE= specifies the sample size. METHOD=SRS, the default, requests a simple random sample without replacement so that no observation is selected more than once. The OUT= data set contains the selected sample. The SEED= option is specified to allow the results of this example to be reproduced.
proc surveyselect data=a out=samples method=srs sampsize=50 seed=48922 noprint;
run;
To randomly assign each of the selected observations to one of five subsamples, start by adding a random number to the data set using the RANUNI function. Using the same seed allows the results of this example to be reproduced.
data samples;
set samples;
random=ranuni(93821);
run;
Sorting the sample by the random numbers randomizes the order of the observations.
proc sort data=samples;
by random;
run;
Using the CEIL function and the automatic variable _N_, this DATA step creates the variable SampleID which identifies the subsample number for each observation. The data set now has the desired number of random samples, with no replication, from the original data set.
data samples;
set samples;
SampleID=ceil(_n_/10);
drop random;
run;
The following table displays the X values contained in the five samples in side-by-side columns.
| 818 |
542 |
691 |
571 |
871 |
| 888 |
405 |
883 |
25 |
192 |
| 551 |
764 |
703 |
313 |
195 |
| 287 |
228 |
569 |
196 |
358 |
| 57 |
244 |
185 |
803 |
53 |
| 443 |
249 |
601 |
714 |
503 |
| 676 |
217 |
648 |
635 |
783 |
| 379 |
944 |
775 |
729 |
282 |
| 234 |
23 |
178 |
751 |
693 |
| 830 |
613 |
984 |
547 |
726 |
|
Operating System and Release Information
| SAS System | SAS/STAT | Microsoft Windows Server 2003 Datacenter 64-bit Edition | | |
| OpenVMS VAX | | |
| Microsoft® Windows® for 64-Bit Itanium-based Systems | | |
| z/OS | | |
| Microsoft Windows Server 2003 Enterprise 64-bit Edition | | |
| Microsoft Windows XP 64-bit Edition | | |
| Microsoft® Windows® for x64 | | |
| OS/2 | | |
| Microsoft Windows 95/98 | | |
| Microsoft Windows 2000 Advanced Server | | |
| Microsoft Windows 2000 Datacenter Server | | |
| Microsoft Windows 2000 Server | | |
| Microsoft Windows 2000 Professional | | |
| Microsoft Windows NT Workstation | | |
| Microsoft Windows Server 2003 Datacenter Edition | | |
| Microsoft Windows Server 2003 Enterprise Edition | | |
| Microsoft Windows Server 2003 Standard Edition | | |
| Microsoft Windows Server 2008 | | |
| Microsoft Windows XP Professional | | |
| Windows Millennium Edition (Me) | | |
| Windows Vista | | |
| 64-bit Enabled AIX | | |
| 64-bit Enabled HP-UX | | |
| 64-bit Enabled Solaris | | |
| ABI+ for Intel Architecture | | |
| AIX | | |
| HP-UX | | |
| HP-UX IPF | | |
| IRIX | | |
| Linux | | |
| Linux for x64 | | |
| Linux on Itanium | | |
| OpenVMS Alpha | | |
| OpenVMS on HP Integrity | | |
| Solaris | | |
| Solaris for x64 | | |
| Tru64 UNIX | | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.