Sample 24760: Stratified random sample without replacement, equal allocation
Select a specific number of observations from different
groups, where no observation can be chosen more than once.
Note:
Method 1 uses PROC SURVEYSELECT which is part of the
SAS/STAT package in Version 7 and above. If you do not have
SAS/STAT licensed or if you are running Version 6 of SAS,
see Methods 2 and 3.
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
/* Create sample data base of student grade point averages from */
/* East High School, Grades 9 through 12, 100 or more students per grade. */
data EastHigh;
format GPA 3.1;
do Grade=9 to 12;
do StudentID=1 to 100+int(201*ranuni(432098));
GPA=2.0 + (2.1*ranuni(34280));
output;
end;
end;
run;
/* Method 1: Using PROC SURVEYSELECT */
/* */
/* N= is the number of observations to select from */
/* each group. Use METHOD=SRS. The statement STRATA defines */
/* the variable that is used for grouping. OUT= names the */
/* SAS data set that will be created by the procedure. */
proc surveyselect data=EastHigh method=srs n=5 out=sample;
strata Grade;
run;
title 'Method 1: PROC SURVEYSELECT';
proc print data=sample;
run;
/* Method 2: Using Base SAS */
/* */
/* Generate a random number for each observation in the data set. */
/* Sort the data set adding the new variable to the end of the BY statement. */
/* This will randomly order observations within the BY-Group. In the next */
/* DATA step, reset a counter every time the sampling group changes. Output */
/* the number of observations that you want in each sample group. */
data temp;
set EastHigh;
abc=ranuni(15243);
run;
proc sort data=temp;
by grade abc;
run;
data sample;
set temp;
by grade;
if first.grade then count=0;
count+1;
if count <=5 then output;
drop count abc;
run;
title "Method 2: Using BASE SAS with sort on random variable ";
proc print data=sample;
run;
/* Method 3: Using Base SAS, no extra sort required */
/* */
/* Since the process involves choosing a small sample from */
/* each of the subgroups of the data to make the entire random */
/* sample, you need to do the following: */
/* - count the observations in each category */
/* - sort the observations into categories */
/* - combine the sorted data set with the category counts */
/* - select the observations for the sample. */
/* Use PROC FREQ to count the number of observations in each */
/* category. */
proc freq data=EastHigh;
tables grade / out=bycount noprint;
run;
/* Combine the data sets by GRADE */
data sample(drop=k count);
merge EastHigh bycount(drop=percent);
by Grade;
/* Use a RETAIN statement to create a variable that will contain */
/* the number of observations that you want from each category. */
/* Initialize the new variable to that number each time the */
/* category changes by testing FIRST.GRADE. */
retain k;
if first.grade then k=5;
/* COUNT, the number of observations in each category, comes from */
/* the PROC FREQ step. Decrease COUNT by 1 on each iteration of */
/* the DATA step to reflect the number of observations remaining */
/* in the group. */
if ranuni(15243)<=k/count then do;
output;
k=k-1;
end;
count=count-1;
run;
title "Method 3: Using BASE SAS with no extra sort on random variable ";
proc print data=sample;
run;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
Method 1: PROC SURVEYSELECT 14:27 Thursday, November 4, 2004 61
Student Selection Sampling
Obs Grade GPA ID Prob Weight
1 9 2.4 17 0.017123 58.4
2 9 2.9 19 0.017123 58.4
3 9 3.4 66 0.017123 58.4
4 9 3.4 141 0.017123 58.4
5 9 3.5 291 0.017123 58.4
6 10 3.6 15 0.041667 24.0
7 10 3.7 21 0.041667 24.0
8 10 2.1 32 0.041667 24.0
9 10 3.5 92 0.041667 24.0
10 10 2.8 119 0.041667 24.0
11 11 2.2 24 0.030120 33.2
12 11 3.1 63 0.030120 33.2
13 11 3.5 90 0.030120 33.2
14 11 3.0 96 0.030120 33.2
15 11 2.8 139 0.030120 33.2
16 12 2.2 6 0.022422 44.6
17 12 3.5 23 0.022422 44.6
18 12 3.4 25 0.022422 44.6
19 12 3.1 68 0.022422 44.6
20 12 2.1 114 0.022422 44.6
Method 2: Using BASE SAS with sort on random variable 14:27 Thursday, November 4, 2004 62
Student
Obs GPA Grade ID
1 3.0 9 144
2 3.7 9 81
3 2.6 9 9
4 3.3 9 39
5 3.3 9 4
6 2.6 10 11
7 4.1 10 102
8 3.6 10 28
9 2.8 10 108
10 2.6 10 20
11 2.7 11 98
12 3.4 11 91
13 2.2 11 49
14 2.1 11 50
15 3.3 11 97
16 2.3 12 75
17 4.0 12 9
18 4.1 12 195
19 3.0 12 148
20 3.3 12 80
Method 3: Using BASE SAS with no extra sort on random variable 14:27 Thursday, November 4, 2004 63
Student
Obs GPA Grade ID
1 3.3 9 4
2 2.6 9 9
3 3.3 9 39
4 3.0 9 144
5 2.5 9 290
6 2.6 10 11
7 2.6 10 20
8 3.6 10 28
9 4.1 10 102
10 2.8 10 108
11 3.7 11 45
12 2.2 11 49
13 3.4 11 91
14 2.7 11 98
15 3.1 11 165
16 2.8 12 7
17 4.0 12 9
18 4.1 12 38
19 2.3 12 75
20 3.0 12 148
Select a specific number of observations from different
groups, where no observation can be chosen more than once.
| Type: | Sample |
| Topic: | Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> SURVEYSELECT SAS Reference ==> DATA Step
|
| Date Modified: | 2005-12-08 11:34:31 |
| Date Created: | 2004-09-30 14:09:11 |
Operating System and Release Information
| SAS System | SAS/STAT | All | n/a | n/a |
| SAS System | Base SAS | All | n/a | n/a |