![]() | ![]() | ![]() |
Beginning with SAS 9.2, some common allocation methods, including proportional allocation, are available in PROC SURVEYSELECT. Prior to SAS 9.2, you can manually allocate your total sample size according to specified proportions to achieve this result. In proportional allocation, the sampling fraction in each stratum reflects the same sampling fraction as in the population. That is, nh/Nh is equal to n/N for every stratum h.
Here is an example using PROC SURVEYSELECT. Suppose you have three groups and a total of 3000 observations:
data a;
do i=1 to 3000;
if ranuni(2345) le .4 then Group=1;
else if ranuni(2345) ge .8 then Group=2;
else Group=3;
output;
end;
run;
Suppose further that you want a simple random sample of 10 observations such that the group proportions in the sample are the same as the group proportions in the original population.
For convenience, set a macro variable to your total sample size.
%let TotalSampleSize=10;
To have the same or nearly the same sample and population stratum proportions, you’ll compute the necessary stratum sample sizes and pass them to PROC SURVEYSELECT. First, determine the population stratum proportions and save them to a data set.
title "Group distribution in the population";
proc freq data=a;
tables Group / out=freqout;
run;
title;
The results in this example are as follows:
The FREQ Procedure
|
For the sample to retain this distribution, you want 39.67% of the sample to be in Group 1; 12.5% in Group 2; and 47.83% in Group 3. So, the sample sizes to select from the groups are as follows:
Because the sample sizes must be integers, use a rounding or truncating function when computing the stratum sample sizes. This next DATA step creates a SAMPSIZE= input data set for PROC SURVEYSELECT, and computes the proportionally allocated stratum sample sizes.
data sampsize;
set freqout;
_nsize_=round((Percent/100) * &TotalSampleSize);
run;
Sort your DATA= data set by the stratum variable if it's not already sorted. To use PROC SURVEYSELECT correctly for stratified sampling, the DATA= data set and SAMPSIZE= value list or data set must have the strata in the same order.
proc sort data=a;
by group;
run;
And finally, select the sample.
proc surveyselect data=a out=sample sampsize=sampsize;
strata Group;
run;
If you want to verify the distribution of the sample, use PROC FREQ.
title "Group distribution in the sample";
proc freq data=sample;
tables Group;
run;
title;
The desired result is achieved:
The FREQ Procedure
|
The overall sampling fraction of 10/3000 = 0.0033 has been retained as closely as possible in each stratum. We have stratum sampling fractions of 4/1190 = 0.0034, 1/375 = 0.0027, and 5/1435 = 0.0035.
This same approach can be easily extended to two or more strata. For example, suppose you want a sample of 25 observations from the following data set of 500, such that the proportions of males and females in each of three age groups are the same as the population proportions.
data b;
do id=1 to 500;
Gender=scan('male, female', ceil(2*ranuni(430982)));
AgeGroup=scan('18-25,25-50,Over 50', rantbl(839247,0.20,0.50,0.30),',');
output;
end;
run;
%let TotalSampSize = 25;
title "Gender and Age Group distribution in the population";
proc freq data=b;
tables Gender*AgeGroup / out=freqout;
run;
title;
The FREQ Procedure
| ||||||||||||||||||||||||||||||||
data sampsize;
set freqout;
_nsize_=round((Percent/100) * &TotalSampSize);
run;
proc sort data=b;
by Gender AgeGroup;
run;
proc surveyselect data=b out=sample sampsize=sampsize noprint;
strata Gender AgeGroup;
run;
title "Gender and Age Group distribution in the sample";
proc freq data=sample;
tables Gender * AgeGroup;
run;
title;
Notice that the final sample size is 26 in this example instead of the intended size of 25. This is due to the necessary rounding in the _NSIZE_ computation. The _NSIZE_ computations prior to rounding result in stratum sampling fractions equal to the overall sampling fraction. Because integer sample sizes are required, the final sample size will be as close as possible to the specified sample size while maintaining stratum sampling fractions as close as possible to the overall sampling fraction.
The FREQ Procedure
| ||||||||||||||||||||||||||||||||
| Product Family | Product | System | SAS Release | |
| Reported | Fixed* | |||
| SAS System | SAS/STAT | All | n/a | |
| Type: | Usage Note |
| Priority: | low |
| Topic: | Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> SURVEYSELECT |
| Date Modified: | 2017-05-18 15:49:01 |
| Date Created: | 2007-03-12 15:27:38 |


