|
Chapter Contents |
Previous |
Next |
| The SURVEYMEANS Procedure |
Suppose that the sample of students was selected using stratified random sampling. In stratified sampling, the study population is divided into nonoverlapping strata, and samples are selected from each stratum independently.
The list of students in this junior high school was stratified by grade, yielding three strata: grades 7, 8, and 9. A simple random sample of students was selected from each grade. Table 13.1 shows the total number of students in each grade.
Table 13.1: Number of Students by Grade| Grade | Number of Students |
| 7 | 1,824 |
| 8 | 1,025 |
| 9 | 1,151 |
| Total | 4,000 |
A sample of 40 students was selected from the entire student population. Each student selected for the sample was asked how much he or she spends for ice cream per week, on average. The SAS data set named IceCream saved the responses of the 40 students:
data IceCream;
input Grade Spending @@;
if (Spending < 10) then Group='less';
else Group='more';
datalines;
7 7 7 7 8 12 9 10 7 1 7 10 7 3 8 20 8 19 7 2
7 2 9 15 8 16 7 6 7 6 7 6 9 15 8 17 8 14 9 8
9 8 9 7 7 3 7 12 7 4 9 14 8 18 9 9 7 2 7 1
7 4 7 11 9 8 8 10 8 13 7 2 9 6 9 11 7 2 7 9
;
The variable Grade contains a student's grade. The variable Spending contains a student's response on how much was spent per week for ice cream, in dollars. The variable Group is created to indicate whether a student spends at least $10 weekly for ice cream: Group='more' if a student spends at least $10, or Group='less' if a student spends less than $10.
To analyze this stratified sample, you need to provide the population totals for each stratum to PROC SURVEYMEANS. The SAS data set named StudentTotals contains the information from Table 13.1:
data StudentTotals;
input Grade _total_; datalines;
7 1824
8 1025
9 1151
;
The variable Grade is the stratum identification variable, and the variable _TOTAL_ contains the total number of students for each stratum. PROC SURVEYMEANS requires you to use the variable name _TOTAL_ for the stratum population totals.
The procedure uses the stratum population totals to adjust variance estimates for the effects of sampling from a finite population. If you do not provide population totals or sampling rates, then the procedure assumes that the proportion of the population in the sample is very small, and the computation does not involve a finite population correction.
In a stratified sample design, when the sampling rates in the strata are unequal, you need to use sampling weights to reflect this information in order to produce an unbiased mean estimator. In this example, the appropriate sampling weights are reciprocals of the probabilities of selection. You can use the following data step to create the sampling weights:
data IceCream;
set IceCream;
if Grade=7 then Prob=20/1824;
if Grade=8 then Prob=9/1025;
if Grade=9 then Prob=11/1151;
Weight=1/Prob;
If you use PROC SURVEYSELECT to select your sample, it creates these sampling weights for you.
The following SAS statements perform the stratified analysis of the survey data:
title1 'Analysis of Ice Cream Spending';
title2 'Stratified Simple Random Sample Design';
proc surveymeans data=IceCream total=StudentTotals;
stratum Grade / list;
var Spending Group;
weight Weight;
run;
The PROC SURVEYMEANS statement invokes the procedure. The DATA= option names the SAS data set IceCream as the input data set to be analyzed. The TOTAL= option names the data set StudentTotals as the input data set containing the stratum population totals. Notice that the TOTAL=StudentTotals option is used here instead of the TOTAL=4000 option. In this stratified sample design, the population totals are different for different strata, and so you need to provide them to PROC SURVEYMEANS in a SAS data set.
The STRATA statement identifies the stratification variable Grade. The LIST option in the STRATA statement requests that the procedure display stratum information. The WEIGHT statement tells the procedure that the variable Weight contains the sampling weights.
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.