In a stratified sampling design, when the sampling fraction is the same in all strata, the population mean estimate y is the same as the stratified sampling estimate yst. Because PROC SURVEYMEANS assigns equal weights of 1 to all observations by default, the sample mean is computed as y even under a stratified design. In order to compute yst, the appropriate weight variable must be used. If the STRATA statement is specified without a WEIGHT statement, PROC SURVEYMEANS issues the following message:
NOTE: You are using unequal sampling rates in a stratified design but did not
specify a WEIGHT statement. Unless you also specify a WEIGHT statement,
the analysis will assume equal weights for all observations.
The following demonstrates the issue using the example titled "Stratified Cluster Sample Design" in the PROC SURVEYMEANS documentation. In the data, there is ice cream spending data from three strata: Grade=7, 8, and 9.
data IceCream;
input Grade Spending @@;
if (Spending < 10) then Group='less';
else Group='more';
datalines;
7 7 7 7 8 12 9 10 7 1 7 10 7 3 8 20 8 19 7 2
7 2 9 15 8 16 7 6 7 6 7 6 9 15 8 17 8 14 9 8
9 8 9 7 7 3 7 12 7 4 9 14 8 18 9 9 7 2 7 1
7 4 7 11 9 8 8 10 8 13 7 2 9 6 9 11 7 2 7 9
;
Based on the total number of students in each stratum, the TOTAL= data set is as follows:
data StudentTotal;
input Grade _total_;
datalines;
7 1824
8 1025
9 1151
;
The sampling rate in each stratum is as follows:
Grade Sampling Rate
7 20/1824 = 0.011
8 9/1025 = 0.009
9 11/1151 = 0.010
In order to compute an unbiased estimate for yst, each observation needs to be weighted appropriately. In this case, the weights are obtained as the ratio of the overall sampling rate (40/4000=0.01) to the sampling rate for the stratum to which this observation belongs. The weights would be constructed as follows:
data IceCream;
set IceCream;
if Grade=7 then Weight=1/(20/1824);
if Grade=8 then Weight=1/(9/1025);
if Grade=9 then Weight=1/(11/1151);
run;
The following statements estimate the stratified sample mean yst.
proc surveymeans data=IceCream total=StudentTotal;
stratum Grade / list;
var Spending Group;
weight Weight;
run;
Reference
Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley & Sons.
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.