In a stratified sample, it is possible that some strata might have only one sampling unit. When this happens, PROC SURVEYREG collapses the strata that contain a single sampling unit into a pooled stratum. For more detailed information about stratum collapse, see the section Stratum Collapse.
Suppose that you have the following data:
data Sample; input Stratum X Y W; datalines; 10 0 0 5 10 1 1 5 11 1 1 10 11 1 2 10 12 3 3 16 33 4 4 45 14 6 7 50 12 3 4 16 ;
The variable Stratum
is again the stratification variable, the variable X
is the independent variable, and the variable Y
is the dependent variable. You want to regress Y
on X
. In the data set Sample
, both Stratum
=33 and Stratum
=14 contain one observation. By default, PROC SURVEYREG collapses these strata into one pooled stratum in the regression analysis.
To input the finite population correction information, you create the SAS data set StratumTotals
:
data StratumTotals; input Stratum _TOTAL_; datalines; 10 10 11 20 12 32 33 40 33 45 14 50 15 . 66 70 ;
The variable Stratum
is the stratification variable, and the variable _TOTAL_
contains the stratum totals. The data set StratumTotals
contains more strata than the data set Sample
. Also in the data set StratumTotals
, more than one observation contains the stratum totals for Stratum
=33:
33 40 33 45
PROC SURVEYREG allows this type of input. The procedure simply ignores strata that are not present in the data set Sample
; for the multiple entries of a stratum, the procedure uses the first observation. In this example, Stratum
=33 has the stratum total _TOTAL_
=40.
The following SAS statements perform the regression analysis:
title1 'Stratified Sample with Single Sampling Unit in Strata'; title2 'With Stratum Collapse'; proc surveyreg data=Sample total=StratumTotals; strata Stratum/list; model Y=X; weight W; run;
Output 114.6.1 shows that there are a total of five strata in the input data set and two strata are collapsed into a pooled stratum. The denominator degrees of freedom is 4, due to the collapse (see the section Denominator Degrees of Freedom).
Output 114.6.1: Summary of Data and Regression
Output 114.6.2 displays the stratification information, including stratum collapse. Under the column Collapsed, the fourth stratum (Stratum
=14) and the fifth (Stratum
=33) are marked as 'Yes,' which indicates that these two strata are collapsed into the pooled stratum (Stratum Index=0). The
sampling rate for the pooled stratum is 2% (see the section Sampling Rate of the Pooled Stratum from Collapse).
Output 114.6.3 displays the parameter estimates and the tests of the significance of the model effects.
Output 114.6.2: Stratification Information
Output 114.6.3: Parameter Estimates and Effect Tests
Alternatively, if you prefer not to collapse strata with a single sampling unit, you can specify the NOCOLLAPSE option in the STRATA statement:
title1 'Stratified Sample with Single Sampling Unit in Strata'; title2 'Without Stratum Collapse'; proc surveyreg data=Sample total=StratumTotals; strata Stratum/list nocollapse; model Y = X; weight W; run;
Output 114.6.4 does not contain the stratum collapse information displayed in Output 114.6.1, and the denominator degrees of freedom are 3 instead of 4.
Output 114.6.4: Summary of Data and Regression
In Output 114.6.5, although the fourth stratum and the fifth stratum contain only one observation, no stratum collapse occurs.
Output 114.6.5: Stratification Information
As a result of not collapsing strata, the standard error estimates of the parameters, shown in Output 114.6.6, are different from those in Output 114.6.3, as are the tests of the significance of model effects.
Output 114.6.6: Parameter Estimates and Effect Tests