The SURVEYREG Procedure

Example 98.6 Stratum Collapse

In a stratified sample, it is possible that some strata might have only one sampling unit. When this happens, PROC SURVEYREG collapses the strata that contain a single sampling unit into a pooled stratum. For more detailed information about stratum collapse, see the section Stratum Collapse.

Suppose that you have the following data:

data Sample; 
   input Stratum X Y W; 
   datalines;
10 0 0 5 
10 1 1 5 
11 1 1 10
11 1 2 10
12 3 3 16
33 4 4 45
14 6 7 50
12 3 4 16
;

The variable Stratum is again the stratification variable, the variable X is the independent variable, and the variable Y is the dependent variable. You want to regress Y on X. In the data set Sample, both Stratum=33 and Stratum=14 contain one observation. By default, PROC SURVEYREG collapses these strata into one pooled stratum in the regression analysis.

To input the finite population correction information, you create the SAS data set StratumTotals:

data StratumTotals; 
   input Stratum _TOTAL_;
   datalines;
10 10
11 20
12 32
33 40
33 45
14 50
15  .
66 70
;

The variable Stratum is the stratification variable, and the variable _TOTAL_ contains the stratum totals. The data set StratumTotals contains more strata than the data set Sample. Also in the data set StratumTotals, more than one observation contains the stratum totals for Stratum=33:

33 40
33 45

PROC SURVEYREG allows this type of input. The procedure simply ignores strata that are not present in the data set Sample; for the multiple entries of a stratum, the procedure uses the first observation. In this example, Stratum=33 has the stratum total _TOTAL_=40.

The following SAS statements perform the regression analysis:

title1 'Stratified Sample with Single Sampling Unit in Strata';
title2 'With Stratum Collapse';
proc surveyreg data=Sample total=StratumTotals;
   strata Stratum/list;
   model Y=X;
   weight W;
run;

Output 98.6.1 shows that there are a total of five strata in the input data set and two strata are collapsed into a pooled stratum. The denominator degrees of freedom is 4, due to the collapse (see the section Denominator Degrees of Freedom).

Output 98.6.1: Summary of Data and Regression

Stratified Sample with Single Sampling Unit in Strata
With Stratum Collapse

The SURVEYREG Procedure
 
Regression Analysis for Dependent Variable Y

Data Summary
Number of Observations 8
Sum of Weights 157.00000
Weighted Mean of Y 4.31210
Weighted Sum of Y 677.00000

Design Summary
Number of Strata 5
Number of Strata Collapsed 2

Fit Statistics
R-Square 0.9564
Root MSE 0.5111
Denominator DF 4


Output 98.6.2 displays the stratification information, including stratum collapse. Under the column Collapsed, the fourth stratum (Stratum=14) and the fifth (Stratum=33) are marked as 'Yes,' which indicates that these two strata are collapsed into the pooled stratum (Stratum Index=0). The sampling rate for the pooled stratum is 2% (see the section Sampling Rate of the Pooled Stratum from Collapse).

Output 98.6.3 displays the parameter estimates and the tests of the significance of the model effects.

Output 98.6.2: Stratification Information

Stratum Information
Stratum
Index
Collapsed Stratum N Obs Population Total Sampling Rate
1   10 2 10 20.0%
2   11 2 20 10.0%
3   12 2 32 6.25%
4 Yes 14 1 50 2.00%
5 Yes 33 1 40 2.50%
0 Pooled   2 90 2.22%

Note: Strata with only one observation are collapsed into the stratum with Stratum Index "0".



Output 98.6.3: Parameter Estimates and Effect Tests

Tests of Model Effects
Effect Num DF F Value Pr > F
Model 1 173.01 0.0002
Intercept 1 0.00 0.9961
X 1 173.01 0.0002

Note: The denominator degrees of freedom for the F tests is 4.


Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Intercept 0.00179469 0.34306373 0.01 0.9961
X 1.12598708 0.08560466 13.15 0.0002

Note: The denominator degrees of freedom for the t tests is 4.



Alternatively, if you prefer not to collapse strata with a single sampling unit, you can specify the NOCOLLAPSE option in the STRATA statement:

title1 'Stratified Sample with Single Sampling Unit in Strata';
title2 'Without Stratum Collapse';
proc surveyreg data=Sample total=StratumTotals;
   strata Stratum/list nocollapse;
   model Y = X;
   weight W;
run;

Output 98.6.4 does not contain the stratum collapse information displayed in Output 98.6.1, and the denominator degrees of freedom are 3 instead of 4.

Output 98.6.4: Summary of Data and Regression

Stratified Sample with Single Sampling Unit in Strata
Without Stratum Collapse

The SURVEYREG Procedure
 
Regression Analysis for Dependent Variable Y

Data Summary
Number of Observations 8
Sum of Weights 157.00000
Weighted Mean of Y 4.31210
Weighted Sum of Y 677.00000

Design Summary
Number of Strata 5

Fit Statistics
R-Square 0.9564
Root MSE 0.5111
Denominator DF 3


In Output 98.6.5, although the fourth stratum and the fifth stratum contain only one observation, no stratum collapse occurs.

Output 98.6.5: Stratification Information

Stratum Information
Stratum
Index
Stratum N Obs Population Total Sampling Rate
1 10 2 10 20.0%
2 11 2 20 10.0%
3 12 2 32 6.25%
4 14 1 50 2.00%
5 33 1 40 2.50%


As a result of not collapsing strata, the standard error estimates of the parameters, shown in Output 98.6.6, are different from those in Output 98.6.3, as are the tests of the significance of model effects.

Output 98.6.6: Parameter Estimates and Effect Tests

Tests of Model Effects
Effect Num DF F Value Pr > F
Model 1 347.27 0.0003
Intercept 1 0.00 0.9962
X 1 347.27 0.0003

Note: The denominator degrees of freedom for the F tests is 3.


Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Intercept 0.00179469 0.34302581 0.01 0.9962
X 1.12598708 0.06042241 18.64 0.0003

Note: The denominator degrees of freedom for the t tests is 3.