The SURVEYSELECT Procedure

Specifying the Margin of Error

Instead of specifying the total sample size to allocate among the strata, you can specify the desired margin of error for estimating the overall mean from the stratified sample. Based on the requested allocation method and the stratum variances that you provide, PROC SURVEYSELECT computes the stratum sample sizes that are required to achieve this margin of error. You specify the margin of error in the MARGIN= option in the STRATA statement, and you provide stratum variances in the VAR= option. You can use the MARGIN= option with any allocation method (proportional, optimal, or Neyman) or with allocation proportions that you provide (ALLOC=(values ) or ALLOC=SAS-data-set ).

The margin of error e is the half-width of the $100(1-\alpha )$% confidence interval for the overall mean based on the stratified sample,

\[  e = z_{\alpha /2} \times \sqrt {\mr{Var}(\bar{y}_{\mi{str}})}  \]

where $\mr{Var}(\bar{y}_{\mi{str}})$ is the variance of the estimate of the mean from the stratified sample and $z_{\alpha /2}$ is the $100(1-\alpha /2)$ percentile of the standard normal distribution. You can specify the value of $\alpha $ in the ALPHA= option in the STRATA statement. By default, PROC SURVEYSELECT uses a 95% confidence interval (ALPHA=0.05).

For the specified margin of error e, PROC SURVEYSELECT computes the target stratum sample sizes $n_ h^{*}$ for without-replacement selection methods as

\[  n_ h^{*} = f_ h^{*} ~  \left( \sum _{i=1}^ H{ N_ i^2 S_ i^2 / f_ i^{*} } \right) ~  / ~  \left( ( e N / z_{\alpha /2} )^2 + \sum _{i=1}^ H{ N_ i S_ i^2 } \right)  \]

where $N_ i$ is the number of sampling units in stratum i, $S_ i^2$ is the variance within stratum i, N is the total number of sampling units for all strata, and H is the total number of strata.

The values of $f_ h^{*}$ are the stratum allocation proportions, which PROC SURVEYSELECT computes according to the allocation method that you request. For more information, see the sections Proportional Allocation, Optimal Allocation, and Neyman Allocation.

For with-replacement selection methods, PROC SURVEYSELECT computes the target stratum sample sizes as

\[  n_ h^{*} = f_ h^{*} ~  \left( \sum _{i=1}^ H{ N_ i^2 S_ i^2 / f_ i^{*} } \right) ~  / ~  \left( e N / z_{\alpha /2} \right)^2  \]

For more information, see Lohr (2010, p. 91), Cochran (1977, Chapter 5), and Arkin (1984, Chapter 10).

The target sample size values $n_ h^{*}$ might not be integers, but the stratum sample sizes are required to be integers. PROC SURVEYSELECT rounds all fractional target sample sizes up to integer sample sizes. If you specify a minimum stratum sample size $n_{\mi{min}}$ in the ALLOCMIN= option in the STRATA statement, then all stratum sample sizes $n_ h$ are required to be at least $n_{\mi{min}}$.

For without-replacement selection methods, a stratum sample size cannot exceed the number of units in the stratum. If a target stratum sample size does exceed the number of units in the stratum, the procedure sets $n_ h = N_ h$ for that stratum, removes the stratum from the variance computation (because it contributes nothing to the sampling error), revises the allocation proportions $f_ h^{*}$ for the remaining strata, and computes the stratum sample sizes again. If a stratum sample size equals the number of units in its stratum, the procedure also removes that stratum from the variance computation and revises the sample sizes for the remaining strata. For more information, see Cochran (1977, p. 104) and Arkin (1984, p. 176).

When you specify the STATS option with the MARGIN= option in the STRATA statement, PROC SURVEYSELECT displays the expected margin of error for the sample allocation. The expected margin of error (for the overall mean based on the stratified sample) is computed from the stratum sizes ($N_ i$), the stratum variances that you provide ($S_ i^2$), and the allocated stratum sample sizes that the procedure computes ($n_ i$). For without-replacement selection methods, the expected margin of error is

\[  e = z_{\alpha /2} \times \frac{1}{N} \sqrt { \sum _{i=1}^ H{ \frac{N_ i^2 S_ i^2}{n_ i} ~  ( 1 - \frac{n_ i}{N} ) } }  \]

For with-replacement selection methods, the expected margin of error is

\[  e = z_{\alpha /2} \times \frac{1}{N} \sqrt { \sum _{i=1}^ H{ \frac{N_ i^2 S_ i^2}{n_ i} } }  \]

The expected margin of error should be less than or equal to the value specified in the MARGIN= option. Any difference between the expected margin and the specified value is due to rounding the target stratum sample sizes up to integer values and increasing stratum sample sizes to equal the required minimum value (ALLOCMIN= ).