The SURVEYMEANS Procedure

Definitions and Notation

For a stratified clustered sample design, together with the sampling weights, the sample can be represented by an $n \times (P+1)$ matrix

$\displaystyle (\mb {w,Y})$	$\displaystyle =$	$\displaystyle \left( w_{hij}, \mb {y}_{hij} \right)$
$\displaystyle$	$\displaystyle =$	$\displaystyle \left( w_{hij}, y_{hij}^{(1)}, y_{hij}^{(2)}, \ldots , y_{hij}^{(P)}\right)$

where

$h=1, 2, \ldots , H$ is the stratum index
$i=1, 2, \ldots , n_ h$ is the cluster index within stratum h
$j=1, 2, \ldots , m_{hi}$ is the unit index within cluster i of stratum h
$p=1, 2, \ldots , P$ is the analysis variable number, with a total of P variables
$n=\sum _{h=1}^ H \sum _{i=1}^{n_ h} {m_{hi}}$ is the total number of observations in the sample
$w_{hij}$ denotes the sampling weight for unit j in cluster i of stratum h
$\mb {y}_{hij}=\left( y_{hij}^{(1}), y_{hij}^{(2)}, \ldots , y_{hij}^{(P)}\right)$ are the observed values of the analysis variables for unit j in cluster i of stratum h, including both the values of numerical variables and the values of indicator variables for levels of categorical variables.

For a categorical variable C, let l denote the number of levels of C, and denote the level values as $c_1, c_2, \ldots , c_ l$ . Let $y^{(q)}$ $(q\in \{ 1, 2, \ldots , P\} )$ be an indicator variable for the category $(k=1, 2, \ldots , l)$ with the observed value in unit j in cluster i of stratum h:

$y_{hij}^{(q)} = I_{\{ C=c_ k\} }(h,i,j) = \left\{ \begin{array}{ll} 1 & \mbox{if $C_{hij}=c_ k$ } \\ 0 & \mbox{otherwise} \end{array} \right.$

Note that the indicator variable $y_{hij}^{(q)}$ is set to missing when $C_{hij}$ is missing. Therefore, the total number of analysis variables, P, is the total number of numerical variables plus the total number of levels of all categorical variables.

The sampling rate for stratum h, which is used in Taylor series variance estimation, is the fraction of first-stage units (PSUs) selected for the sample. You can use the TOTAL= or RATE= option to input population totals or sampling rates. See the section Specification of Population Totals and Sampling Rates for details. If you input stratum totals, PROC SURVEYMEANS computes as the ratio of the stratum sample size to the stratum total. If you input stratum sampling rates, PROC SURVEYMEANS uses these values directly for . If you do not specify the TOTAL= or RATE= option, then the procedure assumes that the stratum sampling rates are negligible, and a finite population correction is not used when computing variances. Replication methods specified by the VARMETHOD=BRR or the VARMETHOD=JACKKNIFE option do not use this finite population correction .