The SURVEYREG Procedure

Notation

For a stratified clustered sample design, observations are represented by an $n \times (p+2)$ matrix

\[  (\mb {w, y, X}) = (w_{hij}, y_{hij}, \mb {x}_{hij})  \]

where

  • $\mb {w}$ denotes the sampling weight vector

  • $\mb {y}$ denotes the dependent variable

  • $\mb {X}$ denotes the $n\times p$ design matrix. (When an effect contains only classification variables, the columns of $\mb {X}$ that correspond this effect contain only 0s and 1s; no reparameterization is made.)

  • $h=1, 2, \ldots , H$ is the stratum index

  • $i=1, 2, \ldots , n_ h$ is the cluster index within stratum h

  • $j=1, 2, \ldots , m_{hi}$ is the unit index within cluster i of stratum h

  • p is the total number of parameters (including an intercept if the INTERCEPT effect is included in the MODEL statement)

  • $n=\sum _{h=1}^ H \sum _{i=1}^{n_ h} {m_{hi}}$   is the total number of observations in the sample

Also, $f_ h$ denotes the sampling rate for stratum h. You can use the TOTAL= or RATE= option to input population totals or sampling rates. See the section Specification of Population Totals and Sampling Rates for details. If you input stratum totals, PROC SURVEYREG computes $f_ h$ as the ratio of the stratum sample size to the stratum total. If you input stratum sampling rates, PROC SURVEYREG uses these values directly for $f_ h$. If you do not specify the TOTAL= or RATE= option, then the procedure assumes that the stratum sampling rates $f_ h$ are negligible, and a finite population correction is not used when computing variances.