The SURVEYLOGISTIC Procedure

Notation

Let Y be the response variable with categories $1,2, \ldots , D, D+1$ . The p covariates are denoted by a p-dimension row vector $\mb {x}$ .

For a stratified clustered sample design, each observation is represented by a row vector, $(w_{hij}, \mb {y}_{hij}’, y_{hij(D+1)}, \mb {x}_{hij})$ , where

$h=1, 2, \ldots , H$ is the stratum index
$i=1, 2, \ldots , n_ h$ is the cluster index within stratum h
$j=1, 2, \ldots , m_{hi}$ is the unit index within cluster i of stratum h
$w_{hij}$ denotes the sampling weight
$\mb {y}_{hij}$ is a D-dimensional column vector whose elements are indicator variables for the first D categories for variable Y. If the response of the jth unit of the ith cluster in stratum h falls in category d, the dth element of the vector is one, and the remaining elements of the vector are zero, where $d=1, 2, \ldots , D$ .
$y_{hij(D+1)}$ is the indicator variable for the $(D+1)$ category of variable Y
$\mb {x}_{hij}$ denotes the k-dimensional row vector of explanatory variables for the jth unit of the ith cluster in stratum h. If there is an intercept, then $x_{hij1}\equiv 1$ .
$\tilde n=\sum _{h=1}^ H n_ h$ is the total number of clusters in the sample
$n=\sum _{h=1}^ H \sum _{i=1}^{n_ h} {m_{hi}}$ is the total sample size

The following notations are also used:

$f_ h$ denotes the sampling rate for stratum h
$\bpi _{hij}$ is the expected vector of the response variable:

$\begin{eqnarray*} {\bpi }_{hij} & =& E(\mb {y}_{hij}|\mb {x}_{hij}) \\ & =& (\pi _{hij1}, \pi _{hij2}, \ldots , \pi _{hijD})’ \\ \pi _{hij(D+1)} & =& E(y_{hij(D+1)}|\mb {x}_{hij}) \end{eqnarray*}$

Note that $\pi _{hij(D+1)}=1-\mbox{\Strong{1}}’ {\bpi }_{hij}$ , where 1 is a D-dimensional column vector whose elements are 1.