The SURVEYLOGISTIC Procedure

Notation

Let Y be the response variable with categories $1,2, \ldots , D, D+1$. The p covariates are denoted by a p-dimension row vector $\mb {x}$.

For a stratified clustered sample design, each observation is represented by a row vector, $ (w_{hij}, \mb {y}_{hij}’, y_{hij(D+1)}, \mb {x}_{hij}) $, where

  • $h=1, 2, \ldots , H$ is the stratum index

  • $i=1, 2, \ldots , n_ h$ is the cluster index within stratum h

  • $j=1, 2, \ldots , m_{hi}$ is the unit index within cluster i of stratum h

  • $w_{hij}$ denotes the sampling weight

  • $\mb {y}_{hij}$ is a D-dimensional column vector whose elements are indicator variables for the first D categories for variable Y. If the response of the jth unit of the ith cluster in stratum h falls in category d, the dth element of the vector is one, and the remaining elements of the vector are zero, where $d=1, 2, \ldots , D$.

  • $y_{hij(D+1)}$ is the indicator variable for the $(D+1)$ category of variable Y

  • $\mb {x}_{hij}$ denotes the k-dimensional row vector of explanatory variables for the jth unit of the ith cluster in stratum h. If there is an intercept, then $x_{hij1}\equiv 1$.

  • $\tilde n=\sum _{h=1}^ H n_ h$ is the total number of clusters in the sample

  • $n=\sum _{h=1}^ H \sum _{i=1}^{n_ h} {m_{hi}}$ is the total sample size

The following notations are also used:

  • $f_ h$ denotes the sampling rate for stratum h

  • $\bpi _{hij}$ is the expected vector of the response variable:

    $\displaystyle  {\bpi }_{hij}  $
    $\displaystyle = $
    $\displaystyle  E(\mb {y}_{hij}|\mb {x}_{hij})  $
    $\displaystyle  $
    $\displaystyle  = $
    $\displaystyle  (\pi _{hij1}, \pi _{hij2}, \ldots , \pi _{hijD})’  $
    $\displaystyle \pi _{hij(D+1)}  $
    $\displaystyle = $
    $\displaystyle  E(y_{hij(D+1)}|\mb {x}_{hij})  $

Note that $\pi _{hij(D+1)}=1-\mbox{\Strong{1}}’ {\bpi }_{hij}$, where 1 is a D-dimensional column vector whose elements are 1.