The SURVEYFREQ Procedure

Definitions and Notation

For a stratified clustered sample design, define the following:

\[  \begin{array}{lcll} h &  = &  1, 2, \ldots , H &  \mbox{is the stratum number,} \\ & & &  \mbox{with a total of \Mathtext{H} strata} \\[0.08in] i &  = &  1, 2, \ldots , n_ h &  \mbox{is the cluster number within stratum \Mathtext{h},} \\ & & &  \mbox{with a total of $n_ h$ sample clusters in stratum \Mathtext{h}} \\[0.08in] j &  = &  1, 2, \ldots , m_{hi} &  \mbox{is the unit number within cluster \Mathtext{i} of stratum \Mathtext{h},} \\ & & &  \mbox{with a total of $m_{hi}$ sample units from cluster \Mathtext{i} of stratum \Mathtext{h}} \\[0.08in] n &  = &  \sum _{h=1}^ H ~  \sum _{i=1}^{n_ h} ~  {m_{hi}} &  \mbox{is the total number of observations in the sample} \\ \end{array}  \]

$\displaystyle  f_ h  $
$\displaystyle  =  $
$\displaystyle  \mbox{first-stage sampling rate for stratum \Mathtext{h}}  $
$\displaystyle  W_{hij}  $
$\displaystyle  =  $
$\displaystyle  \mbox{sampling weight of unit \Mathtext{j} in cluster \Mathtext{i} of stratum \Mathtext{h}}  $

The sampling rate $f_ h$, which is used in Taylor series variance estimation, is the fraction of first-stage units (PSUs) selected for the sample. You can specify the stratum sampling rates with the RATE= option. Or if you specify population totals with the TOTAL= option, PROC SURVEYFREQ computes $f_ h$ as the ratio of stratum sample size to the stratum total, in terms of PSUs. See the section Population Totals and Sampling Rates for details. If you do not specify the RATE= option or the TOTAL= option, then the procedure assumes that the stratum sampling rates $f_ h$ are negligible and does not use a finite population correction when computing variances.

This notation is also applicable to other sample designs. For example, for a design without stratification, you can let H = 1; for a sample design without clustering, you can let $m_{hi} = 1$ for every h and i, which replaces clusters with individual sampling units.

For a two-way table representing the crosstabulation of two variables, define the following, where there are R levels of the row variable and C levels of the column variable:

\[  \begin{array}{lcll} r &  = &  1, 2, \ldots , R &  \mbox{is the row number, with a total of \Mathtext{R} rows} \\[0.08in] c &  = &  1, 2, \ldots , C &  \mbox{is the column number, with a total of \Mathtext{C} columns} \\[0.08in] N_{rc} & & &  \mbox{is the population total in row \Mathtext{r} and column \Mathtext{c}} \\[0.08in] N_{r \cdot } &  = &  \sum _{c=1}^{C} {N_{rc}} &  \mbox{is the total in row \Mathtext{r}} \\[0.08in] N_{\cdot c} &  = &  \sum _{r=1}^{R} {N_{rc}} &  \mbox{is the total in column \Mathtext{c}} \\[0.08in] N &  = &  \sum _{r=1}^{R} ~  \sum _{c=1}^{C} {N_{rc}} &  \mbox{is the overall total} \end{array}  \]
\[  \begin{array}{lcll} P_{rc} &  = &  N_{rc} ~  / ~  N &  \mbox{is the population proportion in row \Mathtext{r} and column \Mathtext{c}} \\[0.08in] P_{r.} &  = &  N_{r \cdot } ~  / ~  N &  \mbox{is the proportion in row \Mathtext{r}} \\[0.08in] P_{.c} &  = &  N_{\cdot c} ~  / ~  N &  \mbox{is the proportion in column \Mathtext{c}} \\[0.08in] P_{rc}^{~ r} &  = &  N_{rc} ~  / ~  N_{r \cdot } &  \mbox{is the row proportion for table cell (\Mathtext{r}, \Mathtext{c})} \\[0.08in] P_{rc}^{~ c} &  = &  N_{rc} ~  / ~  N_{\cdot c} &  \mbox{is the column proportion for table cell (\Mathtext{r}, \Mathtext{c})} \end{array}  \]

For a specified observation (identified by stratum, cluster, and unit number within the cluster), define the following to indicate whether or not that observation belongs to cell (r, c), row r and column c, of the two-way table, for $r = 1, 2, \ldots , R$ and $c = 1, 2, \ldots , C$:

\[  \delta _{hij} (r,c) = \left\{  \begin{array}{lcl} 1 & &  \mbox{if observation $(hij)$ is in cell (\Mathtext{r}, \Mathtext{c})} \\[0.10in] 0 & &  \mbox{otherwise} \\ \end{array} \right.  \]

Similarly, define the following functions to indicate the observation’s row and column classification:

\[  \delta _{hij} (r ~  \cdot ) = \left\{  \begin{array}{lcl} 1 & &  \mbox{if observation $(hij)$ is in row \Mathtext{r}} \\[0.10in] 0 & &  \mbox{otherwise} \\ \end{array} \right.  \]
\[  \delta _{hij} (\cdot ~  c) = \left\{  \begin{array}{lcl} 1 & &  \mbox{if observation $(hij)$ is in column \Mathtext{c}} \\[0.10in] 0 & &  \mbox{otherwise} \\ \end{array} \right.  \]