Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The SURVEYMEANS Procedure

Statistical Computations

Definitions and Notation

For a stratified clustered sample design, together with the sampling weights, the sample can be represented by an n ×(P+1) matrix
({w,Y}) &=& ( w_{hij}, y_{hij} ) \ &=& ( w_{hij}, y_{hij}^{(1)}, y_{hij}^{(2)}, ... , y_{hij}^{(P)})

where

For a categorical variable C, let l denote the number of levels of C, and denote the level values as c1, c2, ... , cl. Then there are l indicator variables associated with these levels. That is, for level C=ck (k = 1, 2, ... , l), a y(q) (q\in\{1, 2, ... , P\}) contains the values of the indicator variable for the category C=ck, with the value of observation j in cluster i of stratum h:

y_{hij}^{(q)}=I_{\{C=c_k\}}(h,i,j)=\{ 1 & {if C_{hij}=c_k } \ 0 & {otherwise} .
Therefore, the total number of analysis variables, P, is the total number of numerical variables plus the total number of levels of all categorical variables.

Also, fh denotes the sampling rate for stratum h. You can use the TOTAL= option or the RATE= option to input population totals or sampling rates. If you input stratum totals, PROC SURVEYMEANS computes fh as the ratio of the stratum sample size to the stratum total. If you input stratum sampling rates, PROC SURVEYMEANS uses these values directly for fh. If you do not specify the TOTAL= option or the RATE= option, then the procedure assumes that the stratum sampling rates fh are negligible, and a finite population correction is not used when computing variances.

This notation is also applicable to other sample designs. For example, for a sample design without stratification, you can let H=1; for a sample design without clusters, you can let mhi=1 for every h and i.

Ratio

When you use a RATIO statement, the procedure produces statistics requested by the statistics-keywords in the PROC SURVEYMEANS statement.

Suppose that you want to calculate the ratio of variable Y over variable X. Let xhij be the value of variable X for the jth member in cluster i in the hth stratum.

The ratio of Y over X is

\hat{R}=\frac{ \sum_{h=1}^H\sum_{i=1}^{n_h} \sum_{j=1}^{m_{hi}} w_{hij}... ...ij} } { \sum_{h=1}^H\sum_{i=1}^{n_h} \sum_{j=1}^{m_{hi}} w_{hij} x_{hij} }

PROC SURVEYMEANS uses the Taylor series expansion method to estimate the variance of the ratio \hat{R} as

\hat{V}(\hat{R})=\sum_{h=1}^H { \frac{n_h(1-f_h)}{n_h-1} \sum_{i=1}^{n_h} {(g_{hi\cdot}-\bar{g}_{h\cdot\cdot})^2}}
where
g_{hi\cdot}&=& \frac{\sum_{j=1}^{m_{hi}}w_{hij}(y_{hij}- x_{hij}\hat{R}) } {\s... ...} x_{hij}}\ \bar{g}_{h\cdot\cdot} &=& ( \sum_{i=1}^{n_h}g_{hi\cdot} ) / n_h
The standard error of the ratio is the square root of the estimated variance.
{StdErr}(\hat{R})=\sqrt{\hat{V}(\hat{R})}

Coefficient of Variation

If you specify the keyword CV, PROC SURVEYMEANS computes the coefficient of variation, which is the ratio of the standard error of the mean to the estimated mean.
cv(\bar{Y})={StdErr}(\hat{{\bar{Y}}}) /\hat{{\bar{Y}}}

If you specify the keyword CVSUM, PROC SURVEYMEANS computes the coefficient of variation for the estimated total, which is the ratio of the standard deviation of the sum to the estimated total.

cv(Y)={Std}(\hat{Y}) /\hat{Y}

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.