The SURVEYMEANS Procedure

Geometric Mean

For a continuous variable Y that has positive values, the SURVEYMEANS procedure can compute its geometric mean and associated standard error and confidence limits. To request these statistics, you can specify statistic-keywords such as GEOMEAN, GEOSTDERR, and GEOMLCLM.

The geometric mean of Y from a sample is computed as

$\displaystyle \widehat{\bar{Y}}_ G$	$\displaystyle =$	$\displaystyle \left(\prod _{h=1}^ H\prod _{i=1}^{n_ h} \prod _{j=1}^{m_{hi}} ~ {y_{hij}}^{ w_{hij}} \right)^{\frac{1}{w_{\cdot \cdot \cdot }}}$
$\displaystyle$	$\displaystyle =$	$\displaystyle \exp \left( {\frac{1}{w_{\cdot \cdot \cdot }}} \sum _{h=1}^ H\sum _{i=1}^{n_ h}\sum _{j=1}^{m_{hi}}{w_{hij} \ln (y_{hij})} \right)$

where

$w_{\cdot \cdot \cdot } = \sum _{h=1}^ H\sum _{i=1}^{n_ h}\sum _{j=1}^{m_{hi}}w_{hij}$

is the sum of the weights over all observations in the data set.

When you use the Taylor series method, the variance estimation for the geometric mean is computed as

$\displaystyle \hat{V}(\widehat{\bar{Y}}_ G)$

$\displaystyle =$

$\displaystyle \left( \widehat{\bar{Y}}_ G \right) ^2 \sum _{h=1}^ H{\frac{n_ h(1-f_ h)}{n_ h-1} ~ \sum _{i=1}^{n_ h} {(r_{hi\cdot }-\bar{r}_{h\cdot \cdot })^2}}$

where

$\displaystyle r_{hi\cdot }$	$\displaystyle =$	$\displaystyle \left( \sum _{j=1}^{m_{hi}}w_{hij}~ (\ln (y_{hij})- \ln (\widehat{\bar{Y}}_ G)) \right) / ~ w_{\cdot \cdot \cdot }$
$\displaystyle \bar{r}_{h\cdot \cdot }$	$\displaystyle =$	$\displaystyle \left( \sum _{i=1}^{n_ h}r_{hi\cdot } \right) / ~ n_ h$

The standard error of the geometric mean is the square root of the estimated variance:

$\mbox{StdErr}(\widehat{{\bar{Y_ G}}})= \sqrt {\hat{V}(\widehat{\bar{Y}}_ G)}$

The confidence limits for the geometric means are computed based on the confidence limits for the log transformation of the Y variable as

$\left( \exp ( \ln (\widehat{\bar{Y}}_ G) ~ - ~ \gamma ), ~ ~ ~ ~ \exp ( \ln (\widehat{\bar{Y}}_ G) ~ + ~ \gamma ) \right)$

where

$\gamma = t_{\mi {df},\, \, \alpha /2} * \mbox{StdErr}(\widehat{{\bar{Y_ G}}}) / \widehat{\bar{Y}}_ G$

and $t_{\mi {df},\, \, \alpha /2}$ is the $100(1-\alpha /2)$ percentile of the t distribution, with df calculated as in the section t Test for the Mean.

If you use replication methods to estimate the variance by specifying VARMETHOD=BRR or VARMETHOD=JACKKNIFE, the procedure computes the variance of a geometric means $\hat{V_ R}(\widehat{\bar{Y}}_ G)$ by using the variability among replicate estimates to estimate the overall variance. See the section Replication Methods for Variance Estimation for more information.

Then the standard error is the square root of the estimated variance:

$\mbox{StdErr}_ R(\widehat{{\bar{Y_ G}}})= \sqrt {\hat{V_ R}(\widehat{\bar{Y}}_ G)}$

The confidence limits for the geometric means are computed based on the confidence limits for the log transformation of the variable Y as

$\left( \exp ( \ln (\widehat{\bar{Y}}_ G) ~ - ~ \lambda ), ~ ~ ~ ~ \exp ( \ln (\widehat{\bar{Y}}_ G) ~ + ~ \lambda ) \right)$

where

$\lambda = t_{\mi {df},\, \, \alpha /2} * \mbox{StdErr}_ R (\widehat{{\bar{Y_ G}}}) / \widehat{{\bar{Y_ G}}}$

and $t_{\mi {df},\, \, \alpha /2}$ is the $100(1-\alpha /2)$ percentile of the t distribution, with df calculated as in the section t Test for the Mean.