Geometric means are widely used in a variety of scientific disciplines. They are the natural parameter of interest for a lognormal random variable because a ratio of lognormal random variables has a known lognormal distribution, and the geometric mean of a lognormal ratio is equal to the ratio of the individual geometric means. Some common uses for the geometric mean with survey data include estimating average population growth rates, bacterial contamination rates, and chemical concentration rates.
If you use SAS/STAT^{® }12.1 or later, you can estimate geometric means by specifying the ALLGEO statistic keyword in the PROC SURVEYMEANS statement. If you use a release prior to SAS/STAT12.1, none of the SAS/STAT survey procedures directly compute geometric means. However, with a little programming you can still estimate a geometric mean and its variance from sample survey data.
Following Wolter (1985), suppose denotes the population mean of a characteristic and denotes an estimator of based on a sample of fixed size . The natural estimator for the exponential function is
Suppose denotes an estimator of the variance of that is appropriate to the particular sampling design. Then, the Taylor series estimator of variance is
These results can be applied directly to the problem of estimating the geometric mean of a finite population characteristic because the geometric mean is the exponentiation of the mean of the natural logarithm. That is, the geometric mean can be expressed as a finite population quantity by
where for . Substituting , then
and the natural estimator of is
You can estimate the variance of by
where the estimator is appropriate to the particular sampling design and estimator and is based on the variable . You can use PROC SURVEYMEANS to estimate the arithmetic mean of ; then using the result, you can compute the geometric mean of . Similarly, you can use the variance of the arithmetic mean of and the computed estimate of the geometric mean of to compute the variance and standard error of the geometric mean of .
This example uses hypothetical data that represents survey results from an inspection of meat processing plants. The sampling plan is designed to estimate the levels of the Clostridium perfringens bacteria in processed ground beef. The survey has a stratified, twostage sampling design and includes the variables PLANT, SHIFT, CLOSTRIDIUM, and WGT. PLANT identifies the strata, SHIFT identifies the clusters, WGT contains the sampling weights, and CLOSTRIDIUM measures the levels of bacteria per gram of product.
data example; input plant shift clostridium wgt; datalines; 1 1 60 10.3676 1 1 129 11.4145 1 1 5 10.3055 1 2 159 10.3626 1 2 38 11.1399 1 2 72 10.7285 2 1 6335 11.2469 2 1 605 10.7137 2 1 1 11.4907 ... more lines ... 20 2 1017 10.7235 20 2 23 11.1932 20 2 3 11.1999 20 2 1 11.6239 ; run;
After you put your data in a SAS data set, generate a variable that contains the natural logarithm of the variable CLOSTRIDIUM:
data example; set example; logClostridium = log(clostridium); run;
Use PROC SURVEYMEANS to estimate the arithmetic mean of the newly generated variable LOGCLOSTRIDIUM. The WEIGHT statement specifies that the variable WGT contains the sampling weights. The STRATA statement specifies that the variable PLANT identifies the strata. The CLUSTER statement specifies that the variable PLANT identifies the clusters. The VAR statement specifies the variable LOGCLOSTRIDIUM as the variable whose mean you want to estimate. An ODS OUTPUT statement generates a data set that contains the estimation results.
proc surveymeans data=example; weight wgt; strata plant; cluster shift; var logClostridium; ods output statistics = estimates; run;
Data Summary  

Number of Strata  20 
Number of Clusters  40 
Number of Observations  168 
Sum of Weights  1854.5865 
Statistics  

Variable  N  Mean  Std Error of Mean  95% CL for Mean  
logClostridium  168  3.277231  0.176325  2.90942422  3.64503856 
Table 1 shows the contents of the ODS output data set ESTIMATES.
Variable Name 
Contents 

VarName 
Variable name 
N 
Number of observations 
Mean 
Estimated mean 
StdErr 
Standard error 
LowerCLMean 
Lower confidence limit 
UpperCLMean 
Upper confidence limit 
Use the following DATA step to perform the required transformations on the estimates in the ESTIMATES output data set. The first assignment statement replaces the contents of the variable MEAN with its exponentiated value; this transforms the estimated arithmetic mean of the variable LOGCLOSTRIDIUM into the geometric mean of the variable CLOSTRIDIUM. The next assignment statement transforms the standard error of the arithmetic mean of LOGCLOSTRIDIUM into the standard error of the geometric mean of CLOSTRIDIUM. The next two assignment statements replace the lower and upper confidence limits of the arithmetic mean of LOGCLOSTRIDIUM with their exponentiated values; this transforms the confidence limits for the arithmetic mean of LOGCLOSTRIDIUM into confidence limits for the geometric mean of CLOSTRIDIUM. The last assignment statement changes the contents of VARNAME from LOGCLOSTRIDIUM to CLOSTRIDIUM. The LABEL statement relabels the variables Mean, StdErr, LowerCLMean, and UpperCLMean.
data estimates; set estimates; Mean = exp(Mean); StdErr = sqrt((Mean**2)*(StdErr**2)); LowerCLMean = exp(LowerCLMean); UpperCLMean = exp(UpperCLMean); VarName = 'Clostridium'; label Mean='Geometric Mean' StdErr='Standard Error' LowerCLMean='Lower 95% Confidence Limit' UpperCLMean='Upper 95% Confidence Limit'; run;
Finally, print the transformed output data set. The output from the PRINT procedure in Figure 2 displays the variable name, the estimate of the geometric mean, the standard error of the estimated geometric mean, and the lower and upper 95% confidence limits.
proc print data=estimates label noobs; run;
Variable Name  N  Geometric Mean  Standard Error  Lower 95% Confidence Limit 
Upper 95% Confidence Limit 

Clostridium  168  26.502297  4.673013  18.3462321  38.2842492 
The estimated geometric mean for Clostridium perfringens bacteria per gram of product is 26.50, the standard error of the estimate is 4.67, and a 95% confidence interval for the estimate is (18.34, 38.28).
If you are using SAS/STAT 12.1 or later, you estimate the geometric mean of CLOSTRIDIUM by simply specifying the ALLGEO statistic keyword in the PROC SURVEYMEANS statement; this requests all available statistics associated with geometric means.
proc surveymeans data=example allgeo; weight wgt; strata plant; cluster shift; var Clostridium; run;
Data Summary  

Number of Strata  20 
Number of Clusters  40 
Number of Observations  168 
Sum of Weights  1854.5865 
Geometric Means  

Variable  Geometric Mean  Std Error  95% CL for Mean  Lower 95% OneSided CL for Mean 
Upper 95% OneSided CL for Mean 

clostridium  26.502297  4.673013  18.3462321  38.2842492  19.552843  35.921718 
The estimated geometric mean for Clostridium perfringens bacteria per gram of product is 26.50, the standard error of the estimate is 4.67, a twosided 95% confidence interval for the estimate is (18.34, 38.28), and a onesided 95% confidence interval for the estimate is (19.55, 35.92).