The MIANALYZE Procedure

Examples of the Complete-Data Inferences

Subsections:

Means
Regression Coefficients
Correlation Coefficients
Ratios of Variable Means

For a given parameter of interest, it is not always possible to compute the estimate and associated covariance matrix directly from a SAS procedure. This section describes examples of parameters with their estimates and associated covariance matrices, which provide the input to the MIANALYZE procedure. Some are straightforward, and others require special techniques.

Means

For a population mean vector $\bmu$ , the usual estimate is the sample mean vector

$\overline{\mb{y}} = \frac{1}{n} \sum {\mb{y}_{i}}$

A variance estimate for $\overline{\mb{y}}$ is $\frac{1}{n}\mb{S}$ , where $\mb{S}$ is the sample covariance matrix

$\mb{S} = \frac{1}{\, n-1 \, } \sum { ( \mb{y}_{i} - \overline{\mb{y}} ) ( \mb{y}_{i} - \overline{\mb{y}} )’ }$

These statistics can be computed from a procedure such as PROC CORR. This approach is illustrated in Example 64.2.

Regression Coefficients

Many SAS procedures are available for regression analysis. Among them, PROC REG provides the most general analysis capabilities, and others like PROC LOGISTIC and PROC MIXED provide more specialized analyses.

Some regression procedures, such as REG and LOGISTIC, create an EST type data set that contains both the parameter estimates for the regression coefficients and their associated covariance matrix. You can read an EST type data set in the MIANALYZE procedure with the DATA= option. This approach is illustrated in Example 64.3.

Other procedures, such as GLM, MIXED, and GENMOD, do not generate EST type data sets for regression coefficients. For PROC MIXED and PROC GENMOD, you can use ODS OUTPUT statement to save parameter estimates in a data set and the associated covariance matrix in a separate data set. These data sets are then read in the MIANALYZE procedure with the PARMS= and COVB= options, respectively. This approach is illustrated in Example 64.4 for PROC MIXED and in Example 64.5 for PROC GENMOD.

PROC GLM does not display tables for covariance matrices. However, you can use the ODS OUTPUT statement to save parameter estimates and associated standard errors in a data set and the associated $(\mb{X}^\prime \mb{X})^{-1}$ matrix in a separate data set. These data sets are then read in the MIANALYZE procedure with the PARMS= and XPXI= options, respectively. This approach is illustrated in Example 64.6.

For univariate inference, only parameter estimates and associated standard errors are needed. You can use the ODS OUTPUT statement to save parameter estimates and associated standard errors in a data set. This data set is then read in the MIANALYZE procedure with the PARMS= option. This approach is illustrated in Example 64.4.

Correlation Coefficients

For the population correlation coefficient $\rho$ , a point estimate is the sample correlation coefficient r. However, for nonzero $\rho$ , the distribution of r is skewed.

The distribution of r can be normalized through Fisher’s z transformation

$z(r) = \frac{1}{2} \, \mr{log} \left( \frac{1+r}{1-r} \right)$

$z(r)$ is approximately normally distributed with mean $z(\rho )$ and variance $1/(n-3)$ .

With a point estimate $\hat{z}$ and an approximate 95% confidence interval $(z_{1}, z_{2})$ for $z(\rho )$ , a point estimate $\hat{r}$ and a 95% confidence interval $(r_{1}, r_{2})$ for $\rho$ can be obtained by applying the inverse transformation

$r = \mr{tanh}(z) = \frac{e^{2z} - 1}{e^{2z} + 1}$

to $z = \hat{z}, z_{1}$ , and $z_{2}$ .

This approach is illustrated in Example 64.10.

Ratios of Variable Means

For the ratio ${\mu _{1}}/{\mu _{2}}$ of means for variables $Y_{1}$ and $Y_{2}$ , the point estimate is $\overline{y}_{1}/ \overline{y}_{2}$ , the ratio of the sample means. The Taylor expansion and delta method can be applied to the function ${y_{1}}/{y_{2}}$ to obtain the variance estimate (Schafer, 1997, p. 196)

$\frac{1}{n} \left[ {\left( \frac{\overline{y}_{1}}{\overline{y}_{2}^{2}} \right)}^{2} s_{22} - 2 {\left( \frac{\overline{y}_{1}}{y_{2}^{2}} \right)} {\left( \frac{1}{\overline{y}_{2}} \right)} s_{12} + {\left( \frac{1}{\overline{y}_{2}} \right)}^{2} s_{11} \right]$

where $s_{11}$ and $s_{22}$ are the sample variances of $Y_{1}$ and $Y_{2}$ , respectively, and $s_{12}$ is the sample covariance between $Y_{1}$ and $Y_{2}$ .

A ratio of sample means will be approximately unbiased and normally distributed if the coefficient of variation of the denominator (the standard error for the mean divided by the estimated mean) is 10% or less (Cochran 1977, p. 166; Schafer 1997, p. 196).