PROC CALIS: Estimation Criteria :: SAS/STAT(R) 9.22 User's Guide

The CALIS Procedure

Estimation Criteria

The following six estimation methods are available in PROC CALIS:

unweighted least squares (ULS)
full information maximum likelihood (FIML)
generalized least squares (GLS)
normal-theory maximum likelihood (ML)
weighted least squares (WLS, ADF)
diagonally weighted least squares (DWLS)

Default weight matrices $\text{[math]}$ are computed for GLS, WLS, and DWLS estimation. You can also provide your own weight matrices by using an INWGT= data set.

PROC CALIS does not implement all estimation methods in the field. As mentioned in the section Overview: CALIS Procedure, partial least squares (PLS) is not implemented. The PLS method is developed under less restrictive statistical assumptions. It circumvents some computational and theoretical problems encountered by the existing estimation methods in PROC CALIS; however, PLS estimates are less efficient in general. When the statistical assumptions of PROC CALIS are tenable (for example, large sample size, correct distributional assumptions, and so on), ML, GLS, or WLS methods yield better estimates than the PLS method. Note that there is a SAS/STAT procedure called PROC PLS that employs the partial least squares technique, but for a different class of models than those of PROC CALIS. For example, in a PROC CALIS model each latent variable is typically associated with only a subset of manifest variables (predictor or outcome variables). However, in PROC PLS latent variables are not prescribed with subsets of manifest variables. Rather, they are extracted from linear combinations of all manifest predictor variables. Therefore, for general path analysis with latent variables you should use PROC CALIS.

ULS, GLS, and ML Discrepancy Functions

In each estimation method, the parameter vector is estimated iteratively by a nonlinear optimization algorithm that minimizes a discrepancy function $\text{[math]}$ , which is also known as the fit function in the literature. With $\text{[math]}$ denoting the number of manifest variables, $\text{[math]}$ the sample $\text{[math]}$ covariance matrix for a sample with size $\text{[math]}$ , $\text{[math]}$ the $\text{[math]}$ vector of sample means, $\text{[math]}$ the fitted covariance matrix, and $\text{[math]}$ the vector of fitted means, the discrepancy function for unweighted least squares (ULS) estimation is:

$\text{[math]}$

The discrepancy function for generalized least squares estimation (GLS) is:

$\text{[math]}$

By default, $\text{[math]}$ is assumed so that $\text{[math]}$ is the normal theory generalized least squares discrepancy function.

The discrepancy function for normal-theory maximum likelihood estimation (ML) is:

$\text{[math]}$

In each of the discrepancy functions, $\text{[math]}$ and $\text{[math]}$ are considered to be given and $\text{[math]}$ and $\text{[math]}$ are functions of model parameter vector $\text{[math]}$ . That is:

$\text{[math]}$

Estimating $\text{[math]}$ by using a particular estimation method amounts to choosing a vector $\text{[math]}$ that minimizes the corresponding discrepancy function $\text{[math]}$ .

When the mean structures are not modeled or when the mean model is saturated by parameters, the last term of each fit function vanishes. That is, they become:

$\text{[math]}$

If, instead of being a covariance matrix, $\text{[math]}$ is a correlation matrix in the discrepancy functions, $\text{[math]}$ would naturally be interpreted as the fitted correlation matrix. Although whether $\text{[math]}$ is a covariance or correlation matrix makes no difference in minimizing the discrepancy functions, correlational analyses that use these functions are problematic because of the following issues:

The diagonal of the fitted correlation matrix $\text{[math]}$ might contain values other than ones, which violates the requirement of being a correlation matrix.
Whenever available, standard errors computed for correlation analysis in PROC CALIS are straightforward generalizations of those of covariance analysis. In very limited cases these standard errors are good approximations. However, in general they are not even asymptotically correct.
The model fit chi-square statistic for correlation analysis might not follow the theoretical distribution, thus making model fit testing difficult.

Despite these issues in correlation analysis, if your primary interest is to obtain the estimates in the correlation models, you might still find PROC CALIS results for correlation analysis useful.

The statistical techniques used in PROC CALIS are primarily developed for the analysis of covariance structures, and hence COVARIANCE is the default option. Depending on the nature of your research, you can add the mean structures in the analysis by specifying mean and intercept parameters in your models. However, you cannot analyze mean structures simultaneously with correlation structures (see the CORRELATION option) in PROC CALIS.

FIML Discrepancy Function

The full information maximum likelihood method (FIML) assumes multivariate normality of the data. Suppose that you analyze a model that contains $\text{[math]}$ observed variables. The discrepancy function for FIML is

$\text{[math]}$

where $\text{[math]}$ is a data vector for observation $\text{[math]}$ , and $\text{[math]}$ is a constant term (to be defined explicitly later) independent of the model parameters $\text{[math]}$ . In the current formulation, $\text{[math]}$ ’s are not required to have the same dimensions. For example, $\text{[math]}$ could be a complete vector with all $\text{[math]}$ variables present while $\text{[math]}$ is a $\text{[math]}$ vector with one missing value that has been excluded from the original $\text{[math]}$ data vector. As a consequence, subscript $\text{[math]}$ is also used in $\text{[math]}$ and $\text{[math]}$ to denote the submatrices that are extracted from the entire $\text{[math]}$ structured mean vector $\text{[math]}$ ( $\text{[math]}$ ) and $\text{[math]}$ covariance matrix $\text{[math]}$ ( $\text{[math]}$ ). In other words, in the current formulation $\text{[math]}$ and $\text{[math]}$ do not mean that each observation is fitted by distinct mean and covariance structures (although theoretically it is possible to formulate FIML in such a way). The notation simply signifies that the dimensions of $\text{[math]}$ and of the associated mean and covariance structures could vary from observation to observation.

Let $\text{[math]}$ be the number of variables without missing values for observation $\text{[math]}$ . Then $\text{[math]}$ denotes a $\text{[math]}$ data vector, $\text{[math]}$ denotes a $\text{[math]}$ vector of means (structured with model parameters), $\text{[math]}$ is a $\text{[math]}$ matrix for variances and covariances (also structured with model parameters), and $\text{[math]}$ is defined by the following formula, which is a constant term independent of model parameters:

$\text{[math]}$

As a general estimation method, the FIML method is based on the same statistical principle as the ordinary maximum likelihood (ML) method for multivariate normal data—that is, both methods maximize the normal theory likelihood function given the data. In fact, $\text{[math]}$ used in PROC CALIS is related to the log-likelihood function $\text{[math]}$ by the following formula:

$\text{[math]}$

Because the FIML method can deal with observations with various levels of information available, it is primarily developed as an estimation method that could deal with data with random missing values. See the section Relationships among Estimation Criteria for more details about the relationship between FIML and ML methods.

Whenever you use the FIML method, the mean structures are automatically assumed in the analysis. This is due to fact that there is no closed-form formula to obtain the saturated mean vector in the FIML discrepancy function if missing values are present in the data. You can certainly provide explicit specification of the mean parameters in the model by specifying intercepts in the LINEQS statement or means and intercepts in the MEAN or MATRIX statement. However, usually you do not need to do the explicit specification if all you need to achieve is to saturate the mean structures with $\text{[math]}$ parameters (that is, the same number as the number of observed variables in the model). With METHOD=FIML, PROC CALIS uses certain default parameterizations for the mean structures automatically. For example, all intercepts of endogenous observed variables and all means of exogenous observed variables are default parameters in the model, making the explicit specification of these mean structure parameters unnecessary.

WLS and ADF Discrepancy Functions

Another important discrepancy function to consider is the weighted least squares (WLS) function. Let $\text{[math]}$ be a $\text{[math]}$ vector containing all nonredundant elements in the sample covariance matrix $\text{[math]}$ and sample mean vector $\text{[math]}$ , with $\text{[math]}$ representing the vector of the $\text{[math]}$ lower triangle elements of the symmetric matrix $\text{[math]}$ , stacking row by row. Similarly, let $\text{[math]}$ be a $\text{[math]}$ vector containing all nonredundant elements in the fitted covariance matrix $\text{[math]}$ and the fitted mean vector $\text{[math]}$ , with $\text{[math]}$ representing the vector of the $\text{[math]}$ lower triangle elements of the symmetric matrix $\text{[math]}$ .

The WLS discrepancy function is:

$\text{[math]}$

where $\text{[math]}$ is a positive definite symmetric weight matrix with $\text{[math]}$ rows and columns. Because $\text{[math]}$ is a function of model parameter vector $\text{[math]}$ under the structural model, you can write the WLS function as:

$\text{[math]}$

Suppose that $\text{[math]}$ converges to $\text{[math]}$ with increasing sample size, where $\text{[math]}$ and $\text{[math]}$ denote the population covariance matrix and mean vector, respectively. By default, the WLS weight matrix $\text{[math]}$ in PROC CALIS is computed from the raw data as a consistent estimate of the asymptotic covariance matrix $\text{[math]}$ of $\text{[math]}$ , with $\text{[math]}$ partitioned as

$\text{[math]}$

where $\text{[math]}$ denotes the $\text{[math]}$ asymptotic covariance matrix for $\text{[math]}$ , $\text{[math]}$ denotes the $\text{[math]}$ asymptotic covariance matrix for $\text{[math]}$ , and $\text{[math]}$ denotes the $\text{[math]}$ asymptotic covariance matrix between $\text{[math]}$ and $\text{[math]}$ .

To compute the default weight matrix $\text{[math]}$ as a consistent estimate of $\text{[math]}$ , define a similar partition of the weight matrix $\text{[math]}$ as:

$\text{[math]}$

Each of the submatrices in the partition can now be computed from the raw data. First, define the biased sample covariance for variables $\text{[math]}$ and $\text{[math]}$ as:

$\text{[math]}$

and the sample fourth-order central moment for variables $\text{[math]}$ , $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ as:

$\text{[math]}$

The submatrices in $\text{[math]}$ are computed by:

$\text{[math]}$

Assuming the existence of finite eighth-order moments, this default weight matrix $\text{[math]}$ is a consistent but biased estimator of the asymptotic covariance matrix $\text{[math]}$ .

By using the ASYCOV= option, you can use Browne’s unbiased estimator (Browne; 1984, formula (3.8)) of $\text{[math]}$ as:

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

There is no guarantee that $\text{[math]}$ computed this way is positive semidefinite. However, the second part is of order $\text{[math]}$ and does not destroy the positive semidefinite first part for sufficiently large $\text{[math]}$ . For a large number of independent observations, default settings of the weight matrix $\text{[math]}$ result in asymptotically distribution-free parameter estimates with unbiased standard errors and a correct $\text{[math]}$ test statistic (Browne; 1982, 1984).

With the default weight matrix $\text{[math]}$ computed by PROC CALIS, the WLS estimation is also called as the asymptotically distribution-free (ADF) method. In fact, as options in PROC CALIS, METHOD=WLS and METHOD=ADF are totally equivalent, even though WLS in general might include cases with special weight matrices other than the default weight matrix.

When the mean structures are not modeled, the WLS discrepancy function is still the same quadratic form statistic. However, with only the elements in covariance matrix being modeled, the dimensions of $\text{[math]}$ and $\text{[math]}$ are both reduced to $\text{[math]}$ , and the dimension of the weight matrix is now $\text{[math]}$ . That is, the WLS discrepancy function for covariance structure models is:

$\text{[math]}$

If $\text{[math]}$ is a correlation rather than a covariance matrix, the default setting of the $\text{[math]}$ is a consistent estimator of the asymptotic covariance matrix $\text{[math]}$ of $\text{[math]}$ (Browne and Shapiro; 1986; DeLeeuw; 1983), with $\text{[math]}$ and $\text{[math]}$ representing vectors of sample and population correlations, respectively. Elementwise, $\text{[math]}$ is expressed as:

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

where

$\text{[math]}$

and

$\text{[math]}$

The asymptotic variances of the diagonal elements of a correlation matrix are 0. That is,

$\text{[math]}$

for all $\text{[math]}$ . Therefore, the weight matrix computed this way is always singular. In this case, the discrepancy function for weighted least squares estimation is modified to:

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

where $\text{[math]}$ is the penalty weight specified by the WPENALTY= $\text{[math]}$ option and the $\text{[math]}$ are the elements of the inverse of the reduced $\text{[math]}$ weight matrix that contains only the nonzero rows and columns of the full weight matrix $\text{[math]}$ .

The second term is a penalty term to fit the diagonal elements of the correlation matrix $\text{[math]}$ . The default value of $\text{[math]}$ can be decreased or increased by the WPENALTY= option. The often used value of $\text{[math]}$ seems to be too small in many cases to fit the diagonal elements of a correlation matrix properly.

Note that when you model correlation structures, no mean structures can be modeled simultaneously in the same model.

DWLS Discrepancy Functions

Storing and inverting the huge weight matrix $\text{[math]}$ in WLS estimation requires considerable computer resources. A compromise is found by implementing the diagonally weighted least squares (DWLS) method that uses only the diagonal of the weight matrix $\text{[math]}$ from the WLS estimation in the following discrepancy function:

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

When only the covariance structures are modeled, the discrepancy function becomes:

$\text{[math]}$

For correlation models, the discrepancy function is:

$\text{[math]}$

where $\text{[math]}$ is the penalty weight specified by the WPENALTY= $\text{[math]}$ option. Note that no mean structures can be modeled simultaneously with correlation structures when using the DWLS method.

As the statistical properties of DWLS estimates are still not known, standard errors for estimates are not computed for the DWLS method.

Input Weight Matrices

In GLS, WLS, or DWLS estimation you can change from the default settings of weight matrices $\text{[math]}$ by using an INWGT= data set. The CALIS procedure requires a positive definite weight matrix that has positive diagonal elements.

Multiple-Group Discrepancy Function

Suppose that there are $\text{[math]}$ independent groups in the analysis and $\text{[math]}$ , $\text{[math]}$ , ..., $\text{[math]}$ are the sample sizes for the groups. The overall discrepancy function $\text{[math]}$ is expressed as a weighted sum of individual discrepancy functions $\text{[math]}$ ’s for the groups:

$\text{[math]}$

where

$\text{[math]}$

is the weight of the discrepancy function for group $\text{[math]}$ , and

$\text{[math]}$

is the total number of observations in all groups. In PROC CALIS, all discrepancy function $\text{[math]}$ ’s in the overall discrepancy function must belong to the same estimation method. You cannot specify different estimation methods for the groups in a multiple-group analysis. In addition, the same analysis type must be applied to all groups—that is, you can analyze either covariance structures, covariance and mean structures, and correlation structures for all groups.

Top of Page