The CALIS Procedure

Overview: CALIS Procedure

Structural equation modeling is an important statistical tool in economics and behavioral sciences. Structural equations express relationships among several variables that can be either directly observed variables (manifest variables) or unobserved hypothetical variables (latent variables). For an introduction to latent variable models, refer to Loehlin (1987), Bollen (1989b), Everitt (1984), or Long (1983); and for manifest variables, refer to Fuller (1987).

In structural models, as opposed to functional models, all variables are taken to be random rather than having fixed levels. For maximum likelihood (default) and generalized least squares estimation in PROC CALIS, the random variables are assumed to have an approximately multivariate normal distribution. Nonnormality, especially high kurtosis, can produce poor estimates and grossly incorrect standard errors and hypothesis tests, even in large samples. Consequently, the assumption of normality is much more important than in models with nonstochastic exogenous variables. You should remove outliers and consider transformations of nonnormal variables before using PROC CALIS with maximum likelihood (default) or generalized least squares estimation. If the number of observations is sufficiently large, Browne’s asymptotically distribution-free (ADF) estimation method can be used.

You can use the CALIS procedure to estimate parameters and test hypotheses for constrained and unconstrained problems in the following:

multiple and multivariate linear regression
linear measurement-error models
path analysis and causal modeling
simultaneous equation models with reciprocal causation
exploratory and confirmatory factor analysis of any order
canonical correlation
a wide variety of other (non)linear latent variable models

The parameters are estimated using the following criteria:

unweighted least squares (ULS)
generalized least squares (GLS, with optional weight matrix input)
maximum likelihood (ML, for multivariate normal data)
weighted least squares (WLS, ADF, with optional weight matrix input)
diagonally weighted least squares (DWLS, with optional weight matrix input)

The default weight matrix for generalized least squares estimation is the sample covariance or correlation matrix. The default weight matrix for weighted least squares estimation is an estimate of the asymptotic covariance matrix of the sample covariance or correlation matrix. In this case, weighted least squares estimation is equivalent to Browne’s (1982, 1984) asymptotic distribution-free estimation. The default weight matrix for diagonally weighted least squares estimation is an estimate of the asymptotic variances of the input sample covariance or correlation matrix. You can also use an input data set to specify the weight matrix in GLS, WLS, and DWLS estimation.

Estimation methods implemented in PROC CALIS do not exhaust all alternatives in the field. For example, partial least squares (PLS) is not implemented. See the section Estimation Criteria for details about estimation criteria used in PROC CALIS. Note that there is a SAS/STAT procedure called PROC PLS, which employs the partial least squares technique but for a class of models different from those of PROC CALIS. For general path analysis or structural equation model with latent variables you should consider using PROC CALIS.

You can specify the model in several ways:

If you have a set of structural equations to describe the model, you can use an equation-type LINEQS statement similar to that originally developed by Bentler (1985).
You can specify simple path models by using an easily formulated list-type RAM statement similar to that originally developed by McArdle (McArdle and McDonald 1984).
You can do a constrained (confirmatory) first-order factor analysis or component analysis by using the FACTOR statement.
You can analyze a broad family of matrix models by using COSAN and MATRIX statements that are similar to the COSAN program of McDonald and Fraser (McDonald 1978, 1980). It enables you to specify complex matrix models including nonlinear equation models and higher-order factor models.

You can specify linear and nonlinear equality and inequality constraints on the parameters with several different statements, depending on the type of input. Lagrange multiplier test indices are computed for simple constant and equality parameter constraints and for active boundary constraints. General equality and inequality constraints can be formulated using programming statements. For more information, see the section SAS Programming Statements.

PROC CALIS offers a variety of methods for the automatic generation of initial values for the optimization process:

two-stage least squares estimation
instrumental variable factor analysis
approximate factor analysis
ordinary least squares estimation
McDonald’s (McDonald and Hartmann 1992) method

In many common applications, these initial values prevent computational problems and save computer time.

Because numerical problems can occur in the (non)linearly constrained optimization process, the CALIS procedure offers several optimization algorithms:

Levenberg-Marquardt algorithm (Moré, 1978)
trust-region algorithm (Gay 1983)
Newton-Raphson algorithm with line search
ridge-stabilized Newton-Raphson algorithm
various quasi-Newton and dual quasi-Newton algorithms: Broyden-Fletcher-Goldfarb-Shanno and Davidon-Fletcher-Powell, including a sequential quadratic programming algorithm for processing nonlinear equality and inequality constraints
various conjugate gradient algorithms: automatic restart algorithm of Powell (1977), Fletcher-Reeves, Polak-Ribiere, and conjugate descent algorithm of Fletcher (1980)

The quasi-Newton and conjugate gradient algorithms can be modified by several line-search methods. All of the optimization techniques can impose simple boundary and general linear constraints on the parameters. Only the dual quasi-Newton algorithm is able to impose general nonlinear equality and inequality constraints.

The procedure creates an OUTRAM= output data set that completely describes the model (except for program statements) and also contains parameter estimates. This data set can be used as input for another execution of PROC CALIS. Small model changes can be made by editing this data set, so you can exploit the old parameter estimates as starting values in a subsequent analysis. An OUTEST= data set contains information about the optimal parameter estimates (parameter estimates, gradient, Hessian, projected Hessian and Hessian of Lagrange function for constrained optimization, the information matrix, and standard errors). The OUTEST= data set can be used as an INEST= data set to provide starting values and boundary and linear constraints for the parameters. An OUTSTAT= data set contains residuals and, for exploratory factor analysis, the rotated and unrotated factor loadings.

Automatic variable selection (using only those variables from the input data set that are used in the model specification) is performed in connection with the RAM and LINEQS input statements or when these models are recognized in an input model file. Also in these cases, the covariances of the exogenous manifest variables are recognized as given constants. With the PREDET option, you can display the predetermined pattern of constant and variable elements in the predicted model matrix before the minimization process starts. For more information, see the section Automatic Variable Selection and the section Exogenous Manifest Variables.

PROC CALIS offers an analysis of linear dependencies in the information matrix (approximate Hessian matrix) that might be helpful in detecting unidentified models. You also can save the information matrix and the approximate covariance matrix of the parameter estimates (inverse of the information matrix), together with parameter estimates, gradient, and approximate standard errors, in an output data set for further analysis.

PROC CALIS does not provide the analysis of multiple samples with different sample size or a generalized algorithm for missing values in the data. However, the analysis of multiple samples with equal sample size can be performed by the analysis of a moment supermatrix containing the individual moment matrices as block diagonal submatrices.

The new experimental procedure TCALIS is now available. Except for the COSAN model specification, PROC TCALIS supports almost all model specification methods that are available in the CALIS procedure. In addition, there are many new features in PROC TCALIS, including a PATH statement that enables you to specify models by using a path-like syntax, an MSTRUCT statement that enables you to specify patterned covariance structures directly, multiple-group analysis, enhanced mean and covariance structure analysis, a priori parametric function testing, effect analysis with standard error estimates, and more. For more information, see Chapter 88, The TCALIS Procedure.

The CALIS procedure uses ODS Graphics to create graphs as part of its output. High-quality residual histograms are available in PROC CALIS. See Chapter 21, Statistical Graphics Using ODS, for general information about ODS Graphics. See the section ODS Graphics and the PLOTS= option for specific information about the statistical graphics available with the CALIS procedure.

Structural Equation Models

Top of Page