The CALIS Procedure

Overview: CALIS Procedure

Subsections:

Structural equation modeling is an important statistical tool in social and behavioral sciences. Structural equations express relationships among a system of variables that can be either observed variables (manifest variables) or unobserved hypothetical variables (latent variables). For an introduction to latent variable models, see Loehlin (2004); Bollen (1989b); Everitt (1984), or Long (1983); and for manifest variables with measurement errors, see Fuller (1987).

In structural models, as opposed to functional models, all variables are taken to be random rather than having fixed levels. For maximum likelihood (ML, the default) and generalized least squares (GLS) estimation in PROC CALIS, the random variables are assumed to have an approximately multivariate normal distribution. Nonnormality, especially high kurtosis, can produce poor estimates and grossly incorrect standard errors and hypothesis tests, even in large samples. Consequently, the assumption of normality is much more important than in models with nonstochastic exogenous variables. You should remove outliers and consider transformations of nonnormal variables before using PROC CALIS with maximum likelihood (default) or generalized least squares estimation.

Alternatively, several approaches are available to deal with the nonnormality issue. If the number of observations is sufficiently large, you can use Browne’s asymptotically distribution-free (ADF; Browne 1982) estimation method. However, there is no definite guideline for how large a sample needs to be in order to use ADF estimation. Simulation studies usually show that several thousand observations might be required. If you use maximum likelihood estimation, the Satorra-Bentler scaled chi-square test statistics and the associated sandwich-type standard error estimates (Satorra and Bentler 1994) might be a viable solution even if your data are not normal. In PROC CALIS, you can apply the Satorra-Bentler method by using the METHOD= MLSB option. In the psychometric literature, the Satorra-Bentler method is sometimes referred to as robust ML. However, PROC CALIS reserves the term "robust" for another estimation procedure. When you use METHOD=ML (default) with the ROBUST option, the iterative estimation downweights model outliers so that they have less impact on the estimation. Henceforth, this is the robust ML method that the CALIS procedure refers to.

When your data contain missing values, you might consider the full information maximum likelihood (FIML) method. Assuming that the data are missing at random (MAR; see Rubin 1976), the FIML method uses all available data to perform estimation within the maximum likelihood framework. In contrast, all other estimation methods perform listwise deletion of incomplete observations before estimation. Hence, when the data are scarce and you need to use as much information as possible, FIML estimation might provide a viable solution.

You can use the CALIS procedure to estimate parameters and test hypotheses for constrained and unconstrained problems in various situations, including but not limited to the following:

exploratory and confirmatory factor analysis of any order
linear measurement-error models or regression with errors in variables
multiple and multivariate linear regression
multiple-group structural equation modeling with mean and covariance structures
path analysis and causal modeling
latent curve modeling
simultaneous equation models with reciprocal causation
structured covariance and mean matrices in various forms

To specify models in PROC CALIS, you can use a variety of modeling languages:

COSAN —a generalized version of the COSAN program (McDonald 1978, 1980), uses general mean and covariance structures to define models
FACTOR —supports the input of latent factor and observed variable relations
LINEQS —like the EQS program (Bentler 1995), uses equations to describe variable relationships
LISMOD —uses LISREL (Jöreskog and Sörbom 1985) model matrices to define models
MSTRUCT —supports direct parameterizations in the mean and covariance matrices
PATH —provides an intuitive causal path specification interface
RAM —uses the formulation of the reticular action model (McArdle and McDonald 1984) to define models
REFMODEL —provides a quick way for model referencing and respecification

Various modeling languages are provided to suit a wide range of researchers’ background and modeling philosophy. However, statistical situations might arise where one modeling language is more convenient than the others. This will be discussed in the section Which Modeling Language?.

In addition to basic model specification, you can set various parameter constraints in PROC CALIS. Equality constraints on parameters can be achieved by simply giving the same parameter names in different parts of the model. Boundary , linear , and nonlinear constraints are supported as well. If parameters in the model are dependent on additional parameters, you can define the dependence by using the PARAMETERS and the SAS programming statements .

Before the data are analyzed, researchers might be interested in studying some statistical properties of the data. PROC CALIS can provide the following statistical summary of the data:

covariance and mean matrices and their properties
descriptive statistics like means, standard deviations, univariate skewness, and kurtosis measures
multivariate measures of kurtosis
coverage of covariances and means, missing patterns summary, and means of the missing patterns when the FIML estimation is used
weight matrix and its descriptive properties
robust covariance and mean matrices with the robust methods

After a model is fitted and accepted by the researcher, PROC CALIS can provide the following supplementary statistical analysis:

computing squared multiple correlations and determination coefficients
direct and indirect effects partitioning with standard error estimates
model modification tests such as Lagrange multiplier and Wald tests
computing fit summary indices
computing predicted moments of the model
residual analysis on the covariances and means
case-level residual diagnostics with graphical plots
factor rotations
standardized solutions with standard errors
testing parametric functions, individually or simultaneously

When fitting a model, you need to choose an estimation method. The following estimation methods are supported in the CALIS procedure:

diagonally weighted least squares (DWLS, with optional weight matrix input)
full information maximum likelihood (FIML, which can treat observations with random missing values)
generalized least squares (GLS, with optional weight matrix input)
maximum likelihood (ML, for multivariate normal data); this is the default method
maximum likelihood with Satorra-Bentler scaled model fit chi-square statistic and sandwich-type standard error estimation (MLSB)
robust estimation with maximum likelihood model evaluation (ROBUST option with METHOD=ML)
unweighted least squares (ULS)
weighted least squares or asymptotically distribution-free method (WLS or ADF, with optional weight matrix input)

Estimation methods implemented in PROC CALIS do not exhaust all alternatives in the field. For example, the partial least squares (PLS) method is not implemented. See the section Estimation Criteria for details about estimation criteria used in PROC CALIS. Note that there is a SAS/STAT procedure called PROC PLS, which employs the partial least squares technique but for a different class of models than those of PROC CALIS. For general path analysis with latent variables, consider using PROC CALIS.

All estimation methods need some starting values for the parameter estimates. You can provide starting values for any parameters. If there is any estimate without a starting value provided, PROC CALIS determines the starting value by using one or any combination of the following methods:

approximate factor analysis
default initial values
instrumental variable method
matching observed moments of exogenous variables
McDonald’s method (McDonald and Hartmann 1992) method
ordinary least squares estimation
random number generation, if a seed is provided
two-stage least squares estimation

Although no methods for initial estimates are completely foolproof, the initial estimation methods provided by PROC CALIS behave reasonably well in most common applications.

With initial estimates, PROC CALIS will iterate the solutions so as to achieve the optimum solution as defined by the estimation criterion. This is a process known as optimization. Because numerical problems can occur in any optimization process, the CALIS procedure offers several optimization algorithms so that you can choose alternative algorithms when the one being used fails. The following optimization algorithms are supported in PROC CALIS:

Levenberg-Marquardt algorithm (Moré 1978)
trust-region algorithm (Gay 1983)
Newton-Raphson algorithm with line search
ridge-stabilized Newton-Raphson algorithm
various quasi-Newton and dual quasi-Newton algorithms: Broyden-Fletcher-Goldfarb-Shanno and Davidon-Fletcher-Powell, including a sequential quadratic programming algorithm for processing nonlinear equality and inequality constraints
various conjugate gradient algorithms: automatic restart algorithm of Powell (1977), Fletcher-Reeves, Polak-Ribiere, and conjugate descent algorithm of Fletcher (1980)
iteratively reweighted least squares for robust estimation

In addition to the ability to save output tables as data sets by using the ODS OUTPUT statement, PROC CALIS supports the following types of output data sets so that you can save your analysis results for later use:

OUTEST= data sets for storing parameter estimates and their covariance estimates
OUTFIT= data sets for storing fit indices and some pertinent modeling information
OUTMODEL= data sets for storing model specifications and final estimates
OUTSTAT= data sets for storing descriptive statistics, robust covariances and means, residuals, predicted moments, and latent variable scores regression coefficients
OUTWGT= data sets for storing the weight matrices used in the modeling

The OUTEST= , OUTMODEL= , and OUTWGT= data sets can be used as input data sets for subsequent analyses. That is, in addition to the input data provided by the DATA= option, PROC CALIS supports the following input data sets for various purposes in the analysis:

INEST= data sets for providing initial parameter estimates. An INEST= data set could be an OUTEST= data set created from a previous analysis.
INMODEL= data sets for providing model specifications and initial estimates. An INMODEL= data set could be an OUTMODEL= data set created from a previous analysis.
INWGT= data sets for providing the weight matrices. An INWGT= data set could be an OUTWGT= data set created from a previous analysis.

The CALIS procedure uses ODS Graphics to create high-quality graphs as part of its output. You can produce the following graphical output by specifying the PLOTS= option or the PATHDIAGRAM statement:

histogram for mean, covariance, or correlation residuals
histogram for case-level residual M-distances
case-level residual diagnostic plots such as residual by leverage plot, residual by predicted plot, PP-plot, and QQ-plot
path diagram for initial model specification, unstandardized solution, or standardized solution

See Chapter 21: Statistical Graphics Using ODS, for general information about ODS Graphics. See the section ODS Graphics and the PLOTS= option for specific information about the statistical graphics available with the CALIS procedure. For more information about producing customized path diagrams, see the options of the PATHDIAGRAM statement.