The CALIS Procedure

 

Overview: CALIS Procedure

Structural equation modeling is an important statistical tool in social and behavioral sciences. Structural equations express relationships among a system of variables that can be either observed variables (manifest variables) or unobserved hypothetical variables (latent variables). For an introduction to latent variable models, see Loehlin (2004), Bollen (1989b), Everitt (1984), or Long (1983); and for manifest variables with measurement errors, see Fuller (1987).

In structural models, as opposed to functional models, all variables are taken to be random rather than having fixed levels. For maximum likelihood (default) and generalized least squares estimation in PROC CALIS, the random variables are assumed to have an approximately multivariate normal distribution. Nonnormality, especially high kurtosis, can produce poor estimates and grossly incorrect standard errors and hypothesis tests, even in large samples. Consequently, the assumption of normality is much more important than in models with nonstochastic exogenous variables. You should remove outliers and consider transformations of nonnormal variables before using PROC CALIS with maximum likelihood (default) or generalized least squares estimation. If the number of observations is sufficiently large, Browne’s asymptotically distribution-free (ADF) estimation method can be used. If your data sets contain random missing data, the full information maximum likelihood (FIML) method can be used.

You can use the CALIS procedure to estimate parameters and test hypotheses for constrained and unconstrained problems in various situations, including but not limited to the following:

  • exploratory and confirmatory factor analysis of any order

  • linear measurement-error models or regression with errors in variables

  • multiple and multivariate linear regression

  • multiple-group structural equation modeling with mean and covariance structures

  • path analysis and causal modeling

  • simultaneous equation models with reciprocal causation

  • structured covariance and mean matrices in various forms

To specify models in PROC CALIS, you can use a variety of modeling languages:

  • COSAN—a generalized version of the COSAN program (McDonald; 1978, 1980), uses general mean and covariance structures to define models

  • FACTOR—supports the input of latent factor and observed variable relations

  • LINEQS—like the EQS program (Bentler; 1995), uses equations to describe variable relationships

  • LISMOD—utilizes LISREL (Jöreskog and Sörbom; 1985) model matrices to define models

  • MSTRUCT—supports direct parameterizations in the mean and covariance matrices

  • PATH—provides an intuitive causal path specification interface

  • RAM—utilizes the formulation of the reticular action model (McArdle and McDonald; 1984) to define models

  • REFMODEL—provides a quick way for model referencing and respecification

Various modeling languages are provided to suit a wide range of researchers’ background and modeling philosophy. However, statistical situations might arise where one modeling language is more convenient than the others. This will be discussed in the section Which Modeling Language?.

In addition to basic model specification, you can set various parameter constraints in PROC CALIS. Equality constraints on parameters can be achieved by simply giving the same parameter names in different parts of the model. Boundary, linear, and nonlinear constraints are supported as well. If parameters in the model are dependent on additional parameters, you can define the dependence by using the PARAMETERS and the SAS programming statements.

Before the data are analyzed, researchers might be interested in studying some statistical properties of the data. PROC CALIS can provide the following statistical summary of the data:

  • covariance and mean matrices and their properties

  • descriptive statistics like means, standard deviations, univariate skewness, and kurtosis measures

  • multivariate measures of kurtosis

  • coverage of covariances and means, missing patterns summary, and means of the missing patterns when the FIML estimation is used

  • weight matrix and its descriptive properties

After a model is fitted and accepted by the researcher, PROC CALIS can provide the following supplementary statistical analysis:

  • computing squared multiple correlations and determination coefficients

  • direct and indirect effects partitioning with standard error estimates

  • model modification tests such as Lagrange multiplier and Wald tests

  • computing fit summary indices

  • computing predicted moments of the model

  • residual analysis

  • factor rotations

  • standardized solutions with standard errors

  • testing parametric functions, individually or simultaneously

When fitting a model, you need to choose an estimation method. The following estimation methods are supported in the CALIS procedure:

  • diagonally weighted least squares (DWLS, with optional weight matrix input)

  • full information maximum likelihood (FIML, which can treat observations with random missing values)

  • generalized least squares (GLS, with optional weight matrix input)

  • maximum likelihood (ML, for multivariate normal data); this is the default method

  • unweighted least squares (ULS)

  • weighted least squares or asymptotically distribution-free method (WLS or ADF, with optional weight matrix input)

Estimation methods implemented in PROC CALIS do not exhaust all alternatives in the field. For example, the partial least squares (PLS) method is not implemented. See the section Estimation Criteria for details about estimation criteria used in PROC CALIS. Note that there is a SAS/STAT procedure called PROC PLS, which employs the partial least squares technique but for a different class of models than those of PROC CALIS. For general path analysis with latent variables, consider using PROC CALIS.

All estimation methods need some starting values for the parameter estimates. You can provide starting values for any parameters. If there is any estimate without a starting value provided, PROC CALIS determines the starting value by using one or any combination of the following methods:

  • approximate factor analysis

  • default initial values

  • instrumental variable method

  • matching observed moments of exogenous variables

  • McDonald’s method (McDonald and Hartmann; 1992) method

  • ordinary least squares estimation

  • random number generation, if a seed is provided

  • two-stage least squares estimation

Although no methods for initial estimates are completely foolproof, the initial estimation methods provided by PROC CALIS behave reasonably well in most common applications.

With initial estimates, PROC CALIS will iterate the solutions so as to achieve the optimum solution as defined by the estimation criterion. This is a process known as optimization. Because numerical problems can occur in any optimization process, the CALIS procedure offers several optimization algorithms so that you can choose alternative algorithms when the one being used fails. The following optimization algorithms are supported in PROC CALIS:

  • Levenberg-Marquardt algorithm (Moré; 1978)

  • trust-region algorithm (Gay; 1983)

  • Newton-Raphson algorithm with line search

  • ridge-stabilized Newton-Raphson algorithm

  • various quasi-Newton and dual quasi-Newton algorithms: Broyden-Fletcher-Goldfarb-Shanno and Davidon-Fletcher-Powell, including a sequential quadratic programming algorithm for processing nonlinear equality and inequality constraints

  • various conjugate gradient algorithms: automatic restart algorithm of Powell (1977), Fletcher-Reeves, Polak-Ribiere, and conjugate descent algorithm of Fletcher (1980)

In addition to the ability to save output tables as data sets by using the ODS OUTPUT statement, PROC CALIS supports the following types of output data sets so that you can save your analysis results for later use:

  • OUTEST= data sets for storing parameter estimates and their covariance estimates

  • OUTFIT= data sets for storing fit indices and some pertinent modeling information

  • OUTMODEL= data sets for storing model specifications and final estimates

  • OUTSTAT= data sets for storing descriptive statistics, residuals, predicted moments, and latent variable scores regression coefficients

  • OUTWGT= data sets for storing the weight matrices used in the modeling

The OUTEST=, OUTMODEL=, and OUTWGT= data sets can be used as input data sets for subsequent analyses. That is, in addition to the input data provided by the DATA= option, PROC CALIS supports the following input data sets for various purposes in the analysis:

  • INEST= data sets for providing initial parameter estimates. An INEST= data set could be an OUTEST= data set created from a previous analysis.

  • INMODEL= data sets for providing model specifications and initial estimates. An INMODEL= data set could be an OUTMODEL= data set created from a previous analysis.

  • INWGT= data sets for providing the weight matrices. An INWGT= data set could be an OUTWGT= data set created from a previous analysis.

The CALIS procedure uses ODS Graphics to create graphs as part of its output. High-quality residual histograms are available in PROC CALIS. See Chapter 21, Statistical Graphics Using ODS, for general information about ODS Graphics. See the section ODS Graphics and the PLOTS= option for specific information about the statistical graphics available with the CALIS procedure.