The CALIS Procedure

Estimation Criteria

The following seven estimation methods are available in PROC CALIS:

  • unweighted least squares (ULS)

  • full information maximum likelihood (FIML)

  • generalized least squares (GLS)

  • normal-theory maximum likelihood (ML)

  • weighted least squares (WLS, ADF)

  • diagonally weighted least squares (DWLS)

  • robust estimation (ROBUST) under the normal theory (ML)

Default weight matrices $\mb{W}$ are computed for GLS, WLS, and DWLS estimation. You can also provide your own weight matrices by using an INWGT= data set. The weight matrices in these estimation methods provide weights for the moment matrices. In contrast, weights that are applied to individual observations are computed in robust estimation. These observation weights are updated during iteration steps of robust estimation. The ULS, GLS, ML, WLS, ADF, and DWLS methods can analyze sample moment matrices as well as raw data, while the FIML and robust methods must analyze raw data.

PROC CALIS does not implement all estimation methods in the field. As mentioned in the section Overview: CALIS Procedure, partial least squares (PLS) is not implemented. The PLS method is developed under less restrictive statistical assumptions. It circumvents some computational and theoretical problems encountered by the existing estimation methods in PROC CALIS; however, PLS estimates are less efficient in general. When the statistical assumptions of PROC CALIS are tenable (for example, large sample size, correct distributional assumptions, and so on), ML, GLS, or WLS methods yield better estimates than the PLS method. Note that there is a SAS/STAT procedure called PROC PLS that employs the partial least squares technique, but for a different class of models than those of PROC CALIS. For example, in a PROC CALIS model each latent variable is typically associated with only a subset of manifest variables (predictor or outcome variables). However, in PROC PLS latent variables are not prescribed with subsets of manifest variables. Rather, they are extracted from linear combinations of all manifest predictor variables. Therefore, for general path analysis with latent variables you should use PROC CALIS.

ULS, GLS, and ML Discrepancy Functions

In each estimation method, the parameter vector is estimated iteratively by a nonlinear optimization algorithm that minimizes a discrepancy function F, which is also known as the fit function in the literature. With p denoting the number of manifest variables, $\mb{S}$ the sample $p \times p$ covariance matrix for a sample with size N, $\mb{\bar{x}}$ the $p \times 1$ vector of sample means, $\bSigma $ the fitted covariance matrix, and $\bmu $ the vector of fitted means, the discrepancy function for unweighted least squares (ULS) estimation is:

\[  F_{\mathit{ULS}} = 0.5 \mr{Tr} [ (\mb{S} - \bSigma )^2 ] + (\mb{\bar{x}} - \bmu )^{\prime }(\mb{\bar{x}} - \bmu )  \]

The discrepancy function for generalized least squares estimation (GLS) is:

\[  F_{\mathit{GLS}} = 0.5 \mr{Tr} [ ( \mb{W}^{-1}(\mb{S} - \bSigma ) )^2 ] + (\mb{\bar{x}} - \bmu )^{\prime }\mb{W}^{-1}(\mb{\bar{x}} - \bmu )  \]

By default, $\mb{W} = \mb{S}$ is assumed so that $F_{\mathit{GLS}}$ is the normal theory generalized least squares discrepancy function.

The discrepancy function for normal-theory maximum likelihood estimation (ML) is:

\[  F_{\mathit{ML}} = \mr{Tr}(\mb{S} \bSigma ^{-1}) - p + \ln (|\bSigma |) - \ln (|\mb{S}|) + (\mb{\bar{x}} - \bmu )^{\prime }\bSigma ^{-1}(\mb{\bar{x}} - \bmu )  \]

In each of the discrepancy functions, $\mb{S}$ and $\mb{\bar{x}}$ are considered to be given and $\bSigma $ and $\bmu $ are functions of model parameter vector $\bTheta $. That is:

\[  F=F(\bSigma (\bTheta ),\bmu (\bTheta );\mb{S},\mb{\bar{x}})  \]

Estimating $\bTheta $ by using a particular estimation method amounts to choosing a vector $\btheta $ that minimizes the corresponding discrepancy function F.

When the mean structures are not modeled or when the mean model is saturated by parameters, the last term of each fit function vanishes. That is, they become:

\[  F_{\mathit{ULS}} = 0.5 \mr{Tr} [ (\mb{S} - \bSigma )^2 ]  \]
\[  F_{\mathit{GLS}} = 0.5 \mr{Tr} [ ( \mb{W}^{-1}(\mb{S} - \bSigma ) )^2 ]  \]
\[  F_{\mathit{ML}} = \mr{Tr}(\mb{S} \bSigma ^{-1}) - p + \ln (|{\bSigma }|) - \ln (|\mb{S}|)  \]

If, instead of being a covariance matrix, $\mb{S}$ is a correlation matrix in the discrepancy functions, $\bSigma $ would naturally be interpreted as the fitted correlation matrix. Although whether $\mb{S}$ is a covariance or correlation matrix makes no difference in minimizing the discrepancy functions, correlational analyses that use these functions are problematic because of the following issues:

  • The diagonal of the fitted correlation matrix $\bSigma $ might contain values other than ones, which violates the requirement of being a correlation matrix.

  • Whenever available, standard errors computed for correlation analysis in PROC CALIS are straightforward generalizations of those of covariance analysis. In very limited cases these standard errors are good approximations. However, in general they are not even asymptotically correct.

  • The model fit chi-square statistic for correlation analysis might not follow the theoretical distribution, thus making model fit testing difficult.

Despite these issues in correlation analysis, if your primary interest is to obtain the estimates in the correlation models, you might still find PROC CALIS results for correlation analysis useful.

The statistical techniques used in PROC CALIS are primarily developed for the analysis of covariance structures, and hence COVARIANCE is the default option. Depending on the nature of your research, you can add the mean structures in the analysis by specifying mean and intercept parameters in your models. However, you cannot analyze mean structures simultaneously with correlation structures (see the CORRELATION option) in PROC CALIS.

FIML Discrepancy Function

The full information maximum likelihood method (FIML) assumes multivariate normality of the data. Suppose that you analyze a model that contains p observed variables. The discrepancy function for FIML is

\[  F_{\mathit{FIML}} = \frac{1}{N}\sum _{j=1}^ N (\ln (|\bSigma _ j|) +(\mb{x}_ j - \bmu _ j)^{\prime }\bSigma _ j^{-1}(\mb{x}_ j - \bmu _ j) + K_ j)  \]

where $\mb{x}_ j$ is a data vector for observation j, and $K_ j$ is a constant term (to be defined explicitly later) independent of the model parameters $\bTheta $. In the current formulation, $\mb{x}_ j$’s are not required to have the same dimensions. For example, $\mb{x}_1$ could be a complete vector with all p variables present while $\mb{x}_2$ is a $(p-1) \times 1$ vector with one missing value that has been excluded from the original $p \times 1$ data vector. As a consequence, subscript j is also used in $\bmu _ j$ and $\bSigma _ j$ to denote the submatrices that are extracted from the entire $p \times 1$ structured mean vector $\bmu $ ($\bmu = \bmu (\bTheta )$) and $p \times p$ covariance matrix $\bSigma $ ($\bSigma = \bSigma (\bTheta )$). In other words, in the current formulation $\bmu _ j$ and $\bSigma _ j$ do not mean that each observation is fitted by distinct mean and covariance structures (although theoretically it is possible to formulate FIML in such a way). The notation simply signifies that the dimensions of $\mb{x}_ j$ and of the associated mean and covariance structures could vary from observation to observation.

Let $p_ j$ be the number of variables without missing values for observation j. Then $\mb{x}_ j$ denotes a $p_ j \times 1$ data vector, $\bmu _ j$ denotes a $p_ j \times 1$ vector of means (structured with model parameters), $\bSigma _ j$ is a $p_ j \times p_ j$ matrix for variances and covariances (also structured with model parameters), and $K_ j$ is defined by the following formula, which is a constant term independent of model parameters:

\[  K_ j = \ln (2\pi ) * p_ j  \]

As a general estimation method, the FIML method is based on the same statistical principle as the ordinary maximum likelihood (ML) method for multivariate normal data—that is, both methods maximize the normal theory likelihood function given the data. In fact, $F_{\mathit{FIML}}$ used in PROC CALIS is related to the log-likelihood function L by the following formula:

\[  F_{\mathit{FIML}} = \frac{-2L}{N}  \]

Because the FIML method can deal with observations with various levels of information available, it is primarily developed as an estimation method that could deal with data with random missing values. See the section Relationships among Estimation Criteria for more details about the relationship between FIML and ML methods.

Whenever you use the FIML method, the mean structures are automatically assumed in the analysis. This is due to fact that there is no closed-form formula to obtain the saturated mean vector in the FIML discrepancy function if missing values are present in the data. You can certainly provide explicit specification of the mean parameters in the model by specifying intercepts in the LINEQS statement or means and intercepts in the MEAN or MATRIX statement. However, usually you do not need to do the explicit specification if all you need to achieve is to saturate the mean structures with p parameters (that is, the same number as the number of observed variables in the model). With METHOD=FIML, PROC CALIS uses certain default parameterizations for the mean structures automatically. For example, all intercepts of endogenous observed variables and all means of exogenous observed variables are default parameters in the model, making the explicit specification of these mean structure parameters unnecessary.

WLS and ADF Discrepancy Functions

Another important discrepancy function to consider is the weighted least squares (WLS) function. Let $\mb{u}=(\mb{s},\mb{\bar{x}})$ be a $p(p+3)/2$ vector containing all nonredundant elements in the sample covariance matrix $\mb{S}$ and sample mean vector $\mb{\bar{x}}$, with $\mb{s}=\mr{vecs}(\mb{S})$ representing the vector of the $p(p+1)/2$ lower triangle elements of the symmetric matrix $\mb{S}$, stacking row by row. Similarly, let $\bm {\eta }=(\bsigma ,\bmu )$ be a $p(p+3)/2$ vector containing all nonredundant elements in the fitted covariance matrix $\bSigma $ and the fitted mean vector $\bmu $, with $\bsigma =\mr{vecs}(\bSigma )$ representing the vector of the $p(p+1)/2$ lower triangle elements of the symmetric matrix $\bSigma $.

The WLS discrepancy function is:

\[  F_{\mathit{WLS}} = (\mb{u} - \bm {\eta })^{\prime } \mb{W}^{-1} (\mb{u} - \bm {\eta })  \]

where $\mb{W}$ is a positive definite symmetric weight matrix with $(p(p+3)/2)$ rows and columns. Because $\bm {\eta }$ is a function of model parameter vector $\bTheta $ under the structural model, you can write the WLS function as:

\[  F_{\mathit{WLS}} = (\mb{u} - \bm {\eta }(\bTheta ))^{\prime } \mb{W}^{-1} (\mb{u} - \bm {\eta }(\bTheta ))  \]

Suppose that $\mb{u}$ converges to $\bm {\eta }_ o=(\bsigma _ o,\bmu _ o)$ with increasing sample size, where $\bsigma _ o$ and $\bmu _ o$ denote the population covariance matrix and mean vector, respectively. By default, the WLS weight matrix $\mb{W}$ in PROC CALIS is computed from the raw data as a consistent estimate of the asymptotic covariance matrix $\bGamma $ of $\sqrt {N}(\mb{u} - \bm {\eta }_ o)$, with $\bGamma $ partitioned as

\[  \Gamma = \left( \begin{matrix}  \bGamma _{ss}   &  \bGamma ^{\prime }_{\bar{x}s}   \\ \bGamma _{\bar{x}s}   &  \bGamma _{\bar{x} \bar{x}}   \\ \end{matrix} \right)  \]

where $\bGamma _{ss}$ denotes the $(p(p+1)/2) \times (p(p+1)/2) $ asymptotic covariance matrix for $\sqrt {N}(\mb{s} - \bsigma _ o)$, $\bGamma _{\bar{x} \bar{x}}$ denotes the $p \times p$ asymptotic covariance matrix for $\sqrt {N}(\bar{\mb{x}} - \bmu _ o)$, and $\bGamma _{\bar{x}s}$ denotes the $p \times (p(p+1)/2)$ asymptotic covariance matrix between $\sqrt {N}(\bar{\mb{x}} - \bmu _ o)$ and $\sqrt {N}(\mb{s} - \bsigma _ o)$.

To compute the default weight matrix $\mb{W}$ as a consistent estimate of $\bGamma $, define a similar partition of the weight matrix $\mb{W}$ as:

\[  \mb{W} = \left( \begin{matrix}  \mb{W}_{ss}   &  \mb{W}^{\prime }_{\bar{x}s}   \\ \mb{W}_{\bar{x}s}   &  \mb{W}_{\bar{x} \bar{x}}   \\ \end{matrix} \right)  \]

Each of the submatrices in the partition can now be computed from the raw data. First, define the biased sample covariance for variables i and j as:

\[  \mb{t}_{ij} = \frac{1}{N} \sum _{r=1}^ N{(x_{ri} - \bar{x}_ i) (x_{rj} - \bar{x}_ j)}  \]

and the sample fourth-order central moment for variables i, j, k, and l as:

\[  \mb{t}_{ij,kl} = \frac{1}{N} \sum _{r=1}^ N{(x_{ri} - \bar{x}_ i) (x_{rj} - \bar{x}_ j)(x_{rk} - \bar{x}_ k) (x_{rl} - \bar{x}_ l)}  \]

The submatrices in $\mb{W}$ are computed by:

\[  [\mb{W}_{ss}]_{ij,kl} = \mb{t}_{ij,kl} - \mb{t}_{ij} \mb{t}_{kl}  \]
\[  [\mb{W}_{\bar{x}s}]_{i,kl} = \frac{1}{N} \sum _{r=1}^ N{(x_{ri} - \bar{x}_ i) (x_{rk} - \bar{x}_ k) (x_{rl} - \bar{x}_ l)}  \]
\[  [\mb{W}_{\bar{x} \bar{x}}]_{ij} = \mb{t}_{ij}  \]

Assuming the existence of finite eighth-order moments, this default weight matrix $\mb{W}$ is a consistent but biased estimator of the asymptotic covariance matrix $\bGamma $.

By using the ASYCOV= option, you can use Browne’s unbiased estimator (Browne, 1984, formula (3.8)) of $\bGamma _{ss}$ as:

\begin{eqnarray*}  [\mb{W}_{ss}]_{ij,kl} & =&  \frac{N(N-1)}{(N-2)(N-3)} (\mb{t}_{ij,kl} - \mb{t}_{ij}\mb{t}_{kl}) \\ & &  - \frac{N}{(N-2)(N-3)} (\mb{t}_{ik} \mb{t}_{jl} + \mb{t}_{il} \mb{t}_{jk} - \frac{2}{N-1} \mb{t}_{ij} \mb{t}_{kl}) \end{eqnarray*}

There is no guarantee that $\mb{W}_{ss}$ computed this way is positive semidefinite. However, the second part is of order $O(N^{-1})$ and does not destroy the positive semidefinite first part for sufficiently large N. For a large number of independent observations, default settings of the weight matrix $\mb{W}$ result in asymptotically distribution-free parameter estimates with unbiased standard errors and a correct $\chi ^2$ test statistic (Browne, 1982, 1984).

With the default weight matrix $\mb{W}$ computed by PROC CALIS, the WLS estimation is also called as the asymptotically distribution-free (ADF) method. In fact, as options in PROC CALIS, METHOD= WLS and METHOD= ADF are totally equivalent, even though WLS in general might include cases with special weight matrices other than the default weight matrix.

When the mean structures are not modeled, the WLS discrepancy function is still the same quadratic form statistic. However, with only the elements in covariance matrix being modeled, the dimensions of $\mb{u}$ and $\bm {\eta }$ are both reduced to $p(p+1)/2 \times 1$, and the dimension of the weight matrix is now $(p(p+1)/2) \times (p(p+1)/2)$. That is, the WLS discrepancy function for covariance structure models is:

\[  F_{\mathit{WLS}} = (\mb{s} - \bsigma )^{\prime } \mb{W}_{ss}^{-1} (\mb{s} - \bsigma )  \]

If $\mb{S}$ is a correlation rather than a covariance matrix, the default setting of the $\mb{W}_{ss}$ is a consistent estimator of the asymptotic covariance matrix $\bGamma _{ss}$ of $\sqrt {N}(\mb{s} - \bsigma _ o)$ (Browne and Shapiro, 1986; De Leeuw, 1983), with $\mb{s}$ and $\bsigma _ o$ representing vectors of sample and population correlations, respectively. Elementwise, $\mb{W}_{ss}$ is expressed as:

\begin{eqnarray*}  [\mb{W}_{ss}]_{ij,kl} &  = &  r_{ij,kl} - \frac{1}{2} r_{ij}(r_{ii,kl} + r_{jj,kl}) - \frac{1}{2} r_{kl}(r_{kk,ij} + r_{ll,ij}) \\ & &  + \frac{1}{4} r_{ij}r_{kl} (r_{ii,kk} + r_{ii,ll} + r_{jj,kk} + r_{jj,ll}) \end{eqnarray*}

where

\[  r_{ij} = \frac{\mb{t}_{ij}}{\sqrt {\mb{t}_{ii} \mb{t}_{jj}}}  \]

and

\[  r_{ij,kl} = \frac{\mb{t}_{ij,kl}}{\sqrt {\mb{t}_{ii} \mb{t}_{jj} \mb{t}_{kk} \mb{t}_{ll}}}  \]

The asymptotic variances of the diagonal elements of a correlation matrix are 0. That is,

\[  [\mb{W}_{ss}]_{ii,ii} = 0  \]

for all i. Therefore, the weight matrix computed this way is always singular. In this case, the discrepancy function for weighted least squares estimation is modified to:

\begin{eqnarray*}  F_{\mathit{WLS}} &  = &  {\sum _{i=2}^ p \sum _{j=1}^{i-1} \sum _{k=2}^ p \sum _{l=1}^{k-1} [\mb{W}_{ss}]^{ij,kl} ([\mb{S}]_{ij} - [\bSigma ]_{ij})([\mb{S}]_{kl} - [\bSigma ]_{kl}) } \\ & &  + r \sum _ i^ p ([\mb{S}]_{ii} - [\bSigma ]_{ii})^2 \end{eqnarray*}

where r is the penalty weight specified by the WPENALTY= r option and the $[\mb{W}_{ss}]^{ij,kl}$ are the elements of the inverse of the reduced $(p(p-1)/2) \times (p(p-1)/2)$ weight matrix that contains only the nonzero rows and columns of the full weight matrix $\mb{W}_{ss}$.

The second term is a penalty term to fit the diagonal elements of the correlation matrix $\mb{S}$. The default value of r = 100 can be decreased or increased by the WPENALTY= option. The often used value of r = 1 seems to be too small in many cases to fit the diagonal elements of a correlation matrix properly.

Note that when you model correlation structures, no mean structures can be modeled simultaneously in the same model.

DWLS Discrepancy Functions

Storing and inverting the huge weight matrix $\mb{W}$ in WLS estimation requires considerable computer resources. A compromise is found by implementing the diagonally weighted least squares (DWLS) method that uses only the diagonal of the weight matrix $\mb{W}$ from the WLS estimation in the following discrepancy function:

\begin{eqnarray*}  F_{\mathit{DWLS}} &  = &  (\mb{u} - \bm {\eta })^{\prime } [\mr{diag}(\mb{W})]^{-1} (\mb{u} - \bm {\eta }) \\ &  = &  \sum _{i=1}^ p \sum _{j=1}^ i [\mb{W}_{ss}]_{ij,ij}^{-1} ([\mb{S}]_{ij} - [\bSigma ]_{ij})^2 + \sum _{i=1}^ p [\mb{W}_{\bar{x} \bar{x}}]_{ii}^{-1}(\bar{\mb{x}}_ i - \bmu _ i)^2 \end{eqnarray*}

When only the covariance structures are modeled, the discrepancy function becomes:

\[  F_{\mathit{DWLS}} = \sum _{i=1}^ p \sum _{j=1}^ i [\mb{W}_{ss}]_{ij,ij}^{-1} ([\mb{S}]_{ij} - [\bSigma ]_{ij})^2  \]

For correlation models, the discrepancy function is:

\[  F_{\mathit{DWLS}} = \sum _{i=2}^ p \sum _{j=1}^{i-1} [\mb{W}_{ss}]_{ij,ij}^{-1} ([\mb{S}]_{ij} - [\bSigma ]_{ij})^2 + r \sum _{i=1}^ p ([\mb{S}]_{ii} - [\bSigma ]_{ii})^2  \]

where r is the penalty weight specified by the WPENALTY= r option. Note that no mean structures can be modeled simultaneously with correlation structures when using the DWLS method.

As the statistical properties of DWLS estimates are still not known, standard errors for estimates are not computed for the DWLS method.

Input Weight Matrices

In GLS, WLS, or DWLS estimation you can change from the default settings of weight matrices $\mb{W}$ by using an INWGT= data set. The CALIS procedure requires a positive definite weight matrix that has positive diagonal elements.

Multiple-Group Discrepancy Function

Suppose that there are k independent groups in the analysis and $N_1$, $N_2$, …, $N_ k$ are the sample sizes for the groups. The overall discrepancy function $F(\bTheta )$ is expressed as a weighted sum of individual discrepancy functions $F_ i$’s for the groups:

\[  F(\bTheta ) = \sum _{i=1}^ k t_ i F_ i(\bTheta )  \]

where

\[  t_ i = \frac{N_ i-1}{N-k}  \]

is the weight of the discrepancy function for group i, and

\[  N = \sum _{i=1}^ k N_ i  \]

is the total number of observations in all groups. In PROC CALIS, all discrepancy function $F_ i$’s in the overall discrepancy function must belong to the same estimation method. You cannot specify different estimation methods for the groups in a multiple-group analysis. In addition, the same analysis type must be applied to all groups—that is, you can analyze either covariance structures, covariance and mean structures, and correlation structures for all groups.

Robust Estimation

Two robust estimation methods that are proposed by Yuan and Zhong (2008) and Yuan and Hayashi (2010) are implemented in PROC CALIS. The first method is the two-stage robust method, which estimates robust covariance and mean matrices in the first stage and then feeds the robust covariance and mean matrices (in place of the ordinary sample covariance and mean matrices) for ML estimation in the second stage. Weighting of the observations is done only in the first stage. The ROBUST=SAT option invokes the two-stage robust estimation. The second method is the direct robust method, which iteratively estimates model parameters with simultaneous weightings of the observations. The ROBUST , ROBUST=RES(E), or ROBUST=RES(F) option invokes the direct robust estimation method.

The procedural difference between the two robust methods results in differential treatments of model outliers and leverage observations (or leverage points). In producing the robust covariance and mean matrices in the first stage, the two-stage robust method downweights outlying observations in all variable dimensions without regard to the model structure. This method downweights potential model outliers and leverage observations (which are not necessarily model outliers) in essentially the same way before the ML estimation in the second stage.

However, the direct robust method downweights the model outliers only. "Good" leverage observations (those that are not outliers at the same time) are not downweighted for model estimation. Therefore, it could be argued that the direct robust method is more desirable if you can be sure that the model is a reasonable one. The reason is that the direct robust method can retain the information from the "good" leverage observations for estimation, while the two-stage robust method downweights all leverage observations indiscriminately during its first stage. However, if the model is itself uncertain, the two-stage robust estimation method might be more foolproof.

Both robust methods employ weights on the observations. Weights are functions of the Mahalanobis distances (M-distances) of the observations and are computed differently for the two robust methods. The following two sections describe the weighting scheme and the estimation procedure of the two robust methods in more detail.

Two-Stage Robust Method

For the two-stage robust method, the following conventional M-distance $d_{s}$ for an observed random vector $\mb{x}$ is computed as

\[  d_{s} = \sqrt {(\mb{x} - \bmu )^{\prime }\Sigma ^{-1}(\mb{x} - \bmu )}  \]

where $\bmu $ and $\Sigma $ are the unstructured mean and covariance matrices, respectively.

Two sets of weights are computed as functions of the M-distances of the observations. The weighting functions are essentially the same form as that of Huber (see, for example, Huber 1981). Let $d_ i$ be the M-distance of observation i, computed from the $d_{s}$ formula. The first set of weights $w_1(d_ i)$ applies to the first moments of the data and is defined as

\begin{eqnarray*}  w_1(d_ i) &  = &  1 \quad \quad \mbox{(if $d_ i \leq \rho $)} \\ &  = &  \rho / d_ i \quad \mbox{(if $d_ i > \rho $)} \end{eqnarray*}

where $\rho $ is the critical value corresponding to the $(1-\varphi ) \times 100$ quantile of the $\chi _ r = \sqrt {{\chi _ r}^{2}}$ distribution, with r being the degrees of freedom. For the two-stage robust method, r is simply the number of observed variables in the analysis. The tuning parameter $\varphi $ controls the approximate proportion of observations to be downweighted (that is, with $w_1(d_ i)$ less than 1). The default $\varphi $ value is set to 0.05. You can override this value by using the ROBPHI= option.

The second set of weights $w_2(d_ i)$ applies to the second moments of the data and is defined as

\[  w_2(d_ i) = (w_1(d_ i))^2/\kappa  \]

where $\kappa $ is a constant that adjusts the sum $\sum _{i=1}^ N w_2(d_ i)$ to 1 approximately. After the tuning parameter $\varphi $ is determined, the critical value $\rho $ and the adjustment $\kappa $ are computed automatically by PROC CALIS.

With these two sets of weights, the two-stage robust method (ROBUST=SAT) estimates the mean and covariance by the so-called iteratively reweighted least squares (IRLS) algorithm. Specifically, the updating formulas at the j+1 iteration are

\begin{eqnarray*}  \bmu ^{(j+1)} &  = &  \frac{\sum _{i=1}^ N w_1({d_ i}^{(j)})\mb{x}_ i}{\sum _{i=1}^ n w_1({d_ i}^{(j)})} \\ \bSigma ^{(j+1)} &  = &  \frac{1}{N} \sum _{i=1}^ N w_2({d_ i}^{(j)})(\mb{x}_ i - \bmu ^{(j)})(\mb{x}_ i - \bmu ^{(j)})^{\prime } \end{eqnarray*}

where ${d_ i}^{(j)}$ is the M-distance evaluated at $\bmu ^{(j)}$ and $\Sigma ^{(j)}$ obtained in the jth iteration. Carry out the iterations until $\bmu $ and $\Sigma $ converge. The final iteration yields the robust estimates of mean and covariance matrices. PROC CALIS uses the relative parameter convergence criterion for the IRLS algorithm. The default criterion is 1E–8. See the XCONV= option for the definition of the relative parameter convergence criterion. After the IRLS algorithm converges in the first stage, the two-stage robust method proceeds to treat the robust mean and covariance estimates as if they were sample mean and covariance matrices for a maximum likelihood estimation (METHOD=ML) of the model.

Direct Robust Method

The direct robust method computes the following residual M-distance $d_{r}$ for an observation with residual random vector $\hat{\mb{e}}$ (say, of dimension $h \times 1$, where h is the number of dependent observed variables $\mb{y}$):

\[  d_{r} = \sqrt {(\mb{L}\hat{\mb{e}})^{\prime }(\mb{L}\Omega _{\hat{\mb{e}}}\mb{L}^{\prime })^{-1}(\mb{L}\hat{\mb{e}})}  \]

where $\mb{L}$ ($(h-q)\times h$) is a loading matrix that reduces $\hat{\mb{e}}$ to $(h-q)$ independent components and q is the number of independent factors to be estimated from the dependent observed variables $\mb{y}$. The reduction of the residual vector into independent components is necessary when the number of factors q is not zero in the model. For $q > 0$, the residual covariance matrix $\Omega _{\hat{\mb{e}}}$ is not invertible and cannot be used in computing the residual M-distances. Hence, the covariance matrix $\mb{L}\Omega _{\hat{\mb{e}}}\mb{L}^{\prime }$ of independent components $\mb{L}\hat{\mb{e}}$ is used instead. See Yuan and Hayashi (2010) for details about the computation of the residuals in the context of structural equation modeling.

The direct robust method also computes two sets of weights as functions of the residual M-distances. Let $d_ i$ be the M-distance of observation i, computed from the $d_{r}$ formula. The first set of weights $w_1(d_ i)$ applies to parameters in the first moments and is defined as

\begin{eqnarray*}  w_1(d_ i) &  = &  1 \quad \quad \mbox{(if $d_ i \leq \rho $)} \\ &  = &  \rho / d_ i \quad \mbox{(if $d_ i > \rho $)} \end{eqnarray*}

The second set of weights $w_2(d_ i)$ applies to the parameters in the second moments and is defined as

\[  w_2(d_ i) = (w_1(d_ i))^2/\kappa  \]

These are essentially the same Huber-type weighting functions as those for the two-stage robust method. The only difference is that the $d_{r}$, instead of the $d_{s}$, formula is used in the weighting functions for the direct robust method. The definition of $\rho $ is also the same as that in the two-stage robust method, but it is now based on a different theoretical chi-square distribution. That is, in the direct robust method, $\rho $ is the critical value that corresponds to the $(1-\varphi ) \times 100$ quantile of the $\chi _ r = \sqrt {{\chi _ r}^{2}}$ distribution, with $r=(h-q)$ being the degrees of freedom. Again, $\varphi $ is a tuning parameter and is set to 0.05 by default. You can override this value by using the ROBPHI= option. The calculation of the number of "independent factors" q depends on the variants of the direct robust estimation that you choose. With the ROBUST=RES(E) option, q is the same as the number of exogenous factors specified in the model. With the ROBUST=RES(F) option, the disturbances of the endogenous factors in the model are also treated as "independent factors," so q is the total number of latent factors specified in the model.

The direct robust method (ROBUST=RES(E) or ROBUST=RES(F)) employs the IRLS algorithm in model estimation. Let the following expression be a vector of nonredundant first and second moments structured in terms of model parameters $\btheta $:

\begin{eqnarray*}  \mb{m}(\btheta ) = \left( \begin{array}{l} \bmu (\btheta ) \\ \mbox{vech}(\bSigma (\btheta )) \end{array} \right) \end{eqnarray*}

where vech() extracts the lower-triangular nonredundant elements in $\bSigma (\btheta )$. The updating formulas for $\btheta $ at the j+1 iteration is

\[  \btheta ^{(j+1)} = \btheta ^{(j)} + \Delta \btheta ^{(j)}  \]

where $\btheta ^{(j)}$ is the parameter values at the jth iteration and $\Delta \btheta ^{(j)}$ is defined by the following formula:

\[  \Delta \btheta ^{(j)} = (\dot{\mb{m}}^{\prime }(\btheta ^{(j)})\mb{W}(\btheta ^{(j)})\dot{\mb{m}}(\btheta ^{(j)}))^{-1}\dot{\mb{m}}^{\prime }(\btheta ^{(j)})\mb{W}(\btheta ^{(j)})\mb{g}_ j  \]

where $\dot{\mb{m}}(\btheta ) = \frac{\partial \mb{m}(\btheta )}{\partial \btheta ^{\prime }}$ is the model Jacobian, $\mb{W}(\btheta )$ is the (normal theory) weight matrix for the moments (see Yuan and Zhong 2008 for the formula of the weight matrix), and $\mb{g}_ j$ is defined as

\begin{eqnarray*}  \mb{g}_ j = \left( \begin{array}{l} \frac{1}{\sum _{i=1}^ N w_1{(d_ i)}} \sum _{i=1}^ N w_1{(d_ i)} \mb{x}_ i - \bmu (\btheta ^{(j)}) \\ \frac{1}{N}\sum _{i=1}^ N w_2{(d_ i)} \mbox{vech}[(\mb{x}_ i - \bmu (\btheta ^{(j)}))(\mb{x}_ i - \bmu (\btheta ^{(j)}))^{\prime }]-\Sigma (\btheta ^{(j)}) \end{array} \right) \end{eqnarray*}

Starting with some reasonable initial estimates for $\btheta $, PROC CALIS iterates the updating formulas until the relative parameter convergence criterion of the IRLS algorithm is satisfied. The default criterion value is 1E–8. This essentially means that the IRLS algorithm converges when $\Delta \btheta $ is sufficiently small. See the XCONV= option for the definition of the relative parameter convergence criterion.

Although the iterative formulas and the IRLS steps for robust estimation have been presented for single-group analysis, they are easily generalizable to multiple-group analysis. For the two-stage robust method, you only need to repeat the robust estimation of the means and covariances for the groups and then apply the obtained robust moments as if they were sample moments for regular maximum likelihood estimation (METHOD=ML). For the direct robust method, you need to expand the dimensions of the model Jacobian matrix $\dot{\mb{m}}(\btheta )$, the weight matrix $\mb{W}(\btheta )$, and the vector $\mb{g}$ to include moments from several groups/models. Therefore, the multiple-group formulas are conceptually quite simple but tedious to present. For this reason, they are omitted here.