The CALIS Procedure

Gradient, Hessian, Information Matrix, and Approximate Standard Errors

For a single-sample setting with a discrepancy function $F=F(\bSigma (\bTheta ),\bmu (\bTheta );\mb {S},\mb {\bar{x}})$, the gradient is defined as the first partial derivatives of the discrepancy function with respect to the model parameters $\bTheta $:

\[  g(\bTheta ) = \frac{\partial }{\partial \bTheta } F(\bTheta )  \]

The Hessian is defined as the second partial derivatives of the discrepancy function with respect to the model parameters $\bTheta $:

\[  H(\bTheta ) = \frac{\partial ^2 }{\partial \bTheta \partial \bTheta ^{\prime }} F(\bTheta )  \]

Suppose that the mean and covariance structures fit perfectly with $\bTheta =\bTheta _ o$ in the population. The information matrix is defined as

\[  I(\bTheta _ o) = \frac{1}{2} \mathcal{E}(H(\bTheta _ o))  \]

where the expectation $\mathcal{E}(\cdot )$ is taken over the sampling space of $\mb {S},\mb {\bar{x}}$.

The information matrix plays a significant role in statistical theory. Under certain regularity conditions, the inverse of the information matrix $I^{-1}(\bTheta _ o)$ is the asymptotic covariance matrix for $\sqrt {N}(\hat{\bTheta }-\bTheta _ o)$, where N denotes the sample size and $\hat{\bTheta }$ is an estimator.

In practice, $\bTheta _ o$ is never known and can only be estimated. The information matrix is therefore estimated by the so-called empirical information matrix:

\[  I(\hat{\bTheta }) = \frac{1}{2} H(\hat{\bTheta })  \]

which is evaluated at the values of the sample estimates $\hat{\bTheta }$. Notice that this empirical information matrix, rather than the unknown $I(\bTheta _ o)$, is the information matrix displayed in PROC CALIS output.

Taking the inverse of the empirical information matrix with sample size adjustment, PROC CALIS approximates the estimated covariance matrix of $\hat{\bTheta }$ by:

\[  ((N-1)I(\hat{\bTheta }))^{-1} = ((N - 1) \frac{1}{2} H(\hat{\bTheta }))^{-1} = \frac{2}{N-1} H^{-1}(\hat{\bTheta })  \]

Approximate standard errors for $\hat{\bTheta }$ can then be computed as the square roots of the diagonal elements of the estimated covariance matrix. The theory about the empirical information matrix, the approximate covariance matrix of the parameter estimates, and the approximate standard errors applies to all but the ULS and DWLS estimation methods. Standard errors are therefore not computed with the ULS and DWLS estimation methods.

If a given Hessian or information matrix is singular, PROC CALIS offers two ways to compute a generalized inverse of the matrix and, therefore, two ways to compute approximate standard errors of implicitly constrained parameter estimates, t values, and modification indices. Depending on the G4= specification, either a Moore-Penrose inverse or a G2 inverse is computed. The expensive Moore-Penrose inverse computes an estimate of the null space by using an eigenvalue decomposition. The cheaper G2 inverse is produced by sweeping the linearly independent rows and columns and zeroing out the dependent ones.

Multiple-Group Extensions

In the section Multiple-Group Discrepancy Function, the overall discrepancy function for multiple-group analysis is defined. The same notation is applied here. To begin with, the overall discrepancy function $F(\bTheta )$ is expressed as a weighted sum of individual discrepancy functions $F_ i$’s for the groups as follows:

\[  F(\bTheta ) = \sum _{i=1}^ k t_ i F_ i(\bTheta )  \]

where

\[  t_ i = \frac{N_ i-1}{N-k}  \]

is the weight for group i,

\[  N = \sum _{i=1}^ k N_ i  \]

is the total sample size, and $N_ i$ is the sample size for group i.

The gradient $g(\bTheta )$ and the Hessian $H(\bTheta )$ are now defined as weighted sum of individual functions. That is,

\[  g(\bTheta ) = \sum _{i=1}^ k t_ i g_ i(\bTheta ) = \sum _{i=1}^ k t_ i \frac{\partial }{\partial \bTheta } F_ i(\bTheta )  \]

and

\[  H(\bTheta ) = \sum _{i=1}^ k t_ i H_ i(\bTheta ) = \sum _{i=1}^ k t_ i \frac{\partial ^2 }{\partial \bTheta \partial \bTheta ^{\prime }} F_ i(\bTheta )  \]

Suppose that the mean and covariance structures fit perfectly with $\bTheta =\bTheta _ o$ in the population. If each $t_ i$ converges to a fixed constant $\tau _ i$ ($\tau _ i > 0$) with increasing total sample size, the information matrix can be written as:

\[  I(\bTheta _ o) = \frac{1}{2} \sum _{i=1}^ k \tau _ i \mathcal{E}(H_ i(\bTheta _ o))  \]

To approximate this information matrix, an empirical counterpart is used:

\[  I(\hat{\bTheta }) = \frac{1}{2} \sum _{i=1}^ k t_ i H_ i(\hat{\bTheta })  \]

which is evaluated at the values of the sample estimates $\hat{\bTheta }$. Again, this empirical information matrix, rather than the unknown $I(\bTheta _ o)$, is the information matrix output in PROC CALIS results.

Taking the inverse of the empirical information matrix with sample size adjustment, PROC CALIS approximates the estimated covariance matrix of $\hat{\bTheta }$ in multiple-group analysis by:

\[  ((N-k)I(\hat{\bTheta }))^{-1} = ((N-k) \frac{1}{2} H(\hat{\bTheta }))^{-1} = \frac{2}{N-k} \sum _{i=1}^ k t_ i H_ i^{-1}(\hat{\bTheta })  \]

Approximate standard errors for $\hat{\bTheta }$ can then be computed as the square roots of the diagonal elements of the estimated covariance matrix. Again, for ULS and DWLS estimation, the theory does not apply and so there are no standard errors computed in these cases.

Testing Rank Deficiency in the Approximate Covariance Matrix for Parameter Estimates

When computing the approximate covariance matrix and hence the standard errors for the parameter estimates, inversion of the scaled information matrix or Hessian matrix is involved. The numerical condition of the information matrix can be very poor in many practical applications, especially for the analysis of unscaled covariance data. The following four-step strategy is used for the inversion of the information matrix.

  1. The inversion (usually of a normalized matrix $\mb {D}^{-1}\mb {H}\mb {D}^{-1}$) is tried using a modified form of the Bunch and Kaufman (1977) algorithm, which allows the specification of a different singularity criterion for each pivot. The following three criteria for the detection of rank loss in the information matrix are used to specify thresholds:

    • ASING specifies absolute singularity.

    • MSING specifies relative singularity depending on the whole matrix norm.

    • VSING specifies relative singularity depending on the column matrix norm.

    If no rank loss is detected, the inverse of the information matrix is used for the covariance matrix of parameter estimates, and the next two steps are skipped.

  2. The linear dependencies among the parameter subsets are displayed based on the singularity criteria.

  3. If the number of parameters t is smaller than the value specified by the G4= option (the default value is 60), the Moore-Penrose inverse is computed based on the eigenvalue decomposition of the information matrix. If you do not specify the NOPRINT option, the distribution of eigenvalues is displayed, and those eigenvalues that are set to zero in the Moore-Penrose inverse are indicated. You should inspect this eigenvalue distribution carefully.

  4. If PROC CALIS did not set the right subset of eigenvalues to zero, you can specify the COVSING= option to set a larger or smaller subset of eigenvalues to zero in a further run of PROC CALIS.