The CALIS Procedure

The LINEQS Model

The LINEQS modeling language is adapted from the EQS (equations) program by Bentler (1995). The statistical models that LINEQS or EQS analyzes are essentially the same as other general modeling languages such as LISMOD, RAM, and PATH. However, the terminology and approach of the LINEQS or EQS modeling language are different from other languages. They are based on the theoretical model developed by Bentler and Weeks (1980). For convenience, models that are analyzed using the LINEQS modeling language are called LINEQS models. Note that these so-called LINEQS models can also be analyzed by other general modeling languages in PROC CALIS.

In the LINEQS (or the original EQS) model, relationships among variables are represented by a system of equations. For example:

\[ Y_1 = a_0 + a_1 X_1 + a_2 X_2 + E_1 \]
\[ Y_2 = b_0 + b_1 X_1 + b_2 Y_1 + E_2 \]

On the left-hand side of each equation, an outcome variable is hypothesized to be a linear function of one or more predictor variables and an error, which are all specified on the right-hand side of the equation. The parameters specified in an equation are the effects (or regression coefficients) of the predictor variables. For example, in the preceding equations, $Y_1$ and $Y_2$ are outcome variables; $E_1$ and $E_2$ are error variables; $a_1$, $a_2$, $b_1$, and $b_2$ are effect parameters (or regression coefficients); and $a_0$ and $b_0$ are intercept parameters. Variables $X_1$ and $X_2$ serve as predictors in the first equation, while variables $X_1$ and $Y_1$ serve as predictors in the second equation.

This is almost the same representation as in multiple regression models. However, the LINEQS model entails more. It supports a system of equations that can also include latent variables, measurement errors, and correlated errors.

Types of Variables in the LINEQS Model

The distinction between dependent and independent variables is important in the LINEQS model.

A variable is dependent if it appears on the left-hand side of an equation in the model. A dependent variable might be observed (manifest) or latent. It might or might not appear on the right-hand side of other equations, but it cannot appear on the left-hand sides of two or more equations. Error variables cannot be dependent in the LINEQS model.

A variable in the LINEQS model is independent if it is not dependent. Independent variables can be observed (manifest) or latent. All error variables must be independent in the LINEQS model.

Dependent variables are also referred to as endogenous variables; these names are interchangeable. Similarly, independent variables are interchangeable with exogenous variables.

Whereas an outcome variable in any equation must be a dependent variable, a predictor variable in an equation is not necessarily an independent variable in the entire LINEQS model. For example, $Y_1$ is a predictor variable in the second equation of the preceding example, but it is a dependent variable in the LINEQS model. In summary, the predictor-outcome nature of a variable is determined within a single equation, while the exogenous-endogenous (independent-dependent) nature of variable is determined within the entire system of equations.

In addition to the dependent-independent variable distinction, variables in the LINEQS model are distinguished according to whether they are observed in the data. Variables that are observed in research are called observed or manifest variables. Hypothetical variables that are not observed in the LINEQS model are latent variables.

Two types of latent variables should be distinguished: one is error variables; the other is non-error variables. An error variable is unique to an equation. It serves as the unsystematic source of effect for the outcome variable in an equation. If the outcome variable in the equation is latent, the corresponding error variable is also called disturbance. In contrast, non-error or systematic latent variables are called factors. Factors are unmeasured hypothetical constructs in your model. They are systematic sources that explain or describe functional relationships in your model.

Both manifest variables and latent factors can be dependent or independent. However, error or disturbance terms must be independent (or exogenous) variables in your model.

Naming Variables in the LINEQS Model

Whether a variable in each equation is an outcome or a predictor variable is prescribed by the modeler. Whether a variable is independent or dependent can be determined by analyzing the entire system of equations in the model. Whether a variable is observed or latent can be determined if it is referenced in your data set. However, whether a latent variable serves as a factor or an error can be determined only if you provide the specific information.

To distinguish latent factors from errors and both from manifest variables, the following rules for naming variables in the LINEQS model are followed:

  • Manifest variables are referenced in the input data set. You use their names in the LINEQS model specification directly. There is no additional naming rule for the manifest variables in the LINEQS model beyond those required by the SAS System.

  • Latent factor variables must start with letter F or f (for factor).

  • Error variables must start with letter E or e (for error), or D or d (for disturbance). Although you might enforce the use of D- (or d-) variables for disturbances, it is not required. For flexibility, disturbance variables can also start with letter E or e in the LINEQS model.

  • The names of latent variables, errors, and disturbances (F-, E-, and D-variables) should not coincide with the names of manifest variables.

  • You should not use Intercept as a name for any variable. This name is reserved for the intercept specification in LINEQS model equations.

See the section Naming Variables and Parameters for the general rules about naming variables and parameters.

Matrix Representation of the LINEQS Model

As a programming language, the LINEQS model uses equations to describes relationships among variables. But as a mathematical model, the LINEQS model is more conveniently described by matrix terms. In this section, the LINEQS matrix model is described.

Suppose in a LINEQS model that there are $n_ i$ independent variables and $n_ d$ dependent variables. The vector of the independent variables is denoted by $\bxi $, in the order of manifest variables, latent factors, and error variables. The vector of dependent variables is denoted by $\bm {\eta }$, in the order of manifest variables and latent factors. The LINEQS model matrices are defined as follows:

$\balpha $ $(n_ d \times 1)$ :

intercepts of dependent variables

$\bbeta $ $(n_ d \times n_ d)$:

effects of dependent variables (in columns) on dependent variables (in rows)

$\bgamma $ $(n_ d \times n_ i)$ :

effects of independent variables (in columns) on dependent variables (in rows)

$\bPhi $ $(n_ i \times n_ i)$ :

covariance matrix of independent variables

$\bnu $ $(n_ i \times 1)$ :

means of independent variables

The model equation of the LINEQS model is

\[ \bm {\eta } = \balpha + \bbeta \bm {\eta } + \bgamma \bxi \]

Assuming that $(\mb{I} - \bbeta )$ is invertible, under the model the covariance matrix of all variables $(\bm {\eta }^{\prime },\bxi ^{\prime })^{\prime }$ is structured as

\[ \bSigma _ a = \left( \begin{matrix} (\mb{I} - \bbeta )^{-1} \bgamma \bPhi \bgamma ^{\prime } (\mb{I} - \bbeta )^{-1 \prime } & (\mb{I} - \bbeta )^{-1} \bgamma \bPhi \\ \bPhi \bgamma ^{\prime } (\mb{I} - \bbeta )^{-1 \prime } & \bPhi \\ \end{matrix} \right) \]

The mean vector of all variables $(\bm {\eta }^{\prime },\bxi ^{\prime })^{\prime }$ is structured as

\[ \bmu _ a = \left( \begin{matrix} (\mb{I} - \bbeta )^{-1} (\balpha + \bgamma \bnu ) \\ \bnu \\ \end{matrix} \right) \]

As is shown in the structured covariance and mean matrices, the means $\mb{G}$ and covariances of independent variables are direct model parameters in $\bnu $ and $\bPhi $; whereas the means and covariances of dependent variables are functions of various model matrices and hence functions of model parameters.

The covariance and mean structures of all observed variables are obtained by selecting the elements in $\bSigma _ a$ and $\bmu _ a$. Mathematically, define a selection matrix $\mb{G}$ of dimensions $n \times (n_ d+n_ i)$, where n is the number of observed variables in the model. The selection matrix $\mb{G}$ contains zeros and ones as its elements. Each row of $\mb{G}$ has exactly one nonzero element at the position that corresponds to the location of an observed row variable in $\bSigma _ a$ or $\bmu _ a$. With each row of $\mb{G}$ selecting a distinct observed variable, the structured covariance matrix of all observed variables is represented by

\[ \bSigma = \mb{G} \bSigma _ a \mb{G}^{\prime } \]

The structured mean vector of all observed variables is represented by

\[ \bmu = \mb{G} \bmu _ a \]

Partitions of Some LINEQS Model Matrices and Their Restrictions

There are some restrictions in some of the LINEQS model matrices. Although these restrictions do not affect the derivation of the covariance and mean structures, they are enforced in the LINEQS model specification.

Model Restrictions on the $\bbeta $ Matrix

The diagonal of the $\bbeta $ matrix must be zeros. This prevents the direct regression of dependent variables on themselves. Hence, in the LINEQS statement you cannot specify the same variable on both the left-hand and the right-hand sides of the same equation.

Partitions of the $\bgamma $ Matrix and the Associated Model Restrictions

The columns of the $\bgamma $ matrix refer to the variables in $\bxi $, in the order of manifest variables, latent factors, and error variables. In the LINEQS model, the following partition of the $\bgamma $ matrix is assumed:

\[ \bgamma = \left( \begin{matrix} \bgamma _0 & \mb{E} \\ \end{matrix} \right) \]

where $\bgamma _0$ is an $n_ d \times (n_ i - n_ d)$ matrix for the effects of independent manifest variables and latent factors on the dependent variables and $\mb{E}$ is an $n_ d \times n_ d$ permutation matrix for the effects of errors on the dependent variables.

The dimension of submatrix $\mb{E}$ is $n_ d \times n_ d$ because in the LINEQS model each dependent variable signifies an equation with an error term. In addition, because $\mb{E}$ is a permutation matrix (which is formed by exchanging rows of an identity matrix of the same order), the partition of the $\bgamma $ matrix ensures that each dependent variable is associated with a unique error term and that the effect of each error term on its associated dependent variable is 1.

As a result of the error term restriction, in the LINEQS statement you must specify a unique error term in each equation. The coefficient associated with the error term can only be a fixed value at one, either explicitly (with 1.0 inserted immediately before the error term) or implicitly (with no coefficient specified).

Partitions of the $\bnu $ Vector and the Associated Model Restrictions

The $\bnu $ vector contains the means of independent variables, in the order of the manifest, latent factor, and error variables. In the LINEQS model, the following partition of the $\bnu $ vector is assumed:

\[ \bnu = \left( \begin{matrix} \bnu _0 \\ 0 \\ \end{matrix} \right) \]

where $\bnu _0$ is an $(n_ i - n_ d) \times 1$ vector for the means of independent manifest variables and latent factors and 0 is a null vector of dimension $n_ d$ for the means of errors or disturbances. Again, the dimension of the null vector is $n_ d$ because each dependent variable is associated uniquely with an error term. This partition restricts the means of errors or disturbances to zeros.

Hence, when specifying a LINEQS model, you cannot specify the means of errors (or disturbances) as free parameter or fixed values other than zero in the MEAN statement.

Partitions of the $\bPhi $ matrix

The $\bPhi $ matrix is for the covariances of the independent variables, in the order of the manifest, latent factor, and error variables. The following partition of the $\bPhi $ matrix is assumed:

\[ \bPhi = \left( \begin{matrix} \bPhi _{11} & \bPhi ^{\prime }_{21} \\ \bPhi _{21} & \bPhi _{22} \\ \end{matrix} \right) \]

where $\bPhi _{11}$ is an $(n_ i - n_ d) \times (n_ i - n_ d)$ covariance matrix for the independent manifest variables and latent factors, $\bPhi _{22}$ is an $n_ d \times n_ d$ covariance matrix for the errors, and $\bPhi _{21}$ is an $n_ d \times (n_ i - n_ d)$ covariance matrix for the errors with other independent variables in the LINEQS model. Because $\bPhi $ is symmetric, $\bPhi _{11}$ and $\bPhi _{22}$ are also symmetric.

There are actually no model restrictions placed on the submatrices of the partition. However, in most statistical applications, errors represent unsystematic sources of effects and therefore they are not to be correlated with other systematic sources. This implies that submatrix $\bPhi _{21}$ is a null matrix. However, $\bPhi _{21}$ being null is not enforced in the LINEQS model specification. If you ever specify a covariance between an error variable and a non-error independent variable in the COV statement, as a workaround trick or otherwise, you should provide your own theoretical justifications.

Summary of Matrices and Submatrices in the LINEQS Model

Let $n_ d$ be the number of dependent variables and $n_ i$ be the number of independent variables. The names, roles, and dimensions of the LINEQS model matrices and submatrices are summarized in the following table.

Matrix

Name

Description

Dimensions

Model Matrices

$\balpha $

_EQSALPHA_

Intercepts of dependent variables

$n_ d \times 1$

$\bbeta $

_EQSBETA_

Effects of dependent (column) variables on dependent (row) variables

$n_ d \times n_ d$

$\bgamma $

_EQSGAMMA_

Effects of independent (column) variables on dependent (row) variables

$n_ d \times n_ i$

$\bnu $

_EQSNU_

Means of independent variables

$n_ i \times 1$

$\bPhi $

_EQSPHI_

Covariance matrix of independent variables

$n_ i \times n_ i$

Submatrices

$\bgamma _0$

_EQSGAMMA_SUB_

Effects of independent variables, excluding errors, on dependent variables

$n_ d \times (n_ i - n_ d)$

$\bnu _0$

_EQSNU_SUB_

Means of independent variables, excluding errors

$(n_ i - n_ d) \times 1$

$\bPhi _{11}$

_EQSPHI11_

Covariance matrix of independent variables, excluding errors

$(n_ i - n_ d) \times $
$(n_ i - n_ d)$

$\bPhi _{21}$

_EQSPHI21_

Covariances of errors with other independent variables

$n_ d \times (n_ i - n_ d)$

$\bPhi _{22}$

_EQSPHI22_

Covariance matrix of errors

$n_ d \times n_ d$

Specification of the LINEQS Model

Specification in Equations

In the LINEQS statement, you specify intercepts and effect parameters (or regression coefficients) along with the variable relationships in equations. In terms of model matrices, you specify the $\balpha $ vector and the $\bbeta $ and $\bgamma $ matrices in the LINEQS statement without using any matrix language.

For example:

\[ Y = b_0 + b_1 * X_1 + b_2 * F_2 + E_1 \]

In this equation, you specify Y as an outcome variable, $X_1$ and $F_2$ as predictor variables, and $E_1$ as an error variable. The parameters in the equation are the intercept $b_0$ and the path coefficients (or effects) $b_1$ and $b_2$.

This kind of model equation is specified in the LINEQS statement. For example, the previous equation translates into the following LINEQS statement specification:

lineqs Y = b0 * Intercept + b1 * X1 + b2 * F2 + E1;

If the mean structures of the model are not of interest, the intercept term can be omitted. The specification becomes:

lineqs Y =  b1 * X1 + b2 * F2 + E1;

See the LINEQS statement for the details about the syntax.

Because of the LINEQS model restrictions (see the section Partitions of Some LINEQS Model Matrices and Their Restrictions), you must also follow these rules when specifying LINEQS model equations:

  • A dependent variable can appear only on the left-hand side of an equation once. In other words, you must put all predictor variables for a dependent variable in one equation. This is different from some econometric models where a dependent variable can appear on the left-hand sides of two equations to represent an equilibrium point. However, this limitation can be resolved by reparameterization in some cases. See Example 29.18.

  • A dependent variable that appears on the left-hand side of an equation cannot appear on the right-hand side of the same equation. If you measure the same characteristic at different time points and the previous measurement serves as a predictor of the next measurement, you should use different variable names for the measurements so as to comply with this rule.

  • An error term must be specified in each equation and must be unique. The same error name cannot appear in two or more equations. When an equation is truly intended to have no error term, it should be represented equivalently in the LINEQS equation by introducing an error term with zero variance (specified in the VARIANCE statement).

  • The regression coefficient (effect) that is associated with an error term must be fixed at one (1.0). This is done automatically by omitting any fixed constants or parameters that are associated with the error terms. Inserting a parameter or a fixed value other than 1 immediately before an error term is not allowed.

Mean, Variance, and Covariance Parameter Specification

In addition to the intercept and effect parameters that are specified in equations, the means, variances, and covariances among all independent variables are parameters in the LINEQS model. An exception is that the means of all error variables are restricted to fixed zeros in the LINEQS model. To specify the mean, variance, and covariance parameters, you use the MEAN , VARIANCE , and the COV statements, respectively.

The means, variances, and covariances among dependent variables are not parameters themselves in the model. Rather, they are complex functions of the model parameters. See the section Matrix Representation of the LINEQS Model for mathematical details.

Default Parameters in the LINEQS Model

There are two types of default parameters of the LINEQS model, as implemented in PROC CALIS. One is the free parameters; the other is the fixed constants.

The following sets of parameters are free parameters by default:

  • the variances of all exogenous (independent) observed or latent variables (including error and disturbance variables)

  • the covariances among all exogenous (independent) manifest or latent variables (excluding error and disturbance variances)

  • the means of all exogenous (independent) observed variables if the mean structures are modeled

  • the intercepts of all endogenous (dependent) manifest variables if the mean structures are modeled

PROC CALIS names the default free parameters with the _Add prefix and a unique integer suffix. You can override the default free parameters by explicitly specifying them as free, constrained, or fixed parameters in the COV, LINEQS, MEAN, or VARIANCE statement.

Parameters that are not default free parameters in the LINEQS model are fixed constants by default. You can override almost all of the default fixed constants of the LINEQS model by using the COV, LINEQS, MEAN, or VARIANCE statement. You cannot override the following two sets of fixed constants:

  • fixed zero parameters for the direct effects (path coefficients) of variables on their own. You cannot have an equation in the LINEQS statement that has the same variable specified on the left-hand and the right-hand sides.

  • fixed one effects from the error or disturbance variables. You cannot set the path coefficient (effect) of the error or disturbance term to any value other than 1 in the LINEQS statement.

These two sets of fixed parameters reflect the LINEQS model restrictions so that they cannot be modified. Other than these two sets of default fixed parameters, all other default fixed parameters are zeros. You can override these default zeros by explicitly specifying them as free, constrained, or fixed parameters in the COV, LINEQS, MEAN, or VARIANCE statement.