PROC TCALIS: The LINEQS Model :: SAS/STAT(R) 9.2 User's Guide, Second Edition

The TCALIS Procedure

The LINEQS Model

The LINEQS modeling language is adapted from the EQS program by Bentler (1995). The statistical models that LINEQS or EQS analyzes are essentially the same as other general modeling languages such as LISMOD, RAM, and PATH in PROC TCALIS. However, the terminology and approach of the LINEQS or EQS modeling language are different from other languages. They are based on the theoretical model developed by Bentler and Weeks (1980). For convenience, models analyzed using the LINEQS modeling language will be called LINEQS models. It is noted that these so-called LINEQS models can also be analyzed by other general modeling languages in PROC TCALIS.

In the LINEQS (or the original EQS) model, relationships among variables are represented by a system of equations. For example:

$\text{[math]}$

On the left-hand side of each equation, an outcome variable is hypothesized to be a linear function of one or more predictor variables and an error, which are all specified on the right-hand side of the equation. The parameters specified in an equation are the effects (or regression coefficients) of the predictor variables. For example, in the preceding equations, $\text{[math]}$ and $\text{[math]}$ are outcome variables; $\text{[math]}$ and $\text{[math]}$ are error variables; $\text{[math]}$ , $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ are effect parameters (or regression coefficients); and $\text{[math]}$ and $\text{[math]}$ are intercept parameters. Variables $\text{[math]}$ and $\text{[math]}$ serve as predictors in the first equation, while variables $\text{[math]}$ and $\text{[math]}$ serve as predictors in the second equation.

This is almost the same representation as in multiple regression models. However, the LINEQS model entails more. It supports a system of equations that can also include latent variables, measurement errors, and correlated errors.

Types of Variables in the LINEQS Model

The distinction between dependent and independent variables is important in the LINEQS model.

A variable is dependent if it appears on the left-hand side of an equation in the model. A dependent variable might be observed (manifest) or latent. It might or might not appear on the right-hand side of other equations, but it cannot appear on the left-hand sides of two or more equations. Error variables cannot be dependent in the LINEQS model.

A variable in the LINEQS model is independent if it is not dependent. Independent variables can be observed (manifest) or latent. All error variables must be independent in the LINEQS model.

Dependent variables are also referred to as endogenous variables. These names are interchangeable. Similarly, independent variables are interchangeable with exogenous variables.

Whereas an outcome variable in any equation must be a dependent variable, a predictor variable in an equation is not necessarily an independent variable in the entire LINEQS model. For example, $\text{[math]}$ is a predictor variable in the second equation of the preceding example, but it is a dependent variable in the LINEQS model. In sum, the predictor-outcome nature of a variable is determined within a single equation, while the exogenous-endogenous (independent-dependent) nature of variable is determined within the entire system of equations.

In addition to the dependent-independent variable distinction, variables in the LINEQS model are distinguished according to whether they are observed in the data. Variables that are observed in research are called observed or manifest variables. Hypothetical variables that are not observed in the LINEQS model are latent variables.

Two types of latent variables should be distinguished: one is error variables; the other is non-error variables. An error variable is unique to an equation. It serves as the unsystematic source of effect for the outcome variable in an equation. If the outcome variable in the equation is latent, the corresponding error variable is also called disturbance. In contrast, non-error or systematic latent variables are called factors. Factors are unmeasured hypothetical constructs in your model. They are systematic sources that explain or describe functional relationships in your model.

Both manifest variables and latent factors can be dependent or independent. However, error or disturbance terms must be independent (or exogenous) variables in your model.

Naming Variables in the LINEQS Model

Whether a variable in each equation is an outcome or a predictor variable is prescribed by the modeler. Whether a variable is independent or dependent can be determined by analyzing the entire system of equations in the model. Whether a variable is observed or latent can be determined if it is referenced in your data set. However, whether a latent variable serves as a factor or an error can be determined only if you provide the specific information.

To distinguish latent factors from errors and both from manifest variables, the following rules for naming variables in the LINEQS model are followed:

Manifest variables are referenced in the input data set. You use their names in the LINEQS model specification directly. There is no additional naming rule for the manifest variables in the LINEQS model beyond those required by the SAS system.
Latent factor variables must start with letter F or f (for factor).
Error variables must start with letter E or e (for error), or D or d (for disturbance). Although you might enforce the use of D- (or d-) variables for disturbances, it is not required. For flexibility, disturbance variables can also start with letter E or e in the LINEQS model.
The names of latent variables, errors, and disturbances (F-, E-, and D- variables) should not coincide with the names of manifest variables.
You should not use Intercept as a name for any variable. This name is reserved for intercept specification in LINEQS model equations.

Matrix Representation of the LINEQS Model

As a programming language, the LINEQS model uses equations to describes relationships among variables. But as a mathematical model, the LINEQS model is more conveniently described by matrix terms. In this section, the LINEQS matrix model is described.

Suppose in a LINEQS model that there are $\text{[math]}$ independent variables and $\text{[math]}$ dependent variables. The vector of the independent variables is denoted by $\text{[math]}$ , in the order of manifest variables, latent factors, and error variables. The vector of dependent variables is denoted by $\text{[math]}$ , in the order of manifest variables and latent factors. The LINEQS model matrices are defined as follows:

$\text{[math]}$ $\text{[math]}$ :: intercepts of dependent variables
$\text{[math]}$ $\text{[math]}$ :: effects of dependent variables (in columns) on dependent variables (in rows)
$\text{[math]}$ $\text{[math]}$ :: effects of independent variables (in columns) on dependent variables (in rows)
$\text{[math]}$ $\text{[math]}$ :: covariance matrix of independent variables
$\text{[math]}$ $\text{[math]}$ :: means of independent variables

The model equation of the LINEQS model is:

$\text{[math]}$

Assuming that $\text{[math]}$ is invertible, under the model the covariance matrix of all variables $\text{[math]}$ is structured as:

$\text{[math]}$

The mean vector of all variables $\text{[math]}$ is structured as:

$\text{[math]}$

As is shown in the structured covariance and mean matrices, the means and covariances of independent variables are direct model parameters in $\text{[math]}$ and $\text{[math]}$ ; whereas the means and covariances of dependent variables are functions of various model matrices and hence functions of model parameters.

The covariance and mean structures of all observed variables are obtained by selecting the elements in $\text{[math]}$ and $\text{[math]}$ . Mathematically, define a selection matrix $\text{[math]}$ of dimensions $\text{[math]}$ , where $\text{[math]}$ is the number of observed variables in the model. The selection matrix $\text{[math]}$ contains zeros and ones as its elements. Each row of $\text{[math]}$ has exactly one nonzero element at the position that corresponds to the location of an observed row variable in $\text{[math]}$ or $\text{[math]}$ . With each row of $\text{[math]}$ selecting a distinct observed variable, the structured covariance matrix of all observed variables is represented by the following:

$\text{[math]}$

The structured mean vector of all observed variables is represented by:

$\text{[math]}$

Partitions of Some LINEQS Model Matrices and Their Restrictions

There are some restrictions in some of the LINEQS model matrices. Although these restrictions do not affect the derivation of the covariance and mean structures, they are enforced in the LINEQS model specification.

Model Restrictions on the $\text{[math]}$ Matrix

The diagonal of the $\text{[math]}$ matrix must be zeros. This prevents the direct regression of dependent variables on themselves. Hence, in the LINEQS statement you cannot specify the same variable on both the left-hand and the right-hand sides of the same equation.

Partitions of the $\text{[math]}$ Matrix and the Associated Model Restrictions

The columns of the $\text{[math]}$ matrix refer to the variables in $\text{[math]}$ , in the order of mainfest variables, latent factors, and error variables. In the LINEQS model, the following partition of the $\text{[math]}$ matrix is assumed:

$\text{[math]}$

where $\text{[math]}$ is an $\text{[math]}$ matrix for the effects of independent manifest variables and latent factors on the dependent variables and $\text{[math]}$ is an $\text{[math]}$ permutation matrix for the effects of errors on the dependent variables.

The dimension of submatrix $\text{[math]}$ is $\text{[math]}$ because in the LINEQS model each dependent variable signifies an equation with an error term. In addition, because $\text{[math]}$ is a permutation matrix (which is formed by exchanging rows of an identity matrix of the same order), the partition of the $\text{[math]}$ matrix ensures that each dependent variable is associated with a unique error term and that the effect of each error term on its associated dependent variable is $\text{[math]}$ .

As a result of the error term restriction, in the LINEQS statement you must specify a unique error term in each equation. The coefficient associated with the error term can only be a fixed value at one, either explicitly (with $\text{[math]}$ inserted immediately before the error term) or implicitly (with no coefficient specified).

Partitions of the $\text{[math]}$ Vector and the Associated Model Restrictions

The $\text{[math]}$ vector contains the means of independent variables, in the order of the manifest, latent factor, and error variables. In the LINEQS model, the following partition of the $\text{[math]}$ vector is assumed:

$\text{[math]}$

where $\text{[math]}$ is an $\text{[math]}$ vector for the means of independent manifest variables and latent factors and $\text{[math]}$ is a null vector of dimension $\text{[math]}$ for the means of errors or disturbances. Again, the dimension of the null vector is $\text{[math]}$ because each dependent variable is associated uniquely with an error term. This partition restricts the means of errors or disturbances to zeros.

Hence, when specifying a LINEQS model, you cannot specify the means of errors (or disturbances) as free parameter or fixed values other than zero in the MEAN statement.

Partitions of the $\text{[math]}$ matrix

The $\text{[math]}$ matrix is for the covariances of the independent variables, in the order of the manifest, latent factor, and error variables. The following partition of the $\text{[math]}$ matrix is assumed:

$\text{[math]}$

where $\text{[math]}$ is an $\text{[math]}$ covariance matrix for the independent manifest variables and latent factors, $\text{[math]}$ is an $\text{[math]}$ covariance matrix for the errors, and $\text{[math]}$ is an $\text{[math]}$ covariance matrix for the errors with other independent variables in the LINEQS model. Because $\text{[math]}$ is symmetric, $\text{[math]}$ and $\text{[math]}$ are also symmetric.

There are actually no model restrictions placed on the submatrices of the partition. However, in most statistical applications, errors represent unsystematic sources of effects and therefore they are not to be correlated with other systematic sources. This implies that submatrix $\text{[math]}$ is a null matrix. However, $\text{[math]}$ being null is not enforced in the LINEQS model specification. If you ever specify a covariance between an error variable and a non-error independent variable in the COV statement, as a workaround trick or otherwise, you should provide your own theoretical justifications.

Summary of Matrices and Submatrices in the LINEQS Model

Let $\text{[math]}$ be the number of dependent variables and $\text{[math]}$ be the number of independent variables. The names, roles, and dimensions of the LINEQS model matrices and submatrices are summarized in the following table:

Matrix	Name	Description	Dimensions
Model Matrices
$\text{[math]}$	_EQSALPHA_	intercepts of dependent variables	$\text{[math]}$
$\text{[math]}$	_EQSBETA_	effects of dependent (column) variables on dependent (row) variables	$\text{[math]}$
$\text{[math]}$	_EQSGAMMA_	effects of independent (column) variables on dependent (row) variables	$\text{[math]}$
$\text{[math]}$	_EQSNU_	means of independent variables	$\text{[math]}$
$\text{[math]}$	_EQSPHI_	covariance matrix of independent variables	$\text{[math]}$
Submatrices
$\text{[math]}$	_EQSGAMMA_SUB_	effects of independent variables, excluding errors, on dependent variables	$\text{[math]}$
$\text{[math]}$	_EQSNU_SUB_	means of independent variables, excluding errors	$\text{[math]}$
$\text{[math]}$	_EQSPHI11_	covariance matrix of independent variables, excluding errors	$\text{[math]}$ $\text{[math]}$
$\text{[math]}$	_EQSPHI21_	covariances of errors with other independent variables	$\text{[math]}$
$\text{[math]}$	_EQSPHI22_	covariance matrix of errors	$\text{[math]}$

Specification of the LINEQS Model

Specification in Equations

In the LINEQS statement, you specify intercepts and effect parameters (or regression coefficients) along with the variable relationships in equations. In terms of model matrices, you specify the $\text{[math]}$ vector and the $\text{[math]}$ and $\text{[math]}$ matrices in the LINEQS statement without using any matrix language.

For example:

$\text{[math]}$

In this equation, you specify $\text{[math]}$ as an outcome variable, $\text{[math]}$ and $\text{[math]}$ as predictor variables, and $\text{[math]}$ as an error variable. The parameters in the equation are the intercept $\text{[math]}$ and the path coefficients (or effects) $\text{[math]}$ and $\text{[math]}$ .

This kind of model equation is specified in the LINEQS statement. For example, the previous equation translates into the following LINEQS statement specification:

   lineqs Y = b0 Intercept + b1 X1 + b2 F2 + E1;

If the mean structures of the model are not of interest, the intercept term can be omitted. The specification becomes:

   lineqs Y =  b1 X1 + b2 F2 + E1;

See the LINEQS statement for the details in syntax.

Because of the LINEQS model restrictions (see the section Partitions of Some LINEQS Model Matrices and Their Restrictions), you must also follow these rules when specifying LINEQS model equations:

A dependent variable can appear only on the left-hand side of an equation once. In other words, you must put all predictor variables for a dependent variable in one equation. This is different from some econometric models where a dependent variable can appear on the left-hand sides of two equations to represent an equilibrium point. This limitation, however, can be resolved by reparameterization in some cases. See Example 88.2.
A dependent variable that appears on the left-hand side of an equation cannot appear on the right-hand side of the same equation. If you measure the same characteristic at different time points and the previous measurement serves as a predictor of the next measurement, you should use different variable names for the measurements so as to comply with this rule.
An error term must be specified in each equation and must be unique. The same error name cannot appear in two or more equations. When an equation is truly intended to have no error term, it should be represented equivalently in the LINEQS equation by introducing an error term with zero variance (specified in the STD statement).
The regression coefficient or effect associated with an error term must be fixed at one ( $\text{[math]}$ ). This is done automatically by omitting any fixed constants or parameters associated with the error terms. Inserting a parameter or a fixed value other than one immediately before an error term is not allowed.

Mean, Variance, and Covariance Parameter Specification

In addition to the intercept and effect parameters specified in equations, the means, variances, and covariances among all independent variables are parameters in the LINEQS model. An exception is that the means of all error variables are restricted to fixed zeros in the LINEQS model. To specify the mean, variance, and covariance parameters, you use the MEAN, STD, and the COV statements, respectively.

The means, variances, and covariances among dependent variables are not parameters themselves in the model. Rather, they are complex functions of the model parameters. See the section Matrix Representation of the LINEQS Model for mathematical details.

Default Parameters in the LINEQS Model

Model-restricted values in the LINEQS model include the zero direct effects of any variables on themselves (that is, the diagonal of $\text{[math]}$ matrix contains zeros only), the predetermined effects of error variables (that is, each element of the $\text{[math]}$ submatrix in $\text{[math]}$ is always either one or zero) and the predetermined means of error variables (that is, $\text{[math]}$ elements pertaining to error means are always zero). These fixed values are always enforced and cannot be specified differently. All other elements or locations in the LINEQS model matrices can be specified as parameters (free, constrained, or fixed) in the LINEQS and the subsidiary model specification statements, as described in the previous section.

If a location (or an element) in a LINEQS model matrix is not model-restricted and you do not specify a parameter for it, a default parameter will be assumed for this location. There are two types of default parameters. One is automatic free parameters and the other is fixed zeros.

Automatic Free Parameters

In the LINEQS model, automatic free parameters are generated for the variances of all independent variables (manifest variables, latent factors, and latent errors), the covariances among all manifest independent variables (but not latent factors nor errors), and the means of all manifest independent variables if the mean structures are modeled. The name of each automatic free parameter is generated with the _Add prefix and appended with a unique integer.

In terms of LINEQS model matrices, automatic parameters are applied to all diagonal elements of the $\text{[math]}$ submatrix, to all off-diagonal elements of the $\text{[math]}$ submatrix pertaining to the manifest independent variable in rows and in columns, to the diagonal elements of the $\text{[math]}$ submatrix, and to the elements in the $\text{[math]}$ subvector pertaining to the manifest independent variables in rows.

Default Fixed Zeros

Unspecified locations (elements) of LINEQS model matrices that are neither model-restricted nor automatic free parameters will be fixed at zeros by default.

Rationale of the Default Parameters in the LINEQS Model

To explain the automatic free parameters in the LINEQS model, note that manifest independent variables are explanatory variables in your model. Their means, variances, and covariances are not functions of other parameters in the model. Rather, together with other parameters, they explain the means, variances, and covariances of the dependent variables. Therefore, means, variances, and covariances of manifest independent variables should be saturated as free parameters unless you are also testing a theoretical pattern of these parameters. Otherwise, failing to do this might add unnecessary constraints to your model.

The same argument applies to the latent independent variables. However, because latent variable means, variances, and covariances are not observed, the treatment is different. Only the variances of latent independent variables (factors or errors) are automatic free parameters. The means and covariances among latent independent variables are fixed zeros by default.

You can override the default parameters by manual specification in the STD, COV, and MEAN statements. Manual specification of these parameters becomes necessary when you are testing theoretical covariance or mean patterns or when these parameters are constrained with other parameters in your model. For example, when you test a fixed covariance matrix pattern, you have to manually specify the fixed values for the variances of, and covariances among, exogenous manifest variables in the STD and COV statements.

Note: This procedure is experimental.

Top of Page

Types of Variables in the LINEQS Model

Naming Variables in the LINEQS Model

Matrix Representation of the LINEQS Model

Partitions of Some LINEQS Model Matrices and Their Restrictions

Model Restrictions on the Matrix

Partitions of the Matrix and the Associated Model Restrictions

Partitions of the Vector and the Associated Model Restrictions

Partitions of the matrix

Summary of Matrices and Submatrices in the LINEQS Model

Specification of the LINEQS Model

Specification in Equations

Mean, Variance, and Covariance Parameter Specification

Default Parameters in the LINEQS Model

Automatic Free Parameters

Default Fixed Zeros

Rationale of the Default Parameters in the LINEQS Model

Model Restrictions on the $\text{[math]}$ Matrix

Partitions of the $\text{[math]}$ Matrix and the Associated Model Restrictions

Partitions of the $\text{[math]}$ Vector and the Associated Model Restrictions

Partitions of the $\text{[math]}$ matrix