The CALIS Procedure

LINEQS Statement

LINEQS <equation <, equation ...>> ;

where equation represents:

dependent = term < $\text{[math]}$ term ...>

and each term represents one of the following:

$\text{[math]}$ coefficient-name < (number) > < * > variable-name
$\text{[math]}$ prefix-name < (number) > < * > variable-name
$\text{[math]}$ < number > < * > variable-name

The LINEQS statement is a main model specification statement that invokes the LINEQS modeling language. You can specify at most one LINEQS statement in a model, within the scope of either the PROC CALIS statement or a MODEL statement. To completely specify a LINEQS model, you might need to add some subsidiary model specification statements such as the VARIANCE, COV, and MEAN statements. The syntax for the LINEQS modeling language is as follows:

LINEQS <equation <, equation ...>> ;

VARIANCE partial-variance-parameters ;

COV covariance-parameters ;

MEAN mean-parameters ;

In the LINEQS statement, you use equations to specify the linear functional relations among manifest and latent variables. Equations in the LINEQS statement are separated by commas.

In the VARIANCE statement, you specify the variance parameters. In the COV statement, you specify the covariance parameters. In the MEAN statement, you specify the mean parameters. For details of these subsidiary model specification statements, see the syntax of these statements.

In the LINEQS statement, in addition to the functional relations among variables, you specify the coefficient parameters of interest in the equations. There are five types of parameters you can specify in equations, as shown in the following example:

lineqs
   V1 =         * F1 + E1,
   V2 = (.5)    * F1 + E2,
   V3 = 1.      * F1 + E3,
   V4 = b4      * F1 + E4;
   V5 = b5 (.4) * F1 + E5;

In this example, you have manifest variables V1–V5, which are related to a latent factor, denoted by F1, as specified in the equations. In each equation, you have one outcome variable (V-variable), one predictor variable (F1, which is assumed to be a latent factor, the so-called F-variable), and one error variable (E-variable). The following four types of parameters have been specified:

an unnamed free parameter

The effect of F1 on V1 in the first equation is an unnamed free parameter. Although you specify nothing before the asterisk sign, the effect parameter is effectively specified. For an unnamed free parameter, PROC CALIS generates a parameter name with the _Parm prefix and appended with a unique integer (for example, _Parm1, _Parm2, and so on).
an initial value

The effect of F1 on V2 in the second equation is an unnamed free parameter with an initial estimate of 0.5. PROC CALIS also generates a parameter name for this specification. Notice that you must use a pair of parentheses for the initial value specification because it is interpreted as a fixed value otherwise, as described in the next case.
a fixed value

The effect of F1 on V3 in the third equation is an unnamed free parameter with a fixed value of 0.5. A fixed value remains the same in the estimation. There is no parameter name for a fixed constant in the model.
a free parameter with a name

The effect of F1 on V4 in the fourth equation is a free parameter named b4. You do not provide an initial estimate for this free parameter.
a free parameter with a name and an initial estimate

The effect of F1 on V5 in the fifth equation is a free parameter named b5 with an initial estimate of 0.4. Parameters with no starting values are initialized by various heuristic and effective methods in PROC CALIS. See the section Initial Estimates for details.

Notice that there must be an error term in each equation. The error terms in equation must start with the prefix 'E', 'e', 'D', or 'd'. See the section Representing Latent Variables in the LINEQS Model for details about naming the factors and error terms. The effect or the path coefficient attached to an error term must be 1.0. This is implicitly specified as in the preceding example. For example, there is no parameter specification nor an asterisk sign before the error term E1 in the first equation, as shown in the following:

   V1 =         * F1 + E1,

This specification is the same as the following explicit specification with a fixed constant 1.0 for the effect of the error term E1:

   V1 =         * F1 + 1. * E1,

The equivalence shown here implies that you can also specify the third equation in the following equivalent way:

   V3 =  F1 + E3,

This implicitly specifies a constant 1.0 for the effect of F1 on V3. You must be very careful about the distinction between this specification and the following one with an asterisk before F1:

   V3 =  * F1 + E3,

With an asterisk sign, the effect of F1 on V3 becomes an unnamed free parameter in the current specification. This interpretation is very different from the preceding one without an asterisk sign before F1, which assumes a fixed constant of 1.0.

Except for the unnamed free parameter specification, you can omit the asterisk signs in all other types of parameter specifications. That is, you can use the following equivalent statement for the preceding LINEQS specification:

lineqs
   V1 =         * F1 + E1,
   V2 = (.5)      F1 + E2,
   V3 = 1.        F1 + E3,
   V4 = b4        F1 + E4;
   V5 = b5 (.4)   F1 + E5;

Again, you cannot omit the asterisk in the first equation because it is intended to denote an unnamed free parameter.

If your model contains many unconstrained parameters and it is too cumbersome to find different parameter names, you can specify all those parameters by the same prefix-name. A prefix name is a short name called "root" followed by two underscores __. Whenever a prefix-name is encountered, the CALIS procedure generates a parameter name by appending a unique integer to the root. Hence, the prefix-name should have few characters so that the generated parameter name is not longer than thirty-two characters. To avoid unintentional equality constraints, the prefix names should not coincide with explicitly defined parameter names. The following statement illustrates the uses prefix-names:

lineqs
   V1 = 1.   * F1 + E1,
   V2 = b__  * F1 + E2,
   V3 = b__  * F1 + E3,
   V4 = b__  * F1 + E4;
   V5 = b__  * F1 + E5;

In the five equations, only the first equation has a fixed constant 1.0 for the effect of F1 on V1. For all the remaining equations, the effects of F1 on the variables are all free parameters with the prefix b. The generated parameter names for these effects have unique integers appended to this prefix. For example, b1, b2, b3, and b4 are the parameter names for these effects.

Representing Latent Variables in the LINEQS Model

Because latent variables are widely used in structural equation modeling, PROC CALIS needs a way to identify different types of latent variables that are specified in the LINEQS model. This is accomplished by following some naming conventions for the latent variables. See the section Naming Variables in the LINEQS Model for details about these naming rules. Essentially, latent factors (systematic sources) must start with the letter 'F' or 'f'. Error terms must start with the letter 'E', 'e', 'D', or 'd'. Prefix 'E' or 'e' represents the error term of an endogenous manifest variable. Prefix 'D' or 'd' represents the disturbance (or error) term of an endogenous latent variable. Although D- and E- variables are conceptually different, for modeling purposes 'D' and 'E' prefixes are interchangeable in the LINEQS modeling language. Essentially, only the distinction between latent factors (systematic sources) and errors or disturbances (unsystematic sources) is critical in specifying a proper LINEQS model. Manifest variables in the LINEQS model do not need to follow additional naming rules beyond those required by the general SAS System—they are recognized by PROC CALIS by referring to the variables in the input data sets.

Types of Variables and Semantic Rules of Equations

Depending on their roles in the system of equations, variables in a LINEQS model can be classified into endogenous or exogenous. An endogenous variable is a variable that serves as an outcome variable (left-hand side of an equation) in one of the equations. All other variables are exogenous variables, including those manifest variables that do not appear in any equations but are included in the model because they are specified in the VAR statement for the analysis.

Merely following the syntactic rules described so far is not sufficient to define a proper system of equations that PROC CALIS can analyze. You also need to observe the following semantic rules:

Only manifest or latent variables can be endogenous. This means that you cannot specify any error or disturbances variables on the left-hand side of the equations. This also means that error and disturbance variables are always exogenous in the LINEQS model.
An endogenous variable that appears on the left-hand side of an equation cannot appear on the left-hand side of another equation. In other words, you have to specify all the predictors for an endogenous variable in a single equation.
An endogenous variable that appears on the left-hand side of an equation cannot appear on the right-hand side of the same equation. This prevents a variable to have a direct effect on itself (but indirect effect on itself is possible).
Each equation must contain one and only one unique error term, be it an E-variable for a manifest outcome variable or a D-variable for a latent outcome variable. If, indeed, you want to specify an equation without an error term, you can equivalently set the variance of the error term to a fixed zero in the VARIANCE statement.

Mean Structures in Equations

To fit a LINEQS model with mean structures, you can specify the MEANSTR option in the PROC CALIS or the associated MODEL statement. This generates the default mean and intercept parameters for the model (see the section Default Parameters). Alternatively, you can specify the intercept parameters with the Intercept variable in the equations or the mean parameters in the MEAN statement. The Intercept variable in the LINEQS model is a special "variable" that contains the value 1 for each observation. You do not need to have this variable in your input data set, nor do you need to generate it in the DATA step. It serves as a notational convenience in the LINEQS modeling language. The actual intercept parameter is expressed as a coefficient parameter with the intercept variable. For example, consider the following LINEQS model specification:

lineqs
   V1 = a1 (10) * Intercept +  1.0     * F1 + E1,
   V2 =         * Intercept +          * F1 + E2,
   V3 =                     +  b2      * F1 + E3,
   V4 = a2      * Intercept +  b2      * F1 + E4,
   V5 = a2      * Intercept +  b4 (.4) * F1 + E5;

In the first equation, a1, with a starting value at $\text{[math]}$ , is the intercept parameter of V1. In the second equation, the intercept parameter of V2 is an unnamed free parameter. In the third equation, although you do not specify the Intercept variable, the intercept parameter of manifest variable V3 is assumed to be a free parameter by default. See the section Default Parameters for more details about default parameters. In the fourth and the fifth equations, the intercept parameters are both named a2. This means that these intercepts are constrained to be the same in the estimation.

In some cases, you might need to set the intercepts to fixed constants such as zeros. You can use the following syntax:

lineqs
   V1 = 0 * Intercept +  F_intercept + a2 * F_slope + E1;

This sets the intercept parameter of V1 to a fixed zero. An example of this application is the analysis of latent growth curve model in which you define the intercept as a random variable represented by a latent factor (for example, F_intercept in the specification). See Example 26.24 for a detailed example.

To complete the specification of the mean structures in the LINEQS model, you might want to use the MEAN statement to specify the mean parameters. For example, the following statements specify the means of F_intercept and F_slope as unnamed free parameters in the LINEQS model:

lineqs
   V1 = 0 * Intercept +  F_intercept + 1 * F_slope + E1;
mean
   F_intercept F_slope;

See the MEAN statement for details.

Default Parameters

It is important to understand the default parameters in the LINEQS model. First, if you know which parameters are default free parameters, you can make your specification more efficient by omitting the specifications of those parameters that can be set by default. For example, because all variances and covariances among exogenous variables (excluding error terms) are free parameters by default, you do not need to specify them with the COV and VARIANCE statements if these variances and covariances are not constrained. Second, if you know which parameters are default fixed zero parameters, you can specify your model accurately. For example, because all error covariances in the LINEQS model are fixed zeros by default, you must use the COV statement to specify the covariances among the errors if you want to fit a model with correlated errors. See the section Default Parameters in the LINEQS Model for details about the default parameters of the LINEQS model.

Modifying a LINEQS Model from a Reference Model

This section assumes that you use a REFMODEL statement within the scope of a MODEL statement and that the reference model (or base model) is a LINEQS model. The reference model is called the old model, and the model being defined is called the new model. If the new model is not intended to be an exact copy of the old model, you can use the extended LINEQS modeling language described in this section to make modifications within the scope of the MODEL statement for the new model.

The syntax of the extended LINEQS modeling language is the same as that of the ordinary LINEQS modeling language (see the section LINEQS Statement):

LINEQS <equation <, equation ...>> ;

VARIANCE partial-variance-parameters ;

COV covariance-parameters ;

MEAN mean-parameters ;

The new model is formed by integrating with the old model in the following ways:

Duplication:: If you do not specify in the new model an equation with an outcome variable (that is, a variable on the left side of the equal sign) that exists in the old model, the equation with that outcome variable in the old model is duplicated in the new model. For specifications other than the LINEQS statement, if you do not specify in the new model a parameter location that exists in the old model, the old parameter specification is duplicated in the new model.
Addition:: If you specify in the new model an equation with an outcome variable that does not exist as an outcome variable in the equations of the old model, the equation is added in the new model. For specifications other than the LINEQS statement, if you specify in the new model a parameter location that does not exist in the old model, the new parameter specification is added in the new model.
Deletion:: If you specify in the new model an equation with an outcome variable that also exists as an outcome variable in an equation of the old model and you specify the missing value '.' as the only term on the right-hand side of the equation in the new model, the equation with the same outcome variable in the old model is not copied into the new model. For specifications other than the LINEQS statement, if you specify in the new model a parameter location that also exists in the old model and the new parameter is denoted by the missing value '.', the old parameter specification is not copied into the new model.
Replacement:: If you specify in the new model an equation with an outcome variable that also exists as an outcome variable in an equation of the model and the right-hand side of the equation in the new model is not denoted by the missing value '.', the new equation replaces the old equation with the same outcome variable in the new model. For specifications other than the LINEQS statement, if you specify in the new model a parameter location that also exists in the old model and the new parameter is not denoted by the missing value '.', the new parameter specification replaces the old one in the new model.

For example, the following two-group analysis specifies Model 2 by referring to Model 1 in the REFMODEL statement:

proc calis;
   group 1 / data=d1;
   group 2 / data=d2;
   model 1 / group=1;
      lineqs
         V1   =      1 * F1   + E1,
         V2   =  load1 * F1   + E2,
         V3   =  load2 * F1   + E3,
         F1   =     b1 * V4   + b2 * V5 + b3 * V6 + D1;
      variance 
         E1-E3  = ve1-ve3,
         D1     = vd1,
         V4-V6  = phi4-phi6;
      cov
         E1 E2 = cve12;
   model 2 / group=2;
      refmodel 1;
      lineqs
         V3   = load1 * F1 + E3;
      cov
         E1 E2 = ., 
         E2 E3 = cve23;
   run;

Model 2 is the new model which refers to the old model, Model 1. This example illustrates the four types of model integration:

Duplication: All equations, except the one with outcome variable V3, in the old model are duplicated in the new model. All specifications in the VARIANCE and COV statements, except the covariance between E1 and E2, in the old model are also duplicated in the new model.
Addition: The parameter cve23 for the covariance between E2 and E3 is added in the new model.
Deletion: The specification of covariance between E1 and E2 in the old model is not copied into the new model, as indicated by the missing value '.' specified in the new model.
Replacement: The equation with V3 as the outcome variable in the old model is replaced with a new equation in the model. The new equation uses parameter load1 so that it is now constrained to be the same as the regression coefficient in the equation with V2 as the outcome variable.