VAR Statement |
The VAR statement defines and limits the set of observed variables that are available for the corresponding model analysis. It is one of the subsidiary group specification statements. You can use the VAR statement no more than once within the scope of each GROUP or the PROC CALIS statement. The set of variables in the VAR statement must be present in the data set specified in the associated GROUP or the PROC CALIS statement.
The VAR statement should not be confused with the PARAMETERS statement. In the PARAMETERS statement, you specify additional parameters in the model. Parameters are population quantities that characterize the functional relationships, variations, or covariation among variables. Unfortunately, parameters are sometimes referred to as var_list in the optimization context. You have to make sure that all variables specified in the VAR statement refer to the variables in the input data set, while the parameters specified in the PARAMETERS statement are population quantities that characterize distributions of the variables and their relationships.
In some modeling languages of PROC CALIS, you can also specify the observed variables either directly (for example, through the VAR= or similar option in some main model specification statements) or indirectly (for example, through the specification of functional relationships between observed variables). How does the VAR statement specifications interplay with the observed variables specified in the model? This depends on the types of models specified. Four different cases are considered in the following.
Case 1. Exploratory Factor Models With No VAR= option in the FACTOR statement. For exploratory factor models specified using the FACTOR statement, it is important for you to use the VAR statement to select and limit the set of the observed variables for analysis. The reason is simply that there is no other options in the FACTOR statement that will serve the same purpose. For example, you analyze only v1–v3 in the following exploratory factor model even though there might be more observed variables available in the data set:
proc calis; var v1-v3; factor n=1;
If you do not specify the VAR statement, PROC CALIS simply selects all numerical variables for analysis. However, to avoid confusions it is a good practice to specify the observed variables explicitly in the VAR statement.
Case 2. Models With a VAR= or Similar Option for Defining the Set of Observed Variables for Analysis. The classes of models considered here are: COSAN, LISMOD, MSTRUCT, and RAM. Except for the LISMOD models, in all other three classes of models you can specify the observed variables in the model by using the a VAR= option in the respective main model specification statement. For the LISMOD models, you can specify all observed variables that should be included in the model in the XVAR= and YVAR= options of the LISMOD statement. Therefore, the use of the VAR statement for these models might become unnecessary. For example, the following MSTRUCT statement specifies the observed variables v1–v6 in the VAR= option:
proc calis; mstruct var=v1-v6;
It would have been redundant to use a VAR statement to specify v1–v6 additionally. The same conclusion applies to the COSAN and the RAM models.
Another example is when you specify a LISMOD model. In the following LISMOD specification, variables v1–v8 would be the set of observed variables for analysis:
proc calis; var v1-v8; lismod xvar = v1-v4, yvar = v5-v8, eta = factor1, xi = factor2;
Again, there is no need to add a VAR statement merely repeating the specification of variables v1–v8.
If you do specify the VAR statement in addition to the specification of variable lists in these models, PROC CALIS will check the consistency between the lists. Conflicts arise if the two lists do not match.
For example, the following statements will generate an error in model specification because v6 specified in the MSTRUCT model is not defined as an observed variable available for analysis in the VAR statement (even if v6 might indeed be present in the data set):
proc calis; var v1-v5; mstruct var=v1-v6;
So it is an error when you specify fewer observed variables in the VAR statement than in the VAR= option in the model. How about if you specify more variables in the VAR statement? PROC CALIS will also general an error because the extra variables in VAR statement will not be well-undefined in the model. For example, v7–v10 specified in the VAR statement are supposed to be included into the model, but they not listed on either the XVAR= or YVAR= list in the following LISMOD statement:
proc calis; var v1-v10; lismod xvar = v1-v3, yvar = v4-v6, eta = factor1, xi = factor2;
Therefore, if you must specify the VAR statement for these models, the specifications of the observed variables must be consistent in the VAR statement and in the relevant model options. However, to avoid potential conflicts in these situations, you are recommended to specify the observed variables in the VAR=, XVAR=, or YVAR= lists only.
When the VAR= option is not specified in the COSAN, MSTRUCT, or RAM statement, the VAR statement specification will be used as the list of observed variables in the model. If both of the VAR= option and VAR statement specification are lacking, then all numerical variables in the associated data set will be used in the model. However, to avoid confusions the preferred method is to specify the list of observed variables explicitly on the VAR=, XVAR=, or YVAR= option of the main model specification statements.
Case 3. Models With Certain Indirect Ways to Include the Set of Observed Variables for Analysis. Two types of models are considered here: LINEQS and PATH. For these models, the main use of the VAR statement is to include those observed variables that are not mentioned in model specifications.
For example, in the following statements for a LINEQS model variable v3 is not mentioned in the LINEQS statement:
proc calis; var v1-v3; lineqs v1 = a1 * v2 + e1;
With the specification in the VAR statement, however, variable v3 is included into the model as an exogenous manifest variable. Similarly, the same applies to the following PATH model specification:
proc calis; var v1-v3; path v1 <- v2;
Again, variable v3 is included into the PATH model because it is specified in the VAR statement.
The two preceding examples also suggest that you do not need to use the VAR statement when your already mentions all observed variables in the model specification. For example, if your target set of observed variable are v1–v3, the use of the VAR statement in the following specification is unnecessary:
proc calis; var v1-v3; path v1 <- v2; pvar v3;
For the two types of models considered here, you can also use the VAR statement to define and limit the set of observed variables for analysis. For example, you might have v1, v2, v3 in your data set as observed variables for analysis; but somehow in your model v2 should be treated as a latent variable. You might use the following code to exclude v2 as an observed variable in the model:
proc calis; var v1 v3; path v1 <- v2; pvar v3;
The role of the VAR statement here is to define and limit the set of observed variables available for the model. Hence, only variables v1 and v3 are supposed to be observed variables in the model while variable v2 in the PATH model is treated as latent.
In sum, in the current situation the use of the VAR statement should depend on whether a variable should or should not be included as an observed variable in your theoretical model.
Case 4. Confirmatory Factor Model With the FACTOR statement. In this case, the VAR statement still limits the set of observed variables being analyzed in the confirmatory factor model. However, because all observed variables in a confirmatory factor analysis must be loaded on (or related to) some factors through the specification of factor-variable-relations in the FACTOR statement, all observed variables in the model should have been specified (or mentioned) in the FACTOR statement already, making it redundant to use the VAR statement for the same purpose.