The CALIS Procedure

Latent Variable Scores

Analysis of Covariance Structures Only

Although latent variable or factor scores are unobserved, you can estimate them after you use the CALIS procedure to fit your model. If your model contains covariance structures only, then latent variable or factor scores are estimated as linear combinations of observed variables, weighted by the latent variable (factor) score regression coefficients. This section covers the case of covariance structures. The next section covers the case of mean and covariance structures.

You can use the PLATCOV option to display the latent variable (factor) score regression coefficients. You can also save these coefficients in an output data set by using the OUTSTAT= option. You can then provide these coefficients to the SCORE procedure to compute the latent variable (factor) scores in a data set.

To summarize, follow these steps to compute latent variable (factor) scores for each observation:

  • Create an output data set by using the OUTSTAT= option in the PROC CALIS statement.

  • Run the SCORE procedure, using both the raw data and the OUTSTAT= data set.

For example, you can use the following statements to compute the latent variable (factor) scores, which are stored in the OUTSTAT= data set named ostat:

proc calis data=raw outstat=ostat;
   lineqs
      v1 = a1 f1 + e1,
      v2 = a2 f1 + e2,
      v3 = a3 f1 + e3;
   std
      f1 = 1.,
      e1-e3 = evar1-evar3;
run;

Then, in the PROC SCORE statement, you specify the raw data set in the DATA= option and the ostat data set in the SCORE= option:

proc score data=raw score=ostat out=scores;
   var v1-v3;
run;

The data set that you specify in the OUT= option stores the latent variable (factor) scores.

Although you can get the latent variable (factor) score regression coefficients by analyzing either raw data or covariance (correlation) matrices in PROC CALIS, you must provide the raw data to the SCORE procedure in order to compute latent variable (factor) scores. For a more detailed example, see Example 100.1 in Chapter 100: The SCORE Procedure. Although that example uses PROC FACTOR, the scoring procedure that is demonstrated in the example is also applicable to PROC CALIS. For the conceptual differences of factor scores between the FACTOR and CALIS procedures, see the section Factor Scores in PROC FACTOR and PROC CALIS.

Analysis of Mean and Covariance Structures

When you model the mean and covariance structures simultaneously, the computation of latent variable (factor) scores is not as straightforward as the case where you analyze covariance structures only. However, if the mean structures in your model are saturated, you can still use the steps that are described in the preceding section. For example, if you specify the MEANSTR option and do not specify any other mean parameters or constraints on means, then the mean structures in your model are saturated with default parameters. Another example is when you specify METHOD= FIML but do not specify any mean parameters or mean constraints. In this case, PROC CALIS adds default mean parameters to your model so that the mean structures are also saturated.

If the mean structures in your models are not saturated, you need to adjust the OUTSTAT= data set that contains the scoring coefficients before submitting it to the SCORE procedure to compute latent variable (factor) scores. For example, your OUTSTAT= data set for a mean and covariance structure model might look like the following:

    OBS    _TYPE_      _NAME_          x1          x2          x3       f1

      1    OBSERVED                1.0000      1.0000      1.0000    0.00000
      2    MEAN                    0.8674      0.9787      1.0114     .     
      3    SKEWNESS               -0.1240     -0.1181     -0.8996     .     
      4    KURTOSIS               -0.2892      0.4518      1.1604     .     
      5    N                      50.0000     50.0000     50.0000     .     
      6    SUMWGT                 50.0000     50.0000     50.0000     .     
      7    VARDIV                 49.0000     49.0000     49.0000     .     
      8    COV         x1          1.4845      1.1501      1.3554     .     
      9    COV         x2          1.1501      1.5138      1.4084     .     
     10    COV         x3          1.3554      1.4084      1.9416     .     
     11    MAXRES      _Mean_     -0.0314      0.0384     -0.0118     .     
     12    MAXRES      x1          0.0318     -0.0039      0.0319     .     
     13    MAXRES      x2         -0.0039     -0.0448     -0.0204     .     
     14    MAXRES      x3          0.0319     -0.0204      0.0151     .     
     15    MAXASRES    _Mean_     -0.7066      0.7065     -0.7065     .     
     16    MAXASRES    x1          0.6853     -0.5405      0.6984     .     
     17    MAXASRES    x2         -0.5405     -0.7306     -0.6912     .     
     18    MAXASRES    x3          0.6984     -0.6912      0.7000     .     
     19    MAXPRED     _Mean_      0.8987      0.9403      1.0232    0.50558
     20    MAXPRED     x1          1.4527      1.1540      1.3236    1.03392
     21    MAXPRED     x2          1.1540      1.5586      1.4288    1.11613
     22    MAXPRED     x3          1.3236      1.4288      1.9265    1.28014
     23    MAXPRED     f1          1.0339      1.1161      1.2801    1.00000
     24    SCORE       f1          0.2001      0.2650      0.3305     .     

Observation 24 in this OUTSTAT= data set contains the scoring coefficients for computing the factor scores of f1. Observation 2 contains the sample means of the observed variables x1–x3. These are the means that the SCORE procedure uses to compute the deviation scores of the observed variables (before the deviation scores are multiplied by the scoring coefficients). However, when your model contains mean structures, you should use the predicted variable means instead of the observed variable means to compute the deviation scores. In the OUTSTAT= data set, the predicted means are stored in the observation that has _TYPE_=MAXPRED and _NAME_=_Mean_ (observation 19). In order to "trick" the SCORE procedure into using the predicted means, you can use the following statements:

data ostat2;
   set ostat;
   if _TYPE_='MEAN' then _TYPE_='OBSMEAN';
   if _TYPE_='MAXPRED' & _NAME_='_Mean_' then _TYPE_='MEAN';
run;

proc score data=raw score=ostat2 out=Scores2;
   var v1-v3;
run;

The DATA step creates the data set ostat2 by copying the original OUTSTAT= data set, ostat, but it also change the value of the _TYPE_ variable in observation 19 so that it becomes

     19    MEAN        _Mean_      0.8987      0.9403      1.0232    0.50558

The SCORE procedure then uses the means in this observation to compute the deviation scores and hence the latent variable (factor) scores. It saves the scores in a new OUT= data set, Scores2. If the means of the factors are not important in subsequent analyses of the factor scores (for example, exploratory factor analysis), you can use the latent variable (factor) scores in the Scores2 data set without further processing.

However, if the factor means are important, you need to do perform one more step, as shown in the following statements:

data Scores3;
   set Scores2;
   f1  =  f1 + 0.50558;
run;

The data set Scores3 is essentially a copy of the Scores2 data set, but it adds the constant 0.50558 to the factor variable f1. This constant value is the predicted factor mean of f1, as you can see from observation 19 in the original OUTSTAT= data set. Before this constant is added, the expected value of f1 is 0. However, the new score data set, Scores3, changes this expected value to 0.50558, which is consistent with the original model results. If you have more latent variable (factors) in your model, you need to add the predicted mean for each of the latent variables (factors) when you create the target data set for the latent variable (factor) scores.

Factor Scores in PROC FACTOR and PROC CALIS

Conceptually, the scoring coefficients that you obtain from PROC FACTOR and PROC CALIS are different in their scoring applications. The scoring coefficients that PROC FACTOR produces are applied to observed variables in the standardized form (with mean 0 and standard deviation 1), whereas the scoring coefficients that PROC CALIS produces are applied to observed variables in the deviation form (with mean 0). By default, PROC SCORE uses the standardized variables, so it is compatible with the scoring coefficients that are obtained from PROC FACTOR. However, because the CALIS procedure does not contain the _TYPE_=STD observation in the OUTSTAT= data set (unless you specify the CORR option), PROC SCORE does not scale the observed variables by the standard deviations. Hence, in effect, PROC SCORE is able to use the correct deviation form for the observed variables "automatically" in computing latent variable (factor) scores.

When you specify the CORR option in the PROC CALIS statement, the estimation results are based on the analysis of correlation structures. The factor scoring coefficients are now applied to observed variables in the standardized form rather than the default deviation form. If you also specify the OUTSTAT= option, the OUTSTAT= data set for the correlation structure analysis contains the _TYPE_=STD observation. Consequently, when you use the SCORE procedure to compute latent variable (factor) scores, the standard deviations of the variables are available through the _TYPE_=STD observation in the OUTSTAT= data set. Therefore, even if you use PROC CALIS to perform correlational analysis (instead of the default covariance analysis), PROC SCORE is still able to use the correct standardized form for the observed variables "automatically" in computing latent variable (factor) scores.