Although latent variable or factor scores are unobserved, you can estimate them after you use the CALIS procedure to fit your model. If your model contains covariance structures only, then latent variable or factor scores are estimated as linear combinations of observed variables, weighted by the latent variable (factor) score regression coefficients. This section covers the case of covariance structures. The next section covers the case of mean and covariance structures.
You can use the PLATCOV option to display the latent variable (factor) score regression coefficients. You can also save these coefficients in an output data set by using the OUTSTAT= option. You can then provide these coefficients to the SCORE procedure to compute the latent variable (factor) scores in a data set.
To summarize, follow these steps to compute latent variable (factor) scores for each observation:
Create an output data set by using the OUTSTAT= option in the PROC CALIS statement.
Run the SCORE procedure, using both the raw data and the OUTSTAT= data set.
For example, you can use the following statements to compute the latent variable (factor) scores, which are stored in the
OUTSTAT= data set named ostat
:
proc calis data=raw outstat=ostat; lineqs v1 = a1 f1 + e1, v2 = a2 f1 + e2, v3 = a3 f1 + e3; std f1 = 1., e1-e3 = evar1-evar3; run;
Then, in the PROC SCORE statement, you specify the raw data set in the DATA= option and the ostat
data set in the SCORE= option:
proc score data=raw score=ostat out=scores; var v1-v3; run;
The data set that you specify in the OUT= option stores the latent variable (factor) scores.
Although you can get the latent variable (factor) score regression coefficients by analyzing either raw data or covariance (correlation) matrices in PROC CALIS, you must provide the raw data to the SCORE procedure in order to compute latent variable (factor) scores. For a more detailed example, see Example 100.1 in Chapter 100: The SCORE Procedure. Although that example uses PROC FACTOR, the scoring procedure that is demonstrated in the example is also applicable to PROC CALIS. For the conceptual differences of factor scores between the FACTOR and CALIS procedures, see the section Factor Scores in PROC FACTOR and PROC CALIS.
When you model the mean and covariance structures simultaneously, the computation of latent variable (factor) scores is not as straightforward as the case where you analyze covariance structures only. However, if the mean structures in your model are saturated, you can still use the steps that are described in the preceding section. For example, if you specify the MEANSTR option and do not specify any other mean parameters or constraints on means, then the mean structures in your model are saturated with default parameters. Another example is when you specify METHOD= FIML but do not specify any mean parameters or mean constraints. In this case, PROC CALIS adds default mean parameters to your model so that the mean structures are also saturated.
If the mean structures in your models are not saturated, you need to adjust the OUTSTAT= data set that contains the scoring coefficients before submitting it to the SCORE procedure to compute latent variable (factor) scores. For example, your OUTSTAT= data set for a mean and covariance structure model might look like the following:
OBS _TYPE_ _NAME_ x1 x2 x3 f1 1 OBSERVED 1.0000 1.0000 1.0000 0.00000 2 MEAN 0.8674 0.9787 1.0114 . 3 SKEWNESS -0.1240 -0.1181 -0.8996 . 4 KURTOSIS -0.2892 0.4518 1.1604 . 5 N 50.0000 50.0000 50.0000 . 6 SUMWGT 50.0000 50.0000 50.0000 . 7 VARDIV 49.0000 49.0000 49.0000 . 8 COV x1 1.4845 1.1501 1.3554 . 9 COV x2 1.1501 1.5138 1.4084 . 10 COV x3 1.3554 1.4084 1.9416 . 11 MAXRES _Mean_ -0.0314 0.0384 -0.0118 . 12 MAXRES x1 0.0318 -0.0039 0.0319 . 13 MAXRES x2 -0.0039 -0.0448 -0.0204 . 14 MAXRES x3 0.0319 -0.0204 0.0151 . 15 MAXASRES _Mean_ -0.7066 0.7065 -0.7065 . 16 MAXASRES x1 0.6853 -0.5405 0.6984 . 17 MAXASRES x2 -0.5405 -0.7306 -0.6912 . 18 MAXASRES x3 0.6984 -0.6912 0.7000 . 19 MAXPRED _Mean_ 0.8987 0.9403 1.0232 0.50558 20 MAXPRED x1 1.4527 1.1540 1.3236 1.03392 21 MAXPRED x2 1.1540 1.5586 1.4288 1.11613 22 MAXPRED x3 1.3236 1.4288 1.9265 1.28014 23 MAXPRED f1 1.0339 1.1161 1.2801 1.00000 24 SCORE f1 0.2001 0.2650 0.3305 .
Observation 24 in this OUTSTAT= data set contains the scoring coefficients for computing the factor scores of f1
. Observation 2 contains the sample means of the observed variables x1
–x3
. These are the means that the SCORE procedure uses to compute the deviation scores of the observed variables (before the
deviation scores are multiplied by the scoring coefficients). However, when your model contains mean structures, you should
use the predicted variable means instead of the observed variable means to compute the deviation scores. In the OUTSTAT= data
set, the predicted means are stored in the observation that has _TYPE_=MAXPRED and _NAME_=_Mean_ (observation 19). In order
to "trick" the SCORE procedure into using the predicted means, you can use the following statements:
data ostat2; set ostat; if _TYPE_='MEAN' then _TYPE_='OBSMEAN'; if _TYPE_='MAXPRED' & _NAME_='_Mean_' then _TYPE_='MEAN'; run; proc score data=raw score=ostat2 out=Scores2; var v1-v3; run;
The DATA step creates the data set ostat2
by copying the original OUTSTAT= data set, ostat
, but it also change the value of the _TYPE_ variable in observation 19 so that it becomes
19 MEAN _Mean_ 0.8987 0.9403 1.0232 0.50558
The SCORE procedure then uses the means in this observation to compute the deviation scores and hence the latent variable
(factor) scores. It saves the scores in a new OUT= data set, Scores2
. If the means of the factors are not important in subsequent analyses of the factor scores (for example, exploratory factor
analysis), you can use the latent variable (factor) scores in the Scores2
data set without further processing.
However, if the factor means are important, you need to do perform one more step, as shown in the following statements:
data Scores3; set Scores2; f1 = f1 + 0.50558; run;
The data set Scores3
is essentially a copy of the Scores2
data set, but it adds the constant 0.50558 to the factor variable f1
. This constant value is the predicted factor mean of f1
, as you can see from observation 19 in the original OUTSTAT= data set. Before this constant is added, the expected value
of f1
is 0. However, the new score data set, Scores3
, changes this expected value to 0.50558, which is consistent with the original model results. If you have more latent variable
(factors) in your model, you need to add the predicted mean for each of the latent variables (factors) when you create the
target data set for the latent variable (factor) scores.
Conceptually, the scoring coefficients that you obtain from PROC FACTOR and PROC CALIS are different in their scoring applications. The scoring coefficients that PROC FACTOR produces are applied to observed variables in the standardized form (with mean 0 and standard deviation 1), whereas the scoring coefficients that PROC CALIS produces are applied to observed variables in the deviation form (with mean 0). By default, PROC SCORE uses the standardized variables, so it is compatible with the scoring coefficients that are obtained from PROC FACTOR. However, because the CALIS procedure does not contain the _TYPE_=STD observation in the OUTSTAT= data set (unless you specify the CORR option), PROC SCORE does not scale the observed variables by the standard deviations. Hence, in effect, PROC SCORE is able to use the correct deviation form for the observed variables "automatically" in computing latent variable (factor) scores.
When you specify the CORR option in the PROC CALIS statement, the estimation results are based on the analysis of correlation structures. The factor scoring coefficients are now applied to observed variables in the standardized form rather than the default deviation form. If you also specify the OUTSTAT= option, the OUTSTAT= data set for the correlation structure analysis contains the _TYPE_=STD observation. Consequently, when you use the SCORE procedure to compute latent variable (factor) scores, the standard deviations of the variables are available through the _TYPE_=STD observation in the OUTSTAT= data set. Therefore, even if you use PROC CALIS to perform correlational analysis (instead of the default covariance analysis), PROC SCORE is still able to use the correct standardized form for the observed variables "automatically" in computing latent variable (factor) scores.