Previous Page | Next Page

The TCALIS Procedure

Example 88.4 Confirmatory Factor Analysis: Cognitive Abilities

In this example, cognitive abilities of 64 students from a middle school were measured. The fictitious data contain nine cognitive test scores. Three of the scores were for reading skills, three others were for math skills, and the remaining three were for writing skills. The covariance matrix for the nine variables was obtained. A confirmatory factor analysis with three factors was conducted. The following is the input data set and the PROC TCALIS specification for the analysis:

   title "Confirmatory Factor Analysis Using the FACTOR Modeling Language";
   title2 "Cognitive Data";    
   data cognitive1(type=cov);
      _type_='cov';
      input _name_ $ reading1 reading2 reading3 math1 math2 math3 
            writing1 writing2 writing3;
      datalines;
   reading1 83.024    .      .      .      .      .      .      .      .   
   reading2 50.924 108.243   .      .      .      .      .      .      .   
   reading3 62.205  72.050 99.341   .      .      .      .      .      .   
   math1    22.522  22.474 25.731 82.214   .      .      .      .      .   
   math2    14.157  22.487 18.334 64.423 96.125   .      .      .      .   
   math3    22.252  20.645 23.214 49.287 58.177 88.625   .      .      .   
   writing1 33.433  42.474 41.731 25.318 14.254 27.370 90.734   .      .   
   writing2 24.147  20.487 18.034 22.106 26.105 22.346 53.891 96.543   .   
   writing3 13.340  20.645 23.314 19.387 28.177 38.635 55.347 52.999 98.445  
   ;


   proc tcalis data=cognitive1 nobs=64 modification;
      factor
         Read_Factor   -> reading1-reading3  = load1-load3,
         Math_Factor   -> math1-math3        = load4-load6,
         Write_Factor  -> writing1-writing3  = load7-load9;
      pvar
         Read_Factor Math_Factor Write_Factor = 3 * 1.,
         reading1-reading3 math1-math3 writing1-writing3 = errvar1-errvar9;
   run;

In the PROC TCALIS statement, the number of observations is specified with the NOBS= option. The MODIFICATION option in the same statement requests model modification indices be computed.

The FACTOR modeling language is the most handy tool for specifying confirmatory factor models. The FACTOR statement is used to invoke this modeling language. Entries in the FACTOR statement are for specifying factor-variables relationships and are separated by commas. In each entry, you first specify a latent factor, followed by the right arrow sign –>. Then you specify the observed variables that have nonzero loadings on the factor. Next, an equal sign is used to signify the specification of the loading parameters that follow. The loading parameters can be names (parameters without initial estimates), numbers (fixed values), or names followed by parenthesized numbers (parameter with initial values). In this example, there are three factors: Read_Factor, Math_Factor, and Write_Factor. These factors have simple cluster structures with the nine observed variables. Each observed variable has only one loading on exactly one factor, yielding a total of nine loading parameters named load1load9. No initial estimates are specified for them. They are computed by PROC TCALIS.

In the PVAR statement, you specify the variances of the factors and the error variances for the observed variables. The factor variances in this model are fixed at ones for identification purposes. The error variances for the observed variables are free parameters without initial estimates, named errvar1errvar9, respectively.

The covariances of the factors are not specified in the model, meaning that the factors are uncorrelated by default. Certainly, this might not be reasonable. But for illustration purposes, this uncorrelated factor model is fitted. With the MODIFICATION option in the PROC TCALIS statement, LM (Lagrange Multiplier) tests are conducted. The results of LM tests can suggest the inclusion of additional parameters for a better model fit. If the uncorrelated factor model is indeed unreasonable, it is shown in the results of the LM tests.

In Output 88.4.1, the initial model specification is echoed in matrix form. The observed variables and factors are also displayed.

Output 88.4.1 Uncorrelated Factor Model Specification
Variables in the Model
Variables reading1 reading2 reading3 math1 math2 math3 writing1 writing2 writing3
Factors Read_Factor Math_Factor Write_Factor
Number of Variables = 9
Number of Factors = 3

Initial Factor Loading Matrix
  Read_Factor Math_Factor Write_Factor
reading1
.
[load1]
0
 
0
 
reading2
.
[load2]
0
 
0
 
reading3
.
[load3]
0
 
0
 
math1
0
 
.
[load4]
0
 
math2
0
 
.
[load5]
0
 
math3
0
 
.
[load6]
0
 
writing1
0
 
0
 
.
[load7]
writing2
0
 
0
 
.
[load8]
writing3
0
 
0
 
.
[load9]

Initial Factor Covariance Matrix
  Read_Factor Math_Factor Write_Factor
Read_Factor 1.0000 0 0
Math_Factor 0 1.0000 0
Write_Factor 0 0 1.0000

Initial Error Variances
Variable Parameter Estimate
reading1 errvar1 .
reading2 errvar2 .
reading3 errvar3 .
math1 errvar4 .
math2 errvar5 .
math3 errvar6 .
writing1 errvar7 .
writing2 errvar8 .
writing3 errvar9 .

In the table for initial factor loading matrix, the nine loading parameters are shown to have simple cluster relations with the factors. In the table for initial factor covariance matrix, the diagonal matrix shows that the factors are not correlated. The diagonal elements are fixed at ones so that this matrix is also a correlation matrix for the factors. In the table for initial error variances, the nine variance parameters are shown. No initial estimates were specified, as indicated by the missing values '.'.

In Output 88.4.2, initial estimates are generated by the instrumental variable method and the McDonald method.

Output 88.4.2 Optimization of the Uncorrelated Factor Model: Initial Estimates
Initial Estimation Methods
1 Instrumental Variables Method
2 McDonald Method

Optimization Start
Parameter Estimates
N Parameter Estimate Gradient
1 load1 7.15372 0.00851
2 load2 7.80225 -0.00170
3 load3 8.70856 -0.00602
4 load4 7.68637 0.00272
5 load5 8.01765 -0.01096
6 load6 7.05012 0.00932
7 load7 8.76776 -0.0009955
8 load8 5.96161 -0.01335
9 load9 7.23168 0.01665
10 errvar1 31.84831 -0.00179
11 errvar2 47.36790 0.0003461
12 errvar3 23.50199 0.00257
13 errvar4 23.13374 -0.0008384
14 errvar5 31.84224 0.00280
15 errvar6 38.92075 -0.00167
16 errvar7 13.86035 -0.00579
17 errvar8 61.00217 0.00115
18 errvar9 46.14784 -0.00300
Value of Objective Function = 0.9103815918

These initial estimates turn out to be pretty good, in the sense that only three more iterations are needed to converge to the maximum likelihood estimates and the final function value does not change much from the initial function value , as shown in Output 88.4.3.

Output 88.4.3 Optimization of the Uncorrelated Factor Model: Iteration Summary
Iteration   Restarts Function
Calls
Active
Constraints
  Objective
Function
Objective
Function
Change
Max Abs
Gradient
Element
Lambda Ratio
Between
Actual
and
Predicted
Change
1   0 4 0   0.78792 0.1225 0.00175 0 0.932
2   0 6 0   0.78373 0.00419 0.000037 0 1.051
3   0 8 0   0.78373 5.087E-7 3.715E-9 0 1.001

Optimization Results
Iterations 3 Function Calls 11
Jacobian Calls 5 Active Constraints 0
Objective Function 0.783733415 Max Abs Gradient Element 3.7146571E-9
Lambda 0 Actual Over Pred Change 1.0006660673
Radius 0.0025042942    

Convergence criterion (ABSGCONV=0.00001) satisfied.

The fit summary is shown in Output 88.4.4.

Output 88.4.4 Fit of the Uncorrelated Factor Model
Fit Summary
Modeling Info N Observations 64
  N Variables 9
  N Moments 45
  N Parameters 18
  N Active Constraints 0
  Independence Model Chi-Square 272.0467
  Independence Model Chi-Square DF 36
Absolute Index Fit Function 0.7837
  Chi-Square 49.3752
  Chi-Square DF 27
  Pr > Chi-Square 0.0054
  Z-Test of Wilson & Hilferty 2.5474
  Hoelter Critical N 53
  Root Mean Square Residual (RMSR) 19.5739
  Standardized RMSR (SRMSR) 0.2098
  Goodness of Fit Index (GFI) 0.8555
Parsimony Index Adjusted GFI (AGFI) 0.7592
  Parsimonious GFI 0.6416
  RMSEA Estimate 0.1147
  RMSEA Lower 90% Confidence Limit 0.0617
  RMSEA Upper 90% Confidence Limit 0.1646
  Probability of Close Fit 0.0271
  ECVI Estimate 1.4630
  ECVI Lower 90% Confidence Limit 1.2069
  ECVI Upper 90% Confidence Limit 1.8687
  Akaike Information Criterion -4.6248
  Bozdogan CAIC -89.9146
  Schwarz Bayesian Criterion -62.9146
  McDonald Centrality 0.8396
Incremental Index Bentler Comparative Fit Index 0.9052
  Bentler-Bonett NFI 0.8185
  Bentler-Bonett Non-normed Index 0.8736
  Bollen Normed Index Rho1 0.7580
  Bollen Non-normed Index Delta2 0.9087
  James et al. Parsimonious NFI 0.6139

Using the chi-square model test criterion, the uncorrelated factor model should be rejected at . The RMSEA estimate is , which is not indicative of a good fit according to Browne and Cudeck (1993). Other indices might suggest only a marginal good fit. For example, Bentler’s comparative fit index and Bollen non-normed index delta2 are both above 0.90. However, many other do not attain this 0.90 level. For example, adjusted GFI is only . It is thus safe to conclude that there could be some improvements on the model fit.

The MODIFICATION option in the PROC TCALIS statement has been used to request for computing the LM test indices for model modifications. The results are shown in Output 88.4.5.

Output 88.4.5 Lagrange Multiplier Tests
Rank Order of the 10 Largest LM Stat for Factor Loadings
Variable Factor LM Stat Pr > ChiSq Parm
Change
writing1 Read_Factor 9.76596 0.0018 2.95010
math3 Write_Factor 3.58077 0.0585 1.89703
math1 Read_Factor 2.15312 0.1423 1.17976
writing3 Math_Factor 1.87637 0.1707 1.41298
math3 Read_Factor 1.02954 0.3103 0.95427
reading2 Write_Factor 0.91230 0.3395 0.99933
writing2 Math_Factor 0.86221 0.3531 0.95672
reading1 Write_Factor 0.63403 0.4259 0.73916
math1 Write_Factor 0.55602 0.4559 0.63906
reading2 Math_Factor 0.55362 0.4568 0.74628

Rank Order of the 3 Largest LM Stat for Covariances of Factors
Var1 Var2 LM Stat Pr > ChiSq Parm
Change
Write_Factor Read_Factor 8.95268 0.0028 0.44165
Write_Factor Math_Factor 7.07904 0.0078 0.40132
Math_Factor Read_Factor 4.61896 0.0316 0.30411

Rank Order of the 10 Largest LM Stat for Error Variances and Covariances
Error
of
Error
of
LM Stat Pr > ChiSq Parm
Change
writing1 math2 5.45986 0.0195 -13.16822
writing1 math1 5.05573 0.0245 12.32431
writing3 math3 3.93014 0.0474 13.59149
writing3 math1 2.83209 0.0924 -9.86342
writing2 reading1 2.56677 0.1091 10.15901
writing2 math2 1.94879 0.1627 8.40273
writing2 reading3 1.75181 0.1856 -7.82777
writing3 reading1 1.57978 0.2088 -7.97915
writing1 reading2 1.34894 0.2455 7.77158
writing2 math3 1.11704 0.2906 -7.23762

Three different tables for ranking the LM test results are shown. In the first table, the new loading parameters that would improve the model fit the most are shown first. For example, in the first row a new factor loading of writing1 on the Read_Factor is suggested to improve the model fit the most. The LM Stat value is . This is an approximation of the chi-square drop if this parameter was included in the model. The Pr > ChiSq value of indicates a significant improvement of model fit at . Nine more new loading parameters are suggested in the table, with less and less statistical significance in the change of model fit chi-square. Note that these approximate chi-squares are one-at-a-time chi-square changes. That means that the overall chi-square drop is not a simple sum of individual chi-square changes when you include two or more new parameters in the modified model.

The other two tables in Output 88.4.5 shows the new parameters in factor covariances, error variances, or error covariances that would result in a better model fit. The table for the new parameters of the factor covariance matrix indicates that adding each of the covariances among factors might lead to a statistically significant improvement in model fit. This confirms with the initial argument that uncorrelated factors might not reasonable in this case—it fails to explain the covariances among observed variables through the correlations among latent factors. The largest LM Stat value in this table is , which is smaller than that of the largest LM Stat for the factor loading parameters. Despite this, it is more reasonable to add the covariance parameters among factors first to determine whether that improves the model fit. To do this, you need to add the COV statement for specifying the covariances among factors to the original code. The following statements are used to specify the modified factor model with covariances among factors:

   proc tcalis data=cognitive1 nobs=64;
      factor
         Read_Factor   -> reading1-reading3  = load1-load3,
         Math_Factor   -> math1-math3        = load4-load6,
         Write_Factor  -> writing1-writing3  = load7-load9;
      pvar
         Read_Factor Math_Factor Write_Factor = 3 * 1.,
         reading1-reading3 math1-math3 writing1-writing3 = errvar1-errvar9;
      cov Read_Factor Math_Factor Write_Factor = fcov1-fcov3;
   run;

The fit summary is shown in Output 88.4.6.

Output 88.4.6 Fit of the Correlated Factor Model
Fit Summary
Modeling Info N Observations 64
  N Variables 9
  N Moments 45
  N Parameters 21
  N Active Constraints 0
  Independence Model Chi-Square 272.0467
  Independence Model Chi-Square DF 36
Absolute Index Fit Function 0.4677
  Chi-Square 29.4667
  Chi-Square DF 24
  Pr > Chi-Square 0.2031
  Z-Test of Wilson & Hilferty 0.8320
  Hoelter Critical N 79
  Root Mean Square Residual (RMSR) 5.7038
  Standardized RMSR (SRMSR) 0.0607
  Goodness of Fit Index (GFI) 0.9109
Parsimony Index Adjusted GFI (AGFI) 0.8330
  Parsimonious GFI 0.6073
  RMSEA Estimate 0.0601
  RMSEA Lower 90% Confidence Limit .
  RMSEA Upper 90% Confidence Limit 0.1244
  Probability of Close Fit 0.3814
  ECVI Estimate 1.2602
  ECVI Lower 90% Confidence Limit .
  ECVI Upper 90% Confidence Limit 1.5637
  Akaike Information Criterion -18.5333
  Bozdogan CAIC -94.3465
  Schwarz Bayesian Criterion -70.3465
  McDonald Centrality 0.9582
Incremental Index Bentler Comparative Fit Index 0.9768
  Bentler-Bonett NFI 0.8917
  Bentler-Bonett Non-normed Index 0.9653
  Bollen Normed Index Rho1 0.8375
  Bollen Non-normed Index Delta2 0.9780
  James et al. Parsimonious NFI 0.5945

The model fit chi-square value is , which is about less than the model with uncorrelated factors. The -value is , indicating a fairly satisfactory model fit. The RMSEA value is , which is close to , a value recommended as an indication of good model fit by Browne and Cudeck (1993). More fit indices that do not attain the level with the uncorrelated factor model now have values close to or above . These include the goodness-of-fit index (GFI), McDonald centrality, Bentler-Bonnet NFI, and Bentler-Bonnet non-normed index. By all counts, the correlated factor model is a much better fit than the uncorrelated factor model.

In Output 88.4.7, the estimation results for factor loadings are shown. All these loadings are statistically significant, indicating non-chance relationships with the factors.

Output 88.4.7 Estimation of the Factor Loading Matrix
Factor Loading Matrix: Estimate/StdErr/t-value
  Read_Factor Math_Factor Write_Factor
reading1
6.7657
1.0459
6.4689
[load1]
0
 
 
 
0
 
 
 
reading2
7.8579
1.1890
6.6090
[load2]
0
 
 
 
0
 
 
 
reading3
9.1344
1.0712
8.5269
[load3]
0
 
 
 
0
 
 
 
math1
0
 
 
 
7.5488
1.0128
7.4536
[load4]
0
 
 
 
math2
0
 
 
 
8.4401
1.0838
7.7874
[load5]
0
 
 
 
math3
0
 
 
 
6.8194
1.0910
6.2506
[load6]
0
 
 
 
writing1
0
 
 
 
0
 
 
 
7.9677
1.1254
7.0797
[load7]
writing2
0
 
 
 
0
 
 
 
6.8742
1.1986
5.7350
[load8]
writing3
0
 
 
 
0
 
 
 
7.0949
1.2057
5.8844
[load9]

In Output 88.4.8, the factor covariance matrix is shown. Because the diagonal elements are all ones, the off-diagonal elements are correlations among factors. The correlations range from . These factors are moderately correlated.

Output 88.4.8 Estimation of the Correlations of Factors
Factor Covariance Matrix: Estimate/StdErr/t-value
  Read_Factor Math_Factor Write_Factor
Read_Factor
1.0000
 
 
 
0.3272
0.1311
2.4955
[fcov1]
0.4810
0.1208
3.9813
[fcov2]
Math_Factor
0.3272
0.1311
2.4955
[fcov1]
1.0000
 
 
 
0.3992
0.1313
3.0417
[fcov3]
Write_Factor
0.4810
0.1208
3.9813
[fcov2]
0.3992
0.1313
3.0417
[fcov3]
1.0000
 
 
 

In Output 88.4.9, the error variances for variables are shown.

Output 88.4.9 Estimation of the Error Variances
Error Variances
Variable Parameter Estimate Standard
Error
t Value
reading1 errvar1 37.24939 8.33997 4.46637
reading2 errvar2 46.49695 10.69869 4.34604
reading3 errvar3 15.90447 9.26097 1.71737
math1 errvar4 25.22889 7.72269 3.26685
math2 errvar5 24.89032 8.98327 2.77074
math3 errvar6 42.12110 9.20362 4.57658
writing1 errvar7 27.24965 10.36489 2.62903
writing2 errvar8 49.28881 11.39812 4.32429
writing3 errvar9 48.10684 11.48868 4.18733

All values except the one for reading3 are bigger than , a value close to a critical -value at . This means that the error variance for reading3 could have been zero in the population, or it could have been nonzero but the current sample just has this nonsignificant value by chance (that is, a Type 2 error). Further research is needed to confirm either way.

In addition to the parameter estimation results, PROC TCALIS also outputs supplementary results that could be useful for interpretations. In Output 88.4.10, the squared multiple correlations and the factor scores regression coefficients are shown.

Output 88.4.10 Supplementary Estimation Results
Squared Multiple Correlations
Variable Error Variance Total Variance R-Square
reading1 37.24939 83.02400 0.5513
reading2 46.49695 108.24300 0.5704
reading3 15.90447 99.34100 0.8399
math1 25.22889 82.21400 0.6931
math2 24.89032 96.12500 0.7411
math3 42.12110 88.62500 0.5247
writing1 27.24965 90.73400 0.6997
writing2 49.28881 96.54300 0.4895
writing3 48.10684 98.44500 0.5113

Factor Scores Regression Coefficients
  Read_Factor Math_Factor Write_Factor
reading1 0.02001 0.0006807 0.00198
reading2 0.01861 0.0006334 0.00185
reading3 0.06326 0.00215 0.00628
math1 0.00112 0.04035 0.00281
math2 0.00127 0.04572 0.00318
math3 0.0006068 0.02183 0.00152
writing1 0.00319 0.00274 0.05128
writing2 0.00152 0.00131 0.02446
writing3 0.00161 0.00138 0.02587

The percentages of variance for the observed variables that can be explained by the factors are shown in the R-Square column of the table for squared multiple correlations (R-squares). These R-squares can be interpreted meaningfully because there is no reciprocal relationships among variables or correlated errors in the model. All estimates of R-squares are bounded between and .

In the table for factor scores regression coefficients, entries are coefficients for the variables you can use to create the factor scores. The larger the coefficient, the more influence of the corresponding variable for creating the factor scores. It makes intuitive sense to see the cluster pattern of these coefficients—the reading measures are more important to create the latent variable scores of Read_Factor and so on.


Note: This procedure is experimental.

Previous Page | Next Page | Top of Page