|The TCALIS Procedure|
In this example, confirmatory higher-order and hierarchical factor models are fitted by PROC TCALIS.
In higher-order factor models, factors are at different levels. The higher-order factors explain the relationships among factors at the next lower level, in the same way that the first-order factors explain the relationships among manifest variables. For example, in a two-level higher factor model you have nine manifest variables V1–V9 with three first-order factors F1–F3. The first-order factor pattern of the model might appear like the following:
F1 F2 F3 V1 x V2 x V3 x V4 x V5 x V6 x V7 x V8 x V9 x
where each "x" marks a nonzero factor loading and all other unmarked entries are fixed zeros in the model. To explain the correlations among the first-order factors, a second-order factor F4 is hypothesized with the following second-order factor pattern:
F4 F1 x F2 x F3 x
If substantiated by your theory, you might have higher-order factor models with more than two levels.
In hierarchical factor models, all factors are at the same (first-order) level but are different in their clusters of manifest variables related. Using the terminology of Yung, Thissen, and McLeod (1999), factors in hierarchical factor models are classified into "layers." The factors in the first layer partition the manifest variables into clusters so that each factor has a distinct cluster of related manifest variables. This part of the factor pattern of the hierarchical factor model is similar to that of the first-order factor model for manifest variables. The next layer of factors in the hierarchical factor model again partitions the manifest variables into clusters. However, this time each cluster contains at least two clusters of manifest variables that are formed in the previous layer. For example, the following is a factor pattern of a confirmatory hierarchical factor model with two layers:
First Layer | Second Layer F1 F2 F3 | F4 V1 x | x V2 x | x V3 x | x V4 x | x V5 x | x V6 x | x V7 x | x V8 x | x V9 x | x
F1–F3 are first-layer factors and F4 is the only second-layer factor. This special kind of two-layer hierarchical pattern is also known as a bifactor solution. In a bifactor solution, there are two classes of factors—group factors and a general factor. For example, in the preceding hierarchical factor pattern F1–F3 are group factors for different abilities and F4 is a general factor such as "intelligence" (see, for example, Holzinger and Swineford 1937). See Mulaik and Quartetti (1997) for more examples and distinctions among various types of hierarchical factor models. Certainly, if substantiated by your theory, hierarchical factor models with more than two layers are possible.
In this example, you use PROC TCALIS to fit these two types of confirmatory factor models. First, you fit a second-order factor model to a real data set. Then you fit a bifactor model to the same data set. In the final section of this example, an informal account of the relationship between the higher-order and hierarchical factor models is attempted. Techniques for constraining parameters using PROC TCALIS are also shown. This final section might be too technical in the first reading. Interested readers are referred to articles by Mulaik and Quartetti (1997), Schmid and Leiman (1957), and Yung, Thissen, and McLeod (1999), for more details.
In this section, a second-order confirmatory factor analysis model is applied to a correlation matrix of Thurstone reported by McDonald (1985). The correlation matrix is read into a SAS data set in the following statements:
data Thurst(type=corr); title "Example of THURSTONE resp. McDONALD (1985, p.57, p.105)"; _type_ = 'corr'; input _name_ $ V1-V9; label V1='Sentences' V2='Vocabulary' V3='Sentence Completion' V4='First Letters' V5='Four-letter Words' V6='Suffices' V7='Letter series' V8='Pedigrees' V9='Letter Grouping'; datalines; V1 1. . . . . . . . . V2 .828 1. . . . . . . . V3 .776 .779 1. . . . . . . V4 .439 .493 .460 1. . . . . . V5 .432 .464 .425 .674 1. . . . . V6 .447 .489 .443 .590 .541 1. . . . V7 .447 .432 .401 .381 .402 .288 1. . . V8 .541 .537 .534 .350 .367 .320 .555 1. . V9 .380 .358 .359 .424 .446 .325 .598 .452 1. ;
Variables in this data set are measures of cognitive abilities. Three factors are assumed for these nine variable V1–V9. These three factors are the first-order factors in the analysis. A second-order factor is also assumed to explain the correlations among the three first-order factors.
The following statements define a second-order factor model by using the LINEQS modeling language.
proc tcalis corr data=Thurst method=max nobs=213 nose nostand; lineqs V1 = X11 Factor1 + E1, V2 = X21 Factor1 + E2, V3 = X31 Factor1 + E3, V4 = X42 Factor2 + E4, V5 = X52 Factor2 + E5, V6 = X62 Factor2 + E6, V7 = X73 Factor3 + E7, V8 = X83 Factor3 + E8, V9 = X93 Factor3 + E9, Factor1 = L1g FactorG + E10, Factor2 = L2g FactorG + E11, Factor3 = L3g FactorG + E12; std FactorG = 1. , E1-E12 = U1-U9 W1-W3; bounds 0. <= U1-U9; fitindex ON(ONLY)=[chisq df probchi]; /* SAS Programming Statements: Dependent parameter definitions */ W1 = 1. - L1g * L1g; W2 = 1. - L2g * L2g; W3 = 1. - L3g * L3g; run;
In the first nine equations of the LINEQS statement, variables V1–V3 are manifest indicators of latent factor Factor1, variables V4–V6 are manifest indicators of latent factor Factor2, and variables V7–V9 are manifest indicators of latent factor Factor3. In the last three equations of the LINEQS statement, the three first-order factors Factor1–Factor3 are predicted by a common source: FactorG. Hence, Factor1–Factor3 are correlated due to the common source FactorG in the model.
An error term is added to each equation in the LINEQS statement. These error terms E1–E12 are needed because the factors are not assumed to be perfect predictors of the corresponding outcome variables.
In the STD statement, you specify variance parameters for all independent or exogenous variables in the model: FactorG, and E1–E12. The variance of FactorG is fixed at one for identification. Variances for E1–E9 are given parameter names U1–U9, respectively. Variances for E10–E12 are given parameter names W1–W3, respectively. Note that for model identification purposes, W1–W3 are defined as dependent parameters in the SAS programming statements. That is,
These dependent parameter definitions ensure that the variances for Factor1–Factor3 are fixed at ones for identification.
In the BOUNDS statement, you specify that variance parameters U1–U9 must be positive in the solution.
In addition to the statements for model specification, options are used to control the output. In the PROC TCALIS statement, the NOSE and NOSTAND options suppress the display of standard errors and standardized results. In the FITINDEX statement, the ON(ONLY)= option requests only the model fit chi-square and its associated degrees of freedom and -value be shown in the fit summary table. Using options in PROC TCALIS to reduce the amount the of printout is a good practice. It makes your output more focused, as you output only what you need in a particular situation.
In Output 88.8.1, parameters and their initial values, gradients, and bounds are shown.
|N||Parameter||Estimate||Gradient||Lower Bound||Upper Bound|
|Value of Objective Function = 0.5693888709|
The first table contains all the independent parameters. There are twenty-one in total. Parameters W1–W3, which are defined in the SAS programming statements, are shown in the next table for dependent parameters. Their initial values are computed as functions of initial independent parameters.
Output 88.8.2 shows information about optimization—iteration history and the convergence status.
|Active Constraints||0||Objective Function||0.5693888709|
|Max Abs Gradient Element||0.5749163348||Radius||1.8533033852|
|Jacobian Calls||13||Active Constraints||0|
|Objective Function||0.1801712146||Max Abs Gradient Element||8.5605681E-6|
|Lambda||0||Actual Over Pred Change||1.3969225014|
First, there are 21 independent parameters in the optimization by using 45 "Functions (Observations)." The so-called functions refer to the moments in the model that are structured with parameters. Nine lower bounds, which are specified for the error variance parameters, are specified in the optimization. The next table for iteration history shows that the optimization stops in 11 iterations. The notes at the bottom of table show that the solution converges without problems.
In the fit summary table shown in Output 88.8.3, the chi-square model fit value is 38.196, with =24, and =0.033. This indicates a satisfactory model fit.
The fitted equations with final estimates are shown in Output 88.8.4. Interpretations of these loadings are not done here. The last table in this output shows various variance estimates. These estimates are classified by whether they are for the latent variables, error variables, or disturbance variables.
For illustration purposes, you might check whether the model constraints put on the variances of Factor1–Factor3 are honored (although this should have been taken care of in the optimization). For example, the variance of Factor1 should be:
Extracting the estimates from the output, you have:
So, this model constraint is verified.
A bifactor model (or a hierarchical factor model with two layers) for the same data set is now considered. In this model, the same set of factors as in the preceding higher-order factor model are used. The most notable difference is that the second-order factor FactorG in the higher-order factor model is no longer a factor of the first-order factors Factor1–Factor3. Instead, FactorG, like Factor1–Factor3, now also serves as a factor of the observed variable V1–V9. Unlike Factor1–Factor3, FactorG is considered to be a general factor in the sense that all observed variables have direct functional relationships with it. In contrast, Factor1–Factor3 are group factors in the sense that each of them has a direct functional relationship with only one group of observed variables. Because of the coexistence of a general factor and group factors at the same factor level, such a hierarchical model is also called a bifactor model.
The bifactor model is specified in the following statements:
proc tcalis corr data=Thurst method=max nobs=213 nose nostand; lineqs V1 = X11 Factor1 + X1g FactorG + E1, V2 = X21 Factor1 + X2g FactorG + E2, V3 = X31 Factor1 + X3g FactorG + E3, V4 = X42 Factor2 + X4g FactorG + E4, V5 = X52 Factor2 + X5g FactorG + E5, V6 = X62 Factor2 + X6g FactorG + E6, V7 = X73 Factor3 + X7g FactorG + E7, V8 = X83 Factor3 + X8g FactorG + E8, V9 = X93 Factor3 + X9g FactorG + E9; std Factor1-Factor3 = 3 * 1., FactorG = 1. , E1-E9 = U1-U9; bounds 0. <= U1-U9; fitindex ON(ONLY)=[chisq df probchi]; run;
In the LINEQS statement, there are only nine equations for the manifest variables in the model. Unlike the second-order factor model fitted previously, Factor1–Factor3 are no longer functionally related to FactorG and therefore there are no equations with Factor1–Factor3 as outcome variables.
All factors in the bifactor model are uncorrelated. The factor variances are all fixed at in the STD statement. The variance parameters for E1–E9 are named U1–U9, respectively. The BOUNDS statement, again, is specified so that only positive estimates are accepted for error variance estimates. Like the previous PROC TCALIS run, options are specified in the PROC TCALIS and the FITINDEX statements to reduce the amount of default output.
There are more parameters in this model than in the preceding higher-order factor model, as shown in Output 88.8.5, which shows the optimization information.
|Active Constraints||0||Objective Function||0.8380304146|
|Max Abs Gradient Element||2.4076251809||Radius||20.596787596|
|Jacobian Calls||16||Active Constraints||0|
|Objective Function||0.1142278162||Max Abs Gradient Element||4.9090342E-6|
|Lambda||0||Actual Over Pred Change||1.4423534599|
There are parameters in the bifactor model: nine for the loadings on the group factors Factor1–Factor3, nine for the loadings on the general factor FactorG, and nine for the variances of errors E1–E9. The optimization converges in iterations without problems.
A fit summary table is shown in Output 88.8.6
The fit of this model is quite good. The chi-square value is , with =18 and =0.148. This is expected because the bifactor model has more parameters than the second-order factor model, which already has a good fit with fewer parameters.
Estimation results are shown in Output 88.8.7. Estimates are left uninterpreted because they are not the main interest of this example.
One might ask whether this bifactor (hierarchical) model provides a significantly better fit than the previous second-order model. Can one use a chi-square difference test for nested models to answer this question? The answer is yes.
Although it is not obvious that the previous second-order factor model is nested within the current bifactor model, a general nested relationship between the higher-order factor and the hierarchical factor model is formally proved by Yung, Thissen, and McLeod (1999). Therefore, a chi-square difference test can be conducted using the following DATA step:
data _null_; df0 = 24; chi0 = 38.1963; df1 = 18; chi1 = 24.2163; diff = chi0-chi1; p = 1.-probchi(chi0-chi1,df0-df1); put 'Chi-square difference = ' diff; put 'p-value = ' p; run;
The results are shown in the following:
The chi-square difference is , with =6 and =0.02986. If -level is set at , the bifactor model indicates a significantly better fit. But if -level is set at , statistically the two models fit equally well to the data.
In the next section, it is demonstrated that the second-order factor model is indeed nested within the bifactor model, and hence the chi-square test conducted in the previous section is justified. In addition, through the demonstration of the nested relationship between the two classes of models, you can see how some parameter constraints in structural equation model can be set up in PROC TCALIS.
For some practical researchers, the technical details involved in the next section might not be of interest and therefore could be skipped.
To demonstrate that the second-order factor model is indeed nested within the bifactor model, a constrained bifactor model is fitted in this section. This constrained bifactor model is essentially the same as the preceding bifactor model, but with additional constraints on the factor loadings. Hence, the constrained bifactor model is nested within the unconstrained bifactor model.
Furthermore, if it can be shown that the constrained bifactor model is equivalent to the previous second-order factor, then the second-order factor model must also be nested within the unconstrained bifactor model. As a result, it justifies the chi-square difference test conducted in the previous section.
The construction of such a constrained bifactor model is based on Yung, Thissen, and McLeod (1999). In the following statements, a constrained bifactor model is specified.
proc tcalis corr data=Thurst method=max nobs=213 nose nostand; lineqs V1 = X11 Factor1 + X1g FactorG + E1, V2 = X21 Factor1 + X2g FactorG + E2, V3 = X31 Factor1 + X3g FactorG + E3, V4 = X42 Factor2 + X4g FactorG + E4, V5 = X52 Factor2 + X5g FactorG + E5, V6 = X62 Factor2 + X6g FactorG + E6, V7 = X73 Factor3 + X7g FactorG + E7, V8 = X83 Factor3 + X8g FactorG + E8, V9 = X93 Factor3 + X9g FactorG + E9; std Factor1-Factor3 = 3 * 1., FactorG = 1. , E1-E9 = U1-U9; bounds 0. <= U1-U9; fitindex ON(ONLY)=[chisq df probchi]; parameters p1 (.5) p2 (.5) p3 (.5); /* Proportionality constraints */ X1g = p1 * X11; X2g = p1 * X21; X3g = p1 * X31; X4g = p2 * X42; X5g = p2 * X52; X6g = p2 * X62; X7g = p3 * X73; X8g = p3 * X83; X9g = p3 * X93; run;
In this constrained model, a PARAMETERS statement and nine SAS programming statements are added to the previous bifactor model. In the PARAMETERS statement, three new independent parameters are added: p1, p2, and p3. These parameters represent the proportions that constrain the factor loadings of the observed variables on the group factors Factor1–Factor3 and the general factor FactorG. They are all free parameters and have initial values at . The next nine SAS programming statements represent the proportionality constraints imposed. For example, X1g–X3g are now dependent parameters expressed as functions of p1, X11, X21, and X31. Adding three new parameters (in the PARAMETERS statement) and redefining nine original parameters as dependent (in the SAS programming statements) is equivalent to adding six () constraints to the original bifactor model. Mathematically, the additional statements in specifying the constrained bifactor model realizes the following six constraints:
Obviously, with these six constraints the current constrained bifactor model is nested within the unconstrained version. What remains to be shown is that this constrained bifactor model is indeed equivalent to the previous second-order factor model. If so, the second-order factor model is also nested within the unconstrained bifactor model. One evidence for the equivalence of the current constrained bifactor model and the second-order factor model is from the fit summary table shown in Output 88.8.10. But first, it is also useful to consider the optimization information of the constrained bifactor model, which is shown in Output 88.8.9.
|Active Constraints||0||Objective Function||12.813623548|
|Max Abs Gradient Element||0.2163121033||Radius||1|
|Jacobian Calls||20||Active Constraints||0|
|Objective Function||0.1801712146||Max Abs Gradient Element||9.004196E-6|
|Lambda||0||Actual Over Pred Change||1.3918856965|
As shown Output 88.8.9, there are 21 independent parameters in the constrained bifactor model for the "Functions (Observations)." These numbers match those of the second-order factor model exactly. The optimization shows some problems in initial iterations. The iteration numbers with asterisks indicate that the Hessian matrix is not positive definite in those iterations. But as long as the final converged iteration is not marked with this asterisk, the problems exhibited in early iterations do not raise any concern, as in the current case. Next, the fit summary is shown in Output 88.8.10.
In Output 88.8.10, the chi-square value in the fit summary table is , with =24, and =0.033. Again, these numbers match those of the second-order factor model exactly. These matches (same model fit with the same number of parameters) are necessary (but not sufficient) to show that the constrained bifactor model is equivalent to the second-order model. Stronger evidence is now presented.
In Output 88.8.11, estimation results of the constrained bifactor model are shown.
According to Yung, Thissen, and McLeod (1999), two models are equivalent if there is a one-to-one correspondence of the parameters in the models. This fact is illustrated for the constrained bifactor model and the second-order factor model.
First, the error variances for E1–E9 in the second-order factor model are transformed directly (using an identity map) to that of the bifactor models. The nine error variances in Output 88.8.4 for the second-order factor model match those of the constrained bifactor model exactly in Output 88.8.11. In addition, the variances of factors are fixed at one in both models.
The error variances and the factor loadings at both factor levels in Output 88.8.4 for the second-order factor model are now transformed to yield the loading estimates in the constrained bifactor model. Denote as the first-order factor loading matrix, as the second-order factor loading matrix, and be the matrix of variances for disturbances. That is, for the second-order factor model,
According to Yung, Thissen, and McLeod (1999), the transformation to obtain the estimates in the equivalent constrained bifactor model is:
where is the matrix of the first-layer factor loadings (that is, loadings on group factors Factor1–Factor3), and is the matrix of the second-layer factor loadings (that is, loadings on FactorG) in the constrained bifactor model. Carrying out the matrix calculations for and shows that:
With very minor numerical differences and ignorable sign changes, these transformation results match the estimated loadings observed in Output 88.8.11 for the constrained bifactor model. Therefore, the second-order factor model is shown to be equivalent to the constrained bifactor model, and hence is nested within the unconstrained bifactor model.
Note: This procedure is experimental.