The CALIS Procedure

Example 29.16 Comparing the ML and FIML Estimation

This example uses the complete data set from Example 29.12 to illustrate how the maximum likelihood (ML) and full information maximum likelihood (FIML) methods are theoretically equivalent when you apply them to data set without missing values. In Example 29.15, you apply a confirmatory factor model to a data set with missing values. You find that with METHOD=FIML, you can get more stable estimates than with METHOD=ML (which is the default estimation method). Near the end of Example 29.15, you learn that ML and FIML are theoretically equivalent estimation methods when you apply them to data sets without missing values.

However, the ML and FIML methods have two major computational differences in their implementations in PROC CALIS. First, with METHOD=FIML the first-order properties (that is, the means of the variables) of the data are automatically included in the analysis. However, by default you analyze only the second-order properties (that is, the covariances of the variables) with METHOD=ML. Second, the biased sample covariance formula (with N as the variance divisor) is used with METHOD=FIML, while the unbiased sample covariance formula (with DF=N – 1 as the variance divisor) is used with METHOD=ML. See the section Relationships among Estimation Criteria for more details about the similarities and differences between the ML and FIML methods.

If you take care of these two differences between ML and FIML in PROC CALIS, you can obtain exactly the same results with these two methods when you apply them to data sets without missing values.

For example, with the complete data set scores from Example 29.12, you specify the FIML estimation in the following statements:

proc calis method=fiml data=scores;
   factor
      verbal ===> x1-x3,
      math   ===> y1-y3;
   pvar
      verbal = 1.,
      math   = 1.;
run;

An equivalent specification with the ML method is shown in the following statements:

proc calis method=ml meanstr vardef=n data=scores;
   factor
      verbal ===> x1-x3,
      math   ===> y1-y3;
   pvar
      verbal = 1.,
      math   = 1.;
run;

In the PROC CALIS statement, you specify two options to make the ML estimation exactly equivalent to the FIML estimation in PROC CALIS. First, the MEANSTR option requests the first-order properties (the mean structures) to be analyzed with the covariance structures. Second, the VARDEF=N option defines the variance divisor to N, instead of the default DF, which is the same as N–1. These two options make the ML estimation equivalent to the FIML estimation.

Output 29.16.1 and Output 29.16.2 show some fit summary statistics under the FIML and ML methods, respectively.

Output 29.16.1: Model Fitting by the FIML Method: Scores Data

Fit Summary
Fit Function 31.7837
Chi-Square 10.1215
Chi-Square DF 8
Pr > Chi-Square 0.2566
Standardized RMR (SRMR) 0.0571
RMSEA Estimate 0.0910
Bentler Comparative Fit Index 0.9872
NOTE: Saturated mean structure parameters are
excluded from the computations of fit indices.



Output 29.16.2: Model Fitting by the ML Method: Scores Data

Fit Summary
Fit Function 0.3163
Chi-Square 10.1215
Chi-Square DF 8
Pr > Chi-Square 0.2566
Standardized RMR (SRMR) 0.0504
RMSEA Estimate 0.0910
Bentler Comparative Fit Index 0.9872



Except for the fit function values, both FIML and ML methods produce the same set of fit statistics. The difference in the fit function values is expected because the FIML function has a constant term which is derived from the likelihood function. This constant term does not depend on the model parameters. Hence, the FIML and ML discrepancy functions that are used in PROC CALIS are equivalent when VARDEF=N is used in the ML method for analyzing mean and covariance structures.

The parameter estimates are shown in Output 29.16.3 and Output 29.16.4 for the FIML and ML methods, respectively. Except for very tiny numerical differences in some estimates, the FIML and ML estimates match.

Output 29.16.3: Parameter Estimates by the FIML Method: Scores Data

Factor Loading Matrix: Estimate/StdErr/t-value/p-value
  verbal math
x1
5.7486
0.9692
5.9315
<.0001
[_Parm1]
0
 
 
 
 
x2
5.7265
0.9278
6.1720
<.0001
[_Parm2]
0
 
 
 
 
x3
4.5886
0.7562
6.0676
<.0001
[_Parm3]
0
 
 
 
 
y1
0
 
 
 
 
5.1972
0.6796
7.6477
<.0001
[_Parm4]
y2
0
 
 
 
 
4.1342
0.6036
6.8489
<.0001
[_Parm5]
y3
0
 
 
 
 
3.7004
0.6177
5.9904
<.0001
[_Parm6]

Factor Covariance Matrix: Estimate/StdErr/t-value/p-value
  verbal math
verbal
1.0000
 
 
 
 
0.5175
0.1433
3.6113
0.000305
[_Add01]
math
0.5175
0.1433
3.6113
0.000305
[_Add01]
1.0000
 
 
 
 

Intercepts
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
x1 _Add02 19.90625 1.17540 16.9357 <.0001
x2 _Add03 18.81250 1.14089 16.4893 <.0001
x3 _Add04 18.68750 0.92749 20.1486 <.0001
y1 _Add05 17.90625 0.93161 19.2208 <.0001
y2 _Add06 17.84375 0.78823 22.6377 <.0001
y3 _Add07 17.75000 0.76419 23.2272 <.0001

Error Variances
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
x1 _Add08 11.16406 4.19243 2.6629 0.0077
x2 _Add09 8.85978 3.78083 2.3433 0.0191
x3 _Add10 6.47248 2.45789 2.6334 0.0085
y1 _Add11 0.76135 1.32719 0.5737 0.5662
y2 _Add12 2.79060 1.08465 2.5728 0.0101
y3 _Add13 4.99466 1.48010 3.3745 0.0007



Output 29.16.4: Parameter Estimates by the ML Method: Scores Data

Factor Loading Matrix: Estimate/StdErr/t-value/p-value
  verbal math
x1
5.7486
0.9651
5.9567
<.0001
[_Parm1]
0
 
 
 
 
x2
5.7265
0.9239
6.1981
<.0001
[_Parm2]
0
 
 
 
 
x3
4.5885
0.7570
6.0617
<.0001
[_Parm3]
0
 
 
 
 
y1
0
 
 
 
 
5.1972
0.6779
7.6662
<.0001
[_Parm4]
y2
0
 
 
 
 
4.1341
0.6025
6.8612
<.0001
[_Parm5]
y3
0
 
 
 
 
3.7004
0.6143
6.0238
<.0001
[_Parm6]

Factor Covariance Matrix: Estimate/StdErr/t-value/p-value
  verbal math
verbal
1.0000
 
 
 
 
0.5175
0.1406
3.6800
0.000233
[_Add01]
math
0.5175
0.1406
3.6800
0.000233
[_Add01]
1.0000
 
 
 
 

Intercepts
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
x1 _Add02 19.90625 1.17540 16.9357 <.0001
x2 _Add03 18.81250 1.14089 16.4893 <.0001
x3 _Add04 18.68750 0.92749 20.1486 <.0001
y1 _Add05 17.90625 0.93161 19.2208 <.0001
y2 _Add06 17.84375 0.78823 22.6377 <.0001
y3 _Add07 17.75000 0.76419 23.2272 <.0001

Error Variances
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
x1 _Add08 11.16365 4.06567 2.7458 0.0060
x2 _Add09 8.85925 3.65397 2.4246 0.0153
x3 _Add10 6.47288 2.47689 2.6133 0.0090
y1 _Add11 0.76124 1.23420 0.6168 0.5374
y2 _Add12 2.79066 1.04307 2.6754 0.0075
y3 _Add13 4.99461 1.40024 3.5670 0.0004



The equivalence between METHOD=ML and METHOD=FIML implies that if you do not have any missing data in your data, you can just use METHOD=ML because it is computationally more efficient than the FIML method.

While the equivalence between ML and FIML is established here with the use of the VARDEF= and MEANSTR options (for data without missing values), it is not necessary in practice to use these options with METHOD=ML. The VARDEF= option is used in this example only to demonstrate the theoretical equivalence between METHOD=ML and METHOD=FIML. The VARDEF= option has very little effect if you have at least a moderate sample size (for example, 30 or more observations).

Merely adding the MEANSTR option to an analysis for data without missing values amounts to adding a saturated mean structure to a covariance structure analysis. In this case, the MEANSTR option only gives you more estimates that pertain to the mean structures, but the parameter estimates that pertain to the covariance structures do not change. Therefore, use the MEANSTR option only when you need to estimate certain mean structure parameters or when you fit models with nonsaturated mean structures.

However, use METHOD=FIML when there are missing values in your data and you need to use every bit of information from the incomplete observations with random missing values.