The CALIS Procedure

Example 29.16 Comparing the ML and FIML Estimation

This example uses the complete data set from Example 29.12 to illustrate how the maximum likelihood (ML) and full information maximum likelihood (FIML) methods are theoretically equivalent when you apply them to data set without missing values. In Example 29.15, you apply a confirmatory factor model to a data set with missing values. You find that with METHOD=FIML, you can get more stable estimates than with METHOD=ML (which is the default estimation method). Near the end of Example 29.15, you learn that ML and FIML are theoretically equivalent estimation methods when you apply them to data sets without missing values.

However, the ML and FIML methods have two major computational differences in their implementations in PROC CALIS. First, with METHOD=FIML the first-order properties (that is, the means of the variables) of the data are automatically included in the analysis. However, by default you analyze only the second-order properties (that is, the covariances of the variables) with METHOD=ML. Second, the biased sample covariance formula (with N as the variance divisor) is used with METHOD=FIML, while the unbiased sample covariance formula (with DF=N – 1 as the variance divisor) is used with METHOD=ML. See the section Relationships among Estimation Criteria for more details about the similarities and differences between the ML and FIML methods.

If you take care of these two differences between ML and FIML in PROC CALIS, you can obtain exactly the same results with these two methods when you apply them to data sets without missing values.

For example, with the complete data set scores from Example 29.12, you specify the FIML estimation in the following statements:

proc calis method=fiml data=scores;
   factor
      verbal ===> x1-x3,
      math   ===> y1-y3;
   pvar
      verbal = 1.,
      math   = 1.;
run;

An equivalent specification with the ML method is shown in the following statements:

proc calis method=ml meanstr vardef=n data=scores;
   factor
      verbal ===> x1-x3,
      math   ===> y1-y3;
   pvar
      verbal = 1.,
      math   = 1.;
run;

In the PROC CALIS statement, you specify two options to make the ML estimation exactly equivalent to the FIML estimation in PROC CALIS. First, the MEANSTR option requests the first-order properties (the mean structures) to be analyzed with the covariance structures. Second, the VARDEF=N option defines the variance divisor to N, instead of the default DF, which is the same as N–1. These two options make the ML estimation equivalent to the FIML estimation.

Output 29.16.1 and Output 29.16.2 show some fit summary statistics under the FIML and ML methods, respectively.

Output 29.16.1: Model Fitting by the FIML Method: Scores Data

Fit Summary
Fit Function	31.7837
Chi-Square	10.1215
Chi-Square DF	8
Pr > Chi-Square	0.2566
Standardized RMR (SRMR)	0.0571
RMSEA Estimate	0.0910
Bentler Comparative Fit Index	0.9872
NOTE: Saturated mean structure parameters are excluded from the computations of fit indices.

Output 29.16.2: Model Fitting by the ML Method: Scores Data

Fit Summary
Fit Function	0.3163
Chi-Square	10.1215
Chi-Square DF	8
Pr > Chi-Square	0.2566
Standardized RMR (SRMR)	0.0504
RMSEA Estimate	0.0910
Bentler Comparative Fit Index	0.9872

Except for the fit function values, both FIML and ML methods produce the same set of fit statistics. The difference in the fit function values is expected because the FIML function has a constant term which is derived from the likelihood function. This constant term does not depend on the model parameters. Hence, the FIML and ML discrepancy functions that are used in PROC CALIS are equivalent when VARDEF=N is used in the ML method for analyzing mean and covariance structures.

The parameter estimates are shown in Output 29.16.3 and Output 29.16.4 for the FIML and ML methods, respectively. Except for very tiny numerical differences in some estimates, the FIML and ML estimates match.

Output 29.16.3: Parameter Estimates by the FIML Method: Scores Data

5.7486

0.9692

5.9315

<.0001

[_Parm1]

5.7265

0.9278

6.1720

<.0001

[_Parm2]

4.5886

0.7562

6.0676

<.0001

[_Parm3]

5.1972

0.6796

7.6477

<.0001

[_Parm4]

4.1342

0.6036

6.8489

<.0001

[_Parm5]

3.7004

0.6177

5.9904

<.0001

[_Parm6]

verbal

1.0000

0.5175

0.1433

3.6113

0.000305

[_Add01]

math

0.5175

0.1433

3.6113

0.000305

[_Add01]

1.0000

Intercepts
Variable	Parameter	Estimate	Standard Error	t Value	Pr > \|t\|
x1	_Add02	19.90625	1.17540	16.9357	<.0001
x2	_Add03	18.81250	1.14089	16.4893	<.0001
x3	_Add04	18.68750	0.92749	20.1486	<.0001
y1	_Add05	17.90625	0.93161	19.2208	<.0001
y2	_Add06	17.84375	0.78823	22.6377	<.0001
y3	_Add07	17.75000	0.76419	23.2272	<.0001

Error Variances
Variable	Parameter	Estimate	Standard Error	t Value	Pr > \|t\|
x1	_Add08	11.16406	4.19243	2.6629	0.0077
x2	_Add09	8.85978	3.78083	2.3433	0.0191
x3	_Add10	6.47248	2.45789	2.6334	0.0085
y1	_Add11	0.76135	1.32719	0.5737	0.5662
y2	_Add12	2.79060	1.08465	2.5728	0.0101
y3	_Add13	4.99466	1.48010	3.3745	0.0007

Output 29.16.4: Parameter Estimates by the ML Method: Scores Data

5.7486

0.9651

5.9567

<.0001

[_Parm1]

5.7265

0.9239

6.1981

<.0001

[_Parm2]

4.5885

0.7570

6.0617

<.0001

[_Parm3]

5.1972

0.6779

7.6662

<.0001

[_Parm4]

4.1341

0.6025

6.8612

<.0001

[_Parm5]

3.7004

0.6143

6.0238

<.0001

[_Parm6]

verbal

1.0000

0.5175

0.1406

3.6800

0.000233

[_Add01]

math

0.5175

0.1406

3.6800

0.000233

[_Add01]

1.0000

Intercepts
Variable	Parameter	Estimate	Standard Error	t Value	Pr > \|t\|
x1	_Add02	19.90625	1.17540	16.9357	<.0001
x2	_Add03	18.81250	1.14089	16.4893	<.0001
x3	_Add04	18.68750	0.92749	20.1486	<.0001
y1	_Add05	17.90625	0.93161	19.2208	<.0001
y2	_Add06	17.84375	0.78823	22.6377	<.0001
y3	_Add07	17.75000	0.76419	23.2272	<.0001

Error Variances
Variable	Parameter	Estimate	Standard Error	t Value	Pr > \|t\|
x1	_Add08	11.16365	4.06567	2.7458	0.0060
x2	_Add09	8.85925	3.65397	2.4246	0.0153
x3	_Add10	6.47288	2.47689	2.6133	0.0090
y1	_Add11	0.76124	1.23420	0.6168	0.5374
y2	_Add12	2.79066	1.04307	2.6754	0.0075
y3	_Add13	4.99461	1.40024	3.5670	0.0004

The equivalence between METHOD=ML and METHOD=FIML implies that if you do not have any missing data in your data, you can just use METHOD=ML because it is computationally more efficient than the FIML method.

While the equivalence between ML and FIML is established here with the use of the VARDEF= and MEANSTR options (for data without missing values), it is not necessary in practice to use these options with METHOD=ML. The VARDEF= option is used in this example only to demonstrate the theoretical equivalence between METHOD=ML and METHOD=FIML. The VARDEF= option has very little effect if you have at least a moderate sample size (for example, 30 or more observations).

Merely adding the MEANSTR option to an analysis for data without missing values amounts to adding a saturated mean structure to a covariance structure analysis. In this case, the MEANSTR option only gives you more estimates that pertain to the mean structures, but the parameter estimates that pertain to the covariance structures do not change. Therefore, use the MEANSTR option only when you need to estimate certain mean structure parameters or when you fit models with nonsaturated mean structures.

However, use METHOD=FIML when there are missing values in your data and you need to use every bit of information from the incomplete observations with random missing values.