Previous Page | Next Page

The CALIS Procedure

Example 25.13 The Full Information Maximum Likelihood Method

This example shows how you can fully utilize all available information from the data when there is a high proportion of observations with random missing value. You use the full-information maximum likelihood method for model estimation.

In Example 25.11, 32 students take six tests. These six tests are indicator measures of two ability factors: verbal and math. You conduct a confirmatory factor analysis in Example 25.11 based on a data set without any missing values. The path diagram for the confirmatory factor model is shown the following:


Suppose now due to sickness or unexpected events, some students cannot take part in one of these tests. Now, the data test contains missing values at various locations, as indicated by the following DATA step:

data missing;
   input x1 x2 x3 y1 y2 y3;
   datalines;
 23   .  16  15  14  16
 29  26  23  22  18  19
 14  21   .  15  16  18
 20  18  17  18  21  19
 25  26  22   .  21  26
 26  19  15  16  17  17
  .  17  19   4   6   7
 12  17  18  14  16   .
 25  19  22  22  20  20
  7  12  15  10  11   8
 29  24   .  14  13  16
 28  24  29  19  19  21
 12   9  10  18  19   .
 11   .  12  15  16  16
 20  14  15  24  23  16
 26  25   .  24  23  24
 20  16  19  22  21  20
 14   .  15  17  19  23
 14  20  13  24   .   .
 29  24  24  21  20  18
 26   .  26  28  26  23
 20  23  24  22  23  22
 23  24  20  23  22  18
 14   .  17   .  16  14
 28  34  27  25  21  21
 17  12  10  14  12  16
  .   1  13  14  15  14
 22  19  19  13  11  14
 18  21   .  15  18  19
 12  12  10  13  13  16
 22  14  20  20  18  19
 29  21  22  13  17   .
;

This data set is similar to the scores data set used in Example 25.11, except that some values are replaced at random with missing values. You can still fit the same confirmatory factor-analysis model described in Example 25.11 to this data set by the default maximum likelihood (ML) method, as shown in the following statement:

proc calis data=missing;
   factor
      verbal ---> x1-x3,
      math   ---> y1-y3;
   pvar
      verbal = 1.,
      math   = 1.;
run;

The data set, the number of observations, the model type, and analysis type are shown in the first table of Output 25.13.1. Although PROC CALIS reads all 32 records in the data set, only 16 of these records are used. The remaining 16 records contain at least one missing value in the tests. They are discarded from the analysis. Therefore, the maximum likelihood method only uses those 16 observations without missing values.

Output 25.13.1 Modeling Information of the CFA Model: Missing Data
Confirmatory Factor Model With \Dataset{Missing} Data: ML
FACTOR Model Specification

The CALIS Procedure
Covariance Structure Analysis: Model and Initial Values

Modeling Information
Data Set WORK.MISSING
N Records Read 32
N Records Used 16
N Obs 16
Model Type FACTOR
Analysis Covariances

Output 25.13.2 shows the parameter estimates.

Output 25.13.2 Parameter Estimates of the CFA Model: Missing Data
Factor Loading Matrix: Estimate/StdErr/t-value
  verbal math
x1
5.1110
1.3110
3.8984
[_Parm1]
0
 
 
 
x2
5.6261
1.2561
4.4790
[_Parm2]
0
 
 
 
x3
4.8739
1.1410
4.2717
[_Parm3]
0
 
 
 
y1
0
 
 
 
4.4529
0.8530
5.2205
[_Parm4]
y2
0
 
 
 
3.8562
0.8303
4.6444
[_Parm5]
y3
0
 
 
 
2.6338
0.7416
3.5513
[_Parm6]

Factor Covariance Matrix: Estimate/StdErr/t-value
  verbal math
verbal
1.0000
 
 
 
0.7050
0.1464
4.8165
[_Add1]
math
0.7050
0.1464
4.8165
[_Add1]
1.0000
 
 
 

Error Variances
Variable Parameter Estimate Standard
Error
t Value
x1 _Add2 11.27773 5.19739 2.16988
x2 _Add3 6.33003 4.25356 1.48817
x3 _Add4 6.47402 3.61040 1.79316
y1 _Add5 0.57143 1.51781 0.37648
y2 _Add6 2.57992 1.47618 1.74770
y3 _Add7 4.59651 1.77777 2.58555

Most of the factor loading estimates shown in Output 25.13.2 are similar to those estimated from the data set without missing values, as shown in Output 25.11.4. The loading estimate of y3 on the math factor shows the largest discrepancy. With only half of the data used in the current estimation, this loading estimate is 2.6338 in the current analysis, while it is 3.7596 if no data were missing, as shown in Output 25.11.4. Another obvious difference between the two sets of results is that the standard error estimates for the loadings are consistently larger in the current analysis than in the analysis in Example 25.11 where there are no missing data. This is expected because you have only half of the data set available in the current analysis.

Similarly, the estimates for the factor covariance and error variances are mostly similar to those in the analysis with complete data, but the standard error estimates in the current analysis are consistently higher.

The maximum likelihood method, as implemented in PROC CALIS, deletes all observations with at least one missing value in the estimation. In a sense, the partially available information of these deleted observations is wasted. This greatly reduces the efficiency of the estimation, which results in higher standard error estimates.

To fully utilize all available information from the data set with the presence of missing values, you can use the full information maximum likelihood (FIML) method in PROC CALIS, as shown in the following statements:


proc calis method=fiml data=missing;
   factor
      verbal ---> x1-x3,
      math   ---> y1-y3;
   pvar
      verbal = 1.,
      math   = 1.;
run;

In the PROC CALIS statement, you use METHOD=FIML to request the full-information maximum likelihood method. Instead of deleting observations with missing values, the full-information maximum likelihood method uses all available information in all observations. Output 25.13.3 shows some modeling information of the FIML estimation of the confirmatory factor model on the missing data.

Output 25.13.3 Modeling Information of the CFA Model with FIML: Missing Data
Confirmatory Factor Model With Missing Data: FIML
FACTOR Model Specification

The CALIS Procedure
Mean and Covariance Structures: Model and Initial Values

Modeling Information
Data Set WORK.MISSING
N Records Read 32
N Complete Records 16
N Incomplete Records 16
N Complete Obs 16
N Incomplete Obs 16
Model Type FACTOR
Analysis Means and Covariances

PROC CALIS shows you that the number of complete observations is 16 and the number of incomplete observations is 16 in the data set. All these observations are included in the estimation. The analysis type is 'Means and Covariances' because with full information maximum likelihood, the sample means have to be analyzed during the estimation.

Output 25.13.4 shows the parameter estimates.

Output 25.13.4 Parameter Estimates of the CFA Model with FIML: Missing Data
Factor Loading Matrix: Estimate/StdErr/t-value
  verbal math
x1
5.5003
1.0025
5.4867
[_Parm1]
0
 
 
 
x2
5.7134
0.9956
5.7385
[_Parm2]
0
 
 
 
x3
4.4417
0.7669
5.7918
[_Parm3]
0
 
 
 
y1
0
 
 
 
4.9277
0.6798
7.2491
[_Parm4]
y2
0
 
 
 
4.1215
0.5716
7.2100
[_Parm5]
y3
0
 
 
 
3.3834
0.6145
5.5058
[_Parm6]

Factor Covariance Matrix: Estimate/StdErr/t-value
  verbal math
verbal
1.0000
 
 
 
0.5014
0.1473
3.4029
[_Add01]
math
0.5014
0.1473
3.4029
[_Add01]
1.0000
 
 
 

Error Variances
Variable Parameter Estimate Standard
Error
t Value
x1 _Add08 12.72770 4.77627 2.66478
x2 _Add09 9.35994 4.48806 2.08552
x3 _Add10 5.67393 2.69872 2.10246
y1 _Add11 1.86768 1.36676 1.36650
y2 _Add12 1.49942 0.97322 1.54067
y3 _Add13 5.24973 1.54121 3.40623

First, you can compare the current FIML results with the results in Example 25.11, where maximum likelihood method is used with the complete data set. Overall, the estimates of loadings, factor covariance, and error variances are similar in the two analyses. Next, you compare the current FIML results with the results in Output 25.13.2, where the default ML method is applied to the same data set with missing values. Except for the standard error estimate of the factor covariance, which are very similar with ML and FIML, the standard error estimates with FIML are consistently smaller than those with ML in Output 25.13.2. This means that with FIML, you improve the estimation efficiency by including the partial information in those observations with missing values.

When you have a data set with no missing values, the ML and FIML methods, as implemented in PROC CALIS, are theoretically the same. Both are equally efficient and produce similar estimates (see Example 25.14). FIML and ML are the same estimation technique that maximizes the likelihood function under the multivariate normal distribution. However, in PROC CALIS, the distinction between of ML and FIML concerns different treatments of the missing values. With METHOD=ML, all observations with one or more missing values are discarded from the analysis. With METHOD=FIML, all observations with at least one nonmissing value are included in the analysis.

Previous Page | Next Page | Top of Page