Previous Page | Next Page

 The CALIS Procedure

## Example 25.13 The Full Information Maximum Likelihood Method

This example shows how you can fully utilize all available information from the data when there is a high proportion of observations with random missing value. You use the full-information maximum likelihood method for model estimation.

In Example 25.11, 32 students take six tests. These six tests are indicator measures of two ability factors: verbal and math. You conduct a confirmatory factor analysis in Example 25.11 based on a data set without any missing values. The path diagram for the confirmatory factor model is shown the following:

Suppose now due to sickness or unexpected events, some students cannot take part in one of these tests. Now, the data test contains missing values at various locations, as indicated by the following DATA step:

```data missing;
input x1 x2 x3 y1 y2 y3;
datalines;
23   .  16  15  14  16
29  26  23  22  18  19
14  21   .  15  16  18
20  18  17  18  21  19
25  26  22   .  21  26
26  19  15  16  17  17
.  17  19   4   6   7
12  17  18  14  16   .
25  19  22  22  20  20
7  12  15  10  11   8
29  24   .  14  13  16
28  24  29  19  19  21
12   9  10  18  19   .
11   .  12  15  16  16
20  14  15  24  23  16
26  25   .  24  23  24
20  16  19  22  21  20
14   .  15  17  19  23
14  20  13  24   .   .
29  24  24  21  20  18
26   .  26  28  26  23
20  23  24  22  23  22
23  24  20  23  22  18
14   .  17   .  16  14
28  34  27  25  21  21
17  12  10  14  12  16
.   1  13  14  15  14
22  19  19  13  11  14
18  21   .  15  18  19
12  12  10  13  13  16
22  14  20  20  18  19
29  21  22  13  17   .
;
```

This data set is similar to the scores data set used in Example 25.11, except that some values are replaced at random with missing values. You can still fit the same confirmatory factor-analysis model described in Example 25.11 to this data set by the default maximum likelihood (ML) method, as shown in the following statement:

```proc calis data=missing;
factor
verbal ---> x1-x3,
math   ---> y1-y3;
pvar
verbal = 1.,
math   = 1.;
run;
```

The data set, the number of observations, the model type, and analysis type are shown in the first table of Output 25.13.1. Although PROC CALIS reads all 32 records in the data set, only 16 of these records are used. The remaining 16 records contain at least one missing value in the tests. They are discarded from the analysis. Therefore, the maximum likelihood method only uses those 16 observations without missing values.

Output 25.13.1 Modeling Information of the CFA Model: Missing Data
 Confirmatory Factor Model With \Dataset{Missing} Data: ML FACTOR Model Specification

The CALIS Procedure
Covariance Structure Analysis: Model and Initial Values

Modeling Information
Data Set WORK.MISSING
N Records Used 16
N Obs 16
Model Type FACTOR
Analysis Covariances

Output 25.13.2 shows the parameter estimates.

Output 25.13.2 Parameter Estimates of the CFA Model: Missing Data
verbal math
x1
 5.1110 1.3110 3.8984 [_Parm1]
 0
x2
 5.6261 1.2561 4.4790 [_Parm2]
 0
x3
 4.8739 1.1410 4.2717 [_Parm3]
 0
y1
 0
 4.4529 0.8530 5.2205 [_Parm4]
y2
 0
 3.8562 0.8303 4.6444 [_Parm5]
y3
 0
 2.6338 0.7416 3.5513 [_Parm6]

Factor Covariance Matrix: Estimate/StdErr/t-value
verbal math
verbal
 1.0000
math
 1.0000

Error Variances
Variable Parameter Estimate Standard
Error
t Value

Most of the factor loading estimates shown in Output 25.13.2 are similar to those estimated from the data set without missing values, as shown in Output 25.11.4. The loading estimate of y3 on the math factor shows the largest discrepancy. With only half of the data used in the current estimation, this loading estimate is 2.6338 in the current analysis, while it is 3.7596 if no data were missing, as shown in Output 25.11.4. Another obvious difference between the two sets of results is that the standard error estimates for the loadings are consistently larger in the current analysis than in the analysis in Example 25.11 where there are no missing data. This is expected because you have only half of the data set available in the current analysis.

Similarly, the estimates for the factor covariance and error variances are mostly similar to those in the analysis with complete data, but the standard error estimates in the current analysis are consistently higher.

The maximum likelihood method, as implemented in PROC CALIS, deletes all observations with at least one missing value in the estimation. In a sense, the partially available information of these deleted observations is wasted. This greatly reduces the efficiency of the estimation, which results in higher standard error estimates.

To fully utilize all available information from the data set with the presence of missing values, you can use the full information maximum likelihood (FIML) method in PROC CALIS, as shown in the following statements:

```proc calis method=fiml data=missing;
factor
verbal ---> x1-x3,
math   ---> y1-y3;
pvar
verbal = 1.,
math   = 1.;
run;
```

In the PROC CALIS statement, you use METHOD=FIML to request the full-information maximum likelihood method. Instead of deleting observations with missing values, the full-information maximum likelihood method uses all available information in all observations. Output 25.13.3 shows some modeling information of the FIML estimation of the confirmatory factor model on the missing data.

Output 25.13.3 Modeling Information of the CFA Model with FIML: Missing Data
 Confirmatory Factor Model With Missing Data: FIML FACTOR Model Specification

The CALIS Procedure
Mean and Covariance Structures: Model and Initial Values

Modeling Information
Data Set WORK.MISSING
N Complete Records 16
N Incomplete Records 16
N Complete Obs 16
N Incomplete Obs 16
Model Type FACTOR
Analysis Means and Covariances

PROC CALIS shows you that the number of complete observations is 16 and the number of incomplete observations is 16 in the data set. All these observations are included in the estimation. The analysis type is 'Means and Covariances' because with full information maximum likelihood, the sample means have to be analyzed during the estimation.

Output 25.13.4 shows the parameter estimates.

Output 25.13.4 Parameter Estimates of the CFA Model with FIML: Missing Data
verbal math
x1
 5.5003 1.0025 5.4867 [_Parm1]
 0
x2
 5.7134 0.9956 5.7385 [_Parm2]
 0
x3
 4.4417 0.7669 5.7918 [_Parm3]
 0
y1
 0
 4.9277 0.6798 7.2491 [_Parm4]
y2
 0
 4.1215 0.5716 7.2100 [_Parm5]
y3
 0
 3.3834 0.6145 5.5058 [_Parm6]

Factor Covariance Matrix: Estimate/StdErr/t-value
verbal math
verbal
 1.0000
math
 1.0000

Error Variances
Variable Parameter Estimate Standard
Error
t Value