The HPIMPUTE procedure can use four methods to impute numeric missing values. This example uses all four imputation methods
available in the IMPUTE statement to manipulate a data set. The following SAS DATA step creates the SAS data set ex1
, which has six variables: the first four variables all have some missing values, the fifth variable is the frequency variable,
and the last variable is an index variable.
data ex1; input a b c d freq id; cards; 2 3 1 1 2 1 2 2 2 2 3 2 . 0 3 . 0 3 2 3 . . . 4 2 . . . -5 5 . 6 . . 3 6 . 4 . . 4 7 2 5 . . 3 8 . 6 9 9 1 9 2 3 10 10 3 10 run;
The following statements include four IMPUTE statements, each of which specifies a different imputation method. The INPUT statement specifies the input variables. PROC HPIMPUTE assumes that the variables have an interval level of measurement because the variables are numeric.
proc hpimpute data=ex1 out=out1; id id; input a b c d; impute a / value=0.1; impute b / method=pmedian; impute c / method=random; impute d / method=mean; freq freq; run;
Figure 7.1 shows the imputation results. The Variable column shows the original variable names from the input data set. The Imputation Indicator column shows a 0 if that observation was not imputed and 1 if it was. The Imputed Variable column shows the names of imputed variables in the output data set. The Type of Imputation column shows the types of imputation methods: Given Value, Pseudo Median, Random (between the minimum value and the maximum value of the nonmissing values), and Mean. For random imputation, the last column shows the imputation seed. For other imputation methods, the last column shows the imputation value that is used to replace missing values.
Figure 7.1: HPIMPUTE Getting Started Example Output
Imputation Results | |||||
---|---|---|---|---|---|
Variable | Imputation Indicator |
Imputed Variable |
N Missing |
Type of Imputation |
Imputation Value (Seed) |
a | M_a | IM_a | 4 | Given value | 0.10000 |
b | M_b | IM_b | 1 | Pseudo Median | 4.00000 |
c | M_c | IM_c | 5 | Random | 5.00000 |
d | M_d | IM_d | 6 | Mean | 5.22222 |