The HPIMPUTE Procedure

Getting Started: HPIMPUTE Procedure

The HPIMPUTE procedure can use four methods to impute numeric missing values. This example uses all four imputation methods available in the IMPUTE statement to manipulate a data set. The following SAS DATA step creates the SAS data set ex1, which has six variables: the first four variables all have some missing values, the fifth variable is the frequency variable, and the last variable is an index variable.

data ex1;
input a  b  c  d  freq id;
cards;
       2    3   1  1   2   1
       2    2   2  2   3   2
       .    0   3  .   0   3
       2    3   .  .   .   4
       2    .   .  .   -5  5
       .    6   .  .   3   6
       .    4   .  .   4   7
       2    5   .  .   3   8
       .    6   9  9   1   9
       2    3   10 10  3   10
run;

The following statements include four IMPUTE statements, each of which specifies a different imputation method. The INPUT statement specifies the input variables. PROC HPIMPUTE assumes that the variables have an interval level of measurement because the variables are numeric.

 proc hpimpute data=ex1 out=out1;
     id id;
     input a b c d;
     impute a / value=0.1;
     impute b / method=pmedian;
     impute c / method=random;
     impute d / method=mean;
     freq freq;
 run;    

Figure 7.1 shows the imputation results. The Variable column shows the original variable names from the input data set. The Imputation Indicator column shows a 0 if that observation was not imputed and 1 if it was. The Imputed Variable column shows the names of imputed variables in the output data set. The Type of Imputation column shows the types of imputation methods: Given Value, Pseudo Median, Random (between the minimum value and the maximum value of the nonmissing values), and Mean. For random imputation, the last column shows the imputation seed. For other imputation methods, the last column shows the imputation value that is used to replace missing values.

Figure 7.1: HPIMPUTE Getting Started Example Output

The HPIMPUTE Procedure

Imputation Results
Variable Imputation
Indicator
Imputed
Variable
N
Missing
Type of
Imputation
Imputation
Value (Seed)
a M_a IM_a 4 Given value 0.10000
b M_b IM_b 1 Pseudo Median 4.00000
c M_c IM_c 5 Random 5.00000
d M_d IM_d 6 Mean 5.22222