Problem Note 62873: The "Missing Values and Imputed Values" example in the chapter The HPFOREST documentation gives incorrect information
In SAS® Enterprise Miner™ High-Performance Procedures documentation, the chapter The HPFOREST Procedure contains an example called "Missing Values and Imputed Values". The stated purpose of the example is as-follows:
This example uses the Home Equity data from the SAS sample library to illustrate the difference between using missing values and using imputed values.
However, the conclusion that a difference exists is based on a false premise. You can ignore the example.
DETAILS
proc hpimpute data=sampsio.hmeq out=imout;
input mortdue value yoj clage ninq clno debtinc derog delinq;
impute mortdue value yoj clage ninq clno debtinc derog delinq/method=mean;
run;
data job_reason;
set sampsio.hmeq;
if job='' then job="Other";
if reason='' then reason="DebtCon";
run;
data imout;
merge imout job_reason;
run;
The DATA step code that creates the imputed table imout does not output the correct imputed values for those variables that were imputed by the PROC HPIMPUTE invocation. The incorrect values occur because the OUT= data set that is created by PROC IMPUTE does not preserve the data order of the DATA= data set when the procedure is running in multi-threaded (the default) mode. There is no BY statement with the MERGE statement in the DATA step. Therefore, the merged observations are not correctly matched.
The only way to guarantee data order is by using a PERFORMANCE statement with the NTHREADS=1 option in the PROC HPIMPUTE invocation.
If the MERGE is done correctly, then the variable-importance ranking and miss-classification rates are similar with and without imputation. In that case, the premise "imputing variables reduce the predictive power of the variables" is no longer valid.
Operating System and Release Information
SAS System | SAS High-Performance Data Mining | Microsoft® Windows® for x64 | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 |
64-bit Enabled AIX | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 |
64-bit Enabled Solaris | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 |
Linux for x64 | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 |
Solaris for x64 | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Problem Note |
Priority: | medium |
Date Modified: | 2018-09-06 16:04:37 |
Date Created: | 2018-09-06 12:51:39 |