The default neural network
model does not perform any better than the regression model. If both
models performed poorly compared to the decision tree model, poor
performance might be due to how missing values are handled. The tree
model directly handles observations with missing values while the
regression and neural network models ignore those observations. This
is why the
Regression model is significantly
worse than the
Decision Tree model.
In the
Neural
Network and
Regression (3) model,
you transformed and binned the variables before creating the regression.
When the variables were binned, classification variables were created
and missing values were assigned to a level for each of the new classification
variables.
The
Regression
(2) model uses imputation to handle missing values. The
effect of this replacement is that you replace a missing value (perhaps
an unusual value for the variable) with an imputed value (a typical
value for the variable). Imputation can change an observation from
being somewhat unusual with respect to a particular variable to very
typical with respect to that variable.
For example, if someone
applied for a loan and had a missing value for INCOME, the Impute
node would replace that value with the mean value of INCOME by default.
In practice, however, someone who has an average value for INCOME
would often be evaluated differently than someone with a missing value
for INCOME. Any models that follow the Impute node would not be able
to distinguish the difference between these two applicants.
One solution to this
problem is to create missing value indicator variables. These variables
indicate whether an observation originally had a missing value before
imputation is performed. The missing value indicators enable the Regression
and Neural Network nodes to differentiate between observations that
originally had missing values and observations with no missing values.
The addition of missing value indicators can greatly improve a neural
network or regression model.
Recall that you chose
not to use indicator variables earlier. You are going to reverse that
decision now. In your diagram workspace, select the
Impute node.
Locate the
Type property in the
Indicator
Variables subgroup. Set the value of this property to
Unique.
Set the value of the
Source property to
Missing
Values. Set the value of the
Role property
to
Input.
In the diagram workspace,
select the
Model Comparison node. Right-click
the
Model Comparison node and click
Run.
In the
Confirmation window, click
Yes.
In the
Run Status window, click
Results.
Notice that the
Regression
(2) model now outperforms the
Neural Network model.
In fact, it is the best model in the first decile of the training
data.
Close the
Results window.
In general, it is impossible
to know which model will provide the best results when it is applied
to new data. For this data, or any other, a good analyst considers
many variations of each model and identifies the best model according
to their own criteria. In this case, assume that
Regression
(3) is the selected model.