Working with Nodes That Modify, Model, and Explore

About Missing Values

Many of the input variables in the Donor data set that you have been using have missing values. If an observation contains a missing value, then by default that observation is not used for modeling by nodes such as Variable Selection, Neural Network, or Regression.

Depending on the type of predictive model that you build, missing values can cause problems. If your model is based on a decision tree, missing values cause no problems because decision trees handle missing values directly.

However, in Enterprise Miner, regression and neural network models ignore observations that contain missing values. Substantially reducing the size of the training data set can weaken these predictive models. It is wise to impute missing values before you fit a regression model or neural network model. When you replace missing observations with imputed data, regression and neural network algorithms are able to perform whole-case analysis on the entire training data set. If you do not impute missing values for these models, the missing values might result in the creation of an inferior model. Additionally, it is important to impute missing values if you are planning to compare a regression model or neural network model with a decision tree model, because it is more appropriate to compare models that are built on the same set of observations.

Top of Page