Input data often contains
an extremely large number of variables. In general, using all of the
variables in a predictive model is not practical, so variable selection
plays a critical role in modeling. The previous chapter used stepwise
regression to perform variable selection. However, this method might
not perform well when you are evaluating data sets that contain hundreds
of potential input variables. Furthermore, keep in mind that the stepwise
selection method is available only in the Regression node. Variable
selection is often more critical for the Neural Network node than
it is for the other modeling nodes. This is because of the large number
of parameters that are generated relative to using the same number
of variables in a regression model.
Because variable selection
is a critical phase in model building, SAS Enterprise Miner provides
the Variable Selection node. Variables selected for analysis in the
Variable Selection node are available to any subsequent nodes. No
single method of variable selection is universally better than any
other method. It is often useful to consider many types of variable
selection methods when evaluating the importance of each variable.
This chapter demonstrates
how to use the Variable Selection node to identify important variables.
For convenience, consider the process flow diagram that you created
in the previous chapter. In this chapter, you perform only variable
selection before creating a neural network model. This model is then
compared to the other models created in the previous chapter.