Introduction to Variable Selection

Input data often contains an extremely large number of variables. In general, using all of the variables in a predictive model is not practical, so variable selection plays a critical role in modeling. The previous chapter used stepwise regression to perform variable selection. However, this method might not perform well when you are evaluating data sets that contain hundreds of potential input variables. Furthermore, keep in mind that the stepwise selection method is available only in the Regression node. Variable selection is often more critical for the Neural Network node than it is for the other modeling nodes. This is because of the large number of parameters that are generated relative to using the same number of variables in a regression model.
Because variable selection is a critical phase in model building, SAS Enterprise Miner provides the Variable Selection node. Variables selected for analysis in the Variable Selection node are available to any subsequent nodes. No single method of variable selection is universally better than any other method. It is often useful to consider many types of variable selection methods when evaluating the importance of each variable.
This chapter demonstrates how to use the Variable Selection node to identify important variables. For convenience, consider the process flow diagram that you created in the previous chapter. In this chapter, you perform only variable selection before creating a neural network model. This model is then compared to the other models created in the previous chapter.