Working with Nodes That Modify, Model, and Explore |
Overview |
SAS Enterprise Miner provides numerous predictive modeling tools. The Regression node automatically performs either a logistic or ordinary least squares regression, depending on the target measurement level. Like the Decision Tree and Neural Network nodes, the Regression node supports binary, nominal, ordinal, and continuous targets.
This task builds a regression model that uses the partitioned, imputed and transformed DONOR_RAW_DATA data set.
Drag a Regression node from the Model tab of the toolbar into the Diagram Workspace, and connect it to the Transform Variables node.
Create Histograms of Transformed Variables |
It might be useful to view the distributions of newly transformed variables before you set the properties in the Regression node Properties panel.
Select the Regression node in the Diagram Workspace to view the node settings in the Properties panel.
Click the ellipsis button to the right of the Variables property to open the Variables - Reg window.
The transformed variables that you created begin with variable prefixes LOG_ and OPT_. The imputed variables that you created begin with an IMP_ prefix.
Note: If you do not see these variables, close the Variables - Reg window, right-click the Regression node and select Update.
Select the TARGET_B variable, as well as variables that have the prefixes IMP_, LOG_, and OPT in order to create a histogram or bar chart of all the transformed variables.
Click
.Maximize the Explore window. Then from the menu, select Window Tile in order to improve the visual layout of the plots.
The plots for each variable are created using a sample size of 2,000 observations. In the Sample Properties window, set the Sample Method to Random and then click to plot a random sample of the data.
Double click each level of the target variable in order to view how the donors and non-donors are distributed across each of the transformed variables.
Note that some of the heavily skewed variables are more normally distributed after you apply the logarithmic transformation.
Close the Explore window and then close the Variables window.
Set Regression Properties |
The Regression node can select a model from a set of candidate terms by using one of several methods and criteria. In this task, you specify a model selection criterion that you use during training.
Select the Regression node in the Diagram Workspace. In the Regression node Properties panel, set the Regression node Selection Model property to Stepwise. This setting systematically adds and deletes variables from the model, based on the Entry Significance Level and Stay Significance Levels (defaults of 0.05).
Run the Regression node and then view the results. Examine the average profit of the validation data
By default the Regression node displays the following:
A table of fit statistics for both the training and validation data. Examine the average profit in the validation data.
A cumulative lift plot (score rankings) across the various deciles for the training and validation data sources. The plot lift values are very consistent for both the training and validation data. You can change the plotting variables on this chart as you did when viewing the lift plot for the decision tree. You can change the vertical axis (Y) to display profit.
An effects plot that shows the model effects in order by size. The model effects are sized according to their absolute coefficients. The color of the model effect indicates the sign of the coefficient. When you hold your mouse pointer over the effect bars, you will see that some of the transformed inputs and one of the imputed variables have significant effects in the stepwise selection.
A detailed output window. The detailed output window provides several statistics in addition to a summary of the stepwise selection process.
Close the Results window.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.