Sometimes, input data
is more informative on a scale other than that from which it was originally
collected. For example, variable transformations can be used to stabilize
variance, remove nonlinearity, improve additivity, and counter non-normality.
Therefore, for many models, transformations of the input data (either
dependent or independent variables) can lead to a better model fit.
These transformations can be functions of either a single variable
or of more than one variable.
To use the Transform
Variables node to make variables better suited
for logistic regression models and neural networks:
-
From the
Modify tab on the Toolbar, select the Transform Variables
node icon. Drag the node into the Diagram Workspace.
-
Connect the Impute node
to the Transform Variables node.
Tip
To align a process flow diagram
vertically, as in the image above, right-click anywhere in the Diagram
Workspace, and select
LayoutVertically from the resulting
menu.
-
Select the Transform
Variables node. In the Properties Panel, scroll down to view the Train
properties, and click on the ellipses that represent the value of
Formulas. The
Formulas window
appears.
-
In the variables table,
click the
Role column heading to sort the variables
in ascending order by their role, as seen in the image above.
-
You can select any row
in the variable table to display the histogram of a variable in the
panel above. Look at the histograms for all variables that have the
role
Input
. Notice that several variables
have skewed distributions.
-
Click
OK to close the
Formulas window.
-
In the Properties Panel,
scroll down to view the Train properties, and click on the ellipses
that represent the value of
Variables. The
Variables — Trans window appears.
-
The common log transformation
is often used to control skewness. Select the transformation
Method for the following interval variables and select
Log 10 from the drop-down menu that appears:
Tip
You can hold down the CTRL
key to select multiple rows. Then, when you select a new
Method for one of the selected variables, the new method
will apply to all of the selected variables.
-
Select the transformation
Method for the following interval variables and select
Optimal Binning from the drop-down menu that appears:
The optimal binning
transformation is useful when there is a nonlinear relationship between
an input variable and the target. For more information about this
transformation, see the SAS Enterprise Miner Help.
-
-
In the Diagram Workspace,
right-click the Transform Variables node, and select
Run from the resulting menu. Click
Yes in the
confirmation window that opens.
-
In the window that appears
when processing completes, click
OK.
Note: In the data that is exported
from the Transform Variables node, a new variable is created for each
variable that is transformed. The original variable is not overwritten.
Instead, the new variable has the same name as the original variable
but is prefaced with an identifier of the transformation. For example,
variables to which the log transformation have been applied are prefaced
with LOG_, and variables to which the optimal binning transformation
have been applied are prefaced with OPT_. The original version of
each variable also exists in the exported data and has the role
Rejected
.