Sometimes, input data
is more informative on a scale other than that from which it was originally
collected. For example, variable transformations can be used to stabilize
variance, remove nonlinearity, improve additivity, and counter non-normality.
Therefore, for many models, transformations of the input data (either
dependent or independent variables) can lead to a better model fit.
These transformations can be functions of either a single variable
or of more than one variable.
To use the Transform
Variables node to make variables better suited
for logistic regression models and neural networks:
-
From the
Modify tab
on the Toolbar, select the
Transform Variables node
icon. Drag the node into the Diagram Workspace.
-
Connect the
Impute node
to the
Transform Variables node.
Tip
To align a process flow diagram
vertically, as in the image above, right-click anywhere in the Diagram
Workspace, and select
LayoutVertically from the resulting
menu.
-
Select the
Transform
Variables node. In the Properties Panel, scroll down
to view the
Train properties, and click on
the ellipses that represent the value of
Formulas.
The
Formulas window appears.
-
In the variables table,
click the
Role column heading to sort the variables
in ascending order by their role.
-
You can select any row
in the variable table to display the histogram of a variable in the
panel above. Look at the histograms for all variables that have the
role
Input
. Notice that several variables have
skewed distributions.
-
Close the
Formulas window.
-
In the Properties Panel,
scroll down to view the
Train properties,
and click on the ellipses that represent the value of
Variables.
The
Variables — Trans window appears.
-
The common log transformation
is often used to control skewness. Select the transformation
Method for
the following interval variables and select
Log 10 from
the drop-down menu that appears:
Tip
You can hold down the Ctrl
key to select multiple rows. Then, when you select a new
Method for
one of the selected variables, the new method will apply to all of
the selected variables.
-
Select the transformation
Method for
the following interval variables and select
Optimal Binning from
the drop-down menu that appears:
The optimal binning
transformation is useful when there is a nonlinear relationship between
an input variable and the target. For more information about this
transformation, see the SAS Enterprise Miner Help.
-
-
In the Diagram Workspace,
right-click the Transform Variables node, and select
Run from
the resulting menu. Click
Yes in the
Confirmation window
that opens.
-
In the window that appears
when processing completes, click
OK.
Note: In the data that is exported
from the Transform Variables node, a new variable is created for each
variable that is transformed. The original variable is not overwritten.
Instead, the new variable has the same name as the original variable
but is prefaced with an identifier of the transformation. For example,
variables to which the log transformation have been applied are prefaced
with LOG_, and variables to which the optimal binning transformation
have been applied are prefaced with OPT_. The original version of
each variable also exists in the exported data and has the role
Rejected
.