Sometimes, input data is more informative on a scale other than that from which it
was originally collected. For example,
variable transformations can be used to stabilize variance, remove nonlinearity, improve additivity,
and counter non-normality. Therefore, for many models, transformations of the input
data (either dependent or independent variables) can lead to a better
model fit. These transformations can be functions of either a single variable or of more
than one variable.
To use the Transform
Variables node to make variables better suited for
logistic regression models and neural networks:
-
From the
Modify tab
on the Toolbar, select the
Transform Variables node
icon. Drag the node into the Diagram Workspace.
-
Connect the
Impute node
to the
Transform Variables node.
Tip
To align a process flow diagram
vertically, as in the image above, right-click anywhere in the Diagram
Workspace, and select
LayoutVertically from the resulting
menu.
-
Select the
Transform
Variables node. In the Properties Panel, scroll down
to view the
Train properties, and click on
the ellipses that represent the value of
Formulas.
The
Formulas window appears.
-
In the variables table,
click the
Role column heading to sort the variables
in ascending order by their role.
-
You can select any row in the variable table to display the histogram of a variable
in the panel above. Look at the histograms
for all variables that have the role
Input
. Notice that several variables have
skewed distributions.
-
Close the
Formulas window.
-
In the Properties Panel,
scroll down to view the
Train properties,
and click on the ellipses that represent the value of
Variables.
The
Variables — Trans window appears.
-
The common log
transformation is often used to control skewness. Select the transformation
Method for
the following interval variables and select
Log 10 from
the drop-down menu that appears:
Tip
You can hold down the Ctrl
key to select multiple rows. Then, when you select a new
Method for
one of the selected variables, the new method will apply to all of
the selected variables.
-
Select the transformation
Method for
the following interval variables and select
Optimal Binning from
the drop-down menu that appears:
The optimal binning transformation is useful when there is a nonlinear relationship
between an
input variable and the target. For more information about this transformation, see the SAS Enterprise
Miner Help.
-
-
In the Diagram Workspace,
right-click the Transform Variables node, and select
Run from
the resulting menu. Click
Yes in the
Confirmation window
that opens.
-
In the window that appears
when processing completes, click
OK.
Note: In the data that is exported
from the Transform Variables node, a new variable is created for each
variable that is transformed. The original variable is not overwritten.
Instead, the new variable has the same name as the original variable
but is prefaced with an identifier of the transformation. For example,
variables to which the log transformation have been applied are prefaced
with LOG_, and variables to which the optimal binning transformation
have been applied are prefaced with OPT_. The original version of
each variable also exists in the exported data and has the role Rejected
.