To use the StatExplore
node to produce a statistical summary of the input data:
-
Select the
Explore tab on the Toolbar.
-
Select the StatExplore
node icon. Drag the node into the Diagram Workspace.
Tip
To determine which node an
icon represents, position the mouse pointer over the icon and read
the tooltip.
-
Connect the DONOR_RAW_DATA
input data source node to the StatExplore node.
To connect the two nodes,
position the mouse pointer over the right edge of the input data source
node until the pointer becomes a pencil. With the left mouse button
held down, drag the pencil to the left edge of the StatExplore node.
Then, release the mouse button. An arrow between the two nodes indicates
a successful connection.
-
Select the StatExplore
node. In the Properties Panel, scroll down to view the Chi-Square
Statistics properties. Click on the value of
Interval Variables and select
Yes from the drop-down menu that
appears.
Chi-square statistics
are always computed for categorical variables. Changing the selection
for interval variables causes SAS Enterprise Miner to distribute interval
variables into five (by default) bins and compute chi-square statistics
for the binned variables when you run the node.
-
In the Diagram Workspace,
right-click the StatExplore node, and select
Run from the resulting menu. Click
Yes in the
confirmation window that opens.
When you run a node,
all of the nodes preceding it in the process flow are also run in
order, beginning with the first node that has changed since the flow
was last run. If no nodes other than the one that you select have
changed since the last run, then only the node that you select is
run. You can watch the icons in the process flow diagram to monitor
the status of execution.
-
Nodes that are outlined in green
are currently running.
-
Nodes that are denoted with a check
mark inside a green circle have successfully run.
-
Nodes that are outlined in red
have failed to run due to errors.
In this example, the
DONOR_RAW_DATA input data node had not yet been run. Therefore, both
nodes are run when you select to run the StatExplore node.
-
In the window that appears
when processing completes, click
Results. The
Results window appears.
Note: Panels in Results windows
might not have the same arrangement on your screen, due to window
resizing. When the
Results window
is resized, SAS Enterprise Miner redistributes panels for optimal
viewing.
The results window displays
the following:
-
a plot that orders the variables
by their worth in predicting the target variable.
Note: In the StatExplore node,
SAS Enterprise Miner calculates variable worth using the Gini split
worth statistic that would be generated by building a decision tree
of depth 1. For detailed information about Gini split worth, see the
SAS Enterprise Miner Help.
-
the SAS output from the node.
-
a plot that orders the top 20 variables
by their chi-square statistics. You can also choose to view the top
20 variables ordered by their Cramer's V statistics on this plot.
Tip
In SAS Enterprise Miner, you
can select graphs, tables, and rows within tables and select
Copy from the right-click pop-up menu to copy these items
for subsequent pasting in other applications such as Microsoft Word
and Microsoft Excel.
-
Expand the
Output window, and then scroll
to the
Class Variable Summary Statistics and
the
Interval Variable Summary Statistics sections of the
output.
-
Notice that there are four variables
for which there are missing values. Later in the example, you will
impute values to use in the place of missing values for these variables.
-
Notice that several variables have
relatively large standard deviations. Later in the example, you will
plot the data and explore transformations that can reduce the variances
of these variables.
-
Close the
Results window.