Generate Descriptive Statistics

To use the StatExplore node to produce a statistical summary of the input data:
  1. Select the Explore tab on the Toolbar.
  2. Select the StatExplore node icon. Drag the node into the Diagram Workspace.
    Tip
    To determine which node an icon represents, position the mouse pointer over the icon and read the tooltip.
  3. Connect the DONOR_RAW_DATA input data source node to the StatExplore node.
    To connect the two nodes, position the mouse pointer over the right edge of the input data source node until the pointer becomes a pencil. With the left mouse button held down, drag the pencil to the left edge of the StatExplore node. Then, release the mouse button. An arrow between the two nodes indicates a successful connection.
    Process Flow Diagram
  4. Select the StatExplore node. In the Properties Panel, scroll down to view the Chi-Square Statistics properties group. Click on the value of Interval Variables and select Yes from the drop-down menu that appears.
    Chi-square statistics are always computed for categorical variables. Changing the selection for interval variables causes SAS Enterprise Miner to distribute interval variables into five (by default) bins and compute chi-square statistics for the binned variables when you run the node.
    Properties Panel
  5. In the Diagram Workspace, right-click the StatExplore node, and select Run from the resulting menu. Click Yes in the Confirmation window that opens.
    When you run a node, all of the nodes preceding it in the process flow are also run in order, beginning with the first node that has changed since the flow was last run. If no nodes other than the one that you select have changed since the last run, then only the node that you select is run. You can watch the icons in the process flow diagram to monitor the status of execution.
    • Nodes that are outlined in green are currently running.
    • Nodes that are denoted with a check mark inside a green circle have successfully run.
    • Nodes that are outlined in red have failed to run due to errors.
    In this example, the DONOR_RAW_DATA input data node had not yet been run. Therefore, both nodes are run when you select to run the StatExplore node.
  6. In the window that appears when processing completes, click Results. The Results window appears.
    Stat Explore Node Results Window
    Note: Panels in Results windows might not have the same arrangement on your screen, due to window resizing. When the Results window is resized, SAS Enterprise Miner redistributes panels for optimal viewing.
    The results window displays the following:
    • a plot that orders the variables by their worth in predicting the target variable.
      Note: In the StatExplore node, SAS Enterprise Miner calculates variable worth using the Gini split worth statistic that would be generated by building a decision tree of depth 1. For detailed information about Gini split worth, see the SAS Enterprise Miner Help.
    • the SAS output from the node.
    • a plot that orders the top 20 variables by their chi-square statistics. You can also choose to view the top 20 variables ordered by their Cramer's V statistics on this plot.
    Tip
    In SAS Enterprise Miner, you can select graphs, tables, and rows within tables and select Copy from the right-click pop-up menu to copy these items for subsequent pasting in other applications such as Microsoft Word and Microsoft Excel.
  7. Expand the Output window, and then scroll to the Class Variable Summary Statistics and the Interval Variable Summary Statistics sections of the output.
    • Notice that there are two class variables and two interval variables for which there are missing values. Later in the example, you will impute values to use in the place of missing values for these variables.
    • Notice that several variables have relatively large standard deviations. Later in the example, you will plot the data and explore transformations that can reduce the variances of these variables.
  8. Close the Results window.