Previous Page | Next Page

Working with Nodes That Sample, Explore, and Modify

Create Exploratory Plots

Enterprise Miner enables you to generate numerous data visualization graphics in order to reveal extreme values in the data and to discover patterns and trends. You use the MultiPlot node to visualize your data from a wide range of perspectives. With MultiPlot you can graphically explore large volumes of data, observe data distributions, and examine relationships among the variables. The MultiPlot node uses all of the observations for plotting.

In this task, you add a MultiPlot node to your diagram.

  1. Select the Explore tab from the node toolbar and drag a MultiPlot node into the Diagram Workspace. Connect the StatExplore node to the MultiPlot node.

    [untitled graphic]

  2. Select the MultiPlot node in the Diagram Workspace. In the Properties panel, set the Type of Charts property to Both in order to generate both scatter and bar charts.

    [untitled graphic]

  3. In the Diagram Workspace, right-click the MultiPlot node, and select Run.

  4. After the run is complete, select Results from the Run Status window.

  5. In the Results window, maximize the Train Graphs window.

    [untitled graphic]

    Click First, Previous, or Next at the bottom of the window to scroll through the graphs. You can also view a specific graph by selecting the variable on the selection box to the right of Last.

    You will notice several results in the graphs.

    • One value for the variable DONOR_GENDER is incorrectly recorded as an A.

    • There are several heavily skewed variables, such as FILE_AVG_GIFT, LAST_GIFT_AMT, LIFETIME_AVG_GIFT_AMT, LIFETIME_GIFT_AMOUNT, MOR_HIT_RATE, PCT_ATTRIBUTE1, and PCT_OWNER_OCCUPIED. You might want to consider a log transformation later.

    • Increasing values of LIFTIME_CARD_PROM, RECENT_RESPONSE_PROP, LIFETIME_GIFT_AMOUNT, LIFETIME_GIFT_COUNT , MEDIAN_HOME_VALUE, MEDIAN_HOUSEHOLD_INCOME, PER_CAPITA_INCOME, and RECENT_STAR_STATUS tend to be more associated with donors and are also heavily skewed. You might want to consider a bucket transformation that will be relative to the relationship with target.

    • Other variables, such as MONTHS_SINCE_LAST_PROM_RESP and NUMBER_PROM_12, show some good separation of the target values at both tails of the distribution.

  6. Close the Results window.

Previous Page | Next Page | Top of Page