Example Process Flow Diagram

Task 14. Viewing the Expected Losses in the Score Data Set

  1. Add a Distribution Explorer node to the Diagram Workspace.

  2. Connect the Score node to the Distribution Explorer node.

  3. Open the configuration interface to the Distribution Explorer node and select the Data tab. The Score node exports the training, validation, and score data sets to the Distribution Explorer node. By default, the Distribution Explorer node selects the training data set as the active data set. To view the distribution of the expected loss values, you must first set the score data set as the active data set. In the Data tab, click Select, and then find and select the score data set in the Imports Map window. The score data set name contains the prefix "SD_". Click OK.

    [Imports Map window showing training, validation, and score data set in a GUI hierarchy viewer]

  4. Select the Variables tab. When you scored SAMPSIO.DMAGESCR, Enterprise Miner automatically created several score variables, such as predicted values, residuals, and classifications. Two important variables that you will plot are

    • EL_GOOD_BAD_ -- contains the expected loss values for making the good decision.

    • D_GOOD_BAD_ -- assigns either the accept or reject decision status to an applicant in the score data set.

    Note:   For a complete listing of the variable names that are written to the scored data sets see the Predictive Modeling section in the Enterprise Miner online reference documentation.  [cautionend]

    Help [arrow] EM Reference [arrow] Predictive Modeling
  5. To assign EL_GOOD_BAD_ as the X-axis variable, right-click in the Axis cell for this variable, select Set Axis, and then select X. Repeat these steps to assign D_GOOD_BAD_ as the Y-axis variable. You will use the D_GOOD_BAD_ variable in code that you write in the SAS Code node. Write down the name of this variable for future reference.

    [Variables tab of the Distribution Explorer window listing available variables for use in plots]

  6. To view a histogram of the expected losses for making the good decision, select the X Axis tab.

    [XAxis tab of the Distribution Explorer window showing expected loss histogram with 16 bins]

    Note:   The metadata sample is used to create the histogram in the X axis tab and the bar chart in the Y axis tab. For this example, there are only 75 applicants in the score data set. If the scored data set contains more observations than the metadata sample, then you should examine the histogram that is created when you run the node. The node uses all of the observations in the score data set to generate the graphs that are shown in the Results Browser.  [cautionend]

    Applicants who have negative expected loss values (the yellow bars) represent the customers who pose a good credit risk. All of these customers are assigned to the accept decision (D_GOOD_BAD_=accept). The orange and red bars represent the applicants that pose a bad credit risk. Because these applicants have positive expected loss values, they are assigned to the reject decision (D_GOOD_BAD_=reject).

  7. To view a bar chart of the accepted and rejected customers in the score data set, select the Y Axis tab.

    [Y Axis tab of Distribution Explorer window showing bar chart of accepted and rejected customers for the variable GOOD_BAD]

    Note:   The data in both charts is based on the metadata sample.  [cautionend]

    To determine the percentage of accepted and rejected customers in the score data set, use the Probe tool icon and click on a bar: the screen shows that 68% of the applicants were assigned to the accept decision, and 32% of the applicants were assigned to the reject decision.

    You can use the SAS Code node to create a data set that contains only those customers who pose a good credit risk (accept status). Alternatively, you could use the Filter Outliers node to create a data set that contains only the good credit applicants.

  8. To create a three-dimensional histogram of the expected losses for the accepted and rejected applicants, use the Tools menu to select Run Distribution Explorer.

    [Chart tab of the Distribution Explorer Results window showing 3D histogram of expected losses for accepted and rejected applicants.]

  9. After the node runs, click Yes in the message window to display the histogram. To display frequencies on the vertical axis, use the View menu to select Axes Statistics [arrow] Response Axis [arrow] Frequency

    [Same chart as above, but with Z-axis set to Frequency instead of Percentage.]

  10. Close the Results Browser and then close the Distribution Explorer node.

space
Previous Page | Next Page | Top of Page