Example Process Flow Diagram |
Add a Distribution Explorer node to the Diagram Workspace.
Connect the Score node to the Distribution Explorer node.
Open the configuration interface to the Distribution Explorer node and select the Data tab. The Score node exports the training, validation, and score data sets to the Distribution Explorer node. By default, the Distribution Explorer node selects the training data set as the active data set. To view the distribution of the expected loss values, you must first set the score data set as the active data set. In the Data tab, click
, and then find and select the score data set in the Imports Map window. The score data set name contains the prefix "SD_". Click .Select the Variables tab. When you scored SAMPSIO.DMAGESCR, Enterprise Miner automatically created several score variables, such as predicted values, residuals, and classifications. Two important variables that you will plot are
EL_GOOD_BAD_ -- contains the expected loss values for making the good decision.
D_GOOD_BAD_ -- assigns either the accept or reject decision status to an applicant in the score data set.
Note: For a complete listing of the variable names that are written to the scored data sets see the Predictive Modeling section in the Enterprise Miner online reference documentation.
Help EM Reference Predictive ModelingTo assign EL_GOOD_BAD_ as the X-axis variable, right-click in the Axis cell for this variable, select Set Axis, and then select X. Repeat these steps to assign D_GOOD_BAD_ as the Y-axis variable. You will use the D_GOOD_BAD_ variable in code that you write in the SAS Code node. Write down the name of this variable for future reference.
To view a histogram of the expected losses for making the good decision, select the X Axis tab.
Note: The metadata sample is used to create the histogram in the X axis tab and the bar chart in the Y axis tab. For this example, there are only 75 applicants in the score data set. If the scored data set contains more observations than the metadata sample, then you should examine the histogram that is created when you run the node. The node uses all of the observations in the score data set to generate the graphs that are shown in the Results Browser.
Applicants who have negative expected loss values (the yellow bars) represent the customers who pose a good credit risk. All of these customers are assigned to the accept decision (D_GOOD_BAD_=accept). The orange and red bars represent the applicants that pose a bad credit risk. Because these applicants have positive expected loss values, they are assigned to the reject decision (D_GOOD_BAD_=reject).
To view a bar chart of the accepted and rejected customers in the score data set, select the Y Axis tab.
Note: The data in both charts is based on the metadata sample.
To determine the percentage of accepted and rejected customers in the score data set, use the Probe tool icon and click on a bar: the screen shows that 68% of the applicants were assigned to the accept decision, and 32% of the applicants were assigned to the reject decision.
You can use the SAS Code node to create a data set that contains only those customers who pose a good credit risk (accept status). Alternatively, you could use the Filter Outliers node to create a data set that contains only the good credit applicants.
To create a three-dimensional histogram of the expected losses for the accepted and rejected applicants, use the Tools menu to select Run Distribution Explorer.
After the node runs, click View menu to select Axes Statistics Response Axis Frequency
in the message window to display the histogram. To display frequencies on the vertical axis, use theClose the Results Browser and then close the Distribution Explorer node.
Copyright © 2006 by SAS Institute Inc., Cary, NC, USA. All rights reserved.