Example Process Flow Diagram

Task 11. Assessing the Models

You use the Assessment node to judge the generalization properties of each predictive model based on characteristics such as their predictive power, lift, sensitivity, profit or loss.

  1. Add an Assessment node to the Diagram Workspace.

  2. Connect each modeling node to the Assessment node. Assessment statistics are automatically calculated by each modeling node during training. The Assessment node assembles these statistics, thus enabling you to compare the models with assessment charts.

  3. Open the configuration interface to the Assessment node. The Models tab of the Assessment Tool window is displayed.

  4. Select all three models by dragging your mouse pointer across each model row entry.

    [Models tab of Assessment Tool window showing Tree, Neural Network, and Regression models highlighted for selection.]

  5. To create a lift chart (gains chart) for the models, use the Tools pull-down menu to select Lift Chart. Alternatively, you can create a lift chart by selecting the Draw Lift Chart tool (the second tool on the Toolbox). By default, the validation data set is used to create the assessment charts. For a binary target, the lift chart does not adjust for the expected loss, it considers only the event posterior probabilities.

    [Lift Chart window plotting cumulative % response vs. Percentile for Baseline, Tree, Neural, and Regression Models.]

    By default, the Assessment node displays a cumulative %Response lift chart. For this example chart, the customer cases are sorted from left to right by individuals who are most likely to have good credit as predicted by each model. The sorted group is then divided into ten deciles along the X axis. The left-most decile represents the 10% of the customers who are most likely to have good credit. The vertical axis represents the actual cumulative response rate in each decile.

    The lift chart displays the cumulative % response values for a baseline model and for the three predictive models. The legend at the bottom of the display corresponds to each of the models in the chart. Note that the default legend might not have enough room to display all models.

    To resize the legend, select the Move and Resize Legend tool icon ( [Resize Legend Tool Icon]), click and hold the mouse pointer on the legend resize handles, and then drag the legend to obtain the desired size.

    You measure the performance of each model by determining how well the models capture the good credit risk applicants across the various deciles. To display a text box that shows the % response for a point, select the View Info tool ( [The View Info Tool Icon]), and then click and hold on a point. For the regression model, the second decile contains 97.92% good credit risk applicants.

    [Lift chart window with view info popup showing Percentile, % Response, and Tool Name information for the selected model and datapoint on the plot.]

    The regression model captures a few more good applicants in the second decile than the neural network model. For the other deciles, the performance of the neural network model is as good as or better than the regression model.

    For this data, the tree model does not perform quite as well as the regression and neural network models. Note that since a different random seed is used to generate the starting values for training the neural network, your results may differ slightly from the neural network results that are displayed in this lift chart.

  6. To create a loss chart, select Loss.

    [Lift Chart window with plot of Cumulative Loss vs Percentile for Baseline, Tree, and Neural models.]

    The loss chart shows the expected loss across all deciles for each model and for the baseline model. In the first and second deciles, the regression model provides minimal expected loss values of about -70 and -64 cents, respectively (remember that you can use the View Info tool to probe a point on the loss chart). In the remaining deciles, the neural network model provides smaller expected loss values than does the regression model. The performance of the tree model is not as good as the neural model or the regression model in the earlier deciles. The regression model becomes the poorest performing model beyond the fifth decile. In fact, beyond the seventh decile, the regression model yields positive loss values.

    There does not seem to be a clear-cut champion model to use for subsequent scoring. Actually, the German credit data set is a highly random data set, which makes it difficult to develop a really good model. This is typical of data mining problems. The selection of the champion model for scoring ultimately depends on how many applicants you intend to target. For this example, you will use the neural network model to score the SAMPSIO.DMAGESCR data set.

  7. Select the model for subsequent scoring. To export the neural network model to subsequent nodes in the process flow (for example, to the Score node), follow these steps:

    1. Select the Output tab.

    2. Click on the Neural Network entry in the Highlight Model for Output list box.

    [Output Tab of Assessment Tool window showing Neural Network model highlighted in list for selection.]

  8. Close the Assessment node.

space
Previous Page | Next Page | Top of Page