Next, you will build
a new regression model based on the imputed data set. From the
Model tab,
drag a
Regression node to your diagram workspace.
Connect the
Impute node to the new
Regression node.
Right-click the
Regression
(2) node and click
Run. In the
Confirmation window,
click
Yes. After the node has successfully
run, click
Results in the
Run
Status window.
In the
Score
Rankings Overlay window, click
Cumulative
% Response on the drop-down menu in the upper left corner.
Notice that this model shows a smooth decrease in the cumulative percent
response, which contrasts with the previous model. This indicates
that the new model is much better than the first model at predicting
who will default on a loan.
The discussion of the
remaining charts refers to those who defaulted on a loan as defaults
or respondents. This is because the target level of interest is BAD=1.
Position the mouse over
a point on the
Cumulative % Response plot
to see the cumulative percent response for that point on the curve.
Notice that at the first decile (the top 10% of the data), approximately
69% of the loan recipients default on their loan.
On the drop-down menu
in the upper left corner of the
Score Rankings Overlay window,
click
% Response. This chart shows the non-cumulative
percentage of loan recipients that defaulted at each level of the
input data.
On the drop-down menu,
click
Cumulative Lift.
Lift charts plot the
same information about a different scale. As discussed earlier, the
overall response rate is approximately 20%. You calculate lift by
dividing the response rate in a given group by the overall response
rate. The percentage of respondents in the first decile was approximately
69%, so the lift for that decile is approximately 69/20 = 3.45. Position
the cursor over the cumulative lift chart at the first decile to see
that the calculated lift for that point is 3.4. This indicates that
the response rate in the first decile is more than three times greater
than the response rate in the population.
Instead of asking the
question, “What percentage of those in a bin were defaulters?”,
you could ask the question, “What percentage of the total number
of defaulters are in a bin?” The latter question can be evaluated
by using the
Cumulative % Captured Response curve.
Click
Cumulative % Captured Response on the
drop-down menu.
You can calculate lift
from this chart as well. If you were to take a random sample of 10%
of the observations, you would expect to capture 10% of the defaulters.
Likewise, if you take a random sample of 20% of the data, you would
expect to capture 20% of the defaulters. To calculate lift, divide
the proportion of the defaulters that were captured by the percentage
of those whom you have chosen for action (rejection of the loan application).
Note that at the 20th
percentile, approximately 55% of those who defaulted are identified.
At the 30th percentile, approximately 68% of those who defaulted are
identified. The corresponding lift values for those two percentiles
are approximately 2.75 and 2.27, respectively. Observe that lift depends
on the proportion of those who have been chosen for action. Lift generally
decreases as you choose larger proportions of the data for action.
When you compare two models on the same proportion of the data, the
model that has the higher lift is often preferred, excluding issues
that involve model complexity and interpretability.
Note: A model that performs best
in one decile might perform poorly in other deciles. Therefore, when
you compare competing models, your choice of the final model might
depend on the proportion of individuals that you have chosen for action.
As with the initial
Regression node
results, you can view the
Effects Plot for
this model. Note that in this model, the most important variables
are DELINQ, JOB, DEROG, NINQ, and REASON.