Category
|
Description
|
---|---|
Model Stability
|
Tracks the change in
distribution of the modeling data and scoring data.
|
Model Performance
|
|
Model Calibration
|
Checks the accuracy
of the PD and LGD models by comparing the correct quantification of
the risk components with the available standards.
|
Measure
|
Description
|
PD Report
|
LGD Report
|
---|---|---|---|
System Stability Index
(SSI)
|
SSI monitors the score
distribution over a time period.
|
Yes
|
Yes
|
Measure
|
Description
|
PD Report
|
LGD Report
|
---|---|---|---|
Accuracy
|
Accuracy is the proportion
of the total number of predictions that were correct.
|
Yes
|
No
|
Accuracy Ratio (AR)
|
AR is the summary index
of Cumulative Accuracy Profile (CAP) and is also known as Gini coefficient.
It shows the performance of the model that is being evaluated by depicting
the percentage of defaulted accounts that are captured by the model
across different scores.
|
Yes
|
Yes
|
Area Under Curve (AUC)
|
AUC can be interpreted
as the average ability of the rating model to accurately classify
non-default accounts and default accounts. It represents the discrimination
between the two populations. A higher area denotes higher discrimination.
When AUC is 0.5, it means that non-default accounts and default accounts
are randomly classified, and when AUC is 1, it means that the scoring
model accurately classifies non-default accounts and default accounts.
Thus, the AUC ranges between 0.5 and 1.
|
Yes
|
No
|
Bayesian Error Rate
(BER)
|
BER is the proportion
of the whole sample that is misclassified when the rating system is
in optimal use. For a perfect rating model, the BER has a value of
zero. A model's BER depends on the probability of default. The lower
the BER, and the lower the classification error, the better the model.
|
Yes
|
No
|
D Statistic
|
The D Statistic is the
mean difference of scores between default accounts and non-default
accounts, weighted by the relative distribution of those scores.
|
Yes
|
No
|
Error Rate
|
The Error Rate is the
proportion of the total number of incorrect predictions.
|
Yes
|
No
|
Information Statistic
(I)
|
The Information Statistic
value is a weighted sum of the difference between conditional default
and conditional non-default rates. The higher the value, the more
likely a model can predict a default account.
|
Yes
|
No
|
Kendall’s Tau-b
|
Kendall's tau-b is a
nonparametric measure of association based on the number of concordances
and discordances in paired observations. Kendall's tau values range
between -1 and +1, with a positive correlation indicating that the
ranks of both variables increase together. A negative association
indicates that as the rank of one variable increases, the rank of
the other variable decreases.
|
Yes
|
No
|
Kullback-Leibler Statistic
(KL)
|
KL is a non-symmetric
measure of the difference between the distributions of default accounts
and non-default accounts. This score has similar properties to the
information value.
|
Yes
|
No
|
Kolmogorov-Smirnov Statistic
(KS)
|
KS is the maximum distance
between two population distributions. This statistic helps discriminate
default accounts from non-default accounts. It is also used to determine
the best cutoff in application scoring. The best cutoff maximizes
KS, which becomes the best differentiator between the two populations.
The KS value can range between 0 and 1, where 1 implies that the model
is perfectly accurate in predicting default accounts or separating
the two populations. A higher KS denotes a better model.
|
Yes
|
No
|
1–PH Statistic
(1–PH)
|
1-PH is the percentage
of cumulative non-default accounts for the cumulative 50% of the default
accounts.
|
Yes
|
No
|
Mean Square Error (MSE),
Mean Absolute Deviation (MAD), and Mean Absolute Percent Error (MAPE)
|
MSE, MAD, and MAPE are
generated for LGD reports. These statistics measure the differences
between the actual LGD and predicted LGD.
|
No
|
Yes
|
Pietra Index
|
The Pietra Index is
a summary index of Receiver Operating Characteristic (ROC) statistics
because the Pietra Index is defined as the maximum area of a triangle
that can be inscribed between the ROC curve and the diagonal of the
unit square.
The Pietra Index can
take values between 0 and 0.353. As a rating model's performance improves,
the value is closer to 0.353. This expression is interpreted as the
maximum difference between the cumulative frequency distributions
of default accounts and non-default accounts.
|
Yes
|
No
|
Precision
|
Precision is the proportion
of the actual default accounts among the predicted default accounts.
|
Yes
|
No
|
Sensitivity
|
Sensitivity is the ability
to correctly classify default accounts that have actually defaulted.
|
Yes
|
No
|
Somers’ D (p-value)
|
Somers' D is a nonparametric
measure of association that is based on the number of concordances
and discordances in paired observations. It is an asymmetric modification
of Kendall's tau. Somers' D differs from Kendall’s tau in that
it uses a correction only for pairs that are tied on the independent
variable. Values range between -1 and +1. A positive association indicates
that the ranks for both variables increase together. A negative association
indicates that as the rank of one variable increases, the rank of
the other variable decreases.
|
Yes
|
No
|
Specificity
|
Specificity is the ability
to correctly classify non-default accounts that have not defaulted.
|
Yes
|
No
|
Validation Score
|
The Validation Score
is the average scaled value of seven distance measures, anchored to
a scale of 1 to 13, lowest to highest. The seven measures are the
mean difference (D), the percentage of cumulative non-default accounts
for the cumulative 50% of the default accounts (1-PH), the maximum
deviation (KS), the Gini coefficient (G), the Information Statistic
(I), the Area Under the Curve (AUC), or Receiver Operating Characteristic
(ROC) statistic, and the Kullback-Leibler statistic (KL).
|
Yes
|
No
|
Measure
|
Description
|
PD Report
|
LGD Report
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Binomial Test
|
The Binomial Test evaluates
whether the PD of a pool is correctly estimated. It does not take
into account correlated defaults, and it generally yields an overestimate
of the significance of deviations in the realized default rate from
the forecast rate. The Modified Binomial Test now addresses the overestimate.
This test takes into account the correlated defaults
(footnote 1)
.
The default correlation coefficient in SAS Decision Manager
is 0.04. By using past banking evaluations, you can use these rho
values
(footnote 2)
:
If the number of default
accounts per pool exceeds either the low limit (binomial test at 0.95
confidence) or high limit (binomial test at 0.99 confidence), the
test suggests that the model is poorly calibrated.
To change the default
rho value, contact your application administrator. The value is a
report option in SAS Management Console.
|
Yes
|
No
|
||||||||
Brier Skill Score (BSS)
|
BSS measures the accuracy
of probability assessments at the account level. It measures the average
squared deviation between predicted probabilities for a set of events
and their outcomes. Therefore, a lower score represents a higher accuracy.
|
Yes
|
No
|
||||||||
Confidence Interval
|
The Confidence Interval
indicates the confidence interval band of the PD or LGD for a pool.
The Probability of Default report compares the actual and estimated
PD rates with the CI limit of the estimate. If the estimated PD lies
in the CI limits of the actual PD model, the PD performs better in
estimating actual outcomes.
For the Loss Given Default
(LGD) report, confidence intervals are based on the pool-level average
of the estimated LGD, plus or minus the pool-level standard deviation,
and multiplied by the 1-(alpha/2) quantile of the standard normal
distribution.
|
Yes
|
Yes
|
||||||||
Correlation Analysis
|
The model validation
report for LGD provides a correlation analysis of the estimated LGD
with the actual LGD. This correlation analysis is an important measure
for a model’s usefulness. The Pearson correlation coefficients
are provided at the pool and overall levels for each time period are
examined.
|
No
|
Yes
|
||||||||
Hosmer-Lemeshow Test
(p-value)
|
The Hosmer-Lemeshow
test is a statistical test for goodness-of-fit for classification
models. The test assesses whether the observed event rates match the
expected event rates in pools. Models for which expected and observed
event rates in pools are similar are well calibrated. The p-value
of this test is a measure of the accuracy of the estimated default
probabilities. The closer the p-value is to zero, the poorer the calibration
of the model.
|
Yes
|
No
|
||||||||
Mean Absolute Deviation
(MAD)
|
MAD is the distance
between the account level estimated and the actual loss LGD, averaged
at the pool level.
|
No
|
Yes
|
||||||||
Mean Absolute Percent
Error (MAPE)
|
MAPE is the absolute
value of the account-level difference between the estimated and actual
LGD, divided by the estimated LGD, and averaged at the pool level.
|
No
|
Yes
|
||||||||
Mean Squared Error (MSE)
|
MSE is the squared distance
between the account level estimated and actual LGD, averaged at the
pool level.
|
No
|
Yes
|
||||||||
Normal Test
|
The Normal Test compares
the normalized difference of predicted and actual default rates per
pool with two limits estimated over multiple observation periods.
This test measures the pool stability over time. If a majority of
the pools lie in the rejection region, to the right of the limits,
then the pooling strategy should be revisited.
|
Yes
|
No
|
||||||||
Observed versus Estimated
Index
|
The observed versus
estimated index is a measure of closeness of the observed and estimated
default rates. It measures the model's ability to predict default
rates. The closer the index is to zero, the better the model performs
in predicting default rates.
|
Yes
|
No
|
||||||||
Traffic Lights Test
|
The Traffic Lights Test
evaluates whether the PD of a pool is underestimated, but unlike the
binomial test, it does not assume that cross-pool performance is statistically
independent. If the number of default accounts per pool exceeds either
the low limit (Traffic Lights Test at 0.95 confidence) or high limit
(Traffic Lights Test at 0.99 confidence), the test suggests the model
is poorly calibrated.
|
Yes
|
No
|