The Characteristic report
detects and quantifies the shifts in the distribution of variable
values in the input data over time. Input data variable distribution
shifts can point to significant changes in customer behavior that
are due to new technology, competition, marketing promotions, new
laws, or other influences.
To find shifts, the
Characteristic report compares the distributions of the variables
in these two data sets:
-
the training data set that was
used to develop the model
-
If large enough shifts
occur in the distribution of variable values over time, the original
model might not be the best predictive or classification tool to use
with the current data.
The Characteristic report
uses a deviation index to quantify the shifts in a variable's values
distribution that can occur between the training data set and the
current data set. The deviation index is computed for each predictor
variable in the data set, using this equation:
Numeric predictor variable
values are placed into bins for frequency analysis. Outlier values
are removed to facilitate better placement of values and to avoid
scenarios that can aggregate most observations into a single bin.
If the training data
set and the current data set have identical distributions for a variable,
the variable's deviation index is equal to 0. A variable with a deviation
index value that is P1>2 is classified as having a mild deviation.
The Characteristic report uses the performance measure P1 to count
the number of variables that receive a deviation index value that
is greater than 0.1.
A variable that has
a deviation index value that is P1>5 or P25>0 is classified
as having a significant deviation. A performance measure P25 is used
to count the number of variables that have significant deviations,
or the number of input variables that receive a deviation index score
value that is greater than or equal to 0.25.