From the Project Panel,
drag the
CS_REJECTS data source into the
Diagram Workspace. Connect the
CS_REJECTS data
set to the
Reject Inference node.
The Reject Inference
node attempts to infer the behavior (good or bad), or performance,
of the rejected applicants using three industry-accepted inference
methods. You can set the inference method using the
Inference
Method property.
The following inference
methods are supported in SAS Enterprise Miner:
-
Fuzzy —
Fuzzy classification
uses partial classifications of “good” and “bad”
to classify the rejected applicants in the augmented data set. Instead
of classifying observations as “good” and “bad,”
fuzzy classification allocates weight to observations in the augmented
data set. The weight reflects the observation's tendency to be “good”
or “bad.”
The partial classification
information is based on the probability of being good or bad based
on the model built with the CS_ACCEPTS data set that is applied to
the CS_REJECTS data set. Fuzzy classification multiplies these probabilities
by the user-specified Reject Rate parameter to form frequency variables.
This results in two observations for each observation in the Rejects
data. Let p(good) be the probability that an observation represents
a good applicant and p(bad) be the probability that an observation
represents a bad applicant. The first observation has a frequency
variable defined as (Reject Rate)*p(good) and a target variable of
0. The second observation has a frequency variable defined as (Reject
Rate)*p(bad) and a target value of 1.
Fuzzy is
the default inference method.
-
Hard Cutoff —
Hard
Cutoff classification classifies observations as either
good or bad based on a cutoff score. If you choose
Hard
Cutoff as your inference method, you must specify a
Cutoff
Score in the
Hard Cutoff properties.
Any score below the hard cutoff value is allocated a status of bad.
You must also specify the
Rejection Rate in
General properties.
The
Rejection Rate is applied to the CS_REJECTS
data set as a frequency variable.
-
Parceling —
Parceling distributes
binned, scored rejected applicants into either a good bin or a bad
bin. Distribution is based on the expected bad rates that are calculated
from the scores from the logistic regression model. The parameters
that must be defined for parceling vary according to the
Score
Range method that you select in the
Parceling properties
group. All parceling classifications require that you specify the
Rejection
Rate,
Score Range Method,
Min
Score,
Max Score, and
Score
Buckets properties.
You must specify a value
for the
Rejection Rate property
when you use either the
Hard Cutoff or
Parceling inference
method. The
Rejection Rate is used as a frequency
variable. The rate of bad applicants is defined as the number of bad
applicants divided by the total number of applicants. The value for
the
Rejection Rate property must be a real
number between 0.0001 and 1. The default value is 0.3.
The
Cutoff
Score property is used when you specify
Hard
Cutoff as the inference method. The
Cutoff
Score is the threshold score that is used to classify
good and bad observations in the
Hard Cutoff method.
Scores below the threshold value are assigned a status of bad; all
other observations are classified as good.
The
Parceling properties group
is available when you specify
Parceling as
the inference method.
The following properties
are available in the Parceling properties
group:
-
Score Range Method —
Use the
Score Range Method property to specify
how you want to define the range of scores to be bucketed. The available
methods are as follows:
-
Accepts —
The
Accepts score range method distributes
the rejected applicants into equal-sized buckets based on the score
range of the CS_ACCEPTS data set.
-
Rejects —
The
Rejects score range method distributes
the rejected applicants into equal-sized buckets based on the score
range of the CS_REJECTS data set.
-
Scorecard —
The
Scorecard score range method distributes
the rejected applicants into equal-sized buckets based on the score
range that is output by the augmented data set.
-
Manual —
The
Manual score range method distributes
the rejected applicants into equal-sized buckets based on the range
that you input.
-
Score Buckets —
Use the
Score Buckets property to specify
the number of buckets that you want to use to parcel the data set
into during attribute classification. Permissible
Score
Buckets property values are integers between 1 and 100.
The default setting for the
Score Buckets property
is 25.
When you use the
Parceling inference method,
you must also specify the
Event Rate Increase property.
The proportion of bad and good observations in the CS_REJECTS data
set is not expected to approximate the proportion of bad and good
observations in the CS_ACCEPTS data set. Logically, the bad rate of
the CS_REJECTS data set should be higher than that of the CS_ACCEPTS
data set. It is appropriate to use some coefficient to classify a
higher proportion of rejected applicants as bad. When you use
Parceling,
the observed event rate of the accepts data is multiplied by the value
of the
Event Rate Increase property to determine
the event rate for the rejects data. To configure a 20% increase,
set the Event Rate Increase property to 1.2.