Problem Formulation

Overview of the Predictive Modeling Case

A financial services company offers a home equity line of credit to its clients. The company has extended several thousand lines of credit in the past, and many of these accepted applicants (approximately 20%) have defaulted on their loans. By using geographic, demographic, and financial variables, the company wants to build a model to predict whether an applicant will default.

Input Data Source

After analyzing the data, the company selected a subset of 12 predictor (or input) variables to model whether each applicant defaulted. The response (or target) variable BAD indicates whether an applicant defaulted on the home equity line of credit. These variables, along with their model role, measurement level, and description are shown in the following table.
Note: This book uses uppercase for variable names. SAS accepts mixed case and lowercase variable names as well.
Name
Model Role
Measurement Level
Description
BAD
Target
Binary
A value of 1 indicates that the client defaulted on the loan or is seriously delinquent. A value of 0 indicates that the client paid off the loan.
CLAGE
Input
Interval
Age of the oldest credit line, measured in months
CLNO
Input
Interval
Number of credit lines
DEBTINC
Input
Interval
Debt-to-income ratio
DELINQ
Input
Interval
Number of delinquent credit lines
DEROG
Input
Interval
Number of major derogatory reports
JOB
Input
Nominal
Occupational categories
LOAN
Input
Interval
Amount requested for the loan
MORTDUE
Input
Interval
Amount due on the existing mortgage
NINQ
Input
Interval
Number of recent credit inquiries
REASON
Input
Binary
The value DebtCon indicates that the loan was intended for debt consolidation. The value HomeImp indicates that the loan was for home improvement.
VALUE
Input
Interval
Value of the current property
YOJ
Input
Interval
Years at the applicant’s current job
The SAMPSIO.HMEQ data set contains 5,960 observations for building and comparing competing models. The data set is split into training, validation, and test data sets for analysis.