The following example shows a 10% stratified sampling, with the target variable BAD
used by the HPSAMPLE procedure as a stratum:
proc hpsample data=Sampsio.Hmeq out=Smp samppct=10 seed=1234 partition; var loan derog mortdue value yoj delinq clage ninq clno debtinc; class bad reason job; target bad; run; proc print data=Smp;run;
The input data set Sampsio.Hmeq
includes information about 5,960 fictitious mortgages. Each observation represents an applicant for a home equity loan, and
all applicants have an existing mortgage. The SAMPPCT=10 option specifies that 10% of the input data be sampled. The SEED
option specifies that the random seed used in the sampling process be 1234. The PARTITION option specifies that the output
data set, Smp
, include an indicator that shows whether each observation is selected to the sample (1) or not (0). The VAR statement specifies
10 numeric input variables, and the CLASS statement specifies three classification input variables. All these variables are
included in the output sample. The binary TARGET variable BAD
indicates whether an applicant eventually defaulted or was ever seriously delinquent. The TARGET statement triggers stratified
sampling, which enables you to sample each subpopulation in the target variable (stratum) independently. The displayed output
contains a performance table (FigureĀ 9.1) that shows the performance environment information and a frequency table (FigureĀ 9.2) that shows the frequency of observations in each level of BAD
.
Figure 9.1: Performance Information
Figure 9.2: Frequency Table