The following example shows a 10% stratified sampling, with the target variable BAD
used by the HPSAMPLE procedure as a stratum:
proc hpsample data=Sampsio.Hmeq out=Smp samppct=10 seed=1234 partition; var loan derog mortdue value yoj delinq clage ninq clno debtinc; class bad reason job; target bad; run; proc print data=Smp;run;
The input data set Sampsio.Hmeq
includes information about 5,960 fictitious mortgages. Each observation represents an applicant for a home equity loan, and
all applicants have an existing mortgage. The SAMPPCT=10 option requests that 10% of the input data be sampled. The SEED option
specifies that the random seed to use in the sampling process is 1234. The PARTITION option specifies that the output data
set, Smp, include an indicator that shows whether each observation is selected for the sample (1) or not (0). The VAR statement
specifies 10 numeric input variables, and the CLASS statement specifies 3 classification input variables. All these variables
are included in the output sample. The binary TARGET variable BAD
indicates whether an applicant eventually defaulted or was ever seriously delinquent. The displayed output contains a performance
table (Figure 8.1) that shows the performance environment information and a frequency table (Figure 8.2) that shows the frequency of observations in each level of BAD
.
Figure 8.1: Performance Information
Performance Information | |
---|---|
Execution Mode | Single-Machine |
Number of Threads | 1 |
Figure 8.2: Frequency Table
Stratified Sampling Frequency Table | ||
---|---|---|
Target Level | Number of Obs | Number of Sample |
0 | 4771 | 478 |
1 | 1189 | 118 |