The following example shows a 10% stratified sampling, with the target variable BAD used by the HPSAMPLE procedure as a stratum:
proc hpsample data=Sampsio.Hmeq out=Smp samppct=10 seed=1234 partition;
var loan derog mortdue value yoj delinq
clage ninq clno debtinc;
class bad reason job;
target bad;
run;
proc print data=Smp;run;
The input data set Sampsio.Hmeq includes information about 5,960 fictitious mortgages. Each observation represents an applicant for a home equity loan, and
all applicants have an existing mortgage. The SAMPPCT=10 option requests that 10% of the input data be sampled. The SEED option
specifies that the random seed to use in the sampling process is 1234. The PARTITION option specifies that the output data
set, Smp, include an indicator that shows whether each observation is selected for the sample (1) or not (0). The VAR statement
specifies 10 numeric input variables, and the CLASS statement specifies 3 classification input variables. All these variables
are included in the output sample. The binary TARGET variable BAD indicates whether an applicant eventually defaulted or was ever seriously delinquent. The displayed output contains a performance
table (Figure 8.1) that shows the performance environment information and a frequency table (Figure 8.2) that shows the frequency of observations in each level of BAD.
Figure 8.1: Performance Information
| Performance Information | |
|---|---|
| Execution Mode | Single-Machine |
| Number of Threads | 1 |
Figure 8.2: Frequency Table
| Stratified Sampling Frequency Table | ||
|---|---|---|
| Target Level | Number of Obs | Number of Sample |
| 0 | 4771 | 478 |
| 1 | 1189 | 118 |