The HPSAMPLE Procedure

Getting Started: HPSAMPLE Procedure

The following example shows a 10% stratified sampling, with the target variable BAD used by the HPSAMPLE procedure as a stratum:

 proc hpsample data=Sampsio.Hmeq  out=Smp samppct=10 seed=1234 partition;
     var loan derog mortdue value yoj delinq 
         clage ninq clno debtinc;
     class bad reason job;
     target bad;
 run;
 proc print data=Smp;run;

The input data set Sampsio.Hmeq includes information about 5,960 fictitious mortgages. Each observation represents an applicant for a home equity loan, and all applicants have an existing mortgage. The SAMPPCT=10 option requests that 10% of the input data be sampled. The SEED option specifies that the random seed to use in the sampling process is 1234. The PARTITION option specifies that the output data set, Smp, include an indicator that shows whether each observation is selected for the sample (1) or not (0). The VAR statement specifies 10 numeric input variables, and the CLASS statement specifies 3 classification input variables. All these variables are included in the output sample. The binary TARGET variable BAD indicates whether an applicant eventually defaulted or was ever seriously delinquent. The displayed output contains a performance table (Figure 8.1) that shows the performance environment information and a frequency table (Figure 8.2) that shows the frequency of observations in each level of BAD.

Figure 8.1: Performance Information

The HPSAMPLE Procedure

Performance Information
Execution Mode Single-Machine
Number of Threads 1


Figure 8.2: Frequency Table

Stratified Sampling Frequency Table
Target Level Number of Obs Number of Sample
0 4771 478
1 1189 118