The HPSAMPLE Procedure

Getting Started: HPSAMPLE Procedure

The following example shows a 10% stratified sampling, with the target variable BAD used by the HPSAMPLE procedure as a stratum:

 proc hpsample data=Sampsio.Hmeq  out=Smp samppct=10 seed=1234 partition;
     var loan derog mortdue value yoj delinq
         clage ninq clno debtinc;
     class bad reason job;
     target bad;
 proc print data=Smp;run;

The input data set Sampsio.Hmeq includes information about 5,960 fictitious mortgages. Each observation represents an applicant for a home equity loan, and all applicants have an existing mortgage. The SAMPPCT=10 option specifies that 10% of the input data be sampled. The SEED option specifies that the random seed used in the sampling process be 1234. The PARTITION option specifies that the output data set, Smp, include an indicator that shows whether each observation is selected to the sample (1) or not (0). The VAR statement specifies 10 numeric input variables, and the CLASS statement specifies three classification input variables. All these variables are included in the output sample. The binary TARGET variable BAD indicates whether an applicant eventually defaulted or was ever seriously delinquent. The TARGET statement triggers stratified sampling, which enables you to sample each subpopulation in the target variable (stratum) independently. The displayed output contains a performance table (Figure 9.1) that shows the performance environment information and a frequency table (Figure 9.2) that shows the frequency of observations in each level of BAD.

Figure 9.1: Performance Information

The HPSAMPLE Procedure

Performance Information
Execution Mode Single-Machine
Number of Threads 1

Figure 9.2: Frequency Table

One Target Stratified Sampling Frequency
Target Level Number of Obs Number of Samples
0 4771 478
1 1189 118