The HPSAMPLE Procedure

Example 9.3 Running with Data on the SAS Appliance

This example uses the same data set as in Example 9.1. It demonstrates how to use PROC HPSAMPLE to perform oversampling.

When the input data set resides on the SAS appliance, the SAS appliance performs all samplings, writes out a sample on the SAS appliance, and reports the frequency results back to the client. In the following statements, the input data resides in the MyLib library (which is a distributed data source), and the output data set is a distributed data set in the MyLib library. You can use PROC DATASETS to delete the output table if it already exists on the SAS appliance. The ods output FreqTable=Freqtab; statement saves the frequency table to a SAS data set called Freqtab on the client.

 /*MyLib is a libref for a distributed data source
   In this case, the computation is automatically done
   on the SAS Appliance.*/

 option set=GRIDHOST       = "&GRIDHOST";
 option set=GRIDINSTALLLOC = "&GRIDINSTALLLOC";
 option set=GRIDMODE = "&GRIDMODE";
 libname MyLib &LIBTYPE
         server  ="&GRIDDATASERVER"
         user    =&USER
         password=&PASSWORD
         database=&DATABASE;

 proc delete data=MyLib.out_smp_ex3; run;

 proc hpsample data=MyLib.hmeq out=MyLib.out_smp_ex3 seed=13579 partition
    samppctevt=80  eventprop=.2 event="SALES";
     var loan value delinq derog;
     class job;
     target job;
     ods output FreqTable=Freqtab;
 run;

Output 9.3.1 shows the performance environment information.

Output 9.3.1: Performance Information

The HPSAMPLE Procedure

Performance Information
Host Node greenarrow.unx.sas.com
Execution Mode Distributed
Grid Mode Symmetric
Number of Compute Nodes 16
Number of Threads per Node 1


Output 9.3.2 shows the number of observations in each level of target variable JOB in the data set MyLib.Hmeq and in the sample. After oversampling, the proportion of SALES level is adjusted to 20% in the sample from the original 1.8% in the population.

Output 9.3.2: Frequency Table

Oversampling Frequency Table
Target Level Number of Obs Number of Samples
  279 15
MGR 767 47
OFFICE 948 56
OTHER 2388 142
PROFEXE 1276 77
SALES 109 87
SELF 193 15