This example uses the same data set as is used in Example 9.1. This example demonstrates how to use PROC HPSAMPLE to perform stratified sampling.
When the input data set resides on the client and a PERFORMANCE statement with a NODES= option is specified, as in the following statements, PROC HPSAMPLE copies the data set to the SAS appliance, where the sampling is performed:
/*Perform the computation on the SAS appliance using 2 nodes*/ option set=GRIDHOST="&GRIDHOST"; option set=GRIDINSTALLLOC="&GRIDINSTALLLOC"; proc hpsample data=sampsio.hmeq out=out2 samppct=10 seed=13579 partition; var loan value delinq derog; class job reason; target job; performance nodes = 2; run; proc print data=out2(obs=15); run;
Output 9.2.1 shows the performance environment information.
Output 9.2.2 shows the frequency information for each level of target variable JOB in the data set Sampsio.Hmeq
and in the sample.
Output 9.2.3 shows the first 15 output sample observations that contain "_PARTIND_", which indicates whether the observation is selected for the sample (1) or not (0).
Output 9.2.3: Sample Output with Partition Indicator
Obs | JOB | REASON | LOAN | VALUE | DELINQ | DEROG | _PartInd_ |
---|---|---|---|---|---|---|---|
1 | Other | HomeImp | 1100 | 39025 | 0 | 0 | 0 |
2 | Other | HomeImp | 1300 | 68400 | 2 | 0 | 0 |
3 | Other | HomeImp | 1500 | 16700 | 0 | 0 | 0 |
4 | 1500 | . | . | . | 0 | ||
5 | Office | HomeImp | 1700 | 112000 | 0 | 0 | 0 |
6 | Other | HomeImp | 1700 | 40320 | 0 | 0 | 0 |
7 | Other | HomeImp | 1800 | 57037 | 2 | 3 | 0 |
8 | Other | HomeImp | 1800 | 43034 | 0 | 0 | 1 |
9 | Other | HomeImp | 2000 | 46740 | 2 | 0 | 0 |
10 | Sales | HomeImp | 2000 | 62250 | 0 | 0 | 0 |
11 | 2000 | . | . | . | 0 | ||
12 | Office | HomeImp | 2000 | 29800 | 1 | 0 | 1 |
13 | Other | HomeImp | 2000 | 55000 | 0 | 0 | 0 |
14 | Mgr | 2000 | 87400 | 0 | 0 | 0 | |
15 | Other | HomeImp | 2100 | 83850 | 1 | 0 | 0 |