The HPBIN Procedure

Example 4.2 Pseudo–Quantile Binning in Distributed Mode

This example shows pseudo–quantile binning that is executed in distributed mode. The following DATA step generates 1,000,000 observations:

    data ex12;
        length id 8;
        do id=1 to 1000000;
            x1 = ranuni(101);
            x2 = 10*ranuni(201);
            output;
        end;
    run;
    

To run this program in distributed mode, you need to specify two macros for the GRIDHOST= and GRIDINSTALLLOC= options:

 ods output BinInfo=bininfo;
 ods output PerformanceInfo=perfInfo;
 ods output Mapping=mapTable;
 ods output Summary=Summary;
 ods output Quantile=Quantile;
 ods listing close;
 proc hpbin data=ex12 output=out numbin=10 pseudo_quantile
            computestats computequantile ;
     input x1-x2;
     performance nodes=4 nthreads=8 details
     host="&GRIDHOST" install="&GRIDINSTALLLOC";
 run;
 ods listing;

The Performance Information table in Output 4.2.1 shows the grid setting.

Output 4.2.1: PROC HPBIN Performance Information

Performance Information
Host Node << your grid host >>
Install Location << your grid install location >>
Execution Mode Distributed
Grid Mode Symmetric
Number of Compute Nodes 4
Number of Threads per Node 8


The Binning Information table in Output 4.2.2 shows the binning method, number of bins, and number of variables.

Output 4.2.2: PROC HPBIN Binning Information

Binning Information
Method Pseudo-Quantile Binning
Number of Bins Specified 10
Number of Variables 2


The Mapping table in Output 4.2.3 shows the level mapping of the input variables.

Output 4.2.3: PROC HPBIN Mapping

Mapping
Variable Binned Variable Range Frequency Proportion
x1 BIN_x1 x1 < 0.099900 100046 0.10005
    0.099900 <= x1 < 0.199500 100029 0.10003
    0.199500 <= x1 < 0.299300 100016 0.10002
    0.299300 <= x1 < 0.399500 99939 0.09994
    0.399500 <= x1 < 0.500000 100049 0.10005
    0.500000 <= x1 < 0.599800 99989 0.09999
    0.599800 <= x1 < 0.700400 99975 0.09998
    0.700400 <= x1 < 0.800300 100014 0.10001
    0.800300 <= x1 < 0.900299 100007 0.10001
    0.900299 <= x1 99936 0.09994
x2 BIN_x2 x2 < 0.997008 100006 0.10001
    0.997008 <= x2 < 1.995006 100025 0.10003
    1.995006 <= x2 < 2.994005 99986 0.09999
    2.994005 <= x2 < 3.995004 100034 0.10003
    3.995004 <= x2 < 4.999002 99990 0.09999
    4.999002 <= x2 < 5.998001 100063 0.10006
    5.998001 <= x2 < 6.993000 99929 0.09993
    6.993000 <= x2 < 7.998998 100008 0.10001
    7.998998 <= x2 < 8.999997 100010 0.10001
    8.999997 <= x2 99949 0.09995


The Summary Statistics table in Output 4.2.4 displays the basic statistical information, including the number of observations, number of missing observations, mean, median, and so on.

Output 4.2.4: PROC HPBIN Summary Statistics Table

Summary Statistics
Variable N N Missing Mean Median Std Dev Minimum Maximum N Bins
x1 1000000 0 0.49984 0.49991 0.28895 2.24449E-7 1.00000 10
x2 1000000 0 4.99688 4.99852 2.88736 9.10833E-6 10.00000 10


The Quantiles and Extremes table in Output 4.2.5 shows the quantile computation of the given variables. The ODS table is generated only when the COMPUTESTATS option is specified in the PROC HPBIN statement.

Output 4.2.5: PROC HPBIN Quantile Computation

Quantiles and Extremes
Variable Quantile Level Quantile
x1 Max 1.00000
  .99 0.99012
  .95 0.95025
  .90 0.90024
  .75 Q3 0.75032
  .50 Median 0.49991
  .25 Q1 0.24932
  .10 0.09986
  .05 0.04954
  .01 0.01001
  Min 2.24449E-7
x2 Max 10.00000
  .99 9.90137
  .95 9.49989
  .90 8.99939
  .75 Q3 7.49894
  .50 Median 4.99852
  .25 Q1 2.49432
  .10 0.99692
  .05 0.49879
  .01 0.10062
  Min 9.10833E-6