The HPBIN Procedure

Example 4.2 Pseudo–Quantile Binning in Distributed Mode

This example shows pseudo–quantile binning that is executed in distributed mode. The following DATA step generates 1,000,000 observations:

data ex12;
   length id 8;
   do id=1 to 1000000;
      x1 = ranuni(101);
      x2 = 10*ranuni(201);
      output;
   end;
run;

You can run PROC HPBIN in distributed mode by specifying valid values for the NODES=, INSTALL=, and HOST= options in the PERFORMANCE statement. An alternative to specifying the INSTALL= and HOST= options in the PERFORMANCE statement is to set appropriate values for the GRIDHOST and GRIDINSTALLLOC environment variables by using OPTIONS SET commands. See the section Processing Modes in Chapter 3: Shared Concepts and Topics, for details about setting these options or environment variables.

The following statements provide an example. To run these statements successfully, you need to set the macro variables GRIDHOST and GRIDINSTALLLOC to resolve to appropriate values, or you can replace the references to macro variables with appropriate values.

ods output BinInfo=bininfo;
ods output PerformanceInfo=perfInfo;
ods output Mapping=mapTable;
ods output Summary=Summary;
ods output Quantile=Quantile;
ods listing close;
proc hpbin data=ex12 output=out numbin=10 pseudo_quantile
   computestats computequantile ;
   input x1-x2;
   performance nodes=4 nthreads=8
   host="&GRIDHOST" install="&GRIDINSTALLLOC";
run;
ods listing;

The "Performance Information" table in Output 4.2.1 shows the grid setting.

Output 4.2.1: PROC HPBIN Performance Information

Performance Information
Host Node << your grid host >>
Install Location << your grid install location >>
Execution Mode Distributed
Number of Compute Nodes 4
Number of Threads per Node 8



The "Binning Information" table in Output 4.2.2 shows the binning method, number of bins, and number of variables.

Output 4.2.2: PROC HPBIN Binning Information

Binning Information
Method Pseudo-Quantile Binning
Number of Bins Specified 10
Number of Variables 2



The "Mapping" table in Output 4.2.3 shows the level mapping of the input variables.

Output 4.2.3: PROC HPBIN Mapping

Mapping
Variable Binned Variable Range Frequency Proportion
x1 BIN_x1 x1 < 0.0999001409 100046 0.10004600
    0.0999001409 <= x1 < 0.1995000577 100029 0.10002900
    0.1995000577 <= x1 < 0.2992999743 100016 0.10001600
    0.2992999743 <= x1 < 0.3994998905 99939 0.09993900
    0.3994998905 <= x1 < 0.4999998065 100049 0.10004900
    0.4999998065 <= x1 < 0.5997997231 99989 0.09998900
    0.5997997231 <= x1 < 0.700399639 99975 0.09997500
    0.700399639 <= x1 < 0.8002995555 100014 0.10001400
    0.8002995555 <= x1 < 0.9002994719 100007 0.10000700
    0.9002994719 <= x1 99936 0.09993600
x2 BIN_x2 x2 < 0.9970077388 100006 0.10000600
    0.9970077388 <= x2 < 1.9950063678 100025 0.10002500
    1.9950063678 <= x2 < 2.9940049955 99986 0.09998600
    2.9940049955 <= x2 < 3.9950036204 100034 0.10003400
    3.9950036204 <= x2 < 4.9990022412 99990 0.09999000
    4.9990022412 <= x2 < 5.9980008689 100063 0.10006300
    5.9980008689 <= x2 < 6.992999502 99929 0.09992900
    6.992999502 <= x2 < 7.9989981201 100008 0.10000800
    7.9989981201 <= x2 < 8.999996745 100010 0.10001000
    8.999996745 <= x2 99949 0.09994900



The "Summary Statistics" table in Output 4.2.4 displays the basic statistical information, including the number of observations, number of missing observations, mean, median, and so on.

Output 4.2.4: PROC HPBIN Summary Statistics Table

Summary Statistics
Variable N N Missing Mean Median Std Dev Minimum Maximum N Bins
x1 1000000 0 0.49984213 0.49991238 0.28894736 2.24449E-7 0.99999939 10
x2 1000000 0 4.99688234 4.99851593 2.88736227 9.10833E-6 9.99999537 10



The "Quantiles and Extremes" table in Output 4.2.5 shows the quantile computation of the variables. The ODS table is generated only when the COMPUTESTATS option is specified in the PROC HPBIN statement.

Output 4.2.5: PROC HPBIN Quantile Computation

Quantiles and Extremes
Variable Quantile Level Quantile
x1 Max 0.99999939
  .99 0.99011639
  .95 0.95024946
  .90 0.90023557
  .75 Q3 0.75032495
  .50 Median 0.49991238
  .25 Q1 0.24931534
  .10 0.09985729
  .05 0.04954403
  .01 0.01000524
  Min 2.24449E-7
x2 Max 9.99999537
  .99 9.90136979
  .95 9.49989152
  .90 8.99939011
  .75 Q3 7.49894200
  .50 Median 4.99851593
  .25 Q1 2.49431827
  .10 0.99691767
  .05 0.49879104
  .01 0.10062442
  Min 9.10833E-6