The HPBIN Procedure

Example 4.3 Quantile Binning in Distributed Mode

This example shows quantile binning that is executed in distributed mode. Most of this example is the same as the pseudo–quantile binning example (see Example 4.2: Pseudo–Quantile Binning in Distributed Mode), so you can easily compare these two binning methods. The following DATA step generates 1,000,000 observations:

data ex12;
   length id 8;
   do id=1 to 1000000;
      x1 = ranuni(101);
      x2 = 10*ranuni(201);
      output;
   end;
run;    

You can run PROC HPBIN in distributed mode by specifying valid values for the NODES=, INSTALL=, and HOST= options in the PERFORMANCE statement. An alternative to specifying the INSTALL= and HOST= options in the PERFORMANCE statement is to set appropriate values for the GRIDHOST and GRIDINSTALLLOC environment variables by using OPTIONS SET commands. See the section Processing Modes in Chapter 3: Shared Concepts and Topics, for details about setting these options or environment variables.

The following statements provide an example. To run these statements successfully, you need to set the macro variables GRIDHOST and GRIDINSTALLLOC to resolve to appropriate values, or you can replace the references to macro variables with appropriate values.

ods output BinInfo=bininfo;
ods output PerformanceInfo=perfInfo;
ods output Mapping=mapTable;
ods listing close;
proc hpbin data=ex12 output=out numbin=10 quantile;
   input x1-x2;
   performance nodes=4 nthreads=8
   host="&GRIDHOST" install="&GRIDINSTALLLOC";
run;
ods listing;

The "Performance Information" table in Output 4.3.1 shows the grid setting.

Output 4.3.1: PROC HPBIN Performance Information

Performance Information
Host Node << your grid host >>
Install Location << your grid install location >>
Execution Mode Distributed
Number of Compute Nodes 4
Number of Threads per Node 8



The "Binning Information" table in Output 4.3.2 shows the binning method, number of bins, and number of variables.

Output 4.3.2: PROC HPBIN Binning Information

Binning Information
Method Quantile Binning
Number of Bins Specified 10
Number of Variables 2



The "Mapping" table in Output 4.3.3 shows the level mapping of the input variables. As you can see from this table, when the binning method is quantile, PROC HPBIN assigns the same number of observations to each bin for the input variables if possible.

Output 4.3.3: PROC HPBIN Mapping

Mapping
Variable Binned Variable Range Frequency Proportion
x1 BIN_x1 x1 < 0.0998588647 100000 0.10000000
    0.0998588647 <= x1 < 0.1994129534 100000 0.10000000
    0.1994129534 <= x1 < 0.2992100247 100000 0.10000000
    0.2992100247 <= x1 < 0.3994717134 100000 0.10000000
    0.3994717134 <= x1 < 0.4999128976 100000 0.10000000
    0.4999128976 <= x1 < 0.5997462776 100000 0.10000000
    0.5997462776 <= x1 < 0.7003605509 100000 0.10000000
    0.7003605509 <= x1 < 0.8002305945 100000 0.10000000
    0.8002305945 <= x1 < 0.9002355914 100000 0.10000000
    0.9002355914 <= x1 100000 0.10000000
x2 BIN_x2 x2 < 0.9969235519 100000 0.10000000
    0.9969235519 <= x2 < 1.9947160254 100000 0.10000000
    1.9947160254 <= x2 < 2.9937471882 100000 0.10000000
    2.9937471882 <= x2 < 3.9946339088 100000 0.10000000
    3.9946339088 <= x2 < 4.998519884 100000 0.10000000
    4.998519884 <= x2 < 5.9970218949 100000 0.10000000
    5.9970218949 <= x2 < 6.9926729901 100000 0.10000000
    6.9926729901 <= x2 < 7.9985574996 100000 0.10000000
    7.9985574996 <= x2 < 8.9993908461 100000 0.10000000
    8.9993908461 <= x2 100000 0.10000000