This example shows pseudo–quantile binning that is executed in distributed mode. The following DATA step generates 1,000,000 observations:
data ex12; length id 8; do id=1 to 1000000; x1 = ranuni(101); x2 = 10*ranuni(201); output; end; run;
To run this program in distributed mode, you need to specify two macros for the GRIDHOST= and GRIDINSTALLLOC= options:
ods output BinInfo=bininfo; ods output PerformanceInfo=perfInfo; ods output Mapping=mapTable; ods output Summary=Summary; ods output Quantile=Quantile; ods listing close; proc hpbin data=ex12 output=out numbin=10 pseudo_quantile computestats computequantile ; input x1-x2; performance nodes=4 nthreads=8 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; run; ods listing;
The “Performance Information” table in Output 3.2.1 shows the grid setting.
Output 3.2.1: PROC HPBIN Performance Information
Performance Information | |
---|---|
Host Node | << your grid host >> |
Install Location | << your grid install location >> |
Execution Mode | Distributed |
Grid Mode | Symmetric |
Number of Compute Nodes | 4 |
Number of Threads per Node | 8 |
The “Binning Information” table in Output 3.2.2 shows the binning method, number of bins, and number of variables.
Output 3.2.2: PROC HPBIN Binning Information
Binning Information | |
---|---|
Method | Pseudo-Quantile Binning |
Number of Bins Specified | 10 |
Number of Variables | 2 |
The “Mapping” table in Output 3.2.3 shows the level mapping of the input variables.
Output 3.2.3: PROC HPBIN Mapping
Mapping | ||||
---|---|---|---|---|
Variable | Binned Variable | Range | Frequency | Proportion |
x1 | BIN_x1 | x1 < 0.099900 | 100046 | 0.10005 |
0.099900 <= x1 < 0.199500 | 100029 | 0.10003 | ||
0.199500 <= x1 < 0.299300 | 100016 | 0.10002 | ||
0.299300 <= x1 < 0.399500 | 99939 | 0.09994 | ||
0.399500 <= x1 < 0.500000 | 100049 | 0.10005 | ||
0.500000 <= x1 < 0.599800 | 99989 | 0.09999 | ||
0.599800 <= x1 < 0.700400 | 99975 | 0.09998 | ||
0.700400 <= x1 < 0.800300 | 100014 | 0.10001 | ||
0.800300 <= x1 < 0.900299 | 100007 | 0.10001 | ||
0.900299 <= x1 | 99936 | 0.09994 | ||
x2 | BIN_x2 | x2 < 0.997008 | 100006 | 0.10001 |
0.997008 <= x2 < 1.995006 | 100025 | 0.10003 | ||
1.995006 <= x2 < 2.994005 | 99986 | 0.09999 | ||
2.994005 <= x2 < 3.995004 | 100034 | 0.10003 | ||
3.995004 <= x2 < 4.999002 | 99990 | 0.09999 | ||
4.999002 <= x2 < 5.998001 | 100063 | 0.10006 | ||
5.998001 <= x2 < 6.993000 | 99929 | 0.09993 | ||
6.993000 <= x2 < 7.998998 | 100008 | 0.10001 | ||
7.998998 <= x2 < 8.999997 | 100010 | 0.10001 | ||
8.999997 <= x2 | 99949 | 0.09995 |
The “Summary Statistics” table in Output 3.2.4 displays the basic statistical information, including the number of observations, number of missing observations, mean, pseudo-median, and so on.
Output 3.2.4: PROC HPBIN Summary Statistics Table
Summary Statistics | ||||||||
---|---|---|---|---|---|---|---|---|
Variable | N | N Missing | Mean | Pseudo Median |
Std Dev | Minimum | Maximum | N Bins |
x1 | 1000000 | 0 | 0.49984 | 0.49990 | 0.28895 | 2.24449E-7 | 1.00000 | 10 |
x2 | 1000000 | 0 | 4.99688 | 4.99801 | 2.88736 | 9.10833E-6 | 10.00000 | 10 |
The “Estimated Quantiles and Extremes” table in Output 3.2.5 shows the quantile estimation of the given variables. The ODS table is generated only when the COMPUTESTATS option is specified in the PROC HPBIN statement.
Output 3.2.5: PROC HPBIN Quantile Estimation
Estimated Quantiles and Extremes | ||
---|---|---|
Variable | Quantile Level | Quantile |
x1 | Max | 1.00000 |
.99 | 0.99010 | |
.95 | 0.95020 | |
.90 | 0.90020 | |
.75 Q3 | 0.75030 | |
.50 Median | 0.49990 | |
.25 Q1 | 0.24930 | |
.10 | 0.09980 | |
.05 | 0.04950 | |
.01 | 0.01000 | |
Min | 2.24449E-7 | |
x2 | Max | 10.00000 |
.99 | 9.90102 | |
.95 | 9.49900 | |
.90 | 8.99901 | |
.75 Q3 | 7.49800 | |
.50 Median | 4.99801 | |
.25 Q1 | 2.49401 | |
.10 | 0.99601 | |
.05 | 0.49802 | |
.01 | 0.10001 | |
Min | 9.10833E-6 |