This example shows the use of the HPQLIM procedure with an emphasis on processing a large data set and on the performance improvements that are achieved by executing in the high-performance distributed environment.
The following DATA step generates 5 million replicates from a censored model. The model contains seven variables.
data simulate; call streaminit(12345); array vars x1-x7; array parms{7} (3 4 2 4 -3 -5 -3); intercept=2; do i=1 to 5000000; sum_xb=0; do j=1 to 7; vars[j]=rand('NORMAL',0,1); sum_xb=sum_xb+parms[j]*vars[j]; end; y=intercept+sum_xb+400*rand('NORMAL',0,1); if y>400 then y=400; if y<0 then y=0; output; end; keep y x1-x7; run;
The following statements estimate a censored model. The model is executed in the distributed computing environment with two threads and only one node. These settings are used to obtain a hypothetical environment that might resemble running the HPQLIM procedure on a desktop workstation with a dual-core CPU. To run these statements successfully, you need to set the macro variables GRIDHOST and GRIDINSTALLLOC to resolve to appropriate values, or you can replace the references to the macro variables in the example with the appropriate values.
option set=GRIDHOST="&GRIDHOST"; option set=GRIDINSTALLLOC="&GRIDINSTALLLOC";
proc hpqlim data=simulate ; performance nthreads=2 nodes=1 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; model y=x1-x7 /censored(lb=0 ub=400); run;
Output 22.1.1 shows that the censored model was estimated on the grid, defined in a macro variable named GRIDHOST, in a distributed environment on only one node with two threads.
Output 22.1.1: Censored Model with One Node and Two Threads: Performance Table
Output 22.1.2 shows the estimation results for the censored model. The "Model Fit Summary" table shows detailed information about the model and indicates that all 5 million observations were used to fit the model. All parameter estimates in the "Parameter Estimates" table are highly significant and correspond to their theoretical values that were set during the data generating process. The optimization of the model with 5 million observations took 45.4 seconds.
Output 22.1.2: Censored Model with One Node and Two Threads: Summary
Parameter Estimates | |||||
---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
t Value | Approx Pr > |t| |
Intercept | 1 | 2.220379 | 0.222201 | 9.99 | <.0001 |
x1 | 1 | 3.055533 | 0.201620 | 15.15 | <.0001 |
x2 | 1 | 4.000176 | 0.201570 | 19.85 | <.0001 |
x3 | 1 | 1.852740 | 0.201555 | 9.19 | <.0001 |
x4 | 1 | 4.170266 | 0.201533 | 20.69 | <.0001 |
x5 | 1 | -3.010679 | 0.201458 | -14.94 | <.0001 |
x6 | 1 | -5.176016 | 0.201541 | -25.68 | <.0001 |
x7 | 1 | -2.695948 | 0.201671 | -13.37 | <.0001 |
_Sigma | 1 | 399.997845 | 0.261930 | 1527.12 | <.0001 |
In the following statements, the PERFORMANCE statement is modified to use a grid with 10 nodes, with each node capable of spawning eight threads:
proc hpqlim data=simulate ; performance nthreads=8 nodes=10 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; model y=x1-x7 /censored(lb=0 ub=400); run;
The second model which was run on a grid with 10 nodes and eight threads each (Output 22.1.3) took only 1.4 seconds instead of 45.4 seconds to optimize.
Output 22.1.3: Censored Model on Ten Nodes with Eight Threads Each: Performance Table
Because the two models being estimated are identical, it is reasonable to expect that Output 22.1.2 and Output 22.1.4 would show the same results except for the performance. However, in certain circumstances, you might observe slight numerical differences in the results (depending on the number of nodes and threads) because the order in which partial results are accumulated, the limits of numerical precision, and the propagation of error in numerical computations can make a difference in the final result.
Output 22.1.4: Censored Model on Ten Nodes with Eight Threads Each: Summary
Parameter Estimates | |||||
---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
t Value | Approx Pr > |t| |
Intercept | 1 | 2.220358 | 0.222201 | 9.99 | <.0001 |
x1 | 1 | 3.055491 | 0.201620 | 15.15 | <.0001 |
x2 | 1 | 4.000196 | 0.201570 | 19.85 | <.0001 |
x3 | 1 | 1.852735 | 0.201555 | 9.19 | <.0001 |
x4 | 1 | 4.170323 | 0.201533 | 20.69 | <.0001 |
x5 | 1 | -3.010670 | 0.201458 | -14.94 | <.0001 |
x6 | 1 | -5.176019 | 0.201541 | -25.68 | <.0001 |
x7 | 1 | -2.695886 | 0.201671 | -13.37 | <.0001 |
_Sigma | 1 | 399.997846 | 0.261930 | 1527.12 | <.0001 |
As this example suggests, increasing the number of nodes and the number of threads per node improves performance significantly. When you use the parallelism that a high-performance distributed environment affords, you can see an even more dramatic reduction in the time required for the optimization as the number of observations in the data set increases. When the data set is extremely large, the computations might not even be possible with the typical memory resources and computational constraints of a desktop computer. Under such circumstances the high-performance distributed environment becomes a necessity.