The HPQLIM Procedure

Example 22.1 High-Performance Model with Censoring

This example shows the use of the HPQLIM procedure with an emphasis on processing a large data set and on the performance improvements that are achieved by executing in the high-performance distributed environment.

The following DATA step generates 5 million replicates from a censored model. The model contains seven variables.

  data simulate;
       call streaminit(12345);
       array vars x1-x7;
       array parms{7}  (3 4 2 4 -3 -5 -3);

       intercept=2;

      do i=1 to 5000000;
          sum_xb=0;
          do j=1 to 7;
             vars[j]=rand('NORMAL',0,1);
             sum_xb=sum_xb+parms[j]*vars[j];
          end;
          y=intercept+sum_xb+400*rand('NORMAL',0,1);
          if y>400 then y=400;
          if y<0 then y=0;
          output;
       end;
    keep y x1-x7;
    run;

The following statements estimate a censored model. The model is executed in the distributed computing environment with two threads and only one node. These settings are used to obtain a hypothetical environment that might resemble running the HPQLIM procedure on a desktop workstation with a dual-core CPU. To run these statements successfully, you need to set the macro variables GRIDHOST and GRIDINSTALLLOC to resolve to appropriate values, or you can replace the references to the macro variables in the example with the appropriate values.

    option set=GRIDHOST="&GRIDHOST";
    option set=GRIDINSTALLLOC="&GRIDINSTALLLOC";

 proc hpqlim data=simulate ;
    performance nthreads=2 nodes=1 details
                 host="&GRIDHOST" install="&GRIDINSTALLLOC";
    model y=x1-x7 /censored(lb=0 ub=400);
 run;

Output 22.1.1 shows that the censored model was estimated on the grid, defined in a macro variable named GRIDHOST, in a distributed environment on only one node with two threads.

Output 22.1.1: Censored Model with One Node and Two Threads: Performance Table

Estimating a Tobit model

Performance Information
Host Node	<< your grid host >>
Install Location	<< your grid install location >>
Execution Mode	Distributed
Number of Compute Nodes	1
Number of Threads per Node	2

Output 22.1.2 shows the estimation results for the censored model. The "Model Fit Summary" table shows detailed information about the model and indicates that all 5 million observations were used to fit the model. All parameter estimates in the "Parameter Estimates" table are highly significant and correspond to their theoretical values that were set during the data generating process. The optimization of the model with 5 million observations took 45.4 seconds.

Output 22.1.2: Censored Model with One Node and Two Threads: Summary

Model Information
Data Source	SIMULATE
Response Variable	y
Optimization Technique	Quasi-Newton

Number of Observations
Number of Observations Read	5000000
Number of Observations Used	5000000

Summary Statistics of Continuous Responses
Variable	Mean	Standard Error	Type	Lower Bound	Upper Bound	N Obs Lower Bound	N Obs Upper Bound
y	127.0	159.491090	Censored	0	400.0	249E4	8E5

Convergence criterion (FCONV=2.220446E-16) satisfied.

Model Fit Summary
Number of Endogenous Variables	1
Endogenous Variable	y
Number of Observations	5000000
Log Likelihood	-15268972
Maximum Absolute Gradient	0.0003291
Number of Iterations	11
Optimization Method	Quasi-Newton
AIC	30537962
Schwarz Criterion	30538083

Parameter Estimates
Parameter	DF	Estimate	Standard Error	t Value	Approx Pr > \|t\|
Intercept	1	2.220379	0.222201	9.99	<.0001
x1	1	3.055533	0.201620	15.15	<.0001
x2	1	4.000176	0.201570	19.85	<.0001
x3	1	1.852740	0.201555	9.19	<.0001
x4	1	4.170266	0.201533	20.69	<.0001
x5	1	-3.010679	0.201458	-14.94	<.0001
x6	1	-5.176016	0.201541	-25.68	<.0001
x7	1	-2.695948	0.201671	-13.37	<.0001
_Sigma	1	399.997845	0.261930	1527.12	<.0001

Procedure Task Timing
Task	Seconds	Percent
Reading and Levelizing Data	1.46	3.12%
Communication to Client	0.09	0.19%
Optimization	45.39	96.69%
Post-optimization	0.00	0.00%

In the following statements, the PERFORMANCE statement is modified to use a grid with 10 nodes, with each node capable of spawning eight threads:

 proc hpqlim data=simulate ;
    performance nthreads=8 nodes=10 details
                 host="&GRIDHOST" install="&GRIDINSTALLLOC";
    model y=x1-x7 /censored(lb=0 ub=400);
run;

The second model which was run on a grid with 10 nodes and eight threads each (Output 22.1.3) took only 1.4 seconds instead of 45.4 seconds to optimize.

Output 22.1.3: Censored Model on Ten Nodes with Eight Threads Each: Performance Table

Estimating a Tobit model

Performance Information
Host Node	<< your grid host >>
Install Location	<< your grid install location >>
Execution Mode	Distributed
Number of Compute Nodes	10
Number of Threads per Node	8

Because the two models being estimated are identical, it is reasonable to expect that Output 22.1.2 and Output 22.1.4 would show the same results except for the performance. However, in certain circumstances, you might observe slight numerical differences in the results (depending on the number of nodes and threads) because the order in which partial results are accumulated, the limits of numerical precision, and the propagation of error in numerical computations can make a difference in the final result.

Output 22.1.4: Censored Model on Ten Nodes with Eight Threads Each: Summary

Model Information
Data Source	SIMULATE
Response Variable	y
Optimization Technique	Quasi-Newton

Number of Observations
Number of Observations Read	5000000
Number of Observations Used	5000000

Summary Statistics of Continuous Responses
Variable	Mean	Standard Error	Type	Lower Bound	Upper Bound	N Obs Lower Bound	N Obs Upper Bound
y	127.0	159.491090	Censored	0	400.0	249E4	8E5

Convergence criterion (FCONV=2.220446E-16) satisfied.

Model Fit Summary
Number of Endogenous Variables	1
Endogenous Variable	y
Number of Observations	5000000
Log Likelihood	-15268972
Maximum Absolute Gradient	0.0008332
Number of Iterations	10
Optimization Method	Quasi-Newton
AIC	30537962
Schwarz Criterion	30538083

Parameter Estimates
Parameter	DF	Estimate	Standard Error	t Value	Approx Pr > \|t\|
Intercept	1	2.220358	0.222201	9.99	<.0001
x1	1	3.055491	0.201620	15.15	<.0001
x2	1	4.000196	0.201570	19.85	<.0001
x3	1	1.852735	0.201555	9.19	<.0001
x4	1	4.170323	0.201533	20.69	<.0001
x5	1	-3.010670	0.201458	-14.94	<.0001
x6	1	-5.176019	0.201541	-25.68	<.0001
x7	1	-2.695886	0.201671	-13.37	<.0001
_Sigma	1	399.997846	0.261930	1527.12	<.0001

Procedure Task Timing
Task	Seconds	Percent
Reading and Levelizing Data	0.09	5.77%
Communication to Client	0.12	7.67%
Optimization	1.38	86.56%
Post-optimization	0.00	0.00%

As this example suggests, increasing the number of nodes and the number of threads per node improves performance significantly. When you use the parallelism that a high-performance distributed environment affords, you can see an even more dramatic reduction in the time required for the optimization as the number of observations in the data set increases. When the data set is extremely large, the computations might not even be possible with the typical memory resources and computational constraints of a desktop computer. Under such circumstances the high-performance distributed environment becomes a necessity.