This example shows the use of the Bayesian analysis available in the HPQLIM procedure with an emphasis on processing a large data set and on the performance improvements that are achieved by executing in a high-performance distributed environment.
The model and the data set are the same as in Example 8.1, and the priors are set to the defaults.
The model is executed in the distributed computing environment with two threads and only one node. These settings are used to obtain a hypothetical environment that might resemble running the HPQLIM procedure on a desktop workstation with a dual-core CPU. To run the following statements successfully, you need to set the macro variables GRIDHOST and GRIDINSTALLLOC to resolve to appropriate values, or you can replace the references to the macro variables in the example with the appropriate values.
option set=GRIDHOST="&GRIDHOST"; option set=GRIDINSTALLLOC="&GRIDINSTALLLOC";
proc hpqlim data=simulate ; bayes nbi=10000 nmc=30000; performance nthreads=2 nodes=1 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; model y=x1-x7 /censored(lb=0 ub=400); %*; ods output PerformanceInfo=perfInfo; %*; ods output Timing=time; run;
Output 8.2.1 shows a summary of the posterior distribution that is associated with the censored model when you use diffuse prior distributions.
Output 8.2.1: Posterior Summary for Bayesian Censored Model
Estimating a Tobit model |
Posterior Summaries | ||||||
---|---|---|---|---|---|---|
Parameter | N | Mean | Standard Deviation |
Percentiles | ||
25% | 50% | 75% | ||||
Intercept | 30000 | 2.2240 | 0.2206 | 2.0730 | 2.2209 | 2.3725 |
x1 | 30000 | 3.0489 | 0.2006 | 2.9127 | 3.0419 | 3.1803 |
x2 | 30000 | 3.9984 | 0.1978 | 3.8667 | 4.0014 | 4.1284 |
x3 | 30000 | 1.8443 | 0.2018 | 1.7069 | 1.8456 | 1.9822 |
x4 | 30000 | 4.1748 | 0.2000 | 4.0419 | 4.1753 | 4.3104 |
x5 | 30000 | -3.0096 | 0.1998 | -3.1447 | -3.0114 | -2.8736 |
x6 | 30000 | -5.1686 | 0.2003 | -5.3041 | -5.1680 | -5.0348 |
x7 | 30000 | -2.6953 | 0.2099 | -2.8375 | -2.6955 | -2.5545 |
_Sigma | 30000 | 400.0 | 0.2615 | 399.8 | 400.0 | 400.2 |
Output 8.2.2 show a summary of the performance when you use a distributed computing environment with one node and two threads.
Output 8.2.2: Performance Analysis for Bayesian Censored Model on One Node with Two Threads
Estimating a Tobit model |
Performance Information | |
---|---|
Host Node | << your grid host >> |
Install Location | << your grid install location >> |
Execution Mode | Distributed |
Grid Mode | Symmetric |
Number of Compute Nodes | 1 |
Number of Threads per Node | 2 |
Estimating a Tobit model |
Procedure Task Timing | ||
---|---|---|
Task | Seconds | Percent |
Reading and Levelizing Data | 0.95 | 0.00% |
Communication to Client | 0.15 | 0.00% |
Bayesian Analysis: Likelihood for MCMC | 30819.21 | 99.86% |
Bayesian Analysis: MCMC | 0.22 | 0.00% |
Optimization | 43.41 | 0.14% |
Post-optimization | 0.00 | 0.00% |
Finally, Output 8.2.3 shows the diagnostic and summary plots that are associated with X1.
In the following statements, the PERFORMANCE statement is modified to use a grid with 10 nodes, where each node spawns eight threads:
option set=GRIDHOST="&GRIDHOST"; option set=GRIDINSTALLLOC="&GRIDINSTALLLOC";
proc hpqlim data=simulate ; bayes nbi=10000 nmc=30000; performance nthreads=8 nodes=10 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; model y=x1-x7 /censored(lb=0 ub=400); %*; ods output PerformanceInfo=perfInfo; %*; ods output Timing=time; run;
The two models are identical, but the second implementation, which was run on a grid that used 10 nodes with eight threads each, took only 21 minutes instead of 8.5 hours to sample from the same posterior distribution.
Output 8.2.4: Performance Analysis for Bayesian Censored Model on Ten Nodes with Eight Threads Each
Estimating a Tobit model |
Performance Information | |
---|---|
Host Node | << your grid host >> |
Install Location | << your grid install location >> |
Execution Mode | Distributed |
Grid Mode | Symmetric |
Number of Compute Nodes | 10 |
Number of Threads per Node | 8 |
Estimating a Tobit model |
Procedure Task Timing | ||
---|---|---|
Task | Seconds | Percent |
Reading and Levelizing Data | 0.07 | 0.01% |
Communication to Client | 0.23 | 0.02% |
Bayesian Analysis: Likelihood for MCMC | 1242.54 | 99.84% |
Bayesian Analysis: MCMC | 0.17 | 0.01% |
Optimization | 1.49 | 0.12% |
Post-optimization | 0.00 | 0.00% |