One of the key features of the HPSEVERITY procedure is that is takes advantage of the distributed and multithreaded computing machinery in order to solve a given problem faster. This example illustrates the benefits of using multithreading and distributed computing.
The example uses a simulated data set Work.Largedata
, which contains 10,000,000 observations, some of which are right-censored or left-truncated. The losses are affected by three
external effects. The DATA step program that generates this data set is available in the accompanying sample program hsevex06.sas
.
The following PROC HPSEVERITY step fits all the predefined distributions to the data in Work.Largedata
data set on the client machine with just one thread of computation:
/* Fit all predefined distributions without any multithreading or distributed computing */ proc hpseverity data=largedata criterion=aicc initsample(size=20000); loss y / lt=threshold rc=limit; scalemodel x1-x3; dist _predef_; performance nthreads=1 bufsize=1000000 details; run;
The NTHREADS=1 option in the PERFORMANCE statement specifies that just one thread of computation be used. The absence of the NODES= option in the PERFORMANCE statement specifies that single-machine mode of execution be used. That is, this step does not use any multithreading or distributed computing. The BUFSIZE= option in the PERFORMANCE statement specifies the number of observations to read at one time. Specifying a larger value tends to decrease the time it takes to load the data. The DETAILS option in the performance statement enables reporting of the timing information. The INITSAMPLE option in the PROC HPSEVERITY statement specifies that a uniform random sample of maximum 20,000 observations be used for parameter initialization.
The "Performance Information" and "Procedure Task Timing" tables that PROC HPSEVERITY creates are shown in Output 23.6.1. The "Performance Information" table contains the information about the execution environment. The "Procedure Task Timing" table indicates the total time and relative time taken by each of the four main steps of PROC HPSEVERITY. As that table shows, it takes around 28.2 minutes for the task of estimating parameters, which is usually the most time-consuming of all the tasks.
Output 23.6.1: Performance for Single-Machine Mode with No Multithreading
If the grid appliance is not available, you can improve the performance by using multiple threads of computation; this is in fact the default. The following PROC HPSEVERITY step fits all the predefined distributions by using all the logical CPU cores of the machine:
/* Specify that all the logical CPU cores on the machine be used */ options cpucount=actual; /* Fit all predefined distributions with multithreading, but no distributed computing */ proc hpseverity data=largedata criterion=aicc initsample(size=20000); loss y / lt=threshold rc=limit; scalemodel x1-x3; dist _predef_; performance bufsize=1000000 details; run;
When you do not specify the NTHREADS= option in the PERFORMANCE statement, the HPSEVERITY procedure uses the value of the CPUCOUNT= system option to decide the number of threads to use in single-machine mode. Setting the CPUCOUNT= option to ACTUAL before the PROC HPSEVERITY step enables the procedure to use all the logical cores of the machine. The machine that is used to obtain these results (and the earlier results in Output 23.6.1) has four physical CPU cores, each with a clock speed of 3.4 GHz. Hyperthreading is enabled on the CPUs to yield eight logical CPU cores; this number is confirmed by the "Performance Information" table in Output 23.6.2. The results in the "Procedure Task Timing" table in Output 23.6.2 indicate that the use of multithreading has improved the performance by reducing the time to estimate parameters to around 5.7 minutes.
Output 23.6.2: Performance for Single-Machine Mode with Eight Threads
When a grid appliance is available, performance can be further improved by using more than one node in the distributed mode of execution. Large data sets are usually predistributed on the grid appliance that hosts a distributed database. In other words, large problems are best suited for the alongside-the-database model of execution. However, for the purpose of illustration, this example assumes that the data set is available on the client machine and is then distributed to the grid nodes by the HPSEVERITY procedure according to the options that are specified in the PERFORMANCE statement.
The next few PROC HPSEVERITY steps are run on a grid appliance by varying the number of nodes and the number of threads that are used within each node.
You can specify your distributed computing environment by using SAS environment variables or by specifying options in the PERFORMANCE statement, or by a combination of these methods. For example, you can submit the following statements to specify the appliance host (GRIDHOST= SAS environment variable) and the installation location of shared libraries on the appliance (GRIDINSTALLLOC= SAS environment variable):
/* Set the appliance host and installation location that are appropriate for your distributed mode setup */ option set=GRIDHOST ="&GRIDHOST"; option set=GRIDINSTALLLOC="&GRIDINSTALLLOC";
To run the preceding statements successfully, you need to set the macro variables GRIDHOST and GRIDINSTALLLOC to resolve to appropriate values, or you can replace the references to macro variables with the appropriate values. Alternatively, you can specify the HOST= and INSTALL= options in the PERFORMANCE statement; this method is used in the PROC HPSEVERITY steps of this example. You can use other SAS environment variables and PERFORMANCE statement options to describe your distributed computing environment. For more information, see the section PERFORMANCE Statement.
To establish a reference point for the performance of one CPU of a grid node, the results of using only one node of the grid
appliance without any multithreading are presented first. The particular grid appliance that is used to obtain these results
has more than sixteen nodes. Each node has 8 dual-core CPUs with a clock speed of 2.7 GHz. The following PROC HPSEVERITY step
fits all the predefined distributions to the data in the Work.Largedata
data set:
/* Fit all predefined distributions on 1 grid node without any multithreading */ proc hpseverity data=largedata criterion=aicc initsample(size=20000); loss y / lt=threshold rc=limit; scalemodel x1-x3; dist _predef_; performance nodes=1 nthreads=1 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; run;
The PERFORMANCE statement specifies that only one node be used to fit the models, with only one thread of computation on that node. The "Performance Information" and "Procedure Task Timing" tables that PROC HPSEVERITY creates are shown in Output 23.6.3. It takes around 33.5 minutes to complete the task of estimating parameters. Note that this time is longer than the time taken for the single-machine mode with one thread of computation, because the CPUs of an individual grid node are slower than the CPUs of the machine that is used in single-machine mode. When the performance is measured, the grid node is shared among multiple users, unlike the machine that is used in single-machine mode.
Output 23.6.3: Performance on One Grid Appliance Node with No Multithreading
The computations and time taken to fit each model are shown in the "Estimation Details" table of Output 23.6.4, which is generated whenever you specify the DETAILS option in the PERFORMANCE statement. This table can be useful for comparing the relative effort required to fit each model and drawing some broader conclusions. For example, even if the Pareto distribution takes a larger number of iterations, function calls, and gradient and Hessian updates than the gamma distribution, it takes less time to complete; this indicates that the individual PDF and CDF computations of the gamma distribution are more expensive than those of the Pareto distribution.
Output 23.6.4: Estimation Details
Estimation Details | ||||||
---|---|---|---|---|---|---|
Distribution | Converged | Iterations | Function Calls |
Gradient Updates |
Hessian Updates |
Time (Seconds) |
Burr | Yes | 11 | 28 | 104 | 90 | 325.96 |
Exp | Yes | 4 | 12 | 27 | 20 | 29.56 |
Gamma | Yes | 6 | 16 | 44 | 35 | 722.06 |
Igauss | Yes | 4 | 16 | 27 | 20 | 215.40 |
Logn | Yes | 4 | 12 | 27 | 20 | 112.60 |
Pareto | Yes | 39 | 113 | 902 | 860 | 397.75 |
Gpd | Yes | 6 | 17 | 44 | 35 | 132.98 |
Weibull | Yes | 4 | 12 | 27 | 20 | 72.57 |
To obtain the next reference point for performance, the following PROC HPSEVERITY step specifies that 16 computation threads be used on one node of the grid appliance:
/* Fit all predefined distributions on 1 grid node with multithreading */ proc hpseverity data=largedata criterion=aicc initsample(size=20000); loss y / lt=threshold rc=limit; scalemodel x1-x3; dist _predef_; performance nodes=1 nthreads=16 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; run;
The performance tables that are created by the preceding statements are shown in Output 23.6.5. As the "Procedure Task Timing" table shows, use of multithreading has improved the performance significantly over that of the single-threaded case. Now, it takes around 2.9 minutes to complete the task of estimating parameters.
Output 23.6.5: Performance Information with Multithreading but No Distributed Computing
You can combine the power of multithreading and distributed computing by specifying that multiple nodes of the grid be used to accomplish the task. The following PROC HPSEVERITY step specifies that 16 nodes of the grid appliance be used:
/* Fit all predefined distributions with distributed computing and multithreading within each node */ proc hpseverity data=largedata criterion=aicc initsample(size=20000); loss y / lt=threshold rc=limit; scalemodel x1-x3; dist _predef_; performance nodes=16 nthreads=16 details host="&GRIDHOST" install="&GRIDINSTALLLOC"; run;
When the DATA= data set is local to the client machine, as it is in this example, you must specify a nonzero value for the NODES= option in the PERFORMANCE statement in order to enable the distributed mode of execution. In other words, for the distributed mode that is not executing alongside the database, omitting the NODES= option is equivalent to specifying NODES=0, which is single-machine mode.
The performance tables that are created by the preceding statements are shown in Output 23.6.6. If you compare these tables to the tables in Output 23.6.3 and Output 23.6.5, you see that the task that would have taken a long time with a single thread of execution on a single machine (over half an hour) can be performed in a much shorter time (around 15 seconds) by using the computational resources of the grid appliance to combine the power of multithreaded and distributed computing.
Output 23.6.6: Performance Information with Distributed Computing and Multithreading
The machines that were used to obtain these performance results are relatively modest machines, and PROC HPSEVERITY was run in a multiuser environment; that is, background processes were running in single-machine mode or other users were using the grid in distributed mode. For time-critical applications, you can use a larger, dedicated grid that consists of more powerful machines to achieve more dramatic performance improvement.