The HPCORR Procedure

Example 4.2 Computing the Pearson Measure of Association in Distributed Mode

The real power of PROC HPCORR is when the computation is solved with multiple threads or in a distributed environment.

You can switch to running in distributed mode simply by specifying valid values for the NODES=, INSTALL=, and HOST= options in the PERFORMANCE statement.

An alternative to specifying the INSTALL= and HOST= options in the PERFORMANCE statement is to set appropriate values for the GRIDHOST and GRIDINSTALLLOC environment variables by using OPTIONS SET commands. For more information about setting these options or environment variables, see the section Processing Modes in Chapter 2: Shared Concepts and Topics.

The following statements provide an example. To run these statements successfully, you need to set the macro variables GRIDHOST and GRIDINSTALLLOC to resolve to appropriate values, or you can replace the references to macro variables with appropriate values.

The macro variable BRECLIB is the name of a libref to a billion-record database.

title 'HPCORR processing 1-Billion record database';
proc hpcorr data=&BRECLIB;
   var x1-x5;
   performance host=&GRIDHOST install=&GRIDINSTALLLOC;
run;

The execution mode in the Performance Information table shown in Output 4.2.1 indicates that the calculations were performed in a distributed environment that uses 8 nodes; the data are predistributed using a Greenplum parallel database.

Output 4.2.1: Performance Information in Distributed Mode

HPCORR processing 1-Billion record database

Performance Information
Host Node << your grid host >>
Install Location << your grid install lo
Execution Mode Distributed
Grid Mode Symmetric
Number of Compute Nodes 8


Another indication of distributed execution is the following message, which is issued by all high-performance analytical procedures in the SAS log:

NOTE: The HPCORR procedure is executing in the distributed
      computing environment with 8 worker nodes.

Because the sample database uses random data, the results are not meaningful. The power of high-performance analytics is that this test can be completed in a matter of minutes instead of hours.