The HPSUMMARY Procedure

Single-Machine and Distributed Execution Modes

The HPSUMMARY procedure enables you to perform analyses either on a single computer (single-machine mode) or on multiple computers that are connected in a grid configuration (distributed mode). For more information about these execution modes, see the section Processing Modes in Chapter 3: Shared Concepts and Topics.

In single-machine mode, you can take advantage of multiple processors and cores in a single machine, and you can control the number of parallel threads.

In distributed mode, you can take advantage of the collective processing resources of multiple machines. You can control both the number of parallel threads per execution node and the number of nodes to engage. One or more copies of the summarization code are executed in parallel on each node. You can read data in parallel from and write data in parallel to a supported database management system (DBMS) on each node in the grid, thus greatly reducing processing time for large volumes of data. The distributed mode of execution has two variations:

  • In the client-data (local-data) model of distributed execution, the input data are not stored on the grid computing appliance but are distributed to it from the client during execution of the HPSUMMARY procedure.

  • In the alongside-the-database model of distributed execution, the data source is the database on the appliance. The data are stored in the distributed database, and the summarization code that runs on each node can read and write the data in parallel during execution of the procedure. Instead of being moved across the network and possibly back to the client machine, data are passed locally between the processes on each node of the appliance. In general, especially with large data sets, the best PROC HPSUMMARY performance can be achieved if execution is alongside the database.