The HPDS2 Procedure

Single-Machine and Distributed Execution Modes

The HPDS2 procedure controls both the number of nodes that are engaged and the number of parallel threads that each node uses for the execution of the DS2 language statements. In contrast to the THREADS PACKAGE DS2 (whose syntax provides single-node scalability as part of the DS2 syntax), PROC HPDS2 provides threading that operates outside the syntax of the language.

In single-machine mode, one or more copies of the DS2 program can be executed in parallel in multiple threads on the client machine.

In distributed mode, one or more copies of the DS2 program are executed in parallel on each machine in the distributed computing environment. The distributed mode of execution has two variations:

  • In the client-data (local-data) model of distributed execution, the input data are not stored on the appliance but are distibuted to the distributed computing environment by the SAS High-Performance Analytics infrastructure during execution of the HPDS2 procedure.

  • In the alongside-the-database model of distributed execution, the data source is the database on the appliance. The data are stored in the distributed database, and the DS2 program that is run on each node is able to read and write the data in parallel during execution of the procedure. Instead of data being moved across the network and possibly back to the client machine, data are passed locally between the processes on each node of the appliance. In general, especially with large data sets, the best HPDS2 performance can be achieved if execution is alongside the database.

By default, the number of copies of the DS2 program that are executed in parallel on a given host (that is, client machine or grid node) is determined by the HPDS2 procedure based on the number of CPUs (cores) available on the host machine. The default is to execute one instance of the DS2 program in a dedicated thread per CPU. You can change the default by specifying the NTHREADS= option in the PERFORMANCE statement. For example, if you specify NTHREADS=n, then the HPDS2 procedure runs n copies of the DS2 program in parallel on each machine.

For information about the available modes of execution and how to switch between them, see the section Processing Modes in Chapter 3: Shared Concepts and Topics.