The HPCDM Procedure (Experimental)

DISTBY Statement

DISTBY replication-id-variable ;

A DISTBY statement is necessary if and only if you specify an ID= variable in the EXTERNALCOUNTS statement. In fact, the replication-id-variable must be the same as the ID= variable. This is especially important in the distributed mode of execution, because when the observations in the DATA= data set are distributed to the grid nodes, by specifying the replication-id-variable as a DISTBY variable, you are instructing PROC HPCDM to make sure that the observations that have the same value for the replication-id-variable are always kept together on one grid node. This is required for correct simulation of the CDM in the presence of the ID= variable.

Contrast this to the BY variables that you specify in the BY statement. The observations of a BY group might be split across all the nodes of the grid, but the observations of a DISTBY group, which is nested within a BY group, are never split across the nodes of the grid.

The replication-id-variable must not appear in the BY statement. However, the DATA= data set must be sorted as if the replication-id-variable were listed after the BY variables in the BY statement.

Even though the DISTBY statement is important primarily in distributed mode, you must also specify it in single-machine mode.