High-Performance Features of the OPTGRAPH Procedure


Distributing Input Data to the Appliance

As described in the section Alongside-the-Database Distributed Mode, the OPTGRAPH procedure in distributed mode supports the alongside-the-database model of execution. Before you can run PROC OPTGRAPH alongside the database, you must distribute the data to the appliance.

To run community detection in distributed mode, you must distribute the links data set. Community detection requires all links that have the same from node to be colocated on the same node of the appliance. Therefore, before running community detection, you must distribute the links data set by the column that represents the from node. Community detection in distributed mode requires a directed graph. If your graph is undirected, you must first convert it to a directed graph by repeating each link in the opposite direction.

To run centrality computations in distributed mode, the links data set must contain an additional column, cluster, which identifies the community that corresponds to the from node and to node for each link. PROC OPTGRAPH does not require the links data set to be distributed by the cluster column before running centrality computations. This is because PROC OPTGRAPH redistributes the links data internally before centrality computation if it detects that the data set is not distributed by the cluster column.

The following sections describe how to distribute a data set to a Greenplum, Teradata, or Hadoop appliance. For more information about each data access engine, see SAS/ACCESS for Relational Databases: Reference.