High-Performance Features of the OPTGRAPH Procedure


Recommended Workflow

For graphs that contain hundreds of millions or billions of links, such as a typical telecommunications network, minimizing the movement of data is crucial to achieving maximum performance in a high-performance distributed computing environment. Therefore, the following workflow is recommended when you use PROC OPTGRAPH to perform community detection and to compute centrality metrics on a high-performance appliance:

  1. Distribute the links data set to the appliance as described in the section Distributing Input Data to the Appliance. The links data set must be distributed by the variable that represents the from node of each link.

  2. Run PROC OPTGRAPH with the COMMUNITY statement to perform community detection as described in the section Community Detection. Write the resulting OUT_INTRA_COMM_LINKS= output data set to the appliance.

    Repeat this step as many times as desired by using different options in the COMMUNITY statement to control the parallel community detection algorithm. For example, you might want to try different values for the maximum community size or the maximum number of iterations. For information about the options that are available when you run community detection, see the section Community Detection.

  3. After running community detection, run PROC OPTGRAPH again with the CENTRALITY statement to compute centrality metrics by cluster as described in the section Centrality Computation by Cluster. As the links input data set, use the OUT_INTRA_COMM_LINKS= output data set that was created in step 2.