High-Performance Features of the OPTGRAPH Procedure


Centrality Computation by Cluster

The centrality of a node in a graph indicates its relative importance within a graph. In the field of network analysis, many different types of centrality metrics are used to better understand levels of prominence. For more information, see the section "Centrality" in SAS OPTGRAPH Procedure: Graph Algorithms and Network Analysis.

When running in distributed mode, you can use the CENTRALITY statement in PROC OPTGRAPH along with the BY_CLUSTER option to process the induced subgraphs that are defined by the output of the community detection algorithm or to process the induced subgraphs that are defined by any general partition of the links in the graph. The typical use case of the BY_CLUSTER option is described in the section "Processing by Cluster" in SAS OPTGRAPH Procedure: Graph Algorithms and Network Analysis. The main difference when you run in distributed mode is that the cluster variable is defined in the links input data set (which corresponds to the DATA_LINKS= option), not in the nodes input data set (which corresponds to the DATA_NODES= option). In distributed mode, there is no need for the DATA_NODES= option.

As in the process described previously for community detection, you can predistribute the links data set to the grid by cluster by using one of the methods described in the section Distributing Input Data to the Appliance. However, as mentioned in the section Recommended Workflow, the recommended workflow is to use the OUT_INTRA_COMM_LINKS= output data set that results from running community detection as input to the centrality algorithm. This data set already contains the cluster variable, which identifies the assigned community for each link, and the data set is stored on the appliance.

The following sections provide two examples of running centrality in distributed mode. The first example shows how to use the output from running community detection as input to the centrality algorithm. It then shows an alternative manual process of predistributing the data by cluster, to be used as input to the centrality algorithm. In the second example, the cluster variable does not need to come from the community detection algorithm, but it can represent any partition of the graph.