The OPTGRAPH Procedure

COMMUNITY Statement

  • COMMUNITY < options >;

The COMMUNITY statement invokes an algorithm that detects communities of the input graph. Community detection is described in the section Community.

You can specify the following options in the COMMUNITY statement:

ALGORITHM=LOUVAIN | LABEL_PROP | PARALLEL_LABEL_PROP

specifies whether to use the Louvain algorithm (LOUVAIN), the label propagation algorithm (LABEL_PROP), or the parallel label propagation algorithm (PARALLEL_LABEL_PROP). The Louvain algorithm is the default.

For more information about this option, see the sections Community and Parallel Community Detection.

LINK_REMOVAL_RATIO=number

defines the percentage of small-weight links to be removed around each node neighborhood. A link is usually removed if its weight is relatively smaller than the weights of neighboring links. Suppose that node A links to node B and to node C, link $A \rightarrow B$ has weight of 100, and link $A \rightarrow C$ has weight of 1. When nodes are grouped into communities, link $A \rightarrow B$ is much more important than link $A \rightarrow C$ because it contributes much more to the overall modularity value. Therefore, link $A \rightarrow C$ can be dropped from the network if dropping it does not disconnect node C from the network. If the LINK_REMOVAL_RATIO= option is specified, then the links that are incident to each node are examined. If the weight of any link is less than (number/100)*max_link_weight, where max_link_weight is the maximum link weight among all links incident to this node, it is removed provided that its removal does not disconnect any node from the network. This option can often dramatically improve the running time of large graphs. The valid range is between 0 and 100. The default value is 10.

LOGLEVEL=number | string

controls the amount of information that is displayed in the SAS log. Table 1.22 describes the valid values for this option.

Table 1.22: Values for LOGLEVEL= Option

number

string

Description

0

NONE

Turns off all algorithm-related messages in the SAS log

1

BASIC

Displays a basic summary of the algorithmic processing

2

MODERATE

Displays a summary of the algorithmic processing

3

AGGRESSIVE

Displays a detailed summary of the algorithmic processing


The default is the value that you specify in the LOGLEVEL= option in the PROC OPTGRAPH statement (or BASIC if that option is not specified).

MAXITER=number

specifies the maximum number of iterations allowed in the algorithm. The default is 20 when ALGORITHM= LOUVAIN and 100 when ALGORITHM=LABEL_PROP or ALGORITHM=PARALLEL_LABEL_PROP.

OUT_COMM_LINKS=SAS-data-set

specifies the output data set that describes the links between communities.

OUT_COMMUNITY=SAS-data-set

specifies the output data set that contains the number of nodes in each community.

OUT_LEVEL=SAS-data-set

specifies the output data set that contains community information at different resolution levels.

OUT_OVERLAP=SAS-data-set

specifies the output data set that describes the intensity of each node.

RANDOM_FACTOR=number

specifies the random factor for the parallel label propagation algorithm. Specify a number between 0 and 1. At each iteration, number $\times $ 100% of the nodes are randomly selected to skip the label propagation step. The default is 0.15, which means that 15% of nodes skip the label propagation step at each iteration.

RANDOM_SEED=number

specifies the random seed for the parallel label propagation algorithm. At each iteration, some nodes are randomly selected to skip the label propagation step, based on the value that you specify in the RANDOM_FACTOR= option. To choose a different set of random samples, specify a number in the RANDOM_SEED= option. By default, RANDOM_SEED=1234.

RECURSIVE (options)

requests that the algorithm recursively break down large communities into smaller ones until the specified conditions are satisfied. This option starts with the keyword RECURSIVE followed by any combination of three suboptions enclosed in parentheses—for example, RECURSIVE (MAX_COMM_SIZE=500) or RECURSIVE (MAX_COMM_SIZE=1000 MAX_DIAMETER=3 RELATION=AND).

Table 1.23: RECURSIVE options

option

Description

MAX_COMM_SIZE=

Specifies the maximum number of nodes to be contained in any community.

MAX_DIAMETER=

Specifies the maximum number of links on the shortest paths between any pair of nodes in any community.

RELATION=

Specifies the relationship between the values of MAX_COMM_SIZE and MAX_DIAMETER options.

 

If RELATION=AND, then recursive splitting continues until both MAX_COMM_SIZE and MAX_DIAMETER conditions are satisfied.

 

If RELATION=OR, then recursive splitting continues until either the MAX_COMM_SIZE or the MAX_DIAMETER condition is satisfied.

 

The valid values are AND and OR. The default is OR.


The MAX_DIAMETER= option is ignored when you specify ALGORITHM= PARALLEL_LABEL_PROP.

RESOLUTION_LIST=num_list

specifies a list of resolution values that are separated by spaces (for example, 4.3 2.1 1.0 0.6 0.2). The OPTGRAPH procedure interprets the RESOLUTION_LIST= option differently depending on the value of the ALGORITHM= option:

  • When ALGORITHM=LOUVAIN, specifying multiple resolution values enables you to see how communities are merged at various resolution levels. A larger parameter value indicates a higher resolution. For example, resolution 4.3 produces more communities than resolution 0.2. By default, RESOLUTION_LIST=1.0. When you also specify the RECURSIVE option, the first value in the resolution list is used and the other values are ignored.

  • When ALGORITHM=LABEL_PROP, PROC OPTGRAPH ignores the RESOLUTION_LIST= option. It uses the default value of 1.0.

  • When ALGORITHM=PARALLEL_LABEL_PROP, specifying multiple resolution values requests that the OPTGRAPH procedure perform community detection multiple times, each time with a different resolution value. By default, RESOLUTION_LIST=0.001. In this case, the RESOLUTION_LIST= option is fully compatible with the RECURSIVE option.

For more information about the use of the RESOLUTION_LIST= option, see the section Large Community.

TOLERANCE=number
MODULARITY=number

specifies the tolerance value for when to stop iterations. When you specify ALGORITHM= LOUVAIN, the algorithm stops iterations when the percentage modularity gain between two consecutive iterations falls within the specified tolerance value. When you specify ALGORITHM=LABEL_PROP or ALGORITHM=PARALLEL_LABEL_PROP, the algorithm stops iterations when the percentage of label changes for all nodes in the graph falls within the tolerance specified by number. The valid range is strictly between 0 and 1. By default, TOLERANCE=0.01.