High-Performance Features of the OPTGRAPH Procedure


COMMUNITY Statement

  • COMMUNITY < options >;

You can specify the following options in the COMMUNITY statement when running PROC OPTGRAPH in distributed mode.

ALGORITHM=PARALLEL_LABEL_PROP

specifies which algorithm to use. Currently, only the parallel label propagation algorithm (PARALLEL_LABEL_PROP) is supported in distributed mode.

LOGLEVEL=number | string

controls the amount of information that is displayed in the SAS log. Table 1.1 describes the valid values for this option.

Table 1.1: Values for LOGLEVEL= Option

number

string

Description

0

NONE

Turns off all algorithm-related messages in the SAS log

1

BASIC

Displays a basic summary of the algorithmic processing

2

MODERATE

Displays a summary of the algorithmic processing

3

AGGRESSIVE

Displays a detailed summary of the algorithmic processing


The default is the value that you specify in the LOGLEVEL= option in the PROC OPTGRAPH statement (or BASIC if that option is not specified).

MAXITER=number

specifies the maximum number of iterations that the algorithm allows. The default is 100.

OUT_COMM_LINKS=SAS-data-set

specifies the output data set that describes the links between communities.

OUT_COMMUNITY=SAS-data-set

specifies the output data set that contains the number of nodes in each community.

OUT_INTRA_COMM_LINKS=SAS-data-set

specifies the output data set that describes the links within each community.

OUT_LEVEL=SAS-data-set

specifies the output data set that contains community information at different resolution levels.

OUT_OVERLAP=SAS-data-set

specifies the output data set that describes the intensity of each node.

RANDOM_FACTOR=number

specifies the random factor for the parallel label propagation algorithm. Specify a number between 0 and 1. At each iteration, number $\times $ 100% of the nodes are randomly selected to skip the label propagation step. The default is 0.15, which means that 15% of the nodes skip the label propagation step at each iteration.

RANDOM_SEED=number

specifies the initial seed for random number generation used in the parallel label propagation algorithm. At each iteration, some nodes are randomly selected to skip the label propagation step, based on the value that you specify in the RANDOM_FACTOR= option. To change the sequence of random numbers generated by changing the initial seed, specify a number in the RANDOM_SEED= option. The default is 1234.

RECURSIVE(MAX_COMM_SIZE=number)

requests that the algorithm recursively break down large communities into smaller ones until all communities have a size that is less than or equal to number. This option starts with the keyword RECURSIVE, followed by the MAX_COMM_SIZE= suboption enclosed in parentheses—for example, RECURSIVE (MAX_COMM_SIZE=200). MAX_COMM_SIZE= specifies the maximum number of nodes to be contained in any community.

For information about using the RECURSIVE (MAX_COMM_SIZE=) option, see the section Large Community.

RESOLUTION_LIST=number_list

specifies a list of resolution values that are separated by spaces (for example, 1.0 0.6 0.2). Multiple resolution values enable you to run community detection multiple times, each time with a different resolution value. Valid values are any nonnegative numbers; the default is 0.001.

For more information about using the RESOLUTION_LIST= option, see the section Large Community.

TOLERANCE=number

stops iterations when the percentage of label changes for all nodes in the graph falls within the tolerance specified by number. The valid range is between 0 and 1. The default is 0.01.