specifies whether to use the Louvain algorithm (LOUVAIN), the label propagation algorithm (LABEL_PROP), or the parallel label
propagation algorithm (PARALLEL_LABEL_PROP). The Louvain algorithm is the default.
defines the percentage of small-weight links to be removed around each node neighborhood. A link is usually removed if its
weight is relatively smaller than the weights of neighboring links. Suppose that node A links to node B and to node C, link has weight of 100, and link has weight of 1. When nodes are grouped into communities, link is much more important than link because it contributes much more to the overall modularity value. Therefore, link can be dropped from the network if dropping it does not disconnect node C from the network. If the LINK_REMOVAL_RATIO= option is specified, then the links that are incident to each node are examined.
If the weight of any link is less than (number/100)*max_link_weight, where max_link_weight is the maximum link weight among all links incident to this node, it is removed provided that its removal does not disconnect
any node from the network. This option can often dramatically improve the running time of large graphs. The valid range is
between 0 and 100. The default value is 10.
LOGLEVEL=number | string
controls the amount of information that is displayed in the SAS log. Table 1.22 describes the valid values for this option.
Table 1.22: Values for LOGLEVEL= Option
number
string
Description
0
NONE
Turns off all algorithm-related messages in the SAS log
1
BASIC
Displays a basic summary of the algorithmic processing
2
MODERATE
Displays a summary of the algorithmic processing
3
AGGRESSIVE
Displays a detailed summary of the algorithmic processing
The default is the value that you specify in the LOGLEVEL=
option in the PROC OPTGRAPH statement (or BASIC if that option is not specified).
MAXITER=number
specifies the maximum number of iterations allowed in the algorithm. The default is 20 when ALGORITHM=
LOUVAIN and 100 when ALGORITHM=LABEL_PROP or ALGORITHM=PARALLEL_LABEL_PROP.
OUT_COMM_LINKS=SAS-data-set
specifies the output data set that describes the links between communities.
OUT_COMMUNITY=SAS-data-set
specifies the output data set that contains the number of nodes in each community.
OUT_LEVEL=SAS-data-set
specifies the output data set that contains community information at different resolution levels.
OUT_OVERLAP=SAS-data-set
specifies the output data set that describes the intensity of each node.
RANDOM_FACTOR=number
specifies the random factor for the parallel label propagation algorithm. Specify a number between 0 and 1. At each iteration, number 100% of the nodes are randomly selected to skip the label propagation step. The default is 0.15, which means that 15% of
nodes skip the label propagation step at each iteration.
RANDOM_SEED=number
specifies the random seed for the parallel label propagation algorithm. At each iteration, some nodes are randomly selected
to skip the label propagation step, based on the value that you specify in the RANDOM_FACTOR= option. To choose a different
set of random samples, specify a number in the RANDOM_SEED= option. By default, RANDOM_SEED=1234.
RECURSIVE (options)
requests that the algorithm recursively break down large communities into smaller ones until the specified conditions are
satisfied. This option starts with the keyword RECURSIVE followed by any combination of three suboptions enclosed in parentheses—for
example, RECURSIVE (MAX_COMM_SIZE=500) or RECURSIVE (MAX_COMM_SIZE=1000 MAX_DIAMETER=3 RELATION=AND).
Table 1.23: RECURSIVE options
option
Description
MAX_COMM_SIZE=
Specifies the maximum number of nodes to be contained in any community.
MAX_DIAMETER=
Specifies the maximum number of links on the shortest paths between any pair of nodes in any community.
RELATION=
Specifies the relationship between the values of MAX_COMM_SIZE and MAX_DIAMETER options.
If RELATION=AND, then recursive splitting continues until both MAX_COMM_SIZE and MAX_DIAMETER conditions are satisfied.
If RELATION=OR, then recursive splitting continues until either the MAX_COMM_SIZE or the MAX_DIAMETER condition is satisfied.
The valid values are AND and OR. The default is OR.
The MAX_DIAMETER= option is ignored when you specify ALGORITHM=
PARALLEL_LABEL_PROP.
RESOLUTION_LIST=num_list
specifies a list of resolution values that are separated by spaces (for example, 4.3 2.1 1.0 0.6 0.2). The OPTGRAPH procedure
interprets the RESOLUTION_LIST= option differently depending on the value of the ALGORITHM=
option:
When ALGORITHM=LOUVAIN, specifying multiple resolution values enables you to see how communities are merged at various resolution
levels. A larger parameter value indicates a higher resolution. For example, resolution 4.3 produces more communities than
resolution 0.2. By default, RESOLUTION_LIST=1.0. When you also specify the RECURSIVE
option, the first value in the resolution list is used and the other values are ignored.
When ALGORITHM=LABEL_PROP, PROC OPTGRAPH ignores the RESOLUTION_LIST= option. It uses the default value of 1.0.
When ALGORITHM=PARALLEL_LABEL_PROP, specifying multiple resolution values requests that the OPTGRAPH procedure perform community
detection multiple times, each time with a different resolution value. By default, RESOLUTION_LIST=0.001. In this case, the
RESOLUTION_LIST= option is fully compatible with the RECURSIVE option.
For more information about the use of the RESOLUTION_LIST= option, see the section Large Community.
TOLERANCE=number
MODULARITY=number
specifies the tolerance value for when to stop iterations. When you specify ALGORITHM=
LOUVAIN, the algorithm stops iterations when the percentage modularity gain between two consecutive iterations falls within
the specified tolerance value. When you specify ALGORITHM=LABEL_PROP or ALGORITHM=PARALLEL_LABEL_PROP, the algorithm stops
iterations when the percentage of label changes for all nodes in the graph falls within the tolerance specified by number. The valid range is strictly between 0 and 1. By default, TOLERANCE=0.01.