The OPTGRAPH Procedure

CENTRALITY Statement

CENTRALITY < options >;

The CENTRALITY statement enables you to select which centrality metrics to calculate for the given input graph. It also enables you to specify options for particular metrics. The resulting metrics are included in the node output data set (specified in the OUT_NODES= option) or the link output data set (specified in the OUT_LINKS= option).

The centrality metrics are described in the section Centrality.

You can specify the following options in the CENTRALITY statement.

AUTH=WEIGHT | UNWEIGHT | BOTH

specifies which type of authority centrality to calculate.

Table 1.10: Values for the AUTH= Option

Option Value	Description
WEIGHT	Calculates authority centrality based on the weighted graph.
UNWEIGHT	Calculates authority centrality based on the unweighted graph.
BOTH	Calculates authority centrality based on both weighted and unweighted graphs.

If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). This centrality metric can be used only for directed graphs. The authority centrality metric is described in the section Hub and Authority Scoring.

BETWEEN=WEIGHT | UNWEIGHT | BOTH

specifies which type of betweenness centrality to calculate.

Table 1.11: Values for the BETWEEN= Option

Option Value	Description
WEIGHT	Calculates betweenness centrality based on the weighted graph.
UNWEIGHT	Calculates betweenness centrality based on the unweighted graph.
BOTH	Calculates betweenness centrality based on both weighted and unweighted graphs.

If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). If the OUT_NODES= option is specified in the PROC OPTGRAPH statement, the node betweenness metric is produced. If the OUT_LINKS= option is specified, the link betweenness metric is produced. The betweenness centrality metric is described in the section Betweenness Centrality.

BETWEEN_NORM=YES | NO

specifies whether to normalize the betweenness centrality metrics.

Table 1.12: Values for the BETWEEN_NORM= Option

Option Value	Description
YES	Normalizes the betweenness metrics. This is the default.
NO	Does not normalize the betweenness metrics.

The normalization factor for betweenness centrality is described in the section Betweenness Centrality.

BY_CLUSTER

decomposes the calculations by cluster (or subgraph). If this option is specified, PROC OPTGRAPH looks for a definition of the clusters in the input data set specified by the DATA_NODES= option in the PROC OPTGRAPH statement. The use of the BY_CLUSTER option is described in the section Processing by Cluster.

CLOSE=WEIGHT | UNWEIGHT | BOTH

specifies which type of closeness centrality to calculate.

Table 1.13: Values for the CLOSE= Option

Option Value	Description
WEIGHT	Calculates closeness centrality based on the weighted graph.
UNWEIGHT	Calculates closeness centrality based on the unweighted graph.
BOTH	Calculates closeness centrality based on both weighted and unweighted graphs.

If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). The closeness centrality metric is described in the section Closeness Centrality.

CLOSE_NOPATH=NNODES | DIAMETER | ZERO | HARMONIC

specifies a method for accounting for a shortest path distance between two nodes when a path does not exist (disconnected nodes).

Table 1.14: Values for the CLOSE_NOPATH= Option

Option Value	Description
NNODES	Uses the number of nodes as a shortest path between disconnected nodes. This option cannot be used in calculating weighted closeness centrality.
DIAMETER	Uses the graph diameter (plus one) as a shortest path between disconnected nodes. This is the default.
ZERO	Uses zero as a shortest path between disconnected nodes.
HARMONIC	Uses the harmonic formula for closeness centrality.

For each option, there is a slight variation in the formula for the closeness centrality metric. These differences are described in the section Closeness Centrality.

CLUSTERING_COEF

calculates the node clustering coefficient. The cluster coefficient is described in the section Clustering Coefficient.

DEGREE=IN | OUT | BOTH

specifies which type of degree centrality to calculate for the input graph.

Table 1.15: Values for the DEGREE= Option

Option Value	Description
IN	Calculates degree based on in-links.
OUT	Calculates degree based on out-links.
BOTH	Calculates degree based on in-links and out-links.

For an undirected graph, the option values IN and BOTH are ignored, because there is only one notion of degree, which corresponds to the degree of out-links. The degree centrality metric is described in the section Degree Centrality.

EIGEN=WEIGHT | UNWEIGHT | BOTH

specifies which type of eigenvector centrality to calculate.

Table 1.16: Values for the EIGEN= Option

Option Value	Description
WEIGHT	Calculates eigenvector centrality based on the weighted graph.
UNWEIGHT	Calculates eigenvector centrality based on the unweighted graph.
BOTH	Calculates eigenvector centrality based on both weighted and unweighted graphs.

If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). This centrality metric can be used only for undirected graphs. The eigenvector centrality metric is described in the section Eigenvector Centrality.

EIGEN_ALGORITHM=AUTOMATIC | JACOBI_DAVIDSON | POWER

specifies the algorithm to use in calculating centrality metrics that require solving eigensystems (EIGEN, HUB, and AUTH).

Table 1.17: Values for the EIGEN_ALGORITHM= Option

Option Value	Description
AUTOMATIC	Requests that PROC OPTGRAPH automatically determine the eigensolver to use. This is the default.
JACOBI_DAVIDSON (JD)	Uses a variant of the Jacobi-Davidson algorithm for solving eigensystems (Sleijpen and van der Vorst 2000). This is used as the default for the eigenvector metric on undirected graphs and the hub and authority metrics.
POWER	Uses the power method to calculate eigenvectors. This is used as the default for the eigenvector metric on directed graphs.

EIGEN_MAXITER=number

specifies the maximum number of iterations to use for eigenvector calculations to limit the amount of computation time spent when convergence is slow. By default, EIGEN_MAXITER=10,000.

HUB=WEIGHT | UNWEIGHT | BOTH

specifies which type of hub centrality to calculate.

Table 1.18: Values for the HUB= Option

Option Value	Description
WEIGHT	Calculates hub centrality based on the weighted graph.
UNWEIGHT	Calculates hub centrality based on the unweighted graph.
BOTH	Calculates hub centrality based on both weighted and unweighted graphs.

If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). This centrality metric can be used only for directed graphs. The hub centrality metric is described in the section Hub and Authority Scoring.

INFLUENCE=WEIGHT | UNWEIGHT | BOTH

specifies which type of influence centrality to calculate.

Table 1.19: Values for the INFLUENCE= Option

Option Value	Description
WEIGHT	Calculates influence centrality based on the weighted graph.
UNWEIGHT	Calculates influence centrality based on the unweighted graph.
BOTH	Calculates influence centrality based on both weighted and unweighted graphs.

If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). The influence centrality metric is described in the section Influence Centrality.

LOGFREQNODE=number

controls the frequency for displaying iteration logs for some of the centrality metrics. For computationally intensive algorithms such as betweenness and closeness centrality, this option displays progress every number nodes. If you also specify the BY_CLUSTER option in this statement or a value greater than 1 for the NTHREADS= option in the PERFORMANCE statement, this option is ignored and the display frequency is determined by using the LOGFREQTIME= option instead. The value of number can be any integer greater than or equal to 1; the default is determined automatically based on the size of the graph. Setting this value too low can hurt performance on large-scale graphs.

LOGFREQTIME=number

controls the frequency for displaying iteration logs for some of the centrality metrics. For computationally intensive algorithms such as betweenness and closeness centrality, this option displays progress every number seconds. If you specify a value greater than 1 for the NTHREADS= option in the PERFORMANCE statement, PROC OPTGRAPH displays the number of nodes that have completed. If you specify the BY_CLUSTER option, PROC OPTGRAPH displays the number of subgraphs that have completed. The value of number can be any integer greater than or equal to 1; the default is 5. Setting this value too low can hurt performance on large-scale graphs.

LOGLEVEL=number | string

controls the amount of information that is displayed in the SAS log. Table 1.20 describes the valid values for this option.

Table 1.20: Values for LOGLEVEL= Option

number	string	Description
0	NONE	Turns off all algorithm-related messages in the SAS log
1	BASIC	Displays a basic summary of the algorithmic processing
2	MODERATE	Displays a summary of the algorithmic processing including a progress log using the interval that is specified in the LOGFREQNODE= or LOGFREQTIME= option
3	AGGRESSIVE	Displays a detailed summary of the algorithmic processing including a progress log using the interval that is specified in the LOGFREQNODE= or LOGFREQTIME= option

The default is the value that is specified in the LOGLEVEL= option in the PROC OPTGRAPH statement (or BASIC if that option is not specified).

SUBSIZESWITCH=number

specifies the size of the subgraphs (number of nodes) to run separately when you also specify the BY_CLUSTER option in this statement and a value greater than 1 for the NTHREADS= option in the PERFORMANCE statement. When PROC OPTGRAPH processes summary by subgraphs, it uses thread logic to simultaneously process n subgraphs, where n is the number of threads specified in the NTHREADS= option in the PERFORMANCE statement. Subgraphs that have more nodes than number are processed sequentially, enabling the threading to be done at the centrality metric level. The default is 10,000.

WEIGHT2=column

specifies the data set variable name for a second link weight. The value of column must be numeric. The use of this option is described in more detail in the section Weight Interpretation.