The OPTGRAPH Procedure

CENTRALITY Statement

  • CENTRALITY < options >;

The CENTRALITY statement enables you to select which centrality metrics to calculate for the given input graph. It also enables you to specify options for particular metrics. The resulting metrics are included in the node output data set (specified in the OUT_NODES= option) or the link output data set (specified in the OUT_LINKS= option).

The centrality metrics are described in the section Centrality.

You can specify the following options in the CENTRALITY statement.

AUTH=WEIGHT | UNWEIGHT | BOTH

specifies which type of authority centrality to calculate.

Table 1.10: Values for the AUTH= Option

Option Value

Description

WEIGHT

Calculates authority centrality based on the weighted graph.

UNWEIGHT

Calculates authority centrality based on the unweighted graph.

BOTH

Calculates authority centrality based on both weighted and unweighted graphs.


If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). This centrality metric can be used only for directed graphs. The authority centrality metric is described in the section Hub and Authority Scoring.

BETWEEN=WEIGHT | UNWEIGHT | BOTH

specifies which type of betweenness centrality to calculate.

Table 1.11: Values for the BETWEEN= Option

Option Value

Description

WEIGHT

Calculates betweenness centrality based on the weighted graph.

UNWEIGHT

Calculates betweenness centrality based on the unweighted graph.

BOTH

Calculates betweenness centrality based on both weighted and unweighted graphs.


If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). If the OUT_NODES= option is specified in the PROC OPTGRAPH statement, the node betweenness metric is produced. If the OUT_LINKS= option is specified, the link betweenness metric is produced. The betweenness centrality metric is described in the section Betweenness Centrality.

BETWEEN_NORM=YES | NO

specifies whether to normalize the betweenness centrality metrics.

Table 1.12: Values for the BETWEEN_NORM= Option

Option Value

Description

YES

Normalizes the betweenness metrics. This is the default.

NO

Does not normalize the betweenness metrics.


The normalization factor for betweenness centrality is described in the section Betweenness Centrality.

BY_CLUSTER

decomposes the calculations by cluster (or subgraph). If this option is specified, PROC OPTGRAPH looks for a definition of the clusters in the input data set specified by the DATA_NODES= option in the PROC OPTGRAPH statement. The use of the BY_CLUSTER option is described in the section Processing by Cluster.

CLOSE=WEIGHT | UNWEIGHT | BOTH

specifies which type of closeness centrality to calculate.

Table 1.13: Values for the CLOSE= Option

Option Value

Description

WEIGHT

Calculates closeness centrality based on the weighted graph.

UNWEIGHT

Calculates closeness centrality based on the unweighted graph.

BOTH

Calculates closeness centrality based on both weighted and unweighted graphs.


If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). The closeness centrality metric is described in the section Closeness Centrality.

CLOSE_NOPATH=NNODES | DIAMETER | ZERO | HARMONIC

specifies a method for accounting for a shortest path distance between two nodes when a path does not exist (disconnected nodes).

Table 1.14: Values for the CLOSE_NOPATH= Option

Option Value

Description

NNODES

Uses the number of nodes as a shortest path between disconnected nodes. This option cannot be used in calculating weighted closeness centrality.

DIAMETER

Uses the graph diameter (plus one) as a shortest path between disconnected nodes. This is the default.

ZERO

Uses zero as a shortest path between disconnected nodes.

HARMONIC

Uses the harmonic formula for closeness centrality.


For each option, there is a slight variation in the formula for the closeness centrality metric. These differences are described in the section Closeness Centrality.

CLUSTERING_COEF

calculates the node clustering coefficient. The cluster coefficient is described in the section Clustering Coefficient.

DEGREE=IN | OUT | BOTH

specifies which type of degree centrality to calculate for the input graph.

Table 1.15: Values for the DEGREE= Option

Option Value

Description

IN

Calculates degree based on in-links.

OUT

Calculates degree based on out-links.

BOTH

Calculates degree based on in-links and out-links.


For an undirected graph, the option values IN and BOTH are ignored, because there is only one notion of degree, which corresponds to the degree of out-links. The degree centrality metric is described in the section Degree Centrality.

EIGEN=WEIGHT | UNWEIGHT | BOTH

specifies which type of eigenvector centrality to calculate.

Table 1.16: Values for the EIGEN= Option

Option Value

Description

WEIGHT

Calculates eigenvector centrality based on the weighted graph.

UNWEIGHT

Calculates eigenvector centrality based on the unweighted graph.

BOTH

Calculates eigenvector centrality based on both weighted and unweighted graphs.


If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). This centrality metric can be used only for undirected graphs. The eigenvector centrality metric is described in the section Eigenvector Centrality.

EIGEN_ALGORITHM=AUTOMATIC | JACOBI_DAVIDSON | POWER

specifies the algorithm to use in calculating centrality metrics that require solving eigensystems (EIGEN, HUB, and AUTH).

Table 1.17: Values for the EIGEN_ALGORITHM= Option

Option Value

Description

AUTOMATIC

Requests that PROC OPTGRAPH automatically determine the eigensolver to use. This is the default.

JACOBI_DAVIDSON (JD)

Uses a variant of the Jacobi-Davidson algorithm for solving eigensystems (Sleijpen and van der Vorst 2000). This is used as the default for the eigenvector metric on undirected graphs and the hub and authority metrics.

POWER

Uses the power method to calculate eigenvectors. This is used as the default for the eigenvector metric on directed graphs.


EIGEN_MAXITER=number

specifies the maximum number of iterations to use for eigenvector calculations to limit the amount of computation time spent when convergence is slow. By default, EIGEN_MAXITER=10,000.

HUB=WEIGHT | UNWEIGHT | BOTH

specifies which type of hub centrality to calculate.

Table 1.18: Values for the HUB= Option

Option Value

Description

WEIGHT

Calculates hub centrality based on the weighted graph.

UNWEIGHT

Calculates hub centrality based on the unweighted graph.

BOTH

Calculates hub centrality based on both weighted and unweighted graphs.


If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). This centrality metric can be used only for directed graphs. The hub centrality metric is described in the section Hub and Authority Scoring.

INFLUENCE=WEIGHT | UNWEIGHT | BOTH

specifies which type of influence centrality to calculate.

Table 1.19: Values for the INFLUENCE= Option

Option Value

Description

WEIGHT

Calculates influence centrality based on the weighted graph.

UNWEIGHT

Calculates influence centrality based on the unweighted graph.

BOTH

Calculates influence centrality based on both weighted and unweighted graphs.


If the input graph does not contain weights, then WEIGHT and UNWEIGHT both give the same results (using 1.0 for each link weight). The influence centrality metric is described in the section Influence Centrality.

LOGFREQNODE=number

controls the frequency for displaying iteration logs for some of the centrality metrics. For computationally intensive algorithms such as betweenness and closeness centrality, this option displays progress every number nodes. If you also specify the BY_CLUSTER option in this statement or a value greater than 1 for the NTHREADS= option in the PERFORMANCE statement, this option is ignored and the display frequency is determined by using the LOGFREQTIME= option instead. The value of number can be any integer greater than or equal to 1; the default is determined automatically based on the size of the graph. Setting this value too low can hurt performance on large-scale graphs.

LOGFREQTIME=number

controls the frequency for displaying iteration logs for some of the centrality metrics. For computationally intensive algorithms such as betweenness and closeness centrality, this option displays progress every number seconds. If you specify a value greater than 1 for the NTHREADS= option in the PERFORMANCE statement, PROC OPTGRAPH displays the number of nodes that have completed. If you specify the BY_CLUSTER option, PROC OPTGRAPH displays the number of subgraphs that have completed. The value of number can be any integer greater than or equal to 1; the default is 5. Setting this value too low can hurt performance on large-scale graphs.

LOGLEVEL=number | string

controls the amount of information that is displayed in the SAS log. Table 1.20 describes the valid values for this option.

Table 1.20: Values for LOGLEVEL= Option

number

string

Description

0

NONE

Turns off all algorithm-related messages in the SAS log

1

BASIC

Displays a basic summary of the algorithmic processing

2

MODERATE

Displays a summary of the algorithmic processing including a progress log using the interval that is specified in the LOGFREQNODE= or LOGFREQTIME= option

3

AGGRESSIVE

Displays a detailed summary of the algorithmic processing including a progress log using the interval that is specified in the LOGFREQNODE= or LOGFREQTIME= option


The default is the value that is specified in the LOGLEVEL= option in the PROC OPTGRAPH statement (or BASIC if that option is not specified).

SUBSIZESWITCH=number

specifies the size of the subgraphs (number of nodes) to run separately when you also specify the BY_CLUSTER option in this statement and a value greater than 1 for the NTHREADS= option in the PERFORMANCE statement. When PROC OPTGRAPH processes summary by subgraphs, it uses thread logic to simultaneously process n subgraphs, where n is the number of threads specified in the NTHREADS= option in the PERFORMANCE statement. Subgraphs that have more nodes than number are processed sequentially, enabling the threading to be done at the centrality metric level. The default is 10,000.

WEIGHT2=column

specifies the data set variable name for a second link weight. The value of column must be numeric. The use of this option is described in more detail in the section Weight Interpretation.