The OPTNET Procedure

Graph Input Data

This section describes how to input a graph for analysis by PROC OPTNET. Let $G = (N,A)$ define a graph with a set N of nodes and a set A of links.

Consider the directed graph shown in Figure 2.5.

Figure 2.5: A Simple Directed Graph

A Simple Directed Graph


Notice that each node and link has associated attributes: a node label and a link weight.

Node Input Data

The DATA_NODES= option in the PROC OPTNET statement defines the data set that contains the list of nodes in the graph. This data set is used to assign node weights.

The nodes data set is expected to contain some combination of the following possible variables:

  • node: the node label (this variable can be numeric or character)

  • weight: the node weight (this variable must be numeric)

  • weight2: the auxiliary node weight (this variable must be numeric)

You can specify any values that you want for the data set variable names. If you use nonstandard names, you must identify the variables by using the DATA_NODES_VAR statement, as described in the section DATA_NODES_VAR Statement.

The data set that is specified in the DATA_LINKS= option defines the set of nodes that are incident to some link. If the graph contains a node that has no links (called a singleton node), then this node must be defined in the DATA_NODES data set. The following is an example of a graph with three links but four nodes, including a singleton node D:

data NodeSetIn;
   input label $ @@;
   datalines;
A B C D
;

data LinkSetInS;
   input from $ to $ weight;
   datalines;
A B 1
A C 2
B C 1
;

If you specify duplicate entries in the node data set, PROC OPTNET takes the first occurrence of the node and ignores the others. A warning is printed to the log.

Node Subset Input Data

For some algorithms you might want to process only a subset of the nodes that appear in the input graph. You can accomplish this by using the DATA_NODES_SUB= option in the PROC OPTNET statement. You can use the node subset data set in conjunction with the SHORTPATH statement (see the section Shortest Path. The node subset data set is expected to contain some combination of the following variables:

  • node: the node label (this variable can be numeric or character)

  • source: whether to process this node as a source node in shortest path algorithms (this variable must be numeric)

  • sink: whether to process this node as a sink node in shortest path algorithms (this variable must be numeric)

The values in the node subset data set determine how to process nodes when the SHORTPATH statement is processed. A value of 0 for the source variable designates that the node is not to be processed as a source; a value of 1 designates that the node is to be processed as a source. The same values can be used for the sink variable to designate whether the node is to be processed as a sink. The missing indicator (.) can also be used in place of 0 to designate that a node is not to be processed.

A representative example of a node subset data set that might be used with the graph in Figure 2.5 is as follows:

data NodeSubSetIn;
   input node $ source sink;
   datalines;
A 1 .
F . 1
E 1 .
;

The data set NodeSubSetIn indicates that you want to process the shortest paths for the source-sink pairs in $\{ A,E\}  \times \{ F\} $.

Standardized Labels

For large-scale graphs, the processing stage that reads the nodes and links into memory can be time-consuming. Under the following assumptions, you can use the STANDARDIZED_LABELS option in the PROC OPTNET statement to greatly speed up this stage:

  1. The link data set variables from and to are numeric type.

  2. The node and node subset data set variable node is numeric type.

  3. The node labels start from 0 and are consecutive nonnegative integers.

Consider the following links data set that uses numeric labels:

data LinkSetIn;
   input from to weight;
   datalines;
0 1 1
3 0 2
1 5 1
;

Using default settings, the following statements echo back link and node data sets that contain three links and four nodes, respectively:

proc optnet
   data_links = LinkSetIn
   out_nodes  = NodeSetOut
   out_links  = LinkSetOut;
run;

The log is shown in Figure 2.10.

Figure 2.10: PROC OPTNET Log: A Simple Undirected Graph

NOTE: ------------------------------------------------------------------------------------------
NOTE: Running OPTNET version 14.1.                                                              
NOTE: ------------------------------------------------------------------------------------------
NOTE: The OPTNET procedure is executing in single-machine mode.                                 
NOTE: ------------------------------------------------------------------------------------------
NOTE: Data input used 0.01 (cpu: 0.00) seconds.                                                 
NOTE: The number of nodes in the input graph is 4.                                              
NOTE: The number of links in the input graph is 3.                                              
NOTE: ------------------------------------------------------------------------------------------
NOTE: Data output used 0.00 (cpu: 0.00) seconds.                                                
NOTE: ------------------------------------------------------------------------------------------
NOTE: The data set WORK.NODESETOUT has 4 observations and 1 variables.                          
NOTE: The data set WORK.LINKSETOUT has 3 observations and 3 variables.                          



The data set NodeSetOut, shown in Figure 2.11, contains the unique numeric node labels, $\{ 0,1,3,5\} $.

Figure 2.11: Node Data Set of a Simple Directed Graph

Obs node
1 0
2 1
3 3
4 5



Using standardized labels, the same input data set defines a graph that has six (not four) nodes:

proc optnet
   standardized_labels
   data_links = LinkSetIn
   out_nodes  = NodeSetOut
   out_links  = LinkSetOut;
run;

The log that results from using standardized labels is shown in Figure 2.12.

Figure 2.12: PROC OPTNET Log: A Simple Undirected Graph Using Standardized Labels

NOTE: ------------------------------------------------------------------------------------------
NOTE: Running OPTNET version 14.1.                                                              
NOTE: ------------------------------------------------------------------------------------------
NOTE: The OPTNET procedure is executing in single-machine mode.                                 
NOTE: ------------------------------------------------------------------------------------------
NOTE: Data input used 0.00 (cpu: 0.00) seconds.                                                 
NOTE: The number of nodes in the input graph is 6.                                              
NOTE: The number of links in the input graph is 3.                                              
NOTE: The number of singleton nodes in the input graph is 2.                                    
NOTE: ------------------------------------------------------------------------------------------
NOTE: Data output used 0.00 (cpu: 0.00) seconds.                                                
NOTE: ------------------------------------------------------------------------------------------
NOTE: The data set WORK.NODESETOUT has 6 observations and 1 variables.                          
NOTE: The data set WORK.LINKSETOUT has 3 observations and 3 variables.                          



The data set NodeSetOut, shown in Figure 2.13, now contains all node labels from 0 to 5, based on the assumptions when you use the STANDARDIZED_LABELS option.

Figure 2.13: Node Data Set of a Simple Directed Graph

Obs node
1 0
2 1
3 2
4 3
5 4
6 5



When you use standardized labels, the DATA_NODES= input order (which can be arbitrary) is not preserved in the OUT_NODES= output data set. Instead, the order is ascending, starting from zero.