The OPTGRAPH Procedure

Example 1.11 Linear Assignment Problem, Sparse Format versus Dense Format

This example looks at the problem of assigning swimmers to strokes based on their best times. However, in this case certain swimmers are not eligible to perform certain strokes. A missing (.) value in the data matrix identifies an ineligible assignment. For example:

data RelayTimesMatrix;
   input name $ sex $ back breast fly free;
   datalines;
Sue     F   .  36.7 28.3 36.1
Karen   F 34.6    .    . 26.2
Jan     F 31.3    . 27.1    .
Andrea  F 28.6    . 29.1    .
Carol   F 32.9    . 26.6    .
;

Recall that the linear assignment problem can also be interpreted as the minimum-weight matching in a bipartite graph. The eligible assignments define links between the rows (swimmers) and the columns (strokes), as in Figure 1.149.

Figure 1.149: Bipartite Graph for Linear Assignment Problem

You can represent the same data in RelayTimesMatrix by using a links data set as follows:

data RelayTimesLinks;
   input name $ attr $ cost;
   datalines;
Sue     breast 36.7
Sue     fly    28.3
Sue     free   36.1
Karen   back   34.6
Karen   free   26.2
Jan     back   31.3
Jan     fly    27.1
Andrea  back   28.6
Andrea  fly    29.1
Carol   back   32.9
Carol   fly    26.6
;

This graph must be bipartite (such that S and T are disjoint). If it is not, PROC OPTGRAPH returns an error.

Now, you can use either input format to solve the same problem, as follows:

proc optgraph
   data_matrix = RelayTimesMatrix;
   linear_assignment
      out      = LinearAssignMatrix
      weight   = (back--free)
      id       = (name sex);
run;

proc optgraph
   graph_direction = directed
   data_links      = RelayTimesLinks;
   data_links_var
      from         = name
      to           = attr
      weight       = cost;
   linear_assignment
      out          = LinearAssignLinks;
run;

When you use the graph input format, the LINEAR_ASSIGNMENT options WEIGHT= and ID= are not used directly.

The data sets LinearAssignMatrix and LinearAssignLinks now contain the optimal assignments, as shown in Output 1.11.1 and Output 1.11.2.

Output 1.11.1: Optimal Assignments for Swim Times (Dense Input)

name	sex	assign	cost
Sue	F	breast	36.7
Karen	F	free	26.2
Andrea	F	back	28.6
Carol	F	fly	26.6
			118.1

Output 1.11.2: Optimal Assignments for Swim Times (Sparse Input)

name	attr	cost
Sue	breast	36.7
Karen	free	26.2
Andrea	back	28.6
Carol	fly	26.6
		118.1

The optimal assignments are shown graphically in Figure 1.150.

Figure 1.150: Optimal Assignments for Swim Times

For large problems where a number of links are forbidden, the sparse format can be faster and can save a great deal of memory. Consider an example that uses the format of the DATA_MATRIX= option with 15,000 columns ( $|S|=15,000$ ) and 4,000 rows ( $|T|=4,000$ ). To store the dense matrix in memory, PROC OPTGRAPH needs to allocate approximately $|S| \cdot |T| \cdot 8/1024/1024=457$ MB. If the data have mostly ineligible links, then the sparse (graph) format that uses the DATA_LINKS= option is much more efficient with respect to memory. For example, if the data have only 5% of the eligible links ( $15,000 \cdot 4,000 \cdot 0.05 = 3,000,000$ ), then the dense storage would still need 457 MB. The sparse storage for the same example needs approximately $|S| \cdot |T| \cdot 0.05 \cdot 12/1024/1024 = 34$ MB. If the problem is fully dense (all links are eligible), then the dense format that uses the DATA_MATRIX= option is the most efficient.