Getting Started: NPSC Problems

To solve NPSC problems using PROC INTPOINT, you save a representation of the network and the side constraints in three SAS data sets. These data sets are then passed to PROC INTPOINT for solution. There are various forms that a problem’s data can take. You can use any one or a combination of several of these forms.

The NODEDATA= data set contains the names of the supply and demand nodes and the supply or demand associated with each. These are the elements in the column vector $ b$ in the NPSC problem (see the section Mathematical Description of NPSC).

The ARCDATA= data set contains information about the variables of the problem. Usually these are arcs, but there can also be data related to nonarc variables in the ARCDATA= data set.

An arc is identified by the names of its tail node (where it originates) and head node (where it is directed). Each observation can be used to identify an arc in the network and, optionally, the cost per flow unit across the arc, the arc’s capacity, lower flow bound, and name. These data are associated with the matrix $ F$ and the vectors $ c$, $ l$, and $ u$ in the NPSC problem (see the section Mathematical Description of NPSC).

Note: Although $ F$ is a node-arc incidence matrix, it is specified in the ARCDATA= data set by arc definitions. Do not explicitly specify these flow conservation constraints as constraints of the problem.

In addition, the ARCDATA= data set can be used to specify information about nonarc variables, including objective function coefficients, lower and upper value bounds, and names. These data are the elements of the vectors $ d$, $ m$, and $ v$ in the NPSC problem (see the section Mathematical Description of NPSC). Data for an arc or nonarc variable can be given in more than one observation.

Supply and demand data also can be specified in the ARCDATA= data set. In such a case, the NODEDATA= data set may not be needed.

The CONDATA= data set describes the side constraints and their right-hand sides. These data are elements of the matrices $ H$ and $ Q$ and the vector $ r$. Constraint types are also specified in the CONDATA= data set. You can include in this data set upper bound values or capacities, lower flow or value bounds, and costs or objective function coefficients. It is possible to give all information about some or all nonarc variables in the CONDATA= data set.

An arc is identified in this data set by its name. If you specify an arc’s name in the ARCDATA= data set, then this name is used to associate data in the CONDATA= data set with that arc. Each arc also has a default name that is the name of the tail and head node of the arc concatenated together and separated by an underscore character; tail_head, for example.

If you use the dense side constraint input format (described in the section CONDATA= Data Set), and want to use the default arc names, these arc names are names of SAS variables in the VAR list of the CONDATA= data set.

If you use the sparse side constraint input format (see the section CONDATA= Data Set) and want to use the default arc names, these arc names are values of the COLUMN list variable of the CONDATA= data set.

PROC INTPOINT reads the data from the NODEDATA= data set, the ARCDATA= data set, and the CONDATA= data set. Error checking is performed, and the model is converted into an equivalent LP. This LP is preprocessed. Preprocessing is optional but highly recommended. Preprocessing analyzes the model and tries to determine before optimization whether variables can be fixed to their optimal values. Knowing that, the model can be modified and these variables dropped out. It can be determined that some constraints are redundant. Sometimes, preprocessing succeeds in reducing the size of the problem, thereby making the subsequent optimization easier and faster.

The optimal solution to the equivalent LP is then found. This LP is converted back to the original NPSC problem, and the optimum for this is derived from the optimum of the equivalent LP. If the problem was preprocessed, the model is now post-processed, where fixed variables are reintroduced. The solution can be saved in the CONOUT= data set.

Introductory NPSC Example

Consider the following transshipment problem for an oil company. Crude oil is shipped to refineries where it is processed into gasoline and diesel fuel. The gasoline and diesel fuel are then distributed to service stations. At each stage, there are shipping, processing, and distribution costs. Also, there are lower flow bounds and capacities.

In addition, there are two sets of side constraints. The first set is that two times the crude from the Middle East cannot exceed the throughput of a refinery plus 15 units. (The phrase plus 15 units that finishes the last sentence is used to enable some side constraints in this example to have a nonzero rhs.) The second set of constraints are necessary to model the situation that one unit of crude mix processed at a refinery yields three-fourths of a unit of gasoline and one-fourth of a unit of diesel fuel.

Because there are two products that are not independent in the way in which they flow through the network, an NPSC is an appropriate model for this example (see Figure 4.6). The side constraints are used to model the limitations on the amount of Middle Eastern crude that can be processed by each refinery and the conversion proportions of crude to gasoline and diesel fuel.

Figure 4.6: Oil Industry Example

LaTeX defined picture


To solve this problem with PROC INTPOINT, save a representation of the model in three SAS data sets. In the NODEDATA= data set, you name the supply and demand nodes and give the associated supplies and demands. To distinguish demand nodes from supply nodes, specify demands as negative quantities. For the oil example, the NODEDATA= data set can be saved as follows:

title  'Oil Industry Example';
title3 'Setting Up Nodedata = Noded For PROC INTPOINT';
data noded;
   input   _node_&$15. _sd_;
   datalines;
middle east       100
u.s.a.             80
servstn1 gas      -95
servstn1 diesel   -30
servstn2 gas      -40
servstn2 diesel   -15
;

The ARCDATA= data set contains the rest of the information about the network. Each observation in the data set identifies an arc in the network and gives the cost per flow unit across the arc, the capacities of the arc, the lower bound on flow across the arc, and the name of the arc.

title3 'Setting Up Arcdata = Arcd1 For PROC INTPOINT';
data arcd1;
   input _from_&$11. _to_&$15. _cost_ _capac_ _lo_ _name_ $;
   datalines;
middle east    refinery 1        63     95   20    m_e_ref1
middle east    refinery 2        81     80   10    m_e_ref2
u.s.a.         refinery 1        55      .    .    .
u.s.a.         refinery 2        49      .    .    .
refinery 1     r1               200    175   50    thruput1
refinery 2     r2               220    100   35    thruput2
r1             ref1 gas           .    140    .    r1_gas
r1             ref1 diesel        .     75    .    .
r2             ref2 gas           .    100    .    r2_gas
r2             ref2 diesel        .     75    .    .
ref1 gas       servstn1 gas      15     70    .    .
ref1 gas       servstn2 gas      22     60    .    .
ref1 diesel    servstn1 diesel   18      .    .    .
ref1 diesel    servstn2 diesel   17      .    .    .
ref2 gas       servstn1 gas      17     35    5    .
ref2 gas       servstn2 gas      31      .    .    .
ref2 diesel    servstn1 diesel   36      .    .    .
ref2 diesel    servstn2 diesel   23      .    .    .
;

Finally, the CONDATA= data set contains the side constraints for the model:

title3 'Setting Up Condata = Cond1 For PROC INTPOINT';
data cond1;
   input m_e_ref1 m_e_ref2 thruput1 r1_gas thruput2 r2_gas
         _type_ $ _rhs_;
   datalines;
-2  .  1 .  . . >= -15
 . -2  . .  1 . GE -15
 .  . -3 4  . . EQ   0
 .  .  . . -3 4  =   0
;

Note that the SAS variable names in the CONDATA= data set are the names of arcs given in the ARCDATA= data set. These are the arcs that have nonzero constraint coefficients in side constraints. For example, the proportionality constraint that specifies that one unit of crude at each refinery yields three-fourths of a unit of gasoline and one-fourth of a unit of diesel fuel is given for refinery 1 in the third observation and for refinery 2 in the last observation. The third observation requires that each unit of flow on the arc thruput1 equals three-fourths of a unit of flow on the arc r1_gas. Because all crude processed at refinery 1 flows through thruput1 and all gasoline produced at refinery 1 flows through r1_gas, the constraint models the situation. It proceeds similarly for refinery 2 in the last observation.

To find the minimum cost flow through the network that satisfies the supplies, demands, and side constraints, invoke PROC INTPOINT as follows:

proc intpoint
   bytes=1000000
   nodedata=noded        /* the supply and demand data */
   arcdata=arcd1         /* the arc descriptions       */
   condata=cond1         /* the side constraints       */
   conout=solution;      /* the solution data set      */
   run;

The following messages, which appear on the SAS log, summarize the model as read by PROC INTPOINT and note the progress toward a solution.

NOTE: Number of nodes= 14 .                                                     
NOTE: Number of supply nodes= 2 .                                               
NOTE: Number of demand nodes= 4 .                                               
NOTE: Total supply= 180 , total demand= 180 .                                   
NOTE: Number of arcs= 18 .                                                      
NOTE: Number of <= side constraints= 0 .                                        
NOTE: Number of == side constraints= 2 .                                        
NOTE: Number of >= side constraints= 2 .                                        
NOTE: Number of side constraint coefficients= 8 .                               
NOTE: The following messages relate to the equivalent Linear Programming        
      problem solved by the Interior Point algorithm.                           
NOTE: Number of <= constraints= 0 .                                             
NOTE: Number of == constraints= 16 .                                            
NOTE: Number of >= constraints= 2 .                                             
NOTE: Number of constraint coefficients= 44 .                                   
NOTE: Number of variables= 18 .                                                 
NOTE: After preprocessing, number of <= constraints= 0.                         
NOTE: After preprocessing, number of == constraints= 3.                         
NOTE: After preprocessing, number of >= constraints= 2.                         
NOTE: The preprocessor eliminated 13 constraints from the problem.              
NOTE: The preprocessor eliminated 33 constraint coefficients from the problem.  
NOTE: After preprocessing, number of variables= 5.                              
NOTE: The preprocessor eliminated 13 variables from the problem.                
NOTE: 4 columns, 0 rows and 4 coefficients were added to the problem to handle  
      unrestricted variables, variables that are split, and constraint slack or 
      surplus variables.                                                        
NOTE: There are 10 sub-diagonal nonzeroes in the unfactored A Atranspose matrix.
NOTE: The 5 factor nodes make up 1 supernodes                                   
NOTE: There are 0 nonzero sub-rows or sub-columns outside the supernodal        
      triangular regions along the factors leading diagonal.                    
NOTE: Bound feasibility attained by iteration 1.                                
NOTE: Dual feasibility attained by iteration 1.                                 
NOTE: Constraint feasibility attained by iteration 1.                           
NOTE: The Primal-Dual Predictor-Corrector Interior Point algorithm performed 6  
      iterations.                                                               
NOTE: Optimum reached.                                                          
NOTE: Objective= 50875.                                                         
NOTE: The data set WORK.SOLUTION has 18 observations and 10 variables.          
NOTE: There were 18 observations read from the data set WORK.ARCD1.             
NOTE: There were 6 observations read from the data set WORK.NODED.              
NOTE: There were 4 observations read from the data set WORK.COND1.              

The first set of messages shows the size of the problem. The next set of messages provides statistics on the size of the equivalent LP problem. The number of variables may not equal the number of arcs if the problem has nonarc variables. This example has none. To convert a network to the equivalent LP problem, a flow conservation constraint must be created for each node (including an excess or bypass node, if required). This explains why the number of equality constraints and the number of constraint coefficients differ from the number of equality side constraints and the number of coefficients in all side constraints.

If the preprocessor was successful in decreasing the problem size, some messages will report how well it did. In this example, the model size was cut approximately in half!

The next set of messages describes aspects of the interior point algorithm. Of particular interest are those concerned with the Cholesky factorization of $ A A^ T$ where $ A$ is the coefficient matrix of the final LP. It is crucial to preorder the rows and columns of this matrix to prevent fill-in and reduce the number of row operations to undertake the factorization. See the section Interior Point Algorithmic Details for a more extensive explanation.

Unlike PROC LP, which displays the solution and other information as output, PROC INTPOINT saves the optimum in the output SAS data set that you specify. For this example, the solution is saved in the SOLUTION data set. It can be displayed with the PRINT procedure as

title3 'Optimum'; 
proc print data=solution;
   var _from_ _to_ _cost_ _capac_ _lo_ _name_ 
       _supply_ _demand_ _flow_ _fcost_;
   sum _fcost_;
   run;

Figure 4.7: CONOUT=SOLUTION

Optimum

Obs _from_ _to_ _cost_ _capac_ _lo_ _name_ _SUPPLY_ _DEMAND_ _FLOW_ _FCOST_
1 refinery 1 r1 200 175 50 thruput1 . . 145.000 29000.00
2 refinery 2 r2 220 100 35 thruput2 . . 35.000 7700.00
3 r1 ref1 diesel 0 75 0   . . 36.250 0.00
4 r1 ref1 gas 0 140 0 r1_gas . . 108.750 0.00
5 r2 ref2 diesel 0 75 0   . . 8.750 0.00
6 r2 ref2 gas 0 100 0 r2_gas . . 26.250 0.00
7 middle east refinery 1 63 95 20 m_e_ref1 100 . 80.000 5040.00
8 u.s.a. refinery 1 55 99999999 0   80 . 65.000 3575.00
9 middle east refinery 2 81 80 10 m_e_ref2 100 . 20.000 1620.00
10 u.s.a. refinery 2 49 99999999 0   80 . 15.000 735.00
11 ref1 diesel servstn1 diesel 18 99999999 0   . 30 30.000 540.00
12 ref2 diesel servstn1 diesel 36 99999999 0   . 30 0.000 0.00
13 ref1 gas servstn1 gas 15 70 0   . 95 68.750 1031.25
14 ref2 gas servstn1 gas 17 35 5   . 95 26.250 446.25
15 ref1 diesel servstn2 diesel 17 99999999 0   . 15 6.250 106.25
16 ref2 diesel servstn2 diesel 23 99999999 0   . 15 8.750 201.25
17 ref1 gas servstn2 gas 22 60 0   . 40 40.000 880.00
18 ref2 gas servstn2 gas 31 99999999 0   . 40 0.000 0.00
                    50875.00


Notice that, in CONOUT=SOLUTION (Figure 4.7), the optimal flow through each arc in the network is given in the variable named _FLOW_, and the cost of flow through each arc is given in the variable _FCOST_.

Figure 4.8: Oil Industry Solution

LaTeX defined picture