The LP Procedure

Integer Programming

Formulations of mathematical programs often require that some of the decision variables take only integer values. Consider the formulation

\[  \begin{array}{ll} \mr {minimize} &  c^ Tx \\ \mr {subject\  to} \quad &  A x \,  \{ \geq , =, \leq \}  \,  b \\ &  \ell \leq x \leq u \\ &  x_ i \text { is integer}, i \in {\mathcal S} \end{array}  \]

The set of indices $ {\mathcal S}$ identifies those variables that must take only integer values. When $ {\mathcal S}$ does not contain all of the integers between 1 and $ n$, inclusive, this problem is called a mixed-integer program (MIP). Otherwise, it is known as an integer program. Let $ x^{opt}$(MIP) denote an optimal solution to (MIP). An integer variable with bounds between 0 and 1 is also called a binary variable.

Specifying the Problem

An integer or mixed-integer problem can be solved with PROC LP. To solve this problem, you must identify the integer variables. You can do this with a row in the input data set that has the keyword 'INTEGER' for the type variable. Any variable that has a nonmissing and nonzero value for this row is interpreted as an integer variable. It is important to note that integer variables must have upper bounds explicitly defined using the 'UPPERBD' keyword. The values in the 'INTEGER' row not only identify those variables that must be integers, but they also give an ordering to the integer variables that can be used in the solution technique.

You can follow the same steps to identify binary variables. For the binary variables, there is no need to supply any upper bounds.

Following the rules of sparse data input format, you can also identify individual integer or binary variables.

The Branch-and-Bound Technique

The branch-and-bound approach is used to solve integer and mixed-integer problems. The following discussion outlines the approach and explains how to use several options to control the procedure.

The branch-and-bound technique solves an integer program by solving a sequence of linear programs. The sequence can be represented by a tree, with each node in the tree being identified with a linear program that is derived from the problems on the path leading to the root of the tree. The root of the tree is identified with a linear program that is identical to (MIP), except that $ \mathcal{S}$ is empty. This relaxed version of (MIP), called (LP($0$)), can be written as

\[  \begin{array}{ll} x^{opt}(0)= &  \min c^ Tx \\ \mr {subject\  to} \quad &  A x \,  \{ \geq , =, \leq \}  \,  b \\ &  \ell \leq x \leq u \\ \end{array}  \]

The branch-and-bound approach generates linear programs along the nodes of the tree using the following scheme. Consider $ x^{opt}(0)$, the optimal solution to (LP($0$)). If $ x^{opt}(0)_ i$ is integer for all $i\in \mathcal{S}$, then $ x^{opt}(0)$ is optimal in (MIP). Suppose for some $i\in \mathcal{S}$, $ x^{opt}(0)_ i$ is nonintegral. In that case, define two new problems (LP($1$)) and (LP($2$)), descendants of the parent problem (LP($0$)). The problem (LP($1$)) is identical to (LP($0$)) except for the additional constraint

\[  x_ i\leq \lfloor x^{opt}(0)_ i \rfloor  \]

and the problem (LP($2$)) is identical to (LP($0$)) except for the additional constraint

\[  x_ i\geq \lceil x^{opt}(0)_ i \rceil  \]

The notation $\lceil y \rceil $ means the smallest integer greater than or equal to $ y$, and the notation $\lfloor y \rfloor $ means the largest integer less than or equal to $ y$. Note that the two new problems do not have $ x^{opt}(0)$ as a feasible solution, but because the solution to (MIP) must satisfy one of the preceding constraints, $ x^{opt}_ i$(MIP) must satisfy one of the new constraints. The two problems thus defined are called active nodes in the branch-and-bound tree, and the variable $ x_ i$ is called the branching variable.

Next, the algorithm chooses one of the problems associated with an active node and attempts to solve it using the dual simplex algorithm. The problem may be infeasible, in which case the problem is dropped. If it can be solved, and it in turn does not have an integer solution (that is, a solution for which $ x_ i$ is integer for all $i\in \mathcal{S}$), then it defines two new problems. These new problems each contain all of the constraints of the parent problems plus the appropriate additional one.

Branching continues in this manner until either there are no active nodes or an integer solution is found. When an integer solution is found, its objective value provides a bound for the objective of (MIP). In particular, if $ z$ is the objective value of the current best integer solution, then any active problems whose parent problem has objective value $\geq z$ can be discarded (assuming that the problem is a minimization). This can be done because all problems that descend from this parent will also have objective value $\geq z$. This technique is known as fathoming. When there are no active nodes remaining to be solved, the current integer solution is optimal in (MIP). If no integer solution has been found, then (MIP) is (integer) infeasible.

It is important to realize that integer programs are NP-complete. Roughly speaking, this means that the effort required to solve them grows exponentially with the size of the problem. For example, a problem with 10 binary variables can, in the worst case, generate $2^{10} =1024$ nodes in the branch-and-bound tree. A problem with 20 binary variables can, in the worst case, generate $2^{20} =1048576$ nodes in the branch-and-bound tree. Although the algorithm is unlikely to have to generate every single possible node, the need to explore even a small fraction of the potential number of nodes for a large problem can be resource intensive.

The Integer Iteration Log

To help monitor the growth of the branch-and-bound tree, the LP procedure reports on the status of each problem that is solved. The report, displayed in the Integer Iteration Log, can be used to reconstruct the branch-and-bound tree. Each row in the report describes the results of the attempted solution of the linear program at a node in the tree. In the following discussion, a problem on a given line in the log is called the current problem. The following columns are displayed in the report:

Iter

identifies the number of the branch-and-bound iteration.

Problem

identifies how the current problem fits in the branch-and-bound tree.

Condition

reports the result of the attempted solution of the current problem. Values for Condition are:

  • ACTIVE: The current problem was solved successfully.

  • INFEASIBLE: The current problem is infeasible.

  • FATHOMED: The current problem cannot lead to an improved integer solution and therefore it is dropped.

  • SINGULAR: A singular basis was encountered in attempting to solve the current problem. Solution of this relaxed problem is suspended and will be attempted later if necessary.

  • SUBOPTIMAL: The current problem has an integer feasible solution.

Objective

reports the objective value of the current problem.

Branched

names the variable that is branched in subtrees defined by the descendants of this problem.

Value

gives the current value of the variable named in the column labeled Branched.

Sinfeas

gives the sum of the integer infeasibilities in the optimal solution to the current problem.

Active

reports the total number of nodes currently active in the branch-and-bound tree.

Proximity  

reports the gap between the best integer solution and the current lower (upper for maximizations) bound of all active nodes.

To reconstruct the branch-and-bound tree from this report, consider the interpretation of iteration $ j$. If Iter=$ j$ and Problem=$ k$, then the problem solved on iteration $j$ is identical to the problem solved on iteration $| \,  k \,  |$ with an additional constraint. If $ k>0$, then the constraint is an upper bound on the variable named in the Branched column on iteration $| \,  k \,  |$. If $ k<0$, then the constraint is a lower bound on that variable. The value of the bound can be obtained from the value of Value in iteration $| \,  k \,  |$ as described in the previous section.

Example 5.8 in the section Examples: LP Procedure shows an Integer Iteration Log in its output.

Controlling the Branch-and-Bound Search

There are several options you can use to control branching. This is accomplished by controlling the program’s choice of the branching variable and of the next active node. In the discussion that follows, let

\[  f_ i(k)= x^{opt}(k)_ i-\lfloor x^{opt}(k)_ i \rfloor  \]

where $ x^{opt}(k)$ is the optimal solution to the problem solved in iteration $ k$.

The CANSELECT= option directs the choice of the next active node. Valid keywords for this option include LIFO, FIFO, OBJ, PROJECT, PSEUDOC, and ERROR. The following list describes the action that each of these causes when the procedure must choose for solution a problem from the list of active nodes.

LIFO

chooses the last problem added to the tree of active nodes. This search has the effect of a depth-first search of the branch-and-bound tree.

FIFO

chooses the first node added to the tree of active nodes. This search has the effect of a breadth-first search of the branch-and-bound tree.

OBJ

chooses the problem whose parent has the smallest (largest if the problem is a maximization) objective value.

PROJECT

chooses the problem with the largest (smallest if the problem is a maximization) projected objective value. The projected objective value is evaluated using the sum of integer infeasibilities, $ s(k)$, associated with an active node (LP($ k$)), defined by

\[  s(k) = \sum _{i\in {\mathcal S}}\mr {min} \{ f_ i(k), 1-f_ i(k)\}   \]

An empirical measure of the rate of increase (decrease) in the objective value is defined as

\[  \lambda = (z^*-z(0)) / s(0)  \]

where

  • $ z(k)$ is the optimal objective value for (LP($ k$))

  • $ z^*$ is the objective value of the current best integer solution

The projected objective value for problems (LP($ k+1$)) and (LP($ k+2$)) is defined as

\[  z(k)+ \lambda s(k)  \]
PSEUDOC

chooses the problem with the largest (least if the problem is a maximization) projected pseudocost) The projected pseudocost is evaluated using the weighted sum of infeasibilities $ s_ w(k)$ associated with an active problem (LP($ k$)), defined by

\[  s_ w(k)=\sum _{i\in {\mathcal S}}\mr {min} \{ d_ i(k)f_ i(k), u_ i(k)(1-f_ i(k))\}   \]

The weights $ u_ i$ and $ d_ i$ are initially equal to the absolute value of the $ i$th objective coefficient and are updated at each integer iteration. They are modified by examining the empirical marginal change in the objective as additional constraints are placed on the variables in $ {\mathcal S}$ along the path from (LP($0$)) to a node associated with an integer feasible solution. In particular, if the definition of problems (LP($ k+1$)) and (LP($ k+2$)) from parent (LP($ k$)) involve the addition of constraints $x_ i \leq \lfloor x^{opt}(k)_ i\rfloor $ and $x_ i \geq \lceil x^{opt}(k)_ i\rceil $, respectively, and one of them is on a path to an integer feasible solution, then only one of the following is true:

\[  d_ i(k)=(z(k+1)-z(k))/f_ i(k)  \]
\[  u_ i(k)=(z(k+2)-z(k))/(1-f_ i(k))  \]

Note the similarity between $ s_ w(k)$ and $ s(k)$. The weighted quantity $ s_ w(k)$ accounts to some extent for the influence of the objective function. The projected pseudocost for problems (LP($ k+1$)) and (LP($ k+2$)) is defined as

\[  z_ w(k) \equiv z(k)+s_ w(k)  \]
ERROR

chooses the problem with the largest error. The error associated with problems (LP($ k+1$)) and (LP($ k+2$)) is defined as

\[  (z^*-z_ w(k))/(z^*-z(k))  \]

The BACKTRACK= option controls the search for the next problem. This option can take the same values as the CANSELECT= option. In addition to the case outlined under the DELTAIT= option, backtracking is required as follows based on the CANSELECT= option in effect:

  • If CANSELECT=LIFO and there is no active node in the portion of the active tree currently under exploration with a bound better than the value of WOBJECTIVE=, then the procedure must backtrack.

  • If CANSELECT=FIFO, PROJECT, PSEUDOC, or ERROR and the bound corresponding to the node under consideration is not better than the value of WOBJECTIVE=, then the procedure must backtrack.

The default value is OBJ.

The VARSELECT= option directs the choice of branching variable. Valid keywords for this option include CLOSE, FAR, PRIOR, PSEUDOC, PRICE, and PENALTY. The following list describes the action that each of these causes when $ x^{opt}(k)$, an optimal solution of problem (LP($ k$)), is used to define active problems (LP($ k+1$)) and (LP($ k+2$)).

CLOSE

chooses as branching variable the variable $ x_ i$ such that $ i$ minimizes

\[  \{ \min \{ f_ i(k),1-f_ i(k)\} \mid i \in {\mathcal S} {\  and}  \]
\[  \quad \mbox{IEPSILON} \leq f_ i(k)\leq 1 - \mbox{IEPSILON} \}   \]
FAR

chooses as branching variable the variable $ x_ i$ such that $ i$ maximizes

\[  \{ \min \{ f_ i(k),1-f_ i(k)\}  \mid i\in {\mathcal S} {\  and}  \]
\[  \quad \mbox{IEPSILON} \leq f_ i(k)\leq 1- \mbox{IEPSILON} \}   \]
PRIOR

chooses as branching variable $ x_ i$ such that $i\in {\mathcal S}$, $ x^{opt}(k)_ i$ is nonintegral, and variable $ x_ i$ has the minimum value in the INTEGER row in the input data set. This choice for the VARSELECT= option is recommended when you have enough insight into the model to identify those integer variables that have the most significant effect on the objective value.

PENALTY

chooses as branching variable $ x_ i$ such that $i\in {\mathcal S}$ and a bound on the increase in the objective of (LP($ k$)) (penalty) resulting from adding the constraint

\[  x_ i \leq \lfloor x^{opt}(k)_ i\rfloor \quad \mr {or} \quad x_ i \geq \lceil x^{opt}(k)_ i\rceil  \]

is maximized. The bound is calculated without pivoting using techniques of sensitivity analysis (Garfinkel and Nemhauser 1972). Because the cost of calculating the maximum penalty can be large if $ {\mathcal S}$ is large, you may want to limit the number of variables in $ {\mathcal S}$ for which the penalty is calculated. The penalty is calculated for PENALTYDEPTH= variables in $ {\mathcal S}$.

PRICE

chooses as branching variable $ x_ i$ such that $i\in {\mathcal S}$, $ x^{opt}(k)_ i$ is nonintegral, and variable $ x_ i$ has the minimum price coefficient (maximum for maximization).

PSEUDOC

chooses as branching variable the variable $ x_ i$ such that $ i$ maximizes

\[  \{ \min \{ d_ if_ i(k),u_ i(1-f_ i(k))\}  \mid i\in {\mathcal S} \mr {\  and}  \]
\[  \quad \mbox{IEPSILON} \leq f_ i(k)\leq 1- \mbox{IEPSILON} \}   \]

The weights $ u_ i$ and $ d_ i$ are initially equal to the absolute value of the $ i$th objective coefficient and are updated whenever an integer feasible solution is encountered. See the discussion on the CANSELECT= option for details on the method of updating the weights.

Customizing Search Heuristics

Often a good heuristic for searching the branch-and-bound tree of a problem can be found. You are tempted to continue using this heuristic when the problem data changes but the problem structure remains constant. The ability to reset procedure options interactively enables you to experiment with search techniques in an attempt to identify approaches that perform well. Then you can easily reapply these techniques to subsequent problems.

For example, the PIP branch-and-bound strategy (Crowder, Johnson, and Padberg 1983) describes one such heuristic. The following program uses a similar strategy. Here, the OBJ rule (choose the active node with least parent objective function in the case of a minimization problem) is used for selecting the next active node to be solved until an integer feasible solution is found. Once such a solution is found, the search procedure is changed to the LIFO rule: choose the problem most recently placed in the list of active nodes.

   proc lp canselect=obj ifeasiblepause=1;
   run;
      reset canselect=lifo ifeasiblepause=9999999;
   run;

Further Discussion on AUTO and CONTROL= options

Consider a minimization problem. At each integer iteration, PROC LP will select a node to solve from a pool of active nodes. The best bound strategy ( CANSELECT=OBJ) will pick the node with the smallest projected objective value. This strategy improves the lower bound of the integer program and usually takes fewer integer iterations. One disadvantage is that PROC LP must recalculate the inverse of the basis matrix at almost every integer iteration; such recalculation is relatively expensive. Another disadvantage is that this strategy does not pay attention to improving the upper bound of the integer program. Thus the number of active nodes tends to grow rapidly if PROC LP cannot quickly find an optimal integer solution.

On the other hand, the LIFO strategy is very efficient and does not need to calculate the inverse of the basis matrix unless the previous node is fathomed. It is a depth-first strategy so it tends to find an integer feasible solution quickly. However, this strategy will pick nodes locally and usually will take longer integer iterations than the best bound strategy.

There is another strategy that is often overlooked. Here it is called the best upper bound strategy. With this strategy, each time you select an active node, instead of picking the node with the smallest projected objective value, you select the one with the largest projected objective value. This strategy is as efficient as the LIFO strategy. Moreover, it selects active nodes globally. This strategy tries to improve the upper bound of the integer program by searching for new integer feasible solutions. It also fathoms active nodes quickly and keeps the total number of active nodes below the current level. A disadvantage is that this strategy may evaluate more nodes that do not have any potential in finding an optimal integer solution.

The best bound strategy has the advantage of improving the lower bound. The LIFO strategy has the advantages of efficiency and finding a local integer feasible solution. The best upper bound strategy has the advantages of keeping the size of active nodes under control and at the same time trying to identify any potential integer feasible solution globally.

Although the best bound strategy is generally preferred, in some instances other strategies may be more effective. For example, if you have found an integer optimal solution but you do not know it, you still have to enumerate all possible active nodes. Then the three strategies will basically take the same number of integer iterations after an optimal solution is found but not yet identified. Since the LIFO and best upper bound strategies are very efficient per integer iteration, both will outperform the best bound strategy.

Since no one strategy suits all situations, a hybrid strategy has been developed to increase applicability. The CONTROL= option combines the above three strategies naturally and provides a simple control parameter in [0, 1] dealing with different integer programming problems and different solution situations. The AUTO option automatically sets and adjusts the CONTROL= parameter so that you do not need to know any problem structure or decide a node selection strategy in advance.

Since the LIFO strategy is less costly, you should use it as much as possible in the combinations. The following process is called a diving process. Starting from an active node, apply the LIFO strategy as much as you can until the current node becomes feasible or is fathomed, or exceeds a preset limit. During this process, there is no inverse matrix calculation involved except for the first node. When the diving process is over, apply one of the three strategies to select the next starting node. One set of combinations is called a cycle.

The control parameter $r$ controls the frequency of the three strategies being applied and the depth of the diving process in a cycle. It starts with a pure best bound strategy at $r= $ 0, and then gradually increases the frequency of the diving processes and their depths as $r$ increases. At $r= $ 0.5, one cycle contains a best bound strategy plus a full diving process. After $r= $ 0.5, the number of the diving processes will gradually increase in a cycle. In addition, the best upper bound strategy will join the cycle. As $r$ continues to increase, the frequency of the best upper bound strategy will increase. At $r= $ 1, it becomes a pure best upper bound strategy.

The AUTO option will automatically adjust the value of the CONTROL= option. At the start, it sets CONTROL=0.7, which emphasizes finding an upper bound. After an integer feasible solution is found, it sets CONTROL=0.5, which emphasizes efficiency and lower bound improvement. When the number of active nodes grows over the default or user defined limit $m$, the number indicates that a better upper bound is needed. The AUTO option will start to increase the value of CONTROL= from 0.5. If the size of the active nodes continues to grow, so will the value of the CONTROL= option. When the size of active nodes reaches to the default or user-defined limit $n$, CONTROL= will be set to 1. At this moment, the growth of active nodes is stopped. When the size of active nodes reduces, AUTO will decrease the value of CONTROL= option.

You can use other strategies to improve the lower bound by setting CANSELECT= to other options.

Saving and Restoring the List of Active Nodes

The list of active nodes can be saved in a SAS data set for use at a subsequent invocation of PROC LP. The ACTIVEOUT= option in the PROC LP statement names the data set into which the current list of active nodes is saved when the procedure terminates due to an error termination condition. Examples of such conditions are time limit exceeded, integer iterations exceeded, and phase 3 iterations exceeded. The ACTIVEIN= option in the PROC LP statement names a data set that can be used to initialize the list of active nodes. To achieve the greatest benefit when restarting PROC LP, use the PRIMALOUT= and PRIMALIN= options in conjunction with the ACTIVEOUT= and ACTIVEIN= options. See Example 5.10 in the section Examples: LP Procedure for an illustration.