The Interior Point Algorithm

The simplex algorithm, developed shortly after World War II, was for many years the main method used to solve linear programming problems. Over the last fifteen years, however, the interior point algorithm has been developed. This algorithm also solves linear programming problems. From the start it showed great theoretical promise, and considerable research in the area resulted in practical implementations that performed competitively with the simplex algorithm. More recently, interior point algorithms have evolved to become superior to the simplex algorithm, in general, especially when the problems are large.

There are many variations of interior point algorithms. PROC INTPOINT uses the Primal-Dual with Predictor-Corrector algorithm. More information on this particular algorithm and related theory can be found in the texts by Roos, Terlaky, and Vial (1997), Wright (1996), and Ye (1996).

Interior Point Algorithmic Details

After preprocessing, the linear program to be solved is

\[  \begin{array}{ll} \mr {minimize} &  c^ T x \\ \mr {subject\  to} &  A x = b \\ &  x \geq 0 \\ \end{array}  \]

This is the primal problem. The matrices $ d$, $ z$, and $ Q$ of NPSC have been renamed $ c$, $ x$, and $ A$, respectively, as these symbols are by convention used more, the problem to be solved is different from the original because of preprocessing, and there has been a change of primal variable to transform the LP into one whose variables have zero lower bounds. To simplify the algebra here, assume that variables have infinite upper bounds, and constraints are equalities. (Interior point algorithms do efficiently handle finite upper bounds, and it is easy to introduce primal slack variables to change inequalities into equalities.) The problem has $ n$ variables; $ i$ is a variable number; $ k$ is an iteration number, and if used as a subscript or superscript it denotes of iteration $ k$.

There exists an equivalent problem, the dual problem, stated as

\[  \begin{array}{ll} \mr {maximize} &  b^ T y \\ \mr {subject\  to} &  A^ T y + s = c \\ &  s \geq 0 \\ \end{array}  \]

where $ y$ are dual variables, and $ s$ are dual constraint slacks.

The interior point algorithm solves the system of equations to satisfy the Karush-Kuhn-Tucker (KKT) conditions for optimality:

$ A x = b$

$ A^ T y + s = c $

$ X S e = 0 $

$ x \ge 0 $

$ s \ge 0 $

where

$ S = \mr {diag}(s) \:  $ (that is, $ S_{i,j} = s_ i$ if $ i=j, S_{i,j} = 0 \, $ otherwise)

$ X = \mr {diag}(x)$

$ e_ i = 1 \:  \forall i $

These are the conditions for feasibility, with the complementarity condition $ X S e = 0$ added. Complementarity forces the optimal objectives of the primal and dual to be equal, $ c^ T x_{opt} = b^ T y_{opt}$, as

$ 0 = x^ T_{opt} s_{opt} = s^ T_{opt} x_{opt} = (c - A^ T y_{opt})^ T x_{opt} = c^ T x_{opt} - y^ T_{opt} (A x_{opt}) = c^ T x_{opt} - b^ T y_{opt}$

Before the optimum is reached, a solution $ (x, y, s)$ may not satisfy the KKT conditions:

  • Primal constraints may be violated, $\mi {infeas}_ c = b - A x \neq 0$.

  • Dual constraints may be violated, $\mi {infeas}_ d = c - A^ T y - s \neq 0$.

  • Complementarity may not be satisfied, $x^ T s = c^ T x - b^ T y \neq 0$. This is called the duality gap.

The interior point algorithm works by using Newton’s method to find a direction to move $(\Delta x^ k, \Delta y^ k, \Delta s^ k)$ from the current solution $ (x^ k, y^ k, s^ k)$ toward a better solution:

$(x^{k+1}, y^{k+1}, s^{k+1}) = (x^ k, y^ k, s^ k) + \alpha (\Delta x^ k, \Delta y^ k, \Delta s^ k)$

where $\alpha $ is the step length and is assigned a value as large as possible but not so large that an $ x^{k+1}_ i$ or $ s^{k+1}_ i$ is too close to zero. The direction in which to move is found using

$A \Delta x^ k = \mi {infeas}_ c$

$A^ T \Delta y^ k + \Delta s^ k = \mi {infeas}_ d$

$S^ k \Delta x^ k + X^ k \Delta s^ k = - X^ k S^ k e $

To greatly improve performance, the third equation is changed to

$S^ k \Delta x^ k + X^ k \Delta s^ k = - X^ k S^ k e + \sigma _ k \mu _ k e $

where $\mu _ k = (x^ k)^ T s^ k / n$, the average complementarity, and $0 \leq \sigma _ k \leq 1.$

The effect now is to find a direction in which to move to reduce infeasibilities and to reduce the complementarity toward zero, but if any $ x^ k_ i s^ k_ i$ is too close to zero, it is nudged out to $\mu $, and any $ x^ k_ i s^ k_ i$ that is larger than $\mu $ is nudged into $\mu $. A $\sigma _ k$ close to or equal to 0.0 biases a direction toward the optimum, and a value of $\sigma _ k$ close to or equal to 1.0 centers the direction toward a point where all pairwise products $x^ k_ i s^ k_ i = \mu $. Such points make up the central path in the interior. Although centering directions make little, if any, progress in reducing $\mu $ and moving the solution closer to the optimum, substantial progress toward the optimum can usually be made in the next iteration.

The central path is crucial to why the interior point algorithm is so efficient. As $\mu $ is decreased, this path guides the algorithm to the optimum through the interior of feasible space. Without centering, the algorithm would find a series of solutions near each other close to the boundary of feasible space. Step lengths along the direction would be small and many more iterations would probably be required to reach the optimum.

That in a nutshell is the primal-dual interior point algorithm. Varieties of the algorithm differ in the way $\alpha $ and $\sigma _ k$ are chosen and the direction adjusted during each iteration. A wealth of information can be found in the texts by Roos, Terlaky, and Vial (1997), Wright (1996), and Ye (1996).

The calculation of the direction is the most time-consuming step of the interior point algorithm. Assume the $ k$th iteration is being performed, so the subscript and superscript $ k$ can be dropped from the algebra:

$A \Delta x = \mi {infeas}_ c$

$A^ T \Delta y + \Delta s = \mi {infeas}_ d$

$S \Delta x + X \Delta s = - X S e + \sigma \mu e $

Rearranging the second equation,

$\Delta s = \mi {infeas}_ d - A^ T \Delta y $

Rearranging the third equation,

$\Delta s = X^{-1}(- S \Delta x - X S e + \sigma \mu e)$

$\Delta s = - \Theta \Delta x - S e + X^{-1} \sigma \mu e $

where $\Theta = S X^{-1}.$

Equating these two expressions for $\Delta s$ and rearranging,

$-\Theta \Delta x - S e + X^{-1} \sigma \mu e = \mi {infeas}_ d - A^ T \Delta y$

$-\Theta \Delta x = S e - X^{-1} \sigma \mu e + \mi {infeas}_ d - A^ T \Delta y$

$\Delta x = \Theta ^{-1}(-S e + X^{-1} \sigma \mu e - \mi {infeas}_ d + A^ T \Delta y)$

$\Delta x = \rho + \Theta ^{-1} A^ T \Delta y $

where $\rho = \Theta ^{-1}(-S e + X^{-1} \sigma \mu e - \mi {infeas}_ d).$

Substituting into the first direction equation,

$A \Delta x = \mi {infeas}_ c$

$A (\rho + \Theta ^{-1} A^ T \Delta y) = \mi {infeas}_ c$

$A \Theta ^{-1} A^ T \Delta y = \mi {infeas}_ c - A \rho $

$\Delta y = (A \Theta ^{-1} A^ T)^{-1}(\mi {infeas}_ c - A \rho ) $

$\Theta $, $\rho $, $\Delta y$, $\Delta x$, and $\Delta s$ are calculated in that order. The hardest term is the factorization of the $(A \Theta ^{-1} A^ T)$ matrix to determine $\Delta y$. Fortunately, although the values of $(A \Theta ^{-1} A^ T)$ are different for each iteration, the locations of the nonzeros in this matrix remain fixed; the nonzero locations are the same as those in the matrix $ (A A^ T)$. This is because $\Theta ^{-1} = X S^{-1}$ is a diagonal matrix that has the effect of merely scaling the columns of $ (A A^ T)$.

The fact that the nonzeros in $A \Theta ^{-1} A^ T$ have a constant pattern is exploited by all interior point algorithms and is a major reason for their excellent performance. Before iterations begin, $ A A^ T$ is examined and its rows and columns are symmetrically permuted so that during Cholesky factorization, the number of fill-ins created is smaller. A list of arithmetic operations to perform the factorization is saved in concise computer data structures (working with memory locations rather than actual numerical values). This is called symbolic factorization. During iterations, when memory has been initialized with numerical values, the operations list is performed sequentially. Determining how the factorization should be performed again and again is unnecessary.

The Primal-Dual Predictor-Corrector Interior Point Algorithm

The variant of the interior point algorithm implemented in PROC INTPOINT is a Primal-Dual Predictor-Corrector interior point algorithm. At first, Newton’s method is used to find a direction $(\Delta x^ k_{\mi {aff}}, \Delta y^ k_{\mi {aff}}, \Delta s^ k_{\mi {aff}})$ to move, but calculated as if $\mu $ is zero, that is, as a step with no centering, known as an affine step:

$A \Delta x^ k_{\mi {aff}} = \mi {infeas}_ c$

$A^ T \Delta y^ k_{\mi {aff}} + \Delta s^ k_{\mi {aff}} = \mi {infeas}_ d$

$S^ k \Delta x^ k_{\mi {aff}} + X^ k \Delta s^ k_{\mi {aff}} = - X^ k S^ k e$

$(x^ k_{\mi {aff}}, y^ k_{\mi {aff}}, s^ k_{\mi {aff}}) = (x^ k, y^ k, s^ k) + \alpha (\Delta x^ k_{\mi {aff}}, \Delta y^ k_{\mi {aff}}, \Delta s^ k_{\mi {aff}}) $

where $\alpha $ is the step length as before.

Complementarity $ x^ T s$ is calculated at $ (x^ k_{\mi {aff}}, y^ k_{\mi {aff}}, s^ k_{\mi {aff}})$ and compared with the complementarity at the starting point $ (x^ k, y^ k, s^ k)$, and the success of the affine step is gauged. If the affine step was successful in reducing the complementarity by a substantial amount, the need for centering is not great, and $\sigma _ k$ in the following linear system is assigned a value close to zero. If, however, the affine step was unsuccessful, centering would be beneficial, and $\sigma _ k$ in the following linear system is assigned a value closer to 1.0. The value of $\sigma _ k$ is therefore adaptively altered depending on the progress made toward the optimum.

A second linear system is solved to determine a centering vector $(\Delta x^ k_{c}, \Delta y^ k_{c}, \Delta s^ k_{c})$ from $ (x^ k_{\mi {aff}}, y^ k_{\mi {aff}}, s^ k_{\mi {aff}})$:

$A \Delta x^ k_{c} = 0$

$A^ T \Delta y^ k_{c} + \Delta s^ k_{c} = 0$

$S^ k \Delta x^ k_{c} + X^ k \Delta s^ k_{c} = - X^ k_{\mi {aff}} S^ k_{\mi {aff}} e + \sigma _ k \mu _ k e $

Then

$(\Delta x^ k, \Delta y^ k, \Delta s^ k) = (\Delta x^ k_{\mi {aff}}, \Delta y^ k_{\mi {aff}}, \Delta s^ k_{\mi {aff}}) + (\Delta x^ k_{c}, \Delta y^ k_{c}, \Delta s^ k_{c})$

$(x^{k+1}, y^{k+1}, s^{k+1}) = (x^ k, y^ k, s^ k) + \alpha (\Delta x^ k, \Delta y^ k, \Delta s^ k) $

where, as before, $\alpha $ is the step length assigned a value as large as possible but not so large that an $ x^{k+1}_ i$ or $ s^{k+1}_ i$ is too close to zero.

Although the Predictor-Corrector variant entails solving two linear systems instead of one, fewer iterations are usually required to reach the optimum. The additional overhead of calculating the second linear system is small, as the factorization of the $(A \Theta ^{-1} A^ T)$ matrix has already been performed to solve the first linear system.

Interior Point: Upper Bounds

If the LP had upper bounds ($0 \leq x \leq u$ where $ u$ is the upper bound vector), then the primal and dual problems, the duality gap, and the KKT conditions would have to be expanded.

The primal linear program to be solved is

\[  \begin{array}{ll} \mr {minimize} &  c^ T x \\ \mr {subject\  to} &  A x = b \\ &  0 \leq x \leq u \\ \end{array}  \]

where $0 \leq x \leq u$ is split into $x \geq 0$ and $x \leq u$. Let $ z$ be primal slack so that $ x + z = u$, and associate dual variables $ w$ with these constraints. The interior point algorithm solves the system of equations to satisfy the Karush-Kuhn-Tucker (KKT) conditions for optimality:

$ A x = b$

$ x + z = u$

$ A^ T y + s - w = c$

$ X S e = 0$

$ Z W e = 0$

$ x,s,z,w \geq 0 $

These are the conditions for feasibility, with the complementarity conditions $ X S e = 0$ and $ Z W e = 0$ added. Complementarity forces the optimal objectives of the primal and dual to be equal, $ c^ T x_{opt} = b^ T y_{opt} - u^ T w_{opt}$, as

$ 0 = z^ T_{opt} w_{opt} = (u - x_{opt})^ T w_{opt} = u^ T w_{opt} - x^ T_{opt} w_{opt}$

$ 0 = x^ T_{opt} s_{opt} = s^ T_{opt} x_{opt} = (c - A^ T y_{opt} + w_{opt})^ T x_{opt} = c^ T x_{opt} - y^ T_{opt} (A x_{opt}) + w_{opt})^ T x_{opt}= c^ T x_{opt} - b^ T y_{opt} + u^ T w_{opt}$

Before the optimum is reached, a solution $ (x, y, s, z, w)$ might not satisfy the KKT conditions:

  • Primal bound constraints may be violated, $\mi {infeas}_ b = u - x - z \neq 0$.

  • Primal constraints may be violated, $\mi {infeas}_ c = b - A x \neq 0$.

  • Dual constraints may be violated, $\mi {infeas}_ d = c - A^ T y - s + w\neq 0$.

  • Complementarity conditions may not be satisfied, $x^ T s \neq 0$ or $z^ T w \neq 0$.

The calculations of the interior point algorithm can easily be derived in a fashion similar to calculations for when an LP has no upper bounds. See the paper by Lustig, Marsten, and Shanno (1992).

In some iteration $ k$, the affine step system that must be solved is

$\Delta x_{\mi {aff}} + \Delta z_{\mi {aff}} = \mi {infeas}_ b$

$A \Delta x_{\mi {aff}} = \mi {infeas}_ c$

$A^ T \Delta y_{\mi {aff}} + \Delta s_{\mi {aff}} - \Delta w_{\mi {aff}} = \mi {infeas}_ d$

$S \Delta x_{\mi {aff}} + X \Delta s_{\mi {aff}} = - X S e$

$Z \Delta w_{\mi {aff}} + W \Delta z_{\mi {aff}} = - Z W e $

Therefore, the computations involved in solving the affine step are

$\Theta =S X^{-1} + W Z^{-1}$

$\rho = \Theta ^{-1} (\mi {infeas}_ d + (S-W)e - Z^{-1} W \:  \mi {infeas}_ b)$

$\Delta y_{\mi {aff}} = (A \Theta ^{-1} A^ T)^{-1} (\mi {infeas}_ c + A \rho )$

$\Delta x_{\mi {aff}} = \Theta ^{-1} A^ T \Delta y_{\mi {aff}} - \rho $

$\Delta z_{\mi {aff}} = \mi {infeas}_ b - \Delta x_{\mi {aff}}$

$\Delta w_{\mi {aff}} = -We - Z^{-1} W \Delta z_{\mi {aff}}$

$\Delta s_{\mi {aff}} = -Se - X^{-1} S \Delta x_{\mi {aff}}$

$(x_{\mi {aff}}, y_{\mi {aff}}, s_{\mi {aff}}, z_{\mi {aff}}, w_{\mi {aff}}) = (x, y, s, z, w) + \alpha (\Delta x_{\mi {aff}}, \Delta y_{\mi {aff}}, \Delta s_{\mi {aff}}, \Delta z_{\mi {aff}}, \Delta w_{\mi {aff}})$

and $\alpha $ is the step length as before.

A second linear system is solved to determine a centering vector $(\Delta x_ c, \Delta y_ c, \Delta s_ c, \Delta z_ c, \Delta w_ c)$ from $ (x_{\mi {aff}}, y_{\mi {aff}}, s_{\mi {aff}}, z_{\mi {aff}}, w_{\mi {aff}})$:

$\Delta x_ c + \Delta z_ c = 0$

$A \Delta x_ c = 0$

$A^ T \Delta y_ c + \Delta s_ c - \Delta w_ c = 0$

$S \Delta x_ c + X \Delta s_ c = - X_{\mi {aff}} S_{\mi {aff}} e + \sigma \mu e$

$Z \Delta w_ c + W \Delta z_ c = - Z_{\mi {aff}} W_{\mi {aff}} e + \sigma \mu e $

where

$\zeta _{\mi {start}} = x^ T s + z^ T w$, complementarity at the start of the iteration

$\zeta _{\mi {aff}} = x_{\mi {aff}}^ T s_{\mi {aff}} + z_{\mi {aff}}^ T w_{\mi {aff}}$, the affine complementarity

$\mu = \zeta _{\mi {aff}} / 2n$, the average complementarity

$\sigma = (\zeta _{\mi {aff}} / \zeta _{\mi {start}})^3 $

Therefore, the computations involved in solving the centering step are

$\rho =\Theta ^{-1} (\sigma \mu (X^{-1} - Z^{-1}) e - X^{-1} X_{\mi {aff}} S_{\mi {aff}} e + Z^{-1} Z_{\mi {aff}} W_{\mi {aff}} e)$

$\Delta y_ c = (A \Theta ^{-1} A^ T)^{-1} A \rho $

$\Delta x_ c = \Theta ^{-1} A^ T \Delta y_ c - \rho $

$\Delta z_ c = -\Delta x_ c$

$\Delta w_ c = \sigma \mu Z^{-1} e - Z^{-1} Z_{\mi {aff}} W_{\mi {aff}} e - Z^{-1} W_{\mi {aff}} \Delta z_ c$

$\Delta s_ c = \sigma \mu X^{-1} e - X^{-1} X_{\mi {aff}} S_{\mi {aff}} e - X^{-1} S_{\mi {aff}} \Delta x_ c $

Then

$(\Delta x, \Delta y, \Delta s, \Delta z, \Delta w) = (\Delta x_{\mi {aff}}, \Delta y_{\mi {aff}}, \Delta s_{\mi {aff}}, \Delta z_{\mi {aff}}, \Delta w_{\mi {aff}}) + (\Delta x_{c}, \Delta y_{c}, \Delta s_{c}, \Delta z_{c}, \Delta w_{c})$

$(x^{k+1}, y^{k+1}, s^{k+1}, z^{k+1}, w^{k+1}) = (x^ k, y^ k, s^ k, z^ k, w^ k) + \alpha (\Delta x, \Delta y, \Delta s, \Delta z, \Delta w)$

where, as before, $\alpha $ is the step length assigned a value as large as possible but not so large that an $ x^{k+1}_ i$, $ s^{k+1}_ i$, $ z^{k+1}_ i$, or $ w^{k+1}_ i$ is too close to zero.

The algebra in this section has been simplified by assuming that all variables have finite upper bounds. If the number of variables with finite upper bounds $ n_ u < n$, you need to change the algebra to reflect that the $ Z$ and $ W$ matrices have dimension $ n_ u \times 1 $ or $ n_ u \times n_ u$. Other computations need slight modification. For example, the average complementarity is

$\mu = x_{\mi {aff}}^ T s_{\mi {aff}} / n + z_{\mi {aff}}^ T w_{\mi {aff}} / n_ u$

An important point is that any upper bounds can be handled by specializing the algorithm and not by generating the constraints $x \leq u$ and adding these to the main primal constraints $ A x = b$.