Active Set Methods

The parameter vector $x \in {\mathcal R}^ n$ may be subject to a set of $ m$ linear equality and inequality constraints:

\[  \begin{array}{ll} \sum \limits _{j=1}^ n a_{ij} x_ j = b_ i, &  i=1,\dots , m_ e \\ \sum \limits _{j=1}^ n a_{ij} x_ j \geq b_ i, &  i=m_ e+1,\dots , m \end{array}  \]

The coefficients $ a_{ij}$ and right-hand sides $ b_ i$ of the equality and inequality constraints are collected in the $ m \times n$ matrix $ A$ and the $ m-$vector $ b$.

The $ m$ linear constraints define a feasible region $ {\mathcal G}$ in $ {\mathcal R}^ n$ that must contain the point $ x^{*}$ that minimizes the problem. If the feasible region $ {\mathcal G}$ is empty, no solution to the optimization problem exists.

All optimization techniques in PROC NLP (except those processing nonlinear constraints) are active set methods. The iteration starts with a feasible point $ x^{(0)}$, which either is provided by the user or can be computed by the Schittkowski and Stoer (1979) algorithm implemented in PROC NLP. The algorithm then moves from one feasible point $ x^{(k-1)}$ to a better feasible point $ x^{(k)}$ along a feasible search direction $ s^{(k)}$:

\[  x^{(k)} = x^{(k-1)} + \alpha ^{(k)} s^{(k)} \: , \quad \alpha ^{(k)} > 0  \]

Theoretically, the path of points $ x^{(k)}$ never leaves the feasible region $ {\mathcal G}$ of the optimization problem, but it can hit its boundaries. The active set $ {\mathcal A}^{(k)}$ of point $ x^{(k)}$ is defined as the index set of all linear equality constraints and those inequality constraints that are satisfied at $ x^{(k)}$. If no constraint is active for $ x^{(k)}$, the point is located in the interior of $ {\mathcal G}$, and the active set $ {\mathcal A}^{(k)} $ is empty. If the point $ x^{(k)}$ in iteration $ k$ hits the boundary of inequality constraint $ i$, this constraint $ i$ becomes active and is added to $ {\mathcal A}^{(k)}$. Each equality or active inequality constraint reduces the dimension (degrees of freedom) of the optimization problem.

In practice, the active constraints can be satisfied only with finite precision. The LCEPSILON=$ r$ option specifies the range for active and violated linear constraints. If the point $ x^{(k)}$ satisfies the condition

\[  \left| \sum _{j=1}^ n a_{ij} x_ j^{(k)} - b_ i \right| \leq t  \]

where $t = r \times (|b_ i| + 1)$, the constraint $ i$ is recognized as an active constraint. Otherwise, the constraint $ i$ is either an inactive inequality or a violated inequality or equality constraint. Due to rounding errors in computing the projected search direction, error can be accumulated so that an iterate $ x^{(k)}$ steps out of the feasible region. In those cases, PROC NLP may try to pull the iterate $ x^{(k)}$ into the feasible region. However, in some cases the algorithm needs to increase the feasible region by increasing the LCEPSILON=$ r$ value. If this happens it is indicated by a message displayed in the log output.

If you cannot expect an improvement in the value of the objective function by moving from an active constraint back into the interior of the feasible region, you use this inequality constraint as an equality constraint in the next iteration. That means the active set $ {\mathcal A}^{(k+1)}$ still contains the constraint $ i$. Otherwise you release the active inequality constraint and increase the dimension of the optimization problem in the next iteration.

A serious numerical problem can arise when some of the active constraints become (nearly) linearly dependent. Linearly dependent equality constraints are removed before entering the optimization. You can use the LCSINGULAR= option to specify a criterion $ r$ used in the update of the QR decomposition that decides whether an active constraint is linearly dependent relative to a set of other active constraints.

If the final parameter set $ x^{*}$ is subjected to $ n_\mi {act}$ linear equality or active inequality constraints, the QR decomposition of the $ n \times n_\mi {act}$ matrix $ \hat{A}^ T$ of the linear constraints is computed by $ \hat{A}^ T = QR$, where $ Q$ is an $ n \times n$ orthogonal matrix and $ R$ is an $ n \times n_\mi {act}$ upper triangular matrix. The $ n$ columns of matrix $ Q$ can be separated into two matrices, $ Q=[Y,Z]$, where $ Y$ contains the first $ n_\mi {act}$ orthogonal columns of $ Q$ and $ Z$ contains the last $ n-n_\mi {act}$ orthogonal columns of $ Q$. The $ n \times (n-n_\mi {act})$ column-orthogonal matrix $ Z$ is also called the nullspace matrix of the active linear constraints $ \hat{A}^ T$. The $ n - n_\mi {act}$ columns of the $ n \times (n - n_\mi {act})$ matrix $ Z$ form a basis orthogonal to the rows of the $ n_\mi {act} \times n$ matrix $ \hat{A}$.

At the end of the iteration process, the PROC NLP can display the projected gradient

\[  g_ Z = Z^ Tg  \]

In the case of boundary constrained optimization, the elements of the projected gradient correspond to the gradient elements of the free parameters. A necessary condition for $ x^{*}$ to be a local minimum of the optimization problem is

\[  g_ Z(x^{*}) = Z^ Tg(x^{*}) = 0  \]

The symmetric $ n_\mi {act} \times n_\mi {act}$ matrix

\[  G_ Z = Z^ TGZ  \]

is called a projected Hessian matrix. A second-order necessary condition for $ x^{*}$ to be a local minimizer requires that the projected Hessian matrix is positive semidefinite. If available, the projected gradient and projected Hessian matrix can be displayed and written in an OUTEST= data set.

Those elements of the $ n_{act}$ vector of first-order estimates of Lagrange multipliers

\[  \lambda = (\hat{A}\hat{A}^ T)^{-1} \hat{A} ZZ^ T g  \]

which correspond to active inequality constraints indicate whether an improvement of the objective function can be obtained by releasing this active constraint. For minimization (maximization), a significant negative (positive) Lagrange multiplier indicates that a possible reduction (increase) of the objective function can be obtained by releasing this active linear constraint. The LCDEACT=$ r$ option can be used to specify a threshold $r$ for the Lagrange multiplier that decides whether an active inequality constraint remains active or can be deactivated. The Lagrange multipliers are displayed (and written in an OUTEST= data set) only if linear constraints are active at the solution $ x^{*}$. (In the case of boundary-constrained optimization, the Lagrange multipliers for active lower (upper) constraints are the negative (positive) gradient elements corresponding to the active parameters.)