The NLMIXED Procedure

Active Set Methods

The parameter vector $\btheta \in \mc{R}^ n$ can be subject to a set of m linear equality and inequality constraints:

\begin{align*} \sum _{j=1}^ n a_{ij} \theta _ j & = b_ i \quad \quad \quad i=1,\ldots , m_ e \\ \sum _{j=1}^ n a_{ij} \theta _ j & \geq b_ i \quad \quad \quad i=m_ e+1,\ldots , m \end{align*}

The coefficients $a_{ij}$ and right-hand sides $b_ i$ of the equality and inequality constraints are collected in the $m \times n$ matrix $\mb{A}$ and the m vector $\mb{b}$.

The m linear constraints define a feasible region $\mc{G}$ in $\mc{R}^ n$ that must contain the point $\btheta _{*}$ that minimizes the problem. If the feasible region $\mc{G}$ is empty, no solution to the optimization problem exists.

In PROC NLMIXED, all optimization techniques use active set methods. The iteration starts with a feasible point $\btheta ^{(0)}$, which you can provide or which can be computed by the Schittkowski and Stoer (1979) algorithm implemented in PROC NLMIXED. The algorithm then moves from one feasible point $\btheta ^{(k-1)}$ to a better feasible point $\btheta ^{(k)}$ along a feasible search direction $\mb{s}^{(k)}$,

\[ \btheta ^{(k)} = \btheta ^{(k-1)} + \alpha ^{(k)} \mb{s}^{(k)} , \quad \alpha ^{(k)} > 0 \]

Theoretically, the path of points $\btheta ^{(k)}$ never leaves the feasible region $\mc{G}$ of the optimization problem, but it can reach its boundaries. The active set $\mc{A}^{(k)}$ of point $\btheta ^{(k)}$ is defined as the index set of all linear equality constraints and those inequality constraints that are satisfied at $\btheta ^{(k)}$. If no constraint is active $\btheta ^{(k)}$, the point is located in the interior of $\mc{G}$, and the active set $\mc{A}^{(k)} = \emptyset $ is empty. If the point $\btheta ^{(k)}$ in iteration k hits the boundary of inequality constraint i, this constraint i becomes active and is added to $\mc{A}^{(k)}$. Each equality constraint and each active inequality constraint reduce the dimension (degrees of freedom) of the optimization problem.

In practice, the active constraints can be satisfied only with finite precision. The LCEPSILON= r option specifies the range for active and violated linear constraints. If the point $\btheta ^{(k)}$ satisfies the condition

\[ \left| \sum _{j=1}^ n a_{ij} \theta _ j^{(k)} - b_ i \right| \leq t \]

where $t = r (|b_ i| + 1)$, the constraint i is recognized as an active constraint. Otherwise, the constraint i is either an inactive inequality or a violated inequality or equality constraint. Due to rounding errors in computing the projected search direction, error can be accumulated so that an iterate $\btheta ^{(k)}$ steps out of the feasible region.

In those cases, PROC NLMIXED might try to pull the iterate $\btheta ^{(k)}$ back into the feasible region. However, in some cases the algorithm needs to increase the feasible region by increasing the LCEPSILON= r value. If this happens, a message is displayed in the log output.

If the algorithm cannot improve the value of the objective function by moving from an active constraint back into the interior of the feasible region, it makes this inequality constraint an equality constraint in the next iteration. This means that the active set $\mc{A}^{(k+1)}$ still contains the constraint i. Otherwise, it releases the active inequality constraint and increases the dimension of the optimization problem in the next iteration.

A serious numerical problem can arise when some of the active constraints become (nearly) linearly dependent. PROC NLMIXED removes linearly dependent equality constraints before starting optimization. You can use the LCSINGULAR= option to specify a criterion r used in the update of the QR decomposition that determines whether an active constraint is linearly dependent relative to a set of other active constraints.

If the solution $\btheta ^*$ is subjected to $n_{\mathit{act}}$ linear equality or active inequality constraints, the QR decomposition of the $n \times n_{\mathit{act}}$ matrix $\hat{\mb{A}}^\prime $ of the linear constraints is computed by $\hat{\mb{A}}^\prime = \mb{QR}$, where $\mb{Q}$ is an $n \times n$ orthogonal matrix and $\mb{R}$ is an $n \times n_{\mathit{act}}$ upper triangular matrix. The n columns of matrix $\mb{Q}$ can be separated into two matrices, $\mb{Q}=[\mb{Y},\mb{Z}]$, where $\mb{Y}$ contains the first $n_{\mathit{act}}$ orthogonal columns of $\mb{Q}$ and $\mb{Z}$ contains the last $n-n_{\mathit{act}}$ orthogonal columns of $\mb{Q}$. The $n \times (n-n_{\mathit{act}})$ column-orthogonal matrix $\mb{Z}$ is also called the null-space matrix of the active linear constraints $\hat{\mb{A}}^\prime $. The $n - n_{\mathit{act}}$ columns of the $n \times (n - n_{\mathit{act}})$ matrix $\mb{Z}$ form a basis orthogonal to the rows of the $n_{\mathit{act}} \times n$ matrix $\hat{\mb{A}}$.

At the end of the iterating, PROC NLMIXED computes the projected gradient $\mb{g}_ Z$,

\[ \mb{g}_ Z = \mb{Z}^\prime \mb{g} \]

In the case of boundary-constrained optimization, the elements of the projected gradient correspond to the gradient elements of the free parameters. A necessary condition for $\btheta ^{*}$ to be a local minimum of the optimization problem is

\[ \mb{g}_ Z(\btheta ^{*}) = \mb{Z}^\prime \mb{g}(\btheta ^{*}) = \mb{0} \]

The symmetric $n_{\mathit{act}} \times n_{\mathit{act}}$ matrix $\mb{G}_ Z$,

\[ \mb{G}_ Z = \mb{Z}^\prime \mb{GZ} \]

is called a projected Hessian matrix. A second-order necessary condition for $\btheta ^{*}$ to be a local minimizer requires that the projected Hessian matrix is positive semidefinite.

Those elements of the $n_{\mathit{act}}$ vector of first-order estimates of Lagrange multipliers,

\[ \lambda = (\hat{\mb{A}}\hat{\mb{A}}^\prime )^{-1} \hat{\mb{A}} \mb{ZZ}^\prime \mb{g} \]

that correspond to active inequality constraints indicate whether an improvement of the objective function can be obtained by releasing this active constraint. For minimization, a significant negative Lagrange multiplier indicates that a possible reduction of the objective function can be achieved by releasing this active linear constraint. The LCDEACT= r option specifies a threshold r for the Lagrange multiplier that determines whether an active inequality constraint remains active or can be deactivated. (In the case of boundary-constrained optimization, the Lagrange multipliers for active lower (upper) constraints are the negative (positive) gradient elements corresponding to the active parameters.)