Termination Criteria :: SAS/IML(R) 13.1 User's Guide

Termination Criteria

Subsections:

Criteria Used by All Techniques
Termination Criteria for NLPNMS
Termination Criteria for Unconstrained and Linearly Constrained Techniques
Termination Criteria for Nonlinearly Constrained Techniques

The input argument tc specifies a vector of bounds that correspond to a set of termination criteria that are tested in each iteration. If you do not specify an IML module with the “ptit” argument, these bounds determine when the optimization process stops.

If you specify the “ptit” argument, the “tc” argument is ignored. The module specified by the “ptit” argument replaces the subroutine that is used by default to test the termination criteria. The module is called in each iteration with the current location, $x$ , and the value, $f$ , of the objective function at $x$ . The module must give a return code, $rc$ , that decides whether the optimization process is to be continued or terminated. As long as the module returns $rc = 0$ , the optimization process continues. When the module returns $rc \neq 0$ , the optimization process stops.

If you use the tc vector, the optimization techniques stop the iteration process when at least one of the corresponding set of termination criteria are satisfied. Table 14.3 and Table 14.4 indicate the criterion associated with each element of the tc vector. There is a default for each criterion, and if you specify a missing value for the corresponding element of the tc vector, the default value is used. You can avoid termination with respect to the ABSFTOL, ABSGTOL, ABSXTOL, FTOL, FTOL2, GTOL, GTOL2, and XTOL criteria by specifying a value of zero for the corresponding element of the tc vector.

Table 14.3: Termination Criteria for the NLPNMS Subroutine

Index	Description
1	maximum number of iterations (MAXIT)
2	maximum number of function calls (MAXFU)
3	absolute function criterion (ABSTOL)
4	relative function criterion (FTOL)
5	relative function criterion (FTOL2)
6	absolute function criterion (ABSFTOL)
7	FSIZE value used in FTOL criterion
8	relative parameter criterion (XTOL)
9	absolute parameter criterion (ABSXTOL)
9	size of final trust-region radius $\rho$ (COBYLA algorithm)
10	XSIZE value used in XTOL criterion

Table 14.4: Termination Criteria for Other Subroutines

Index	Description
1	maximum number of iterations (MAXIT)
2	maximum number of function calls (MAXFU)
3	absolute function criterion (ABSTOL)
4	relative gradient criterion (GTOL)
5	relative gradient criterion (GTOL2)
6	absolute gradient criterion (ABSGTOL)
7	relative function criterion (FTOL)
8	predicted function reduction criterion (FTOL2)
9	absolute function criterion (ABSFTOL)
10	FSIZE value used in GTOL and FTOL criterion
11	relative parameter criterion (XTOL)
12	absolute parameter criterion (ABSXTOL)
13	XSIZE value used in XTOL criterion

Criteria Used by All Techniques

The following list indicates the termination criteria that are used with all the optimization techniques:

tc[1] specifies the maximum number of iterations in the optimization process (MAXIT). The default values are

NLPNMS:		MAXIT=1000
NLPCG:		MAXIT=400
Others:		MAXIT=200

tc[2] specifies the maximum number of function calls in the optimization process (MAXFU). The default values are

NLPNMS:		MAXFU=3000
NLPCG:		MAXFU=1000
Others:		MAXFU=500

tc[3] specifies the absolute function convergence criterion (ABSTOL). For minimization, termination requires $f^{(k)} = f(x^{(k)}) \leq$ ABSTOL, while for maximization, termination requires $f^{(k)} = f(x^{(k)}) \geq$ ABSTOL. The default values are the negative and positive square roots of the largest double precision value, for minimization and maximization, respectively.

These criteria are useful when you want to divide a time-consuming optimization problem into a series of smaller problems.

Termination Criteria for NLPNMS

Since the Nelder-Mead simplex algorithm does not use derivatives, no termination criteria are available that are based on the gradient of the objective function.

When the NLPNMS subroutine implements Powell’s COBYLA algorithm, it uses only one criterion other than the three used by all the optimization techniques. The COBYLA algorithm is a trust-region method that sequentially reduces the radius, $\rho$ , of a spheric trust region from the start radius, $\rho _{beg}$ , which is controlled with the par[2] argument, to the final radius, $\rho _{end}$ , which is controlled with the tc[9] argument. The default value for tc[9] is $\rho _{end} =$ 1E $-$ 4. Convergence to small values of $\rho _{end}$ can take many calls of the function and constraint modules and might result in numerical problems.

In addition to the criteria used by all techniques, the original Nelder-Mead simplex algorithm uses several other termination criteria, which are described in the following list:

tc[4] specifies the relative function convergence criterion (FTOL). Termination requires a small relative difference between the function values of the vertices in the simplex with the largest and smallest function values.

${ | f_{hi}^{(k)} - f_{lo}^{(k)} | \over \max (|f_{hi}^{(k)})|, \mathit{FSIZE}) } \leq \mathit{FTOL}$

where FSIZE is defined by tc[7]. The default value is tc $[4] = 10^{-\mbox{\scriptsize FDIGITS}}$ , where FDIGITS is controlled by the par[8] argument. The par[8] argument has a default value of $\log _{10}(\epsilon )$ , where $\epsilon$ is the machine precision. Hence, the default value for FTOL is $\epsilon$ .
tc[5] specifies another relative function convergence criterion (FTOL2). Termination requires a small standard deviation of the function values of the $n + 1$ simplex vertices $x_0^{(k)}, \ldots , x_ n^{(k)}$ .

$\sqrt {{1 \over n+1} \sum _ l (f(x_ l^{(k)}) - \overline{f}(x^{(k)}))^2 } \leq \mathit{FTOL2}$

where $\overline{f}(x^{(k)}) = {1 \over n+1} \sum _ l f(x_ l^{(k)})$ . If there are $a$ active boundary constraints at $x^{(k)}$ , the mean and standard deviation are computed only for the $n+1-a$ unconstrained vertices. The default is tc $[5]=$ 1E $-$ 6.
tc[6] specifies the absolute function convergence criterion (ABSFTOL). Termination requires a small absolute difference between the function values of the vertices in the simplex with the largest and smallest function values.

$| f_{hi}^{(k)} - f_{lo}^{(k)} | \leq \mathit{ABSFTOL}$

The default is tc $[6]=0$ .
tc[7] specifies the FSIZE value used in the FTOL termination criterion. The default is tc $[7]=0$ .
tc[8] specifies the relative parameter convergence criterion (XTOL). Termination requires a small relative parameter difference between the vertices with the largest and smallest function values.

${\max _ j |x_ j^{lo} - x_ j^{hi}| \over \max (|x_ j^{lo}|,|x_ j^{hi}|, \mathit{XSIZE})} \leq \mathit{XTOL}$

The default is tc $[8]=$ 1E $-$ 8.
tc[9] specifies the absolute parameter convergence criterion (ABSXTOL). Termination requires either a small length, $\alpha ^{(k)}$ , of the vertices of a restart simplex or a small simplex size, $\delta ^{(k)}$ .

$\begin{eqnarray*} \alpha ^{(k)} & \leq & \mbox{\Emph{ABSXTOL}} \\ \delta ^{(k)} & \leq & \mbox{\Emph{ABSXTOL}} \\ \end{eqnarray*}$

where $\delta ^{(k)}$ is defined as the L1 distance of the simplex vertex with the smallest function value, $y^{(k)}$ , to the other $n$ simplex points, $x_ l^{(k)} \neq y$ .

$\delta ^{(k)} = \sum _{x_ l \neq y} \parallel x_ l^{(k)} - y^{(k)} \parallel _1$

The default is tc $[9]=$ 1E $-$ 8.
tc[10] specifies the XSIZE value used in the XTOL termination criterion. The default is tc $[10]=0$ .

Termination Criteria for Unconstrained and Linearly Constrained Techniques

tc[4] specifies the relative gradient convergence criterion (GTOL). For all techniques except the NLPCG subroutine, termination requires that the normalized predicted function reduction is small.

${ g(x^{(k)})^ T [G^{(k)}]^{-1} g(x^{(k)}) \over \max (|f(x^{(k)})|,\mathit{FSIZE}) } \leq \mathit{GTOL}$

where FSIZE is defined by tc[10]. For the NLPCG technique (where a reliable Hessian estimate is not available),

${ \parallel g(x^{(k)}) \parallel _2^2 \quad \parallel s(x^{(k)}) \parallel _2 \over \parallel g(x^{(k)}) - g(x^{(k-1)}) \parallel _2 \max (|f(x^{(k)})|, \mathit{FSIZE}) } \leq \mathit{GTOL}$

is used. The default is tc $[4]=$ 1E $-$ 8.
tc[5] specifies another relative gradient convergence criterion (GTOL2). This criterion is used only by the NLPLM subroutine.

$\max _ j {|g_ j(x^{(k)})| \over \sqrt {f(x^{(k)}) G_{j,j}^{(k)}} } \leq \mathit{GTOL2}$

The default is tc[5]=0.
tc[6] specifies the absolute gradient convergence criterion (ABSGTOL). Termination requires that the maximum absolute gradient element be small.

$\max _ j |g_ j(x^{(k)})| \leq \mathit{ABSGTOL}$

The default is tc $[6]=$ 1E $-$ 5.
tc[7] specifies the relative function convergence criterion (FTOL). Termination requires a small relative change of the function value in consecutive iterations.

${ |f(x^{(k)}) - f(x^{(k-1)})| \over \max (|f(x^{(k-1)})|,FSIZE) } \leq \mathit{FTOL}$

where $FSIZE$ is defined by tc[10]. The default is tc $[7] = 10^{-\mbox{\scriptsize FDIGITS}}$ , where FDIGITS is controlled by the par[8] argument. The par[8] argument has a default value of $\log _{10}(\epsilon )$ , where $\epsilon$ is the machine precision. Hence, the default for FTOL is $\epsilon$ .
tc[8] specifies another function convergence criterion (FTOL2). For least squares problems, termination requires a small predicted reduction of the objective function, $df^{(k)} \approx f(x^{(k)}) - f(x^{(k)} + s^{(k)})$ . The predicted reduction is computed by approximating the objective function by the first two terms of the Taylor series and substituting the Newton step, $s^{(k)}=-G^{(k)-1}g^{(k)}$ , as follows:

$\begin{eqnarray*} df^{(k)} & = & -g^{(k)T}s^{(k)}-{1 \over 2} s^{(k)T} G^{(k)} s^{(k)} \\ & = & -\frac{1}{2} s^{(k)T} g^{(k)} \\ & \leq & \mbox{\Emph{FTOL2}} \end{eqnarray*}$

The FTOL2 criterion is the unscaled version of the GTOL criterion. The default is tc[8]=0.
tc[9] specifies the absolute function convergence criterion (ABSFTOL). Termination requires a small change of the function value in consecutive iterations.

$|f(x^{(k-1)}) - f(x^{(k)})| \leq \mathit{ABSFTOL}$

The default is tc[9]=0.
tc[10] specifies the FSIZE value used in the GTOL and FTOL termination criteria. The default is tc[10]=0.
tc[11] specifies the relative parameter convergence criterion (XTOL). Termination requires a small relative parameter change in consecutive iterations.

${\max _ j |x_ j^{(k)} - x_ j^{(k-1)}| \over \max (|x_ j^{(k)}|,|x_ j^{(k-1)}|,\mathit{XSIZE})} \leq \mathit{XTOL}$

The default is tc[11]=0.
tc[12] specifies the absolute parameter convergence criterion (ABSXTOL). Termination requires a small Euclidean distance between parameter vectors in consecutive iterations.

$\parallel x^{(k)} - x^{(k-1)} \parallel _2 \leq \mathit{ABSXTOL}$

The default is tc[12]=0.
tc[13] specifies the XSIZE value used in the XTOL termination criterion. The default is tc[13]=0.

Termination Criteria for Nonlinearly Constrained Techniques

The only algorithm available for nonlinearly constrained optimization other than the NLPNMS subroutine is the NLPQN subroutine, when you specify the “nlc” module argument. This method, unlike the other optimization methods, does not monotonically reduce the value of the objective function or some kind of merit function that combines objective and constraint functions. Instead, the algorithm uses the watchdog technique with backtracking of Chamberlain et al. (1982). Therefore, no termination criteria are implemented that are based on the values $x$ or $f$ in consecutive iterations. In addition to the criteria used by all optimization techniques, there are three other termination criteria available; these are based on the Lagrange function

$L(x,\lambda ) = f(x) - \sum _{i=1}^ m \lambda _ i c_ i(x)$

and its gradient

$\nabla _ x L(x,\lambda ) = g(x) - \sum _{i=1}^ m \lambda _ i \nabla _ x c_ i(x)$

where $m$ denotes the total number of constraints, $g=g(x)$ is the gradient of the objective function, and $\lambda$ is the vector of Lagrange multipliers. The Kuhn-Tucker conditions require that the gradient of the Lagrange function is zero at the optimal point $(x^*,\lambda ^*)$ , as follows:

$\nabla _ x L(x^*,\lambda ^*) = 0$

tc[4] specifies the GTOL criterion, which requires that the normalized predicted function reduction be small.

${ { |g(x^{(k)}) s(x^{(k)})| + \sum _{i=1}^ m |\lambda _ i c_ i(x^{(k)})| } \over { \max (|f(x^{(k)})|, \mathit{FSIZE}) } } \leq \mathit{GTOL}$

where FSIZE is defined by the tc[10] argument. The default is tc $[4]=$ 1E $-$ 8.
tc[6] specifies the ABSGTOL criterion, which requires that the maximum absolute gradient element of the Lagrange function be small.

$\max _ j | \{ \nabla _ x L(x^{(k)},\lambda ^{(k)})\} _ j | \leq \mathit{ABSGTOL}$

The default is tc $[6]=$ 1E $-$ 5.
tc[8] specifies the FTOL2 criterion, which requires that the predicted function reduction be small.

$| g(x^{(k)}) s(x^{(k)})| + \sum _{i=1}^ m |\lambda _ i c_ i| \leq \mathit{FTOL2}$

The default is tc $[8]=$ 1E $-$ 6. This is the criterion used by the programs VMCWD and VF02AD of Powell (1982b).