Finite-Difference Approximations of Derivatives

If the optimization technique needs first- or second-order derivatives and you do not specify the corresponding IML module grd, hes, jac, or jacnlc, the derivatives are approximated by finite-difference formulas using only calls of the module fun. If the optimization technique needs second-order derivatives and you specify the grd module but not the hes module, the subroutine approximates the second-order derivatives by finite differences using $n$ or $2n$ calls of the grd module.

The eighth element of the opt argument specifies the type of finite-difference approximation used to compute first- or second-order derivatives and whether the finite-difference intervals, $h$, should be computed by an algorithm of Gill et al. (1983). The value of opt[8] is a two-digit integer, $ij$.

  • If opt[8] is missing or $j=0$, the fast but not very precise forward-difference formulas are used; if $j \neq 0$, the numerically more expensive central-difference formulas are used.

  • If opt[8] is missing or $i \neq 1,2,\mbox{ or }3$, the finite-difference intervals $h$ are based only on the information of par[8] or par[9], which specifies the number of accurate digits to use in evaluating the objective function and nonlinear constraints, respectively. If $i=1,2,\mbox{ or }3$, the intervals are computed with an algorithm by Gill et al. (1983). For $i=1$, the interval is based on the behavior of the objective function; for $i=2$, the interval is based on the behavior of the nonlinear constraint functions; and for $i=3$, the interval is based on the behavior of both the objective function and the nonlinear constraint functions.

Forward-Difference Approximations

  • First-order derivatives: $n$ additional function calls are needed.

    \[  g_ i = {\partial f \over \partial x_ i} = {f(x + h_ ie_ i) - f(x) \over h_ i}  \]
  • Second-order derivatives based on function calls only, when the grd module is not specified (Dennis and Schnabel, 1983): for a dense Hessian matrix, $n+n^2/2$ additional function calls are needed.

    \[  {\partial ^2 f \over \partial x_ i \partial x_ j} = {f(x+h_ ie_ i+h_ je_ j) - f(x+h_ ie_ i) - f(x+h_ je_ j) + f(x) \over h_ i h_ j}  \]
  • Second-order derivatives based on gradient calls, when the grd module is specified (Dennis and Schnabel, 1983): $n$ additional gradient calls are needed.

    \[  {\partial ^2 f \over \partial x_ i \partial x_ j} = {g_ i(x + h_ je_ j) - g_ i(x) \over 2h_ j} + {g_ j(x + h_ ie_ i) - g_ j(x) \over 2h_ i}  \]

Central-Difference Approximations

  • First-order derivatives: $2n$ additional function calls are needed.

    \[  g_ i = {\partial f \over \partial x_ i} = {f(x + h_ ie_ i) - f(x - h_ ie_ i) \over 2h_ i}  \]
  • Second-order derivatives based on function calls only, when the grd module is not specified (Abramowitz and Stegun, 1972): for a dense Hessian matrix, $2n+2n^2$ additional function calls are needed.

    \begin{eqnarray*}  \mbox{\hspace{-0.5in}} {\partial ^2 f \over \partial x^2_ i} &  = &  {-f(x + 2h_ ie_ i) + 16f(x + h_ ie_ i) - 30f(x) + 16f(x - h_ ie_ i) - f(x - 2h_ ie_ i) \over 12h^2_ i} \\ \mbox{\hspace{-0.5in}} {\partial ^2 f \over \partial x_ i \partial x_ j} &  = &  {f(x+h_ ie_ i+h_ je_ j) - f(x+h_ ie_ i-h_ je_ j) - f(x-h_ ie_ i+h_ je_ j) + f(x-h_ ie_ i-h_ je_ j) \over 4h_ ih_ j} \end{eqnarray*}

  • Second-order derivatives based on gradient calls, when the grd module is specified: $2n$ additional gradient calls are needed.

    \[  {\partial ^2 f \over \partial x_ i \partial x_ j} = {g_ i(x + h_ je_ j) - g_ i(x - h_ je_ j) \over 4h_ j} + {g_ j(x + h_ ie_ i) - g_ j(x - h_ ie_ i) \over 4h_ i}  \]

The step sizes $h_ j$, $j=1,\ldots ,n$, are defined as follows:

  • For the forward-difference approximation of first-order derivatives using only function calls and for second-order derivatives using only gradient calls, $h_ j = \sqrt [2]{\eta _ j} (1 + |x_ j|)$.

  • For the forward-difference approximation of second-order derivatives using only function calls and for central-difference formulas, $h_ j = \sqrt [3]{\eta _ j} (1 + |x_ j|)$.

If the algorithm of Gill et al. (1983) is not used to compute $\eta _ j$, a constant value $\eta = \eta _ j$ is used depending on the value of par[8].

  • If the number of accurate digits is specified by par$[8]=k_1$, then $\eta $ is set to $10^{-k_1}$.

  • If par[8] is not specified, $\eta $ is set to the machine precision, $\epsilon $.

If central-difference formulas are not specified, the optimization algorithm switches automatically from the forward-difference formula to a corresponding central-difference formula during the iteration process if one of the following two criteria is satisfied:

  • The absolute maximum gradient element is less than or equal to 100 times the ABSGTOL threshold.

  • The term on the left of the GTOL criterion is less than or equal to $\max (1$E$-$6, $100\times $GTOL threshold$)$. The 1E$-$6 ensures that the switch is performed even if you set the GTOL threshold to zero.

The algorithm of Gill et al. (1983) that computes the finite-difference intervals $h_ j$ can be very expensive in the number of function calls it uses. If this algorithm is required, it is performed twice, once before the optimization process starts and once after the optimization terminates.

Many applications need considerably more time for computing second-order derivatives than for computing first-order derivatives. In such cases, you should use a quasi-Newton or conjugate gradient technique.

If you specify a vector, $c$, of $nc$ nonlinear constraints with the nlc module but you do not specify the jacnlc module, the first-order formulas can be used to compute finite-difference approximations of the $nc \times n$ Jacobian matrix of the nonlinear constraints.

\[  ( \nabla c_ i ) = \left( \frac{\partial c_ i}{\partial x_ j} \right) , \quad i= 1,\ldots ,nc, ~ ~  j=1,\ldots ,n  \]

You can specify the number of accurate digits in the constraint evaluations with par[9]. This specification also defines the step sizes $h_ j$, $j=1,\ldots ,n$.

Note: If you are not able to specify analytic derivatives and if the finite-difference approximations provided by the subroutines are not good enough to solve your optimization problem, you might be able to implement better finite-difference approximations with the grd, hes, jac, and jacnlc module arguments.