The NLP Procedure

Testing the Gradient Specification

There are three main ways to check the correctness of derivative specifications:

  • Specify the FD= or FDHESSIAN= option in the PROC NLP statement to compute finite-difference approximations of first- and second-order derivatives. In many applications, the finite-difference approximations are computed with high precision and do not differ too much from the derivatives that are computed by specified formulas.

  • Specify the GRADCHECK option in the PROC NLP statement to compute and display a test vector and a test matrix of the gradient values at the starting point $ x^{(0)}$ by the method of Wolfe (1982). If you do not specify the GRADCHECK option, a fast derivative test identical to the GRADCHECK= FAST specification is done by default.

  • If the default analytical derivative compiler is used or if derivatives are specified using the GRADIENT or JACOBIAN statement, the gradient or Jacobian computed at the initial point $ x^{(0)}$ is tested by default using finite-difference approximations. In some examples, the relative test can show significant differences between the two forms of derivatives, resulting in a warning message indicating that the specified derivatives could be wrong, even if they are correct. This happens especially in cases where the magnitude of the gradient at the starting point $ x^{(0)}$ is small.

The algorithm of Wolfe (1982) is used to check whether the gradient $ g(x)$ specified by a GRADIENT statement (or indirectly by a JACOBIAN statement) is appropriate for the objective function $ f(x)$ specified by the program statements.

Using function and gradient evaluations in the neighborhood of the starting point $ x^{(0)}$, second derivatives are approximated by finite-difference formulas. Forward differences of gradient values are used to approximate the Hessian element $ G_{jk}$,

\[  G_{jk} \approx H_{jk} = \frac{g_ j(x + \delta e_ k) - g_ j(x)}{\delta }  \]

where $\delta $ is a small step length and $ e_ k = (0,\ldots ,0,1,0,\ldots ,0)^ T$ is the unit vector along the kth coordinate axis. The test vector s, with

\[  s_ j = H_{jj} - \frac{2}{\delta } \left\{  \frac{f(x + \delta e_ j) - f(x)}{\delta } - g_ j(x) \right\}   \]

contains the differences between two sets of finite-difference approximations for the diagonal elements of the Hessian matrix

\[  G_{jj} = \partial ^2 f(x^{(0)}) / \partial x^2_ j \,  , \quad j=1,\ldots ,n  \]

The test matrix $\Delta H$ contains the absolute differences of symmetric elements in the approximate Hessian $ |H_{jk} - H_{kj}|$, $ j,k=1,\ldots ,n$, generated by forward differences of the gradient elements.

If the specification of the first derivatives is correct, the elements of the test vector and test matrix should be relatively small. The location of large elements in the test matrix points to erroneous coordinates in the gradient specification. For very large optimization problems, this algorithm can be too expensive in terms of computer time and memory.