The NLP Procedure

Covariance Matrix

The COV= option must be specified to compute an approximate covariance matrix for the parameter estimates under asymptotic theory for least squares, maximum-likelihood, or Bayesian estimation, with or without corrections for degrees of freedom as specified by the VARDEF= option.

Two groups of six different forms of covariance matrices (and therefore approximate standard errors) can be computed corresponding to the following two situations:

  • The LSQ statement is specified, which means that least squares estimates are being computed:

    \[  \min f(x) = \sum _{i=1}^ m f^2_ i(x)  \]
  • The MIN or MAX statement is specified, which means that maximum-likelihood or Bayesian estimates are being computed:

    \[  \textrm{opt} \,  f(x) = \sum _{i=1}^ m f_ i(x)  \]

    where $ \mr {opt}$ is either $ \min $ or $ \max $.

In either case, the following matrices are used:

\[  G = \nabla ^2 f(x)  \]
\[  J(f) = (\nabla f_1,\ldots ,\nabla f_ m) = \left( \frac{\partial f_ i}{\partial x_ j} \right)  \]
\[  JJ(f) = J(f)^ T J(f)  \]
\[  V = J(f)^ T \textrm{diag}(f_ i^2) J(f)  \]
\[  W = J(f)^ T \textrm{diag}(f_ i^{\dagger }) J(f)  \]

where

\[  f_ i^{\dagger } = \left\{  \begin{array}{ll} 0, &  \mbox{if }f_ i=0 \\ 1 / f_ i, &  \mbox{otherwise} \end{array} \right.  \]

For unconstrained minimization, or when none of the final parameter estimates are subjected to linear equality or active inequality constraints, the formulas of the six types of covariance matrices are as follows:

Table 7.3: Central-Difference Approximations

COV

MIN or MAX Statement

LSQ Statement

1

M

$ (\mr {\_ NOBS\_ }/d) G^{-1} JJ(f) G^{-1}$

$ (\mr {\_ NOBS\_ }/d) G^{-1} V G^{-1}$

2

H

$ (\mr {\_ NOBS\_ }/d) G^{-1}$

$ \sigma ^2 G^{-1}$

3

J

$ (1/d) W^{-1}$

$ \sigma ^2 JJ(f)^{-1}$

4

B

$ (1/d) G^{-1} W G^{-1}$

$ \sigma ^2 G^{-1} JJ(f) G^{-1}$

5

E

$ (\mr {\_ NOBS\_ }/d) JJ(f)^{-1}$

$ (1/d) V^{-1}$

6

U

$ (\mr {\_ NOBS\_ }/d) W^{-1} JJ(f) W^{-1}$

$ (\mr {\_ NOBS\_ }/d) JJ(f)^{-1} V JJ(f)^{-1}$


The value of $ d$ depends on the VARDEF= option and on the value of the _NOBS_ variable:

\[  d = \left\{  \begin{array}{ll} \max (1, \_ \text {NOBS}\_  - \_ \text {DF}\_ ), &  \mbox{for VARDEF=DF} \\ \_ \text {NOBS}\_ , &  \mbox{for VARDEF=N} \end{array} \right.  \]

where _DF_ is either set in the program statements or set by default to $ n$ (the number of parameters) and _NOBS_ is either set in the program statements or set by default to nobs $\times $ mfun, where nobs is the number of observations in the data set and mfun is the number of functions listed in the LSQ, MIN, or MAX statement.

The value $\sigma ^2$ depends on the specification of the SIGSQ= option and on the value of $ d$:

\[  \sigma ^2 = \left\{  \begin{array}{ll} sq \times \_ NOBS\_  / d, &  \mbox{if SIGSQ=$sq$ is specified} \\ 2 f(x^*) / d, &  \mbox{if SIGSQ= is not specified} \end{array} \right.  \]

where $ f(x^*)$ is the value of the objective function at the optimal parameter estimates $ x^*$.

The two groups of formulas distinguish between two situations:

  • For least squares estimates, the error variance can be estimated from the objective function value and is used in three of the six different forms of covariance matrices. If you have an independent estimate of the error variance, you can specify it with the SIGSQ= option.

  • For maximum-likelihood or Bayesian estimates, the objective function should be the logarithm of the likelihood or of the posterior density when using the MAX statement.

For minimization, the inversion of the matrices in these formulas is done so that negative eigenvalues are considered zero, resulting always in a positive semidefinite covariance matrix.

In small samples, estimates of the covariance matrix based on asymptotic theory are often too small and should be used with caution.

If the final parameter estimates are subjected to $ n_\mi {act} > 0$ linear equality or active linear inequality constraints, the formulas of the covariance matrices are modified similar to Gallant (1987) and Cramer (1986, p. 38) and additionally generalized for applications with singular matrices. In the constrained case, the value of $ d$ used in the scalar factor $\sigma ^2$ is defined by

\[  d = \left\{  \begin{array}{ll} \max (1, \_ \text {NOBS}\_  - \_ \text {DF}\_  + n_\mi {act}), &  \mbox{for VARDEF=DF} \\ \_ \text {NOBS}\_ , &  \mbox{for VARDEF=N} \end{array} \right.  \]

where $ n_\mi {act}$ is the number of active constraints and _NOBS_ is set as in the unconstrained case.

For minimization, the covariance matrix should be positive definite; for maximization it should be negative definite. There are several options available to check for a rank deficiency of the covariance matrix:

  • The ASINGULAR=, MSINGULAR=, and VSINGULAR= options can be used to set three singularity criteria for the inversion of the matrix $ A$ needed to compute the covariance matrix, when $ A$ is either the Hessian or one of the crossproduct Jacobian matrices. The singularity criterion used for the inversion is

    \[  |d_{j,j}| \le \max (\emph{ASING}, \emph{VSING} \times |A_{j,j}|, \emph{MSING} \times \max (|A_{1,1}|,\ldots ,|A_{n,n}|))  \]

    where $ d_{j,j}$ is the diagonal pivot of the matrix $ A$, and ASING, VSING and MSING are the specified values of the ASINGULAR=, VSINGULAR=, and MSINGULAR= options. The default values are

    • ASING: the square root of the smallest positive double precision value

    • MSING: $1$E$-12$ if the SINGULAR= option is not specified and $\max (10 \times \epsilon ,1\mr {E}-4 \times \mi {SINGULAR})$ otherwise, where $\epsilon $ is the machine precision

    • VSING: $1$E$-8$ if the SINGULAR= option is not specified and the value of SINGULAR otherwise

    Note: In many cases, a normalized matrix $ D^{-1}AD^{-1}$ is decomposed and the singularity criteria are modified correspondingly.

  • If the matrix $ A$ is found singular in the first step, a generalized inverse is computed. Depending on the G4= option, a generalized inverse is computed that satisfies either all four or only two Moore-Penrose conditions. If the number of parameters $ n$ of the application is less than or equal to G4=$ i$, a G4 inverse is computed; otherwise only a G2 inverse is computed. The G4 inverse is computed by (the computationally very expensive but numerically stable) eigenvalue decomposition; the G2 inverse is computed by Gauss transformation. The G4 inverse is computed using the eigenvalue decomposition $ A = Z \Lambda Z^ T$, where $ Z$ is the orthogonal matrix of eigenvectors and $\Lambda $ is the diagonal matrix of eigenvalues, $ \Lambda = \mr {diag}(\lambda _1,\ldots ,\lambda _ n) $. If the PEIGVAL option is specified, the eigenvalues $\lambda _ i$ are displayed. The G4 inverse of $ A$ is set to

    \[  A^- = Z \Lambda ^- Z^ T  \]

    where the diagonal matrix $\Lambda ^- = \mr {diag}(\lambda ^-_1,\ldots ,\lambda ^-_ n)$ is defined using the COVSING= option:

    \[  \lambda ^-_ i = \left\{  \begin{array}{ll} 1 / \lambda _ i, &  \mbox{if }|\lambda _ i| > \mi {COVSING} \\ 0, &  \mbox{if }|\lambda _ i| \le \mi {COVSING} \end{array} \right.  \]

    If the COVSING= option is not specified, the $ \mi {nr}$ smallest eigenvalues are set to zero, where $ \mi {nr}$ is the number of rank deficiencies found in the first step.

For optimization techniques that do not use second-order derivatives, the covariance matrix is usually computed using finite-difference approximations of the derivatives. By specifying TECH=NONE, any of the covariance matrices can be computed using analytical derivatives. The covariance matrix specified by the COV= option can be displayed (using the PCOV option) and is written to the OUTEST= data set.