LAV Call

CALL LAV (rc, xr, a, b <*>, x0 <*>, opt ) ;

The LAV subroutine performs linear least absolute value regression by solving the $L_1$ norm minimization problem.

The LAV subroutine returns the following values:

rc

is a scalar return code that indicates the reason for optimization termination.

rc

Termination

0

Successful

1

Successful, but approximate covariance matrix and standard errors cannot be computed

$-1$ or $-3$

Unsuccessful: error in the input arguments

$-2$

Unsuccessful: matrix $A$ is rank-deficient ($\mr {rank}(A)<n$)

$-4$

Unsuccessful: maximum iteration limit exceeded

$-5$

Unsuccessful: no solution found for ill-conditioned problem

xr

specifies a vector or matrix with $n$ columns. If the optimization process is not successfully completed, xr is a row vector with $n$ missing values. If termination is successful and the opt[3] option is not set, xr is the vector with the optimal estimate, $x^*$. If termination is successful and the opt[3] option is specified, xr is an $(n+2) \times n$ matrix that contains the optimal estimate, $x^*$, in the first row, the asymptotic standard errors in the second row, and the $n \times n$ covariance matrix of parameter estimates in the remaining rows.

The input arguments to the LAV subroutine are as follows:

a

specifies an $m \times n$ matrix $A$ with $m \geq n$ and full column rank, $\mr {rank}(A)=n$. If you want to include an intercept in the model, you must include a column of ones in the matrix $A$.

b

specifies the $m \times 1$ vector $b$.

x0

specifies an optional $n \times 1$ vector that specifies the starting point of the optimization.

opt

is an optional vector used to specify options. If an element of the opt vector is missing, the default value is used.

  • opt[1] specifies the maximum number maxi of outer iterations (this corresponds to the number of changes of the Huber parameter $\gamma $). The default is $\mbox{maxi}=\min (100,10n)$. (The number of inner iterations is restricted by an internal threshold. If the number of inner iterations exceeds this threshold, a new outer iteration is started with an increased value of $\gamma $.)

  • opt[2] specifies the amount of printed output. Higher values request additional output and include the output of lower values.

    opt[2]

    Termination

    0

    No output is printed.

    1

    Error and warning messages are printed.

    2

    The iteration history is printed (this is the default).

    3

    The $n$ least squares ($L_2$ norm) estimates are printed if no starting point is specified, the $L_1$ norm estimates are always printed, and if opt[3] is set, the estimates are printed together with the asymptotic standard errors.

    4

    The $n \times n$ approximate covariance matrix of parameter estimates is printed if opt[3] is set.

    5

    The residual and predicted values for all $m$ rows (equations) of $A$ are printed.

  • opt[3] specifies which estimate of the variance of the median of nonzero residuals be used as a factor for the approximate covariance matrix of parameter estimates and for the approximate standard errors (ASE). If opt[3]$=0$, the McKean-Schrader (1987) estimate is used, and if opt[3]$>0$, the Cox-Hinkley (1974) estimate, with $v=$opt[3], is used. The default behavior is that the covariance matrix is not computed.

  • opt[4] specifies whether a computationally expensive test for necessary and sufficient optimality of the solution $x$ is executed. The default behavior (opt[4]$=0$) is that the convergence test is not performed.

Missing values are not permitted in the a or b argument. The $x0$ argument is ignored if it contains any missing values. Missing values in the opt argument cause the default value to be used.

The LAV subroutine is designed for solving the unconstrained linear $L_1$ norm minimization problem,

\[  \min _{x} L_1(x) \;  \mbox{where} \;  L_1(x) = \|  A x - b \| _1 = \sum _{i=1}^ m \left| \sum _{j=1}^ n a_{ij} x_ j - b_ i \right|  \]

for $m$ equations with $n$ (unknown) parameters $x=(x_1,\ldots ,x_ n)$. This is equivalent to estimating the unknown parameter vector, $x$, by least absolute value regression in the model

\[  b = A x + \epsilon  \]

where $b$ is the vector of $n$ observations, $A$ is the design matrix, and $\epsilon $ is a random error term.

An algorithm by Madsen and Nielsen (1993) is used, which can be faster for large values of $m$ and $n$ than the Barrodale and Roberts (1974) algorithm. The current version of the algorithm assumes that $A$ has full column rank. Also, constraints cannot be imposed on the parameters in this version.

The $L_1$ norm minimization problem is more difficult to solve than the least squares ($L_2$ norm) minimization problem because the objective function of the $L_1$ norm problem is not continuously differentiable (the first derivative has jumps). A function that is continuous but not continuously differentiable is called nonsmooth. By using PROC NLP and the nonlinear optimization subroutines, you can obtain the estimates in linear and nonlinear $L_1$ norm estimation (even subject to linear or nonlinear constraints) as long as the number of parameters, $n$, is small. Using the nonlinear optimization subroutines, there are two ways to solve the nonlinear $L_ p$ norm, $p \geq 1$, problem:

  • For small values of $n$, you can implement the Nelder-Mead simplex algorithm with the NLPNMS subroutine to solve the minimization problem in its original specification. The Nelder-Mead simplex algorithm does not assume a smooth objective function, does not take advantage of any derivatives, and therefore does not require continuous differentiability of the objective function. See the section NLPNMS Call for details.

  • Gonin and Money (1989) describe how an original $L_ p$ norm estimation problem can be modified to an equivalent optimization problem with nonlinear constraints which has a simple differentiable objective function. You can invoke the NLPQN subroutine, which implements a quasi-Newton algorithm, to solve the nonlinearly constrained $L_ p$ norm optimization problem. See the section NLPQN Call for details about the NLPQN subroutine.

Both approaches are successful only for a small number of parameters and good initial estimates. If you cannot supply good initial estimates, the optimal results of the corresponding nonlinear least squares ($L_2$ norm) estimation can provide fairly good initial estimates.

Gonin and Money (1989) show that the nonlinear $L_1$ norm estimation problem

\[  \min _{x} \sum _{i=1}^ m |f_ i(x)|  \]

can be reformulated as a linear optimization problem with nonlinear constraints in the following ways.

  • as a linear optimization problem with $2m$ nonlinear inequality constraints in $m+n$ variables $u_ i$ and $x_ j$,

    \[  \min _{x} \sum _{i=1}^ m u_ i \mbox{ subject to } \left. \begin{array}{lcr} f_ i(x) - u_ i &  \leq &  0 \\ f_ i(x) + u_ i &  \geq &  0 \\ u_ i &  \geq &  0 \end{array} \right\}  \quad i=1,\ldots ,m  \]
  • as a linear optimization problem with $2m$ nonlinear equality constraints in $2m+n$ variables $y_ i$, $z_ i$, and $x_ j$,

    \[  \min _{x} \sum _{i=1}^ m (y_ i + z_ i) \mbox{ subject to } \left. \begin{array}{lcr} f_ i(x) + y_ i - z_ i &  = &  0 \\ y_ i &  \geq &  0 \\ z_ i &  \geq &  0 \end{array} \right\}  \quad i=1,\ldots ,m  \]

For linear functions $f_ i(x) = \sum _{j=1}^ n (a_{ij} x_ j - b_ i)$, $i=1,\ldots ,m$, you obtain linearly constrained linear optimization problems, for which the number of variables and constraints is on the order of the number of observations, $m$. The advantage that the algorithm by Madsen and Nielsen (1993) has over the Barrodale and Roberts (1974) algorithm is that its computational cost increases only linearly with $m$, and it can be faster for large values of $m$.

In addition to computing an optimal solution $x^*$ that minimizes $L_1(x)$, you can also compute approximate standard errors and the approximate covariance matrix of $x^*$. The standard errors can be used to compute confidence limits.

The following example is the same one used for illustrating the LAV subroutine by Lee and Gentle (1986). $A$ and $b$ are as follows:

\[  A = \left[ \begin{array}{rr} 1 &  0 \\ 1 &  1 \\ 1 &  -1 \\ 1 &  -1 \\ 1 &  2 \\ 1 &  2 \end{array} \right] b = \left[ \begin{array}{r} 1 \\ 2 \\ 1 \\ -1 \\ 2 \\ 4 \end{array} \right]  \]

The following statements specify the matrix $A$, the vector $b$, and the options vector opt. The options vector specifies that all output is printed (opt[2]$=5$), that the asymptotic standard errors and covariance matrix are computed based on the McKean-Schrader (1987) estimate $\lambda $ of the variance of the median (opt[3]$=0$), and that the convergence test be performed (opt[4]$=1$).

a = { 0,  1, -1, -1,  2,  2 };
m = nrow(a);
a = j(m, 1, 1.) || a;
b = { 1,  2,  1, -1,  2,  4 };

opt= { . 5  0 1 };
call lav(rc, xr, a, b, , opt);

The first part of the output is shown in Figure 24.183. This output displays the least squares solution, which is used as the starting point. The estimates of the largest and smallest nonzero eigenvalues of $A^{\prime } A$ give only an idea of the magnitude of these values, and they can be very crude approximations.

Figure 24.183: Least Squares Solution

LS Solution
Est 1 1


The second part of the printed output shows the iteration history. It is shown in Figure 24.184.

Figure 24.184: Iteration History

LAV (L1) Estimation
Start with LS Solution
Start Iter: gamma=1 ActEqn=6
Iter N Huber Act Eqn Rank Gamma L1(x) F(Gamma)
1 1 2 2 0.9000 4.000000 2.200000
1 1 2 2 0.0000 4.000000 2.200000


The third part of the output is shown in Figure 24.185. This output displays the $L_1$ norm solution (first row) together with asymptotic standard errors (second row) and the asymptotic covariance matrix of parameter estimates. The ASEs are the square roots of the diagonal elements of this covariance matrix.

Figure 24.185: Parameter and Covariance Estimates

L1 Solution with ASE
Est 1 1
ASE 0.4482711811 0.3310702082

Cov Matrix: McKean-Schrader
0.2009470518 -0.054803741
-0.054803741 0.1096074828


The last part of the printed output shows the predicted values and residuals, as in Lee and Gentle (1986). It is shown in Figure 24.186.

Figure 24.186: Predicted and Residual Values

Predicted Values and Residuals
N Observed Predicted Residual
1 1.0000 1.0000 0
2 2.0000 2.0000 0
3 1.0000 0.0000 1.000000
4 -1.0000 0.0000 -1.000000
5 2.0000 3.0000 -1.000000
6 4.0000 3.0000 1.000000