The QUANTREG Procedure

Quantile Regression as an Optimization Problem

The model for linear quantile regression is

\[  \mb{y} = \bA ^{\prime }\bbeta + \bepsilon  \]

where $\mb{y}=(y_1,\ldots ,y_ n)^{\prime }$ is the $(n \times 1)$ vector of responses, $\bA ^{\prime }=(\mb{x}_1,\ldots ,\mb{x}_ n)^{\prime }$ is the $(n \times p)$ regressor matrix, $\bbeta = (\beta _1,\ldots ,\beta _ p)^{\prime }$ is the $(p \times 1)$ vector of unknown parameters, and $\bepsilon = (\epsilon _1,\ldots ,\epsilon _ n)^{\prime }$ is the $(n \times 1)$ vector of unknown errors.

$L_1$ regression, also known as median regression, is a natural extension of the sample median when the response is conditioned on the covariates. In $L_1$ regression, the least absolute residuals estimate ${\hat\bbeta }_{\mathit{LAR}}$, referred to as the $L_1$-norm estimate, is obtained as the solution of the following minimization problem:

\[  \min _{\bbeta \in \mb{R}^ p} \sum _{i=1}^ n | y_ i - \mb{x}_ i^{\prime }\bbeta |  \]

More generally, for quantile regression Koenker and Bassett (1978) defined the $\tau $ regression quantile, $0<\tau <1$, as any solution to the following minimization problem:

\[  \min _{\bbeta \in \mb{R}^ p} \left[\sum _{i\in \{ i: y_ i\geq \mb{x}_ i^{\prime }\bbeta \} } \tau |y_ i-\mb{x}_ i^{\prime }\bbeta | + \sum _{i\in \{ i: y_ i< \mb{x}_ i^{\prime }\bbeta \} } (1-\tau ) |y_ i-\mb{x}_ i^{\prime }\bbeta |\right]  \]

The solution is denoted as $\hat{\bbeta }(\tau )$, and the $L_1$-norm estimate corresponds to $\hat{\bbeta }(1\slash 2)$. The $\tau $ regression quantile is an extension of the $\tau $ sample quantile $\hat\xi (\tau )$, which can be formulated as the solution of

\[  \min _{\xi \in \mb{R}} \left[ \sum _{i\in \{ i: y_ i\geq \xi \} } \tau |y_ i-\xi | + \sum _{i\in \{ i: y_ i< \xi \} } (1-\tau ) |y_ i-\xi | \right]  \]

If you specify weights $w_ i, i=1,\ldots ,n$, with the WEIGHT statement, weighted quantile regression is carried out by solving

\[  \min _{\bbeta _ w \in \mb{R}^ p} \left[\sum _{i\in \{ i: y_ i\geq \mb{x}_ i^{\prime }\bbeta _ w\} } w_ i \tau |y_ i-\mb{x}_ i^{\prime }\bbeta _ w| + \sum _{i\in \{ i: y_ i< \mb{x}_ i^{\prime }\bbeta _ w\} } w_ i (1-\tau ) |y_ i-\mb{x}_ i^{\prime }\bbeta _ w|\right]  \]

Weighted regression quantiles $\bbeta _ w$ can be used for L-estimation (Koenker and Zhao, 1994).