The QUANTREG Procedure

Leverage Point and Outlier Detection

The QUANTREG procedure uses robust multivariate location and scale estimates for leverage-point detection.

Mahalanobis distance is defined as

\[ \mr{MD}(\mb{x}_ i) = [(\mb{x}_ i - \bar{\mb{x}})^{\prime } \bar{\bC }(\bA )^{-1}(\mb{x}_ i - {\bar{\mb{x}}})]^{1 / 2}  \]

where ${\bar{\mb{x}}} = {\frac1n} \sum _{i=1}^ n \mb{x}_ i$ and $ \bar{\bC }(\bA ) = {\frac1{n-1}}\sum _{i=1}^ n (\mb{x}_ i - {\bar{\mb{x}}})^{\prime } (\mb{x}_ i - {\bar{\mb{x}}})$ are the empirical multivariate location and scale, respectively. Here, $\mb{x}_ i=(\mb{x}_{i1},\ldots ,\mb{x}_{i(p-1)})^{\prime }$ does not include the intercept variable. The relationship between the Mahalanobis distance $\mr{MD}(\mb{x}_ i)$ and the matrix $\bH =(h_{ij})=\bA ^{\prime }(\bA \bA ^{\prime })^{-1}\bA $ is

\[  h_{ii} = {\frac1{n-1}} {\mr{MD}}_ i^2 + {\frac1n}  \]

Robust distance is defined as

\[  \mr{RD}(\mb{x}_ i) = [(\mb{x}_ i - \bT (\bA ))^{\prime } \bC (\bA )^{-1}(\mb{x}_ i - \bT (\bA ))]^{1 / 2}  \]

where $\bT (\bA )$ and $\bC (\bA )$ are robust multivariate location and scale estimates that are computed according to the minimum covariance determinant (MCD) method of Rousseeuw and Van Driessen (1999).

These distances are used to detect leverage points. You can use the LEVERAGE and DIAGNOSTICS options in the MODEL statement to request leverage-point and outlier diagnostics, respectively. Two new variables, Leverage and Outlier, respectively, are created and saved in an output data set that is specified in the OUTPUT statement.

Let $C(p) = {\sqrt {\chi ^2_{p; 1-\alpha }}}$ be the cutoff value. The variable LEVERAGE is defined as

\[  {\mbox{LEVERAGE }} = \left\{  \begin{array}{ll} 0 &  {\mbox{ if }} \mr{RD}(\mb{x}_ i) \leq C(p) \\ 1 &  {\mbox{ otherwise }} \end{array} \right.  \]

You can specify a cutoff value in the LEVERAGE option in the MODEL statement.

Residuals $r_ i, i=1,\ldots ,n$, that are based on quantile regression estimates are used to detect vertical outliers. The variable OUTLIER is defined as

\[  {\mbox{OUTLIER }} = \left\{  \begin{array}{ll} 0 &  {\mbox{ if }} |r_ i| \leq k\sigma \\ 1 &  {\mbox{ otherwise }} \end{array} \right.  \]

You can specify the multiplier k of the cutoff value in the CUTOFF= option in the MODEL statement. You can specify the scale $\sigma $ in the SCALE= option in the MODEL statement. By default, k = 3 and the scale $\sigma $ is computed as the corrected median of the absolute residuals:

$\sigma = {\mbox{median}} \{  |r_ i|/ \beta _0, i=1,\ldots ,n \} $

where $\beta _0 = \Phi ^{-1}(0.75)$ is an adjustment constant for consistency when the normal distribution is used.

An ODS table called DIAGNOSTICS contains the Leverage and Outlier variables.