The ROBUSTREG Procedure

High-Breakdown-Value Estimation

The breakdown value of an estimator is defined as the smallest fraction of contamination that can cause the estimator to take on values that are arbitrarily far from its value on the uncontaminated data. The breakdown value of an estimator can be used as a measure of the robustness of the estimator. Rousseeuw and Leroy (1987) and others introduced the following high-breakdown-value estimators for linear regression.

LTS Estimate

The least trimmed squares (LTS) estimate proposed by Rousseeuw (1984) is defined as the p-vector

\[  {\hat\btheta }_{\mathit{LTS}} = \arg \min _{\btheta } Q_{\mathit{LTS}}(\btheta ) \mbox{ with } Q_{\mathit{LTS}}(\btheta ) = \sum _{i=1}^ h r_{(i)}^2  \]

where $ r_{(1)}^2 \leq r_{(2)}^2 \leq ...\leq r_{(n)}^2$ are the ordered squared residuals $ r_{i}^2 = (y_ i - \mb {x}_ i’ \btheta )^2$, $ i=1,\ldots ,n$, and h is defined in the range ${n \over 2} + 1 \leq h \leq {3n+p +1 \over 4}$.

You can specify the parameter h with the H= option in the PROC ROBUSTREG statement. By default, $h= [{3n + p + 1 \over 4}]$. The breakdown value is ${n-h\over n}$ for the LTS estimate.

The ROBUSTREG procedure computes LTS estimates by using the FAST-LTS algorithm of Rousseeuw and Van Driessen (2000). The estimates are often used to detect outliers in the data, which are then downweighted in the resulting weighted LS regression.

Algorithm

Least trimmed squares (LTS) regression is based on the subset of h observations (out of a total of n observations) whose least squares fit possesses the smallest sum of squared residuals. The coverage h can be set between ${n\over 2}$ and n. The LTS method was proposed by Rousseeuw (1984, p. 876) as a highly robust regression estimator with breakdown value ${n-h \over n}$. The ROBUSTREG procedure uses the FAST-LTS algorithm given by Rousseeuw and Van Driessen (2000). The intercept adjustment technique is also used in this implementation. However, because this adjustment is expensive to compute, it is optional. You can use the IADJUST= option in the PROC ROBUSTREG statement to request or suppress the intercept adjustment. By default, PROC ROBUSTREG does intercept adjustment for data sets with fewer than 10,000 observations. The steps of the algorithm are described briefly as follows. See Rousseeuw and Van Driessen (2000) for details.

  1. The default h is $[{3n+p+1 \over 4}]$, where p is the number of independent variables. You can specify any integer h with $[{n \over 2}] + 1 \leq h \leq [{3n+p+1 \over 4}]$ with the H= option in the MODEL statement. The breakdown value for LTS, ${n-h \over n}$, is reported. The default h is a good compromise between breakdown value and statistical efficiency.

  2. If p = 1 (single regressor), the procedure uses the exact algorithm of Rousseeuw and Leroy (1987, p. 172).

  3. If $p\geq 2$, PROC ROBUSTREG uses the following algorithm. If n < 2 ssubs, where ssubs is the size of the subgroups (you can specify ssubs by using the SUBGROUPSIZE= option in the PROC ROBUSTREG statement; by default, ssubs = 300), PROC ROBUSTREG draws a random p-subset and computes the regression coefficients by using these p points (if the regression is degenerate, another p-subset is drawn). The absolute residuals for all observations in the data set are computed, and the first h points with smallest absolute residuals are selected. From this selected h-subset, PROC ROBUSTREG carries out nsteps C-steps (concentration steps; see Rousseeuw and Van Driessen (2000) for details). You can specify nsteps with the CSTEP= option in the PROC ROBUSTREG statement; by default, nsteps = 2. PROC ROBUSTREG redraws p-subsets and repeats the preceding computation nrep times, and then finds the nbsol (at most) solutions with the lowest sums of h squared residuals. You can specify nrep with the NREP= option in the PROC ROBUSTREG statement; by default, NREP=$\min \{  500, {n\choose p} \} $. For small n and p, all ${n\choose p}$ subsets are used and the NREP= option is ignored (Rousseeuw and Hubert, 1996). You can specify nbsol with the NBEST= option in the PROC ROBUSTREG statement; by default, NBEST=10. For each of these nbsol best solutions, C-steps are taken until convergence and the best final solution is found.

  4. If $n \geq 5 \mi {ssubs}$, construct five disjoint random subgroups with size ssubs. If $ 2 \mi {ssubs} < n < 5 \mi {ssubs}$, the data are split into at most four subgroups with ssubs or more observations in each subgroup, so that each observation belongs to a subgroup and the subgroups have roughly the same size. Let nsubs denote the number of subgroups. Inside each subgroup, PROC ROBUSTREG repeats the step 3 algorithm nrep / nsubs times, keeps the nbsol best solutions, and pools the subgroups, yielding the merged set of size $ n_{\mathit{merged}}$. In the merged set, for each of the $\mi {nsubs} \times \mi {nbsol}$ best solutions, nsteps C-steps are carried out by using $ n_{\mathit{merged}}$ and $h_{\mathit{merged}} = [n_{\mathit{merged}}{h \over n}]$ and the nbsol best solutions are kept. In the full data set, for each of these nbsol best solutions, C-steps are taken by using n and h until convergence and the best final solution is found.

Note: At step 3 in the algorithm, a randomly selected p-subset might be degenerate (that is, its design matrix might be singular). If the total number of p-subsets from any subgroup is more than 4,000 and the ratio of degenerate p-subsets is more than the threshold specified in FAILRATIO= option, the algorithm is terminated with an error message.

R-Square

The robust version of R-square for the LTS estimate is defined as

\[  R^2_{\mathit{LTS}} = 1 - {s^2_{\mathit{LTS}}(\bX , \mb {y}) \over s^2_{\mathit{LTS}}({\mbox{\Strong{1}}},\mb {y})}  \]

for models with the intercept term and as

\[  R^2_{\mathit{LTS}} = 1 - {s^2_{\mathit{LTS}}(\bX ,\mb {y}) \over s^2_{\mathit{LTS}}({\mbox{\Strong{0}} },\mb {y})}  \]

for models without the intercept term, where

\[  s_{\mathit{LTS}}(\bX ,\mb {y}) = d_{h,n} \sqrt { {1\over h} \sum _{i=1}^ h r^2_{(i)} }  \]

Note that $s_{\mathit{LTS}}$ is a preliminary estimate of the parameter $\sigma $ in the distribution function $L(\cdot / \sigma )$.

Here $d_{h,n}$ is chosen to make $s_{\mathit{LTS}}$ consistent, assuming a Gaussian model. Specifically,

$\displaystyle  d_{h,n}  $
$\displaystyle = $
$\displaystyle  1 / \sqrt { 1 - {2n \over h c_{h,n}} \phi ({1 / c_{h,n}})} $
$\displaystyle c_{h,n}  $
$\displaystyle = $
$\displaystyle  1 / \Phi ^{-1}\left({h+n\over 2n}\right)  $

with $\Phi $ and $\phi $ being the distribution function and the density function of the standard normal distribution, respectively.

Final Weighted Scale Estimator

The ROBUSTREG procedure displays two scale estimators, $s_{\mathit{LTS}}$ and Wscale. The estimator Wscale is a more efficient scale estimator based on the preliminary estimate $s_{\mathit{LTS}}$; it is defined as

\[  {\mbox{ Wscale}} = \sqrt {{\sum _ i{w_ i {r_ i}^2}} \over {\sum _ i{w_ i - p}}}  \]

where

\[  w_ i = \left\{  \begin{array}{ll} 0 &  {\mbox{ if }} |r_ i| / s_{\mathit{LTS}} > k \\ 1 &  {\mbox{ otherwise }} \end{array} \right.  \]

You can specify k with the CUTOFF= option in the MODEL statement. By default, k = 3.

S Estimate

The S estimate proposed by Rousseeuw and Yohai (1984) is defined as the p-vector

\[  {\hat\btheta }_ S = \arg \min _\theta S(\btheta )  \]

where the dispersion $S(\btheta )$ is the solution of

\[  {1\over n-p} \sum _{i=1}^ n \chi \left({y_ i-\mb {x}_ i’\btheta \over S}\right) = \beta  \]

Here $\beta $ is set to $\int \chi (s)d \Phi (s)$ such that ${\hat\btheta }_ S$ and $S({\hat\btheta }_ S)$ are asymptotically consistent estimates of $\btheta $ and $\sigma $ for the Gaussian regression model. The breakdown value of the S estimate is

\[ {\beta \over \max _ s\chi (s)}  \]

The ROBUSTREG procedure provides two choices for $\chi $: Tukey’s bisquare function and Yohai’s optimal function.

Tukey’s bisquare function, which you can specify with the option CHIF=TUKEY, is

\[  \chi _{k_0}(s) = \left\{  \begin{array}{ll} 3({s\over k_0})^2 - 3({s\over k_0})^4 + ({s\over k_0})^6,&  {\mbox{if }} |s| \leq k_0 \\ 1 &  {\mbox{ otherwise }} \end{array} \right.  \]

The constant $ k_0$ controls the breakdown value and efficiency of the S estimate. If you specify the efficiency by using the EFF= option, you can determine the corresponding $ k_0$. The default $ k_0$ is 2.9366 such that the breakdown value of the S estimate is 0.25 with a corresponding asymptotic efficiency for the Gaussian model of 75.9%.

The Yohai function, which you can specify with the option CHIF=YOHAI, is

\[  \chi _{k_0}(s) = \left\{  \begin{array}{lll} {s^2 \over 2} &  {\mbox{if }} |s| \leq 2 k_0 \\ k_0^2 [ b_0 + b_1({s\over k_0})^2 + b_2({s\over k_0})^4 & \\ + b_3({s\over k_0})^6 + b_4({s\over k_0})^8] &  {\mbox{if }} 2 k_0 < |s| \leq 3 k_0\\ 3.25 k_0^2 &  {\mbox{if }} |s| >3 k_0 \end{array} \right.  \]

where $ b_0 = 1.792$, $ b_1= -0.972$, $ b_2= 0.432$, $ b_3 = -0.052$, and $ b_4 = 0.002$. If you specify the efficiency by using the EFF= option, you can determine the corresponding $ k_0$. By default, $ k_0$ is set to 0.7405 such that the breakdown value of the S estimate is 0.25 with a corresponding asymptotic efficiency for the Gaussian model of 72.7%.

Algorithm

The ROBUSTREG procedure implements the algorithm by Marazzi (1993) for the S estimate, which is a refined version of the algorithm proposed by Ruppert (1992). The refined algorithm is briefly described as follows.

Initialize iter = 1.

  1. Draw a random q-subset of the total n observations and compute the regression coefficients by using these q observations (if the regression is degenerate, draw another q-subset), where $q\geq p$ can be specified with the SUBSIZE= option. By default, $ q=p$.

  2. Compute the residuals: $r_ i = y_ i -\sum _{j=1}^ p x_{ij} \theta _ j$ for $ i=1,\ldots ,n$. If iter = 1, set $s^* = 2 {\mbox{median}} \{ |r_ i|, i=1,\ldots ,n \} $; if $ s^* = 0$, set $s^* = \min \{ |r_ i|, i=1,\ldots ,n\} $; else while $\sum _{i=1}^ n \chi (r_ i / s^*) > (n-p)\beta $, set $ s^* = 1.5 s^*$; go to step 3. If iter > 1 and $\sum _{i=1}^ n \chi (r_ i / s^*) <= (n-p)\beta $, go to step 3; otherwise, go to step 5.

  3. Solve for s the equation

    \[  {1\over n-p} \sum _{i=1}^ n \chi (r_ i / s)= \beta  \]

    using an iterative algorithm.

  4. If iter > 1 and $ s > s^*$, go to step 5. Otherwise, set $ s^*=s$ and $\btheta ^* = \btheta $. If $ s^* < \mbox{TOLS}$, return $ s^*$ and $\btheta ^*$; otherwise, go to step 5.

  5. If iter < NREP, set $\mi {iter}=\mi {iter}+1$ and return to step 1; otherwise, return $ s^*$ and $\btheta ^*$.

The ROBUSTREG procedure does the following refinement step by default. You can request that this refinement not be done by using the NOREFINE option in the PROC ROBUSTREG statement.

  1. Let $\psi = \chi ’$. Using the values $ s^*$ and $\btheta ^*$ from the previous steps, compute M estimates $\btheta _ M$ and $\sigma _ M$ of $\btheta $ and $\sigma $ with the setup for M estimation that is described in the section M Estimation. If $\sigma _ M > s^*$, give a warning and return $ s^*$ and $\btheta ^*$; otherwise, return $\sigma _ M$ and $\btheta _ M$.

You can specify TOLS with the TOLERANCE= option; by default, TOLERANCE=0.001. Alternately, you can specify NREP with the NREP= option. You can also use the options NREP=NREP0 or NREP=NREP1 to determine NREP according to the following table. NREP=NREP0 is set as the default.

Table 80.10: Default NREP

P

NREP0

NREP1

1

150

500

2

300

1000

3

400

1500

4

500

2000

5

600

2500

6

700

3000

7

850

3000

8

1250

3000

9

1500

3000

>9

1500

3000


Note: At step 1 in the algorithm, a randomly selected q-subset might be degenerate. If the total number of q-subsets from any subgroup is more than 4,000 and the ratio of degenerate q-subsets is more than the threshold specified in FAILRATIO= option, the algorithm is terminated with an error message.

R-Square and Deviance

The robust version of R-square for the S estimate is defined as

\[  R^2_{S} = 1 - {(n-p)S_ p^2 \over (n-1)S_\mu ^2}  \]

for the model with the intercept term and

\[  R^2_{S} = 1 - {(n-p)S_ p^2 \over n S_0^2}  \]

for the model without the intercept term, where $ S_ p$ is the S estimate of the scale in the full model, $ S_\mu $ is the S estimate of the scale in the regression model with only the intercept term, and $ S_0$ is the S estimate of the scale without any regressor. The deviance D is defined as the optimal value of the objective function on the $\sigma ^2$ scale:

\[  D = S_ p^2  \]
Asymptotic Covariance and Confidence Intervals

Since the S estimate satisfies the first-order necessary conditions as the M estimate, it has the same asymptotic covariance as that of the M estimate. All three estimators of the asymptotic covariance for the M estimate in the section Asymptotic Covariance and Confidence Intervals can be used for the S estimate. Besides, the weighted covariance estimator H4 described in the section Asymptotic Covariance and Confidence Intervals is also available and is set as the default. Confidence intervals for estimated parameters are computed from the diagonal elements of the estimated asymptotic covariance matrix.