| The ROBUSTREG Procedure |
Robust regression and outlier detection techniques have considerable applications to econometrics. The following example from Zaman, Rousseeuw, and Orhan (2001) shows how these techniques substantially improve the ordinary least squares (OLS) results for the growth study of De Long and Summers.
De Long and Summers (1991) studied the national growth of 61 countries from 1960 to 1985 by using OLS with the following data set growth.
data growth;
input country$ GDP LFG EQP NEQ GAP @@;
datalines;
Argentin 0.0089 0.0118 0.0214 0.2286 0.6079
Austria 0.0332 0.0014 0.0991 0.1349 0.5809
Belgium 0.0256 0.0061 0.0684 0.1653 0.4109
Bolivia 0.0124 0.0209 0.0167 0.1133 0.8634
Botswana 0.0676 0.0239 0.1310 0.1490 0.9474
Brazil 0.0437 0.0306 0.0646 0.1588 0.8498
Cameroon 0.0458 0.0169 0.0415 0.0885 0.9333
Canada 0.0169 0.0261 0.0771 0.1529 0.1783
Chile 0.0021 0.0216 0.0154 0.2846 0.5402
Colombia 0.0239 0.0266 0.0229 0.1553 0.7695
CostaRic 0.0121 0.0354 0.0433 0.1067 0.7043
Denmark 0.0187 0.0115 0.0688 0.1834 0.4079
Dominica 0.0199 0.0280 0.0321 0.1379 0.8293
Ecuador 0.0283 0.0274 0.0303 0.2097 0.8205
ElSalvad 0.0046 0.0316 0.0223 0.0577 0.8414
Ethiopia 0.0094 0.0206 0.0212 0.0288 0.9805
Finland 0.0301 0.0083 0.1206 0.2494 0.5589
France 0.0292 0.0089 0.0879 0.1767 0.4708
Germany 0.0259 0.0047 0.0890 0.1885 0.4585
Greece 0.0446 0.0044 0.0655 0.2245 0.7924
Guatemal 0.0149 0.0242 0.0384 0.0516 0.7885
Honduras 0.0148 0.0303 0.0446 0.0954 0.8850
HongKong 0.0484 0.0359 0.0767 0.1233 0.7471
India 0.0115 0.0170 0.0278 0.1448 0.9356
Indonesi 0.0345 0.0213 0.0221 0.1179 0.9243
Ireland 0.0288 0.0081 0.0814 0.1879 0.6457
Israel 0.0452 0.0305 0.1112 0.1788 0.6816
Italy 0.0362 0.0038 0.0683 0.1790 0.5441
IvoryCoa 0.0278 0.0274 0.0243 0.0957 0.9207
Jamaica 0.0055 0.0201 0.0609 0.1455 0.8229
Japan 0.0535 0.0117 0.1223 0.2464 0.7484
Kenya 0.0146 0.0346 0.0462 0.1268 0.9415
Korea 0.0479 0.0282 0.0557 0.1842 0.8807
Luxembou 0.0236 0.0064 0.0711 0.1944 0.2863
Madagasc -0.0102 0.0203 0.0219 0.0481 0.9217
Malawi 0.0153 0.0226 0.0361 0.0935 0.9628
Malaysia 0.0332 0.0316 0.0446 0.1878 0.7853
Mali 0.0044 0.0184 0.0433 0.0267 0.9478
Mexico 0.0198 0.0349 0.0273 0.1687 0.5921
Morocco 0.0243 0.0281 0.0260 0.0540 0.8405
Netherla 0.0231 0.0146 0.0778 0.1781 0.3605
Nigeria -0.0047 0.0283 0.0358 0.0842 0.8579
Norway 0.0260 0.0150 0.0701 0.2199 0.3755
Pakistan 0.0295 0.0258 0.0263 0.0880 0.9180
Panama 0.0295 0.0279 0.0388 0.2212 0.8015
Paraguay 0.0261 0.0299 0.0189 0.1011 0.8458
Peru 0.0107 0.0271 0.0267 0.0933 0.7406
Philippi 0.0179 0.0253 0.0445 0.0974 0.8747
Portugal 0.0318 0.0118 0.0729 0.1571 0.8033
Senegal -0.0011 0.0274 0.0193 0.0807 0.8884
Spain 0.0373 0.0069 0.0397 0.1305 0.6613
SriLanka 0.0137 0.0207 0.0138 0.1352 0.8555
Tanzania 0.0184 0.0276 0.0860 0.0940 0.9762
Thailand 0.0341 0.0278 0.0395 0.1412 0.9174
Tunisia 0.0279 0.0256 0.0428 0.0972 0.7838
U.K. 0.0189 0.0048 0.0694 0.1132 0.4307
U.S. 0.0133 0.0189 0.0762 0.1356 0.0000
Uruguay 0.0041 0.0052 0.0155 0.1154 0.5782
Venezuel 0.0120 0.0378 0.0340 0.0760 0.4974
Zambia -0.0110 0.0275 0.0702 0.2012 0.8695
Zimbabwe 0.0110 0.0309 0.0843 0.1257 0.8875
;
The regression equation they used is
![]() |
where the response variable is the growth in gross domestic product per worker (
) and the regressors are labor force growth (
), relative GDP gap (
), equipment investment (
), and nonequipment investment (
).
The following statements invoke the REG procedure ( Chapter 73, The REG Procedure ) for the OLS analysis:
proc reg data=growth;
model GDP = LFG GAP EQP NEQ ;
run;
| Parameter Estimates | |||||
|---|---|---|---|---|---|
| Variable | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
| Intercept | 1 | -0.01430 | 0.01028 | -1.39 | 0.1697 |
| LFG | 1 | -0.02981 | 0.19838 | -0.15 | 0.8811 |
| GAP | 1 | 0.02026 | 0.00917 | 2.21 | 0.0313 |
| EQP | 1 | 0.26538 | 0.06529 | 4.06 | 0.0002 |
| NEQ | 1 | 0.06236 | 0.03482 | 1.79 | 0.0787 |
The OLS analysis shown in Output 74.3.1 indicates that
and
have a significant influence on
at the
level.
The following statements invoke the ROBUSTREG procedure with the default M estimation.
ods graphics on;
proc robustreg data=growth plots=all;
model GDP = LFG GAP EQP NEQ / diagnostics leverage;
id country;
run;
ods graphics off;
Output 74.3.2 displays model information and summary statistics for variables in the model.
| Model Information | |
|---|---|
| Data Set | WORK.GROWTH |
| Dependent Variable | GDP |
| Number of Independent Variables | 4 |
| Number of Observations | 61 |
| Method | M Estimation |
| Summary Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | Q1 | Median | Q3 | Mean | Standard Deviation |
MAD |
| LFG | 0.0118 | 0.0239 | 0.0281 | 0.0211 | 0.00979 | 0.00949 |
| GAP | 0.5796 | 0.8015 | 0.8863 | 0.7258 | 0.2181 | 0.1778 |
| EQP | 0.0265 | 0.0433 | 0.0720 | 0.0523 | 0.0296 | 0.0325 |
| NEQ | 0.0956 | 0.1356 | 0.1812 | 0.1399 | 0.0570 | 0.0624 |
| GDP | 0.0121 | 0.0231 | 0.0310 | 0.0224 | 0.0155 | 0.0150 |
Output 74.3.3 displays the M estimates. Besides
and
, the robust analysis also indicates that
is significant. This new finding is explained by Output 74.3.4, which shows that Zambia, the 60th country in the data, is an outlier. Output 74.3.4 also identifies leverage points based on the robust MCD distances; however, there are no serious high-leverage points in this data set.
| Parameter Estimates | |||||||
|---|---|---|---|---|---|---|---|
| Parameter | DF | Estimate | Standard Error | 95% Confidence Limits | Chi-Square | Pr > ChiSq | |
| Intercept | 1 | -0.0247 | 0.0097 | -0.0437 | -0.0058 | 6.53 | 0.0106 |
| LFG | 1 | 0.1040 | 0.1867 | -0.2619 | 0.4699 | 0.31 | 0.5775 |
| GAP | 1 | 0.0250 | 0.0086 | 0.0080 | 0.0419 | 8.36 | 0.0038 |
| EQP | 1 | 0.2968 | 0.0614 | 0.1764 | 0.4172 | 23.33 | <.0001 |
| NEQ | 1 | 0.0885 | 0.0328 | 0.0242 | 0.1527 | 7.29 | 0.0069 |
| Scale | 1 | 0.0099 | |||||
| Diagnostics | ||||||
|---|---|---|---|---|---|---|
| Obs | country | Mahalanobis Distance | Robust MCD Distance | Leverage | Standardized Robust Residual |
Outlier |
| 1 | Argentin | 2.6083 | 4.0639 | * | -0.9424 | |
| 5 | Botswana | 3.4351 | 6.7391 | * | 1.4200 | |
| 8 | Canada | 3.1876 | 4.6843 | * | -0.1972 | |
| 9 | Chile | 3.6752 | 5.0599 | * | -1.8784 | |
| 17 | Finland | 2.6024 | 3.8186 | * | -1.7971 | |
| 23 | HongKong | 2.1225 | 3.8238 | * | 1.7161 | |
| 27 | Israel | 2.6461 | 5.0336 | * | 0.0909 | |
| 31 | Japan | 2.9179 | 4.7140 | * | 0.0216 | |
| 53 | Tanzania | 2.2600 | 4.3193 | * | -1.8082 | |
| 57 | U.S. | 3.8701 | 5.4874 | * | 0.1448 | |
| 58 | Uruguay | 2.5953 | 3.9671 | * | -0.0978 | |
| 59 | Venezuel | 2.9239 | 4.1663 | * | 0.3573 | |
| 60 | Zambia | 1.8562 | 2.7135 | -4.9798 | * | |
| 61 | Zimbabwe | 1.9634 | 3.9128 | * | -2.5959 | |
Figure 74.3.5 displays robust versions of goodness-of-fit statistics for the model.
The PLOTS=ALL option generates four diagnostic plots. Figure 74.3.6 and Figure 74.3.7 are for outlier and leverage-point diagnostics. Figure 74.3.8 and Figure 74.3.9 are a histogram and a Q-Q plot of the standardized robust residuals, respectively.




The following statements invoke the ROBUSTREG procedure with LTS estimation, which was used by Zaman, Rousseeuw, and Orhan (2001). The results are consistent with those of M estimation.
proc robustreg method=lts(h=33) fwls data=growth;
model GDP = LFG GAP EQP NEQ / diagnostics leverage ;
id country;
run;
| LTS Parameter Estimates | ||
|---|---|---|
| Parameter | DF | Estimate |
| Intercept | 1 | -0.0249 |
| LFG | 1 | 0.1123 |
| GAP | 1 | 0.0214 |
| EQP | 1 | 0.2669 |
| NEQ | 1 | 0.1110 |
| Scale (sLTS) | 0 | 0.0076 |
| Scale (Wscale) | 0 | 0.0109 |
Output 74.3.10 displays the LTS estimates.
| Diagnostics | ||||||
|---|---|---|---|---|---|---|
| Obs | country | Mahalanobis Distance | Robust MCD Distance | Leverage | Standardized Robust Residual |
Outlier |
| 1 | Argentin | 2.6083 | 4.0639 | * | -1.0715 | |
| 5 | Botswana | 3.4351 | 6.7391 | * | 1.6574 | |
| 8 | Canada | 3.1876 | 4.6843 | * | -0.2324 | |
| 9 | Chile | 3.6752 | 5.0599 | * | -2.0896 | |
| 17 | Finland | 2.6024 | 3.8186 | * | -1.6367 | |
| 23 | HongKong | 2.1225 | 3.8238 | * | 1.7570 | |
| 27 | Israel | 2.6461 | 5.0336 | * | 0.2334 | |
| 31 | Japan | 2.9179 | 4.7140 | * | 0.0971 | |
| 53 | Tanzania | 2.2600 | 4.3193 | * | -1.2978 | |
| 57 | U.S. | 3.8701 | 5.4874 | * | 0.0605 | |
| 58 | Uruguay | 2.5953 | 3.9671 | * | -0.0857 | |
| 59 | Venezuel | 2.9239 | 4.1663 | * | 0.4113 | |
| 60 | Zambia | 1.8562 | 2.7135 | -4.4984 | * | |
| 61 | Zimbabwe | 1.9634 | 3.9128 | * | -2.1201 | |
Output 74.3.11 displays outlier and leverage-point diagnostics based on the LTS estimates.
| Parameter Estimates for Final Weighted Least Squares Fit | |||||||
|---|---|---|---|---|---|---|---|
| Parameter | DF | Estimate | Standard Error | 95% Confidence Limits | Chi-Square | Pr > ChiSq | |
| Intercept | 1 | -0.0222 | 0.0093 | -0.0405 | -0.0039 | 5.65 | 0.0175 |
| LFG | 1 | 0.0446 | 0.1771 | -0.3026 | 0.3917 | 0.06 | 0.8013 |
| GAP | 1 | 0.0245 | 0.0082 | 0.0084 | 0.0406 | 8.89 | 0.0029 |
| EQP | 1 | 0.2824 | 0.0581 | 0.1685 | 0.3964 | 23.60 | <.0001 |
| NEQ | 1 | 0.0849 | 0.0314 | 0.0233 | 0.1465 | 7.30 | 0.0069 |
| Scale | 0 | 0.0116 | |||||
Output 74.3.12 displays the final weighted least squares estimates, which are identical to those reported in Zaman, Rousseeuw, and Orhan (2001).
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.