Distribution Analyses |
The sample standard deviation is a commonly used estimator of the population scale. However, it is sensitive to outliers and may not remain bounded when a single data point is replaced by an arbitrary number. With robust scale estimators, the estimates remain bounded even when a portion of the data points are replaced by arbitrary numbers.
A simple robust scale estimator is the interquartile range, which is the difference between the upper and lower quartiles. For a normal population, the standard deviation can be estimated by dividing the interquartile range by 1.34898.
Gini's mean difference is also a robust estimator of the standard deviation .It is computed as
If the observations are from a normal distribution, then is an unbiased estimator of the standard deviation .
A very robust scale estimator is the median absolute deviation (MAD) about the median (Hampel 1974).
For a normal distribution, 1.4826 MAD can be used to estimate the standard deviation .
The MAD statistic has low efficiency for normal distributions and it may not be appropriate for symmetric distributions. Rousseeuw and Croux (1993) proposed two new statistics as alternatives to the MAD statistic, Sn and Qn.
To reduce small-sample bias, csnSn is used to estimate the standard deviation , where csn is a correction factor (Croux and Rousseeuw 1992).
The second statistic is computed as
The bias-corrected statistic cqnQn is used to estimate the standard deviation , where cqnis the correction factor.
A Robust Measures of Scale table includes the interquartile range, Gini's mean difference, MAD, Sn, and Qn, with their corresponding estimates of ,as shown in Figure 38.14.
Figure 38.14: Robust Measures of Scale and Tests for Normality
Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.