Modeling in the Presence of Outliers
In practice, modeling and forecasting time series data in the presence of outliers is a difficult
problem for several reasons. The presence of outliers can adversely affect the model identification
and estimation steps. Their presence close to the end of the observation period can have a
serious impact on the forecasting performance of the model.
In some cases level shifts are associated with changes in the mechanism driving the observation process,
and separate models may be appropriate to different sections of the data.
In view of all these difficulties, diagnostic tools such as outlier detection and residual analysis are
essential in any modeling process.
The following modeling strategy, which incorporates level shift detection in the familiar Box-Jenkins
modeling methodology, seems to work in many cases:
- Proceed with model identification and estimation as usual. Suppose this results in a tentative ARIMA model,
say M.
- Check for suspected level shifts unaccounted for by model M using the OUTLIER statement. In this step,
unless there is evidence to justify it, the number of level shifts searched should be kept small.
- Augment the original dataset with the regression variables corresponding to the detected outliers.
- Include the first few of these regression variables in M, and call this model M1. Re-estimate all the
parameters of M1. It is important not to include too many of these outlier variables in the model
in order to avoid the danger of over-fitting.
- Check the adequacy of M1 by examining the parameter estimates, residual analysis, and
outlier detection. Refine it more if necessary.
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.