# SAS/ETS Examples

## Forecasting a Seasonal ARMA Process

Contents | SAS Program

# Overview

Many economic and business variables are affected by seasonal factors. For example, power usage is highest in the months when temperatures are most extreme. The most common type of seasonality is variation due to the time of year, but other types of seasonality are also found in time series data.

Seasonal models are often multiplicative rather than additive. A multiplicative model includes the product of one or more nonseasonal parameters with one or more seasonal parameters. For example, a multiplicative model with both autoregressive and moving average terms (an ARMA model) and with yearly seasonality for a time series, yt, can be written as:

where
is the intercept parameter.
is the nonseasonal first-order autoregressive parameter.
is the seasonal autoregressive parameter.
is the nonseasonal first-order moving average parameter.
is the seasonal moving average parameter.

To identify a seasonal model, you need to examine the autocorrelation function (ACF) and the inverse autocorrelation function (IACF) plots. For multiplicative MA processes, there are small spikes in the ACF plot q lags before and after the seasonal lag, where q is the number of nonseasonal MA parameters necessary to model the data. These small spikes are usually in the opposite direction of the seasonal spike. For example, a multiplicative MA(1, 12) process typically has small spikes at lags 11 and 13 on either side of, and in the opposite direction of, a large spike at lag 12.

An additive MA process typically has small spikes q lags before the seasonal lag, where q is the number of nonseasonal MA parameters necessary to model the data. For example, an additive MA(1, 12) process typically has a small spike at lag 11 and a larger spike at lag 12.

To identify an AR process, look for the patterns described previously in the IACF plot rather than in the ACF plot. If a process contains both AR and MA components, the patterns may appear in both the ACF and IACF plots.

This example develops an ARMA model for steel shipments from U.S. steel mills.

# Analysis

The identification and estimation of Autoregressive Integrated Moving Average (ARIMA) models is more of an art than a science. Generally, the most parsimonious model fitting the data is considered the best. This example uses steel shipments data taken from Metal Statistics 1993. The values represent monthly totals of steel products shipped from U.S. steel mills, in thousands of net tons, for the period from January 1984 to December 1991. The following statements create the data set STEEL.

   data steel;
input date:monyy5. steelshp @@;
format date monyy5.;
title 'U.S. Steel Shipments Data';
title2 '(thousands of net tons)';
datalines;
JAN84 5980 FEB84 6150 MAR84 7240 APR84 6472 MAY84 6948 JUN84 6686
JUL84 5820 AUG84 6033 SEP84 5454 OCT84 6087 NOV84 5317 DEC84 4867
... more data lines ...
;


The analysis performed by the ARIMA procedure is divided into three stages, corresponding to the stages described by Box and Jenkins (1976). The IDENTIFY, ESTIMATE, and FORECAST statements perform these three stages. In the identification stage, you use the IDENTIFY statement to specify the response series and identify candidate ARIMA models for it. The IDENTIFY statement reads time series that are to be used in later statements, possibly differencing them, and computes autocorrelations, inverse autocorrelations, partial autocorrelations, and cross correlations. The analysis of this output usually suggests one or more ARIMA models that could be fit. The VAR= option specifies the variable to be identified.

   proc arima data=steel;
i var=steelshp;
run;


 U.S. Steel Shipments Data (thousands of net tons)

 The ARIMA Procedure

 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 Std Error 0 406442 1.00000 |                    |********************| 0 1 262630 0.64617 |                .   |*************       | 0.102062 2 261597 0.64363 |              .     |*************       | 0.138258 3 235909 0.58042 |             .      |************        | 0.166570 4 168515 0.41461 |             .      |********            | 0.186451 5 201896 0.49674 |            .       |**********          | 0.195820 6 129000 0.31739 |            .       |****** .            | 0.208533 7 152701 0.37570 |           .        |********.           | 0.213506 8 113117 0.27831 |           .        |******  .           | 0.220285 9 127532 0.31378 |           .        |******  .           | 0.223918 10 137000 0.33707 |           .        |******* .           | 0.228452 11 130723 0.32163 |           .        |******  .           | 0.233575 12 200408 0.49308 |          .         |**********          | 0.238144 13 112496 0.27678 |          .         |******   .          | 0.248551 14 135119 0.33244 |          .         |*******  .          | 0.251741 15 103295 0.25414 |          .         |*****    .          | 0.256273 16 62982.090 0.15496 |          .         |***      .          | 0.258885 17 108381 0.26666 |          .         |*****    .          | 0.259850 18 42836.479 0.10539 |         .          |**        .         | 0.262685 19 65840.039 0.16199 |         .          |***       .         | 0.263125 20 37765.859 0.09292 |         .          |**        .         | 0.264162 21 27790.106 0.06837 |         .          |*         .         | 0.264502 22 40303.846 0.09916 |         .          |**        .         | 0.264686 23 46097.710 0.11342 |         .          |**        .         | 0.265073 24 76317.464 0.18777 |         .          |****      .         | 0.265578

 "." marks two standard errors

The large spike at lag 12 in the ACF plot provides evidence that the steel shipments time series has a seasonal autoregressive component. The lack of a large spike at lag 24 indicates that the series is stationary at the seasonal level.

 U.S. Steel Shipments Data (thousands of net tons)

 The ARIMA Procedure

 Inverse Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 -0.37291 |             *******|   .                | 2 0.08136 |                .   |** .                | 3 -0.31032 |              ******|   .                | 4 0.16197 |                .   |***.                | 5 -0.20750 |                ****|   .                | 6 0.16115 |                .   |***.                | 7 -0.02341 |                .   |   .                | 8 0.06910 |                .   |*  .                | 9 0.00628 |                .   |   .                | 10 0.02046 |                .   |   .                | 11 0.02875 |                .   |*  .                | 12 -0.23279 |               *****|   .                | 13 0.03755 |                .   |*  .                | 14 0.04050 |                .   |*  .                | 15 0.03498 |                .   |*  .                | 16 0.09969 |                .   |** .                | 17 -0.10703 |                . **|   .                | 18 0.04901 |                .   |*  .                | 19 -0.08634 |                . **|   .                | 20 0.02281 |                .   |   .                | 21 0.00844 |                .   |   .                | 22 0.10510 |                .   |** .                | 23 -0.10923 |                . **|   .                | 24 0.02676 |                .   |*  .                |

The spikes at lags 1 and 3 in the IACF plot indicate that other components are necessary to fit an adequate model. The null hypothesis of white noise residuals is resoundingly rejected.

 U.S. Steel Shipments Data (thousands of net tons)

 The ARIMA Procedure

 Autocorrelation Check for White Noise To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 170.51 6 <.0001 0.646 0.644 0.580 0.415 0.497 0.317 12 255.47 12 <.0001 0.376 0.278 0.314 0.337 0.322 0.493 18 296.96 18 <.0001 0.277 0.332 0.254 0.155 0.267 0.105 24 309.34 24 <.0001 0.162 0.093 0.068 0.099 0.113 0.188

In the estimation and diagnostic checking stage, you use the ESTIMATE statement to specify the ARIMA model to fit to the variable specified in the previous IDENTIFY statement and to estimate the parameters of that model. The ESTIMATE statement also produces diagnostic statistics to help you judge the adequacy of the model.

Significance tests for parameter estimates indicate whether some terms in the model may be unnecessary. Goodness-of-fit statistics aid in comparing this model to others. Tests for white noise residuals indicate whether the residual series contains additional information that might be used by a more complex model. If the diagnostic tests indicate problems with the model, you try another model, then repeat the estimation and diagnostic checking stage.

The following statement fits a seasonal ARMA model to the time series. In the syntax of the ESTIMATE statement, the two multiplicative AR terms, denoted by the P= option, are enclosed in separate parentheses. The two additive MA terms, denoted by the Q= option, are separated by a space within a single set of parentheses.

      e p=(2)(12) q=(1 3);
run;


 U.S. Steel Shipments Data (thousands of net tons)

 The ARIMA Procedure

 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 2.42 2 0.2979 -0.009 -0.051 0.071 0.070 0.104 0.018 12 3.63 8 0.8891 -0.084 0.032 -0.024 0.013 -0.033 -0.035 18 11.86 14 0.6176 -0.082 0.168 0.014 -0.137 0.107 0.073 24 16.16 20 0.7066 0.023 0.019 -0.010 -0.047 0.174 -0.000

 Model for variable steelshp Estimated Mean 6057.122

 Autoregressive Factors Factor 1: 1 - 0.54234 B**(2) Factor 2: 1 - 0.64802 B**(12)

 Moving Average Factors Factor 1: 1 + 0.55505 B**(1) + 0.43689 B**(3)

The Autocorrelation Check of Residuals shows that none of the Q-statistics are statistically significant. This indicates that the model provides an adequate fit to the data.

 U.S. Steel Shipments Data (thousands of net tons)

 The ARIMA Procedure

 Conditional Least Squares Estimation Parameter Estimate Standard Error t Value Approx Pr > |t| Lag MU 6057.1 232.96713 26.00 <.0001 0 MA1,1 -0.55505 0.08021 -6.92 <.0001 1 MA1,2 -0.43689 0.07936 -5.51 <.0001 3 AR1,1 0.54234 0.09903 5.48 <.0001 2 AR2,1 0.64802 0.09392 6.90 <.0001 12

 Constant Estimate 975.739 Variance Estimate 126334 Std Error Estimate 355.435 AIC 1404.98 SBC 1417.81 Number of Residuals 96

All of the estimated parameters have relatively large t-statistics, which indicates that these parameters cannot be omitted from the model.

In the forecasting stage, you use the FORECAST statement to forecast future values of the time series and to generate confidence intervals for these forecasts from the ARIMA model produced by the preceding ESTIMATE statement.

The following statements produce forecasts and upper and lower 95% confidence limits for 12 future periods and creates the output data set STEEL2.

      f lead=12
out=steel2
id=date
interval=month
noprint;
run;


To prepare the output data set for plotting, change the values for the forecasts and confidence limits to missing for all dates prior to the future forecast periods.

   data steel3;
set steel2;
if date lt '01jan92'd then do;
forecast=.;
l95=.;
u95=.;
end;
run;


Use the GPLOT procedure to plot the data.

   proc gplot data=steel3;
format date year4.;
plot steelshp*date=1
forecast*date=2
l95*date=3
u95*date=3 / overlay cframe=ligr
haxis=axis1 vaxis=axis2
vminor=1 href='01jan92'd;
title 'U.S. Steel Shipments Data';
title2 '(thousands of net tons)';
axis1 offset=(1 cm)
label=('Year') minor=none
order=('01jan84'd to '01jan93'd by year);
axis2 label=(angle=90 'Steel Shipments')
order=(4500 to 8500 by 1000);
symbol1 c=blue  i=join l=1 v=star;
symbol2 c=red   i=join l=1 v=F;
symbol3 c=green i=join l=20;
run;
quit;

The values of the original steel shipments time series are plotted with the star symbol. The forecasts are plotted with the F symbol, and the upper and lower 95% confidence limits for the forecasts are plotted with dashed lines.

Because the model fit to the steel shipments data includes a seasonal component, the forecasts do not follow a simple linear trend. Instead, the forecasts show variability due to the season (month of the year).

# References

Box, G.E.P. and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day.

Chilton Publications (1993), Metal Statistics 1993, New York: Chilton Publications.

Hamilton, J. (1994), Time Series Analysis, Princeton, NJ: Princeton University Press.

SAS Institute Inc. (1996), Forecasting Examples for Business and Economics Using the SAS System, Cary, NC: SAS Institute Inc.

SAS Institute Inc. (1993), SAS/ETS User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc.