Forecasting a Seasonal ARMA Process

Overview

Many economic and business variables are affected by seasonal factors. For example, power usage is highest in the months when temperatures are most extreme. The most common type of seasonality is variation due to the time of year, but other types of seasonality are also found in time series data.

Seasonal models are often multiplicative rather than additive. A multiplicative model includes the product of one or more nonseasonal parameters with one or more seasonal parameters. For example, a multiplicative model with both autoregressive and moving average terms (an ARMA model) and with yearly seasonality for a time series, y_t, can be written as:

where

is the intercept parameter.

is the nonseasonal first-order autoregressive parameter.

is the seasonal autoregressive parameter.

is the nonseasonal first-order moving average parameter.

is the seasonal moving average parameter.

To identify a seasonal model, you need to examine the autocorrelation function (ACF) and the inverse autocorrelation function (IACF) plots. For multiplicative MA processes, there are small spikes in the ACF plot q lags before and after the seasonal lag, where q is the number of nonseasonal MA parameters necessary to model the data. These small spikes are usually in the opposite direction of the seasonal spike. For example, a multiplicative MA(1, 12) process typically has small spikes at lags 11 and 13 on either side of, and in the opposite direction of, a large spike at lag 12.

An additive MA process typically has small spikes q lags before the seasonal lag, where q is the number of nonseasonal MA parameters necessary to model the data. For example, an additive MA(1, 12) process typically has a small spike at lag 11 and a larger spike at lag 12.

To identify an AR process, look for the patterns described previously in the IACF plot rather than in the ACF plot. If a process contains both AR and MA components, the patterns may appear in both the ACF and IACF plots.

This example develops an ARMA model for steel shipments from U.S. steel mills.

Analysis

The identification and estimation of Autoregressive Integrated Moving Average (ARIMA) models is more of an art than a science. Generally, the most parsimonious model fitting the data is considered the best. This example uses steel shipments data taken from Metal Statistics 1993. The values represent monthly totals of steel products shipped from U.S. steel mills, in thousands of net tons, for the period from January 1984 to December 1991. The following statements create the data set STEEL.

   data steel;
      input date:monyy5. steelshp @@;
      format date monyy5.;
      title 'U.S. Steel Shipments Data';
      title2 '(thousands of net tons)';
      datalines;
   JAN84 5980 FEB84 6150 MAR84 7240 APR84 6472 MAY84 6948 JUN84 6686
   JUL84 5820 AUG84 6033 SEP84 5454 OCT84 6087 NOV84 5317 DEC84 4867
   ... more data lines ...
   ;

The analysis performed by the ARIMA procedure is divided into three stages, corresponding to the stages described by Box and Jenkins (1976). The IDENTIFY, ESTIMATE, and FORECAST statements perform these three stages. In the identification stage, you use the IDENTIFY statement to specify the response series and identify candidate ARIMA models for it. The IDENTIFY statement reads time series that are to be used in later statements, possibly differencing them, and computes autocorrelations, inverse autocorrelations, partial autocorrelations, and cross correlations. The analysis of this output usually suggests one or more ARIMA models that could be fit. The VAR= option specifies the variable to be identified.

  proc arima data=steel;
      i var=steelshp;
   run;

U.S. Steel Shipments Data

(thousands of net tons)

The ARIMA Procedure

Autocorrelations
Lag	Covariance	Correlation	-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1	Std Error
0	406442	1.00000	\| \|********************\|	0
1	262630	0.64617	\| . \|************* \|	0.102062
2	261597	0.64363	\| . \|************* \|	0.138258
3	235909	0.58042	\| . \|************ \|	0.166570
4	168515	0.41461	\| . \|******** \|	0.186451
5	201896	0.49674	\| . \|********** \|	0.195820
6	129000	0.31739	\| . \|****** . \|	0.208533
7	152701	0.37570	\| . \|********. \|	0.213506
8	113117	0.27831	\| . \|****** . \|	0.220285
9	127532	0.31378	\| . \|****** . \|	0.223918
10	137000	0.33707	\| . \|******* . \|	0.228452
11	130723	0.32163	\| . \|****** . \|	0.233575
12	200408	0.49308	\| . \|********** \|	0.238144
13	112496	0.27678	\| . \|****** . \|	0.248551
14	135119	0.33244	\| . \|******* . \|	0.251741
15	103295	0.25414	\| . \|***** . \|	0.256273
16	62982.090	0.15496	\| . \|*** . \|	0.258885
17	108381	0.26666	\| . \|***** . \|	0.259850
18	42836.479	0.10539	\| . \|** . \|	0.262685
19	65840.039	0.16199	\| . \|*** . \|	0.263125
20	37765.859	0.09292	\| . \|** . \|	0.264162
21	27790.106	0.06837	\| . \|* . \|	0.264502
22	40303.846	0.09916	\| . \|** . \|	0.264686
23	46097.710	0.11342	\| . \|** . \|	0.265073
24	76317.464	0.18777	\| . \|**** . \|	0.265578

"." marks two standard errors

The large spike at lag 12 in the ACF plot provides evidence that the steel shipments time series has a seasonal autoregressive component. The lack of a large spike at lag 24 indicates that the series is stationary at the seasonal level.

U.S. Steel Shipments Data

(thousands of net tons)

The ARIMA Procedure

Inverse Autocorrelations
Lag	Correlation	-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1	-0.37291	\| *******\| . \|
2	0.08136	\| . \|** . \|
3	-0.31032	\| ******\| . \|
4	0.16197	\| . \|***. \|
5	-0.20750	\| ****\| . \|
6	0.16115	\| . \|***. \|
7	-0.02341	\| . \| . \|
8	0.06910	\| . \|* . \|
9	0.00628	\| . \| . \|
10	0.02046	\| . \| . \|
11	0.02875	\| . \|* . \|
12	-0.23279	\| *****\| . \|
13	0.03755	\| . \|* . \|
14	0.04050	\| . \|* . \|
15	0.03498	\| . \|* . \|
16	0.09969	\| . \|** . \|
17	-0.10703	\| . **\| . \|
18	0.04901	\| . \|* . \|
19	-0.08634	\| . **\| . \|
20	0.02281	\| . \| . \|
21	0.00844	\| . \| . \|
22	0.10510	\| . \|** . \|
23	-0.10923	\| . **\| . \|
24	0.02676	\| . \|* . \|

The spikes at lags 1 and 3 in the IACF plot indicate that other components are necessary to fit an adequate model. The null hypothesis of white noise residuals is resoundingly rejected.

U.S. Steel Shipments Data

(thousands of net tons)

The ARIMA Procedure

Autocorrelation Check for White Noise
To Lag	Chi-Square	DF	Pr > ChiSq	Autocorrelations
6	170.51	6	<.0001	0.646	0.644	0.580	0.415	0.497	0.317
12	255.47	12	<.0001	0.376	0.278	0.314	0.337	0.322	0.493
18	296.96	18	<.0001	0.277	0.332	0.254	0.155	0.267	0.105
24	309.34	24	<.0001	0.162	0.093	0.068	0.099	0.113	0.188

In the estimation and diagnostic checking stage, you use the ESTIMATE statement to specify the ARIMA model to fit to the variable specified in the previous IDENTIFY statement and to estimate the parameters of that model. The ESTIMATE statement also produces diagnostic statistics to help you judge the adequacy of the model.

Significance tests for parameter estimates indicate whether some terms in the model may be unnecessary. Goodness-of-fit statistics aid in comparing this model to others. Tests for white noise residuals indicate whether the residual series contains additional information that might be used by a more complex model. If the diagnostic tests indicate problems with the model, you try another model, then repeat the estimation and diagnostic checking stage.

The following statement fits a seasonal ARMA model to the time series. In the syntax of the ESTIMATE statement, the two multiplicative AR terms, denoted by the P= option, are enclosed in separate parentheses. The two additive MA terms, denoted by the Q= option, are separated by a space within a single set of parentheses.

      e p=(2)(12) q=(1 3);
   run;

U.S. Steel Shipments Data

(thousands of net tons)

The ARIMA Procedure

Autocorrelation Check of Residuals
To Lag	Chi-Square	DF	Pr > ChiSq	Autocorrelations
6	2.42	2	0.2979	-0.009	-0.051	0.071	0.070	0.104	0.018
12	3.63	8	0.8891	-0.084	0.032	-0.024	0.013	-0.033	-0.035
18	11.86	14	0.6176	-0.082	0.168	0.014	-0.137	0.107	0.073
24	16.16	20	0.7066	0.023	0.019	-0.010	-0.047	0.174	-0.000

Model for variable steelshp
Estimated Mean	6057.122

Autoregressive Factors
Factor 1:	1 - 0.54234 B**(2)
Factor 2:	1 - 0.64802 B**(12)

Moving Average Factors
Factor 1:	1 + 0.55505 B(1) + 0.43689 B(3)

The Autocorrelation Check of Residuals shows that none of the Q-statistics are statistically significant. This indicates that the model provides an adequate fit to the data.

U.S. Steel Shipments Data

(thousands of net tons)

The ARIMA Procedure

Conditional Least Squares Estimation
Parameter	Estimate	Standard Error	t Value	Approx Pr > \|t\|	Lag
MU	6057.1	232.96713	26.00	<.0001	0
MA1,1	-0.55505	0.08021	-6.92	<.0001	1
MA1,2	-0.43689	0.07936	-5.51	<.0001	3
AR1,1	0.54234	0.09903	5.48	<.0001	2
AR2,1	0.64802	0.09392	6.90	<.0001	12

Constant Estimate	975.7391
Variance Estimate	126334.1
Std Error Estimate	355.4351
AIC	1404.983
SBC	1417.805
Number of Residuals	96

All of the estimated parameters have relatively large t-statistics, which indicates that these parameters cannot be omitted from the model.

In the forecasting stage, you use the FORECAST statement to forecast future values of the time series and to generate confidence intervals for these forecasts from the ARIMA model produced by the preceding ESTIMATE statement.

The following statements produce forecasts and upper and lower 95% confidence limits for 12 future periods and creates the output data set STEEL2.

      f lead=12
        out=steel2
        id=date
        interval=month
        noprint;
   run;

To prepare the output data set for plotting, change the values for the forecasts and confidence limits to missing for all dates prior to the future forecast periods.

   data steel3;
      set steel2;
      if date lt '01jan92'd then do;
         forecast=.;
         l95=.;
         u95=.;
      end;
   run;

Use the GPLOT procedure to plot the data.

   proc gplot data=steel3;
      format date year4.;
      plot steelshp*date=1
           forecast*date=2
           l95*date=3
           u95*date=3 / overlay cframe=ligr
                        haxis=axis1 vaxis=axis2
                        vminor=1 href='01jan92'd;
      title 'U.S. Steel Shipments Data';
      title2 '(thousands of net tons)';
      axis1 offset=(1 cm)
            label=('Year') minor=none
            order=('01jan84'd to '01jan93'd by year);
      axis2 label=(angle=90 'Steel Shipments')
            order=(4500 to 8500 by 1000);
      symbol1 c=blue  i=join l=1 v=star;
      symbol2 c=red   i=join l=1 v=F;
      symbol3 c=green i=join l=20;
   run;
   quit;

The values of the original steel shipments time series are plotted with the star symbol. The forecasts are plotted with the F symbol, and the upper and lower 95% confidence limits for the forecasts are plotted with dashed lines.

Because the model fit to the steel shipments data includes a seasonal component, the forecasts do not follow a simple linear trend. Instead, the forecasts show variability due to the season (month of the year).

References

Box, G.E.P. and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day.

Chilton Publications (1993), Metal Statistics 1993, New York: Chilton Publications.

Hamilton, J. (1994), Time Series Analysis, Princeton, NJ: Princeton University Press.

SAS Institute Inc. (1996), Forecasting Examples for Business and Economics Using the SAS System, Cary, NC: SAS Institute Inc.

SAS Institute Inc. (1993), SAS/ETS User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc.