Example 27.2 Backcasting, Forecasting, and Interpolation

This example illustrates how you can do model-based extrapolation—backcasting, forecasting, or interpolation—of a response variable. All you need is to appropriately augment the input data set with the relevant ID and predictor information and assign missing values to the response variable in these places. Suppose that the response variable is named Y. Then, after an appropriate model is fit to the data, smoothed_Y (a variable in the output data set specified in the OUT= option in the FORECAST statement) contains the needed estimates of the extrapolated Y (the corresponding standard errors are in the variable stderr_smoothed_Y).

The following DATA step creates one such augmented data set by using a well-known data set that contains the recordings of the Nile river water level measured between the years 1871 and 1970. Suppose you want to backcast the Nile water level for two years before 1871, forecast it for two years after 1970, and interpolate its value for the year 1921—assuming that this value is missing in the available data set.

data nile;
   input level @@;
   year = intnx( 'year', '1jan1869'd, _n_-1 );
   format year year4.;
   if year = '1jan1921'd then level=.;
datalines;
. .
1120  1160  963  1210  1160  1160  813  1230   1370  1140
995   935   1110 994   1020  960   1180 799    958   1140
1100  1210  1150 1250  1260  1220  1030 1100   774   840
874   694   940  833   701   916   692  1020   1050  969
831   726   456  824   702   1120  1100 832    764   821
768   845   864  862   698   845   744  796    1040  759
781   865   845  944   984   897   822  1010   771   676
649   846   812  742   801   1040  860  874    848   890
744   749   838  1050  918   986   797  923    975   815
1020  906   901  1170  912   746   919  718    714   740
. . 
;

It is also known that for this time span the Nile water level can be reasonably modeled as a sum of a random walk trend, a level shift in the year 1899, and the observation error. The following statements fit this model to the data. The forecast output is stored in nileOut.

 proc ssm data=nile;
     id year interval=year; 
     shift1899 = ( year >= '1jan1899'd );
     trend rw(rw);
     irregular wn; 
     model level = shift1899 RW wn;
     forecast out=nileOut;
 quit;

The following statements print the extrapolated values of the Nile water level, as shown in Output 27.2.1.

 proc print data=nileOut noobs;
    var year level smoothed_level stderr_smoothed_level;
    where level=.;
 run;

Output 27.2.1 Extrapolated Nile Water Level
year level Smoothed_level StdErr_Smoothed_level
1869 . 1097.75 130.231
1870 . 1097.75 130.231
1921 . 851.13 128.864
1971 . 851.13 128.864
1972 . 851.13 128.864


Note: This procedure is experimental.