FOCUS AREAS

SAS/ETS Examples

Accounting for Missing Observations in Time Series Data


Contents | SAS Program



Overview

A critical assumption in all time series models is that the observations are sampled with the same frequency. Unfortunately it is often the case that some values of the variable of interest are either missing or unavailable for certain dates in the sample period. There are several ways of dealing with the problem, including aggregation and interpolation, which are illustrated in the example "Transforming the Frequency of Time Series Data."

Many times, however, you merely want to make note of the missing values and proceed with the analysis using only those observations for which you have data. For instance, you are interested in analyzing a company's cash balances at the beginning of the month. Perusal of the records reveals that, before August 1996, entries in the books are sporadic. If the data set is large, it may not be altogether obvious which months are missing.

In the example, the EXPAND procedure is used to provide missing values for the omitted dates in an aperiodic data set.



Analysis

Suppose that you want to analyze the amount of cash in a company's account at the beginning of each month over the past year and a half. Inspection of the books yields the following data set shown in Figure 1.



   data cash;
      input date : monyy. balance @@;
      label balance = "Cash Account Balance";
      format date monyy.;
   datalines;
   aug95 84 sep95 52 oct95  8 dec95 98 jan96 61 feb96 24 may96 67 jun96 58
   aug96 43 sep96  3 oct96 73 nov96 90 dec96 89 jan97 55 feb97 86 mar97 79
   apr97 23
   ;


   proc print data=cash;
      title 'Cash Account Balances - Original Data';
   run;

Cash Account Balances - Original Data

Obs date balance
1 AUG95 84
2 SEP95 52
3 OCT95 8
4 DEC95 98
5 JAN96 61
6 FEB96 24
7 MAY96 67
8 JUN96 58
9 AUG96 43
10 SEP96 3
11 OCT96 73
12 NOV96 90
13 DEC96 89
14 JAN97 55
15 FEB97 86
16 MAR97 79
17 APR97 23





Figure 1: Cash Account Balances - Original Data

Notice that there are several months with no entry.

If you want to create a data set that includes the missing observations for the months with no entry and keep the cash value missing in these observations, you can use the EXPAND procedure with the following options:



   proc expand data=cash out=cash2 to=month method=none;
      id date;
   run;

   proc print data=cash2;
      title 'Cash Account Balances - Expanded Data';
   run;

The DATA= option specifies the input SAS data set as CASH and the OUT= option creates an output data set CASH2. The TO= option determines the frequency of the output data set; in this case, the observations are monthly. The METHOD=NONE option specifies that no interpolation be performed. The METHOD=NONE option cannot be used when frequency conversion is specified; however, in this case, you are interested only in including missing values for the omitted observations. The modified data set appears in Figure 2.

Cash Account Balances - Expanded Data

Obs date balance
1 AUG1995 84
2 SEP1995 52
3 OCT1995 8
4 NOV1995 .
5 DEC1995 98
6 JAN1996 61
7 FEB1996 24
8 MAR1996 .
9 APR1996 .
10 MAY1996 67
11 JUN1996 58
12 JUL1996 .
13 AUG1996 43
14 SEP1996 3
15 OCT1996 73
16 NOV1996 90
17 DEC1996 89
18 JAN1997 55
19 FEB1997 86
20 MAR1997 79
21 APR1997 23





Figure 2: Cash Account Balances - Expanded Data

The following code creates the plot of the data in Figure 3 with the missing observations highlighted.



   data graph;
      set cash2;
      if balance=. then unknown=98;
   run;

   proc gplot data=graph;
      plot balance*date unknown*date / overlay vaxis=axis2
                                       href='01nov95'd
                                       href='01mar96'd
                                       href='01apr96'd
                                       href='01jul96'd
                                       chref=red lhref=2
                                       cframe=ligr;
      title1 "Plot of Cash Balance at Beginning of Month" h=3;
      axis2 label=(angle=90 'Cash Account Balance');
      symbol1 c=blue  interpol=join value=star;
      symbol2 c=black interpol=none font=complex h=7 value="?";
   run;
   quit;

missingplot.gif (6760 bytes)


Figure 3: Beginning of Month Cash Balances with Missing Data



You can now proceed with the analysis using a data set that contains an observation, numerical or missing, for every date in the sampling period.



References

SAS Institute Inc. (1993), SAS/ETS User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc.