A critical assumption in all time series models is that the observations are sampled with the same frequency. Unfortunately it is often the case that some values of the variable of interest are either missing or unavailable for certain dates in the sample period. There are several ways of dealing with the problem, including aggregation and interpolation, which are illustrated in the example "Transforming the Frequency of Time Series Data."
Many times, however, you merely want to make note of the missing values and proceed with the analysis using only those observations for which you have data. For instance, you are interested in analyzing a company's cash balances at the beginning of the month. Perusal of the records reveals that, before August 1996, entries in the books are sporadic. If the data set is large, it may not be altogether obvious which months are missing.
In the example, the EXPAND procedure is used to provide missing values for the omitted dates in an aperiodic data set.
Suppose that you want to analyze the amount of cash in a company's account at the beginning of each month over the past year and a half. Inspection of the books yields the following data set shown in Figure 1.
data cash; input date : monyy. balance @@; label balance = "Cash Account Balance"; format date monyy.; datalines; aug95 84 sep95 52 oct95 8 dec95 98 jan96 61 feb96 24 may96 67 jun96 58 aug96 43 sep96 3 oct96 73 nov96 90 dec96 89 jan97 55 feb97 86 mar97 79 apr97 23 ;
proc print data=cash; title 'Cash Account Balances - Original Data'; run;
Figure 1: Cash Account Balances - Original Data
Notice that there are several months with no entry.
If you want to create a data set that includes the missing observations for the months with no entry and keep the cash value missing in these observations, you can use the EXPAND procedure with the following options:
proc expand data=cash out=cash2 to=month method=none; id date; run; proc print data=cash2; title 'Cash Account Balances - Expanded Data'; run;
The DATA= option specifies the input SAS data set as CASH and the OUT= option creates an output data set CASH2. The TO= option determines the frequency of the output data set; in this case, the observations are monthly. The METHOD=NONE option specifies that no interpolation be performed. The METHOD=NONE option cannot be used when frequency conversion is specified; however, in this case, you are interested only in including missing values for the omitted observations. The modified data set appears in Figure 2.
Figure 2: Cash Account Balances - Expanded Data
The following code creates the plot of the data in Figure 3 with the missing observations highlighted.
data graph; set cash2; if balance=. then unknown=98; run; proc gplot data=graph; plot balance*date unknown*date / overlay vaxis=axis2 href='01nov95'd href='01mar96'd href='01apr96'd href='01jul96'd chref=red lhref=2 cframe=ligr; title1 "Plot of Cash Balance at Beginning of Month" h=3; axis2 label=(angle=90 'Cash Account Balance'); symbol1 c=blue interpol=join value=star; symbol2 c=black interpol=none font=complex h=7 value="?"; run; quit;
Figure 3: Beginning of Month Cash Balances with Missing Data
You can now proceed with the analysis using a data set that contains an observation, numerical or missing, for every date in the sampling period.
SAS Institute Inc. (1993), SAS/ETS User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
/*-----------------------------------------------------------------
Example: Accounting for Missing Observations in Time Series Data
Requires: SAS/ETS, SAS/GRAPH
Version: 9.0
------------------------------------------------------------------*/
ods trace on;
data cash;
input date : monyy. balance @@;
label balance = "Cash Account Balance";
format date monyy.;
datalines;
aug95 84 sep95 52 oct95 8 dec95 98 jan96 61 feb96 24 may96 67 jun96 58
aug96 43 sep96 3 oct96 73 nov96 90 dec96 89 jan97 55 feb97 86 mar97 79
apr97 23
;
proc print data=cash;
title 'Cash Account Balances - Original Data';
run;
proc expand data=cash out=cash2 to=month method=none;
id date;
run;
proc print data=cash2;
title 'Cash Account Balances - Expanded Data';
run;
data graph;
set cash2;
if balance=. then unknown=98;
run;
proc gplot data=graph;
plot balance*date unknown*date / overlay vaxis=axis2
href='01nov95'd
href='01mar96'd
href='01apr96'd
href='01jul96'd
chref=red lhref=2
cframe=ligr;
title1 "Plot of Cash Balance at Beginning of Month" h=3;
axis2 label=(angle=90 'Cash Account Balance');
symbol1 c=blue interpol=join value=star;
symbol2 c=black interpol=none font=complex h=7 value="?";
run;
quit;
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
Type: | Sample |
Topic: | SAS Reference ==> Procedures ==> EXPAND |
Date Modified: | 2017-01-19 14:56:25 |
Date Created: | 2017-01-19 14:39:52 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | SAS/ETS | 64-bit Enabled Solaris | ||
64-bit Enabled HP-UX | ||||
64-bit Enabled AIX | ||||
Windows Vista for x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows 7 Ultimate x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Enterprise 32 bit | ||||
Microsoft Windows XP Professional | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 10 | ||||
Microsoft Windows 8.1 Pro x64 | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
OS/2 | ||||
Microsoft® Windows® for x64 | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
OpenVMS VAX | ||||
z/OS 64-bit | ||||
z/OS | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |