This section outlines the use of the TIMESERIES procedure and gives a cursory description of some of the analysis techniques that can be performed on time-stamped transactional data.
Given an input data set that contains numerous transaction variables recorded over time at no specific frequency, the TIMESERIES procedure can form time series as follows:
PROC TIMESERIES DATA=<input-data-set> OUT=<output-data-set>; ID <time-ID-variable> INTERVAL=<frequency> ACCUMULATE=<statistic>; VAR <time-series-variables>; RUN;
The TIMESERIES procedure forms time series from the input time-stamped transactional data. It can provide results in output data sets or in other output formats by using the Output Delivery System (ODS).
Time-stamped transactional data are often recorded at no fixed interval. Analysts often want to use time series analysis techniques that require fixed-time intervals. Therefore, the transactional data must be accumulated to form a fixed-interval time series.
Suppose that a bank wants to analyze the transactions associated with each of its customers over time. Further, suppose that
the data set WORK.TRANSACTIONS
contains four variables that are related to these transactions: CUSTOMER
, DATE
, WITHDRAWAL
, and DEPOSITS
. The following examples illustrate possible ways to analyze these transactions by using the TIMESERIES procedure.
To accumulate the time-stamped transactional data to form a daily time series based on the accumulated daily totals of each
type of transaction (WITHDRAWALS
and DEPOSITS
), the following TIMESERIES procedure statements can be used:
proc timeseries data=transactions out=timeseries; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The OUT=TIMESERIES option specifies that the resulting time series data for each customer is to be stored in the data set
WORK.TIMESERIES
. The INTERVAL=DAY option specifies that the transactions are to be accumulated on a daily basis. The ACCUMULATE=TOTAL option
specifies that the sum of the transactions is to be calculated. After the transactional data is accumulated into a time series
format, many of the procedures provided with SAS/ETS software can be used to analyze the resulting time series data.
For example, the ARIMA procedure can be used to model and forecast each customer’s withdrawal data by using an ARIMA(0,1,1)(0,1,1) model (where the number of seasons is s=7 days in a week) using the following statements:
proc arima data=timeseries; identify var=withdrawals(1,7) noprint; estimate q=(1)(7) outest=estimates noprint; forecast id=date interval=day out=forecasts; quit;
The OUTEST=ESTIMATES data set contains the parameter estimates of the model specified. The OUT=FORECASTS data set contains forecasts based on the model specified. See the SAS/ETS ARIMA procedure for more detail.
A single set of transactions can be very large and must be summarized in order to analyze them effectively. Analysts often want to examine transactional data for trends and seasonal variation. To analyze transactional data for trends and seasonality, statistics must be computed for each time period and season of concern. For each observation, the time period and season must be determined and the data must be analyzed based on this determination.
The following statements illustrate how to use the TIMESERIES procedure to perform trend and seasonal analysis of time-stamped transactional data.
proc timeseries data=transactions out=out outseason=season outtrend=trend; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
Since the INTERVAL=DAY option is specified, the length of the seasonal cycle is seven (7) where the first season is Sunday and the last season is Saturday. The output data set specified by the OUTSEASON=SEASON option contains the seasonal statistics for each day of the week by each customer. The output data set specified by the OUTTREND=TREND option contains the trend statistics for each day of the calendar by each customer.
Often it is desired to seasonally decompose into seasonal, trend, cycle, and irregular components or to seasonally adjust a time series. The following techniques describe how the changing seasons influence the time series.
The following statements illustrate how to use the TIMESERIES procedure to perform seasonal adjustment/decomposition analysis of time-stamped transactional data.
proc timeseries data=transactions out=out outdecomp=decompose; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTDECOMP=DECOMPOSE data set contains the decomposed/adjusted time series for each customer.
A single time series can be very large. Often, a time series must be summarized with respect to time lags in order to be efficiently analyzed using time domain techniques. These techniques help describe how a current observation is related to the past observations with respect to the time (season) lag.
The following statements illustrate how to use the TIMESERIES procedure to perform time domain analysis of time-stamped transactional data.
proc timeseries data=transactions out=out outcorr=timedomain; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTCORR=TIMEDOMAIN data set contains the time domain statistics, such as sample autocorrelations and partial autocorrelations, by each customer.
Sometimes time series data contain underlying patterns that can be identified using spectral analysis techniques. Two kinds of spectral analyses on univariate data can be performed using the TIMESERIES procedure. They are singular spectrum analysis and Fourier spectral analysis.
Singular spectrum analysis (SSA) is a technique for decomposing a time series into additive components and categorizing these components based on the magnitudes of their contributions. SSA uses a single parameter, the window length, to quantify patterns in a time series without relying on prior information about the series’ structure. The window length represents the maximum lag that is considered in the analysis, and it corresponds to the dimensionality of the principle components analysis (PCA) on which SSA is based. The components are combined into groups to categorize their roles in the SSA decomposition.
Fourier spectral analysis decomposes a time series into a sum of harmonics. In the discrete Fourier transform, the contribution of components at evenly spaced frequencies are quantified in a periodogram and summarized in spectral density estimates.
The following statements illustrate how to use the TIMESERIES procedure to analyze time-stamped transactional data without prior information about the series’ structure.
proc timeseries data=transactions outssa=ssa outspectra=spectra; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTSSA=SSA data set contains a singular spectrum analysis of the withdrawals and deposits data. The data set specified by OUTSPECTRA=SPECTRA contains a Fourier spectral decomposition of the same data.
By default, the TIMESERIES procedure produces no printed output.