The SIMILARITY Procedure

Example 24.1 Accumulating Transactional Data into Time Series Data

This example uses the SIMILARITY procedure to illustrate the accumulation of time-stamped transactional data that has been recorded at no particular frequency into time series data at a specific frequency. After the time series is created, the various SAS/ETS procedures related to time series analysis, similarity analysis, seasonal adjustment and decomposition, modeling, and forecasting can be used to further analyze the time series data.

Suppose that the input data set `WORK.RETAIL` contains variables `STORE` and `TIMESTAMP` and numerous other numeric transaction variables. The BY variable `STORE` contains values that break up the transactions into groups (BY groups). The time ID variable `TIMESTAMP` contains SAS date values recorded at no particular frequency. The other data set variables contain the numeric transaction values to be analyzed. It is further assumed that the input data set is sorted by the variables `STORE` and `TIMESTAMP`.

The following statements form monthly time series from the transactional data based on the median value (ACCUMULATE=MEDIAN) of the transactions recorded with each time period. The accumulated time series values for time periods with no transactions are set to zero instead of missing (SETMISS=0). Only transactions recorded between the first day of 1998 (START=’01JAN1998’D ) and last day of 2000 (END=’31JAN2000’D ) are considered and if needed are extended to include this range.

```proc similarity data=work.retail out=mseries;
by store;
id timestamp interval=month
accumulate=median
setmiss=0
start='01jan1998'd
end  ='31dec2000'd;
target _NUMERIC_;
run;
```

The monthly time series data are stored in the data `WORK.MSERIES`. Each BY group associated with the BY variable `STORE` contains an observation for each of the 36 months associated with the years 1998, 1999, and 2000. Each observation contains the variable `STORE`, `TIMESTAMP`, and each of the analysis variables in the input DATA= data set.

After each set of transactions has been accumulated to form the corresponding time series, the accumulated time series can be analyzed by using various time series analysis techniques. For example, exponentially weighted moving averages can be used to smooth each series. The following statements use the EXPAND procedure to smooth the analysis variable named `STOREITEM`.

```proc expand data=mseries
out=smoothed
from=month;
by store;
id timestamp;
convert storeitem=smooth / transform=(ewma 0.1);
run;
```

The smoothed series is stored in the data set `WORK.SMOOTHED`. The variable `SMOOTH` contains the smoothed series.

If the time ID variable `TIMESTAMP` contains SAS datetime values instead of SAS date values, the INTERVAL= , START=, and END= options in the SIMILARITY procedure must be changed accordingly, and the following statements could be used to accumulate the datetime transactions to a monthly interval:

```proc similarity data=work.retail
out=tseries;
by store;
id timestamp interval=dtmonth
accumulate=median
setmiss=0
start='01jan1998:00:00:00'dt
end  ='31dec2000:00:00:00'dt;
target _NUMERIC_;
run;
```

The monthly time series data are stored in the data `WORK.TSERIES`, and the time ID values use a SAS datetime representation.