The SIMILARITY Procedure

Example 31.3 Sliding Similarity Analysis

This example illustrates how to use sliding similarity analysis to compare two time sequences. The SASHELP.WORKERS data set contains two similar time series variables (ELECTRIC and MASONRY), which represent employment over time. The following statements create an example data set that contains two time series of differing lengths, where the variable MASONRY has the first 12 and last 7 observations set to missing to simulate the lack of data associated with the target series:

data workers; set sashelp.workers;
   if '01JAN1978'D <= date < '01JAN1982'D then masonry = masonry;
   else masonry = .;
run;

The goal of sliding similarity measures analysis is find the slide index that corresponds to the most similar subsequence of the input series when compared to the target sequence. The following statements perform sliding similarity analysis on the example data set:

proc similarity data=workers out=_NULL_ print=(slides summary);
   id date interval=month;
   input  electric;
   target masonry / slide=index measure=msqrdev
                    expand=(localabs=3 globalabs=3)
                    compress=(localabs=3 globalabs=3);
run;

The DATA=WORKERS option specifies that the input data set WORK.WORKERS is to be used in the analysis. The OUT=_NULL_ option specifies that no output time series data set is to be created. The PRINT=(SLIDES SUMMARY) option specifies that the ODS tables related to the sliding similarity measures and their summary be produced. The INPUT statement specifies that the input variable is ELECTRIC. The TARGET statement specifies that the target variable is MASONRY and that the similarity measure be computed using mean squared deviation (MEASURE=MSQRDEV). The SLIDE=INDEX option specifies observation index sliding. The COMPRESS=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute compression to 3. The EXPAND=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute expansion to 3.

Output 31.3.1: Summary of the Slide Measures

The SIMILARITY Procedure

Slide Measures Summary for Input=ELECTRIC
and Target=MASONRY
Slide Index DATE Slide Target
Sequence
Length
Slide Input
Sequence
Length
Slide Warping
Amount
Slide Minimum
Measure
0 JAN1977 48 51 3 497.6737
1 FEB1977 48 51 1 482.6777
2 MAR1977 48 51 0 474.1251
3 APR1977 48 51 0 490.7792
4 MAY1977 48 51 -2 533.0788
5 JUN1977 48 51 -3 605.8198
6 JUL1977 48 51 -3 701.7138
7 AUG1977 48 51 3 646.5918
8 SEP1977 48 51 3 616.3258
9 OCT1977 48 51 3 510.9836
10 NOV1977 48 51 3 382.1434
11 DEC1977 48 51 3 340.4702
12 JAN1978 48 51 2 327.0572
13 FEB1978 48 51 1 322.5460
14 MAR1978 48 51 0 325.2689
15 APR1978 48 51 -1 351.4161
16 MAY1978 48 51 -2 398.0490
17 JUN1978 48 50 -3 471.6931
18 JUL1978 48 49 -3 590.8089
19 AUG1978 48 48 0 595.2538
20 SEP1978 48 47 -1 689.2233
21 OCT1978 48 46 -2 745.8891
22 NOV1978 48 45 -3 679.1907



Output 31.3.2: Minimum Measure

Minimum Measure Summary
Input Variable MASONRY
ELECTRIC 322.5460



This analysis results in 23 slides based on the observation index. The minimum measure (322.5460) occurs at slide index 13 which corresponds to the time value FEB1978. Note that the original data set SASHELP.WORKERS was modified beginning at the time value JAN1978. This similarity analysis justifies the belief the ELECTRIC lags MASONRY by one month based on the time series cross-correlation analysis despite the lack of target data (MASONRY).

The goal of seasonal sliding similarity measures is to find the seasonal slide index that corresponds to the most similar seasonal subsequence of the input series when compared to the target sequence. The following statements repeat the preceding similarity analysis on the example data set with seasonal sliding:

proc similarity data=workers out=_NULL_ print=(slides summary);
   id date interval=month;
   input  electric;
   target masonry / slide=season measure=msqrdev;
run;

Output 31.3.3: Summary of the Seasonal Slide Measures

The SIMILARITY Procedure

Slide Measures Summary for Input=ELECTRIC
and Target=MASONRY
Slide Index DATE Slide Target
Sequence
Length
Slide Input
Sequence
Length
Slide Warping
Amount
Slide Minimum
Measure
0 JAN1977 48 48 0 1040.086
12 JAN1978 48 48 0 641.927



Output 31.3.4: Seasonal Minimum Measure

Minimum Measure Summary
Input Variable MASONRY
ELECTRIC 641.9273



The analysis differs from the previous analysis in that the slides are performed based on the seasonal index (SLIDE=SEASON) with no warping. With a seasonality of 12, two seasonal slides are considered at slide indices 0 and 12 with the minimum measure (641.9273) occurring at slide index 12 which corresponds to the time value JAN1978. Note that the original data set SASHELP.WORKERS was modified beginning at the time value JAN1978. This similarity analysis justifies the belief that ELECTRIC and MASONRY have similar seasonal properties based on seasonal decomposition analysis despite the lack of target data (MASONRY).