The SIMILARITY Procedure |
This example illustrates how to compare two time sequences using sliding similarity analysis. The SASHELP.WORKERS data set contains two similar time series variables (ELECTRIC and MASONRY), which represent employment over time. The following statements create an example data set that contains two time series of differing lengths, where the variable MASONRY has the first 12 and last 7 observations set to missing to simulate the lack of data associated with the target series.
data workers; set sashelp.workers; if '01JAN1978'D <= date < '01JAN1982'D then masonry = masonry; else masonry = .; run;
The goal of sliding similarity measures analysis is find the slide index that corresponds to the most similar subsequence of the input series when compared to the target sequence. The following statements perform sliding similarity analysis on the example data set:
proc similarity data=workers out=_NULL_ print=(slides summary); id date interval=month; input electric; target masonry / slide=index measure=msqrdev expand=(localabs=3 globalabs=3) compress=(localabs=3 globalabs=3); run;
The DATA=WORKERS option specifies that the input data set WORK.WORKERS is to be used in the analysis. The OUT=_NULL_ option specifies that no output time series data set is to be created. The PRINT=(SLIDES SUMMARY) option specifies that the ODS tables related to the sliding similarity measures and their summary are produced. The INPUT statement specifies that the input variable is ELECTRIC. The TARGET statement specifies that the target variable is MASONRY and that the similarity measure is computed using mean squared deviation (MEASURE=MSQRDEV). The SLIDE=INDEX option specifies observation index sliding. The COMPRESS=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute compression to 3. The EXPAND=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute expansion to 3.
Slide Measures Summary for Input=ELECTRIC and Target=MASONRY |
|||||
---|---|---|---|---|---|
Slide Index | DATE | Slide Target Sequence Length |
Slide Input Sequence Length |
Slide Warping Amount |
Slide Minimum Measure |
0 | JAN1977 | 48 | 51 | 3 | 497.6737 |
1 | FEB1977 | 48 | 51 | 1 | 482.6777 |
2 | MAR1977 | 48 | 51 | 0 | 474.1251 |
3 | APR1977 | 48 | 51 | 0 | 490.7792 |
4 | MAY1977 | 48 | 51 | -2 | 533.0788 |
5 | JUN1977 | 48 | 51 | -3 | 605.8198 |
6 | JUL1977 | 48 | 51 | -3 | 701.7138 |
7 | AUG1977 | 48 | 51 | 3 | 646.5918 |
8 | SEP1977 | 48 | 51 | 3 | 616.3258 |
9 | OCT1977 | 48 | 51 | 3 | 510.9836 |
10 | NOV1977 | 48 | 51 | 3 | 382.1434 |
11 | DEC1977 | 48 | 51 | 3 | 340.4702 |
12 | JAN1978 | 48 | 51 | 2 | 327.0572 |
13 | FEB1978 | 48 | 51 | 1 | 322.5460 |
14 | MAR1978 | 48 | 51 | 0 | 325.2689 |
15 | APR1978 | 48 | 51 | -1 | 351.4161 |
16 | MAY1978 | 48 | 51 | -2 | 398.0490 |
17 | JUN1978 | 48 | 50 | -3 | 471.6931 |
18 | JUL1978 | 48 | 49 | -3 | 590.8089 |
19 | AUG1978 | 48 | 48 | 0 | 595.2538 |
20 | SEP1978 | 48 | 47 | -1 | 689.2233 |
21 | OCT1978 | 48 | 46 | -2 | 745.8891 |
22 | NOV1978 | 48 | 45 | -3 | 679.1907 |
This analysis results in 23 slides based on the observation index with the minimum measure (322.5460) occurring at slide index 14 which corresponds to the time value FEB1978. Note that the original data set SASHELP.WORKERS was modified beginning at the time value JAN1978. This similarity analysis justifies the belief the ELECTRIC lags MASONRY by one month based on the time series cross-correlation analysis despite the lack of target data (MASONRY).
The goal of seasonal sliding similarity measures is to find the seasonal slide index which corresponds to the most similar seasonal subsequence of the input series when compared to the target sequence. The following statements repeat the above similarity analysis on the example data set with seasonal sliding:
proc similarity data=workers out=_NULL_ print=(slides summary); id date interval=month; input electric; target masonry / slide=season measure=msqrdev; run;
Slide Measures Summary for Input=ELECTRIC and Target=MASONRY |
|||||
---|---|---|---|---|---|
Slide Index | DATE | Slide Target Sequence Length |
Slide Input Sequence Length |
Slide Warping Amount |
Slide Minimum Measure |
0 | JAN1977 | 48 | 48 | 0 | 1040.086 |
12 | JAN1978 | 48 | 48 | 0 | 641.927 |
The analysis differs from the previous analysis in that the slides are performed based on the seasonal index (SLIDE=SEASON) with no warping. With a seasonality of 12, two seasonal slides are considered at slide indices 0 and 12 with the minimum measure (641.9273) occurring at slide index 12 which corresponds to the time value JAN1978. Note that the original data set SASHELP.WORKERS was modified beginning at the time value JAN1978. This similarity analysis justifies the belief that ELECTRIC and MASONRY have similar seasonal properties based on seasonal decomposition analysis despite the lack of target data (MASONRY).
Note: This procedure is experimental.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.