The SIMILARITY Procedure

Details: SIMILARITY Procedure

You can use the SIMILARITY procedure to do the following functions, which are done in the order shown. First, you can form time series data from transactional data with the options shown:

  1. accumulation

    ACCUMULATE= option

  2. missing value interpretation

    SETMISSING= option

  3. zero value interpretation

    ZEROMISS= option

Next, you can transform the accumulated time series to form the working time series with the following options. Transformations are useful when you want to stabilize the time series before computing the similarity measures. Simple and seasonal differencing are useful when you want to detrend or deseasonalize the time series before computing the similarity measures. Often, but not always, the TRANSFORM=, DIF=, and SDIF= options should be specified in the same way for both the target and input variables.

  1. time series transformation

    TRANSFORM= option

  2. time series differencing

    DIF= and SDIF= option

  3. time series missing value trimming

    TRIMMISSING= option

  4. time series descriptive statistics

    PRINT=DESCSTATS option

After the working series is formed, you can treat it as an ordered sequence that can be normalized or scaled. Normalizations are useful when you want to compare the "shape" or "profile" of the time series. Scaling is useful when you want to compare the input sequence to the target sequence while discounting the variation of the target sequence.

  1. normalization

    NORMALIZE= option

  2. scaling

    SCALE= option

After the working sequences are formed, you can compute similarity measures between input and target sequences:

  1. sliding

    SLIDE= option

  2. warping

    COMPRESS= and EXPAND= option

  3. similarity measure

    MEASURE= and PATH= option

The SLIDE= option specifies observation-index sliding, seasonal-index sliding, or no sliding. The COMPRESS= and EXPAND= options specify the warping limits. The MEASURE= and PATH= options specify how the similarity measures are computed.