Previous Page | Next Page

The SIMILARITY Procedure

PROC SIMILARITY Statement

PROC SIMILARITY options ;

The following options can be used in the PROC SIMILARITY statement.

DATA=SAS-data-set

names the SAS data set that contains the time series, transactional, or sequence input data for the procedure. If the DATA= option is not specified, the most recently created SAS data set is used.

ORDER=

specifies the order that the variables listed in the INPUT and TARGET statements are processed. This ordering affects the OUTSEQUENCE=, OUTPATH=, OUTMEASURE=, and OUTSUM= data sets, as well as the printed and graphical output. The SORTNAMES option also affects the ordering of the analysis.

OUT=SAS-data-set

names the output data set to contain the time series variables specified in the subsequent INPUT and TARGET statements. If an ID variable is specified, it is also included in the OUT= data set. The values are accumulated based on the ID statement INTERVAL= option or the ACCUMULATE= options or both. The values are transformed based on the INPUT or TARGET statement TRANSFORM=, DIF=, and/ or SDIF= options in this order. The OUT= data set is particularly useful when you want to further analyze, model, or forecast the resulting time series with other SAS/ETS procedures.

OUTMEASURE=SAS-data-set

names the output data set to contain the detailed similarity measures by time ID value. The form of the OUTMEASURE= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options.

OUTPATH=SAS-data-set

names the output data set to contain the path used to compute the similarity measures for each slide and warp. The form of the OUTPATH= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options.

OUTSEQUENCE=SAS-data-set

names the output data set to contain the sequences used to compute the similarity measures for each slide and warp. The form of the OUTSEQUENCE= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options.

OUTSUM=SAS-data-set

names the output data set to contain the similarity measure summary. The OUTSUM= data set is particularly useful when analyzing large numbers of series and only the summary of the results are needed. The form of the OUTSUM= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options.

INPUT

specifies that each INPUT variable is processed and then the TARGET variables are processed. The results are stored and printed based only on the INPUT variables.

INPUTTARGET

specifies that each INPUT variable is processed and then the TARGET variables are processed. The results are stored and printed based on both the INPUT and TARGET variables. This is the default.

TARGET

specifies that each TARGET variable is processed and then the INPUT variables are processed. The results are stored and printed based only on the TARGET variables.

TARGETINPUT

specifies that each TARGET variable is processed and then the INPUT variables are processed. The results are stored and printed based on both the TARGET and INPUT variables.

PLOTS=option
PLOTS=( options ...)

specifies the graphical output desired. The options are separated by spaces. By default, the SIMILARITY procedure produces no graphical output. The following graphical options are available:

ALL

same as PLOTS=(INPUTS TARGETS SEQUENCES NORMALIZED SCALED DISTANCES PATHS MAPS WARPS COST MEASURES).

COSTS

plots time warp costs graphics.

DISTANCES

plots similarity absolute and relative distances graphics. (OUTPATH= data set)

INPUTS

plots input variable time series graphics. (OUT= data set)

MAPS

plots time warp maps graphics. (OUTPATH= data set)

MEASURES

plots similarity measure graphics. (OUTMEASURE= data set)

NORMALIZED

plots both the input and target variable normalized sequence graphics. These plots are displayed only when the INPUT or TARGET statement NORMALIZE= option is specified.

PATHS

plots time warp paths graphics. (OUTPATH= data set)

SCALED

plots both the input variable scaled sequence graphics. These plots are displayed only when the INPUT statement SCALE= option is specified.

SEQUENCES

plots both the input and target variable sequence graphics. (OUTSEQUENCE= data set)

TARGETS

plots target variable time series graphics. (OUT= data set)

WARPS

plots time warps graphics. (OUTPATH= data set)

PRINT=option
PRINT=(options ...)

specifies the printed output desired. The options are separated by spaces. By default, the SIMILARITY procedure produces no printed output. The following printing options are available:

DESCSTATS

prints the descriptive statistics for the working time series.

PATHS

prints the path statistics table.

COSTS

prints the cost statistics table.

WARPS

prints the warp summary table.

SLIDES

prints the slides summary table.

SUMMARY

prints the similarity measure summary table.

ALL

same as PRINT=(DESCSTATS PATHS COSTS WARPS SLIDES SUMMARY).

PRINTDETAILS

specifies that output requested with the PRINT= option be printed in greater detail.

SEASONALITY=integer

specifies the length of the seasonal cycle where integer ranges from one to 10,000. For example, SEASONALITY=3 means that every group of three time periods forms a seasonal cycle. By default, the length of the seasonal cycle is one (no seasonality) or the length implied by the INTERVAL= option specified in the ID statement. For example, INTERVAL=MONTH implies that the length of the seasonal cycle is twelve.

SORTNAMES

specifies that the variables specified in the INPUT and TARGET statements are processed in order sorted by the variable names. By default, the SIMILARITY procedure processes the variables in the order they are listed. The ORDER= option also affects the ordering of the analysis.


Note: This procedure is experimental.

Previous Page | Next Page | Top of Page