The SIMILARITY Procedure

INPUT Statement

INPUT variable-list < / options> ;

The INPUT statement lists the input numeric variables in the DATA= data set whose values are to be accumulated to form the time series or represent ordered numeric sequences (when no ID statement is specified).

An input data set variable can be specified in only one INPUT or TARGET statement. Any number of INPUT statements can be used. The following options can be used with an INPUT statement:

ACCUMULATE=option

specifies how the data set observations are accumulated within each time period for the variables listed in the INPUT statement. If the ACCUMULATE= option is not specified in the INPUT statement, accumulation is determined by the ACCUMULATE= option of the ID statement. If the ACCUMULATE= option is not specified in the ID statement or the INPUT statement, no accumulation is performed. See the ID statement ACCUMULATE= option for more details.

DIF=(numlist)

specifies the differencing to be applied to the accumulated time series. The list of differencing orders must be separated by spaces or commas. For example, DIF=(1,3) specifies first, then third order, differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the DIF= option. Simple differencing is useful when you want to detrend the time series before computing the similarity measures.

NORMALIZE=option

specifies the sequence normalization to be applied to the working input sequence. The following normalization options are provided:

NONE: No normalization is applied. This option is the default.
ABSOLUTE: Absolute normalization is applied.
STANDARD: Standard normalization is applied.
User-Defined: Normalization is computed by a user-defined subroutine that is created using the FCMP procedure, where User-Defined is the subroutine name.

Normalization is applied to the working input sequence, which can be a subset of the working input time series if the SLIDE=INDEX or SLIDE=SEASON option is specified.

SCALE=option

specifies the scaling of the working input sequence with respect to the working target sequence. Scaling is performed after normalization. The following scaling options are provided:

NONE: No scaling is applied. This option is the default.
ABSOLUTE: Absolute scaling is applied.
STANDARD: Standard scaling is applied.
User-Defined: Scaling is computed by a user-defined subroutine that is created using the FCMP procedure, where User-Defined is the subroutine name.

Scaling is applied to the working input sequence, which can be a subset of the working input time series if the SLIDE=INDEX or SLIDE=SEASON option is specified.

SDIF=(numlist)

specifies the seasonal differencing to be applied to the accumulated time series. The list of seasonal differencing orders must be separated by spaces or commas. For example, SDIF=(1,3) specifies first, then third, order seasonal differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the SDIF= option. Seasonal differencing is useful when you want to deseasonalize the time series before computing the similarity measures.

SETMISSING=option | number SETMISS=option | number

specifies how missing values (either actual or accumulated) are interpreted in the accumulated time series or ordered sequence for variables listed in the INPUT statement. If the SETMISSING= option is not specified in the INPUT statement, missing values are set based on the SETMISSING= option in the ID statement. If the SETMISSING= option is not specified in the ID statement or the INPUT statement, no missing value interpretation is performed. See the ID statement SETMISSING= option for more details.

TRANSFORM=option

specifies the time series transformation to be applied to the accumulated time series. The following transformations are provided:

NONE: No transformation is applied. This option is the default.
LOG: Logarithmic transformation is applied.
SQRT: Square-root transformation is applied.
LOGISTIC: Logistic transformation is applied.
BOXCOX(number): Box-Cox transformation with parameter is applied, where the real number is between –5 and 5.
User-Defined: Transformation is computed by a user-defined subroutine that is created using the FCMP procedure, where User-Defined is the subroutine name.

When the TRANSFORM= option is specified, the time series must be strictly positive unless a user-defined function is used.

TRIMMISSING=option TRIMMISSING=option

specifies how missing values (either actual or accumulated) are trimmed from the accumulated time series or ordered sequence for variables that are listed in the INPUT statement. The following trimming options are provided:

NONE: No missing value trimming is applied.
LEFT: Beginning missing values are trimmed.
RIGHT: Ending missing values are trimmed.
BOTH: Both beginning and ending missing value are trimmed. This is the default.

ZEROMISS=option

specifies how beginning and ending zero values (either actual or accumulated) are interpreted in the accumulated time series or ordered sequence for variables listed in the INPUT statement. If the ZEROMISS= option is not specified in the INPUT statement, beginning and ending zero values are set based on the ZEROMISS= option of the ID statement. If the ZERO= option is not specified in the ID statement or the INPUT statement, no zero value interpretation is performed. See the ID statement ZEROMISS= option for more details.