The SIMILARITY Procedure |
TARGET Statement |
The TARGET statement lists the numeric target variables in the DATA= data set whose values are to be accumulated to form the time series or represent ordered numeric sequences (when no ID statement is specified).
An input data set variable can be specified in only one INPUT or TARGET statement. Any number of TARGET statements can be used. The following options can be used with a TARGET statement.
specifies how the data set observations are accumulated within each time period for the variables listed in the TARGET statement. If the ACCUMULATE= option is not specified in the TARGET statement, accumulation is determined by the ACCUMULATE= option of the ID statement. If the ACCUMULATE= option is not specified on the ID statement or the TARGET statement, no accumulation is performed. See the ID statement ACCUMULATE= option for more details.
specifies the sliding sequence (global) and warping (local) compression range of the target sequence with respect to the input sequence. Compression of the target sequence is the same as expansion of the input sequence and vice versa. The compression limits are defined based on the length of the target sequence and are imposed on the target sequence. The following compression options are provided:
specifies the absolute global compression, where integer ranges from zero to 10,000. GLOBALABS=0 implies no global compression, which is the default unless the GLOBALPCT= option is specified.
specifies global compression as a percentage of the length of the target sequence, where number ranges from zero to 100. GLOBALPCT=0 implies no global compression, which is the default. GLOBALPCT=100 implies maximum allowable compression.
specifies the absolute local compression, where integer ranges from zero to 10,000. The default is maximum allowable absolute local compression unless the LOCALPCT= option is specified.
specifies local compression as a percentage of the length of the input sequence, where number ranges from zero to 100. The percentage specified by the LOCALPCT= option must be less than the GLOBALPCT= option. LOCALPCT=0 implies no local compression. LOCALPCT=100 implies maximum allowable local compression. The default is LOCALPCT=100.
If the SLIDE=NONE or the SLIDE=SEASON option is specified in the TARGET statement, the global compression options are ignored. To disallow local compression, use the option COMPRESS=(LOCALPCT=0 LOCALABS=0).
If the SLIDE=INDEX option is specified, the global compression options are not ignored. To completely disallow both global and local compression, use the option COMPRESS=(GLOBALPCT=0 LOCALPCT=0) or COMPRESS=(GLOBALABS=0 LOCALABS=0). To allow only local compression, use the option COMPRESS=(GLOBALPCT=0 GLOBALABS=0). These are the default compression options.
The above options can be used in combination to specify the desired amount of global and local compression as the following examples illustrate.
Let denote the global compression limit and denote the local compression limit.
COMPRESS=(GLOBALPCT=20) specifies the global and local compression to range from zero to .
COMPRESS=(GLOBALPCT=20 GLOBALABS=10) allows for the global and local compression to range from zero to .
COMPRESS=(LOCALPCT=10) allows for the local compression to range from zero to .
COMPRESS=(LOCALPCT=20 LOCALABS=5) allows for the local compression to range from zero to .
COMPRESS=(GLOBALPCT=20 LOCALPCT=20) allows for the global compression to range from zero to and allows for the local compression to range from zero to .
COMPRESS=(GLOBALPCT=20 GLOBALABS=10 LOCALPCT=10 LOCALABS=5) allows for the global compression to range from zero to and allows for the local compression to range from zero to .
Suppose is the length of the input time series and is the length of the target sequence. The valid global compression limit, , is always limited by the length of the target sequence: .
Suppose is the length of the input sequence, and is the length of the target sequence. The valid local compression limit, , is always limited by the lengths of the input and target sequence: .
specifies the differencing to be applied to the accumulated time series. The list of differencing orders must be separated by spaces or commas. For example, DIF=(1,3) specifies first, then third, order differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the DIF= option. Simple differencing is useful when you want to detrend the time series before computing the similarity measures.
specifies the sliding sequence (global) and warping (local) expansion range of the target sequence with respect to the input sequence. Expansion of the target sequence is the same as compression of the input sequence and vice versa. The expansion limits are defined based on the length of the input sequence, but are imposed on the target sequence. The following expansion options are provided:
specifies the absolute global expansion, where integer ranges from zero to 10,000. GLOBALABS=0 implies no global expansion, which is the default unless the GLOBALPCT= option is specified.
specifies global expansion as a percentage of the length of the target sequence, where number ranges from zero to 100. GLOBALPCT=0 implies no global expansion, which is the default unless the GLOBALABS= option is specified. GLOBALPCT=100 implies maximum allowable global expansion.
specifies the absolute local expansion, where integer ranges from zero to 10,000. The default is maximal allowable absolute local expansion unless the LOCALPCT= option is specified.
specifies local expansion as a percentage of the length of the target sequence, where number ranges from zero to 100. LOCALPCT=0 implies no local expansion. LOCALPCT=100 implies maximum allowable local expansion. The default is LOCALPCT=100.
If the SLIDE=NONE or the SLIDE=SEASON option is specified in the TARGET statement, the global expansion options are ignored. To disallow local expansion, use the option EXPAND=(LOCALPCT=0 LOCALABS=0).
If the SLIDE=INDEX option is specified, the global expansion options are not ignored. To completely disallow both global and local expansion, use the option EXPAND=(GLOBALPCT=0 LOCALPCT=0) or EXPAND=(GLOBALABS=0 LOCALABS=0). To allow only local expansion, use the option EXPAND=(GLOBALPCT=0 GLOBALABS=0). These are the default expansion options.
The above options can be used in combination to specify the desired amount of global and local expansion as the following examples illustrate.
Let denote the global expansion limit and denote the local expansion limit.
EXPAND=(GLOBALPCT=20) allows for the global and local expansion to range from zero to .
EXPAND=(GLOBALPCT=20 GLOBALABS=10) allows for the global and local expansion to range from zero to .
EXPAND=(LOCALPCT=10) allows for the local expansion to range from zero to .
EXPAND=(LOCALPCT=10 LOCALABS=5) allows for the local expansion to range from zero to .
EXPAND=(GLOBALPCT=20 LOCALPCT=10) allows for the global expansion to range from zero to and allows for the local expansion to range from zero to .
EXPAND=(GLOBALPCT=20 GLOBALABS=10 LOCALPCT=10 LOCALABS=5) allows for the global expansion to range from zero to and allows for the local expansion to range from zero to .
Suppose is the length of the input time series and is the length of the target sequence. The valid global expansion limit, , is always limited by the length of the input time series: .
Suppose is the length of the input sequence and is the length of the target sequence. The valid local expansion limit, , is always limited by the lengths of the input and target sequence: .
specifies the similarity measure to be computed by using the working input and target sequences. The following similarity measures are provided:
squared deviation. This option is the default.
absolute deviation
mean squared deviation
mean squared deviation relative to the length of the input sequence
mean squared deviation relative to the length of the target sequence
mean squared deviation relative to the minimum valid path length
mean squared deviation relative to the maximum valid path length
mean absolute deviation
mean absolute deviation relative to the length of the input sequence
mean absolute deviation relative to the length of the target sequence
mean absolute deviation relative to the minimum valid path length
mean absolute deviation relative to the maximum valid path length
Measure computed by a user-defined function created by using the FCMP procedure, where User-Defined is the function name
specifies the sequence normalization to be applied to the working target sequence. The following normalization options are provided:
No normalization is applied. This option is the default.
Absolute normalization is applied.
Standard normalization is applied.
Normalization computed by a user-defined subroutine, created by using the FCMP procedure, where User-Defined is the subroutine name.
specifies the similarity measure and warping path information to be computed using the working input and target sequences. The following similarity measures and warping path are provided:
measure and path computed by a user-defined subroutine created by using the FCMP procedure, where User-Defined is the subroutine name
For computational efficiency, the PATH= option should be only used when it is desired to compute both the similarity measure and the warping path information. If only the similarity measure is needed, use the MEASURE= option. If both the MEASURE= and PATH= option are specified in the TARGET statement, the PATH= option takes precedence.
specifies the seasonal differencing to be applied to the accumulated time series. The list of seasonal differencing orders must be separated by spaces or commas. For example, SDIF=(1,3) specifies first, then third, order seasonal differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the SDIF= option. Seasonal differencing is useful when you want to deseasonalize the time series before computing the similarity measures.
option specifies how missing values (either actual or accumulated) are interpreted in the accumulated time series for variables listed in the TARGET statement. If the SETMISSING= option is not specified in the TARGET statement, missing values are set, based on the SETMISSING= option of the ID statement. If the SETMISSING= option is not specified on the ID statement or the TARGET statement, no missing value interpretation is performed. See the ID statement SETMISSING= option for more details.
specifies the sliding of the target sequence with respect to the input sequence. The following slides are provided:
No sequence sliding. The input time series is compared with the target sequence directly with no sliding. This option is the default.
Slide by time index. The input time series is compared with the target sequence by observation index.
Slide by seasonal index. The input time series is compared with the target sequence by seasonal index.
NOTE: The SLIDE= option takes precedence over the COMPRESS= and EXPAND= option.
specifies the time series transformation to be applied to the accumulated time series. The following transformations are provided:
No transformation is applied. This option is the default.
Logarithmic transformation is applied.
Square-root transformation is applied.
Logistic transformation is applied.
Box-Cox transformation with parameter is applied, where the real number is between -5 and 5
transformation is computed by a user-defined subroutine created by using the FCMP procedure, where User-Defined is the subroutine name.
When the TRANSFORM= option is specified, the time series must be strictly positive unless a user-defined function is used.
specifies how missing values (either actual or accumulated) are trimmed from the accumulated time series or ordered sequence for variables listed in the TARGET statement. The following trimming options are provided:
No missing value trimming is applied.
Beginning missing values are trimmed.
Ending missing values are trimmed.
Both beginning and ending missing values are trimmed. This is the default.
specifies how beginning and/or ending zero values (either actual or accumulated) are interpreted in the accumulated time series or ordered sequence for variables listed in the TARGET statement. If the ZEROMISS= option is not specified in the TARGET statement, beginning and/or ending values are set based on the ZEROMISS= option of the ID statement. See the ID statement ZEROMISS= option for more details.
Note: This procedure is experimental.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.