IMSTAT Procedure (Analytics)

FORECAST Statement

The FORECAST statement computes predicted values, measures of precision, and confidence limits for observed and future (forecast) values of a time series.

Syntax

FORECAST timestamp-variable </ options>;
FORECAST DATA=libref.member-name timestamp-variable </ options>;

Required Arguments

timestamp-variable

specifies the name of the SAS datetime variable to use.

libref.member-name

specifies the libref and table name of a SAS data set when you specify the DATA= option. The data set must contain the timestamp variable and one or more of the analysis variables. The procedure then sends these values to the server to request the forecast calculation.

FORECAST Statement Options

AGGREGATE=(list-of-aggregators)

specifies the aggregators on which the ordering of the result set is based. The following aggregators are valid:

The available aggregation methods are as follows:
CSS corrected sum of squares
CV coefficient of variation
MAX maximum value
MEAN arithmetic mean
MIN minimum value
N number of observations
PROBT p-value for the t-statistic
STD standard deviation
STDERR standard error
SUM sum of the nonmissing values
TSTAT t-statistic for the null hypothesis that the mean equals zero
USS uncorrected sum of squares
VAR sample variance
Each analysis variable can be associated with a different aggregator. For example, the following statement forecasts the sum of expenses and the mean of revenue:
forecast ts / vars     =(expenses revenue)
                     aggregate=(sum mean);
The default aggregation is the mean of the analysis variables within unique values of the timestamp variable.
Interaction This option has no effect if you specify a data set with the DATA= option.

FORMATS=("format-specification")

specifies the format for the time stamp variable. If you do not specify the FORMATS= option, the default format is applied for the time stamp variable.

Interaction This option has no effect if you specify a data set with the DATA= option.

FRAME=LEAD | HORIZON

FRAME=TAIL | HISTORY

FRAME=BOTH

specifies how to compose the main result table. The default is FRAME=BOTH and the result set contains the observed series (the history) as well as the forecast (the horizon). If you specify FRAME=LEAD (or FRAME=HORIZON), then only the future values are returned. You can control the length of the horizon with the LEAD= option.

If you specify FRAME=TAIL (or FRAME=HISTORY), then only the results for the historic values are returned. The returned values are the aggregated values, their predicted values, residuals, prediction standard errors, and confidence limits. You can control the number of the historical records with the TAIL= option.
Alias WINDOW=
Default BOTH

HOST="host-name"

specifies the machine to which you want to connect to produce the forecast when you specify the DATA= option in the FORECAST statement. If you do not specify the host information, it is determined from the active table.

PORT=number

specifies to use the server that is listening on that port to produce the forecast when you specify the DATA= option in the FORECAST statement. You can use this option with the HOST= option to use a specific server. If you do not specify a PORT= value, the behavior of the FORECAST statement depends on whether a table is active. If there is no active table, then the IMSTAT procedure tries to connect to the server using the LASRPORT macro variable. If a table is active, then a connection is made to the server that has the active table.

INDEP=variable-name

INDEP=(variable-list)

specifies the independent variables used in automatic modeling. When you specify one or more independent variables, the server performs model selection and determines the best-fitting time series model and the important independent variables. If any variables are selected, a table is generated to show the actual and predicted values for each variable. Specify the INFO option to view the Forecast Information table that displays the selected time series model.

Alias INDEPVARS=

INFO

specifies to display a forecast information table for each analysis variable. Each table provides informational details about the forecast. For example, you can learn from this table what time units were applied and which method was used to compute the forecast.

LEAD=n

specifies the forecast horizon (in number of time intervals).

Default 12
Interaction This option has no effect if you specify a data set with the DATA= option.

NOPREPARSE

specifies to prevent the procedure from pre-parsing and pre-generating code for temporary expressions, scoring programs, and other user-written SAS statements.

When this option is specified, the user-written statements are sent to the server "as-is" and then the server attempts to generate code from it. If the server detects problems with the code, the error messages might not to be as detailed as the messages that are generated by SAS client. If you are debugging your user-written program, then you might want to pre-parse and pre-generate code in the procedure. However, if your SAS statements compile and run as you want them to, then you can specify this option to avoid the work of parsing and generating code on the SAS client.
Alias NOPREP
Interaction This option has no effect if you specify a data set with the DATA= option.

STAMPLIMIT=m

specifies a hard limit for the number of time stamps. If that number reaches m, then execution stops and the server generates an error message. This option is useful to protect against the generation of very large result sets. You can also limit the number of time stamps used in the forecast with the TAIL= option. Using the TAIL= option also reduces the size of the result set.

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

TAIL=k

specifies the number of most recent time intervals on which to base the estimation of the predicted and forecasted values. The TAIL= option enables you to restrict the length of the series that is used in the forecast.

For example, if the aggregation results in 500 unique values of the time stamp, then specifying TAIL=30 uses only the thirty most recent values in the estimation procedure. If you do not specify the TAIL= option, then all the aggregated time stamps are used in the estimation procedure. This option can also limit the size of the result set since at most k observations are used in the computation of the forecast.
Interaction This option has no effect if you specify a data set with the DATA= option.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias TN=

VARS=variable-name

VARS=(variable-list)

specifies one or more numeric analysis variables to forecast. If you do not specify the VARS= option, a forecast is produced for all numeric variables in the active table. If you specify a data set with the DATA= option, you must specify the analysis variables in the VARS= option. If you do not, the server generates an error.

Alias DEPVARS=

Details

There are two ways to use the FORECAST statement. You can use the active table or you can specify a data set with the DATA= option. The following paragraphs provide more information about these choices.
When you use the active table, the server forms a time series by aggregating the values of the analysis variables according to the unique (formatted) values of a numeric time stamp variable. The time stamp variable must be a SAS datetime type. The aggregate series (one for each analysis variable) are then used to compute predicted values of the series. The predicted values can cover the observed time interval or can apply to future observations. Measures of precision (standard errors of prediction and confidence limits) are also available. You can produce forecasts for multiple variables and you can vary the method for aggregating values on a variable-by-variable basis.
Alternatively, you can specify a SAS data set with the DATA= option. The data set must have a time stamp variable and one or more of the analysis variables. In this case, the data are sent to the server for the forecast calculation. In this case, there is no aggregation because the values read from the data set are assumed to constitute the series of interest. You can produce forecasts for multiple variables when you use the DATA= option, but you cannot specify the aggregation technique for the variables or specify the TAIL= and HEAD= options in the FORECAST statement.