IMSTAT Procedure (Analytics)

FORECAST Statement

The FORECAST statement computes predicted values, measures of precision, and confidence limits for observed and future (forecast) values of a time series. The models generated by the FORECAST statement belong to the exponential smoothing method (ESM) and autoregressive integrated moving average (ARIMA) families.

Forecasting and Automatic Modeling

Forecasting with Goal Seeking

Syntax

FORECAST timestamp-variable </ options>;
FORECAST DATA=libref.member-name timestamp-variable </ options>;

Required Arguments

timestamp-variable

specifies the name of the SAS datetime variable to use.

DATA=libref.member-name

specifies the libref and table name of a SAS data set when you specify the DATA= option. The data set must contain the timestamp variable and one or more of the analysis variables. The procedure then sends these values to the server to request the forecast calculation. With this option, there is no aggregation, as the values read from the data set are assumed to constitute the series of interest.

You can produce forecasts for multiple variables when you specify DATA=, but you cannot specify an aggregation method for the variables or specify the TAIL= and HEAD= options in the FORECAST statement.
When you specify the DATA= option, and a data set is sent to the server, you can also request a goal seeking analysis. The data set then must contain variables that identify the goal variable, the control variable, possibly bounds for the control variables, and a weight variable.

FORECAST Statement Options

AGGREGATE=(list-of-aggregators)

specifies the aggregate method on which the ordering of the result set is based. The following methods are valid:

The available aggregation methods are as follows:
CSS corrected sum of squares
CV coefficient of variation
MAX maximum value
MEAN arithmetic mean
MIN minimum value
N number of observations
PROBT p-value for the t-statistic
STD standard deviation
STDERR standard error
SUM sum of the nonmissing values
TSTAT t-statistic for the null hypothesis that the mean equals zero
USS uncorrected sum of squares
VAR sample variance
Each analysis variable can be associated with a different aggregate method. For example, the following statement forecasts the sum of expenses and the mean of revenue:
forecast ts / vars     =(expenses revenue)
                     aggregate=(sum mean);
The default aggregation is the mean of the analysis variables within unique values of the timestamp variable.
Interaction This option has no effect if you specify a data set with the DATA= option.

CONTROLVARS=(variable1-name <variable2-name...>)

specifies the controllable variables used in goal seeking. Control variables act like independent variables in the automatic modeling step. Only control variables are passed to the optimization step in goal seeking. The optimization determines the best values for the control variables that meet the values of the GOAL= variable.

Variables listed as control variables cannot appear in the list of independent variables.
When you also specify INDEP= variables, the goal-seeking analysis gives precedence to controllable variables over non-controllable (specified with the INDEP= option) for its variable selection. Relative precedence of controllable variables is maintained, as is relative precedence of non-controllable variables.
Alias CONTROL=

FORMATS=("format-specification")

specifies the format for the time stamp variable. The observations are grouped by the formatted values of the time stamp variable. If multiple values map to the same formatted value, the smallest is kept as the representative value. These values form the time stamps for the forecast.

If you do not specify the FORMATS= option, the default format is applied for the time stamp variable.
Interaction This option has no effect if you specify a data set with the DATA= option.

FRAME=LEAD | HORIZON

FRAME=TAIL | HISTORY

FRAME=BOTH

specifies how to compose the main result table. The default is FRAME=BOTH and the result set contains the observed series (the history) as well as the forecast (the horizon). If you specify FRAME=LEAD (or FRAME=HORIZON), then only the future values are returned. You can control the length of the horizon with the LEAD= option.

If you specify FRAME=TAIL (or FRAME=HISTORY), then only the results for the historic values are returned. The returned values are the aggregated values, their predicted values, residuals, prediction standard errors, and confidence limits. You can control the number of the historical records with the TAIL= option.
Alias WINDOW=
Default BOTH

GOALVAR=variable-name

specifies the variable in the active table that contains the goal (the desired forecast) for goal seeking.

Alias GOAL=
Interaction You must use the DATA= option to perform forecasting with goal seeking.
Forecasting with Goal Seeking

HOST="host-name"

specifies the machine to which you want to connect to produce the forecast when you specify the DATA= option in the FORECAST statement. If you do not specify the host information, it is determined from the active table.

INDEP=variable-name

INDEP=(variable-list)

specifies the independent variables used in automatic modeling. When you specify one or more independent variables, the server performs model selection automatically and determines the best-fitting time series model and the important independent variables. If any variables are selected, a table is generated to show the actual and predicted values for each variable. Specify the INFO option to view the Forecast Information table that displays the selected time series model.

Variables that are listed as independent variables cannot appear in the list of control variables.
Alias INDEPVARS=

INFO

specifies to display a forecast information table for each analysis variable. Each table provides informational details about the forecast. For example, you can learn from this table what time units were applied and which method was used to compute the forecast.

The server performs automatic model selection. The available methods and the associated ARIMA models are as follows:
Damped-trend exponential smoothing ARIMA(1, 1, 2)
Linear exponential smoothing ARIMA(0, 2, 2)
Seasonal exponential smoothing ARIMA( 0, 1, p + 1)(0, 1, 0)p
Simple exponential smoothing ARIMA(0, 1, 1)
Winters method (additive) ARIMA( 0, 1, p + 1)(0, 1, 0)p
Winters method (multiplicative) There is no ARIMA equivalent.

LEAD=n

specifies the forecast horizon (in number of time intervals).

Default 12
Interaction This option has no effect if you specify a data set with the DATA= option.

LOWERBOUNDS<=>(boundary-specification1 <, boundary-specification2 ...>

specifies lower boundary variables for the control variables. A boundary-specification is specified with the following form:

control-variable = boundary-variable
For example, in the following FORECAST statement the variable Pricelb in data set Merged2 contains the lower boundary values for the control variable Price, and the variable Priceub contains the upper boundary values for the control variable Price.
Alias LOWER=
Example
forecast data=merged2 date / dep    =sale
                             control=price
                             lower(price=pricelb)
                             upper(price=priceub)
                             goal   =gsale
                             lead   =12;

NOPREPARSE

specifies to prevent the procedure from pre-parsing and pre-generating code for temporary expressions, scoring programs, and other user-written SAS statements.

When this option is specified, the user-written statements are sent to the server "as-is" and then the server attempts to generate code from it. If the server detects problems with the code, the error messages might not to be as detailed as the messages that are generated by SAS client. If you are debugging your user-written program, then you might want to pre-parse and pre-generate code in the procedure. However, if your SAS statements compile and run as you want them to, then you can specify this option to avoid the work of parsing and generating code on the SAS client.
Alias NOPREP
Interaction This option has no effect if you specify a data set with the DATA= option.

PORT=number

specifies to use the server that is listening on that port to produce the forecast when you specify the DATA= option in the FORECAST statement. You can use this option with the HOST= option to use a specific server. If you do not specify a PORT= value, the behavior of the FORECAST statement depends on whether a table is active. If there is no active table, then the IMSTAT procedure tries to connect to the server using the LASRPORT macro variable. If a table is active, then a connection is made to the server that has the active table.

STAMPLIMIT=m

specifies a hard limit for the number of time stamps. If that number reaches m, then execution stops and the server generates an error message. This option is useful to protect against the generation of very large result sets. You can also limit the number of time stamps used in the forecast with the TAIL= option. Using the TAIL= option also reduces the size of the result set.

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

TAIL=k

specifies the number of most recent time intervals on which to base the estimation of the predicted and forecasted values. The TAIL= option enables you to restrict the length of the series that is used in the forecast.

For example, if the aggregation results in 500 unique values of the time stamp, then specifying TAIL=30 uses only the thirty most recent values in the estimation procedure. If you do not specify the TAIL= option, then all the aggregated time stamps are used in the estimation procedure. This option can also limit the size of the result set since at most k observations are used in the computation of the forecast.
Interaction This option has no effect if you specify a data set with the DATA= option.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias TN=

UPPERBOUNDS<=>(boundary-specification1 <, boundary-specification2 ...>

specifies upper boundary variables for the control variables. A boundary-specification is specified with the following form:

control-variable = boundary-variable
asdfThe boundary specification is identical to the LOWERBOUNDS= option.
Alias UPPER=

VARS=variable-name

VARS=(variable-list)

specifies one or more numeric analysis variables to forecast. If you do not specify the VARS= option, a forecast is produced for all numeric variables in the active table. If you specify a data set with the DATA= option, you must specify the analysis variables in the VARS= option. If you do not, the server generates an error.

Alias DEPVARS=

WEIGHTVAR=variable-name

specifies the optional weight variable for goal-seeking analysis.

Alias WEIGHT=

Details

Accessing Data with the FORECAST Statement

There are two ways to use the FORECAST statement. You can use the active table or you can specify a data set with the DATA= option. The following paragraphs provide more information about these choices. In either case, the table does not need to be sorted by values of the time stamp variable.
When you use the active table, the server forms a time series by aggregating the values of the analysis variables according to the unique (formatted) values of a numeric time stamp variable. The time stamp variable must be a SAS datetime type. The aggregate series (one for each analysis variable) are then used to compute predicted values of the series. The predicted values can cover the observed time interval or can apply to future observations. Measures of precision (standard errors of prediction and confidence limits) are also available. You can produce forecasts for multiple variables and you can vary the method for aggregating values on a variable-by-variable basis.
Alternatively, you can specify a SAS data set with the DATA= option. The data set must have a time stamp variable and one or more of the analysis variables. In this case, the data are sent to the server for the forecast calculation. In this case, there is no aggregation because the values read from the data set are assumed to constitute the series of interest. You can produce forecasts for multiple variables when you use the DATA= option, but you cannot specify the aggregation technique for the variables or specify the TAIL= and HEAD= options in the FORECAST statement.

ODS Table Names

The FORECAST statement generates the following ODS table.
ODS Table Name
Description
Option
Forecast
Results from a series forecast
Default
ForecastInfo
Information about a series forecast
INFO
ForecastSelectedVars
Selected independent variables from a series forecast
INDEP=
For information about using the ODS table with SAVE= option, see the Details section of the STORE statement.