Previous Page | Next Page

The HPFDIAGNOSE Procedure

PROC HPFDIAGNOSE Statement

PROC HPFDIAGNOSE options ;
PROC HPFDIAG options ;

The following options can be used in the PROC HPFDIAGNOSE or HPFDIAG statement.

ALPHA=value

specifies the confidence level size to use in computing the confidence limits in the model selection list files. The ALPHA= value must be between (0, 1). The default is ALPHA=0.05, which produces 95% confidence intervals.

BACK=number

specifies the number of observations before the end of the data. If BACK= and the number of observation is , then the first observations are used to diagnose a series. The default is BACK=0.

BASENAME=SAS-name

prefixes the model specification file name or the model selection list file name or both. If the BASENAME=MYSPEC, then the model specification files or the model selection list files or both are named MYSPEC0, ..., MYSPEC9999999999. The default SAS-name starts with DIAG, such as DIAG0, ..., DIAG9999999999. The model specification files or the model selection list files or both are stored in the model repository defined by the REPOSITORY= option.

CRITERION=option

specifies the model selection criterion to select the best model. This option is often used in conjunction with the HOLDOUT= and HOLDOUTPCT= options. The default is CRITERION=RMSE. The following statistics of fit are provided:

SSE

sum of square error

MSE

mean squared error

RMSE

root mean squared error

UMSE

unbiased mean squared error

URMSE

unbiased root mean squared error

MAXPE

maximum percent error

MINPE

minimum percent error

MPE

mean percent error

MAPE

mean absolute percent error

MDAPE

median percent error

GMAPE

geometric mean percent error

MAPES

mean absolute error percent of standard deviation

MDAPES

median absolute error percent of standard deviation

GMAPES

geometric mean absolute error percent of standard deviation

MINPPE

minimum predictive percent error

MAXPPE

maximum predictive percent error

MPPE

mean predictive percent error

MAPPE

symmetric mean absolute predictive percent error

MDAPPE

median predictive percent error

GMAPPE

geometric mean predictive percent error

MINSPE

minimum symmetric percent error

MAXSPE

maximum symmetric percent error

MSPE

mean symmetric percent error

SMAPE

symmetric mean absolute percent error

MDASPE

median symmetric percent error

GMASPE

geometric mean symmetric percent error

MINRE

minimum relative error

MAXRE

maximum relative error

MRE

mean relative error

MRAE

mean relative absolute error

MDRAE

median relative absolute error

GMRAE

geometric mean relative absolute error

MAXERR

maximum error

MINERR

minimum error

ME

mean error

MAE

mean absolute error

MASE

mean absolute scaled error

RSQUARE

R-square

ADJRSQ

adjusted R-square

AADJRSQ

Amemiya’s adjusted R-square

RWRSQ

random walk R-square

AIC

Akaike information criterion

AICC

Akaike information Corrected criterion

SBC

Schwarz Bayesian information criterion

APC

Amemiya’s prediction criterion

DATA=SAS data set

specifies the name of the SAS data set that contains the time series. If the DATA= option is not specified, the most recently created SAS data set is used.

DELAYEVENT=number

specifies the delay lag for the events. If the option is not specified, the delay lag for the events is set to zero by default.

DELAYINPUT=number

specifies the delay lag for the inputs. If the option is not specified, the delay lag for the inputs is appropriately chosen by the procedure.

ENTRYPCT=number

specifies a threshold to check the percentage increment of the criterion between two candidate models. The ENTRYPCT=value should be in (0,100); the default is ENTRYPCT=0.1.

ERRORCONTROL=( SEVERITY= ( severity-options) STAGE= ( stage-options) MAXMESSAGE=number)

allows finer control of message printing. The error severity level and the HPFDIAGNOSE procedure processing stages are set independently. The MAXMESSAGE=number option controls the number of messages printed. A logical ‘and’ is taken over all the specified options and any message.

Available severity-options are as follows:

LOW

specifies low severity, minor issues

MEDIUM

specifies medium severity problems

HIGH

specifies severe errors

ALL

specifies all severity levels of LOW, MEDIUM, and HIGH options

NONE

specifies that no messages from PROC HPFDIAGNOSE are printed

Available stage-options are as follows:

PROCEDURELEVEL

specifies that the procedure stage is option processing and validation

DATAPREP

specifies the accumulation of data and the application of SETMISS= and ZEROMISS= options

DIAGNOSE

specifies the diagnostic process

ALL

specifies all PROCEDURELEVEL, DATAPREP, and DIAGNOSE options

Examples are as follows.

The following statement prints high- and moderate-severity errors at any processing stage of PROC HPFDIAGNOSE:

   errorcontrol=(severity=(high medium) stage=all)

The following statement prints high-severity errors only during the data preparation:

   errorcontrol=(severity=high stage=dataprep)

The following statement turns off messages from PROC HPFDIAGNOSE:

   errorcontrol=(severity=none stage=all)
   errorcontrol=(maxmessage=0)

Each of the following statements specifies the default behavior:

   errorcontrol=( severity=(high medium low)
                  stage=(procedurelevel dataprep diagnose) )
   errorcontrol=(severity=all stage=all)
EVENTBY=SAS data set

specifies the name of the event data set that contains the events for specific BY groups that are created by DATA steps. The events in the EVENT statement are used in all BY groups, but the events in the EVENTBY= data set are used in the specific BY group.

EXCEPTIONS=except-option

specifies the desired handling of arithmetic exceptions during the run. You can specify except-option as one of the following:

IGNORE

specifies that PROC HPFDIAGNOSE stop on an arithmetic exception. No recovery is attempted. This is the default behavior if the EXCEPTIONS= option is not specified.

CATCH

specifies that PROC HPFDIAGNOSE skip the generation of diagnostic output for the variable that produces the exception in the current BY group. PROC HPFDIAGNOSE generates a record to the OUTEST= data set with a blank select list name in the _SELECT_ column. The blank select list name reflects the handled exception on that combination of variable and BY group.

HOLDOUT=number

specifies the size of the holdout sample to be used for model selection. The holdout sample is a subset of the dependent time series that ends at the last nonmissing observation. The statistics of a model selection criterion are computed using only the holdout sample. The default is HOLDOUT=0.

HOLDOUTPCT=value

specifies the size of the holdout sample as a percentage of the length of the dependent time series. If HOLDOUT=5 and HOLDOUTPCT=10, the size of the holdout sample is where is the length of the dependent time series with beginning and ending missing values removed. The default is HOLDOUTPCT=0.

INEST=SAS data set

contains information that maps forecast variables to models or selection lists, and data set variables to model variables.

INEVENT=SAS data set

specifies the name of the event data set that contains the event definitions created by the HPFEVENTS procedure. If the INEVENT= data set is not specified, only SAS predefined event definitions can be used in the EVENT statement.

For more information about the INEVENT= option, see Chapter 7, The HPFEVENTS Procedure .

INPUTMISSINGPCT=value

specifies the size of the missing observation as a percentage of the length of the input time series. If INPUTMISSINGPCT=50, then the input time series that has more than 50% missing data is ignored in the model. The default is INPUTMISSINGPCT=10.

INSELECTNAME=SAS-name

specifies the name of a catalog entry that serves as a model selection list. This is the selection list that includes existing model specification files. A selection list created by the HPFDIAGNOSE procedure includes the existing model specification files.

MINOBS=(SEASON=number TREND=number)

SEASON=

specifies that no seasonal model is fitted to any series with fewer nonmissing observations than number (season length). The value of number must be greater than or equal to 1. The default is number = 2.

TREND=

specifies that no trend model is fitted to any series with fewer nonmissing observations than number . The value of number must be greater than or equal to 1. The default is number = 1.

NODIAGNOSE

specifies that the series is not diagnosed. If the INSELECTNAME= option and OUTEST= option are specified, the existing model specification files are written to the OUTEST data set.

NOINESTOPTS

specifies that the selection lists referred to by the INEST= option are not used in the diagnosed version.

OUTEST=SAS data set

contains information that maps data set variables to model symbols and references model specification files and model selection list files.

OUTOUTLIER=SAS data set

contains information that is associated with the detected outliers.

OUTPROCINFO= SAS-data-set

names the output data set to contain the summary information of the processing done by PROC HPFDIAGNOSE . It is particularly useful for easy programmatic assessment of the status of the procedure’s execution via a data set instead of looking at or parsing the SAS log.

PREFILTER=MISSING | YES | EXTREME | BOTH

specifies handling missing and extreme values prior to diagnostic tests.

MISSING

Smoothed values for missing data are applied for tentative order selection and missing values are used for the final diagnostics.

YES

Smoothed values for missing data are applied to overall diagnoses. This option is the default.

EXTREME

Extreme values set to missing for a tentative ARIMA model and extreme values are used for the final ARIMAX model diagnostics.

BOTH

This value is equivalent to both YES and EXTREME.

If the input variables have missing values, they are always smoothed for the diagnostics.

PRINT=NONE | SHORT | LONG | ALL

specifies the print option.

NONE

suppresses the printed output. This option is the default.

SHORT

prints the model specifications. This option also prints only the significant input variables, events, and outliers.

LONG

prints the summary of the transform, the stationarity test, and the determination of ARMA order in addition to all of the information printed by PRINT=SHORT.

ALL

prints the details of the stationarity test and the determination of ARMA order. This option prints the detail information about all input variables and events under consideration.

REPOSITORY=catalog

contains information about model specification files and model selection list files. The REPOSITORY= option can also be specified as MODELREPOSITORY=, MODELREP=, or REP=. The default model repository is SASUSER.HPFDFLT.


RETAINCHOOSE=YES | NO

RETAINCHOOSE=TRUE | FALSE

specifies that the CHOOSE= option in the HPFSELECT procedure is respected when re-diagnosing series. The default is RETAINCHOOSE=YES.

SEASONALITY=number

specifies the length of the seasonal cycle. The number should be a positive integer. For example, SEASONALITY=3 means that every group of three observations forms a seasonal cycle. By default, the length of the seasonal cycle is 1 (no seasonality) or the length implied by the INTERVAL= option specified in the ID statement. For example, INTERVAL=MONTH implies that the length of the seasonal cycle is 12.

SELECTINPUT=SELECT | ALL | number

specifies the maximum number of the input variables to select.

SELECT

selects the input variables that satisfy the criteria (noncollinearity, nonnegative delay, smaller AIC). This option is the default.

ALL

selects the input variables that satisfy the criteria (noncollinearity, nonnegative delay).

number

selects the best number input variables that satisfy the criteria (noncollinearity, nonnegative delay).

SELECTEVENT=SELECT | ALL |number

specifies the maximum number of events to select.

SELECT

selects the events that satisfy the criteria (noncollinearity, smaller AIC). This option is the default.

ALL

selects the events that satisfy the criteria (noncollinearity).

number

selects the best number of events that satisfy the criteria (noncollinearity).

SIGLEVEL=value

specifies the cutoff value for all diagnostic tests such as log transformation, stationarity, tentative ARMA order selection, and significance of UCM components. The SIGLEVEL=value should be between (0,1) and SIGLEVEL=0.05 is the default. The SIGLEVEL options in TRANSFORM, TREND, ARIMAX, and UCM statements control testing independently.

SELECTBASE=SAS-name

prefixes the model selection list file name. If the SELECTBASE=MYSELECT, then the model selection list files are named MYSELECT0, MYSELECT1, and so on. The default SAS-name starts with DIAG, such as DIAG0, DIAG1, and so on. The model selection list files are stored in the model repository defined by the REPOSITORY= option.

SPECBASE=SAS-name

prefixes the model specification file name. If the SPECBASE=MYSPEC, then the model specification files are named MYSPEC0, MYSPECT1, and so on. The default SAS-name starts with DIAG, such as DIAG0, DIAG1, and so on. The model specification files are stored in the model repository defined by the REPOSITORY= option.

TESTINPUT=TRANSFORM | TREND | BOTH

TRANSFORM

specifies that the log transform testing of the input variables is applied independently of the variable to be forecast.

TREND

specifies that the trend testing of the input variables is applied independently of the variable to be forecast.

BOTH

specifies that the log transform and trend testing of the input variables are applied independently of the variable to be forecast.

If this option is not specified, the same differencing is applied to the input variables as is used for the variable to be forecast, and no transformation is applied to the input variables.

Previous Page | Next Page | Top of Page