REGRESSION Statement
REGRESSION PREDEFINED= variables < / B=(value <F>) > ;
REGRESSION USERVAR= variables < / B=(value <F>) USERTYPE=option > ;

The REGRESSION statement includes regression variables in a regARIMA model or specifies regression variables whose effects are to be removed by the IDENTIFY statement to aid in ARIMA model identification. Predefined regression variables are selected with the PREDEFINED= option. User-defined regression variables are specified with the USERVAR= option. The currently available predefined variables are listed in Table 37.3. Table A6 in the displayed output generated by the X12 procedure provides information related to trading day effects. Table A7 provides information related to holiday effects. Tables A8, A8AO, A8LS, and A8TC provide information related to outlier factors. Ramps and level shifts are combined in the A8LS table. The A8AO, A8LS, and A8TC tables are available only when more than one outlier type is present in the model. Table A9 provides information about user-defined regression effects. Table A10 provides information about the user-defined seasonal component. Missing values in the span of an input series automatically create missing value regressors. See the NOTRIMMISS option in the PROC X12 statement and the section Missing Values for further details about missing values. Combining your model with additional predefined regression variables can result in a singularity problem. If a singularity occurs, then you might need to alter either the model or the choices of the predefined regressors in order to successfully perform the regression.

In order to seasonally adjust a series that uses a regARIMA model, the factors derived from regression are used as multiplicative or additive factors based on the mode of seasonal decomposition. Therefore, regressors should be defined that are appropriate to the mode of the seasonal decomposition, so that meaningful combined adjustment factors can be derived and adjustment diagnostics can be generated. For example, if a regARIMA model is applied to a log-transformed series, then the regression factors are expressed as ratios, which match the form of the seasonal factors that are generated by the multiplicative or log-additive adjustment modes. Conversely, if a regARIMA model is fit to the original series, then the regression factors are measured on the same scale as the original series, which matches the scale of the seasonal factors that are generated by the additive adjustment mode. Note that the default transformation (no transformation) and the default seasonal adjustment mode (multiplicative) are in conflict. Thus when you specify the X11 statement and any of the REGRESSION, INPUT, or EVENT statements, you must also specify either a transformation by using the TRANSFORM statement or a different mode by using the MODE= option in the X11 statement in order to seasonally adjust the data that uses the regARIMA model.

According to Ladiray and Quenneville (2001), "X-12-ARIMA is based on the same principle [as the X-11 method] but proposes, in addition, a complete module, called Reg-ARIMA, that allows for the initial series to be corrected for all sorts of undesirable effects. These effects are estimated using regression models with ARIMA errors (Findley et al. [23])." The REGRESSION, INPUT, and EVENT statements specify these regression effects. Predefined effects that can be corrected in this manner are listed in the PREDEFINED= option. You can create your own definitions to remove other effects by using the USERVAR= option and the EVENT statement.

Either the PREDEFINED= option or the USERVAR= option can be specified in a single REGRESSION statement, but not both. Multiple REGRESSION statements can be used.

The following options can appear in the REGRESSION statement.

PREDEFINED=CONSTANT
PREDEFINED=EASTER(value)
PREDEFINED=LABOR(value)
PREDEFINED=LOM
PREDEFINED=LOMSTOCK
PREDEFINED=LOQ
PREDEFINED=LPYEAR
PREDEFINED=SCEASTER(value)
PREDEFINED=SEASONAL
PREDEFINED=SINCOS(value ...)
PREDEFINED=TD
PREDEFINED=TD1COEF
PREDEFINED=TD1NOLPYEAR
PREDEFINED=TDNOLPYEAR
PREDEFINED=TDSTOCK(value)
PREDEFINED=THANK(value)

lists the predefined regression variables to be included in the model. Data values for these variables are calculated by the program, mostly as functions of the calendar. Table 37.3 gives definitions for the available predefined variables. The values LOM and LOQ are equivalent: the actual regression is controlled by the SEASONS= option in the PROC X12 statement. Multiple predefined regression variables can be used. The syntax for using both a length-of-month and a seasonal regression can be in one of the following forms:

   regression predefined=lom seasonal;

   regression predefined=(lom seasonal);

   regression predefined=lom predefined=seasonal;

Certain restrictions apply when you use more than one predefined regression variable. Only one of TD, TDNOLPYEAR, TD1COEF, or TD1NOLPYEAR can be specified. LPYEAR cannot be used with TD, TD1COEF, LOM, LOMSTOCK, or LOQ. LOM or LOQ cannot be used with TD or TD1COEF.

The following restriction also applies to the SINCOS predefined regression variable. If SINCOS is specified, then the INTERVAL= option or the SEASONS= option in the PROC X12 statement must also be specified because there are restrictions to this regression variable based on the frequency of the data.

The predefined regression variables TDSTOCK, SCEASTER, EASTER, LABOR, THANK, and SINCOS require extra parameters. Only one TDSTOCK regressor can be implemented in the regression model. If multiple TDSTOCK variables are specified, PROC X12 uses the last TDSTOCK variable specified. For SCEASTER, EASTER, LABOR, THANK, and SINCOS, multiple regressors can be implemented in the model by specifying the variables with different parameters. For example, the following statement specifies two EASTER regressors with widths 7 and 14:

   regression predefined=easter(7) easter(14);

For SINCOS, specifying a parameter includes both the sine and the cosine regressor except for the highest order allowed (2 for quarterly data and 6 for monthly data.) The most common use of the SINCOS variable for quarterly data is

   regression predefined=sincos(1,2);

and for monthly data is

   regression predefined=sincos(1,2,3,4,5,6);

These statements include 3 and 11 regressors in the model, respectively.

Table 37.3 Predefined Regression Variables in X-12-ARIMA

Regression Effect

Variable Definitions

 

Trend constant

CONSTANT

where

 

and

 

is the number of the days before Easter that fall in month

Easter holiday

(or quarter) . (Note: This variable is except in February, March,

EASTER()

and April (or first and second quarter).

 

It is nonzero in February only for .)

 

Restriction: .

Labor Day

LABOR()

(Note: This variable is except in August and September.)

 

Restriction: .

Length-of-month

where = length of month (in days)

(monthly flow)

and (average length of month)

LOM

 


 

Stock length-of-month

LOMSTOCK

where and are defined in LOM and

Length-of-quarter

where = length of quarter (in days)

(quarterly flow)

and (average length of quarter)

LOQ

 

Leap year

(monthly and quarterly flow)

LPYEAR

Statistics Canada Easter

If Easter falls before April , let be the number of the days

(monthly or quarterly flow)

on or before Easter that fall in March. Then:

SCEASTER()

 
 

 

If Easter falls on or after April , then .

 

(Note: This variable is except in March and April (or first and

 

second quarter).) Restriction: .

Fixed seasonal

SEASONAL

 

Fixed seasonal

SINCOS()

and is the seasonal period

SINCOS()

for )

 

Restrictions: , .


 

Trading day

TD, TDNOLPYEAR

One coefficient trading day

TD1COEF, TD1NOLPYEAR

 

Stock trading day

TDSTOCK()

 

 

where is the smaller of and the length of month .

 

For end-of-month stock series, set to 31; that is,

 

specify TDSTOCK(31). Restriction: .

Thanksgiving

proportion of days from days before Thanksgiving

THANK()

through December 24 that fall in month (negative values of indicate

 

days after Thanksgiving).

 

(Note: This variable is except in November and December.)

 

Restriction: .

USERVAR=(variables)

specifies variables in the PROC X12 DATA= or AUXDATA= data set that are to be used as regressors. The variables in the data set should contain the values for each observation that define the regressor. Regression variables should also include future values in the data set for the forecast horizon if the time series is to be extended with regARIMA forecasts. Missing values are not permitted within the data span, including forecasts, of the user-defined regressors. Example 37.6 shows how to create an input data set that contains both the series to be seasonally adjusted and a user-defined input variable. All regression variables in the USERVAR= option apply to all time series to be seasonally adjusted unless the MDLINFOIN= data set specifies different regression information. The PREDEFINED= option and the USERVAR= option cannot be specified in the same REGRESSION statement; however, multiple REGRESSION statements can be specified.

The following options can be specified with the PREDEFINED= and USERVAR= options after the slash.

B=(value <F> ...)

specifies initial or fixed values for the regression parameters in the order in which they appear in a PREDEFINED= or USERVAR= option. Each B= list applies to the PREDEFINED= or USERVAR= variable list that immediately precedes the slash.


For example, the following statements set an initial value for the user-defined regressor, x, of 1:

   regression predefined=LOM ;
   regression uservar=x / b=1 2 ;

In this example, the B= option applies only to the USERVAR= statement. The value 2 is discarded since there is only one variable in the USERVAR= list.

To assign an initial value of 1 to the LOM regressor and 2 to the x regressor, use the following statements:

   regression predefined=LOM / b=1;
   regression uservar=x / b=2 ;

An F immediately following the numerical value indicates that this is not an initial value, but a fixed value. See Example 37.8 for an example that uses fixed parameters. In PROC X12, individual parameters can be fixed while other parameters in the same model are estimated.

USERTYPE=(value)

enables a user-defined variable to be processed in the same manner as a U.S. Census predefined variable. value can be AO, CONSTANT, EASTER, HOLIDAY, LABOR, LOM, LOMSTOCK, LOQ, LPYEAR, LS, RP, SCEASTER, SEASONAL, TC, TD, TDSTOCK, THANKS, or USER. For example, the U.S. Census Bureau EASTER() regression effects are included the "RegARIMA Holiday Component" table (A7). Specify USERTYPE=EASTER to include a user-defined variable that is processed exactly as the U.S. Census predefined EASTER() variable, including inclusion in the A7 table. Each USERTYPE= list applies to the USERVAR= variable list that immediately precedes the slash. USERTYPE= does not apply to U.S. Census predefined variables.

The same rules for assigning B= values to regression variables apply for USERTYPE= options. For example, the following statements specify that the user-defined regressor in the variable MyEaster be processed exactly as the U.S. Census predefined LOM variable:

   regression uservar=MyLOM;
   regression uservar=MyEaster / usertype=LOM EASTER;

In this example, the USERTYPE= option applies only to the MyEaster variable in the second REGRESSION statement. The USERTYPE value EASTER is discarded since there is only one variable in the USERVAR= list.

To assign the USERTYPE value LOM to the MyLOM variable and EASTER to the MyEaster variable, use the following statements:

   regression uservar=MyLOM / usertype=LOM;
   regression uservar=MyEaster / usertype=EASTER;