REGRESSION
regressiongroupoptions ;
REGRESSION PREDEFINED=
variables < / B=(value <F> …) > ;
REGRESSION USERVAR=
variables < / B=(value <F> …) USERTYPE=(values) > ;
The REGRESSION statement includes regression variables in a regARIMA model or specifies regression variables whose effects
are to be removed by the IDENTIFY statement to aid in ARIMA model identification. Include the PREDEFINED= option to select predefined regression variables. Include the USERVAR= option to specify userdefined regression variables.
Table 38.3 shows the X12ARIMA tables that contain regression factors. Tables A8AO, A8LS, and A8TC are available only when more than
one outlier type is present in the model.
Table 38.3: X12ARIMA Regression Effects Tables
Table

Regression Effects

A6

Trading day effects

A7

Holiday effects including Easter, Labor Day, and ThanksgivingChristmas

A8

Combined effects of outliers, level shifts, ramps, and temporary changes

A8AO

Point outlier effects; available only when more than one outlier type is present in the model

A8LS

Level shift and ramp effects; available only when more than one outlier type is present in the model

A8TC

Temporary change effects; available only when more than one outlier type is present in the model

A9

Userdefined regression effects

A10

Userdefined seasonal component effects

Missing values in the span of an input series automatically create missing value regressors. See the NOTRIMMISS option in the PROC X12 statement and the section Missing Values for further details about missing values.
Combining your model with additional predefined regression variables can result in a singularity problem. To successfully
perform the regression if a singularity occurs, you might need to alter either the model or the choices of the regressors.
To seasonally adjust a series that uses a regARIMA model, the factors derived from regression are used as multiplicative or
additive factors, depending on the mode of seasonal decomposition. Therefore, regressors that are appropriate to the mode
of the seasonal decomposition should be defined, so that meaningful combined adjustment factors can be derived and adjustment
diagnostics can be generated. For example, if a regARIMA model is applied to a logtransformed series, then the regression
factors are expressed as ratios, which match the form of the seasonal factors that are generated by the multiplicative or
logadditive adjustment modes. Conversely, if a regARIMA model is fit to the original series, then the regression factors
are measured on the same scale as the original series, which matches the scale of the seasonal factors that are generated
by the additive adjustment mode. Note that the default transformation (no transformation) and the default seasonal adjustment
mode (multiplicative) are in conflict. Thus, when you specify the X11 statement and any of the REGRESSION, INPUT, or EVENT
statements, you must also either use the TRANSFORM statement to specify a transformation or use the MODE= option in the X11 statement to specify a different mode to seasonally adjust the data that uses the regARIMA model.
According to Ladiray and Quenneville (2001), “X12ARIMA is based on the same principle [as the X11 method] but proposes, in addition, a complete module, called RegARIMA,
that allows for the initial series to be corrected for all sorts of undesirable effects. These effects are estimated using
regression models with ARIMA errors (Findley et al. [23]).” The REGRESSION, INPUT, and EVENT statements specify these regression effects. Predefined effects that can be corrected in
this manner are listed in the PREDEFINED= option. You can create your own definitions to remove other effects by using the USERVAR= option and the EVENT statement.
You can specify either the PREDEFINED= option or the USERVAR= option, but not both, in a single REGRESSION statement. You
can use multiple REGRESSION statements.
You can specify the following regressiongroupoptions in the REGRESSION statement. The regressiongroupoptions apply to all regression variables in a regression group. For predefined regression variables, the regression group is predefined.
For userdefined regression variables, you can specify the regression group in the USERTYPE= option.

AICTEST=(EASTER  TD  TD1COEF  TD1NOLPYEAR  TDNOLPYEAR  TDSTOCK  USER)

specifies that an AICbased selection be used to determine whether a given set of regression variables are to be included
with the specified regARIMA model. For example, if you specify a trading day model selection, then AIC values (with a correction
for the length of the series, henceforth referred to as AICC) are derived for models with and without the specified trading
day variable. By default, the model with a smaller AICC is used to generate forecasts, identify outliers, and so on. If you
specify more than one type of regressor, the AIC tests are performed sequentially in this order: (a) trading day regressors,
(b) Easter regressors, (c) userdefined regressors. If there are several variables of the same type (for example, several
trading day regressors), then AICbased selection is applied to them as a group. That is, either all variables of this type
or none are included in the final model. If you do not specify this option, no automatic AICbased selection is performed.
If you use the AUTOMDL statement to identify the model and you also specify this option, then this option affects the model selection process in
the following manner:

AICbased selection tests are performed on the default model.

A new series is created by removing the regression effects that are identified in the default model from the original series.
The automatic model identification process attempts to identify a model that is based on the new series.

After a model is automatically identified, AICbased selection tests that use the automatically identified model are performed
on the original series.

The default model, including regressors that are identified by using AICbased selection, is compared to the automatically
identified model, which also might include regressors that are identified by using AICbased selections. The regressors for
the two models can differ.
For more information about the X12ARIMA automatic modeling method, see section 7.2 of the X12ARIMA Reference Manual (U.S. Bureau of the Census, 2009c).

NOAPPLY=(AO  HOLIDAY  LS  TC  TD  USER  USERSEASONAL)

specifies a list of the types of regression effects whose modelestimated values are not to be removed from the original series
before performing the seasonal adjustment calculations that are specified by the X11 statement. The NOAPPLY= option applies
to the regression component values displayed in the X11 seasonal adjustment method regARIMA component tables as shown in Table 38.4.
Table 38.4: NOAPPLY= Types and Regression Effects
NOAPPLY= Option

Regression Effects Table

Description

AO

A8AO

Point outliers

HOLIDAY

A7

Easter, Labor Day, and ThanksgivingtoChristmas



holiday effects

LS

A8LS

Level changes and ramps

TC

A8TC

Temporary changes

TD

A6

Trading day effects

USER

A9

Userdefined regression effects

USERSEASONAL

A10

Userdefined seasonal regression effects

You can specify the following regression variable specification options in the REGRESSION statement.

PREDEFINED=CONSTANT  EASTER(value)  LABOR(value)  LOM  LOMSTOCK  LOQ  LPYEAR
PREDEFINED=SCEASTER(value)  SEASONAL  SINCOS(value …)  TD  TD1COEF
PREDEFINED=TD1NOLPYEAR  TDNOLPYEAR  TDSTOCK(value)  THANK(value)

lists the predefined regression variables to be included in the model. Data values for these variables are calculated by the program, mostly as functions of the calendar. Table 38.5 gives definitions for the available predefined variables. The values LOM and LOQ are equivalent: the actual regression is
controlled by the SEASONS= option in the PROC X12 statement. You can specify multiple predefined regression variables. The
syntax for using both a lengthofmonth and a seasonal regression can be in one of the following forms:
regression predefined=lom seasonal;
regression predefined=(lom seasonal);
regression predefined=lom predefined=seasonal;
The following restrictions apply when you use more than one predefined regression variable:

You can specify only one of TD, TDNOLPYEAR, TD1COEF, or TD1NOLPYEAR.

You cannot specify LPYEAR with TD, TD1COEF, LOM, LOMSTOCK, or LOQ.

You cannot specify LOM or LOQ with TD or TD1COEF.

If you specify the SINCOS predefined regression variable, then you must also specify the INTERVAL= option or the SEASONS=
option in the PROC X12 statement because there are restrictions on this regression variable that are based on the frequency
of the data.
The predefined regression variables, EASTER, LABOR, SCEASTER, SINCOS, TDSTOCK, and THANK, require extra parameters. Only one
TDSTOCK regressor can be implemented in the regression model. If you specify multiple TDSTOCK variables, PROC X12 uses the
last TDSTOCK variable specified. For EASTER, LABOR, SCEASTER, SINCOS, and THANK, you can specify the variables with different
parameters to implement multiple regressors in the model. For example, the following statement specifies two EASTER regressors
with widths 7 and 14:
regression predefined=easter(7) easter(14);
For SINCOS, specifying a parameter includes both the sine and the cosine regressor except for the highest order allowed (2
for quarterly data and 6 for monthly data.) For quarterly data, the following statement is the most common use of the SINCOS
variable; it includes three regressors in the model:
regression predefined=sincos(1,2);
For monthly data, the following statement is the most common use of the SINCOS variable; it includes 11 regressors in the
model:
regression predefined=sincos(1,2,3,4,5,6);
Table 38.5: Predefined Regression Variables in X12ARIMA
Regression Effect

Variable Definitions




where


and


is the number of the days before Easter that fall in month

Easter holiday

(or quarter) . (Note: This variable is except in February, March,

EASTER()

and April (or first and second quarter).


It is nonzero in February only for .)


Restriction: .

Labor Day


LABOR()

(Note: This variable is except in August and September.)


Restriction: .

Lengthofmonth

where = length of month (in days)

(monthly flow)

and (average length of month)

LOM


Stock lengthofmonth

LOMSTOCK



where and are defined in LOM and



Lengthofquarter

where = length of quarter (in days)

(quarterly flow)

and (average length of quarter)

LOQ


Leap year

(monthly and quarterly flow)

LPYEAR



Statistics Canada Easter

If Easter falls before April , let be the number of the days

(monthly or quarterly flow)

on or before Easter that fall in March. Then:

SCEASTER()





If Easter falls on or after April , then .


(Note: This variable is except in March and April (or first and


second quarter).) Restriction: .





Fixed seasonal


SINCOS()

, and is the seasonal period

SINCOS()

for )


Restrictions: , .

Trading day


TD, TDNOLPYEAR


One coefficient trading day


TD1COEF, TD1NOLPYEAR


Stock trading day

TDSTOCK()






where is the smaller of and the length of month .


For endofmonth stock series, set to 31; that is,


specify TDSTOCK(31). Restriction: .

Thanksgiving

proportion of days from days before Thanksgiving

THANK()

through December 24 that fall in month (negative values of indicate


days after Thanksgiving).


(Note: This variable is except in November and December.)


Restriction: .


USERVAR=(variables)

specifies variables in the DATA= or AUXDATA= data set (which are specified in the PROC X12 statement) that are to be used
as regressors. The variables in the data set should contain the values for each observation that define the regressor. Regression
variables should also include future values in the data set for the forecast horizon if the time series is to be extended
with regARIMA forecasts. Regression variables should include past values if the time series is to be extended with regARIMA
backcasts. Missing values are not permitted within the data span, including backcasts and forecasts, of the userdefined regressors.
Example 38.6 shows how to create an input data set that contains both the series to be seasonally adjusted and a userdefined input variable.
Example 38.11 shows how to create an auxiliary data set that contains a userdefined input variable. For more information about specifying
userdefined regression variables see the section UserDefined Regression Variables.
All regression variables in the USERVAR= option apply to all time series to be seasonally adjusted unless the MDLINFOIN= data
set specifies different regression information. You cannot specify the PREDEFINED= option and the USERVAR= option in the same
REGRESSION statement; however, you can specify multiple REGRESSION statements.
You can specify the following options for individual regression variables. Individual regression variable options are specified in the PREDEFINED= and USERVAR=
options after the slash. The B= option can be specified in both the PREDEFINED= and USERVAR= options. Because the regression
group is predefined for predefined variables, you can specify the USERTYPE= option only in the USERVAR= option.

B=(value <F> …)

specifies initial or fixed values for the regression parameters in the order in which they appear in a PREDEFINED= or USERVAR=
option. Each B= list applies to the PREDEFINED= or USERVAR= variable list that immediately precedes the slash.
For example, the following statements set an initial value of 1 for the userdefined regressor, x
:
regression predefined=LOM ;
regression uservar=x / b=1 2 ;
In this example, the B= option applies only to the USERVAR= option. The value 2 is discarded because there is only one variable
in the USERVAR= list.
To assign an initial value of 1 to the LOM regressor and 2 to the x
regressor, use the following statements:
regression predefined=LOM / b=1;
regression uservar=x / b=2 ;
An F immediately following the numerical value indicates that this is not an initial value, but a fixed value. See Example 38.8 for an example that uses fixed parameters. In PROC X12, individual parameters can be fixed while other parameters in the
same model are estimated.

USERTYPE=(values)

enables a variable that you define to be processed in the same manner as a U.S. Census predefined variable. You can specify
the following values: AO, CONSTANT, EASTER, HOLIDAY, LABOR, LOM, LOMSTOCK, LOQ, LPYEAR, LS, RP, SCEASTER, SEASONAL, TC, TD, TDSTOCK, THANKS, or
USER. For example, the U.S. Census Bureau EASTER() regression effects are included the “RegARIMA Holiday Component” table (A7). Specify USERTYPE=EASTER to define a variable that is processed exactly as the U.S. Census predefined EASTER() variable, including inclusion in the A7 table. Each USERTYPE= list applies to the USERVAR= variable list that immediately
precedes the slash. USERTYPE= does not apply to U.S. Census predefined variables.
The same rules for assigning B= values to regression variables apply for USERTYPE= options. For example, the following statements
specify that the userdefined regressor in the variable MyEaster
be processed exactly as the U.S. Census predefined LOM variable:
regression uservar=MyLOM;
regression uservar=MyEaster / usertype=LOM EASTER;
In this example, the USERTYPE= option applies only to the MyEaster
variable in the second REGRESSION statement. The USERTYPE value EASTER is discarded because there is only one variable in
the USERVAR= list.
To assign the USERTYPE value LOM to the MyLOM
variable and EASTER to the MyEaster
variable, use the following statements:
regression uservar=MyLOM / usertype=LOM;
regression uservar=MyEaster / usertype=EASTER;
The following USERTYPE= options specify that the regression effect be removed from the seasonally adjusted series: EASTER,
HOLIDAY, LABOR, LOM, LOMSTOCK, LOQ, LPYEAR, SCEASTER, SEASONAL, TD, TDSTOCK, THANKS, and USER. When a regression effect is
removed from the seasonally adjusted series, the level (mean) of the seasonally adjusted series can be altered. It is often
desirable to use a zeromean (meanadjusted) regressor for effects that are to be removed from the seasonally adjusted series.
See Example 38.6 for an example that specifies a zeromean regressor.