The VARMAX Procedure

Vector Error Correction Modeling

This section discusses the implication of cointegration for the autoregressive representation. Assume that the cointegrated series can be represented by a vector error correction model according to the Granger representation theorem (Engle and Granger 1987). Consider the vector autoregressive process with Gaussian errors defined by

$\displaystyle  \mb {y} _ t = \sum _{i=1}^ p\Phi _ i\mb {y} _{t-i} + \bepsilon _ t  $

or

$\displaystyle  \Phi (B) \mb {y} _ t = \bepsilon _ t  $

where the initial values, $\mb {y} _{-p+1},\ldots ,\mb {y} _0$, are fixed and $\bepsilon _ t \sim N(0,\Sigma )$. Since the AR operator $\Phi (B)$ can be re-expressed as $\Phi (B) = \Phi ^*(B)(1-B)+\Phi (1)B$, where $\Phi ^*(B)=I_ k-\sum _{i=1}^{p-1}\Phi ^*_ iB^ i$ with $\Phi ^*_ i= - \sum _{j=i+1}^ p \Phi _ j$, the vector error correction model is

\[  \Phi ^*(B)(1-B)\mb {y} _ t=\balpha \bbeta ’\mb {y} _{t-1} +\bepsilon _ t  \]

or

\[  \Delta \mb {y} _ t = \balpha \bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} + \bepsilon _ t  \]

where $\balpha \bbeta ’ = -\Phi (1)= -I_ k+\Phi _{1}+\Phi _{2}+\cdots +\Phi _{p}$.

One motivation for the VECM($p$) form is to consider the relation $\bbeta ’\mb {y} _{t} = \mb {c} $ as defining the underlying economic relations and assume that the agents react to the disequilibrium error $\bbeta ’\mb {y} _{t} - \mb {c} $ through the adjustment coefficient $\balpha $ to restore equilibrium; that is, they satisfy the economic relations. The cointegrating vector, $\bbeta $ is sometimes called the long-run parameters.

You can consider a vector error correction model with a deterministic term. The deterministic term $D_ t$ can contain a constant, a linear trend, and seasonal dummy variables. Exogenous variables can also be included in the model.

$\displaystyle  \Delta \mb {y} _ t = \Pi \mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i}+ A D_ t + \sum _{i=0}^{s}\Theta ^*_ i\mb {x} _{t-i} + \bepsilon _ t  $

where $\Pi = \balpha \bbeta ’$.

The alternative vector error correction representation considers the error correction term at lag $t-p$ and is written as

\[  \Delta \mb {y} _ t=\sum _{i=1}^{p-1}\Phi ^{\sharp }_ i\Delta \mb {y} _{t-i} +\Pi ^{\sharp } \mb {y} _{t-p} + A D_ t +\sum _{i=0}^{s}\Theta ^*_ i\mb {x} _{t-i} +\bepsilon _ t  \]

If the matrix $\Pi $ has a full-rank ($r=k$), all components of $\mb {y} _ t$ are $I(0)$. On the other hand, $\mb {y} _ t$ are stationary in difference if $rank(\Pi )=0$. When the rank of the matrix $\Pi $ is $r < k$, there are $k-r$ linear combinations that are nonstationary and $r$ stationary cointegrating relations. Note that the linearly independent vector $\mb {z} _ t=\bbeta ’\mb {y} _ t$ is stationary and this transformation is not unique unless $r=1$. There does not exist a unique cointegrating matrix $\bbeta $ since the coefficient matrix $\Pi $ can also be decomposed as

$\displaystyle  \Pi = \balpha MM^{-1}\bbeta ’ = \balpha ^{*}\bbeta ^{*}  $

where $M$ is an $r\times r$ nonsingular matrix.

Test for the Cointegration

The cointegration rank test determines the linearly independent columns of $\Pi $. Johansen (1988, 1995a) and Johansen and Juselius (1990) proposed the cointegration rank test by using the reduced rank regression.

Different Specifications of Deterministic Trends

When you construct the VECM($p$) form from the VAR($p$) model, the deterministic terms in the VECM($p$) form can differ from those in the VAR($p$) model. When there are deterministic cointegrated relationships among variables, deterministic terms in the VAR($p$) model are not present in the VECM($p$) form. On the other hand, if there are stochastic cointegrated relationships in the VAR($p$) model, deterministic terms appear in the VECM($p$) form via the error correction term or as an independent term in the VECM($p$) form. There are five different specifications of deterministic trends in the VECM($p$) form.

  • Case 1: There is no separate drift in the VECM($p$) form.

    \[  \Delta \mb {y} _ t = \balpha \bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} +\bepsilon _ t  \]
  • Case 2: There is no separate drift in the VECM($p$) form, but a constant enters only via the error correction term.

    \[  \Delta \mb {y} _ t = \balpha (\bbeta ’, \beta _0)(\mb {y} _{t-1}’,1)’ + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} + \bepsilon _ t  \]
  • Case 3: There is a separate drift and no separate linear trend in the VECM($p$) form.

    \[  \Delta \mb {y} _ t = \balpha \bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} + \bdelta _0 + \bepsilon _ t  \]
  • Case 4: There is a separate drift and no separate linear trend in the VECM($p$) form, but a linear trend enters only via the error correction term.

    \[  \Delta \mb {y} _ t = \balpha (\bbeta ’, \beta _1)(\mb {y} _{t-1}’,t)’ + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} + \bdelta _0 + \bepsilon _ t  \]
  • Case 5: There is a separate linear trend in the VECM($p$) form.

    \[  \Delta \mb {y} _ t = \balpha \bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} + \bdelta _0 + \bdelta _1t + \bepsilon _ t  \]

First, focus on Cases 1, 3, and 5 to test the null hypothesis that there are at most $r$ cointegrating vectors. Let

$\displaystyle  Z_{0t} $
$\displaystyle = $
$\displaystyle \Delta \mb {y} _ t  $
$\displaystyle Z_{1t} $
$\displaystyle = $
$\displaystyle \mb {y} _{t-1}  $
$\displaystyle Z_{2t} $
$\displaystyle = $
$\displaystyle [\Delta \mb {y} _{t-1}’,\ldots ,\Delta \mb {y} _{t-p+1}’,D_ t]’ $
$\displaystyle Z_{0}  $
$\displaystyle = $
$\displaystyle  [Z_{01}, \ldots , Z_{0T}]’  $
$\displaystyle Z_{1}  $
$\displaystyle = $
$\displaystyle  [Z_{11}, \ldots , Z_{1T}]’  $
$\displaystyle Z_{2}  $
$\displaystyle = $
$\displaystyle  [Z_{21}, \ldots , Z_{2T}]’  $

where $D_ t$ can be empty for Case 1, 1 for Case 3, and $(1,t)$ for Case 5.

In Case 2, $Z_{1t}$ and $Z_{2t}$ are defined as

$\displaystyle  Z_{1t} $
$\displaystyle = $
$\displaystyle [ \mb {y} _{t-1}’, 1]’  $
$\displaystyle Z_{2t} $
$\displaystyle = $
$\displaystyle [\Delta \mb {y} _{t-1}’,\ldots ,\Delta \mb {y} _{t-p+1}’]’ $

In Case 4, $Z_{1t}$ and $Z_{2t}$ are defined as

$\displaystyle  Z_{1t} $
$\displaystyle = $
$\displaystyle [ \mb {y} _{t-1}’, t]’  $
$\displaystyle Z_{2t} $
$\displaystyle = $
$\displaystyle [\Delta \mb {y} _{t-1}’,\ldots ,\Delta \mb {y} _{t-p+1}’, 1]’ $

Let $\Psi $ be the matrix of parameters consisting of $\Phi ^{*}_1$, …, $\Phi ^{*}_{p-1}$, $A$, and $\Theta ^*_0$, …, $\Theta ^{*}_ s$, where parameters $A$ corresponds to regressors $D_ t$. Then the VECM($p$) form is rewritten in these variables as

\[  Z_{0t}=\balpha \bbeta ’ Z_{1t} +\Psi Z_{2t} +\bepsilon _ t  \]

The log-likelihood function is given by

$\displaystyle  \ell  $
$\displaystyle = $
$\displaystyle  - \frac{kT}{2} \log 2\pi -\frac{T}{2} \log |\Sigma |  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  - \frac{1}{2} \sum _{t=1}^ T(Z_{0t} - \balpha \bbeta ’ Z_{1t} -\Psi Z_{2t})’\Sigma ^{-1} (Z_{0t} -\balpha \bbeta ’ Z_{1t} -\Psi Z_{2t})  $

The residuals, $R_{0t}$ and $R_{1t}$, are obtained by regressing $Z_{0t}$ and $Z_{1t}$ on $Z_{2t}$, respectively. The regression equation of residuals is

\[  R_{0t} = \balpha \bbeta ’ R_{1t} + \hat{ \bepsilon }_ t  \]

The crossproducts matrices are computed

\[  S_{ij} = \frac{1}{T}\sum _{t=1}^{T}R_{it}R_{jt}’,~ ~ i,j=0,1  \]

Then the maximum likelihood estimator for $\bbeta $ is obtained from the eigenvectors that correspond to the $r$ largest eigenvalues of the following equation:

\[  |\lambda S_{11} - S_{10}S_{00}^{-1}S_{01}| = 0  \]

The eigenvalues of the preceding equation are squared canonical correlations between $R_{0t}$ and $R_{1t}$, and the eigenvectors that correspond to the $r$ largest eigenvalues are the $r$ linear combinations of $\mb {y} _{t-1}$, which have the largest squared partial correlations with the stationary process $\Delta \mb {y} _{t}$ after correcting for lags and deterministic terms. Such an analysis calls for a reduced rank regression of $\Delta \mb {y} _{t}$ on $\mb {y} _{t-1}$ corrected for $(\Delta \mb {y} _{t-1},\ldots ,\Delta \mb {y} _{t-p+1},D_ t)$, as discussed by Anderson (1951). Johansen (1988) suggests two test statistics to test the null hypothesis that there are at most $r$ cointegrating vectors

\[  \mbox{H}_0: \lambda _ i=0 \mr {~ ~ for~ ~ } i=r+1,\ldots ,k  \]

Trace Test

The trace statistic for testing the null hypothesis that there are at most $r$ cointegrating vectors is as follows:

\[  \lambda _{trace} = -T\sum _{i=r+1}^{k}\log (1-\lambda _ i)  \]

The asymptotic distribution of this statistic is given by

\[  tr\left\{  \int _0^1 (dW){\tilde W}’ \left(\int _0^1 {\tilde W}{\tilde W}’dr\right)^{-1}\int _0^1 {\tilde W}(dW)’ \right\}   \]

where $tr(A)$ is the trace of a matrix $A$, $W$ is the $k-r$ dimensional Brownian motion, and $\tilde W$ is the Brownian motion itself, or the demeaned or detrended Brownian motion according to the different specifications of deterministic trends in the vector error correction model.

Maximum Eigenvalue Test

The maximum eigenvalue statistic for testing the null hypothesis that there are at most $r$ cointegrating vectors is as follows:

\[  \lambda _{max} = -T\log (1-\lambda _{r+1})  \]

The asymptotic distribution of this statistic is given by

\[  max\{  \int _0^1 (dW){\tilde W}’ (\int _0^1 {\tilde W}{\tilde W}’dr)^{-1}\int _0^1 {\tilde W}(dW)’ \}   \]

where $max(A)$ is the maximum eigenvalue of a matrix $A$. Osterwald-Lenum (1992) provided detailed tables of the critical values of these statistics.

The following statements use the JOHANSEN option to compute the Johansen cointegration rank trace test of integrated order 1:

proc varmax data=simul2;
   model y1 y2 / p=2 cointtest=(johansen=(normalize=y1));
run;

Figure 36.52 shows the output based on the model specified in the MODEL statement, an intercept term is assumed. In the Cointegration Rank Test Using Trace table, the column Drift In ECM means there is no separate drift in the error correction model and the column Drift In Process means the process has a constant drift before differencing. The Cointegration Rank Test Using Trace table shows the trace statistics based on Case 3 and the Cointegration Rank Test Using Trace under Restriction table shows the trace statistics based on Case 2. The output indicates that the series are cointegrated with rank 1 because the trace statistics are smaller than the critical values in both Case 2 and Case 3.

Figure 36.52: Cointegration Rank Test (COINTTEST=(JOHANSEN=) Option)

The VARMAX Procedure

Cointegration Rank Test Using Trace
H0:
Rank=r
H1:
Rank>r
Eigenvalue Trace 5% Critical Value Drift in ECM Drift in Process
0 0 0.4644 61.7522 15.34 Constant Linear
1 1 0.0056 0.5552 3.84    

Cointegration Rank Test Using Trace Under Restriction
H0:
Rank=r
H1:
Rank>r
Eigenvalue Trace 5% Critical Value Drift in ECM Drift in Process
0 0 0.5209 76.3788 19.99 Constant Constant
1 1 0.0426 4.2680 9.13    


Figure 36.53 shows which result, either Case 2 (the hypothesis H0) or Case 3 (the hypothesis H1), is appropriate depending on the significance level. Since the cointegration rank is chosen to be 1 by the result in Figure 36.52, look at the last row that corresponds to rank=1. Since the $p$-value is 0.054, the Case 2 cannot be rejected at the significance level 5%, but it can be rejected at the significance level 10%. For modeling of the two Case 2 and Case 3, see Figure 36.56 and Figure 36.57.

Figure 36.53: Cointegration Rank Test Continued

Hypothesis of the Restriction
Hypothesis Drift in ECM Drift in Process
H0(Case 2) Constant Constant
H1(Case 3) Constant Linear

Hypothesis Test of the Restriction
Rank Eigenvalue Restricted
Eigenvalue
DF Chi-Square Pr > ChiSq
0 0.4644 0.5209 2 14.63 0.0007
1 0.0056 0.0426 1 3.71 0.0540


Figure 36.54 shows the estimates of long-run parameter (Beta) and adjustment coefficients (Alpha) based on Case 3.

Figure 36.54: Cointegration Rank Test Continued

Beta
Variable 1 2
y1 1.00000 1.00000
y2 -2.04869 -0.02854

Alpha
Variable 1 2
y1 -0.46421 -0.00502
y2 0.17535 -0.01275


Using the NORMALIZE= option, the first low of the Beta table has 1. Considering that the cointegration rank is 1, the long-run relationship of the series is

$\displaystyle  {\bbeta }’y_ t  $
$\displaystyle = $
$\displaystyle  \left[ \begin{array}{rr} 1 &  -2.04869 \\ \end{array} \right] \left[ \begin{array}{r} y_1 \\ y_2 \\ \end{array} \right]  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  y_{1t} - 2.04869 y_{2t}  $
$\displaystyle y_{1t}  $
$\displaystyle = $
$\displaystyle  2.04869 y_{2t}  $

Figure 36.55 shows the estimates of long-run parameter (Beta) and adjustment coefficients (Alpha) based on Case 2.

Figure 36.55: Cointegration Rank Test Continued

Beta Under Restriction
Variable 1 2
y1 1.00000 1.00000
y2 -2.04366 -2.75773
1 6.75919 101.37051

Alpha Under Restriction
Variable 1 2
y1 -0.48015 0.01091
y2 0.12538 0.03722


Considering that the cointegration rank is 1, the long-run relationship of the series is

$\displaystyle  {\bbeta }’y_ t  $
$\displaystyle = $
$\displaystyle  \left[ \begin{array}{rrr} 1 &  -2.04366 &  6.75919 \\ \end{array} \right] \left[ \begin{array}{r} y_1 \\ y_2 \\ 1 \end{array} \right]  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  y_{1t} - 2.04366~  y_{2t} + 6.75919  $
$\displaystyle y_{1t}  $
$\displaystyle = $
$\displaystyle  2.04366~  y_{2t} - 6.75919  $

Estimation of Vector Error Correction Model

The preceding log-likelihood function is maximized for

$\displaystyle  \hat{\bbeta }  $
$\displaystyle = $
$\displaystyle  S_{11}^{-1/2} [v_1,\ldots ,v_ r]  $
$\displaystyle \hat{\balpha }  $
$\displaystyle = $
$\displaystyle  S_{01}\hat{\bbeta }(\hat{\bbeta }’ S_{11}\hat{\bbeta })^{-1}  $
$\displaystyle \hat\Pi  $
$\displaystyle = $
$\displaystyle  \hat{\balpha } \hat{\bbeta }’  $
$\displaystyle \hat\Psi ’  $
$\displaystyle = $
$\displaystyle  (Z_{2}’Z_{2})^{-1} Z_{2}’(Z_{0} - Z_{1} \hat\Pi ’)  $
$\displaystyle \hat\Sigma  $
$\displaystyle = $
$\displaystyle  (Z_{0} - Z_{2} \hat\Psi ’ - Z_{1} \hat\Pi ’)’ (Z_{0} - Z_{2} \hat\Psi ’ - Z_{1} \hat\Pi ’)/T  $

The estimators of the orthogonal complements of $\balpha $ and $\bbeta $ are

\[  \hat{\bbeta }_{\bot } = S_{11} [v_{r+1},\ldots ,v_{k}]  \]

and

\[  \hat{\balpha }_{\bot } = S_{00}^{-1} S_{01} [v_{r+1},\ldots ,v_{k}]  \]

The ML estimators have the following asymptotic properties:

$\displaystyle  {\sqrt T} \mr {vec} ([\hat\Pi ,\hat\Psi ] - [\Pi , \Psi ]) \stackrel{d}{\rightarrow } N(0, \Sigma _{co})  $

where

$\displaystyle  \Sigma _{co} = \Sigma \otimes \left( \left[ \begin{array}{cc} \bbeta &  0 \\ 0 &  I_ k \end{array} \right] \Omega ^{-1} \left[ \begin{array}{cc} \bbeta ’ &  0 \\ 0 &  I_ k \end{array} \right] \right)  $

and

$\displaystyle  \Omega = \mr {plim} \frac{1}{T} \left[ \begin{array}{cc} \bbeta ’Z_{1}’Z_{1}\bbeta &  \bbeta ’Z_{1}’Z_{2} \\ Z_{2}’Z_{1}\bbeta &  Z_{2}’Z_{2} \\ \end{array} \right]  $

The following statements are examples of fitting the five different cases of the vector error correction models mentioned in the previous section.

For fitting Case 1,

   model y1 y2 / p=2 ecm=(rank=1 normalize=y1) noint;

For fitting Case 2,

   model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend);

For fitting Case 3,

   model y1 y2 / p=2 ecm=(rank=1 normalize=y1);

For fitting Case 4,

   model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend)
                 trend=linear;

For fitting Case 5,

   model y1 y2 / p=2 ecm=(rank=1 normalize=y1) trend=linear;

From Figure 36.53 that uses the COINTTEST=(JOHANSEN) option, you can fit the model by using either Case 2 or Case 3 because the test was not significant at the 0.05 level, but was significant at the 0.10 level. Here both models are fitted to show the difference in output display. Figure 36.56 is for Case 2, and Figure 36.57 is for Case 3.

For Case 2,

proc varmax data=simul2;
   model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend)
                 print=(estimates);
run;

Figure 36.56: Parameter Estimation with the ECTREND Option

The VARMAX Procedure

Parameter Alpha * Beta' Estimates
Variable y1 y2 1
y1 -0.48015 0.98126 -3.24543
y2 0.12538 -0.25624 0.84748

AR Coefficients of Differenced Lag
DIF Lag Variable y1 y2
1 y1 -0.72759 -0.77463
  y2 0.38982 -0.55173

Model Parameter Estimates
Equation Parameter Estimate Standard
Error
t Value Pr > |t| Variable
D_y1 CONST1 -3.24543 0.33022     1, EC
  AR1_1_1 -0.48015 0.04886     y1(t-1)
  AR1_1_2 0.98126 0.09984     y2(t-1)
  AR2_1_1 -0.72759 0.04623 -15.74 0.0001 D_y1(t-1)
  AR2_1_2 -0.77463 0.04978 -15.56 0.0001 D_y2(t-1)
D_y2 CONST2 0.84748 0.35394     1, EC
  AR1_2_1 0.12538 0.05236     y1(t-1)
  AR1_2_2 -0.25624 0.10702     y2(t-1)
  AR2_2_1 0.38982 0.04955 7.87 0.0001 D_y1(t-1)
  AR2_2_2 -0.55173 0.05336 -10.34 0.0001 D_y2(t-1)


Figure 36.56 can be reported as follows:

$\displaystyle  \Delta \mb {y} _ t  $
$\displaystyle = $
$\displaystyle  \left[ \begin{array}{rrr} -0.48015 &  0.98126 &  -3.24543 \\ 0.12538 &  -0.25624&  0.84748 \end{array} \right] \left[ \begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \\ 1 \end{array} \right]  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  + \left[ \begin{array}{rr} -0.72759 &  -0.77463 \\ 0.38982 &  -0.55173 \end{array} \right] \Delta \mb {y} _{t-1} + \bepsilon _ t  $

The keyword EC in the Model Parameter Estimates table means that the ECTREND option is used for fitting the model.

For fitting Case 3,

proc varmax data=simul2;
   model y1 y2 / p=2 ecm=(rank=1 normalize=y1)
                 print=(estimates);
run;

Figure 36.57: Parameter Estimation without the ECTREND Option

The VARMAX Procedure

Parameter Alpha * Beta' Estimates
Variable y1 y2
y1 -0.46421 0.95103
y2 0.17535 -0.35923

AR Coefficients of Differenced Lag
DIF Lag Variable y1 y2
1 y1 -0.74052 -0.76305
  y2 0.34820 -0.51194

Model Parameter Estimates
Equation Parameter Estimate Standard
Error
t Value Pr > |t| Variable
D_y1 CONST1 -2.60825 1.32398 -1.97 0.0518 1
  AR1_1_1 -0.46421 0.05474     y1(t-1)
  AR1_1_2 0.95103 0.11215     y2(t-1)
  AR2_1_1 -0.74052 0.05060 -14.63 0.0001 D_y1(t-1)
  AR2_1_2 -0.76305 0.05352 -14.26 0.0001 D_y2(t-1)
D_y2 CONST2 3.43005 1.39587 2.46 0.0159 1
  AR1_2_1 0.17535 0.05771     y1(t-1)
  AR1_2_2 -0.35923 0.11824     y2(t-1)
  AR2_2_1 0.34820 0.05335 6.53 0.0001 D_y1(t-1)
  AR2_2_2 -0.51194 0.05643 -9.07 0.0001 D_y2(t-1)


Figure 36.57 can be reported as follows:

$\displaystyle  \Delta \mb {y} _ t  $
$\displaystyle = $
$\displaystyle  \left[ \begin{array}{rr} -0.46421 &  0.95103 \\ 0.17535 &  -0.35293 \end{array} \right] \mb {y} _{t-1} + \left[ \begin{array}{rr} -0.74052 &  -0.76305 \\ 0.34820 &  -0.51194 \end{array} \right] \Delta \mb {y} _{t-1}  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  + \left[ \begin{array}{r} -2.60825 \\ 3.43005 \end{array} \right] + \bepsilon _ t  $

Test for the Linear Restriction on the Parameters

Consider the example with the variables $m_ t$ log real money, $y_ t$ log real income, $i^ d_ t$ deposit interest rate, and $i^ b_ t$ bond interest rate. It seems a natural hypothesis that in the long-run relation, money and income have equal coefficients with opposite signs. This can be formulated as the hypothesis that the cointegrated relation contains only $m_ t$ and $y_ t$ through $m_ t - y_ t$. For the analysis, you can express these restrictions in the parameterization of $H$ such that $\bbeta = H\phi $, where $H$ is a known $k\times s$ matrix and $\psi $ is the $s\times r ( r\leq s < k)$ parameter matrix to be estimated. For this example, $H$ is given by

\[  H = \left[ \begin{array}{rrr} 1 &  0 &  0 \\ -1 &  0 &  0 \\ 0 &  1 &  0 \\ 0 &  0 &  1 \\ \end{array} \right]  \]

Restriction $H_0\colon \bbeta = H\phi $

When the linear restriction $\bbeta = H\phi $ is given, it implies that the same restrictions are imposed on all cointegrating vectors. You obtain the maximum likelihood estimator of $\bbeta $ by reduced rank regression of $\Delta \mb {y} _ t$ on $H\mb {y} _{t-1}$ corrected for $(\Delta \mb {y} _{t-1},\ldots ,\Delta \mb {y} _{t-p+1}, D_ t)$, solving the following equation

$\displaystyle  |\rho H’S_{11}H - H’S_{10}S^{-1}_{00}S_{01}H| = 0  $

for the eigenvalues $1>\rho _1>\cdots >\rho _ s>0$ and eigenvectors $(v_1,\ldots ,v_ s)$, $S_{ij}$ given in the preceding section. Then choose $\hat\phi =(v_1,\ldots ,v_ r)$ that corresponds to the $r$ largest eigenvalues, and the $\hat{\bbeta }$ is $H\hat\phi $.

The test statistic for $H_0\colon \bbeta = H\phi $ is given by

\[  T\sum _{i=1}^ r \log \{ (1-\rho _ i)/(1-\lambda _ i)\}  \stackrel{d}{\rightarrow } \chi ^2_{r(k-s)}  \]

If the series has no deterministic trend, the constant term should be restricted by $\balpha _{\bot }’\bdelta _0 = 0$ as in Case 2. Then $H$ is given by

\[  H = \left[ \begin{array}{rrrr} 1 &  0 &  0 &  0\\ -1 &  0 &  0 &  0\\ 0 &  1 &  0 &  0\\ 0 &  0 &  1 &  0\\ 0 &  0 &  0 &  1\\ \end{array} \right]  \]

The following statements test that 2 $\beta _1 + \beta 2 = 0$:

proc varmax data=simul2;
   model y1 y2 / p=2 ecm=(rank=1 normalize=y1);
   cointeg rank=1 h=(1,-2);
run;

Figure 36.58 shows the results of testing $H_0\colon 2 \beta _1 +\beta 2 =0$. The input $H$ matrix is $H=(1 -2)’$. The adjustment coefficient is reestimated under the restriction, and the test indicates that you cannot reject the null hypothesis.

Figure 36.58: Testing of Linear Restriction (H= Option)

The VARMAX Procedure

Beta Under Restriction
Variable 1
y1 1.00000
y2 -2.00000

Alpha Under Restriction
Variable 1
y1 -0.47404
y2 0.17534

Hypothesis Test
Index Eigenvalue Restricted
Eigenvalue
DF Chi-Square Pr > ChiSq
1 0.4644 0.4616 1 0.51 0.4738


Test for the Weak Exogeneity and Restrictions of Alpha

Consider a vector error correction model:

\[  \Delta \mb {y} _ t = \balpha \bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} + AD_ t + \bepsilon _ t  \]

Divide the process $\mb {y} _ t$ into $(\mb {y} _{1t}’,\mb {y} _{2t}’)’$ with dimension $k_1$ and $k_2$ and the $\Sigma $ into

$\displaystyle  \Sigma = \left[ \begin{array}{cc} \Sigma _{11} &  \Sigma _{12} \\ \Sigma _{21} &  \Sigma _{22} \end{array} \right]  $

Similarly, the parameters can be decomposed as follows:

$\displaystyle  \balpha = \left[ \begin{array}{c} \balpha _1 \\ \balpha _2 \end{array} \right] ~ ~  \Phi ^*_ i = \left[ \begin{array}{c} \Phi ^*_{1i} \\ \Phi ^*_{2i} \end{array} \right] ~ ~  A = \left[ \begin{array}{c} A_{1} \\ A_{2} \end{array} \right]  $

Then the VECM(p) form can be rewritten by using the decomposed parameters and processes:

$\displaystyle  \left[ \begin{array}{c} \Delta \mb {y} _{1t} \\ \Delta \mb {y} _{2t} \end{array} \right] = \left[ \begin{array}{c} \balpha _1 \\ \balpha _2 \end{array} \right] \bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1} \left[ \begin{array}{c} \Phi ^*_{1i} \\ \Phi ^*_{2i} \end{array} \right] \Delta \mb {y} _{t-i} + \left[ \begin{array}{c} A_{1} \\ A_{2} \end{array} \right] D_ t + \left[ \begin{array}{c} \bepsilon _{1t} \\ \bepsilon _{2t} \end{array} \right]  $

The conditional model for $\mb {y} _{1t}$ given $\mb {y} _{2t}$ is

$\displaystyle  \Delta \mb {y} _{1t}  $
$\displaystyle = $
$\displaystyle  \omega \Delta \mb {y} _{2t} + (\alpha _1-\omega \alpha _2)\bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1}(\Phi ^{*}_{1i} - \omega \Phi ^{*}_{2i})\Delta \mb {y} _{t-i}  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  + (A_1 - \omega A_2) D_ t + \bepsilon _{1t} - \omega \bepsilon _{2t}  $

and the marginal model of $\mb {y} _{2t}$ is

\[  \Delta \mb {y} _{2t} =\alpha _2\bbeta ’\mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^{*}_{2i}\Delta \mb {y} _{t-i} + A_2 D_ t + \bepsilon _{2t}  \]

where $\omega =\Sigma _{12}\Sigma _{22}^{-1}$.

The test of weak exogeneity of $\mb {y} _{2t}$ for the parameters $(\alpha _1, \bbeta )$ determines whether $\alpha _2=0$. Weak exogeneity means that there is no information about $\bbeta $ in the marginal model or that the variables $\mb {y} _{2t}$ do not react to a disequilibrium.

Restriction $H_0\colon \balpha =J\psi $

Consider the null hypothesis $H_0\colon \balpha = J\psi $, where $J$ is a $k\times m$ matrix with $r \leq m < k$.

From the previous residual regression equation

$\displaystyle  {R}_{0t} = \balpha \bbeta ’{R}_{1t} + \hat{\bepsilon }_ t = J\psi \bbeta ’{R}_{1t} + \hat{\bepsilon }_ t  $

you can obtain

$\displaystyle  \bar{J}’{R}_{0t}  $
$\displaystyle = $
$\displaystyle  \psi \bbeta ’{R}_{1t} +\bar{J}’\hat{\bepsilon }_ t  $
$\displaystyle J_{\bot }’{R}_{0t} $
$\displaystyle = $
$\displaystyle  J_{\bot }’\hat{\bepsilon }_ t  $

where $\bar{J}=J(J’J)^{-1}$ and $J_{\bot }$ is orthogonal to $J$ such that $J_{\bot }’J=0$.

Define

\[  \Sigma _{JJ_{\bot }} = \bar{J}’\Sigma J_{\bot } \mr {~ ~ and~ ~ } \Sigma _{J_{\bot }J_{\bot }} = J_{\bot }’\Sigma J_{\bot }  \]

and let $\omega =\Sigma _{JJ_{\bot }}\Sigma _{J_{\bot }J_{\bot }}^{-1}$. Then $\bar{J}’{R}_{0t}$ can be written as

$\displaystyle  \bar{J}’{R}_{0t} = \psi \bbeta ’{R}_{1t} + \omega J_{\bot }’{R}_{0t} + \bar{J}’\hat{\bepsilon }_ t - \omega J_{\bot }’ \hat{\bepsilon }_ t  $

Using the marginal distribution of $J_{\bot }’{R}_{0t}$ and the conditional distribution of $\bar{J}’{R}_{0t}$, the new residuals are computed as

$\displaystyle  \tilde{R}_{Jt}  $
$\displaystyle  =  $
$\displaystyle  \bar{J}’{R}_{0t} - S_{JJ_{\bot }} S_{J_{\bot }J_{\bot }}^{-1}J_{\bot }’{R}_{0t}  $
$\displaystyle \tilde{R}_{1t}  $
$\displaystyle  =  $
$\displaystyle  {R}_{1t} - S_{1J_{\bot }} S_{J_{\bot }J_{\bot }}^{-1}J_{\bot }’{R}_{0t}  $

where

\[  S_{JJ_{\bot }} = \bar{J}’S_{00}J_{\bot }, ~ ~  S_{J_{\bot }J_{\bot }} = J_{\bot }’S_{00}J_{\bot }, ~ ~ \mr {and ~ ~ } S_{J_{\bot }1} = J_{\bot }’S_{01}  \]

In terms of $\tilde{R}_{Jt}$ and $\tilde{R}_{1t}$, the MLE of $\bbeta $ is computed by using the reduced rank regression. Let

\[  S_{ij\mb {.} J_{\bot }}=\frac{1}{T}\sum _{t=1}^{T}\tilde{{R}}_{it} \tilde{{R}}_{jt}’, \mr {~ ~ for~ ~ } i,j=1,J  \]

Under the null hypothesis $H_0\colon \balpha =J\psi $, the MLE $\tilde{\bbeta }$ is computed by solving the equation

$\displaystyle  |\rho S_{11\mb {.} J_{\bot }} - S_{1J\mb {.} J_{\bot }}S_{JJ\mb {.} J_{\bot }}^{-1} S_{J1\mb {.} J_{\bot }}| = 0  $

Then $\tilde{\bbeta }=(v_1,\ldots , v_ r)$, where the eigenvectors correspond to the $r$ largest eigenvalues and are normalized such that $ \tilde{\bbeta }’ S_{11\mb {.} J_{\bot }} \tilde{\bbeta } = I_ r $; $\tilde{\balpha }=J S_{J1\mb {.} J_{\bot }} \tilde{\bbeta }$. The likelihood ratio test for $H_0\colon \balpha =J\psi $ is

\[  T\sum _{i=1}^ r\log \{ (1-\rho _ i)/(1-\lambda _ i)\}  \stackrel{d}{\rightarrow } \chi ^2_{r(k-m)}  \]

See Theorem 6.1 in Johansen and Juselius (1990) for more details.

The test of weak exogeneity of $\mb {y} _{2t}$ is a special case of the test $\balpha =J\psi $, considering $J=(I_{k_1},0)’$. Consider the previous example with four variables ( $m_ t, y_ t, i_ t^ b, i_ t^ d$ ). If $r=1$, you formulate the weak exogeneity of ($y_ t,i_ t^ b,i_ t^ d$) for $m_ t$ as $J=[1, 0, 0, 0]’$ and the weak exogeneity of $i_ t^ d$ for ($m_ t, y_ t, i_ t^ b$) as $J = [I_3,0]’$.

The following statements test the weak exogeneity of other variables, assuming $r=1$:

proc varmax data=simul2;
   model y1 y2 / p=2 ecm=(rank=1 normalize=y1);
   cointeg rank=1 exogeneity;
run;
proc varmax data=simul2;
   model y1 y2 / p=2 ecm=(rank=1 normalize=y1);
   cointeg rank=1 j=exogeneity;
run;

Figure 36.59 shows that each variable is not the weak exogeneity of other variable.

Figure 36.59: Testing of Weak Exogeneity (EXOGENEITY Option)

The VARMAX Procedure

Testing Weak Exogeneity of
Each Variables
Variable DF Chi-Square Pr > ChiSq
y1 1 53.46 <.0001
y2 1 8.76 0.0031


Forecasting of the VECM

Consider the cointegrated moving-average representation of the differenced process of $\mb {y}_ t$

$\displaystyle  \Delta \mb {y} _ t = \bdelta + \Psi (B)\bepsilon _ t  $

Assume that $\mb {y} _0=0$. The linear process $\mb {y} _ t$ can be written as

$\displaystyle  \mb {y}_ t = \bdelta t + \sum _{i=1}^ t\sum _{j=0}^{t-i}\Psi _ j\bepsilon _ i  $

Therefore, for any $l > 0$,

$\displaystyle  \mb {y} _{t+l} = \bdelta (t+l) + \sum _{i=1}^ t\sum _{j=0}^{t+l-i}\Psi _ j\bepsilon _ i + \sum _{i=1}^ l\sum _{j=0}^{l-i}\Psi _ j\bepsilon _{t+i}  $

The $l$-step-ahead forecast is derived from the preceding equation:

$\displaystyle  \mb {y} _{t+l|t} = (t+l) + \sum _{i=1}^ t\sum _{j=0}^{t+l-i}\Psi _ j\bepsilon _ i  $

Note that

\[  \lim _{l\rightarrow \infty } \bbeta ’\mb {y} _{t+l|t} = 0  \]

since $\lim _{l\rightarrow \infty }\sum _{j=0}^{t+l-i}\Psi _ j = \Psi (1)$ and $\bbeta ’ \Psi (1) = 0$. The long-run forecast of the cointegrated system shows that the cointegrated relationship holds, although there might exist some deviations from the equilibrium status in the short-run. The covariance matrix of the predict error $\mb {e} _{t+l|t}=\mb {y} _{t+l}-\mb {y} _{t+l|t}$ is

\[  \Sigma (l) = \sum _{i=1}^{l}[(\sum _{j=0}^{l-i}\Psi _ j)\Sigma (\sum _{j=0}^{l-i}\Psi _ j’)]  \]

When the linear process is represented as a VECM($p$) model, you can obtain

$\displaystyle  \Delta \mb {y} _ t = \Pi \mb {y} _{t-1} + \sum _{j=1}^{p-1} \Phi ^{*}_ j\Delta \mb {y} _{t-j} + \bdelta + \bepsilon _ t  $

The transition equation is defined as

$\displaystyle  \mb {z} _{t} = F \mb {z} _{t-1} + \mb {e} _{t}  $

where $\mb {z} _ t=(\mb {y} _{t-1}’,\Delta \mb {y} _{t}’, \Delta \mb {y} _{t-1}’, \cdots ,\Delta \mb {y} _{t-p+2}’)’$ is a state vector and the transition matrix is

$\displaystyle  F = \left[ \begin{array}{cccccc} I_ k &  I_ k &  0 &  \cdots &  0 \\ \Pi & (\Pi +\Phi ^*_1)& \Phi ^*_2 &  \cdots & \Phi ^*_{p-1} \\ 0 &  I_ k &  0 &  \cdots &  0 \\ \vdots &  \vdots &  \vdots &  \ddots &  \vdots \\ 0 &  0 &  \cdots &  I_ k &  0 \\ \end{array} \right]  $

where 0 is a $k \times k$ zero matrix. The observation equation can be written

\[  \mb {y} _ t = \bdelta t + H \mb {z} _ t  \]

where $H=[I_ k,I_ k,0,\ldots ,0]$.

The $l$-step-ahead forecast is computed as

$\displaystyle  \mb {y} _{t+l|t} = \bdelta (t+l) + H F^ l \mb {z} _ t  $

Cointegration with Exogenous Variables

The error correction model with exogenous variables can be written as follows:

$\displaystyle  \Delta \mb {y} _{t} = \balpha \bbeta ’ \mb {y} _{t-1} + \sum _{i=1}^{p-1} \Phi ^*_ i \Delta \mb {y} _{t-i} + A D_ t + \sum _{i=0}^{s}\Theta ^*_ i\mb {x} _{t-i} + \bepsilon _ t  $

The following statements demonstrate how to fit VECMX($p,s$), where $p=2$ and $s=1$ from the P=2 and XLAG=1 options:

   proc varmax data=simul3;
      model y1 y2 = x1 / p=2 xlag=1 ecm=(rank=1);
   run;

The following statements demonstrate how to BVECMX(2,1):

   proc varmax data=simul3;
      model y1 y2 = x1 / p=2 xlag=1 ecm=(rank=1)
         prior=(lambda=0.9 theta=0.1);
   run;