The VARMAX Procedure

Cointegration

This section briefly introduces the concepts of cointegration (Johansen 1995b).

Definition 1.

(Engle and Granger 1987): If a series $y_ t$ with no deterministic components can be represented by a stationary and invertible ARMA process after differencing $d$ times, the series is integrated of order $d$, that is, $y_ t \sim I(d)$.

Definition 2.

(Engle and Granger 1987): If all elements of the vector $\mb {y} _ t$ are $I(d)$ and there exists a cointegrating vector $\bbeta \neq 0$ such that $\bbeta ’\mb {y} _ t \sim I(d-b)$ for any $b > 0$, the vector process is said to be cointegrated $CI(d,b)$.

A simple example of a cointegrated process is the following bivariate system:

$\displaystyle  y_{1t}  $
$\displaystyle = $
$\displaystyle  \gamma y_{2t} + \epsilon _{1t}  $
$\displaystyle y_{2t}  $
$\displaystyle = $
$\displaystyle  y_{2,t-1} + \epsilon _{2t}  $

with $ \epsilon _{1t}$ and $\epsilon _{2t}$ being uncorrelated white noise processes. In the second equation, $y_{2t}$ is a random walk, $\Delta y_{2t} = \epsilon _{2t}$, $\Delta \equiv 1-B$. Differencing the first equation results in

\[  \Delta y_{1t} = \gamma \Delta y_{2t} +\Delta \epsilon _{1t} = \gamma \epsilon _{2t} +\epsilon _{1t}-\epsilon _{1,t-1}  \]

Thus, both $y_{1t}$ and $y_{2t}$ are $I(1)$ processes, but the linear combination $ y_{1t} - \gamma y_{2t}$ is stationary. Hence $\mb {y} _ t =(y_{1t}, y_{2t})’$ is cointegrated with a cointegrating vector $\bbeta = (1, -\gamma )’$.

In general, if the vector process $\mb {y} _ t$ has $k$ components, then there can be more than one cointegrating vector $\bbeta ’$. It is assumed that there are $r$ linearly independent cointegrating vectors with $r<k$, which make the $k\times r$ matrix $\bbeta $. The rank of matrix $\bbeta $ is $r$, which is called the cointegration rank of $\mb {y} _ t$.

Common Trends

This section briefly discusses the implication of cointegration for the moving-average representation. Let $\mb {y} _ t$ be cointegrated $CI(1,1)$, then $\Delta \mb {y} _ t$ has the Wold representation:

$\displaystyle  \Delta \mb {y} _ t = \bdelta + \Psi (B)\bepsilon _ t  $

where $\bepsilon _ t$ is $iid (0,\Sigma )$, $\Psi (B)=\sum _{j=0}^{\infty } \Psi _ jB^ j$ with $\Psi _0=I_ k$, and $\sum _{j=0}^{\infty }j|\Psi _ j| < \infty $.

Assume that $\bepsilon _ t = 0$ if $t\leq 0$ and $\mb {y} _0$ is a nonrandom initial value. Then the difference equation implies that

$\displaystyle  \mb {y} _ t = \mb {y} _0 + \bdelta t + \Psi (1)\sum _{i=0}^{t}\bepsilon _ i + \Psi ^{*}(B)\bepsilon _ t  $

where $\Psi ^{*}(B) = (1-B)^{-1}(\Psi (B)-\Psi (1))$ and $\Psi ^{*}(B)$ is absolutely summable.

Assume that the rank of $\Psi (1)$ is $m=k-r$. When the process $\mb {y} _ t$ is cointegrated, there is a cointegrating $k\times r$ matrix $\bbeta $ such that $\bbeta ’ \mb {y} _ t$ is stationary.

Premultiplying $\mb {y} _ t$ by $\bbeta ’$ results in

\[  \bbeta ’ \mb {y} _ t = \bbeta ’\mb {y} _0 + \bbeta ’ \Psi ^{*}(B)\bepsilon _ t  \]

because $\bbeta ’\Psi (1)=0$ and $\bbeta ’\bdelta =0$.

Stock and Watson (1988) showed that the cointegrated process $\mb {y} _ t$ has a common trends representation derived from the moving-average representation. Since the rank of $\Psi (1)$ is $m=k-r$, there is a $k\times r$ matrix $H_1$ with rank $r$ such that $\Psi (1)H_1=0$. Let $H_2$ be a $k\times m$ matrix with rank $m$ such that $H_2’H_1=0$; then $A=C(1)H_2$ has rank $m$. The $H=(H_1,H_2)$ has rank $k$. By construction of $H$,

$\displaystyle  \Psi (1)H = [0, A] = A S_ m  $

where $S_ m=(0_{m\times r},I_ m)$. Since $\bbeta ’\Psi (1)=0$ and $\bbeta ’\bdelta =0$, $\bdelta $ lies in the column space of $\Psi (1)$ and can be written

$\displaystyle  \bdelta = C(1)\tilde{\bdelta }  $

where $\tilde{\bdelta }$ is a $k$-dimensional vector. The common trends representation is written as

$\displaystyle  \mb {y} _ t  $
$\displaystyle  =  $
$\displaystyle  \mb {y} _0 + \Psi (1)[\tilde{\bdelta } t + \sum _{i=0}^{t}\bepsilon _ i] + \Psi ^{*}(B)\bepsilon _ t  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \mb {y} _0 + \Psi (1)H[H^{-1}\tilde{\delta } t + H^{-1}\sum _{i=0}^{t}\bepsilon _ i] + \mb {a} _ t  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \mb {y} _0 + A\btau _ t + \mb {a} _ t  $

and

\[  \btau _ t = \pi + \btau _{t-1} + \mb {v} _ t  \]

where $\mb {a} _ t = \Psi ^{*}(B)\bepsilon _ t$, $\pi =S_ mH^{-1}\tilde{\bdelta }$, $\btau _ t= S_ m[H^{-1}\tilde{\bdelta } t + H^{-1}\sum _{i=0}^{t}\bepsilon _ i]$, and $\mb {v} _ t=S_ mH^{-1}\bepsilon _ t$.

Stock and Watson showed that the common trends representation expresses $\mb {y} _ t$ as a linear combination of $m$ random walks ($\btau _ t$) with drift $\pi $ plus $I(0)$ components ($\mb {a} _ t)$.

Test for the Common Trends

Stock and Watson (1988) proposed statistics for common trends testing. The null hypothesis is that the $k$-dimensional time series $\mb {y} _{t}$ has $m$ common stochastic trends, where $m\leq k$ and the alternative is that it has $s$ common trends, where $s < m$ . The test procedure of $m$ versus $s$ common stochastic trends is performed based on the first-order serial correlation matrix of $\mb {y} _ t$. Let $\bbeta _{\bot }$ be a $k\times m$ matrix orthogonal to the cointegrating matrix such that $\bbeta _{\bot }^{}\bbeta = 0$ and $\bbeta _{\bot }^{}\bbeta _{\bot }^{}=I_ m$. Let $\mb {z} _{t}=\bbeta ’\mb {y} _ t$ and $\mb {w} _{t}=\bbeta _{\bot }^{}\mb {y} _ t$. Then

\[  \mb {w} _{t} = \bbeta _{\bot }’\mb {y} _0 + \bbeta _{\bot }’\bdelta t + \bbeta _{\bot }’ \Psi (1)\sum _{i=0}^{t}\bepsilon _ i + \bbeta _{\bot }’ \Psi ^{*}(B)\bepsilon _ t  \]

Combining the expression of $\mb {z} _ t$ and $\mb {w} _ t$,

$\displaystyle  \left[ \begin{array}{c} \mb {z} _ t \\ \mb {w} _ t \end{array} \right]  $
$\displaystyle  =  $
$\displaystyle  \left[ \begin{array}{c} \bbeta ’\mb {y} _0 \\ \bbeta _{\bot }^{}\mb {y} _0 \end{array} \right] + \left[ \begin{array}{c} 0 \\ \bbeta _{\bot }^{}\bdelta \end{array} \right] t + \left[ \begin{array}{c} 0 \\ \bbeta _{\bot }^{}\Psi (1) \end{array} \right] \sum _{i=1}^ t\bepsilon _ i  $
$\displaystyle  $
$\displaystyle  +  $
$\displaystyle  \left[ \begin{array}{c} \bbeta ’\Psi ^{*}(B) \\ \bbeta _{\bot }’\Psi ^{*}(B) \end{array} \right] \bepsilon _ t  $

The Stock-Watson common trends test is performed based on the component $\mb {w} _ t$ by testing whether $\bbeta _{\bot }^{}\Psi (1)$ has rank $m$ against rank $s$.

The following statements perform the Stock-Watson test for common trends:

proc iml;
   sig = 100*i(2);
   phi = {-0.2 0.1, 0.5 0.2, 0.8 0.7, -0.4 0.6};
   call varmasim(y,phi) sigma=sig n=100 initial=0
                        seed=45876;
   cn = {'y1' 'y2'};
   create simul2 from y[colname=cn];
   append from y;
quit;

data simul2;
   set simul2;
   date = intnx( 'year', '01jan1900'd, _n_-1 );
   format date year4. ;
run;

proc varmax data=simul2;
   model y1 y2 / p=2 cointtest=(sw);
run;

In Figure 36.51, the first column is the null hypothesis that $\mb {y} _ t$ has $m\leq k$ common trends; the second column is the alternative hypothesis that $\mb {y} _ t$ has $s < m$ common trends; the third column contains the eigenvalues used for the test statistics; the fourth column contains the test statistics using AR($p$) filtering of the data. The table shows the output of the case $p=2$.

Figure 36.51: Common Trends Test (COINTTEST=(SW) Option)

The VARMAX Procedure

Common Trend Test
H0:
Rank=m
H1:
Rank=s
Eigenvalue Filter 5% Critical Value Lag
1 0 1.000906 0.09 -14.10 2
2 0 0.996763 -0.32 -8.80  
  1 0.648908 -35.11 -23.00  


The test statistic for testing for 2 versus 1 common trends is more negative (–35.1) than the critical value (–23.0). Therefore, the test rejects the null hypothesis, which means that the series has a single common trend.