The VARMAX Procedure

Bayesian VAR and VARX Modeling

Consider the VAR($p$) model

$\displaystyle  \mb {y} _ t = \bdelta + \Phi _1\mb {y} _{t-1} + \cdots + \Phi _ p\mb {y} _{t-p} + \bepsilon _ t  $

or

$\displaystyle  \mb {y} = (X \otimes I_ k)\bbeta + \mb {e}  $

When the parameter vector $\bbeta $ has a prior multivariate normal distribution with known mean $\bbeta ^{*}$ and covariance matrix $V_{\beta }$, the prior density is written as

$\displaystyle  f(\bbeta ) = (\frac{1}{2\pi })^{k^2p/2}|V_{\beta }|^{-1/2} \exp [-\frac{1}{2}(\bbeta -\bbeta ^{*}) V_{\beta }^{-1}(\bbeta -\bbeta ^{*})]  $

The likelihood function for the Gaussian process becomes

$\displaystyle  \ell (\bbeta |\mb {y} )  $
$\displaystyle  =  $
$\displaystyle  (\frac{1}{2\pi })^{kT/2}|I_ T\otimes \Sigma |^{-1/2}\times  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  \exp [-\frac{1}{2}(\mb {y} -(X\otimes I_ k)\bbeta )’ (I_ T\otimes \Sigma ^{-1})(\mb {y} -(X\otimes I_ k)\bbeta )]  $

Therefore, the posterior density is derived as

$\displaystyle  f(\bbeta |\mb {y} ) \propto \exp [-\frac{1}{2}(\bbeta -\bar{\bbeta })’ \bar{\Sigma }_{\beta }^{-1}(\bbeta -\bar{\bbeta })]  $

where the posterior mean is

$\displaystyle  \bar{\bbeta } = [V_{\beta }^{-1}+(X’X\otimes \Sigma ^{-1} )]^{-1} [V_{\beta }^{-1}\bbeta ^{*}+( X’\otimes \Sigma ^{-1})\mb {y} ]  $

and the posterior covariance matrix is

$\displaystyle  \bar{\Sigma }_{\beta } = [V_{\beta }^{-1} +(X’X\otimes \Sigma ^{-1})]^{-1}  $

In practice, the prior mean $\bbeta ^{*}$ and the prior variance $V_{\beta }$ need to be specified. If all the parameters are considered to shrink toward zero, the null prior mean should be specified. According to Litterman (1986), the prior variance can be given by

$\displaystyle  v_{ij}(l) = \left\{  \begin{array}{ll} ({\lambda }/{l})^2 &  \mbox{if $i=j$} \\ ({\lambda \theta \sigma _{ii}}/{l\sigma _{jj}})^2 &  \mbox{if $i\neq j$} \end{array} \right.  $

where $v_{ij}(l)$ is the prior variance of the $(i,j)$th element of $\Phi _ l$, $\lambda $ is the prior standard deviation of the diagonal elements of $\Phi _ l$, $\theta $ is a constant in the interval $(0,1)$, and $\sigma ^2_{ii}$ is the $i$th diagonal element of $\Sigma $. The deterministic terms have diffused prior variance. In practice, you replace the $\sigma ^2_{ii}$ by the diagonal element of the ML estimator of $\Sigma $ in the nonconstrained model.

For example, for a bivariate BVAR(2) model,

$\displaystyle  y_{1t}  $
$\displaystyle = $
$\displaystyle  0 + \phi _{1,11}y_{1,t-1} + \phi _{1,12}y_{2,t-1} + \phi _{2,11}y_{1,t-2} + \phi _{2,12}y_{2,t-2} + \epsilon _{1t}  $
$\displaystyle y_{2t}  $
$\displaystyle = $
$\displaystyle  0 + \phi _{1,21}y_{1,t-1} + \phi _{1,22}y_{2,t-1} + \phi _{2,21}y_{1,t-2} + \phi _{2,22}y_{2,t-2} + \epsilon _{2t}  $

with the prior covariance matrix

$\displaystyle  V_{\beta } = \mr {Diag}  $
$\displaystyle ( $
$\displaystyle  \infty , \lambda ^2, (\lambda \theta \sigma _1/\sigma _2)^2, (\lambda /2)^2, (\lambda \theta \sigma _1/2\sigma _2)^2,  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  \infty , (\lambda \theta \sigma _2/\sigma _1)^2, \lambda ^2, (\lambda \theta \sigma _2/2\sigma _1)^2, (\lambda /2)^2 ~ ~ )  $

For the Bayesian estimation of integrated systems, the prior mean is set to the first lag of each variable equal to one in its own equation and all other coefficients at zero. For example, for a bivariate BVAR(2) model,

$\displaystyle  y_{1t}  $
$\displaystyle = $
$\displaystyle  0 + 1 ~ y_{1,t-1} + 0 ~ y_{2,t-1} + 0 ~ y_{1,t-2} + 0 ~ y_{2,t-2} + \epsilon _{1t}  $
$\displaystyle y_{2t}  $
$\displaystyle = $
$\displaystyle  0 + 0 ~ y_{1,t-1} + 1 ~ y_{2,t-1} + 0 ~ y_{1,t-2} + 0 ~ y_{2,t-2} + \epsilon _{2t}  $

Forecasting of BVAR Modeling

The mean squared error (MSE) is used to measure forecast accuracy (Litterman 1986). The MSE of the $s$-step-ahead forecast is

\[  MSE_ s = {\frac{1}{J-s+1} } \sum _{j=1}^{J-s+1} (A_{t_ j}-F^ s_{t_ j})^2  \]

where $J$ is the number specified by NREP= option, $t_ j$ is the time index of the observation to be forecasted in repetition $j$, $A_{t_ j}$ is the actual value at time $t_ j$, and $F^ s_{t_ j}$ is the forecast made $s$ periods earlier.

Bayesian VARX Modeling

The Bayesian vector autoregressive model with exogenous variables is called the BVARX($p$,$s$) model. The form of the BVARX($p$,$s$) model can be written as

$\displaystyle  \mb {y} _ t = \bdelta + \sum _{i=1}^{p} \Phi _ i\mb {y} _{t-i} + \sum _{i=0}^{s}\Theta ^*_ i\mb {x} _{t-i} + \bepsilon _ t  $

The parameter estimates can be obtained by representing the general form of the multivariate linear model,

$\displaystyle  \mb {y} = (X \otimes I_ k)\bbeta + \mb {e}  $

The prior means for the AR coefficients are the same as those specified in BVAR($p$). The prior means for the exogenous coefficients are set to zero.

Some examples of the Bayesian VARX model are as follows:

   model y1 y2 = x1 / p=1 xlag=1 prior;
   model y1 y2 = x1 / p=(1 3) xlag=1 nocurrentx
                      prior=(lambda=0.9 theta=0.1);