The VARMAX Procedure

VARMAX Model

The vector autoregressive moving-average model with exogenous variables is called the VARMAX($p$,$q$,$s$) model. The form of the model can be written as

$\displaystyle  \mb {y} _ t = \sum _{i=1}^{p} \Phi _ i\mb {y} _{t-i} + \sum _{i=0}^{s}\Theta ^*_ i\mb {x} _{t-i} + \bepsilon _ t - \sum _{i=1}^{q}\Theta _ i\bepsilon _{t-i}  $

where the output variables of interest, $\mb {y} _ t =(y_{1t},\ldots ,y_{kt})’$, can be influenced by other input variables, $\mb {x} _ t = (x_{1t},\ldots ,x_{rt})’$, which are determined outside of the system of interest. The variables $\mb {y} _ t$ are referred to as dependent, response, or endogenous variables, and the variables $\mb {x} _ t$ are referred to as independent, input, predictor, regressor, or exogenous variables. The unobserved noise variables, $\bepsilon _ t = (\epsilon _{1t},\ldots ,\epsilon _{kt})’$, are a vector white noise process.

The VARMAX($p$,$q$,$s$) model can be written

$\displaystyle  \Phi (B)\mb {y} _ t  $
$\displaystyle  =  $
$\displaystyle  \Theta ^*(B)\mb {x} _ t + \Theta (B)\bepsilon _ t  $

where

$\displaystyle  \Phi (B)  $
$\displaystyle  =  $
$\displaystyle  I_ k - \Phi _1 B - \cdots - \Phi _ p B^ p  $
$\displaystyle \Theta ^*(B) $
$\displaystyle  =  $
$\displaystyle  \Theta ^*_0 + \Theta ^*_1B +\cdots + \Theta ^*_ sB^ s  $
$\displaystyle \Theta (B)  $
$\displaystyle  =  $
$\displaystyle  I_ k - \Theta _1B - \cdots - \Theta _ qB^ q  $

are matrix polynomials in $B$ in the backshift operator, such that $B^ i\mb {y} _{t}=\mb {y} _{t-i}$, the $\Phi _ i$ and $\Theta _ i$ are $k\times k$ matrices, and the $\Theta ^*_ i$ are $k\times r$ matrices.

The following assumptions are made:

  • $\mr {E} (\bepsilon _ t) = 0$, $\mr {E} (\bepsilon _ t\bepsilon _ t’) = \Sigma $, which is positive-definite, and $\mr {E} (\bepsilon _ t\bepsilon _{s}’) = 0$ for $t \ne s $.

  • For stationarity and invertibility of the VARMAX process, the roots of $|\Phi (z)|=0$ and $|\Theta (z)|=0$ are outside the unit circle.

  • The exogenous (independent) variables $\mb {x} _{t}$ are not correlated with residuals $\bepsilon _{t}$, $\mr {E} (\mb {x} _{t}\bepsilon _{t}’) = 0$. The exogenous variables can be stochastic or nonstochastic. When the exogenous variables are stochastic and their future values are unknown, forecasts of these future values are needed to forecast the future values of the endogenous (dependent) variables. On occasion, future values of the exogenous variables can be assumed to be known because they are deterministic variables. The VARMAX procedure assumes that the exogenous variables are nonstochastic if future values are available in the input data set. Otherwise, the exogenous variables are assumed to be stochastic and their future values are forecasted by assuming that they follow the VARMA($p$,$q$) model, prior to forecasting the endogenous variables, where $p$ and $q$ are the same as in the VARMAX($p$,$q$,$s$) model.

State-Space Representation

Another representation of the VARMAX($p$,$q$,$s$) model is in the form of a state-variable or a state-space model, which consists of a state equation

\[  \mb {z} _{t} =F\mb {z} _{t-1} +K\mb {x} _{t} +G\bepsilon _{t}  \]

and an observation equation

\[  \mb {y} _ t = H \mb {z} _{t}  \]

where

\[  \mb {z} _{t} = \left[\begin{matrix}  \mb {y} _{t}   \\ \vdots   \\ \mb {y} _{t-p+1}   \\ \mb {x} _{t}   \\ \vdots   \\ \mb {x} _{t-s+1}   \\ \bepsilon _{t}   \\ \vdots   \\ \bepsilon _{t-q+1}   \end{matrix}\right], ~ ~ K = \left[\begin{matrix}  \Theta ^*_0   \\ 0_{k\times r}   \\ \vdots   \\ 0_{k\times r}   \\ I_ r   \\ 0_{r\times r}   \\ \vdots   \\ 0_{r\times r}   \\ 0_{k\times r}   \\ \vdots   \\ 0_{k\times r}   \end{matrix}\right], ~ ~ G = \left[\begin{matrix}  I_ k   \\ 0_{k\times k}   \\ \vdots   \\ 0_{k\times k}   \\ 0_{r\times k}   \\ \vdots   \\ 0_{r\times k}   \\ I_{k\times k}   \\ 0_{k\times k}   \\ \vdots   \\ 0_{k\times k}   \end{matrix}\right]  \]

\[  F = \left[\begin{matrix}  \Phi _{1}   &  \cdots   &  \Phi _{p-1}   &  \Phi _{p}   &  \Theta ^*_{1}   &  \cdots   &  \Theta ^*_{s-1}   &  \Theta ^*_{s}   &  -\Theta _{1}   &  \cdots   &  -\Theta _{q-1}   &  -\Theta _{q}   \\ I_ k   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   \\ \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   \\ 0   &  \cdots   &  I_ k   &  0   &  0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   \\ 0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   \\ 0   &  \cdots   &  0   &  0   &  I_ r   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   \\ \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   \\ 0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  I_ r   &  0   &  0   &  \cdots   &  0   &  0   \\ 0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   \\ 0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   &  I_ k   &  \cdots   &  0   &  0   \\ \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   &  \vdots   \\ 0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  0   &  0   &  0   &  \cdots   &  I_ k   &  0   \end{matrix} \right]  \]

and

\[  H = [I_ k, 0_{k\times k}, \ldots , 0_{k\times k}, 0_{k\times r}, \ldots , 0_{k\times r}, 0_{k\times k}, \ldots , 0_{k\times k}]  \]

On the other hand, it is assumed that $\mb {x} _ t$ follows a VARMA($p$,$q$) model

\[  \mb {x} _ t = \sum _{i=1}^{p} A_ i\mb {x} _{t-i} + \mb {a} _ t - \sum _{i=1}^{q}C_ i\mb {a} _{t-i}  \]

The model can also be expressed as

\[  A(B)\mb {x} _ t = C(B)\mb {a} _ t  \]

where $A(B)=I_ r - A_1B - \cdots - A_ pB^ p$ and $C(B)=I_ r - C_1B - \cdots - C_ qB^ q$ are matrix polynomials in $B$, and the $A_ i$ and $C_ i$ are $r\times r$ matrices. Without loss of generality, the AR and MA orders can be taken to be the same as the VARMAX($p$,$q$,$s$) model, and $\mb {a} _ t$ and $\bepsilon _ t$ are independent white noise processes.

Under suitable conditions such as stationarity, $\mb {x} _ t$ is represented by an infinite order moving-average process

$\displaystyle  \mb {x} _ t = A(B)^{-1}C(B)\mb {a} _ t = \Psi ^ x(B) \mb {a} _ t = \sum _{j=0}^{\infty }\Psi ^ x_{j}\mb {a} _{t-j}  $

where $\Psi ^ x(B) = A(B)^{-1}C(B) = \sum _{j=0}^{\infty } \Psi ^ x_ j B^ j$.

The optimal minimum mean squared error (minimum MSE) $i$-step-ahead forecast of $\mb {x} _{t+i}$ is

$\displaystyle  \mb {x} _{t+i|t}  $
$\displaystyle = $
$\displaystyle  \sum _{j=i}^{\infty }\Psi ^ x_{j}\mb {a} _{t+i-j}  $
$\displaystyle \mb {x} _{t+i|t+1}  $
$\displaystyle = $
$\displaystyle  \mb {x} _{t+i|t} + \Psi ^ x_{i-1}\mb {a} _{t+1}  $

For $i>q$,

\[  \mb {x} _{t+i|t} = \sum _{j=1}^ p A_ j \mb {x} _{t+i-j|t}  \]

The VARMAX($p$,$q$,$s$) model has an absolutely convergent representation as

$\displaystyle  \mb {y} _ t  $
$\displaystyle = $
$\displaystyle  \Phi (B)^{-1}\Theta ^*(B)\mb {x} _ t + \Phi (B)^{-1}\Theta (B)\bepsilon _{t}  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \Psi ^{*}(B)\Psi ^ x(B) \mb {a} _ t + \Phi (B)^{-1}\Theta (B)\bepsilon _{t}  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  V(B) \mb {a} _ t + \Psi (B) \bepsilon _{t}  $

or

\[  \mb {y} _ t = \sum _{j=0}^{\infty }V_{j} \mb {a} _{t-j} + \sum _{j=0}^{\infty }\Psi _ j \bepsilon _{t-j}  \]

where $\Psi (B)= \Phi (B)^{-1}\Theta (B) =\sum _{j=0}^{\infty }\Psi _ j B^ j$, $\Psi ^{*}(B)= \Phi (B)^{-1}\Theta ^*(B)$, and $V(B) = \Psi ^{*}(B)\Psi ^ x(B) = \sum _{j=0}^{\infty } V_ j B^ j$.

The optimal (minimum MSE) $i$-step-ahead forecast of $\mb {y} _{t+i}$ is

$\displaystyle  \mb {y} _{t+i|t}  $
$\displaystyle = $
$\displaystyle  \sum _{j=i}^{\infty }V_{j}\mb {a} _{t+i-j} + \sum _{j=i}^{\infty }\Psi _{j}\bepsilon _{t+i-j} $
$\displaystyle \mb {y} _{t+i|t+1}  $
$\displaystyle = $
$\displaystyle  \mb {y} _{t+i|t} + V_{i-1} \mb {a} _{t+1} + \Psi _{i-1} \bepsilon _{t+1}  $

for $i=1,\ldots ,v$ with $v=\mr {max} (p,q+1)$. For $i>q$,

$\displaystyle  \mb {y} _{t+i|t}  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+i-j|t} + \sum _{j=0}^ s \Theta _ j^* \mb {x} _{t+i-j|t}  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+i-j|t} + \Theta _0^* \mb {x} _{t+i|t} + \sum _{j=1}^ s \Theta _ j^* \mb {x} _{t+i-j|t}  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+i-j|t} + \Theta _0^* \sum _{j=1}^ p A_ j \mb {x} _{t+i-j|t} + \sum _{j=1}^ s \Theta _ j^* \mb {x} _{t+i-j|t}  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+i-j|t} + \sum _{j=1}^ u ( \Theta _0^* A_ j + \Theta _ j^* ) \mb {x} _{t+i-j|t}  $

where $u=\mr {max} (p,s)$.

Define $\Pi _ j = \Theta _0^* A_ j + \Theta _ j^*$. For $i=v>q$ with $v=\mr {max} (p,q+1)$, you obtain

$\displaystyle  \mb {y} _{t+v|t}  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+v-j|t} + \sum _{j=1}^ u \Pi _ j\mb {x} _{t+v-j|t} ~ ~ \mr {for} ~ ~  u\leq v $
$\displaystyle \mb {y} _{t+v|t}  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+v-j|t} + \sum _{j=1}^ r \Pi _ j\mb {x} _{t+v-j|t} ~ ~ \mr {for} ~ ~  u>v  $

From the preceding relations, a state equation is

\[  \mb {z} _{t+1} = F\mb {z} _{t} + K\mb {x} _{t}^* + G\mb {e} _{t+1}  \]

and an observation equation is

\[  \mb {y} _ t = H\mb {z} _ t  \]

where

\[  \mb {z} _{t} = \left[\begin{matrix}  \mb {y} _{t}   \\ \mb {y} _{t+1|t}   \\ {\vdots }   \\ \mb {y} _{t+v-1|t}   \\ \mb {x} _{t}   \\ \mb {x} _{t+1|t}   \\ {\vdots }   \\ \mb {x} _{t+v-1|t}   \end{matrix}\right], ~ ~  \mb {x} _{t}^* = \left[\begin{matrix}  \mb {x} _{t+v-u}   \\ \mb {x} _{t+v-u+1}   \\ {\vdots }   \\ \mb {x} _{t-1}   \end{matrix}\right], ~ ~  \mb {e} _{t+1} = \left[\begin{matrix}  \mb {a} _{t+1}   \\ \bepsilon _{t+1}   \end{matrix}\right]  \]
\[  F = \left[\begin{matrix}  0   &  I_ k   &  0   &  {\cdots }   &  0   &  0   &  0   &  0   &  {\cdots }   &  0   \\ 0   &  0   &  I_ k   &  {\cdots }   &  0   &  0   &  0   &  0   &  {\cdots }   &  0   \\ {\vdots }   &  {\vdots }   &  {\vdots }   &  \ddots   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  \ddots   &  {\vdots }   \\ \Phi _{v}   &  \Phi _{v-1}   &  \Phi _{v-2}  & {\cdots }   &  \Phi _{1}   &  \Pi _{v}   & \Pi _{v-1}   & \Pi _{v-2}   & {\cdots }  & \Pi _{1}   \\ 0   &  0   &  0   &  {\cdots }   &  0   &  0   &  I_ r   &  0   &  {\cdots }   &  0   \\ 0   &  0   &  0   &  {\cdots }   &  0   &  0   &  0   &  I_ r   &  {\cdots }   &  0   \\ {\vdots }   &  {\vdots }   &  {\vdots }   &  \ddots   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  \ddots   &  {\vdots }   \\ 0   &  0   &  0   &  {\cdots }   &  0   &  A_ v   &  A_{v-1}   &  A_{v-2}   &  {\cdots }   &  A_1   \\ \end{matrix} \right]  \]
\[  K = \left[\begin{matrix}  0   &  0   &  {\cdots }   &  0   \\ 0   &  0   &  {\cdots }   &  0   \\ {\vdots }   &  {\vdots }   &  \ddots   &  {\vdots }   \\ \Pi _{u}   &  \Pi _{u-1}   &  {\cdots }   & \Pi _{v+1}   \\ 0   &  0   &  {\cdots }   &  0   \\ {\vdots }   &  {\vdots }   &  \ddots   &  {\vdots }   \\ 0   &  0   &  {\cdots }   &  0   \\ \end{matrix}\right], ~ ~  G = \left[\begin{matrix}  V_{0}   &  I_ k   \\ V_{1}   & \Psi _{1}   \\ {\vdots }   &  {\vdots }   \\ V_{v-1}   &  \Psi _{v-1}   \\ I_ r   &  0_{r\times k}   \\ \Psi ^ x_{1}   &  0_{r\times k}   \\ {\vdots }   &  {\vdots }   \\ \Psi ^ x_{v-1}   &  0_{r\times k}   \\ \end{matrix}\right]  \]

and

\[  H = [I_ k, 0_{k\times k}, \ldots , 0_{k\times k}, 0_{k\times r}, \ldots , 0_{k\times r}]  \]

Note that the matrix $K$ and the input vector $\mb {x} _{t}^*$ are defined only when $u>v$.