The VARMAX Procedure

VARMA and VARMAX Modeling

A VARMA($p,q$) process is written as

$\displaystyle  \mb {y} _{t} = \bdelta + \sum _{i=1}^{p}\Phi _ i\mb {y} _{t-i} + \bepsilon _ t -\sum _{i=1}^{q} \Theta _ i\bepsilon _{t-i}  $

or

$\displaystyle  \Phi (B) \mb {y} _{t} = \bdelta + \Theta (B) \bepsilon _ t  $

where $\Phi (B) = I_ k - \sum _{i=1}^{p} \Phi _ i B^ i$ and $\Theta (B) = I_ k - \sum _{i=1}^{q} \Theta _ i B^ i$.

Stationarity and Invertibility

For stationarity and invertibility of the VARMA process, the roots of $|\Phi (z)|=0$ and $|\Theta (z)|=0$ are outside the unit circle.

Parameter Estimation

Under the assumption of normality of the $\bepsilon _{t}$ with mean vector zero and nonsingular covariance matrix $\Sigma $, consider the conditional (approximate) log-likelihood function of a VARMA($p$,$q$) model with mean zero.

Define $Y=(\mb {y} _{1},\ldots ,\mb {y} _{T})’$ and $E=(\bepsilon _{1},\ldots ,\bepsilon _{T})’$ with $B^ i Y=(\mb {y} _{1-i},\ldots ,\mb {y} _{T-i})’$ and $B^ i E=(\bepsilon _{1-i},\ldots ,\bepsilon _{T-i})’$; define $\mb {y} =\mr {vec} (Y’)$ and $\mb {e} =\mr {vec} (E’)$. Then

\[  \mb {y} -\sum _{i=1}^ p (I_ T \otimes \Phi _ i)B^ i \mb {y} =\mb {e} - \sum _{i=1}^ q (I_ T \otimes \Theta _ i)B^ i \mb {e}  \]

where $B^ i \mb {y} = \mr {vec} [(B^ i Y)’]$ and $B^ i \mb {e} = \mr {vec} [(B^ i E)’]$.

Then, the conditional (approximate) log-likelihood function can be written as follows (Reinsel 1997):

$\displaystyle  \ell  $
$\displaystyle = $
$\displaystyle  -\frac{T}{2} \log |\Sigma | -\frac{1}{2}\sum _{t=1}^ T \bepsilon _{t}’\Sigma ^{-1}\bepsilon _{t}  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  -\frac{T}{2} \log |\Sigma |-\frac{1}{2}\mb {w} ’\Theta ’^{-1} (I_ T\otimes \Sigma ^{-1})\Theta ^{-1}\mb {w}  $

where $\mb {w} = \mb {y} -\sum _{i=1}^ p (I_ T \otimes \Phi _ i)B^ i \mb {y}$, and $\Theta $ is such that $\mb {e} -\sum _{i=1}^ q(I_ T\otimes \Theta _ i)B^ i\mb {e} =\Theta \mb {e}$.

For the exact log-likelihood function of a VARMA($p$,$q$) model, the Kalman filtering method is used transforming the VARMA process into the state-space form (Reinsel 1997).

The state-space form of the VARMA($p$,$q$) model consists of a state equation

\[  \mb {z} _{t} =F\mb {z} _{t-1} + G\bepsilon _{t}  \]

and an observation equation

\[  \mb {y} _ t = H\mb {z} _{t}  \]

where for $v=\mr {max} (p,q+1)$

\[  \mb {z} _{t}=(\mb {y} _{t}’,\mb {y} _{t+1|t}’,\ldots ,\mb {y} _{t+v-1|t}’)’  \]
\[  F = \left[\begin{matrix}  0   &  I_ k   &  0   &  {\cdots }   &  0   \\ 0   &  0   &  I_ k   &  {\cdots }   &  0   \\ {\vdots }   &  {\vdots }   &  {\vdots }   &  \ddots   &  {\vdots }   \\ \Phi _{v}   &  \Phi _{v-1}   &  \Phi _{v-2}  & {\cdots }   &  \Phi _{1}   \\ \end{matrix} \right], ~ ~  G = \left[\begin{matrix}  I_ k   \\ \Psi _{1}   \\ {\vdots }   \\ \Psi _{v-1}   \\ \end{matrix}\right]  \]

and

\[  H = [I_ k, 0, \ldots , 0]  \]

The Kalman filtering approach is used for evaluation of the likelihood function. The updating equation is

\[  \hat{\mb {z}}_{t|t} = {\hat{\mb {z}}}_{t|t-1} + K_ t\bepsilon _{t|t-1}  \]

with

\[  K_ t = P_{t|t-1}H’[H P_{t|t-1} H’]^{-1}  \]

and the prediction equation is

\[  \hat{\mb {z} }_{t|t-1} = F \hat{\mb {z} }_{t-1|t-1}, ~ ~  P_{t|t-1} = F P_{t-1|t-1} F’ + G \Sigma G’  \]

with $P_{t|t} = [I-K_ tH]P_{t|t-1}$ for $t=1,2,\ldots ,n$.

The log-likelihood function can be expressed as

\[  \ell = -\frac{1}{2} \sum _{t=1}^ T [ \log |\Sigma _{t|t-1}| - (\mb {y} _{t}-\hat{\mb {y} }_{t|t-1})’\Sigma _{t|t-1}^{-1} (\mb {y} _{t}-\hat{\mb {y} }_{t|t-1}) ]  \]

where $\hat{\mb {y} }_{t|t-1}$ and $\Sigma _{t|t-1}$ are determined recursively from the Kalman filter procedure. To construct the likelihood function from Kalman filtering, you obtain $\hat{\mb {y} }_{t|t-1}=H \hat{\mb {z} }_{t|t-1}$, $\hat{\bepsilon }_{t|t-1} = \mb {y} _{t}-\hat{\mb {y} }_{t|t-1}$, and $\Sigma _{t|t-1}=H P_{t|t-1} H’$.

Define the vector $\bbeta $

\[  \bbeta = ( \phi _1’, \ldots , \phi _ p’, \theta _1’, \ldots , \theta _ q’, \mr {vech} (\Sigma ) )’  \]

where $\phi _ i=\mr {vec} (\Phi _ i)$ and $\theta _ i=\mr {vec} (\Theta _ i)$.

The log-likelihood equations are solved by iterative numerical procedures such as the quasi-Newton optimization. The starting values for the AR and MA parameters are obtained from the least squares estimates.

Asymptotic Distribution of the Parameter Estimates

Under the assumptions of stationarity and invertibility for the VARMA model and the assumption that $\bepsilon _{t}$ is a white noise process, $\hat{\bbeta }$ is a consistent estimator for ${\bbeta }$ and $\sqrt {T}(\hat{\bbeta } - {\bbeta })$ converges in distribution to the multivariate normal $N(0, V^{-1})$ as $T \rightarrow \infty $, where $V$ is the asymptotic information matrix of ${\bbeta }$.

Asymptotic Distributions of Impulse Response Functions

Defining the vector $\bbeta $

\[  \bbeta = ( \phi _1’, \ldots , \phi _ p’, \theta _1’, \ldots , \theta _ q’ )’  \]

the asymptotic distribution of the impulse response function for a VARMA($p,q$) model is

\[  \sqrt {T} \mr {vec} (\hat\Psi _ j - \Psi _ j ) \stackrel{d}{\rightarrow } N(0, G_ j\Sigma _{\bbeta } G_ j’) ~ ~ j=1,2,\ldots  \]

where $\Sigma _{\bbeta }$ is the covariance matrix of the parameter estimates and

\[  G_ j= \frac{\partial \mr {vec} (\Psi _ j)}{\partial {\bbeta }} = \sum _{i=0}^{j-1} \mb {H} ’(\mb {A} ’)^{j-1-i} \otimes \mb {J} \mb {A} ^ i\mb {J} ’  \]

where $\mb {H} = [ I_ k, 0,\ldots , 0, I_ k, 0,\ldots , 0]’$ is a $k(p+q)\times k$ matrix with the second $I_ k$ following after $p$ block matrices; $\mb {J} = [ I_ k, 0,\ldots , 0 ]$ is a $k\times k(p+q)$ matrix; $\mb {A} $ is a $k(p+q)\times k(p+q)$ matrix,

$\displaystyle  \mb {A} = \left[\begin{matrix}  A_{11}   &  A_{12}   \\ A_{21}   &  A_{22}   \\ \end{matrix}\right]  $

where

$\displaystyle  A_{11} = \left[ \begin{matrix}  \Phi _1   &  \Phi _2   &  \cdots   &  \Phi _{p-1}   &  \Phi _{p}   \\ I_ k   &  0   &  \cdots   &  0   &  0   \\ 0   &  I_ k   &  \cdots   &  0   &  0   \\ \vdots   &  \vdots   &  \ddots   &  \vdots   &  \vdots   \\ 0   &  0   &  \cdots   &  I_ k   &  0   \\ \end{matrix} \right] ~ ~  A_{12} = \left[ \begin{matrix}  -\Theta _1   &  \cdots   &  -\Theta _{q-1}   &  -\Theta _{q}   \\ 0   &  \cdots   &  0   &  0   \\ 0   &  \cdots   &  0   &  0   \\ \vdots   &  \ddots   &  \vdots   &  \vdots   \\ 0   &  \cdots   &  0   &  0   \\ \end{matrix} \right]  $

    $ A_{21}$ is a $kq\times kp$ zero matrix, and

$\displaystyle  A_{22} = \left[ \begin{matrix}  0   &  0   &  \cdots   &  0   &  0   \\ I_ k   &  0   &  \cdots   &  0   &  0   \\ 0   &  I_ k   &  \cdots   &  0   &  0   \\ \vdots   &  \vdots   &  \ddots   &  \vdots   &  \vdots   \\ 0   &  0   &  \cdots   &  I_ k   &  0   \\ \end{matrix} \right]  $

An Example of a VARMA(1,1) Model

Consider a VARMA(1,1) model with mean zero

$\displaystyle  \mb {y} _{t} = \Phi _1\mb {y} _{t-1} + \bepsilon _ t - \Theta _1\bepsilon _{t-1}  $

where $\bepsilon _ t$ is the white noise process with a mean zero vector and the positive-definite covariance matrix $\Sigma $.

The following IML procedure statements simulate a bivariate vector time series from this model to provide test data for the VARMAX procedure:

proc iml;
   sig = {1.0  0.5, 0.5 1.25};
   phi = {1.2 -0.5, 0.6 0.3};
   theta = {0.5 -0.2, 0.1 0.3};
   /* to simulate the vector time series */
   call varmasim(y,phi,theta) sigma=sig n=100 seed=34657;
   cn = {'y1' 'y2'};
   create simul3 from y[colname=cn];
   append from y;
run;

The following statements fit a VARMA(1,1) model to the simulated data. You specify the order of the autoregressive model with the P= option and the order of moving-average model with the Q= option. You specify the quasi-Newton optimization in the NLOPTIONS statement as an optimization method.

proc varmax data=simul3;
   nloptions tech=qn;
   model y1 y2 / p=1 q=1 noint print=(estimates);
run;

Figure 36.46 shows the initial values of parameters. The initial values were estimated using the least squares method.

Figure 36.46: Start Parameter Estimates for the VARMA(1, 1) Model

The VARMAX Procedure

Optimization Start
Parameter Estimates
N Parameter Estimate Gradient
Objective
Function
1 AR1_1_1 1.013118 -0.987110
2 AR1_2_1 0.510233 0.163904
3 AR1_1_2 -0.399051 0.826824
4 AR1_2_2 0.441344 -9.605845
5 MA1_1_1 0.295872 -1.929771
6 MA1_2_1 -0.002809 2.408518
7 MA1_1_2 -0.044216 -0.632995
8 MA1_2_2 0.425334 0.888222


Figure 36.47 shows the default option settings for the quasi-Newton optimization technique.

Figure 36.47: Default Criteria for the quasi-Newton Optimization

Minimum Iterations 0
Maximum Iterations 200
Maximum Function Calls 2000
ABSGCONV Gradient Criterion 0.00001
GCONV Gradient Criterion 1E-8
ABSFCONV Function Criterion 0
FCONV Function Criterion 2.220446E-16
FCONV2 Function Criterion 0
FSIZE Parameter 0
ABSXCONV Parameter Change Criterion 0
XCONV Parameter Change Criterion 0
XSIZE Parameter 0
ABSCONV Function Criterion -1.34078E154
Line Search Method 2
Starting Alpha for Line Search 1
Line Search Precision LSPRECISION 0.4
DAMPSTEP Parameter for Line Search .
Singularity Tolerance (SINGULAR) 1E-8


Figure 36.48 shows the iteration history of parameter estimates.

Figure 36.48: Iteration History of Parameter Estimates

Iteration   Restarts Function
Calls
Active
Constraints
  Objective
Function
Objective
Function
Change
Max Abs
Gradient
Element
Step
Size
Slope of
Search
Direction
1   0 38 0   121.86537 0.1020 6.3068 0.00376 -54.483
2   0 57 0   121.76369 0.1017 6.9381 0.0100 -33.359
3   0 76 0   121.55605 0.2076 5.1526 0.0100 -31.265
4   0 95 0   121.36386 0.1922 5.7292 0.0100 -37.419
5   0 114 0   121.25741 0.1064 5.5107 0.0100 -17.708
6   0 133 0   121.22578 0.0316 4.9213 0.0240 -2.905
7   0 152 0   121.20582 0.0200 6.0114 0.0316 -1.268
8   0 171 0   121.18747 0.0183 5.5324 0.0457 -0.805
9   0 207 0   121.13613 0.0513 0.5829 0.628 -0.164
10   0 226 0   121.13536 0.000766 0.2054 1.000 -0.0021
11   0 245 0   121.13528 0.000082 0.0534 1.000 -0.0002
12   0 264 0   121.13528 2.537E-6 0.0101 1.000 -637E-8
13   0 283 0   121.13528 1.475E-7 0.000270 1.000 -3E-7


Figure 36.49 shows the final parameter estimates.

Figure 36.49: Results of Parameter Estimates for the VARMA(1, 1) Model

The VARMAX Procedure

Optimization Results
Parameter Estimates
N Parameter Estimate
1 AR1_1_1 1.018085
2 AR1_2_1 0.391465
3 AR1_1_2 -0.386513
4 AR1_2_2 0.552904
5 MA1_1_1 0.322906
6 MA1_2_1 -0.165661
7 MA1_1_2 -0.021533
8 MA1_2_2 0.586132


Figure 36.50 shows the AR coefficient matrix in terms of lag 1, the MA coefficient matrix in terms of lag 1, the parameter estimates, and their significance, which is one indication of how well the model fits the data.

Figure 36.50: Parameter Estimates for the VARMA(1, 1) Model

The VARMAX Procedure

Type of Model VARMA(1,1)
Estimation Method Maximum Likelihood Estimation

AR
Lag Variable y1 y2
1 y1 1.01809 -0.38651
  y2 0.39147 0.55290

MA
Lag Variable e1 e2
1 y1 0.32291 -0.02153
  y2 -0.16566 0.58613

Schematic Representation
Variable/Lag AR1 MA1
y1 +- +.
y2 ++ .+
+ is > 2*std error,  - is < -2*std error,  . is between,  * is N/A

Model Parameter Estimates
Equation Parameter Estimate Standard
Error
t Value Pr > |t| Variable
y1 AR1_1_1 1.01809 0.10257 9.93 0.0001 y1(t-1)
  AR1_1_2 -0.38651 0.09644 -4.01 0.0001 y2(t-1)
  MA1_1_1 0.32291 0.14530 2.22 0.0285 e1(t-1)
  MA1_1_2 -0.02153 0.14200 -0.15 0.8798 e2(t-1)
y2 AR1_2_1 0.39147 0.10062 3.89 0.0002 y1(t-1)
  AR1_2_2 0.55290 0.08421 6.57 0.0001 y2(t-1)
  MA1_2_1 -0.16566 0.15700 -1.06 0.2939 e1(t-1)
  MA1_2_2 0.58612 0.14115 4.15 0.0001 e2(t-1)


The fitted VARMA(1,1) model with estimated standard errors in parentheses is given as

$\displaystyle  \mb {y} _ t = \left( \begin{array}{rr} 1.01809 &  -0.38651 \\ (0.10256)& (0.09644)\\ 0.39147 &  0.55290 \\ (0.10062)& (0.08421)\\ \end{array} \right) \mb {y} _{t-1} + \bepsilon _ t - \left( \begin{array}{rr} 0.32291 &  -0.02153 \\ (0.14530)&  (0.14199)\\ -0.16566 &  0.58613 \\ (0.15699)&  (0.14115)\\ \end{array} \right) \bepsilon _{t-1}  $

VARMAX Modeling

A VARMAX($p,q,s$) process is written as

$\displaystyle  \mb {y} _{t} = \bdelta + \sum _{i=1}^{p}\Phi _ i\mb {y} _{t-i} + \sum _{i=0}^{s}\Theta ^*_ i\mb {x} _{t-i} + \bepsilon _ t -\sum _{i=1}^{q} \Theta _ i\bepsilon _{t-i}  $

or

$\displaystyle  \Phi (B)\mb {y} _{t} = \bdelta + \Theta ^*(B)\mb {x} _ t + \Theta (B) \bepsilon _ t  $

where $\Phi (B) = I_ k - \sum _{i=1}^{p} \Phi _ i B^ i$, $ \Theta ^*(B) = \Theta ^*_0 + \Theta ^*_1B +\cdots + \Theta ^*_ sB^ s $, and $\Theta (B) = I_ k - \sum _{i=1}^{q} \Theta _ i B^ i$.

The dimension of the state-space vector of the Kalman filtering method for the parameter estimation of the VARMAX($p$,$q$,$s$) model is large, which takes time and memory for computing. For convenience, the parameter estimation of the VARMAX($p$,$q$,$s$) model uses the two-stage estimation method, which first estimates the deterministic terms and exogenous parameters, and then maximizes the log-likelihood function of a VARMA($p$,$q$) model.

Some examples of VARMAX modeling are as follows:

   model y1 y2 = x1 / q=1;
   nloptions tech=qn;
   model y1 y2 = x1 / p=1 q=1 xlag=1 nocurrentx;
   nloptions tech=qn;