The STATESPACE Procedure

Parameter Estimation

The model is ${\mb {z}_{t+1}=\mb {Fz}_{t}+\mb {Ge}_{t+1}}$, where ${\mb {e}_{t}}$ is a sequence of independent multivariate normal innovations with mean vector 0 and variance ${{\Sigma }_{\mb {ee}}}$. The observed sequence ${\mb {x}_{t}}$ composes the first r components of ${\mb {z}_{t}}$, and thus ${\mb {x}_{t}=\mb {Hz}_{t}}$, where H is the ${r \times s}$ matrix ${\left[{\mb {I}_{r}\; \mb {0}}\right]}$.

Let ${\mb {E}}$ be the ${r \times n}$ matrix of innovations:

$\displaystyle  \Strong{E} = \left[\begin{matrix} \mb {e}_{1}   &  {\cdots }   &  \mb {e}_{n}   \end{matrix}\right] \nonumber  $

If the number of observations n is reasonably large, the log likelihood L can be approximated up to an additive constant as follows:

\[  L = -\frac{n}{2} {\ln }( {|\bSigma _{\mb {ee}}|} ) -\frac{1}{2}{trace} ( \bSigma ^{-1}_{\mb {ee}} \mb {EE} ’)  \]

The elements of ${\bSigma _{\mb {ee}}}$ are taken as free parameters and are estimated as follows:

\[  \mb {S}_{0} = \frac{1}{n}\mb {EE} ’  \]

Replacing ${\bSigma _{\mb {ee}}}$ by ${\mb {S}_{0}}$ in the likelihood equation, the log likelihood, up to an additive constant, is

\[  \mb {L} = -\frac{n}{2}{\ln }( {|\mb {S}_{0}|} )  \]

Letting B be the backshift operator, the formal relation between ${\mb {x}_{t}}$ and ${\mb {e}_{t}}$ is

\[  \mb {x}_{t}=\mb {H} (\mb {I} -{B}\mb {F} )^{-1}\mb {Ge}_{t}  \]
\[  \mb {e}_{t} = (\mb {H} (\mb {I} -{B}\mb {F} )^{-1}\mb {G} )^{-1}\mb {x}_{t} = \sum _{i=0}^{{\infty }}{\bXi _{i}\mb {x}_{t-i} }  \]

Letting ${\mb {C}_{i}}$ be the ith lagged sample covariance of ${\mb {x}_{t}}$ and neglecting end effects, the matrix ${\mb {S}_{0}}$ is

\[  \mb {S}_{0} = \sum _{i,j=0}^{{\infty }}{\bXi _{i}\mb {C}_{-i+j} \bXi ^{}_{j} }  \]

For the computation of ${\mb {S}_{0}}$, the infinite sum is truncated at the value of the KLAG= option. The value of the KLAG= option should be large enough that the sequence ${\bXi _{i}}$ is approximately 0 beyond that point.

Let ${\btheta }$ be the vector of free parameters in the ${\mb {F}}$ and ${\mb {G}}$ matrices. The derivative of the log likelihood with respect to the parameter $\btheta $ is

$\displaystyle  \frac{{\partial }L}{{\partial }\btheta } = -\frac{n}{2} \, \textrm{trace} \left( \Strong{S} ^{-1}_{0} \frac{{\partial }\Strong{S}_{0}}{{\partial }\btheta } \right) \nonumber  $

The second derivative is

$\displaystyle  \frac{{\partial }^{2}\Strong{L}}{{\partial }\btheta {\partial }\btheta } = \frac{n}{2}\left( \textrm{trace} \left( \Strong{S} ^{-1}_{0} \frac{{\partial }\Strong{S}_{0}}{{\partial }\btheta } \Strong{S} ^{-1}_{0} \frac{{\partial }\Strong{S}_{0}}{{\partial }\btheta } \right) - \textrm{trace} \left( \Strong{S} ^{-1}_{0} \frac{{\partial }^{2}\Strong{S}_{0}}{{\partial }\btheta {\partial }\btheta } \right) \right) \nonumber  $

Near the maximum, the first term is unimportant and the second term can be approximated to give the following second derivative approximation:

$\displaystyle  \frac{{\partial }^{2}L}{{\partial }\btheta {\partial }\btheta } \cong -n\,  \textrm{trace} \left( \Strong{S} ^{-1}_{0} \frac{{\partial }\Strong{E}}{{\partial }\btheta } \frac{{\partial }\Strong{E} }{{\partial }\btheta }\right) \nonumber  $

The first derivative matrix and this second derivative matrix approximation are computed from the sample covariance matrix ${\mb {C}_{0}}$ and the truncated sequence ${\bXi _{i}}$. The approximate likelihood function is maximized by a modified Newton-Raphson algorithm that employs these derivative matrices.

The matrix ${\mb {S}_{0}}$ is used as the estimate of the innovation covariance matrix, ${\bSigma _{\mb {ee}}}$. The negative of the inverse of the second derivative matrix at the maximum is used as an approximate covariance matrix for the parameter estimates. The standard errors of the parameter estimates printed in the parameter estimates tables are taken from the diagonal of this covariance matrix. The parameter covariance matrix is printed when the COVB option is specified.

If the data are nearly nonstationary, a better estimate of ${\bSigma _{\mb {ee}}}$ and the other parameters can sometimes be obtained by specifying the RESIDEST option. The RESIDEST option estimates the parameters by using conditional least squares instead of maximum likelihood.

The residuals are computed using the state space equation and the sample mean values of the variables in the model as start-up values. The estimate of ${\mb {S}_{0}}$ is then computed using the residuals from the ith observation on, where i is the maximum number of times any variable occurs in the state vector. A multivariate Gauss-Marquardt algorithm is used to minimize ${{|\mb {S}_{0}|}}$. See Harvey (1981a) for a further description of this method.