Time Series Analysis and Examples


Example 13.1 Kalman Filtering: Likelihood Function Evaluation

In the following example, the log-likelihood function of the SSM is computed by using a prediction-error decomposition. The data are the annual real gross national product (GNP) for the years 1909–1969. The GNP series $y_ t$ can be decomposed as

\[  y_ t = \mu _ t + \epsilon _ t  \]

where $\mu _ t$ is a trend component and $\epsilon _ t \sim (0, \sigma ^2_\epsilon )$ is a white noise error term. For more information about these data, see Nelson and Plosser (1982) The trend component is assumed to be generated from the stochastic equations

\begin{eqnarray*}  \mu _ t &  = &  \mu _{t-1} + \beta _{t-1} + \eta _{1t} \\ \beta _ t &  = &  \beta _{t-1} + \eta _{2t} \end{eqnarray*}

where $\eta _{1t} \sim (0, \sigma ^2_{\eta _1})$ and $\eta _{2t} \sim (0, \sigma ^2_{\eta _2})$ are independent white noise disturbances.

It is straightforward to construct the SSM of the real GNP series,

\begin{eqnarray*}  y_ t &  = &  \bH \mb{z}_ t + \epsilon _ t \\ \mb{z}_ t &  = &  \bF \mb{z}_{t-1} + \eta _ t \end{eqnarray*}

where

\begin{eqnarray*}  \bH &  = &  (1, 0) \\[0.10in] \bF &  = &  \left[ \begin{array}{cc} 1 &  1 \\ 0 &  1 \end{array} \right] \\[0.10in] \mb{z}_ t &  = &  (\mu _ t, \beta _ t)^{\prime } \\ \eta _ t &  = &  (\eta _{1t}, \eta _{2t})^{\prime } \\[0.10in] \mathrm{Var} \left( \left[ \begin{array}{c} \eta _ t \\ \epsilon _ t \end{array} \right] \right) &  = &  \left[ \begin{array}{ccc} \sigma ^2_{\eta 1} &  0 &  0 \\ 0 &  \sigma ^2_{\eta 2} &  0 \\ 0 &  0 &  \sigma ^2_\epsilon \end{array} \right] \end{eqnarray*}

When the observation noise $\epsilon _ t$ is normally distributed, the average log-likelihood function of the SSM is

\begin{eqnarray*}  \ell &  = &  \frac{1}{T} \sum _{t=1}^ T \ell _ t \\[0.10in] \ell _ t &  = &  -\frac{N_ y}{2}\log (2\pi ) - \frac{1}{2}\log (|\bC _ t|) - \frac{1}{2} \hat{\epsilon }^{\prime }_ t \bC ^{-1}_ t \hat{\epsilon }_ t \end{eqnarray*}

where $\bC _ t$ is the mean square error matrix of the prediction error $\hat{\epsilon }_ t$, such that $\bC _ t = \bH \bP _{t|t-1} \bH ^{\prime } + \bR _ t$.

As an example, consider the following annual real GNP data for 1909–1969:

title 'Likelihood Evaluation of SSM';
title2 'DATA: Annual Real GNP 1909-1969';
data gnp;
  input y @@;
datalines;
116.8 120.1 123.2 130.2 131.4 125.6 124.5 134.3
135.2 151.8 146.4 139.0 127.8 147.0 165.9 165.5
179.4 190.0 189.8 190.9 203.6 183.5 169.3 144.2
141.5 154.3 169.5 193.0 203.2 192.9 209.4 227.2
263.7 297.8 337.1 361.3 355.2 312.6 309.9 323.7
324.1 355.3 383.4 395.1 412.8 406.0 438.0 446.1
452.5 447.3 475.9 487.7 497.2 529.8 551.0 581.1
617.8 658.1 675.2 706.6 724.7
;

In the following program, the LIK module computes the average log-likelihood function. First, the average log-likelihood function is computed by using the default initial values: 0 for Z0 and the diagonal matrix $10^6I$ for VZ0. The second call of the module LIK produces the average log-likelihood function with the given initial conditions: Z0 = 0 and VZ0 = $10^{-3}I$. Output 13.1.1 shows a sizable difference between the uncertain initial condition (VZ0 = $10^6I$) and the almost deterministic initial condition (VZ0 = $10^{-3}I$).

proc iml;
start lik(y,a,b,f,h,var,z0,vz0);
    nz = nrow(f);  n = nrow(y);   k = ncol(y);
    pi = constant('pi');
    const = k*log(2*pi);
    if ( sum(z0 = .) | sum(vz0 = .) ) then
       call kalcvf(pred,vpred,filt,vfilt,y,0,a,f,b,h,var);
    else
       call kalcvf(pred,vpred,filt,vfilt,y,0,a,f,b,h,var,z0,vz0);
    et = y - pred*h`;

    sum1 = 0;  sum2 = 0;
    do i = 1 to n;
       vpred_i = vpred[(i-1)*nz+1:i*nz,];
       et_i = et[i,];
       ft = h*vpred_i*h` + var[nz+1:nz+k,nz+1:nz+k];
       sum1 = sum1 + log(det(ft));
       sum2 = sum2 + et_i*inv(ft)*et_i`;
    end;
    return(-.5*const-.5*(sum1+sum2)/n);
finish;

use gnp;
read all var {y};
close gnp;

f = {1 1, 0 1};
h = {1 0};
a = j(nrow(f),1,0);
b = j(nrow(h),1,0);
var = diag(j(1,nrow(f)+ncol(y),1e-3));

/*-- initial values are computed --*/
z0 = j(1,nrow(f),.);
vz0 = j(nrow(f),nrow(f),.);
logl = lik(y,a,b,f,h,var,z0,vz0);
print 'No initial values are provided', logl;

/*-- initial values are given --*/
z0 = j(1,nrow(f),0);
vz0 = 1e-3#i(nrow(f));
logl = lik(y,a,b,f,h,var,z0,vz0);
print 'Initial values are provided', logl;

Output 13.1.1: Average Log Likelihood of SSM

Likelihood Evaluation of SSM
DATA: Annual Real GNP 1909-1969

No initial values are provided

logl
-26313.74

Initial values are provided

logl
-91883.49



The following statements compute one-step predictions, filtered values, and real GNP series under the moderate initial condition (VZ0 = $10I$). Output 13.1.2 shows the observed data, the predicted state vectors, and the filtered state vectors for the first 16 observations.

z0 = j(1,nrow(f),0);
vz0 = 10#i(nrow(f));
call kalcvf(pred0,vpred,filt0,vfilt,y,1,a,f,b,h,var,z0,vz0);

/* print results for the first few observations */
y0 = y;
y    = y0[1:16];
pred = pred0[1:16,];
filt = filt0[1:16,];
print y pred filt;

Output 13.1.2: Filtering and One-Step Prediction

y pred   filt  
116.8 0 0 116.78832 0
120.1 116.78832 0 120.09967 3.3106857
123.2 123.41035 3.3106857 123.22338 3.1938303
130.2 126.41721 3.1938303 129.59203 4.8825531
131.4 134.47459 4.8825531 131.93806 3.5758561
125.6 135.51391 3.5758561 127.36247 -0.610017
124.5 126.75246 -0.610017 124.90123 -1.560708
134.3 123.34052 -1.560708 132.34754 3.0651076
135.2 135.41265 3.0651076 135.23788 2.9753526
151.8 138.21324 2.9753526 149.37947 8.7100967
146.4 158.08957 8.7100967 148.48254 3.7761324
139 152.25867 3.7761324 141.36208 -1.82012
127.8 139.54196 -1.82012 129.89187 -6.776195
147 123.11568 -6.776195 142.74492 3.3049584
165.9 146.04988 3.3049584 162.36363 11.683345
165.5 174.04698 11.683345 167.02267 8.075817