Example 13.2 Kalman Filtering: Likelihood Function Evaluation

In the following example, the log-likelihood function of the SSM is computed by using prediction error decomposition. The annual real GNP series, $y_ t$, can be decomposed as

\[  y_ t = \mu _ t + \epsilon _ t  \]

where $\mu _ t$ is a trend component and $\epsilon _ t$ is a white noise error with $\epsilon _ t \sim (0, \sigma ^2_\epsilon )$. Refer to Nelson and Plosser (1982) for more details about these data. The trend component is assumed to be generated from the following stochastic equations:

\begin{eqnarray*}  \mu _ t &  = &  \mu _{t-1} + \beta _{t-1} + \eta _{1t} \\ \beta _ t &  = &  \beta _{t-1} + \eta _{2t} \end{eqnarray*}

where $\eta _{1t}$ and $\eta _{2t}$ are independent white noise disturbances with $\eta _{1t} \sim (0, \sigma ^2_{\eta _1})$ and $\eta _{2t} \sim (0, \sigma ^2_{\eta _2})$.

It is straightforward to construct the SSM of the real GNP series.

\begin{eqnarray*}  y_ t &  = &  \bH \textbf{z}_ t + \epsilon _ t \\ \textbf{z}_ t &  = &  \bF \textbf{z}_{t-1} + \eta _ t \end{eqnarray*}

where

\begin{eqnarray*}  \bH &  = &  (1, 0) \\[0.10in] \bF &  = &  \left[ \begin{array}{cc} 1 &  1 \\ 0 &  1 \end{array} \right] \\[0.10in] \textbf{z}_ t &  = &  (\mu _ t, \beta _ t)^{\prime } \\ \eta _ t &  = &  (\eta _{1t}, \eta _{2t})^{\prime } \\[0.10in] \mathrm{Var} \left( \left[ \begin{array}{c} \eta _ t \\ \epsilon _ t \end{array} \right] \right) &  = &  \left[ \begin{array}{ccc} \sigma ^2_{\eta 1} &  0 &  0 \\ 0 &  \sigma ^2_{\eta 2} &  0 \\ 0 &  0 &  \sigma ^2_\epsilon \end{array} \right] \end{eqnarray*}

When the observation noise $\epsilon _ t$ is normally distributed, the average log-likelihood function of the SSM is

\begin{eqnarray*}  \ell &  = &  \frac{1}{T} \sum _{t=1}^ T \ell _ t \\[0.10in] \ell _ t &  = &  -\frac{N_ y}{2}\log (2\pi ) - \frac{1}{2}\log (|\bC _ t|) - \frac{1}{2} \hat{\epsilon }^{\prime }_ t \bC ^{-1}_ t \hat{\epsilon }_ t \end{eqnarray*}

where $\bC _ t$ is the mean square error matrix of the prediction error $\hat{\epsilon }_ t$, such that $\bC _ t = \bH \bP _{t|t-1} \bH ^{\prime } + \bR _ t$.

The LIK module computes the average log-likelihood function. First, the average log-likelihood function is computed by using the default initial values: Z0=0 and VZ0=$10^6$I. The second call of module LIK produces the average log-likelihood function with the given initial conditions: Z0=0 and VZ0=$10^{-3}$I. You can notice a sizable difference between the uncertain initial condition (VZ0=$10^6$I) and the almost deterministic initial condition (VZ0=$10^{-3}$I) in Output 13.2.1.

Finally, the first 15 observations of one-step predictions, filtered values, and real GNP series are produced under the moderate initial condition (VZ0=$10$I). The data are the annual real GNP for the years 1909 to 1969. Here is the code:

 title 'Likelihood Evaluation of SSM';
 title2 'DATA: Annual Real GNP 1909-1969';
 data gnp;
    input y @@;
 datalines;
 116.8 120.1 123.2 130.2 131.4 125.6 124.5 134.3
 135.2 151.8 146.4 139.0 127.8 147.0 165.9 165.5
 179.4 190.0 189.8 190.9 203.6 183.5 169.3 144.2
 141.5 154.3 169.5 193.0 203.2 192.9 209.4 227.2
 263.7 297.8 337.1 361.3 355.2 312.6 309.9 323.7
 324.1 355.3 383.4 395.1 412.8 406.0 438.0 446.1
 452.5 447.3 475.9 487.7 497.2 529.8 551.0 581.1
 617.8 658.1 675.2 706.6 724.7
 ;
 run;
proc iml;
start lik(y,a,b,f,h,var,z0,vz0);
    nz = nrow(f);
    n = nrow(y);
    k = ncol(y);
    const = k*log(8*atan(1));
    if ( sum(z0 = .) | sum(vz0 = .) ) then
       call kalcvf(pred,vpred,filt,vfilt,y,0,a,f,b,h,var);
    else
       call kalcvf(pred,vpred,filt,vfilt,y,0,a,f,b,h,var,z0,vz0);
    et = y - pred*h`;
    sum1 = 0;
    sum2 = 0;

    do i = 1 to n;
       vpred_i = vpred[(i-1)*nz+1:i*nz,];
       et_i = et[i,];
       ft = h*vpred_i*h` + var[nz+1:nz+k,nz+1:nz+k];
       sum1 = sum1 + log(det(ft));
       sum2 = sum2 + et_i*inv(ft)*et_i`;
    end;
    return(-.5*const-.5*(sum1+sum2)/n);
finish;

use gnp;
read all var {y};
close gnp;

f = {1 1, 0 1};
h = {1 0};
a = j(nrow(f),1,0);
b = j(nrow(h),1,0);
var = diag(j(1,nrow(f)+ncol(y),1e-3));
/*-- initial values are computed --*/
z0 = j(1,nrow(f),.);
vz0 = j(nrow(f),nrow(f),.);
logl = lik(y,a,b,f,h,var,z0,vz0);
print 'No initial values are given', logl;
/*-- initial values are given --*/
z0 = j(1,nrow(f),0);
vz0 = 1e-3#i(nrow(f));
logl = lik(y,a,b,f,h,var,z0,vz0);
print 'Initial values are given', logl;
z0 = j(1,nrow(f),0);
vz0 = 10#i(nrow(f));
call kalcvf(pred0,vpred,filt0,vfilt,y,1,a,f,b,h,var,z0,vz0);

y0 = y;
free y;
y = j(16,1,0);
pred = j(16,2,0);
filt = j(16,2,0);
do i=1 to 16;
   y[i] = y0[i];
   pred[i,] = pred0[i,];
   filt[i,] = filt0[i,];
end;
print y pred filt;
quit;

Output 13.2.1: Average Log Likelihood of SSM

Likelihood Evaluation of SSM
DATA: Annual Real GNP 1909-1969

No initial values are given

logl
-26313.74

Initial values are given

logl
-91883.49


Output 13.2.2 shows the observed data, the predicted state vectors, and the filtered state vectors for the first 16 observations.

Output 13.2.2: Filtering and One-Step Prediction

y pred   filt  
116.8 0 0 116.78832 0
120.1 116.78832 0 120.09967 3.3106857
123.2 123.41035 3.3106857 123.22338 3.1938303
130.2 126.41721 3.1938303 129.59203 4.8825531
131.4 134.47459 4.8825531 131.93806 3.5758561
125.6 135.51391 3.5758561 127.36247 -0.610017
124.5 126.75246 -0.610017 124.90123 -1.560708
134.3 123.34052 -1.560708 132.34754 3.0651076
135.2 135.41265 3.0651076 135.23788 2.9753526
151.8 138.21324 2.9753526 149.37947 8.7100967
146.4 158.08957 8.7100967 148.48254 3.7761324
139 152.25867 3.7761324 141.36208 -1.82012
127.8 139.54196 -1.82012 129.89187 -6.776195
147 123.11568 -6.776195 142.74492 3.3049584
165.9 146.04988 3.3049584 162.36363 11.683345
165.5 174.04698 11.683345 167.02267 8.075817