 
                
               

The Kalman filter and smoother (KFS) algorithm is the main computational tool for using SSM for data analysis. This subsection briefly describes the basic quantities generated by this algorithm and their relationship to the output generated by the SSM procedure. For proper treatment of SSMs with a diffuse initial condition or when regression variables are present, a modified version of the traditional KFS, called diffuse Kalman filter and smoother (DKFS), is needed. A good discussion of the different variants of the traditional and diffuse KFS can be found in Durbin and Koopman (2001). The DKFS implemented in the SSM procedure closely follows the treatment in de Jong and Chu-Chun-Lin (2003). Additional details can be found in these references.
The state space model equations (see the section State Space Model and Notation) imply that the combined response data vector  has a Gaussian probability distribution. This probability distribution is proper if
 has a Gaussian probability distribution. This probability distribution is proper if  , the dimension of the diffuse vector
, the dimension of the diffuse vector  in the initial condition, is 0 and if
 in the initial condition, is 0 and if  , the number of regression variables in the observation equation, is also 0 (the regression parameter
, the number of regression variables in the observation equation, is also 0 (the regression parameter  is also treated as a diffuse vector). Otherwise, this probability distribution is improper. The KFS algorithm is a combination of two iterative phases: a forward pass through the data, called filtering, and a backward pass through the data, called smoothing, that uses the quantities generated during filtering. One of the advantages of using the SSM formulation to analyze the time
            series data is its ability to handle the missing values in the response variables. The KFS algorithm appropriately handles
            the missing values in
 is also treated as a diffuse vector). Otherwise, this probability distribution is improper. The KFS algorithm is a combination of two iterative phases: a forward pass through the data, called filtering, and a backward pass through the data, called smoothing, that uses the quantities generated during filtering. One of the advantages of using the SSM formulation to analyze the time
            series data is its ability to handle the missing values in the response variables. The KFS algorithm appropriately handles
            the missing values in  . For additional information about how PROC SSM handles missing values, see the section Missing Values.
. For additional information about how PROC SSM handles missing values, see the section Missing Values. 
         
 The filtering pass sequentially computes the quantities shown in Table 27.5 for  and
 and  .
. 
            
Table 27.5: KFS: Filtering Phase
| Quantity | Description | 
|---|---|
| 
 | One-step-ahead prediction of the response values | 
| 
 | One-step-ahead prediction residuals | 
| 
 | Variance of the one-step-ahead prediction | 
| 
 | One-step-ahead prediction of the state vector | 
| 
 |  Covariance of  | 
| 
 |   | 
| 
 |   | 
| 
 |  Estimate of  | 
| 
 |  Covariance of  | 
 Here the notation  denotes the conditional expectation of
 denotes the conditional expectation of  given the history up to the index
 given the history up to the index  :
:  . Similarly
. Similarly  denotes the corresponding conditional variance. The quantity
 denotes the corresponding conditional variance. The quantity  is set to missing whenever
 is set to missing whenever  is missing. Note that
 is missing. Note that  are one-step-ahead forecasts only when the model has only one response variable and the data are a time series; in all other cases it is more
               appropriate to call them one-measurement-ahead forecasts (since the next measurement might be at the same time point). Despite this,
 are one-step-ahead forecasts only when the model has only one response variable and the data are a time series; in all other cases it is more
               appropriate to call them one-measurement-ahead forecasts (since the next measurement might be at the same time point). Despite this,  are called one-step-ahead predictions (and
 are called one-step-ahead predictions (and  are called one-step-ahead residuals) throughout this document. In the diffuse case, the conditional expectations must be
               appropriately interpreted. The vector
 are called one-step-ahead residuals) throughout this document. In the diffuse case, the conditional expectations must be
               appropriately interpreted. The vector  and the matrix
 and the matrix  contain some accumulated quantities that are needed for the estimation of
 contain some accumulated quantities that are needed for the estimation of  and
 and  . Of course, when
. Of course, when  (the nondiffuse case), these quantities are not needed. In the diffuse case, because the matrix
 (the nondiffuse case), these quantities are not needed. In the diffuse case, because the matrix  is sequentially accumulated (starting at
 is sequentially accumulated (starting at  ), it might not be invertible until some
), it might not be invertible until some  . The filtering process is called initialized after
. The filtering process is called initialized after  . In some situations, this initialization might not happen even after the entire sample is processed—that is, the filtering
               process remains uninitialized. This can happen if the regression variables are collinear or if the data are not sufficient to estimate the initial condition
. In some situations, this initialization might not happen even after the entire sample is processed—that is, the filtering
               process remains uninitialized. This can happen if the regression variables are collinear or if the data are not sufficient to estimate the initial condition
                for some other reason.
 for some other reason. 
            
The filtering process is used for a variety of purposes. One important use of filtering is to compute the likelihood of the
               data. In the model-fitting phase, the unknown model parameters  are estimated by maximum likelihood. This requires repeated evaluation of the likelihood at different trial values of
 are estimated by maximum likelihood. This requires repeated evaluation of the likelihood at different trial values of  . After
. After  is estimated, it is treated as a known vector. The filtering process is used again with the fitted model in the forecasting
               phase, when the one-step-ahead forecasts and residuals based on the fitted model are provided. In addition, this filtering
               output is needed by the smoothing phase to produce the full-sample component estimates.
 is estimated, it is treated as a known vector. The filtering process is used again with the fitted model in the forecasting
               phase, when the one-step-ahead forecasts and residuals based on the fitted model are provided. In addition, this filtering
               output is needed by the smoothing phase to produce the full-sample component estimates. 
            
 In view of the Gaussian nature of the response vector, the likelihood of  ,
,  , can be computed by using the prediction-error decomposition, which leads to the formula
, can be computed by using the prediction-error decomposition, which leads to the formula 
            
![\[  -2 \log \mb {L}( \mb {Y}, \pmb {\theta } ) = N_{0} * \log 2 \pi + \sum _{t=1}^{n} \sum _{i=1}^{q*p_{t}} ( \log F_{t, i} + \frac{\nu _{t,i}^{2} }{ F_{t, i} } ) - \log ( | \mb {S}_{n, p_{n}}^{-1} | ) - \mb {b}_{n, p_{n}}^{} \mb {S}_{n, p_{n}}^{-1} \mb {b}_{n, p_{n}}  \]](images/etsug_ssm0226.png)
 where  ,
,  denotes the determinant of
 denotes the determinant of  , and
, and  denotes the transpose of the column vector
 denotes the transpose of the column vector  . In the preceding formula, the terms that are associated with the missing response values
. In the preceding formula, the terms that are associated with the missing response values  are excluded and
 are excluded and  denotes the total number of nonmissing response values in the sample. If
 denotes the total number of nonmissing response values in the sample. If  is not invertible, then a generalized inverse is used in place of
 is not invertible, then a generalized inverse is used in place of  , and
, and  is computed based on the nonzero eigenvalues of
 is computed based on the nonzero eigenvalues of  . Moreover, in this case
. Moreover, in this case  . When
. When  has a proper distribution (that is, when
 has a proper distribution (that is, when  ), the terms that involve
), the terms that involve  and
 and  are absent and the preceding likelihood is proper. Otherwise, it is called the diffuse likelihood or the restricted likelihood.
 are absent and the preceding likelihood is proper. Otherwise, it is called the diffuse likelihood or the restricted likelihood.
               
            
When the model specification contains any unknown parameters  , they are estimated by maximizing the preceding likelihood function. This is done by using a nonlinear optimization process
               that involves repeated evaluations of
, they are estimated by maximizing the preceding likelihood function. This is done by using a nonlinear optimization process
               that involves repeated evaluations of  at different values of
 at different values of  . The maximum likelihood (ML) estimate of
. The maximum likelihood (ML) estimate of  is denoted by
 is denoted by  . When the restricted likelihood is used for computing
. When the restricted likelihood is used for computing  , the estimate is called the restricted maximum likelihood (REML) estimate. Approximate standard errors of
, the estimate is called the restricted maximum likelihood (REML) estimate. Approximate standard errors of  are computed by taking the square root of the diagonal elements of its (approximate) covariance matrix. This covariance is
               computed as
 are computed by taking the square root of the diagonal elements of its (approximate) covariance matrix. This covariance is
               computed as  , where
, where  is the Hessian (the matrix of the second-order partials) of
 is the Hessian (the matrix of the second-order partials) of  evaluated at the optimum
 evaluated at the optimum  .
. 
            
Let  denote the dimension of the parameter vector
 denote the dimension of the parameter vector  . After the parameter estimation is completed, a table, called “Likelihood Computation Summary” is printed. It summarizes the likelihood calculations at
. After the parameter estimation is completed, a table, called “Likelihood Computation Summary” is printed. It summarizes the likelihood calculations at  as shown in Table 27.6.
 as shown in Table 27.6. 
            
Table 27.6: Likelihood Computation Summary
| Quantity | Formula | 
|---|---|
| Nonmissing response values used |   | 
| Estimated parameters |   | 
| Initialized diffuse state elements |   | 
| Normalized residual sum of squares |   | 
| Full log likelihood |   | 
 In addition, the “Likelihood Based Information Criteria” table reports a variety of information-based criteria, which are functions of  ,
,  , and
, and  . Table 27.7 summarizes the reported information criteria in smaller-is-better form:
. Table 27.7 summarizes the reported information criteria in smaller-is-better form: 
            
Table 27.7: Information Criteria
| Criterion | Formula | Reference | 
|---|---|---|
| AIC |   | Akaike (1974) | 
| AICC |   | Hurvich and Tsai (1989) | 
| Burnham and Anderson (1998) | ||
| HQIC |   | Hannan and Quinn (1979) | 
| BIC |   | Schwarz (1978) | 
| CAIC |   | Bozdogan (1987) | 
 After the model-fitting phase, the filtering process is repeated again to produce the model-based one-step-ahead response
               variable forecasts ( ), residuals (
), residuals ( ), and their standard errors (
), and their standard errors ( ). In addition, one-step-ahead forecasts of the components that are specified in the MODEL statements, and any other user-defined
               linear combinations of
). In addition, one-step-ahead forecasts of the components that are specified in the MODEL statements, and any other user-defined
               linear combinations of  , are also produced. These forecasts are set to missing as long as the index
, are also produced. These forecasts are set to missing as long as the index  (that is, until the filtering process is initialized). If the filtering process remains uninitialized, then all the quantities
               that are related to the one-step-ahead forecast (such as
 (that is, until the filtering process is initialized). If the filtering process remains uninitialized, then all the quantities
               that are related to the one-step-ahead forecast (such as  and
 and  ) are reported as missing. When the fitted model is appropriate, the one-step-ahead residuals
) are reported as missing. When the fitted model is appropriate, the one-step-ahead residuals  form a sequence of uncorrelated normal variates. This fact can be used during model diagnostic process.
 form a sequence of uncorrelated normal variates. This fact can be used during model diagnostic process. 
            
 After the filtering phase of KFS produces the one-step-ahead predictions of the response variables and the underlying state
               vectors, the smoothing phase of KFS produces the full-sample versions of these quantities—that is, rather than using the history
               up to  , the entire sample
, the entire sample  is used. The smoothing phase of KFS is a backward algorithm, which begins at
 is used. The smoothing phase of KFS is a backward algorithm, which begins at  and
 and  and goes back toward
 and goes back toward  and
 and  . It produces the following quantities:
. It produces the following quantities: 
            
Table 27.8: KFS: Smoothing Phase
| Quantity | Description | 
|---|---|
| 
 | Interpolated response value | 
| 
 | Variance of the interpolated response value | 
| 
 | Full-sample estimate of the state vector | 
| 
 |  Covariance of  | 
| 
 |  Full-sample estimate of  | 
| 
 |  Covariance of  | 
| 
 | Estimate of additive outlier | 
| 
 |  Variance of  | 
| 
 | Maximal state shock chi-square statistic | 
 Note that if  is not missing, then
 is not missing, then  and
 and  because
 because  is completely known, given
 is completely known, given  . Therefore,
. Therefore,  provides nontrivial information only when
 provides nontrivial information only when  is missing—in which case
 is missing—in which case  represents the best estimate of
 represents the best estimate of  based on the available data. The full-sample estimates of components that are specified in the model equations are based
               on the corresponding linear combinations of
 based on the available data. The full-sample estimates of components that are specified in the model equations are based
               on the corresponding linear combinations of  . Similarly, their standard errors are computed by using appropriate functions of
. Similarly, their standard errors are computed by using appropriate functions of  . The estimate of the additive outlier,
. The estimate of the additive outlier,  , is the difference between the observed response value
, is the difference between the observed response value  and its estimate or prediction by using all the data except
 and its estimate or prediction by using all the data except  , which is denoted by
, which is denoted by  . The estimate
. The estimate  is missing when
 is missing when  is missing.
 is missing.  is also called the prediction error—as opposed to the one-step-ahead residual,
 is also called the prediction error—as opposed to the one-step-ahead residual,  . Similar to
. Similar to  , the prediction errors can be used in checking the model adequacy. The prediction errors are normally distributed; however,
               unlike
, the prediction errors can be used in checking the model adequacy. The prediction errors are normally distributed; however,
               unlike  , they are not serially uncorrelated. You can request the printing of the prediction error sum of squares (PRESS) by specifying
               the PRESS option in the OUTPUT statement. The maximal state shock chi-square statistic,
, they are not serially uncorrelated. You can request the printing of the prediction error sum of squares (PRESS) by specifying
               the PRESS option in the OUTPUT statement. The maximal state shock chi-square statistic,  , is computed at each distinct time point and is described in de Jong and Penzer (1998) (the second term in the right-hand
               side of Equation 14). Loosely speaking,
, is computed at each distinct time point and is described in de Jong and Penzer (1998) (the second term in the right-hand
               side of Equation 14). Loosely speaking,  is a measure of the magnitude of unexpected change in the underlying state at time
 is a measure of the magnitude of unexpected change in the underlying state at time  . A large value of
. A large value of  , which follows chi-square distribution with degrees of freedom equal to
, which follows chi-square distribution with degrees of freedom equal to  (the state size), can signify change in the data generation mechanism at time
 (the state size), can signify change in the data generation mechanism at time  . For more information about the computation, precise definitions of additive outliers and maximal state shocks, and their
               use in the detection of structural change in the observation process, see de Jong and Penzer (1998). The computation of
. For more information about the computation, precise definitions of additive outliers and maximal state shocks, and their
               use in the detection of structural change in the observation process, see de Jong and Penzer (1998). The computation of  can be expensive for large state size and is not done by default. You can turn on its computation by specifying the MAXSHOCK
               option in the OUTPUT statement.
 can be expensive for large state size and is not done by default. You can turn on its computation by specifying the MAXSHOCK
               option in the OUTPUT statement. 
            
If the filtering process remains uninitialized until the end of the sample (that is, if  is not invertible), some linear combinations of
 is not invertible), some linear combinations of  and
 and  are not estimable. This, in turn, implies that some linear combinations of
 are not estimable. This, in turn, implies that some linear combinations of  are also inestimable. These inestimable quantities are reported as missing. For more information about the estimability of
               the state effects, see Selukar (2010).
 are also inestimable. These inestimable quantities are reported as missing. For more information about the estimability of
               the state effects, see Selukar (2010).