The Kalman filter and smoother (KFS) algorithm is the main computational tool for using SSM for data analysis. This subsection briefly describes the basic quantities generated by this algorithm and their relationship to the output generated by the SSM procedure. For proper treatment of SSMs with a diffuse initial condition or when regression variables are present, a modified version of the traditional KFS, called diffuse Kalman filter and smoother (DKFS), is needed. A good discussion of the different variants of the traditional and diffuse KFS can be found in Durbin and Koopman (2001). The DKFS implemented in the SSM procedure closely follows the treatment in de Jong and ChuChunLin (2003). Additional details can be found in these references.
The state space model equations (see the section State Space Model and Notation) imply that the combined response data vector has a Gaussian probability distribution. This probability distribution is proper if , the dimension of the diffuse vector in the initial condition, is 0 and if , the number of regression variables in the observation equation, is also 0 (the regression parameter is also treated as a diffuse vector). Otherwise, this probability distribution is improper. The KFS algorithm is a combination of two iterative phases: a forward pass through the data, called filtering, and a backward pass through the data, called smoothing, that uses the quantities generated during filtering. One of the advantages of using the SSM formulation to analyze the time series data is its ability to handle the missing values in the response variables. The KFS algorithm appropriately handles the missing values in . For additional information about how PROC SSM handles missing values, see the section Missing Values.
The filtering pass sequentially computes the quantities shown in Table 27.5 for and .
Table 27.5: KFS: Filtering Phase
Quantity 
Description 


Onestepahead prediction of the response values 

Onestepahead prediction residuals 

Variance of the onestepahead prediction 

Onestepahead prediction of the state vector 

Covariance of 

dimensional vector 

dimensional symmetric matrix 

Estimate of and by using the data up to 

Covariance of 
Here the notation denotes the conditional expectation of given the history up to the index : . Similarly denotes the corresponding conditional variance. The quantity is set to missing whenever is missing. Note that are onestepahead forecasts only when the model has only one response variable and the data are a time series; in all other cases it is more appropriate to call them onemeasurementahead forecasts (since the next measurement might be at the same time point). Despite this, are called onestepahead predictions (and are called onestepahead residuals) throughout this document. In the diffuse case, the conditional expectations must be appropriately interpreted. The vector and the matrix contain some accumulated quantities that are needed for the estimation of and . Of course, when (the nondiffuse case), these quantities are not needed. In the diffuse case, because the matrix is sequentially accumulated (starting at ), it might not be invertible until some . The filtering process is called initialized after . In some situations, this initialization might not happen even after the entire sample is processed—that is, the filtering process remains uninitialized. This can happen if the regression variables are collinear or if the data are not sufficient to estimate the initial condition for some other reason.
The filtering process is used for a variety of purposes. One important use of filtering is to compute the likelihood of the data. In the modelfitting phase, the unknown model parameters are estimated by maximum likelihood. This requires repeated evaluation of the likelihood at different trial values of . After is estimated, it is treated as a known vector. The filtering process is used again with the fitted model in the forecasting phase, when the onestepahead forecasts and residuals based on the fitted model are provided. In addition, this filtering output is needed by the smoothing phase to produce the fullsample component estimates.
In view of the Gaussian nature of the response vector, the likelihood of , , can be computed by using the predictionerror decomposition, which leads to the formula

where , denotes the determinant of , and denotes the transpose of the column vector . In the preceding formula, the terms that are associated with the missing response values are excluded and denotes the total number of nonmissing response values in the sample. If is not invertible, then a generalized inverse is used in place of , and is computed based on the nonzero eigenvalues of . Moreover, in this case . When has a proper distribution (that is, when ), the terms that involve and are absent and the preceding likelihood is proper. Otherwise, it is called the diffuse likelihood or the restricted likelihood.
When the model specification contains any unknown parameters , they are estimated by maximizing the preceding likelihood function. This is done by using a nonlinear optimization process that involves repeated evaluations of at different values of . The maximum likelihood (ML) estimate of is denoted by . When the restricted likelihood is used for computing , the estimate is called the restricted maximum likelihood (REML) estimate. Approximate standard errors of are computed by taking the square root of the diagonal elements of its (approximate) covariance matrix. This covariance is computed as , where is the Hessian (the matrix of the secondorder partials) of evaluated at the optimum .
Let denote the dimension of the parameter vector . After the parameter estimation is completed, a table, called “Likelihood Computation Summary” is printed. It summarizes the likelihood calculations at as shown in Table 27.6.
Table 27.6: Likelihood Computation Summary
Quantity 
Formula 

Nonmissing response values used 

Estimated parameters 

Initialized diffuse state elements 

Normalized residual sum of squares 

Full log likelihood 

In addition, the “Likelihood Based Information Criteria” table reports a variety of informationbased criteria, which are functions of , , and . Table 27.7 summarizes the reported information criteria in smallerisbetter form:
Table 27.7: Information Criteria
Criterion 
Formula 
Reference 

AIC 

Akaike (1974) 
AICC 

Hurvich and Tsai (1989) 
Burnham and Anderson (1998) 

HQIC 

Hannan and Quinn (1979) 
BIC 

Schwarz (1978) 
CAIC 

Bozdogan (1987) 
After the modelfitting phase, the filtering process is repeated again to produce the modelbased onestepahead response variable forecasts (), residuals (), and their standard errors (). In addition, onestepahead forecasts of the components that are specified in the MODEL statements, and any other userdefined linear combinations of , are also produced. These forecasts are set to missing as long as the index (that is, until the filtering process is initialized). If the filtering process remains uninitialized, then all the quantities that are related to the onestepahead forecast (such as and ) are reported as missing. When the fitted model is appropriate, the onestepahead residuals form a sequence of uncorrelated normal variates. This fact can be used during model diagnostic process.
After the filtering phase of KFS produces the onestepahead predictions of the response variables and the underlying state vectors, the smoothing phase of KFS produces the fullsample versions of these quantities—that is, rather than using the history up to , the entire sample is used. The smoothing phase of KFS is a backward algorithm, which begins at and and goes back toward and . It produces the following quantities:
Table 27.8: KFS: Smoothing Phase
Quantity 
Description 


Interpolated response value 

Variance of the interpolated response value 

Fullsample estimate of the state vector 

Covariance of 

Fullsample estimate of and 

Covariance of 

Estimate of additive outlier 

Variance of 

Maximal state shock chisquare statistic 
Note that if is not missing, then and because is completely known, given . Therefore, provides nontrivial information only when is missing—in which case represents the best estimate of based on the available data. The fullsample estimates of components that are specified in the model equations are based on the corresponding linear combinations of . Similarly, their standard errors are computed by using appropriate functions of . The estimate of the additive outlier, , is the difference between the observed response value and its estimate or prediction by using all the data except , which is denoted by . The estimate is missing when is missing. is also called the prediction error—as opposed to the onestepahead residual, . Similar to , the prediction errors can be used in checking the model adequacy. The prediction errors are normally distributed; however, unlike , they are not serially uncorrelated. You can request the printing of the prediction error sum of squares (PRESS) by specifying the PRESS option in the OUTPUT statement. The maximal state shock chisquare statistic, , is computed at each distinct time point and is described in de Jong and Penzer (1998) (the second term in the righthand side of Equation 14). Loosely speaking, is a measure of the magnitude of unexpected change in the underlying state at time . A large value of , which follows chisquare distribution with degrees of freedom equal to (the state size), can signify change in the data generation mechanism at time . For more information about the computation, precise definitions of additive outliers and maximal state shocks, and their use in the detection of structural change in the observation process, see de Jong and Penzer (1998). The computation of can be expensive for large state size and is not done by default. You can turn on its computation by specifying the MAXSHOCK option in the OUTPUT statement.
If the filtering process remains uninitialized until the end of the sample (that is, if is not invertible), some linear combinations of and are not estimable. This, in turn, implies that some linear combinations of are also inestimable. These inestimable quantities are reported as missing. For more information about the estimability of the state effects, see Selukar (2010).