PROC UCM: The UCMs as State Space Models :: SAS/ETS(R) 9.22 User's Guide

The UCM Procedure

The UCMs as State Space Models

The UCMs considered in PROC UCM can be thought of as special cases of more general models, called (linear) Gaussian state space models (GSSM). A GSSM can be described as follows:

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The first equation, called the observation equation, relates the response series $\text{[math]}$ to a state vector $\text{[math]}$ that is usually unobserved. The second equation, called the state equation, describes the evolution of the state vector in time. The system matrices $\text{[math]}$ and $\text{[math]}$ are of appropriate dimensions and are known, except possibly for some unknown elements that become part of the parameter vector of the model. The noise series $\text{[math]}$ consists of independent, zero-mean, Gaussian vectors with covariance matrices $\text{[math]}$ . For most of the UCMs considered here, the system matrices $\text{[math]}$ and $\text{[math]}$ , and the noise covariances $\text{[math]}$ , are time invariant—that is, they do not depend on time. In a few cases, however, some or all of them can depend on time. The initial state vector $\text{[math]}$ is assumed to be independent of the noise series, and its covariance matrix $\text{[math]}$ can be partially diffuse. A random vector has a partially diffuse covariance matrix if it can be partitioned such that one part of the vector has a properly defined probability distribution, while the covariance matrix of the other part is infinite—that is, you have no prior information about this part of the vector. The covariance of the initial state $\text{[math]}$ is assumed to have the following form:

$\text{[math]}$

where $\text{[math]}$ and $\text{[math]}$ are nonnegative definite, symmetric matrices and $\text{[math]}$ is a constant that is assumed to be close to $\text{[math]}$ . In the case of UCMs considered here, $\text{[math]}$ is always a diagonal matrix that consists of zeros and ones, and, if a particular diagonal element of $\text{[math]}$ is one, then the corresponding row and column in $\text{[math]}$ are zero.

The state space formulation of a UCM has many computational advantages. In this formulation there are convenient algorithms for estimating and forecasting the unobserved states $\text{[math]}$ by using the observed series $\text{[math]}$ . These algorithms also yield the in-sample and out-of-sample forecasts and the likelihood of $\text{[math]}$ . The state space representation of a UCM does not need to be unique. In the representation used here, the unobserved components in the UCM often appear as elements of the state vector. This makes the elements of the state interpretable and, more important, the sample estimates and forecasts of these unobserved components are easily obtained. For additional information about the computational aspects of the state space modeling, see Durbin and Koopman (2001). Next, some notation is developed to describe the essential quantities computed during the analysis of the state space models.

Let $\text{[math]}$ be the observed sample from a series that satisfies a state space model. Next, for $\text{[math]}$ , let the one-step-ahead forecasts of the series, the states, and their variances be defined as follows, using the usual notation to denote the conditional expectation and conditional variance:

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

These are also called the filtered estimates of the series and the states. Similarly, for $\text{[math]}$ , let the following denote the full-sample estimates of the series and the state values at time $\text{[math]}$ :

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

If the time $\text{[math]}$ is in the historical period— that is, if $\text{[math]}$ — then the full-sample estimates are called the smoothed estimates, and if $\text{[math]}$ lies in the future then they are called out-of-sample forecasts. Note that if $\text{[math]}$ , then $\text{[math]}$ and $\text{[math]}$ , unless $\text{[math]}$ is missing.

All the filtered and smoothed estimates ( $\text{[math]}$ , and so on) are computed by using the Kalman filtering and smoothing (KFS) algorithm, which is an iterative process. If the initial state is diffuse, as is often the case for the UCMs, its treatment requires modification of the traditional KFS, which is called the diffuse KFS (DKFS). The details of DKFS implemented in the UCM procedure can be found in de Jong and Chu-Chun-Lin (2003). Additional information on the state space models can be found in Durbin and Koopman (2001). The likelihood formulas described in this section are taken from the latter reference.

In the case of diffuse initial condition, the effect of the improper prior distribution of $\text{[math]}$ manifests itself in the first few filtering iterations. During these initial filtering iterations the distribution of the filtered quantities remains diffuse; that is, during these iterations the one-step-ahead series and state forecast variances $\text{[math]}$ and $\text{[math]}$ have the following form:

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The actual number of iterations—say, $\text{[math]}$ — affected by this improper prior depends on the nature of the vectors $\text{[math]}$ , the number of nonzero diagonal elements of $\text{[math]}$ , and the pattern of missing values in the dependent series. After $\text{[math]}$ iterations, $\text{[math]}$ and $\text{[math]}$ become zero and the one-step-ahead series and state forecasts have proper distributions. These first $\text{[math]}$ iterations constitute the initialization phase of the DKFS algorithm. The post-initialization phase of the DKFS and the traditional KFS is the same. In the state space modeling literature the pre-initialization and post-initialization phases are some times called pre-collapse and post-collapse phases of the diffuse Kalman filtering. In certain missing value patterns it is possible for $\text{[math]}$ to exceed the sample size; that is, the sample information can be insufficient to create a proper prior for the filtering process. In these cases, parameter estimation and forecasting is done on the basis of this improper prior, and some or all of the series and component forecasts can have infinite variances (or zero precision). The forecasts that have infinite variance are set to missing. The same situation can occur if the specified model contains components that are essentially multicollinear. In these situations no residual analysis is possible; in particular, no residuals-based goodness-of-fit statistics are produced.

The log likelihood of the sample ( $\text{[math]}$ ), which takes account of this diffuse initialization step, is computed by using the one-step-ahead series forecasts as follows

$\text{[math]}$

where $\text{[math]}$ is the number of diffuse elements in the initial state $\text{[math]}$ , $\text{[math]}$ are the one-step-ahead residuals, and

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

If $\text{[math]}$ is missing at some time $\text{[math]}$ , then the corresponding summand in the log likelihood expression is deleted, and the constant term is adjusted suitably. Moreover, if the initialization step does not complete—that is, if $\text{[math]}$ exceeds the sample size— then the value of $\text{[math]}$ is reduced to the number of diffuse states that are successfully initialized.

The portion of the log likelihood that corresponds to the post-initialization period is called the nondiffuse log likelihood ( $\text{[math]}$ ). The nondiffuse log likelihood is given by

$\text{[math]}$

In the case of UCMs considered in PROC UCM, it often happens that the diffuse part of the likelihood, $\text{[math]}$ , does not depend on the model parameters, and in these cases the maximization of nondiffuse and diffuse likelihoods is equivalent. However, in some cases, such as when the model consists of dependent lags, the diffuse part does depend on the model parameters. In these cases the maximization of the diffuse and nondiffuse likelihood can produce different parameter estimates.

In some situations it is convenient to reparameterize the nondiffuse initial state covariance $\text{[math]}$ as $\text{[math]}$ and the state noise covariance $\text{[math]}$ as $\text{[math]}$ for some common scalar parameter $\text{[math]}$ . In this case the preceding log-likelihood expression, up to a constant, can be written as

$\text{[math]}$

Solving analytically for the optimum, the maximum likelihood estimate of $\text{[math]}$ can be shown to be

$\text{[math]}$

When this expression of $\text{[math]}$ is substituted back into the likelihood formula, an expression called the profile likelihood ( $\text{[math]}$ ) of the data is obtained:

$\text{[math]}$

In some situations the parameter estimation is done by optimizing the profile likelihood (see the section Parameter Estimation by Profile Likelihood Optimization and the PROFILE option in the ESTIMATE statement).

In the remainder of this section the state space formulation of UCMs is further explained by using some particular UCMs as examples. The examples show that the state space formulation of the UCMs depends on the components in the model in a simple fashion; for example, the system matrix $\text{[math]}$ is usually a block diagonal matrix with blocks that correspond to the components in the model. The only exception to this pattern is the UCMs that consist of the lags of dependent variable. This case is considered at the end of the section.

In what follows, $\text{[math]}$ denotes a diagonal matrix with diagonal entries $\text{[math]}$ , and the transpose of a matrix $\text{[math]}$ is denoted as $\text{[math]}$ .

Locally Linear Trend Model

Recall that the dynamics of the locally linear trend model are

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

Here $\text{[math]}$ is the response series and $\text{[math]}$ and $\text{[math]}$ are independent, zero-mean Gaussian disturbance sequences with variances $\text{[math]}$ , and $\text{[math]}$ , respectively. This model can be formulated as a state space model where the state vector $\text{[math]}$ and the state noise $\text{[math]}$ . Note that the elements of the state vector are precisely the unobserved components in the model. The system matrices $\text{[math]}$ and $\text{[math]}$ and the noise covariance $\text{[math]}$ corresponding to this choice of state and state noise vectors can be seen to be time invariant and are given by

$\text{[math]}$

The distribution of the initial state vector $\text{[math]}$ is diffuse, with $\text{[math]}$ and $\text{[math]}$ . The parameter vector $\text{[math]}$ consists of all the disturbance variances—that is, $\text{[math]}$ .

Basic Structural Model

The basic structural model (BSM) is obtained by adding a seasonal component, $\text{[math]}$ , to the local level model. In order to economize on the space, the state space formulation of a BSM with a relatively short season length, season length = 4 (quarterly seasonality), is considered here. The pattern for longer season lengths such as 12 (monthly) and 52 (weekly) is easy to see.

Let us first consider the dummy form of seasonality. In this case the state and state noise vectors are $\text{[math]}$ and $\text{[math]}$ , respectively. The first three elements of the state vector are the irregular, level, and slope components, respectively. The remaining elements, $\text{[math]}$ , are lagged versions of the seasonal component $\text{[math]}$ . $\text{[math]}$ corresponds to lag zero—that is, the same as $\text{[math]}$ , $\text{[math]}$ to lag 1 and $\text{[math]}$ to lag 2. The system matrices are

$\text{[math]}$

and $\text{[math]}$ . The distribution of the initial state vector $\text{[math]}$ is diffuse, with $\text{[math]}$ and $\text{[math]}$ .

In the case of the trigonometric type of seasonality, $\text{[math]}$ and $\text{[math]}$ . The disturbance sequences, $\text{[math]}$ , and $\text{[math]}$ , are independent, zero-mean, Gaussian sequences with variance $\text{[math]}$ . The system matrices are

$\text{[math]}$

and $\text{[math]}$ . Here $\text{[math]}$ . The distribution of the initial state vector $\text{[math]}$ is diffuse, with $\text{[math]}$ and $\text{[math]}$ . The parameter vector in both the cases is $\text{[math]}$ .

Seasons with Blocked Seasonal Values

Block seasonals are special seasonal components that impose a special block structure on the seasonal effects. Let us consider a BSM with monthly seasonality that has a quarterly block structure—that is, months within the same quarter are assumed to have identical effects except for some random perturbation. Such a seasonal component is a block seasonal with block size $\text{[math]}$ equal to 3 and the number of blocks $\text{[math]}$ equal to 4. The state space structure for such a model with dummy-type seasonality is as follows: The state and state noise vectors are $\text{[math]}$ and $\text{[math]}$ , respectively. The first three elements of the state vector are the irregular, level, and slope components, respectively. The remaining elements, $\text{[math]}$ , are lagged versions of the seasonal component $\text{[math]}$ . $\text{[math]}$ corresponds to lag zero—that is, the same as $\text{[math]}$ , $\text{[math]}$ to lag $\text{[math]}$ and $\text{[math]}$ to lag $\text{[math]}$ . All the system matrices are time invariant, except the matrix $\text{[math]}$ . They can be seen to be $\text{[math]}$ , $\text{[math]}$ , and

$\text{[math]}$

when $\text{[math]}$ is a multiple of the block size $\text{[math]}$ , and

$\text{[math]}$

otherwise. Note that when $\text{[math]}$ is not a multiple of $\text{[math]}$ , the portion of the $\text{[math]}$ matrix corresponding to the seasonal is identity. The distribution of the initial state vector $\text{[math]}$ is diffuse, with $\text{[math]}$ and $\text{[math]}$ .

Similarly in the case of the trigonometric form of seasonality, $\text{[math]}$ and $\text{[math]}$ . The disturbance sequences, $\text{[math]}$ , and $\text{[math]}$ , are independent, zero-mean, Gaussian sequences with variance $\text{[math]}$ . $\text{[math]}$ , $\text{[math]}$ , and

$\text{[math]}$

when $\text{[math]}$ is a multiple of the block size $\text{[math]}$ , and

$\text{[math]}$

otherwise. As before, when $\text{[math]}$ is not a multiple of $\text{[math]}$ , the portion of the $\text{[math]}$ matrix corresponding to the seasonal is identity. Here $\text{[math]}$ . The distribution of the initial state vector $\text{[math]}$ is diffuse, with $\text{[math]}$ and $\text{[math]}$ . The parameter vector in both the cases is $\text{[math]}$ .

Cycles and Autoregression

The preceding examples have illustrated how to build a state space model corresponding to a UCM that includes components such as irregular, trend, and seasonal. There you can see that the state vector and the system matrices have a simple block structure with blocks corresponding to the components in the model. Therefore, here only a simple model consisting of a single cycle and an irregular component is considered. The state space form for more complex UCMs consisting of multiple cycles and other components can be easily deduced from this example.

Recall that a stochastic cycle $\text{[math]}$ with frequency $\text{[math]}$ , $\text{[math]}$ , and damping coefficient $\text{[math]}$ can be modeled as

$\text{[math]}$

where $\text{[math]}$ and $\text{[math]}$ are independent, zero-mean, Gaussian disturbances with variance $\text{[math]}$ . In what follows, a state space form for a model consisting of such a stochastic cycle and an irregular component is given.

The state vector $\text{[math]}$ , and the state noise vector $\text{[math]}$ . The system matrices are

$\text{[math]}$

The distribution of the initial state vector $\text{[math]}$ is proper, with $\text{[math]}$ , where $\text{[math]}$ . The parameter vector $\text{[math]}$ .

An autoregression $\text{[math]}$ can be considered as a special case of cycle with frequency $\text{[math]}$ equal to $\text{[math]}$ or $\text{[math]}$ . In this case the equation for $\text{[math]}$ is not needed. Therefore, for a UCM consisting of an autoregressive component and an irregular component, the state space model simplifies to the following form.

The state vector $\text{[math]}$ , and the state noise vector $\text{[math]}$ . The system matrices are

$\text{[math]}$

The distribution of the initial state vector $\text{[math]}$ is proper, with $\text{[math]}$ , where $\text{[math]}$ . The parameter vector $\text{[math]}$ .

Incorporating Predictors of Different Kinds

In the UCM procedure, predictors can be incorporated in a UCM in a variety of ways: simple time-invariant linear predictors are specified in the MODEL statement, predictors with time-varying coefficients can be specified in the RANDOMREG statement, and predictors that have a nonlinear relationship with the response variable can be specified in the SPLINEREG statement. As with earlier examples, how to obtain a state space form of a UCM consisting of such variety of predictors is illustrated using a simple special case. Consider a random walk trend model with predictors $\text{[math]}$ , and $\text{[math]}$ . Let us assume that $\text{[math]}$ is a simple regressor specified in the MODEL statement, $\text{[math]}$ and $\text{[math]}$ are random regressors with time-varying regression coefficients that are specified in the same RANDOMREG statement, and $\text{[math]}$ is a nonlinear regressor specified on a SPLINEREG statement. Let us further assume that the spline associated with $\text{[math]}$ has degree one and is based on two internal knots. As explained in the section SPLINEREG Statement, using $\text{[math]}$ is equivalent to using $\text{[math]}$ derived (random) regressors: say, $\text{[math]}$ . In all there are $\text{[math]}$ regressors, the first one being a simple regressor and the others being time-varying coefficient regressors. The time-varying regressors are in two groups, the first consisting of $\text{[math]}$ and $\text{[math]}$ and the other consisting of $\text{[math]}$ , and $\text{[math]}$ . The dynamics of this model are as follows:

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

All the disturbances $\text{[math]}$ and $\text{[math]}$ are independent, zero-mean, Gaussian variables, where $\text{[math]}$ share a common variance parameter $\text{[math]}$ and $\text{[math]}$ share a common variance $\text{[math]}$ . These dynamics can be captured in the state space form by taking state $\text{[math]}$ , state disturbance $\text{[math]}$ , and the system matrices

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

Note that the regression coefficients are elements of the state vector and that the system vector $\text{[math]}$ is not time invariant. The distribution of the initial state vector $\text{[math]}$ is diffuse, with $\text{[math]}$ and $\text{[math]}$ . The parameters of this model are the disturbance variances, $\text{[math]}$ , $\text{[math]}$ $\text{[math]}$ and $\text{[math]}$ , which get estimated by maximizing the likelihood. The regression coefficients, time-invariant $\text{[math]}$ and time-varying $\text{[math]}$ and $\text{[math]}$ , get implicitly estimated during the state estimation (smoothing).

Reporting Parameter Estimates for Random Regressors

If the random walk disturbance variance associated with a random regressor is held fixed at zero, then its coefficient is no longer time-varying. In the UCM procedure the random regressor parameter estimates are reported differently if the random walk disturbance variance associated with a random regressor is held fixed at zero. The following points explain how the parameter estimates are reported in the parameter estimates table and in the OUTEST= data set.

If the random walk disturbance variance associated with a random regressor is not held fixed, then its estimate is reported in the parameter estimates table and in the OUTEST= data set.
If more that one random regressor is specified in a RANDOMREG statement, then the first regressor in the list is used as a representative of the list while reporting the corresponding common variance parameter estimate.
If the random walk disturbance variance is held fixed at zero, then the parameter estimates table and the OUTEST= data set contain the corresponding regression parameter estimate rather than the variance parameter estimate.
Similar considerations apply in the case of the derived random regressors associated with a spline-regressor.

ARMA Irregular Component

The state space form for the irregular component that follows an ARMA(p,q) $\text{[math]}$ (P,Q) $\text{[math]}$ model is described in this section. The notation for ARMA models is explained in the IRREGULAR statement. A number of alternate state space forms are possible in this case; the one given here is based on Jones (1980). With slight abuse of notation, let $\text{[math]}$ denote the effective autoregressive order and $\text{[math]}$ denote the effective moving average order of the model. Similarly, let $\text{[math]}$ be the effective autoregressive polynomial and $\text{[math]}$ be the effective moving average polynomial in the backshift operator with coefficients $\text{[math]}$ and $\text{[math]}$ , obtained by multiplying the respective nonseasonal and seasonal factors. Then, a random sequence $\text{[math]}$ that follows an ARMA(p,q) $\text{[math]}$ (P,Q) $\text{[math]}$ model with a white noise sequence $\text{[math]}$ has a state space form with state vector of size $\text{[math]}$ . The system matrices, which are time invariant, are as follows: $\text{[math]}$ . The state transition matrix $\text{[math]}$ , in a blocked form, is given by

$\text{[math]}$

where $\text{[math]}$ if $\text{[math]}$ and $\text{[math]}$ is an $\text{[math]}$ dimensional indentity matrix. The covariance of the state disturbance matrix $\text{[math]}$ where $\text{[math]}$ is the variance of the white noise sequence $\text{[math]}$ and the vector $\text{[math]}$ contains the first $\text{[math]}$ values of the impulse response function—that is, the first $\text{[math]}$ coefficients in the expansion of the ratio $\text{[math]}$ . Since $\text{[math]}$ is a stationary sequence, the initial state is nondiffuse and $\text{[math]}$ . The description of $\text{[math]}$ , the covariance matrix of the initial state, is a little involved; the details are given in Jones (1980).

Models with Dependent Lags

The state space form of a UCM consisting of the lags of the dependent variable is quite different from the state space forms considered so far. Let us consider an example to illustrate this situation. Consider a model that has random walk trend, two simple time-invariant regressors, and that also includes a few—say, $\text{[math]}$ —lags of the dependent variable. That is,

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The state space form of this augmented model can be described in terms of the state space form of a model that has random walk trend with two simple time-invariant regressors. A superscript dagger ( $\text{[math]}$ ) has been added to distinguish the augmented model state space entities from the corresponding entities of the state space form of the random walk with predictors model. With this notation, the state vector of the augmented model $\text{[math]}$ and the new state noise vector $\text{[math]}$ , where $\text{[math]}$ is the matrix product $\text{[math]}$ . Note that the length of the new state vector is $\text{[math]}$ . The new system matrices, in block form, are

$\text{[math]}$

where $\text{[math]}$ is the $\text{[math]}$ dimensional identity matrix and

$\text{[math]}$

Note that the $\text{[math]}$ and $\text{[math]}$ matrices of the random walk with predictors model are time invariant, and in the expressions above their time indices are kept because they illustrate the pattern for more general models. The initial state vector is diffuse, with

$\text{[math]}$

The parameters of this model are the disturbance variances $\text{[math]}$ and $\text{[math]}$ , the lag coefficients $\text{[math]}$ , and the regression coefficients $\text{[math]}$ and $\text{[math]}$ . As before, the regression coefficients get estimated during the state smoothing, and the other parameters are estimated by maximizing the likelihood.

Top of Page