The SSM Procedure

Building a Complex Model Specification

In addition to being able to specify the system matrices in a flexible way, you can also build a complex model specification in a modular way by combining simpler subspecifications. Suppose that the state vector for the model to be specified is composed of subsections that are statistically independent, which is a common scenario in practical modeling situations. For example, suppose that $\pmb {\alpha }_ t$ can be divided into two disjoint subsections $\pmb {\alpha }_{t}^{a}$ and $\pmb {\alpha }_{t}^{b}$, which are statistically independent. This entails a corresponding block-diagonal structure to the system matrices $\mb{T}_{t}$, $\mb{W}_{t}$, and $\mb{Q}_{t}$ that govern the state equations. In this case the term $ \mb{Z}_{t} \pmb {\alpha }_{t}$ that appears in the observation equation also splits into the sum $ \mb{Z}_{t}^{a} \pmb {\alpha }_{t}^{a} + \mb{Z}_{t}^{b} \pmb {\alpha }_{t}^{b}$ for appropriately partitioned matrices $ \mb{Z}_{t}^{a}$ and $ \mb{Z}_{t}^{b}$. The model specification syntax of the SSM procedure makes building an SSM from such smaller pieces easy. Throughout this chapter, the linear combinations of the state subsections (such as $ \mb{Z}_{t}^{a} \pmb {\alpha }_{t}^{a}$) that appear in the observation equation are called components. An SSM specification in the SSM procedure is created by combining separate component specifications. In general, you specify a component in two steps: first you define a state subsection $\pmb {\alpha }_{t}^{a}$, and then you define a matching linear combination $ \mb{Z}_{t}^{a} \pmb {\alpha }_{t}^{a}$. For some special components, such as some commonly needed trend components, you can combine these two steps into one keyword specification.

The following list summarizes the (nonprogramming) SSM procedure syntax statements used for model specification:

  • The ID statement specifies the index variable ($\tau $). It is assumed that the data within each BY group are ordered (in ascending order) according to the ID variable. The SSM procedure automatically creates a variable, _ID_DELTA_, which contains the difference between the successive ID values. This variable is available for use by the programming statements to define time-varying system matrices. For example, in the case of SSMs used for modeling the longitudinal data, the $\mb{T}_{t}$ and $\mb{Q}_{t}$ matrices often depend on _ID_DELTA_ (see Example 27.5).

  • The PARMS statement specifies variables that serve as the parameters of the model. That is, it partially defines the model parameter vector $\pmb {\theta }$. Other elements of $\pmb {\theta }$ are implicitly defined if your specification of the system matrices is not fully complete.

  • The STATE statement specifies a subsection of the model state vector. Multiple STATE statements can be used in the model specification; each one defines a statistically independent subsection of the model state vector. For full customization, $\mb{T}_{t}$, $\mb{W}_{t}$, and $\mb{Q}_{t}$ blocks that govern this subsection can be specified as lists of variables that are created by programming statements. However, you can obtain many commonly needed state subsection types simply by using the TYPE= option in this statement. For example, the use of TYPE=SEASON(LENGTH=12) results in a state subsection that can be used to define a monthly seasonal component.

  • The COMPONENT statement specifies a linear combination that matches a state subsection that is previously defined in a STATE statement. Thus, a matching pair of STATE and COMPONENT statements define a component.

  • The TREND statement is used for easy specification of some commonly needed components that follow stochastic patterns of certain predefined types.

  • The IRREGULAR statement specifies the observation disturbance for a particular response variable.

  • The MODEL statement specifies the observation equation for one of the response variables. A separate MODEL statement is needed for each response variable in the multivariate case. The MODEL statement specifies an equation in which the left-hand side is the response variable and the right-hand side is a list that contains components and regression variables.