The AUTOREG Procedure


The modeling process consists of four stages: identification, specification, estimation, and diagnostic checking (Cromwell, Labys, and Terraza, 1994). The AUTOREG procedure supports tens of statistical tests for identification and diagnostic checking. Figure 8.15 illustrates how to incorporate these statistical tests into the modeling process.

Figure 8.15: Statistical Tests in the AUTOREG Procedure

Statistical Tests in the AUTOREG Procedure

Testing for Stationarity

Most of the theories of time series require stationarity; therefore, it is critical to determine whether a time series is stationary. Two nonstationary time series are fractionally integrated time series and autoregressive series with random coefficients. However, more often some time series are nonstationary due to an upward trend over time. The trend can be captured by either of the following two models.

  • The difference stationary process

    \[  (1- L)y_ t =\delta +\psi (L) \epsilon _ t  \]

    where $L$ is the lag operator, $\psi (1)\neq 0$, and $\epsilon _ t$ is a white noise sequence with mean zero and variance $\sigma ^2$. Hamilton (1994) also refers to this model the unit root process.

  • The trend stationary process

    \[  y_ t =\alpha +\delta t +\psi (L) \epsilon _ t  \]

When a process has a unit root, it is said to be integrated of order one or I(1). An I(1) process is stationary after differencing once. The trend stationary process and difference stationary process require different treatment to transform the process into stationary one for analysis. Therefore, it is important to distinguish the two processes. Bhargava (1986) nested the two processes into the following general model

\begin{equation*}  y_ t = \gamma _0 +\gamma _1 t + \alpha ( y_{t-1} -\gamma _0 -\gamma _1(t-1)) + \psi (L)\epsilon _ t \end{equation*}

However, a difficulty is that the right-hand side is nonlinear in the parameters. Therefore, it is convenient to use a different parametrization

\begin{equation*}  y_ t =\beta _0 +\beta _1 t +\alpha y_{t-1} +\psi (L) \epsilon _ t \end{equation*}

The test of null hypothesis that $\alpha =1$ against the one-sided alternative of $\alpha <1$ is called a unit root test.

Dickey-Fuller unit root tests are based on regression models similar to the previous model

\begin{equation*}  y_ t =\beta _0 +\beta _1 t +\alpha y_{t-1} + \epsilon _ t \end{equation*}

where $\epsilon _ t$ is assumed to be white noise. The t statistic of the coefficient $\alpha $ does not follow the normal distribution asymptotically. Instead, its distribution can be derived using the functional central limit theorem. Three types of regression models including the preceding one are considered by the Dickey-Fuller test. The deterministic terms that are included in the other two types of regressions are either null or constant only.

An assumption in the Dickey-Fuller unit root test is that it requires the errors in the autoregressive model to be white noise, which is often not true. There are two popular ways to account for general serial correlation between the errors. One is the augmented Dickey-Fuller (ADF) test, which uses the lagged difference in the regression model. This was originally proposed by Dickey and Fuller (1979) and later studied by Said and Dickey (1984) and Phillips and Perron (1988). Another method is proposed by Phillips and Perron (1988); it is called Phillips-Perron (PP) test. The tests adopt the original Dickey-Fuller regression with intercept, but modify the test statistics to take account of the serial correlation and heteroscedasticity. It is called nonparametric because no specific form of the serial correlation of the errors is assumed.

A problem of the augmented Dickey-Fuller and Phillips-Perron unit root tests is that they are subject to size distortion and low power. It is reported in Schwert (1989) that the size distortion is significant when the series contains a large moving average (MA) parameter. DeJong et al. (1992) find that the ADF has power around one third and PP test has power less than 0.1 against the trend stationary alternative, in some common settings. Among some more recent unit root tests that improve upon the size distortion and the low power are the tests described by Elliott, Rothenberg, and Stock (1996) and Ng and Perron (2001). These tests involve a step of detrending before constructing the test statistics and are demonstrated to perform better than the traditional ADF and PP tests.

Most testing procedures specify the unit root processes as the null hypothesis. Tests of the null hypothesis of stationarity have also been studied, among which Kwiatkowski et al. (1992) is very popular.

Economic theories often dictate that a group of economic time series are linked together by some long-run equilibrium relationship. Statistically, this phenomenon can be modeled by cointegration. When several nonstationary processes ${\mb {z}_{t} ={(z_{1t},{\cdots },z_{kt})’}}$ are cointegrated, there exists a $(k{\times }1)$ cointegrating vector $\mb {c}$ such that $\mb {c}’\mb {z}_{t}$ is stationary and $\mb {c}$ is a nonzero vector. One way to test the relationship of cointegration is the residual based cointegration test, which assumes the regression model

\[  y_{t} = {\beta }_{1} + {\mb {x} _{t}’}{\beta } + u_{t}  \]

where ${y_{t} = z_{1t}}$, ${\mb {x} _{t}=(z_{2t},{\cdots },z_{kt})’}$, and ${\beta }$ = (${\beta }$$_{2}$,${\cdots }$,${\beta }$$_{k})’$. The OLS residuals from the regression model are used to test for the null hypothesis of no cointegration. Engle and Granger (1987) suggest using ADF on the residuals while Phillips and Ouliaris (1990) study the tests using PP and other related test statistics.

Augmented Dickey-Fuller Unit Root and Engle-Granger Cointegration Testing

Common unit root tests have the null hypothesis that there is an autoregressive unit root $H_0: \alpha =1$, and the alternative is $H_ a: |\alpha |<1$, where $\alpha $ is the autoregressive coefficient of the time series

\begin{equation*}  y_ t = \alpha y_{t-1}+ \epsilon _ t \end{equation*}

This is referred to as the zero mean model. The standard Dickey-Fuller (DF) test assumes that errors $\epsilon _ t$ are white noise. There are two other types of regression models that include a constant or a time trend as follows:

$\displaystyle  y_ t  $
$\displaystyle = \mu + \alpha y_{t-1}+ \epsilon _ t  $
$\displaystyle y_ t  $
$\displaystyle = \mu +\beta t+ \alpha y_{t-1}+ \epsilon _ t  $

These two models are referred to as the constant mean model and the trend model, respectively. The constant mean model includes a constant mean $\mu $ of the time series. However, the interpretation of $\mu $ depends on the stationarity in the following sense: the mean in the stationary case when $\alpha <1$ is the trend in the integrated case when $\alpha =1$. Therefore, the null hypothesis should be the joint hypothesis that $\alpha =1$ and $ \mu =0$. However for the unit root tests, the test statistics are concerned with the null hypothesis of $\alpha =1$. The joint null hypothesis is not commonly used. This issue is address in Bhargava (1986) with a different nesting model.

There are two types of test statistics. The conventional $t$ ratio is

\[  DF_\tau = \frac{\hat{\alpha }-1}{sd(\hat{\alpha })}  \]

and the second test statistic, called $\rho $-test, is

\[  T(\hat{\alpha }-1)  \]

For the zero mean model, the asymptotic distributions of the Dickey-Fuller test statistics are

$\displaystyle  T(\hat{\alpha }-1)  $
$\displaystyle \Rightarrow \left(\int _0^1 W(r) dW(r) \right)\left( \int _0^1 W(r)^2 dr\right)^{-1} $
$\displaystyle DF_\tau  $
$\displaystyle  \Rightarrow \left(\int _0^1 W(r) dW(r) \right)\left( \int _0^1 W(r)^2 dr\right)^{-1/2}  $

For the constant mean model, the asymptotic distributions are

$\displaystyle  T(\hat{\alpha }-1)  $
$\displaystyle \Rightarrow \left( [W(1)^2-1]/2 -W(1)\int _0^1 W(r)dr \right)\left( \int _0^1 W(r)^2 dr - \left(\int _0^1 W(r)dr \right)^2\right)^{-1} $
$\displaystyle DF_\tau  $
$\displaystyle  \Rightarrow \left([W(1)^2-1]/2 -W(1)\int _0^1 W(r)dr \right)\left( \int _0^1 W(r)^2 dr - \left(\int _0^1 W(r)dr \right)^2\right)^{-1/2}  $

For the trend model, the asymptotic distributions are

\begin{equation*} \begin{split}  T(\hat{\alpha }-1) &  \Rightarrow \left[ W(r)dW +12\left(\int _0^1 rW(r)dr -\frac{1}{2}\int _0^1 W(r)dr \right) \left( \int _0^1 W(r)dr - \frac12 W(1)\right) \right. \\ &  \quad \left. -W(1) \int _0^1 W(r)dr \right] D^{-1} \\ DF_\tau & \Rightarrow \left[ W(r)dW +12\left(\int _0^1 rW(r)dr -\frac{1}{2}\int _0^1 W(r)dr \right) \left( \int _0^1 W(r)dr - \frac12 W(1)\right) \right. \\ &  \quad \left. -W(1) \int _0^1 W(r)dr \right] D^{1/2} \end{split}\end{equation*}


\[  D= \int _0^1 W(r)^2 dr - 12\left( \int _0^1 r(W(r)dr \right)^2 +12 \int _0^1 W(r)dr \int _0^1 rW(r)dr -4\left( \int _0^1 W(r) dr \right) ^2  \]

One problem of the Dickey-Fuller and similar tests that employ three types of regressions is the difficulty in the specification of the deterministic trends. Campbell and Perron (1991) claimed that the proper handling of deterministic trends is a vital prerequisite for dealing with unit roots. However the proper handling is not obvious since the distribution theory of the relevant statistics about the deterministic trends is not available. Hayashi (2000) suggests to using the constant mean model when you think there is no trend, and using the trend model when you think otherwise. However no formal procedure is provided.

The null hypothesis of the Dickey-Fuller test is a random walk, possibly with drift. The differenced process is not serially correlated under the null of I(1). There is a great need for the generalization of this specification. The augmented Dickey-Fuller (ADF) test, originally proposed in Dickey and Fuller (1979), adjusts for the serial correlation in the time series by adding lagged first differences to the autoregressive model,

\begin{equation*}  \Delta y_ t = \mu +\delta t + \alpha y_{t-1} + \sum _{j=1}^ p \alpha _ j \Delta y_{t-j} + \epsilon _ t \end{equation*}

where the deterministic terms $\delta t$ and $\mu $ can be absent for the models without drift or linear trend. As previously, there are two types of test statistics. One is the OLS $t$ value

\begin{equation*}  \frac{\hat{\alpha }-1}{sd( \hat{\alpha })} \end{equation*}

and the other is given by

\[  \frac{T(\hat{\alpha } -1)}{1- \hat{\alpha }_1 -\ldots -\hat{\alpha }_ p}  \]

The asymptotic distributions of the test statistics are the same as those of the standard Dickey-Fuller test statistics.

Nonstationary multivariate time series can be tested for cointegration, which means that a linear combination of these time series is stationary. Formally, denote the series by ${\mb {z}_{t} ={(z_{1t},{\cdots },z_{kt})’}}$. The null hypothesis of cointegration is that there exists a vector $\mb {c}$ such that $\mb {c}’\mb {z}_{t}$ is stationary. Residual-based cointegration tests were studied in Engle and Granger (1987) and Phillips and Ouliaris (1990). The latter are described in the next subsection. The first step regression is

\begin{equation*}  y_{t} = {\mathbf{x} _{t}’}{\beta } + u_{t} \end{equation*}

where ${y_{t} = z_{1t}}$, ${\mathbf{x} _{t}=(z_{2t},{\cdots },z_{kt})’}$, and ${\beta }$ = (${\beta }$$_{2}$,${\cdots }$,${\beta }$$_{k})’$. This regression can also include an intercept or an intercept with a linear trend. The residuals are used to test for the existence of an autoregressive unit root. Engle and Granger (1987) proposed augmented Dickey-Fuller type regression without an intercept on the residuals to test the unit root. When the first step OLS does not include an intercept, the asymptotic distribution of the ADF test statistic $DF_\tau $ is given by

$\displaystyle  DF_\tau \Longrightarrow  $
$\displaystyle \int _0^1 \frac{Q(r)}{(\int _0^1 Q^2)^{1/2}} dS $
$\displaystyle Q(r)  $
$\displaystyle =W_1(r) -\int _0^1 W_1 W_2’ \left(\int _0^1 W_2 W_2’\right)^{-1} W_2(r) $
$\displaystyle S(r) $
$\displaystyle = \frac{Q(r)}{(\kappa  \kappa )^{1/2}} $
$\displaystyle \kappa ’  $
$\displaystyle = \left(1, -\int _0^1 W_1 W_2’ \left(\int _0^1 W_2 W_2’\right)^{-1} \right)  $

where $W(r)$ is a $k$ vector standard Brownian motion and

\[  W(r)=\Bigl (W_1(r), W_2(r) \Bigr )  \]

is a partition such that $W_1(r)$ is a scalar and $W_2(r)$ is $k-1$ dimensional. The asymptotic distributions of the test statistics in the other two cases have the same form as the preceding formula. If the first step regression includes an intercept, then $W(r)$ is replaced by the demeaned Brownian motion $\overline{W}(r) = W(r) -\int _0^1 W(r)dr$. If the first step regression includes a time trend, then $W(r)$ is replaced by the detrended Brownian motion. The critical values of the asymptotic distributions are tabulated in Phillips and Ouliaris (1990) and MacKinnon (1991).

The residual based cointegration tests have a major shortcoming. Different choices of the dependent variable in the first step OLS might produce contradictory results. This can be explained theoretically. If the dependent variable is in the cointegration relationship, then the test is consistent against the alternative that there is cointegration. On the other hand, if the dependent variable is not in the cointegration system, the OLS residual $y_{t} - {\mathbf{x} _{t}’}{\beta }$ do not converge to a stationary process. Changing the dependent variable is more likely to produce conflicting results in finite samples.

Phillips-Perron Unit Root and Cointegration Testing

Besides the ADF test, there is another popular unit root test that is valid under general serial correlation and heteroscedasticity, developed by Phillips (1987) and Phillips and Perron (1988). The tests are constructed using the AR(1) type regressions, unlike ADF tests, with corrected estimation of the long run variance of $\Delta {y}_ t$. In the case without intercept, consider the driftless random walk process

\[  y_{t} = y_{t-1} + u_{t}  \]

where the disturbances might be serially correlated with possible heteroscedasticity. Phillips and Perron (1988) proposed the unit root test of the OLS regression model,

\[  y_{t} = {\rho }y_{t-1} + u_{t}  \]

Denote the OLS residual by $\hat u_{t}$. The asymptotic variance of ${\frac{1}{\mi {T} }\sum _{t=1}^{T}{\hat{u}^{2}_{t}}}$ can be estimated by using the truncation lag l.

\[  \hat{{\lambda }} = \sum _{j=0}^{l}{{\kappa }_{j}[1- j/(l+1)]\hat{{\gamma }}_{j}}  \]

where ${{\kappa }_{0}=1}$, ${{\kappa }_{j}=2}$ for ${j>0}$, and ${\hat{{\gamma }}_{j} =\frac{1}{\mi {T} }\sum _{t=j+1}^{T}{\hat{u}_{t}\hat{u}_{t-j}}}$. This is a consistent estimator suggested by Newey and West (1987).

The variance of $u_ t$ can be estimated by ${s^{2} = \frac{1}{\mi {T} -k}\sum _{t=1}^{T}{\hat{u}^{2}_{t}}}$. Let ${\hat\sigma }^{2}$ be the variance estimate of the OLS estimator ${\hat\rho }$. Then the Phillips-Perron $\hat{\mr {Z}}_{\rho }$ test (zero mean case) is written

\[  \hat{\mr {Z}}_{\rho } = \mi {T} (\hat{{\rho }}-1) - \frac{1}{2} \mi {T} ^{2}\hat{{\sigma }}^{2}(\hat{{\lambda }}-\hat{{\gamma }}_{0}) / {s^{2}}  \]

The $\hat{\mr {Z}}_{\rho }$ statistic is just the ordinary Dickey-Fuller $\hat{\mr {Z}}_{\alpha }$ statistic with a correction term that accounts for the serial correlation. The correction term goes to zero asymptotically if there is no serial correlation.

Note that ${\mr {P}(\hat{{\rho }} < 1) {\approx } 0.68}$ as ${\mi {T} {\rightarrow }{\infty }}$, which shows that the limiting distribution is skewed to the left.

Let $\tau _{\rho }$ be the $\tau $ statistic for ${\hat\rho }$. The Phillips-Perron ${\hat{\mr {Z}}_ t}$ (defined here as ${\hat{\mr {Z}}_\tau }$) test is written

\[  \hat{\mr {Z}}_\tau = ({\hat{{\gamma }}_{0}}/{\hat{{\lambda }}})^{1/2}t_{\hat{{\rho }}} - \frac{1}{2} {\mi {T} \hat{{\sigma }}(\hat{{\lambda }}-\hat{{\gamma }}_{0})} /{(s \hat{{\lambda }}^{1/2})}  \]

To incorporate a constant intercept, the regression model ${y_{t}={\mu }+{\rho }y_{t-1}+u_{t}}$ is used (single mean case) and null hypothesis the series is a driftless random walk with nonzero unconditional mean. To incorporate a time trend, we used the regression model $y_{t}={\mu }+\delta t +\rho y_{t-1}+u_{t}$ and under the null the series is a random walk with drift.

The limiting distributions of the test statistics for the zero mean case are

$\displaystyle  \hat{\mr {Z}}_{\rho }  $
$\displaystyle  \Rightarrow \frac{\frac{1}{2}\{ {\mi {B} (1)} ^{2}-1\} }{\int _{0}^{1}{{[\mi {B} (s)} ]^{2}ds}}  $
$\displaystyle \hat{\mr {Z}}_\tau  $
$\displaystyle  \Rightarrow \frac{\frac{1}{2}\{ {[\mi {B} (1)} ]^{2}-1\} }{\{  \int _{0}^{1}{{[\mi {B} (x)} ]^{2}dx}\} ^{1/2} }  $

where B(${\cdot }$) is a standard Brownian motion.

The limiting distributions of the test statistics for the intercept case are

$\displaystyle  \hat{Z}_{\rho }  $
$\displaystyle  \Rightarrow \frac{\frac{1}{2}\{ {[\mi {B} (1)} ]^{2}-1\}  -\mi {B} (1)\int _{0}^{1}{\mi {B} (x)dx}}{\int _{0}^{1}{{[\mi {B} (x)} ]^{2}dx} -\left[{\int _{0}^{1}{\mi {B} (x)dx}}\right]^{2}}  $
$\displaystyle {\hat{\mr {Z}}_\tau }  $
$\displaystyle  \Rightarrow \frac{\frac{1}{2}\{ {[\mi {B} (1)} ]^{2}-1\}  -\mi {B} (1)\int _{0}^{1}{\mi {B} (x)dx}}{{\{  \int _{0}^{1}{{[\mi {B} (x)}]^{2}dx} -\left[{\int _{0}^{1}{\mi {B} (x)dx}}\right]^{2} }\} ^{1/2} }  $

Finally, The limiting distributions of the test statistics for the trend case are can be derived as

\[  \begin{bmatrix}  0   &  c   &  0   \end{bmatrix} V^{-1} \begin{bmatrix}  \mi {B} (1)   \\ \left({\mi {B} (1)}^{2}-1 \right)/2   \\ \mi {B} (1)-\int _{0}^{1}{\mi {B} (x)dx}   \end{bmatrix}  \]

where ${c=1}$ for ${\hat{\mr {Z}}_\rho }$ and ${c=\frac{1}{\sqrt {Q}}}$ for ${\hat{\mr {Z}}_\tau }$,

\[  V = \left[\begin{matrix}  1   &  \int _{0}^{1}{\mi {B} (x)dx}   &  1/2   \\ \int _{0}^{1}{\mi {B} (x)dx}   &  \int _{0}^{1}{{\mi {B} (x)} ^{2}dx}   &  \int _{0}^{1}{x\mi {B} (x)dx}   \\ 1/2  &  \int _{0}^{1}{x\mi {B} (x)dx}   &  1/3   \end{matrix}\right]  \]
\[  Q = \left[\begin{matrix}  0   &  c   &  0   \end{matrix}\right] V^{-1} \left[\begin{matrix}  0   &  c   &  0   \end{matrix}\right]^ T  \]

The finite sample performance of the PP test is not satisfactory ( see Hayashi (2000) ).

When several variables ${\mb {z}_{t} ={(z_{1t},{\cdots },z_{kt})’}}$ are cointegrated, there exists a $(k{\times }1)$ cointegrating vector $\mb {c}$ such that $\mb {c}’\mb {z}_{t}$ is stationary and $\mb {c}$ is a nonzero vector. The residual based cointegration test assumes the following regression model:

\[  y_{t} = {\beta }_{1} + {\mb {x} _{t}’}{\beta } + u_{t}  \]

where ${y_{t} = z_{1t}}$, ${\mb {x} _{t}=(z_{2t},{\cdots },z_{kt})’}$, and ${\beta }$ = (${\beta }$$_{2}$,${\cdots }$,${\beta }$$_{k})’$. You can estimate the consistent cointegrating vector by using OLS if all variables are difference stationary — that is, I(1). The estimated cointegrating vector is ${\hat{\mb {c} }={(1,-\hat{{\beta }}_{2},{\cdots },-\hat{{\beta }}_{k})’}}$. The Phillips-Ouliaris test is computed using the OLS residuals from the preceding regression model, and it uses the PP unit root tests $ \hat{\mr {Z}}_\rho $ and $ \hat{\mr {Z}}_\tau $ developed in Phillips (1987), although in Phillips and Ouliaris (1990) the asymptotic distributions of some other leading unit root tests are also derived. The null hypothesis is no cointegration.

You need to refer to the tables by Phillips and Ouliaris (1990) to obtain the $p$-value of the cointegration test. Before you apply the cointegration test, you may want to perform the unit root test for each variable (see the option STATIONARITY=).

As in the Engle-Granger cointegration tests, the Phillips-Ouliaris test can give conflicting results for different choices of the regressand. There are other cointegration tests that are invariant to the order of the variables, including Johansen (1988), Johansen (1991), Stock and Watson (1988).

ERS and Ng-Perron Unit Root Tests

As mentioned earlier, ADF and PP both suffer severe size distortion and low power. There is a class of newer tests that improve both size and power. These are sometimes called efficient unit root tests, and among them tests by Elliott, Rothenberg, and Stock (1996) and Ng and Perron (2001) are prominent.

Elliott, Rothenberg, and Stock (1996) consider the data generating process

$\displaystyle  y_ t  $
$\displaystyle = \beta ’ z_ t +u_ t  $
$\displaystyle u_ t  $
$\displaystyle = \alpha u_{t-1}+v_ t , t=1,\ldots , T  $

where $\{ z_ t\} $ is either $\{ 1\} $ or $\{ (1,t)\} $ and $\{ v_ t\} $ is an unobserved stationary zero-mean process with positive spectral density at zero frequency. The null hypothesis is $H_0: \alpha =1$, and the alternative is $H_ a: |\alpha |<1$. The key idea of Elliott, Rothenberg, and Stock (1996) is to study the asymptotic power and asymptotic power envelope of some new tests. Asymptotic power is defined with a sequence of local alternatives. For a fixed alternative hypothesis, the power of a test usually goes to one when sample size goes to infinity; however, this says nothing about the finite sample performance. On the other hand, when the data generating process under the alternative moves closer to the null hypothesis as the sample size increases, the power does not necessarily converge to one. The local-to-unity alternatives in ERS are

\[  \alpha = 1+ \frac{c}{T}  \]

and the power against the local alternatives has a limit as $T$ goes to infinity, which is called asymptotic power. This value is strictly between 0 and 1. Asymptotic power indicates the adequacy of a test to distinguish small deviations from the null hypothesis.


$\displaystyle  y_\alpha  $
$\displaystyle  =(y_1, (1-\alpha L)y_2,\ldots , (1-\alpha L) y_ T)  $
$\displaystyle z_\alpha  $
$\displaystyle =(z_1, (1-\alpha L)z_2,\ldots , (1-\alpha L) z_ T)  $

Let $S(\alpha )$ be the sum of squared residuals from a least squares regression of $y_\alpha $ on $z_\alpha $. Then the point optimal test against the local alternative $\bar{\alpha } = 1 + \bar{c}/T$ has the form

\[  P_ T^{GLS} =\frac{S(\bar{\alpha }) -\bar{\alpha }S(1)}{\hat{\omega }^2}  \]

where $\hat{\omega }^2$ is an estimator for $\omega ^2= \sum _{k=-\infty }^{\infty } Ev_ t v_{t-k}$. The test rejects the null when $P_ T$ is small. The asymptotic power function for the point optimal test that is constructed with $\bar{c}$ under local alternatives with $c$ is denoted by $\pi (c,\bar{c})$. Then the power envelope is $\pi (c,c)$ because the test formed with $\bar{c}$ is the most powerful against the alternative $c=\bar{c}$. In other words, the asymptotic function $\pi (c,\bar{c})$ is always below the power envelope $\pi (c)$ except that at one point, $c=\bar{c}$, they are tangent. Elliott, Rothenberg, and Stock (1996) show that choosing some specific values for $\bar{c}$ can cause the asymptotic power function $\pi (c,\bar{c})$ of the point optimal test to be very close to the power envelope. The optimal $\bar{c}$ is $-7$ when $z_ t =1$, and $-13.5$ when $z_ t =(1,t)’$. This choice of $\bar{c}$ corresponds to the tangent point where $\pi =0.5$. This is also true of the DF-GLS test.

Elliott, Rothenberg, and Stock (1996) also propose the DF-GLS test, given by the $t$ statistic for testing $\psi _0=0$ in the regression

\begin{equation*}  \Delta y_ t^ d = \psi _0 y_{t-1}^ d + \sum _{j=1}^ p \psi _ j \Delta y_{t-j}^ d + \epsilon _{tp} \end{equation*}

where $y_ t^ d$ is obtained in a first step detrending

\[  y_ t^ d =y_ t -\hat{\beta }_{\bar{\alpha }} ’ z_ t  \]

and $\hat{\beta }_{\bar{\alpha }} $ is least squares regression coefficient of $y_\alpha $ on $z_\alpha $. Regarding the lag length selection, Elliott, Rothenberg, and Stock (1996) favor the Schwarz Bayesian information criterion. The optimal selection of the lag length $p$ and the estimation of $\omega ^2$ is further discussed in Ng and Perron (2001). The lag length is selected from the interval $[0, p_{max}]$ for some fixed $p_{max}$ by using the modified Akaike’s information criterion,

$\displaystyle  \text {MAIC}(p) = \log (\hat{\sigma }_ p^2) +\frac{2(\tau _ T(p) +p)}{T- p_{max}}  $

where $\tau _ T(p) =(\hat{\sigma }_ p^2)^{-1} \hat{\psi }_0^2 \sum _{t=p_{max}+1}^{T-1} (y^ d_{t})^2$ and $\hat{\sigma }_ p^2= (T-p_{max}-1)^{-1} \sum _{t=p_{max}+1}^{T-1} \hat{\epsilon }_{tp}^2 $. For fixed lag length $p$, an estimate of $\omega ^2$ is given by

\[  \hat{\omega }^2= \frac{(T-1-p)^{-1} \sum _{t=p+2}^{T} \hat{\epsilon }_{tp}^2 }{ \left(1- \sum _{j=1}^ p \hat{\psi }_ j \right)^2}  \]

DF-GLS is indeed a superior unit root test, according to Stock (1994), Schwert (1989), and Elliott, Rothenberg, and Stock (1996). In terms of the size of the test, DF-GLS is almost as good as the ADF $t$ test $\text {DF}_\tau $ and better than the PP $\hat{\mr {Z}}_\rho $ and $\hat{\mr {Z}}_\tau $ test. In addition, the power of the DF-GLS test is greater than that of both the ADF $t$ test and the $\rho $-test.

Ng and Perron (2001) also apply GLS detrending to obtain the following M-tests:

$\displaystyle  {MZ}_{\alpha }  $
$\displaystyle = ((T-1)^{-1}(y_ T^ d)^2 -\hat{\omega }^2 ) \left(2(T-1)^{-2}\sum _{t=1}^{T-1} (y_{t}^ d)^2 \right)^{-1} $
$\displaystyle {MSB}  $
$\displaystyle =\left( \frac{\sum _{t=1}^{T-1} (y_{t}^ d)^2}{ (T-1)^2 \hat{\omega }^2}\right)^{1/2} $
$\displaystyle {MZ}_ t  $
$\displaystyle = {MZ}_{\alpha } \times {MSB}  $

The first one is a modified version of the Phillips-Perron $\mr {Z}_\rho $ test,

\[  MZ_\rho = \mr {Z}_\rho +\frac T2 (\hat{\alpha }-1)^2  \]

where the detrended data $\{ y_ t^ d\} $ is used. The second is a modified Bhargava (1986) $R_1$ test statistic. The third can be perceived as a modified Phillips-Perron $ \mr {Z}_\tau $ statistic because of the relationship $ \mr {Z}_\tau =MSB\times \mr {Z}_\rho $.

The modified point optimal tests that use the GLS detrended data are

\[  \begin{array}{ll} MP_ T^{GLS}=\frac{ \bar{c}^2(T-1)^{-2} \sum _{t=1}^{T-1} (y_{t}^ d)^2 -\bar{c} (T-1)^{-1} (y_{T}^ d)^2}{\hat{\omega }^2} \quad & \text {for} \; z_ t =1 \\ MP_ T^{GLS}=\frac{ \bar{c}^2(T-1)^{-2} \sum _{t=1}^{T-1} (y_{t}^ d)^2 +(1-\bar{c}) (T-1)^{-1} (y_{T}^ d)^2}{\hat{\omega }^2}\quad & \text {for}\; z_ t =(1,t) \\ \end{array}  \]

The DF-GLS test and the $\text {MZ}_ t$ test have the same limiting distribution:

\[  \begin{array}{ll} \text {DF-GLS} \approx MZ_ t \Rightarrow 0.5\frac{(J_ c(1)^2 -1)}{ \left(\int _0^1 J_ c(r)^2dr \right)^{1/2}} \quad & \text {for} \; z_ t =1\\ \text {DF-GLS} \approx MZ_ t \Rightarrow 0.5\frac{(V_{c,\bar{c}}(1)^2 -1)}{\left( \int _0^1 V_{c,\bar{c}}(r)^2dr \right)^{1/2}} \quad & \text {for}\; z_ t =(1,t)\\ \end{array}  \]

The point optimal test and the modified point optimal test have the same limiting distribution:

\[  \begin{array}{ll} P_ T^{GLS} \approx MP_ T^{GLS} \Rightarrow \bar{c}^2 \int _0^1 J_ c(r)^2 dr -\bar{c} J_ c(1)^2 \quad & \text {for} \; z_ t =1\\ P_ T^{GLS} \approx MP_ T^{GLS} \Rightarrow \bar{c}^2 \int _0^1 V_{c,\bar{c}}(r)^2 dr +(1-\bar{c})V_{c,\bar{c}}(1)^2 \quad & \text {for}\; z_ t =(1,t) \end{array}  \]

where $W(r)$ is a standard Brownian motion and $J_ c(r)$ is an Ornstein-Uhlenbeck process defined by $dJ_ c(r) =c J_ c(r)dr +dW(r)$ with $J_ c(0)=0$, $ V_{c,\bar{c}}(r) = J_ c(r) -r\left[ \lambda J_ c(1) +3(1-\lambda ) \int _0^1 s J_ c(s) ds \right]$, and $\lambda = (1-\bar{c})/(1-\bar{c}+\bar{c}^2/3)$.

Overall, the M-tests have the smallest size distortion, with the ADF t test having the next smallest. The ADF $\rho $-test, $\hat{\mr {Z}}_\rho $, and $\hat{\mr {Z}}_\tau $ have the largest size distortion. In addition, the power of the DF-GLS and M-tests is greater than that of the ADF t test and $\rho $-test. The ADF $\hat{\mr {Z}}_\rho $ has more severe size distortion than the ADF $\hat{\mr {Z}}_\tau $, but it has more power for a fixed lag length.

Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) Unit Root Test and Shin Cointegration Test

There are fewer tests available for the null hypothesis of trend stationarity I(0). The main reason is the difficulty of theoretical development. The KPSS test was introduced in Kwiatkowski et al. (1992) to test the null hypothesis that an observable series is stationary around a deterministic trend. For consistency, the notation used here differs from the notation in the original paper. The setup of the problem is as follows: it is assumed that the series is expressed as the sum of the deterministic trend, random walk $r_ t$, and stationary error $u_ t$; that is,

$\displaystyle  y_ t  $
$\displaystyle =\mu +\delta t+r_ t+u_ t $
$\displaystyle r_ t  $
$\displaystyle =r_{t-1}+e_ t  $

where $e_ t\sim $iid $(0,\sigma ^2_ e)$, and an intercept $\mu $ (in the original paper, the authors use $r_0$ instead of $\mu $; here we assume $r_0=0$.) The null hypothesis of trend stationarity is specified by $H_0: \sigma ^2_ e =0$, while the null of level stationarity is the same as above with the model restriction $ \delta =0$. Under the alternative that $\sigma ^2_ e \ne 0$, there is a random walk component in the observed series $y_ t$.

Under stronger assumptions of normality and iid of $u_ t$ and $e_ t$, a one-sided LM test of the null that there is no random walk ($e_ t=0,\forall t$) can be constructed as follows:

$\displaystyle  \widehat{\mi {LM}} $
$\displaystyle =\frac{1}{T^2}\sum _{t=1}^ T\frac{S_ t^2}{s^2(l)} $
$\displaystyle s^2(l) $
$\displaystyle =\frac{1}{T}\sum _{t=1}^ T \hat{u}^2_ t+\frac{2}{T}\sum _{s=1}^ lw(s,l)\sum _{t=s+1}^ T\hat{u}_ t\hat{u}_{t-s} $
$\displaystyle S_ t $
$\displaystyle =\sum _{\tau =1}^ t \hat{u}_\tau  $

Under the null hypothesis, $\hat{u}_ t$ can be estimated by ordinary least squares regression of $y_ t$ on an intercept and the time trend. Following the original work of Kwiatkowski et al. (1992), under the null ($\sigma ^2_ e=0$), the $\widehat{\mi {LM}}$ statistic converges asymptotically to three different distributions depending on whether the model is trend-stationary, level-stationary ($\delta =0$), or zero-mean stationary ($\delta =0$, $\mu =0$). The trend-stationary model is denoted by subscript $\tau $ and the level-stationary model is denoted by subscript $\mu $. The case when there is no trend and zero intercept is denoted as 0. The last case, although rarely used in practice, is considered in Hobijn, Franses, and Ooms (2004):

$\displaystyle  y_ t=u_ t:\qquad  $
$\displaystyle  \widehat{\mi {LM}}_0\stackrel{D}{\longrightarrow }\int _0^1B^2(r)dr $
$\displaystyle y_ t=\mu +u_ t:\qquad  $
$\displaystyle  \widehat{\mi {LM}}_\mu \stackrel{D}{\longrightarrow }\int _0^1V^2(r)dr $
$\displaystyle y_ t=\mu +\delta t+u_ t:\qquad  $
$\displaystyle  \widehat{\mi {LM}}_\tau \stackrel{D}{\longrightarrow }\int _0^1V_2^2(r)dr  $


$\displaystyle  V(r) $
$\displaystyle =B(r)-rB(1) $
$\displaystyle V_2(r) $
$\displaystyle =B(r)+(2r-3r^2)B(1)+(-6r+6r^2)\int _0^1B(s)ds  $

where $B(r)$ is a Brownian motion (Wiener process) and $\stackrel{D}{\longrightarrow }$ is convergence in distribution. $V(r)$ is a standard Brownian bridge, and $V_2(r)$ is a second-level Brownian bridge.

Using the notation of Kwiatkowski et al. (1992), the $\widehat{\mi {LM}}$ statistic is named as $\hat{\eta }$. This test depends on the computational method used to compute the long-run variance $s(l)$; that is, the window width $l$ and the kernel type $w(\cdot ,\cdot )$. You can specify the kernel used in the test by using the KERNEL option:

  • Newey-West/Bartlett (KERNEL=NW $|$ BART) (this is the default)

    \[ w(s,l)=1-\frac{s}{l+1} \]
  • quadratic spectral (KERNEL=QS)

    \[  w(s,l) = \tilde{w}\left(\frac sl\right)=\tilde{w}(x)=\frac{25}{12\pi ^2 x^2}\left(\frac{\sin \left(6\pi x/5\right)}{6\pi x/5}-\cos \left( \frac65\pi x\right)\right) \]

You can specify the number of lags, $l$, in three different ways:

  • Schwert (SCHW = c) (default for NW, c=12)

    \[ l=\max \left\{ 1,\textmd{floor}\left[c\left(\frac{T}{100}\right)^{1/4}\right]\right\}  \]
  • manual (LAG = $l$)

  • automatic selection (AUTO) (default for QS), from Hobijn, Franses, and Ooms (2004). The number of lags, $l$, is calculated as in the following table:






    $l=\min (T,\textmd{floor}(\hat{\gamma } T^{1/3}))$


    $l=\min (T,\textmd{floor}(\hat{\gamma } T^{1/5}))$


    $\hat{\gamma }=1.1447\left\{ \left(\frac{\hat{s}^{(1)}}{\hat{s}^{(0)}}\right)^2\right\} ^{1/3}$


    $\hat{\gamma }=1.3221\left\{ \left(\frac{\hat{s}^{(2)}}{\hat{s}^{(0)}}\right)^2\right\} ^{1/5}$


    $\hat{s}^{(j)}=\delta _{0,j}\hat{\gamma }_0+2\sum _{i=1}^ ni^ j\hat{\gamma }_ i$


    $\hat{s}^{(j)}=\delta _{0,j}\hat{\gamma }_0+2\sum _{i=1}^ ni^ j\hat{\gamma }_ i$





    where $T$ is the number of observations, $\delta _{0,j} =1$ if $j=0$ and 0 otherwise, and $\hat{\gamma }_ i=\frac{1}{T}\sum _{t=1}^{T-i}u_ tu_{t+i}$.

Simulation evidence shows that the KPSS has size distortion in finite samples. For an example, see Caner and Kilian (2001). The power is reduced when the sample size is large; this can be derived theoretically (see Breitung (1995)). Another problem of the KPSS test is that the power depends on the truncation lag used in the Newey-West estimator of the long-run variance $s^2(l)$.

Shin (1994) extends the KPSS test to incorporate the regressors to be a cointegration test. The cointegrating regression becomes

$\displaystyle  y_ t  $
$\displaystyle =\mu +\delta t+ {X_ t}’\beta + r_ t+u_ t $
$\displaystyle r_ t  $
$\displaystyle =r_{t-1}+e_ t  $

where $y_ t$ and $X_ t$ are scalar and m-vector $I(1)$ variables. There are still three cases of cointegrating regressions: without intercept and trend, with intercept only, and with intercept and trend. The null hypothesis of the cointegration test is the same as that for the KPSS test, $H_0: \sigma ^2_ e =0$. The test statistics for cointegration in the three cases of cointegrating regressions are exactly the same as those in the KPSS test; these test statistics are then ignored here. Under the null hypothesis, the statistics converge asymptotically to three different distributions:

$\displaystyle  y_ t=X_ t’\beta + u_ t:\qquad  $
$\displaystyle  \widehat{\mi {LM}}_0\stackrel{D}{\longrightarrow }\int _0^1Q_1^2(r)dr $
$\displaystyle y_ t=\mu +X_ t’\beta + u_ t:\qquad  $
$\displaystyle  \widehat{\mi {LM}}_\mu \stackrel{D}{\longrightarrow }\int _0^1Q_2^2(r)dr $
$\displaystyle y_ t=\mu +\delta t+X_ t’\beta + u_ t:\qquad  $
$\displaystyle  \widehat{\mi {LM}}_\tau \stackrel{D}{\longrightarrow }\int _0^1Q_3^2(r)dr  $


$\displaystyle  Q_1(r) = B(r) - \left( \int _0^ r{\mathbf{B_ m}(x)dx} \right) \left( \int _0^1{\mathbf{B_ m}(x)\mathbf{{B_ m}}(x)dx} \right) ^{-1} \left( \int _0^1{\mathbf{B_ m}(x)dB(x)} \right)  $
$\displaystyle Q_2(r) = V(r) - \left( \int _0^ r{\mathbf{\bar{B}_ m}(x)dx} \right) \left( \int _0^1{\mathbf{\bar{B}_ m}(x)\bar{\mathbf{{B_ m}}}(x)dx} \right)^{-1} \left( \int _0^1{\mathbf{\bar{B}_ m}(x)dB(x)} \right)  $
$\displaystyle Q_3(r) = V_2(r) - \left( \int _0^ r{\mathbf{B^{*}_ m}(x)dx} \right) \left( \int _0^1{\mathbf{B^{*}_ m}(x)\mathbf{{B^{*}_ m}}(x)dx} \right) ^{-1} \left( \int _0^1{\mathbf{B^{*}_ m}(x)dB(x)} \right)  $

where $B(.)$ and $\mathbf{B_ m}(.)$ are independent scalar and m-vector standard Brownian motion, and $\stackrel{D}{\longrightarrow }$ is convergence in distribution. $V(r)$ is a standard Brownian bridge, $V_2(r)$ is a Brownian bridge of a second-level, $\mathbf{\bar{B}_ m}(r)=\mathbf{B_ m}(r) - \int _0^1{\mathbf{B_ m}(x)dx}$ is an m-vector standard demeaned Brownian motion, and $\mathbf{B^{*}_ m}(r) = \mathbf{B_ m}(r) + (6r-4)\int _0^1{\mathbf{B_ m}(x)dx} + (-12r+6)\int _0^1{x\mathbf{B_ m}(x)dx}$ is an m-vector standard demeaned and detrended Brownian motion.

The p-values that are reported for the KPSS test and Shin test are calculated via a Monte Carlo simulation of the limiting distributions, using a sample size of 2,000 and 1,000,000 replications.

Testing for Statistical Independence

Independence tests are widely used in model selection, residual analysis, and model diagnostics because models are usually based on the assumption of independently distributed errors. If a given time series (for example, a series of residuals) is independent, then no determinic model is necessary for this completely random process; otherwise, there must exist some relationship in the series to be addressed. In the following section, four independence tests are introduced: the BDS test, the runs test, the turning point test, and the rank version of von Neumann ratio test.

BDS Test

Brock, Dechert, and Scheinkman (1987) propose a test (BDS test) of independence based on the correlation dimension. Brock et al. (1996) show that the first-order asymptotic distribution of the test statistic is independent of the estimation error provided that the parameters of the model under test can be estimated $\sqrt {n}$-consistently. Hence, the BDS test can be used as a model selection tool and as a specification test.

Given the sample size $T$, the embedding dimension $m$, and the value of the radius $r$, the BDS statistic is

\[  S_{\mr {BDS}}(T,m,r)=\sqrt {T-m+1}\frac{c_{m,m,T}(r)-c_{1,m,T}^{m}(r)}{\sigma _{m,T}(r)}  \]


$\displaystyle  c_{m,n,N}(r) $
$\displaystyle =\frac{2}{(N-n+1)(N-n)}\sum _{s=n}^{N}\sum _{t=s+1}^{N}\prod _{j=0}^{m-1}I_ r(z_{s-j},z_{t-j})  $
$\displaystyle I_ r(z_{s},z_{t}) $
$\displaystyle =\left\{  \begin{array}{ l l } 1 &  \text {if } |z_ s-z_ t|<r \\ 0 &  \text {otherwise} \end{array} \right.  $
$\displaystyle \sigma _{m,T}^{2}(r) $
$\displaystyle =4\left(k^ m+2\sum _{j=1}^{m-1}{k^{m-j}c^{2j}}+(m-1)^2c^{2m}-m^2kc^{2m-2}\right)  $
$\displaystyle c $
$\displaystyle =c_{1,1,T}(r)  $
$\displaystyle k $
$\displaystyle =k_ T(r)=\frac{6}{T(T-1)(T-2)}\sum _{t=1}^{T}\sum _{s=t+1}^{T}\sum _{l=s+1}^{T}{h_ r(z_ t,z_ s,z_ l)}  $
$\displaystyle h_ r(z_ t,z_ s,z_ l) $
$\displaystyle =\frac{1}{3}\left(I_ r(z_ t,z_ s)I_ r(z_ s,z_ l)+I_ r(z_ t,z_ l)I_ r(z_ l,z_ s)+I_ r(z_ s,z_ t)I_ r(z_ t,z_ l)\right)  $

The statistic has a standard normal distribution if the sample size is large enough. For small sample size, the distribution can be approximately obtained through simulation. Kanzler (1999) has a comprehensive discussion on the implementation and empirical performance of BDS test.

Runs Test and Turning Point Test

The runs test and turning point test are two widely used tests for independence (Cromwell, Labys, and Terraza, 1994).

The runs test needs several steps. First, convert the original time series into the sequence of signs, $\{ ++\  –\  ...\  +\  —\} $, that is, map $\{ z_ t\} $ into $\{ sign(z_ t-z_ M)\} $ where $z_ M$ is the sample mean of ${z_ t}$ and $sign(x)$ is $+$ if $x$ is nonnegative and $-$ if $x$ is negative. Second, count the number of runs, $R$, in the sequence. A run of a sequence is a maximal non-empty segment of the sequence that consists of adjacent equal elements. For example, the following sequence contains $R=8$ runs:

\[  \underbrace{+++}_{1}\underbrace{---}_{1}\underbrace{++}_{1}\underbrace{--}_{1} \underbrace{+}_{1}\underbrace{-}_{1}\underbrace{+++++}_{1}\underbrace{--}_{1}  \]

Third, count the number of pluses and minuses in the sequence and denote them as $N_{+}$ and $N_{-}$, respectively. In the preceding example sequence, $N_{+}=11$ and $N_{-}=8$. Note that the sample size $T=N_{+}+N_{-}$. Finally, compute the statistic of runs test,

\[  S_{\mr {runs}}=\frac{R-\mu }{\sigma }  \]


\[  \mu =\frac{2 N_{+} N_{-}}{T}+1  \]
\[  \sigma ^2=\frac{(\mu -1)(\mu -2)}{T-1}  \]

The statistic of the turning point test is defined as follows:

\[  S_{\mr {TP}}=\frac{\sum _{t=2}^{T-1}{TP_ t}-2(T-2)/3}{\sqrt {(16T-29)/90}}  \]

where the indicator function of the turning point $TP_ t$ is 1 if $z_ t>z_{t\pm 1}$ or $z_ t<z_{t\pm 1}$ (that is, both the previous and next values are greater or less than the current value); otherwise, 0.

The statistics of both the runs test and the turning point test have the standard normal distribution under the null hypothesis of independence.

Rank Version of von Neumann Ratio Test

Since the runs test completely ignores the magnitudes of the observations, Bartels (1982) proposes a rank version of the von Neumann Ratio test for independence:

\[  S_{\mr {RVN}}=\frac{\sqrt {T}}{2}\left(\frac{\sum _{t=1}^{T-1}{(R_{t+1}-R_{t})^2}}{(T(T^2-1)/12)}-2\right)  \]

where $R_ t$ is the rank of $t$th observation in the sequence of $T$ observations. For large sample, the statistic follows the standard normal distribution under the null hypothesis of independence. For small samples of size between 11 and 100, the critical values through simulation would be more precise; for samples of size no more than 10, the exact CDF is applied.

Testing for Normality

Based on skewness and kurtosis, Jarque and Bera (1980) calculated the test statistic

\[  T_{N} = \left[\frac{N}{6} b^{2}_{1}+ \frac{N}{24} ( b_{2}-3)^{2}\right]  \]


\[  b_{1} = \frac{\sqrt {N}\sum _{t=1}^{N}{\hat{u}^{3}_{t}}}{{\left(\sum _{t=1}^ N \hat{u}^{2}_{t}\right)}^{\frac{3}{2}}}  \]
\[  b_{2} = \frac{N\sum _{t=1}^{N}{\hat{u}^{4}_{t}}}{\left(\sum _{t=1}^{N}{\hat{u}^{2}_{t}}\right)^{2}}  \]

The ${\chi }$$^{2}$(2) distribution gives an approximation to the normality test ${T_{N}}$.

When the GARCH model is estimated, the normality test is obtained using the standardized residuals ${ \hat{u}_{t} = \hat{{\epsilon }}_{t}/\sqrt { h_{t}}}$. The normality test can be used to detect misspecification of the family of ARCH models.

Testing for Linear Dependence

Generalized Durbin-Watson Tests

Consider the following linear regression model:

\[  \mb {Y} =\mb {X}{\bbeta }+{\bnu }  \]

where $\mb {X}$ is an ${N\times k}$ data matrix, $\beta $ is a ${k\times 1}$ coefficient vector, and $\bnu $ is a ${N\times 1}$ disturbance vector. The error term $\bnu $ is assumed to be generated by the $j$th-order autoregressive process ${\nu }_{t}={\epsilon }_{t}-{\varphi }_{j}{\nu }_{t-j}$ where ${{| {\varphi }_{j}|} < 1}$, ${{\epsilon }_{t}}$ is a sequence of independent normal error terms with mean 0 and variance ${\sigma }^{2}$. Usually, the Durbin-Watson statistic is used to test the null hypothesis ${H_{0}: {\varphi }_{1}=0}$ against ${H_{1}: - {\varphi }_{1}>0}$. Vinod (1973) generalized the Durbin-Watson statistic:

\[  d_{j}=\frac{\sum _{t=j+1}^{N}{(\hat{{\nu }}_{t}-\hat{{\nu }}_{t-j})^{2} }}{\sum _{t=1}^{N}{\hat{{\nu }}_{t}^{2}} }  \]

where $\hat{\bnu }$ are OLS residuals. Using the matrix notation,

\[  d_{j}=\frac{\mb {{Y}} \mb {M} \mb {A} _{j} \mb {A} _{j}\mb {M} \mb {{Y}} }{\mb {{Y}} \mb {M} \mb {{Y}} }  \]

where ${\mb {M} =\mb {I} _{N}-\mb {X(X’X)^{-1}X’} }$ and ${\mb {A} _{j}}$ is a ${(N-j) \times N}$ matrix:

\[  \mb {A} _{j} = \begin{bmatrix}  -1   &  0   &  {\cdots }   &  0   &  1   &  0   &  {\cdots }   &  0   \\ 0   &  -1   &  0   &  {\cdots }   &  0   &  1   &  0   &  {\cdots }   \\ {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   \\ 0   &  {\cdots }   &  0   &  -1   &  0   &  {\cdots }   &  0   &  1   \end{bmatrix}  \]

and there are ${j-1}$ zeros between $-1$ and 1 in each row of matrix $\mb {A}_{j}$.

The QR factorization of the design matrix $\mb {X}$ yields a ${N\times N}$ orthogonal matrix $\mb {Q}$:

\[  \mb {X} = \mb {QR}  \]

where R is an ${N\times k}$ upper triangular matrix. There exists an ${N\times (N-k)}$ submatrix of $\mb {Q}$ such that ${\mb {Q} _{1}\mb {Q} _{1}’=\mb {M} }$ and ${\mb {Q} _{1}’\mb {Q} _{1}=\mb {I} _{N-k}}$. Consequently, the generalized Durbin-Watson statistic is stated as a ratio of two quadratic forms:

\[  d_{j}=\frac{\sum _{l=1}^{n}{{\lambda }_{jl} {\xi }_{l}}^{2}}{\sum _{l=1}^{n}{{\xi }_{l}^{2}}}  \]

where ${{\lambda }_{j1}{\ldots }{\lambda }_{jn}}$ are upper n eigenvalues of ${{\mb {MA}’}_{j}\mb {A} _{j}\mb {M} }$ and ${ {\xi }_{l}}$ is a standard normal variate, and ${n=\min (N-k, N-j)}$. These eigenvalues are obtained by a singular value decomposition of ${{\mb {Q}’}_{1}{\mb {A}’} _{j}}$ (Golub and Van Loan, 1989; Savin and White, 1978).

The marginal probability (or p-value) for ${d_{j}}$ given ${c_{0}}$ is

\[  \mr {Prob}(\frac{\sum _{l=1}^{n}{{\lambda }_{jl} {\xi }_{l}^{2}}}{\sum _{l=1}^{n}{{\xi }_{l}^{2}}} < c_{0})=\mr {Prob}(q_{j} < 0)  \]


\[  q_{j}=\sum _{l=1}^{n}{( {\lambda }_{jl}-c_{0}) {\xi }_{l}^{2}}  \]

When the null hypothesis ${H_{0}: {\varphi }_{j}=0}$ holds, the quadratic form ${q_{j}}$ has the characteristic function

\[  {\phi }_{j}(t)={\prod _{l=1}^ n (1-2( {\lambda }_{jl}-c_{0})it)^{-1/2}}  \]

The distribution function is uniquely determined by this characteristic function:

\[  F(x) = \frac{1}{2} + \frac{1}{2{\pi }}\int _{0}^{{\infty }}{\frac{e^{itx}{\phi }_{j}(-t)- e^{-itx}{\phi }_{j}(t)}{it}dt}  \]

For example, to test ${H_{0}: {\varphi }_{4}=0}$ given ${{\varphi }_{1}={\varphi }_{2}={\varphi }_{3}=0}$ against ${H_{1}: - {\varphi }_{4}>0}$, the marginal probability (p-value) can be used:

\[  F(0) = \frac{1}{2} + \frac{1}{2{\pi }}\int _{0}^{{\infty }}{\frac{({\phi }_{4}(-t)-{\phi }_{4}(t))}{it}dt}  \]


\[  {\phi }_{4}(t) = {\prod _{l=1}^ n (1-2({\lambda }_{4\mi {l} }-\hat{d}_{4})it )^{-1/2}}  \]

and ${ \hat{d}_{4}}$ is the calculated value of the fourth-order Durbin-Watson statistic.

In the Durbin-Watson test, the marginal probability indicates positive autocorrelation (${-{\varphi }_{j}>0}$) if it is less than the level of significance (${\alpha }$), while you can conclude that a negative autocorrelation (${-{\varphi }_{j}<0}$) exists if the marginal probability based on the computed Durbin-Watson statistic is greater than $1-{\alpha }$. Wallis (1972) presented tables for bounds tests of fourth-order autocorrelation, and Vinod (1973) has given tables for a 5% significance level for orders two to four. Using the AUTOREG procedure, you can calculate the exact p-values for the general order of Durbin-Watson test statistics. Tests for the absence of autocorrelation of order p can be performed sequentially; at the $j$th step, test ${H_{0}: {\varphi }_{j}=0}$ given ${ {\varphi }_{1}={\ldots } = {\varphi }_{j-1}=0}$ against ${ {\varphi }_{j} \neq 0}$. However, the size of the sequential test is not known.

The Durbin-Watson statistic is computed from the OLS residuals, while that of the autoregressive error model uses residuals that are the difference between the predicted values and the actual values. When you use the Durbin-Watson test from the residuals of the autoregressive error model, you must be aware that this test is only an approximation. See Autoregressive Error Model earlier in this chapter. If there are missing values, the Durbin-Watson statistic is computed using all the nonmissing values and ignoring the gaps caused by missing residuals. This does not affect the significance level of the resulting test, although the power of the test against certain alternatives may be adversely affected. Savin and White (1978) have examined the use of the Durbin-Watson statistic with missing values.

The Durbin-Watson probability calculations have been enhanced to compute the $p$-value of the generalized Durbin-Watson statistic for large sample sizes. Previously, the Durbin-Watson probabilities were only calculated for small sample sizes.

Consider the following linear regression model:

\[  \mb {Y} = \mb {X} \beta + \mb {u}  \]
\[  u_ t + \varphi _ j u_{t-j} = \epsilon _ t, \qquad t = 1, \dots , N  \]

where $\mb {X}$ is an $N \times k$ data matrix, $\beta $ is a $k \times 1$ coefficient vector, $\mb {u}$ is a $N \times 1$ disturbance vector, and $\epsilon _ t$ is a sequence of independent normal error terms with mean 0 and variance $\sigma ^2$.

The generalized Durbin-Watson statistic is written as

\[  \mr {DW}_ j = \frac{\hat{\mb {u}}^\prime \mb {A}_ j^\prime \mb {A}_ j \hat{\mb {u}} }{\hat{\mb {u}}^\prime \hat{\mb {u}} }  \]

where $\hat{\mb {u}}$ is a vector of OLS residuals and $\mb {A} _ j$ is a $(T-j)\times T$ matrix. The generalized Durbin-Watson statistic DW$_ j$ can be rewritten as

\[  \mr {DW} _ j = \frac{\mb {Y} ^\prime \mb {M} \mb {A} _ j^\prime \mb {A} _ j \mb {M} \mb {Y} }{\mb {Y} ^\prime \mb {M} \mb {Y} } = \frac{ \eta ^\prime ( \mb {Q} _1^\prime \mb {A} _ j^\prime \mb {A} _ j \mb {Q} _1 ) \eta }{\eta ^\prime \eta }  \]

where $\mb {Q} _1^\prime \mb {Q} _1 = \mb {I} _{T-k}, \,  \mb {Q} _1^\prime \mb {X} = 0, \mr {and} \,  \eta = \mb {Q} _1^\prime \mb {u} $.

The marginal probability for the Durbin-Watson statistic is

\[  \Pr (\mr {DW} _ j < c) = \Pr (h < 0)  \]

where $h = \eta ^\prime ( \mb {Q} _1^\prime \mb {A} _ j^\prime \mb {A} _ j \mb {Q} _1 - c\mb {I} )\eta $.

The $p$-value or the marginal probability for the generalized Durbin-Watson statistic is computed by numerical inversion of the characteristic function $\phi (u)$ of the quadratic form $h = \eta ^\prime ( \mb {Q} _1^\prime \mb {A} _ j^\prime \mb {A} _ j \mb {Q} _1 - c\mb {I} )\eta $. The trapezoidal rule approximation to the marginal probability $\Pr (h < 0)$ is

\[  \Pr (h < 0) = \frac{1}{2} - \sum _{k=0}^ K \frac{\mr {Im} \left[ \phi ((k + \frac{1}{2})\Delta ) \right]}{\pi (k + \frac{1}{2}) } + \mr {E} _ I(\Delta ) + \mr {E} _ T(K)  \]

where $\mr {Im} \left[ \phi (\cdot ) \right]$ is the imaginary part of the characteristic function, $\mr {E} _ I(\Delta )$ and $\mr {E} _ T(K)$ are integration and truncation errors, respectively. Refer to Davies (1973) for numerical inversion of the characteristic function.

Ansley, Kohn, and Shively (1992) proposed a numerically efficient algorithm that requires O($N$) operations for evaluation of the characteristic function $\phi (u)$. The characteristic function is denoted as

$\displaystyle  \phi (u)  $
$\displaystyle = $
$\displaystyle  \left| \mb {I} - 2iu( \mb {Q} _1^\prime \mb {A} _ j^\prime \mb {A} _ j \mb {Q} _1 - c\mb {I} _{N-k} ) \right|^{-1/2}  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \left| \mb {V} \right|^{-1/2} \left| \mb {X} ^{\prime } \mb {V} ^{-1} \mb {X} \right|^{-1/2} \left| \mb {X} ^\prime \mb {X} \right|^{1/2} \nonumber  $

where $\mb {V} = ( 1 + 2iuc )\mb {I} - 2iu \mb {A} _ j^\prime \mb {A} _ j$ and $i = \sqrt {-1}$. By applying the Cholesky decomposition to the complex matrix $\mb {V}$, you can obtain the lower triangular matrix $\mb {G}$ that satisfies $\mb {V} = \mb {G} \mb {G} ^\prime $. Therefore, the characteristic function can be evaluated in O($N$) operations by using the following formula:

\[  \phi (u) = \left| \mb {G} \right|^{-1} \left| \mb {X} ^{\ast \prime } \mb {X} ^\ast \right|^{-1/2} \left| \mb {X} ^\prime \mb {X} \right|^{1/2}  \]

where $\mb {X} ^\ast = \mb {G} ^{-1}\mb {X} $. Refer to Ansley, Kohn, and Shively (1992) for more information on evaluation of the characteristic function.

Tests for Serial Correlation with Lagged Dependent Variables

When regressors contain lagged dependent variables, the Durbin-Watson statistic (${d_{1}}$) for the first-order autocorrelation is biased toward 2 and has reduced power. Wallis (1972) shows that the bias in the Durbin-Watson statistic (${d_{4}}$) for the fourth-order autocorrelation is smaller than the bias in ${d_{1}}$ in the presence of a first-order lagged dependent variable. Durbin (1970) proposes two alternative statistics (Durbin h and t ) that are asymptotically equivalent. The h statistic is written as

\[  h = \hat{{\rho }}\sqrt {N / (1-N\hat{V})}  \]

where ${\hat{{\rho }}=\sum _{t=2}^{N}{\hat{{\nu }}_{t}\hat{{\nu }}_{t-1}}/\sum _{t=1}^{N}{\hat{{\nu }}_{t}^{2}}}$ and $\hat{V}$ is the least squares variance estimate for the coefficient of the lagged dependent variable. Durbin’s t test consists of regressing the OLS residuals ${\hat{{\nu }}_{t}}$ on explanatory variables and ${\hat{{\nu }}_{t-1}}$ and testing the significance of the estimate for coefficient of ${\hat{{\nu }}_{t-1}}$.

Inder (1984) shows that the Durbin-Watson test for the absence of first-order autocorrelation is generally more powerful than the h test in finite samples. Refer to Inder (1986) and King and Wu (1991) for the Durbin-Watson test in the presence of lagged dependent variables.

Godfrey LM test

The GODFREY= option in the MODEL statement produces the Godfrey Lagrange multiplier test for serially correlated residuals for each equation (Godfrey 1978a and 1978b). $r$ is the maximum autoregressive order, and specifies that Godfrey’s tests be computed for lags 1 through $r$. The default number of lags is four.

Testing for Nonlinear Dependence: Ramsey’s Reset Test

Ramsey’s reset test is a misspecification test associated with the functional form of models to check whether power transforms need to be added to a model. The original linear model, henceforth called the restricted model, is

\[  y_ t={\mb {x_ t}} \beta + u_ t  \]

To test for misspecification in the functional form, the unrestricted model is

\[  y_ t={\mb {x_ t}} \beta + \sum _{j=2}^{p}{\phi _ j \hat{y}_ t^ j + u_ t}  \]

where $\hat{y}_ t$ is the predicted value from the linear model and $p$ is the power of $\hat{y}_ t$ in the unrestricted model equation starting from 2. The number of higher-ordered terms to be chosen depends on the discretion of the analyst. The RESET option produces test results for $p=$ 2, 3, and 4.

The reset test is an F statistic for testing $H_{0}: \phi _{j}=0$, for all $j=2, \ldots , p$, against $H_{1}: \phi _{j}\neq 0$ for at least one $j=2, \ldots , p$ in the unrestricted model and is computed as follows:

\[  F_{(p-1,n-k-p+1)}= \frac{(SSE_{R}-SSE_{U})/(p-1)}{SSE_ U/(n-k-p+1)}  \]

where $SSE_ R$ is the sum of squared errors due to the restricted model, $SSE_ U$ is the sum of squared errors due to the unrestricted model, $n$ is the total number of observations, and $k$ is the number of parameters in the original linear model.

Ramsey’s test can be viewed as a linearity test that checks whether any nonlinear transformation of the specified independent variables has been omitted, but it need not help in identifying a new relevant variable other than those already specified in the current model.

Testing for Nonlinear Dependence: Heteroscedasticity Tests

Portmanteau Q Test

For nonlinear time series models, the portmanteau test statistic based on squared residuals is used to test for independence of the series (McLeod and Li, 1983):

\[  Q(q) = N(N+2)\sum _{i=1}^{q}{\frac{r(i; \hat{{\nu }}^{2}_{t})}{(N-i)}}  \]


\[  r(i; \hat{{\nu }}^{2}_{t}) = \frac{\sum _{t=i+1}^{N}{( \hat{{\nu }}^{2}_{t}-\hat{{\sigma }}^{2}) ( \hat{{\nu }}^{2}_{t-i}-\hat{{\sigma }}^{2})}}{\sum _{t=1}^{N}{( \hat{{\nu }}^{2}_{t}- \hat{{\sigma }}^{2})^{2}}}  \]
\[  \hat{{\sigma }}^{2} = \frac{1}{N} \sum _{t=1}^{N}{\hat{{\nu }}^{2}_{t}}  \]

This Q statistic is used to test the nonlinear effects (for example, GARCH effects) present in the residuals. The GARCH$(p,q)$ process can be considered as an ARMA$(\max (p,q),p)$ process. See the section Predicting the Conditional Variance. Therefore, the Q statistic calculated from the squared residuals can be used to identify the order of the GARCH process.

Engle’s Lagrange Multiplier Test for ARCH Disturbances

Engle (1982) proposed a Lagrange multiplier test for ARCH disturbances. The test statistic is asymptotically equivalent to the test used by Breusch and Pagan (1979). Engle’s Lagrange multiplier test for the qth order ARCH process is written

\[  LM(q) = \frac{N\mb {W} \mb {Z} (\mb {Z} \mb {Z} )^{-1}\mb {Z} \mb {W} }{\mb {W} \mb {W} }  \]


\[  \mb {W} = \left( \frac{\hat{{\nu }}^{2}_{1}}{\hat{{\sigma }}^{2}}-1, {\ldots }, \frac{\hat{{\nu }}^{2}_{N}}{\hat{{\sigma }}^{2}}-1\right)’  \]


\[  \mb {Z} = \begin{bmatrix}  1   &  \hat{{\nu }}_{0}^{2}   &  {\cdots }   &  \hat{{\nu }}_{-q+1}^{2}   \\ {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   \\ {\vdots }   &  {\vdots }   &  {\vdots }   &  {\vdots }   \\ 1   &  \hat{{\nu }}_{N-1}^{2}   &  {\cdots }   &  \hat{{\nu }}_{N-q}^{2}   \end{bmatrix}  \]

The presample values ( ${\nu }^{2}_{0}$,${\ldots }$, ${\nu }^{2}_{-q+1}$) have been set to 0. Note that the LM$(q)$ tests might have different finite-sample properties depending on the presample values, though they are asymptotically equivalent regardless of the presample values.

Lee and King’s Test for ARCH Disturbances

Engle’s Lagrange multiplier test for ARCH disturbances is a two-sided test; that is, it ignores the inequality constraints for the coefficients in ARCH models. Lee and King (1993) propose a one-sided test and prove that the test is locally most mean powerful. Let $\varepsilon _ t, t=1,...,T$, denote the residuals to be tested. Lee and King’s test checks

\[  H_0: \alpha _ i=0, i=1,...,q  \]
\[  H_1: \alpha _ i>0, i=1,...,q  \]

where $\alpha _ i, i=1,...,q,$ are in the following ARCH(q) model:

\[  \varepsilon _ t=\sqrt {h_ t}e_ t, e_ t~ iid(0,1)  \]
\[  h_ t=\alpha _0+\sum _{i=1}^{q}{\alpha _ i \varepsilon _{t-i}^2}  \]

The statistic is written as

\[  S=\frac{\sum _{t=q+1}^{T}{(\frac{\varepsilon _ t^2}{h_0}-1)\sum _{i=1}^{q}{\varepsilon _{t-i}^2}}}{\left[2\sum _{t=q+1}^{T}{(\sum _{i=1}^{q}{\varepsilon _{t-i}^2})^2}-\frac{2(\sum _{t=q+1}^{T} {\sum _{i=1}^{q}{\varepsilon _{t-i}^2}})^2}{T-q}\right]^{1/2}}  \]
Wong and Li’s Test for ARCH Disturbances

Wong and Li (1995) propose a rank portmanteau statistic to minimize the effect of the existence of outliers in the test for ARCH disturbances. They first rank the squared residuals; that is, $R_ t=rank(\varepsilon _ t^2)$. Then they calculate the rank portmanteau statistic

\[  Q_ R=\sum _{i=1}^{q}{\frac{(r_ i-\mu _ i)^2}{\sigma _ i^2}}  \]

where $r_ i$, $\mu _ i$, and $\sigma _ i^2$ are defined as follows:

\[  r_ i=\frac{\sum _{t=i+1}^{T}{(R_{t}-(T+1)/2)(R_{t-i}-(T+1)/2)}}{T(T^2-1)/12}  \]
\[  \mu _ i=-\frac{T-i}{T(T-1)}  \]
\[  \sigma _ i^2=\frac{5T^4-(5i+9)T^3+9(i-2)T^2+2i(5i+8)T+16i^2}{5(T-1)^2T^2(T+1)}  \]

The Q, Engle’s LM, Lee and King’s, and Wong and Li’s statistics are computed from the OLS residuals, or residuals if the NLAG= option is specified, assuming that disturbances are white noise. The Q, Engle’s LM, and Wong and Li’s statistics have an approximate ${ {\chi }^{2}_{(q)}}$ distribution under the white-noise null hypothesis, while the Lee and King’s statistic has a standard normal distribution under the white-noise null hypothesis.

Testing for Structural Change

Chow Test

Consider the linear regression model

\[  \mb {y} = \mb {X}{\beta } + \mb {u}  \]

where the parameter vector ${\beta }$ contains $k$ elements.

Split the observations for this model into two subsets at the break point specified by the CHOW= option, so that

$\displaystyle  \mb {y}  $
$\displaystyle = $
$\displaystyle  ( {\mb {y}’}_{1}, {\mb {y}’}_{2})’ $
$\displaystyle \mb {X}  $
$\displaystyle = $
$\displaystyle  ( {\mb {X}’}_{1}, {\mb {X}’}_{2})’ $
$\displaystyle \mb {u}  $
$\displaystyle = $
$\displaystyle  ( {\mb {u}’}_{1}, {\mb {u}’}_{2})’  $

Now consider the two linear regressions for the two subsets of the data modeled separately,

\[  \mb {y} _{1} = \mb {X}_{1}{\beta }_{1} + \mb {u} _{1}  \]
\[  \mb {y} _{2} = \mb {X}_{2}{\beta }_{2} + \mb {u} _{2}  \]

where the number of observations from the first set is ${n_{1}}$ and the number of observations from the second set is ${n_{2}}$.

The Chow test statistic is used to test the null hypothesis ${H_{0}: {\beta }_{1}={\beta }_{2}}$ conditional on the same error variance ${V(\mb {u} _{1}) = V(\mb {u} _{2})}$. The Chow test is computed using three sums of square errors:

\[  \mr {F}_\mi {chow} = \frac{({\mb {\hat{u}} ’}\mb {\hat{u}} - {\mb {\hat{u}}’}_{1}\mb {\hat{u}} _{1} - {\mb {\hat{u}}’}_{2}\mb {\hat{u}} _{2}) / {k}}{( {\mb {\hat{u}}’}_{1} \mb {\hat{u}} _{1} + {\mb {\hat{u}}’}_{2} \mb {\hat{u}} _{2}) / (n_{1}+n_{2}-2k)}  \]

where ${\mb {\hat{u}} }$ is the regression residual vector from the full set model, ${\mb {\hat{u}} _{1}}$ is the regression residual vector from the first set model, and ${\mb {\hat{u}} _{2}}$ is the regression residual vector from the second set model. Under the null hypothesis, the Chow test statistic has an F distribution with ${k}$ and ${(n_{1}+n_{2}-2k)}$ degrees of freedom, where ${k}$ is the number of elements in ${{\beta }}$.

Chow (1960) suggested another test statistic that tests the hypothesis that the mean of prediction errors is 0. The predictive Chow test can also be used when ${n_{2} < k}$.

The PCHOW= option computes the predictive Chow test statistic

\[  \mr {F}_\mi {pchow} = \frac{({\mb {\hat{u}} ’}\mb {\hat{u}} - {\mb {\hat{u}}’}_{1}\mb {\hat{u}} _{1}) / {n_2}}{ {\mb {\hat{u}}’}_{1} \mb {\hat{u}} _{1} / (n_{1}-k)}  \]

The predictive Chow test has an F distribution with ${n_{2}}$ and ${(n_{1}-k)}$ degrees of freedom.

Bai and Perron’s Multiple Structural Change Tests

Bai and Perron (1998) propose several kinds of multiple structural change tests: (1) the test of no break versus a fixed number of breaks ($supF$ test), (2) the equal and unequal weighted versions of double maximum tests of no break versus an unknown number of breaks given some upper bound ($UDmaxF$ test and $WDmaxF$ test), and (3) the test of $l$ versus $l+1$ breaks ($supF_{l+1|l}$ test). Bai and Perron (2003a, 2003b, 2006) also show how to implement these tests, the commonly used critical values, and the simulation analysis on these tests.

Consider the following partial structural change model with $m$ breaks ($m+1$ regimes):

\[  y_ t = x_ t’\beta + z_ t’\delta _ j+u_ t,\quad \quad t=T_{j-1}+1,...,T_ j,j=1,...,m  \]

Here, $y_ t$ is the dependent variable observed at time $t$, $x_ t (p \times 1)$ is a vector of covariates with coefficients $\beta $ unchanged over time, and $z_ t (q \times 1)$ is a vector of covariates with coefficients $\delta _ j$ at regime $j$, $j=1,...,m$. If $p=0$ (that is, there are no $x$ regressors), the regression model becomes the pure structural change model. The indices $(T_1,...,T_ m)$ (that is, the break dates or break points) are unknown, and the convenient notation $T_0=0$ and $T_{m+1}=T$ applies. For any given $m$-partition $(T_1,...,T_ m)$, the associated least squares estimates of $\beta $ and $\delta _ j,j=1,...,m,$ are obtained by minimizing the sum of squared residuals (SSR),

\[  S_ T(T_1,...,T_ m) = \sum _{i=1}^{m+1}{\sum _{t=T_{i-1}+1}^{T_ i}{(y_ t - x_ t’\beta - z_ t’\delta _ i)^2 }}  \]

Let $\hat{S}_ T(T_1,...,T_ m)$ denote the minimized SSR for a given $(T_1,...,T_ m)$. The estimated break dates $(\hat{T}_1,...,\hat{T}_ m)$ are such that

\[  (\hat{T}_1,...,\hat{T}_ m) = \arg \min _{T_1,...,T_ m}{\hat{S}_ T(T_1,...,T_ m)}  \]

where the minimization is taken over all partitions $(T_1,...,T_ m)$ such that $T_ i-T_{i-1}\geq T\epsilon $. Bai and Perron (2003a) propose an efficient algorithm, based on the principle of dynamic programming, to estimate the preceding model.

In the case that the data are nontrending, as stated in Bai and Perron (1998), the limiting distribution of the break dates is as follows:

\[  \frac{(\Delta _ iQ_ i\Delta _ i)^2}{(\Delta _ i\Omega _ i\Delta _ i)}(\hat{T}_ i-T_ i^0)\Rightarrow \arg \max _ s{V^{(i)}(s)},\quad \quad i=1,...,m  \]


\[  V^{(i)}(s)= \left\{  \begin{array}{ l l } W_1^{(i)}(-s)-|s|/2 &  \text {if}\  s \leq 0 \\ \sqrt {\eta _ i}(\phi _{i,2}/\phi _{i,1})W_2^{(i)}(s)-\eta _ i|s|/2 &  \text {if}\  s > 0 \end{array} \right.  \]


$\displaystyle  \Delta T_ i^0  $
$\displaystyle = T_ i^0 - T_{i-1}^0  $
$\displaystyle \Delta _ i  $
$\displaystyle = \delta _{i+1}^0 - \delta _ i^0  $
$\displaystyle Q_ i  $
$\displaystyle = \lim {(\Delta T_ i^0)^{-1}\sum _{t=T_{i-1}^0+1}^{T_ i^0}{E(z_ tz_ t’)}}  $
$\displaystyle \Omega _ i  $
$\displaystyle = \lim {(\Delta T_ i^0)^{-1}\sum _{r=T_{i-1}^0+1}^{T_ i^0}{\sum _{t=T_{i-1}^0+1}^{T_ i^0}{E(z_ rz_ t’u_ ru_ t)}}}  $
$\displaystyle \eta _ i  $
$\displaystyle = \Delta _ i’Q_{i+1}\Delta _ i / \Delta _ i’Q_ i\Delta _ i  $
$\displaystyle \phi _{i,1}^2  $
$\displaystyle = \Delta _ i’\Omega _ i\Delta _ i / \Delta _ i’Q_ i\Delta _ i  $
$\displaystyle \phi _{i,2}^2  $
$\displaystyle = \Delta _ i’\Omega _{i+1}\Delta _ i / \Delta _ i’Q_{i+1}\Delta _ i  $

Also, $W_1^{(i)}(s)$ and $W_2^{(i)}(s)$ are independent standard Weiner processes that are defined on $[0,\infty )$, starting at the origin when $s=0$; these processes are also independent across $i$. The cumulative distribution function of $\arg \max _ s{V^{(i)}(s)}$ is shown in Bai (1997). Hence, with the estimates of $\Delta _ i$, $Q_ i$, and $\Omega _ i$, the relevant critical values for confidence interval of break dates $T_ i$ can be calculated. The estimate of $\Delta _ i$ is $\hat{\delta }_{i+1} - \hat{\delta }_ i$. The estimate of $Q_ i$ is either

\[  \hat{Q}_ i = (\Delta \hat{T}_ i)^{-1} \sum _{t=\hat{T}_{i-1}^0+1}^{\hat{T}_ i^0}{z_ tz_ t’}  \]

if the regressors are assumed to have heterogeneous distributions across regimes (that is, the HQ option is specified), or

\[  \hat{Q}_ i = \hat{Q} = (T)^{-1} \sum _{t=1}^{T}{z_ tz_ t’}  \]

if the regressors are assumed to have identical distributions across regimes (that is, the HQ option is not specified). The estimate of $\Omega _ i$ can also be constructed with data over regime $i$ only or the whole sample, depending on whether the vectors ${z_ t\hat{u}_ t}$ are heterogeneously distributed across regimes (that is, the HO option is specified). If the HAC option is specified, $\hat{\Omega }_ i$ is estimated through the heteroscedasticity- and autocorrelation-consistent (HAC) covariance matrix estimator applied to vectors ${z_ t\hat{u}_ t}$.

The $supF$ test of no structural break $(m=0)$ versus the alternative hypothesis that there are a fixed number, $m=k$, of breaks is defined as follows:

\[  supF(k) = \frac{1}{T}\left( \frac{T-(k+1)q-p}{kq} \right) (R\hat{\theta })’ (R\hat{V}(\hat{\theta })R’)^{-1} (R\hat{\theta })  \]


\[  R_{(kq)\times (p+(k+1)q)} = \left( \begin{array}{ ccccccc } 0_{q \times p} &  I_ q &  -I_ q &  0 &  0 &  \cdots &  0 \\ 0_{q \times p} &  0 &  I_ q &  -I_ q &  0 &  \cdots &  0 \\ \vdots &  \cdots &  \ddots &  \ddots &  \ddots &  \ddots &  \cdots \\ 0_{q \times p} &  0 &  \cdots &  \cdots &  0 &  I_ q &  -I_ q \end{array} \right)  \]

and $I_ q$ is the $q\times q$ identity matrix; $\hat{\theta }$ is the coefficient vector $(\hat{\beta }’ \hat{\delta }_1’ ... \hat{\delta }_{k+1})’$, which together with the break dates $(\hat{T}_1 ... \hat{T}_ k)$ minimizes the global sum of squared residuals; and $\hat{V}(\hat{\theta })$ is an estimate of the variance-covariance matrix of $\hat{\theta }$, which could be estimated using the HAC estimator or another way depending on how the HAC, HR, and HE options are specified.

There are two versions of double maximum tests of no break against an unknown number of breaks given some upper bound $M$: (1) the $UDmaxF$ test:

\[  UDmaxF(M) = \max _{1\leq m \leq M} supF(m)  \]

and (2) the $WDmaxF$ test:

\[  WDmaxF(M, \alpha ) = \max _{1\leq m \leq M} \frac{c_\alpha (m)}{c_\alpha (1)}supF(m)  \]

where $\alpha $ is the significance level and $c_\alpha (m)$ is the critical value of $supF(m)$ test given the significance level $\alpha $. Four kinds of $WDmaxF$ tests that correspond to $\alpha = 0.100, 0.050, 0.025$, and $0.010$ are implemented.

The $supF(l+1|l)$ test of $l$ versus $l+1$ breaks is calculated in two ways that are asymptotically the same. In the first calculation, the method amounts to the application of $(l+1)$ tests of the null hypothesis of no structural change versus the alternative hypothesis of a single change. The test is applied to each segment that contains the observations $\hat{T}_{i-1}$ to $\hat{T}_ i$ $(i=1, ..., l+1)$. The $supF(l+1|l)$ test statistics are the maximum of these $(l+1)$ $supF$ test statistics. In the second calculation, for the given $l$ breaks $(\hat{T}_1, ..., \hat{T}_ l)$, the new break $\hat{T}^{(N)}$ is to minimize the global SSR:

\[  \hat{T}^{(N)} = \arg \min _{T^{(N)}}{SSR(\hat{T}_1, ..., \hat{T}_ l;T^{(N)})}  \]


\[  supF(l+1|l) = (T-(l+1)q-p) \frac{SSR(\hat{T}_1, ..., \hat{T}_ l) - SSR(\hat{T}_1, ..., \hat{T}_ l;\hat{T}^{(N)})}{SSR(\hat{T}_1, ..., \hat{T}_ l)}  \]

The p-value of each test is based on the simulation of the limiting distribution of that test.