The PANEL Procedure

Panel Data Unit Root Tests

Subsections:

Levin, Lin, and Chu (2002)
Im, Pesaran, and Shin (2003)
Combination Tests
Breitung’s Unbiased Tests
Hadri (2000) Stationarity Tests
Harris and Tzavalis (1999) Panel Unit Root Tests

Levin, Lin, and Chu (2002)

Levin, Lin, and Chu (2002) propose a panel unit root test for the null hypothesis of unit root against a homogeneous stationary hypothesis. The model is specified as

$\begin{equation*} \Delta y_{it}= \delta y_{it-1} + \sum _{L=1}^{p_{i}}{\theta _{iL}\Delta y_{it-L}} + \alpha _{mi} d_{mt} + \varepsilon _{it} \hspace{0.2 in} m=1,2,3 \end{equation*}$

Three models are considered: (1) $d_{1t}=\phi$ (the empty set) with no individual effects, (2) $d_{2t}=\left\{ 1\right\}$ in which the series $y_{it}$ has an individual-specific mean but no time trend, and (3) $d_{3t}=\left\{ 1,t\right\}$ in which the series $y_{it}$ has an individual-specific mean and linear and individual-specific time trend. The panel unit root test evaluates the null hypothesis of $H_{0}: \delta = 0$ , for all i, against the alternative hypothesis $H_{1}: \delta <0$ for all i. The lag order $p_{i}$ is unknown and is allowed to vary across individuals. It can be selected by the methods that are described in the section Lag Order Selection in the ADF Regression. Denote the selected lag orders as $\hat{p}_{i}$ . The test is implemented in three steps.

Step 1

The ADF regressions are implemented for each individual i, and then the orthogonalized residuals are generated and normalized. That is, the following model is estimated:

$\begin{equation*} \Delta y_{it}= \delta _{i} y_{it-1} + \sum _{L=1}^{\hat{p}_{i}}{\theta _{iL}\Delta y_{it-L}} + \alpha _{mi} d_{mt} + \varepsilon _{it} \hspace{0.2 in} m=1,2,3 \end{equation*}$

The two orthogonalized residuals are generated by the following two auxiliary regressions:

$\begin{equation*} \Delta y_{it}= \sum _{L=1}^{\hat{p}_{i}}{\theta _{iL}\Delta y_{it-L}} + \alpha _{mi} d_{mi} + e_{it} \end{equation*}$

$\begin{equation*} y_{it-1}= \sum _{L=1}^{\hat{p}_{i}}{\theta _{iL}\Delta y_{it-L}} + \alpha _{mi} d_{mi} + v_{it-1} \end{equation*}$

The residuals are saved at $\hat{e}_{it}$ and $\hat{v}_{it-1}$ , respectively. To remove heteroscedasticity, the residuals $\hat{e}_{it}$ and $\hat{v}_{it-1}$ are normalized by the regression standard error from the ADF regression. Denote the standard error as $\hat{\sigma }_{\varepsilon i}^{2}=\sum _{t=\hat{p}_{i}+2}^{T}\left(\hat{e}_{it}-\hat{\delta _{i}}\hat{v}_{it-1}\right)^{2}/\left(T-p_{i}-1\right)$ , and normalize residuals as

$\begin{equation*} \tilde{e}_{it}=\frac{\hat{e}_{it}}{\hat{\sigma }_{\varepsilon i}}, \hspace{0.2 in} \tilde{v}_{it-1}=\frac{\hat{v}_{it-1}}{\hat{\sigma }_{\varepsilon i}} \end{equation*}$

Step 2

The ratios of long-run to short-run standard deviations of $\Delta y_{it}$ are estimated. Denote the ratios and the long-run variances as $s_{i}$ and $\sigma _{yi}$ , respectively. The long-run variances are estimated by the HAC (heteroscedasticity- and autocorrelation-consistent) estimators, which are described in the section Long-Run Variance Estimation. Then the ratios are estimated by $\hat{s}_{i}=\hat{\sigma }_{yi}/\hat{\sigma }_{\varepsilon i}$ . Let the average standard deviation ratio be $S_{N}=(1/N)\sum _{i=1}^{N}s_{i}$ , and let its estimator be $\hat{S}_{N}=(1/N)\sum _{i=1}^{N}\hat{s}_{i}$ .

Step 3

The panel test statistics are calculated. To calculate the t statistic and the adjusted t statistic, the following equation is estimated:

$\begin{equation*} \tilde{e}_{it} = \delta \tilde{v}_{it-1} + \tilde{\varepsilon }_{it} \end{equation*}$

The total number of observations is $N\tilde{T}$ , with $\bar{\hat{p}}=\sum _{i=1}^{N}\hat{p}_{i}/N,\tilde{T}=T-\bar{\hat{p}}-1$ . The standard t statistic for testing $\delta = 0$ is $t_{\delta }=\hat{\delta }/STD(\hat{\delta })$ , with OLS estimator $\hat{\delta }$ and standard deviation $STD(\hat{\delta })$ . However, the standard t statistic diverges to negative infinity for models (2) and (3). Let $\hat{\sigma }_{\tilde{\varepsilon }}$ be the root mean square error from the step 3 regression, and denote it as

$\begin{equation*} \hat{\sigma }_{\tilde{\varepsilon }}^{2} = \left[\frac{1}{N\tilde{T}}\sum _{i=1}^{N}\sum _{t=2+\hat{p}_{i}}^{T}(\tilde{e}_{it}-\hat{\delta }\tilde{v}_{it-1})^{2}\right] \end{equation*}$

Levin, Lin, and Chu (2002) propose the following adjusted t statistic:

$\begin{equation*} t_{\delta }^{*}=\frac{t_{\delta }- N\tilde{T}\hat{S}_{N}\hat{\sigma }_{\tilde{\varepsilon }}^{-2}STD(\hat{\delta })\mu _{m\tilde{T}}^{*}}{\sigma _{m\tilde{T}}^{*}} \nonumber \end{equation*}$

The mean and standard deviation adjustments ( $\mu _{m\tilde{T}}^{*},\sigma _{m\tilde{T}}^{*}$ ) depend on the time series dimension $\tilde{T}$ and model specification m, which can be found in Table 2 of Levin, Lin, and Chu (2002). The adjusted t statistic converges to the standard normal distribution. Therefore, the standard normal critical values are used in hypothesis testing.

Lag Order Selection in the ADF Regression

The methods for selecting the individual lag orders in the ADF regressions can be divided into two categories: selection based on information criteria and selection via sequential testing.

Lag Selection Based on Information Criteria

In this method, the following information criteria can be applied to lag order selection: AIC, SBC, HQIC (HQC), and MAIC. As with other model selection applications, the lag order is selected from 0 to the maximum $p_{max}$ to minimize the objective function, plus a penalty term, which is a function of the number of parameters in the regression. Let k be the number of parameters and $T_{o}$ be the number of effective observations. For regression models, the objective function is $T_{o}log(SSR/T_{o})$ , where SSR is the sum of squared residuals. For AIC, the penalty term equals $2k$ . For SBC, this term is $klogT_{o}$ . For HQIC, it is $2cklog\left[log(T_{o})\right]$ with c being a constant greater than 1.^[8] For MAIC, the penalty term equals $2(\tau _ T(k) +k)$ , where

$\begin{equation*} \tau _ T(k)=(SSR/T_{o})^{-1} \hat{\delta }^2 \sum _{t=p_{max}+2}^ T y_{t-1}^2 \end{equation*}$

and $\hat{\delta }$ is the estimated coefficient of the lagged dependent variable $y_{t-1}$ in the ADF regression.

Lag Selection via Sequential Testing

In this method, the lag order estimation is based on the statistical significance of the estimated AR coefficients. Hall (1994) proposed general-to-specific (GS) and specific-to-general (SG) strategies. Levin, Lin, and Chu (2002) recommend the first strategy, following Campbell and Perron (1991). In the GS modeling strategy, starting with the maximum lag order $p_{max}$ , the t test for the largest lag order in $\hat{\theta }_{i}$ is performed to determine whether a smaller lag order is preferred. Specifically, when the null of $\hat{\theta }_{iL}=0$ is not rejected given the significance level ( $5\%$ ), a smaller lag order is preferred. This procedure continues until a statistically significant lag order is reached. On the other hand, the SG modeling strategy starts with lag order 0 and moves toward the maximum lag order $p_{max}$ .

Long-Run Variance Estimation

The long-run variance of $\Delta y_{it}$ is estimated by a HAC-type estimator. For model (1), given the lag truncation parameter $\bar{K}$ and kernel weights $w_{\bar{K}L}$ , the formula is

$\begin{equation*} \hat{\sigma }_{yi}^{2}=\frac{1}{T-1}\sum _{t=2}^{T}\Delta y_{it}^{2}+2\sum _{L=1}^{\bar{K}}w_{\bar{K}L}\left[\frac{1}{T-1}\sum _{t=2+L}^{T}\Delta y_{it}\Delta y_{it-L}\right] \end{equation*}$

To achieve consistency, the lag truncation parameter must satisfy $\bar{K}/T\to 0$ and $\bar{K}\to \infty$ as $T\to \infty$ . Levin, Lin, and Chu (2002) suggest $\bar{K}=\left\lfloor {3.21T^{1/3}}\right\rfloor$ . The weights $w_{\bar{K}L}$ depend on the kernel function. Andrews (1991) proposes data-driven bandwidth (lag truncation parameter + 1 if integer-valued) selection procedures to minimize the asymptotic mean squared error (MSE) criterion. For details about the kernel functions and Andrews (1991) data-driven bandwidth selection procedure, see the section Heteroscedasticity- and Autocorrelation-Consistent Covariance Matrices for details. Because Levin, Lin, and Chu (2002) truncate the bandwidth as an integer, when LLCBAND is specified as the BANDWIDTH option, it corresponds to $\mbox{BANDWIDTH}=\left\lfloor {3.21T^{1/3}}\right\rfloor +1$ . Furthermore, kernel weights $w_{\bar{K}L}=k(L/(\bar{K}+1))$ with kernel function $k(\cdot )$ .

For model (2), the series $\Delta y_{it}$ is demeaned individual by individual first. Therefore, $\Delta y_{it}$ is replaced by $\Delta y_{it}-\overline{\Delta y_{it}}$ , where $\overline{\Delta y_{it}}$ is the mean of $\Delta y_{it}$ for individual i. For model (3) with individual fixed effects and time trend, both the individual mean and trend should be removed before the long-run variance is estimated. That is, first regress $\Delta y_{it}$ on $\{ 1,t\}$ for each individual and save the residual $\widetilde{\Delta y_{it}}$ , and then replace $\Delta y_{it}$ with the residual.

Cross-Sectional Dependence via Time-Specific Aggregate Effects

The Levin, Lin, and Chu (2002) testing procedure is based on the assumption of cross-sectional independence. It is possible to relax this assumption and allow for a limited degree of dependence via time-specific aggregate effects. Let $\theta _ t$ denote the time-specific aggregate effects; then the data generating process (DGP) becomes

$\begin{equation*} \Delta y_{it}= \delta y_{it-1} + \sum _{L=1}^{p_{i}}{\theta _{iL}\Delta y_{it-L}} + \alpha _{mi} d_{mt} + \theta _ t + \varepsilon _{it} \hspace{0.2 in} m=4,5 \end{equation*}$

Two more models are considered: (4) $d_{1t}=\phi$ (the empty set) with no individual effects, but with time effects, and (5) $d_{2t}=\left\{ 1\right\}$ in which the series $y_{it}$ has an individual-specific mean but and time-specific mean.

By subtracting the time averages $\bar{y_{t}}=\sum _{i=1}^{N}y_{it}$ from the observed dependent variable $y_{it}$ , or equivalently, by including the time-specific intercepts $\theta _ t$ in the ADF regression, the cross-sectional dependence is removed. The impact of a single aggregate common factor that has an identical impact on all individuals but changes over time can also be removed in this way. After cross-sectional dependence is removed, the three-step procedure is applied to calculate the Levin, Lin, and Chu (2002) adjusted t statistic.

Deterministic Variables

Three deterministic variables can be included in the model for the first-stage estimation: CS_FixedEffects (cross-sectional fixed effects), TS_FixedEffects (time series fixed effects), and TimeTrend (individual linear time trend). When a linear time trend is included, the individual fixed effects are also included. Otherwise the time trend is not identified. Moreover, if the time fixed effects are included, the time trend is not identified either. Therefore, we have 5 identified models: model (1), no deterministic variables; model (2), CS_FixedEffects; model (3), CS_FixedEffects and TimeTrend; model (4), TS_FixedEffects; model (5), CS_FixedEffects TS_FixedEffects. PROC PANEL outputs the test results for all 5 model specifications.

Im, Pesaran, and Shin (2003)

To test for the unit root in heterogeneous panels, Im, Pesaran, and Shin (2003) propose a standardized t-bar test statistic based on averaging the (augmented) Dickey-Fuller statistics across the groups. The limiting distribution is standard normal. The stochastic process $y_{it}$ is generated by the first-order autoregressive process. If $\Delta y_{it} = y_{it} - y_{i,t-1}$ , the data generating process can be expressed as in LLC :

$\begin{equation*} \Delta y_{it}= \beta _{i} y_{it-1} + \sum _{j=1}^{p_{i}}{\rho _{ij}\Delta y_{i,t-j}} + \alpha _{mi} d_{mt} + \varepsilon _{it} \hspace{0.2 in} m= 1,2,3 \end{equation*}$

Unlike the DGP in LLC , $\beta _{i}$ is allowed to differ across groups. The null hypothesis of unit roots is

$\begin{equation*} H_{0}: \beta _{i} = 0\hspace{0.2 in} \text {for all } i \end{equation*}$

against the heterogeneous alternative,

$\begin{equation*} H_{1}: \beta _{i} < 0\hspace{0.2 in} \text {for }i = 1, \ldots , N_{1}, \hspace{0.2 in} \beta _{i} = 0\hspace{0.2 in} \text {for }i = N_{1} + 1, \ldots , N \end{equation*}$

The Im, Pesaran, and Shin test also allows for some (but not all) of the individual series to have unit roots under the alternative hypothesis. But the fraction of the individual processes that are stationary is positive, $lim_{N\to \infty } N_{1}/N = \delta \in \left( 0, 1 \right]$ . The t-bar statistic, denoted by $t\text {-bar}_{NT}$ , is formed as a simple average of the individual t statistics for testing the null hypothesis of $\beta _{i} = 0$ . If $t_{iT} \left(p_{i},\rho _{i}\right)$ is the standard t statistic, then

$\begin{equation*} t\text {-bar}_{NT} = \frac{1}{N}\sum _{i = 1}^{N} t_{iT} \left(p_{i},\rho _{i}\right) \end{equation*}$

If $T\to \infty$ , then for each i the t statistic (without time trend) converges to the Dickey-Fuller distribution, $\eta _{i}$ , defined by

$\begin{equation*} \eta _{i} = \frac{\frac{1}{2}\{ [W_{i}(1)]^{2} - 1\} -W_{i}(1)\int _0^1W_{i}(u)du}{\int _0^1[W_{i}(u)]^{2}du-[\int _0^1W_{i}(u)du]^{2}} \end{equation*}$

where $W_{i}$ is the standard Brownian motion. The limiting distribution is different when a time trend is included in the regression (Hamilton, 1994, p. 499). The mean and variance of the limiting distributions are reported in Nabeya (1999). The standardized t-bar statistic satisfies

$\begin{equation*} Z_{tbar}(p, \rho ) = \frac{\sqrt {N} \{ t\text {-bar}_{NT} - E(\eta ) \} }{\sqrt {Var(\eta )}}\implies \mathcal{N}(0,1) \end{equation*}$

where the standard normal is the sequential limit with $T\to \infty$ followed by $N\to \infty$ . To obtain better finite sample approximations, Im, Pesaran, and Shin (2003) propose standardizing the t-bar statistic by means and variances of $t_{iT}(p_{i},0)$ under the null hypothesis $\beta _{i}=0$ . The alternative standardized t-bar statistic is

$\begin{equation*} W_{tbar}(p, \rho ) = \frac{\sqrt {N} \{ t\text {-bar}_{NT} - \sum _{i = 1}^{N}E[t_{iT}(p_{i}, 0)| \beta _{i} = 0] /N\} }{\sqrt {\sum _{i = 1}^{N}Var[t_{iT}(p_{i}, 0)| \beta _{i} = 0]/N}}\implies \mathcal{N}(0,1) \end{equation*}$

Im, Pesaran, and Shin (2003) simulate the values of $E[t_{iT}(p_{i}, 0)| \beta _{i} = 0]$ and $Var[t_{iT}(p_{i}, 0)| \beta _{i} = 0]$ for different values of T and p. The lag order in the ADF regression can be selected by the same method as in Levin, Lin, and Chu (2002). See the section Lag Order Selection in the ADF Regression for details.

When T is fixed, Im, Pesaran, and Shin (2003) assume serially uncorrelated errors, $p_{i} = 0$ ; $t_{iT}$ is likely to have finite second moment, which is not established in the paper. The t statistic is modified by imposing the null hypothesis of a unit root. Denote $\tilde{\sigma }_{iT}$ as the estimated standard error from the restricted regression ( $\beta _{i}=0$ ),

$\begin{equation*} \tilde{t}\text {-bar}_{NT}=\sum _{i = 1}^{N}\tilde{t}_{it}/N=\sum _{i = 1}^{N}\left[\hat{\beta }_{iT}\left(y_{i,-1}’M_{\tau }y_{i,-1}\right)^{1/2}/\tilde{\sigma }_{iT}\right]/N \end{equation*}$

where $\hat{\beta }_{iT}$ is the OLS estimator of $\beta _{i}$ (unrestricted model), $\tau _{T} = \left(1,1,\ldots ,1\right)’$ , $M_{\tau } = I_{T}-\tau _{T}\left(\tau _{T}’\tau _{T}\right)\tau _{T}’$ , and $y_{i,-1}=\left(y_{i0},y_{i1},\ldots ,y_{i,T-1}\right)’$ Under the null hypothesis, the standardized $\tilde{t}\text {-bar}$ statistic converges to a standard normal variate,

$\begin{equation*} Z_{\tilde{t}bar}=\frac{\sqrt {N}\{ \tilde{t}\text {-bar}_{NT}-E\left(\tilde{t}_{T}\right)\} }{\sqrt {Var\left(\tilde{t}_{T}\right)}}\implies \mathcal{N}(0,1) \end{equation*}$

where $E\left(\tilde{t}_{T}\right)$ and $Var\left(\tilde{t}_{T}\right)$ are the mean and variance of $\tilde{t}_{iT}$ , respectively. The limit is taken as $N\to \infty$ and T is fixed. Their values are simulated for finite samples without a time trend. The $Z_{\tilde{t}bar}$ is also likely to converge to standard normal.

When N and T are both finite, an exact test that assumes no serial correlation can be used. The critical values of $t\text {-bar}_{NT}$ and $\tilde{t}\text {-bar}_{NT}$ are simulated.

Similar as in section Levin, Lin, and Chu (2002), it is possible to relax this assumption of cross-sectional independence and allow for a limited degree of dependence via time-specific aggregate effects. Two more models (model 4 and model 5) with time fixed effects are considered. See section Cross-Sectional Dependence via Time-Specific Aggregate Effects for details.

Combination Tests

Combining the observed significance levels (p-values) from N independent tests of the unit root null hypothesis was proposed by Maddala and Wu (1999); Choi (2001). Suppose $G_{i}$ is the test statistic to test the unit root null hypothesis for individual $i = 1,\ldots ,N$ , and $F(\cdot )$ is the cdf (cumulative distribution function) of the asymptotic distribution as $T\to \infty$ . Then the asymptotic p-value is defined as

$\begin{equation*} p_{i}=F\left(G_{i}\right) \end{equation*}$

There are different ways to combine these p-values. The first one is the inverse chi-square test (Fisher, 1932); this test is referred to as P test in Choi (2001) and $\lambda$ in Maddala and Wu (1999):

$\begin{equation*} Chi-Square=-2\sum _{i=1}^{N}\text {ln}\left(p_{i}\right) \end{equation*}$

When the test statistics $\left\{ G_{i}\right\} _{i=1,\ldots ,N}$ are continuous, $\left\{ p_{i}\right\} _{i=1,\ldots ,N}$ are independent uniform $\left(0,1\right)$ variables. Therefore, $P\Rightarrow \chi ^{2}\left(2N\right)$ as $T\to \infty$ and N fixed. But as $N\to \infty$ , P diverges to infinity in probability. Therefore, it is not applicable for large N. To derive a nondegenerate limiting distribution, the P test (Fisher test with $N\to \infty$ ) should be modified to

$\begin{equation*} P_{m}=FI=\sum _{i=1}^{N}\left(-2\text {ln}\left(p_{i}\right)-2\right)/2\sqrt {N}=-\sum _{i=1}^{N}\left(\text {ln}\left(p_{i}\right)+1\right)/\sqrt {N} \end{equation*}$

Under the null as $T_{i}\to \infty$ ,^[9] and then $N\to \infty$ , $P_{m}\Rightarrow \mathcal{N}\left(0,1\right)$ .^[10]

The second way of combining individual p-values is the inverse normal test,

$\begin{equation*} Z=\sum _{i=1}^{N}\Phi ^{-1}\left(p_{i}\right) \end{equation*}$

where $\Phi \left(\cdot \right)$ is the standard normal cdf. When $T_{i}\to \infty$ , $Z\Rightarrow \mathcal{N}\left(0,1\right)$ as N is fixed. When N and $T_{i}$ are both large, the sequential limit is also standard normal if $T_{i}\to \infty$ first and $N\to \infty$ next.

The third way of combining p-values is the logit test,

$\begin{equation*} L^{*}=\sqrt {k}L=\sqrt {k}\sum _{i=1}^{N}\text {ln}\left(\frac{p_{i}}{1-p_{i}}\right) \end{equation*}$

where $k=3\left(5N+4\right)/\left(\pi ^{2}N\left(5N+2\right)\right)$ . When $T_{i}\to \infty$ and N is fixed, $L^{*}\Rightarrow t_{5N+4}$ . In other words, the limiting distribution is the t distribution with degree of freedom $5N+4$ . The sequential limit is $L^{*}\Rightarrow \mathcal{N}\left(0,1\right)$ as $T_{i}\to \infty$ and then $N\to \infty$ . Simulation results in Choi (2001) suggest that the $Z$ test outperforms other combination tests. For the time series unit root test $G_{i}$ , Maddala and Wu (1999) apply the augmented Dickey-Fuller test. According to Choi (2006), the Elliott, Rothenberg, and Stock (1996) Dickey-Fuller generalized least squares (DF-GLS) test brings significant size and power advantages in finite samples.

Breitung’s Unbiased Tests

To account for the nonzero mean of the t statistic in the OLS detrending case, bias-adjusted t statistics were proposed by: Levin, Lin, and Chu (2002); Im, Pesaran, and Shin (2003). The bias corrections imply a severe loss of power. Breitung and associates take an alternative approach to avoid the bias, by using alternative estimates of the deterministic terms (Breitung and Meyer, 1994; Breitung, 2000; Breitung and Das, 2005). The DGP is the same as in the Im, Pesaran, and Shin approach. When serial correlation is absent, for model (2) with individual specific means, the constant terms are estimated by the initial values $y_{i1}$ . Therefore, the series $y_{it}$ is adjusted by subtracting the initial value. The equation becomes

$\begin{equation*} \Delta y_{it}=\delta ^{*}\left(y_{i,t-1}-y_{i1}\right)+v_{it} \end{equation*}$

For model (3) with individual specific means and time trends, the time trend can be estimated by $\hat{\beta }_{i}=\left(T-1\right)^{-1}\left(y_{iT}-y_{i1}\right)$ . The levels can be transformed as

$\begin{equation*} \tilde{y}_{it}=y_{it}-y_{i1}-\hat{\beta }_{i}t=y_{it}-y_{i1}-t\left(y_{iT}-y_{i1}\right)/\left(T-1\right) \end{equation*}$

The Helmert transformation is applied to the dependent variable to remove the mean of the differenced variable:

$\begin{equation*} \Delta y_{it}^{*}=\sqrt {\frac{T-t}{T-t+1}}\left[\Delta y_{it}-\left(\Delta y_{i,t+1}+\ldots +\Delta y_{iT}\right)/\left(T-t\right)\right] \end{equation*}$

The transformed model is

$\begin{equation*} \Delta y_{it}^{*}=\delta ^{*}\tilde{y}_{i,t-1}+v_{it} \end{equation*}$

The pooled t statistic has a standard normal distribution. Therefore, no adjustment is needed for the t statistic. To adjust for heteroscedasticity across cross sections, Breitung (2000) proposes a UB (unbiased) statistic based on the transformed data,

$\begin{equation*} UB=\frac{\sum _{i=1}^{N}\sum _{t=2}^{T}\Delta y_{it}^{*}\tilde{y}_{i,t-1}/\sigma _{i}^{2}}{\sqrt {\sum _{i=1}^{N}\sum _{t=2}^{T}\tilde{y}_{i,t-1}^{2}/\sigma _{i}^{2}}} \end{equation*}$

where $\sigma _{i}^{2}=E\left(\Delta y_{it}-\beta _{i}\right)^{2}$ . When $\sigma _{i}^{2}$ is unknown, it can be estimated as

$\begin{equation*} \hat{\sigma }_{i}^{2}=\sum _{t=2}^{T}\left(\Delta y_{it}-\sum _{t=2}^{T}\Delta y_{it}/\left(T-1\right)\right)^{2}/\left(T-2\right) \end{equation*}$

The $UB$ statistic has a standard normal limiting distribution as $T\to \infty$ followed by $N\to \infty$ sequentially.
To account for the short-run dynamics, Breitung and Das (2005) suggest applying the test to the prewhitened series, $\hat{y}_{it}$ . For model (1) and model (2) (constant-only case), they suggested the same method as in step 1 of Levin, Lin, and Chu (2002).^[11] For model (3) (with a constant and linear time trend), the prewhitened series can be obtained by running the following restricted ADF regression under the null hypothesis of a unit root ( $\delta =0$ ) and no intercept and linear time trend ( $\mu _{i}=0,\beta _{i}=0$ ):

$\begin{equation*} \Delta y_{it}= \sum _{L=1}^{\hat{p}_{i}}{\theta _{iL}\Delta y_{it-L}} + \mu _{i} + \varepsilon _{it} \hspace{0.2 in} \end{equation*}$

where $\hat{p}_{i}$ is a consistent estimator of the true lag order $p_{i}$ and can be estimated by the procedures listed in the section Lag Order Selection in the ADF Regression. For LLC and IPS tests, the lag orders are selected by running the ADF regressions. But for Breitung and his coauthors’ tests, the restricted ADF regressions are used to be consistent with the prewhitening method. Let $\left(\hat{\mu }_{i}, \hat{\theta }_{iL}\right)$ be the estimated coefficients.^[12] The prewhitened series can be obtained by

$\begin{equation*} \Delta \hat{y}_{it}=\Delta y_{it}-\sum _{L=1}^{\hat{p}_{i}}{\hat{\theta }_{iL}\Delta y_{it-L}} \end{equation*}$

and

$\begin{equation*} \hat{y}_{it}=y_{it}-\sum _{L=1}^{\hat{p}_{i}}{\hat{\theta }_{iL}y_{it-L}} \end{equation*}$

The transformed series are random walks under the null hypothesis,

$\begin{equation*} \Delta \hat{y}_{it}=\delta \hat{y}_{i,t-1}+v_{it} \end{equation*}$

where $y_{is}=0$ for $s<0$ . When the cross-section units are independent, the t statistic converges to standard normal under the null, as $T\to \infty$ followed by $N\to \infty$ ,

$\begin{equation*} t_{OLS}=\frac{\sum _{i=1}^{N}\sum _{t=2}^{T}y_{i,t-1}\Delta y_{it}}{\hat{\sigma }\sqrt {\sum _{i=1}^{N}\sum _{t=2}^{T}y_{i,t-1}^{2}}}\implies \mathcal{N}\left(0,1\right) \end{equation*}$

where $\hat{\sigma }^{2}=\sum _{i=1}^{N}\sum _{t=2}^{T}\left(\Delta y_{it}-\hat{\delta }y_{i,t-1}\right)^{2}/N\left(T-1\right)$ with OLS estimator $\hat{\delta }$ .
To take account for cross-sectional dependence, Breitung and Das (2005) propose the robust t statistic and a GLS version of the test statistic. Let $v_{t}=\left(v_{1t},\ldots ,v_{Nt}\right)’$ be the error vector for time t, and let $\Omega =E\left(v_{t}v_{t}’\right)$ be a positive definite matrix with eigenvalues $\lambda _{1}\geq \ldots \geq \lambda _{N}$ . Let $y_{t}=\left(y_{1t},\ldots ,y_{Nt}\right)’$ and $\Delta y_{t}=\left(\Delta y_{1t},\ldots ,\Delta y_{Nt}\right)’$ . The model can be written as a SUR-type system of equations,

$\begin{equation*} \Delta y_{t}=\delta y_{t-1} + v_{t} \end{equation*}$

The unknown covariance matrix $\Omega$ can be estimated by its sample counterpart,

$\begin{equation*} \hat{\Omega }= \sum _{t=2}^{T}\left(\Delta y_{t}-\hat{\delta } y_{t-1}\right)\left(\Delta y_{t}-\hat{\delta } y_{t-1}\right)’/\left(T-1\right) \end{equation*}$

The sequential limit $T\to \infty$ followed by $N\to \infty$ of the standard t statistic $t_{OLS}$ is normal with mean 0 and variance $v_{\Omega }=\text {lim}_{N\to \infty }\text {tr}\left(\Omega ^{2}/N\right)/\left(\text {tr}\Omega /N\right)^{2}$ . The variance $v_{\Omega }$ can be consistently estimated by $\hat{v}_{\hat{\delta }}=\left(\sum _{t=2}^{T}y_{t-1}’\hat{\Omega }y_{t-1}\right)/\left(\sum _{t=2}^{T}y_{t-1}’y_{t-1}\right)^{2}$ . Thus the robust t statistic can be calculated as

$\begin{equation*} t_{rob} =\frac{\delta }{\hat{v}_{\hat{\delta }}} =\frac{\sum _{t=2}^{T}y_{t-1}'\Delta y_{t}}{\sqrt {\sum _{t=2}^{T}y_{t-1}’\hat{\Omega }y_{t-1}}} \implies \mathcal{N}\left(0,1\right) \end{equation*}$

as $T\to \infty$ followed by $N\to \infty$ under the null hypothesis of random walk. Since the finite sample distribution can be quite different, Breitung and Das (2005) list the $1\%$ , $5\%$ , and $10\%$ critical values for different N’s.

When $T>N$ , a (feasible) GLS estimator is applied; it is asymptotically more efficient than the OLS estimator. The data are transformed by multiplying $\hat{\Omega }^{-1/2}$ as defined before, $\hat{z}_{t}=\hat{\Omega }^{-1/2}y_{t}$ . Thus the model is transformed into

$\begin{equation*} \Delta \hat{z}_{t}=\delta \hat{z}_{t-1} + e_{t} \end{equation*}$

The feasible GLS (FGLS) estimator of $\delta$ and the corresponding t statistic are obtained by estimating the transformed model by OLS and denoted by $\hat{\delta }_{GLS}$ and $t_{GLS}$ , respectively:

$\begin{equation*} t_{GLS} =\frac{\sum _{t=2}^{T}y_{t-1}'\hat{\Omega }^{-1}\Delta y_{t}}{\sqrt {\sum _{t=2}^{T}y_{t-1}’\hat{\Omega }^{-1}y_{t-1}}} \implies \mathcal{N}\left(0,1\right) \end{equation*}$

Hadri (2000) Stationarity Tests

Hadri (2000) adopts a component representation where an individual time series is written as a sum of a deterministic trend, a random walk, and a white-noise disturbance term. Under the null hypothesis of stationary, the variance of the random walk equals 0. Specifically, two models are considered:

For model (1), the time series $y_{it}$ is stationary around a level $r_{i0}$ ,

$\begin{equation*} y_{it}=r_{it}+\epsilon _{it} \hspace{0.2 in} i = 1, \ldots , N, \hspace{0.2 in}t = 1, \ldots , T \end{equation*}$
For model (2), $y_{it}$ is trend stationary,

$\begin{equation*} y_{it}=r_{it}+\beta _{i}t+\epsilon _{it} \hspace{0.2 in} i = 1, \ldots , N, \hspace{0.2 in}t = 1, \ldots , T \end{equation*}$

where $r_{it}$ is the random walk component,

$\begin{equation*} r_{it}=r_{it-1}+u_{it} \hspace{0.2 in} i = 1, \ldots , N, \hspace{0.2 in}t = 1, \ldots , T \end{equation*}$

The initial values of the random walks, $\left\{ r_{i0}\right\} _{i=1,\ldots ,N}$ , are assumed to be fixed unknowns and can be considered as heterogeneous intercepts. The errors $\epsilon _{it}$ and $u_{it}$ satisfy $\epsilon _{it}\sim \mbox{iid}\mathcal{N}\left(0,\sigma _{\epsilon }^{2}\right)$ , $u_{it}\sim \mbox{iid}\mathcal{N}\left(0,\sigma _{u}^{2}\right)$ and are mutually independent.

The null hypothesis of stationarity is $H_{0}: \sigma _{u}^{2}=0$ against the alternative random walk hypothesis $H_{1}: \sigma _{u}^{2}>0$ .

In matrix form, the models can be written as

$\begin{equation*} y_{i}=X_{i}\beta _{i}+e_{i} \end{equation*}$

where $y_{i}’=\left(y_{i1},\ldots ,y_{iT}\right)$ , $e_{i}’=\left(e_{i1},\ldots ,e_{iT}\right)$ with $e_{it}=\sum _{j=1}^{t}u_{ij}+\epsilon _{it}$ , and $X_{i}=\left(\iota _{T},a_{T}\right)$ with $\iota _{T}$ being a $T\times 1$ vector of ones, $a_{T}’=\left(1,\ldots ,T\right)$ , and $\beta _{i}’=\left(r_{i0},\beta _{i}\right)$ .

Let $\hat{\epsilon }_{it}$ be the residuals from the regression of $y_{i}$ on $X_{i}$ ; then the LM statistic is

$\begin{equation*} LM=\frac{\frac{1}{N}\sum _{i=1}^{N}\frac{1}{T^{2}}\sum _{t=1}^{T}S_{it}^{2}}{\hat{\sigma }_{\epsilon }^{2}} \end{equation*}$

where $S_{it}=\sum _{j=1}^{t}\hat{\epsilon }_{ij}$ is the partial sum of the residuals and $\hat{\sigma }_{\epsilon }^{2}$ is a consistent estimator of $\sigma _{\epsilon }^{2}$ under the null hypothesis of stationarity. With some regularity conditions,

$\begin{equation*} LM\xrightarrow {p}E\left[\int _{0}^{1}V^{2}\left(r\right)dr\right] \end{equation*}$

where $V\left(r\right)$ is a standard Brownian bridge in model (1) and a second-level Brownian bridge in model (2). Let $W\left(r\right)$ be a standard Wiener process (Brownian motion),

$\begin{equation*} V\left(r\right)=\left\{ \begin{array}{l l} W\left(r\right)-rW\left(1\right)& \text {for model (1)}\\ W\left(r\right)+\left(2r-3r^{2}\right)W\left(1\right) +6r\left(r-1\right)\int _{0}^{1}W\left(s\right)ds& \text {for model (2)} \end{array} \right. \end{equation*}$

The mean and variance of the random variable $\int V^{2}$ can be calculated by using the characteristic functions,

$\begin{equation*} \xi =E\left[\int _{0}^{1}V^{2}\left(r\right)dr\right] =\left\{ \begin{array}{l l} \frac{1}{6}& \text {for model (1)}\\ \frac{1}{15}& \text {for model (2)}\\ \end{array} \right. \end{equation*}$

and

$\begin{equation*} \zeta ^{2}=var\left[\int _{0}^{1}V^{2}\left(r\right)dr\right] =\left\{ \begin{array}{l l} \frac{1}{45}& \text {for model (1)}\\ \frac{11}{6300}& \text {for model (2)}\\ \end{array} \right. \end{equation*}$

The LM statistics can be standardized to obtain the standard normal limiting distribution,

$\begin{equation*} Z=\frac{\sqrt {N}\left(LM-\xi \right)}{\zeta }\implies \mathcal{N}\left(0,1\right) \end{equation*}$

Consistent Estimator of $\sigma _{\epsilon }^{2}$

Hadri’s (2000) test can be applied to the general case of heteroscedasticity and serially correlated disturbance errors. Under homoscedasticity and serially uncorrelated errors, $\sigma _{\epsilon }^{2}$ can be estimated as

$\begin{equation*} \hat{\sigma }_{\epsilon }^{2}=\sum _{i=1}^{N}\sum _{t=1}^{T}\hat{\epsilon }_{it}^{2}/N\left(T-k\right) \end{equation*}$

where k is the number of regressors. Therefore, $k=1$ for model (1) and $k=2$ for model (2).

When errors are heteroscedastic across individuals, the standard errors $\sigma _{\epsilon ,i}^{2}$ can be estimated by $\hat{\sigma }_{\epsilon ,i}^{2}=\sum _{t=1}^{T}\hat{\epsilon }_{it}^{2}/\left(T-k\right)$ for each individual i and the LM statistic needs to be modified to

$\begin{equation*} LM=\frac{1}{N}\sum _{i=1}^{N}\left(\frac{\frac{1}{T^{2}}\sum _{t=1}^{T}S_{it}^{2}}{\hat{\sigma }_{\epsilon ,i}^{2}}\right) \end{equation*}$

To allow for temporal dependence over t, $\sigma _{\epsilon }^{2}$ has to be replaced by the long-run variance of $\epsilon _{it}$ , which is defined as $\sigma ^{2}=\sum _{i=1}^{N}\text {lim}_{T\to \infty }T^{-1}\left(S_{iT}^{2}\right)/N$ . A HAC estimator can be used to consistently estimate the long-run variance $\sigma ^{2}$ . For more information, see the section Long-Run Variance Estimation.

Similar as in section Levin, Lin, and Chu (2002), it is possible to relax this assumption of cross-sectional independence and allow for a limited degree of dependence via time-specific aggregate effects. One more models (model 3) with time fixed effects are considered. See section Cross-Sectional Dependence via Time-Specific Aggregate Effects for details.

Harris and Tzavalis (1999) Panel Unit Root Tests

Harris and Tzavalis (1999) derive the panel unit root test under fixed T and large N. Five models are considered as in Levin, Lin, and Chu (2002). Model (1) is the homogeneous panel,

$\begin{equation*} y_{it}=\varphi y_{it-1}+v_{it} \end{equation*}$

Under the null hypothesis, $\varphi =1$ . For model (2), each series is a unit root process with a heterogeneous drift,

$\begin{equation*} y_{it}=\alpha _{i}+\varphi y_{it-1}+v_{it} \end{equation*}$

Model (3) includes heterogeneous drifts and linear time trends,

$\begin{equation*} y_{it}=\alpha _{i}+\beta _{i}t+\varphi y_{it-1}+v_{it} \end{equation*}$

Let $\hat{\varphi }$ be the OLS estimator of $\varphi$ ; then

$\begin{equation*} \hat{\varphi }-1=\left[\sum _{i=1}^{N}y_{i,-1}’Q_{T}y_{i,-1}\right]^{-1}\cdot \left[\sum _{i=1}^{N}y_{i,-1}’Q_{T}v_{i}\right] \end{equation*}$

where $y_{i,-1}=\left(y_{i0},\ldots ,y_{iT-1}\right)$ , $v_{i}’=\left(v_{i1},\ldots ,v_{iT}\right)$ , and $Q_{T}$ is the projection matrix. For model (1), there are no regressors other than the lagged dependent value, so $Q_{T}$ is the identity matrix $I_{T}$ . For model (2), a constant is included, so $Q_{T}=I_{T}-e_{T}e_{T}’/T$ with $e_{T}$ a $T\times 1$ column of ones. For model (3), a constant and time trend are included. Thus $Q_{T}=I_{T}-Z_{T}\left(Z_{T}’Z_{T}\right)^{-1}Z_{T}’$ , where $Z_{T}=\left(e_{T},\tau _{T}\right)$ and $\tau _{T}=\left(1,\ldots ,T\right)’$ .

When $y_{i0}=0$ in model (1) under the null hypothesis, as $N\to \infty$

$\sqrt {NT\left(T-1\right)/2}\left(\hat{\varphi }-1\right)\xrightarrow {y_{i0}=0, H_{0}}\mathcal{N}\left(0,1\right)$

As $T\to \infty$ , it becomes $T\sqrt {N}\left(\hat{\varphi }-1\right)\overset {H_{0}}\Longrightarrow \mathcal{N}\left(0,2\right)$ .

When the drift is absent in model (2), $\alpha _{i}=0$ , under the null hypothesis, as $N\to \infty$

$\begin{equation*} \sqrt {\frac{5N\left(T+1\right)^{3}\left(T-1\right)}{3\left(17T^{2}-20T+17\right)}}\left(\hat{\varphi }-1+\frac{3}{\left(T+1\right)}\right)\xrightarrow {\alpha _{i}=0,H_{0}}\mathcal{N}\left(0,1\right) \end{equation*}$

As $T\to \infty$ , $\left(T\sqrt {N}\left(\hat{\varphi }-1\right)+3\sqrt {N}\right)/\sqrt {51/5}\overset {H_{0}}\Longrightarrow \mathcal{N}\left(0,1\right)$ .

When the time trend is absent in model (3), $\beta _{i}=0$ , under the null hypothesis, as $N\to \infty$

$\begin{equation*} \sqrt {\frac{112N\left(T+2\right)^{3}\left(T-2\right)}{15\left(193T^{2}-728T+1147\right)})}\left(\hat{\varphi }-1+\frac{15}{2\left(T+2\right)}\right)\xrightarrow {\beta _{i}=0,H_{0}}\mathcal{N}\left(0,1\right) \end{equation*}$

When $T\to \infty$ , $\left(T\sqrt {N}\left(\hat{\varphi }-1\right)+7.5\sqrt {N}\right)/\sqrt {2895/112}\overset {H_{0}}\Longrightarrow \mathcal{N}\left(0,1\right)$ .

^[8]In practice c is set to 1, following the literature (Hannan and Quinn, 1979; Hall, 1994).

^[9]The time series length T is subindexed by $i=1,\ldots ,N$ because the panel can be unbalanced.

^[10]Choi (2001) also points out that the joint limit result where N and $\left\{ T_{i}\right\} _{i=1,\ldots ,N}$ go to infinity simultaneously is the same as the sequential limit, but it requires more moment conditions.

^[11]See the section Levin, Lin, and Chu (2002) for details. The only difference is the standard error estimate $\hat{\sigma }_{\varepsilon i}^{2}$ . Breitung suggests using $T-p_{i}-2$ instead of $T-p_{i}-1$ as in LLC to normalize the standard error.

^[12]Breitung (2000) suggests the approach in step 1 of Levin, Lin, and Chu (2002), while Breitung and Das (2005) suggest the prewhitening method as described above. In Breitung’s code, to be consistent with the papers, different approaches are adopted for model (2) and (3). Meanwhile, for the order of variable transformation and prewhitening, in model (2), the initial values are deducted (variable transformation) first, and then the prewhitening was applied. For model (3), the order is reversed. The series is prewhitened and then transformed to remove the mean and linear time trend.