# The PANEL Procedure

### Panel Data Unit Root Tests

Subsections:

#### Levin, Lin, and Chu (2002)

Levin, Lin, and Chu (2002) propose a panel unit root test for the null hypothesis of unit root against a homogeneous stationary hypothesis. The model is specified as

Three models are considered: (1) (the empty set) with no individual effects, (2) in which the series has an individual-specific mean but no time trend, and (3) in which the series has an individual-specific mean and linear and individual-specific time trend. The panel unit root test evaluates the null hypothesis of , for all i, against the alternative hypothesis for all i. The lag order is unknown and is allowed to vary across individuals. It can be selected by the methods that are described in the section Lag Order Selection in the ADF Regression. Denote the selected lag orders as . The test is implemented in three steps.

Step 1

The ADF regressions are implemented for each individual i, and then the orthogonalized residuals are generated and normalized. That is, the following model is estimated:

The two orthogonalized residuals are generated by the following two auxiliary regressions:

The residuals are saved at and , respectively. To remove heteroscedasticity, the residuals and are normalized by the regression standard error from the ADF regression. Denote the standard error as , and normalize residuals as

Step 2

The ratios of long-run to short-run standard deviations of are estimated. Denote the ratios and the long-run variances as and , respectively. The long-run variances are estimated by the HAC (heteroscedasticity- and autocorrelation-consistent) estimators, which are described in the section Long-Run Variance Estimation. Then the ratios are estimated by . Let the average standard deviation ratio be , and let its estimator be .

Step 3

The panel test statistics are calculated. To calculate the t statistic and the adjusted t statistic, the following equation is estimated:

The total number of observations is , with . The standard t statistic for testing is , with OLS estimator and standard deviation . However, the standard t statistic diverges to negative infinity for models (2) and (3). Let be the root mean square error from the step 3 regression, and denote it as

Levin, Lin, and Chu (2002) propose the following adjusted t statistic:

The mean and standard deviation adjustments () depend on the time series dimension and model specification m, which can be found in Table 2 of Levin, Lin, and Chu (2002). The adjusted t statistic converges to the standard normal distribution. Therefore, the standard normal critical values are used in hypothesis testing.

##### Lag Order Selection in the ADF Regression

The methods for selecting the individual lag orders in the ADF regressions can be divided into two categories: selection based on information criteria and selection via sequential testing.

###### Lag Selection Based on Information Criteria

In this method, the following information criteria can be applied to lag order selection: AIC, SBC, HQIC (HQC), and MAIC. As with other model selection applications, the lag order is selected from 0 to the maximum to minimize the objective function, plus a penalty term, which is a function of the number of parameters in the regression. Let k be the number of parameters and be the number of effective observations. For regression models, the objective function is , where SSR is the sum of squared residuals. For AIC, the penalty term equals . For SBC, this term is . For HQIC, it is with c being a constant greater than 1.[8] For MAIC, the penalty term equals , where

and is the estimated coefficient of the lagged dependent variable in the ADF regression.

###### Lag Selection via Sequential Testing

In this method, the lag order estimation is based on the statistical significance of the estimated AR coefficients. Hall (1994) proposed general-to-specific (GS) and specific-to-general (SG) strategies. Levin, Lin, and Chu (2002) recommend the first strategy, following Campbell and Perron (1991). In the GS modeling strategy, starting with the maximum lag order , the t test for the largest lag order in is performed to determine whether a smaller lag order is preferred. Specifically, when the null of is not rejected given the significance level (), a smaller lag order is preferred. This procedure continues until a statistically significant lag order is reached. On the other hand, the SG modeling strategy starts with lag order 0 and moves toward the maximum lag order .

##### Long-Run Variance Estimation

The long-run variance of is estimated by a HAC-type estimator. For model (1), given the lag truncation parameter and kernel weights , the formula is

To achieve consistency, the lag truncation parameter must satisfy and as . Levin, Lin, and Chu (2002) suggest . The weights depend on the kernel function. Andrews (1991) proposes data-driven bandwidth (lag truncation parameter + 1 if integer-valued) selection procedures to minimize the asymptotic mean squared error (MSE) criterion. For details about the kernel functions and Andrews (1991) data-driven bandwidth selection procedure, see the section Heteroscedasticity- and Autocorrelation-Consistent Covariance Matrices for details. Because Levin, Lin, and Chu (2002) truncate the bandwidth as an integer, when LLCBAND is specified as the BANDWIDTH option, it corresponds to . Furthermore, kernel weights with kernel function .

For model (2), the series is demeaned individual by individual first. Therefore, is replaced by , where is the mean of for individual i. For model (3) with individual fixed effects and time trend, both the individual mean and trend should be removed before the long-run variance is estimated. That is, first regress on for each individual and save the residual , and then replace with the residual.

##### Cross-Sectional Dependence via Time-Specific Aggregate Effects

The Levin, Lin, and Chu (2002) testing procedure is based on the assumption of cross-sectional independence. It is possible to relax this assumption and allow for a limited degree of dependence via time-specific aggregate effects. Let denote the time-specific aggregate effects; then the data generating process (DGP) becomes

Two more models are considered: (4) (the empty set) with no individual effects, but with time effects, and (5) in which the series has an individual-specific mean but and time-specific mean.

By subtracting the time averages from the observed dependent variable , or equivalently, by including the time-specific intercepts in the ADF regression, the cross-sectional dependence is removed. The impact of a single aggregate common factor that has an identical impact on all individuals but changes over time can also be removed in this way. After cross-sectional dependence is removed, the three-step procedure is applied to calculate the Levin, Lin, and Chu (2002) adjusted t statistic.

##### Deterministic Variables

Three deterministic variables can be included in the model for the first-stage estimation: CS_FixedEffects (cross-sectional fixed effects), TS_FixedEffects (time series fixed effects), and TimeTrend (individual linear time trend). When a linear time trend is included, the individual fixed effects are also included. Otherwise the time trend is not identified. Moreover, if the time fixed effects are included, the time trend is not identified either. Therefore, we have 5 identified models: model (1), no deterministic variables; model (2), CS_FixedEffects; model (3), CS_FixedEffects and TimeTrend; model (4), TS_FixedEffects; model (5), CS_FixedEffects TS_FixedEffects. PROC PANEL outputs the test results for all 5 model specifications.

#### Im, Pesaran, and Shin (2003)

To test for the unit root in heterogeneous panels, Im, Pesaran, and Shin (2003) propose a standardized t-bar test statistic based on averaging the (augmented) Dickey-Fuller statistics across the groups. The limiting distribution is standard normal. The stochastic process is generated by the first-order autoregressive process. If , the data generating process can be expressed as in LLC :

Unlike the DGP in LLC , is allowed to differ across groups. The null hypothesis of unit roots is

against the heterogeneous alternative,

The Im, Pesaran, and Shin test also allows for some (but not all) of the individual series to have unit roots under the alternative hypothesis. But the fraction of the individual processes that are stationary is positive, . The t-bar statistic, denoted by , is formed as a simple average of the individual t statistics for testing the null hypothesis of . If is the standard t statistic, then

If , then for each i the t statistic (without time trend) converges to the Dickey-Fuller distribution, , defined by

where is the standard Brownian motion. The limiting distribution is different when a time trend is included in the regression (Hamilton, 1994, p. 499). The mean and variance of the limiting distributions are reported in Nabeya (1999). The standardized t-bar statistic satisfies

where the standard normal is the sequential limit with followed by . To obtain better finite sample approximations, Im, Pesaran, and Shin (2003) propose standardizing the t-bar statistic by means and variances of under the null hypothesis . The alternative standardized t-bar statistic is

Im, Pesaran, and Shin (2003) simulate the values of and for different values of T and p. The lag order in the ADF regression can be selected by the same method as in Levin, Lin, and Chu (2002). See the section Lag Order Selection in the ADF Regression for details.

When T is fixed, Im, Pesaran, and Shin (2003) assume serially uncorrelated errors, ; is likely to have finite second moment, which is not established in the paper. The t statistic is modified by imposing the null hypothesis of a unit root. Denote as the estimated standard error from the restricted regression (),

where is the OLS estimator of (unrestricted model), , , and Under the null hypothesis, the standardized statistic converges to a standard normal variate,

where and are the mean and variance of , respectively. The limit is taken as and T is fixed. Their values are simulated for finite samples without a time trend. The is also likely to converge to standard normal.

When N and T are both finite, an exact test that assumes no serial correlation can be used. The critical values of and are simulated.

Similar as in section Levin, Lin, and Chu (2002), it is possible to relax this assumption of cross-sectional independence and allow for a limited degree of dependence via time-specific aggregate effects. Two more models (model 4 and model 5) with time fixed effects are considered. See section Cross-Sectional Dependence via Time-Specific Aggregate Effects for details.

#### Combination Tests

Combining the observed significance levels (p-values) from N independent tests of the unit root null hypothesis was proposed by Maddala and Wu (1999); Choi (2001). Suppose is the test statistic to test the unit root null hypothesis for individual , and is the cdf (cumulative distribution function) of the asymptotic distribution as . Then the asymptotic p-value is defined as

There are different ways to combine these p-values. The first one is the inverse chi-square test (Fisher, 1932); this test is referred to as P test in Choi (2001) and in Maddala and Wu (1999):

When the test statistics are continuous, are independent uniform variables. Therefore, as and N fixed. But as , P diverges to infinity in probability. Therefore, it is not applicable for large N. To derive a nondegenerate limiting distribution, the P test (Fisher test with ) should be modified to

Under the null as ,[9] and then , .[10]

The second way of combining individual p-values is the inverse normal test,

where is the standard normal cdf. When , as N is fixed. When N and are both large, the sequential limit is also standard normal if first and next.

The third way of combining p-values is the logit test,

where . When and N is fixed, . In other words, the limiting distribution is the t distribution with degree of freedom . The sequential limit is as and then . Simulation results in Choi (2001) suggest that the test outperforms other combination tests. For the time series unit root test , Maddala and Wu (1999) apply the augmented Dickey-Fuller test. According to Choi (2006), the Elliott, Rothenberg, and Stock (1996) Dickey-Fuller generalized least squares (DF-GLS) test brings significant size and power advantages in finite samples.

Similar as in section Levin, Lin, and Chu (2002), it is possible to relax this assumption of cross-sectional independence and allow for a limited degree of dependence via time-specific aggregate effects. Two more models (model 4 and model 5) with time fixed effects are considered. See section Cross-Sectional Dependence via Time-Specific Aggregate Effects for details.

#### Breitung’s Unbiased Tests

To account for the nonzero mean of the t statistic in the OLS detrending case, bias-adjusted t statistics were proposed by: Levin, Lin, and Chu (2002); Im, Pesaran, and Shin (2003). The bias corrections imply a severe loss of power. Breitung and associates take an alternative approach to avoid the bias, by using alternative estimates of the deterministic terms (Breitung and Meyer, 1994; Breitung, 2000; Breitung and Das, 2005). The DGP is the same as in the Im, Pesaran, and Shin approach. When serial correlation is absent, for model (2) with individual specific means, the constant terms are estimated by the initial values . Therefore, the series is adjusted by subtracting the initial value. The equation becomes

For model (3) with individual specific means and time trends, the time trend can be estimated by . The levels can be transformed as

The Helmert transformation is applied to the dependent variable to remove the mean of the differenced variable:

The transformed model is

The pooled t statistic has a standard normal distribution. Therefore, no adjustment is needed for the t statistic. To adjust for heteroscedasticity across cross sections, Breitung (2000) proposes a UB (unbiased) statistic based on the transformed data,

where . When is unknown, it can be estimated as

The statistic has a standard normal limiting distribution as followed by sequentially.
To account for the short-run dynamics, Breitung and Das (2005) suggest applying the test to the prewhitened series, . For model (1) and model (2) (constant-only case), they suggested the same method as in step 1 of Levin, Lin, and Chu (2002).[11] For model (3) (with a constant and linear time trend), the prewhitened series can be obtained by running the following restricted ADF regression under the null hypothesis of a unit root ( ) and no intercept and linear time trend ():

where is a consistent estimator of the true lag order and can be estimated by the procedures listed in the section Lag Order Selection in the ADF Regression. For LLC and IPS tests, the lag orders are selected by running the ADF regressions. But for Breitung and his coauthors’ tests, the restricted ADF regressions are used to be consistent with the prewhitening method. Let be the estimated coefficients.[12] The prewhitened series can be obtained by

and

The transformed series are random walks under the null hypothesis,

where for . When the cross-section units are independent, the t statistic converges to standard normal under the null, as followed by ,

where with OLS estimator .
To take account for cross-sectional dependence, Breitung and Das (2005) propose the robust t statistic and a GLS version of the test statistic. Let be the error vector for time t, and let be a positive definite matrix with eigenvalues . Let and . The model can be written as a SUR-type system of equations,

The unknown covariance matrix can be estimated by its sample counterpart,

The sequential limit followed by of the standard t statistic is normal with mean 0 and variance . The variance can be consistently estimated by . Thus the robust t statistic can be calculated as

as followed by under the null hypothesis of random walk. Since the finite sample distribution can be quite different, Breitung and Das (2005) list the , , and critical values for different N’s.

When , a (feasible) GLS estimator is applied; it is asymptotically more efficient than the OLS estimator. The data are transformed by multiplying as defined before, . Thus the model is transformed into

The feasible GLS (FGLS) estimator of and the corresponding t statistic are obtained by estimating the transformed model by OLS and denoted by and , respectively:

Similar as in section Levin, Lin, and Chu (2002), it is possible to relax this assumption of cross-sectional independence and allow for a limited degree of dependence via time-specific aggregate effects. Two more models (model 4 and model 5) with time fixed effects are considered. See section Cross-Sectional Dependence via Time-Specific Aggregate Effects for details.

Hadri (2000) adopts a component representation where an individual time series is written as a sum of a deterministic trend, a random walk, and a white-noise disturbance term. Under the null hypothesis of stationary, the variance of the random walk equals 0. Specifically, two models are considered:

• For model (1), the time series is stationary around a level ,

• For model (2), is trend stationary,

where is the random walk component,

The initial values of the random walks, , are assumed to be fixed unknowns and can be considered as heterogeneous intercepts. The errors and satisfy , and are mutually independent.

The null hypothesis of stationarity is against the alternative random walk hypothesis .

In matrix form, the models can be written as

where , with , and with being a vector of ones, , and .

Let be the residuals from the regression of on ; then the LM statistic is

where is the partial sum of the residuals and is a consistent estimator of under the null hypothesis of stationarity. With some regularity conditions,

where is a standard Brownian bridge in model (1) and a second-level Brownian bridge in model (2). Let be a standard Wiener process (Brownian motion),

The mean and variance of the random variable can be calculated by using the characteristic functions,

and

The LM statistics can be standardized to obtain the standard normal limiting distribution,

##### Consistent Estimator of

Hadri’s (2000) test can be applied to the general case of heteroscedasticity and serially correlated disturbance errors. Under homoscedasticity and serially uncorrelated errors, can be estimated as

where k is the number of regressors. Therefore, for model (1) and for model (2).

When errors are heteroscedastic across individuals, the standard errors can be estimated by for each individual i and the LM statistic needs to be modified to

To allow for temporal dependence over t, has to be replaced by the long-run variance of , which is defined as . A HAC estimator can be used to consistently estimate the long-run variance . For more information, see the section Long-Run Variance Estimation.

Similar as in section Levin, Lin, and Chu (2002), it is possible to relax this assumption of cross-sectional independence and allow for a limited degree of dependence via time-specific aggregate effects. One more models (model 3) with time fixed effects are considered. See section Cross-Sectional Dependence via Time-Specific Aggregate Effects for details.

#### Harris and Tzavalis (1999) Panel Unit Root Tests

Harris and Tzavalis (1999) derive the panel unit root test under fixed T and large N. Five models are considered as in Levin, Lin, and Chu (2002). Model (1) is the homogeneous panel,

Under the null hypothesis, . For model (2), each series is a unit root process with a heterogeneous drift,

Model (3) includes heterogeneous drifts and linear time trends,

Similar as in section Levin, Lin, and Chu (2002), it is possible to relax this assumption of cross-sectional independence and allow for a limited degree of dependence via time-specific aggregate effects. Two more models (model 4 and model 5) with time fixed effects are considered. See section Cross-Sectional Dependence via Time-Specific Aggregate Effects for details.

Let be the OLS estimator of ; then

where , , and is the projection matrix. For model (1), there are no regressors other than the lagged dependent value, so is the identity matrix . For model (2), a constant is included, so with a column of ones. For model (3), a constant and time trend are included. Thus , where and .

When in model (1) under the null hypothesis, as

As , it becomes .

When the drift is absent in model (2), , under the null hypothesis, as

As , .

When the time trend is absent in model (3), , under the null hypothesis, as

When , .

[8] In practice c is set to 1, following the literature (Hannan and Quinn, 1979; Hall, 1994).

[9] The time series length T is subindexed by because the panel can be unbalanced.

[10] Choi (2001) also points out that the joint limit result where N and go to infinity simultaneously is the same as the sequential limit, but it requires more moment conditions.

[11] See the section Levin, Lin, and Chu (2002) for details. The only difference is the standard error estimate . Breitung suggests using instead of as in LLC to normalize the standard error.

[12] Breitung (2000) suggests the approach in step 1 of Levin, Lin, and Chu (2002), while Breitung and Das (2005) suggest the prewhitening method as described above. In Breitung’s code, to be consistent with the papers, different approaches are adopted for model (2) and (3). Meanwhile, for the order of variable transformation and prewhitening, in model (2), the initial values are deducted (variable transformation) first, and then the prewhitening was applied. For model (3), the order is reversed. The series is prewhitened and then transformed to remove the mean and linear time trend.