The PANEL Procedure

Overview: PANEL Procedure

The PANEL procedure analyzes a class of linear econometric models that commonly arise when time series and cross-sectional data are combined. This type of pooled data on time series cross-sectional bases is often referred to as panel data. Typical examples of panel data include observations over time on households, countries, firms, trade, and so on. For example, in the case of survey data on household income, the panel is created by repeatedly surveying the same households in different time periods (years).

The panel data models can be grouped into several categories depending on the structure of the error term. The PANEL procedure uses the following error structures and the corresponding methods to analyze data:

one-way and two-way models
fixed-effects and random-effects models
autoregressive models
moving average models

A one-way model depends only on the cross section to which the observation belongs. A two-way model depends on both the cross section and the time period to which the observation belongs.

Apart from the possible one-way or two-way nature of the effect, the other dimension of difference between the possible specifications is the nature of the cross-sectional or time-series effect. The models are referred to as fixed-effects models if the effects are nonrandom and as random-effects models otherwise.

If the effects are fixed, the models are essentially regression models with dummy variables that correspond to the specified effects. For fixed-effects models, ordinary least squares (OLS) estimation is the best linear unbiased estimator. Random-effects models use a two-stage approach. In the first stage, variance components are calculated by using methods described by Fuller and Battese (1974), Wansbeek and Kapteyn (1984), Wallace and Hussain (1969), or Nerlove (1971). In the second stage, variance components are used to standardize the data, and ordinary least squares (OLS) regression is performed.

Two types of models in the PANEL procedure accommodate an autoregressive structure: The Parks method estimates a first-order autoregressive model with contemporaneous correlation, and the dynamic panel estimator estimates an autoregressive model with lagged dependent variable.

The Da Silva method estimates a mixed variance-component moving-average error process. The regression parameters are estimated by using a two-step generalized least squares (GLS)-type estimator.

The PANEL procedure enhances the features that were implemented in the TSCSREG procedure. The following list shows the most important additions.

New estimation methods include between estimators, pooled estimators, and dynamic panel estimators that use the generalized method of moments (GMM). The variance components for random-effects models can be calculated for both balanced and unbalanced panels by using the methods described by Fuller and Battese (1974), Wansbeek and Kapteyn (1984), Wallace and Hussain (1969), or Nerlove (1971).
The CLASS statement creates classification variables that are used in the analysis.
The TEST statement includes new options for Wald, Lagrange multiplier, and the likelihood ratio tests.
The new RESTRICT statement specifies linear restrictions on the parameters.
The FLATDATA statement enables the data to be in a compressed form.
Several methods that produce heteroscedasticity-consistent covariance matrices (HCCME) are added because the presence of heteroscedasticity can result in inefficient and biased estimates of the variance-covariance matrix in the OLS framework.
The LAG statement can generate a large number of missing values, depending on lag order. Typically, it is difficult to create lagged variables in the panel setting. If lagged variables are created in a DATA step, several programming steps that include loops are often needed. By including the LAG statement, the PANEL procedure makes the creation of lagged values easy. The missing values can be replaced with zeros, overall mean, time mean, or cross section mean by using the LAG, ZLAG, XLAG, SLAG, and CLAG statements.
The OUTPUT statement enables you to output data and estimates that can be used in other analyses.