The PANEL Procedure

Hausman-Taylor Estimation

The Hausman and Taylor (1981) model is a hybrid that combines the consistency of a fixed-effects model with the efficiency and applicability of a random-effects model. One-way random-effects models assume exogeneity of the regressors, namely that they be independent of both the cross-sectional and observation-level errors. In cases where some regressors are correlated with the cross-sectional errors, the random effects model can be adjusted to deal with the endogeneity.

Consider the one-way model:

\[ y_{it}= \Strong{X}_{1it} \bbeta _1 + \Strong{X}_{2it} \bbeta _2 + \Strong{Z}_{1i} \bgamma _1 + \Strong{Z}_{2i} \bgamma _2 + \nu _ i + \epsilon _{it} \hspace{0.2 in} \]

The regressors are subdivided so that the $\Strong{X}$ variables vary within cross sections whereas the $\Strong{Z}$ variables do not and would otherwise be dropped from a fixed-effects model. The subscript 1 denotes variables that are independent of both error terms (exogenous variables), and the subscript 2 denotes variables that are independent of the observation-level errors $\epsilon _{it}$ but correlated with cross-sectional errors $\nu _ i$ (endogenous variables). The intercept term (if your model has one) is included as part of $\Strong{Z}_1$ in what follows.

The Hausman-Taylor estimator is an instrumental variables regression on data that are weighted similarly to data for random-effects estimation. In both cases, the weights are functions of the estimated variance components.

Begin with ${\mb{P} _{0}=\mr{diag}({\bar{\mb{J}}}_{T_{i}})}$ and $\mb{Q} _{0}=\mr{diag}(\mb{E} _{T_{i}})$. The mean transformation vector is ${\bar{\mb{J}}}_{T_{i}}=\mb{J} _{T_{i}}/\mi{T} _{i}$ and the deviations from the mean transform is ${\mb{E} _{T_{i}}=\mb{I} _{T_{i}}-{\bar{\mb{J}}}_{T_{i}}}$, where $\mb{J}_{T_{i}}$ is a square matrix of ones of dimension ${T_{i}}$.

The observation-level variance is estimated from a standard fixed-effects model fit. For $\Strong{X}_ s = \{ \Strong{X}_1, \Strong{X}_2\} $, $\tilde{\Strong{X}}_ s = \Strong{Q}_0 \Strong{X}_ s$, and $\tilde{\Strong{y}} = \Strong{Q}_0 \Strong{y}$, let

\begin{eqnarray*} \tilde{\bbeta }_ s& =& \left( \tilde{\Strong{X}}_ s’ \tilde{\Strong{X}}_ s \right) ^{-1} \tilde{\Strong{X}}_ s’ \tilde{\Strong{y}} \\ \mbox{SSE} & =& \left(\tilde{\Strong{y}} - \tilde{\Strong{X}}_ s \tilde{\bbeta }_ s \right)’ \left(\tilde{\Strong{y}} - \tilde{\Strong{X}}_ s \tilde{\bbeta }_ s \right) \\ \hat\sigma ^2_\epsilon & =& \mbox{SSE} / (M - N) \end{eqnarray*}

To estimate the cross-sectional error variance, form the mean residuals $\Strong{r} = \Strong{P}_0’( \Strong{y} - \Strong{X}_ s \tilde{\bbeta }_ s)$. You can use the mean residuals to obtain intermediate estimates of the coefficients for $\Strong{Z}_1$ and $\Strong{Z}_2$ via two-stage least squares (2SLS) estimation. At the first stage, use $\Strong{X}_1$ and $\Strong{Z}_1$ as instrumental variables to predict $\Strong{Z}_2$. At the second stage, regress $\Strong{r}$ on both $\Strong{Z}_1$ and the predicted $\Strong{Z}_2$ to obtain $\hat\bgamma ^ m_1$ and $\hat\bgamma ^ m_2$.

To estimate the cross-sectional variance, form $\hat\sigma ^2_\nu = \{  R(\nu ) / N - \hat\sigma ^2_\epsilon \} /\bar{T}$, with $\bar{T} = N / (\sum _{i=1}^ N T^{-1}_ i)$ and

\[ R(\nu ) = \left(\Strong{r} - \Strong{Z}_1 \hat\bgamma ^ m_1 - \Strong{Z}_2 \hat\bgamma ^ m_2 \right)’ \left(\Strong{r} - \Strong{Z}_1 \hat\bgamma ^ m_1 - \Strong{Z}_2 \hat\bgamma ^ m_2 \right) \\ \]

After variance-component estimation, transform the dependent variable into partial deviations: $y^*_{it} = y_{it} - \hat\theta _ i \bar{y}_{i.}$. Likewise, transform the regressors to form $\Strong{X}^*_{1it}$, $\Strong{X}^*_{2it}$, $\Strong{Z}^*_{1i}$, and $\Strong{Z}^*_{2i}$. The partial weights $\hat\theta _ i$ are determined by $\hat\theta _ i = 1 - \hat\sigma _\epsilon / \hat w_ i$, with $\hat w_ i^2 = T_ i\hat\sigma ^2_\nu + \hat\sigma ^2_\epsilon $.

Finally, you obtain the Hausman-Taylor estimates by performing 2SLS regression of $y^*_{it}$ on $\Strong{X}^*_{1it}$, $\Strong{X}^*_{2it}$, $\Strong{Z}^*_{1i}$, and $\Strong{Z}^*_{2i}$. For the first-stage regression, use the following instruments:

  • $\tilde{\Strong{X}}_{it}$, the deviations from cross-sectional means for all time-varying variables $\Strong{X}$, for the ith cross section during time period t

  • $(1-\hat\theta _ i)\bar{\Strong{X}}_{1i.}$, where $\bar{\Strong{X}}_{1i.}$ are the means of the time-varying exogenous variables for the ith cross section

  • $(1-\hat\theta _ i)\Strong{Z}_{1i}$

Multiplication by the factor $(1-\hat\theta _ i)$ is redundant in balanced data, but necessary in the unbalanced case to produce accurate instrumentation; see Gardner (1998).

Let $k_1$ equal the number of regressors in $\Strong{X}_1$, and $g_2$ equal the number of regressors in $\Strong{Z}_2$. Then the Hausman-Taylor model is identified only if $k_1 \geq g_2$; otherwise, no estimation will take place.

Hausman and Taylor (1981) describe a specification test that compares their model to fixed effects. For a null hypothesis of fixed effects, Hausman’s m statistic is calculated by comparing the parameter estimates and variance matrices for both models, identically to how it is calculated for one-way random effects models; for more information, see the section Specification Tests. The degrees of freedom of the test, however, are not based on matrix rank but instead are equal to $k_1 - g_2$.