The QLIM Procedure

Panel Data Analysis

You can use panel data to estimate the random effects in linear, censored response, truncated response, discrete choice, and stochastic frontier models.

Random-Effects Models for Panel Data

The general form of a nonlinear random-effects model is defined by the density for an observed random variable, $y_{it}$, as follows[13]

\[ f(y_{it}|\mathbf{x}_{it}, \mu _ i) = g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _ i; \btheta ) \]

where $\btheta $ is a vector of ancillary parameters such as a scale parameter or an overdispersion parameter and $\mu _ i$, $i=1,\ldots ,N$, embodies the group-specific heterogeneity, which is unobservable and has the density

\[ f(\mu _ i) = h(\mu _ i ; \btheta ) \]

There are $T_ i$ observations for group i. For example, in the case of a random-effects Tobit model, $y_{it}$ is specified as

\[ y^{*}_{it} = \mathbf{x}_{it}’\bbeta + \epsilon _{it}, ~ ~ ~ ~ ~ ~ ~ t=1,\ldots ,T_ i, ~ ~ ~ i=1,\ldots ,N \]
\[ y_{it} = \left\{ \begin{array}{ll} y^{*}_{it} & \mr{if} y^{*}_{it}>0 \\ 0 & \mr{if} y^{*}_{it}\leq 0 \end{array} \right. \]

where

\[ \epsilon _{it} = \mu _{i} + v_{it} \]
\[ v_{it}|\mathbf{x}_{i}, \mu _ i \sim N(0,\sigma ^{2}) \]
\[ \mu _{i}|\mathbf{x}_{i} \sim N(0,\sigma ^{2}_{\mu }) \]

where $\mathbf{x}_{i}$ contains $\mathbf{x}_{it}$ for all t and $\btheta $ consists of $\sigma $ and $\sigma _{\mu }$. Therefore, for this model

\[ f(y_{it}|\mathbf{x}_{it}, \mu _ i) = \{ 1 - \Phi [(\mathbf{x}_{it}’\bbeta + \mu _{i})/\sigma ]\} ^{1[y_{it}=0]}\{ (1/\sigma )\phi [(y_{it} - \mathbf{x}_{it}’\bbeta - \mu _{i})/\sigma ]\} ^{1[y_{it}>0]} \]

and

\[ f(\mu _ i) = \phi (\mu _ i/\sigma _{\mu }) \]

where $\Phi (\cdot )$ is the cumulative density function and $\phi (\cdot )$ is the probability density function of the standard normal distribution and $1[\cdot ]$ is the indicator function.

By construction, the $T_ i$ observations in group i are correlated and jointly distributed with a distribution that does not factor into the product of the marginal distributions. To be able to drive the joint distribution of the $T_ i + 1$ random variables, $f(y_{i1}, y_{i2},\ldots , y_{iT_ i}, \mu _ i | \mathbf{x}_{i}, \bbeta ; \btheta )$, the assumption that the $T_ i$ observations are independent conditional on $\mu _ i$ is important. Under this assumption the joint distribution can be written as

\begin{eqnarray*} f(y_{i1}, y_{i2},\ldots , y_{iT_ i}, \mu _ i | \mathbf{x}_{i}, \bbeta ; \btheta ) & =& f(y_{i1}, y_{i2},\ldots , y_{iT_ i} | \mathbf{x}_{i}, \bbeta ; \btheta )f(\mu _ i) \\ & =& \prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _ i; \btheta )h(\mu _ i ; \btheta ) \end{eqnarray*}

In order to form the likelihood function for the observed data, the unobserved component, $\mu _ i$, must be integrated out. For individual i

\[ L_ i = f(y_{i1}, y_{i2},\ldots , y_{iT_ i} | \mathbf{x}_{i}, \bbeta ; \btheta ) = \int _{\mu } \left[\prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _ i; \btheta )\right] h(\mu _ i ; \btheta )d\mu _ i \]

Therefore, the log-likelihood function for the observed data becomes

\[ \ln L = \sum _{i=1}^ N \ln \left[ \int _{\mu } \left(\prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _ i; \btheta )\right)h(\mu _ i ; \btheta )d\mu _ i\right] \]

In most cases, the integral in the square brackets does not have a closed form. The following subsections describe three approaches to this integration.

Simulated Maximum Likelihood

You can use simulation to approximate the integral. First, note that

\[ \int _{\mu } \left(\prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _ i; \btheta )\right)h(\mu _ i ; \btheta )d\mu _ i = E[F(\mu _ i;\btheta )] \]

The function is smooth, continuous, and continuously differentiable. By the law of large numbers, if $(\mu _{i1}, \mu _{i2}, \ldots , \mu _{iR})$ is a sample of iid draws from $h(\mu _ i ; \btheta )$, then

\[ \mr{plim}\frac{1}{R}\sum _{r=1}^{R} F(\mu _{ir};\btheta ) = E[F(\mu _ i;\btheta )] \]

This operation is implemented by simulation that uses a random-number generator. PROC QLIM inserts the simulated integral in the log likelihood to obtain the simulated log likelihood

\[ \ln L_{Simulated} = \sum _{i=1}^ N \ln \left[ \frac{1}{R}\sum _{r=1}^{R} \left(\prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _{ir}; \btheta )\right)\right] \]

and maximizes the simulated log likelihood with respect to the parameters $\bbeta $ and $\btheta $.

Under certain assumptions (Greene 2001), the simulated likelihood estimator and the maximum likelihood estimator are equivalent. For this equivalence result to hold, the number of draws, R, must increase faster than the number of observations, N. For this reason, if the NDRAW option is not specified, then by default, it is tied to the sample size by using the rule $R = N^{1+\delta }$, where $\delta =1/2$.

The use of independent random draws in simulation is conceptually straightforward, and the statistical properties of the simulated maximum likelihood estimator are easy to derive. However, simulation is a very computationally intensive technique. Moreover, the simulation method itself contributes to the variation of the simulated maximum likelihood estimator; see, for example, Geweke (1995). There are other ways to take draws that can provide greater accuracy by covering the domain of the integral more uniformly and by lowering the simulation variance (Train 2009 section 9.3). Quas–Monte Carlo methods (QMC), for example, are based on an integration technique that replaces the pseudorandom draws of Monte Carlo (MC) integration with a sequence of judiciously selected nonrandom points that provide more uniform coverage of the domain of the integral. Therefore, the advantage of QMC integration over MC integration is that for some types of sequences, the accuracy is far greater, convergence is much faster, and the simulation variance is smaller. QMC methods are surveyed in Bhat (2001), Sloan and Woźniakowski (1998), and Morokoff and Caflisch (1995). Besides MC simulation, PROC QLIM offers the QMC integration method that uses Halton sequences.

QMC Method Using the Halton Sequence

Halton sequences (Halton 1960) provide a uniform coverage for each observations’s integral, and they decrease the simulation variance by inducing a negative correlation over the draws for each observation. A Halton sequence is constructed deterministically in terms of a prime number as its base. For example, the following sequence is the Halton sequence for 2:

\[ 1/2, 1/4, 3/4, 1/8, 5/8, 3/8, 7/8, 1/16, 9/16, \ldots \]

For more information about how to generate a Halton sequence, see Train (2009) section 9.3.3.

If you use the QMC method, one long Halton sequence is created, and then part of the sequence (or the whole sequence, depending on whether you decide to discard the initial elements of the sequence[14]) is used in groups. Each group of consequent elements constitutes the “draws” for each cross-sectional observation. This way, each subsequence fills in the gaps for the previous subsequences, and the draws for one observation tend to be negatively correlated with those for the previous observation.

When the number of draws used for each observation rises, the coverage for each observation improves. This improvement in turn improves the accuracy; however, the negative covariance across observations diminishes. Because Halton draws are far more effective than random draws for simulation, a small number of Halton draws provide relatively good integration (Spanier and Maize 1991).

The Halton draws are for a uniform density. PROC QLIM evaluates the inverse cumulative standard normal density for each element of the Halton sequence to obtain draws from a standard normal density.

Approximation by Hermite Quadrature

This method is the Butler and Moffitt (1982) approach, which is based on models in which $\mu _ i$ has a normal distribution. If $\mu _ i$ is normally distributed with zero mean, then

\begin{eqnarray*} & & \int _{\mu } \left(\prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _ i; \btheta )\right)h(\mu _ i ; \btheta )d\mu _ i \\ & & = \frac{1}{\sigma _{\mu }\sqrt {2\pi }} \int _{-\infty }^{+\infty } \prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta , \mu _ i; \btheta ) \exp \left(\frac{-\mu _ i^2}{2\sigma _{\mu }^2}\right)d\mu _ i \end{eqnarray*}

Let $r_ i = \mu _ i/(\sigma _{\mu }\sqrt {2})$. Then $\mu _ i = (\sigma _{\mu }\sqrt {2})r_ i$ and $d\mu _ i = (\sigma _{\mu }\sqrt {2})dr_ i$. Making the change of variable and letting the error effects be additive produce

\[ L_ i = \frac{1}{\sqrt {\pi }} \int _{-\infty }^{+\infty } \exp (-r_ i^2) \left[\prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta + (\sigma _{\mu }\sqrt {2})r_ i; \btheta )\right] dr_ i \]

This likelihood function is in a form that can be approximated accurately by using Gauss-Hermite quadrature, which eliminates the integration. Thus, the log-likelihood function can be approximated with

\[ \ln L_{h} = \sum _{i=1}^ N \ln \left[ \frac{1}{\sqrt {\pi }}\sum _{h=1}^{H} w_ h \prod _{t=1}^{T_ i} g(y_{it}, \mathbf{x}_{it}’\bbeta + (\sigma _{\mu }\sqrt {2})r_ i; \btheta )\right] \]

where $w_ h$ and $r_ h$ are the weights and nodes for the Hermite quadrature of degree H. PROC QLIM maximizes $\ln L_{h}$ when the Hermite quadrature option is specified (METHOD=HERMITE in the RANDOM statement).



[13] The i subscript represents individuals, and the t subscript represents time.

[14] When sequences are created in multiple dimensions, the initial part of the series is usually eliminated because the initial terms of multiple Halton sequences are highly correlated. However, there is no such correlation for a single dimension.