The COUNTREG Procedure

Negative Binomial Regression

The Poisson regression model can be generalized by introducing an unobserved heterogeneity term for observation $i$. Thus, the individuals are assumed to differ randomly in a manner that is not fully accounted for by the observed covariates. This is formulated as

\[  E(y_{i}|\mathbf{x}_{i}, \tau _{i}) = \mu _{i} \tau _{i} = e^{\mathbf{x}_{i}\bbeta + \epsilon _{i}}  \]

where the unobserved heterogeneity term $\tau _{i} = e^{\epsilon _{i}}$ is independent of the vector of regressors $\mathbf{x}_{i}$. Then the distribution of $y_{i}$ conditional on $\mathbf{x}_{i}$ and $\tau _{i}$ is Poisson with conditional mean and conditional variance $\mu _{i} \tau _{i}$:

\[  f(y_{i}|\mathbf{x}_{i},\tau _{i}) = \frac{\exp (-\mu _{i}\tau _{i}) (\mu _{i}\tau _{i})^{y_{i}}}{y_{i}!}  \]

Let $g(\tau _{i})$ be the probability density function of $\tau _{i}$. Then, the distribution $f(y_{i}|\mathbf{x}_{i})$ (no longer conditional on $\tau _{i}$) is obtained by integrating $f(y_{i}|\mathbf{x}_{i},\tau _{i})$ with respect to $\tau _{i}$:

\[  f(y_{i}|\mathbf{x}_{i}) = \int _{0}^{\infty } f(y_{i}|\mathbf{x}_{i},\tau _{i}) g(\tau _{i}) d\tau _{i}  \]

An analytical solution to this integral exists when $\tau _{i}$ is assumed to follow a gamma distribution. This solution is the negative binomial distribution. When the model contains a constant term, it is necessary to assume that $E(e^{\epsilon _{i}}) = E(\tau _{i}) =1$, in order to identify the mean of the distribution. Thus, it is assumed that $\tau _{i}$ follows a gamma($\theta , \theta $) distribution with $E(\tau _{i}) =1$ and $V(\tau _{i}) = 1 \slash \theta $,

\[  g(\tau _{i}) = \frac{\theta ^{\theta }}{\Gamma (\theta )} \tau _{i}^{\theta -1}\exp (-\theta \tau _{i})  \]

where $\Gamma (x)=\int _{0}^{\infty }z^{x-1}\exp (-z)dz$ is the gamma function and $\theta $ is a positive parameter. Then, the density of $y_ i$ given $\mathbf{x}_ i$ is derived as

$\displaystyle  f(y_{i}|\mathbf{x}_{i})  $
$\displaystyle  =  $
$\displaystyle  \int _{0}^{\infty }f(y_{i}|\mathbf{x}_{i},\tau _{i}) g(\tau _{i})d\tau _{i}  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \frac{\theta ^{\theta } \mu _{i}^{y_{i}}}{y_{i}!\Gamma (\theta )} \int _{0}^{\infty } e^{-(\mu _{i}+\theta )\tau _{i}}\tau _{i}^{\theta +y_{i}-1}d\tau _{i}  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \frac{\theta ^{\theta }\mu _{i}^{y_{i}}\Gamma (y_{i}+\theta )}{y_{i}!\Gamma (\theta )(\theta +\mu _{i})^{\theta +y_{i}}}  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \frac{\Gamma (y_{i}+\theta )}{y_{i}!\Gamma (\theta )} \left(\frac{\theta }{\theta +\mu _{i}}\right)^{\theta } \left(\frac{\mu _{i}}{\theta +\mu _{i}}\right)^{y_{i}}  $

Making the substitution $\alpha =\frac{1}{\theta }$ ($\alpha > 0$), the negative binomial distribution can then be rewritten as

\[  f(y_{i}|\mathbf{x}_{i}) = \frac{\Gamma (y_{i}+\alpha ^{-1})}{y_{i}!\Gamma (\alpha ^{-1})} \left(\frac{\alpha ^{-1}}{\alpha ^{-1}+\mu _{i}}\right)^{\alpha ^{-1}} \left(\frac{\mu _{i}}{\alpha ^{-1}+\mu _{i}}\right)^{y_{i}}, \quad y_{i} = 0,1,2,\ldots  \]

Thus, the negative binomial distribution is derived as a gamma mixture of Poisson random variables. It has conditional mean

\[  E(y_{i}|\mathbf{x}_{i})=\mu _{i} = e^{\mathbf{x}_{i}\bbeta }  \]

and conditional variance

\[  V(y_{i}|\mathbf{x}_{i}) = \mu _{i} [1+\frac{1}{\theta }\mu _{i}] = \mu _{i}[1+\alpha \mu _{i}] > E(y_{i}|\mathbf{x}_{i})  \]

The conditional variance of the negative binomial distribution exceeds the conditional mean. Overdispersion results from neglected unobserved heterogeneity. The negative binomial model with variance function $V(y_{i}|\mathbf{x}_{i}) = \mu _{i}+\alpha \mu _{i}^2$, which is quadratic in the mean, is referred to as the NEGBIN2 model (Cameron and Trivedi 1986). To estimate this model, specify DIST=NEGBIN(p=2) in the MODEL statement. The Poisson distribution is a special case of the negative binomial distribution where $\alpha =0$. A test of the Poisson distribution can be carried out by testing the hypothesis that $\alpha =\frac{1}{\theta _ i}=0$. A Wald test of this hypothesis is provided (it is the reported $t$ statistic for the estimated $\alpha $ in the negative binomial model).

The log-likelihood function of the negative binomial regression model (NEGBIN2) is given by

$\displaystyle  \mathcal{L}  $
$\displaystyle  =  $
$\displaystyle  \sum _{i=1}^{N} w_ i\Bigg\{  \sum _{j=0}^{y_{i}-1}\ln (j+\alpha ^{-1}) -\ln (y_{i}!)  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  - (y_{i}+\alpha ^{-1}) \ln (1+\alpha \exp (\mathbf{x}_{i}^{\prime }\bbeta )) +y_{i}\ln (\alpha ) + y_{i}\mathbf{x}_{i}^{\prime }\bbeta \Bigg\}   $

\[  \Gamma (y+a)/\Gamma (a)= \prod _{j=0}^{y-1} (j+a)  \]

if $y$ is an integer. See Poisson Regression for the definition of $w_ i$.

The gradient is

\[  \frac{\partial \mathcal{L}}{\partial \bbeta } = \sum _{i=1}^{N} w_ i\frac{y_{i}-\mu _{i}}{1+\alpha \mu _{i}} \mathbf{x}_{i}  \]

and

\[  \frac{\partial \mathcal{L}}{\partial \alpha } = \sum _{i=1}^{N} w_ i\left\{  - \alpha ^{-2} \sum _{j=0}^{y_{i}-1} \frac{1}{(j + \alpha ^{-1})} + \alpha ^{-2} \ln (1+\alpha \mu _{i}) + \frac{y_{i}-\mu _{i}}{\alpha (1+\alpha \mu _{i})}\right\}   \]

Cameron and Trivedi (1986) consider a general class of negative binomial models with mean $\mu _{i}$ and variance function $\mu _{i}+\alpha \mu _{i}^ p$. The NEGBIN2 model, with $p =2$, is the standard formulation of the negative binomial model. Models with other values of $p$, $- \infty < p < \infty $, have the same density $f(y_{i}|\mathbf{x}_{i})$ except that $\alpha ^{-1}$ is replaced everywhere by $\alpha ^{-1} \mu ^{2-p}$. The negative binomial model NEGBIN1, which sets $p=1$, has variance function $V(y_{i}|\mathbf{x}_{i}) = \mu _{i} + \alpha \mu _{i}$, which is linear in the mean. To estimate this model, specify DIST=NEGBIN(p=1) in the MODEL statement.

The log-likelihood function of the NEGBIN1 regression model is given by

$\displaystyle  \mathcal{L}  $
$\displaystyle  =  $
$\displaystyle  \sum _{i=1}^{N} w_ i\Bigg\{  \sum _{j=0}^{y_{i}-1} \ln \left(j+\alpha ^{-1}\exp (\mathbf{x}_{i}^{\prime }\bbeta ) \right)  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  -\ln (y_{i}!) -\left(y_{i}+\alpha ^{-1}\exp (\mathbf{x}_{i}^{\prime }\bbeta )\right)\ln (1+\alpha ) + y_{i}\ln (\alpha ) \Bigg\}   $

See Poisson Regression for the definition of $w_ i$.

The gradient is

\[  \frac{\partial \mathcal{L}}{\partial \bbeta } = \sum _{i=1}^{N} w_ i\left\{  \left( \sum _{j=0}^{y_{i}-1} \frac{\mu _{i}}{(j \alpha + \mu _{i})} \right) \mathbf{x}_{i} - \alpha ^{-1} \ln (1+\alpha )\mu _{i} \mathbf{x}_{i} \right\}   \]

and

\[  \frac{\partial \mathcal{L}}{\partial \alpha } = \sum _{i=1}^{N} w_ i\left\{  - \left( \sum _{j=0}^{y_{i}-1} \frac{\alpha ^{-1}\mu _{i}}{(j \alpha + \mu _{i})} \right) - \alpha ^{-2} \mu _{i} \ln (1+\alpha ) - \frac{(y_ i+\alpha ^{-1}\mu _{i})}{1+\alpha } + \frac{y_ i}{\alpha } \right\}   \]