The COUNTREG Procedure

Negative Binomial Regression

The Poisson regression model can be generalized by introducing an unobserved heterogeneity term for observation . Thus, the individuals are assumed to differ randomly in a manner that is not fully accounted for by the observed covariates. This is formulated as

$E(y_{i}|\mathbf{x}_{i}, \tau _{i}) = \mu _{i} \tau _{i} = e^{\mathbf{x}_{i}\bbeta + \epsilon _{i}}$

where the unobserved heterogeneity term $\tau _{i} = e^{\epsilon _{i}}$ is independent of the vector of regressors $\mathbf{x}_{i}$ . Then the distribution of $y_{i}$ conditional on $\mathbf{x}_{i}$ and $\tau _{i}$ is Poisson with conditional mean and conditional variance $\mu _{i} \tau _{i}$ :

$f(y_{i}|\mathbf{x}_{i},\tau _{i}) = \frac{\exp (-\mu _{i}\tau _{i}) (\mu _{i}\tau _{i})^{y_{i}}}{y_{i}!}$

Let $g(\tau _{i})$ be the probability density function of $\tau _{i}$ . Then, the distribution $f(y_{i}|\mathbf{x}_{i})$ (no longer conditional on $\tau _{i}$ ) is obtained by integrating $f(y_{i}|\mathbf{x}_{i},\tau _{i})$ with respect to $\tau _{i}$ :

$f(y_{i}|\mathbf{x}_{i}) = \int _{0}^{\infty } f(y_{i}|\mathbf{x}_{i},\tau _{i}) g(\tau _{i}) d\tau _{i}$

An analytical solution to this integral exists when $\tau _{i}$ is assumed to follow a gamma distribution. This solution is the negative binomial distribution. When the model contains a constant term, it is necessary to assume that $E(e^{\epsilon _{i}}) = E(\tau _{i}) =1$ , in order to identify the mean of the distribution. Thus, it is assumed that $\tau _{i}$ follows a gamma( $\theta , \theta$ ) distribution with $E(\tau _{i}) =1$ and $V(\tau _{i}) = 1 \slash \theta$ ,

$g(\tau _{i}) = \frac{\theta ^{\theta }}{\Gamma (\theta )} \tau _{i}^{\theta -1}\exp (-\theta \tau _{i})$

where $\Gamma (x)=\int _{0}^{\infty }z^{x-1}\exp (-z)dz$ is the gamma function and $\theta$ is a positive parameter. Then, the density of given $\mathbf{x}_ i$ is derived as

$\displaystyle f(y_{i}\|\mathbf{x}_{i})$	$\displaystyle =$	$\displaystyle \int _{0}^{\infty }f(y_{i}\|\mathbf{x}_{i},\tau _{i}) g(\tau _{i})d\tau _{i}$
$\displaystyle$	$\displaystyle =$	$\displaystyle \frac{\theta ^{\theta } \mu _{i}^{y_{i}}}{y_{i}!\Gamma (\theta )} \int _{0}^{\infty } e^{-(\mu _{i}+\theta )\tau _{i}}\tau _{i}^{\theta +y_{i}-1}d\tau _{i}$
$\displaystyle$	$\displaystyle =$	$\displaystyle \frac{\theta ^{\theta }\mu _{i}^{y_{i}}\Gamma (y_{i}+\theta )}{y_{i}!\Gamma (\theta )(\theta +\mu _{i})^{\theta +y_{i}}}$
$\displaystyle$	$\displaystyle =$	$\displaystyle \frac{\Gamma (y_{i}+\theta )}{y_{i}!\Gamma (\theta )} \left(\frac{\theta }{\theta +\mu _{i}}\right)^{\theta } \left(\frac{\mu _{i}}{\theta +\mu _{i}}\right)^{y_{i}}$

Making the substitution $\alpha =\frac{1}{\theta }$ ( $\alpha > 0$ ), the negative binomial distribution can then be rewritten as

$f(y_{i}|\mathbf{x}_{i}) = \frac{\Gamma (y_{i}+\alpha ^{-1})}{y_{i}!\Gamma (\alpha ^{-1})} \left(\frac{\alpha ^{-1}}{\alpha ^{-1}+\mu _{i}}\right)^{\alpha ^{-1}} \left(\frac{\mu _{i}}{\alpha ^{-1}+\mu _{i}}\right)^{y_{i}}, \quad y_{i} = 0,1,2,\ldots$

Thus, the negative binomial distribution is derived as a gamma mixture of Poisson random variables. It has conditional mean

$E(y_{i}|\mathbf{x}_{i})=\mu _{i} = e^{\mathbf{x}_{i}\bbeta }$

and conditional variance

$V(y_{i}|\mathbf{x}_{i}) = \mu _{i} [1+\frac{1}{\theta }\mu _{i}] = \mu _{i}[1+\alpha \mu _{i}] > E(y_{i}|\mathbf{x}_{i})$

The conditional variance of the negative binomial distribution exceeds the conditional mean. Overdispersion results from neglected unobserved heterogeneity. The negative binomial model with variance function $V(y_{i}|\mathbf{x}_{i}) = \mu _{i}+\alpha \mu _{i}^2$ , which is quadratic in the mean, is referred to as the NEGBIN2 model (Cameron and Trivedi 1986). To estimate this model, specify DIST=NEGBIN(p=2) in the MODEL statement. The Poisson distribution is a special case of the negative binomial distribution where $\alpha =0$ . A test of the Poisson distribution can be carried out by testing the hypothesis that $\alpha =\frac{1}{\theta _ i}=0$ . A Wald test of this hypothesis is provided (it is the reported statistic for the estimated $\alpha$ in the negative binomial model).

The log-likelihood function of the negative binomial regression model (NEGBIN2) is given by

$\displaystyle \mathcal{L}$	$\displaystyle =$	$\displaystyle \sum _{i=1}^{N} w_ i\Bigg\{ \sum _{j=0}^{y_{i}-1}\ln (j+\alpha ^{-1}) -\ln (y_{i}!)$
$\displaystyle$	$\displaystyle$	$\displaystyle - (y_{i}+\alpha ^{-1}) \ln (1+\alpha \exp (\mathbf{x}_{i}^{\prime }\bbeta )) +y_{i}\ln (\alpha ) + y_{i}\mathbf{x}_{i}^{\prime }\bbeta \Bigg\}$

$\Gamma (y+a)/\Gamma (a)= \prod _{j=0}^{y-1} (j+a)$

if is an integer. See Poisson Regression for the definition of .

The gradient is

$\frac{\partial \mathcal{L}}{\partial \bbeta } = \sum _{i=1}^{N} w_ i\frac{y_{i}-\mu _{i}}{1+\alpha \mu _{i}} \mathbf{x}_{i}$

and

$\frac{\partial \mathcal{L}}{\partial \alpha } = \sum _{i=1}^{N} w_ i\left\{ - \alpha ^{-2} \sum _{j=0}^{y_{i}-1} \frac{1}{(j + \alpha ^{-1})} + \alpha ^{-2} \ln (1+\alpha \mu _{i}) + \frac{y_{i}-\mu _{i}}{\alpha (1+\alpha \mu _{i})}\right\}$

Cameron and Trivedi (1986) consider a general class of negative binomial models with mean $\mu _{i}$ and variance function $\mu _{i}+\alpha \mu _{i}^ p$ . The NEGBIN2 model, with , is the standard formulation of the negative binomial model. Models with other values of , $- \infty < p < \infty$ , have the same density $f(y_{i}|\mathbf{x}_{i})$ except that $\alpha ^{-1}$ is replaced everywhere by $\alpha ^{-1} \mu ^{2-p}$ . The negative binomial model NEGBIN1, which sets , has variance function $V(y_{i}|\mathbf{x}_{i}) = \mu _{i} + \alpha \mu _{i}$ , which is linear in the mean. To estimate this model, specify DIST=NEGBIN(p=1) in the MODEL statement.

The log-likelihood function of the NEGBIN1 regression model is given by

$\displaystyle \mathcal{L}$	$\displaystyle =$	$\displaystyle \sum _{i=1}^{N} w_ i\Bigg\{ \sum _{j=0}^{y_{i}-1} \ln \left(j+\alpha ^{-1}\exp (\mathbf{x}_{i}^{\prime }\bbeta ) \right)$
$\displaystyle$	$\displaystyle$	$\displaystyle -\ln (y_{i}!) -\left(y_{i}+\alpha ^{-1}\exp (\mathbf{x}_{i}^{\prime }\bbeta )\right)\ln (1+\alpha ) + y_{i}\ln (\alpha ) \Bigg\}$

See Poisson Regression for the definition of .

The gradient is

$\frac{\partial \mathcal{L}}{\partial \bbeta } = \sum _{i=1}^{N} w_ i\left\{ \left( \sum _{j=0}^{y_{i}-1} \frac{\mu _{i}}{(j \alpha + \mu _{i})} \right) \mathbf{x}_{i} - \alpha ^{-1} \ln (1+\alpha )\mu _{i} \mathbf{x}_{i} \right\}$

and

$\frac{\partial \mathcal{L}}{\partial \alpha } = \sum _{i=1}^{N} w_ i\left\{ - \left( \sum _{j=0}^{y_{i}-1} \frac{\alpha ^{-1}\mu _{i}}{(j \alpha + \mu _{i})} \right) - \alpha ^{-2} \mu _{i} \ln (1+\alpha ) - \frac{(y_ i+\alpha ^{-1}\mu _{i})}{1+\alpha } + \frac{y_ i}{\alpha } \right\}$