PROC COUNTREG: Negative Binomial Regression :: SAS/ETS(R) 9.2 User's Guide

The COUNTREG Procedure

The Poisson regression model can be generalized by introducing an unobserved heterogeneity term for observation $\text{[math]}$ . Thus, the individuals are assumed to differ randomly in a manner that is not fully accounted for by the observed covariates. This is formulated as

$\text{[math]}$

where the unobserved heterogeneity term $\text{[math]}$ is independent of the vector of regressors $\text{[math]}$ . Then the distribution of $\text{[math]}$ conditional on $\text{[math]}$ and $\text{[math]}$ is Poisson with conditional mean and conditional variance $\text{[math]}$ :

$\text{[math]}$

Let $\text{[math]}$ be the probability density function of $\text{[math]}$ . Then, the distribution $\text{[math]}$ (no longer conditional on $\text{[math]}$ ) is obtained by integrating $\text{[math]}$ with respect to $\text{[math]}$ :

$\text{[math]}$

An analytical solution to this integral exists when $\text{[math]}$ is assumed to follow a gamma distribution. This solution is the negative binomial distribution. When the model contains a constant term, it is necessary to assume that $\text{[math]}$ , in order to identify the mean of the distribution. Thus, it is assumed that $\text{[math]}$ follows a gamma( $\text{[math]}$ ) distribution with $\text{[math]}$ and $\text{[math]}$ :

$\text{[math]}$

where $\text{[math]}$ is the gamma function and $\text{[math]}$ is a positive parameter. Then, the density of $\text{[math]}$ given $\text{[math]}$ is derived as

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

Making the substitution $\text{[math]}$ ( $\text{[math]}$ ), the negative binomial distribution can then be rewritten as

$\text{[math]}$

Thus, the negative binomial distribution is derived as a gamma mixture of Poisson random variables. It has conditional mean

$\text{[math]}$

and conditional variance

$\text{[math]}$

The conditional variance of the negative binomial distribution exceeds the conditional mean. Overdispersion results from neglected unobserved heterogeneity. The negative binomial model with variance function $\text{[math]}$ , which is quadratic in the mean, is referred to as the NEGBIN2 model (Cameron and Trivedi 1986). To estimate this model, specify DIST=NEGBIN(p=2) in the MODEL statement. The Poisson distribution is a special case of the negative binomial distribution where $\text{[math]}$ . A test of the Poisson distribution can be carried out by testing the hypothesis that $\text{[math]}$ . A Wald test of this hypothesis is provided (it is the reported $\text{[math]}$ statistic for the estimated $\text{[math]}$ in the negative binomial model).

The log-likelihood function of the negative binomial regression model (NEGBIN2) is given by

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

where use of the following fact is made:

$\text{[math]}$

if $\text{[math]}$ is an integer.

The gradient is

$\text{[math]}$

and

$\text{[math]}$

Cameron and Trivedi (1986) consider a general class of negative binomial models with mean $\text{[math]}$ and variance function $\text{[math]}$ . The NEGBIN2 model, with $\text{[math]}$ , is the standard formulation of the negative binomial model. Models with other values of $\text{[math]}$ , $\text{[math]}$ , have the same density $\text{[math]}$ except that $\text{[math]}$ is replaced everywhere by $\text{[math]}$ . The negative binomial model NEGBIN1, which sets $\text{[math]}$ , has variance function $\text{[math]}$ , which is linear in the mean. To estimate this model, specify DIST=NEGBIN(p=1) in the MODEL statement.