The HPCOUNTREG Procedure

Zero-Inflated Negative Binomial Regression

The zero-inflated negative binomial (ZINB) model in PROC HPCOUNTREG is based on the negative binomial model that has a quadratic variance function (when DIST=NEGBIN in the MODEL or PROC HPCOUNTREG statement). The ZINB model is obtained by specifying a negative binomial distribution for the data generation process referred to earlier as Process 2:

\[  g(y_{i}) = \frac{\Gamma (y_{i}+\alpha ^{-1})}{y_{i}! \Gamma (\alpha ^{-1})}\left(\frac{\alpha ^{-1}}{\alpha ^{-1}+\mu _{i}} \right)^{\alpha ^{-1}}\left(\frac{\mu _{i}}{\alpha ^{-1}+\mu _{i}} \right)^{y_{i}}  \]

Thus the ZINB model is defined to be

\begin{eqnarray*}  P(y_{i}=0|\mathbf{x}_{i},\mathbf{z}_{i}) & =&  F_{i} + \left(1 - F_{i}\right)(1+\alpha \mu _{i})^{-\alpha ^{-1}} \\ P(y_{i}|\mathbf{x}_{i},\mathbf{z}_{i}) & =&  \left(1- F_{i} \right) \frac{\Gamma (y_{i}+\alpha ^{-1})}{y_{i}! \Gamma (\alpha ^{-1})}\left(\frac{\alpha ^{-1}}{\alpha ^{-1}+\mu _{i}} \right)^{\alpha ^{-1}} \\ &  \times &  \left(\frac{\mu _{i}}{\alpha ^{-1}+\mu _{i}} \right)^{y_{i}} , \quad y_{i}>0 \end{eqnarray*}

In this case, the conditional expectation ($\Emph{E}$) and conditional variance (V) of $y_{i}$ are

\[  E(y_{i}|\mathbf{x}_{i},\mathbf{z}_{i}) = \mu _{i}(1 -F_{i})  \]
\[  V(y_{i}|\mathbf{x}_{i},\mathbf{z}_{i}) = E(y_{i}|\mathbf{x}_{i},\mathbf{z}_{i})\left[1+\mu _{i} (F_{i}+\alpha ) \right]  \]

Like the ZIP model, the ZINB model exhibits overdispersion because the conditional variance exceeds the conditional mean.

ZINB Model with Logistic Link Function

In this model, the probability $\varphi _{i}$ is given by the logistic function, namely

\[  \varphi _{i}=\frac{\exp (\mathbf{z}_{i}'\bgamma )}{1+\exp (\mathbf{z}_{i}'\bgamma )}  \]

The log-likelihood function is

\begin{eqnarray*}  \mathcal{L} &  = &  \sum _{\{ i: y_{i}=0\} } \ln \left[\exp (\mathbf{z}_{i}’\bgamma )+(1+\alpha \exp (\mathbf{x}_{i}’\bbeta ))^{-\alpha ^{-1}} \right] \\ &  + &  \sum _{\{ i: y_{i}>0\} } \sum _{j=0}^{y_{i}-1}\ln (j+\alpha ^{-1}) \\ &  + &  \sum _{\{ i: y_{i}>0\} } \left\{  -\ln (y_{i}!) - (y_{i}+\alpha ^{-1}) \ln (1+\alpha \exp (\mathbf{x}_{i}^{\prime }\bbeta )) +y_{i}\ln (\alpha ) + y_{i}\mathbf{x}_{i}^{\prime }\bbeta \right\}  \\ &  - &  \sum _{i=1}^{N}\ln \left[ 1 + \exp (\mathbf{z}_{i}’\bgamma ) \right] \end{eqnarray*}

ZINB Model with Standard Normal Link Function

For this model, the probability $\varphi _{i}$ is expressed by the standard normal distribution function (probit function): $\varphi _{i}= \Phi (\mathbf{z}_{i}’\bgamma )$. The log-likelihood function is

\begin{eqnarray*}  \mathcal{L} &  = &  \sum _{\{ i: y_{i}=0\} } \ln \left\{  \Phi (\mathbf{z}_{i}’\bgamma ) + \left[ 1 - \Phi (\mathbf{z}_{i}’\bgamma ) \right] (1+\alpha \exp (\mathbf{x}_{i}’\bbeta ))^{-\alpha ^{-1}} \right\}  \\ &  + &  \sum _{\{ i: y_{i}>0\} } \ln \left[ 1 - \Phi (\mathbf{z}_{i}’\bgamma ) \right] \\ &  + &  \sum _{\{ i: y_{i}>0\} } \sum _{j=0}^{y_{i}-1} \left\{  \ln (j+\alpha ^{-1})\right\}  \\ &  - &  \sum _{\{ i: y_{i}>0\} } \ln (y_{i}!) \\ &  - &  \sum _{\{ i: y_{i}>0\} } (y_{i}+\alpha ^{-1}) \ln (1+\alpha \exp (\mathbf{x}_{i}^{\prime }\bbeta )) \\ &  + &  \sum _{\{ i: y_{i}>0\} } y_{i}\ln (\alpha ) \\ &  + &  \sum _{\{ i: y_{i}>0\} }y_{i} \mathbf{x}_{i}^{\prime }\bbeta \end{eqnarray*}

For more information about the zero-inflated negative binomial regression model, see the section Zero-Inflated Negative Binomial Regression in SAS/ETS 13.2 User's Guide.