The HPCOUNTREG Procedure

Zero-Inflated Count Regression Overview

The main motivation for using zero-inflated count models is that real-life data frequently display overdispersion and excess zeros. Zero-inflated count models provide a way to both model the excess zeros and allow for overdispersion. In particular, there are two possible data generation processes for each observation. The result of a Bernoulli trial is used to determine which of the two processes to use. For observation i, Process 1 is chosen with probability $\varphi _{i}$ and Process 2 with probability $1-\varphi _{i}$ . Process 1 generates only zero counts. Process 2 generates counts from either a Poisson or a negative binomial model. In general,

$y_ i \sim \begin{cases} 0 & \quad \text {with probability}\quad \varphi _{i}\\ g(y_ i) & \quad \text {with probability}\quad 1-\varphi _{i} \end{cases}$

Therefore, the probability of $\{ Y_{i} = y_{i} \}$ can be described as

$\begin{eqnarray*} P(y_{i}=0|\mathbf{x}_{i}) & = & \varphi _{i} + (1-\varphi _{i})g(0) \\ P(y_{i}|\mathbf{x}_{i}) & = & (1-\varphi _{i})g(y_{i}), \quad y_{i}>0 \end{eqnarray*}$

where $g(y_ i)$ follows either the Poisson or the negative binomial distribution.

If the probability $\varphi _{i}$ depends on the characteristics of observation i, then $\varphi _{i}$ is written as a function of $\mathbf{z}_{i}’\bgamma$ , where $\mathbf{z}_ i’$ is the $1 \times (q+1)$ vector of zero-inflated covariates and $\bgamma$ is the $(q+1) \times 1$ vector of zero-inflated coefficients to be estimated. (The zero-inflated intercept is $\gamma _0$ ; the coefficients for the q zero-inflated covariates are $\gamma _1, \ldots , \gamma _ q$ .) The function F that relates the product $\mathbf{z}_{i}’\bgamma$ (which is a scalar) to the probability $\varphi _{i}$ is called the zero-inflated link function,

$\varphi _{i} = F_{i} = F(\mathbf{z}_{i}’\bgamma )$

In the HPCOUNTREG procedure, the zero-inflated covariates are indicated in the ZEROMODEL statement. Furthermore, the zero-inflated link function F can be specified as either the logistic function,

$F(\mathbf{z}_{i}’\bgamma ) = \Lambda (\mathbf{z}_{i}’\bgamma ) = \frac{\exp (\mathbf{z}_{i}'\bgamma )}{1+\exp (\mathbf{z}_{i}'\bgamma )}$

or the standard normal cumulative distribution function (also called the probit function),

$F(\mathbf{z}_{i}’\bgamma ) = \Phi (\mathbf{z}_{i}’\bgamma ) = \int _{0}^{\mathbf{z}_{i}'\bgamma } \frac{1}{\sqrt {2 \pi }}\exp (-u^2 \slash 2) du$

The zero-inflated link function is indicated by using the LINK= option in the ZEROMODEL statement. The default ZI link function is the logistic function.