Zero-Inflated Count Regression Overview :: SAS/ETS(R) 12.3 User's Guide: High-Performance Procedures

Zero-Inflated Count Regression Overview

The main motivation for using zero-inflated count models is that real-life data frequently display overdispersion and excess zeros. Zero-inflated count models provide a way to both model the excess zeros and allow for overdispersion. In particular, there are two possible data generation processes for each observation. The result of a Bernoulli trial is used to determine which of the two processes to use. For observation , Process 1 is chosen with probability $\varphi _{i}$ and Process 2 with probability $1-\varphi _{i}$ . Process 1 generates only zero counts. Process 2 generates counts from either a Poisson or a negative binomial model. In general,

$y_ i \sim \left\{ \begin{array}{l@{\quad \mbox {with probability} \quad }l} 0 & \varphi _{i} \\ g(y_ i) & 1-\varphi _{i} \end{array} \right.$

Therefore, the probability of $\{ Y_{i} = y_{i} \}$ can be described as

$\displaystyle P(y_{i}=0\|\mathbf{x}_{i})$	$\displaystyle =$	$\displaystyle \varphi _{i} + (1-\varphi _{i})g(0)$
$\displaystyle P(y_{i}\|\mathbf{x}_{i})$	$\displaystyle =$	$\displaystyle (1-\varphi _{i})g(y_{i}), \quad y_{i}>0$

where follows either the Poisson or the negative binomial distribution.

If the probability $\varphi _{i}$ depends on the characteristics of observation , then $\varphi _{i}$ is written as a function of $\mathbf{z}_{i}’\bgamma$ , where $\mathbf{z}_ i’$ is the $1 \times (q+1)$ vector of zero-inflated covariates and $\bgamma$ is the $(q+1) \times 1$ vector of zero-inflated coefficients to be estimated. (The zero-inflated intercept is $\gamma _0$ ; the coefficients for the zero-inflated covariates are $\gamma _1, \ldots , \gamma _ q$ .) The function that relates the product $\mathbf{z}_{i}’\bgamma$ (which is a scalar) to the probability $\varphi _{i}$ is called the zero-inflated link function,

$\varphi _{i} = F_{i} = F(\mathbf{z}_{i}’\bgamma )$

In the HPCOUNTREG procedure, the zero-inflated covariates are indicated in the ZEROMODEL statement. Furthermore, the zero-inflated link function can be specified as either the logistic function,

$F(\mathbf{z}_{i}’\bgamma ) = \Lambda (\mathbf{z}_{i}’\bgamma ) = \frac{\exp (\mathbf{z}_{i}\bgamma )}{1+\exp (\mathbf{z}_{i}\bgamma )}$

or the standard normal cumulative distribution function (also called the probit function),

$F(\mathbf{z}_{i}’\bgamma ) = \Phi (\mathbf{z}_{i}’\bgamma ) = \int _{0}^{\mathbf{z}_{i}\bgamma } \frac{1}{\sqrt {2 \pi }}\exp (-u^2 \slash 2) du$

The zero-inflated link function is indicated by using the LINK= option in the ZEROMODEL statement. The default ZI link function is the logistic function.

The HPCOUNTREG Procedure

Zero-Inflated Count Regression Overview