The main motivation for using zero-inflated count models is that real-life data frequently display overdispersion and excess zeros. Zero-inflated count models provide a way to both model the excess zeros and allow for overdispersion. In particular, there are two possible data generation processes for each observation. The result of a Bernoulli trial is used to determine which of the two processes to use. For observation , Process 1 is chosen with probability and Process 2 with probability . Process 1 generates only zero counts. Process 2 generates counts from either a Poisson or a negative binomial model. In general,
Therefore, the probability of can be described as
|
|
|
|
|
|
where follows either the Poisson or the negative binomial distribution.
If the probability depends on the characteristics of observation , then is written as a function of , where is the vector of zero-inflated covariates and is the vector of zero-inflated coefficients to be estimated. (The zero-inflated intercept is ; the coefficients for the zero-inflated covariates are .) The function that relates the product (which is a scalar) to the probability is called the zero-inflated link function,
In the HPCOUNTREG procedure, the zero-inflated covariates are indicated in the ZEROMODEL statement. Furthermore, the zero-inflated link function can be specified as either the logistic function,
or the standard normal cumulative distribution function (also called the probit function),
The zero-inflated link function is indicated by using the LINK= option in the ZEROMODEL statement. The default ZI link function is the logistic function.