
Usage Note 48506: Fitting hurdle models

The example titled "Modeling Zero-Inflation: Is it Better to Fish Poorly or Not to Have Fished At All?" in the FMM procedure documentation discusses zero-inflated and hurdle models for modeling count data containing excessive zeros. As noted there, the hurdle model supposes two processes at work — one that generates zeros with some probability, and the other that generates events. For instance, the Poisson hurdle model is a mixture of a degenerate distribution at zero and a truncated Poisson distribution. The zero-inflated Poisson (ZIP) model also uses a degenerate zero distribution, but the second process is a regular Poisson distribution which can generate both zeros and events. For the ZIP model, the first process therefore generates only extra zeros beyond those of the regular Poisson distribution. For the hurdle model, the first process generates all of the zeros. Truncated Poisson and negative binomial distributions are discussed and illustrated in more detail in this note.

The hurdle model can also be used in cases of underdispersion in which there is less variability in the data than expected under the Poisson distribution.

The example in the FMM documentation illustrates the ZIP model. The Poisson hurdle model is just as easily fit, but uses the DIST=TRUNCPOISSON option instead of the DIST=POISSON option.

      proc fmm data=catch;
         class gender;
         model count = gender*age / dist=TruncPoisson;
         model       +            / dist=Constant;

Notice that the results are similar to the ZIP model shown in the FMM documentation.

Parameter Estimates for 'Truncated Poisson' Model
Component Effect gender Estimate Standard Error z Value Pr > |z|
1 Intercept   -3.1369 0.7512 -4.18 <.0001
1 age*gender F 0.1135 0.01562 7.27 <.0001
1 age*gender M 0.09954 0.01604 6.21 <.0001
Parameter Estimates for Mixing Probabilities
Effect Linked Scale Probability
Estimate Standard Error z Value Pr > |z|
Intercept -0.1542 0.2782 -0.55 0.5795 0.4615

The above Poisson hurdle model can also be fit using PROC NLMIXED. The following statements fit the model and may help to clarify how the model is fit. A logistic model containing only an intercept is used for the zeros process as can be seen by the statements defining the linear predictor for the zeros model (LINPZERO) and the probability of zero (PI). The events process is defined by its linear predictor (LINPNOZERO) and mean count (MUNOZERO) statements. The next two statements define the log likelihood for the Poisson hurdle model.

      proc nlmixed data=catch;
         parameters a0=0 a1=0 a2=0 b0=0;
         linpzero   = b0;
         pi         = 1/(1+exp(-linpzero));
         linpnozero = a0 + a1*(gender='F')*age + a2*(gender='M')*age;
         munozero   = exp(linpnozero);
         logpnozero = log(pi) - log(1-exp(-munozero)) - munozero -
                      lgamma(count+1) + count*log(munozero);
         if count=0 then ll=log(1-pi); else ll=logpnozero;
         model count ~ general(ll);

Beginning in SAS® 9.3 TS1M2, hurdle models using the negative binomial distribution can also be fit using the DIST=TRUNCNEGBIN option to specify the truncated version of the distribution.

