SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 48506: Fitting hurdle models

DetailsAboutRate It

The example titled "Modeling Zero-Inflation: Is it Better to Fish Poorly or Not to Have Fished At All?" in the FMM procedure documentation discusses zero-inflated and hurdle models for modeling count data containing excessive zeros. As noted there, the hurdle model supposes two processes at work — one that generates zeros with some probability, and the other that generates events. For instance, the Poisson hurdle model is a mixture of a degenerate distribution at zero and a truncated Poisson distribution. The zero-inflated Poisson (ZIP) model also uses a degenerate zero distribution, but the second process is a regular Poisson distribution which can generate both zeros and events. For the ZIP model, the first process therefore generates only extra zeros beyond those of the regular Poisson distribution. For the hurdle model, the first process generates all of the zeros. Truncated Poisson and negative binomial distributions are discussed and illustrated in more detail in this note.

The hurdle model can also be used in cases of underdispersion in which there is less variability in the data than expected under the Poisson distribution.

The example in the FMM documentation illustrates the ZIP model. The Poisson hurdle model is just as easily fit, but uses the DIST=TRUNCPOISSON option instead of the DIST=POISSON option.

      proc fmm data=catch;
         class gender;
         model count = gender*age / dist=TruncPoisson;
         model       +            / dist=Constant;
         run;

Notice that the results are similar to the ZIP model shown in the FMM documentation.

Parameter Estimates for 'Truncated Poisson' Model
Component Effect gender Estimate Standard Error z Value Pr > |z|
1 Intercept   -3.1369 0.7512 -4.18 <.0001
1 age*gender F 0.1135 0.01562 7.27 <.0001
1 age*gender M 0.09954 0.01604 6.21 <.0001
 
Parameter Estimates for Mixing Probabilities
Effect Linked Scale Probability
Estimate Standard Error z Value Pr > |z|
Intercept -0.1542 0.2782 -0.55 0.5795 0.4615

The above Poisson hurdle model can also be fit using PROC NLMIXED. The following statements fit the model and may help to clarify how the model is fit. A logistic model containing only an intercept is used for the zeros process as can be seen by the statements defining the linear predictor for the zeros model (LINPZERO) and the probability of zero (PI). The events process is defined by its linear predictor (LINPNOZERO) and mean count (MUNOZERO) statements. The next two statements define the log likelihood for the Poisson hurdle model.

      proc nlmixed data=catch;
         parameters a0=0 a1=0 a2=0 b0=0;
         linpzero   = b0;
         pi         = 1/(1+exp(-linpzero));
         linpnozero = a0 + a1*(gender='F')*age + a2*(gender='M')*age;
         munozero   = exp(linpnozero);
         logpnozero = log(pi) - log(1-exp(-munozero)) - munozero -
                      lgamma(count+1) + count*log(munozero);
         if count=0 then ll=log(1-pi); else ll=logpnozero;
         model count ~ general(ll);
         run;

Beginning in SAS® 9.3 TS1M2, hurdle models using the negative binomial distribution can also be fit using the DIST=TRUNCNEGBIN option to specify the truncated version of the distribution.



Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemSAS/STATz/OS
OpenVMS VAX
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 8 Pro
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.