Count data in which zero counts cannot be observed is called truncated count data. Such data can be modeled using truncated versions of the Poisson or negative binomial distributions.
Models using the truncated Poisson or truncated negative binomial distribution can be fit in PROC NLMIXED by specifying the log likelihood function. Beginning in SAS 9.3 TS1M0, models using the truncated Poisson distribution can more easily be fit in PROC FMM by specifying the DIST=TRUNCPOISSON option. Beginning in SAS 9.3 TS1M2, PROC FMM can also fit the truncated negative binomial model with the DIST=TRUNCNEGBIN option. Note that these truncated distributions are used to fit hurdle models. Fitting models with both truncated distributions using PROC FMM and PROC NLMIXED is illustrated below.
The following statements create the skunk capture data presented in Table 7.1 of Zelterman. Y is the number of times each individual skunk was captured over several trappings. Individual skunks never trapped could not be recorded, so only counts of one or more appear in the data. One goal is to estimate the number of skunks in the area for each sex in each of two years.
data skunk; input y freq year sex $ @@; female=1; if sex='M' then female=0; y77=1; if year=78 then y77=0; fem77=female*y77; datalines; 1 1 77 F 2 2 77 F 3 4 77 F 4 2 77 F 5 1 77 F 1 3 77 M 2 0 77 M 3 3 77 M 4 3 77 M 5 2 77 M 6 2 77 M 1 7 78 F 2 7 78 F 3 3 78 F 4 1 78 F 5 2 78 F 1 4 78 M 2 3 78 M 3 1 78 M ;
The statements below fit the following truncated Poisson model,
E(y) = λ = exp[β_{0} + β_{f}*I(Sex=female) + β_{77}*I(Year=1977) + β_{f77}*I(Sex=female and year=1977)]
where I(·) is an indicator variable equaling 1 when the condition is true and 0 otherwise. The variables FEMALE, Y77, and FEM77 in data set SKUNK are these indicator variables. β_{0}, β_{f}, β_{77}, and β_{f77} are the model parameters to be estimated.
The model is easily fit in PROC FMM using the DIST=TRUNCPOISSON option in the MODEL statement. The COV option and the ODS OUTPUT statement save the model information for use in the NLEstimate macro later.
proc fmm data=skunk cov; freq freq; model y = female y77 fem77 / dist=truncpoisson; ods output parameterestimates=pe cov=covb; run;
Below are the estimated parameters of the model.

In PROC NLMIXED, zero initial values for the model parameters are specified in the PARMS statement. The above model for the expected count, λ, is specified next. The log likelihood, LL, for the truncated Poisson is specified in the following line. Finally, the MODEL statement species that the observed counts, Y, are distributed according to the log likelihood previously defined.
proc nlmixed data=skunk; parms b0=0 bf=0 b77=0 bf77=0; lambda=exp(b0+bf*female+b77*y77+bf77*fem77); LL=freq*(lambda+y*log(lambda)lgamma(y+1)log(1exp(lambda))); model y ~ general(LL); run;
The fitted model is the same as from PROC FMM above.
To determine the size of the population for each sex in each year, you need to estimate the probability, p_{i}, of capturing a skunk at least once in each of the four populations, i=1, 2, 3, or 4. The probability of at least one capture is 1 p 0 i , the probability of zero captures in population i. The product of the capture probability, p_{i}, and the population size, N_{i}, is the number of skunks captured at least once (N_{i}p_{i} = n_{i}). Consequently, an estimate of the population size is the number of skunks captured at least once divided by the probability. The PDF function can be used to evaluate the probability mass or density function of many distributions. Specifying pdf('poisson',value,lambda), returns the probability of the given integer value in a Poisson distribution with parameter λ. Then, an estimate of the population size is n/(1pdf('poisson',0,lambda)), where lambda is the estimated Poisson parameter, λ, for a given population. Since this is a nonlinear function of the model parameters, it can be estimated using the NLEstimate macro.
These statements save the observed number, n_{i}, of skunks captured at least once in each population in data set OBS in variable COUNT.
proc freq data=skunk; weight freq; table y77*female / out=obs; run;
The fitted model information from PROC FMM is provided to the NLEstimate macro using the inest= and incovb= macro parameters. The data set containing the covariance matrix from PROC FMM has additional numeric variables besides those that contain the matrix. These must be removed. The following statements keep only the variables containing the covariance matrix.
data covb; set covb; keep col:; run;
With the fitted model information, the NLEstimate macro then uses PROC NLMIXED to estimate the population sizes and the delta method to obtain confidence limits. You write the function to be estimated using the parameter names and specify it in the f= macro parameter. A label can be provided in the label= parameter. See the description of the NLEstimate macro for details about displaying parameter names and using the macro. The population sizes are estimated by specifying data set OBS in the score= macro parameter. The NLEstimate macro evaluates the f= function discussed above for each observation in the score= data set. The estimated values are saved in the outscore= data set, Pr0, in a variable named PRED. Standard errors, a test statistic, pvalue, and confidence limits are also saved.
%NLEstimate(inest=pe, incovb=covb, f=count/(1pdf('poisson',0, exp(b_p1+b_p2*female+b_p3*y77+b_p4*female*y77))), score=obs, outscore=Pr0)
These statements display the estimated population sizes.
proc print data=Pr0; id y77 female; var count pred lower upper; run;
For the population of males (FEMALE=0) in 1978 (Y77=0), the estimated population size is 13 when rounded to the next largest integer. Eight were captured. A largesample 95% confidence interval for the population size is approximately (6.5, 18). Similarly for the other three populations as shown.

The data set by Long, analyzed in the PROC COUNTREG documentation, is from a study examining how several factors affect the number of articles (ART) published by scientists, including the number of articles published by a scientist's mentor (MENT). While the actual data set includes scientists who did not publish, suppose that those scientists were not reported. Then the data would contain no observations with zero counts resulting in a truncated distribution. The following statements fit this truncated negative binomial model to the nonzero data.
E(y) = mean = exp[β_{0} + β_{1}*MENT]
Beginning in SAS 9.3 TS1M2, the truncated negative binomial model can easily be fit using PROC FMM. The WHERE statement removes the observations with zero counts.
proc fmm data=long97data; where art ne 0; model art = ment / dist=truncnegbin; run;
As with the truncated Poisson model, PROC NLMIXED can also fit the truncated negative binomial model.
proc nlmixed data=long97data; where art ne 0; y=art; parms b0=1 b1=0 k=1; mean = exp(b0 + b1*ment); ll = lgamma(y+1/k)  lgamma(1/k)  lgamma(y+1) + y*log(k*mean)  (y+(1/k))*log(1+k*mean)  log(1(1+k*mean)**(1/k)); model y ~ general(ll); run;
The estimate of the β_{1} parameter (0.02483) indicates that the number of published articles increases significantly as the number of articles published by the mentor increases (p<.0001). Also, the negative binomial dispersion parameter, k, is significantly different from zero (p<.0001) indicating that a Poisson model would not be appropriate.

Since overdispersion in truncated models can result in biased parameter estimates, it may be desirable to model the dispersion as well as the mean. Modeling data exhibiting overdispersion is extensively discussed by Morel and Neerchal. This can be done by simply including the desired model for the dispersion parameter, k. In the following NLMIXED statements, k is allowed to differ between females (FEM=1) and males (FEM=0).
proc nlmixed data=long97data; where art ne 0; y=art; parms b0=1 b1=0 k0=1 k1=0; mean = exp(b0 + b1*ment); k=k0+k1*fem; ll = lgamma(y+1/k)  lgamma(1/k)  lgamma(y+1) + y*log(k*mean)  (y+(1/k))*log(1+k*mean)  log(1(1+k*mean)**(1/k)); model y ~ general(ll); run;
The significant k1 parameter (p=0.0109) indicates that there is more dispersion for males than for females.

Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  z/OS  
OpenVMS VAX  
Microsoft® Windows® for 64Bit Itaniumbased Systems  
Microsoft Windows Server 2003 Datacenter 64bit Edition  
Microsoft Windows Server 2003 Enterprise 64bit Edition  
Microsoft Windows XP 64bit Edition  
Microsoft® Windows® for x64  
OS/2  
Microsoft Windows 95/98  
Microsoft Windows 2000 Advanced Server  
Microsoft Windows 2000 Datacenter Server  
Microsoft Windows 2000 Server  
Microsoft Windows 2000 Professional  
Microsoft Windows NT Workstation  
Microsoft Windows Server 2003 Datacenter Edition  
Microsoft Windows Server 2003 Enterprise Edition  
Microsoft Windows Server 2003 Standard Edition  
Microsoft Windows Server 2003 for x64  
Microsoft Windows Server 2008  
Microsoft Windows Server 2008 for x64  
Microsoft Windows XP Professional  
Windows 7 Enterprise 32 bit  
Windows 7 Enterprise x64  
Windows 7 Home Premium 32 bit  
Windows 7 Home Premium x64  
Windows 7 Professional 32 bit  
Windows 7 Professional x64  
Windows 7 Ultimate 32 bit  
Windows 7 Ultimate x64  
Windows Millennium Edition (Me)  
Windows Vista  
Windows Vista for x64  
64bit Enabled AIX  
64bit Enabled HPUX  
64bit Enabled Solaris  
ABI+ for Intel Architecture  
AIX  
HPUX  
HPUX IPF  
IRIX  
Linux  
Linux for x64  
Linux on Itanium  
OpenVMS Alpha  
OpenVMS on HP Integrity  
Solaris  
Solaris for x64  
Tru64 UNIX 
Type:  Usage Note 
Priority:  
Topic:  Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Procedures ==> FMM 
Date Modified:  20181112 13:12:00 
Date Created:  20110614 13:09:02 