For any model fit in PROC GENMOD (in SAS/STAT^{®} software) or PROC COUNTREG (in SAS/ETS^{®} software), including zeroinflated models, the PRED= option in the OUTPUT statement provides the estimated mean for each observation. However, the standard error and confidence limits are not available for the mean count or rate in zeroinflated models. Currently, neither the LSMEANS statement nor the ESTIMATE statement in PROC GENMOD provides mean count or rate estimates for zeroinflated models. To obtain estimated mean counts or rates along with confidence limits, you can use the NLEstimate macro or fit the model in PROC NLMIXED and use its ESTIMATE or PREDICT statement. An easier approach, available beginning in SAS 9.4 TS1M1, is to use the SCORE statement in PROC PLM. With the NLEstimate macro or PROC NLMIXED you can also make comparisons of mean counts or rates. The use of the macro and both procedures is illustrated below.
The following statements create a hypothetical insurance claims data set in which claims are classified by age group with two levels and car type with three levels.
data insure; input c car age; datalines; 10 1 1 37 2 1 100 3 1 0 1 2 0 2 2 80 3 2 ;
The GENMOD step below fits a zeroinflated Poisson (ZIP) model to the insurance claims data. Note that the same model can be fit with PROC COUNTREG using very similar syntax or with PROC FMM as a finite mixture model with two components. Zeroinflated negative binomial (ZINB) models can be fit in GENMOD or COUNTREG using the DIST=ZINB option, or in FMM.^{NOTE1}
The LSMEANS statement requests LSmean estimates for each AGE level. The E option displays the linear combination of model parameters which produces the LSmean estimates. The ILINK option applies the inverse link function (exponentiation for loglinked models) to the estimates to provide estimates on the mean count scale. The ESTIMATE statement specifies the same linear combination of model parameters as the LSMEANS statement for AGE=1 and also estimates the zeroinflation parameter. The coefficients following @ZERO in the ESTIMATE statement estimate the zeroinflation parameter for AGE=1, CAR=2. The P= option in the OUTPUT statement saves the predicted mean count for each observation to a data set. The PZERO= option saves the estimated zeroinflation parameter for each observation. The STORE statement saves the fitted model for later use with PROC PLM. The COVB option and ODS OUTPUT statement create and save model information for later use by the NLEstimate macro.
proc genmod data=insure; class age; model c = age car / dist=zip covb; zeromodel car; lsmeans age / e ilink; estimate 'age=1, car=2 LSmean' intercept 1 age 1 0 car 2 @zero intercept 1 car 2; output out=gout pred=p pzero=p0; store out=zip1; ods output parameterestimates=mpe zeroparameterestimates=zpe covb=covb; run;
Notice that the table of LSmeans coefficients does not include any coefficients for the parameters in the zeroinflation portion of the model. The LSMEANS statement provides estimates based only on the Poisson portion of the model, not for the zeroinflated model. A Note indicating this is displayed in the SAS log. The ESTIMATE statement provides separate estimates for the Poisson and zeroinflation portions of the model. The estimate for the Poisson portion of the model (34.0898) matches the LSmean estimate for AGE=1, CAR=2.

These statements display the OUT= data set produced by the OUTPUT statement.
proc print data=gout noobs label; run;
The values in variable P (labeled "Predicted Value") are the mean counts estimated by the ZIP model. Note the difference between the estimated mean from the zeroinflated model (24.0563) provided by the OUTPUT statement and the estimated mean using only the Poisson part of the model (34.0898) provided by the ESTIMATE and LSMEANS statements. Also notice that the value for the zeroinflation parameter (labeled "Zero Inflation Probability") for AGE=1, CAR=2 is identical to the value provided by the ESTIMATE statement (0.2943).

Using PROC PLM
Beginning in SAS 9.4 TS1M1, PROC PLM can provide the mean counts along with standard errors and confidence limits. The SOURCE= option reads the saved ZIP model. The SCORE statement requests predicted means, standard errors, and confidence limits and stores them in data set PLMOUT. The ILINK option requests that these values be on the mean (count) scale rather than the linked (log count) scale.
proc plm source=zip1; score data=insure out=plmout pred stderr lclm uclm / ilink; run; proc print noobs label; run;
Note that the predicted counts match those from GENMOD.

Using the NLEstimate macro
As shown in "ZeroInflated Models" in the Details section of the GENMOD documentation, the mean count for zeroinflated models is (1ω)λ, where ω is the zeroinflation parameter and λ is the mean of the Poisson or negative binomial portion of the model. Note that a binary response model (by default, a logistic model) is used to estimate ω. The model for estimating λ is typically a loglinked model. Since the resulting mean estimator is a nonlinear function of the model parameters, it cannot be estimated by the ESTIMATE statement in PROC GENMOD, but it can be estimated by the NLEstimate macro or in PROC NLMIXED.
The NLEstimate macro uses the fitted model information (parameter estimates and covariance matrix) saved by the ODS OUTPUT statement in PROC GENMOD. It then uses PROC NLMIXED to estimate the function and the delta method to obtain confidence limits. When estimating several functions, it is useful to specify them in a data set with appropriate labels. This is done in the following statements and the resulting data set is specified in the fdata= NLEstimate macro parameter. You write the functions to be estimated using the parameter names. See the description of the NLEstimate macro for details about displaying parameter names and using the macro.
Since PROC GENMOD produces separate tables for the two parts of the zeroinflated model, the two tables are saved to separate data sets by the ODS OUTPUT statement above. The following statements combine them into a single data set for use in the NLEstimate macro. The IF statement is needed to remove rows for parameters with zero degrees of freedom since such parameters are not included in the covariance matrix. Removing the rows is needed to make the resulting vector of model parameter estimates compatible with the covariance matrix.
data pe; set mpe zpe; if df ne 0; run;
These statements produce a data set of the nonlinear functions to estimate using the formula for the mean shown above. The NLEstimate macro is then called to perform the estimation for each population in the data.
data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; car=1 age=1,(1logistic(b_p4+b_p5))*exp(b_p1+b_p2+b_p3) car=2 age=1,(1logistic(b_p4+2*b_p5))*exp(b_p1+b_p2+2*b_p3) car=3 age=1,(1logistic(b_p4+3*b_p5))*exp(b_p1+b_p2+3*b_p3) car=1 age=2,(1logistic(b_p4+b_p5))*exp(b_p1+b_p3) car=2 age=2,(1logistic(b_p4+2*b_p5))*exp(b_p1+2*b_p3) car=3 age=2,(1logistic(b_p4+3*b_p5))*exp(b_p1+3*b_p3) ; %NLEstimate(inest=pe, incovb=covb, fdata=fd, df=6)
Note that the predicted mean counts (Predicted Value) match those from PROC GENMOD. The data set also includes standard errors and confidence limits for the predicted counts. These differ from the PLM results because of the use of different methods. Note that the confidence intervals provided by the NLEstimate macro are symmetric around the predicted means while the intervals from PROC PLM are not. See "Scoring Data Sets for ZeroInflated Models" in the Details section of the PLM documentation for formulas showing the method used by PROC PLM. The NLEstimate macro uses the delta method to estimate the standard error and forms a confidence interval using a quantile from the t distribution.

Using PROC NLMIXED
Mean estimates, standard errors and confidence limits can also be obtained by fitting the zeroinflated model directly in PROC NLMIXED. The following statements fit the same ZIP model as in PROC GENMOD above. The zeroinflation part of the model, which is a logistic model, is defined in the first two assignment statements. INFPROB is the zeroinflation parameter, ω. The Poisson portion of the model is defined in the next assignment statement, and LAMBDA is the Poisson mean parameter, λ. The parameters to be estimated are inf0 and inf1 in the zero inflation model, and b0, b1, and b2 in the Poisson model. The next lines define the ZIP log likelihood. The PREDICT statement requests a data set containing a mean count estimate and confidence interval for each observation. Zeroinflated negative binomial models require different IFELSE statements to define the ZINB log likelihood.^{NOTE2}
proc nlmixed data=insure; /* define zero inflation model */ zeromodel = inf0 + inf1*car; infprob = logistic(zeromodel); /* define Poisson model */ lambda = exp(b0 + b1*(age=1) + b2*car); /* define ZIP log likelihood */ if c=0 then ll = log(infprob + (1infprob)*exp(lambda)); else ll = log((1infprob)) + c*log(lambda)  lgamma(c+1)  lambda; model c ~ general(ll); /* request predicted counts */ predict (1infprob)*lambda out=nout; run;
The results in the OUT= data set are the same as those from the NLEstimate macro.
Comparing mean counts among populations
Comparisons among the population mean counts can be done using the ESTIMATE statement. For example, suppose you want a test and confidence interval for the difference in mean counts between the AGE=1, CAR=2 population and the AGE=2, CAR=2 population. The difference in these means is (1ω_{12})λ_{12}  (1ω_{22})λ_{22}.
This estimate can be produced in the NLEstimate macro by adding this function to the fdata= data set.
Mean Count Difference,(1logistic(b_p4+2*b_p5))*exp(b_p1+b_p2+2*b_p3)(1logistic(b_p4+2*b_p5))*exp(b_p1+2*b_p3)
Equivalently, in the PROC NLMIXED step above, add these statements before the RUN statement to produce the estimate, test, and confidence interval. Note that the entire expression could be written in the ESTIMATE statement if desired, eliminating the preceding assignment statements, but that is not done here to clarify the process.
/* AGE=1, CAR=2 */ zeromodel12 = inf0 + inf1*2; infprob12 = logistic(zeromodel12); lambda12 = exp(b0 + b1 + b2*2); /* AGE=2, CAR=2 */ zeromodel22 = inf0 + inf1*2; infprob22 = logistic(zeromodel22); lambda22 = exp(b0 + b2*2); /* estimate and test difference in mean counts */ estimate "Mean Count Difference" (1infprob12)*lambda12  (1infprob22)*lambda22;
Notice that the estimated difference (5.0876) is the difference of the two estimated means above (24.0563  18.9687). The means do not differ significantly (p=0.1778).

By including an offset in a Poisson or negative binomial model, you can model a rate rather than a count. This is explained in this note for models that are not zeroinflated. The same applies to zeroinflated models. The offset is the log of the rate denominator.
These statements create another hypothetical data set of insurance claims. In each observation, the number of claims (C) and the number of policyholders (N) is recorded for each car type and age group. Since the number of policyholders varies across populations, the rates (number of claims per policyholder, or per 100 policyholders) will be modeled rather than the number of claims. The log of the number of policyholders (LN) is computed for use as an offset in the ZIP model.
data insure; input n c car$ age; ln = log(n); datalines; 500 0 small 1 1200 37 medium 1 100 0 large 1 400 101 small 2 500 73 medium 2 300 0 large 2 ;
PROC GENMOD fits a ZIP model to the claims data and saves a data set of predicted values. In this example, the mean model is a function only of age and the zeroinflation model contains only an intercept. Since the offset is included in the computation of predicted values by the PRED= option in the OUTPUT statement, the predicted values are mean counts rather than rates. Again, this model can also be fit using the COUNTREG or FMM procedure.^{NOTE3} The STORE statement saves the fitted model. The COVB option and ODS OUTPUT statement save the model information for use in the NLEstimate macro later.
proc genmod data=insure; model c = age / dist=zip offset=ln covb; zeromodel ; output out=gout pred=p; store out=zip2; ods output parameterestimates=mpe zeroparameterestimates=zpe covb=covb; run;
These are the parameter estimates of the fitted zeroinflated Poisson model.

The following DATA step computes the estimated rates by dividing the estimated counts by the number of policyholders.
data gout; set gout; prate=p/n; run; proc print noobs; run;

Using PROC PLM
These statements use PROC PLM to produce the predicted rates as well as standard errors and confidence limits for the rates. The addition of the NOOFFSET option excludes the offset from the computations so that the predicted rates are produced rather than predicted counts.
proc plm source=zip2; score data=insure out=plm2 pred=p stderr=s lclm=l uclm=u / ilink nooffset; run; proc print noobs label; run;
The predicted rates from PLM match those above and include standard errors and confidence limits.

Using the NLEstimate macro
As described in earlier, the separate tables of parameter estimates must be combined and rows for parameters with zero degrees of freedom must be removed so the parameter vector and covariance matrix are compatible.
data pe; set mpe zpe; if df ne 0; run;
Predicted rates for the two ages are then obtained from the NLEstimate macro.
data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; age=1,(1logistic(b_p3))*exp(b_p1+b_p2) age=2,(1logistic(b_p3))*exp(b_p1+2*b_p2) ; %NLEstimate(inest=pe, incovb=covb, fdata=fd, df=6)
The estimated rate for AGE=1 is 0.01559, or about 1.6 claims per 100 policyholders. For AGE=2, the rate is 0.09814, or about 9.8 claims per 100 policyholders. Differences with the PLM results are for reasons described above.

Using PROC NLMIXED
By using PROC NLMIXED to fit the zeroinflated model you can use its PREDICT statement to estimate the rates and provide these additional statistics. The following statements fit the same ZIP model as above. Notice the inclusion of LN without a coefficient in the Poisson part of the model since an offset has a coefficient fixed at 1. In the PREDICT statement, the ZIP mean is divided by the rate denominator as was done in the DATA step above. The predicted rates and confidence limits are saved in data set NLMOUT.
proc nlmixed data=insure; /* define zero inflation model */ zeromodel = a0; infprob = logistic(zeromodel); /* define Poisson model */ lambda = exp(b0 + ln + b1*age); /* define ZIP log likelihood */ if c=0 then ll = log(infprob + (1infprob)*exp(lambda)); else ll = log((1infprob)) + c*log(lambda)  lgamma(c+1)  lambda; model c ~ general(ll); /* request predicted rates */ predict (1infprob)*lambda/n out=nlmout; run;
The results in the OUT= data set match those from the NLEstimate macro.
Comparing rates among populations
A similar approach as used for comparing mean counts can be used for comparing rates. The rate difference is (1ω)λ_{1}/n_{1}  (1ω)λ_{2}/n_{2} = (1ω)exp(b_{0}+b_{1}+log(n_{1}))/n_{1}  (1ω)exp(b_{0}+2b_{1}+log(n_{2}))/n_{2} = (1ω)exp(b_{0}+b_{1})  (1ω)exp(b_{0}+2b_{1}).
The estimate of this difference is produced by the NLEstimate macro by adding this function to the fdata= data set.
Mean Rate Difference,(1logistic(b_p3))*exp(b_p1+b_p2)(1logistic(b_p3))*exp(b_p1+2*b_p2)
Comparison of the AGE level rates can also be done in PROC NLMIXED by adding the following statement in the NLMIXED step above.
estimate 'diff' (1infprob)*exp(b0 + b1*1)  (1infprob)*exp(b0 + b1*2);
The difference in rates is as expected from the rates above (0.015592  0.098136 = 0.082544) and this difference is not quite significant at the 0.05 level (p=0.0546). The results from the NLEstimate macro and PROC NLMIXED are the same.

_______
NOTE1: The following statements fit the ZIP model for counts in the COUNTREG and FMM procedures. The OUTPUT statements also save the predicted values and the inflation probabilities from the ZIP model.
proc countreg data=insure; class age; model c = age car / dist=zip; zeromodel c ~ car; output out=cout pred=p probzero=p0; run; proc fmm data=insure; class age; model c = / dist=constant; model + age car / dist=poisson; probmodel car; output out=fout pred=p mixprob=p0; run;
NOTE2: These statements can be used to define the zeroinflated negative binomial (ZINB) log likelihood in PROC NLMIXED:
/* define ZINB log likelihood */ if c=0 then ll = log( infprob + (1infprob)/(1+k*lambda)**(1/k) ); else ll = log((1infprob)) + c*log(k*lambda)  (c+(1/k))*log(1+k*lambda) + lgamma(c+1/k)  lgamma(1/k)  lgamma(c+1);
NOTE3: These statements fit the ZIP model for rates in the COUNTREG and FMM procedures.
proc countreg data=insure; model c = age / dist=zip offset=ln; zeromodel c ~; output out=cout pred=p; run; proc fmm data=insure; model c= / dist=constant; model + age / dist=poisson offset=ln; output out=fout pred=p; run;
Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  z/OS  
OpenVMS VAX  
Microsoft® Windows® for 64Bit Itaniumbased Systems  
Microsoft Windows Server 2003 Datacenter 64bit Edition  
Microsoft Windows Server 2003 Enterprise 64bit Edition  
Microsoft Windows XP 64bit Edition  
Microsoft® Windows® for x64  
OS/2  
Microsoft Windows 95/98  
Microsoft Windows 2000 Advanced Server  
Microsoft Windows 2000 Datacenter Server  
Microsoft Windows 2000 Server  
Microsoft Windows 2000 Professional  
Microsoft Windows NT Workstation  
Microsoft Windows Server 2003 Datacenter Edition  
Microsoft Windows Server 2003 Enterprise Edition  
Microsoft Windows Server 2003 Standard Edition  
Microsoft Windows Server 2003 for x64  
Microsoft Windows Server 2008  
Microsoft Windows Server 2008 for x64  
Microsoft Windows XP Professional  
Windows 7 Enterprise 32 bit  
Windows 7 Enterprise x64  
Windows 7 Home Premium 32 bit  
Windows 7 Home Premium x64  
Windows 7 Professional 32 bit  
Windows 7 Professional x64  
Windows 7 Ultimate 32 bit  
Windows 7 Ultimate x64  
Windows Millennium Edition (Me)  
Windows Vista  
Windows Vista for x64  
64bit Enabled AIX  
64bit Enabled HPUX  
64bit Enabled Solaris  
ABI+ for Intel Architecture  
AIX  
HPUX  
HPUX IPF  
IRIX  
Linux  
Linux for x64  
Linux on Itanium  
OpenVMS Alpha  
OpenVMS on HP Integrity  
Solaris  
Solaris for x64  
Tru64 UNIX 
Type:  Usage Note 
Priority:  
Topic:  Analytics ==> Regression SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> COUNTREG SAS Reference ==> Procedures ==> FMM SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Procedures ==> PLM SAS Reference ==> Macro 
Date Modified:  20111005 16:27:27 
Date Created:  20110920 16:27:37 