SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 37344: Estimating rate differences (with confidence interval) using a Poisson model

DetailsAboutRate It

You can estimate rates in PROC GENMOD using a log-linked Poisson or negative binomial model with an offset as discussed and illustrated in this note. Since the log of the rate is the response function, such a model enables you to estimate the log rate for a setting of the predictors. By exponentiating, you can estimate the rate. Similarly, the difference between two predictor settings results in an estimated difference in log rates that is equivalent to a log rate ratio. By exponentiating, you can estimate the rate ratio comparing the settings. In PROC GENMOD, using the DIFF option in the LSMEANS statement, or specifying the equivalent linear combination of model parameters in the ESTIMATE statement can provide estimates of rates and rate ratios. However, the difference in rates cannot be obtained with these statements.

To estimate the difference in rates, you need to estimate a nonlinear function of the model parameters. This can be done using the NLMeans macro, the NLEstimate macro, or from PROC NLMIXED. All three are illustrated in this note.

An analogous problem for a logistic model is to compute differences of probabilities rather than odds ratios. Both macros can also be used for this as well.

Using a subset of the car insurance data that appears in the "Getting Started" section of the GENMOD documentation, the following statements begin by computing estimates of the rates for the two levels of age in the data and the rate ratio comparing the ages as presented in this note. Note that the observed rate is created as variable ObsRate. The fitted model is saved for later use with the STORE statement.

     data insure;
         input n c age;
         ln = log(n);
         ObsRate=c/n;
         datalines;
         500   42  1
         400  101  2
         ;
   
      proc genmod data=insure;
         class age;
         model c=age / dist=poisson offset=ln;
         estimate 'Age rate ratio' age 1 -1;
         lsmeans age / e diff exp cl;
         ods output coef=coeffs;
         store out=insmodel;
         run;

The rate estimates for the two age levels are provided by the EXP option in the LSMEANS statement. Since the model is saturated in this example, the predicted rates (in the Exponentiated column) are identical to the observed rates — 0.0840 for age 1 and 0.2525 for age 2. The ESTIMATE and LSMEANS statements provide the rate ratio estimate, 0.3327, comparing the two ages. Confidence intervals for the rates and the rate ratio are given.

Analysis Of Maximum Likelihood Parameter Estimates
Parameter   DF Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept   1 -1.3763 0.0995 -1.5714 -1.1813 191.33 <.0001
age 1 1 -1.1006 0.1836 -1.4605 -0.7407 35.93 <.0001
age 2 0 0.0000 0.0000 0.0000 0.0000 . .
Scale   0 1.0000 0.0000 1.0000 1.0000    
 
Contrast Estimate Results
Label Mean Estimate Mean L'Beta Estimate Standard Error Alpha L'Beta Chi-Square Pr > ChiSq
Confidence Limits Confidence Limits
Age rate ratio 0.3327 0.2321 0.4768 -1.1006 0.1836 0.05 -1.4605 -0.7407 35.93 <.0001
 
age Least Squares Means
age Estimate Standard Error z Value Pr > |z| Alpha Lower Upper Exponentiated Exponentiated
Lower
Exponentiated
Upper
1 -2.4769 0.1543 -16.05 <.0001 0.05 -2.7794 -2.1745 0.08400 0.06208 0.1137
2 -1.3763 0.09950 -13.83 <.0001 0.05 -1.5714 -1.1813 0.2525 0.2078 0.3069
 
Differences of age Least Squares Means
age _age Estimate Standard Error z Value Pr > |z| Alpha Lower Upper Exponentiated Exponentiated
Lower
Exponentiated
Upper
1 2 -1.1006 0.1836 -5.99 <.0001 0.05 -1.4605 -0.7407 0.3327 0.2321 0.4768
 

The SCORE statement in PROC PLM can be used to provide rate estimates and confidence intervals for each observation in a data set using the fitted model. The rate estimate is provided when the NOOFFSET and ILINK options are used.

      proc plm source=insmodel;
         score data=insure out=PredRates pred stderr lclm uclm / nooffset ilink;
         run;
   
      proc print data=PredRates noobs label;
         run;
n c age ln ObsRate Predicted Value Standard Error Lower 95% Confidence
Limit
Upper 95% Confidence
Limit
500 42 1 6.21461 0.0840 0.0840 0.012961 0.06208 0.11366
400 101 2 5.99146 0.2525 0.2525 0.025125 0.20776 0.30687

Rate difference using the NLMeans macro

While the difference in rates cannot be obtained from the LSMEANS or ESTIMATE statements as noted above, it can be estimated using the NLMeans macro. To use the macro, you need to supply the saved model from the STORE statement and a data set of coefficients that define the individual LS-means. This coefficients data set is made available by the E option in the LSMEANS statement and is saved by the ODS OUTPUT statement shown above. Finally, you specify that the link function used in the model is the log link.

      %NLMeans(instore=insmodel, coef=coeffs, link=log, title=Difference of Age Rates)

Note that the estimated rate difference is exactly the difference between the rates shown in the PredRates data set produced by PROC PLM: 0.0840 - 0.2525 = -0.1685. This difference is significantly nonzero (p<0.0001). A confidence interval for the rate difference, (-0.2239, -0.1131), is also given.

Difference of Age Rates
 
Label Estimate Standard Error Wald Chi-Square Pr > ChiSq Alpha Lower Upper
1 -1 -0.1685 0.02827 35.5236 <.0001 0.05 -0.2239 -0.1131

Rate difference using the NLEstimate macro

Next, the NLEstimate macro is used to estimate the rate difference. Note that the estimated log rate for age=1 is Intercept+age1 and the estimated log rate for age=2 is just Intercept, where Intercept, age1, and age2 are the parameter estimates of the fitted model shown above. The difference in rates is then exp(Intercept+age1) - exp(Intercept). Specify this function of the model parameters in the f= parameter of the NLEstimate macro. See the description of the NLEstimate macro for details such as obtaining parameter names.

      %NLEstimate(instore=insmodel, label=Rate Difference,
                  f=exp(b_p1+b_p2)-exp(b_p1),
                  title=Difference of Age Rates)

The results match those from the NLMeans macro above.

Difference of Age Rates
 
Label Estimate Standard Error Wald Chi-Square Pr > ChiSq Alpha Lower Upper
Rate Difference -0.1685 0.02827 35.5236 <.0001 0.05 -0.2239 -0.1131

Rate difference using the PROC NLMIXED

The rate difference can also be estimated by fitting the Poisson model using PROC NLMIXED as follows. The LAMBDA= assignment statement expresses the Poisson mean parameter, lambda, as a function of age, the offset (ln), and the model parameters (b0 and b1). The MODEL statement indicates that the response count, c, is to be modeled as a Poisson variable with mean lambda. The ESTIMATE statement is used to estimate the rate difference. A large degrees of freedom value (df=1e8) is used to produce large-sample (Wald) statistics like those from the NLMeans and NLEstimate macros above. Note that the estimated log rate for age 1 is b0+b1 and the estimated log rate for age 2 is just b0. For details about using PROC NLMIXED, see the NLMIXED documentation.

      proc nlmixed data=insure;
         lambda = exp(b0 + b1*(age=1) + ln);
         model c ~ poisson(lambda);
         estimate "Rate Difference" exp(b0+b1)-exp(b0) df=1e8;
         run;

Again, the results match those from the macros above.

Additional Estimates
Label Estimate Standard
Error
DF t Value Pr > |t| Alpha Lower Upper
Rate Difference -0.1685 0.02827 1E8 -5.96 <.0001 0.05 -0.2239 -0.1131


Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemSAS/STATz/OS
OpenVMS VAX
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 7
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2008
Microsoft Windows XP Professional
Windows Millennium Edition (Me)
Windows Vista
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.