You can estimate rates in PROC GENMOD using a log-linked Poisson or negative binomial model with an offset as discussed and illustrated in this note. Since the log of the rate is the response function, such a model enables you to estimate the log rate for a setting of the predictors. By exponentiating, you can estimate the rate. Similarly, the difference between two predictor settings results in an estimated difference in log rates that is equivalent to a log rate ratio. By exponentiating, you can estimate the rate ratio comparing the settings. In PROC GENMOD, using the DIFF option in the LSMEANS statement, or specifying the equivalent linear combination of model parameters in the ESTIMATE statement can provide estimates of rates and rate ratios. However, the difference in rates cannot be obtained with these statements.
To estimate the difference in rates, you need to estimate a nonlinear function of the model parameters. This can be done using the NLMeans macro, the NLEstimate macro, or from PROC NLMIXED. All three are illustrated in this note.
An analogous problem for a logistic model is to compute differences of probabilities rather than odds ratios. Both macros can also be used for this as well.
Using a subset of the car insurance data that appears in the "Getting Started" section of the GENMOD documentation, the following statements begin by computing estimates of the rates for the two levels of age in the data and the rate ratio comparing the ages as presented in this note. Note that the observed rate is created as variable ObsRate. The fitted model is saved for later use with the STORE statement.
data insure;
input n c age;
ln = log(n);
ObsRate=c/n;
datalines;
500 42 1
400 101 2
;
proc genmod data=insure;
class age;
model c=age / dist=poisson offset=ln;
estimate 'Age rate ratio' age 1 -1;
lsmeans age / e diff exp cl;
ods output coef=coeffs;
store out=insmodel;
run;
The rate estimates for the two age levels are provided by the EXP option in the LSMEANS statement. Since the model is saturated in this example, the predicted rates (in the Exponentiated column) are identical to the observed rates — 0.0840 for age 1 and 0.2525 for age 2. The ESTIMATE and LSMEANS statements provide the rate ratio estimate, 0.3327, comparing the two ages. Confidence intervals for the rates and the rate ratio are given.
|
1 |
-1.3763 |
0.0995 |
-1.5714 |
-1.1813 |
191.33 |
<.0001 |
1 |
1 |
-1.1006 |
0.1836 |
-1.4605 |
-0.7407 |
35.93 |
<.0001 |
2 |
0 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
. |
. |
|
0 |
1.0000 |
0.0000 |
1.0000 |
1.0000 |
|
|
0.3327 |
0.2321 |
0.4768 |
-1.1006 |
0.1836 |
0.05 |
-1.4605 |
-0.7407 |
35.93 |
<.0001 |
1 |
-2.4769 |
0.1543 |
-16.05 |
<.0001 |
0.05 |
-2.7794 |
-2.1745 |
0.08400 |
0.06208 |
0.1137 |
2 |
-1.3763 |
0.09950 |
-13.83 |
<.0001 |
0.05 |
-1.5714 |
-1.1813 |
0.2525 |
0.2078 |
0.3069 |
1 |
2 |
-1.1006 |
0.1836 |
-5.99 |
<.0001 |
0.05 |
-1.4605 |
-0.7407 |
0.3327 |
0.2321 |
0.4768 |
|
The SCORE statement in PROC PLM can be used to provide rate estimates and confidence intervals for each observation in a data set using the fitted model. The rate estimate is provided when the NOOFFSET and ILINK options are used.
proc plm source=insmodel;
score data=insure out=PredRates pred stderr lclm uclm / nooffset ilink;
run;
proc print data=PredRates noobs label;
run;
500 |
42 |
1 |
6.21461 |
0.0840 |
0.0840 |
0.012961 |
0.06208 |
0.11366 |
400 |
101 |
2 |
5.99146 |
0.2525 |
0.2525 |
0.025125 |
0.20776 |
0.30687 |
|
Rate difference using the NLMeans macro
While the difference in rates cannot be obtained from the LSMEANS or ESTIMATE statements as noted above, it can be estimated using the NLMeans macro. To use the macro, you need to supply the saved model from the STORE statement and a data set of coefficients that define the individual LS-means. This coefficients data set is made available by the E option in the LSMEANS statement and is saved by the ODS OUTPUT statement shown above. Finally, you specify that the link function used in the model is the log link.
%NLMeans(instore=insmodel, coef=coeffs, link=log, title=Difference of Age Rates)
Note that the estimated rate difference is exactly the difference between the rates shown in the PredRates data set produced by PROC PLM: 0.0840 - 0.2525 = -0.1685. This difference is significantly nonzero (p<0.0001). A confidence interval for the rate difference, (-0.2239, -0.1131), is also given.
-0.1685 |
0.02827 |
35.5236 |
<.0001 |
0.05 |
-0.2239 |
-0.1131 |
|
Rate difference using the NLEstimate macro
Next, the NLEstimate macro is used to estimate the rate difference. Note that the estimated log rate for age=1 is Intercept+age1 and the estimated log rate for age=2 is just Intercept, where Intercept, age1, and age2 are the parameter estimates of the fitted model shown above. The difference in rates is then exp(Intercept+age1) - exp(Intercept). Specify this function of the model parameters in the f= parameter of the NLEstimate macro. See the description of the NLEstimate macro for details such as obtaining parameter names.
%NLEstimate(instore=insmodel, label=Rate Difference,
f=exp(b_p1+b_p2)-exp(b_p1),
title=Difference of Age Rates)
The results match those from the NLMeans macro above.
-0.1685 |
0.02827 |
35.5236 |
<.0001 |
0.05 |
-0.2239 |
-0.1131 |
|
Rate difference using the PROC NLMIXED
The rate difference can also be estimated by fitting the Poisson model using PROC NLMIXED as follows. The LAMBDA= assignment statement expresses the Poisson mean parameter, lambda, as a function of age, the offset (ln), and the model parameters (b0 and b1). The MODEL statement indicates that the response count, c, is to be modeled as a Poisson variable with mean lambda. The ESTIMATE statement is used to estimate the rate difference. A large degrees of freedom value (df=1e8) is used to produce large-sample (Wald) statistics like those from the NLMeans and NLEstimate macros above. Note that the estimated log rate for age 1 is b0+b1 and the estimated log rate for age 2 is just b0. For details about using PROC NLMIXED, see the NLMIXED documentation.
proc nlmixed data=insure;
lambda = exp(b0 + b1*(age=1) + ln);
model c ~ poisson(lambda);
estimate "Rate Difference" exp(b0+b1)-exp(b0) df=1e8;
run;
Again, the results match those from the macros above.
-0.1685 |
0.02827 |
1E8 |
-5.96 |
<.0001 |
0.05 |
-0.2239 |
-0.1131 |
|
Operating System and Release Information
SAS System | SAS/STAT | z/OS | | |
OpenVMS VAX | | |
Microsoft® Windows® for 64-Bit Itanium-based Systems | | |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | | |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | | |
Microsoft Windows XP 64-bit Edition | | |
Microsoft® Windows® for x64 | | |
OS/2 | | |
Microsoft Windows 7 | | |
Microsoft Windows 95/98 | | |
Microsoft Windows 2000 Advanced Server | | |
Microsoft Windows 2000 Datacenter Server | | |
Microsoft Windows 2000 Server | | |
Microsoft Windows 2000 Professional | | |
Microsoft Windows NT Workstation | | |
Microsoft Windows Server 2003 Datacenter Edition | | |
Microsoft Windows Server 2003 Enterprise Edition | | |
Microsoft Windows Server 2003 Standard Edition | | |
Microsoft Windows Server 2008 | | |
Microsoft Windows XP Professional | | |
Windows Millennium Edition (Me) | | |
Windows Vista | | |
64-bit Enabled AIX | | |
64-bit Enabled HP-UX | | |
64-bit Enabled Solaris | | |
ABI+ for Intel Architecture | | |
AIX | | |
HP-UX | | |
HP-UX IPF | | |
IRIX | | |
Linux | | |
Linux for x64 | | |
Linux on Itanium | | |
OpenVMS Alpha | | |
OpenVMS on HP Integrity | | |
Solaris | | |
Solaris for x64 | | |
Tru64 UNIX | | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.