![]() | ![]() | ![]() |
When the count of an event is observed within a group or over an interval, such as deaths per 100,000 individuals, traffic accidents per year, or injuries per person-year, it is called a rate. Unlike a proportion, which ranges from 0 to 1, a rate can have any positive value, such as 4.2 deaths per 100,000 individuals or 65 accidents per year. Following is a log-linked model for a rate as a function of some predictor variables, x:
where μ is the mean event count. Note that this can be rewritten as
and it can be written as a model for log(μ) as
This is typically a Poisson or negative binomial model in which the additional term on the right-hand side, log(n), is called the offset. The offset is the log of the denominator of the rate. It is a model term in which the associated parameter is fixed at 1. An offset can be added to a model in PROC GENMOD by specifying the OFFSET= option.
The following discusses rate and rate ratio estimation in a Poisson model or negative binomial model. This note illustrates how differences, rather than ratios, of rates can be estimated. See this note for estimating and comparing rates in zero-inflated Poisson or negative binomial models.
The insurance claim example in the "Getting Started" section of the GENMOD procedure documentation illustrates fitting a Poisson model to the rate of insurance claims per policyholder, C/N, as a function of the size of car and age of the policyholder. The variable, N, is the size of the population that has a given car size and policyholder age. The following statements fit the model. The ESTIMATE, LSMEANS, and OUTPUT statements are discussed later.
data insure;
input n c car$ age;
ln = log(n);
datalines;
500 42 small 1
1200 37 medium 1
100 1 large 1
400 101 small 2
500 73 medium 2
300 14 large 2
;
proc genmod data=insure;
class car age;
model c = car age / dist = poisson
link = log
offset = ln;
estimate "Rate: age=1, small" intercept 1 age 1 0 car 0 0 1;
estimate "Age Rate Ratio" age 1 -1;
lsmeans age / diff exp cl;
output out=out xbeta=xb stdxbeta=std;
run;
The preceding statements produce results that include the following tables and which are discussed below:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Estimates of rates for individual populations can be obtained by using the ESTIMATE statement or computed using values that are obtained from the OUTPUT statement. However, each ESTIMATE statement can estimate the rate for only one population, and you must identify that population by correctly specifying coefficients in the statement. This requires knowing how parameters are associated with predictor levels. Labeling in the "Analysis of Maximum Likelihood Parameter Estimates" table makes this association clear. For example, the AGE=1, CAR=SMALL population is identified in the model by β0 + β3 + β4, that is the intercept plus the third (small) CAR parameter plus the first (1) AGE parameter. This provides the parameter coefficients used in the first ESTIMATE statement above. The sum of these three parameters yields the "L'Beta Estimate", -2.6367, which is the estimated log rate for this population. The estimated rate, 0.0716 (or about 7 claims in 100 policyholders), is labeled as the "Mean Estimate"NOTE with a confidence interval of (0.0553, 0.0927) or about 5.5 to 9.3 claims per 100 policyholders.
The OUTPUT statement can be used to compute rate estimates for all observations in the data set, and for new observations of interest if they are added to the original data set with missing response values. However, to get rate estimates and confidence intervals via the OUTPUT statement, some extra steps are needed because the OUTPUT statement provides count estimates rather than rate estimates. In the OUTPUT statement, the XBETA= option creates a variable that contains the estimated log counts. The XBETA= variable includes the offset and as shown in the third expression of the model above, x'β plus the offset is the log count. To estimate the log rate, we need to subtract the offset contribution because, as shown in the first expression of the model, x'β alone estimates the log rate. The standard errors of the log counts can be added to the OUT= data set by specifying the STDXBETA= option. The variances of the log rates and the log counts are the same because they differ only by an added constant (the offset). So, we can form a large-sample confidence interval for the log rate by using the standard error of the log count. Point and interval estimates for the rates are obtained by exponentiating the point estimates and confidence limits for the log rates.
The following statements compute point estimates of the rates by subtracting the offset and exponentiating. Large-sample 95% confidence limits are obtained by computing the limits around the log rate and then exponentiating those limits.
data predrates;
set out;
obsrate=c/n; /* observed rate */
lograte=xb-ln;
prate=exp(lograte);
lcl=exp(lograte-probit(.975)*std);
ucl=exp(lograte+probit(.975)*std);
run;
proc print data=predrates noobs;
run;
Note that the estimates for the first observation match the results from the first ESTIMATE statement above.
|
The ESTIMATE and LSMEANS statements can be used to estimate the ratio of two rates. The LSMEANS statement is the easiest way to produce rate ratio estimates.
Under the fitted model, the difference in the log rates for the two age levels for small cars is
log(μ1/n1) - log(μ2/n2) = (β0+β3+β4) - (β0+β3+β5) = 1·β4 + (-1)·β5
The coefficients 1 and -1 are used in the second ESTIMATE statement to estimate this difference. Because β5 is set to zero by the model parameterization, the difference is simply β4. The same result occurs for medium and large cars. Since the difference in logs is the log of the ratio
log(μ1/n1) - log(μ2/n2) = log[(μ1/n1) / (μ2/n2)] ,
β4 is the log rate ratio that compares AGE=1 to AGE=2 for any car size, and exp(β4) is the rate ratio comparing AGE 1 and 2. A test of β4=0 is provided in the "Analysis of Maximum Likelihood Parameter Estimates" table. Note that testing β4=0 is equivalent to testing exp(β4)=1.
Because the difference in log rates is simply a linear combination of model parameters as shown above, the ESTIMATE statement can provide point and interval estimates of it. The "L'Beta Estimate" from the second ESTIMATE statement above estimates the difference in the log rates (or log rate ratio), -1.3199, for the two AGE levels and is equivalent to the β4 parameter estimate. The "Mean Estimate" is the estimated rate ratio, 0.2672. A test that the "L'Beta Estimate" equals zero is also provided and matches the test of the β4 parameter in the "Analysis of Maximum Likelihood Parameter Estimates" table.
You can estimate and test custom rate ratios by using the appropriate coefficients in the ESTIMATE statement. See Examples of Writing CONTRAST and ESTIMATE Statements for more information about determining proper coefficients for custom contrasts.
The same comparison of AGE rates can be accomplished more easily using the LSMEANS statement since it isn't necessary to define the specific linear combination of model parameters. The LSMEANS statement above requests estimation of LS-means for each AGE level. The DIFF option provides all pairwise differences among the AGE levels. The EXP option exponentiates the LS-mean estimates and the difference estimates. This results in the estimate of the AGE rate ratio. The CL option provides confidence limits. Note that the LSMEANS statement provides the same log rate ratio and rate ratio estimates as from the ESTIMATE statement, as well as the same confidence limits and test. The LS-means for the AGE levels (not shown) do not correspond to a particular CAR and AGE population, but rather are the AGE means averaged over the CAR levels. But, as noted above, the difference in the log rates is the same regardless of the CAR setting, so the difference in LS-means provides the same results as obtained by the ESTIMATE statement.
_____
NOTE: In releases prior to SAS 9.2, the EXP option is needed in the ESTIMATE statement to exponentiate the estimated linear combination resulting in a rate ratio estimate. Beginning in SAS 9.2, the EXP option is no longer needed since estimates of the contrast applying the inverse link function (labeled "Mean") are provided by default.
| Product Family | Product | System | SAS Release | |
| Reported | Fixed* | |||
| SAS System | SAS/STAT | All | n/a | |
| Type: | Usage Note |
| Priority: | low |
| Topic: | SAS Reference ==> Procedures ==> GENMOD Analytics ==> Regression |
| Date Modified: | 2006-10-04 17:31:12 |
| Date Created: | 2004-10-22 15:07:07 |




