![]() | ![]() | ![]() |
When the count of an event is observed within a group or over an interval, such as deaths per 100,000 individuals, traffic accidents per year, or injuries per person-year, it is called a rate. Unlike a proportion, which ranges from 0 to 1, a rate can have any positive value, such as 4.2 deaths per 100,000 individuals or 65 accidents per year. Following is a log-linked model for a rate as a function of some predictor variables, x:
where μ is the mean event count. Note that this can be rewritten as
and it can be written as a model for log(μ) as
This is a Poisson model in which the additional term on the right-hand side, log(n), is called the offset. The offset is the log of the denominator of the rate. It is a model term in which the associated parameter is fixed at 1. An offset can be added to a Poisson model in PROC GENMOD by using the OFFSET= option.
The insurance claim example in the "Getting Started" section of the GENMOD procedure documentation shows a model for the rate of insurance claims per policyholder, C/N, as a function of the size of car and age of the policyholder. The variable N is the size of the population that has a given car size and policyholder age. The following statements fit the model. The ESTIMATE and OUTPUT statements are optional and are discussed later.
data insure;
input n c car$ age;
ln = log(n);
datalines;
500 42 small 1
1200 37 medium 1
100 1 large 1
400 101 small 2
500 73 medium 2
300 14 large 2
;
proc genmod data=insure;
class car age;
model c = car age / dist = poisson
link = log
offset = ln;
estimate "Log Age Rate Ratio" age 1 -1 / exp;
estimate "Log rate age=1, small" intercept 1 age 1 0 car 0 0 1 / exp;
output out=out xbeta=xb stdxbeta=std;
run;
The preceding statements produce results that include the following tables:
Under the fitted model, the difference in the log of the rates for the two age levels is the AGE parameter (the β4 parameter) for any car size. For instance, for small cars,
Because the difference in log rates is only a linear combination of model parameters, the ESTIMATE statement can provide point- and confidence-interval estimates of it. With the addition of the EXP option, it can also estimate the rate ratio. The ESTIMATE statement in the preceding program produces the estimated rate ratio 0.2672. Because β4=0 implies exp(β4)=1, the test that the rate ratio is equal to one is equivalent to testing that the AGE parameter is equal to zero. This test is provided by the ESTIMATE statement (p<0.0001). The same test also appears in the Parameter Estimates table. You can estimate and test custom rate ratios by using the appropriate contrast coefficients in the ESTIMATE statement. See Examples of Writing CONTRAST and ESTIMATE Statements for more information about determining proper coefficients for custom contrasts.
Estimates of rates for individual populations can be obtained by using the ESTIMATE statement or computed by using values that are obtained from the OUTPUT statement. However, each ESTIMATE statement can estimate the rate for only one population, and you must identify that population by correctly specifying coefficients in the statement. This requires knowing exactly how the model is specified. Using the model for this example, the AGE=1, CAR=SMALL population is identified by β0 + β3 + β4 — the intercept plus the third CAR parameter plus the first AGE parameter. This results in the second ESTIMATE statement above. The estimated rate in this population is 0.0716 (or about 7 claims in 100 policyholders) with a confidence interval of (0.0553, 0.0927).
The OUTPUT statement can be used to compute rate estimates for all observations in the data set, and for new observations of interest if they are added to the original data set with missing response values. However, to get rate estimates and confidence intervals via the OUTPUT statement, some extra steps are needed, because the OUTPUT statement provides count estimates rather than rate estimates. In the OUTPUT statement, the XBETA= option creates a variable that contains the estimated log counts, because both the x'β and the offset terms are used and together they estimate the log count. To estimate the log rate, we need to subtract the offset contribution because, as shown above, x'β alone estimates the log rate. The standard errors of the log counts are also added to the data set by using the STDXBETA= option. The variances of the log rates and the log counts are the same because they differ only by a constant (the offset). So we can form a large-sample confidence interval for the log rate by using the standard error of the log count. Point and interval estimates for the rates are obtained by exponentiating the point estimates and confidence limits for the log rates.
The following statements compute point estimates of the rates by subtracting the offset and exponentiating. Large-sample 95% confidence limits are obtained by computing the limits around the log rate and then exponentiating those limits.
data predrates;
set out;
obsrate=c/n; /* observed rate */
lograte=xb-ln;
prate=exp(lograte);
lcl=exp(lograte-probit(.975)*std);
ucl=exp(lograte+probit(.975)*std);
run;
proc print data=predrates noobs;
run;
Note that the estimates for the first observation match the results from the second ESTIMATE statement above.
|
| Product Family | Product | System | SAS Release | |
| Reported | Fixed | |||
| SAS System | SAS/STAT | All | n/a | |
| Type: | Usage Note |
| Priority: | low |
| Topic: | SAS Reference ==> Procedures ==> GENMOD |
| Date Modified: | 2006-10-04 17:31:12 |
| Date Created: | 2004-10-22 15:07:07 |



