The relative risk (or prevalence ratio) is the ratio of event probabilities at two settings of the model predictors. When assessing the effect of a particular predictor, it is of interest to estimate the relative risk for that predictor adjusted for the effects of the other predictors. For a continuous predictor, the relative risk, px+1/px, is interpreted as the change in event probability for a unit increase in the predictor. For a categorical predictor, the relative risk, pxi /pxj , is interpreted as the change in event probability when changing from level j of the predictor to level i.
When prevalence of the event is low, the odds ratio provides a good estimate of the relative risk (Agresti 2002). In this situation, the odds ratio estimates from the usual logistic model (the default logit-linked binomial model) fit by PROC LOGISTIC can be used to estimate the adjusted relative risks.
If the event probability is not small, then you can fit a log-linked rather than a logit-linked binomial model. As shown below, exponentiating a parameter estimate in a log-linked binomial model directly estimates the relative risk. Here is the one-variable, linear log-linked model:
Under this model, a one-unit increase in the predictor yields the following results:
| log(p1) = a + b(x+1) = a + bx + b | (1) |
and
Subtract (2) from (1), and this is the result:
But note that log(p1) log(p2) = log(p1/p2) = log(relative risk), implying that the parameter estimate for the predictor, b, estimates the log relative risk. So, exponentiating the parameter estimate, eb, provides an estimate of the relative risk.
You can fit the log-linked binomial model by using PROC GENMOD with the DIST=BINOMIAL and
LINK=LOG options. However, using the log link can result in fitting
problems because the log does not ensure that predicted probabilities are mapped
to the [0,1] range that is required for probabilities. Petersen and Lei (2003) suggest routinely using the MODEL statement option INTERCEPT=-4 when fitting this model. This option provides a starting value of -4 for the intercept in the maximum likelihood estimation process. The sense of doing this can be seen by noting that 0 < p < 1, which implies that log(p) < 0. When all predictors are zero or at their reference levels, the intercept estimates log(p), so it makes sense to start its estimation in the negative range.
Petersen and Lei note that PROC GENMOD still might fail to fit the log-linked model because the solution falls on the boundary of the parameter space. When this happens, they suggest that the solution can often be found by fitting the model to a data set consisting of many copies of the original data augmented with one copy in which the response values are opposite those in the original data. This puts the solution in the parameter space where the optimization algorithm can find it. While this provides good estimates of the model parameters, and therefore good estimates of the adjusted relative risks, the standard errors are reduced due to the replication of the data. To correct this, they multiply the standard errors by the square root of the number of copies and recompute tests and confidence intervals. The same effect might be achieved by using weights normalized to the actual sample size so that replication of the data and adjustment of the standard errors are unnecessary.
Using the example presented by the authors, the following statements fit the log-linked model to the original data augmented with a copy of the data having reversed responses. The original observations are assigned a weight of 10,000 and the reversed-response observations are assigned a weight of 1. To normalize the weights so that they sum to the original sample size, the weights are multiplied by the true sample size of 10 and divided by the sum of the weights, 100,010. The sum of the normalized weights is the actual sample size, 10. This adjusts the standard errors and related statistics so that they are correct. The ESTIMATE statement is included to provide the relative risk estimate and confidence interval. In releases prior to SAS 9.2, the EXP option is needed to exponentiate the contrast (in this case, only the parameter for X) resulting in a relative risk estimate for a unit increase in X. Beginning in SAS 9.2, the EXP option is not needed since estimates of the contrast applying the inverse link function (labeled "Mean") are provided by default.
data a;
do x=1 to 10;
/* read the actual data, set weights to 10000, then normalize */
input y @@; f=10000 * 10/100010; output;
/* create reverse-response data, set weights to 1, then normalize */
y=1-y; f= 1 * 10/100010; output;
end;
datalines;
0 0 0 0 1 0 1 1 1 1
;
proc genmod descending;
weight f;
model y=x / dist=binomial link=log;
estimate 'X Rel. Risk' x 1;
run;
Following are abbreviated results from PROC GENMOD. The first two tables confirm that 20 observations were read (the original data and the data with reversed responses), that the weights were correctly normalized to the actual sample size of 10, and that the actual counts of the response levels were maintained.
The Parameter Estimates table shows that the true maximum likelihood solution was found as shown by Petersen and Lei. The standard errors are also correct due to normalizing the weights. The linear effect of the predictor is significant at p=.0403.
| 1 |
-2.0934 |
1.0207 |
-4.0939 |
-0.0929 |
4.21 |
0.0403 |
| 1 |
0.2093 |
0.1021 |
0.0093 |
0.4094 |
4.21 |
0.0403 |
| 0 |
1.0000 |
0.0000 |
1.0000 |
1.0000 |
|
|
|
The results from the ESTIMATE statement provide estimates and confidence intervals for both the relative risk (labeled "Mean") and the log relative risk (labeled "L'Beta"). The results indicate that the event (Y=1) is 1.23 times more likely when the predictor, X, increases by one unit.
| 1.2329 |
1.0093 |
1.5059 |
0.2093 |
0.1021 |
0.05 |
0.0093 |
0.4094 |
4.21 |
0.0403 |
|
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.