If an exposure variable, such as smoking, is thought to increase the occurrence of some outcome, such as lung cancer, it is often of interest to estimate the proportion reduction in the probability of the outcome that would be achieved if the exposure were removed. This is the intent of the population attributable risk, PAR. A related measure, the attributable risk among the exposed, ARE, assesses the proportion reduction among those exposed.
This note shows how these statistics can be computed using the data in a 2×2 table of exposure by outcome. However, when there are additional factors that affect the probability of outcome, they should be adjusted for when estimating the attributable risk statistics. That adjustment can be done using stratification, which is available in PROC STDRATE. Another approach is to use a suitable model such as a logistic model on the outcome that includes the confounding variables as covariates in the model.
The example titled "Computing Attributable Fraction Estimates" in the PROC STDRATE documentation illustrates the stratification method to adjust for covariates. The example below uses the modeling method to estimate the attributable risk statistics.
In cross-sectional studies, the attributable risk among the exposed is defined as ARE = (Re - Ru)/Re, where Re is the risk of the outcome in the exposed group and Ru is the risk in the unexposed group. The population attributable risk is defined as PAR = (R - Ru)/R, where R is the overall risk of the outcome.NOTE Note that the ARE can also be written in terms of the relative risk, RR = Re/Ru, so that ARE = (RR-1)/RR. PAR can be written as PAR = ARE·pe, where pe is the proportion of observed events that were in the exposed group.
In retrospective, case-control studies and when the risk of the outcome is small, the attributable risk can be estimated by ARE = (OR-1)/OR, where OR is the odds ratio for exposure. The population attributable risk is then computed as above by PAR = ARE·pe.
The following example uses the chemical exposure data in the Example titled "Computing Attributable Fraction Estimates" in the PROC STDRATE documentation. The modified version of the Factory data set created by the DATA step below creates separate observations for exposed and nonexposed cases at each age along with their counts. Expose is an exposure indicator variable, where Expose=E represents the group exposed to a chemical agent and Expose=NE represents non exposure. Exposure to the agent is linked to increased probability of the outcome event. Age is a confounding factor which should be adjusted for when assessing the effect of exposure. In the STDRATE documentation example, this adjustment is done by stratifying on the levels of Age. In order to adjust for Age by modeling, the numeric variable AgeMid, with values at the midpoints of the age ranges, is created for use as a continuous covariate.
A logistic model on the outcome is fitted by the following PROC LOGISTIC statements which includes both Expose and AgeMid as predictors. The LSMEANS statement provides estimates of the outcome risk in each exposure group adjusted for age. The E option shows the coefficients of the linear combination of model parameters used to estimate the log odds for each group. The ILINK option applies the inverse logit function to the log odds estimates to produce the risk estimates Re and Ru. The STORE statement saves the fitted model for later use in the NLEst macro.
data Factory; input AgeMid Age $ Event_E Count_E Event_NE Count_NE; r=Event_E; n=Count_E; Expose='E'; output; r=Event_NE; n=Count_NE; Expose='NE'; output; datalines; 25 20-29 31 352 143 2626 35 30-39 57 486 392 4124 45 40-49 62 538 459 4662 55 50-59 50 455 337 3622 65 60-69 38 322 199 2155 75 70+ 9 68 35 414 ; proc logistic data=Factory; class Expose / param=glm; model r/n = AgeMid Expose; lsmeans Expose / e ilink; store out=log; run;
Below are shown the four parameters of the logistic model and the estimated odds ratios. The adjusted Expose estimate is 0.2444 and its adjusted odds ratio estimate is 1.277. The adjusted risk estimates are provided in the Mean column of the Expose Least Squares Means table – Re = 0.1140 and Ru = 0.0915.
The LOGISTIC Procedure
|
In the DATA FD step below, the first two rows express the adjusted risk for each group in terms of the linear combination coefficients from the LSMEANS statement (shown in the Coefficients for Expose Least Squares Means table above) which multiply the model parameters (named B_p1 - B_p4). Note that the expressions involve the mean of the AgeMid covariate, 50. The inverse logit function is then applied using the LOGISTIC function. The odds ratio for Expose is recomputed in the third row and is simply the third parameter estimate (Expose), exponentiated. The fourth and fifth rows define the attributable risk and population attributable risk as defined above. PAR is computed as ARE·pe. The observed value of pe is provided by the PROC FREQ step and is found to be 0.1363. Since the outcome event is relatively rare (around 10% as seen above), the attributable risks are also approximated using the odds ratio for exposure in rows 6 and 7 of the data set. The NLEst macro is then called to re-estimate the two group risks, the exposure odds ratio, and to estimate the attributable risk and population attributable risk along with tests and confidence intervals.
The NLEst macro is a general macro for estimating and testing linear or nonlinear combinations of model parameters. Because the inverse logit function or exponentiation is used in the expressions, nonlinear functions must be estimated. The macro is used by providing the saved model from the STORE statement and a data set containing the expressions to be estimated along with labels. The name of the saved model is specified in INSTORE=, and the name of the data set of expressions and labels is specified in FDATA=.
proc freq data=Factory; table Expose; weight r; run; data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; Risk(E) , logistic(B_p1+50*B_p2+B_p3) Risk(NE) , logistic(B_p1+50*B_p2) OR (Expose) , exp(B_p3) ARE ,( logistic(B_p1+50*B_p2+B_p3) - logistic(B_p1+50*B_p2) ) / ( logistic(B_p1+50*B_p2+B_p3) ) PAR , 0.1363*( logistic(B_p1+50*B_p2+B_p3) - logistic(B_p1+50*B_p2) ) / ( logistic(B_p1+50*B_p2+B_p3) ) ARE OR , (exp(B_p3)-1)/exp(B_p3) PAR OR , 0.1363*(exp(B_p3)-1)/exp(B_p3) ; %NLEst( instore=log, fdata=fd )
The group risks, Re and Ru, and the exposure odds ratio agree with those in the PROC LOGISTIC results above. The estimated attributable risk, 0.197, and the estimated population attributable risk, 0.027, closely match those obtained by stratification from PROC STDRATE. Additionally, the approximate estimates based on the exposure odds ratio are similar.
These results indicate that nearly 3% of the outcome event is attributable to the exposure to the chemical agent. In other words, the proportion of the outcome event would be reduced by about 3% in the population if exposure to the chemical agent were removed. Among those exposed to the agent, the outcome event would be reduced by about 20%.
The FREQ Procedure
|
Another modeling approach is to use a Poisson GEE model (Zou, 2004) which allows for estimation of the relative risk. This is further illustrated in this note. The Factory data are expanded to have a binary response with separate observations for events (Y=1) and nonevents (Y=0) with frequency counts (Count). An ID variable is added which simply numbers the final observations 1, 2, ... . PROC GENMOD with a REPEATED statement is then used to fit a Poisson GEE model to the binary response with Expose and AgeMid as predictors as before. The LSMEANS statement is again added with the E and ILINK options to provide the risk estimates for each group and the coefficients that define them. The DIFF and EXP options are also added to provide the relative risk estimate. The STORE statement saves the model for later use.
data Factory2; set Factory; y=1; Count=r; ID+1; output; y=0; Count=n-r; ID+1; output; run; proc genmod data=Factory2; class ID Expose; model y = AgeMid Expose / dist=poisson; freq Count; repeated subject=ID; lsmeans Expose / e ilink diff exp; store gen; run;
Note that the estimated relative risk, 1.246 in the Exponentiated column of the Differences of Expose Least Squares Means table, is similar to the previously estimated odds ratio, 1.277, suggesting that using the odds ratio approximating formula is reasonable.
The GENMOD Procedure
|
The LSMEANS coefficients table above shows, for each Expose level, the vector of coefficients that multiplies the vector of model parameters. Since the response function being modeled is log(mean) and the mean is just the probability of the outcome event, the DIFF option comparing the two groups estimates the difference in log(mean) values, or equivalently, the log relative risk. Exponentiating that difference (EXP option) produces an estimate of the relative risk. Equivalently, the difference of the two coefficient vectors can be used to multiply the parameter vector to obtain the log mean difference which can then be exponentiated. The difference of the two group vectors is simply (0 0 1 -1), but since the fourth parameter estimate is zero, the difference of log means is just 1*B_p3 = B_p3, the Expose parameter estimate. The relative risk estimate is then exp(B_p3) and this is used in the first data line in the following DATA step to recompute the relative risk. The next two data lines use the formula involving the relative risk to compute the ARE which is then used to compute the PAR.
data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; RR , exp(B_p3) ARE RR , ( (exp(B_p3)-1)/exp(B_p3) ) PAR RR , ( 0.1363*(exp(B_p3)-1)/exp(B_p3) ) ; %NLEst( instore=gen, fdata=fd )
The resulting estimates for the relative risk, the ARE, and PAR and very close to those directly estimated from the separate group risks as above.
|
__________
References:
Fleiss, J. L., Levin, B., and Paik, M. C. (2003), Statistical Methods for Rates and Proportions, 3d ed. New York: John Wiley & Sons, Inc.
Zou, G. (2004), "A Modified Poisson Regression Approach to Prospective Studies with Binary Data," Am. J. Epidemiol., 159:702-706.
NOTE: In some cases, exposure reduces the probability of the outcome event rather than increases it. For example, exposure to the flu vaccine reduces the probability of getting the flu. In this situation, the attributable benefit in the exposed and population attributable benefit can be computed using similar formulas: ABE = (Ru - Re)/Ru and PAB = (R - Re)/R.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS 64-bit | ||
z/OS | ||||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 8.1 Pro x64 | ||||
Microsoft Windows 10 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows Server 2016 | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LOGISTIC Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Macro |
Date Modified: | 2019-05-02 15:16:17 |
Date Created: | 2019-01-11 10:51:45 |