If an exposure variable, such as smoking, is thought to increase the occurrence of some outcome, such as lung cancer, it is often of interest to estimate the proportion reduction in the probability of the outcome that would be achieved if the exposure were removed. This is the intent of the population attributable risk, PAR. A related measure, the attributable risk among the exposed, ARE, assesses the proportion reduction among those exposed.
This note shows how these statistics can be computed using the data in a 2×2 table of exposure by outcome. However, when there are additional factors that affect the probability of outcome, they should be adjusted for when estimating the attributable risk statistics. That adjustment can be done using stratification, which is available in PROC STDRATE. Another approach is to use a suitable model such as a logistic model on the outcome that includes the confounding variables as covariates in the model.
The example titled "Computing Attributable Fraction Estimates" in the PROC STDRATE documentation illustrates the stratification method to adjust for covariates. The example below uses the modeling method to estimate the attributable risk statistics.
In crosssectional studies, the attributable risk among the exposed is defined as ARE = (R_{e}  R_{u})/R_{e}, where Re is the risk of the outcome in the exposed group and R_{u} is the risk in the unexposed group. The population attributable risk is defined as PAR = (R  R_{u})/R, where R is the overall risk of the outcome.^{NOTE} Note that the ARE can also be written in terms of the relative risk, RR = R_{e}/R_{u}, so that ARE = (RR1)/RR. PAR can be written as PAR = ARE·p_{e}, where p_{e} is the proportion of observed events that were in the exposed group.
In retrospective, casecontrol studies and when the risk of the outcome is small, the attributable risk can be estimated by ARE = (OR1)/OR, where OR is the odds ratio for exposure. The population attributable risk is then computed as above by PAR = ARE·p_{e}.
The following example uses the chemical exposure data in the Example titled "Computing Attributable Fraction Estimates" in the PROC STDRATE documentation. The modified version of the Factory data set created by the DATA step below creates separate observations for exposed and nonexposed cases at each age along with their counts. Expose is an exposure indicator variable, where Expose=E represents the group exposed to a chemical agent and Expose=NE represents non exposure. Exposure to the agent is linked to increased probability of the outcome event. Age is a confounding factor which should be adjusted for when assessing the effect of exposure. In the STDRATE documentation example, this adjustment is done by stratifying on the levels of Age. In order to adjust for Age by modeling, the numeric variable AgeMid, with values at the midpoints of the age ranges, is created for use as a continuous covariate.
A logistic model on the outcome is fitted by the following PROC LOGISTIC statements which includes both Expose and AgeMid as predictors. The LSMEANS statement provides estimates of the outcome risk in each exposure group adjusted for age. The E option shows the coefficients of the linear combination of model parameters used to estimate the log odds for each group. The ILINK option applies the inverse logit function to the log odds estimates to produce the risk estimates R_{e} and R_{u}. The STORE statement saves the fitted model for later use in the NLEst macro.
data Factory; input AgeMid Age $ Event_E Count_E Event_NE Count_NE; r=Event_E; n=Count_E; Expose='E'; output; r=Event_NE; n=Count_NE; Expose='NE'; output; datalines; 25 2029 31 352 143 2626 35 3039 57 486 392 4124 45 4049 62 538 459 4662 55 5059 50 455 337 3622 65 6069 38 322 199 2155 75 70+ 9 68 35 414 ; proc logistic data=Factory; class Expose / param=glm; model r/n = AgeMid Expose; lsmeans Expose / e ilink; store out=log; run;
Below are shown the four parameters of the logistic model and the estimated odds ratios. The adjusted Expose estimate is 0.2444 and its adjusted odds ratio estimate is 1.277. The adjusted risk estimates are provided in the Mean column of the Expose Least Squares Means table – R_{e} = 0.1140 and R_{u} = 0.0915.
The LOGISTIC Procedure

In the DATA FD step below, the first two rows express the adjusted risk for each group in terms of the linear combination coefficients from the LSMEANS statement (shown in the Coefficients for Expose Least Squares Means table above) which multiply the model parameters (named B_p1  B_p4). Note that the expressions involve the mean of the AgeMid covariate, 50. The inverse logit function is then applied using the LOGISTIC function. The odds ratio for Expose is recomputed in the third row and is simply the third parameter estimate (Expose), exponentiated. The fourth and fifth rows define the attributable risk and population attributable risk as defined above. PAR is computed as ARE·p_{e}. The observed value of p_{e} is provided by the PROC FREQ step and is found to be 0.1363. Since the outcome event is relatively rare (around 10% as seen above), the attributable risks are also approximated using the odds ratio for exposure in rows 6 and 7 of the data set. The NLEst macro is then called to reestimate the two group risks, the exposure odds ratio, and to estimate the attributable risk and population attributable risk along with tests and confidence intervals.
The NLEst macro is a general macro for estimating and testing linear or nonlinear combinations of model parameters. Because the inverse logit function or exponentiation is used in the expressions, nonlinear functions must be estimated. The macro is used by providing the saved model from the STORE statement and a data set containing the expressions to be estimated along with labels. The name of the saved model is specified in INSTORE=, and the name of the data set of expressions and labels is specified in FDATA=.
proc freq data=Factory; table Expose; weight r; run; data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; Risk(E) , logistic(B_p1+50*B_p2+B_p3) Risk(NE) , logistic(B_p1+50*B_p2) OR (Expose) , exp(B_p3) ARE ,( logistic(B_p1+50*B_p2+B_p3)  logistic(B_p1+50*B_p2) ) / ( logistic(B_p1+50*B_p2+B_p3) ) PAR , 0.1363*( logistic(B_p1+50*B_p2+B_p3)  logistic(B_p1+50*B_p2) ) / ( logistic(B_p1+50*B_p2+B_p3) ) ARE OR , (exp(B_p3)1)/exp(B_p3) PAR OR , 0.1363*(exp(B_p3)1)/exp(B_p3) ; %NLEst( instore=log, fdata=fd )
The group risks, R_{e} and R_{u}, and the exposure odds ratio agree with those in the PROC LOGISTIC results above. The estimated attributable risk, 0.197, and the estimated population attributable risk, 0.027, closely match those obtained by stratification from PROC STDRATE. Additionally, the approximate estimates based on the exposure odds ratio are similar.
These results indicate that nearly 3% of the outcome event is attributable to the exposure to the chemical agent. In other words, the proportion of the outcome event would be reduced by about 3% in the population if exposure to the chemical agent were removed. Among those exposed to the agent, the outcome event would be reduced by about 20%.
The FREQ Procedure

Another modeling approach is to use a Poisson GEE model (Zou, 2004) which allows for estimation of the relative risk. This is further illustrated in this note. The Factory data are expanded to have a binary response with separate observations for events (Y=1) and nonevents (Y=0) with frequency counts (Count). An ID variable is added which simply numbers the final observations 1, 2, ... . PROC GENMOD with a REPEATED statement is then used to fit a Poisson GEE model to the binary response with Expose and AgeMid as predictors as before. The LSMEANS statement is again added with the E and ILINK options to provide the risk estimates for each group and the coefficients that define them. The DIFF and EXP options are also added to provide the relative risk estimate. The STORE statement saves the model for later use.
data Factory2; set Factory; y=1; Count=r; ID+1; output; y=0; Count=nr; ID+1; output; run; proc genmod data=Factory2; class ID Expose; model y = AgeMid Expose / dist=poisson; freq Count; repeated subject=ID; lsmeans Expose / e ilink diff exp; store gen; run;
Note that the estimated relative risk, 1.246 in the Exponentiated column of the Differences of Expose Least Squares Means table, is similar to the previously estimated odds ratio, 1.277, suggesting that using the odds ratio approximating formula is reasonable.
The GENMOD Procedure

The LSMEANS coefficients table above shows, for each Expose level, the vector of coefficients that multiplies the vector of model parameters. Since the response function being modeled is log(mean) and the mean is just the probability of the outcome event, the DIFF option comparing the two groups estimates the difference in log(mean) values, or equivalently, the log relative risk. Exponentiating that difference (EXP option) produces an estimate of the relative risk. Equivalently, the difference of the two coefficient vectors can be used to multiply the parameter vector to obtain the log mean difference which can then be exponentiated. The difference of the two group vectors is simply (0 0 1 1), but since the fourth parameter estimate is zero, the difference of log means is just 1*B_p3 = B_p3, the Expose parameter estimate. The relative risk estimate is then exp(B_p3) and this is used in the first data line in the following DATA step to recompute the relative risk. The next two data lines use the formula involving the relative risk to compute the ARE which is then used to compute the PAR.
data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; RR , exp(B_p3) ARE RR , ( (exp(B_p3)1)/exp(B_p3) ) PAR RR , ( 0.1363*(exp(B_p3)1)/exp(B_p3) ) ; %NLEst( instore=gen, fdata=fd )
The resulting estimates for the relative risk, the ARE, and PAR and very close to those directly estimated from the separate group risks as above.

__________
References:
Fleiss, J. L., Levin, B., and Paik, M. C. (2003), Statistical Methods for Rates and Proportions, 3d ed. New York: John Wiley & Sons, Inc.
Zou, G. (2004), "A Modified Poisson Regression Approach to Prospective Studies with Binary Data," Am. J. Epidemiol., 159:702706.
NOTE: In some cases, exposure reduces the probability of the outcome event rather than increases it. For example, exposure to the flu vaccine reduces the probability of getting the flu. In this situation, the attributable benefit in the exposed and population attributable benefit can be computed using similar formulas: ABE = (R_{u}  R_{e})/R_{u} and PAB = (R  R_{e})/R.
Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  z/OS 64bit  
z/OS  
OpenVMS VAX  
Microsoft® Windows® for 64Bit Itaniumbased Systems  
Microsoft Windows Server 2003 Datacenter 64bit Edition  
Microsoft Windows Server 2003 Enterprise 64bit Edition  
Microsoft Windows XP 64bit Edition  
Microsoft® Windows® for x64  
OS/2  
Microsoft Windows 8 Enterprise 32bit  
Microsoft Windows 8 Enterprise x64  
Microsoft Windows 8 Pro 32bit  
Microsoft Windows 8 Pro x64  
Microsoft Windows 8.1 Enterprise 32bit  
Microsoft Windows 8.1 Enterprise x64  
Microsoft Windows 8.1 Pro 32bit  
Microsoft Windows 8.1 Pro x64  
Microsoft Windows 10  
Microsoft Windows 95/98  
Microsoft Windows 2000 Advanced Server  
Microsoft Windows 2000 Datacenter Server  
Microsoft Windows 2000 Server  
Microsoft Windows 2000 Professional  
Microsoft Windows NT Workstation  
Microsoft Windows Server 2003 Datacenter Edition  
Microsoft Windows Server 2003 Enterprise Edition  
Microsoft Windows Server 2003 Standard Edition  
Microsoft Windows Server 2003 for x64  
Microsoft Windows Server 2008  
Microsoft Windows Server 2008 R2  
Microsoft Windows Server 2008 for x64  
Microsoft Windows Server 2012 Datacenter  
Microsoft Windows Server 2012 R2 Datacenter  
Microsoft Windows Server 2012 R2 Std  
Microsoft Windows Server 2012 Std  
Microsoft Windows Server 2016  
Microsoft Windows XP Professional  
Windows 7 Enterprise 32 bit  
Windows 7 Enterprise x64  
Windows 7 Home Premium 32 bit  
Windows 7 Home Premium x64  
Windows 7 Professional 32 bit  
Windows 7 Professional x64  
Windows 7 Ultimate 32 bit  
Windows 7 Ultimate x64  
Windows Millennium Edition (Me)  
Windows Vista  
Windows Vista for x64  
64bit Enabled AIX  
64bit Enabled HPUX  
64bit Enabled Solaris  
ABI+ for Intel Architecture  
AIX  
HPUX  
HPUX IPF  
IRIX  
Linux  
Linux for x64  
Linux on Itanium  
OpenVMS Alpha  
OpenVMS on HP Integrity  
Solaris  
Solaris for x64  
Tru64 UNIX 
Type:  Usage Note 
Priority:  
Topic:  SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LOGISTIC Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Macro 
Date Modified:  20190502 15:16:17 
Date Created:  20190111 10:51:45 