63471 - Estimating attributable risks when covariates are present using logistic and other models

Usage Note 63471: Estimating attributable risks when covariates are present using logistic and other models

If an exposure variable, such as smoking, is thought to increase the occurrence of some outcome, such as lung cancer, it is often of interest to estimate the proportion reduction in the probability of the outcome that would be achieved if the exposure were removed. This is the intent of the population attributable risk, PAR. A related measure, the attributable risk among the exposed, ARE, assesses the proportion reduction among those exposed.

This note shows how these statistics can be computed using the data in a 2×2 table of exposure by outcome. However, when there are additional factors that affect the probability of outcome, they should be adjusted for when estimating the attributable risk statistics. That adjustment can be done using stratification, which is available in PROC STDRATE. Another approach is to use a suitable model such as a logistic model on the outcome that includes the confounding variables as covariates in the model.

The example titled "Computing Attributable Fraction Estimates" in the PROC STDRATE documentation illustrates the stratification method to adjust for covariates. The example below uses the modeling method to estimate the attributable risk statistics.

In cross-sectional studies, the attributable risk among the exposed is defined as ARE = (R_e - R_u)/R_e, where Re is the risk of the outcome in the exposed group and R_u is the risk in the unexposed group. The population attributable risk is defined as PAR = (R - R_u)/R, where R is the overall risk of the outcome.^NOTE Note that the ARE can also be written in terms of the relative risk, RR = R_e/R_u, so that ARE = (RR-1)/RR. PAR can be written as PAR = ARE·p_e, where p_e is the proportion of observed events that were in the exposed group.

In retrospective, case-control studies and when the risk of the outcome is small, the attributable risk can be estimated by ARE = (OR-1)/OR, where OR is the odds ratio for exposure. The population attributable risk is then computed as above by PAR = ARE·p_e.

Example with covariate using a logistic model

The following example uses the chemical exposure data in the Example titled "Computing Attributable Fraction Estimates" in the PROC STDRATE documentation. The modified version of the Factory data set created by the DATA step below creates separate observations for exposed and nonexposed cases at each age along with their counts. Expose is an exposure indicator variable, where Expose=E represents the group exposed to a chemical agent and Expose=NE represents non exposure. Exposure to the agent is linked to increased probability of the outcome event. Age is a confounding factor which should be adjusted for when assessing the effect of exposure. In the STDRATE documentation example, this adjustment is done by stratifying on the levels of Age. In order to adjust for Age by modeling, the numeric variable AgeMid, with values at the midpoints of the age ranges, is created for use as a continuous covariate.

A logistic model on the outcome is fitted by the following PROC LOGISTIC statements which includes both Expose and AgeMid as predictors. The LSMEANS statement provides estimates of the outcome risk in each exposure group adjusted for age. The E option shows the coefficients of the linear combination of model parameters used to estimate the log odds for each group. The ILINK option applies the inverse logit function to the log odds estimates to produce the risk estimates R_e and R_u. The STORE statement saves the fitted model for later use in the NLEst macro.

      data Factory;
         input AgeMid Age $ Event_E Count_E Event_NE Count_NE;
         r=Event_E;  n=Count_E;  Expose='E';  output;
         r=Event_NE; n=Count_NE; Expose='NE'; output;
         datalines;
      25 20-29   31  352  143  2626
      35 30-39   57  486  392  4124
      45 40-49   62  538  459  4662
      55 50-59   50  455  337  3622
      65 60-69   38  322  199  2155
      75 70+      9   68   35   414
      ;
      
      proc logistic data=Factory;
         class Expose / param=glm;
         model r/n = AgeMid Expose;
         lsmeans Expose / e ilink;
         store out=log;
         run;

Below are shown the four parameters of the logistic model and the estimated odds ratios. The adjusted Expose estimate is 0.2444 and its adjusted odds ratio estimate is 1.277. The adjusted risk estimates are provided in the Mean column of the Expose Least Squares Means table – R_e = 0.1140 and R_u = 0.0915.

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates
Parameter		DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		1	-2.6406	0.0888	883.7678	<.0001
AgeMid		1	0.00691	0.00185	13.9804	0.0002
Expose	E	1	0.2444	0.0725	11.3536	0.0008
Expose	N	0	0	.	.	.

Odds Ratio Estimates
Effect	Point Estimate	95% Wald Confidence Limits
AgeMid	1.007	1.003	1.011
Expose E vs N	1.277	1.108	1.472

Coefficients for Expose Least Squares Means
Parameter	Expose	Row1	Row2
Intercept		1	1
AgeMid		50	50
Expose E	E	1
Expose N	N		1

Expose Least Squares Means
Expose	Estimate	Standard Error	z Value	Pr > \|z\|	Mean	Standard Error of Mean
E	-2.0507	0.06782	-30.24	<.0001	0.1140	0.006849
N	-2.2951	0.02757	-83.26	<.0001	0.09153	0.002292

In the DATA FD step below, the first two rows express the adjusted risk for each group in terms of the linear combination coefficients from the LSMEANS statement (shown in the Coefficients for Expose Least Squares Means table above) which multiply the model parameters (named B_p1 - B_p4). Note that the expressions involve the mean of the AgeMid covariate, 50. The inverse logit function is then applied using the LOGISTIC function. The odds ratio for Expose is recomputed in the third row and is simply the third parameter estimate (Expose), exponentiated. The fourth and fifth rows define the attributable risk and population attributable risk as defined above. PAR is computed as ARE·p_e. The observed value of p_e is provided by the PROC FREQ step and is found to be 0.1363. Since the outcome event is relatively rare (around 10% as seen above), the attributable risks are also approximated using the odds ratio for exposure in rows 6 and 7 of the data set. The NLEst macro is then called to re-estimate the two group risks, the exposure odds ratio, and to estimate the attributable risk and population attributable risk along with tests and confidence intervals.

The NLEst macro is a general macro for estimating and testing linear or nonlinear combinations of model parameters. Because the inverse logit function or exponentiation is used in the expressions, nonlinear functions must be estimated. The macro is used by providing the saved model from the STORE statement and a data set containing the expressions to be estimated along with labels. The name of the saved model is specified in INSTORE=, and the name of the data set of expressions and labels is specified in FDATA=.

      proc freq data=Factory; 
         table Expose;
         weight r;
         run;

      data fd; 
         length label f $32767; 
         infile datalines delimiter=',';
         input label f; 
         datalines;
      Risk(E)  , logistic(B_p1+50*B_p2+B_p3)
      Risk(NE) , logistic(B_p1+50*B_p2)
      OR (Expose) , exp(B_p3)
      ARE ,( logistic(B_p1+50*B_p2+B_p3) - logistic(B_p1+50*B_p2) ) / ( logistic(B_p1+50*B_p2+B_p3) )
      PAR , 0.1363*( logistic(B_p1+50*B_p2+B_p3) - logistic(B_p1+50*B_p2) ) / ( logistic(B_p1+50*B_p2+B_p3) )
      ARE OR , (exp(B_p3)-1)/exp(B_p3)
      PAR OR , 0.1363*(exp(B_p3)-1)/exp(B_p3)
      ;
      
      %NLEst( instore=log, fdata=fd )

The group risks, R_e and R_u, and the exposure odds ratio agree with those in the PROC LOGISTIC results above. The estimated attributable risk, 0.197, and the estimated population attributable risk, 0.027, closely match those obtained by stratification from PROC STDRATE. Additionally, the approximate estimates based on the exposure odds ratio are similar.

These results indicate that nearly 3% of the outcome event is attributable to the exposure to the chemical agent. In other words, the proportion of the outcome event would be reduced by about 3% in the population if exposure to the chemical agent were removed. Among those exposed to the agent, the outcome event would be reduced by about 20%.

The FREQ Procedure

Expose	Frequency	Percent	Cumulative Frequency	Cumulative Percent
E	247	13.63	247	13.63
N	1565	86.37	1812	100.00

Nonlinear Function Estimate

Label	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq	Alpha	Lower	Upper
Risk(E)	0.1140	0.006849	276.95	<.0001	0.05	0.1006	0.1274
Risk(NE)	0.09153	0.002292	1594.54	<.0001	0.05	0.08703	0.09602
OR (Expose)	1.2769	0.09262	190.05	<.0001	0.05	1.0953	1.4584
ARE	0.1970	0.05179	14.47	0.0001	0.05	0.09549	0.2985
PAR	0.02685	0.007059	14.47	0.0001	0.05	0.01302	0.04069
ARE OR	0.2168	0.05681	14.57	0.0001	0.05	0.1055	0.3282
PAR OR	0.02956	0.007743	14.57	0.0001	0.05	0.01438	0.04473

Using Zou's Poisson model to estimate the relative risk

Another modeling approach is to use a Poisson GEE model (Zou, 2004) which allows for estimation of the relative risk. This is further illustrated in this note. The Factory data are expanded to have a binary response with separate observations for events (Y=1) and nonevents (Y=0) with frequency counts (Count). An ID variable is added which simply numbers the final observations 1, 2, ... . PROC GENMOD with a REPEATED statement is then used to fit a Poisson GEE model to the binary response with Expose and AgeMid as predictors as before. The LSMEANS statement is again added with the E and ILINK options to provide the risk estimates for each group and the coefficients that define them. The DIFF and EXP options are also added to provide the relative risk estimate. The STORE statement saves the model for later use.

      data Factory2;
         set Factory;
         y=1; Count=r; ID+1; output;
         y=0; Count=n-r; ID+1; output;
         run;
      proc genmod data=Factory2;
         class ID Expose;
         model y = AgeMid Expose / dist=poisson;
         freq Count;
         repeated subject=ID;
         lsmeans Expose / e ilink diff exp;
         store gen;
         run;

Note that the estimated relative risk, 1.246 in the Exponentiated column of the Differences of Expose Least Squares Means table, is similar to the previously estimated odds ratio, 1.277, suggesting that using the odds ratio approximating formula is reasonable.

The GENMOD Procedure

Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter		Estimate	Standard Error	95% Confidence Limits		Z	Pr > \|Z\|
Intercept		-2.7048	0.0767	-2.8552	-2.5545	-35.26	<.0001
AgeMid		0.0063	0.0016	0.0032	0.0094	3.96	<.0001
Expose	E	0.2196	0.0647	0.0929	0.3464	3.40	0.0007
Expose	N	0.0000	0.0000	0.0000	0.0000	.	.

Coefficients for Expose Least Squares Means
Parameter	Expose	Row1	Row2
Intercept		1	1
AgeMid		44.955	44.955
Expose E	E	1
Expose N	N		1

Expose Least Squares Means
Expose	Estimate	Standard Error	z Value	Pr > \|z\|	Mean	Standard Error of Mean	Exponentiated
E	-2.2036	0.06007	-36.68	<.0001	0.1104	0.006633	0.1104
N	-2.4232	0.02418	-100.22	<.0001	0.08864	0.002143	0.08864

Differences of Expose Least Squares Means
Expose	_Expose	Estimate	Standard Error	z Value	Pr > \|z\|	Exponentiated
E	N	0.2196	0.06467	3.40	0.0007	1.2456

The LSMEANS coefficients table above shows, for each Expose level, the vector of coefficients that multiplies the vector of model parameters. Since the response function being modeled is log(mean) and the mean is just the probability of the outcome event, the DIFF option comparing the two groups estimates the difference in log(mean) values, or equivalently, the log relative risk. Exponentiating that difference (EXP option) produces an estimate of the relative risk. Equivalently, the difference of the two coefficient vectors can be used to multiply the parameter vector to obtain the log mean difference which can then be exponentiated. The difference of the two group vectors is simply (0 0 1 -1), but since the fourth parameter estimate is zero, the difference of log means is just 1*B_p3 = B_p3, the Expose parameter estimate. The relative risk estimate is then exp(B_p3) and this is used in the first data line in the following DATA step to recompute the relative risk. The next two data lines use the formula involving the relative risk to compute the ARE which is then used to compute the PAR.

      data fd; 
         length label f $32767; 
         infile datalines delimiter=',';
         input label f; 
         datalines;
         RR , exp(B_p3)
         ARE RR , ( (exp(B_p3)-1)/exp(B_p3) )
         PAR RR , ( 0.1363*(exp(B_p3)-1)/exp(B_p3) )
         ;
      %NLEst( instore=gen, fdata=fd )

The resulting estimates for the relative risk, the ARE, and PAR and very close to those directly estimated from the separate group risks as above.

Nonlinear Function Estimate

Label	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq	Alpha	Lower	Upper
RR	1.2456	0.08055	239.139	<.0001	0.05	1.0877	1.4035
ARE RR	0.1972	0.05192	14.424	0.0001	0.05	0.09542	0.2989
PAR RR	0.02687	0.007076	14.424	0.0001	0.05	0.01301	0.04074

__________

References:

Fleiss, J. L., Levin, B., and Paik, M. C. (2003), Statistical Methods for Rates and Proportions, 3d ed. New York: John Wiley & Sons, Inc.

Zou, G. (2004), "A Modified Poisson Regression Approach to Prospective Studies with Binary Data," Am. J. Epidemiol., 159:702-706.

NOTE: In some cases, exposure reduces the probability of the outcome event rather than increases it. For example, exposure to the flu vaccine reduces the probability of getting the flu. In this situation, the attributable benefit in the exposed and population attributable benefit can be computed using similar formulas: ABE = (R_u - R_e)/R_u and PAB = (R - R_e)/R.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS 64-bit
		z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8 Enterprise 32-bit
		Microsoft Windows 8 Enterprise x64
		Microsoft Windows 8 Pro 32-bit
		Microsoft Windows 8 Pro x64
		Microsoft Windows 8.1 Enterprise 32-bit
		Microsoft Windows 8.1 Enterprise x64
		Microsoft Windows 8.1 Pro 32-bit
		Microsoft Windows 8.1 Pro x64
		Microsoft Windows 10
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 R2
		Microsoft Windows Server 2008 for x64
		Microsoft Windows Server 2012 Datacenter
		Microsoft Windows Server 2012 R2 Datacenter
		Microsoft Windows Server 2012 R2 Std
		Microsoft Windows Server 2012 Std
		Microsoft Windows Server 2016
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LOGISTIC Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Macro

Date Modified:	2019-05-02 15:16:17
Date Created:	2019-01-11 10:51:45

Support

Usage Note 63471: Estimating attributable risks when covariates are present using logistic and other models

Example with covariate using a logistic model

Using Zou's Poisson model to estimate the relative risk

Operating System and Release Information