Zheng (2000) proposed a marginal R2 statistic, R 2 marg , that is applicable to Generalized Estimating Equations (GEE) models. It is a generalization of the R-square statistic as used in simple, ordinary least squares (OLS) regression models and in fact is the ordinary R2 in those models. Ballinger (2004) also discusses this statistic for assessing and comparing the fit of GEE models. In a non-GEE model for a binary response model, this statistic is available with the GOF option in the MODEL statement in PROC LOGISTIC and is labeled as Efron's R-square in the Model Fit Statistics table. See "Model Fitting Information" in the Details section of the LOGISTIC procedure documentation.
Note that for generalized linear models that do not involve clustering (repeated, correlated measures) and for some other models, an R2 statistic based on the variance function of the response distribution is available. See the description of the RsquareV macro. Zheng's generalized R-square statistic can also be used for generalized linear and other models as illustrated in this note.
R 2 marg is computed as
1 - SS(residuals)/SS(total) ,
where SS(residuals) is the sum of squares of the raw residuals, yit - ^ yit, i is the cluster (or subject) index, and t is the time (or repeated measure) index. SS(total) is the sum of squared deviations from the overall mean.
Zheng says that R 2 marg "shares the same interpretation and intuitive appeal as R2 and reduces to R2 for T = 1. It equals 1, its upper bound, when there is perfect prediction. It equals 0 when there is no association between the response and the predictors. It assumes a negative value when the variation is greater under the model of interest than under the null model, indicating poor prediction." T is the number of time points (repeated measures). She notes that as the sample size increases, R 2 marg approaches a value bounded by (0,1).
Concerning the fact that the GEE covariance matrix is not involved in the computation of R 2 marg , Zheng says, "In our opinion, goodness of fit is concerned with the agreement between the response and the prediction. The covariance matrix is only relevant to the point that it affects the fitted value through the parameter estimates, but is not of interest by itself. A measure that incorporates the covariance matrix may assume a low value because of poor fit and/or inappropriate correlation structure, and therefore confounds goodness of fit with correlation modelling. Such a measure does not serve the unique purpose of summarizing the goodness of fit and therefore is not considered."
Consider the respiratory data analyzed in Stokes et al. (2012). As described there, eligible patients in each of two centers were randomly assigned to active treatment or placebo. During treatment, respiratory status was determined at baseline and four visits and recorded on a five-point scale of 0 for terrible to 4 for excellent. Potential explanatory variables in addition to treatment were center, sex, and baseline respiratory status, as well as age (in years) at the time of study entry. The baseline and follow-up responses are actually measured on a five-point scale, from terrible to excellent. For this example, a dichotomized outcome is analyzed indicating whether the patient experienced a good or excellent response (dichot=1) or a worse response (dichot=0). Note that in order to compute R 2 marg the response must be numeric.
The following statements fit a logistic GEE model with the dichotomized baseline response, treatment, age, center, sex, and visit as predictors of the dichotomized response at the later visits.
proc genmod data=resp2; class id center trt sex visit; model dichot(event="1") = di_base trt age center sex visit / link=logit dist=bin; repeated subject=id*center / type=un; output out=out resraw=res pred=pred; run;
These statements compute R 2 marg . The USS and CSS functions compute the uncorrected and corrected sums of squares of their argument variables, respectively.
proc sql; select 1-(uss(res)/css(dichot)) as R2marg from out; quit;
The resulting value of R 2 marg is 0.25.
The above computation of R 2 marg uses the simple, overall average of all responses to compute SS(total). However, R2 can be viewed as a comparison of the fitted model to a reference model. The reference model is generally thought of as the model containing only an intercept (the null model). For OLS models, the overall average response is the estimate of the mean under the null model. This is not necessarily the case in GEE models, though this might be the case when the independence correlation structure is used. The null GEE model is easily fit and its mean estimate used in the computation.
These statements illustrate that for the above model. The same model as above is fit first in GENMOD. The output data set, OUT, from this model is then used as input to the second GENMOD step, which fits the null GEE model. Note that it has no predictors and therefore contains only an intercept. The output data set from this step contains both the raw residuals, RES, from the model of interest and the predicted mean, NULLMEAN, from the null GEE model. Note that the value of NULLMEAN is the same for all observations. The function, uss(dichot-nullmean), computes SS(total) using the null mean rather than the simple average as above.
proc genmod data=resp2; class id center trt sex visit; model dichot(event="1") = di_base trt age center sex visit / link=logit dist=bin; repeated subject=id*center / type=un; output out=out resraw=res; run; proc genmod data=out; class id center; model dichot(event="1") = / link=logit dist=bin; repeated subject=id*center / type=un; output out=out2 pred=nullmean; run; proc sql; select 1-(uss(res)/uss(dichot-nullmean)) as R2marg from out2; quit;
The resulting value of R 2 marg is still 0.25 since the difference between the overall average response, 0.55855, and the mean estimate from the null GEE model, 0.56174, is quite small. If the independence correlation structure, TYPE=IND, were used rather than the unstructured correlation matrix, then the mean estimate from the null GEE would be the same as the overall average response and the two values of R 2 marg would be identical.
A partial R2 can be obtained using the above concept of a reference model, but one that is a nested simplification of the model of interest rather than the null model. R 2 marg computed with a sub-model as the reference model is equivalent to the partial R2 for an OLS model as can be obtained with the PCORR2 option in PROC REG. The partial R2 is the proportion of total variation left over from the reference model that is accounted for by the full model.
The following statements assess the effect of adding age, sex, and visit to the model. The model containing only the dichotomized baseline response, treatment, and center serves as the reference model.
proc genmod data=resp2; class id center trt sex visit; model dichot(event="1") = di_base trt age center sex visit / link=logit dist=bin; repeated subject=id*center / type=un; output out=out resraw=res; run; proc genmod data=out; class id center trt; model dichot(event="1") = di_base trt center / link=logit dist=bin; repeated subject=id*center / type=un; output out=out2 pred=nullmean; run; proc sql; select 1-(uss(res)/uss(dichot-nullmean)) as R2marg from out2; quit;
The resulting partial R 2 marg is 0.017. This small value suggests that the addition of age, sex, and center explains only a small proportion of the variability that is left unexplained by the reference model.
Ballinger, G. A. 2004. “Using Generalized Estimating Equations for Longitudinal Data Analysis.” Organizational Research Methods. 7(2): 127–150.
Stokes, M. E., C. S. Davis, and G. G. Koch. 2012. Categorical Data Analysis Using SAS. 3rd ed. Cary, NC: SAS Institute Inc.
Zheng, B. 2000. “Summarizing the Goodness of Fit of Generalized Linear Models for Longitudinal Data.” Statistics in Medicine. 19(10): 1265–1275.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
z/OS 64-bit | ||||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 8.1 Pro x64 | ||||
Microsoft Windows 10 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows Server 2016 | ||||
Microsoft Windows Server 2019 | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Longitudinal Analysis SAS Reference ==> Procedures ==> GEE SAS Reference ==> Procedures ==> GENMOD |
Date Modified: | 2021-11-11 12:45:56 |
Date Created: | 2021-05-07 16:39:45 |