67880 - Generalized R-square for GEE and other models

Usage Note 67880: Generalized R-square for GEE and other models

Zheng (2000) proposed a marginal R² statistic, R 2 marg , that is applicable to Generalized Estimating Equations (GEE) models. It is a generalization of the R-square statistic as used in simple, ordinary least squares (OLS) regression models and in fact is the ordinary R² in those models. Ballinger (2004) also discusses this statistic for assessing and comparing the fit of GEE models. In a non-GEE model for a binary response model, this statistic is available with the GOF option in the MODEL statement in PROC LOGISTIC and is labeled as Efron's R-square in the Model Fit Statistics table. See "Model Fitting Information" in the Details section of the LOGISTIC procedure documentation.

Note that for generalized linear models that do not involve clustering (repeated, correlated measures) and for some other models, an R² statistic based on the variance function of the response distribution is available. See the description of the RsquareV macro. Zheng's generalized R-square statistic can also be used for generalized linear and other models as illustrated in this note.

R 2 marg is computed as

1 - SS(residuals)/SS(total) ,

where SS(residuals) is the sum of squares of the raw residuals, y_it - ^ y_it, i is the cluster (or subject) index, and t is the time (or repeated measure) index. SS(total) is the sum of squared deviations from the overall mean.

Zheng says that R 2 marg "shares the same interpretation and intuitive appeal as R² and reduces to R² for T = 1. It equals 1, its upper bound, when there is perfect prediction. It equals 0 when there is no association between the response and the predictors. It assumes a negative value when the variation is greater under the model of interest than under the null model, indicating poor prediction." T is the number of time points (repeated measures). She notes that as the sample size increases, R 2 marg approaches a value bounded by (0,1).

Concerning the fact that the GEE covariance matrix is not involved in the computation of R 2 marg , Zheng says, "In our opinion, goodness of fit is concerned with the agreement between the response and the prediction. The covariance matrix is only relevant to the point that it affects the fitted value through the parameter estimates, but is not of interest by itself. A measure that incorporates the covariance matrix may assume a low value because of poor fit and/or inappropriate correlation structure, and therefore confounds goodness of fit with correlation modelling. Such a measure does not serve the unique purpose of summarizing the goodness of fit and therefore is not considered."

Example

Consider the respiratory data analyzed in Stokes et al. (2012). As described there, eligible patients in each of two centers were randomly assigned to active treatment or placebo. During treatment, respiratory status was determined at baseline and four visits and recorded on a five-point scale of 0 for terrible to 4 for excellent. Potential explanatory variables in addition to treatment were center, sex, and baseline respiratory status, as well as age (in years) at the time of study entry. The baseline and follow-up responses are actually measured on a five-point scale, from terrible to excellent. For this example, a dichotomized outcome is analyzed indicating whether the patient experienced a good or excellent response (dichot=1) or a worse response (dichot=0). Note that in order to compute R 2 marg the response must be numeric.

The following statements fit a logistic GEE model with the dichotomized baseline response, treatment, age, center, sex, and visit as predictors of the dichotomized response at the later visits.

     proc genmod data=resp2;
        class id center trt sex visit;
        model dichot(event="1") = di_base trt age center sex visit / link=logit dist=bin;
        repeated subject=id*center / type=un;
        output out=out resraw=res pred=pred;
        run;

These statements compute R 2 marg . The USS and CSS functions compute the uncorrected and corrected sums of squares of their argument variables, respectively.

     proc sql; 
        select 1-(uss(res)/css(dichot)) as R2marg from out;
        quit;

The resulting value of R 2 marg is 0.25.

The above computation of R 2 marg uses the simple, overall average of all responses to compute SS(total). However, R² can be viewed as a comparison of the fitted model to a reference model. The reference model is generally thought of as the model containing only an intercept (the null model). For OLS models, the overall average response is the estimate of the mean under the null model. This is not necessarily the case in GEE models, though this might be the case when the independence correlation structure is used. The null GEE model is easily fit and its mean estimate used in the computation.

These statements illustrate that for the above model. The same model as above is fit first in GENMOD. The output data set, OUT, from this model is then used as input to the second GENMOD step, which fits the null GEE model. Note that it has no predictors and therefore contains only an intercept. The output data set from this step contains both the raw residuals, RES, from the model of interest and the predicted mean, NULLMEAN, from the null GEE model. Note that the value of NULLMEAN is the same for all observations. The function, uss(dichot-nullmean), computes SS(total) using the null mean rather than the simple average as above.

     proc genmod data=resp2;
        class id center trt sex visit;
        model dichot(event="1") = di_base trt age center sex visit / link=logit dist=bin;
        repeated subject=id*center / type=un;
        output out=out resraw=res;
        run;
     proc genmod data=out;
        class id center;
        model dichot(event="1") =  / link=logit dist=bin;
        repeated subject=id*center / type=un;
        output out=out2 pred=nullmean;
        run;
     proc sql; 
        select 1-(uss(res)/uss(dichot-nullmean)) as R2marg from out2;
        quit;

The resulting value of R 2 marg is still 0.25 since the difference between the overall average response, 0.55855, and the mean estimate from the null GEE model, 0.56174, is quite small. If the independence correlation structure, TYPE=IND, were used rather than the unstructured correlation matrix, then the mean estimate from the null GEE would be the same as the overall average response and the two values of R 2 marg would be identical.

Partial R²

A partial R² can be obtained using the above concept of a reference model, but one that is a nested simplification of the model of interest rather than the null model. R 2 marg computed with a sub-model as the reference model is equivalent to the partial R² for an OLS model as can be obtained with the PCORR2 option in PROC REG. The partial R² is the proportion of total variation left over from the reference model that is accounted for by the full model.

The following statements assess the effect of adding age, sex, and visit to the model. The model containing only the dichotomized baseline response, treatment, and center serves as the reference model.

     proc genmod data=resp2;
        class id center trt sex visit;
        model dichot(event="1") = di_base trt age center sex visit / link=logit dist=bin;
        repeated subject=id*center / type=un;
        output out=out resraw=res;
        run;
     proc genmod data=out;
        class id center trt;
        model dichot(event="1") = di_base trt center / link=logit dist=bin;
        repeated subject=id*center / type=un;
        output out=out2 pred=nullmean;
        run;
     proc sql; 
        select 1-(uss(res)/uss(dichot-nullmean)) as R2marg from out2;
        quit;

The resulting partial R 2 marg is 0.017. This small value suggests that the addition of age, sex, and center explains only a small proportion of the variability that is left unexplained by the reference model.

References

Ballinger, G. A. 2004. “Using Generalized Estimating Equations for Longitudinal Data Analysis.” Organizational Research Methods. 7(2): 127–150.

Stokes, M. E., C. S. Davis, and G. G. Koch. 2012. Categorical Data Analysis Using SAS. 3rd ed. Cary, NC: SAS Institute Inc.

Zheng, B. 2000. “Summarizing the Goodness of Fit of Generalized Linear Models for Longitudinal Data.” Statistics in Medicine. 19(10): 1265–1275.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		z/OS 64-bit
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8 Enterprise 32-bit
		Microsoft Windows 8 Enterprise x64
		Microsoft Windows 8 Pro 32-bit
		Microsoft Windows 8 Pro x64
		Microsoft Windows 8.1 Enterprise 32-bit
		Microsoft Windows 8.1 Enterprise x64
		Microsoft Windows 8.1 Pro 32-bit
		Microsoft Windows 8.1 Pro x64
		Microsoft Windows 10
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 R2
		Microsoft Windows Server 2008 for x64
		Microsoft Windows Server 2012 Datacenter
		Microsoft Windows Server 2012 R2 Datacenter
		Microsoft Windows Server 2012 R2 Std
		Microsoft Windows Server 2012 Std
		Microsoft Windows Server 2016
		Microsoft Windows Server 2019
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Longitudinal Analysis SAS Reference ==> Procedures ==> GEE SAS Reference ==> Procedures ==> GENMOD

Date Modified:	2021-11-11 12:45:56
Date Created:	2021-05-07 16:39:45

Support

Usage Note 67880: Generalized R-square for GEE and other models

Example

Partial R2

References

Operating System and Release Information

Partial R²