60162 - R-square and partial R-square for generalized linear models based on the variance function

SUPPORT / SAMPLES & SAS NOTES

Support

Sample 60162: R-square and partial R-square for generalized linear models based on the variance function

R² and partial R² for generalized linear models based on the variance function

Contents:

Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References

PURPOSE:

R² is a popular measure of fit used for ordinary regression models. The RsquareV macro provides the R 2 V statistic proposed by Zhang (2017) for use with any model based on a distribution with a well-defined variance function. This includes the class of generalized linear models and generalized additive models based on distributions such as the binomial for logistic models, Poisson, gamma, and others. It also includes models based on quasi-likelihood functions for which only the mean and variance functions are defined. A partial R² is provided when comparing a full model to a nested, reduced model. Partial R can be obtained from this when the difference between the full and reduced model is a single parameter. A penalized R² is also available adjusting for the additional parameters in the full model.

HISTORY:

The version of the RsquareV macro that you are using is displayed when you specify version (or any string) as the first argument. For example:

    %RsquareV(version, <macro options>)

The RsquareV macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:

    RsquareV: Unable to check for newer version

The computations performed by the macro are not affected by the appearance of this message.

Version	Update Notes
1.3	Fixed incomplete removal of observations with missing values.
1.1	Fixed error in KEEP statement when Base SAS^® is used.
1.0	Initial coding

REQUIREMENTS:

Base SAS^®. If the inverse Gaussian or Tweedie distribution is used then SAS/IML^® is required.

USAGE:

Follow the instructions in the Downloads tab of this sample to save the RsquareV macro definition. Replace the text within quotation marks in the following statement with the location of the RsquareV macro definition file on your system. In your SAS program or in the SAS editor window, specify this statement to define the RsquareV macro and make it available for use:

   %inc "<location of your file containing the RsquareV macro>";

Following this statement, you can call the RsquareV macro. See the Results tab for examples.

Before calling the macro, fit the full model and save the response and predicted values in a data set. This is usually accomplished by including an OUTPUT statement with the PRED= option in the modeling procedure. Use this data set as input for fitting the reduced model and save the predicted values from the reduced model in an output data set using a different variable name than for full model predicted values. Specify this data set containing the observed responses and both sets of predicted values in the data= option in the macro. This process is illustrated in the examples in the Results tab.

The following parameters are required when using the RsquareV macro:

response=variable: Specifies the response variable that was modeled. If events/trials syntax was used to fit the model, specify the events variable in response= and the trials variable in trials=. If the response is binary and events/trials syntax was not used, then the response variable must be numeric and must be coded as 1 (for the event) or 0 (for the nonevent).
dist=poisson | binomial | normal | gamma | negbin | geometric | igauss | tweedie: Specifies the response distribution used in fitting the models.
pfull=variable: Specifies the variable containing predicted values from the full model.
psub=variable: Specifies the variable containing predicted values from the reduced model.

The following parameters are optional:

data=data-set-name: Specifies the name of the data set containing the response variable and the variables of predicted values for both the full and reduced models. If omitted, the data set last created is used.
freq=variable: Required if the FREQ statement was used in fitting the models. Specifies the variable in the FREQ statements. If omitted, all observation frequencies equal 1.
trials=variable: Required if events/trials syntax was used in fitting the models. Specifies the trials variable in the MODEL statements.
nparmfull=value: Specifies the number of parameters (or effective degrees of freedom) in the full model. This is the sum of the values in the DF column in the table of parameter estimates. If omitted, the adjusted (penalized) R² is not computed.
nparmsub=value: Specifies the number of parameters (or effective degrees of freedom) in the reduced model. This is the sum of the values in the DF column in the table of parameter estimates. If omitted nparmsub=1.
k=value: Required if dist=negbin. Specifies the estimated negative binomial dispersion parameter estimate.
twpower=value: Required if dist=tweedie. Specifies the estimated Tweedie power parameter estimate.

DETAILS:

For details on the justification for and the computation and performance of R 2 V see Zhang (2017). Note in particular that the variance function of the response distribution defines the total variation and variation accounted for by models. As a result, the statistic applies to any model based on well-defined mean and variance functions such as generalized linear models based on likelihood or quasi-likelihood functions. Also, R 2 V reduces to the usual R² statistic for ordinary regression models based on the normal distribution.

R 2 V for a single model is obtained by fitting the model of interest and an intercept-only model using the same data and response distribution. A data set containing the observed responses and the predicted values from both models are required. If a FREQ and/or WEIGHT statement is used to fit the model of interest, the same must be done when fitting the reduced model. The FREQ variable must be included in the data set read by the macro. The WEIGHT variable is not needed by the macro.

Partial R 2 V comparing a full model and a nested submodel can also be computed. The submodel is reduced from the full model by removing (constraining to zero) some of its parameters. Use the macro as above, but instead of the intercept-only model, fit the reduced model of interest and save its predicted values. The result is the partial R² assessing the effect of the parameters in the full model that are constrained in the reduced model. For an ordinary linear regression model with normal response, this is the same as the square partial correlation provided by the PCORR2 option in PROC REG. If the difference between the full and reduced models is a single parameter, then the square root of the partial R² (with sign matching the parameter's sign) is the partial R associated with that parameter.

Penalized R 2 V , adjusted for the additional parameters in the full model, is provided when the numbers of parameters in the full and reduced models are provided.

BY group processing

While the RsquareV macro does not directly support BY group processing, this capability can be provided by the RunBY macro which can run the modeling procedure and the RsquareV macro repeatedly for each of the BY groups in your data. See the RunBY macro documentation for details on its use. Also see the example titled "BY group processing" in the Results tab above.

LIMITATIONS:

Multinomial logit, zero-inflated, and Generalized Estimating Equations (GEE) models are not currently supported. An R² measure for GEE models, proposed by Zheng (2000), can be computed as described in this note.

MISSING VALUES:

Observations omitted by the modeling procedure because of missing values are omitted by the macro.

REFERENCES:

Zhang, D. 2017. "A Coefficient of Determination for Generalized Linear Models." The American Statistician. 71(4): 310–316.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

Type:	Sample
Topic:	Analytics ==> Regression

Date Modified:	2021-05-13 11:02:32
Date Created:	2017-03-21 11:26:29

Product Family	Product	Host	SAS Release
Product Family	Product	Host	Starting	Ending
SAS System	SAS/STAT	z/OS
		z/OS 64-bit
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8 Enterprise 32-bit
		Microsoft Windows 8 Enterprise x64
		Microsoft Windows 8 Pro 32-bit
		Microsoft Windows 8 Pro x64
		Microsoft Windows 8.1 Enterprise 32-bit
		Microsoft Windows 8.1 Enterprise x64
		Microsoft Windows 8.1 Pro 32-bit
		Microsoft Windows 8.1 Pro x64
		Microsoft Windows 10
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 R2
		Microsoft Windows Server 2008 for x64
		Microsoft Windows Server 2012 Datacenter
		Microsoft Windows Server 2012 R2 Datacenter
		Microsoft Windows Server 2012 R2 Std
		Microsoft Windows Server 2012 Std
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

Support

Sample 60162: R-square and partial R-square for generalized linear models based on the variance function

R² and partial R² for generalized linear models based on the variance function

BY group processing

Operating System and Release Information

Follow Us

What is...

Support

Sample 60162: R-square and partial R-square for generalized linear models based on the variance function

R2 and partial R2 for generalized linear models based on the variance function

BY group processing

Operating System and Release Information

Follow Us

What is...

R² and partial R² for generalized linear models based on the variance function