Contents: |
Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References |

*PURPOSE:*- R
^{2}is a popular measure of fit used for ordinary regression models. The RsquareV macro provides the R 2 V statistic proposed by Zhang (2017) for use with any model based on a distribution with a well-defined variance function. This includes the class of generalized linear models and generalized additive models based on distributions such as the binomial for logistic models, Poisson, gamma, and others. It also includes models based on quasi-likelihood functions for which only the mean and variance functions are defined. A partial R^{2}is provided when comparing a full model to a nested, reduced model. Partial R can be obtained from this when the difference between the full and reduced model is a single parameter. A penalized R^{2}is also available adjusting for the additional parameters in the full model. *HISTORY:*- The version of the RsquareV macro that you are using is displayed when you specify
**version**(or any string) as the first argument. For example:%RsquareV(version, <

*macro options*>)The RsquareV macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:

RsquareV: Unable to check for newer version

The computations performed by the macro are not affected by the appearance of this message.

*Version**Update Notes*1.3 Fixed incomplete removal of observations with missing values. 1.1 Fixed error in KEEP statement when Base SAS ^{®}is used.1.0 Initial coding *REQUIREMENTS:*- Base SAS
^{®}. If the inverse Gaussian or Tweedie distribution is used then SAS/IML^{®}is required. *USAGE:*- Follow the instructions in the Downloads tab of this sample to save the RsquareV macro definition. Replace the text within quotation marks in the following statement with the location of the RsquareV macro definition file on your system. In your SAS program or in the SAS editor window, specify this statement to define the RsquareV macro and make it available for use:
%inc "<

*location of your file containing the RsquareV macro*>";Following this statement, you can call the RsquareV macro. See the Results tab for examples.

Before calling the macro, fit the full model and save the response and predicted values in a data set. This is usually accomplished by including an OUTPUT statement with the PRED= option in the modeling procedure. Use this data set as input for fitting the reduced model and save the predicted values from the reduced model in an output data set using a different variable name than for full model predicted values. Specify this data set containing the observed responses and both sets of predicted values in the

**data=**option in the macro. This process is illustrated in the examples in the Results tab.The following parameters are required when using the RsquareV macro:

**response=***variable*- Specifies the response variable that was modeled. If
*events/trials*syntax was used to fit the model, specify the*events*variable in**response=**and the*trials*variable in**trials=**. If the response is binary and*events/trials*syntax was not used, then the response variable must be numeric and must be coded as 1 (for the event) or 0 (for the nonevent). **dist=**poisson | binomial | normal | gamma | negbin | geometric | igauss | tweedie- Specifies the response distribution used in fitting the models.
**pfull=***variable*- Specifies the variable containing predicted values from the full model.
**psub=***variable*- Specifies the variable containing predicted values from the reduced model.

The following parameters are optional:

**data=***data-set-name*- Specifies the name of the data set containing the response variable and the variables of predicted values for both the full and reduced models. If omitted, the data set last created is used.
**freq=***variable*- Required if the FREQ statement was used in fitting the models. Specifies the variable in the FREQ statements. If omitted, all observation frequencies equal 1.
**trials=***variable*- Required if
*events/trials*syntax was used in fitting the models. Specifies the*trials*variable in the MODEL statements. **nparmfull=***value*- Specifies the number of parameters (or effective degrees of freedom) in the full model. This is the sum of the values in the DF column in the table of parameter estimates. If omitted, the adjusted (penalized) R
^{2}is not computed. **nparmsub=***value*- Specifies the number of parameters (or effective degrees of freedom) in the reduced model. This is the sum of the values in the DF column in the table of parameter estimates. If omitted
**nparmsub=1**. **k=***value*- Required if
**dist=negbin**. Specifies the estimated negative binomial dispersion parameter estimate. **twpower=***value*- Required if
**dist=tweedie**. Specifies the estimated Tweedie power parameter estimate.

*DETAILS:*- For details on the justification for and the computation and performance of R 2 V see Zhang (2017). Note in particular that the variance function of the response distribution defines the total variation and variation accounted for by models. As a result, the statistic applies to any model based on well-defined mean and variance functions such as generalized linear models based on likelihood or quasi-likelihood functions. Also, R 2 V reduces to the usual R
^{2}statistic for ordinary regression models based on the normal distribution.R 2 V for a single model is obtained by fitting the model of interest and an intercept-only model using the same data and response distribution. A data set containing the observed responses and the predicted values from both models are required. If a FREQ and/or WEIGHT statement is used to fit the model of interest, the same must be done when fitting the reduced model. The FREQ variable must be included in the data set read by the macro. The WEIGHT variable is not needed by the macro.

Partial R 2 V comparing a full model and a nested submodel can also be computed. The submodel is reduced from the full model by removing (constraining to zero) some of its parameters. Use the macro as above, but instead of the intercept-only model, fit the reduced model of interest and save its predicted values. The result is the partial R

^{2}assessing the effect of the parameters in the full model that are constrained in the reduced model. For an ordinary linear regression model with normal response, this is the same as the square partial correlation provided by the PCORR2 option in PROC REG. If the difference between the full and reduced models is a single parameter, then the square root of the partial R^{2}(with sign matching the parameter's sign) is the partial R associated with that parameter.Penalized R 2 V , adjusted for the additional parameters in the full model, is provided when the numbers of parameters in the full and reduced models are provided.

#### BY group processing

While the RsquareV macro does not directly support BY group processing, this capability can be provided by the RunBY macro which can run the modeling procedure and the RsquareV macro repeatedly for each of the BY groups in your data. See the RunBY macro documentation for details on its use. Also see the example titled "BY group processing" in the Results tab above.

*LIMITATIONS:*- Multinomial logit, zero-inflated, and Generalized Estimating Equations (GEE) models are not currently supported. An R
^{2}measure for GEE models, proposed by Zheng (2000), can be computed as described in this note. *MISSING VALUES:*- Observations omitted by the modeling procedure because of missing values are omitted by the macro.
*REFERENCES:*- Zhang, D. 2017. "A Coefficient of Determination for Generalized Linear Models."
*The American Statistician.*71(4): 310–316.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

In addition to the following examples, see this note which uses the RsquareV macro to assess the relative importance of the effects in a generalized linear model.

**EXAMPLE 1: Poisson model for crab data**- Zhang (2016) analyzes data on 173 nesting horseshoe crabs with various models illustrating the computation of
R
2
V
and partial
R
2
V
.
The following statements fit a Poisson model including the
*color*and*spine*categorical predictors as well as linear and quadratic contributions of the*width*and*weight*predictors. The predicted values are saved in variable*px*in data set*Preds*by the P= option in the OUTPUT statement. Note that this data set also contains all variables from input data set*Crabs*.proc genmod data=Crabs; class color spine; model satellite = color spine width|width weight|weight / dist=poisson; output out=Preds p=px; run;

These statements use the

*Preds*data set to fit the intercept-only model. The OUTPUT statement adds the predicted values from this model as variable*p1*to data set*Preds*.proc genmod data=Preds; model satellite = / dist=poisson; output out=Preds p=p1; run;

The following statement defines the RsquareV macro and makes it available for use. Specify the path of the saved macro code on your system between the quotes.

%include "<

*location of your file containing the RsquareV macro*>";The following calls the macro and computes R 2 V and penalized R 2 V . The response variable and predicted values from both models are specified as well as the number of parameters in the model of interest.

%RsquareV(data=Preds, response=satellite, pfull=px, psub=p1, dist=poisson, nparmfull=10)

The computed R 2 V for the first model is 0.1515. Adjusting this for the number of parameters included in the model, the penalized R 2 V is 0.1046.

R _{v}^{2}and Penalized R_{v}^{2}

Rsquare_v Penalized

Rsquare_v0.15149 0.10464 Note that these results do not change when using a quasi-likelihood to fit the model. For example, adding a dispersion parameter to the Poisson likelihood using the SCALE= option in the MODEL statement results in use of a quasi-likelihood function for model estimation. The SCALE=PEARSON option produces an estimated dispersion parameter of 3.2354 which serves to inflate the parameter covariance matrix and therefore the standard errors of the model parameters. However, the model parameter estimates are unaffected, so the predicted values are the same resulting in the same R 2 V statistics.

Next, partial R 2 V is computed comparing a full model containing

*color*,*width*and*weight*with a reduced model that removes the five parameters of*color*and*width*. First, the full model is fit.proc genmod data=Crabs; class color spine; model satellite = color width|width weight|weight / dist=poisson; output out=Preds p=pf; run;

The reduced model is fit next, and its predicted values are added to the OUTPUT OUT= data set.

proc genmod data=Preds; model satellite = weight|weight / dist=poisson; output out=Preds p=pr; run;

Calling the macro produces the partial R 2 V comparing the full and reduced models.

%RsquareV(data=Preds, response=satellite, pfull=pf, psub=pr, dist=poisson)

The partial R 2 V is 0.0205 suggesting relatively small additional contributions by the

*color*and*width*predictors.R _{v}^{2}

Rsquare_v 0.020455 **EXAMPLE 2: Generalized additive model with splines compared to linear regression model**- The example in the Getting Started section of the GAMPL procedure documentation uses a data set on voting in 3107 U.S. counties in 1980. The log proportion of voting in a county is the response. These statements fit a generalized additive model (GAM) using spline smooths for each of the six available predictors and save the response and predicted values in a data set. The number of effective degrees of freedom for this model is 48.71.
proc gampl data=sashelp.Vote1980 plots seed=12345; model LogVoteRate = spline(Pop ) spline(Edu) spline(Houses) spline(Income) spline(Longitude Latitude); id LogVoteRate Pop Edu Houses Income Longitude Latitude; output out=gamout p=pgamf; run;

The following statements fit the intercept-only model, add its predicted values to the output data set, and call the macro.

proc gampl data=gamout plots seed=12345; model LogVoteRate = ; id LogVoteRate Pop Edu Houses Income Longitude Latitude pgamf; output out=gamout p=pgam1; run; %RsquareV(data=gamout, response=LogVoteRate, pfull=pgamf, psub=pgam1, dist=normal, nparmfull=48.70944, nparmsub=2)

R 2 V for this model is 0.74.

R _{v}^{2}and Penalized R_{v}^{2}

Rsquare_v Penalized

Rsquare_v0.74237 0.73844 Compare this to an ordinary regression model with linear effects for each of the predictors.

proc reg data=gamout; model LogVoteRate=pop edu houses income longitude latitude; output out=gamregout p=preg; run; quit;

R

^{2}for the linear regression model is 0.59. This macro call provides the partial and adjusted partial R 2 V%RsquareV(data=gamregout, response=LogVoteRate, pfull=pgamf, psub=preg, dist=normal, nparmfull=48.71, nparmsub=7)

The partial value, 0.38, is the proportion of total variation left over from the reduced model that is accounted for by the full model, approximately (0.74 - 0.59)/(1 - 0.59). That is, the splines account for about 38% of the total variation left over from the linear regression model.

R _{v}^{2}and Penalized R_{v}^{2}

Rsquare_v Penalized

Rsquare_v0.37825 0.36977 **EXAMPLE 3: BY group processing**- While the RsquareV macro does not support BY processing directly, the RunBY macro can be used to run the modeling procedure and the macro on BY groups in the data.
The following uses the low birth weight data presented in Hosmer and Lemeshow (2000). In the statements below, a WHERE statement is included in the LOGISTIC modeling steps to subset the input data to one level of the BY variable, RACE. The special macro variables, _BYx and _LVLx, are used by the RunBY macro to fit the models to each BY group and then to run the RsquareV macro. The BYlabel macro variable is also used to label the displayed results with the BY group definition. Since the RsquareV macro writes its own titles, a FOOTNOTE statement is used instead of a TITLE statement to provide the label.

%macro code(); proc logistic data=lowbirth; where &_BY1=&_LVL1; model low(event="1")=; output out=lb p=pnull; title "&BYlabel"; run; proc logistic data=lb; where &_BY1=&_LVL1; model low(event="1")=age lwt ftv; output out=lb p=pfull; title "&BYlabel"; run; footnote "Above for &BYlabel"; %RsquareV(response=low, dist=binomial, psub=pnull, pfull=pfull); footnote; %mend; %RunBY(data=lowbirth, by=race)

Right-click on the link below and select **Save** to save the RsquareV macro definition to a file. It is recommended that you name the file RsquareV.sas.

Download and save RsquareV.sas

The RsquareV macro provides an R-square measure for models with a well-defined variance function such as generalized linear and generalized additive models.

#### Operating System and Release Information

Type: | Sample |

Topic: | Analytics ==> Regression |

Date Modified: | 2021-05-13 11:02:32 |

Date Created: | 2017-03-21 11:26:29 |

Product Family | Product | Host | SAS Release | |

Starting | Ending | |||

SAS System | SAS/STAT | z/OS | ||

z/OS 64-bit | ||||

OpenVMS VAX | ||||

Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||

Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||

Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||

Microsoft Windows XP 64-bit Edition | ||||

Microsoft® Windows® for x64 | ||||

OS/2 | ||||

Microsoft Windows 8 Enterprise 32-bit | ||||

Microsoft Windows 8 Enterprise x64 | ||||

Microsoft Windows 8 Pro 32-bit | ||||

Microsoft Windows 8 Pro x64 | ||||

Microsoft Windows 8.1 Enterprise 32-bit | ||||

Microsoft Windows 8.1 Enterprise x64 | ||||

Microsoft Windows 8.1 Pro 32-bit | ||||

Microsoft Windows 8.1 Pro x64 | ||||

Microsoft Windows 10 | ||||

Microsoft Windows 95/98 | ||||

Microsoft Windows 2000 Advanced Server | ||||

Microsoft Windows 2000 Datacenter Server | ||||

Microsoft Windows 2000 Server | ||||

Microsoft Windows 2000 Professional | ||||

Microsoft Windows NT Workstation | ||||

Microsoft Windows Server 2003 Datacenter Edition | ||||

Microsoft Windows Server 2003 Enterprise Edition | ||||

Microsoft Windows Server 2003 Standard Edition | ||||

Microsoft Windows Server 2003 for x64 | ||||

Microsoft Windows Server 2008 | ||||

Microsoft Windows Server 2008 R2 | ||||

Microsoft Windows Server 2008 for x64 | ||||

Microsoft Windows Server 2012 Datacenter | ||||

Microsoft Windows Server 2012 R2 Datacenter | ||||

Microsoft Windows Server 2012 R2 Std | ||||

Microsoft Windows Server 2012 Std | ||||

Microsoft Windows XP Professional | ||||

Windows 7 Enterprise 32 bit | ||||

Windows 7 Enterprise x64 | ||||

Windows 7 Home Premium 32 bit | ||||

Windows 7 Home Premium x64 | ||||

Windows 7 Professional 32 bit | ||||

Windows 7 Professional x64 | ||||

Windows 7 Ultimate 32 bit | ||||

Windows 7 Ultimate x64 | ||||

Windows Millennium Edition (Me) | ||||

Windows Vista | ||||

Windows Vista for x64 | ||||

64-bit Enabled AIX | ||||

64-bit Enabled HP-UX | ||||

64-bit Enabled Solaris | ||||

ABI+ for Intel Architecture | ||||

AIX | ||||

HP-UX | ||||

HP-UX IPF | ||||

IRIX | ||||

Linux | ||||

Linux for x64 | ||||

Linux on Itanium | ||||

OpenVMS Alpha | ||||

OpenVMS on HP Integrity | ||||

Solaris | ||||

Solaris for x64 | ||||

Tru64 UNIX |