SUPPORT / SAMPLES & SAS NOTES
 

Support

Sample 60162: R-square and partial R-square for generalized linear models based on the variance function

DetailsResultsDownloadsAboutRate It

R2 and partial R2 for generalized linear models based on the variance function

Contents: Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References

 

PURPOSE:
R2 is a popular measure of fit used for ordinary regression models. The RsquareV macro provides the R 2 V statistic proposed by Zhang (2017) for use with any model based on a distribution with a well-defined variance function. This includes the class of generalized linear models and generalized additive models based on distributions such as the binomial for logistic models, Poisson, gamma, and others. It also includes models based on quasi-likelihood functions for which only the mean and variance functions are defined. A partial R2 is provided when comparing a full model to a nested, reduced model. Partial R can be obtained from this when the difference between the full and reduced model is a single parameter. A penalized R2 is also available adjusting for the additional parameters in the full model.
HISTORY:
The version of the RsquareV macro that you are using is displayed when you specify version (or any string) as the first argument. For example:
    %RsquareV(version, <macro options>)

The RsquareV macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:

    RsquareV: Unable to check for newer version

The computations performed by the macro are not affected by the appearance of this message.

Version
Update Notes
1.3 Fixed incomplete removal of observations with missing values.
1.1 Fixed error in KEEP statement when Base SAS® is used.
1.0 Initial coding
REQUIREMENTS:
Base SAS®. If the inverse Gaussian or Tweedie distribution is used then SAS/IML® is required.
USAGE:
Follow the instructions in the Downloads tab of this sample to save the RsquareV macro definition. Replace the text within quotation marks in the following statement with the location of the RsquareV macro definition file on your system. In your SAS program or in the SAS editor window, specify this statement to define the RsquareV macro and make it available for use:
   %inc "<location of your file containing the RsquareV macro>";

Following this statement, you can call the RsquareV macro. See the Results tab for examples.

Before calling the macro, fit the full model and save the response and predicted values in a data set. This is usually accomplished by including an OUTPUT statement with the PRED= option in the modeling procedure. Use this data set as input for fitting the reduced model and save the predicted values from the reduced model in an output data set using a different variable name than for full model predicted values. Specify this data set containing the observed responses and both sets of predicted values in the data= option in the macro. This process is illustrated in the examples in the Results tab.

The following parameters are required when using the RsquareV macro:

response=variable
Specifies the response variable that was modeled. If events/trials syntax was used to fit the model, specify the events variable in response= and the trials variable in trials=. If the response is binary and events/trials syntax was not used, then the response variable must be numeric and must be coded as 1 (for the event) or 0 (for the nonevent).
dist=poisson | binomial | normal | gamma | negbin | geometric | igauss | tweedie
Specifies the response distribution used in fitting the models.
pfull=variable
Specifies the variable containing predicted values from the full model.
psub=variable
Specifies the variable containing predicted values from the reduced model.

The following parameters are optional:

data=data-set-name
Specifies the name of the data set containing the response variable and the variables of predicted values for both the full and reduced models. If omitted, the data set last created is used.
freq=variable
Required if the FREQ statement was used in fitting the models. Specifies the variable in the FREQ statements. If omitted, all observation frequencies equal 1.
trials=variable
Required if events/trials syntax was used in fitting the models. Specifies the trials variable in the MODEL statements.
nparmfull=value
Specifies the number of parameters (or effective degrees of freedom) in the full model. This is the sum of the values in the DF column in the table of parameter estimates. If omitted, the adjusted (penalized) R2 is not computed.
nparmsub=value
Specifies the number of parameters (or effective degrees of freedom) in the reduced model. This is the sum of the values in the DF column in the table of parameter estimates. If omitted nparmsub=1.
k=value
Required if dist=negbin. Specifies the estimated negative binomial dispersion parameter estimate.
twpower=value
Required if dist=tweedie. Specifies the estimated Tweedie power parameter estimate.
DETAILS:
For details on the justification for and the computation and performance of R 2 V see Zhang (2017). Note in particular that the variance function of the response distribution defines the total variation and variation accounted for by models. As a result, the statistic applies to any model based on well-defined mean and variance functions such as generalized linear models based on likelihood or quasi-likelihood functions. Also, R 2 V reduces to the usual R2 statistic for ordinary regression models based on the normal distribution.

R 2 V for a single model is obtained by fitting the model of interest and an intercept-only model using the same data and response distribution. A data set containing the observed responses and the predicted values from both models are required. If a FREQ and/or WEIGHT statement is used to fit the model of interest, the same must be done when fitting the reduced model. The FREQ variable must be included in the data set read by the macro. The WEIGHT variable is not needed by the macro.

Partial R 2 V comparing a full model and a nested submodel can also be computed. The submodel is reduced from the full model by removing (constraining to zero) some of its parameters. Use the macro as above, but instead of the intercept-only model, fit the reduced model of interest and save its predicted values. The result is the partial R2 assessing the effect of the parameters in the full model that are constrained in the reduced model. For an ordinary linear regression model with normal response, this is the same as the square partial correlation provided by the PCORR2 option in PROC REG. If the difference between the full and reduced models is a single parameter, then the square root of the partial R2 (with sign matching the parameter's sign) is the partial R associated with that parameter.

Penalized R 2 V , adjusted for the additional parameters in the full model, is provided when the numbers of parameters in the full and reduced models are provided.

BY group processing

While the RsquareV macro does not directly support BY group processing, this capability can be provided by the RunBY macro which can run the modeling procedure and the RsquareV macro repeatedly for each of the BY groups in your data. See the RunBY macro documentation for details on its use. Also see the example titled "BY group processing" in the Results tab above.

LIMITATIONS:
Multinomial logit, zero-inflated, and Generalized Estimating Equations (GEE) models are not currently supported. An R2 measure for GEE models, proposed by Zheng (2000), can be computed as described in this note.
MISSING VALUES:
Observations omitted by the modeling procedure because of missing values are omitted by the macro.
REFERENCES:
Zhang, D. 2017. "A Coefficient of Determination for Generalized Linear Models." The American Statistician. 71(4): 310–316.

 




These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.