The HPLOGISTIC Procedure

Generalized Coefficient of Determination

The goal of a coefficient of determination, also known as an R-square measure, is to express the agreement between a stipulated model and the data in terms of variation in the data explained by the model. In linear models, the R-square measure is based on residual sums of squares; because these are additive, a measure bounded between 0 and 1 is easily derived.

In more general models where parameters are estimated by the maximum likelihood principle, Cox and Snell (1989, pp. 208–209) and Magee (1990) proposed the following generalization of the coefficient of determination:

\[  R^2 = 1 - \biggl \{ \frac{L(\bm {0})}{L({\widehat{\bbeta }})}\biggr \} ^{\frac{2}{n}}  \]

Here, $L(\bm {0})$ is the likelihood of the intercept-only model, $L({\widehat{\bbeta }})$ is the likelihood of the specified model, and $n$ denotes the number of observations used in the analysis. This number is adjusted for frequencies if a FREQ statement is present and is based on the trials variable for binomial models.

As discussed in Nagelkerke (1991), this generalized R-square measure has properties similar to the coefficient of determination in linear models. If the model effects do not contribute to the analysis, $L({\widehat{\bbeta }})$ approaches $L(\bm {0})$ and $R^2$ approaches zero.

However, $R^2$ does not have an upper limit of 1. Nagelkerke suggested a rescaled generalized coefficient of determination that achieves an upper limit of 1, by dividing $R^2$ by its maximum value,

\[  R_{\max }^2 = 1 - \{ L(\bm {0})\} ^{\frac{2}{n}}  \]

If you specify the RSQUARE option in the MODEL statement, the HPLOGISTIC procedure computes $R^2$ and the rescaled coefficient of determination according to Nagelkerke:

\[  \tilde{R}^2 = \frac{R^2}{R_{\max }^2}  \]

The $R^2$ and $\tilde{R}^2$ measures are most useful for comparing competing models that are not necessarily nested—that is, models that cannot be reduced to one another by simple constraints on the parameter space. Larger values of the measures indicate better models.