The LOGISTIC Procedure

Classification Table

For binary response data, the response is either an event or a nonevent. In PROC LOGISTIC, the response with Ordered Value 1 is regarded as the event, and the response with Ordered Value 2 is the nonevent. PROC LOGISTIC models the probability of the event. From the fitted model, a predicted event probability can be computed for each observation. A method to compute a reduced-bias estimate of the predicted probability is given in the section Predicted Probability of an Event for Classification. If the predicted event probability exceeds or equals some cutpoint value $z \in [0,1]$, the observation is predicted to be an event observation; otherwise, it is predicted as a nonevent. A $2\times 2$ frequency table can be obtained by cross-classifying the observed and predicted responses. The CTABLE option produces this table, and the PPROB= option selects one or more cutpoints. Each cutpoint generates a classification table. If the PEVENT= option is also specified, a classification table is produced for each combination of PEVENT= and PPROB= values.

The accuracy of the classification is measured by its sensitivity (the ability to predict an event correctly) and specificity (the ability to predict a nonevent correctly). Sensitivity is the proportion of event responses that were predicted to be events. Specificity is the proportion of nonevent responses that were predicted to be nonevents. PROC LOGISTIC also computes three other conditional probabilities: false positive rate, false negative rate, and rate of correct classification. The false positive rate is the proportion of predicted event responses that were observed as nonevents. The false negative rate is the proportion of predicted nonevent responses that were observed as events. Given prior probabilities specified with the PEVENT= option, these conditional probabilities can be computed as posterior probabilities by using Bayes’ theorem.

Predicted Probability of an Event for Classification

When you classify a set of binary data, if the same observations used to fit the model are also used to estimate the classification error, the resulting error-count estimate is biased. One way of reducing the bias is to remove the binary observation to be classified from the data, reestimate the parameters of the model, and then classify the observation based on the new parameter estimates. However, it would be costly to fit the model by leaving out each observation one at a time. The LOGISTIC procedure provides a less expensive one-step approximation to the preceding parameter estimates. Let $\widehat{\bbeta }$ be the MLE of the parameter vector $(\alpha , \beta _{1},\dots ,\beta _{s})’$ based on all observations. Let $\widehat{\bbeta }_{(j)}$ denote the MLE computed without the jth observation. The one-step estimate of $\widehat{\bbeta }_{(j)}$ is given by

\[  \widehat{\bbeta }_{(j)}^1=\widehat{\bbeta }- \frac{w_ j(y_ j-{\widehat{\pi }}_ j)}{1-h_{j}}\widehat{\bV }(\widehat{\bbeta }) \left( \begin{array}{c} 1 \\ \mb {x}_ j \end{array} \right)  \]

where

$y_ j$

is 1 for an observed event response and 0 otherwise

$w_ j$

is the weight of the observation

${\widehat{\pi }}_ j$

is the predicted event probability based on $\widehat{\bbeta }$

$h_{j}$

is the hat diagonal element with $n_ j=1$ and $r_ j=y_ j$

${\widehat{\bV }}(\widehat{\bbeta })$

is the estimated covariance matrix of $\widehat{\bbeta }$

False Positive, False Negative, and Correct Classification Rates Using Bayes’ Theorem

Suppose $n_1$ of n individuals experience an event, such as a disease. Let this group be denoted by ${\mc {C}}_1$, and let the group of the remaining $n_2=n-n_1$ individuals who do not have the disease be denoted by ${\mc {C}}_2$. The jth individual is classified as giving a positive response if the predicted probability of disease (${\widehat{\pi }}^*_{(j)}$) is large. The probability ${\widehat{\pi }}^*_{(j)}$ is the reduced-bias estimate based on the one-step approximation given in the preceding section. For a given cutpoint z, the jth individual is predicted to give a positive response if ${\widehat{\pi }}^*_{(j)} \geq z$.

Let B denote the event that a subject has the disease, and let $\bar{B}$ denote the event of not having the disease. Let A denote the event that the subject responds positively, and let $\bar{A}$ denote the event of responding negatively. Results of the classification are represented by two conditional probabilities, ${\Pr }(A|B)$ and ${\Pr }(A|\bar{B})$, where ${\Pr }(A|B)$ is the sensitivity and ${\Pr }(A|\bar{B})$ is one minus the specificity.

These probabilities are given by

$\displaystyle  {\Pr }(A|B)= \frac{\sum _{j \in {\mc {C}}_1} I({\widehat{\pi }}^*_{(j)} \geq z)}{n_1} $
$\displaystyle {\Pr }(A|\bar{B})= \frac{\sum _{j \in {\mc {C}}_2} I({\widehat{\pi }}^*_{(j)} \geq z)}{n_2}  $

where $I(\cdot )$ is the indicator function.

Bayes’ theorem is used to compute several rates of the classification. For a given prior probability ${\Pr }(B)$ of the disease, the false positive rate $P_{F+}$, the false negative rate $P_{F-}$, and the correct classification rate $P_ C$ are given by Fleiss (1981, pp. 4–5) as follows:

$\displaystyle  P_{F+} = {\Pr }(\bar{B}|A)  $
$\displaystyle  =  $
$\displaystyle  \frac{{\Pr }(A|\bar{B})[1-{\Pr }(B)]}{{\Pr }(A|\bar{B}) + {\Pr }(B)[{\Pr }(A|B) - {\Pr }(A|\bar{B})]}  $
$\displaystyle P_{F-} = {\Pr }(B|\bar{A})  $
$\displaystyle  =  $
$\displaystyle  \frac{[1-{\Pr }(A|B)]{\Pr }(B)}{1-{\Pr }(A|\bar{B}) - {\Pr }(B)[{\Pr }(A|B) - {\Pr }(A|\bar{B})]}  $
$\displaystyle P_ C = {\Pr }(B|A) + {\Pr }(\bar{B}|\bar{A})  $
$\displaystyle = $
$\displaystyle  {\Pr }(A|B){\Pr }(B)+{\Pr }(\bar{A}|\bar{B})[1-{\Pr }(B)]  $

The prior probability ${\Pr }(B)$ can be specified by the PEVENT= option. If the PEVENT= option is not specified, the sample proportion of diseased individuals is used; that is, ${\Pr }(B)=n_1/n$. In such a case, the false positive rate and the false negative rate reduce to

$\displaystyle  P_{F+}  $
$\displaystyle = $
$\displaystyle  \frac{\sum _{j \in {\mc {C}}_2} I({\widehat{\pi }}^*_{(j)} \geq z)}{\sum _{j \in {\mc {C}}_1} I({\widehat{\pi }}^*_{(j)} \geq z) + \sum _{j \in {\mc {C}}_2} I({\widehat{\pi }}^*_{(j)} \geq z)} $
$\displaystyle P_{F-}  $
$\displaystyle = $
$\displaystyle  \frac{\sum _{j \in {\mc {C}}_1} I({\widehat{\pi }}^*_{(j)}<z)}{\sum _{j \in {\mc {C}}_1} I({\widehat{\pi }}^*_{(j)}<z) + \sum _{j \in {\mc {C}}_2} I({\widehat{\pi }}^*_{(j)}<z)} $
$\displaystyle P_{C}  $
$\displaystyle = $
$\displaystyle  \frac{\sum _{j \in {\mc {C}}_1} I({\widehat{\pi }}^*_{(j)} \geq z) + \sum _{j \in {\mc {C}}_2} I({\widehat{\pi }}^*_{(j)}<z)}{n} $

Note that for a stratified sampling situation in which $n_1$ and $n_2$ are chosen a priori, $n_1/n$ is not a desirable estimate of ${\Pr }(B)$. For such situations, the PEVENT= option should be specified.