The OUTPUT statement creates a new SAS data set that contains all the variables in the input data set and, optionally, the estimated linear predictors and their standard error estimates, the estimates of the cumulative or individual response probabilities, and the confidence limits for the cumulative probabilities. Regression diagnostic statistics and estimates of cross validated response probabilities are also available for binary response models. If you specify more than one OUTPUT statement, only the last one is used. Formulas for the statistics are given in the sections Linear Predictor, Predicted Probability, and Confidence Limits and Regression Diagnostics, and, for conditional logistic regression, in the section Conditional Logistic Regression.
If you use the single-trial syntax, the data set also contains a variable named _LEVEL_
, which indicates the level of the response that the given row of output is referring to. For instance, the value of the cumulative
probability variable is the probability that the response variable is as large as the corresponding value of _LEVEL_
. For more information, see the section OUT= Output Data Set in the OUTPUT Statement.
The estimated linear predictor, its standard error estimate, all predicted probabilities, and the confidence limits for the cumulative probabilities are computed for all observations in which the explanatory variables have no missing values, even if the response is missing. By adding observations with missing response values to the input data set, you can compute these statistics for new observations or for settings of the explanatory variables not present in the data without affecting the model fit. Alternatively, the SCORE statement can be used to compute predicted probabilities and confidence intervals for new observations.
Table 60.9 summarizes the options available in the OUTPUT statement. These options can be specified after a slash (/). The statistic and diagnostic options specify the statistics to be included in the output data set and name the new variables that contain the statistics. If a STRATA statement is specified, only the PREDICTED= , RESCHI= , STDRESCHI= , DFBETAS= , and H= options are available; for more information, see the section Regression Diagnostic Details.
Table 60.9: OUTPUT Statement Options
Option |
Description |
---|---|
Specifies for the confidence intervals |
|
Names the output data set |
|
Statistic Options |
|
Names the lower confidence limit |
|
Names the predicted probabilities |
|
Requests the individual, cumulative, or cross validated predicted probabilities |
|
Names the standard error estimate of the linear predictor |
|
Names the upper confidence limit |
|
Names the linear predictor |
|
Diagnostic Options for Binary Response |
|
Names the confidence interval displacement |
|
Names the confidence interval displacement |
|
Names the standardized deletion parameter differences |
|
Names the deletion chi-square goodness-of-fit change |
|
Names the deletion deviance change |
|
Names the leverage |
|
Names the Pearson chi-square residual |
|
Names the deviance residual |
|
Names the likelihood residual |
|
Names the standardized Pearson chi-square residual |
|
Names the standardized deviance residual |
The following list describes these options.
You can request any of the three types of predicted probabilities. For example, you can request both the individual predicted probabilities and the cross validated probabilities by specifying PREDPROBS=(I X).
When you specify the PREDPROBS= option, two automatic variables, _FROM_
and _INTO_
, are included for the single-trial syntax and only one variable, _INTO_
, is included for the events/trials syntax. The variable _FROM_
contains the formatted value of the observed response. The variable _INTO_
contains the formatted value of the response level with the largest individual predicted probability.
If you specify PREDPROBS=INDIVIDUAL, the OUT= data set contains k additional variables representing the individual probabilities, one for each response level, where k is the maximum number of response levels across all BY groups. The names of these variables have the form IP_
xxx, where xxx represents the particular level. The representation depends on the following situations:
If you specify events/trials syntax, xxx is either ‘Event’ or ‘Nonevent’. Thus, the variable containing the event probabilities is named IP_Event
and the variable containing the nonevent probabilities is named IP_Nonevent
.
If you specify the single-trial syntax with more than one BY group, xxx is 1 for the first ordered level of the response, 2 for the second ordered level of the response, and so forth, as given
in the "Response Profile" table. The variable containing the predicted probabilities Pr(Y
=1) is named IP_1
, where Y
is the response variable. Similarly, IP_2
is the name of the variable containing the predicted probabilities Pr(Y
=2), and so on.
If you specify the single-trial syntax with no BY-group processing, xxx is the left-justified formatted value of the response level (the value might be truncated so that IP_
xxx does not exceed 32 characters). For example, if Y
is the response variable with response levels ‘None’, ‘Mild’, and ‘Severe’, the variables representing individual probabilities
Pr(Y
=’None’), P(Y
=’Mild’), and P(Y
=’Severe’) are named IP_None
, IP_Mild
, and IP_Severe
, respectively.
If you specify PREDPROBS=CUMULATIVE, the OUT= data set contains k additional variables representing the cumulative probabilities, one for each response level, where k is the maximum number of response levels across all BY groups. The names of these variables have the form CP_
xxx, where xxx represents the particular response level. The naming convention is similar to that given by PREDPROBS=INDIVIDUAL. The PREDPROBS=CUMULATIVE
values are the same as those output by the PREDICT= option, but are arranged in variables on each output observation rather
than in multiple output observations.
If you specify PREDPROBS=CROSSVALIDATE, the OUT= data set contains k additional variables representing the cross validated predicted probabilities of the k response levels, where k is the maximum number of response levels across all BY groups. The names of these variables have the form XP_
xxx, where xxx represents the particular level. The representation is the same as that given by PREDPROBS=INDIVIDUAL except that for the
events/trials syntax there are four variables for the cross validated predicted probabilities instead of two:
XP_EVENT_R1E
is the cross validated predicted probability of an event when a single event is removed from the current observation.
XP_NONEVENT_R1E
is the cross validated predicted probability of a nonevent when a single event is removed from the current observation.
XP_EVENT_R1N
is the cross validated predicted probability of an event when a single nonevent is removed from the current observation.
XP_NONEVENT_R1N
is the cross validated predicted probability of a nonevent when a single nonevent is removed from the current observation.
The cross validated predicted probabilities are precisely those used in the CTABLE option. For more information about the computation, see the section Predicted Probability of an Event for Classification.