The HPLOGISTIC Procedure

OUTPUT Statement

OUTPUT <OUT=SAS-data-set>
<COPYVARS=(variables)>
<keyword <=name>>…<keyword <=name>> </ options> ;

The OUTPUT statement creates a data set that contains observationwise statistics that are computed after fitting the model. The variables in the input data set are not included in the output data set to avoid data duplication for large data sets; however, variables specified in the ID statement or COPYVAR= option are included.

If the input data are in distributed form, where access of data in a particular order cannot be guaranteed, the HPLOGISTIC procedure copies the distribution or partition key to the output data set so that its contents can be joined with the input data.

The output statistics are computed based on the final parameter estimates. If the model fit does not converge, missing values are produced for the quantities that depend on the estimates.

When there are more than two response levels, only variables named by the XBETA and PREDICTED keywords have their values computed; the other variables have missing values. These statistics are computed for every response category, and the automatic variable _LEVEL_ identifies the response category upon which the computed values are based. If you also specify the OBSCAT option, then the observationwise statistics are computed only for the observed response category, as indicated by the value of the _LEVEL_ variable.

For observations in which only the response variable is missing, values of the XBETA and PREDICTED statistics are computed even though these observations do not affect the model fit. This enables, for instance, predicted probabilities to be computed for new observations.

You can specify the following syntax elements in the OUTPUT statement before the slash (/).

OUT=SAS-data-set
DATA=SAS-data-set

specifies the name of the output data set. If the OUT= (or DATA=) option is omitted, the procedure uses the DATAn convention to name the output data set.

COPYVAR=variable
COPYVARS=(variables)

transfers one or more variables from the input data set to the output data set. Variables named in an ID statement are also copied from the input data set to the output data set.

keyword <=name>

specifies a statistic to include in the output data set and optionally names the variable name. If you do not provide a name, the HPLOGISTIC procedure assigns a default name based on the type of statistic requested.

The following are valid keywords for adding statistics to the OUTPUT data set:

LINP | XBETA

requests the linear predictor $\eta = \mb {x}’\bbeta $.

PREDICTED | PRED | P

requests predicted values (predicted probabilities of events) for the response variable.

RESIDUAL | RESID | R

requests the raw residual, $y - \mu $, where $\mu $ is the estimate of the predicted event probability. This statistic is not computed for multinomial models.

PEARSON | PEARS | RESCHI

requests the Pearson residual, $\frac{\sqrt {wn}(y/n - \mu )}{\sqrt {\mu (1-\mu )}}$, where $\mu $ is the estimate of the predicted event probability, $w$ is the weight of the observation, and $n$ is the number of binomial trials ($n$=1 for binary observations). This statistic is not computed for multinomial models.

You can specify the following options in the OUTPUT statement after the slash (/):

OBSCAT

requests (for multinomial models) that observationwise statistics be produced for the response level only. If the OBSCAT option is not specified and the response variable has $J$ levels, then the following outputs are created: for cumulative link models, $J-1$ records are output for every observation in the input data that corresponds to the $J-1$ lower-ordered response categories; for generalized logit models, $J$ records are output that correspond to all $J$ response categories.