The OUTPUT statement creates a data set that contains observationwise statistics that are computed after the model is fitted. The variables in the input data set are not included in the output data set to avoid data duplication for large data sets; however, variables that are specified in the ID statement are included.
If the input data are in distributed form, where accessing data in a particular order cannot be guaranteed, the HPGENSELECT procedure copies the distribution or partition key to the output data set so that its contents can be joined with the input data.
The computation of the output statistics is based on the final parameter estimates. If the model fit does not converge, missing values are produced for the quantities that depend on the estimates.
When there are more than two response levels for multinomial data, values are computed only for variables that are named by
the XBETA
and PREDICTED
keywords; the other variables have missing values. These statistics are computed for every response category, and the automatic
variable _LEVEL_
identifies the response category on which the computed values are based. If you also specify the OBSCAT
option, then the observationwise statistics are computed only for the observed response category, as indicated by the value
of the _LEVEL_
variable.
For observations in which only the response variable is missing, values of the XBETA and PREDICTED statistics are computed even though these observations do not affect the model fit. For zero-inflated models, ZBETA and PZERO are also computed. This practice enables predicted mean values or predicted probabilities to be computed for new observations.
You can specify the following syntax elements in the OUTPUT statement before the slash (/).
You can specify the following options in the OUTPUT statement after the slash (/):