The ADAPTIVEREG Procedure (Experimental)

OUTPUT Statement

OUTPUT <OUT=SAS-data-set> <keyword <(keyword-options )> <=name>> …<keyword <(keyword-options )> <=name>> ;

The OUTPUT statement creates a new SAS data set to contain diagnostic measures that are calculated for the selected model. If you do not specify a keyword, then the only diagnostic included is the predicted response.

All the variables in the original data set are included by the new data set, along with variables created in the OUTPUT statement. These new variables contain the values of a variety of statistics and diagnostic measures that are calculated for each observation in the data set. If you specify a BY statement, then a variable _BY_ that indexes the BY groups is included. For each observation, the value of _BY_ is the index of the BY group to which this observation belongs.

If you have requested n-fold cross validation, then a variable _CVINDEX_ is included in the output data set. For each observation that is used for model training, the value of _CVINDEX_ is i if that observation is omitted in forming the ith subset of the training data. See the CVMETHOD= for additional details. The value of _CVINDEX_ is 0 for all observations in the input data set that are not used for model training.

If you have partitioned the input data by using a PARTITION statement, then a character variable _ROLE_ is included in the output data set. For each observation the value of _ROLE_ is as follows:

_ROLE_

Observation Role

TEST

Testing

TRAIN

Training

VALIDATE

Validation

If you want to create a permanent SAS data set, you must specify a two-level name. For more information about permanent SAS data sets, see SAS Language Reference: Concepts.

Details about the specifications in the OUTPUT statement follow.

keyword <(keyword-options)><=name>

specifies the statistics to include in the output data set and optionally names the new variables that contain the statistics. You can use the keyword-options to control which type of a particular statistic to compute for generalized linear models. You can specify the following keyword-options for associated statistics:

ILINK

computes the prediction on the scale of the data $\hat{\mu }=g^{-1}(\hat{\eta })$.

RAW

requests the raw residual value $r=y-\hat{\eta }$.

PEARSON

requests the Pearson residual value $r=(y-\hat{\eta })/\sqrt {V(\hat{\mu })}$.

DEVIANCE

requests the deviance residual value $r=\mbox{sign}(y-\hat{\mu })\sqrt {\hat{d^2}}$.

You can specify a keyword for each desired statistic (see the following list of keywords), followed optionally by an equal sign, and a variable to contain the statistic.

If you specify keyword=name, the new variable that contains the requested statistic has the specified name. If you omit the optional =name after a keyword, then the new variable name is formed by default names.

You can specify the following keywords for the corresponding statistics:

PREDICTED |PRED |P

requests predicted values. The default name is Pred.

RESIDUAL |RESID |R

requests residuals, calculated as ACTUAL – PREDICTED. The default name is Resid.

OUT=SAS-data-set

specifies the name of the new data set to contain the diagnostic measures. If the OUT= option is omitted, the procedure uses the DATAn convention to name the output data set.