The QUANTSELECT Procedure

OUTPUT Statement

  • OUTPUT <OUT=SAS-data-set> <keyword <=name> > …<keyword <=name> >;

The OUTPUT statement creates a new SAS data set that saves diagnostic measures that are calculated for the selected model. If you do not specify a keyword, then the only diagnostic included is the predicted response.

All the variables in the original data set are included in the new data set, along with variables that are created by the keyword options in the OUTPUT statement. These new variables contain the values of a variety of statistics and diagnostic measures that are calculated for each observation in the data set.

The OUTPUT data set is created in row-wise form, and the variable _QUANTILE_ is optional. For each appropriate keyword specified in the OUTPUT statement, one variable for each specified quantile level is generated. These variables appear in the sorted order of the specified quantile levels.

If you specify a BY statement, then a variable _BY_ that indexes the BY groups is included. For each observation, the value of _BY_ is the index of the BY group to which this observation belongs. This variable is useful for matching BY groups with macro variables that PROC QUANTSELECT creates. See the section Macro Variables That Contain Selected Models for more information.

If you have partitioned the input data with a PARTITION statement, then a character variable _ROLE_ is included in the output data set. The following table shows the value of _ROLE_ for each observation:

_ROLE_ Value

Observation Role

TEST

Testing

TRAIN

Training

VALIDATE

Validation

If you want to create a permanent SAS data set, you must specify a two-level name. For more information about permanent SAS data sets, see the discussion in SAS Language Reference: Concepts.

You can specify the following arguments in the OUTPUT statement:

keyword <=name>

specifies the statistics to include in the output data set and optionally names the new variables that contain the statistics. Specify one of the following keywords for each desired statistic, followed optionally by an equal sign, and the name of a variable to contain the statistic. If you specify keyword=name, the new variable that contains the requested statistic has the specified name. If you omit the optional =name after a keyword, then the new variable name is formed by using a prefix of one or more characters that identify the statistic, followed by an underscore (_), followed by the dependent variable name.

PREDICTED | PRED | P

includes predicted values in the output data set. The prefix for the default name is p.

QUANTLEVEL | QL

includes observation quantile levels in the output data set. The prefix for the default name is ql. The QL= option is available only when you specify QUANTILE=PROCESS in the MODEL statement. For more information about observation quantile level, see the section Observation Quantile Level.

RESIDUAL | RESID | R

includes residuals, calculated as ACTUAL – PREDICTED, in the output data set. The prefix for the default name is r.

OUT=SAS-data-set

names the output data set. By default, PROC QUANTSELECT uses the DATAn convention to name the new data set.