The LOESS Procedure

OUTPUT Statement

  • OUTPUT <OUT= SAS-data-set> <keyword <= name>> <…keyword <=name>> </ options>;

The OUTPUT statement creates a new SAS data set that saves the predicted values and other requested statistics that are calculated after models for all smoothing parameter values that are specified in the SMOOTH= option in the MODEL statement have been fit. If you do not specify a keyword, then only the predicted response is included.

All the variables in the original data set are included in the new data set, along with variables created by the OUTPUT statement. These new variables contain the predicted values and a variety of other statistics that are calculated for each observation in the data set.

If you want to create a SAS data set in a permanent library, you must specify a two-level name. For more information about permanent libraries and SAS data sets, see SAS Language Reference: Concepts.

You can specify the following options in the OUTPUT statement:

OUT=SAS data set

specifies the name of the new data set. By default, the procedure uses the DATAn convention to name the new data set.

keyword <=name>

specifies the statistics to include in the output data set as new variables and optionally names the new variables. Specify a keyword for each desired statistic (see the following list of keywords), followed optionally by an equal sign and a variable to contain the statistic.

The new variables are named as follows: If you specify keyword=name, the new variable has the specified name. If you omit the optional =name after a keyword, then the new variable name is formed by using a default character string that identifies the statistic. In either case, if you also specify the ROWWISE option after a slash and you specify more than one dependent variable or smoothing value in the MODEL statement, the variable name is appended with an order number. For details, see the ROWWISE option.

The keywords allowed and the statistics they represent are as follows:

PREDICTED | P

creates a new variable that contains predicted values. The default name is Predicted.

RESIDUAL | R

creates a new variable that contains residual values, which are calculated as ACTUAL – PREDICTED. The default name is Residual.

STD

creates a new variable that contains standard errors of the mean predicted values. The use of this option implicitly selects the model option DFMETHOD=EXACT even if the DFMETHOD= option has not been explicitly used. The default name is StdErr.

T

creates a new variable that contains t statistics. The use of this option implicitly selects the model option DFMETHOD=EXACT even if the DFMETHOD= option has not been explicitly used. The default name is tValue.

LCLM

creates a new variable that contains the lower part of $100(1-\alpha )$% confidence limits on the mean predicted value. By default, the 95% limits are computed; the ALPHA= option in the MODEL statement can be used to change the significance level. The use of this option implicitly selects the model option DFMETHOD=EXACT even if the DFMETHOD= option has not been explicitly used. The default name is LowerCL.

UCLM

creates a new variable that contains the upper part of $100(1-\alpha )$% confidence limits on the mean predicted value. By default, the 95% limits are computed; the ALPHA= option in the MODEL statement can be used to change the significance level. The use of this option implicitly selects the model option DFMETHOD=EXACT even if the DFMETHOD= option has not been explicitly used. The default name is UpperCL.

You can specify the following options in the OUTPUT statement after a slash (/).

ALL

requests all these keywords: PREDICTED, RESIDUAL, STD, T, LCLM, and UCLM.

ROWWISE | ROW

arranges the created OUTPUT data set in rowwise format. For each dependent variable and each smoothing value specified in the SMOOTH= option in the MODEL statement, one variable is generated for each specified keyword and the variable name is appended with an order number if there are multiple occurrences of the requested statistic. Those variables appear in an order that corresponds to the specified order of the dependent variables and the smoothing values in the MODEL statement. For each variable generated, a label is also created automatically; the label contains the default name of the represented statistic, the name of the dependent variable selected to be modeled, and the smoothing value used for calculating the represented statistic.

By default, the OUTPUT data set is created in columnwise format, where the input data is repeated for each dependent variable and for each smoothing value. Three extra columns, named SmoothingParameter for smoothing parameter values, DepVar for dependent variable names, and Obs for observation numbers, are also added to the OUTPUT data set to distinguish each model.