The OUTPUT statement creates a data set that contains observationwise statistics, which are computed after fitting the model.
If you do not specify any keyword, then only the predicted values for responses are included.
The variables in the input data set are not included in the output data set in order to avoid data duplication for large data sets; however, variables specified in the
ID
statement are included. If the input data are in distributed form, where accessing data in a particular order cannot be guaranteed,
the HPPLS procedure copies the distribution or partition key to the output data set so that its contents can be joined with
the input data.
You can specify the following syntax elements in the OUTPUT statement:
-
OUT=SAS-data-set
DATA=SAS-data-set
-
specifies the name of the output data set. If the OUT= (or DATA=) option is omitted, the procedure uses the DATA
n convention to name the output data set.
-
keyword <=prefix>
-
specifies a statistic to include in the output data set and optionally a prefix for naming the output variables. If you do not provide a prefix, the HPPLS procedure assigns a default prefix based on the type of statistic requested. For example, for response variables
y1
and y2
, a specification of PREDICTED produces two predicted value variables Pred_y1
and Pred_y2
.
You can specify the following keywords for adding statistics to the OUTPUT data set:
-
H
-
requests the approximate leverage. The default prefix is H
.
-
PREDICTED
PRED
P
-
requests predicted values for each response. The default prefix is Pred
.
-
PRESS
-
requests approximate predicted residuals for each response. The default prefix is PRESS
.
-
ROLE
-
requests numeric values that indicate the role played by each observation in fitting the model. The default prefix is _ROLE_
. Table 12.2 shows the interpretation of this variable for each observation.
Table 12.2: Role Interpretation
Value
|
Observation Role
|
0
|
Not used
|
1
|
Training
|
2
|
Testing
|
If you do not partition the input data by using a PARTITION
statement, then the role variable value is 1 for observations that are used in fitting the model, and 0 for observations
that have at least one missing or invalid value for the responses or predictors.
-
STDX
-
requests standardized (centered and scaled) predictor values for each predictor. The default prefix is StdX
.
-
STDXSSE
-
requests the sum of squares of residuals for standardized predictors. The default prefix is StdXSSE
.
-
STDY
-
requests standardized (centered and scaled) response values for each response. The default prefix is StdY
.
-
STDYSSE
-
requests the sum of squares of residuals for standardized responses. The default prefix is StdYSSE
.
-
TSQUARE
T2
-
requests scaled sum of squares of score values. The default prefix is TSquare
.
-
XRESIDUAL
XRESID
XR
-
requests residuals for each predictor. The default prefix is XResid
.
-
XSCORE
-
requests extracted factors (X-scores, latent vectors, latent variables, T) for each selected model factor. The default prefix is XScore
.
-
YRESIDUAL
YRESID
YR
-
requests residuals for each response. The default prefix is YResid
.
-
YSCORE
-
requests extracted responses (Y-scores, U) for each selected model factor. The default prefix is YScore
.
According to the keyword specified, the output variables that contain the requested statistic are named as follows:
-
The keywords XRESIDUAL and STDX define an output variable for each predictor, so the variables that correspond to each predictor are named
by appending a number (which starts from 1) to the prefix. For each defined variable, a label is also generated automatically;
the label contains the prefix of the variable and the name of the predictor. For example, if the model has three predictors,
then a specification of XRESIDUAL=XR produces the variables XR1
, XR2
, and XR3
.
-
The keywords PREDICTED, YRESIDUAL, STDY, and PRESS define an output variable for each response, so the variables that correspond to each
response are named by appending the name of the response variable to the prefix. For example, if the model has response variables
y1
and y2
, then a specification of PREDICTED=P produces the variables P_y1
and P_y2
.
-
The keywords XSCORE and YSCORE define an output variable for each selected model factor, so the variables that correspond to each successive
factor are named by appending the factor number to the prefix. For example, if the model has three selected factors, then
a specification of XSCORE=T produces the variables T1
, T2
, and T3
.
-
The keywords H, TSQUARE, STDXSSE, STDYSSE, and ROLE each define a single output variable, so the variable name matches the prefix.