The HPPLS Procedure

OUTPUT Statement

  • OUTPUT <OUT=SAS-data-set><keyword <=prefix>>…<keyword <=prefix>> ;

The OUTPUT statement creates a data set that contains observationwise statistics, which are computed after fitting the model. If you do not specify any keyword, then only the predicted values for responses are included.

The variables in the input data set are not included in the output data set in order to avoid data duplication for large data sets; however, variables specified in the ID statement are included. If the input data are in distributed form, where accessing data in a particular order cannot be guaranteed, the HPPLS procedure copies the distribution or partition key to the output data set so that its contents can be joined with the input data.

You can specify the following syntax elements in the OUTPUT statement:

OUT=SAS-data-set
DATA=SAS-data-set

specifies the name of the output data set. If the OUT= (or DATA=) option is omitted, the procedure uses the DATAn convention to name the output data set.

keyword <=prefix>

specifies a statistic to include in the output data set and optionally a prefix for naming the output variables. If you do not provide a prefix, the HPPLS procedure assigns a default prefix based on the type of statistic requested. For example, for response variables y1 and y2, a specification of PREDICTED produces two predicted value variables Pred_y1 and Pred_y2.

You can specify the following keywords for adding statistics to the OUTPUT data set:

H

requests the approximate leverage. The default prefix is H.

PREDICTED
PRED
P

requests predicted values for each response. The default prefix is Pred.

PRESS

requests approximate predicted residuals for each response. The default prefix is PRESS.

ROLE

requests numeric values that indicate the role played by each observation in fitting the model. The default prefix is _ROLE_. Table 57.2 shows the interpretation of this variable for each observation.

Table 57.2: Role Interpretation

Value

Observation Role

0

Not used

1

Training

2

Testing


If you do not partition the input data by using a PARTITION statement, then the role variable value is 1 for observations that are used in fitting the model, and 0 for observations that have at least one missing or invalid value for the responses or predictors.

STDX

requests standardized (centered and scaled) predictor values for each predictor. The default prefix is StdX.

STDXSSE

requests the sum of squares of residuals for standardized predictors. The default prefix is StdXSSE.

STDY

requests standardized (centered and scaled) response values for each response. The default prefix is StdY.

STDYSSE

requests the sum of squares of residuals for standardized responses. The default prefix is StdYSSE.

TSQUARE
T2

requests scaled sum of squares of score values. The default prefix is TSquare.

XRESIDUAL
XRESID
XR

requests residuals for each predictor. The default prefix is XResid.

XSCORE

requests extracted factors (X-scores, latent vectors, latent variables, T) for each selected model factor. The default prefix is XScore.

YRESIDUAL
YRESID
YR

requests residuals for each response. The default prefix is YResid.

YSCORE

requests extracted responses (Y-scores, U) for each selected model factor. The default prefix is YScore.

According to the keyword specified, the output variables that contain the requested statistic are named as follows:

  • The keywords XRESIDUAL and STDX define an output variable for each predictor, so the variables that correspond to each predictor are named by appending a number (which starts from 1) to the prefix. For each defined variable, a label is also generated automatically; the label contains the prefix of the variable and the name of the predictor. For example, if the model has three predictors, then a specification of XRESIDUAL=XR produces the variables XR1, XR2, and XR3.

  • The keywords PREDICTED, YRESIDUAL, STDY, and PRESS define an output variable for each response, so the variables that correspond to each response are named by appending the name of the response variable to the prefix. For example, if the model has response variables y1 and y2, then a specification of PREDICTED=P produces the variables P_y1 and P_y2.

  • The keywords XSCORE and YSCORE define an output variable for each selected model factor, so the variables that correspond to each successive factor are named by appending the factor number to the prefix. For example, if the model has three selected factors, then a specification of XSCORE=T produces the variables T1, T2, and T3.

  • The keywords H, TSQUARE, STDXSSE, STDYSSE, and ROLE each define a single output variable, so the variable name matches the prefix.