The HPPRINCOMP Procedure

OUTPUT Statement

  • OUTPUT <OUT=SAS-data-set>
    <keyword <=prefix>>…<keyword <=prefix>>
    ;

The OUTPUT statement creates a data set that contains observationwise statistics, which are computed after PROC HPPRINCOMP fits the model. If you do not specify a keyword, then only the principal component scores are included.

The OUTPUT statement causes the OUT= option in the PROC HPPRINCOMP statement to be ignored.

The variables in the input data set are not included in the output data set, in order to avoid data duplication for large data sets; however, variables that you specify in the ID statement are included. If the input data are in distributed form, in which accessing data in a particular order cannot be guaranteed, the HPPRINCOMP procedure copies the distribution or partition key to the output data set so that its contents can be joined with the input data.

You can specify the following syntax elements:

OUT=SAS-data-set
DATA=SAS-data-set

specifies the name of the output data set. If you omit this option, the procedure uses the DATAn convention to name the output data set.

keyword <=prefix>

specifies a statistic to include in the output data set and optionally a prefix for naming the output variables. If you do not provide a prefix, the HPPRINCOMP procedure assigns a default prefix based on the type of statistic requested. For example, for the VAR variables x1 and x2, RESIDUAL produces two residual value variables, R_x1 and R_x2.

You can specify the following keywords to add statistics to the OUTPUT data set:

H

requests the approximate leverage. The default prefix is H.

STD

requests standardized (centered and scaled) VAR variable values for each VAR variable. The default prefix is Std.

STDSSE

requests the sum of squares of residuals for standardized VAR variables. The default prefix is StdSSE.

TSQUARE
T2

requests scaled sum of squares of score values. The default prefix is TSquare.

RESIDUAL
RESID
R

requests residuals for each VAR variable. The default prefix is R.

SCORE

requests principal component scores for each principal component. The default prefix is Score.

If you specify METHOD= EIG, the only valid keywords are RESIDUAL (if you also specify the PARTIAL statement) and SCORE. Other keywords are ignored.

The output variables that contain the requested statistic are named as follows, according to the keyword that you specify:

  • The keywords RESIDUAL and STD define an output variable for each VAR variable, so the variables that correspond to each VAR variable are named by appending the name of the VAR variable to the prefix. For example, if the model has the VAR variables x1 and x2, then RESIDUAL=R produces the variables R_x1 and R_x2.

  • The keyword SCORE defines an output variable for each principal component, so the variables that correspond to each successive component are named by appending the component number to the prefix. For example, if the model has three principal components, then SCORE=T produces the variables T1, T2, and T3.

  • The keywords H, STDSSE, and TSQUARE each define a single output variable, so the variable name matches the prefix.