The OUTPUT statement creates a data set that contains observationwise statistics, which are computed after PROC HPPRINCOMP
fits the model. If you do not specify a keyword, then only the principal component scores are included.
The OUTPUT statement causes the OUT= option in the PROC HPPRINCOMP
statement to be ignored.
The variables in the input data set are not included in the output data set, in order to avoid data duplication for large data sets; however, variables that you specify
in the ID
statement are included. If the input data are in distributed form, in which accessing data in a particular order cannot be
guaranteed, the HPPRINCOMP procedure copies the distribution or partition key to the output data set so that its contents
can be joined with the input data.
You can specify the following syntax elements:
-
OUT=SAS-data-set
DATA=SAS-data-set
-
specifies the name of the output data set. If you omit this option, the procedure uses the DATA
n convention to name the output data set.
-
keyword <=prefix>
-
specifies a statistic to include in the output data set and optionally a prefix for naming the output variables. If you do not provide a prefix, the HPPRINCOMP procedure assigns a default prefix based on the type of statistic requested. For example, for the VAR variables
x1
and x2
, RESIDUAL produces two residual value variables, R_x1
and R_x2
.
You can specify the following keywords to add statistics to the OUTPUT data set:
-
H
-
requests the approximate leverage. The default prefix is H
.
-
STD
-
requests standardized (centered and scaled) VAR variable values for each VAR variable. The default prefix is Std
.
-
STDSSE
-
requests the sum of squares of residuals for standardized VAR variables. The default prefix is StdSSE
.
-
TSQUARE
T2
-
requests scaled sum of squares of score values. The default prefix is TSquare
.
-
RESIDUAL
RESID
R
-
requests residuals for each VAR variable. The default prefix is R
.
-
SCORE
-
requests principal component scores for each principal component. The default prefix is Score
.
If you specify METHOD=
EIG, the only valid keywords are RESIDUAL (if you also specify the PARTIAL statement) and SCORE. Other keywords are ignored.
The output variables that contain the requested statistic are named as follows, according to the keyword that you specify:
-
The keywords RESIDUAL and STD define an output variable for each VAR variable, so the variables that correspond to each VAR variable are
named by appending the name of the VAR variable to the prefix. For example, if the model has the VAR variables x1
and x2
, then RESIDUAL=R produces the variables R_x1
and R_x2
.
-
The keyword SCORE defines an output variable for each principal component, so the variables that correspond to each successive component
are named by appending the component number to the prefix. For example, if the model has three principal components, then
SCORE=T produces the variables T1
, T2
, and T3
.
-
The keywords H, STDSSE, and TSQUARE each define a single output variable, so the variable name matches the prefix.