By default, the predictors and the responses are centered and scaled to have mean 0 and standard deviation 1. Centering the
predictors and the responses ensures that the criterion for choosing successive factors is based on how much variation they explain in either the predictors or the responses or in both. (For more information about how different methods explain
variation, see the section Regression Methods.) Without centering, both the mean variable value and the variation around that mean are involved in selecting factors. Scaling
serves to place all predictors and responses on an equal footing relative to their variation in the data. For example, if
Time
and Temp
are two of the predictors, then scaling says that a change of (Time
) in Time
is approximately equivalent to a change of (Temp
) in Temp
.
Usually, both the predictors and responses should be centered and scaled. However, if their values already represent variation around a nominal or target value, then you can use the NOCENTER option in the PROC HPPLS statement to suppress centering. Likewise, if the predictors or responses are already all on comparable scales, then you can use the NOSCALE option to suppress scaling.
If the predictors involve crossproduct terms, PROC HPPLS does not standardize the variables before it standardizes the crossproduct. That is, if the ith values of two predictors are denoted and , then the default standardized ith value of the crossproduct is
When test set validation is performed for the number of effects, some practitioners disagree as to whether the training data should be retransformed. By default, PROC HPPLS does retransform the training data, but you can suppress this behavior by specifying the NOCVSTDIZE option in the PROC HPPLS statement.