The main features of the HPPLS procedure are as follows:
supports GLM and reference parameterization for classification effects
permits any degree of interaction effects that involve classification and continuous variables
supports partitioning of data into training and testing roles
supports test set validation to choose the number of extracted factors, where the model is fit to only part of the available data (the training set) and the fit is evaluated over the other part of the data (the test set)
produces an output data set that contains predicted values and other observationwise statistics
The HPPLS procedure implements the following techniques:
principal components regression, which extracts factors to explain as much predictor sample variation as possible
reduced rank regression, which extracts factors to explain as much response variation as possible. This technique, also known as (maximum) redundancy analysis, differs from multivariate linear regression only when there are multiple responses.
partial least squares regression, which balances the two objectives of explaining response variation and explaining predictor variation. Two different formulations for partial least squares are available: the original predictive method of Wold (1966) and the straightforward implementation of a statistically inspired modification of the partial least squares (SIMPLS) method of De Jong (1993).
Because the HPPLS procedure is a high-performance analytical procedure, it also does the following:
enables you to run in distributed mode on a cluster of machines that distribute the data and the computations when you license SAS High-Performance Statistics
enables you to run in single-machine mode on the server where SAS is installed
exploits all the available cores and concurrent threads, regardless of execution mode
For more information, see the section Processing Modes.